LW - Towards a formalization of the agent structure problem by Alex Altair

The Nonlinear Library

Innehåll tillhandahållet av The Nonlinear Fund. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av The Nonlinear Fund eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.

22d ago 22:01

MP3•Episod hem

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Towards a formalization of the agent structure problem, published by Alex Altair on April 30, 2024 on LessWrong. In Clarifying the Agent-Like Structure Problem (2022), John Wentworth describes a hypothetical instance of what he calls a selection theorem. In Scott Garrabrant's words, the question is, does agent-like behavior imply agent-like architecture? That is, if we take some class of behaving things and apply a filter for agent-like behavior, do we end up selecting things with agent-like architecture (or structure)? Of course, this question is heavily under-specified. So another way to ask this is, under which conditions does agent-like behavior imply agent-like structure? And, do those conditions feel like they formally encapsulate a naturally occurring condition? For the Q1 2024 cohort of AI Safety Camp, I was a Research Lead for a team of six people, where we worked a few hours a week to better understand and make progress on this idea. The teammates[1] were Einar Urdshals, Tyler Tracy, Jasmina Nasufi, Mateusz Bagiński, Amaury Lorin, and Alfred Harwood. The AISC project duration was too short to find and prove a theorem version of the problem. Instead, we investigated questions like: What existing literature is related to this question? What are the implications of using different types of environment classes? What could "structure" mean, mathematically? What could "modular" mean? What could it mean, mathematically, for something to be a model of something else? What could a "planning module" look like? How does it relate to "search"? Can the space of agent-like things be broken up into sub-types? What exactly is a "heuristic"? Other posts on our progress may come out later. For this post, I'd like to simply help concretize the problem that we wish to make progress on. What are "agent behavior" and "agent structure"? When we say that something exhibits agent behavior, we mean that seems to make the trajectory of the system go a certain way. We mean that, instead of the "default" way that a system might evolve over time, the presence of this agent-like thing makes it go some other way. The more specific of a target it seems to hit, the more agentic we say it behaves. On LessWrong, the word "optimization" is often used for this type of system behavior. So that's the behavior that we're gesturing toward. Seeing this behavior, one might say that the thing seems to want something, and tries to get it. It seems to somehow choose actions which steer the future toward the thing that it wants. If it does this across a wide range of environments, then it seems like it must be paying attention to what happens around it, use that information to infer how the world around it works, and use that model of the world to figure out what actions to take that would be more likely to lead to the outcomes it wants. This is a vague description of a type of structure. That is, it's a description of a type of process happening inside the agent-like thing. So, exactly when does the observation that something robustly optimizes imply that it has this kind of process going on inside it? Our slightly more specific working hypothesis for what agent-like structure is consists of three parts; a world-model, a planning module, and a representation of the agent's values. The world-model is very roughly like Bayesian inference; it starts out ignorant about what world its in, and updates as observations come in. The planning module somehow identifies candidate actions, and then uses the world model to predict their outcome. And the representation of its values is used to select which outcome is preferred. It then takes the corresponding action. This may sound to you like an algorithm for utility maximization. But a big part of the idea behind the agent structure problem is that there is a much l...

2418 episoder

#Podcasting Education #The Nonlinear Fund