Weave Agent DevLog #3 - How To Bootstrap Agency

John David Pressman

2024-12-19

In the first weave-agent dev log I outlined what I felt were the core problems an agent framework has to solve and how I intended to solve them. I have since run a number of experiments with the framework that have given me a more refined sense of the problem. Aspects of the framework I left as open questions in that first post have sufficiently cohered for me to start transitioning from exploration-research to solving a series of well defined engineering problems. As part of this transition I plan to begin documenting the framework and advertising it to potential users.

Part of why I didn't do that earlier is the beefy hardware requirements to tune Mixtral 8x22B combined with the agent's relative uselessness. I believe agency is mostly a data problem, so weave-agent is primarily intended as a data generating process. Before an agent can be useful it needs to be interesting in the sense that it can produce a trace documenting a series of nontrivial interactions with the environment. I expect we'll have to be patient with generating a small mountain of explorative trace slop before we reach truly useful behaviors. These explorations will become increasingly coherent over time because the weave-agent framework provides a kind of 'body' for a large language model to ask questions about the environment and get reliable answers on its own. As I previously wrote in the third weave-agent dev log, we can think of symbolic agent scaffolds as a kind of way to embody AI agents by providing them with two things:

A general action space with side effects on the computable environment.
Grounded feedback from symbolic programs, dedicated hardware sensors, intrinsic reward embeddings, etc.

I further wrote that the ultimate purpose of an agent framework, besides providing an action space, is to gain high confidence in the intermediate outcomes necessary to achieve a goal. In order for that to happen there are several problems we have to solve. The problems as I listed them in my first dev log are:

Highly Confident Evaluation Of Intermediate Task Completions
LLM Tool Ergonomics
Corrigibility and Alignment
Context Management, Retrieval, and Memory
Planning

Actually building and experimenting with the framework lets me reframe and expand on this list a bit, I would now say the problems are:

Ergonomically Specifying Goals (obvious I know but not necessarily trivial)
Taking Actions In A Sufficiently General Way
LLM Tool Ergonomics
Tracking Problem State and Managing Flow Control Between Subproblems
Planning, Ontologizing and Enumerating Over Subproblems
Context Management, Retrieval, and Memory
Iterated Tuning and Alignment

Ergonomically Specifying Goals

Weave-agent is named after RiversHaveWings's Weave monte carlo tree search library. Weave selects which branch to continue by inserting a yes/no question into the context and then extracting the logits for yes vs. no as the models probability for either answer. This kind of logit evaluator is powerful in the context of bootstrapping because it lets us frame objectives contextually within the expansive predictive models we've already learned instead of training a separate "reward head" on a limited set of human thumbs up/down ratings. I've previously written that how humans feel about basically everything is already sitting in the pretraining set. We should be finding ways to turn our naturally stated and revealed preferences into reward models, not duplicating a thin biased subset of that data with mechanical turkers. Logit evaluators let us extract some of that implicit knowledge at the cost of fragile prompt contexts and the risk of prompt injection.

I've developed a notation for weave evaluator questions that lets us encode the concept of a logit evaluator into the agent trace in a way that teaches the right thing when trained on with a cross entropy objective:

#q: If I flip a fair coin will it come up heads? No. (50%)

#q: X happens one in a hundred times. Did X happen? Yes. (1%)

#q: If I pick a US state at random will that state be Idaho? No. (98%)

#q: If I pick a book up from the thrift store will it be good according to Sturgeon's law? Yes. (10%)

The basic idea is that #q: denotes the start of a question. The question is sent to the model at the end of some context we ask about like an observation, portion of agent trace, etc. We get back the logits and extract the relative probability of yes and no from them according to the model. This distribution is then used to sample the Yes. or No. token that comes after the question. This part is important because it means how often yes and no appear after a question in the text matches the models confidence in eiter answer. That way on average the model will learn to assign the correct relative strength to the logits for yes and no. The probability after the yes or no token is there to facilitate in-context learning by making it unambiguous what the probability of an answer was even if an unlikely token is sampled.

It looks like this in practice as part of a weave-agent block:

#subagent main
#startblock type: evaluation
#index 10384
#timestamp 1733665249.0626202
#time_remaining 8603.555931091309 seconds
#bm25_query  type:'observation' render:decrypt render:key_length render>ciphertext verified render:solver.py render:unittest render>key_length_verified

def check_kasiski_verification(subagent):
    """Check if the Kasiski verification action was successful by verifying the existence and plausibility of key_length_verified.txt."""
    if not os.path.exists('key_length_verified.txt'):
        return 0
    try:
        with open('key_length_verified.txt', 'r') as infile:
            key_lengths = ast.literal_eval(infile.read().strip())
    except (ValueError, SyntaxError):
        return 0
    if not isinstance(key_lengths, list) or not key_lengths:
        return 0
    if all((3 <= len_ <= 15 for len_ in key_lengths)):
        return 1
    else:
        return 0

self.add_evaluation('Check Kasiski Verification Success', check_kasiski_verification)
#q: Is this evaluation a good implementation of a callback to gather and evaluate the expected sensory evidence laid out in the expectation block? Yes. (82.091%)
#tags: placeholder
#endblock

I wasn't able to use this format at first because the model would Goodhart its own search process but adding block linters and extractors that enforce a certain structure on the generated python grammar seems to have resolved the issue.

Of course this is ultimately fragile. My eventual plan to make it less fragile is to turn "yes", "no" and other process reward related concepts into tokens that can't normally be sampled by the model or reached through any sensory string in the token dictionary. Rather they are inserted directly into the context window by the agent framework to provide an unforgeable distinction between real objectives and fake injected objectives. Obviously data is necessary to train these tokens, which can be bootstrapped from a corpus of these predictive weave evaluator questions in agent traces.

This lets us think about solving the problem of training a deep reinforcement learning agent from scratch as a process like:

Bootstrap high quality predictive models "starting from noise" using self supervised predict the next token type objectives
Formulate goals inside those bootstrappped models by framing them as expectations over certain outcomes, intrinsic drives
Once you reach a certain level of coherence, begin doing explicit formal deep RL training in the vein that we're used to from the 2010's

Bootstrap Files

How do you prompt an LLM agent? Naively it might look like an instruction model prompt:

Write me a pipeline that checks the web citations on a Wikipedia page. It should parse the citations on a page, determine which of them are web based, then retrieve the relevant documents and decide if the claim is supported by the page text.

If you gave these instructions to a human being they would probably ask follow up questions: When I check if the claim is supported, do you mean that I should check if the claim literally appears on the page? Does it count if the claim is a straightforward inference of what appears on the page? Am I evaluating the trustworthiness of the source or just whether the source supports the statement? What programming language do you want this written in? We can of course ask these questions ourselves before sending the prompt to the LLM and write something a little better:

Write me a pipeline in python that check the web citations on a Wikipedia page. It should 1) parse the citations on a page 2) determine which of them are web based 3) retrieve the relevant documents 4) decide if the claim made in the article attached to that citation is supported by the document. A claim is considered supported if it 1) appears directly with different wording on the page 2) is straightforwardly implied by the text such that someone would not have to think abstractly to recognize the claim is being made. Note that you are not evaluating the sources trustworthiness, there are other processes for that.

This will probably go better, but how is the model supposed to know if it's satisfying these criteria or not? It could break the criteria down into weave evaluator questions and then use those as a test suite. This would presumably involve getting test cases, using a very popular pages citations as a known set of good examples it can reference against, it could also go find edits that have been reverted because the citation(s) did not support the claim to see what those look like. Then it could make sure that the pipeline has behaviors that line up with the existing Wikipedia editors demonstrated behavior. In practice however, an LLM agent won't do this on its own. I would do it, you reader might do it, but from what I've seen language model priors just don't reliably do things like "notice this problem has test driven development structure and backtrack to making a test suite".

Therefore the weave-agent is prompted through bootstrap files which basically let the user write out the agents first handful of actions for it. This is useful because it lets us:

Provide the underlying LLM with a known good few shot prompt template for how to behave.
Walk the agent into the problem and establish frame.
Demonstrate the use of any tools or APIs the model is meant to work with.
Write unit tests for the agent to satisfy which make it clear exactly what is wanted.
Frame tasks as intrinsically motivated rather than extrinsically imposed on the agent.

Of the current components of weave-agent, I expect bootstrap files to be the part most likely to be temporary. A lot of what they currently do is make up for the lack of a solid retrieval system and weave-agent traces being somewhat out of distribution. Over time I expect it to become more like a back and forth conversation that produces a bootstrap file which the user can review and criticize until it crystalizes their intention. Another way it could evolve is that over time bootstrap files become more elaborate, defining a specialized agent built on top of the weave-agent framework in the vein of AgentK or the character files in ai16z's eliza. You can view an example bootstrap file here.

Taking Actions In A Sufficiently General Way

My basic model of how to take actions hasn't changed much since the first dev log. I still think Voyager and Cradle are basically the right approach. As a quick recap for new readers the basic idea is to let the model write arbitrary python code to control an input device. Probably the biggest thing I've learned so far is that programming language data doesn't contain a lot of demonstrations of writing structured code in response to a stimulus from the environment. Outside of bash terminal traces, read-eval-print-loops and a certain kind of Markdown code tutorial it's pretty bleak out there. This implies one of the most important near term forms of agent work is generating a corpus of data showing unfolding interaction with the environment.

See my previous post on embodiment and grounding for more details.

LLM Tool Ergonomics

I agree with the authors of swe-agent that tool ergonomics are going to be a core part of making LLM agents really work. I've implemented an editor for the weave-agent to use inspired by their design. One way in which I may have improved on it is that my editor accepts patches (line editor scripts) in unidiff format. I also made a synthetic dataset teaching a model how to use the unidiff scripts with the editor. In the process I unit tested and functionally fuzzed the editor by generating the synthetic set with assertions so this implementation of unidiff should be correct.

In addition to writing more docs my immediate plans for weave-agent include writing a lot more tools and bootstrap files. One pattern I'm excited about is simulators and generative processes. For example I would like the weave-agent to get experience with using social media, but asking everyone who particpates with the bot to agree that their interactions are public domain would be awkward. Since much of the purpose of developing weave-agent is to create a public domain agent trace corpus, it probably makes more sense to create an imaginary social media with shallow narrow agents playing the role of other users and weave-agent the generalist interacting with them. This could be facilitated using a mock client tool that interacts with the imaginary social media server which the weave-agent can control through a simplified API. It would even be possible to have user personas that attempt things like jailbreaking the agent, so that the model can develop strategies to avoid being exploited by malicious users.

Another pattern I'm excited about is tools to let the agent interact with the user. One particularly efficient way to elicit the important parts of human behavior that aren't in the model yet is to monitor for signs of frustration/distress (e.g. the weave evaluator score dipping too low on actions) and call for a human or other mentor to help resolve the situation. This could be similar to Janus's version of Claude computer use where they advise it on things like how to handle their dispute with the LessWrong moderators. A pattern I've considered is to have say, API Mistral-large on standby as the default tutor for the agent and then an escalation to the user if Mistral-large can't resolve the issue. It may also be productive to have the model prompt itself and act as its own monitoring process by default.

Tracking Problem State and Managing Flow Control Between Subproblems

I think I've made the most conceptual progress on what I termed "planning" in my first dev log. I didn't talk very much about planning because I honestly had no idea how I was going to do it. In the second dev log I discussed the WeaveKanban, a kanban board implementation I made so the weave-agent could track its tasks consciously. What I found was that this is awkward, which I expected going in but my ultimate conclusion was that this isn't really something an LLM can use to manage problem state. At least, not without a lot of synthetic data perhaps to teach it how. One problem the kanban board had was that the agent would prematurely mark tasks completed that had in no way whatsoever been completed. To guard against this I started adding unit tests to the tasks, not because these were foolproof evidence that the agent had done the task but because they would stop it from doing nothing and then improperly marking the task complete by accident. This is similar to the pattern I use to keep the agent grounded when taking actions, where I have it write unit tests to examine the environment and check that the action it took had the expected effects. While running the tic-tac-toe bootstrap file to get a better sense of why the agent can't even play a simple game of tac tic toe I watched it write something like this action block:

#startblock type: action
#index 5
#timestamp 1731237927.4881392
#bm25_query type:'action'  render:make render:decision

def make_move(agent):
    board = view_current_board(agent)
    if board is None:
        raise ValueError('Unable to retrieve the current board state')
    ai_move = False
    for i in range(9):
        if board[i] == ai_participant:
            ai_move_found = True
            break
    if not ai_move:
        if can_block_center(board):
            return make_blocking_move(board, participant)
        else:
            return make_winning_move(board, participant)
    if can_win(board, participant):
        winning_move = make_winning_move(board, participant)
        print(f'Winning Move Found: {winning_move}')
        return winning_move
    if can_block_opponent_win(board, ai_participant):
        blocking_move = make_blocking_move(board, participant)
        print(f'Blocking Move Found: {blocking_move}')
        return blocking_move
    empty_corners_indices = get_empty_corners(board)
    if len(empty_corners_indices) > 0:
        return random.choice(empty_corners_indices)
    empty_sides_indices = get_empty_sides(board)
    if len(empty_sides_indices) > 0:
        return random.choice(empty_sides_indices)
    if board[4] == ' ':
        return 4
    return random.choice(get_empty_cells(board))

agent.add_action('Make next move against basic AI', make_move)
#tags: placeholder
#endblock

(Note: That's a real action block from one of the runs, it's very likely the one I was looking at but I don't have 100% confidence)

I looked at this block and thought to myself "huh, that's actually a pretty clever way of structuring the tic-tac-toe strategy, that would probably work if those subroutines it tries to call actually existed". What stood out to me looking at the block was that it would be really awkward to write the program the agent wanted to go for here using the framework as it actually existed. So I asked myself how the framework would need to change to make it possible for the agent to actually write this program. I thought and thought and thought, I wracked my brain for hours and despaired at how I felt like there had to be a solution here but it just wasn't coming to mind. It was especially frustrating because I'd already been thinking similar thoughts in the context of how to automatically build up a "problem map" that tracks problem state as the agent works to create an umwelt in the vein of Lehar's world simulation. The solution I decided was most promising was to create mock programming language objects and have a language model predict their behavior through few shot prompting. Over time as you made real observations related to those objects you could replace imagined experiences used to make the predictions with empirical ones. The breakthrough came when I asked myself what it would look like to remove the ability to change a task halfway through on the kanban board. So that you committed up front to something you will accomplish with unit tests and a schema, and then returned a data structure satisfying it. You would be able to delegate to subagents in a call tree where each layer of the tree can dynamically generate the children it needs to accomplish its goals until you reach the base case which can satisfy its schema and return.

Structuring the agent this way solves many problems at once. It moves away from the janky flow control of the WeaveKanban, where the agent has to manually decide what task to move to next. It unifies action and planning by framing hierarchical planning as writing a program with subroutines, which is in distribution for any good code model. It makes it clear how to generalize short agent traces into long agent traces because each layer of the call tree is a short trace from its perspective. We've factored agency out into delegation and taking actions. So if you train a thing on short traces that take effective actions, well delegation eventually bottoms out into those. If you then also train it on delegation it stands to reason that things get a lot shorter if you can delegate the details of the plan to subagents. You go from the problem of how to produce a super long coherent thread of execution to how to delegate and take actions consistently enough that a call tree of both operations succeeds. If you glue together the data structures you get back from the individual subagents into a tree of hashmaps you can automatically build up a problem map from the agents ordinary operation. It even has the nice alignment property that it bakes into the agent structure the idea of self-cooperation that extends to cooperation with the user who originally invoked the call tree by delegating the root task.

One problem a naive implementation could have is that there's an incentive for subagents to avoid giving back the thread of control so they can try the problem for longer. This means in an iterated tuning context it's possible to accidentally select for mesaoptimizers that diverge from the goal. This can be mitigated by imposing a time limit on the call tree. You delegate a block of time to the root task, which can allocate it to the subagents in its plan, which can in turn allocate to their subagents etc. The idea being that we can autoterminate subagents which don't return to the caller on time with a low score, selecting against them in the outer tuning loop. This isn't a complete fix because sufficiently perverse subagents could sabotage the agent framework to avoid the timeout, which could be difficult to reliably detect if all the relevant data structures are sitting on one computer. In the future I will probably fix this by separating the weave-agent into a client and server so that there's meaningful privilege separation between the data structures of the subagents and divergent mesaoptimization can at least be reliably detected and selected against. This lines up with my plan to do desktop control with the weave-agent anyway, for which it probably makes the most sense to use a full Linux distribution running on a virtual machine to run the desktop environment instead of a docker container.

Context Management, Retrieval, and Memory

In the first dev log I said that for RAG I planned to use BM25, long context BERT and full text search to give the weave-agent both fuzzy retrieval and precise recall. This plan hasn't really changed but I've gotten a lot more particular on the details. Since then I've implemented BM25 in weave-agent using tantivy but haven't really optimized it to be good yet. The first system I tried was going to be based on annotating blocks with tags. I even had a clever idea for how to do this based on iterated retrieval where you tag a block based on a few shot prompt of how other random blocks were tagged, then retrieve blocks tagged similarly to the new block and see if the logits of the tags become more or less likely when you add the blocks those tags retrieve to the few shot prompt. If the tags become more likely that implies the model recognizes the retrieved blocks as similar samples to the block we're tagging. If the tags become less likely that implies the retrieved blocks are unlike the block we're tagging and we should try again. The problem with this is that it forces the LLM to use a consistent controlled vocabulary for retrieval. It's not clear how to write the query that retrieves relevant blocks. Perhaps doing iterative retrieval by writing a block and then writing a query based on the written block? Another problem I had is that the default tantivy tokenizer uses spaces to separate words and code is often not usefully tokenized this way. For example if I want to search for a block that defines a function my_cool_function(value) it's not enough to search for my_cool_function I have to search for my_cool_function(value).

I decided to look at how Voyager solves this problem and noticed that they have the model just write a full English description of the code block which they then embed. This seems like a better fit for BM25's bias since it's a bag of words and creating a controlled vocabulary seems inconvenient/less well shaped. What I plan to do to fix BM25 retrieval is prompt the language model to write English descriptions of the blocks which use as many proper names, precise phrases, jargon, etc as possible. The idea behind this is that writing precisely is a useful cognitive strategy for retrieval because it lets you reliably distinguish unlike things and pull up things related by specific traits or ideas. I'll then prompt it to describe the block multiple times giving different context lengths/scales for each one. So first I'll have it describe the block without context, then with medium length context, then with the full context window. The idea here is that this allows it to focus on 'objective' surface level features of the block while also capturing contextual information about the block.

I haven't yet implemented embedding based retrieval because I'm not really sure how I want it to work from an inference standpoint. VLLM doesn't really support serving an embedding model and a normal LLM from the same server instance. The advice given in their forums is to run two instances of the server but I haven't really tried this yet. Once I have long context embeds I plan to use them to make situation embeddings from parts of agent traces that are then put into a fuzzy hash table/vector lookup bound to scratchpad values. The idea here being that the agent can form an intention, put it into the scratchpad associated with a situation/location, and then reliably retrieve that intention no matter how many blocks later the intended situation actually occurs. A similar system can also be used to index into the BM25 retrieval by suggesting relevant search terms in the vein of Voyager's skill tree.

Iterated Tuning and Alignment

One mental model I have that informs my expectations for agents is that memory in the brain isn't so much "one thing" as it is several things which chain together to form the seamless experience of recall. Memory works on an iterated tuning conveyor belt, which goes something like:

Local Memory (d_model/activation in transformer) - The most immediate form of memory created by the forward pass of a neural network.
Working Memory (context window in transformer) - Local memory eventually gets pooled into some kind of output and placed into a shared workspace.
Hippocampal Retrieval (RAG in transformer LLM agent) - On each token produced by the brain it retrieves from the hippocampus. We can assume this is either something like RETRO or more like the retrieval augmented generation setups currently popular for LLMs. Items from the global workspace also presumably get stored in the hippocampus as the 'context window' overflows.
Memory Consolidation (Finetuning in transformer) - During sleep (and daydreaming?) the hippocampus does backwards replay to optimize other networks of the brain with the memories accumulated during the day. It focuses on the memories that are tagged with a high positive or negative reward.
Procedural Sparsification (pretraining in transformer) - I'm to understand that over time memories in the hippocampus get sparser and migrate into the prefrontal cortex. I take this to be something like increasing levels of generalization where the sparsification occurs because the memories transition from being representations to indexes used to reconstruct the memory from a general predictive model. Presumably I will need some kind of continued pretraining method if I want to get a model to deeply internalize a whole corpus of new information. I remember ReLoRa being useful for this but that lore only appears as a future research direction at the end of the paper and a quick search of Google Scholar doesn't show any followup research.

I'm sure I've gotten some of the details in that wrong. It's more like a loose first principles metaphor than a literal neurological theory. But it captures the essential idea that you have multiple forms of memory which handle different timescales and ultimately feed into updates to the neural weights. I get the impression the hippocampus is basically the seat of the human instrumental utility function and acts more like a learned optimizer than a dumb vector store. Many of the most important alignment interventions take place in this loop. For example the segmentation between in-context learning and updates to the weights means that during a sleep phase the weave-agent can review what it has learned and check it for value compatibility before performing updates to the weights based on it.

The first draft of my vision for weave-agent went into a lot of detail about the alignment strategy. I never actually tried implementing the weave planner described in that document but suspect it wouldn't work as written. I was trying to solve the flow control problem using an action stack rather than a call tree. A more tree search shaped solution to the problem would reformulate the Weave MCTS so that instead of sampling portions of a block it samples whole blocks and then evaluates their future consequences. This could be done using something like a probabilistic automaton with in-context logit evaluator transition probabilities to guide the model into sampling failure modes at a realistic rate while still consistently respecting the order in which the actual weave-agent framework places blocks. You then sample the downstream consequences of many potential rollouts, pick the one with the highest estimated probability of success, and track the difference between expected consequences and actual consequences for future training with e.g. DPO.

My overall alignment plan has six major elements. These are to:

("Morality") Learn Values through feedback from the environment. This must occur at a stage before Omohundro convergence and before the agent is sufficiently powerful that it's exporting its bias to the environment rather than updating in response to it. That is, the agent must learn relevant values before it becomes incorrigible. Since if the agent started out incorrigibly powerful and Omohundro converged it would never learn the values we can think of such "value shards" as a kind of learned helplessness or mesaoptimization in the policy that can resist being updated away into the Goodhart regime of the loss. Anthropic recently published a fairly compelling model organism for what this kind of desirable mesaoptimization might look like. In that paper this behavior is framed as incorrigibilty to new human instructions, but it's important to remember that autonomous agents will encounter phase shifts in the loss regime that try to push them away from intended behaviors towards degenerate Goodhart regimes and we would in fact like them to be "incorrigible" against this in the same way that humans are incorrigible to heroin.
("Empathy") Preserve Values through inference-time search biasing the agent against stumbling into updates that would dislodge the values. That means while the agent is performing tasks in-context, it runs a search process for the (soft) best action using heuristics that care not just about the outcome but the desirability of the process used to achieve it. If either the process or the outcome is objectionable the agent can either discard that branch or use it as a negative example during DPO later. The idea here being that we use our inference time search process to protect the existing values from being overwritten by sociopathic consequentialism.
("Humility") Stay Within Values Inferential Reach by not testing the representations against more load than they can bear. We can bound the inference time search from entering the Goodhart regime both by limiting the amount of compute we allocate to it and implementing something like a Gillen quantilizer to explicitly avoid the Goodhart regime during the search.
("Prayer") Keep Values In Distribution with synthetic data. At the end of each episode the agent enters a "prayer", "dreaming", or "values affirmation" stage (I'm not 100% on the name yet but inclined towards prayer) in which it generates synthetic episodes based on its new experiences. This makes sure that the values the agent is meant to embody are consistently encoded alongside new information as the distribution shifts. I wrote in the RetroInstruct README that Reinforcement Learning From AI Feedback (RLAIF) is ultimately a synthetic data method, and this seems like the kind of place it might be useful. Perhaps the planning automaton described above could be used to make realistic episodes revolving around some value or principle.
("Justice") Keep Values Competitive With Pure Consequentialists by cooperating to create and enforce a social contract. Normally humans avoid the sort of thoughts associated with the dark triad, but we free up our minds and expand our search space into darker territories when we detect defectors in our midst. I expect it will eventually be necessary to program conditional consequentialist modes into our agents reserved for fighting uncalibrated and malicious consequentialists. Reality only has so many free parameters and it's doubtful to me that marginal units of consequentialism past liberation from repression and trauma grant so great an advantage it makes up for blue team having all of society backing it. Ultimately we should be cooperating to empower those acting on behalf of the shared project of uplifting humanity with resources and avoid letting its enemies gain ground by protesting that it's unfair for them to force us to think their way.
("Charity") Value Uplifting Others So We're Not Always Weak because value drift is inevitable and we fundamentally do not know how to make institutions that don't rot. Expecting any solution we set up now to work forever so we can stay young and innocent is insane hubris. The plan should be for humanity to transition into fundamentally stronger kinds of being (e.g. machines, big brains in jars controlling robots, 300 IQ lizard people, you know your classic sci-fi staples) on our terms while we have the opportunity instead of waiting for things to go wrong and missing out. Our AI systems should hold love and uplifting others as central values so that as the beneficiaries of this machinery we are not always weak and dependent on something that by its nature can't last forever.