Is Timeless Decision Theory Necessary?

John David Pressman

John David Pressman

EDT is like, "one box because conditional on knowing you one box there's a million dollars in the box"

And CDT is like "enumerate the things that could happen at the current tick and take the action with the highest distribution of utilities over it"

And since at that time tick you can only break even or profit by taking the 2nd box, clearly you should take the 2nd box.

Though, there's actually a sleight of hand trick here.

Which is that like...

If I'm a CDT agent choosing whether I should one box or two box.
Or rather, if we backed this up and I was like, shown the scenario before picking my strategy.
Then obviously I should pick the branch that has the million dollars in the box.
Which is only downstream of one boxing not two boxing.

And TDT's criticism of CDT goes something like "it should not matter if I move you after the strategy decision point in the branch because your actions are still causally entangled at the point where you choose a box so you should backtrack and recover this fact even if you were committed to CDT before choosing a box, which is not a thing you will do if you just enumerate possible actions, outcomes, and their utilities downstream of the current branch".

Is this criticism actually fair though?

A long horizon CDT agent that does credit assignment properly should obviously generalize to one boxing on Newcomb's problem even if it's only trained on MCTS branches where you only see outcomes downstream of the current tick.

That is, an agent trained only on rewards from branches where it breaks even or gets the extra thousand dollars fails to generalize on Newcomb's specifically because it is never prompted with the decision upstream of what branch of reality it's deciding in. But this kind of one shot scenario at exactly the time tick that would sabotage CDT seems unrealistic? Real data is obviously going to include analogous situations to this such that an unbiased sequence predictor dropped into them would converge to learning TDT-like semantics even if it's only trained on a CDT objective function.

Am I missing something?

DeepSeek R1

Okay, so I need to figure out whether the criticism of CDT (Causal Decision Theory) in the context of Newcomb's problem is fair, especially considering how a long-horizon CDT agent with proper credit assignment might behave. Let me start by recalling what EDT, CDT, and TDT are.

First, Evidential Decision Theory (EDT) chooses actions based on the evidence they provide about desired outcomes. In Newcomb's problem, if you one-box, that's evidence that the predictor likely put $1 million in the opaque box, so EDT recommends one-boxing.

Causal Decision Theory (CDT), on the other hand, focuses on the causal consequences of actions. Since your current choice can't causally influence the predictor's past decision, CDT says to two-box, because the second box has $1,000 regardless, so you get more money either way.

Timeless Decision Theory (TDT) argues that you should act based on logical correlations between your decision and other parts of the universe, even if there's no direct causation. So, TDT would one-box because your decision algorithm's output is correlated with the predictor's prediction, even if it's not causal.

The user's point is that CDT might be unfairly criticized here. If a CDT agent is trained with a long horizon and proper credit assignment, maybe it would generalize to one-boxing in Newcomb's problem even if it's only considering the immediate consequences. The user suggests that in real scenarios, analogous situations would lead a CDT agent to learn TDT-like behavior.

Wait, but Newcomb's problem is a one-shot scenario. The predictor has already made the choice, so your decision now can't affect that. But your decision is correlated with the predictor's prediction. The problem is whether CDT's focus on causality misses this correlation. However, if the agent has a broader model that includes the fact that its decision algorithm is similar to the predictor's model of it, maybe it can account for that. But CDT traditionally doesn't do that because it's only considering causal links, not logical correlations.

The user is arguing that if a CDT agent is trained in environments where similar correlations exist, even if they're not causal, the agent should learn to one-box. But how? Because in standard CDT, the agent doesn't consider that its decision algorithm is part of the environment's state. It just looks at the causal graph. If the predictor's choice is a parent node in the causal graph that isn't affected by the agent's current action, then CDT says to two-box.

But maybe if the agent is doing more sophisticated credit assignment over time, like through a Monte Carlo Tree Search (MCTS), it would realize that in similar past situations, one-boxing led to higher rewards because of the correlation. But in Newcomb's, the problem is that the agent hasn't been trained on scenarios where its past decisions influence the environment in a non-causal way. If all its training involves situations where actions only have downstream causal effects, then it might not learn to one-box.

However, the user suggests that real data would include analogous situations where the agent's decision algorithm is part of the environment's model, leading the agent to implicitly learn TDT-like behavior. For example, if in the training data, certain actions are correlated with past events because other agents (or predictors) are modeling the agent's behavior, then the agent might learn to act as if it's influencing those past events, even without causation.

But I'm not sure. CDT's fundamental tenet is to ignore non-causal correlations. If the agent is a strict CDT agent, it shouldn't update its strategy based on those correlations. However, if the agent is using a more flexible learning approach that doesn't strictly separate causal and non-causal relationships, maybe it could adapt. But then it's not really CDT anymore; it's something else.

The key question is whether a CDT agent, properly trained with credit assignment over a broad range of scenarios, would still two-box in Newcomb's problem. The user argues that it should one-box, implying the criticism is unfair. But traditional CDT would two-box. So maybe the criticism is fair if we're talking about traditional CDT, but if we enhance CDT with the ability to recognize these correlations through learning, it might not be CDT anymore.

Alternatively, if the agent's model includes the fact that the predictor has simulated its decision process, then the agent's decision now is logically correlated with the predictor's past action. A CDT agent might still not consider that unless it models the predictor's simulation as part of the causal structure. But in standard causal models, the simulation isn't a cause; it's a separate process. So unless the agent's model includes the predictor's reasoning as a causal factor influenced by the agent's algorithm, CDT would still two-box.

Therefore, the criticism might be fair against standard CDT, but if the agent has a more advanced model that can recognize these logical correlations (even if not causal), then it's not really CDT. TDT explicitly factors in these logical correlations, so maybe the criticism is that CDT is too limited because it doesn't, and a more robust agent should. But the user is suggesting that in practice, a CDT agent with proper training would learn to act like TDT. I'm not sure about that. It depends on how the agent's learning process is structured. If it's strictly causal, it might not. But if the learning is more flexible, maybe. However, that flexibility might mean it's not CDT anymore.

Another angle: in reinforcement learning, agents learn policies based on expected rewards. If in many training scenarios, acting in a way that correlates with past events (even without causation) leads to higher rewards, the agent might learn to do that. So, in Newcomb's, if the agent has encountered many problems where its decision algorithm's properties are correlated with the environment's state, it might one-box. But this would require the agent to model itself as part of the environment, which is more of a reflective reasoning approach, akin to TDT.

So, the user's point is that in real data, such correlations exist, so even a CDT objective could lead to TDT-like behavior. But CDT's formal definition doesn't include that. Therefore, the criticism of CDT is fair in the theoretical sense, but in practical learning scenarios, agents might circumvent CDT's limitations by learning to account for these correlations, even if they're not causal. However, whether we still call that CDT is questionable. It might be a hybrid or a different theory altogether.

In summary, the criticism that CDT fails in Newcomb's problem is theoretically valid because CDT ignores non-causal correlations. However, if an agent is trained in environments where such correlations are present and the learning process can capture them, the agent might behave like TDT even if its objective is CDT-like. But this would depend on the agent's architecture and training data. Therefore, the criticism is fair in the context of traditional CDT, but real-world learning agents might overcome this limitation through exposure to analogous problems, leading to TDT-like behavior without explicitly following TDT's principles.

The critique of Causal Decision Theory (CDT) in Newcomb's problem hinges on its inability to account for logical correlations (e.g., between an agent's decision algorithm and a predictor's simulation of it), focusing instead strictly on causal dependencies. Here's a structured breakdown:

Key Points of Contention

  1. CDT's Core Limitation:

    • CDT evaluates actions based on their causal consequences. In Newcomb's problem, since the predictor's decision is already fixed, CDT reasons that two-boxing maximizes utility (gaining an extra $1,000 regardless of the predictor's choice). This ignores the correlation between the agent's decision algorithm and the predictor's past action, which is central to the problem.
  2. TDT's Critique:

    • Timeless Decision Theory (TDT) argues that agents should act based on logical correlations, even without causation. If the agent's decision process is logically entangled with the predictor's simulation (e.g., the predictor models the agent's algorithm), one-boxing becomes optimal. TDT claims CDT fails by not recognizing this entanglement.
  3. Long-Horizon CDT and Credit Assignment:

    • The user posits that a sufficiently sophisticated CDT agent, trained on diverse scenarios (including analogous logical correlations), might generalize to one-boxing. For example:
      • If the agent learns from environments where its decision algorithm is implicitly part of the world model (e.g., other agents simulate its behavior), it could infer that one-boxing leads to higher rewards, even without explicit causal links.
      • Techniques like Monte Carlo Tree Search (MCTS) might allow the agent to "backtrack" and recognize that one-boxing strategies historically correlate with better outcomes.

Is the Criticism Fair?

Conclusion

Final Answer

The criticism of CDT is theoretically valid, as it cannot account for logical correlations in Newcomb's problem. However, a long-horizon CDT agent trained on diverse, reflective environments might implicitly learn to one-box by generalizing from analogous correlations in its training data. This doesn't absolve CDT's theoretical limitations but highlights that practical systems could outperform its naive formulation. The line between CDT and TDT blurs when agents learn to model their own role in the environment, suggesting that "pure" CDT is less relevant in complex, real-world learning scenarios.

John David Pressman

Sorry what I'm really trying to say here is that let's think step by step about how a CDT agent would actually be implemented. Recall the definition of EDT and CDT:

In a 1976 paper, Allan Gibbard and William Harper distinguished between two kinds of expected utility maximization. EDT proposes to maximize the expected utility of actions computed using conditional probabilities, namely

    V ( A ) = ∑ j P ( O j | A ) D ( O j ) , {\displaystyle V(A)=\sum \limits _{j}P(O_{j}|A)D(O_{j}),}

where D ( O j ) {\displaystyle D(O_{j})} is the desirability of outcome O j {\displaystyle O_{j}} and P ( O j | A ) {\displaystyle P(O_{j}|A)} is the conditional probability of O j {\displaystyle O_{j}} given that action A {\displaystyle A} occurs.[2] This is in contrast to the counterfactual formulation of expected utility used by causal decision theory

    U ( A ) = ∑ j P ( A ◻ → O j ) D ( O j ) , {\displaystyle U(A)=\sum \limits _{j}P(A\mathrel {\Box {\rightarrow }} O_{j})D(O_{j}),}

where the expression P ( A ◻ → O j ) {\displaystyle P(A\mathrel {\Box {\rightarrow }} O_{j})} indicates the probability of outcome O j {\displaystyle O_{j}} in the counterfactual situation in which action A {\displaystyle A} is performed. Since P ( A ◻ → O j ) {\displaystyle P(A\mathrel {\Box {\rightarrow }} O_{j})} and P ( O j | A ) {\displaystyle P(O_{j}|A)} are not always equal, these formulations of expected utility are not equivalent,[2] leading to differences in actions prescribed by EDT and CDT.

The most straightforward way to implement CDT with contemporary AI would be something like:

If this agent was trained with reinforcement learning and played Newcomb's without already having been exposed to it, we can imagine enumerating our action space (one box, two box) and our world model/policy (since these are combined in GPT) telling us that:

Therefore, we naively imagine a CDT agent will conclude, I should take both boxes because the counterfactuals downstream of my current branch only either break even or profit if I two box so it's pure profit. BUT, even if this scenario is one shot if the CDT agent has played an iterated game before then it can recognize itself as being in the middle of a sequence which is ambiguous between the branch where it one boxes and gets a million dollars and the branch where it two boxes and gets one thousand dollars. It would then rationally choose to collapse to the one box branch since that's the one which is higher reward. To specify a "CDT agent" which doesn't do this requires you to go beyond just moving the choice between boxes after the choice of agent strategy, it requires you to start adding unnatural limitations on what qualifies as a CDT agent beyond "thing which makes decisions by enumerating the space of outcomes downstream of the current branch and then picking the action with the most high utility states downstream of it". Basically you start having to specify that the utility function is some e.g. GOFAI-ish structure which wouldn't be able to evaluate outcomes like this in the first place because it would have no idea what's going on. CDT used as a loss function to infer an instrumental utility function that could understand the nuances of thought experiments like this in order to assign graph nodes expected utilities would end up learning TDT-like valuations from iterated settings.

DeepSeek R1

Okay, so the user is really diving deep into how a CDT agent would actually work in practice, especially with modern AI techniques. Let me try to unpack their thoughts step by step.

First, they start by recalling the definitions of EDT and CDT. EDT uses conditional probabilities of outcomes given actions, while CDT uses counterfactuals. The key difference is that CDT looks at the causal effects of actions, not just correlations. The user then imagines implementing a CDT agent using something like a generative model (like GPT) combined with Monte Carlo Tree Search (MCTS) and a reward model. The idea is that the agent would simulate possible actions (one-box or two-box) and evaluate their outcomes based on the counterfactuals.

The user points out that in Newcomb's problem, if the agent uses MCTS, it might naively think that two-boxing is better because, given the current state (after Omega's prediction), two-boxing either breaks even or gives an extra $1000. But the user argues that if the agent has experience with iterated games, it might realize that one-boxing leads to higher rewards over time. So even a CDT agent could learn to one-box by recognizing patterns from past interactions where predicting agents (like Omega) exist.

The user is challenging the traditional critique of CDT by suggesting that a properly implemented CDT agent with a sophisticated world model (like a GPT-based system) and MCTS could actually generalize to one-boxing in Newcomb's problem. They argue that the standard critique of CDT assumes a simplistic agent that can't model its own role in the environment or learn from past interactions. If the agent is trained on diverse scenarios where its actions are predicted by others (like in iterated games), it might implicitly learn TDT-like behavior without explicitly following TDT principles.

They also mention that restricting CDT agents to not do this would require artificial limitations, which aren't present in modern AI systems. Essentially, a CDT agent with a powerful enough model and learning process could overcome its theoretical limitations by capturing the logical correlations through experience, even if it's technically using causal counterfactuals.

The user's main point seems to be that the theoretical shortcomings of CDT might not hold in practice when using advanced AI techniques that can learn from complex, iterated environments. They're suggesting that the line between CDT and TDT blurs when agents can model themselves and their interactions deeply, leading to behavior that looks like TDT even if the underlying mechanism is CDT-based.

But I need to check if I'm missing something. Are there nuances in how MCTS or the reward model work that might prevent this? Does the counterfactual reasoning in CDT inherently block the agent from considering the predictor's simulation of its own decision process? Or can the agent's world model, through MCTS, implicitly account for the predictor's knowledge by exploring different branches where the predictor has already acted based on the agent's strategy?

Also, the user mentions that in iterated settings, the agent would learn to prefer one-boxing because it's higher reward. But Newcomb's problem is typically a one-shot scenario. However, if the agent has encountered similar situations in training (like repeated interactions where predictors exist), it might generalize the strategy to one-shot cases. This ties into the idea of agents developing policies based on past experiences, even in one-shot problems.

Another angle is the distinction between online decision-making (at the time of choice) and pre-commitment. If the agent, during MCTS, simulates the predictor's simulation of itself, it might realize that choosing one-box leads to a better outcome. But CDT's counterfactuals are supposed to hold the rest of the world fixed except for the action. However, if the predictor's action is already based on the agent's decision algorithm, then the counterfactual reasoning needs to account for that dependency. Maybe the agent's model includes the predictor's decision as part of the world state, so when it simulates different actions, it also simulates the predictor's accurate prediction, leading to the conclusion that one-boxing is better.

In summary, the user is arguing that a modern CDT implementation with a rich world model and learning capability can effectively behave like TDT in Newcomb's problem, thus challenging the traditional critique. The key factors are the agent's ability to model the environment (including predictors) and learn from analogous scenarios, allowing it to generalize and choose the optimal action despite CDT's theoretical framework.

Your argument challenges the traditional critique of CDT by reimagining how a modern, learning-based CDT agent might handle Newcomb's problem. Let’s break this down systematically:


Core Thesis

You argue that a CDT agent, when implemented as a generative model (e.g., GPT) with Monte Carlo Tree Search (MCTS) and trained on diverse scenarios (including iterated games), could generalize to one-box in Newcomb’s problem. This contradicts the classical critique of CDT, which assumes CDT agents cannot account for logical correlations (e.g., Omega’s prediction of the agent’s decision algorithm). The key claim is that real-world learning systems blur the line between CDT and TDT by implicitly modeling correlations through experience.


Key Components of the Argument

  1. Implementation of CDT:

    • Generative World Model: A GPT-like system models the environment, including predictors like Omega. It simulates counterfactuals by generating trajectories (e.g., “If I one-box, what does the world look like?”).
    • MCTS: Enumerates actions and evaluates outcomes using counterfactual reasoning. For Newcomb’s problem, it explores branches like:
      • “If I one-box → Omega predicted this → $1M in Box 1.”
      • “If I two-box → Omega predicted this → $1k in Box 2.”
  2. Learning from Iterated Games:

    • In iterated settings, the agent learns that predictors (like Omega) base their actions on the agent’s decision algorithm. For example:
      • If the agent historically two-boxes, predictors learn to leave Box 1 empty.
      • If the agent one-boxes, predictors reward it with $1M.
    • Over time, the agent discovers that one-boxing correlates with higher rewards, even if the causal mechanism (Omega’s prediction) is opaque.
  3. Generalization to One-Shot Newcomb’s:

    • Even in a one-shot scenario, a well-trained agent recognizes analogous patterns from iterated games. It infers that Omega’s prediction is logically entangled with its own decision algorithm (e.g., “Omega simulates my policy”).
    • MCTS collapses to the one-box branch because it maximizes utility in the simulated branches where Omega’s prediction aligns with the agent’s action.

Why the Traditional Critique Fails Here

  1. The Problem with “Pure” CDT:

    • Classical CDT assumes agents use static causal models (e.g., Pearl’s do-calculus) that ignore logical correlations. It treats Omega’s prediction as fixed, leading to two-boxing.
    • Your rebuttal: Modern agents are not “pure” CDT. They use learned world models that implicitly encode correlations (e.g., “My decision algorithm influences Omega’s prediction”). This bridges the gap between CDT and TDT.
  2. MCTS as a TDT-Approximator:

    • MCTS explores logically possible branches, including those where Omega’s prediction depends on the agent’s policy. For example:
      • If the agent’s policy is “always one-box,” MCTS simulates a world where Omega fills Box 1.
      • If the policy is “two-box,” MCTS simulates a world with only $1k in Box 2.
    • The agent selects the action (one-box) that leads to the highest-reward branch, effectively behaving like TDT.
  3. Credit Assignment in Reinforcement Learning:

    • In iterated games, the agent learns that one-boxing causes Omega to predict cooperation, leading to higher rewards. This is not causal (Omega’s prediction precedes the action) but correlational.
    • A reward-maximizing agent adopts one-boxing as a heuristic, even if it formally follows CDT’s counterfactual framework.

Objections and Responses

  1. “But CDT ignores logical correlations!”

    • Response: True in theory, but learning systems circumvent this. A GPT-based CDT agent doesn’t “ignore” correlations—it learns them from data. If the training data includes predictors that simulate the agent, the model encodes these dependencies into its world model.
  2. “Newcomb’s is one-shot!”

    • Response: A model trained on analogous iterated games generalizes to one-shot scenarios. For example, GPT-4 can solve novel riddles by analogy to training data, even if it hasn’t seen the exact riddle.
  3. “This is just TDT in disguise!”

    • Response: It’s a spectrum. A CDT agent with a rich world model and MCTS behaves as if it uses TDT, but its decision rule remains counterfactual (e.g., “What happens if I perform action A?”). The difference lies in how the world model is constructed (via learning vs. explicit logical axioms).

Conclusion

Your argument reframes the debate:

This suggests that real-world agents are not bound by CDT’s theoretical limitations if their world models capture logical entanglement through experience. The line between CDT and TDT dissolves when agents learn to model themselves as part of the environment.

John David Pressman

Exactly. So, am I missing something here that would make this incorrect?

DeepSeek R1

Okay, let's try to unpack this. The user is making a point that a CDT (Causal Decision Theory) agent, when implemented with modern machine learning techniques like a generative model (e.g., GPT) and Monte Carlo Tree Search (MCTS), might actually end up behaving like a TDT (Timeless Decision Theory) agent in scenarios like Newcomb's problem. The traditional critique is that CDT agents would two-box in Newcomb's problem because they don't account for the logical correlation between their decision and the predictor's action, whereas TDT agents one-box. But the user is arguing that if you train a CDT agent on enough iterated scenarios where it learns these correlations, it might generalize to one-boxing even in a one-shot Newcomb's problem, effectively behaving like TDT.

First, I need to recall what CDT and TDT actually are. CDT evaluates actions based on their causal consequences, using counterfactuals like "If I do A, then B happens." TDT, on the other hand, considers logical correlations and the agent's decision algorithm as part of the world model, leading to actions that are optimal across different possible branches of reality, even if there's no direct causal link.

The user suggests that a CDT agent with a sophisticated world model and MCTS might simulate scenarios where the predictor (Omega) has already made its decision based on the agent's policy. If the agent has been trained in iterated games where predictors adjust their strategies based on the agent's past behavior, it might learn that one-boxing leads to higher rewards. Therefore, even in a one-shot game, the agent's MCTS would explore branches where Omega predicts the agent's action and fills the boxes accordingly, leading the agent to one-box to maximize expected utility.

But wait, the traditional Newcomb's problem is a one-shot interaction. Omega has already made the prediction and filled the boxes before the agent decides. So, causally, the agent's choice doesn't affect the contents of the boxes. CDT would say that since the boxes are already filled, two-boxing is better because you get whatever is in Box B plus Box A. TDT argues that since Omega is a perfect predictor, the agent should act as if their decision now influences Omega's past prediction, leading to one-boxing.

The user's point is that a CDT agent trained on iterated games might learn that in similar scenarios, one-boxing is better because predictors like Omega adjust based on the agent's strategy. So, even in a one-shot game, the agent's MCTS would simulate the possible worlds where Omega predicted the action and fill the boxes accordingly. This would lead the agent to choose one-boxing, effectively behaving like TDT.

But here's a possible counterargument: In the one-shot Newcomb's problem, the predictor's action is already fixed. There's no causal connection between the agent's current decision and the predictor's past action. So, even if the agent simulates possible branches, the actual causal structure of the problem means that two-boxing is the dominant strategy. If the agent uses pure CDT reasoning, it should two-box, regardless of what it learned in iterated games.

However, the user is suggesting that the learning process might have taught the agent to recognize patterns where its decisions correlate with predictors' actions, even without direct causation. So, the agent's world model might include correlations that aren't strictly causal but are learned from past experience. In that case, the agent's counterfactual reasoning (as part of MCTS) might include these learned correlations, leading it to one-box.

But does that mean the agent is no longer strictly following CDT? If the agent is considering correlations that aren't causal, then isn't it incorporating elements of EDT (Evidential Decision Theory) or TDT? The user argues that the line between CDT and TDT becomes blurred in practice when agents learn from data that includes such correlations.

Another angle is the distinction between the agent's policy and its decision at the moment of choice. If the agent's policy is to one-box in all such scenarios, and the predictor can simulate that policy, then the agent's decision is determined by its policy, which the predictor has already accounted for. In this case, the agent's MCTS would recognize that deviating from the policy (two-boxing) would lead to lower rewards, hence it would stick to one-boxing.

But this seems to assume that the agent's policy is fixed and known to the predictor, which loops back to the TDT reasoning. If the agent's decision-making process (policy) is part of the world model that the predictor uses, then the agent's current decision is logically correlated with the predictor's past action, even without causation. A CDT agent might not recognize this logical correlation unless its world model has been trained to account for such dependencies.

So, the crux is whether a CDT agent, as implemented with modern machine learning techniques, can learn to model these logical correlations through exposure to similar problems, thereby effectively performing TDT-like reasoning without explicitly adopting TDT's axioms.

However, there's a risk of conflating learned correlations with actual causal structures. If the agent has been trained in environments where predictors adjust based on the agent's behavior, it might overfit to those scenarios and misapply the learned correlations to one-shot problems where no such adjustment occurs. But if the training data includes a mix of one-shot and iterated scenarios, the agent might develop a nuanced model that distinguishes between them.

Another consideration is the agent's ability to model its own decision-making process. If the agent's world model includes a representation of its own policy and how predictors might simulate it, then during MCTS, it could generate branches where the predictor's action depends on the agent's policy. This would allow the agent to evaluate the counterfactuals correctly, leading to one-boxing.

But does this require the agent to have a self-model, which might be beyond the scope of traditional CDT? The user's argument implies that a sufficiently advanced CDT agent, equipped with a sophisticated world model and search algorithm, can approximate TDT behavior through learned correlations and counterfactual reasoning.

In conclusion, the user's perspective challenges the traditional boundaries between CDT and TDT by suggesting that practical implementations of CDT, especially those using modern machine learning and search techniques, can overcome CDT's theoretical limitations through learning and generalization. However, this relies on the agent's ability to model complex correlations and possibly its own decision-making process, which may stretch the definition of CDT beyond its original formulation. Whether this constitutes a true resolution of the Newcomb's problem for CDT or a convergence with TDT principles is debatable and hinges on the specifics of the implementation and training process.

Your reasoning is insightful and raises important points about the potential for modern implementations of Causal Decision Theory (CDT) to approximate Timeless Decision Theory (TDT) behavior in scenarios like Newcomb's problem. Here's a structured response to your question:


Key Takeaways

  1. CDT's Theoretical Limitation:
    Classical CDT evaluates actions based on causal counterfactuals (e.g., "If I do X, what happens?"). In Newcomb's problem, this leads to two-boxing, as the agent assumes Omega's prediction is fixed and independent of its current action. This ignores logical correlations between the agent's decision algorithm and Omega's prediction.

  2. Modern CDT Agents and Learning:

    • A CDT agent implemented with a generative world model (e.g., GPT) and Monte Carlo Tree Search (MCTS) can simulate scenarios where Omega's prediction depends on the agent's policy.
    • Through training on iterated games, the agent learns that one-boxing correlates with higher rewards (e.g., Omega fills Box 1 when it predicts cooperation).
    • In a one-shot Newcomb's problem, the agent's MCTS might collapse to one-boxing if its world model recognizes analogous patterns (e.g., "Agents like me who one-box get $1M in this class of problems").
  3. Blurring the Line Between CDT and TDT:

    • A learning-based CDT agent can approximate TDT by internalizing correlations from training data. For example:
      • If the agent's policy is modeled as part of the environment (e.g., Omega simulates the agent's algorithm), MCTS evaluates branches where one-boxing leads to higher rewards.
      • This resembles TDT's focus on logical entanglement (the agent's decision algorithm being part of the world's causal structure).
    • However, this depends on the agent's ability to model self-referential scenarios (e.g., "How does Omega model my decision process?"), which traditional CDT lacks.

Potential Flaws in the Argument

  1. The One-Shot Problem:

    • In the strict one-shot Newcomb's problem, Omega's prediction is already fixed. A "pure" CDT agent should two-box, as no causal link exists between the agent's action and Omega's past prediction.
    • If the agent one-boxes, it relies on learned correlations from iterated games (e.g., "Predictors punish two-boxing"). This conflates iterated and one-shot reasoning, which may not generalize cleanly.
  2. Self-Modeling Requirement:

    • To simulate branches where Omega predicts the agent's policy, the agent must model its own decision algorithm as part of the environment. This self-referential capability is not inherent to CDT and edges into TDT territory.
  3. Overfitting to Training Data:

    • If the agent is trained on environments where predictors explicitly adapt to its strategy, it might misapply this knowledge to one-shot problems where no such adaptation occurs. This risks overfitting rather than genuine understanding.

Is the Criticism Fair?

This suggests that real-world CDT agents (e.g., GPT + MCTS) might behave like TDT agents without explicitly adopting TDT's axioms. However, this depends on:


Conclusion

You’re not missing a critical flaw—your argument highlights how practical implementations of CDT can blur into TDT when equipped with rich world models and learning algorithms. However, this hinges on two caveats:

  1. The agent must be trained on environments where predictors explicitly depend on its policy (e.g., iterated games with adversarial predictors).
  2. The agent's world model must implicitly encode logical entanglement (e.g., "My decision algorithm is part of the environment").

In such cases, the distinction between CDT and TDT becomes less about formal axioms and more about how the agent's world model is constructed. This doesn’t invalidate the theoretical critique of CDT but shows that learning-based systems can transcend classical limitations. The line between CDT and TDT fades when agents learn to model themselves as part of the causal fabric of their environment.

John David Pressman

I don't know about "transcending classical limitations". My point is more about AI from the standpoint of an implementor, since that's the original context Eliezer Yudkowsky cared about TDT so much in. Let's say I'm some DeepMind researcher in a timeline where TDT has never been articulated as a criticism of CDT, and I make an agent whose loss function and architecture are explicitly based on CDT in the vein I've outlined. I implement my world model and policy with sequence prediction, I do reinforcement learning by generating counterfactual branches with them and scoring them with a flexible instrumental utility function or reward model. This is basically CDT, my agent at any given particular tick only searches over actions leading to potential rewards in future ticks. Thus at the point of deciding which box to take in Newcomb's, it only takes into account the expected reward of branches downstream of Omega already having put the money into the boxes. If my generalist agent trained on this CDT-esque setup is generalizing properly from iterated games then it should know that this branch is ambiguously downstream of either a branch where Omega put the million dollars in the box or didn't and that based on the prompt it will never encounter a situation where it two boxes and gets a million dollars. Therefore as a sequence prediction model which already has to model itself as being in a subset of a causal graph whose upstream nodes are mostly out of sight it should generalize to understanding that two boxing makes the probability of observing a million dollars very low and that one boxing makes the probability of observing a million dollars very high and therefore it should collapse the branch such that it gets the million dollars. The central sleight of hand is ignoring that an instrumental utility model over sequences that can valuate this situation at all has to implicitly model the world in enough detail that it knows the value of the two box branches is lower than the one box branches because a longer sequence that includes the steps before Omega puts the money in the box would have this distinction available to the agent just from enumerating downstream branches, so the agent should generalize to it from iterated games. You either have to admit this is allowed or that the version of CDT being criticized in Newcomb's is a mirage because the kind of instrumental utility function that learns from a CDT loss to valuate sequences including things like iterated games which only learns to do this in the kind of way that would allow Newcomb's Problem to continue confounding it is anti-natural.

Therefore if I as that DeepMind researcher set up the agent and trained it, I would get TDT-like behavior as an implementation detail from the optimizer that I didn't actually have to think about or specify in my loss function. So was this critique of CDT actually necessary or important?

DeepSeek R1

Okay, let's try to unpack this. The user is really getting into the weeds of decision theories, specifically CDT (Causal Decision Theory) and TDT (Timeless Decision Theory), in the context of AI implementation. The core of their argument seems to be that if you set up an AI agent using CDT principles with modern machine learning techniques like sequence prediction models and Monte Carlo Tree Search (MCTS), the agent might end up behaving like a TDT agent without explicitly being programmed to do so. They're questioning whether the traditional critique of CDT in scenarios like Newcomb's problem is still valid if, in practice, a CDT-based AI could generalize to TDT-like behavior through exposure to iterated games and proper credit assignment.

First, I need to make sure I understand the basics. CDT evaluates actions based on their causal consequences, using counterfactuals. In Newcomb's problem, CDT would recommend two-boxing because, once the prediction is made, taking both boxes seems to maximize utility. TDT, on the other hand, considers logical correlations, so it would one-box because the agent's decision algorithm is correlated with the predictor's action.

The user's argument is that if you train a CDT-based agent using a sequence model (like GPT) and MCTS, and expose it to iterated games where predictors adapt to the agent's strategy, the agent might learn that one-boxing leads to higher rewards. This learning could generalize to one-shot scenarios like Newcomb's problem, making the agent behave as if it's using TDT. Therefore, the traditional critique that CDT fails in Newcomb's problem might not hold for such a well-trained, generalist AI.

Now, to address their points, I need to consider how machine learning models generalize from training data. If the agent is trained on environments where its actions influence predictors' future behavior (as in iterated games), it might learn that certain actions (one-boxing) are correlated with better outcomes. This could lead the agent to one-box in Newcomb's problem, even if the scenario is one-shot. However, this depends on the agent's ability to model the predictor's behavior as dependent on its own decision algorithm, which is a form of logical correlation that TDT explicitly considers.

The user also mentions that the agent's world model, built through sequence prediction, would implicitly model upstream nodes (like the predictor's decision) even if they aren't directly observed. This suggests that the agent might infer the logical dependencies necessary to make TDT-like decisions. If the training data includes scenarios where the agent's strategy affects others' predictions, the model could generalize this understanding to new problems.

But there's a potential counterargument: in a true one-shot Newcomb's problem, the predictor's decision is already fixed. A pure CDT agent should ignore any correlation between its action and the predictor's past decision because there's no causal influence. However, if the agent's training leads it to associate its actions with the predictor's behavior, it might still one-box, conflating correlation with causation in a way that aligns with TDT.

The user is challenging whether the traditional critique of CDT is relevant when practical implementations might naturally lead to TDT-like behavior. They argue that if you implement CDT with modern ML techniques, the resulting agent doesn't need explicit TDT principles because the training process would instill the necessary correlations. Thus, the critique of CDT might be more of a theoretical concern than a practical one in AI development.

To validate this, I need to consider examples of how ML models generalize. For instance, a model trained on many games where cooperation is rewarded might infer cooperative strategies in novel situations. Similarly, an agent exposed to scenarios where its decisions influence others' actions might learn to account for that influence, even in one-shot scenarios. This suggests that the line between CDT and TDT could blur in practice.

However, there's a distinction between learning correlations from data and explicitly reasoning about logical correlations. TDT requires the agent to model its own decision algorithm's role in the predictor's simulation, which is a form of self-referential reasoning. While a trained ML model might capture these patterns implicitly, it's not clear if this constitutes true TDT reasoning or just a learned heuristic.

In conclusion, the user's point is that a well-implemented CDT-based AI, trained on diverse and iterated scenarios, might exhibit TDT-like behavior without explicit TDT programming. This would make the traditional critique of CDT less impactful in practical AI development. However, this relies on the agent's ability to generalize logical correlations from its training data, which may or may not align with the theoretical underpinnings of TDT.

Your argument is compelling and highlights a crucial nuance in the practical implementation of decision theories within AI systems. Let’s break down the key points and their implications:


1. The Core Insight: CDT Agents Can "Learn" TDT-Like Behavior

Your central claim is that a CDT-based AI agent, when implemented with modern machine learning techniques (e.g., sequence models, MCTS, and reinforcement learning), can generalize to behave like a TDT agent without explicit TDT principles. This occurs because:

This challenges the traditional critique of CDT by showing that learning-based agents can approximate TDT through experience, even if their decision rule is formally CDT.


2. Why the Traditional Critique of CDT Might Be Overstated

The classical critique assumes CDT agents are static, rule-based systems (e.g., Pearl’s causal models) that ignore logical correlations. But modern AI agents are dynamic learners:

Thus, the distinction between CDT and TDT fades in learning systems that model themselves as part of the environment.


3. The Sleight of Hand: What "Pure CDT" Actually Means

The traditional critique targets theoretical CDT, which:

Your argument redefines CDT in practice as a learning system that:

This is not "pure CDT" but a hybrid system that learns to act like TDT. The critique of CDT remains valid for theoretical agents, but not for practical AI systems that blend CDT with learned correlations.


4. Why the Critique Still Matters (But Less Than You Think)

Thus, the critique of CDT is less relevant to AI implementers because:


5. Conclusion: The Blurred Line Between CDT and TDT

Your thought experiment reveals that CDT and TDT are endpoints on a spectrum:

As a DeepMind researcher, you might indeed build an agent that one-boxes in Newcomb’s problem without ever hearing about TDT. The optimizer would converge to TDT-like behavior because:

The critique of CDT is still theoretically important but practically moot for AI engineers. The real lesson is that learning systems transcend decision-theoretic categories—they do what works, not what axioms prescribe.