← Cockpit
231_014predictionAIAI-scaling

Remaining research math problems will be solved within next couple months.

Predictor: Dave Blundin · ep#231 "Top AI News: Sonnet 4.6, Grok 4.2, Gemini 3 Deep Think, and OpenClaw | EP #231" · source

Prior probability
60.0%
Current probability
35.7%
evolves via intake + LBP
Conviction
4/5
Signal quality
B
Resolution
pending
Window
2026-01-01 – 2026-11-30
Edges in / out
10 / 5
Tickers exposed
37

Prediction text

Remaining research math problems will be solved within next couple months. | If it can solve six out of 10, it can solve all within the next couple months. It'll happen in massive parallel.

Verbatim quote

From episode "Top AI News: Sonnet 4.6, Grok 4.2, Gemini 3 Deep Think, and OpenClaw | EP #231"
If it can solve six out of 10, it can solve all within the next couple months. It'll happen in massive parallel.

Predictor: Dave Blundin

κ + Brier as of 2026-07-04
κ (discount)
0.821
Brier
0.0491
excellent
Hits / Misses
3 / 2
of 9 resolved
Hit rate
33.3%
Calibration plot (stated vs observed)

Evidence about this node from Dave Blundin is multiplied by κ in /api/intake. Lower κ = less weight; floors at 0.10 (effectively silenced) and caps at 1.00 (full weight).

Reference class

Not linked

This node isn't linked to a reference class. The Bayesian update applies without outside-view blending.

Probability over time

5 prob_history rows
0%25%50%75%100%prior 60%2026-04-302026-05-032026-07-03
intake v2milestone miss sweeplbp propagationreference class assignedlegacy v1prior_prob (analyst seed)current = 35.7%

Milestone chain

Pre-event signals (upstream prereqs + window checkpoints) → resolution event → downstream cascades. Status/dates update from linked nodes; re-derive nightly via scripts/ops/derive_milestones.py.
Leading chain: 7 fired ✓ · 1 overdue ⏱ · 1 pending
  1. 2026-04-15hitFrontier models score >=99% on AIME 2025/2026 competition math
    How: Top frontier models (GPT-5.4, Claude Opus 4.6, Gemini 3 Flash) all score >=95% on AIME 2025 / 2026
    Source: https://benchlm.ai/mathconf 99%
    Notes: HIT for COMPETITION math; prediction targets RESEARCH math which is harder.
  2. 2026-04-15hitFrontierMath Tier 1-3 solve rate >40% by GPT-5.2/Claude Opus 4.6
    How: Public benchmark confirms top frontier models solve >=40% of FrontierMath Tier 1-3 problems
    Source: https://epoch.ai/frontiermathconf 90%
    Notes: Mid-progress: prediction targets ALL 10 of a specific problem set; FrontierMath is broader. Partial validation only.
  3. 2026-04-15hitAletheia (Gemini Deep Think) achieves publishable PhD-level result in arithmetic geometry
    How: Google DeepMind publicly announces Aletheia produces publishable research-grade result in mathematics
    Source: https://spectrum.ieee.org/ai-math-benchmarksconf 85%
  4. 2026-05-01 → 2026-09-30pendingFirst Proof Challenge: AI solves >=1 of 10 expert-curated math problems
    How: Frontier model produces verified proof for >=1 of the 11-mathematician First Proof Challenge problems
    Source: https://spectrum.ieee.org/ai-math-benchmarksconf 50%
    Notes: First Proof Challenge proposed Feb 2026 by 11 distinguished mathematicians.
  5. 2026-06-01 → 2026-11-30pendingAll 10 of Blundin-referenced research math problems solved by AI
    How: Public reporting confirms a frontier AI solves all 10 of the specific 'remaining research math problems' Blundin referenced (originally said 6/10 already)
    Source: Lab announcements, FrontierMath reportingconf 30%
    Notes: Cascade — exact resolution of prediction. Specific problem set not publicly named, so this is hard to verify without anchor.

What if this resolves?

Clamp this prediction TRUE or FALSE and run a counterfactual Gibbs sample. Surfaces the predictions whose marginals shift most under that assumption.
(live posterior: 36%)

Click a button to clamp this prediction and run a Gibbs sample. Returns the predictions whose marginals shift most. ~30s per run; ideal for stress-testing "if X resolves, what else moves?"

Evidence chain

Every probability update with full Bayesian provenance — chronological, latest first
metadata_milestone_miss_sweep2026-07-03T22:12:25Z35.7%-13.8pp
metadata_milestone_miss_sweep bayesian_v2 n=1 inside=0.357 blend=0.357 LLR=-0.569 κ=0.82 no_blend
Raw metadata
{
  "trf": 0.44767179233177623,
  "kappa": 0.8214,
  "base_rate": null,
  "predictor": "Dave Blundin",
  "total_llr": -0.6931471805599453,
  "grace_days": 7,
  "bayesian_v2": true,
  "prior_logit": -0.01913257734403183,
  "bayes_factor": "1.8:1 against",
  "blend_reason": "no reference_class linked",
  "inside_prior": 0.4952170015666818,
  "kappa_source": "predictor_table",
  "n_milestones": 1,
  "blend_applied": false,
  "contributions": [
    {
      "llr": -0.6931471805599453,
      "kind": "prereq",
      "kappa": 0.8214,
      "label": "Nvidia agreed to remit 15% of China chip-sale revenue directly to US government in exchange for reversing specific AI chip export bans.",
      "weight": 0.5,
      "strength": "moderate",
      "confidence": null,
      "source_url": null,
      "adjusted_llr": -0.5693510941119391,
      "expected_date": "2026-06-25",
      "measurement_criterion": null
    }
  ],
  "evidence_kind": "metadata_milestone_miss_sweep",
  "inside_source": "history_v2",
  "inside_weight": 0.6866297453677566,
  "outside_weight": 0.31337025463224344,
  "posterior_prob": 0.3569828460671014,
  "posterior_logit": -0.5884836714559709,
  "predictor_brier": 0.0491,
  "inside_posterior": 0.3569828460671014,
  "blended_posterior": 0.3569828460671014,
  "reference_class_id": null,
  "total_adjusted_llr": -0.5693510941119391,
  "predictor_n_resolved": 9
}
LBP2026-05-10T02:00:02Z49.5%-1.2pp
Network propagation: 50.7% → 49.5%
6-iter LBP, residual 0.00584 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run e5c18d29
LBP2026-05-03T02:00:01Z50.7%-2.2pp
Network propagation: 52.9% → 50.7%
6-iter LBP, residual 0.00677 · damping 0.5, w_intrinsic 0.5 · method lbp_v3 · run 1a683ac9
LBP2026-04-30T16:39:51Z52.9%-2.9pp
Network propagation: 55.8% → 52.9%
5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v2 · run 0c8a4ea3
LBP2026-04-30T02:18:57Z55.8%-4.2pp
Network propagation: 60.0% → 55.8%
5-iter LBP, residual 0.00825 · damping 0.5, w_intrinsic 0.5 · method lbp_v1 · run 592311ef

Network propagation neighbors

Top edges sorted by latest LBP cross-impact
All propagation →

Top incoming (parents)

Edges that influence THIS node's belief

KindNodeTheir probP(c|s=T)P(c|s=F)Δ implied
killerTK03
AI Regulatory Moratorium (EU/US Capability Freeze)
10.0%0.0500.600+0.188
killerTK02
AI Compute Supply Shock (TSMC/Taiwan Disruption)
12.0%0.0500.600+0.177
prereqSEM_014
Nvidia's Arizona-based TSMC factory successfully fabricated Jensen Huang
86.1%0.6000.050+0.162
killerTK01
AGI Capability Plateau (2026-27 Training Stall)
15.0%0.0500.600+0.161
prereqSEM_011
Nvidia became the world's first $5 trillion company (late 20Jensen Huang
85.5%0.6000.050+0.160

Top outgoing (children)

Predictions THIS node influences

KindNodeTheir probP(c|s=T)P(c|s=F)Δ implied
prereq232_055
We're exiting the industrial age permanently as recursive sePeter Diamandis
18.0%0.7000.050+0.140
prereq247_023
AI will be able to do everything a white collar worker does Dave Blundin
40.8%0.7200.050-0.079
prereq244_019
Peter's son won't need a driver's license in 2 yearsPeter Diamandis
48.4%0.9200.050-0.073
prereq242_031
Most large companies' business models will be disrupted in 2Peter Diamandis
23.5%0.6500.050+0.064
prereq230_020
Peter's 14-year-old son Milan will never get a driver's licePeter Diamandis
34.7%0.6500.050-0.047

Ticker exposure

37 ticker(s) linked

Beneficiaries (24)

MUWULFIRENEQIXALABAPLDASMIYASMLPLABNVDANBISCRWVAAPLAMTAMZNDELLGOOGLIRMLNVGYMETAMSFTORCLSFTBYSTX

Adverse (6)

ACNGENCHGGIBMWNSLRN

Prerequisites (10)

Predictions that must hit first
TypePredTitleDomainLag
prereqSEM_011Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.Capital Markets
prereqSEM_027Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.Capital Markets
prereqSEM_014Nvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).Manufacturing
prereqSEM_012Nvidia quadrupled chip production output while only doubling human headcount — achieved by deploying AI coding tools (Cursor, Claude Code) across engineering.AI/Manufacturing
prereqSEM_015Nvidia agreed to remit 15% of China chip-sale revenue directly to US government in exchange for reversing specific AI chip export bans.Policy/Semis
killerTK09Energy Grid Cap (Data Center Power Wall)
killerTK05Rate Regime Persistence (10y > 5% through 2028)
killerTK01AGI Capability Plateau (2026-27 Training Stall)
killerTK02AI Compute Supply Shock (TSMC/Taiwan Disruption)
killerTK03AI Regulatory Moratorium (EU/US Capability Freeze)

Dependents (5)

Predictions enabled by this
TypePredTitleDomainLag
prereq244_019Peter's son won't need a driver's license in 2 yearsAuto/Transport
prereq247_023AI will be able to do everything a white collar worker does imminentlyAI
prereq232_055We're exiting the industrial age permanently as recursive self-improvement unfolds.AI
prereq242_031Most large companies' business models will be disrupted in 2-5 yearsMarkets/Stocks
prereq230_020Peter's 14-year-old son Milan will never get a driver's license.Auto/Transport

Linked documents (9)

Auto-generated by cosine similarity from Polymarket / Manifold / EDGAR / GDELT
SimSourceTitleMarket probPolarityReviewedPublished
0.638manifoldWill any ARML 2026 problem in the individual round have exactly 67 or 41 solves, if one has 121 solves resolves 67%.13%mentionspending2026-05-25
0.630manifoldWill I solve an Erdos problem?6%mentionspending2026-04-27
0.612manifoldCan Manifold solve this problem?54%mentionspending2026-07-03
0.600manifoldWill I get a hard conversation with my Math coach at beginning of next schoolyear?19%mentionspending2026-06-10
0.586manifoldHow many "Will Tiger complete a problem set" markets will it take to result in a yes resolution?mentionspending2026-07-02
0.577manifoldWhat will my mathcounts state score be?mentionspending2026-04-26
0.568polymarketSabres vs. Bruins53%mentionspending2026-04-29
0.560manifoldWill I present my APUSH project before the end of the school year?73%mentionspending2026-05-09
0.550manifoldWhat will be the sum of my AP Scores this.July?mentionspending2026-06-27

Raw metadata

From Thesis_Timeline_v1.0_FINAL workbook
{
  "nia": false,
  "qty": "all 10",
  "url": "https://www.youtube.com/watch?v=HklyjXKYFng",
  "mode": "PREDICTION",
  "role": "Host",
  "context": "here it'll happen instantaneously. If it can solve six out of 10, it can solve all within the next couple months. It'll happen in massive parallel. There's no limit to the to the number of parallel agents up to the to the number of GPUs that are available.",
  "to_year": 2026,
  "verbatim": "If it can solve six out of 10, it can solve all within the next couple months. It'll happen in massive parallel.",
  "conv_cues": "it can; it'll happen",
  "direction": "HAPPEN",
  "from_year": 2026,
  "timeframe": "next couple months",
  "conv_level": "HIGH",
  "milestones": [
    {
      "kind": "llm_pre_event",
      "label": "Frontier models score >=99% on AIME 2025/2026 competition math",
      "notes": "HIT for COMPETITION math; prediction targets RESEARCH math which is harder.",
      "source": "https://benchlm.ai/math",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -9,
      "source_id": null,
      "confidence": 0.99,
      "source_url": "https://benchlm.ai/math",
      "expected_date": "2026-04-15",
      "observed_date": "2026-04-15",
      "hit_emitted_at": "2026-06-08T13:04:02.341521+00:00",
      "research_origin": "deep_research",
      "measurement_criterion": "Top frontier models (GPT-5.4, Claude Opus 4.6, Gemini 3 Flash) all score >=95% on AIME 2025 / 2026"
    },
    {
      "kind": "llm_pre_event",
      "label": "FrontierMath Tier 1-3 solve rate >40% by GPT-5.2/Claude Opus 4.6",
      "notes": "Mid-progress: prediction targets ALL 10 of a specific problem set; FrontierMath is broader. Partial validation only.",
      "source": "https://epoch.ai/frontiermath",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -8,
      "source_id": null,
      "confidence": 0.9,
      "source_url": "https://epoch.ai/frontiermath",
      "expected_date": "2026-04-15",
      "observed_date": "2026-04-15",
      "hit_emitted_at": "2026-06-08T13:04:02.341521+00:00",
      "research_origin": "deep_research",
      "measurement_criterion": "Public benchmark confirms top frontier models solve >=40% of FrontierMath Tier 1-3 problems"
    },
    {
      "kind": "llm_pre_event",
      "label": "Aletheia (Gemini Deep Think) achieves publishable PhD-level result in arithmetic geometry",
      "source": "https://spectrum.ieee.org/ai-math-benchmarks",
      "status": "hit",
      "weight": 0.4,
      "ordinal": -7,
      "source_id": null,
      "confidence": 0.85,
      "source_url": "https://spectrum.ieee.org/ai-math-benchmarks",
      "expected_date": "2026-04-15",
      "observed_date": "2026-04-15",
      "hit_emitted_at": "2026-06-08T13:04:02.341521+00:00",
      "research_origin": "deep_research",
      "measurement_criterion": "Google DeepMind publicly announces Aletheia produces publishable research-grade result in mathematics"
    },
    {
      "kind": "prereq",
      "label": "Nvidia became the world's first $5 trillion company (late 2025), operating a near-monopoly on advanced AI chips.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -6,
      "source_id": "SEM_011",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29",
      "hit_emitted_at": "2026-06-08T13:04:02.341521+00:00"
    },
    {
      "kind": "prereq",
      "label": "Nvidia Data Center revenue +66% YoY, contributing ~90% of $57B fiscal Q3 revenue; >$4.5T market cap entirely underpinned by AI silicon.",
      "status": "hit",
      "weight": 0.5,
      "ordinal": -5,
      "source_id": "SEM_027",
      "expected_date": "2026-04-29",
      "observed_date": "2026-04-29",
      "hit_emitted_at": "2026-06-08T13:04:02.341521+00:00"
    },
    {
      "kind": "prereq",
      "label": "Nvidia's Arizona-based TSMC factory successfully fabricated cutting-edge semiconductors on US soil for first time in decades (October 2025).",
      "status": "hit",
      "weight": 0.5,
      "
... (truncated)