エピソード

  • (Voiceover) OpenAI's o3: The grand finale of AI in 2024
    2024/12/20

    Original post:

    https://www.interconnects.ai/p/openais-o3-the-2024-finale-of-ai

    Chapters

    00:00 Introduction

    02:51 o3 overview

    05:57 Solving the Abstraction and Reasoning Corpus (ARC)

    10:41 o3’s architecture, cost, and training (hint: still no tree search)

    16:36 2024: RL returns

    Figures

    Fig 1, Frontier Math results

    Fig 2, Coding results

    Fig 3, ARC AGI results

    Fig 4, ARC AGI result details

    Fig 5, ARC AGI example 1

    Fig 6, ARC AGI example in text

    Fig 7, ARC AGI example “easy”



    Get full access to Interconnects at www.interconnects.ai/subscribe
    続きを読む 一部表示
    18 分
  • (Voiceover) The AI agent spectrum
    2024/12/18

    Original post: https://www.interconnects.ai/p/the-ai-agent-spectrum

    Chapters

    00:00 Introduction

    03:24 Agent cartography

    08:02 Questions for the near future

    Figures

    Fig 1. multiple feedbacks diagram



    Get full access to Interconnects at www.interconnects.ai/subscribe
    続きを読む 一部表示
    11 分
  • (Voiceover) OpenAI's Reinforcement Finetuning and RL for the masses
    2024/12/11

    Original post:

    https://www.interconnects.ai/p/openais-reinforcement-finetuning

    Chapters

    00:00 Introduction

    04:19 The impact of reinforcement finetuning’s existence

    07:29 Hypotheses on reinforcement finetuning’s implementation

    Figures

    Fig. 1, Yann’s Cake

    Fig. 2, Grader config

    Fig. 3, RLVR learning curves



    Get full access to Interconnects at www.interconnects.ai/subscribe
    続きを読む 一部表示
    13 分
  • Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning
    2024/12/05
    Finbarr Timbers is an AI researcher who writes Artificial Fintelligence — one of the technical AI blog’s I’ve been recommending for a long time — and has a variety of experiences at top AI labs including DeepMind and Midjourney. The goal of this interview was to do a few things:* Revisit what reinforcement learning (RL) actually is, its origins, and its motivations.* Contextualize the major breakthroughs of deep RL in the last decade, from DQN for Atari to AlphaZero to ChatGPT. How could we have seen the resurgence coming? (see the timeline below for the major events we cover)* Modern uses for RL, o1, RLHF, and the future of finetuning all ML models.* Address some of the critiques like “RL doesn’t work yet.”It was a fun one. Listen on Apple Podcasts, Spotify, YouTube, and where ever you get your podcasts. For other Interconnects interviews, go here.Timeline of RL and what was happening at the timeIn the last decade of deep RL, there have been a few phases.* Era 1: Deep RL fundamentals — when modern algorithms we designed and proven.* Era 2: Major projects — AlphaZero, OpenAI 5, and all the projects that put RL on the map.* Era 3: Slowdown — when DeepMind and OpenAI no longer had the major RL projects and cultural relevance declined.* Era 4: RLHF & widening success — RL’s new life post ChatGPT.Covering these is the following events. This is incomplete, but enough to inspire a conversation.Early era: TD Gammon, REINFORCE, Etc2013: Deep Q Learning (Atari)2014: Google acquires DeepMind2016: AlphaGo defeats Lee Sedol2017: PPO paper, AlphaZero (no human data)2018: OpenAI Five, GPT 22019: AlphaStar, robotic sim2real with RL early papers (see blog post)2020: MuZero2021: Decision Transformer2022: ChatGPT, sim2real continues.2023: Scaling laws for RL (blog post), doubt of RL2024: o1, post-training, RL’s bloomInterconnects is a reader-supported publication. Consider becoming a subscriber.Chapters* [00:00:00] Introduction* [00:02:14] Reinforcement Learning Fundamentals* [00:09:03] The Bitter Lesson* [00:12:07] Reward Modeling and Its Challenges in RL* [00:16:03] Historical Milestones in Deep RL* [00:21:18] OpenAI Five and Challenges in Complex RL Environments* [00:25:24] Recent-ish Developments in RL: MuZero, Decision Transformer, and RLHF* [00:30:29] OpenAI's O1 and Exploration in Language Models* [00:40:00] Tülu 3 and Challenges in RL Training for Language Models* [00:46:48] Comparing Different AI Assistants* [00:49:44] Management in AI Research* [00:55:30] Building Effective AI Teams* [01:01:55] The Need for Personal BrandingWe mention* O1 (OpenAI model)* Rich Sutton* University of Alberta* London School of Economics* IBM’s Deep Blue* Alberta Machine Intelligence Institute (AMII)* John Schulman* Claude (Anthropic's AI assistant)* Logan Kilpatrick* Bard (Google's AI assistant)* DeepSeek R1 Lite* Scale AI* OLMo (AI2's language model)* Golden Gate Claude Get full access to Interconnects at www.interconnects.ai/subscribe
    続きを読む 一部表示
    1 時間 9 分
  • (Voiceover) OpenAI's o1 using "search" was a PSYOP
    2024/12/04

    Original post: https://www.interconnects.ai/p/openais-o1-using-search-was-a-psyop

    Figures

    Figure 0: OpenAI’s seminal test-time compute plot

    Figure 1: Setup for bucketed evals

    Figure 2: Evals with correctness labels

    Figure 3: Grouped evals

    Figure 4: Hypothetical inference scaling law



    Get full access to Interconnects at www.interconnects.ai/subscribe
    続きを読む 一部表示
    12 分
  • (Voiceover) OLMo 2 and building effective teams for training language models
    2024/11/26

    Full post:

    https://www.interconnects.ai/p/olmo-2-and-building-language-model-training

    OLMo 2 demo: https://playground.allenai.org/

    OLMo 2 artifacts: https://huggingface.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc

    Chapters

    00:00 Building AI Teams

    06:35 OLMo 2

    Figures

    Fig 1, pretrain plot: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmo2/pretrain.webp

    Fig 2, pretrain table: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmo2/pretrain-table.webp

    Fig 3, post-train table: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmo2/postrain-table.webp



    Get full access to Interconnects at www.interconnects.ai/subscribe
    続きを読む 一部表示
    10 分
  • (Voiceover) Tülu 3: The next era in open post-training
    2024/11/21

    Original post: https://www.interconnects.ai/p/tulu-3

    Chapters

    00:00 History

    05:44 Technical details sneak peak

    Figures

    Fig 1, results: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/results.webp

    Fig 2, overview: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/overview.webp

    Fig 3, preferences: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/preferences.webp

    Fig 4, RLVR: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/rlvr.webp



    Get full access to Interconnects at www.interconnects.ai/subscribe
    続きを読む 一部表示
    8 分
  • (Voiceover) Scaling realities
    2024/11/14

    Original post: https://www.interconnects.ai/p/scaling-realities



    Get full access to Interconnects at www.interconnects.ai/subscribe
    続きを読む 一部表示
    4 分