『Gradient Descent - Podcast about AI and Data』のカバーアート

Gradient Descent - Podcast about AI and Data

Gradient Descent - Podcast about AI and Data

著者: Wisecube AI
無料で聴く

このコンテンツについて

“Gradient Descent" is a podcast that delves into the depths of artificial intelligence and data science. Hosted by Vishnu Vettrivel (Founder of Wisecube AI) and Alex Thomas (Principal Data Scientist), the show explores the latest trends, innovations, and practical applications in AI and data science. Join us to learn more about how these technologies are shaping our future.Wisecube AI
エピソード
  • LLM Fine-Tuning: RLHF vs DPO and beyond
    2025/05/13

    In this episode of Gradient Descent, we explore two competing approaches to fine-tuning LLMs: Reinforcement Learning with Human Feedback (RLHF) and Direct Preference Optimization (DPO). Dive into the mechanics of RLHF, its computational challenges, and how DPO simplifies the process by eliminating the need for a separate reward model. We also discuss supervised fine-tuning, emerging methods like Identity Preference Optimization (IPO) and Kahneman-Tversky Optimization (KTO), and their real-world applications in models like Llama 3 and Mistral. Learn practical LLM optimization strategies, including task modularization to boost performance without extensive fine-tuning.


    Timestamps:

    Intro - 0:00

    Overview of LLM Fine-Tuning - 00:48

    Understanding RLHF

    Deep Dive into RLHF - 02:46

    Supervised Fine-Tuning vs. RLHF - 10:38

    DPO and Other RLHF Alternatives - 14:43

    Real-World Applications in Frontier Models - 22:23

    Practical Tips for LLM Optimization - 25:18

    Closing Thoughts - 36:05


    References:

    [1] Training language models to follow instructions with human feedback https://arxiv.org/abs/2203.02155

    [2] Direct Preference Optimization: Your Language Model is Secretly a Reward Model https://arxiv.org/abs/2305.18290

    [3] Hugging Face Blog on DPO: Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO) https://huggingface.co/blog/ariG23498/rlhf-to-dpo

    [4] Comparative Analysis: RLHF and DPO Compared https://crowdworks.blog/en/rlhf-and-dpo-compared/

    [5] YouTube Explanation: How to fine-tune LLMs directly without reinforcement learning https://www.youtube.com/watch?v=k2pD3k1485A


    Listen on:

    • Apple Podcasts:

    https://podcasts.apple.com/us/podcast/gradient-descent-podcast-about-ai-and-data/id1801323847

    • Spotify:

    https://open.spotify.com/show/1nG58pwg2Dv6oAhCTzab55

    • Amazon Music:

    https://music.amazon.com/podcasts/79f6ed45-ef49-4919-bebc-e746e0afe94c/gradient-descent---podcast-about-ai-and-data


    Our solutions:

    - https://askpythia.ai/ - LLM Hallucination Detection Tool

    - https://www.wisecube.ai - Wisecube AI platform for large-scale biomedical knowledge analysis


    Follow us:

    - Pythia Website: https://askpythia.ai/

    - Wisecube Website: https://www.wisecube.ai

    - LinkedIn: https://www.linkedin.com/company/wisecube/

    - Facebook: https://www.facebook.com/wisecubeai

    - Twitter: https://x.com/wisecubeai

    - Reddit: https://www.reddit.com/r/pythia/

    - GitHub: https://github.com/wisecubeai


    #FineTuning #LLM #DeepLearning #RLHF #DPO #AI #MachineLearning #AIDevelopment

    続きを読む 一部表示
    38 分
  • The Future of Prompt Engineering: Prompts to Programs
    2025/04/29

    Explore the evolution of prompt engineering in this episode of Gradient Descent. Manual prompt tuning — slow, brittle, and hard to scale — is giving way to DSPy, a framework that turns LLM prompting into a structured, programmable, and optimizable process.

    Learn how DSPy’s modular approach — with Signatures, Modules, and Optimizers — enables LLMs to tackle complex tasks like multi-hop reasoning and math problem solving, achieving accuracy comparable to much larger models. We also dive into real-world examples, optimization strategies, and why the future of prompting looks a lot more like programming.


    Listen to our podcast on these platforms:

    • YouTube: https://youtube.com/@WisecubeAI/podcasts

    • Apple Podcasts: https://apple.co/4kPMxZf

    • Spotify: https://open.spotify.com/show/1nG58pwg2Dv6oAhCTzab55

    • Amazon Music: https://bit.ly/4izpdO2


    Mentioned Materials:

    • DSPy Paper - https://arxiv.org/abs/2310.03714

    • DSPy official site - https://dspy.ai/

    • DSPy GitHub - https://github.com/stanfordnlp/dspy

    • LLM abstractions guide - https://www.twosigma.com/articles/a-guide-to-large-language-model-abstractions/


    Our solutions:

    - https://askpythia.ai/ - LLM Hallucination Detection Tool

    - https://www.wisecube.ai - Wisecube AI platform for large-scale biomedical knowledge analysis


    Follow us:

    - Pythia Website: https://askpythia.ai/

    - Wisecube Website: https://www.wisecube.ai

    - LinkedIn: https://www.linkedin.com/company/wisecube/

    - Facebook: https://www.facebook.com/wisecubeai

    - Twitter: https://x.com/wisecubeai

    - Reddit: https://www.reddit.com/r/pythia/

    - GitHub: https://github.com/wisecubeai


    #AI #PromptEngineering #DSPy #MachineLearning #LLM #ArtificialIntelligence #AIdevelopment

    続きを読む 一部表示
    36 分
  • Agentic AI – Hype or the Next Step in AI Evolution?
    2025/04/12

    Let’s dive into Agentic AI, guided by the "Cognitive Architectures for Language Agents" (CoALA) paper. What defines an agentic system? How does it plan, leverage memory, and execute tasks? We explore semantic, episodic, and procedural memory, discuss decision-making loops, and examine how agents integrate with external APIs (think LangGraph). Learn how AI tackles complex automation — from code generation to playing Minecraft — and why designing robust action spaces is key to scaling systems. We also touch on challenges like memory updates and the ethics of agentic AI. Get actionable insight…

    🔗 Links to the CoALA paper, LangGraph, and more in the description.

    🔔 Subscribe to stay updated with Gradient Descent!


    Listen on:

    • ⁠YouTube⁠: https://youtube.com/@WisecubeAI/podcasts

    • ⁠Apple Podcast⁠: https://apple.co/4kPMxZf

    • ⁠Spotify⁠: https://open.spotify.com/show/1nG58pwg2Dv6oAhCTzab55

    • ⁠Amazon Music⁠: https://bit.ly/4izpdO2


    Mentioned Materials:

    • Cognitive Architectures for Language Agents (CoALA) - https://arxiv.org/abs/2309.02427

    • Memory for agents - https://blog.langchain.dev/memory-for-agents/

    • LangChain - https://python.langchain.com/docs/introduction/

    • LangGraph - https://langchain-ai.github.io/langgraph/


    Our solutions:

    - https://askpythia.ai/ - LLM Hallucination Detection Tool

    - https://www.wisecube.ai - Wisecube AI platform can analyze millions of biomedical publications, clinical trials, protein and chemical databases.


    Follow us:

    - Pythia Website: https://askpythia.ai/

    - Wisecube Website: https://www.wisecube.ai

    - LinkedIn: https://www.linkedin.com/company/wisecube/

    - Facebook: https://www.facebook.com/wisecubeai

    - X: https://x.com/wisecubeai

    - Reddit: https://www.reddit.com/r/pythia/

    - GitHub: https://github.com/wisecubeai


    #AgenticAI #FutureOfAI #AIInnovation #ArtificialIntelligence #MachineLearning #DeepLearning #LLM

    続きを読む 一部表示
    41 分

Gradient Descent - Podcast about AI and Dataに寄せられたリスナーの声

カスタマーレビュー:以下のタブを選択することで、他のサイトのレビューをご覧になれます。