• Kyle Kranen: End Points, Optimizing LLMs, GNNs, Foundation Models - AI Portfolio Podcast #011

  • 2024/10/19
  • 再生時間: 1 時間 30 分
  • ポッドキャスト

Kyle Kranen: End Points, Optimizing LLMs, GNNs, Foundation Models - AI Portfolio Podcast #011

  • サマリー

  • Get 1000 free inference requests for LLMs on build.nvidia.com
    Kyle Kranen, an engineering leader at NVIDIA, who is at the forefront of deep learning, real-world applications, and production. Kyle shares his expertise on optimizing large language models (LLMs) for deployment, exploring the complexities of scaling and parallelism.

    📲 Kyle Kranen Socials:
    LinkedIn: https://www.linkedin.com/in/kyle-kranen/
    Twitter: https://x.com/kranenkyle

    📲 Mark Moyou, PhD Socials:
    LinkedIn: https://www.linkedin.com/in/markmoyou/
    Twitter: https://twitter.com/MarkMoyou

    📗 Chapters
    [00:00] Intro
    [01:26] Optimizing LLMs for deployment
    [10:23] Economy of Scale (Batch Size)
    [13:18] Data Parallelism
    [14:30] Kernels on GPUs
    [18:48] Hardest part of optimizing
    [22:26] Choosing hardware for LLM
    [31:33] Storage and Networking - Analyzing Performance
    [32:33] Minimum size of model where tensor parallel gives you advantage
    [35:20] Director Level folks thinking about deploying LLM
    [37:29] Kyle is working on AI foundation models
    [40:38] Deploying Models with endpoints
    [42:43] Fine Tuning, Deploying Loras
    [45:02] SteerLM
    [48:09] KV Cache
    [51:43] Advice for people for deploying reasonable and large scale LLMs
    [58:08] Graph Neural Networks
    [01:00:04] GNNs
    [01:04:22] Using GPUs to do GNNs
    [01:08:25] Starting your GNN journey
    [01:12:51] Career Optimization Function
    [01:14:46] Solving Hard Problems
    [01:16:20] Maintaining Technical Skills
    [01:20:53] Deep learning expert
    [01:26:00] Rapid Round

    続きを読む 一部表示

あらすじ・解説

Get 1000 free inference requests for LLMs on build.nvidia.com
Kyle Kranen, an engineering leader at NVIDIA, who is at the forefront of deep learning, real-world applications, and production. Kyle shares his expertise on optimizing large language models (LLMs) for deployment, exploring the complexities of scaling and parallelism.

📲 Kyle Kranen Socials:
LinkedIn: https://www.linkedin.com/in/kyle-kranen/
Twitter: https://x.com/kranenkyle

📲 Mark Moyou, PhD Socials:
LinkedIn: https://www.linkedin.com/in/markmoyou/
Twitter: https://twitter.com/MarkMoyou

📗 Chapters
[00:00] Intro
[01:26] Optimizing LLMs for deployment
[10:23] Economy of Scale (Batch Size)
[13:18] Data Parallelism
[14:30] Kernels on GPUs
[18:48] Hardest part of optimizing
[22:26] Choosing hardware for LLM
[31:33] Storage and Networking - Analyzing Performance
[32:33] Minimum size of model where tensor parallel gives you advantage
[35:20] Director Level folks thinking about deploying LLM
[37:29] Kyle is working on AI foundation models
[40:38] Deploying Models with endpoints
[42:43] Fine Tuning, Deploying Loras
[45:02] SteerLM
[48:09] KV Cache
[51:43] Advice for people for deploying reasonable and large scale LLMs
[58:08] Graph Neural Networks
[01:00:04] GNNs
[01:04:22] Using GPUs to do GNNs
[01:08:25] Starting your GNN journey
[01:12:51] Career Optimization Function
[01:14:46] Solving Hard Problems
[01:16:20] Maintaining Technical Skills
[01:20:53] Deep learning expert
[01:26:00] Rapid Round

Kyle Kranen: End Points, Optimizing LLMs, GNNs, Foundation Models - AI Portfolio Podcast #011に寄せられたリスナーの声

カスタマーレビュー:以下のタブを選択することで、他のサイトのレビューをご覧になれます。