Kyle Kranen: End Points, Optimizing LLMs, JNNs, Foundation Models - AI Portfolio Podcast AI Portfolio podcast

Artwork

Business Mark Moyou, PhD Mark Moyou

Innehåll tillhandahållet av Mark Moyou, PhD and Mark Moyou. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av Mark Moyou, PhD and Mark Moyou eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.

AI Portfolio Podcast « »
Kyle Kranen: End Points, Optimizing LLMs, JNNs, Foundation Models - AI Portfolio Podcast

3M ago 1:30:10

Dela

MP3•Episod hem

Innehåll tillhandahållet av Mark Moyou, PhD and Mark Moyou. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av Mark Moyou, PhD and Mark Moyou eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.

Kyle Kranen, an engineering leader at NVIDIA, who is at the forefront of deep learning, real-world applications, and production. Kyle shares his expertise on optimizing large language models (LLMs) for deployment, exploring the complexities of scaling and parallelism.
📲 Kyle Kranen Socials:
LinkedIn: https://www.linkedin.com/in/kyle-kranen/
Twitter: https://x.com/kranenkyle
📲 Mark Moyou, PhD Socials:
LinkedIn: https://www.linkedin.com/in/markmoyou/
Twitter: https://twitter.com/MarkMoyou
📗 Chapters
[00:00] Intro
[01:26] Optimizing LLM for deployment
[10:23] Economy of Scale (Batch Size)
[13:18] Data Parallelism
[14:30] Kernel
[18:48] Hardest part of optimizing
[22:26] Choosing hardware for LLM
[31:33] Storage and Networking - Analyzing Performance
[32:33] Minimum size of model where tensor parallel gives you advantage
[35:20] Director Level folks thinking about deploying LLM
[37:29] Kyle is working on AI foundation models
[40:38] Deploying Models with endpoints
[42:43] Fine Tuning, Deploying Loras
[45:02] Stare LM
[48:09] KV Cache
[51:43] Advice for people for deploying reasonable and large scale LLMs
[58:08] Graph Neural Network
[01:00:04] JNNs
[01:04:22] Using GPUs to do JNNs
[01:08:25] Starting JNN journey
[01:12:51] Career Optimization Function
[01:14:46] Solving Hard Problems
[01:16:20] Maintaining Technical Skills
[01:20:53] Deep learning expert
[01:26:00] Rapid Round

… continue reading

14 episoder

#Business #Mark Moyou, PhD #Mark Moyou

Artwork

Kyle Kranen: End Points, Optimizing LLMs, JNNs, Foundation Models - AI Portfolio Podcast

AI Portfolio Podcast

published 3M ago

Dela

MP3•Episod hem

Innehåll tillhandahållet av Mark Moyou, PhD and Mark Moyou. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av Mark Moyou, PhD and Mark Moyou eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.

Kyle Kranen, an engineering leader at NVIDIA, who is at the forefront of deep learning, real-world applications, and production. Kyle shares his expertise on optimizing large language models (LLMs) for deployment, exploring the complexities of scaling and parallelism.
📲 Kyle Kranen Socials:
LinkedIn: https://www.linkedin.com/in/kyle-kranen/
Twitter: https://x.com/kranenkyle
📲 Mark Moyou, PhD Socials:
LinkedIn: https://www.linkedin.com/in/markmoyou/
Twitter: https://twitter.com/MarkMoyou
📗 Chapters
[00:00] Intro
[01:26] Optimizing LLM for deployment
[10:23] Economy of Scale (Batch Size)
[13:18] Data Parallelism
[14:30] Kernel
[18:48] Hardest part of optimizing
[22:26] Choosing hardware for LLM
[31:33] Storage and Networking - Analyzing Performance
[32:33] Minimum size of model where tensor parallel gives you advantage
[35:20] Director Level folks thinking about deploying LLM
[37:29] Kyle is working on AI foundation models
[40:38] Deploying Models with endpoints
[42:43] Fine Tuning, Deploying Loras
[45:02] Stare LM
[48:09] KV Cache
[51:43] Advice for people for deploying reasonable and large scale LLMs
[58:08] Graph Neural Network
[01:00:04] JNNs
[01:04:22] Using GPUs to do JNNs
[01:08:25] Starting JNN journey
[01:12:51] Career Optimization Function
[01:14:46] Solving Hard Problems
[01:16:20] Maintaining Technical Skills
[01:20:53] Deep learning expert
[01:26:00] Rapid Round

… continue reading

14 episoder

#Business #Mark Moyou, PhD #Mark Moyou

Wszystkie odcinki

×

Välkommen till Player FM

Player FM scannar webben för högkvalitativa podcasts för dig att njuta av nu direkt. Den är den bästa podcast-appen och den fungerar med Android, Iphone och webben. Bli medlem för att synka prenumerationer mellan enheter.

Lyssna på 500+ ämnen