Artwork

Innehåll tillhandahållet av Mark Moyou, PhD and Mark Moyou. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av Mark Moyou, PhD and Mark Moyou eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.
Player FM - Podcast-app
Gå offline med appen Player FM !

Kyle Kranen: End Points, Optimizing LLMs, JNNs, Foundation Models - AI Portfolio Podcast

1:30:10
 
Dela
 

Manage episode 437345386 series 3596668
Innehåll tillhandahållet av Mark Moyou, PhD and Mark Moyou. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av Mark Moyou, PhD and Mark Moyou eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.

Kyle Kranen, an engineering leader at NVIDIA, who is at the forefront of deep learning, real-world applications, and production. Kyle shares his expertise on optimizing large language models (LLMs) for deployment, exploring the complexities of scaling and parallelism.
📲 Kyle Kranen Socials:
LinkedIn: https://www.linkedin.com/in/kyle-kranen/
Twitter: https://x.com/kranenkyle
📲 Mark Moyou, PhD Socials:
LinkedIn: https://www.linkedin.com/in/markmoyou/
Twitter: https://twitter.com/MarkMoyou
📗 Chapters
[00:00] Intro
[01:26] Optimizing LLM for deployment
[10:23] Economy of Scale (Batch Size)
[13:18] Data Parallelism
[14:30] Kernel
[18:48] Hardest part of optimizing
[22:26] Choosing hardware for LLM
[31:33] Storage and Networking - Analyzing Performance
[32:33] Minimum size of model where tensor parallel gives you advantage
[35:20] Director Level folks thinking about deploying LLM
[37:29] Kyle is working on AI foundation models
[40:38] Deploying Models with endpoints
[42:43] Fine Tuning, Deploying Loras
[45:02] Stare LM
[48:09] KV Cache
[51:43] Advice for people for deploying reasonable and large scale LLMs
[58:08] Graph Neural Network
[01:00:04] JNNs
[01:04:22] Using GPUs to do JNNs
[01:08:25] Starting JNN journey
[01:12:51] Career Optimization Function
[01:14:46] Solving Hard Problems
[01:16:20] Maintaining Technical Skills
[01:20:53] Deep learning expert
[01:26:00] Rapid Round

  continue reading

14 episoder

Artwork
iconDela
 
Manage episode 437345386 series 3596668
Innehåll tillhandahållet av Mark Moyou, PhD and Mark Moyou. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av Mark Moyou, PhD and Mark Moyou eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.

Kyle Kranen, an engineering leader at NVIDIA, who is at the forefront of deep learning, real-world applications, and production. Kyle shares his expertise on optimizing large language models (LLMs) for deployment, exploring the complexities of scaling and parallelism.
📲 Kyle Kranen Socials:
LinkedIn: https://www.linkedin.com/in/kyle-kranen/
Twitter: https://x.com/kranenkyle
📲 Mark Moyou, PhD Socials:
LinkedIn: https://www.linkedin.com/in/markmoyou/
Twitter: https://twitter.com/MarkMoyou
📗 Chapters
[00:00] Intro
[01:26] Optimizing LLM for deployment
[10:23] Economy of Scale (Batch Size)
[13:18] Data Parallelism
[14:30] Kernel
[18:48] Hardest part of optimizing
[22:26] Choosing hardware for LLM
[31:33] Storage and Networking - Analyzing Performance
[32:33] Minimum size of model where tensor parallel gives you advantage
[35:20] Director Level folks thinking about deploying LLM
[37:29] Kyle is working on AI foundation models
[40:38] Deploying Models with endpoints
[42:43] Fine Tuning, Deploying Loras
[45:02] Stare LM
[48:09] KV Cache
[51:43] Advice for people for deploying reasonable and large scale LLMs
[58:08] Graph Neural Network
[01:00:04] JNNs
[01:04:22] Using GPUs to do JNNs
[01:08:25] Starting JNN journey
[01:12:51] Career Optimization Function
[01:14:46] Solving Hard Problems
[01:16:20] Maintaining Technical Skills
[01:20:53] Deep learning expert
[01:26:00] Rapid Round

  continue reading

14 episoder

Wszystkie odcinki

×
 
Loading …

Välkommen till Player FM

Player FM scannar webben för högkvalitativa podcasts för dig att njuta av nu direkt. Den är den bästa podcast-appen och den fungerar med Android, Iphone och webben. Bli medlem för att synka prenumerationer mellan enheter.

 

Snabbguide