Data, data, everywhere - enough for AGI?

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

Innehåll tillhandahållet av Turpentine, Erik Torenberg, and Nathan Labenz. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av Turpentine, Erik Torenberg, and Nathan Labenz eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.

1M ago 1:01:40

MP3•Episod hem

In this podcast, Nathan and Nick dive deep into the data requirements for achieving Artificial General Intelligence. They explore the current paradigms, the role of data in approximating intelligence, and the scaling trends for GPT models. The discussion covers various datasets, from email and Twitter to YouTube and genomic data, as they analyze the feasibility of reaching the target of 100 trillion high-quality tokens. While the bull case suggests an abundance of data, the bear case highlights the limits on high-quality data, prompting a fascinating exploration of what makes data good for AI and the potential for AI to generate its own data.

Chapters

(00:00) Introduction

(05:04) Scaling Hypothesis of Intelligence

(07:32) Is There Enough High Quality Data?

(10:19) Algorithms Impacting Data Requirements

(17:42) Sponsor : Omneky

(18:04) Estimating High Quality Token Requirements

(24:07) Astronomy and YouTube Data Scale

(29:42) Genomics Data

(37:58) Sponsors : Brave / Plumb / Squad

(41:16) Code Datasets and Synthetic Data

(45:48) The Bear Case: Quality and Usability of Data

(50:54) Investment Trends and Compute Efficiency

(54:19) Training Run

(57:21) Synthetic Data Generation and Self-Play

135 episoder

#Tech #Society #Entrepreneur #Business #Turpentine #Erik Torenberg #Nathan Labenz #AI #Artificial Intelligence #Founders