Artwork

Innehåll tillhandahållet av Michaël Trazzi. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av Michaël Trazzi eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.
Player FM - Podcast-app
Gå offline med appen Player FM !

[Crosspost] Adam Gleave on Vulnerabilities in GPT-4 APIs (+ extra Nathan Labenz interview)

2:16:08
 
Dela
 

Manage episode 418754446 series 2966339
Innehåll tillhandahållet av Michaël Trazzi. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av Michaël Trazzi eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.

This is a special crosspost episode where Adam Gleave is interviewed by Nathan Labenz from the Cognitive Revolution. At the end I also have a discussion with Nathan Labenz about his takes on AI.

Adam Gleave is the founder of Far AI, and with Nathan they discuss finding vulnerabilities in GPT-4's fine-tuning and Assistant PIs, Far AI's work exposing exploitable flaws in "superhuman" Go AIs through innovative adversarial strategies, accidental jailbreaking by naive developers during fine-tuning, and more.

OUTLINE

(00:00) Intro

(02:57) NATHAN INTERVIEWS ADAM GLEAVE: FAR.AI's Mission

(05:33) Unveiling the Vulnerabilities in GPT-4's Fine Tuning and Assistance APIs

(11:48) Divergence Between The Growth Of System Capability And The Improvement Of Control

(13:15) Finding Substantial Vulnerabilities

(14:55) Exploiting GPT 4 APIs: Accidentally jailbreaking a model

(18:51) On Fine Tuned Attacks and Targeted Misinformation

(24:32) Malicious Code Generation

(27:12) Discovering Private Emails

(29:46) Harmful Assistants

(33:56) Hijacking the Assistant Based on the Knowledge Base

(36:41) The Ethical Dilemma of AI Vulnerability Disclosure

(46:34) Exploring AI's Ethical Boundaries and Industry Standards

(47:47) The Dangers of AI in Unregulated Applications

(49:30) AI Safety Across Different Domains

(51:09) Strategies for Enhancing AI Safety and Responsibility

(52:58) Taxonomy of Affordances and Minimal Best Practices for Application Developers

(57:21) Open Source in AI Safety and Ethics

(1:02:20) Vulnerabilities of Superhuman Go playing AIs

(1:23:28) Variation on AlphaZero Style Self-Play

(1:31:37) The Future of AI: Scaling Laws and Adversarial Robustness

(1:37:21) MICHAEL TRAZZI INTERVIEWS NATHAN LABENZ

(1:37:33) Nathan’s background

(01:39:44) Where does Nathan fall in the Eliezer to Kurzweil spectrum

(01:47:52) AI in biology could spiral out of control

(01:56:20) Bioweapons

(02:01:10) Adoption Accelerationist, Hyperscaling Pauser

(02:06:26) Current Harms vs. Future Harms, risk tolerance

(02:11:58) Jailbreaks, Nathan’s experiments with Claude

The cognitive revolution: https://www.cognitiverevolution.ai/

Exploiting Novel GPT-4 APIs: https://far.ai/publication/pelrine2023novelapis/

Advesarial Policies Beat Superhuman Go AIs: https://far.ai/publication/wang2022adversarial/

  continue reading

54 episoder

Artwork
iconDela
 
Manage episode 418754446 series 2966339
Innehåll tillhandahållet av Michaël Trazzi. Allt poddinnehåll inklusive avsnitt, grafik och podcastbeskrivningar laddas upp och tillhandahålls direkt av Michaël Trazzi eller deras podcastplattformspartner. Om du tror att någon använder ditt upphovsrättsskyddade verk utan din tillåtelse kan du följa processen som beskrivs här https://sv.player.fm/legal.

This is a special crosspost episode where Adam Gleave is interviewed by Nathan Labenz from the Cognitive Revolution. At the end I also have a discussion with Nathan Labenz about his takes on AI.

Adam Gleave is the founder of Far AI, and with Nathan they discuss finding vulnerabilities in GPT-4's fine-tuning and Assistant PIs, Far AI's work exposing exploitable flaws in "superhuman" Go AIs through innovative adversarial strategies, accidental jailbreaking by naive developers during fine-tuning, and more.

OUTLINE

(00:00) Intro

(02:57) NATHAN INTERVIEWS ADAM GLEAVE: FAR.AI's Mission

(05:33) Unveiling the Vulnerabilities in GPT-4's Fine Tuning and Assistance APIs

(11:48) Divergence Between The Growth Of System Capability And The Improvement Of Control

(13:15) Finding Substantial Vulnerabilities

(14:55) Exploiting GPT 4 APIs: Accidentally jailbreaking a model

(18:51) On Fine Tuned Attacks and Targeted Misinformation

(24:32) Malicious Code Generation

(27:12) Discovering Private Emails

(29:46) Harmful Assistants

(33:56) Hijacking the Assistant Based on the Knowledge Base

(36:41) The Ethical Dilemma of AI Vulnerability Disclosure

(46:34) Exploring AI's Ethical Boundaries and Industry Standards

(47:47) The Dangers of AI in Unregulated Applications

(49:30) AI Safety Across Different Domains

(51:09) Strategies for Enhancing AI Safety and Responsibility

(52:58) Taxonomy of Affordances and Minimal Best Practices for Application Developers

(57:21) Open Source in AI Safety and Ethics

(1:02:20) Vulnerabilities of Superhuman Go playing AIs

(1:23:28) Variation on AlphaZero Style Self-Play

(1:31:37) The Future of AI: Scaling Laws and Adversarial Robustness

(1:37:21) MICHAEL TRAZZI INTERVIEWS NATHAN LABENZ

(1:37:33) Nathan’s background

(01:39:44) Where does Nathan fall in the Eliezer to Kurzweil spectrum

(01:47:52) AI in biology could spiral out of control

(01:56:20) Bioweapons

(02:01:10) Adoption Accelerationist, Hyperscaling Pauser

(02:06:26) Current Harms vs. Future Harms, risk tolerance

(02:11:58) Jailbreaks, Nathan’s experiments with Claude

The cognitive revolution: https://www.cognitiverevolution.ai/

Exploiting Novel GPT-4 APIs: https://far.ai/publication/pelrine2023novelapis/

Advesarial Policies Beat Superhuman Go AIs: https://far.ai/publication/wang2022adversarial/

  continue reading

54 episoder

Alla avsnitt

×
 
Loading …

Välkommen till Player FM

Player FM scannar webben för högkvalitativa podcasts för dig att njuta av nu direkt. Den är den bästa podcast-appen och den fungerar med Android, Iphone och webben. Bli medlem för att synka prenumerationer mellan enheter.

 

Snabbguide