AI Peer Review

Interesting tech papers that catch my attention, using Google's NotebookLM to transform them into engaging, easy-to-listen-to audio, making it enjoyable to keep up with cutting-edge research. Consider it peer-reviewed research, analyzed by AI peers. All the audio is produced entirely by Google's NotebookLM.

Follow on Spotify →

There Will Be a Scientific Theory of Deep Learning

00:23:29

May 6, 2026

There Will Be a Scientific Theory of Deep Learning [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate

00:20:06

May 3, 2026

BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

00:20:53

Apr 29, 2026

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

00:20:28

Apr 23, 2026

Direct Preference Optimization: Your Language Model is Secretly a Reward Model [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

00:20:38

Apr 7, 2026

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Refusal in Language Models Is Mediated by a Single Direction

00:19:27

Apr 3, 2026

Refusal in Language Models Is Mediated by a Single Direction [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Mamba-3: Improved Sequence Modeling using State Space Principles

00:19:19

Apr 1, 2026

Mamba-3: Improved Sequence Modeling using State Space Principles [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

00:22:19

Mar 24, 2026

V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

PaperBanana: Automating Academic Illustration for AI Scientists

00:16:18

Mar 23, 2026

PaperBanana: Automating Academic Illustration for AI Scientists [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

mHC: Manifold-Constrained Hyper-Connections

00:18:30

Feb 24, 2026

mHC: Manifold-Constrained Hyper-Connections [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

00:13:36

Feb 1, 2026

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

00:12:41

Jan 12, 2026

Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

VL-JEPA: Joint Embedding Predictive Architecture for Vision-language

00:15:37

Dec 17, 2025

VL-JEPA: Joint Embedding Predictive Architecture for Vision-language [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Self-Improving VLM Judges Without Human Annotations

00:11:41

Dec 10, 2025

Self-Improving VLM Judges Without HumanAnnotations [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

PAN: A World Model for General, Interactable, and Long-Horizon World Simulation

00:09:58

Nov 26, 2025

PAN: A World Model for General, Interactable, and Long-Horizon World Simulation [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

00:13:59

Nov 23, 2025

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

00:12:21

Nov 19, 2025

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents

00:16:30

Nov 16, 2025

GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Scaling Agent Learning via Experience Synthesis

00:16:39

Nov 12, 2025

Scaling Agent Learning via Experience Synthesis [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper

00:17:37

Nov 9, 2025

Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Reasoning with Sampling: Your Base Model is Smarter Than You Think

00:13:41

Nov 2, 2025

Reasoning with Sampling: Your Base Model is Smarter Than You Think [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

00:16:40

Oct 29, 2025

Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play [paper link]. All the audio was generated using AI by Google’s NotebookLM.

DeepSeek-OCR: Contexts Optical Compression

00:14:03

Oct 26, 2025

DeepSeek-OCR: Contexts Optical Compression [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

00:17:21

Oct 22, 2025

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation [paper link]. All the audio was generated using AI by Google’s NotebookLM.

Generalized Parallel Scaling with Interdependent Generations

00:13:54

Oct 19, 2025

Generalized Parallel Scaling with Interdependent Generations [paper link]. All the audio was generated using AI by Google’s NotebookLM.

A Temporal Convolutional Network-Based Approach and a Benchmark Dataset for Colonoscopy Video Temporal Segmentation

00:16:41

Oct 16, 2025

A Temporal Convolutional Network-BasedApproach and a Benchmark Dataset forColonoscopy Video Temporal Segmentation [⁠https://arxiv.org/pdf/2502.03430⁠]. All the audio was generated using AI by Google’s NotebookLM.

Learning to Reason for Hallucination Span Detection

00:15:02

Oct 15, 2025

Learning to Reason for Hallucination Span Detection [paper link]. All the audio was generated using AI by Google’s NotebookLM.

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

00:16:52

Oct 12, 2025

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

RLP: Reinforcement as a Pretraining Objective

00:14:54

Oct 12, 2025

RLP: Reinforcement as a Pretraining Objective [paper link]. All the audio was generated using AI by Google’s NotebookLM.

Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation

00:15:27

Oct 8, 2025

Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation [paper link]. All the audio was generated using AI by Google’s NotebookLM.

CWM: An Open-Weights LLM for Research on Code Generation with World Models

00:15:11

Sep 29, 2025

CWM: An Open-Weights LLM for Research on Code Generation with World Models [paper link]. All the audio was generated using AI by Google’s NotebookLM.

Transition Models: Rethinking the Generative Learning Objective

00:18:50

Sep 18, 2025

Transition Models: Rethinking the Generative Learning Objective [paper link]. All the audio was generated using AI by Google’s NotebookLM.

Planning with Reasoning using Vision Language World Model

00:17:14

Sep 14, 2025

Planning with Reasoning using Vision LanguageWorld Model [paper link]. All the audio was generated using AI by Google’s NotebookLM.

FastVLM: Efficient Vision Encoding for Vision Language Models

00:15:16

Sep 10, 2025

FastVLM: Efficient Vision Encoding for Vision Language Models [paper link]. All the audio was generated using AI by Google’s NotebookLM.

Hierarchical Reasoning Model

00:20:43

Sep 7, 2025

Hierarchical Reasoning Model [https://arxiv.org/pdf/2506.21734]. All the audio was generated using AI by Google’s NotebookLM.

RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models

00:16:29

Sep 3, 2025

RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models [https://arxiv.org/pdf/2412.07679]. All the audio was generated using AI by Google’s NotebookLM.

Self-Rewarding Vision-Language Model via Reasoning Decomposition

00:23:17

Aug 31, 2025

Self-Rewarding Vision-Language Model via Reasoning Decomposition [https://arxiv.org/pdf/2508.19652]. All the audio was generated using AI by Google’s NotebookLM.

TARS : MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs

00:12:22

Aug 27, 2025

TARS : MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs [https://arxiv.org/pdf/2507.21584v2]. All the audio was generated using AI by Google’s NotebookLM.

Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

00:15:46

Aug 24, 2025

Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents [https://arxiv.org/pdf/2508.05954]. All the audio was generated using AI by Google’s NotebookLM.

ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

00:18:08

Aug 20, 2025

ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability [⁠https://arxiv.org/pdf/2508.07050⁠]. All the audio was generated using AI by Google’s NotebookLM.

DINOv3: Self-supervised learning for vision at unprecedented scale

00:21:30

Aug 17, 2025

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling [https://arxiv.org/pdf/2508.10104]. All the audio was generated using AI by Google’s NotebookLM.

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

00:16:41

Jul 6, 2025

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling [https://www.arxiv.org/pdf/2501.00574]. All the audio was generated using AI by Google’s NotebookLM.

Kwai Keye-VL Technical Report

00:17:26

Jul 3, 2025

Kwai Keye-VL Technical Report [https://arxiv.org/pdf/2507.01949]. All the audio was generated using AI by Google’s NotebookLM.

Agent-as-a-Judge: Evaluate Agents with Agents

00:10:54

Jul 1, 2025

A deep dive into Agent-as-a-Judge: Evaluate Agents with Agents [https://arxiv.org/pdf/2410.10934]. All the audio was generated using AI by Google's NotebookLM.

Sequential Diagnosis with Language Models

00:15:28

Jun 30, 2025

Sequential Diagnosis with Language Models [https://arxiv.org/pdf/2506.22405]. All the audio was generated using AI by Google’s NotebookLM.

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

00:16:52

Jun 22, 2025

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models [⁠paper link⁠]. All the audio was generated using AI by Google’s NotebookLM.

Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers

00:19:51

Jun 21, 2025

Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers [https://arxiv.org/abs/2506.14702]. All the audio was generated using AI by Google’s NotebookLM.

TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations

00:16:03

Jun 16, 2025

A deep dive into TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations [https://arxiv.org/pdf/2505.18125]. All the audio was generated using AI by Google's NotebookLM.

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

00:43:35

Jun 13, 2025

V-JEPA 2: Self-Supervised Video Models EnableUnderstanding, Prediction and Planning [https://ai.meta.com/research/publications/v-jepa-2-self-supervised-video-models-enable-understanding-prediction-and-planning/]. All the…

Rate-In: Information-Driven Adaptive Dropout Rates for Improved Inference-Time Uncertainty Estimation

00:15:44

Jun 12, 2025

Rate-In: Information-Driven Adaptive Dropout Rates for Improved Inference-Time Uncertainty Estimation [https://arxiv.org/pdf/2412.07169]. All the audio was generated using AI by Google's NotebookLM.

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

00:39:43

Jun 11, 2025

Lingshu: A Generalist Foundation Model for UnifiedMultimodal Medical Understanding and Reasoning [https://arxiv.org/pdf/2506.07044]. All the audio was generated using AI by Google’s NotebookLM.

MiniCPM4: Efficient LLMs for End Devices

00:18:42

Jun 11, 2025

MiniCPM4: Efficient LLMs for End Devices [https://arxiv.org/pdf/2506.07900]. All the audio was generated using AI by Google’s NotebookLM.

Continuous Thought Machines: Neural Dynamics for AI

00:49:27

Jun 6, 2025

Continuous Thought Machines: Neural Dynamics for AI [https://arxiv.org/pdf/2505.05522]. All the audio was generated using AI by Google’s NotebookLM.

Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies

00:34:37

Jun 5, 2025

Multi-Agent Design: Optimizing Agents withBetter Prompts and Topologies [https://arxiv.org/pdf/2502.02533]. All the audio was generated using AI by Google’s NotebookLM.

How much do language models memorize?

00:50:37

Jun 4, 2025

How much do language models memorize? [https://arxiv.org/pdf/2505.24832]. All the audio was generated using AI by Google’s NotebookLM.

SAM 2: Segment Anything in Images and Videos

00:14:49

May 31, 2025

SAM 2: Segment Anything in Images and Videos [https://arxiv.org/pdf/2408.00714]. All the audio was generated using AI by Google’s NotebookLM.

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

00:13:46

May 30, 2025

Grounding DINO: Marrying DINO with GroundedPre-Training for Open-Set Object Detection [https://arxiv.org/pdf/2303.05499]. All the audio was generated using AI by Google’s NotebookLM.

YOLO-World: Real-Time Open-Vocabulary Detection

00:13:29

May 29, 2025

YOLO-World: Real-Time Open-Vocabulary Detection [https://arxiv.org/pdf/2401.17270]. All the audio was generated using AI by Google’s NotebookLM.

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

00:12:55

May 28, 2025

QuickVideo: Real-Time Long Video Understandingwith System Algorithm Co-Design [https://arxiv.org/pdf/2505.16175]. All the audio was generated using AI by Google’s NotebookLM.

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

00:12:22

May 27, 2025

A deep dive into Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges [https://arxiv.org/pdf/2406.12624]. All the audio was generated using AI by Google's NotebookLM.

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

00:14:52

May 27, 2025

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning [https://arxiv.org/abs/2505.02835]. All the audio was generated using AI by Google’s NotebookLM.

DiffTAD: Temporal Action Detection with Denoising Diffusion

00:15:09

May 26, 2025

DiffTAD: Temporal Action Detection with Denoising Diffusion [⁠https://arxiv.org/abs/2303.14863⁠]. All the audio was generated using AI by Google’s NotebookLM.

Sigmoid Loss for Language-Image Pre-training

00:12:23

May 25, 2025

A deep dive into Sigmoid Loss for Language-Image Pre-training [https://arxiv.org/pdf/2303.15343]. All the audio was generated using AI by Google's NotebookLM.

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

00:13:55

May 22, 2025

A deep dive into UniVG-R1: Reasoning Guided Universal VisualGrounding with Reinforcement Learning[https://arxiv.org/pdf/2505.14231]. All the audio was generated using AI by Google's NotebookLM.

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

00:13:24

May 22, 2025

A deep dive into VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-TuningThis academic paper introduces VideoChat-R1, a video-language model enhanced through Reinforcement Fine-Tuning (RFT) using …

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

00:14:08

May 22, 2025

A deep dive into VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding [https://arxiv.org/pdf/2501.13106]. All the audio was generated using AI by Google's NotebookLM.

Parallel Scaling Law for Language Models

00:18:11

May 21, 2025

A deep dive into Parallel Scaling Law for Language Models [https://arxiv.org/pdf/2505.10475]. All the audio was generated using AI by Google's NotebookLM.

Perception Encoder: The best visual embeddings are not at the output of the network

00:25:37

May 21, 2025

A deep dive into Perception Encoder: The best visual embeddings are not at the output of the network [https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-n…

Qwen2.5-VL Technical Report

00:17:13

May 20, 2025

A deep dive into Qwen2.5-VL Technical Report [https://arxiv.org/pdf/2502.13923]. All the audio was generated using AI by Google's NotebookLM.