AI Peer Review
Interesting tech papers that catch my attention, using Google's NotebookLM to transform them into engaging, easy-to-listen-to audio, making it enjoyable to keep up with cutting-edge research. Consider it peer-reviewed research, analyzed by AI peers. All the audio is produced entirely by Google's NotebookLM.
Follow on Spotify →There Will Be a Scientific Theory of Deep Learning
00:23:29May 6, 2026
There Will Be a Scientific Theory of Deep Learning [paper link]. All the audio was generated using AI by Google’s NotebookLM.
BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate
00:20:06May 3, 2026
BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
00:20:53Apr 29, 2026
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
00:20:28Apr 23, 2026
Direct Preference Optimization: Your Language Model is Secretly a Reward Model [paper link]. All the audio was generated using AI by Google’s NotebookLM.
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
00:20:38Apr 7, 2026
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Refusal in Language Models Is Mediated by a Single Direction
00:19:27Apr 3, 2026
Refusal in Language Models Is Mediated by a Single Direction [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Mamba-3: Improved Sequence Modeling using State Space Principles
00:19:19Apr 1, 2026
Mamba-3: Improved Sequence Modeling using State Space Principles [paper link]. All the audio was generated using AI by Google’s NotebookLM.
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning
00:22:19Mar 24, 2026
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning [paper link]. All the audio was generated using AI by Google’s NotebookLM.
PaperBanana: Automating Academic Illustration for AI Scientists
00:16:18Mar 23, 2026
PaperBanana: Automating Academic Illustration for AI Scientists [paper link]. All the audio was generated using AI by Google’s NotebookLM.
mHC: Manifold-Constrained Hyper-Connections
00:18:30Feb 24, 2026
mHC: Manifold-Constrained Hyper-Connections [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
00:13:36Feb 1, 2026
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings
00:12:41Jan 12, 2026
Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings [paper link]. All the audio was generated using AI by Google’s NotebookLM.
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
00:15:37Dec 17, 2025
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Self-Improving VLM Judges Without Human Annotations
00:11:41Dec 10, 2025
Self-Improving VLM Judges Without HumanAnnotations [paper link]. All the audio was generated using AI by Google’s NotebookLM.
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
00:09:58Nov 26, 2025
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
00:13:59Nov 23, 2025
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm [paper link]. All the audio was generated using AI by Google’s NotebookLM.
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
00:12:21Nov 19, 2025
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics [paper link]. All the audio was generated using AI by Google’s NotebookLM.
GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents
00:16:30Nov 16, 2025
GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Scaling Agent Learning via Experience Synthesis
00:16:39Nov 12, 2025
Scaling Agent Learning via Experience Synthesis [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper
00:17:37Nov 9, 2025
Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a Baseline Paper [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Reasoning with Sampling: Your Base Model is Smarter Than You Think
00:13:41Nov 2, 2025
Reasoning with Sampling: Your Base Model is Smarter Than You Think [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
00:16:40Oct 29, 2025
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play [paper link]. All the audio was generated using AI by Google’s NotebookLM.
DeepSeek-OCR: Contexts Optical Compression
00:14:03Oct 26, 2025
DeepSeek-OCR: Contexts Optical Compression [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
00:17:21Oct 22, 2025
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Generalized Parallel Scaling with Interdependent Generations
00:13:54Oct 19, 2025
Generalized Parallel Scaling with Interdependent Generations [paper link]. All the audio was generated using AI by Google’s NotebookLM.
A Temporal Convolutional Network-Based Approach and a Benchmark Dataset for Colonoscopy Video Temporal Segmentation
00:16:41Oct 16, 2025
A Temporal Convolutional Network-BasedApproach and a Benchmark Dataset forColonoscopy Video Temporal Segmentation [https://arxiv.org/pdf/2502.03430]. All the audio was generated using AI by Google’s NotebookLM.
Learning to Reason for Hallucination Span Detection
00:15:02Oct 15, 2025
Learning to Reason for Hallucination Span Detection [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
00:16:52Oct 12, 2025
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models [paper link]. All the audio was generated using AI by Google’s NotebookLM.
RLP: Reinforcement as a Pretraining Objective
00:14:54Oct 12, 2025
RLP: Reinforcement as a Pretraining Objective [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation
00:15:27Oct 8, 2025
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation [paper link]. All the audio was generated using AI by Google’s NotebookLM.
CWM: An Open-Weights LLM for Research on Code Generation with World Models
00:15:11Sep 29, 2025
CWM: An Open-Weights LLM for Research on Code Generation with World Models [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Transition Models: Rethinking the Generative Learning Objective
00:18:50Sep 18, 2025
Transition Models: Rethinking the Generative Learning Objective [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Planning with Reasoning using Vision Language World Model
00:17:14Sep 14, 2025
Planning with Reasoning using Vision LanguageWorld Model [paper link]. All the audio was generated using AI by Google’s NotebookLM.
FastVLM: Efficient Vision Encoding for Vision Language Models
00:15:16Sep 10, 2025
FastVLM: Efficient Vision Encoding for Vision Language Models [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Hierarchical Reasoning Model
00:20:43Sep 7, 2025
Hierarchical Reasoning Model [https://arxiv.org/pdf/2506.21734]. All the audio was generated using AI by Google’s NotebookLM.
RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models
00:16:29Sep 3, 2025
RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models [https://arxiv.org/pdf/2412.07679]. All the audio was generated using AI by Google’s NotebookLM.
Self-Rewarding Vision-Language Model via Reasoning Decomposition
00:23:17Aug 31, 2025
Self-Rewarding Vision-Language Model via Reasoning Decomposition [https://arxiv.org/pdf/2508.19652]. All the audio was generated using AI by Google’s NotebookLM.
TARS : MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs
00:12:22Aug 27, 2025
TARS : MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs [https://arxiv.org/pdf/2507.21584v2]. All the audio was generated using AI by Google’s NotebookLM.
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents
00:15:46Aug 24, 2025
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents [https://arxiv.org/pdf/2508.05954]. All the audio was generated using AI by Google’s NotebookLM.
ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability
00:18:08Aug 20, 2025
ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability [https://arxiv.org/pdf/2508.07050]. All the audio was generated using AI by Google’s NotebookLM.
DINOv3: Self-supervised learning for vision at unprecedented scale
00:21:30Aug 17, 2025
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling [https://arxiv.org/pdf/2508.10104]. All the audio was generated using AI by Google’s NotebookLM.
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
00:16:41Jul 6, 2025
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling [https://www.arxiv.org/pdf/2501.00574]. All the audio was generated using AI by Google’s NotebookLM.
Kwai Keye-VL Technical Report
00:17:26Jul 3, 2025
Kwai Keye-VL Technical Report [https://arxiv.org/pdf/2507.01949]. All the audio was generated using AI by Google’s NotebookLM.
Agent-as-a-Judge: Evaluate Agents with Agents
00:10:54Jul 1, 2025
A deep dive into Agent-as-a-Judge: Evaluate Agents with Agents [https://arxiv.org/pdf/2410.10934]. All the audio was generated using AI by Google's NotebookLM.
Sequential Diagnosis with Language Models
00:15:28Jun 30, 2025
Sequential Diagnosis with Language Models [https://arxiv.org/pdf/2506.22405]. All the audio was generated using AI by Google’s NotebookLM.
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
00:16:52Jun 22, 2025
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models [paper link]. All the audio was generated using AI by Google’s NotebookLM.
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers
00:19:51Jun 21, 2025
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers [https://arxiv.org/abs/2506.14702]. All the audio was generated using AI by Google’s NotebookLM.
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations
00:16:03Jun 16, 2025
A deep dive into TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations [https://arxiv.org/pdf/2505.18125]. All the audio was generated using AI by Google's NotebookLM.
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
00:43:35Jun 13, 2025
V-JEPA 2: Self-Supervised Video Models EnableUnderstanding, Prediction and Planning [https://ai.meta.com/research/publications/v-jepa-2-self-supervised-video-models-enable-understanding-prediction-and-planning/]. All the…
Rate-In: Information-Driven Adaptive Dropout Rates for Improved Inference-Time Uncertainty Estimation
00:15:44Jun 12, 2025
Rate-In: Information-Driven Adaptive Dropout Rates for Improved Inference-Time Uncertainty Estimation [https://arxiv.org/pdf/2412.07169]. All the audio was generated using AI by Google's NotebookLM.
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
00:39:43Jun 11, 2025
Lingshu: A Generalist Foundation Model for UnifiedMultimodal Medical Understanding and Reasoning [https://arxiv.org/pdf/2506.07044]. All the audio was generated using AI by Google’s NotebookLM.
MiniCPM4: Efficient LLMs for End Devices
00:18:42Jun 11, 2025
MiniCPM4: Efficient LLMs for End Devices [https://arxiv.org/pdf/2506.07900]. All the audio was generated using AI by Google’s NotebookLM.
Continuous Thought Machines: Neural Dynamics for AI
00:49:27Jun 6, 2025
Continuous Thought Machines: Neural Dynamics for AI [https://arxiv.org/pdf/2505.05522]. All the audio was generated using AI by Google’s NotebookLM.
Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies
00:34:37Jun 5, 2025
Multi-Agent Design: Optimizing Agents withBetter Prompts and Topologies [https://arxiv.org/pdf/2502.02533]. All the audio was generated using AI by Google’s NotebookLM.
How much do language models memorize?
00:50:37Jun 4, 2025
How much do language models memorize? [https://arxiv.org/pdf/2505.24832]. All the audio was generated using AI by Google’s NotebookLM.
SAM 2: Segment Anything in Images and Videos
00:14:49May 31, 2025
SAM 2: Segment Anything in Images and Videos [https://arxiv.org/pdf/2408.00714]. All the audio was generated using AI by Google’s NotebookLM.
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
00:13:46May 30, 2025
Grounding DINO: Marrying DINO with GroundedPre-Training for Open-Set Object Detection [https://arxiv.org/pdf/2303.05499]. All the audio was generated using AI by Google’s NotebookLM.
YOLO-World: Real-Time Open-Vocabulary Detection
00:13:29May 29, 2025
YOLO-World: Real-Time Open-Vocabulary Detection [https://arxiv.org/pdf/2401.17270]. All the audio was generated using AI by Google’s NotebookLM.
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
00:12:55May 28, 2025
QuickVideo: Real-Time Long Video Understandingwith System Algorithm Co-Design [https://arxiv.org/pdf/2505.16175]. All the audio was generated using AI by Google’s NotebookLM.
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
00:12:22May 27, 2025
A deep dive into Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges [https://arxiv.org/pdf/2406.12624]. All the audio was generated using AI by Google's NotebookLM.
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
00:14:52May 27, 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning [https://arxiv.org/abs/2505.02835]. All the audio was generated using AI by Google’s NotebookLM.
DiffTAD: Temporal Action Detection with Denoising Diffusion
00:15:09May 26, 2025
DiffTAD: Temporal Action Detection with Denoising Diffusion [https://arxiv.org/abs/2303.14863]. All the audio was generated using AI by Google’s NotebookLM.
Sigmoid Loss for Language-Image Pre-training
00:12:23May 25, 2025
A deep dive into Sigmoid Loss for Language-Image Pre-training [https://arxiv.org/pdf/2303.15343]. All the audio was generated using AI by Google's NotebookLM.
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
00:13:55May 22, 2025
A deep dive into UniVG-R1: Reasoning Guided Universal VisualGrounding with Reinforcement Learning[https://arxiv.org/pdf/2505.14231]. All the audio was generated using AI by Google's NotebookLM.
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
00:13:24May 22, 2025
A deep dive into VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-TuningThis academic paper introduces VideoChat-R1, a video-language model enhanced through Reinforcement Fine-Tuning (RFT) using …
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
00:14:08May 22, 2025
A deep dive into VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding [https://arxiv.org/pdf/2501.13106]. All the audio was generated using AI by Google's NotebookLM.
Parallel Scaling Law for Language Models
00:18:11May 21, 2025
A deep dive into Parallel Scaling Law for Language Models [https://arxiv.org/pdf/2505.10475]. All the audio was generated using AI by Google's NotebookLM.
Perception Encoder: The best visual embeddings are not at the output of the network
00:25:37May 21, 2025
A deep dive into Perception Encoder: The best visual embeddings are not at the output of the network [https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-n…
Qwen2.5-VL Technical Report
00:17:13May 20, 2025
A deep dive into Qwen2.5-VL Technical Report [https://arxiv.org/pdf/2502.13923]. All the audio was generated using AI by Google's NotebookLM.