News
A New Study from Harvard and Perplexity Finds AI Agents Perform 26 Minutes of Autonomous Work per Session vs 33 Seconds for Search
1+ hour, 18+ min ago (658+ words) A new paper measures what changes when users move from a search assistant to an autonomous agent. A new working research from Perplexity and Harvard offers field evidence on what AI agents do to knowledge work. It draws on production…...
Xiaomi Mi Mo and Tile RT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs
15+ hour, 24+ min ago (182+ words) The second layer is DFlash speculative decoding, covered in detail below. The third layer is Tile RT, the system that executes everything on the GPU. Each technique alone is not enough. The 1000 TPS result needs all three aligned tightly. Xiaomi…...
Microsoft AI Introduces MAI-Transcribe-1. 5: 2. 4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription
23+ hour, 17+ min ago (498+ words) Last week Microsoft AI has announced MAI-Transcribe-1. 5. It is the second iteration of the company's in-house speech-to-text family. The model targets accuracy across 43 languages, accents, and noisy environments. The Microsoft team positions it for production transcription workloads. MAI-Transcribe-1. 5 is an…...
Google's New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal
2+ day, 10+ hour ago (549+ words) This week, Google AI team released the Colab CLI. The tool connects your local terminal to remote Colab runtimes. It lets developers and AI agents run code on cloud GPUs and TPUs. You stay in your terminal the entire time. The…...
NVIDIA Releases Nemotron 3. 5 ASR: A 600 M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time
3+ day, 18+ min ago (490+ words) NVIDIA's Nemotron Speech team has released Nemotron 3. 5 ASR. It is a 600 M-parameter streaming Automatic Speech Recognition (ASR) model. A single checkpoint transcribes 40 language-locales in real time. Punctuation and capitalization are built in natively. The model ships as open weights on…...
Google Deep Mind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory
3+ day, 13+ hour ago (679+ words) Google Deep Mind released Quantization-Aware Training (QAT) checkpoints for the Gemma 4 family. The release targets local deployment on edge devices and consumer GPUs. It follows the Gemma 4 launch in April and a 12 B model two days earlier. We compared the…...
NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on Kubernetes
3+ day, 21+ hour ago (551+ words) In production inference deployments, demand fluctuates over time, requiring inference replicas to scale elastically. Cold-starting inference workloads on Kubernetes can take several minutes. During that time, GPUs are allocated but idle, generating no tokens and serving no requests. To address…...
15 Best Vibe Coding Tools in 2026 Compared: Pricing, Features, and Best Fit
4+ day, 6+ min ago (1283+ words) AI-first development is changing how software gets built. A new approach called "vibe coding" sits at the center of that shift. Developers describe what they want in plain language. An AI agent turns that description into working software. The term…...
NVIDIA AI Releases Nemotron 3 Ultra: An Open 550 B Mixture-of-Experts Hybrid Mamba-Transformer for Long-Running Agents
4+ day, 10+ hour ago (1267+ words) NVIDIA has released Nemotron 3 Ultra, the largest model in its Nemotron 3 family. It targets a specific problem: long-running agents that plan, call tools, and reason across many turns. As agents run longer, token counts grow and inference cost climbs. Nemotron…...
Miso Labs Releases Miso TTS: An 8 B Emotive Text-to-Speech Model with Open Weights
5+ day, 2+ min ago (563+ words) Miso Labs has released Miso TTS, an open-weights 8-billion-parameter text-to-speech model. It generates expressive speech from both text and audio context. The model uses residual vector quantization (RVQ) to widen its sonic range. This avoids scaling a single flat vocabulary…...