Google I/O 2026: Gemini Goes Agentic
Google's annual developer conference, held yesterday in Mountain View, was dominated by one theme: AI agents that don't just answer questions — they take action. The star of the keynote was Gemini Spark, which Google describes as a "24/7 AI agent" and personal digital partner capable of navigating your digital life on your behalf.
Alongside Spark, Google unveiled Gemini 3.5 Flash, now generally available, which surpasses the previous 3.1 Pro in coding, agentic tasks, and multimodal benchmarks while running 4x faster than comparable frontier models. Pricing comes in at $1.50/$9 per million input/output tokens with a 1M context window. A more powerful Gemini 3.5 Pro is currently in testing and expected next month.
The company also introduced Gemini Omni, a new model series that accepts image, audio, video, and text input and can output video grounded in real-world knowledge — a clear push toward truly multimodal AI creation tools.
On the hardware side, Google announced the first Android XR audio glasses, arriving this fall in partnership with Samsung, Qualcomm, Gentle Monster, and Warby Parker. The glasses will put Gemini AI in your line of sight for real-time navigation, translation, and object recognition — and notably, they'll pair with iPhones too. Google also revealed Googlebook, a new premium hardware line built with Gemini at its core.
A new AI Ultra subscription tier at $100/month targets developers and power users, bundling access to the latest Gemini models including Spark.
Sources: Tom's Guide, 9to5Google, Android Central
Four Chinese Labs, 12 Days, One Message: The Open-Weights Price War Is Here
In a remarkable display of coordinated capability, four Chinese AI labs released open-weights coding models within a 12-day window in late April and early May: Z.ai's GLM-5.1, MiniMax M2.7, Moonshot's Kimi K2.6, and DeepSeek V4 (in Pro and Flash variants).
The models are not just cheap alternatives — they're genuinely competitive. Kimi K2.6 scored 87 on coding benchmarks with native 300-agent swarm support, while DeepSeek V4 Pro hit 89 (via its custom DeepClaude harness) and offers a 1M-token context window. All four run inference at less than one-third the cost of Western frontier models like Claude Opus.
The implications are significant: with DeepSeek V4 Pro running a promotional rate of 75% off through May 31, the inference cost war is accelerating far faster than many in the industry expected. For developers building agentic applications that require high-volume inference, the economics are shifting rapidly.
Sources: Abhishek Gautam, Artificial Analysis, CoderSera
Claude Mythos Preview Passes Landmark Cybersecurity Test — And Stays Under Lock
Anthropic's Claude Mythos Preview, announced on April 7, became the first AI model to clear the UK AI Security Institute's "The Last Ones" (TLO) evaluation — a 32-step corporate network attack simulation that typically takes human experts 20 hours to complete. Mythos solved the full chain from reconnaissance to domain takeover in 3 out of 10 attempts, averaging 22 of 32 steps across all runs.
Even more striking: Mythos autonomously discovered and exploited CVE-2026-4747, a 17-year-old remote code execution vulnerability in FreeBSD's NFS implementation — a genuine zero-day finding by an AI system.
Anthropic chose not to release Mythos publicly, citing the cybersecurity implications. Instead, the company launched Project Glasswing, an industry consortium focused on finding and fixing vulnerabilities in foundational systems, and granted monitored access to over 40 organizations maintaining critical software.
The Mythos situation has directly accelerated policy action: it was cited as a catalyst for the US government's new pre-release AI testing agreements (see below).
Sources: Anthropic Red Team, UK AISI, Bishop Fox
US Government Secures Pre-Release AI Testing Agreements with Microsoft, Google, and xAI
The National Institute of Standards and Technology (NIST) announced that Microsoft, Google, and xAI have agreed to provide early access to unreleased AI models for government security evaluation — joining OpenAI and Anthropic, who signed similar agreements in 2024.
The agreements allow NIST's Center for AI Standards and Innovation, within the US Department of Commerce, to evaluate new models for national security and public safety risks before they reach the public. The center has already completed more than 40 AI model evaluations.
The move was widely seen as a direct response to the capabilities demonstrated by Claude Mythos Preview, which pushed concerns about AI's offensive cyber potential to a tipping point. While not a formal regulatory mandate, the voluntary agreements signal a growing consensus that frontier AI models need independent review before deployment.
Sources: CNN, Washington Post, CNBC
AWS Gives AI Agents Their Own Wallets with AgentCore Payments
Amazon Web Services launched Amazon Bedrock AgentCore Payments in preview, built in partnership with Coinbase and Stripe. The system enables AI agents to autonomously complete stablecoin-based micropayments while executing tasks — purchasing API access, data feeds, and paywalled content on behalf of users.
Transactions settle in roughly 200 milliseconds using USDC on Ethereum's Base layer-2 network and Solana. Developers can choose between a Coinbase wallet or Stripe's Privy wallet, and end users fund their wallets via stablecoin or fiat debit card.
Crucially, security controls are baked in: agents can only transact with explicit user authorization, and spending limits are enforced per session. Early testers include Warner Bros. Discovery, with plans to expand to larger transactions like hotel bookings and travel reservations.
The launch represents a significant step toward what AWS calls the "agentic economy" — a world where AI agents operate as independent economic actors within user-defined boundaries.