The Engines Powering Autonomy: Gemini 3.5 Flash and Spark

Source: Google (Gemini Spark keynote presentation, Google I/O 2026)
Google needed a serious engine to handle the massive parallelization of Antigravity 2.0. They shifted their default orchestration engine to Gemini 3.5 Flash to solve this problem. This model was not built for deep or philosophical conversations. It was engineered purely for raw sequential throughput and fast tool execution.
It clocks an output generation speed of 289 tokens per second. That makes it roughly four times faster than comparable frontier models. This incredible speed keeps agents from hitting compounding latency bottlenecks when they loop through dozens of read, evaluate, and generate cycles.
The model features a 1-million-token input window alongside an expanded 64k-token output capacity. This allows it to dominate real-world execution benchmarks. It even scored 76.2% on Terminal-Bench 2.1.
The Python SDK Architecture
If you want to build programmatic custom agents, the Google Antigravity Python SDK (google-antigravity) embeds the Go runtime binary directly within the PyPI wheel. It operates via an asynchronous context manager. By default, it enforces a fail-closed and read-only security posture.
You have to explicitly enable write access via CapabilitiesConfig(). PreToolCallDecideHook instances then evaluate this access to prevent Time-of-Check to Time-of-Use security vulnerabilities.
import asyncio from google.antigravity import Agent, LocalAgentConfig, CapabilitiesConfig from google.antigravity.hooks import policy policies = [ policy.allow("run_command"), policy.deny("*") ] async def main(): config = LocalAgentConfig( system_instructions="You are an expert architecture assistant.", capabilities=CapabilitiesConfig(), policies=policies, ) async with Agent(config) as agent: response = await agent.chat("Initialize deployment script.") async for token in response: print(token, end="") if __name__ == "__main__": asyncio.run(main())
Gemini Spark: Breaking the Session Loop
While Antigravity handles local development, Gemini Spark takes over continuous execution. Spark agents run on dedicated Google Cloud virtual machines around the clock. They are completely detached from your local hardware state.
Spark uses Memory Banks to prevent context amnesia over long-horizon tasks. Instead of running slow vector searches across massive databases during live sessions, Spark continuously extracts data into rigid STRUCTURED_PROFILE schemas.
When Spark shifts from observation to action (like making an autonomous purchase), it relies on the Agent Payments Protocol (AP2). AP2 separates the generative model from raw financial data. It requires a dual cryptographic mandate: one mandate handles the cart intent, and the other handles the payment processor. This guarantees the agent never actually touches raw credit card numbers.
Want to discuss this further?
I'm always happy to chat about software engineering, cloud architecture, AI/ML, and DevOps.
Follow me for more insights on software engineering, cloud architecture, AI/ML, and DevOps