A voice AI agent that books, pays, and splits crypto without using crypto!

"Give your car a wallet."
A voice-first AI agent for group road trips, built for ETHGlobal Cannes 2026. Friends pool USDC into a shared on-chain treasury on Arc testnet, then CESTA (Crypto-Enabled Settlement & Transaction Autonomous-agent) — a Claude-powered AI agent — manages the entire trip through voice: finding stops, booking hotels, paying tolls, and splitting costs, all without anyone touching their phone.
The problem: Group road trips mean juggling 5–7 apps — Maps, Yelp, Booking.com, Venmo, a group chat — while the driver can't safely use any of them. Nobody tracks who paid what, and splitting at the end is painful.
The solution: One voice interface. You talk, CESTA pays.
For merchants that don't accept crypto, CESTA uses an x402 TEE card bridge: it pays on-chain in USDC/EURC via x402, and a Trusted Execution Environment issues a one-time Stripe virtual card with the exact spending limit — credentials encrypted directly to CESTA's key so no operator ever sees them. Real-world bookings, paid with crypto, verifiably. Browser agent then books automatically.
A Next.js frontend handles wallet connection (Reown AppKit + SIWE) and shows a real-time dashboard — treasury balance, spending by category, and group voting UI for large purchases. The GroupTreasury smart contract on Arc enforces per-transaction caps, category budgets, and 2-of-N group voting for high-value spends.
C.E.S.T.A. Agent is built around a Claude Code agent session with six MCP tool servers — each a separate process (stdio transport) that the agent calls as tools. The agent's persona and rules live in agent/CLAUDE.md; the server wiring is in agent/.mcp.json. This architecture means each capability (payments, storage, compute, voice) is fully isolated and independently testable.
Arc powers all payments. GroupTreasury.sol holds pooled USDC with category budgets, per-transaction caps, and a group voting mechanism for large spends. The treasury MCP server uses viem to call the contract directly. Nanopayments are gas-free micro-transactions recorded on-chain — used for tolls ($2.60), parking, and per-query data API calls. The agent has an ERC-8004 identity on Arc's IdentityRegistry.
0G is used in two ways: 0G Storage (KV store) persists all trip data — preferences, booking confirmations, receipts, itinerary — accessible across sessions and independent of our orchestrator. 0G Compute provides TEE-verified inference: when the agent compares hotel options, it calls verified_evaluate() and gets back a cryptographically signed evaluation proving an audited model actually produced the recommendation, not fabricated output. The agent also has an ERC-7857 iNFT on 0G Galileo for identity and reputation.
The TEE bridge is the most technically novel part. Real merchants don't accept crypto, so we built a TEE server (Express + Stripe Issuing) that converts on-chain EURC into one-time Stripe virtual cards. The flow: agent requests a card → TEE returns HTTP 402 → x402-pay MCP signs payment on Arc → TEE verifies on-chain → issues a Stripe card with the exact spending limit → ECIES-encrypts credentials using the agent's public key → returns encrypted card + TEE-signed receipt. The operator running the TEE never sees card numbers. Integrity is auditable: GET /v1/attestation returns a code hash matching the public Docker image SHA. The tee-web-agent MCP (Python + browser-use) then decrypts the card and uses browser automation to complete checkout on any real-world merchant site.
Voice pipeline: orchestrator receives raw audio via POST, sends it to a GPU VM running Whisper for STT, forwards the transcript to Claude Code via the voice-channel MCP (HTTP→stdio bridge), and converts the agent's voice_reply() response to audio via Kokoro TTS. The voice-channel MCP is a dual-interface server: it accepts HTTP requests from the orchestrator and exposes them as channel messages to Claude Code, resolving the HTTP response only when the agent calls voice_reply() with the matching request_id.
RoadTrip Co-Pilot is built around a Claude Code agent session with six MCP tool servers — each a separate process (stdio transport) that the agent calls as tools. The agent's persona and rules live in agent/CLAUDE.md; the server wiring is in agent/.mcp.json. This architecture means each capability (payments, storage, compute, voice) is fully isolated and independently testable.
Arc powers all payments. GroupTreasury.sol holds pooled USDC with category budgets, per-transaction caps, and a group voting mechanism for large spends. The treasury MCP server uses viem to call the contract directly. Nanopayments are gas-free micro-transactions recorded on-chain — used for tolls ($2.60), parking, and per-query data API calls. The agent has an ERC-8004 identity on Arc's IdentityRegistry.
0G is used in two ways: 0G Storage (KV store) persists all trip data — preferences, booking confirmations, receipts, itinerary — accessible across sessions and independent of our orchestrator. 0G Compute provides TEE-verified inference: when the agent compares hotel options, it calls verified_evaluate() and gets back a cryptographically signed evaluation proving an audited model actually produced the recommendation, not fabricated output. The agent also has an ERC-7857 iNFT on 0G Galileo for identity and reputation.
The TEE bridge is the most technically novel part. Real merchants don't accept crypto, so we built a TEE server (Express + Stripe Issuing) that converts on-chain EURC into one-time Stripe virtual cards. The flow: agent requests a card → TEE returns HTTP 402 → x402-pay MCP signs payment on Arc → TEE verifies on-chain → issues a Stripe card with the exact spending limit → ECIES-encrypts credentials using the agent's public key → returns encrypted card + TEE-signed receipt. The operator running the TEE never sees card numbers. Integrity is auditable: GET /v1/attestation returns a code hash matching the public Docker image SHA. The tee-web-agent MCP (Python + browser-use) then decrypts the card and uses browser automation to complete checkout on any real-world merchant site.
Voice pipeline: orchestrator receives raw audio via POST, sends it to a GPU VM running Whisper for STT, forwards the transcript to Claude Code via the voice-channel MCP (HTTP→stdio bridge), and converts the agent's voice_reply() response to audio via Kokoro TTS. The voice-channel MCP is a dual-interface server: it accepts HTTP requests from the orchestrator and exposes them as channel messages to Claude Code, resolving the HTTP response only when the agent calls voice_reply() with the matching request_id.

