Architecture

End-to-End Flow

One browser tool call traverses five layers — from the main agent delegating via task, through the schema-validated WebSocket envelope, into the extension’s consent gate, down the CDP branch matching the action, and onto the real Chrome tab. The three load-bearing mechanisms (security invariants, uid cache lifecycle, obstacle detection) are called out at the bottom because they span layers.

Read it top-to-bottom for “what happens when the agent calls click({ uid: 'e3' })”; read the bottom cards side-to-side for “which invariants make this safe and resilient.” The rest of this page zooms into each layer.

Runtime Topology

Three zones participate on every browser tool call: the agent backend (server or desktop), the Chrome Extension (MV3 service worker), and the target tab (the user’s logged-in page).

Backend owns the agent loop. The browser tool receives an action, looks up context.browserBridge, and calls pool.request(userId, action). The pool serializes the request over the live WebSocket to the extension.
Extension owns the enforcement point. ws-client receives the request; session-manager checks the domain blocklist against the target tab’s current domain and auto-creates a session on first action; action-dispatcher maps the action to a CDP command and runs it via debugger-controller.
Target Tab is the real, user-controlled Chrome tab. CDP Input.dispatchMouseEvent produces events with event.isTrusted = true, which is what Gmail, Google Accounts, and most enterprise SaaS require.

Only the background service worker holds the WebSocket. Popup, options, and content scripts relay through it via chrome.runtime.sendMessage, never directly.

Shared pool — one implementation, two platforms

The backend is cross-platform. The connection pool, per-user bridge factory, and BrowserBridgeSocket abstraction all live in @zapvol/backend/infra/browser-bridge-{pool,bridge}.ts:

export interface BrowserBridgeSocket {
  send(data: string): void;
  close(code?: number, reason?: string): void;
}

export interface BrowserBridgePool {
  isConnected(userId: string): boolean;
  request(userId: string, action: BrowserAction): Promise<BrowserBridgeActionResult>;
  attach(userId: string, ws: BrowserBridgeSocket): void;
  detach(userId: string, reason?: string): void;
  handleMessage(userId: string, raw: string): void;
  getStats(): { connections: number; inFlight: number };
}

Both Hono’s WSContext (server) and the ws package’s WebSocket (desktop Electron main) structurally satisfy BrowserBridgeSocket — no adapter needed. Only the handshake routing differs per platform:

Platform	Endpoint	Auth
Server	`wss://<host>/ws/browser`	Pairing token returned by `/api/browser-extension/pairing-token`
Desktop	`ws://127.0.0.1:48123`	Pairing token file in Electron’s `userData`

Extension UI Layering (5 layers)

The extension’s popup and options UI strictly mirrors @zapvol/app’s Contract → Service → Context → Hooks → UI pattern. UI components never call chrome.runtime.* or chrome.storage.* directly.

Layer	Path	Role
Contract	`src/contracts/bridge-service.ts`	`BridgeService` interface + `BridgeState` type — single source of truth
Services	`src/services/bridge-service-local.ts`	Background side: composes `SessionManager` + `WsClient` into the service
	`src/services/bridge-service-runtime.ts`	UI side: proxies every contract method over `chrome.runtime.sendMessage`
Context	`src/context/bridge-context.tsx`	React provider + `useBridgeService()` injection hook
Hooks	`src/hooks/use-bridge-state.ts`	Live `BridgeState` snapshot via `subscribeState` + `getState`
	`src/hooks/use-bridge-config.ts`	Config read / save / clear with loading + error state
	`src/hooks/use-blocklist.ts`	Blocklist list / add / remove with refresh
UI	`src/entrypoints/popup/App.tsx`	Activity monitor — session stream, per-tab stop, global “Stop all”
	`src/entrypoints/options/App.tsx`	Connection + blocklist + audit log

Cross-process messaging (popup ↔ background) is wrapped in two files at the protocol boundary:

src/runtime-protocol.ts — typed request/broadcast shapes, tagged with kind: "bridge_request" | "bridge_broadcast" to not clash with other extension message buses
src/runtime-handler.ts — background-side chrome.runtime.onMessage listener that dispatches into the local service and broadcasts state changes back

End-to-End Data Flow — one `click({ uid })` action

Concrete walkthrough complementing the hero diagram above. Shows the types that move across each boundary and the v2 uid path (the selector path follows the same control flow, only with querySelector-based center resolution instead of DOM.getBoxModel).

Browser subagent emits a tool call: browser({ action: { type: "click", uid: "e3", tabId: 42 } }). The subagent learned uid: "e3" from the previous extract’s elements array.
browser tool’s execute forwards to context.browserBridge.request(action, abortSignal).
Per-user bridge calls pool.request(userId, action) — generates a UUID, sends { type: "request", id, action } over the live WS, stores a pending Promise with a per-action pool timeout and an abort-listener cleanup.
Extension’s ws-client receives, validates against browserBridgeMessageSchema, routes to the request handler registered in background.ts.
action-dispatcher runs buildTarget({ uid: "e3" }) → { uid: "e3" }, then resolveTabAndDomain (tabId explicit → short-circuits), then sessionManager.isDomainBlocked(domain) (blocklist check) → sessionManager.ensureSession(tabId, domain) (creates on first action, reuses on subsequent; silently updates session.domain on mid-session navigation).
debuggerController.click(tabId, { uid: "e3" }). The controller looks up uidCaches.get(tabId).get("e3") → backendNodeId: 1829; DOM.scrollIntoViewIfNeeded({ backendNodeId: 1829 }) + DOM.getBoxModel({ backendNodeId: 1829 }) → center (x, y); Input.dispatchMouseEvent mousePressed → mouseReleased at that point. Dispatcher calls sessionManager.recordAction(tabId) to bump actionCount + lastActionAt for audit.
CDP resolves; dispatcher returns { result: { ok: true } }; extension sends { type: "response", id, result: { ok: true } } back over the WS.
Pool resolves the pending Promise; browser tool’s execute pushes a task_milestone event and returns { ok: true, action: "click", result: { ok: true } } to the subagent’s loop.

Where it breaks. Step 5 can fail with domain_blocked (terminal — target on blocklist) or tab_not_found; step 6 can fail with element_stale (uid not in cache — agent must extract first) or element_not_found (cached backendNodeId resolved to a node no longer in the DOM). Every failure lands on the same response envelope; the agent treats each code per the contract in Session Model → Error-handling contract.

Cancellation

Two independent abort paths reach an in-flight BUA action cleanly:

Extension-side (user-initiated): popup per-session Stop now, the global red Stop all, or Chrome’s debugger-bar Cancel → extension detaches the debugger, fires session_ended (and global_stop for Stop all), and any subsequent action on that tab fails with session_not_found. Adding the target domain to the blocklist ends the session the same way plus domain_blocked errors for new attempts.
Backend-side (agent-run cancelled): the AI SDK passes an AbortSignal to each tool’s execute. The browser tool forwards it to bridge.request(action, signal) → BrowserBridgePool.request attaches an abort listener to its pending-request map. When the signal fires, the pool immediately resolves the pending promise with { error: { code: "internal_error", message: "aborted by caller" } }, clears the timeout, and detaches the listener. Any response that arrives late from the extension is logged and dropped as stray.

Both paths use the same error envelope, so the subagent’s error-handling contract applies uniformly — no new code path in the agent loop.

State Ownership

State	Owner	Storage
Pairing config	Extension	`chrome.storage.local` key `bridge_config`
Domain blocklist entries	Extension	`chrome.storage.local` key `blocklist`
Active sessions (keyed by tabId; multiple per domain allowed)	Extension	`chrome.storage.local` key `activeSessions`
BUA window id	Extension	`chrome.storage.local` key `buaWindowId`
Session audit log (≤ 1000 entries)	Extension	`chrome.storage.local` key `sessionHistory`
In-flight WS requests	Pool (backend, in-memory)	Map<userId, Map<requestId, Pending>>
Connection per user	Pool (backend, in-memory)	Map<userId, WSContext \| WebSocket>
CDP attachment set	`debugger-controller`	In-memory `Set<tabId>`
uid → backendNodeId cache (per tab)	`debugger-controller`	In-memory `Map<tabId, Map<uid, backendNodeId>>` — cleared on `Page.frameNavigated` or `navigate`
`evaluateInProgress` set (guards dialog auto-dismiss)	`debugger-controller`	In-memory `Set<tabId>`
`SessionEvent` subscribers	`session-manager` module-scope	In-memory `Set<listener>` (re-attached on SW wake)

The background service worker can be terminated by Chrome at any time. When it restarts, the bridge client re-connects and extension-side state is rehydrated from chrome.storage.local. Any agent request in flight at the moment of SW termination is rejected on the backend side with an internal_error — the agent learns of the failure and can choose to retry or stop.

Why this shape

One contract, two implementations — keeps the popup and options page honest about their dependency. Swapping the runtime for a mock in tests is a single factory call.
Pool in shared backend — @zapvol/server and @zapvol/desktop both get identical enforcement behavior. Fixing a timeout bug in one fixes it everywhere.
Scope check in the extension, not the backend — the extension is the only code that sees the live tab. It can enforce “this domain is not blocklisted AND this tab is still open” atomically before dispatching. The backend trusts the extension’s answer and surfaces errors to the agent.
One tool, action enum — mirrors Anthropic’s computer-use shape and keeps the prompt tractable (13 actions in a single JSON schema the LLM has to understand). Individual tools per action would explode the prompt.