Features overview

Why Zapvol ships a Chrome extension as a first-class BUA surface, what it unlocks, and how it differs from cloud headless-browser alternatives

The Problem

An agent that reaches out to the web can cover public content (search, scrape, structured APIs) but hits a wall on anything that requires authentication. Gmail, Notion, internal portals, banking dashboards, tenant-specific SaaS — these all live behind cookies, OAuth sessions, and MFA flows that the agent does not possess.

The conventional workaround is a cloud headless browser — Browserbase, Steel, a managed Playwright cluster. These are great for public web automation. They are useless when the task is “read my calendar” or “export my invoices from this portal”: the user’s session exists in the user’s own browser, and cloud workers don’t have it.

The Answer — Browser Use Agent (BUA)

Zapvol pairs a Chrome MV3 extension with the agent so the agent can act inside the user’s already-logged-in browser. The extension proxies high-level actions from the agent into real Chrome DevTools Protocol (CDP) commands on a tab the user has explicitly approved. This is the same capability pattern as Anthropic’s computer-use tool, scoped to a single logged-in browser tab instead of an entire remote desktop.

Browser Use Agent (BUA) Agent action → extension → user's already-logged-in tab Zapvol Agent Backend (server / desktop) — runs the agent loop, picks an action WebSocket — paired connection Chrome Extension (MV3) Background SW — allowlist + session TTL check before every action CDP Input.dispatchMouseEvent / dispatchKeyEvent → event.isTrusted = true User's logged-in Chrome tab User's own cookies / OAuth / MFA — intact Chrome shows "Zapvol Browser Bridge is debugging this browser" bar throughout — not suppressible

Why a dedicated extension

AlternativeWhy it doesn’t fit this use case
Cloud headless Playwright (Browserbase, …)Runs in a clean profile — no access to the user’s cookies / MFA state
Native messaging hostRequires a separately installed binary + plist registration
Bookmarklet / page-injected scriptFake events fail event.isTrusted checks — see below
Electron webview inside the appDoes not reuse the user’s default Chrome profile; parallel login needed

Why CDP, not page-injected scripts

Modern auth-gated sites (Gmail, Google Accounts, most enterprise SaaS) gate sensitive input handlers on event.isTrusted === true. Any event synthesized from page JavaScript — element.click(), a constructed MouseEvent, a bookmarklet — arrives with isTrusted = false and is silently ignored.

Only two event sources produce trusted events: the user’s real hardware, and chrome.debugger’s Input.dispatchMouseEvent / Input.dispatchKeyEvent CDP commands. The second path is available only to an installed Chrome extension with the debugger permission. That single constraint rules out every alternative above and is the reason BUA has to ship as an extension, not a library the agent calls into.

Internal tool, UX-first

BUA is deployed to employees, not the public. The design optimizes for silent, zero-friction execution — no per-domain consent prompts, no session countdown, no approval gates. An internal product where every agent action required a click to trust would just train employees to click-through blindly. Instead, safety rests on three invariants:

  1. Full audit trail — every session start and end is appended to an append-only log (sessionHistory, 1000 entries, surfaced in Options). End entries carry reason, duration, and actionCount. In parallel, lifecycle WS events (session_started, session_ended, domain_blocked, tab_closed, global_stop) flow to the backend for operational observability; per-action outcomes are recorded separately by the agent subsystem as task_milestone events. Accountability is post-hoc, not pre-consent.
  2. User-initiated global kill — the popup exposes a single red “Stop all” button that immediately detaches every active debugger, ends every session, and fires a global_stop WS event so the backend short-circuits any in-flight requests. Always one click away.
  3. Clean lifecycle — sessions end automatically on tab close, system idle (30 min), screen lock, Chrome’s debugger-bar Cancel, SW unload, and blocklist addition. No TTL countdown; the agent runs until the work is done or the user intervenes.

Chrome itself renders the yellow “Zapvol Browser Bridge is debugging this browser” bar whenever chrome.debugger is attached. This bar is not suppressible from extension code — treating it as a feature is the correct frame: the user sees, continuously, every moment the agent has remote control. Employee onboarding documents it once (“yellow bar = BUA active; click Cancel to stop”).

What is deliberately not here

  • No per-domain allowlist UI. Any domain is implicitly allowed. A domain blocklist exists in Options (empty by default) for explicit deny cases — banking, payroll, compliance-sensitive tools — and is checked before every action.
  • No Approve / Trust buttons anywhere. The popup is an activity monitor, not an authorization center.
  • No TTL countdown. Sessions live as long as the tab is open and the system is active.
  • No “Trust this site” floating-ball prompt. The floating ball renders only when the agent is actively working on the tab; idle state is invisible.

A public-facing variant of BUA would need to add back a consent-layer UI over this base. That is a different product, not a reconfiguration of this one — Chrome Web Store also wouldn’t accept the internal build with its silent-execution posture and <all_urls> host permission. Distribution is internal-only via GitHub Release / IT-managed listings.

The Tool Surface

The main agent does not see the browser tool directly. Instead, BUA is packaged as a built-in subagent: the main agent delegates via the standard task tool with subagent_type: "browser", and the browser subagent runs the multi-step loop (extract → reason → click → …) in an isolated context, returning a single summary.

The browser tool itself — one tool with a 13-action discriminated union (navigate, click, type, hover, evaluate, screenshot, extract, and six more) — lives inside the subagent’s private toolkit. Naming mirrors Browser Use so the LLM’s prior familiarity transfers. Multi-step flows are composed by calling the tool repeatedly, keeping retry and cancellation semantics simple.

Why the wrapper: BUA’s useful unit is a loop, not a single call. A typical “extract 3 candidates” task takes 17 browser actions. If the main agent ran that loop directly, every step would re-send the main dialogue context — cost grows with main-conversation length. The subagent gives BUA its own context so the main thread stays clean: one delegation in, one summary out.

The full action list, parameters, return shapes, and error codes live in Protocol; the subagent wrapper and context isolation are covered in Browser Subagent.

What the user sees

Three surfaces, all intentionally prominent but none of them gatekeepers:

  • Chrome’s yellow debug bar — system-level, non-suppressible, shown the entire time the extension is attached.
  • The extension popup — activity monitor listing every active session (domain, tab, action count), a red “Stop all” global kill switch, the connection state to the Zapvol backend, and a Show / Hide toggle for the BUA window. No Approve buttons; the popup is for watching and stopping, not for granting.
  • The BUA window — when the agent opens its own tabs, they land in a dedicated minimized Chrome window so the user’s main browser keeps its focus. The user can surface it any time with one click from the popup.
  • The floating ball — a small in-page indicator that renders only when the agent has an active session on this tab. Click to see the agent’s progress on this page and send a follow-up instruction. Hidden on idle tabs.

The user’s contract with the agent is legible at a glance: which site, which tab, how many actions, how to cut it off. See Session Model for multi-session semantics and the BUA window’s invariants.

Was this page helpful?