Features overview
Why Zapvol ships a Chrome extension as a first-class BUA surface, what it unlocks, and how it differs from cloud headless-browser alternatives
The Problem
An agent that reaches out to the web can cover public content (search, scrape, structured APIs) but hits a wall on anything that requires authentication. Gmail, Notion, internal portals, banking dashboards, tenant-specific SaaS — these all live behind cookies, OAuth sessions, and MFA flows that the agent does not possess.
The conventional workaround is a cloud headless browser — Browserbase, Steel, a managed Playwright cluster. These are great for public web automation. They are useless when the task is “read my calendar” or “export my invoices from this portal”: the user’s session exists in the user’s own browser, and cloud workers don’t have it.
The Answer — Browser Use Agent (BUA)
Zapvol pairs a Chrome MV3 extension with the agent so the agent can act inside the user’s already-logged-in browser. The extension proxies high-level actions from the agent into real Chrome DevTools Protocol (CDP) commands on a tab the user has explicitly approved. This is the same capability pattern as Anthropic’s computer-use tool, scoped to a single logged-in browser tab instead of an entire remote desktop.
Why a dedicated extension
| Alternative | Why it doesn’t fit this use case |
|---|---|
| Cloud headless Playwright (Browserbase, …) | Runs in a clean profile — no access to the user’s cookies / MFA state |
| Native messaging host | Requires a separately installed binary + plist registration |
| Bookmarklet / page-injected script | Fake events fail event.isTrusted checks — see below |
| Electron webview inside the app | Does not reuse the user’s default Chrome profile; parallel login needed |
Why CDP, not page-injected scripts
Modern auth-gated sites (Gmail, Google Accounts, most enterprise SaaS) gate sensitive input handlers on
event.isTrusted === true. Any event synthesized from page JavaScript — element.click(), a constructed MouseEvent,
a bookmarklet — arrives with isTrusted = false and is silently ignored.
Only two event sources produce trusted events: the user’s real hardware, and chrome.debugger’s
Input.dispatchMouseEvent / Input.dispatchKeyEvent CDP commands. The second path is available only to an installed
Chrome extension with the debugger permission. That single constraint rules out every alternative above and is the
reason BUA has to ship as an extension, not a library the agent calls into.
Internal tool, UX-first
BUA is deployed to employees, not the public. The design optimizes for silent, zero-friction execution — no per-domain consent prompts, no session countdown, no approval gates. An internal product where every agent action required a click to trust would just train employees to click-through blindly. Instead, safety rests on three invariants:
- Full audit trail — every session start and end is appended to an append-only log (
sessionHistory, 1000 entries, surfaced in Options). End entries carry reason, duration, andactionCount. In parallel, lifecycle WS events (session_started,session_ended,domain_blocked,tab_closed,global_stop) flow to the backend for operational observability; per-action outcomes are recorded separately by the agent subsystem astask_milestoneevents. Accountability is post-hoc, not pre-consent. - User-initiated global kill — the popup exposes a single red “Stop all” button that immediately detaches every
active debugger, ends every session, and fires a
global_stopWS event so the backend short-circuits any in-flight requests. Always one click away. - Clean lifecycle — sessions end automatically on tab close, system idle (30 min), screen lock, Chrome’s debugger-bar Cancel, SW unload, and blocklist addition. No TTL countdown; the agent runs until the work is done or the user intervenes.
Chrome itself renders the yellow “Zapvol Browser Bridge is debugging this browser” bar whenever chrome.debugger is
attached. This bar is not suppressible from extension code — treating it as a feature is the correct frame: the user
sees, continuously, every moment the agent has remote control. Employee onboarding documents it once (“yellow bar =
BUA active; click Cancel to stop”).
What is deliberately not here
- No per-domain allowlist UI. Any domain is implicitly allowed. A domain blocklist exists in Options (empty by default) for explicit deny cases — banking, payroll, compliance-sensitive tools — and is checked before every action.
- No Approve / Trust buttons anywhere. The popup is an activity monitor, not an authorization center.
- No TTL countdown. Sessions live as long as the tab is open and the system is active.
- No “Trust this site” floating-ball prompt. The floating ball renders only when the agent is actively working on the tab; idle state is invisible.
A public-facing variant of BUA would need to add back a consent-layer UI over this base. That is a different product,
not a reconfiguration of this one — Chrome Web Store also wouldn’t accept the internal build with its silent-execution
posture and <all_urls> host permission. Distribution is internal-only via GitHub Release / IT-managed listings.
The Tool Surface
The main agent does not see the browser tool directly. Instead, BUA is packaged as a built-in subagent: the
main agent delegates via the standard task tool with subagent_type: "browser", and the browser subagent runs the
multi-step loop (extract → reason → click → …) in an isolated context, returning a single summary.
The browser tool itself — one tool with a 13-action discriminated union (navigate, click, type, hover, evaluate,
screenshot, extract, and six more) — lives inside the subagent’s private toolkit. Naming mirrors
Browser Use so the LLM’s prior familiarity transfers. Multi-step flows are
composed by calling the tool repeatedly, keeping retry and cancellation semantics simple.
Why the wrapper: BUA’s useful unit is a loop, not a single call. A typical “extract 3 candidates” task takes 17 browser actions. If the main agent ran that loop directly, every step would re-send the main dialogue context — cost grows with main-conversation length. The subagent gives BUA its own context so the main thread stays clean: one delegation in, one summary out.
The full action list, parameters, return shapes, and error codes live in Protocol; the subagent wrapper and context isolation are covered in Browser Subagent.
What the user sees
Three surfaces, all intentionally prominent but none of them gatekeepers:
- Chrome’s yellow debug bar — system-level, non-suppressible, shown the entire time the extension is attached.
- The extension popup — activity monitor listing every active session (domain, tab, action count), a red “Stop all” global kill switch, the connection state to the Zapvol backend, and a Show / Hide toggle for the BUA window. No Approve buttons; the popup is for watching and stopping, not for granting.
- The BUA window — when the agent opens its own tabs, they land in a dedicated minimized Chrome window so the user’s main browser keeps its focus. The user can surface it any time with one click from the popup.
- The floating ball — a small in-page indicator that renders only when the agent has an active session on this tab. Click to see the agent’s progress on this page and send a follow-up instruction. Hidden on idle tabs.
The user’s contract with the agent is legible at a glance: which site, which tab, how many actions, how to cut it off. See Session Model for multi-session semantics and the BUA window’s invariants.