CDPEx (CDPEx v0.5.0)

Copy Markdown View Source

OTP-native Chrome DevTools Protocol (CDP) browser automation for Elixir.

CDPEx launches a headless Chrome process and drives it directly over the Chrome DevTools Protocol on a Mint.WebSocket connection — no ChromeDriver and no Node.js. Browsers and their WebSocket connections are supervised processes, so a Chrome crash surfaces to callers as {:error, reason} rather than a hung session.

This module is the high-level facade. See CDPEx.Page for page operations.

Example

{:ok, browser} = CDPEx.launch()
{:ok, page} = CDPEx.new_page(browser)
{:ok, _page} = CDPEx.Page.navigate(page, "https://example.com")
{:ok, html} = CDPEx.Page.html(page)
:ok = CDPEx.stop(browser)

Or, resource-safe, with with_page/3:

CDPEx.with_page([], fn page ->
  {:ok, _} = CDPEx.Page.navigate(page, "https://example.com")
  CDPEx.Page.html(page)
end)

Observability is via :telemetry — see CDPEx.Telemetry for the event taxonomy (launch / navigate spans, page open/close, and error events). Silent by default.

Status

Pages default to one WebSocket each (strong crash isolation); opt into sessionId multiplexing (many pages over the one browser socket) with new_page(browser, transport: :session), trading isolation for fewer sockets. Connection pooling, network interception, and stealth remain out of scope.

Summary

Types

The reason shapes that appear in {:error, reason} across CDPEx.

Functions

Closes a page opened with new_page/2.

Launches a headless Chrome browser and returns its process pid.

Opens a new page. See CDPEx.Browser.new_page/2 for options.

Stops a browser started with launch/1, closing all pages and killing Chrome.

Runs fun with a fresh page, guaranteeing the page (and, when given launch options, the browser) is cleaned up afterwards — even if fun raises.

Types

error_reason()

@type error_reason() ::
  CDPEx.Connection.call_error()
  | CDPEx.Chrome.launch_error()
  | :timeout
  | :unknown_page
  | :already_authenticated
  | :already_intercepting
  | {:timeout, :await_event}
  | {:conflict, :authenticated | :intercepting}
  | {:navigate, String.t()}
  | {:no_document_response, String.t()}
  | {:selector_not_found, String.t()}
  | {:evaluate_exception, term()}
  | {:unexpected_evaluate, term()}
  | {:invalid_args, term()}
  | {:invalid_source, term()}
  | {:invalid_error_reason, term()}
  | {:invalid_transport, term()}
  | {:unsupported_transport, term()}
  | {:invalid_response_body, String.t()}
  | {:invalid_pdf_data, String.t()}
  | {:invalid_screenshot_data, String.t()}
  | {:write_failed, term()}

The reason shapes that appear in {:error, reason} across CDPEx.

Error reasons are part of the public contract — pattern-match the tagged kinds ({:cdp_error, …}, {:timeout, …}, {:ws_closed, …}, …); their payloads (a CDP method, an exit status, a stderr/contents excerpt) are open and may gain detail.

The only bare, context-free reasons are :noproc, the high-level :timeout, :unknown_page, :already_authenticated, and :already_intercepting — self-describing control-flow outcomes with no payload to carry, the way GenServer uses :noproc. Validation failures that do have offending data to surface are tagged instead ({:invalid_response_body, excerpt}, {:invalid_pdf_data, excerpt}, {:invalid_screenshot_data, excerpt}).

Only part of this union is machine-checked: CDPEx.Connection.call_error/0 and CDPEx.Chrome.launch_error/0 are precisely specced on call/5 / launch/1, so Dialyzer catches a shape change in those at the source. The remaining members — the page-level tagged kinds and bare atoms — are hand-maintained documentation: the union itself is referenced by no @spec, so it is best-effort and not closed (kinds such as {:cdp_error, method, payload} also wrap arbitrary CDP data, and a renamed page-level producer would drift silently).

Two timeout shapes, by layer: the low-level CDPEx.Connection.call/5 and await_event/4 return {:timeout, context} (a CDP method, or :await_event), while the high-level CDPEx.Page wait_for_* functions and CDPEx.Pool.checkout/2 return a bare :timeout ("the awaited condition didn't happen in time").

A WebSocket frame that fails to decode is not a standalone reason: the connection stops on the decode failure, so callers observe it nested, as {:ws_closed, {:ws_decode, _}}.

Functions

close_page(browser, page)

@spec close_page(pid(), CDPEx.Page.t()) :: :ok | {:error, :unknown_page}

Closes a page opened with new_page/2.

Returns {:error, :unknown_page} if page was not opened on browser.

launch(opts \\ [])

@spec launch(keyword()) :: GenServer.on_start()

Launches a headless Chrome browser and returns its process pid.

Accepts the launch options documented in CDPEx.Chrome (e.g. :headless, :chrome_binary, :extra_args, :window_size, :launch_timeout). On slow cold-start hosts (e.g. headless Chrome in a constrained container) raise :launch_timeout — it is a ceiling, not a fixed wait. For long-lived use, prefer putting CDPEx.Browser under your own supervisor with a :shutdown timeout.

new_page(browser, opts \\ [])

@spec new_page(
  pid(),
  keyword()
) :: {:ok, CDPEx.Page.t()} | {:error, term()}

Opens a new page. See CDPEx.Browser.new_page/2 for options.

stop(browser)

@spec stop(pid()) :: :ok

Stops a browser started with launch/1, closing all pages and killing Chrome.

with_page(browser_or_opts, fun, opts \\ [])

@spec with_page(pid() | keyword(), (CDPEx.Page.t() -> result), keyword()) ::
  result | {:error, term()}
when result: var

Runs fun with a fresh page, guaranteeing the page (and, when given launch options, the browser) is cleaned up afterwards — even if fun raises.

Pass an existing browser pid to reuse it, or a keyword list of launch options to spin up a throwaway browser for the duration of the call. Returns whatever fun returns, or {:error, reason} if the page/browser could not be created.

With launch options, the throwaway browser is linked but contained: if it crashes during the call (e.g. its connection drops) with_page returns {:error, reason} instead of letting the crash propagate to the caller. To do that it briefly traps exits in the calling process for the duration of the call. Only the browser's own {:EXIT, _, _} is drained — a foreign process linked to the caller that exits during this window has its exit delivered as a message left in the caller's mailbox, so a caller that links other processes and relies on un-trapped exit propagation should pass a pre-launched browser pid instead. On slow cold-start hosts, raise :launch_timeout (a ceiling, not a fixed wait).

# against an existing browser
CDPEx.with_page(browser, fn page ->
  {:ok, _} = CDPEx.Page.navigate(page, "https://example.com")
  CDPEx.Page.html(page)
end)

# throwaway browser + page
CDPEx.with_page([headless: true], &CDPEx.Page.html/1)