Chrome DevTools Protocol (CDP) scraper for Elixir. Spawn headless Chromium, navigate pages, evaluate JavaScript, and extract data — all through a clean, idiomatic Elixir API.

Installation

Add :automator to your dependencies:

def deps do
  [
    {:automator, "~> 0.1.0"}
  ]
end

Requires Chromium installed and available on PATH as chromium.

Quick Start

# Start a scraper (spawns Chromium + connects automatically)
{:ok, scraper} = Automator.Scraper.start_link()

# Navigate to a page
Automator.Scraper.navigate(scraper, "https://example.com")

# Evaluate JavaScript
title = Automator.Scraper.eval(scraper, "document.title")
# => "Example Domain"

# Wait for an element to appear
Automator.Scraper.wait_for_selector(scraper, "h1")

# Click an element
Automator.Scraper.click(scraper, "a")

# Take a screenshot (returns base64)
%{"data" => base64} = Automator.Scraper.screenshot(scraper)
File.write!("page.png", Base.decode64!(base64))

# Set cookies
Automator.Scraper.set_cookie(scraper, "session", "abc123", ".example.com")

# Cleanup
Automator.Scraper.stop(scraper)

Architecture

Automator has three layers, from high-level to low-level:


  Automator.Scraper  (GenServer)            Primary API
  Manages browser + page, simple fns     

  Automator.Client   (WebSockex)            Raw CDP commands
  WebSocket JSON-RPC client              

  Automator.Chromium (Process mgmt)         Browser lifecycle
  Spawns/kills headless Chromium         

Most users only need Automator.Scraper. Use Client when you need direct access to CDP domains not exposed by the scraper. Use Chromium when you want to manage the browser lifecycle yourself.

API Reference

Automator.Scraper

High-level scraping API. A GenServer that owns a Chromium instance and a page-level WebSocket connection.

start_link/0

Spawns headless Chromium and connects to a blank page.

{:ok, scraper} = Automator.Scraper.start_link()

Returns {:ok, pid}.

navigate/2

Navigates to a URL. Waits ~1 second for the page to load before returning.

Automator.Scraper.navigate(scraper, "https://example.com")

Parameters:

ParamTypeDescription
pidpidScraper process
urlString.t()URL to navigate to

eval/2

Evaluates JavaScript in the page context. Supports async/await — promises are awaited automatically.

Automator.Scraper.eval(scraper, "document.title")
# => "Example Domain"

Automator.Scraper.eval(scraper, "document.querySelectorAll('a').length")
# => 1

Automator.Scraper.eval(scraper, "Array.from(document.querySelectorAll('a')).map(a => a.href)")
# => ["https://www.iana.org/domains/example"]

# Async example
Automator.Scraper.eval(scraper, """
  await fetch('/api/data').then(r => r.json())
""")

Parameters:

ParamTypeDescription
pidpidScraper process
jsString.t()JavaScript expression

Returns: The JavaScript result value, converted to an Elixir term.

click/2

Clicks an element matching a CSS selector.

Automator.Scraper.click(scraper, "button.submit")
# => true

Automator.Scraper.click(scraper, ".nonexistent")
# => false

Parameters:

ParamTypeDescription
pidpidScraper process
selectorString.t()CSS selector

Returns: true if element found and clicked, false otherwise.

wait_for_selector/3

Waits for an element to appear in the DOM using a MutationObserver (not polling).

Automator.Scraper.wait_for_selector(scraper, "h1")
# => :ok

Automator.Scraper.wait_for_selector(scraper, ".dynamic-content", 5000)
# => :ok

Automator.Scraper.wait_for_selector(scraper, ".nonexistent", 1000)
# => {:error, "selector .nonexistent not found within 1000ms"}

Parameters:

ParamTypeDefaultDescription
pidpidScraper process
selectorString.t()CSS selector
timeoutinteger()10_000Max wait time in ms

Returns: :ok or {:error, reason}.

screenshot/1

Captures a screenshot of the current page as a base64-encoded PNG.

%{"data" => base64} = Automator.Scraper.screenshot(scraper)
File.write!("screenshot.png", Base.decode64!(base64))

Parameters:

ParamTypeDescription
pidpidScraper process

Returns: %{"data" => base64_string}.

set_cookie/4

Sets a cookie for the given domain.

Automator.Scraper.set_cookie(scraper, "session", "abc123", ".example.com")
# => %{"success" => true}

Parameters:

ParamTypeDescription
pidpidScraper process
nameString.t()Cookie name
valueString.t()Cookie value
domainString.t()Cookie domain (e.g., ".example.com")

stop/1

Stops the scraper and kills the Chromium process.

Automator.Scraper.stop(scraper)
# => :ok

Automator.Chromium

Low-level browser process management. Use this when you want to manage the Chromium lifecycle yourself and connect multiple clients.

spawn/0

Launches headless Chromium on an available port.

browser = Automator.Chromium.spawn()
# => %{
#   chromium: #Port<0.5>,
#   os_pid: 12345,
#   port: 9222,
#   ws_url: "ws://localhost:9222/devtools/browser/..."
# }

Flags used:

FlagValue
--headlessnew
--no-sandbox
--disable-gpu
--window-size1920,1080
--remote-debugging-portauto-detected

Returns: A map with :chromium (port ref), :os_pid, :port, and :ws_url.

kill/1

Kills the Chromium process by OS PID.

browser = Automator.Chromium.spawn()
Automator.Chromium.kill(browser)

Parameters:

ParamTypeDescription
browsermap()Map returned by spawn/0

Automator.Client

Low-level WebSocket client for sending raw CDP commands. Use this when you need access to CDP domains not exposed by Scraper.

start_link/1

Connects to a Chromium WebSocket debugger URL.

{:ok, client} = Automator.Client.start_link("ws://localhost:9222/devtools/browser/...")

Parameters:

ParamTypeDescription
ws_urlString.t()WebSocket URL from Chromium.spawn().ws_url or /json endpoint

send_command/3

Sends a CDP command and blocks until the response arrives.

# Browser-level command
{:ok, result} = Automator.Client.send_command(client, "Browser.getVersion")
IO.inspect(result["product"])
# => "Chrome/145.0.7632.159"

# Page-level command
{:ok, page_client} = Automator.Client.start_link(page_ws_url)
{:ok, _} = Automator.Client.send_command(page_client, "Page.navigate", %{url: "https://example.com"})

# With parameters
{:ok, result} = Automator.Client.send_command(page_client, "Runtime.evaluate", %{
  expression: "document.title",
  returnByValue: true
})

Parameters:

ParamTypeDefaultDescription
pidpidClient process
methodString.t()CDP method name
paramsmap()%{}Command parameters

Returns: {:ok, result} or {:error, error}.

See the CDP protocol documentation for all available domains and methods.

Common Patterns

Scraping a list of items

{:ok, scraper} = Automator.Scraper.start_link()
Automator.Scraper.navigate(scraper, "https://example.com/products")

items = Automator.Scraper.eval(scraper, """
  Array.from(document.querySelectorAll('.product')).map(el => ({
    name: el.querySelector('.name').textContent,
    price: el.querySelector('.price').textContent,
    url: el.querySelector('a').href
  }))
""")

Automator.Scraper.stop(scraper)

Waiting for dynamic content

{:ok, scraper} = Automator.Scraper.start_link()
Automator.Scraper.navigate(scraper, "https://example.com")

# Wait for SPA to render
Automator.Scraper.wait_for_selector(scraper, ".app-root", 15_000)

# Interact
Automator.Scraper.click(scraper, "button.load-more")
Automator.Scraper.wait_for_selector(scraper, ".item:nth-child(20)", 10_000)

# Extract
data = Automator.Scraper.eval(scraper, "window.__INITIAL_STATE__")

Automator.Scraper.stop(scraper)

Using cookies for authenticated sessions

{:ok, scraper} = Automator.Scraper.start_link()

# Set auth cookie
Automator.Scraper.set_cookie(scraper, "auth_token", "secret", ".example.com")

# Navigate — already authenticated
Automator.Scraper.navigate(scraper, "https://example.com/dashboard")
profile = Automator.Scraper.eval(scraper, "document.querySelector('.profile').textContent")

Automator.Scraper.stop(scraper)

Raw CDP access for advanced use cases

# Start scraper for browser management
{:ok, scraper} = Automator.Scraper.start_link()
Automator.Scraper.navigate(scraper, "https://example.com")

# Access performance metrics via CDP
Automator.Scraper.eval(scraper, "performance.getEntriesByType('navigation')[0]")

# Or use Client directly for any CDP domain
# (e.g., Network, DOM, CSS, Accessibility, etc.)
Automator.Scraper.stop(scraper)

CDP Domains

Through Automator.Client.send_command/3, you have access to the full Chrome DevTools Protocol. Commonly useful domains:

DomainUse case
PageNavigation, screenshots, lifecycle events
RuntimeJavaScript evaluation, object inspection
DOMDOM tree traversal, node manipulation
NetworkRequest/response interception, cookies
CSSStylesheet inspection, computed styles
InputMouse/keyboard simulation
EmulationDevice emulation, viewport, geolocation
BrowserBrowser info, window management
TargetTab/page management

See the full CDP reference for every method.

License

MIT