Automator.Scraper (automator v0.1.2)

Copy Markdown View Source

High-level scraping API that manages a Chromium instance and page connection.

This is the primary interface for web scraping. It spawns a headless Chromium browser, connects to a page, and provides simple functions for navigation, JavaScript evaluation, and interaction.

Example

{:ok, scraper} = Automator.Scraper.start_link()

Automator.Scraper.navigate(scraper, "https://example.com")
title = Automator.Scraper.eval(scraper, "document.title")
# => "Example Domain"

Automator.Scraper.wait_for_selector(scraper, "h1")
Automator.Scraper.click(scraper, "a")
Automator.Scraper.screenshot(scraper, "page.png")

Automator.Scraper.stop(scraper)

Architecture

Scraper is a GenServer that owns:

  1. A headless Chromium process (via Automator.Chromium.spawn/0)
  2. A WebSocket connection to a page target (via Automator.Client)

When you call stop/1, the Chromium process is killed and the GenServer terminates.

CDP Commands Used

FunctionCDP Method
navigate/2Page.navigate
eval/2Runtime.evaluate
click/2Runtime.evaluate (with document.querySelector)
wait_for_selector/3Runtime.evaluate (with MutationObserver)
screenshot/1, screenshot/2Page.captureScreenshot
set_cookie/4Network.setCookie

For raw CDP access beyond these methods, use Automator.Client directly.

Summary

Functions

Returns a specification to start this module under a supervisor.

Clicks an element matching the CSS selector.

Evaluates JavaScript in the page context and returns the result value.

Callback implementation for GenServer.init/1.

Navigates to the given URL.

Captures a screenshot of the current page.

Captures a screenshot and writes it to the given file path.

Sets a cookie for the given domain.

Starts a new scraper by spawning Chromium and connecting to a page.

Stops the scraper, killing the Chromium process.

Waits for an element matching the CSS selector to appear in the DOM.

Types

t()

@type t() :: %Automator.Scraper{browser: Automator.Chromium.t(), client: pid()}

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

click(pid, selector)

Clicks an element matching the CSS selector.

Uses document.querySelector to find the element and calls .click() on it.

Parameters

  • pid - The scraper process
  • selector - A CSS selector string

Returns

true if the element was found and clicked, false otherwise.

Example

Automator.Scraper.click(scraper, "button.submit")
# => true

Automator.Scraper.click(scraper, "a[href='/next']")
# => true

eval(pid, js)

Evaluates JavaScript in the page context and returns the result value.

Uses Runtime.evaluate with awaitPromise: true and returnByValue: true, so async functions and promises are awaited, and the actual value is returned (not a RemoteObject reference).

Parameters

  • pid - The scraper process
  • js - The JavaScript expression to evaluate

Returns

The JavaScript result value, converted to an Elixir term.

Example

Automator.Scraper.eval(scraper, "document.title")
# => "Example Domain"

Automator.Scraper.eval(scraper, "document.querySelectorAll('a').length")
# => 1

Automator.Scraper.eval(scraper, "Array.from(document.querySelectorAll('a')).map(a => a.href)")
# => ["https://www.iana.org/domains/example"]

init(args)

Callback implementation for GenServer.init/1.

screenshot(pid)

Captures a screenshot of the current page.

Returns a map with a "data" key containing the base64-encoded PNG image. See also screenshot/2 to write directly to a file.

Parameters

  • pid - The scraper process

Example

%{"data" => base64} = Automator.Scraper.screenshot(scraper)
File.write!("screenshot.png", Base.decode64!(base64))

screenshot(pid, path)

Captures a screenshot and writes it to the given file path.

Decodes the base64 PNG data and writes it directly to disk.

Parameters

  • pid - The scraper process
  • path - File path to write the PNG to

Example

Automator.Scraper.screenshot(scraper, "screenshot.png")
# => :ok

set_cookie(pid, name, value, domain)

Sets a cookie for the given domain.

Parameters

  • pid - The scraper process
  • name - The cookie name
  • value - The cookie value
  • domain - The cookie domain (e.g., ".example.com")

Example

Automator.Scraper.set_cookie(scraper, "session", "abc123", ".example.com")
# => %{"success" => true}

start_link(opts \\ [])

Starts a new scraper by spawning Chromium and connecting to a page.

Returns {:ok, pid} where pid is the scraper process.

Example

{:ok, scraper} = Automator.Scraper.start_link()

stop(pid)

Stops the scraper, killing the Chromium process.

Example

Automator.Scraper.stop(scraper)

wait_for_selector(pid, selector, timeout \\ 10000)

Waits for an element matching the CSS selector to appear in the DOM.

Uses a MutationObserver to react immediately when the element is added, rather than polling. Times out after the given milliseconds.

Parameters

  • pid - The scraper process
  • selector - A CSS selector string
  • timeout - Maximum wait time in milliseconds (default: 10,000)

Returns

  • :ok - The element was found
  • {:error, reason} - The element was not found within the timeout

Example

Automator.Scraper.wait_for_selector(scraper, "h1")
# => :ok

Automator.Scraper.wait_for_selector(scraper, ".dynamic-content", 5000)
# => :ok

Automator.Scraper.wait_for_selector(scraper, ".nonexistent", 1000)
# => {:error, "selector .nonexistent not found within 1000ms"}