High-level scraping API that manages a Chromium instance and page connection.
This is the primary interface for web scraping. It spawns a headless Chromium browser, connects to a page, and provides simple functions for navigation, JavaScript evaluation, and interaction.
Example
{:ok, scraper} = Automator.Scraper.start_link()
Automator.Scraper.navigate(scraper, "https://example.com")
title = Automator.Scraper.eval(scraper, "document.title")
# => "Example Domain"
Automator.Scraper.wait_for_selector(scraper, "h1")
Automator.Scraper.click(scraper, "a")
Automator.Scraper.screenshot(scraper, "page.png")
Automator.Scraper.stop(scraper)Architecture
Scraper is a GenServer that owns:
- A headless Chromium process (via
Automator.Chromium.spawn/0) - A WebSocket connection to a page target (via
Automator.Client)
When you call stop/1, the Chromium process is killed and the GenServer
terminates.
CDP Commands Used
| Function | CDP Method |
|---|---|
navigate/2 | Page.navigate |
eval/2 | Runtime.evaluate |
click/2 | Runtime.evaluate (with document.querySelector) |
wait_for_selector/3 | Runtime.evaluate (with MutationObserver) |
screenshot/1, screenshot/2 | Page.captureScreenshot |
set_cookie/4 | Network.setCookie |
For raw CDP access beyond these methods, use Automator.Client directly.
Summary
Functions
Returns a specification to start this module under a supervisor.
Clicks an element matching the CSS selector.
Evaluates JavaScript in the page context and returns the result value.
Callback implementation for GenServer.init/1.
Navigates to the given URL.
Captures a screenshot of the current page.
Captures a screenshot and writes it to the given file path.
Sets a cookie for the given domain.
Starts a new scraper by spawning Chromium and connecting to a page.
Stops the scraper, killing the Chromium process.
Waits for an element matching the CSS selector to appear in the DOM.
Functions
Returns a specification to start this module under a supervisor.
See Supervisor.
Clicks an element matching the CSS selector.
Uses document.querySelector to find the element and calls .click() on it.
Parameters
pid- The scraper processselector- A CSS selector string
Returns
true if the element was found and clicked, false otherwise.
Example
Automator.Scraper.click(scraper, "button.submit")
# => true
Automator.Scraper.click(scraper, "a[href='/next']")
# => true
Evaluates JavaScript in the page context and returns the result value.
Uses Runtime.evaluate with awaitPromise: true and returnByValue: true,
so async functions and promises are awaited, and the actual value is returned
(not a RemoteObject reference).
Parameters
pid- The scraper processjs- The JavaScript expression to evaluate
Returns
The JavaScript result value, converted to an Elixir term.
Example
Automator.Scraper.eval(scraper, "document.title")
# => "Example Domain"
Automator.Scraper.eval(scraper, "document.querySelectorAll('a').length")
# => 1
Automator.Scraper.eval(scraper, "Array.from(document.querySelectorAll('a')).map(a => a.href)")
# => ["https://www.iana.org/domains/example"]
Callback implementation for GenServer.init/1.
Captures a screenshot of the current page.
Returns a map with a "data" key containing the base64-encoded PNG image.
See also screenshot/2 to write directly to a file.
Parameters
pid- The scraper process
Example
%{"data" => base64} = Automator.Scraper.screenshot(scraper)
File.write!("screenshot.png", Base.decode64!(base64))
Captures a screenshot and writes it to the given file path.
Decodes the base64 PNG data and writes it directly to disk.
Parameters
pid- The scraper processpath- File path to write the PNG to
Example
Automator.Scraper.screenshot(scraper, "screenshot.png")
# => :ok
Sets a cookie for the given domain.
Parameters
pid- The scraper processname- The cookie namevalue- The cookie valuedomain- The cookie domain (e.g.,".example.com")
Example
Automator.Scraper.set_cookie(scraper, "session", "abc123", ".example.com")
# => %{"success" => true}
Starts a new scraper by spawning Chromium and connecting to a page.
Returns {:ok, pid} where pid is the scraper process.
Example
{:ok, scraper} = Automator.Scraper.start_link()
Stops the scraper, killing the Chromium process.
Example
Automator.Scraper.stop(scraper)
Waits for an element matching the CSS selector to appear in the DOM.
Uses a MutationObserver to react immediately when the element is added,
rather than polling. Times out after the given milliseconds.
Parameters
pid- The scraper processselector- A CSS selector stringtimeout- Maximum wait time in milliseconds (default: 10,000)
Returns
:ok- The element was found{:error, reason}- The element was not found within the timeout
Example
Automator.Scraper.wait_for_selector(scraper, "h1")
# => :ok
Automator.Scraper.wait_for_selector(scraper, ".dynamic-content", 5000)
# => :ok
Automator.Scraper.wait_for_selector(scraper, ".nonexistent", 1000)
# => {:error, "selector .nonexistent not found within 1000ms"}