Kreuzcrawl (kreuzcrawl v0.1.1)

Copy Markdown

High-level API for kreuzcrawl.

Summary

Functions

Crawl multiple seed URLs concurrently, each following links to configured depth.

Scrape multiple URLs concurrently.

Crawl a website starting from url, following links up to the configured depth.

Validate the configuration, returning an error if any values are invalid.

Returns the count of unique normalized URLs encountered during crawling.

Create a new crawl engine with the given configuration.

Create a new crawl engine with the given configuration.

Discover all pages on a website by following links and sitemaps.

Scrape a single URL, returning extracted page data.

Functions

batch_crawl_async(engine, urls)

@spec batch_crawl_async(reference(), [String.t()]) :: [String.t() | nil]

Crawl multiple seed URLs concurrently, each following links to configured depth.

batch_scrape_async(engine, urls)

@spec batch_scrape_async(reference(), [String.t()]) :: [String.t() | nil]

Scrape multiple URLs concurrently.

browserconfig_default()

@spec browserconfig_default() :: String.t() | nil

Method

crawl_async(engine, url)

@spec crawl_async(reference(), String.t()) ::
  {:ok, String.t() | nil} | {:error, String.t()}

Crawl a website starting from url, following links up to the configured depth.

crawlconfig_default()

@spec crawlconfig_default() :: String.t() | nil

Method

crawlconfig_validate(obj)

@spec crawlconfig_validate(map()) :: {:ok, nil} | {:error, String.t()}

Validate the configuration, returning an error if any values are invalid.

crawlresult_unique_normalized_urls(obj)

@spec crawlresult_unique_normalized_urls(map()) :: non_neg_integer()

Returns the count of unique normalized URLs encountered during crawling.

create_engine()

@spec create_engine() :: {:ok, reference()} | {:error, String.t()}

Create a new crawl engine with the given configuration.

create_engine(config)

@spec create_engine(String.t() | nil | nil) ::
  {:ok, reference()} | {:error, String.t()}

Create a new crawl engine with the given configuration.

map_urls_async(engine, url)

@spec map_urls_async(reference(), String.t()) ::
  {:ok, String.t() | nil} | {:error, String.t()}

Discover all pages on a website by following links and sitemaps.

scrape_async(engine, url)

@spec scrape_async(reference(), String.t()) ::
  {:ok, String.t() | nil} | {:error, String.t()}

Scrape a single URL, returning extracted page data.