Kreuzcrawl (kreuzcrawl v0.3.0-rc.43)

Copy Markdown

High-level API for kreuzcrawl

Summary

Functions

Crawl multiple seed URLs concurrently, each following links to configured depth.

Streaming batch_crawl_stream — returns an Enumerable of decoded chunk maps.

Scrape multiple URLs concurrently.

Crawl a website starting from url, following links up to the configured depth.

Streaming crawl_stream — returns an Enumerable of decoded chunk maps.

Create a new crawl engine with the given configuration.

Create a new crawl engine with the given configuration.

Convert markdown links to numbered citations.

Execute browser actions on a single page.

Discover all pages on a website by following links and sitemaps.

Scrape a single URL, returning extracted page data.

Functions

batch_crawl_async(engine, urls)

@spec batch_crawl_async(reference(), [String.t()]) ::
  {:ok, map()} | {:error, atom(), String.t()}

Crawl multiple seed URLs concurrently, each following links to configured depth.

batch_crawl_stream(client, req)

Streaming batch_crawl_stream — returns an Enumerable of decoded chunk maps.

batch_scrape_async(engine, urls)

@spec batch_scrape_async(reference(), [String.t()]) ::
  {:ok, map()} | {:error, atom(), String.t()}

Scrape multiple URLs concurrently.

crawl_async(engine, url)

@spec crawl_async(reference(), String.t()) ::
  {:ok, map()} | {:error, atom(), String.t()}

Crawl a website starting from url, following links up to the configured depth.

crawl_stream(client, req)

Streaming crawl_stream — returns an Enumerable of decoded chunk maps.

create_engine()

@spec create_engine() :: {:ok, reference()} | {:error, atom(), String.t()}

Create a new crawl engine with the given configuration.

create_engine(config)

@spec create_engine(String.t() | nil) ::
  {:ok, reference()} | {:error, atom(), String.t()}

Create a new crawl engine with the given configuration.

generate_citations(markdown)

@spec generate_citations(String.t()) :: map()

Convert markdown links to numbered citations.

interact_async(engine, url, actions)

@spec interact_async(reference(), String.t(), [map()]) ::
  {:ok, map()} | {:error, atom(), String.t()}

Execute browser actions on a single page.

map_urls_async(engine, url)

@spec map_urls_async(reference(), String.t()) ::
  {:ok, map()} | {:error, atom(), String.t()}

Discover all pages on a website by following links and sitemaps.

scrape_async(engine, url)

@spec scrape_async(reference(), String.t()) ::
  {:ok, map()} | {:error, atom(), String.t()}

Scrape a single URL, returning extracted page data.