Kreuzcrawl.DownloadedDocument (kreuzcrawl v0.3.0-rc.37)

Copy Markdown

A downloaded non-HTML document (PDF, DOCX, image, code file, etc.).

When the crawler encounters non-HTML content and download_documents is enabled, it downloads the raw bytes and populates this struct instead of skipping the resource.

Summary

Types

t()

A downloaded non-HTML document (PDF, DOCX, image, code file, etc.).

Types

t()

@type t() :: %Kreuzcrawl.DownloadedDocument{
  content_hash: String.t() | nil,
  filename: String.t() | nil,
  headers: map(),
  mime_type: String.t() | nil,
  size: non_neg_integer(),
  url: String.t() | nil
}

A downloaded non-HTML document (PDF, DOCX, image, code file, etc.).