scrapy_cloud_ex v0.1.0 ScrapyCloudEx.Endpoints.Storage.Requests View Source

Wraps the Requests endpoint.

The requests API allows you to work with request and response data from your crawls.

Link to this section Summary

Types

A request object

Functions

Retrieves request data for a given job

Retrives request stats for a given job

Link to this section Types

Link to this type request_object() View Source
request_object() :: %{required(String.t()) => integer() | String.t()}

A request object.

Map with the following keys:

  • "time" - request start timestamp in milliseconds (integer/0).
  • "method" - HTTP method. Defaults to "GET" (String.t/0).
  • "url" - request URL (String.t/0).
  • "status" - HTTP response code (integer/0).
  • "duration" - request duration in milliseconds (integer/0).
  • "rs" - response size in bytes (integer/0).
  • "parent" - index of the parent request (integer/0).
  • "fp" - request fingerprint (String.t/0).

Link to this section Functions

Link to this function get(api_key, composite_id, params \\ [], opts \\ []) View Source

Retrieves request data for a given job.

The composite_id may have up to 4 sections: the first 3 refering to project/spider/job ids with the last refering to the request number.

The following parameters are supported in the params argument:

  • :format - the format to be used for returning results. Can be :json or :jl. Defaults to :json.

  • :pagination - pagination parameters.

  • :meta - meta parameters to show.

  • :nodata - if set, no data will be returned other than specified :meta keys.

The opts value is documented here.

A warning will be logged if the composite_id has fewer than 4 sections and no pagination parameters were provided.

See docs here and here.

Example

ScrapyCloudEx.Endpoints.Storage.Requests.get("API_KEY", "14")
ScrapyCloudEx.Endpoints.Storage.Requests.get("API_KEY", "14/13")
ScrapyCloudEx.Endpoints.Storage.Requests.get("API_KEY", "14/13/12")
ScrapyCloudEx.Endpoints.Storage.Requests.get("API_KEY", "14/13/12/3456")
Link to this function stats(api_key, composite_id, opts \\ []) View Source

Retrives request stats for a given job.

The composite_id must have 3 sections (i.e. refer to a job).

The opts value is documented here.

The response will contain the following information:

FieldDescription
counts[field]The number of times the field occurs.
totals.input_bytesThe total size of all requests in bytes.
totals.input_valuesThe total number of requests.

See docs here.

Example

ScrapyCloudEx.Endpoints.Storage.Requests.stats("API_KEY", "14/13/12")

Example return value

%{
 "counts" => %{
   "duration" => 2888,
   "fp" => 2888,
   "method" => 2888,
   "parent" => 2886,
   "rs" => 2888,
   "status" => 2888,
   "url" => 2888
 },
 "totals" => %{"input_bytes" => 374000, "input_values" => 2888}
}