scrapy_cloud_ex v0.1.1 ScrapyCloudEx.Endpoints.Storage.JobQ View Source

Wraps the JobQ endpoint.

The JobQ API allows you to retrieve finished jobs from the queue.

Link to this section Summary

Functions

Counts the jobs for the specified project

Lists the jobs for the specified project, in order from most recent to last

Link to this section Functions

Link to this function count(api_key, project_id, params \\ [], opts \\ []) View Source

Counts the jobs for the specified project.

The following parameters are supported in the params argument:

  • :spider - the spider name.

  • :state - return jobs with specified state. Supported values: "pending", "running", "finished", "deleted".

  • :startts - UNIX timestamp at which to begin results, in milliseconds.

  • :endts - UNIX timestamp at which to end results, in milliseconds.

  • :count - limit results by a given number of jobs.

  • :has_tag - return jobs with specified tag. May be given multiple times, and will behave as a logical OR operation among the values.

  • :lacks_tag - return jobs that lack specified tag. May be given multiple times, and will behave as a logical AND operation among the values.

The opts value is documented here.

See docs here.

Example

ScrapyCloudEx.Endpoints.Storage.JobQ.count("API_KEY", "14", state: "running", has_tag: "sometag")
# {:ok, 4}
Link to this function list(api_key, project_id, params \\ [], opts \\ []) View Source

Lists the jobs for the specified project, in order from most recent to last.

The following parameters are supported in the params argument:

  • :format - the format to be used for returning results. Can be :json or :jl. Defaults to :json.

  • :pagination - the :count pagination parameter is supported.

  • :spider - the spider name.

  • :state - return jobs with specified state. Supported values: "pending", "running", "finished", "deleted".

  • :startts - UNIX timestamp at which to begin results, in milliseconds.

  • :endts - UNIX timestamp at which to end results, in milliseconds.

  • :start - offset of initial jobs to skip in returned results.

  • :end - job key at which to stop showing results.

  • :key - job key for which to get job data. May be given multiple times.

  • :has_tag - return jobs with specified tag. May be given multiple times, and will behave as a logical OR operation among the values.

  • :lacks_tag - return jobs that lack specified tag. May be given multiple times, and will behave as a logical AND operation among the values.

The opts value is documented here.

See docs here.

List jobs finished between two timestamps

If you pass the startts and endts parameters, the API will return only the jobs finished between them.

ScrapyCloudEx.Endpoints.Storage.JobQ.list("API_KEY", 53, startts: 1359774955431, endts: 1359774955440)

Retrieve jobs finished after some job

JobQ returns the list of jobs, with the most recently finished first. It is recommended to associate the key of the most recently finished job with the downloaded data. When you want to update your data later on, you can list the jobs and stop at the previously downloaded job, through the :stop parameter.

ScrapyCloudEx.Endpoints.Storage.JobQ.list("API_KEY", 53, stop: "53/7/81")

Example return value

{:ok, [
  %{
    "close_reason" => "cancelled",
    "elapsed" => 485061225,
    "errors" => 1,
    "finished_time" => 1540745154657,
    "items" => 2783,
    "key" => "345675/1/26",
    "logs" => 20,
    "pages" => 2888,
    "pending_time" => 1540744974169,
    "running_time" => 1540744974190,
    "spider" => "sixbid.com",
    "state" => "finished",
    "ts" => 1540745141316,
    "version" => "5ef2169-master"
  }
]}