scrapy_cloud_ex v0.1.2 ScrapyCloudEx.Endpoints.App.Jobs View Source

Wraps the Jobs endpoint.

The jobs API makes it easy to work with your spider’s jobs and lets you schedule, stop, update and delete them.

Link to this section Summary

Types

A function to encode job settings to JSON

Functions

Retrieves job information for a given project, spider, or specific job

Link to this section Types

Link to this type encoder_fun() View Source
encoder_fun() :: (term() -> {:ok, String.t()} | {:error, any()})

A function to encode job settings to JSON.

This function will be given the job settings provided to run/5 so they can be encoded into a JSON string.

Link to this section Functions

Link to this function delete(api_key, project_id, job_or_jobs, opts \\ []) View Source

Deletes one or more jobs.

The job ids in job_or_jobs must have at least 3 sections.

The opts value is documented here.

See docs here.

Example

ScrapyCloudEx.Endpoints.App.Jobs.delete("API_KEY", "123", ["123/1/1", "123/1/2"])
# {:ok, %{"count" => 2, "status" => "ok"}}
Link to this function list(api_key, project_id, params \\ [], opts \\ []) View Source

Retrieves job information for a given project, spider, or specific job.

The following parameters are supported in the params argument:

  • :format - the format to be used for returning results. Can be :json or :jl. Defaults to :json.

  • :pagination - the pagination params: a keyword list with optional :count and :offset integer values, where :count indicates the desired number of results per page and :offset the offset to retrieve specific records.

  • :job - the job id.

  • :spider - the spider name.

  • :state - return jobs with specified state. Supported values: "pending", "running", "finished", "deleted".

  • :has_tag - return jobs with specified tag. May be given multiple times, and will behave as a logical OR operation among the values.

  • :lacks_tag - return jobs that lack specified tag. May be given multiple times, and will behave as a logical AND operation among the values.

The opts value is documented here.

See docs here.

Examples

# Retrieve the latest 3 finished jobs for "somespider" spider
params = [spider: "somespider", state: "finished", count: 3]
ScrapyCloudEx.Endpoints.App.Jobs.list("API_KEY", "123", params)

# Retrieve all running jobs
ScrapyCloudEx.Endpoints.App.Jobs.list("API_KEY", "123", state: "running")

# Retrieve 10 jobs with the tag "consumed"
ScrapyCloudEx.Endpoints.App.Jobs.list("API_KEY", "123", has_tag: "consumed", pagination: [count: 10])

Example return value

{:ok,
   %{
     "status" => "ok",
     "count" => 2,
     "total" => 2,
     "jobs" => [
       %{
         "close_reason" => "cancelled",
         "elapsed" => 124138,
         "errors_count" => 0,
         "id" => "123/1/3",
         "items_scraped" => 620,
         "logs" => 17,
         "priority" => 2,
         "responses_received" => 670,
         "spider" => "somespider",
         "spider_type" => "manual",
         "started_time" => "2018-10-03T07:06:07",
         "state" => "finished",
         "tags" => ["foo"],
         "updated_time" => "2018-10-03T07:07:42",
         "version" => "5ef3139-master"
       },
       %{
         "close_reason" => "cancelled",
         "elapsed" => 483843779,
         "errors_count" => 1,
         "id" => "123/1/2",
         "items_scraped" => 2783,
         "logs" => 20,
         "priority" => 3,
         "responses_received" => 2888,
         "spider" => "somespider",
         "spider_args" => %{"spiderarg1" => "example"},
         "spider_type" => "manual",
         "started_time" => "2018-10-23T16:42:54",
         "state" => "finished",
         "tags" => ["bar", "foo"],
         "updated_time" => "2018-10-23T16:45:54",
         "version" => "5ef3139-master"
       }
     ]
   }
 }
Link to this function run(api_key, project_id, spider_name, params \\ [], opts \\ []) View Source

Schedules a job for a given spider.

The following parameters are supported in the params argument:

  • :add_tag - add the specified tag to the job. May be given multiple times.

  • :job_settings - job settings to be proxied to the job. This value can be provided as a string representation of a JSON object, or as an Elixir term. If a term is provided, an accompanying encoding function (of type encoder_fun/0) must be provided with the :encoder key within opts.

  • :priority - job priority. Supports values in the 0..4 range (where 4 is highest priority). Defaults to 2.

  • :units - Amount of units to use for the job. Supports values in the 1..6 range.

Any other parameter will be treated as a spider argument.

The opts value is documented here.

See docs here.

Example

settings = [job_settings: ~s({ "SETTING1": "value1", "SETTING2": "value2" })]
tags = [add_tag: "sometag", add_tag: "othertag"]
params = [priority: 3, units: 1, spiderarg1: "example"] ++ tags ++ settings
ScrapyCloudEx.Endpoints.App.Jobs.run("API_KEY", "123", "somespider", params)
# {:ok, %{"jobid" => "123/1/4", "status" => "ok"}}
Link to this function stop(api_key, project_id, job_or_jobs, opts \\ []) View Source

Stops one or more running jobs.

The job ids in job_or_jobs must have at least 3 sections.

The opts value is documented here.

See docs here.

Example

ScrapyCloudEx.Endpoints.App.Jobs.stop("API_KEY", "123", ["123/1/1", "123/1/2"])
# {:ok, %{"status" => "ok"}}
Link to this function update(api_key, project_id, job_or_jobs, params \\ [], opts \\ []) View Source

Updates information about jobs.

The job ids in job_or_jobs must have at least 3 sections.

The following parameters are supported in the params argument:

  • :add_tag - add specified tag to the job(s). May be given multiple times.

  • :remove_tag - remove specified tag to the job(s). May be given multiple times.

The opts value is documented here.

See docs here.

Example

params = [add_tag: "foo", add_tag: "bar", remove_tag: "sometag"]
ScrapyCloudEx.Endpoints.App.Jobs.update("API_KEY", "123", ["123/1/1", "123/1/2"], params)
# {:ok, %{"count" => 2, "status" => "ok"}}