scrapy_cloud_ex v0.1.0 ScrapyCloudEx.Endpoints.App.Jobs View Source
Wraps the Jobs endpoint.
The jobs API makes it easy to work with your spider’s jobs and lets you schedule, stop, update and delete them.
Link to this section Summary
Types
A function to encode job settings to JSON
Functions
Deletes one or more jobs
Retrieves job information for a given project, spider, or specific job
Schedules a job for a given spider
Stops one or more running jobs
Updates information about jobs
Link to this section Types
A function to encode job settings to JSON.
This function will be given the job settings provided to run/5
so they can be encoded
into a JSON string.
Link to this section Functions
Deletes one or more jobs.
The job ids in job_or_jobs
must have at least 3 sections.
The opts
value is documented here.
See docs here.
Example
ScrapyCloudEx.Endpoints.App.Jobs.delete("API_KEY", "123", ["123/1/1", "123/1/2"])
# {:ok, %{"count" => 2, "status" => "ok"}}
Retrieves job information for a given project, spider, or specific job.
The following parameters are supported in the params
argument:
:format
- the format to be used for returning results. Can be:json
or:jl
. Defaults to:json
.:pagination
- the pagination params: a keyword list with optional:count
and:offset
integer values, where:count
indicates the desired number of results per page and:offset
the offset to retrieve specific records.:job
- the job id.:spider
- the spider name.:state
- return jobs with specified state. Supported values:"pending"
,"running"
,"finished"
,"deleted"
.:has_tag
- return jobs with specified tag. May be given multiple times, and will behave as a logicalOR
operation among the values.:lacks_tag
- return jobs that lack specified tag. May be given multiple times, and will behave as a logicalAND
operation among the values.
The opts
value is documented here.
See docs here.
Examples
# Retrieve the latest 3 finished jobs for "somespider" spider
params = [spider: "somespider", state: "finished", count: 3]
ScrapyCloudEx.Endpoints.App.Jobs.list("API_KEY", "123", params)
# Retrieve all running jobs
ScrapyCloudEx.Endpoints.App.Jobs.list("API_KEY", "123", state: "running")
# Retrieve 10 jobs with the tag "consumed"
ScrapyCloudEx.Endpoints.App.Jobs.list("API_KEY", "123", has_tag: "consumed", pagination: [count: 10])
Example return value
{:ok,
%{
"status" => "ok",
"count" => 2,
"total" => 2,
"jobs" => [
%{
"close_reason" => "cancelled",
"elapsed" => 124138,
"errors_count" => 0,
"id" => "123/1/3",
"items_scraped" => 620,
"logs" => 17,
"priority" => 2,
"responses_received" => 670,
"spider" => "somespider",
"spider_type" => "manual",
"started_time" => "2018-10-03T07:06:07",
"state" => "finished",
"tags" => ["foo"],
"updated_time" => "2018-10-03T07:07:42",
"version" => "5ef3139-master"
},
%{
"close_reason" => "cancelled",
"elapsed" => 483843779,
"errors_count" => 1,
"id" => "123/1/2",
"items_scraped" => 2783,
"logs" => 20,
"priority" => 3,
"responses_received" => 2888,
"spider" => "somespider",
"spider_args" => %{"spiderarg1" => "example"},
"spider_type" => "manual",
"started_time" => "2018-10-23T16:42:54",
"state" => "finished",
"tags" => ["bar", "foo"],
"updated_time" => "2018-10-23T16:45:54",
"version" => "5ef3139-master"
}
]
}
}
Schedules a job for a given spider.
The following parameters are supported in the params
argument:
:add_tag
- add the specified tag to the job. May be given multiple times.:job_settings
- job settings to be proxied to the job. This value can be provided as a string representation of a JSON object, or as an Elixir term. If a term is provided, an accompanying encoding function (of typeencoder_fun/0
) must be provided with the:encoder
key withinopts
.:priority
- job priority. Supports values in the0..4
range (where4
is highest priority). Defaults to2
.:units
- Amount of units to use for the job. Supports values in the1..6
range.
Any other parameter will be treated as a spider argument.
The opts
value is documented here.
See docs here.
Example
settings = [job_settings: ~s({ "SETTING1": "value1", "SETTING2": "value2" })]
tags = [add_tag: "sometag", add_tag: "othertag"]
params = [priority: 3, units: 1, spiderarg1: "example"] ++ tags ++ settings
ScrapyCloudEx.Endpoints.App.Jobs.run("API_KEY", "123", "somespider", params)
# {:ok, %{"jobid" => "123/1/4", "status" => "ok"}}
Stops one or more running jobs.
The job ids in job_or_jobs
must have at least 3 sections.
The opts
value is documented here.
See docs here.
Example
ScrapyCloudEx.Endpoints.App.Jobs.stop("API_KEY", "123", ["123/1/1", "123/1/2"])
# {:ok, %{"status" => "ok"}}
Updates information about jobs.
The job ids in job_or_jobs
must have at least 3 sections.
The following parameters are supported in the params
argument:
:add_tag
- add specified tag to the job(s). May be given multiple times.:remove_tag
- remove specified tag to the job(s). May be given multiple times.
The opts
value is documented here.
See docs here.
Example
params = [add_tag: "foo", add_tag: "bar", remove_tag: "sometag"]
ScrapyCloudEx.Endpoints.App.Jobs.update("API_KEY", "123", ["123/1/1", "123/1/2"], params)
# {:ok, %{"count" => 2, "status" => "ok"}}