Crawly v0.2.0 Crawly.RequestsStorage View Source
Request storage, a module responsible for storing urls for crawling
┌──────────────────┐
│ │ ┌------------------┐
│ RequestsStorage <─────────────┤ From crawlers1,2 │
│ │ └------------------┘
└─────────┬────────┘
│
│
│
│
┌────────────▼─────────────────┐
│ │
│ │
│ │
┌───────────▼──────────┐ ┌───────────▼──────────┐ │RequestsStorageWorker1│ │RequestsStorageWorker2│ │ (Crawler1) │ │ (Crawler2) │ └──────────────────────┘ └──────────────────────┘
All requests are going through one RequestsStorage process, which quickly finds the actual worker, which finally stores the request afterwords.
Link to this section Summary
Functions
Returns a specification to start this module under a supervisor.
Invoked when the server is started. start_link/3
or start/3
will
block until it returns.
Pop a request out of requests storage
Starts a worker for a given spider
Get statistics from the requests storage
Store request in related child worker
Link to this section Functions
child_spec(arg) View Source
Returns a specification to start this module under a supervisor.
See Supervisor
.
init(args) View Source
Invoked when the server is started. start_link/3
or start/3
will
block until it returns.
args
is the argument term (second argument) passed to start_link/3
.
Returning {:ok, state}
will cause start_link/3
to return
{:ok, pid}
and the process to enter its loop.
Returning {:ok, state, timeout}
is similar to {:ok, state}
except handle_info(:timeout, state)
will be called after timeout
milliseconds if no messages are received within the timeout.
Returning {:ok, state, :hibernate}
is similar to {:ok, state}
except the process is hibernated before entering the loop. See
c:handle_call/3
for more information on hibernation.
Returning {:ok, state, {:continue, continue}}
is similar to
{:ok, state}
except that immediately after entering the loop
the c:handle_continue/2
callback will be invoked with the value
continue
as first argument.
Returning :ignore
will cause start_link/3
to return :ignore
and
the process will exit normally without entering the loop or calling
c:terminate/2
. If used when part of a supervision tree the parent
supervisor will not fail to start nor immediately try to restart the
GenServer
. The remainder of the supervision tree will be started
and so the GenServer
should not be required by other processes.
It can be started later with Supervisor.restart_child/2
as the child
specification is saved in the parent supervisor. The main use cases for
this are:
- The
GenServer
is disabled by configuration but might be enabled later. - An error occurred and it will be handled by a different mechanism than the
Supervisor
. Likely this approach involves callingSupervisor.restart_child/2
after a delay to attempt a restart.
Returning {:stop, reason}
will cause start_link/3
to return
{:error, reason}
and the process to exit with reason reason
without
entering the loop or calling c:terminate/2
.
Callback implementation for GenServer.init/1
.
pop(spider_name)
View Source
pop(spider_name) :: result
when spider_name: atom(),
result: nil | Crawly.Request.t() | {:error, :storage_worker_not_running}
pop(spider_name) :: result when spider_name: atom(), result: nil | Crawly.Request.t() | {:error, :storage_worker_not_running}
Pop a request out of requests storage
start_link(list) View Source
start_worker(spider_name) View Source
Starts a worker for a given spider
stats(spider_name)
View Source
stats(spider_name) :: result
when spider_name: atom(),
result:
{:stored_requests, non_neg_integer()}
| {:error, :storage_worker_not_running}
stats(spider_name) :: result when spider_name: atom(), result: {:stored_requests, non_neg_integer()} | {:error, :storage_worker_not_running}
Get statistics from the requests storage
store(spider_name, requests)
View Source
store(spider_name, requests) :: result
when spider_name: atom(),
requests: [Crawly.Request.t()],
result: :ok | {:error, :storage_worker_not_running}
store(spider_name, request) :: :ok
when spider_name: atom(), request: Crawly.Request.t()
store(spider_name, requests) :: result when spider_name: atom(), requests: [Crawly.Request.t()], result: :ok | {:error, :storage_worker_not_running}
store(spider_name, request) :: :ok when spider_name: atom(), request: Crawly.Request.t()
Store request in related child worker