SpiderMan.start

You're seeing just the function start, go back to SpiderMan module for more information.
Link to this function

start(spider, settings \\ [])

Specs

start a spider

Settings

  • :log2file - The default value is true.

  • :status - The default value is :running.

  • :spider_module

  • :ets_file

  • :downloader_options

  • :spider_options

  • :item_processor_options

Downloader options

  • :requester - The default value is {{SpiderMan.Requester.Finch, []}}.

  • :producer - The default value is SpiderMan.Producer.ETS.

  • :context - The default value is %{}.

  • :processor - The default value is [max_demand: 1].

    • :stages
    • :concurrency - The default value is 8.
    • :min_demand
    • :max_demand - The default value is 10.
    • :partition_by
    • :spawn_opt
    • :hibernate_after
  • :rate_limiting - The default value is [allowed_messages: 10, interval: 1000].

    • :allowed_messages - Required.
    • :interval - Required.
  • :pipelines - The default value is [SpiderMan.Pipeline.DuplicateFilter].

  • :post_pipelines - The default value is [].

Spider options

  • :producer - The default value is SpiderMan.Producer.ETS.

  • :context - The default value is %{}.

  • :processor - The default value is [max_demand: 1].

    • :stages
    • :concurrency - The default value is 8.
    • :min_demand
    • :max_demand - The default value is 10.
    • :partition_by
    • :spawn_opt
    • :hibernate_after
  • :rate_limiting

    • :allowed_messages - Required.
    • :interval - Required.
  • :pipelines - The default value is [].

  • :post_pipelines - The default value is [].

Batchers options

  • :concurrency - The default value is 1.

  • :batch_size - The default value is 100.

  • :batch_timeout - The default value is 1000.

  • :partition_by

  • :spawn_opt

  • :hibernate_after

ItemProcessor options

  • :storage - The default value is SpiderMan.Storage.JsonLines.

  • :batchers - The default value is [default: [concurrency: 1, batch_size: 50, batch_timeout: 1000]].

  • :producer - The default value is SpiderMan.Producer.ETS.

  • :context - The default value is %{}.

  • :processor - The default value is [].

    • :stages
    • :concurrency - The default value is 8.
    • :min_demand
    • :max_demand - The default value is 10.
    • :partition_by
    • :spawn_opt
    • :hibernate_after
  • :rate_limiting

    • :allowed_messages - Required.
    • :interval - Required.
  • :pipelines - The default value is [SpiderMan.Pipeline.DuplicateFilter].

  • :post_pipelines - The default value is [].