SpiderMan.Configuration (spider_man v0.3.4)

Handle settings for spider

Startup Spiders

config :spider_man, :spiders, [
  SpiderA,
  {SpiderB, settings = [...]},
  ...
]

All Spider what defined on :spiders would auto startup while the :spider_man application started.

Global Settings

config :spider_man, global_settings: settings = [...]

This settings work for all spiders.

Settings for Spider on config files

config :spider_man, SpiderA, settings = [...]

This settings only work for SpiderA.

Default Settings

[
  downloader_options: [
    producer: SpiderMan.Producer.ETS,
    processor: [max_demand: 1],
    rate_limiting: [allowed_messages: 10, interval: 1000],
    pipelines: [SpiderMan.Pipeline.DuplicateFilter],
    post_pipelines: [],
    context: %{}
  ],
  spider_options: [
    producer: SpiderMan.Producer.ETS,
    processor: [max_demand: 1],
    pipelines: [],
    post_pipelines: [],
    context: %{}
  ],
  item_processor_options: [
    producer: SpiderMan.Producer.ETS,
    storage: SpiderMan.Storage.JsonLines,
    pipelines: [SpiderMan.Pipeline.DuplicateFilter],
    post_pipelines: [],
    context: %{},
    batchers: [default: [concurrency: 1, batch_size: 50, batch_timeout: 1000]]
  ]
]

Settings Priority

  1. Settings for Spider directly. 1.1 settings defined in spiders for the Spider. 1.2 As second argument while call SpiderMan.start/2.
  2. Return by callback function: SpiderModule.settings/0.
  3. Settings for Spider on config files.
  4. Global Settings.
  5. Default Settings.

Link to this section Summary

Link to this section Functions

Link to this function

configuration_docs()

Link to this function

configuration_spec()

Link to this function

validate_pipeline(v)

Link to this function

validate_settings!(spider, spider_settings)