Configuration Overview
Forager uses a single TOML file per crawl package, stored at data/{name}/{name}.toml. The config structure mirrors the pipeline stages:
| Section | Purpose |
|---|---|
[target] | Crawl identity (package name) |
[score] | What to look for: keywords, groups, semantics |
[fetch] | How to get pages: seeds, concurrency, filters |
[extract] | What to pull from pages: CSS selectors |
[frontier] | Frontier management: tree depth, domain caps |
[dqn] | RL agent: replay buffer, epsilon, training |
Param<T> syntax
All numeric parameters support three modes via Param<T>:
Fixed – plain scalar, never changes:
relevance_threshold = 0.5
Auto – starts at the given value, the learner adjusts it freely:
relevance_threshold = { value = 0.1, mode = "auto" }
Range – learner adjusts within clamped bounds:
weight = { value = 0.7, mode = "range", min = 0.3, max = 0.9 }
Fixed params ignore learn() calls entirely. Auto and range params update when the statistical learner or DQN agent provides new values.
Minimal config
Only [target], [score], and [fetch] are required. All other sections have defaults:
[target]
name = "my-crawl"
[score]
terms = [{ text = "ecology", weight = 2.0 }]
[fetch]
seed_urls = ["https://example.com"]
File layout
data/
my-crawl/
my-crawl.toml # config
my-crawl.grafeo # database (auto-created)