Configuration Overview

Forager uses a single TOML file per crawl package, stored at data/{name}/{name}.toml. The config structure mirrors the pipeline stages:

Section	Purpose
`[target]`	Crawl identity (package name)
`[score]`	What to look for: keywords, groups, semantics
`[fetch]`	How to get pages: seeds, concurrency, filters
`[extract]`	What to pull from pages: CSS selectors
`[frontier]`	Frontier management: tree depth, domain caps
`[dqn]`	RL agent: replay buffer, epsilon, training

Param<T> syntax

All numeric parameters support three modes via Param<T>:

Fixed – plain scalar, never changes:

relevance_threshold = 0.5

Auto – starts at the given value, the learner adjusts it freely:

relevance_threshold = { value = 0.1, mode = "auto" }

Range – learner adjusts within clamped bounds:

weight = { value = 0.7, mode = "range", min = 0.3, max = 0.9 }

Fixed params ignore learn() calls entirely. Auto and range params update when the statistical learner or DQN agent provides new values.

Only [target], [score], and [fetch] are required. All other sections have defaults:

[target]
name = "my-crawl"

[score]
terms = [{ text = "ecology", weight = 2.0 }]

[fetch]
seed_urls = ["https://example.com"]

data/
  my-crawl/
    my-crawl.toml      # config
    my-crawl.grafeo     # database (auto-created)