[tune] – Tune Configuration

The [tune] section controls the DQN training loop and experience replay. See Tune for a detailed explanation of the learning process.

Fields

[tune]
replay_capacity    = 50000
batch_size         = 64
learning_rate      = 0.001
gamma              = 0.99
lr_decay           = 0.995
replay_period      = 4
target_update_freq = 100
min_replay_size    = 500
per_alpha          = { value = 0.6, mode = "auto" }
per_epsilon        = { value = 0.0001, mode = "fixed" }

[tune.epsilon]
start       = 1.0
end         = 0.05
decay_steps = 500

Field	Type	Default	Description
`replay_capacity`	`usize`	`50000`	Maximum transitions stored in the replay buffer
`batch_size`	`usize`	`64`	Transitions sampled per training step
`learning_rate`	`f64`	`0.001`	Adam optimiser learning rate
`gamma`	`f64`	`0.99`	Discount factor for future rewards
`lr_decay`	`f64`	`0.995`	Learning rate decay multiplier applied each round
`replay_period`	`usize`	`4`	Train every N transitions (not every round)
`target_update_freq`	`usize`	`100`	Rounds between target network updates
`min_replay_size`	`usize`	`500`	Minimum buffer size before training begins
`per_alpha`	`Param<f64>`	`auto(0.6)`	PER prioritisation exponent (0 = uniform, 1 = full)
`per_epsilon`	`Param<f64>`	`fixed(1e-4)`	Small constant added to TD error to prevent zero priority

The replay buffer stores (state, action, reward, next_state) transitions from every page fetched. Once it reaches replay_capacity, old transitions are evicted. Training does not start until the buffer has at least min_replay_size entries, ensuring the agent has enough experience for meaningful gradient updates.

Learning rate

learning_rate is the initial rate for the Adam optimiser. lr_decay multiplies it after each round, producing exponential decay. This lets the agent make large updates early (when it knows little) and fine-tune later.

Discount factor

gamma controls how much the agent values future rewards versus immediate ones. At 0.99, a reward 100 steps in the future is worth about 37% of an immediate reward. Lower values make the agent more short-sighted.

[tune.epsilon]

The epsilon-greedy exploration schedule.

Field	Type	Default	Description
`start`	`f64`	`1.0`	Initial exploration rate (100% random)
`end`	`f64`	`0.05`	Final exploration rate
`decay_steps`	`usize`	`500`	Rounds over which epsilon decays

Epsilon decays linearly from start to end over decay_steps rounds. At start = 1.0, every action in round 1 is random. By the time decay_steps rounds have passed, only 5% of actions are random.

PER parameters

per_alpha controls how aggressively the replay buffer favours high-error transitions. At 0.0, all transitions are sampled uniformly. At 1.0, sampling is fully proportional to TD error. The default of 0.6 is a common sweet spot. In auto mode, the crawler adjusts this based on training stability.

per_epsilon is a small constant added to every transition’s priority so that no transition ever has exactly zero probability of being sampled. This is fixed at 1e-4 and rarely needs changing.

Forager

[tune] – Tune Configuration

Fields

Replay buffer

Learning rate

Discount factor

[tune.epsilon]

PER parameters

Keyboard shortcuts

Forager

[tune] – Tune Configuration

Fields

Replay buffer

Learning rate

Discount factor

[tune.epsilon]

PER parameters