[score] – Scoring Configuration
The [score] section defines what the crawler considers relevant. Scoring combines keyword matching (term groups) with semantic similarity (embeddings). See Score for a detailed explanation of how scoring works.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
terms | [KeywordTerm] | [] | Flat term list (used when no groups) |
groups | [[TermGroup]] | [] | Multiplicative term groups with weights |
relevance_threshold | Param<f64> | auto(0.1) | Minimum score to count a page as relevant |
lang_penalty | Param<f64> | auto(0.0) | Score penalty for pages in non-target languages |
When groups is non-empty, flat terms are ignored. A page must match all required groups to score well – groups are multiplicative, not additive.
Language penalty
lang_penalty is applied when a page is detected in a language not listed in fetch.languages. In auto mode, the model learns the right penalty strength. For monolingual crawls it tends to ramp up; for multilingual crawls it stays low. A value of 0.0 means no penalty.
Term import
The terms_file field has been removed. Use forager import to load terms from CSV into the database. Config terms always take priority over DB-imported terms during merge.
KeywordTerm
{ text = "process philosophy", weight = 3.0 }
| Field | Type | Default | Description |
|---|---|---|---|
text | str | – | Search phrase |
weight | f64 | 1.0 | Multiplier for this term |
TermGroup
[[score.groups]]
name = "philosophy"
required = true
weight = 1.0
terms = [
{ text = "process philosophy", weight = 3.0 },
{ text = "Whitehead", weight = 2.5 },
]
| Field | Type | Default | Description |
|---|---|---|---|
name | str | – | Group identifier |
required | bool | true | Page must match this group to score well |
weight | f64 | 1.0 | Group-level multiplier |
terms | [KeywordTerm] | – | Terms belonging to this group |
Group example
Two required groups ensure pages must match both topic and program type:
[[score.groups]]
name = "philosophy"
required = true
weight = 1.0
terms = [
{ text = "process philosophy", weight = 3.0 },
{ text = "Whitehead", weight = 2.5 },
{ text = "continental philosophy", weight = 2.0 },
]
[[score.groups]]
name = "program"
required = true
weight = 2.0
terms = [
{ text = "master programme", weight = 3.0 },
{ text = "ECTS", weight = 2.0 },
{ text = "postgraduate", weight = 1.5 },
]
A Wikipedia article on Whitehead (philosophy only, no program terms) scores near zero. An MA in Computer Science (program only) scores near zero. An MA in Process Philosophy (both groups) scores high.
[score.semantic]
Embedding-based similarity using a reference description and optional anti-reference.
[score.semantic]
reference = """
I am looking for a European master's programme in process philosophy...
"""
anti_reference = """
Analytic philosophy focused on formal logic...
"""
weight = { value = 0.7, mode = "range", min = 0.3, max = 0.9 }
anti_weight = { value = 0.3, mode = "range", min = 0.1, max = 0.5 }
max_text_len = 2000
reference_blend = { value = 0.1, mode = "range", min = 0.0, max = 0.3 }
| Field | Type | Default | Description |
|---|---|---|---|
reference | str | – | Natural language description of what you want |
anti_reference | str? | None | Description of what you do not want |
weight | Param<f64> | range(0.7, 0.3, 0.9) | How much semantic similarity counts in final score |
anti_weight | Param<f64> | range(0.3, 0.1, 0.5) | Penalty weight for anti-reference similarity |
max_text_len | usize | 2000 | Max characters of page text sent to embedder |
reference_blend | Param<f64> | range(0.1, 0.0, 0.3) | Blend rate: how fast reference adapts toward relevant pages. 0.0 = static, 1.0 = fully replace each round |
[score.semantic.signals]
Controls how much weight each page region gets in the semantic score.
[score.semantic.signals]
title = { value = 0.4, mode = "auto" }
heading = { value = 0.3, mode = "auto" }
body = { value = 0.3, mode = "auto" }
| Field | Type | Default | Description |
|---|---|---|---|
title | Param<f64> | auto(0.4) | Weight for <title> content |
heading | Param<f64> | auto(0.3) | Weight for heading elements |
body | Param<f64> | auto(0.3) | Weight for body text |