Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

[score] – Scoring Configuration

The [score] section defines what the crawler considers relevant. Scoring combines keyword matching (term groups) with semantic similarity (embeddings). See Score for a detailed explanation of how scoring works.

Fields

FieldTypeDefaultDescription
terms[KeywordTerm][]Flat term list (used when no groups)
groups[[TermGroup]][]Multiplicative term groups with weights
relevance_thresholdParam<f64>auto(0.1)Minimum score to count a page as relevant
lang_penaltyParam<f64>auto(0.0)Score penalty for pages in non-target languages

When groups is non-empty, flat terms are ignored. A page must match all required groups to score well – groups are multiplicative, not additive.

Language penalty

lang_penalty is applied when a page is detected in a language not listed in fetch.languages. In auto mode, the model learns the right penalty strength. For monolingual crawls it tends to ramp up; for multilingual crawls it stays low. A value of 0.0 means no penalty.

Term import

The terms_file field has been removed. Use forager import to load terms from CSV into the database. Config terms always take priority over DB-imported terms during merge.

KeywordTerm

{ text = "process philosophy", weight = 3.0 }
FieldTypeDefaultDescription
textstrSearch phrase
weightf641.0Multiplier for this term

TermGroup

[[score.groups]]
name = "philosophy"
required = true
weight = 1.0
terms = [
    { text = "process philosophy", weight = 3.0 },
    { text = "Whitehead", weight = 2.5 },
]
FieldTypeDefaultDescription
namestrGroup identifier
requiredbooltruePage must match this group to score well
weightf641.0Group-level multiplier
terms[KeywordTerm]Terms belonging to this group

Group example

Two required groups ensure pages must match both topic and program type:

[[score.groups]]
name = "philosophy"
required = true
weight = 1.0
terms = [
    { text = "process philosophy", weight = 3.0 },
    { text = "Whitehead", weight = 2.5 },
    { text = "continental philosophy", weight = 2.0 },
]

[[score.groups]]
name = "program"
required = true
weight = 2.0
terms = [
    { text = "master programme", weight = 3.0 },
    { text = "ECTS", weight = 2.0 },
    { text = "postgraduate", weight = 1.5 },
]

A Wikipedia article on Whitehead (philosophy only, no program terms) scores near zero. An MA in Computer Science (program only) scores near zero. An MA in Process Philosophy (both groups) scores high.

[score.semantic]

Embedding-based similarity using a reference description and optional anti-reference.

[score.semantic]
reference = """
I am looking for a European master's programme in process philosophy...
"""
anti_reference = """
Analytic philosophy focused on formal logic...
"""
weight = { value = 0.7, mode = "range", min = 0.3, max = 0.9 }
anti_weight = { value = 0.3, mode = "range", min = 0.1, max = 0.5 }
max_text_len = 2000
reference_blend = { value = 0.1, mode = "range", min = 0.0, max = 0.3 }
FieldTypeDefaultDescription
referencestrNatural language description of what you want
anti_referencestr?NoneDescription of what you do not want
weightParam<f64>range(0.7, 0.3, 0.9)How much semantic similarity counts in final score
anti_weightParam<f64>range(0.3, 0.1, 0.5)Penalty weight for anti-reference similarity
max_text_lenusize2000Max characters of page text sent to embedder
reference_blendParam<f64>range(0.1, 0.0, 0.3)Blend rate: how fast reference adapts toward relevant pages. 0.0 = static, 1.0 = fully replace each round

[score.semantic.signals]

Controls how much weight each page region gets in the semantic score.

[score.semantic.signals]
title   = { value = 0.4, mode = "auto" }
heading = { value = 0.3, mode = "auto" }
body    = { value = 0.3, mode = "auto" }
FieldTypeDefaultDescription
titleParam<f64>auto(0.4)Weight for <title> content
headingParam<f64>auto(0.3)Weight for heading elements
bodyParam<f64>auto(0.3)Weight for body text