Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Graph Database Model

Forager uses GrafeoDB, an embedded graph database. All state is stored in data/{name}/{name}.grafeo. The schema consists of nine node types and one edge type.

The database can be queried in three ways:

  • From the CLIforager query <pkg> "<GQL>" for ad-hoc queries from the terminal.
  • From Julia/C/etc.forager-db builds as a shared library (libforager_db.so) with a C FFI. See Crates for the API.
  • From Rust — use the Db struct directly via forager-db as a library dependency.

Node types

:CrawlRun

One node per crawl execution.

PropertyTypeDescription
uidstringUUID identifying this run
config_namestringPackage name from [target].name
started_atstringISO 8601 timestamp
finished_atstringISO 8601 timestamp (set on finish)
statusstring"running", "finished", "failed"
pages_crawledintTotal pages fetched in this run

:Page

One node per fetched page. Linked to its CrawlRun via [:BELONGS_TO].

PropertyTypeDescription
uidstringUUID
crawl_run_uidstringForeign key to CrawlRun
urlstringNormalized URL
status_codeintHTTP status code
htmlstringRaw HTML body
fetched_atstringISO 8601 timestamp
scorefloatRelevance score (set after scoring)
term_hitsstringJSON array of matched term texts
scored_atstringISO 8601 timestamp
embeddingvector384-dim float vector (sentence embedding)
extract_{name}stringExtracted field values (one per extract field)

:Term

Imported or config-defined search terms, stored per package.

PropertyTypeDescription
config_namestringPackage name
groupstringTerm group name
textstringThe search phrase
weightfloatTerm weight
embeddingvector384-dim embedding of the term text

:Transition

DQN training data. Each transition records a state, reward, and available next actions.

PropertyTypeDescription
config_namestringPackage name
featuresstringComma-separated float vector (11-dim)
rewardfloatObserved reward for this transition
next_actionsstringJSON array of available next action features

:Domain

Per-domain crawl statistics, saved after each run.

PropertyTypeDescription
config_namestringPackage name
namestringDomain name (e.g., example.com)
fetchesintTotal fetch attempts
successesintSuccessful fetches (HTTP 2xx)
reward_sumfloatCumulative reward from this domain
avg_fetch_msfloatAverage fetch latency
avg_parse_msfloatAverage parse time
avg_html_bytesfloatAverage response body size

:Model

Persisted DQN network weights and training state. One node per package (replaced on save).

PropertyTypeDescription
config_namestringPackage name
dqn_weightsstringSerialized network weights
epsilonfloatCurrent epsilon value
stepsintTotal training steps completed

:ParamGroup

Persisted adaptive parameter groups. One node per (package, stage) pair.

PropertyTypeDescription
config_namestringPackage name
group_keystringStage key (score, fetch, parse, select, tune)
jsonstringJSON-serialized param group + learner state

:Frontier

Persisted frontier tree state. One node per package (replaced on save).

PropertyTypeDescription
config_namestringPackage name
tree_jsonstringJSON-serialized frontier tree

Edge types

[:BELONGS_TO]

Connects :Page to :CrawlRun. Direction: (Page)-[:BELONGS_TO]->(CrawlRun).

Indexes

  • HNSW vector index on Page.embedding: 384 dimensions, cosine similarity. Created by Db::init_schema(). Enables approximate nearest-neighbor queries over page embeddings.