Graph Database Model
Forager uses GrafeoDB, an embedded graph database. All state is stored in data/{name}/{name}.grafeo. The schema consists of nine node types and one edge type.
The database can be queried in three ways:
- From the CLI —
forager query <pkg> "<GQL>"for ad-hoc queries from the terminal. - From Julia/C/etc. —
forager-dbbuilds as a shared library (libforager_db.so) with a C FFI. See Crates for the API. - From Rust — use the
Dbstruct directly viaforager-dbas a library dependency.
Node types
:CrawlRun
One node per crawl execution.
| Property | Type | Description |
|---|---|---|
uid | string | UUID identifying this run |
config_name | string | Package name from [target].name |
started_at | string | ISO 8601 timestamp |
finished_at | string | ISO 8601 timestamp (set on finish) |
status | string | "running", "finished", "failed" |
pages_crawled | int | Total pages fetched in this run |
:Page
One node per fetched page. Linked to its CrawlRun via [:BELONGS_TO].
| Property | Type | Description |
|---|---|---|
uid | string | UUID |
crawl_run_uid | string | Foreign key to CrawlRun |
url | string | Normalized URL |
status_code | int | HTTP status code |
html | string | Raw HTML body |
fetched_at | string | ISO 8601 timestamp |
score | float | Relevance score (set after scoring) |
term_hits | string | JSON array of matched term texts |
scored_at | string | ISO 8601 timestamp |
embedding | vector | 384-dim float vector (sentence embedding) |
extract_{name} | string | Extracted field values (one per extract field) |
:Term
Imported or config-defined search terms, stored per package.
| Property | Type | Description |
|---|---|---|
config_name | string | Package name |
group | string | Term group name |
text | string | The search phrase |
weight | float | Term weight |
embedding | vector | 384-dim embedding of the term text |
:Transition
DQN training data. Each transition records a state, reward, and available next actions.
| Property | Type | Description |
|---|---|---|
config_name | string | Package name |
features | string | Comma-separated float vector (11-dim) |
reward | float | Observed reward for this transition |
next_actions | string | JSON array of available next action features |
:Domain
Per-domain crawl statistics, saved after each run.
| Property | Type | Description |
|---|---|---|
config_name | string | Package name |
name | string | Domain name (e.g., example.com) |
fetches | int | Total fetch attempts |
successes | int | Successful fetches (HTTP 2xx) |
reward_sum | float | Cumulative reward from this domain |
avg_fetch_ms | float | Average fetch latency |
avg_parse_ms | float | Average parse time |
avg_html_bytes | float | Average response body size |
:Model
Persisted DQN network weights and training state. One node per package (replaced on save).
| Property | Type | Description |
|---|---|---|
config_name | string | Package name |
dqn_weights | string | Serialized network weights |
epsilon | float | Current epsilon value |
steps | int | Total training steps completed |
:ParamGroup
Persisted adaptive parameter groups. One node per (package, stage) pair.
| Property | Type | Description |
|---|---|---|
config_name | string | Package name |
group_key | string | Stage key (score, fetch, parse, select, tune) |
json | string | JSON-serialized param group + learner state |
:Frontier
Persisted frontier tree state. One node per package (replaced on save).
| Property | Type | Description |
|---|---|---|
config_name | string | Package name |
tree_json | string | JSON-serialized frontier tree |
Edge types
[:BELONGS_TO]
Connects :Page to :CrawlRun. Direction: (Page)-[:BELONGS_TO]->(CrawlRun).
Indexes
- HNSW vector index on
Page.embedding: 384 dimensions, cosine similarity. Created byDb::init_schema(). Enables approximate nearest-neighbor queries over page embeddings.