Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Roadmap

v0.1.0 — RELEASED

All phases complete. Released at https://github.com/civitas-io/prx/releases/tag/v0.1.0

Phase 0 — Foundation

DeliverableStatus
Project scaffold (Cargo, CI, clippy/fmt)Done
Tree-sitter integration (14 grammars, chunking, AST parsing)Done
Model2Vec inference (pure Rust, safetensors + ndarray, float16)Done
BM25 implementation (compound identifier tokenization, CSC sparse matrix)Done
JSON/JSONL output frameworkDone
Token counting (cl100k_base, fast + exact modes)Done
Content hashing (xxh3)Done
File walking (ignore crate, .prxignore)Done

Phase 1 — Core Tools

CommandStatus
prx search (literal + semantic + structural, RRF fusion, 5-stage reranking)Done
prx read (–lines, –snap, –skeleton, –outline, –hash, –budget)Done
prx find (tree+flat, –pattern, –depth, –changed-since, –related-to)Done
prx exists (bloom filter O(1))Done
prx outline (file + directory mode)Done
Search auto-detection (literal vs semantic vs structural)Done
Continuation tokens for paginationDone
Budget enforcementDone

Phase 2 — Edit, Diff, Integration

CommandStatus
prx edit (literal/regex, dry-run, –apply, –in-function, syntax validation)Done
prx diff (git diff, function attribution, semantic notes, –stat-only)Done
prx run (9 parsers: cargo test/build/clippy, pytest, go test, jest/vitest, tsc, eslint)Done
prx index (persistent to .prx/index/, –rebuild, –stats, –watch)Done
prx batch (JSONL stdin dispatch)Done
prx stats (token savings dashboard, PRX_STATS_FILE env)Done
prx init (AGENTS.md snippet, cursor/codex/opencode/claude-code configs)Done
prx mcp (MCP server over stdio, 6 tools)Done

Phase 3 — Polish, Benchmark, Release

AreaStatus
Cross-platform CI (Linux, macOS, Windows)Done
Float16 model conversion (77MB → 48MB binary)Done
Model2Vec vocabulary loading (real tokenizer, 61,826 tokens)Done
GitHub Actions release pipeline (5 targets)Done
Apache 2.0 licenseDone
Documentation (21 docs, ~5,000 lines)Done
300 tests (256 unit + 44 E2E), 84% coverageDone

v0.1.0 Stats

MetricValue
Commands13
Tests300
Coverage84%
Languages14 (tree-sitter grammars)
Release binary~49 MB
Tool parsers (prx run)9

v0.1.1 — Reliability — RELEASED

ItemStatus
Graceful fallback (catch_unwind + fallback to grep/cat/find on internal errors)Done
Error logging (~/.prx/errors.jsonl captures every fallback)Done
Real-world telemetry (prx stats --compare shows per-command savings)Done
Synthetic benchmarks (prx bench runs side-by-side comparisons)Done
Pre-commit hook (mirrors CI checks: fmt + clippy + tests)Done

v0.2.0 — Context Intelligence — RELEASED

Session and Caching

ItemStatusDescription
--if-changed HASHDoneStateless conditional read. Agent passes previous hash, gets 48-token stub if unchanged. 99% reduction on re-reads.
File reference IDsPlannedAssign sequential IDs (F1, F2…) to files in a session. Accept F1 as path alias.

Read Modes

ItemStatusDescription
--mode aggressiveDoneTree-sitter comment stripping + blank line collapse. 1-19% savings.
--mode diffDoneChanged lines vs git HEAD only. 80-97% savings on modified files.
--mode entropyDonePattern-based repetitive line filter. 5-87% savings (86% on generated structs).
Auto mode for readPlannedAuto-select best read mode based on file size, type, and cache state.

Search Improvements

ItemStatusDescription
Graph proximity boostDoneImport graph from 7 languages via regex. BFS 2-hop neighborhood. 0.25x additive boost with hop decay. Persisted to imports.bin.
MMR diversityPlannedMaximal Marginal Relevance in reranking.

v0.2.0 Stats

MetricValue
Tests353 (304 unit + 49 E2E)
New modules3 (imports.rs, graph.rs, proximity.rs)
New features5 (–if-changed, 3 read modes, proximity boost)

v0.3.0 — Reliability and Search Quality — RELEASED

Reliability

ItemStatusDescription
MCP server E2E testsDone8 E2E tests covering initialize, tools/list, tools/call for all 6 MCP tools.
Incremental indexingDoneSkip unchanged files via hash comparison. Reports files_changed/files_unchanged.
Real criterion benchmarksDone5 search benchmarks + 3 chunking benchmarks.
NDCG@10 measurementDone50-query labeled dataset on prx (NDCG@10=0.639) + 49-query dataset on external production codebase (NDCG@10=0.451).
Structural search validationDoneWarns when pattern compiles but matches 0 files, or when pattern fails to compile for all languages.

Search Quality

Measured NDCG@10: 0.639 (self), 0.451 (external production codebase). Target: 0.70+ on unfamiliar codebases.

ItemStatusDescription
Symbol-query ranking overhaulDone12x definition boost for symbol queries, import-line penalty (0.2x), improved definition detection for Python/TS.
Chunk header enrichmentDoneBM25 enrichment now prepends [lang] file_path stem_tokens to each chunk.
Persistent dense indexDoneEmbeddings computed at index time, stored as embeddings.bin.
Sharper mode detectionDoneSymbol queries: alpha=0.1 (near-pure BM25). NL queries: alpha=0.6. Static synonym dict (18 pairs).
Reranker weight tuningDoneDefinition boost 3→4 (NL), 8→12 (symbol). Stem match 1.0→1.5.
Chunk overlapDone200-byte overlap between chunks, snapped to line boundaries.
Embedding model upgradeDoneEvaluated 3 models: potion-retrieval-32M selected (+7% NDCG).
Symbol indexDoneMap each symbol to definition location + reference count. Symbol NDCG: 0.263 → 0.619.

v0.4.0 — Run Parsers and Project Intelligence — RELEASED

Run Parsers

10 new parsers implemented. Total: 22 parsers.

ParserToolStatus
terraformplan, applyDone
kubectldescribe, getDone
kubectl-logslogs (+ docker logs)Done
docker-buildbuildDone
mvntest, buildDone
gradlebuild, testDone
dotnettest, buildDone
mypytype checkDone
npm-lsnpm listDone
git-loglogDone
pytest-covpytest --cov, coverage reportDone
go-covergo test -coverDone
jest-covjest --coverage, c8Done

Project Intelligence

ItemStatusDescription
prx contextDoneAssemble context packages — search + read + outline in one call
prx impactDoneReverse dependency analysis using the import graph

Security CI

ItemStatus
cargo audit in CIDone
cargo deny in CIDone

v0.5.x — Current Development

v0.5.0 — Features

ItemStatusDescription
prx run --auto-jsonDoneAuto-inject --json flags for tools with structured output.
Tree-sitter import extractionDoneReplace regex imports with tree-sitter AST queries.
Import language coverageDonebash, CSS, HTML import extraction added.

v0.5.1 — Build and Security

ItemStatusDescription
Self-contained build (build.rs)Donecargo build works without make models or Python. SHA-256 pinned artifacts.
Migrate off bincodeDoneReplace bincode (RUSTSEC-2025-0141) with postcard for all index serialization.

v0.5.4 — Lean-Down Refactoring

ItemStatusDescription
define_regex! macroDoneReduce 3-line LazyLock<Regex> statics to 1-line macro calls across 22 parsers. ~130 lines saved.
ParsedResult::new() constructorDoneReplace 10-line struct literals with 1-line constructor calls across 22 parsers. ~200 lines saved.
Extract src/workspace.rsDoneDeduplicate find_workspace_root(), relative_path(), is_test_file(). ~73 lines saved.

v0.5.5 — Index Performance and Test Coverage (Current)

ItemPriorityStatusDescription
Parallel embedding (rayon)HighDoneEmbed chunks in parallel during indexing. ~300s → ~100s on 4-core for 55k chunks.
Parallel chunkingHighDoneParse and chunk files in parallel during indexing.
Parallel import extractionMediumDoneExtract imports per-file in parallel during ImportGraph::build_full.
E2E coverage for search.rsHighIn progressCover hybrid/semantic search paths (47.6% → 80%+).
E2E coverage for mcp.rsHighIn progressCover remaining MCP tool paths (51.4% → 80%+).
E2E coverage for run.rsMediumPlannedCover external command execution paths (63.1% → 80%+).
E2E coverage for init.rsMediumPlannedCover config generation paths (59.8% → 80%+).
Test helpers (tests/helpers/)MediumPlannedExtract run_prx(), test_dir() helpers. ~300 lines saved.

v0.5.6 — Memory-Mapped Index

ItemPriorityDescription
Memory-mapped index filesHighUse mmap instead of read-to-vec for chunks.bin, bm25.bin, embeddings.bin. OS handles caching — index stays in memory across queries.
bench-ndcg --plainMediumHuman-readable table output for terminal use.
bench-ndcg load-onceMediumLoad index once, query N times.

v0.5.7 — Public Benchmark Suite

ItemPriorityDescription
Query generation for 8 pinned reposHigh25 labeled queries per repo (flask, ripgrep, fastify, cargo, django, kafka, terraform, vscode). 200 total queries across 6 languages, 3 size tiers.
benchmark.yml CI workflowHighClone repos at pinned SHAs, build index, run NDCG, compare to baseline, fail on regression >0.05.
Results dashboardMediumbenchmarks/results/ with per-release JSON.
Expand to 40-50 queries per repoMedium25 queries gives ±0.05-0.08 standard error. 40-50 narrows to ±0.03, enabling tighter CI gate.

Repository matrix:

SizeRepoLanguageLOC
Smallpallets/flaskPython15K
SmallBurntSushi/ripgrepRust25K
Smallfastify/fastifyTypeScript15K
Mediumrust-lang/cargoRust150K
Mediumdjango/djangoPython300K
Mediumapache/kafkaJava500K
Largehashicorp/terraformGo2M
Largemicrosoft/vscodeTypeScript1M

v0.5.8 — Documentation Site [DONE]

ItemPriorityStatus
Documentation site (mdBook)HighDone — 33 pages at civitas-io.github.io/prx/.
deploy-docs.yml workflowHighDone — auto-deploy on push to main.
Docs cleanupMediumDone — book/ is single source of truth, docs/ archived.

v0.5.9 — Distribution [DONE]

ItemPriorityStatus
cargo publishHighDonecrates.io/crates/prx. cargo install prx.
Homebrew formulaHighDonebrew install civitas-io/tap/prx. Tap: civitas-io/homebrew-tap.
build.rs OUT_DIR fixHighDone — models download to OUT_DIR, crate is 171 KB compressed.
npm wrapperMediumDeferred — npx prx for JS/TS agents.
pip wrapperMediumDeferred — pip install prx for Python agents.

v0.5.10 — Additional Grammars

ItemPriorityDescription
Kotlin grammarMediumtree-sitter-kotlin + import/outline extraction
Swift grammarMediumtree-sitter-swift + import/outline extraction
C# grammarMediumtree-sitter-c-sharp + import/outline extraction
PHP grammarMediumtree-sitter-php + import/outline extraction
Elixir grammarMediumtree-sitter-elixir + import/outline extraction

v0.6.0 — Model Tiering

Benchmark data (v0.5.7) shows the 32M general-purpose model works for small codebases (NDCG@10 0.5-0.7) but degrades on medium (0.3-0.4) and large (0.2-0.3). Code-specific models distilled via Model2Vec can close this gap while keeping pure-Rust inference.

ItemPriorityDescription
Expand benchmark to 40-50 queries per repoHigh25 queries gives ±0.05-0.08 noise — need tighter baselines before evaluating new models. Prioritize medium/large repos (django, kafka, terraform, vscode).
Distill code-specific Model2Vec modelsHighDistill CodeSage-v2-Base (356M) and/or all-mpnet-base-v2 (109M) into Model2Vec format (256d, f16). ~30 sec distillation, ~8 MB output. Benchmark against expanded query suite.
prx index --model flagHighSupport --model builtin (default), --model standard, --model large. Download on first use to ~/.prx/models/.
Repo analysis + model recommendationHighAfter prx index, emit a hint if repo has >3K files: “For better semantic search, try prx index --model standard”.
Model download infrastructureHighSHA-256 pinned downloads from HuggingFace or GitHub Releases. Offline via PRX_MODELS_DIR. Progress bar.
Benchmark regression gate tighteningMediumWith 40-50 queries, tighten CI gate from 0.05 to 0.02 regression threshold.

Model tiers:

TierModelSizeTargetNDCG@10 (expected)
builtinpotion-retrieval-32M (current)32 MB embedded<3K files0.5-0.7
standardCodeSage-Base-M2V-256~8 MB download3K-10K files0.5-0.6 (est.)
largeJina-Code-v3-M2V-512~30-60 MB download10K+ files0.4-0.5 (est.)

Version Compatibility

CLI flags and JSON output schemas may change between minor versions. All breaking changes are documented in CHANGELOG.md with migration guides. JSON output includes a version field for programmatic detection.