All phases complete. Released at https://github.com/civitas-io/prx/releases/tag/v0.1.0
Deliverable Status
Project scaffold (Cargo, CI, clippy/fmt) Done
Tree-sitter integration (14 grammars, chunking, AST parsing) Done
Model2Vec inference (pure Rust, safetensors + ndarray, float16) Done
BM25 implementation (compound identifier tokenization, CSC sparse matrix) Done
JSON/JSONL output framework Done
Token counting (cl100k_base, fast + exact modes) Done
Content hashing (xxh3) Done
File walking (ignore crate, .prxignore) Done
Command Status
prx search (literal + semantic + structural, RRF fusion, 5-stage reranking)Done
prx read (–lines, –snap, –skeleton, –outline, –hash, –budget)Done
prx find (tree+flat, –pattern, –depth, –changed-since, –related-to)Done
prx exists (bloom filter O(1))Done
prx outline (file + directory mode)Done
Search auto-detection (literal vs semantic vs structural) Done
Continuation tokens for pagination Done
Budget enforcement Done
Command Status
prx edit (literal/regex, dry-run, –apply, –in-function, syntax validation)Done
prx diff (git diff, function attribution, semantic notes, –stat-only)Done
prx run (9 parsers: cargo test/build/clippy, pytest, go test, jest/vitest, tsc, eslint)Done
prx index (persistent to .prx/index/, –rebuild, –stats, –watch)Done
prx batch (JSONL stdin dispatch)Done
prx stats (token savings dashboard, PRX_STATS_FILE env)Done
prx init (AGENTS.md snippet, cursor/codex/opencode/claude-code configs)Done
prx mcp (MCP server over stdio, 6 tools)Done
Area Status
Cross-platform CI (Linux, macOS, Windows) Done
Float16 model conversion (77MB → 48MB binary) Done
Model2Vec vocabulary loading (real tokenizer, 61,826 tokens) Done
GitHub Actions release pipeline (5 targets) Done
Apache 2.0 license Done
Documentation (21 docs, ~5,000 lines) Done
300 tests (256 unit + 44 E2E), 84% coverage Done
Metric Value
Commands 13
Tests 300
Coverage 84%
Languages 14 (tree-sitter grammars)
Release binary ~49 MB
Tool parsers (prx run) 9
Item Status
Graceful fallback (catch_unwind + fallback to grep/cat/find on internal errors) Done
Error logging (~/.prx/errors.jsonl captures every fallback) Done
Real-world telemetry (prx stats --compare shows per-command savings) Done
Synthetic benchmarks (prx bench runs side-by-side comparisons) Done
Pre-commit hook (mirrors CI checks: fmt + clippy + tests) Done
Item Status Description
--if-changed HASHDone Stateless conditional read. Agent passes previous hash, gets 48-token stub if unchanged. 99% reduction on re-reads.
File reference IDs Planned Assign sequential IDs (F1, F2…) to files in a session. Accept F1 as path alias.
Item Status Description
--mode aggressiveDone Tree-sitter comment stripping + blank line collapse. 1-19% savings.
--mode diffDone Changed lines vs git HEAD only. 80-97% savings on modified files.
--mode entropyDone Pattern-based repetitive line filter. 5-87% savings (86% on generated structs).
Auto mode for read Planned Auto-select best read mode based on file size, type, and cache state.
Item Status Description
Graph proximity boost Done Import graph from 7 languages via regex. BFS 2-hop neighborhood. 0.25x additive boost with hop decay. Persisted to imports.bin.
MMR diversity Planned Maximal Marginal Relevance in reranking.
Metric Value
Tests 353 (304 unit + 49 E2E)
New modules 3 (imports.rs, graph.rs, proximity.rs)
New features 5 (–if-changed, 3 read modes, proximity boost)
Item Status Description
MCP server E2E tests Done 8 E2E tests covering initialize, tools/list, tools/call for all 6 MCP tools.
Incremental indexing Done Skip unchanged files via hash comparison. Reports files_changed/files_unchanged.
Real criterion benchmarks Done 5 search benchmarks + 3 chunking benchmarks.
NDCG@10 measurement Done 50-query labeled dataset on prx (NDCG@10=0.639) + 49-query dataset on external production codebase (NDCG@10=0.451).
Structural search validation Done Warns when pattern compiles but matches 0 files, or when pattern fails to compile for all languages.
Measured NDCG@10: 0.639 (self), 0.451 (external production codebase). Target: 0.70+ on unfamiliar codebases.
Item Status Description
Symbol-query ranking overhaul Done 12x definition boost for symbol queries, import-line penalty (0.2x), improved definition detection for Python/TS.
Chunk header enrichment Done BM25 enrichment now prepends [lang] file_path stem_tokens to each chunk.
Persistent dense index Done Embeddings computed at index time, stored as embeddings.bin.
Sharper mode detection Done Symbol queries: alpha=0.1 (near-pure BM25). NL queries: alpha=0.6. Static synonym dict (18 pairs).
Reranker weight tuning Done Definition boost 3→4 (NL), 8→12 (symbol). Stem match 1.0→1.5.
Chunk overlap Done 200-byte overlap between chunks, snapped to line boundaries.
Embedding model upgrade Done Evaluated 3 models: potion-retrieval-32M selected (+7% NDCG).
Symbol index Done Map each symbol to definition location + reference count. Symbol NDCG: 0.263 → 0.619.
10 new parsers implemented. Total: 22 parsers.
Parser Tool Status
terraform plan, applyDone
kubectl describe, getDone
kubectl-logs logs (+ docker logs)Done
docker-build buildDone
mvn test, buildDone
gradle build, testDone
dotnet test, buildDone
mypy type check Done
npm-ls npm listDone
git-log logDone
pytest-cov pytest --cov, coverage reportDone
go-cover go test -coverDone
jest-cov jest --coverage, c8Done
Item Status Description
prx contextDone Assemble context packages — search + read + outline in one call
prx impactDone Reverse dependency analysis using the import graph
Item Status
cargo audit in CIDone
cargo deny in CIDone
Item Status Description
prx run --auto-jsonDone Auto-inject --json flags for tools with structured output.
Tree-sitter import extraction Done Replace regex imports with tree-sitter AST queries.
Import language coverage Done bash, CSS, HTML import extraction added.
Item Status Description
Self-contained build (build.rs) Done cargo build works without make models or Python. SHA-256 pinned artifacts.
Migrate off bincode Done Replace bincode (RUSTSEC-2025-0141) with postcard for all index serialization.
Item Status Description
define_regex! macroDone Reduce 3-line LazyLock<Regex> statics to 1-line macro calls across 22 parsers. ~130 lines saved.
ParsedResult::new() constructorDone Replace 10-line struct literals with 1-line constructor calls across 22 parsers. ~200 lines saved.
Extract src/workspace.rs Done Deduplicate find_workspace_root(), relative_path(), is_test_file(). ~73 lines saved.
Item Priority Status Description
Parallel embedding (rayon) High Done Embed chunks in parallel during indexing. ~300s → ~100s on 4-core for 55k chunks.
Parallel chunking High Done Parse and chunk files in parallel during indexing.
Parallel import extraction Medium Done Extract imports per-file in parallel during ImportGraph::build_full.
E2E coverage for search.rs High In progress Cover hybrid/semantic search paths (47.6% → 80%+).
E2E coverage for mcp.rs High In progress Cover remaining MCP tool paths (51.4% → 80%+).
E2E coverage for run.rs Medium Planned Cover external command execution paths (63.1% → 80%+).
E2E coverage for init.rs Medium Planned Cover config generation paths (59.8% → 80%+).
Test helpers (tests/helpers/) Medium Planned Extract run_prx(), test_dir() helpers. ~300 lines saved.
Item Priority Description
Memory-mapped index files High Use mmap instead of read-to-vec for chunks.bin, bm25.bin, embeddings.bin. OS handles caching — index stays in memory across queries.
bench-ndcg --plainMedium Human-readable table output for terminal use.
bench-ndcg load-onceMedium Load index once, query N times.
Item Priority Description
Query generation for 8 pinned repos High 25 labeled queries per repo (flask, ripgrep, fastify, cargo, django, kafka, terraform, vscode). 200 total queries across 6 languages, 3 size tiers.
benchmark.yml CI workflowHigh Clone repos at pinned SHAs, build index, run NDCG, compare to baseline, fail on regression >0.05.
Results dashboard Medium benchmarks/results/ with per-release JSON.
Expand to 40-50 queries per repo Medium 25 queries gives ±0.05-0.08 standard error. 40-50 narrows to ±0.03, enabling tighter CI gate.
Repository matrix:
Size Repo Language LOC
Small pallets/flaskPython 15K
Small BurntSushi/ripgrepRust 25K
Small fastify/fastifyTypeScript 15K
Medium rust-lang/cargoRust 150K
Medium django/djangoPython 300K
Medium apache/kafkaJava 500K
Large hashicorp/terraformGo 2M
Large microsoft/vscodeTypeScript 1M
Item Priority Status
Documentation site (mdBook) High Done — 33 pages at civitas-io.github.io/prx/.
deploy-docs.yml workflow High Done — auto-deploy on push to main.
Docs cleanup Medium Done — book/ is single source of truth, docs/ archived.
Item Priority Status
cargo publishHigh Done — crates.io/crates/prx . cargo install prx.
Homebrew formula High Done — brew install civitas-io/tap/prx. Tap: civitas-io/homebrew-tap .
build.rs OUT_DIR fix High Done — models download to OUT_DIR, crate is 171 KB compressed.
npm wrapper Medium Deferred — npx prx for JS/TS agents.
pip wrapper Medium Deferred — pip install prx for Python agents.
Item Priority Description
Kotlin grammar Medium tree-sitter-kotlin + import/outline extraction
Swift grammar Medium tree-sitter-swift + import/outline extraction
C# grammar Medium tree-sitter-c-sharp + import/outline extraction
PHP grammar Medium tree-sitter-php + import/outline extraction
Elixir grammar Medium tree-sitter-elixir + import/outline extraction
Benchmark data (v0.5.7) shows the 32M general-purpose model works for small codebases (NDCG@10 0.5-0.7) but degrades on medium (0.3-0.4) and large (0.2-0.3). Code-specific models distilled via Model2Vec can close this gap while keeping pure-Rust inference.
Item Priority Description
Expand benchmark to 40-50 queries per repo High 25 queries gives ±0.05-0.08 noise — need tighter baselines before evaluating new models. Prioritize medium/large repos (django, kafka, terraform, vscode).
Distill code-specific Model2Vec models High Distill CodeSage-v2-Base (356M) and/or all-mpnet-base-v2 (109M) into Model2Vec format (256d, f16). ~30 sec distillation, ~8 MB output. Benchmark against expanded query suite.
prx index --model flagHigh Support --model builtin (default), --model standard, --model large. Download on first use to ~/.prx/models/.
Repo analysis + model recommendation High After prx index, emit a hint if repo has >3K files: “For better semantic search, try prx index --model standard”.
Model download infrastructure High SHA-256 pinned downloads from HuggingFace or GitHub Releases. Offline via PRX_MODELS_DIR. Progress bar.
Benchmark regression gate tightening Medium With 40-50 queries, tighten CI gate from 0.05 to 0.02 regression threshold.
Model tiers:
Tier Model Size Target NDCG@10 (expected)
builtinpotion-retrieval-32M (current) 32 MB embedded <3K files 0.5-0.7
standardCodeSage-Base-M2V-256 ~8 MB download 3K-10K files 0.5-0.6 (est.)
largeJina-Code-v3-M2V-512 ~30-60 MB download 10K+ files 0.4-0.5 (est.)
CLI flags and JSON output schemas may change between minor versions. All breaking changes are documented in CHANGELOG.md with migration guides. JSON output includes a version field for programmatic detection.