v0.3.4 · pip install neuralmind

Your codebase,
40–70× smaller
for AI agents

NeuralMind turns any repository into a queryable neural index. AI coding agents answer code questions in ~800 tokens instead of loading 50,000+ tokens of raw source.

40–70×
Token Reduction
~800
Tokens per Query
97%
Cost Savings

LLMs are flying blind
on large codebases

Without NeuralMind, every code question forces an AI agent to load raw source files — burning tokens and budget on irrelevant context.

Without NeuralMind

Raw file loading on every query

Tokens per query
50,000+
Cost (Claude Sonnet)
$0.15–$3.75
Monthly (100 queries/day)
~$450

With NeuralMind

Smart semantic context retrieval

Tokens per query
~800
Cost (Claude Sonnet)
$0.002–$0.06
Monthly (100 queries/day)
~$7

4-layer progressive
disclosure

NeuralMind loads only what's relevant to each query. Static orientation layers always load; dynamic layers respond to your specific question.

L0

Identity — Always Loaded

Project name, description, graph size, entry points, main patterns

~100 tokens
L1

Architecture Summary — Always Loaded

Module overview, key components, dependencies, data flow, top clusters

~300 tokens
L2

Relevant Modules — Query-Specific

Code clusters most semantically similar to your question via community detection

~300 tokens
L3

Semantic Search — Query-Specific

Direct vector similarity hits, reranked by learned cooccurrence patterns

~300 tokens

Cut tokens at the
source and the output

Most tools optimize only retrieval. NeuralMind compresses both what agents fetch and what they consume from tool outputs.

Phase 1 — Retrieval

What to fetch

neuralmind wakeup .~365 tokens
neuralmind query "?"~800 tokens
neuralmind skeleton <file>5–15× cheaper
Phase 2 — Compression

What agents see

Read (file)~88% savings
Bash (output)~91% savings
Grep (matches)capped at 25

See exactly what
agents receive

Every response includes a token footer showing real-time savings. No guesswork — you always know the efficiency of context.

Automatic on session start

Run neuralmind wakeup . once. The agent orients itself without reading a single source file.

Query-aware context

Different questions get different context. Asking about auth returns auth clusters. Asking about payments returns payment logic.

Gets smarter over time

The cooccurrence reranker learns which modules appear together in your queries and boosts their relevance automatically.

What this means for
your API bill

Based on 100 queries/day. NeuralMind runs entirely offline — no additional API costs beyond your model provider.

ModelWithout NeuralMindWith NeuralMindMonthly Savings
Claude 3.5 Sonnet$450 / mo$7 / mo$443 saved
GPT-4o$750 / mo$12 / mo$738 saved
Claude Opus$2,250 / mo$36 / mo$2,214 saved
GPT-4.5$11,250 / mo$180 / mo$11,070 saved

Works directly in
Claude Desktop & Cursor

Native Model Context Protocol server. Call NeuralMind tools directly from your AI agent session — no wrappers, no middleware.

neuralmind_wakeup

Session-start orientation. Returns project context in ~365–600 tokens without reading any source files.

~400 tokens

neuralmind_query

Answer any code question. Returns L0–L3 structured context with token count and reduction ratio.

~800–1100 tokens

neuralmind_skeleton

Explore a file's functions, call graph, and cross-file dependencies without loading full source.

5–15× cheaper

neuralmind_search

Semantic entity search. Finds functions, classes, and routes by concept — ranked by similarity.

ranked results

neuralmind_build

Incremental index update. Only re-embeds changed nodes — fast after small code changes.

incremental

neuralmind_benchmark

Measure per-query token counts and reduction ratios on your actual codebase.

metrics

NeuralMind vs.
Heuristic-only retrieval

Both approaches reduce context. The tradeoff is retrieval quality vs. zero dependencies. NeuralMind runs fully offline — no API calls, no cloud services, no data leaves your machine.

FeatureHeuristic-only🧠 NeuralMind
Token reduction~33× (97% fewer tokens)40–70×
Retrieval accuracy70–80% top-5Higher (semantic)
External dependencies NoneChromaDB (local)
Runs offline Yes Yes
Learns from usage No Cooccurrence reranking
MCP server No Native
PostToolUse compression No Phase 2 hooks
File skeleton view No Call graph + deps

One install.
Dramatically less context.