v0.3.4 · pip install neuralmind

Your codebase,
40–70× smaller
for AI agents

NeuralMind turns any repository into a queryable neural index. AI coding agents answer code questions in ~800 tokens instead of loading 50,000+ tokens of raw source.

See how it works View on GitHub

40–70×

Token Reduction

~800

Tokens per Query

97%

Cost Savings

LLMs are flying blind
on large codebases

Without NeuralMind, every code question forces an AI agent to load raw source files — burning tokens and budget on irrelevant context.

Without NeuralMind

Raw file loading on every query

Tokens per query

50,000+

Cost (Claude Sonnet)

$0.15–$3.75

Monthly (100 queries/day)

~$450

With NeuralMind

Smart semantic context retrieval

Tokens per query

~800

Cost (Claude Sonnet)

$0.002–$0.06

Monthly (100 queries/day)

~$7

4-layer progressive
disclosure

NeuralMind loads only what's relevant to each query. Static orientation layers always load; dynamic layers respond to your specific question.

Identity — Always Loaded

Project name, description, graph size, entry points, main patterns

~100 tokens

Architecture Summary — Always Loaded

Module overview, key components, dependencies, data flow, top clusters

~300 tokens

Relevant Modules — Query-Specific

Code clusters most semantically similar to your question via community detection

~300 tokens

Semantic Search — Query-Specific

Direct vector similarity hits, reranked by learned cooccurrence patterns

~300 tokens

Cut tokens at the
source and the output

Most tools optimize only retrieval. NeuralMind compresses both what agents fetch and what they consume from tool outputs.

Phase 1 — Retrieval

What to fetch

neuralmind wakeup .~365 tokens

neuralmind query "?"~800 tokens

neuralmind skeleton <file>5–15× cheaper

Phase 2 — Compression

What agents see

Read (file)~88% savings

Bash (output)~91% savings

Grep (matches)capped at 25

See exactly what
agents receive

Every response includes a token footer showing real-time savings. No guesswork — you always know the efficiency of context.

Automatic on session start

Run neuralmind wakeup . once. The agent orients itself without reading a single source file.

Query-aware context

Different questions get different context. Asking about auth returns auth clusters. Asking about payments returns payment logic.

Gets smarter over time

The cooccurrence reranker learns which modules appear together in your queries and boosts their relevance automatically.

neuralmind query

$ neuralmind query . "how does auth work?" ## Project: myapp Full-stack web app. React 18, Node.js, PostgreSQL. Knowledge Graph: 241 entities, 23 clusters ## Architecture Overview Cluster 5 (45 entities): authenticate_user, hash_password, verify_token Cluster 12 (23 entities): UserController, AuthMiddleware ## Relevant Code Areas Cluster 5 (relevance: 1.73) authenticate_user — auth.py verify_token — auth.py ## Search Results AuthMiddleware (0.91) — middleware.py jwt_handler (0.85) — auth/jwt.py

What this means for
your API bill

Based on 100 queries/day. NeuralMind runs entirely offline — no additional API costs beyond your model provider.

Model	Without NeuralMind	With NeuralMind	Monthly Savings
Claude 3.5 Sonnet	$450 / mo	$7 / mo	$443 saved
GPT-4o	$750 / mo	$12 / mo	$738 saved
Claude Opus	$2,250 / mo	$36 / mo	$2,214 saved
GPT-4.5	$11,250 / mo	$180 / mo	$11,070 saved

Works directly in
Claude Desktop & Cursor

Native Model Context Protocol server. Call NeuralMind tools directly from your AI agent session — no wrappers, no middleware.

neuralmind_wakeup

Session-start orientation. Returns project context in ~365–600 tokens without reading any source files.

~400 tokens

neuralmind_query

Answer any code question. Returns L0–L3 structured context with token count and reduction ratio.

~800–1100 tokens

neuralmind_skeleton

Explore a file's functions, call graph, and cross-file dependencies without loading full source.

5–15× cheaper

neuralmind_search

Semantic entity search. Finds functions, classes, and routes by concept — ranked by similarity.

ranked results

neuralmind_build

Incremental index update. Only re-embeds changed nodes — fast after small code changes.

incremental

neuralmind_benchmark

Measure per-query token counts and reduction ratios on your actual codebase.

metrics

NeuralMind vs.
Heuristic-only retrieval

Both approaches reduce context. The tradeoff is retrieval quality vs. zero dependencies. NeuralMind runs fully offline — no API calls, no cloud services, no data leaves your machine.

Feature	Heuristic-only	🧠 NeuralMind
Token reduction	~33× (97% fewer tokens)	40–70×
Retrieval accuracy	70–80% top-5	Higher (semantic)
External dependencies	✓ None	ChromaDB (local)
Runs offline	✓ Yes	✓ Yes
Learns from usage	✗ No	✓ Cooccurrence reranking
MCP server	✗ No	✓ Native
PostToolUse compression	✗ No	✓ Phase 2 hooks
File skeleton view	✗ No	✓ Call graph + deps

One install.
Dramatically less context.

View on GitHub PyPI Package ↗

Documentation Usage Guide Changelog

Your codebase,40–70× smallerfor AI agents

LLMs are flying blindon large codebases

Without NeuralMind

With NeuralMind

4-layer progressivedisclosure

Identity — Always Loaded

Architecture Summary — Always Loaded

Relevant Modules — Query-Specific

Semantic Search — Query-Specific

Cut tokens at thesource and the output

What to fetch

What agents see

See exactly whatagents receive

Automatic on session start

Query-aware context

Gets smarter over time

What this means foryour API bill

Works directly inClaude Desktop & Cursor

neuralmind_wakeup

neuralmind_query

neuralmind_skeleton

neuralmind_search

neuralmind_build

neuralmind_benchmark

NeuralMind vs.Heuristic-only retrieval

One install.Dramatically less context.

Your codebase,
40–70× smaller
for AI agents

LLMs are flying blind
on large codebases

4-layer progressive
disclosure

Cut tokens at the
source and the output

See exactly what
agents receive

What this means for
your API bill

Works directly in
Claude Desktop & Cursor

NeuralMind vs.
Heuristic-only retrieval

One install.
Dramatically less context.