Context optimization for LLM & RAG pipelines

More context isn't a better answer.

A million-token window doesn't make context free — and piling on more of it stops helping fast, then starts to hurt. HighSNR sends your model only the high-signal passages: fewer tokens in, the same answer out. On multi-hop QA, a better one.

⚙ No LLM in the loop 🔒 Zero data retention = Same input → same output ⚡ Sub-second for most docs

Where it fits

A deterministic filter that sits in front of your model. Drop it in anywhere context gets too long or too noisy — no vector DB, no extra model call.

LLM calls

Compress before you send

Pass a long document and a token limit. Get back only the chunks that matter. Fewer tokens sent — lower cost, faster responses, less hallucination from noise.

RAG Memory & Embeddings

Fewer vectors, less noise

Before embedding a large corpus, compress documents first. Fewer, higher-quality chunks mean less storage, faster retrieval, and less noise in your vector store.

After retrieval

Compress what RAG returns

Retrieved chunks often exceed your context window. HighSNR compresses candidates to fit your budget — keeping the best passages, dropping the rest.

Why not just use RAG or rerankers?

RAG and rerankers solve different problems. HighSNR works alongside both — or replaces them where they're overkill.

"RAG retrieves. Rerankers reorder. We cut the noise — before or after."

Pipeline fit

1. No RAG needed

single document

Long doc → HighSNR → LLM

Contract, paper, report — no vector DB required.

2. Before RAG

fewer vectors

Corpus → HighSNR → fewer chunks → embed → Vector DB

Smaller index
Faster retrieval
Less noise in results
Lower embedding costs

3. After RAG

signal to budget

Corpus → RAG → candidates → HighSNR → LLM

Retrieved chunks exceed your context window? HighSNR compresses them to fit.

Benchmark

LongBench v1 · Claude Sonnet 4.5 · n=200 per dataset · QA F1 score

Evaluated across two QA datasets. X-axis: share of the document kept (token budget). Y-axis: QA F1 score, higher is better. Baseline is random chunk selection at the same budget.

HotpotQA — multi-hop QA with distractors

Qasper — dense academic papers

The counterintuitive part: at the right budget, less context answers as well as the full document — sometimes better.

With a hint, HighSNR at 50-60% budget scores 67-69 F1 on HotpotQA — exceeding full-doc (66.26) at half the tokens. On dense academic papers (Qasper), hint reaches 48.98 at 50% — 97% of full-doc (50.69) using half the tokens.

Without a hint at low budgets (10-30%), ranking performance converges with random — there is not enough text to distinguish signal from noise without query context. Providing a context_hint lifts F1 by 10-15 points at every budget level.

Latency

Fast enough for synchronous calls on most documents.

< 5k tokens

770 ms median

mean 777 ms

5k – 10k tokens

1,102 ms median

mean 1,142 ms

10k – 20k tokens

1,792 ms median

mean 1,833 ms

View full results and reproduction scripts on GitHub

API

One POST. Send your text, a token budget, and an optional context_hint. Get back only the passages that matter. No SDK required.

v2 · document input (recommended)

POST /v2/optimize

curl https://api.high-snr.com/v2/optimize \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": "your long document text...",
    "max_output_tokens": 2000,
    "context_hint": "what is the main finding?"
  }'

v2 · pre-split chunks input

POST /v2/optimize

curl https://api.high-snr.com/v2/optimize \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "chunks": ["Section one...", "Section two...", "Section three..."],
    "max_output_tokens": 2000,
    "context_hint": "what is the main finding?"
  }'

Response

{
  "optimized_chunks": [
    "Highest-signal passage from your document...",
    "Second highest-signal passage..."
  ]
}

See full parameter reference in the docs.

Try it free

Sign up, grab a key, make your first call in minutes. No card, no sales call.

Free

•2M tokens or 14 days — whichever comes first
•250K tokens / day
•No card required

Need more? hello@high-snr.com