Context optimization for LLM & RAG pipelines
A million-token window doesn't make context free โ and piling on more of it stops helping fast, then starts to hurt. HighSNR sends your model only the high-signal passages: fewer tokens in, the same answer out. On multi-hop QA, a better one.
A deterministic filter that sits in front of your model. Drop it in anywhere context gets too long or too noisy โ no vector DB, no extra model call.
Pass a long document and a token limit. Get back only the chunks that matter. Fewer tokens sent โ lower cost, faster responses, less hallucination from noise.
Before embedding a large corpus, compress documents first. Fewer, higher-quality chunks mean less storage, faster retrieval, and less noise in your vector store.
Retrieved chunks often exceed your context window. HighSNR compresses candidates to fit your budget โ keeping the best passages, dropping the rest.
RAG and rerankers solve different problems. HighSNR works alongside both โ or replaces them where they're overkill.
"RAG retrieves. Rerankers reorder. We cut the noise โ before or after."
Pipeline fit
1. No RAG needed
single document
Long doc โ HighSNR โ LLM
Contract, paper, report โ no vector DB required.
2. Before RAG
fewer vectors
Corpus โ HighSNR โ fewer chunks โ embed โ Vector DB
3. After RAG
signal to budget
Corpus โ RAG โ candidates โ HighSNR โ LLM
Retrieved chunks exceed your context window? HighSNR compresses them to fit.
LongBench v1 ยท Claude Sonnet 4.5 ยท n=200 per dataset ยท QA F1 score
Evaluated across two QA datasets. X-axis: share of the document kept (token budget). Y-axis: QA F1 score, higher is better. Baseline is random chunk selection at the same budget.
HotpotQA โ multi-hop QA with distractors
Qasper โ dense academic papers
The counterintuitive part: at the right budget, less context answers as well as the full document โ sometimes better.
With a hint, HighSNR at 50-60% budget scores 67-69 F1 on HotpotQA โ exceeding full-doc (66.26) at half the tokens. On dense academic papers (Qasper), hint reaches 48.98 at 50% โ 97% of full-doc (50.69) using half the tokens.
Without a hint at low budgets (10-30%), ranking performance converges with random โ there is not enough text to distinguish signal from noise without query context. Providing a context_hint lifts F1 by 10-15 points at every budget level.
Latency
Fast enough for synchronous calls on most documents.
< 5k tokens
770 ms median
mean 777 ms
5k โ 10k tokens
1,102 ms median
mean 1,142 ms
10k โ 20k tokens
1,792 ms median
mean 1,833 ms
One POST. Send your text, a token budget, and an optional context_hint. Get back only the passages that matter. No SDK required.
v2 ยท document input (recommended)
curl https://api.high-snr.com/v2/optimize \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"document": "your long document text...",
"max_output_tokens": 2000,
"context_hint": "what is the main finding?"
}'
v2 ยท pre-split chunks input
curl https://api.high-snr.com/v2/optimize \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"chunks": ["Section one...", "Section two...", "Section three..."],
"max_output_tokens": 2000,
"context_hint": "what is the main finding?"
}'
{
"optimized_chunks": [
"Highest-signal passage from your document...",
"Second highest-signal passage..."
]
}
See full parameter reference in the docs.
Sign up, grab a key, make your first call in minutes. No card, no sales call.
Free
Need more? hello@high-snr.com