Send a document and a token budget. Get back the highest-signal chunks โ contiguous passages selected by importance, ready to feed your LLM.
Pass a long document and a token limit. Get back only the chunks that matter. Fewer tokens sent โ lower cost, faster responses, less hallucination from noise.
Before embedding a large corpus, compress documents first. Fewer, higher-quality chunks mean less storage, faster retrieval, and less noise in your vector store.
Retrieved chunks often exceed your context window. HighSNR compresses candidates to fit your budget โ keeping the best passages, dropping the rest.
RAG and rerankers solve different problems. HighSNR works alongside both โ or replaces them where they're overkill.
"RAG retrieves. Rerankers reorder. We cut the noise โ before or after."
Pipeline fit
1. No RAG needed
single document
Long doc โ HighSNR โ LLM
Contract, paper, report โ no vector DB required.
2. Before RAG
fewer vectors
Corpus โ HighSNR โ fewer chunks โ embed โ Vector DB
3. After RAG
signal to budget
Corpus โ RAG โ candidates โ HighSNR โ LLM
Retrieved chunks exceed your context window? HighSNR compresses them to fit.
LongBench v1 ยท GPT-4o ยท n=200 per dataset ยท QA F1 score
Evaluated across two multi-hop and single-hop QA datasets. Higher is better.
HotpotQA
QA F1 score โ no hint ยท with hint ยท full doc
Actual budget %
At 90% budget, with hint scores 71.57 โ beating full-context GPT-4o F1 of 69.71. Budget is accurate: 80% target โ 79.6% actual budget used.
Qasper
QA F1 score โ no hint ยท with hint ยท full doc
Actual budget %
At 90% budget, with hint scores 46.25 โ retaining 97.9% of full-context GPT-4o F1 (47.22). Actual compression closely tracks the target.
HighSNR never cuts a chunk mid-sentence. Chunks are selected whole โ if the next chunk would exceed the budget it is skipped, so the output lands at or just below the target.
Latency
Fast enough for synchronous calls on most documents.
< 5k tokens
770 ms median
mean 777 ms
5k โ 10k tokens
1,102 ms median
mean 1,142 ms
10k โ 20k tokens
1,792 ms median
mean 1,833 ms
One endpoint. Pass a document or pre-split chunks and a token limit. Get back only the passages that matter.
document input
curl https://api.high-snr.com/v1/optimize \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"document": "your long document text...",
"max_output_tokens": 2000,
"context_hint": "what is the main finding?"
}'
pre-split chunks input
curl https://api.high-snr.com/v1/optimize \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"chunks": ["Section one...", "Section two...", "Section three..."],
"max_output_tokens": 2000,
"context_hint": "what is the main finding?"
}'
{
"selected_chunks": [
"Highest-signal passage from your document...",
"Second highest-signal passage..."
]
}
See full parameter reference in the docs.
Get started with a free allocation. No card required.
Free
Need more? hello@high-snr.com