Context Optimization API
HighSNR compresses long documents to fit your token budget before they reach your LLM โ cutting costs without sacrificing answer quality.
Pass a long document and a token budget. Get back only the chunks that matter. Fewer tokens sent โ lower cost, faster responses, less hallucination from noise.
Before embedding a large corpus, compress documents first. Fewer, higher-quality chunks mean less storage, faster retrieval, and less noise in your vector store.
No model, no black box, no randomness. You know exactly what it does.
Same document, same budget, same output. Every time. No model drift, no surprises in production.
Your documents are never stored, logged, or used for training. Only counters and metadata are kept.
Sub-second for most documents. No local model download, no GPU required.
LongBench v1 ยท GPT-4o ยท n=200 per dataset ยท QA F1 score
Evaluated across two multi-hop and single-hop QA datasets. Higher is better.
HotpotQA
| Config | 50% | 60% | 70% | 80% | Full doc |
|---|---|---|---|---|---|
| No hint | 65.29 | 66.34 | 68.08 | 70.70 | โ |
| With query hint | 67.28 | 68.02 | 69.95 | 70.96 | โ |
| Full context (baseline) | โ | โ | โ | โ | 69.71 |
At 80% budget, HighSNR beats full-context F1.
Actual token ratio (output / input) โ HotpotQA
| Target | Mean | Median | Min | Max |
|---|---|---|---|---|
| 50% | 55.9% | 55.4% | 41.6% | 71.7% |
| 60% | 67.9% | 67.3% | 55.0% | 83.9% |
| 70% | 79.8% | 79.1% | 69.5% | 99.9% |
| 80% | 91.4% | 90.8% | 81.1% | 100.0% |
Qasper
| Config | 50% | 60% | 70% | 80% | Full doc |
|---|---|---|---|---|---|
| No hint | 35.51 | 38.16 | 41.36 | 45.37 | โ |
| With query hint | 39.87 | 40.76 | 42.97 | 45.21 | โ |
| Full context (baseline) | โ | โ | โ | โ | 47.22 |
At 80% budget, HighSNR retains 96% of full-context F1 on scientific QA.
Actual token ratio (output / input) โ Qasper
| Target | Mean | Median | Min | Max |
|---|---|---|---|---|
| 50% | 54.7% | 54.4% | 37.5% | 69.5% |
| 60% | 66.4% | 66.2% | 47.3% | 79.7% |
| 70% | 78.0% | 77.6% | 69.2% | 92.0% |
| 80% | 89.9% | 89.5% | 79.4% | 100.0% |
Actual ratios exceed the target because HighSNR never cuts a chunk mid-sentence. Chunks are selected whole โ if the next chunk would exceed the budget it is skipped, so the output lands just below (not at) the target. Short documents where a single chunk spans the full budget pull the mean slightly above the target percentage.
Latency
Live API calls ยท 0.5 vCPU / 1 GB ยท n=3,200
< 5k tokens
770 ms median
mean 777 ms
5k โ 10k tokens
1,102 ms median
mean 1,142 ms
10k โ 20k tokens
1,792 ms median
mean 1,833 ms
One endpoint. Pass your document and a token budget. Get back the most relevant chunks.
curl https://api.high-snr.com/v1/optimize \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"document": "your long document text...",
"budget": { "value": 2000 },
"context_hint": "what is the main finding?"
}'
{
"selected_chunks": [
"Most relevant passage from your document...",
"Second most relevant passage..."
],
"selected_chunk_indices": [2, 5]
}
document
The full text to compress. Plain string.
budget.value
Max tokens in the output. Integer token budget for the selected chunks.
context_hint
Optional query string. Biases selection toward relevant chunks.