Quickstart
Get your API key from the Console, then make your first request.
curl — v2 (recommended)
curl https://api.high-snr.com/v2/optimize \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"document": "Your long document text goes here...",
"max_output_tokens": 2000,
"context_hint": "What are the key findings?"
}'
Python
import requests
response = requests.post(
"https://api.high-snr.com/v2/optimize",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"document": "Your long document text goes here...",
"max_output_tokens": 2000,
"context_hint": "What are the key findings?",
},
)
chunks = response.json()["optimized_chunks"]
Response
{
"optimized_chunks": [
"Highest signal passage from your document...",
"Second highest signal passage..."
],
// present when return_metadata: true
"metadata": {
"input_tokens": 1840,
"output_tokens": 1200,
"compression_ratio": 0.6522
},
// present when return_indices: true
"selected_chunk_indices": [0, 2, 3],
// present when return_discarded_chunks: true
"discarded_chunks": ["Low-signal passage..."],
// present when both return_indices and return_discarded_chunks: true
"discarded_chunk_indices": [1]
}
v1 endpoint (backward compat)
curl https://api.high-snr.com/v1/optimize \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"document": "Your long document text goes here...",
"max_output_tokens": 2000
}'
v2 uses an improved selection algorithm — empirically validated to outperform v1 across all tested budgets. Both support context_hint for query-aware optimization.
Authentication
All API requests require a Bearer token in the Authorization header.
To get an API key:
- Sign up at console.high-snr.com
- Go to API Keys and click Create key
- Copy the key — it is shown only once
Concepts
A chunk is a contiguous passage of text — typically a paragraph or a group of related sentences. When you send a document, HighSNR splits it into chunks automatically. The API selects and returns the highest-signal chunks that fit within your token budget, preserving their original order.
The compression ratio is output_tokens / input_tokens — the fraction of the input that was kept.
A ratio of 0.8 means 80% of the input tokens were returned (20% discarded).
Lower is more aggressive compression; higher retains more of the original.
Available when return_metadata: true.
Endpoint
Two versions available. v2 is recommended — it uses an empirically validated ranking algorithm that outperforms v1 across all tested budgets.
Pass a document (or pre-split chunks) and a token budget. Both versions support context_hint for query-aware optimization. v2 adds return_scores for chunk relevance scores.
chunks.
document.
"unstructured" (default) or "structured". Structured mode preserves heading/section boundaries during splitting. Default: "unstructured".
"chunks" (default) returns an array of selected passages in optimized_chunks. "text" joins them into a single string in optimized_text. v2 only. Default: "chunks".
true.
metadata object. Default: false.
selected_chunk_indices (and discarded_chunk_indices when combined with return_discarded_chunks). Default: false.
discarded_chunks — passages that were ranked below budget. Default: false.
chunk_scores for ranked (non-boundary) chunks. When include_boundaries is true, boundary chunks have no score entry. Default: false.
Errors
All errors follow a standard shape:
{
"error": {
"code": "quota_exhausted",
"message": "Token quota exhausted. Email hello@high-snr.com if you need more.",
"reset_at_utc": "2026-03-17T00:00:00Z"
}
}
- • Daily quota used up (resets at UTC midnight)
- • Free trial period expired
- • Total free tokens exhausted
reset_at_utc. Contact hello@high-snr.com for more tokens.
Quotas
Token usage is counted on input tokens sent to the API.
Free allocation
2M tokens or 14 days, whichever comes first. 250K tokens/day. No card required.
Daily quota
Resets at UTC midnight every day.
API keys
Up to 2 keys. Manage keys from the Console.
Current usage and remaining balances are visible in the Console dashboard. Need more tokens? hello@high-snr.com
Privacy & data retention
HighSNR stores zero document text. Request bodies are never logged, stored, or used for any purpose beyond producing the response.
Only counters and billing metadata are persisted (request count, token counts, timestamps). No content leaves the request/response cycle.
Benchmarks
Evaluated on LongBench v1 using Claude Sonnet 4.5 across HotpotQA (multi-hop QA with distractors) and Qasper (dense academic papers), 200 samples each. All benchmarks are fully reproducible — scripts, data, and results are published on GitHub.
HotpotQA · 50-60% budget · with hint
F1 67-69 vs full-doc baseline 66.26
Exceeds full-context at half the tokens
Qasper · 50% budget · with hint
F1 48.98 vs full-doc baseline 50.69
97% of full-context on dense papers at half the tokens
Budget guidance: 40-60% for RAG and multi-document QA (distractor-heavy); 70-90% for dense single documents where most content is relevant.
View full results and reproduction scripts on GitHubIntegrations
Use HighSNR directly from your existing LLM stack.
LangChain
Available
HighSNRDocumentCompressor and
HighSNRDocumentTransformer — drop HighSNR
into any LangChain pipeline with two lines of code.
Official LangChain docs listing under review.
LlamaIndex
Coming soon
Native TransformComponent and
BaseNodePostprocessor for LlamaIndex pipelines. Examples coming soon.
REST API
Works with any HTTP client. See the quickstart above.