Smaller LLM context, same answer quality.

Send a document and a token budget. Get back the highest-signal chunks โ€” contiguous passages selected by importance, ready to feed your LLM.

โœ• No AI involved ๐Ÿ”’ 0 data retention = Same input โ†’ same output โšก Sub-second for most docs

Where it fits

LLM calls

Compress before you send

Pass a long document and a token limit. Get back only the chunks that matter. Fewer tokens sent โ€” lower cost, faster responses, less hallucination from noise.

RAG Memory & Embeddings

Fewer vectors, less noise

Before embedding a large corpus, compress documents first. Fewer, higher-quality chunks mean less storage, faster retrieval, and less noise in your vector store.

After retrieval

Compress what RAG returns

Retrieved chunks often exceed your context window. HighSNR compresses candidates to fit your budget โ€” keeping the best passages, dropping the rest.

Why not just use RAG or rerankers?

RAG and rerankers solve different problems. HighSNR works alongside both โ€” or replaces them where they're overkill.

"RAG retrieves. Rerankers reorder. We cut the noise โ€” before or after."

Pipeline fit

1. No RAG needed

single document

Long doc โ†’ HighSNR โ†’ LLM

Contract, paper, report โ€” no vector DB required.

2. Before RAG

fewer vectors

Corpus โ†’ HighSNR โ†’ fewer chunks โ†’ embed โ†’ Vector DB

  • Smaller index
  • Faster retrieval
  • Less noise in results
  • Lower embedding costs

3. After RAG

signal to budget

Corpus โ†’ RAG โ†’ candidates โ†’ HighSNR โ†’ LLM

Retrieved chunks exceed your context window? HighSNR compresses them to fit.

Benchmark

LongBench v1 ยท GPT-4o ยท n=200 per dataset ยท QA F1 score

Evaluated across two multi-hop and single-hop QA datasets. Higher is better.

HotpotQA

QA F1 score โ€” no hint ยท with hint ยท full doc

Actual budget %

At 90% budget, with hint scores 71.57 โ€” beating full-context GPT-4o F1 of 69.71. Budget is accurate: 80% target โ†’ 79.6% actual budget used.

Qasper

QA F1 score โ€” no hint ยท with hint ยท full doc

Actual budget %

At 90% budget, with hint scores 46.25 โ€” retaining 97.9% of full-context GPT-4o F1 (47.22). Actual compression closely tracks the target.

HighSNR never cuts a chunk mid-sentence. Chunks are selected whole โ€” if the next chunk would exceed the budget it is skipped, so the output lands at or just below the target.

Latency

Fast enough for synchronous calls on most documents.

< 5k tokens

770 ms median

mean 777 ms

5k โ€“ 10k tokens

1,102 ms median

mean 1,142 ms

10k โ€“ 20k tokens

1,792 ms median

mean 1,833 ms

View full results and reproduction scripts on GitHub

API

One endpoint. Pass a document or pre-split chunks and a token limit. Get back only the passages that matter.

document input

POST /v1/optimize
curl https://api.high-snr.com/v1/optimize \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": "your long document text...",
    "max_output_tokens": 2000,
    "context_hint": "what is the main finding?"
  }'

pre-split chunks input

POST /v1/optimize
curl https://api.high-snr.com/v1/optimize \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "chunks": ["Section one...", "Section two...", "Section three..."],
    "max_output_tokens": 2000,
    "context_hint": "what is the main finding?"
  }'
Response
{
  "selected_chunks": [
    "Highest-signal passage from your document...",
    "Second highest-signal passage..."
  ]
}

See full parameter reference in the docs.

Try it free

Get started with a free allocation. No card required.

Free

  • โ€ข2M tokens or 14 days โ€” whichever comes first
  • โ€ข250K tokens / day
  • โ€ขNo card required

Need more? hello@high-snr.com