Quickstart

Get your API key from the Console, then make your first request.

curl — v2 (recommended)

POST https://api.high-snr.com/v2/optimize

curl https://api.high-snr.com/v2/optimize \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": "Your long document text goes here...",
    "max_output_tokens": 2000,
    "context_hint": "What are the key findings?"
  }'

Python

import requests

response = requests.post(
    "https://api.high-snr.com/v2/optimize",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "document": "Your long document text goes here...",
        "max_output_tokens": 2000,
        "context_hint": "What are the key findings?",
    },
)
chunks = response.json()["optimized_chunks"]

Response

{
  "optimized_chunks": [
    "Highest signal passage from your document...",
    "Second highest signal passage..."
  ],
  // present when return_metadata: true
  "metadata": {
    "input_tokens": 1840,
    "output_tokens": 1200,
    "compression_ratio": 0.6522
  },
  // present when return_indices: true
  "selected_chunk_indices": [0, 2, 3],
  // present when return_discarded_chunks: true
  "discarded_chunks": ["Low-signal passage..."],
  // present when both return_indices and return_discarded_chunks: true
  "discarded_chunk_indices": [1]
}

v1 endpoint (backward compat)

POST https://api.high-snr.com/v1/optimize

curl https://api.high-snr.com/v1/optimize \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": "Your long document text goes here...",
    "max_output_tokens": 2000
  }'

v2 uses an improved selection algorithm — empirically validated to outperform v1 across all tested budgets. Both support context_hint for query-aware optimization.

Authentication

All API requests require a Bearer token in the Authorization header.

Authorization: Bearer co_...

To get an API key:

Sign up at console.high-snr.com
Go to API Keys and click Create key
Copy the key — it is shown only once

Concepts

A chunk is a contiguous passage of text — typically a paragraph or a group of related sentences. When you send a document, HighSNR splits it into chunks automatically. The API selects and returns the highest-signal chunks that fit within your token budget, preserving their original order.

The compression ratio is output_tokens / input_tokens — the fraction of the input that was kept. A ratio of 0.8 means 80% of the input tokens were returned (20% discarded). Lower is more aggressive compression; higher retains more of the original. Available when return_metadata: true.

Endpoint

Two versions available. v2 is recommended — it uses an empirically validated ranking algorithm that outperforms v1 across all tested budgets.

Pass a document (or pre-split chunks) and a token budget. Both versions support context_hint for query-aware optimization. v2 adds return_scores for chunk relevance scores.

Request body

fieldtypedescription

document string Full text to compress. Mutually exclusive with chunks.

chunks string[] Pre-split passages to rank. Use when you already have chunked text (e.g. from a RAG pipeline). Mutually exclusive with document.

max_output_tokens integer Token budget for the response. Controls how much is returned.

context_hint string Optional. A query or topic to bias selection toward relevant chunks. Max 2,000 characters.

document_type string "unstructured" (default) or "structured". Structured mode preserves heading/section boundaries during splitting. Default: "unstructured".

output_format string "chunks" (default) returns an array of selected passages in optimized_chunks. "text" joins them into a single string in optimized_text. v2 only. Default: "chunks".

include_boundaries boolean Keep the first and last chunk in the output. Useful when intro/conclusion matter (e.g. summaries). Default: true.

return_metadata boolean Return token counts and compression ratio in a metadata object. Default: false.

return_indices boolean Return selected_chunk_indices (and discarded_chunk_indices when combined with return_discarded_chunks). Default: false.

return_discarded_chunks boolean Return discarded_chunks — passages that were ranked below budget. Default: false.

return_scores boolean v2 only. Return relevance scores in chunk_scores for ranked (non-boundary) chunks. When include_boundaries is true, boundary chunks have no score entry. Default: false.

Limit — Maximum input size is ~50K tokens (250K characters). Requests exceeding this are rejected with a 413.

Errors

All errors follow a standard shape:

{
  "error": {
    "code": "quota_exhausted",
    "message": "Token quota exhausted. Email hello@high-snr.com if you need more.",
    "reset_at_utc": "2026-03-17T00:00:00Z"
  }
}

401 unauthorized Missing or invalid API key.

402 quota_exhausted Token quota exhausted. Returned when:

• Daily quota used up (resets at UTC midnight)
• Free trial period expired
• Total free tokens exhausted

Response includes reset_at_utc. Contact hello@high-snr.com for more tokens.

403 forbidden Key lacks the required scope.

429 rate_limited Too many requests. Limit is 60 requests/minute per key.

413 document_too_large Input exceeds ~50K tokens (250K characters).

422 validation_error Invalid request body (missing required fields or invalid field values).

Quotas

Token usage is counted on input tokens sent to the API.

Free allocation

2M tokens or 14 days, whichever comes first. 250K tokens/day. No card required.

Daily quota

Resets at UTC midnight every day.

API keys

Up to 2 keys. Manage keys from the Console.

Current usage and remaining balances are visible in the Console dashboard. Need more tokens? hello@high-snr.com

Privacy & data retention

HighSNR stores zero document text. Request bodies are never logged, stored, or used for any purpose beyond producing the response.

Only counters and billing metadata are persisted (request count, token counts, timestamps). No content leaves the request/response cycle.

Benchmarks

Evaluated on LongBench v1 using Claude Sonnet 4.5 across HotpotQA (multi-hop QA with distractors) and Qasper (dense academic papers), 200 samples each. All benchmarks are fully reproducible — scripts, data, and results are published on GitHub.

HotpotQA · 50-60% budget · with hint

F1 67-69 vs full-doc baseline 66.26

Exceeds full-context at half the tokens

Qasper · 50% budget · with hint

F1 48.98 vs full-doc baseline 50.69

97% of full-context on dense papers at half the tokens

Budget guidance: 40-60% for RAG and multi-document QA (distractor-heavy); 70-90% for dense single documents where most content is relevant.

View full results and reproduction scripts on GitHub

Integrations

Use HighSNR directly from your existing LLM stack.

LangChain

Available

HighSNRDocumentCompressor and HighSNRDocumentTransformer — drop HighSNR into any LangChain pipeline with two lines of code.

pip install langchain-highsnr | Source

Official LangChain docs listing under review.

LlamaIndex

Coming soon

Native TransformComponent and BaseNodePostprocessor for LlamaIndex pipelines. Examples coming soon.

REST API

Works with any HTTP client. See the quickstart above.