FlatSearch API — Documentation

The research endpoint for AI agents. One API call replaces your scraping loop, content cleaning, and synthesis prompt.

What This Is (And Why It Exists)

FlatSearch is a managed research endpoint for AI agents and chatbots that need live internet data without building and maintaining a scraping pipeline.

We built this infrastructure for our own production applications. At roughly 40% utilisation, we opened the remaining capacity to developers as a clean API. The pricing reflects our actual cost basis — not a margin-maximised SaaS model.

What you get per query:

A live web search across multiple sources
Noise filtering and boilerplate stripping
A fully synthesised markdown answer with cited sources
A structured JSON response you can drop straight into your agent's context window or return to users

No scraping logic. No token math. No citation fees. One flat rate, billed per thousand queries.

The Best of Both Worlds

You aren't locked into our synthesis. Every API response includes both the finished markdown answer and the raw sources array (containing the top 15-25 URLs and text snippets). If your agent just wants to scrape the top links directly—exactly like Serper or Exa—you can simply ignore the answer string and use the raw sources.

Base URL

https://flatsearchapi.com

Authentication

All requests require an API key in the header:

X-API-Key: sp_live_YOUR_KEY

Use sp_test_... keys during development. Test keys do not consume quota.

Fast Search — `/v1/search/fast`

$1.00 per 1,000 queries

Single-pass research and synthesis. Runs parallel source checks, strips low-quality content, and returns a clean markdown answer in under 2.5 seconds.

Use this when: you're building a chatbot, autocomplete layer, customer support bot, or any agent that needs fast web context without managing a scraping pipeline.

Typical latency	1.2s – 2.5s
Sources checked	3 concurrent engines, 5 snippets each
Output	Single-pass synthesised markdown + sources array

Deep Search — `/v1/search/deep`

$2.50 per 1,000 queries

Multi-pass research loop. Generates an initial answer, critiques it internally for gaps, runs targeted follow-up searches, and cross-references before returning. Built for agents where being wrong costs more than being slow.

Use this when: you're building autonomous agents, competitive analysis tools, due diligence workflows, or anything where answer quality is the product.

Typical latency	15s – 25s
Sources checked	5 initial engines + secondary targeted refetches
Output	Multi-perspective synthesised markdown + full citation set

Request & Response Schema

Request Body

Both endpoints take the same request body:

{
  "query": "your search query string"
}

That's it. No pagination parameters, no source filters, no token budget settings. We handle the internals.

Response Body

{
  "request_id": "sp_req_xxxx",
  "query": "your original query string",
  "answer": "Synthesised markdown answer...",
  "sources": [
    {
      "title": "Page Title",
      "url": "https://example.com/page",
      "snippet": "Sanitised text snippet from this source..."
    }
  ],
  "search_queries_generated": ["Query variant 1", "Query variant 2"],
  "response_time_seconds": 1.45,
  "tier": "fast"
}

The fields that matter for most integrations:

answer — your synthesised result, ready to inject into an LLM context window or return directly to a user.
sources — cite these or pass them downstream; already cleaned and formatted.
request_id — use this when contacting support about a specific response.

Real Integration Examples

Example 1: Technical Debugging Query (Fast Search)

This is what a real request and response looks like for a specific, complex technical question.

Request (cURL)

curl https://flatsearchapi.com/v1/search/fast \
  -H "X-API-Key: sp_test_123456789" \
  -H "Content-Type: application/json" \
  -d '{"query": "Next.js 15 App Router authentication token verification error with Iron Session on Vercel deployment with custom subdomains, showing ERR_TOO_MANY_REDIRECTS"}'

Response (JSON)

{
  "request_id": "sp_req_tech_9876",
  "query": "Next.js 15 App Router authentication token verification error with Iron Session on Vercel deployment with custom subdomains, showing ERR_TOO_MANY_REDIRECTS",
  "answer": "The ERR_TOO_MANY_REDIRECTS error occurs because cookie storage defaults to the parent domain, causing session mismatch across subdomains. Fix:\n\n1. Set `cookieOptions.domain = '.yourdomain.com'`\n2. Configure secure headers in middleware.js\n3. Use route-specific checks instead of wildcards\n\n```javascript\nexport const sessionOptions = {\n  password: process.env.SECRET_COOKIE_PASSWORD,\n  cookieName: 'myapp_session',\n  cookieOptions: {\n    secure: process.env.NODE_ENV === 'production',\n    domain: '.yourdomain.com',\n    path: '/'\n  }\n};\n```",
  "sources": [
    {
      "title": "Iron Session cookie options configuration issues",
      "url": "https://github.com/vvo/iron-session/issues/521",
      "snippet": "Adding domain: .domain.com resolved redirect loops across subdomains on Vercel deployments."
    }
  ],
  "search_queries_generated": [
    "Next.js 15 iron session cookie domain subdomain vercel redirects",
    "iron session ERR_TOO_MANY_REDIRECTS v15 App Router"
  ],
  "response_time_seconds": 2.12,
  "tier": "fast"
}

Note the specificity of the answer. This isn't a web link dump — it's a synthesised, actionable response you can pass directly into an agent context or return to a user.

Example 2: Comparative Research Query (Deep Search)

Request (cURL)

curl https://flatsearchapi.com/v1/search/deep \
  -H "X-API-Key: sp_test_123456789" \
  -H "Content-Type: application/json" \
  -d '{"query": "Compare Blackwell B200 vs Nvidia H200 GPU architecture, focusing on memory bandwidth, FP8 TFLOPS performance, interconnect bandwidth, and availability timeline 2025/2026"}'

Response (JSON)

{
  "request_id": "sp_req_deep_1122",
  "query": "Compare Blackwell B200 vs Nvidia H200 GPU architecture, focusing on memory bandwidth, FP8 TFLOPS performance, interconnect bandwidth, and availability timeline 2025/2026",
  "answer": "### Memory Bandwidth\n- H200: 141GB HBM3e at 4.8 TB/s\n- B200: 192GB HBM3e at 8.0 TB/s (~66% increase)\n\n### FP8 Performance\n- H200: 1,979 TFLOPS\n- B200: 9,000 TFLOPS\n\n### Interconnect\n- H200: NVLink 4 at 900 GB/s\n- B200: NVLink 5 at 1.8 TB/s\n\n### Availability\n- H200: Broadly available since mid-2024\n- B200: Full production ramp throughout 2025–2026",
  "sources": [
    {
      "title": "NVIDIA Blackwell Architecture Technical Brief",
      "url": "https://www.nvidia.com/en-us/data-center/blackwell/",
      "snippet": "Blackwell delivers 20 petaflops of AI performance using new FP4/FP8 formats."
    }
  ],
  "response_time_seconds": 18.74,
  "tier": "deep"
}

The deep endpoint ran follow-up searches to cross-reference the availability timeline before returning — that's the extra 16 seconds at work.

Error Reference

Errors return standard JSON with an appropriate HTTP status code.

{
  "error": "ERROR_CODE",
  "message": "Human-readable description",
  "request_id": "sp_req_xxxx"
}

Code	HTTP Status	What It Means	What To Do
`INVALID_API_KEY`	401	Missing or invalid API key	Check your `X-API-Key` header. Use `sp_test_...` for development.
`SEARCH_FAILED`	503	No results retrieved for this query	Retry with exponential backoff. Persistent failures → check status page.
`LLM_FAILED`	502	Synthesis engine failed to respond	Retry. If persistent across multiple requests, contact support.
`HTTP_ERROR`	4xx	Client-side or validation error	Check request schema — query field must be a non-empty string.

On retries: both 502 and 503 are transient. A simple exponential backoff (1s, 2s, 4s) resolves the vast majority of these in practice.

Billing & Overages

You are billed per thousand successful queries at the flat rate for your endpoint tier. Requests that return a 4xx or 5xx error are not counted against your quota.

On the Startup Pro plan, overages beyond your monthly credit bucket are not blocked. They continue seamlessly and bill at standard flat rates from your account balance. You will never hit a wall mid-production.

Rate Limits by Plan

Plan	Fast Search	Deep Search
Hacker (Free)	2 req/s	2 req/s
PAYG	25 req/s	25 req/s
Startup Pro	50 req/s	50 req/s
Enterprise	Custom	Custom

A Note on Why the Pricing Is What It Is

Developers reasonably ask: how is this cheaper than the alternatives?

We pay for our infrastructure — server clusters, proxy networks, synthesis capacity — regardless of how much our primary business uses. Right now that's around 40% of total capacity. FlatSearch is how we put the remaining 60% to work.

You're not getting a discounted product. You're getting access to production infrastructure at our cost basis. That's why the pricing is flat, why credits don't expire on PAYG, and why overages are never blocked.

If you have questions about reliability, capacity, or who's behind this before committing, our LinkedIn is linked from the main site. Message directly — we'll respond.

Support

Discord community channel (all plans)
Direct LinkedIn contact for reliability questions
Enterprise: dedicated support pager

FlatSearch API — flat-rate web search for AI agents. No token math. No surprise bills.