inndx
Scrape API · available now

The data layer for internet-scale crawling.

A scrape API for LLMs, curated datasets, and crawl-job orchestration. From a single URL to a self-hosted fleet.

inndx · scrape api

$ tempo request https://api.inndx.io/v1/scrape/https://yoururlhere.com

# Launching the new platform

Clean, readable Markdown. Nav, ads, and boilerplate stripped. Tables, code blocks, and links preserved, ready for your model.

$ _

Products

Three and a half ways into the crawl engine.

Hit a single URL, query a curated dataset, or orchestrate a full crawl, all backed by the same fetchers and the same fleet.

Scrape APIAvailable

URL in, clean content out.

Point it at any URL and get back clean, LLM-ready content in Markdown, HTML, or other formats. Navigation, ads, and boilerplate stripped, tables and code preserved. Pay per call with an API key, or with nothing but a crypto wallet over the MPP protocol: no account, agents welcome.

Read the docs
Datasets APIComing soon

Skip the crawl. Query the data.

Query an ever-growing library of pre-curated datasets by API, or stream them by webhook. Continuously refreshed and ready the moment you need them.

Get notified
Crawl orchestrationSelf-host

Schedule, run, and pipe full crawls.

Sitemap strategies, triggers, retries, and storage sinks, with real-time logs and a fleet that scales itself. Available today to self-host on your own infrastructure; managed templates arrive with the cloud.

Get a license
Scrape API

URL in, clean content out. Paid by the call.

Point it at any URL and get back clean, model-ready content in Markdown, HTML, or other formats. The md.inndx.io shortcut returns Markdown directly. Pay per call with a wallet over MPP, no account required.

inndx · scrape
POST/v1/scrape
format
output · markdown200 OK
nav
ad
scraping markdown
MPP0x9f…a4c2·0.002 USDC / call·paid
Strips nav, ads, and boilerplatePay with a wallet or an API keyAgents welcome
Use cases

What teams build on inndx.

From one clean URL to a full managed crawl pipeline.

LLM & RAG ingestion

Turn any URL into clean, model-ready Markdown for retrieval and context windows.

GET md.inndx.io/https://… → # markdown
scrape apiparsers
Enterprise · self-host

Run the whole pipeline on your infrastructure.

Deploy inndx inside your own environment. Your crawls and data never leave your boundary.

Your environment, your data

The full pipeline runs inside your environment; crawled data never leaves your boundary.

Own every crawl job

Define orchestration, fetchers, parsers, and sinks; manage runs, triggers, and data maps.

Self-hosted license

Operate inndx with a dashboard and full access to the crawl machinery.

your environmentonline
URLs · sitemaps
Orchestrator
Fetchers
Parsers
Sinks
Orchestratorschedule · rank · policy
your storages3mongodbwebhooks
Under the hood

Everything a crawl needs, handled.

The orchestration layer takes care of the operational details.

Storage sinks

Pipe parsed results straight into your stack: relational, object storage, streaming, or your warehouse.

s3mongodbwebhookkafka

Sitemap strategies

Crawl by sitemap, seed list, or pattern with depth limits.

Retries & backoff

Automatic retries with polite backoff and rate control.

Schedules & triggers

Run on a cron, a webhook, or on demand.

Self-host or managed

Run it on your infra, or let us run it for you.

Datasets

Skip the crawl. Query the data.

An ever-growing library of pre-curated datasets, continuously refreshed and ready by API or webhook.

NewsRetailJobsFinanceSocialResearch

Run the crawler on your own infrastructure.

Crawl-job orchestration is available today for enterprise self-hosting. Tell us what you're building.

Contact salesEnterprise self-hosting available now