The data layer for internet-scale crawling.
A scrape API for LLMs, curated datasets, and crawl-job orchestration. From a single URL to a self-hosted fleet.
$ tempo request https://api.inndx.io/v1/scrape/https://yoururlhere.com
# Launching the new platform
Clean, readable Markdown. Nav, ads, and boilerplate stripped. Tables, code blocks, and links preserved, ready for your model.
$ _
Three and a half ways into the crawl engine.
Hit a single URL, query a curated dataset, or orchestrate a full crawl, all backed by the same fetchers and the same fleet.
URL in, clean content out.
Point it at any URL and get back clean, LLM-ready content in Markdown, HTML, or other formats. Navigation, ads, and boilerplate stripped, tables and code preserved. Pay per call with an API key, or with nothing but a crypto wallet over the MPP protocol: no account, agents welcome.
Read the docsSkip the crawl. Query the data.
Query an ever-growing library of pre-curated datasets by API, or stream them by webhook. Continuously refreshed and ready the moment you need them.
Get notifiedSchedule, run, and pipe full crawls.
Sitemap strategies, triggers, retries, and storage sinks, with real-time logs and a fleet that scales itself. Available today to self-host on your own infrastructure; managed templates arrive with the cloud.
Get a licenseURL in, clean content out. Paid by the call.
Point it at any URL and get back clean, model-ready content in Markdown, HTML, or other formats. The md.inndx.io shortcut returns Markdown directly. Pay per call with a wallet over MPP, no account required.
$ tempo request -X POST https://api.inndx.io/v1/scrape \
--json '{"url": "https://yoururlhere.com", "formats": [{"kind": "markdown"}]}'What teams build on inndx.
From one clean URL to a full managed crawl pipeline.
LLM & RAG ingestion
Turn any URL into clean, model-ready Markdown for retrieval and context windows.
Run the whole pipeline on your infrastructure.
Deploy inndx inside your own environment. Your crawls and data never leave your boundary.
Your environment, your data
The full pipeline runs inside your environment; crawled data never leaves your boundary.
Own every crawl job
Define orchestration, fetchers, parsers, and sinks; manage runs, triggers, and data maps.
Self-hosted license
Operate inndx with a dashboard and full access to the crawl machinery.
Everything a crawl needs, handled.
The orchestration layer takes care of the operational details.
Storage sinks
Pipe parsed results straight into your stack: relational, object storage, streaming, or your warehouse.
Sitemap strategies
Crawl by sitemap, seed list, or pattern with depth limits.
Retries & backoff
Automatic retries with polite backoff and rate control.
Schedules & triggers
Run on a cron, a webhook, or on demand.
Self-host or managed
Run it on your infra, or let us run it for you.
Skip the crawl. Query the data.
An ever-growing library of pre-curated datasets, continuously refreshed and ready by API or webhook.
Run the crawler on your own infrastructure.
Crawl-job orchestration is available today for enterprise self-hosting. Tell us what you're building.