Website Capture

Website capture endpoints convert all pages of a website into PDFs or screenshots in a single API call. The API automatically discovers pages using one of two methods — sitemap parsing or full algorithmic crawling — then processes each URL as a batch job and delivers the results as a ZIP archive.

Paid Plans Only: Website capture requires a paid plan with crawl_mode set to sitemap or full. Free plans will receive a 403 Forbidden response.

Private Keys Only: Website capture is only available when authenticating with a private API key. Public keys and dashboard tokens do not support this feature.

Available Endpoints

Endpoint	Output	Description
`POST /v1/convert/website-to-pdf`	ZIP of PDFs	Convert every discovered page to a PDF
`POST /v1/convert/website-to-screenshot`	ZIP of PNGs	Screenshot every discovered page

Discovery Modes

The API supports two methods for discovering pages on a website. The mode used depends on your plan and the optional crawl_mode parameter.

Sitemap Mode (`crawl_mode: "sitemap"`)

Available on Starter, Pro, and Business plans.

Fetches {url}/sitemap.xml and extracts all page URLs.
If the sitemap is a sitemap index (pointing to child sitemaps), recursively fetches and parses each child sitemap.
If no valid sitemap is found, the request fails with a 400 error.

This is the same behavior as before — simple and fast, but requires the target site to have a valid sitemap.xml.

Full Crawl Mode (`crawl_mode: "full"`)

Available on Pro and Business plans only.

Full crawl uses a two-phase algorithmic discovery pipeline that can find pages even when a website has no sitemap or an incomplete one:

Phase 1 — Seed Discovery (fast, no browser):

Fetches robots.txt to extract Sitemap: directives and disallow rules.
Parses any sitemaps found (standard sitemap.xml + robots.txt sitemaps).
Probes common sitemap paths (/wp-sitemap.xml, /sitemap_index.xml, etc.) if no sitemap was found.
Discovers RSS/Atom feeds from the homepage or common paths (/feed, /rss) and extracts page URLs from feed entries.

Phase 2 — Algorithmic Link Crawl:

Starting from the seed URLs, the crawler follows links on each page to discover additional pages using breadth-first search (BFS).
The crawler automatically adapts per-page — using a fast HTTP parser for static pages and a full browser for JavaScript-rendered pages.
Only same-domain HTML pages are followed. Binary files (images, PDFs, ZIPs), login pages, admin pages, and shopping cart URLs are automatically excluded.
The crawl respects robots.txt disallow rules, detects infinite URL traps (calendars, paginated archives), and backs off on HTTP 429 rate-limit responses.

The full crawl is capped at your plan's batch limit (Pro: 100 pages, Business: 400 pages) and has a maximum duration of 10 minutes.

Auto Mode (`crawl_mode: "auto"`) — Default

When you don't specify a crawl_mode, the API automatically uses the best mode available for your plan:

Starter: Sitemap mode
Pro / Business: Full crawl mode

How It Works

You provide the base URL of a website (e.g. https://example.com).
The API discovers pages using the active discovery mode.
Each discovered URL is converted using the same browser engine as url-to-pdf / url-to-screenshot, with full Clear Capture Mode support.
All converted files are bundled into a ZIP archive and uploaded to storage.
You receive an email notification (or webhook callback) when the job completes, with the results available on the Activity page or via presigned URL.

Request Format

curl -X POST https://api.enconvert.com/v1/convert/website-to-pdf \
  -H "Authorization: Bearer YOUR_PRIVATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com"
  }'

Parameters

Parameter	Type	Default	Description
`url`	string	(required)	The base URL of the website to capture.
`crawl_mode`	string	`"auto"`	Page discovery method: `"auto"` (uses best available for your plan), `"sitemap"` (sitemap.xml only), or `"full"` (algorithmic crawl). See Discovery Modes.
`include_patterns`	array of strings	`null`	Regex patterns to whitelist URLs during full crawl. Only URLs matching at least one pattern will be followed.
`exclude_patterns`	array of strings	`null`	Regex patterns to blacklist URLs during full crawl. Matching URLs will not be crawled or included.
`notification_email`	string	`null`	Email address to notify when the job completes. If omitted, the project owner's email is used as a fallback.
`callback_url`	string	`null`	Webhook URL to receive a POST request when the job completes. See Job Notifications.
`output_filename`	string	`null`	Custom filename prefix for the output ZIP archive.
`auth`	object	`null`	HTTP Basic Auth credentials for password-protected pages. See Authenticated Pages.
`cookies`	array	`null`	Session cookies to inject before loading each page. See Authenticated Pages.
`headers`	object	`null`	Custom HTTP headers to send with every request. See Authenticated Pages.
`load_media`	boolean	`true`	Load images and media assets on each page before conversion.
`enable_scroll`	boolean	`true`	Scroll through each page to trigger lazy-loaded content.
`handle_sticky_header`	boolean	`true`	Neutralize sticky/fixed headers.
`handle_cookies`	boolean	`true`	Dismiss cookie consent banners.
`wait_for_images`	boolean	`true`	Wait for all images to finish loading.
`single_page`	boolean	`false`	Render each page as a single continuous page (PDF only).
`viewport_width`	integer	`1920`	Browser viewport width in pixels.
`viewport_height`	integer	`1080`	Browser viewport height in pixels.
`pdf_options`	object	`null`	PDF output configuration for `website-to-pdf`. Controls page size, orientation, margins, scale, grayscale, and headers/footers. Applied to each page in the batch. See URL Converters — PDF Options.

Response

Website capture is always asynchronous. The API immediately returns an HTTP 202 Accepted response.

{
  "status": "processing",
  "batch_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "url_count": 42,
  "total_discovered": 42,
  "discovery_method": "full_crawl",
  "output_format": "zip"
}

Field	Description
`status`	Always `"processing"` on success.
`batch_id`	Unique identifier for tracking this batch job.
`url_count`	Number of pages that will be converted.
`total_discovered`	Total number of pages discovered during the discovery phase.
`discovery_method`	The discovery method used: `"sitemap"` or `"full_crawl"`.
`output_format`	Always `"zip"` — all pages are bundled into a single archive.

When the job completes, you will receive a notification via email (and/or webhook if configured). The ZIP archive can be downloaded from the Activity page in the dashboard.

Sitemap Requirements (Sitemap Mode)

When using sitemap mode, the target website must have a valid sitemap.xml at its root URL. The API supports:

Standard sitemaps (<urlset>) — a flat list of <url><loc> entries.
Sitemap indexes (<sitemapindex>) — a list of child sitemaps that are recursively fetched and parsed.

If the sitemap cannot be fetched, is not valid XML, or contains no URLs, the API returns a 400 Bad Request error.

Tip: If the target website doesn't have a sitemap.xml, use crawl_mode: "full" (Pro/Business plans) to discover pages algorithmically.

Examples

Website to PDF (Auto Mode)

Uses the best discovery method available for your plan:

curl -X POST https://api.enconvert.com/v1/convert/website-to-pdf \
  -H "Authorization: Bearer YOUR_PRIVATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "notification_email": "team@example.com",
    "output_filename": "docs-site-backup"
  }'

Full Crawl with URL Filtering

Crawl a website but only include blog pages:

curl -X POST https://api.enconvert.com/v1/convert/website-to-pdf \
  -H "Authorization: Bearer YOUR_PRIVATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.example.com",
    "crawl_mode": "full",
    "include_patterns": [".*/blog/.*"],
    "exclude_patterns": [".*/tag/.*", ".*/author/.*"]
  }'

Sitemap Only

Force sitemap-only mode (faster, but requires sitemap.xml):

curl -X POST https://api.enconvert.com/v1/convert/website-to-screenshot \
  -H "Authorization: Bearer YOUR_PRIVATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.example.com",
    "crawl_mode": "sitemap",
    "viewport_width": 1440,
    "viewport_height": 900
  }'

Website to PDF with PDF Options

curl -X POST https://api.enconvert.com/v1/convert/website-to-pdf \
  -H "Authorization: Bearer YOUR_PRIVATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "notification_email": "team@example.com",
    "pdf_options": {
      "page_size": "A4",
      "margins": { "top": 20, "bottom": 20, "left": 15, "right": 15 },
      "footer": {
        "content": "<div style=\"font-size: 9px; width: 100%; text-align: center;\">Page {{page}} of {{total_pages}}</div>",
        "height": 12
      }
    }
  }'

Error Responses

No Sitemap Found (400) — Sitemap Mode

{
  "detail": "Could not fetch sitemap: https://example.com/sitemap.xml returned 404"
}

No Pages Discovered (400) — Full Crawl Mode

{
  "detail": "No pages discovered on https://example.com"
}

Empty Sitemap (400)

{
  "detail": "No URLs found in sitemap: https://example.com/sitemap.xml"
}

Free Plan (403)

{
  "detail": "Website crawling is not available on your current plan. Please upgrade to access this feature."
}

Full Crawl Not Available (403)

Returned when a Starter plan user requests crawl_mode: "full":

{
  "detail": "Full website crawling requires a Pro plan or higher. Your plan supports sitemap-based crawling only."
}

Batch Limit Exceeded (403)

Returned when the number of discovered URLs exceeds your plan's batch limit.

{
  "detail": "Batch size 150 exceeds your plan limit of 50 URLs per batch."
}

Plan Comparison

Feature	Starter	Pro	Business
Sitemap mode	Yes	Yes	Yes
Full crawl mode	No	Yes	Yes
Max pages per batch	50	100	400
Discovery timeout	—	10 min	10 min

← Document Converters Batch Processing →

API Documentation