Website Capture
Website capture endpoints convert all pages of a website into PDFs or screenshots in a single API call. The API automatically discovers pages using one of two methods — sitemap parsing or full algorithmic crawling — then processes each URL as a batch job and delivers the results as a ZIP archive.
crawl_mode set to sitemap or full. Free plans will receive a 403 Forbidden response.
Available Endpoints
| Endpoint | Output | Description |
|---|---|---|
POST /v1/convert/website-to-pdf |
ZIP of PDFs | Convert every discovered page to a PDF |
POST /v1/convert/website-to-screenshot |
ZIP of PNGs | Screenshot every discovered page |
Discovery Modes
The API supports two methods for discovering pages on a website. The mode used depends on your plan and the optional crawl_mode parameter.
Sitemap Mode (crawl_mode: "sitemap")
Available on Starter, Pro, and Business plans.
- Fetches
{url}/sitemap.xmland extracts all page URLs. - If the sitemap is a sitemap index (pointing to child sitemaps), recursively fetches and parses each child sitemap.
- If no valid sitemap is found, the request fails with a 400 error.
This is the same behavior as before — simple and fast, but requires the target site to have a valid sitemap.xml.
Full Crawl Mode (crawl_mode: "full")
Available on Pro and Business plans only.
Full crawl uses a two-phase algorithmic discovery pipeline that can find pages even when a website has no sitemap or an incomplete one:
Phase 1 — Seed Discovery (fast, no browser):
- Fetches
robots.txtto extractSitemap:directives and disallow rules. - Parses any sitemaps found (standard sitemap.xml + robots.txt sitemaps).
- Probes common sitemap paths (
/wp-sitemap.xml,/sitemap_index.xml, etc.) if no sitemap was found. - Discovers RSS/Atom feeds from the homepage or common paths (
/feed,/rss) and extracts page URLs from feed entries.
Phase 2 — Algorithmic Link Crawl:
- Starting from the seed URLs, the crawler follows links on each page to discover additional pages using breadth-first search (BFS).
- The crawler automatically adapts per-page — using a fast HTTP parser for static pages and a full browser for JavaScript-rendered pages.
- Only same-domain HTML pages are followed. Binary files (images, PDFs, ZIPs), login pages, admin pages, and shopping cart URLs are automatically excluded.
- The crawl respects
robots.txtdisallow rules, detects infinite URL traps (calendars, paginated archives), and backs off on HTTP 429 rate-limit responses.
The full crawl is capped at your plan's batch limit (Pro: 100 pages, Business: 400 pages) and has a maximum duration of 10 minutes.
Auto Mode (crawl_mode: "auto") — Default
When you don't specify a crawl_mode, the API automatically uses the best mode available for your plan:
- Starter: Sitemap mode
- Pro / Business: Full crawl mode
How It Works
- You provide the base URL of a website (e.g.
https://example.com). - The API discovers pages using the active discovery mode.
- Each discovered URL is converted using the same browser engine as
url-to-pdf/url-to-screenshot, with full Clear Capture Mode support. - All converted files are bundled into a ZIP archive and uploaded to storage.
- You receive an email notification (or webhook callback) when the job completes, with the results available on the Activity page or via presigned URL.
Request Format
curl -X POST https://api.enconvert.com/v1/convert/website-to-pdf \
-H "Authorization: Bearer YOUR_PRIVATE_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com"
}'
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string | (required) | The base URL of the website to capture. |
crawl_mode |
string | "auto" |
Page discovery method: "auto" (uses best available for your plan), "sitemap" (sitemap.xml only), or "full" (algorithmic crawl). See Discovery Modes. |
include_patterns |
array of strings | null |
Regex patterns to whitelist URLs during full crawl. Only URLs matching at least one pattern will be followed. |
exclude_patterns |
array of strings | null |
Regex patterns to blacklist URLs during full crawl. Matching URLs will not be crawled or included. |
notification_email |
string | null |
Email address to notify when the job completes. If omitted, the project owner's email is used as a fallback. |
callback_url |
string | null |
Webhook URL to receive a POST request when the job completes. See Job Notifications. |
output_filename |
string | null |
Custom filename prefix for the output ZIP archive. |
auth |
object | null |
HTTP Basic Auth credentials for password-protected pages. See Authenticated Pages. |
cookies |
array | null |
Session cookies to inject before loading each page. See Authenticated Pages. |
headers |
object | null |
Custom HTTP headers to send with every request. See Authenticated Pages. |
load_media |
boolean | true |
Load images and media assets on each page before conversion. |
enable_scroll |
boolean | true |
Scroll through each page to trigger lazy-loaded content. |
handle_sticky_header |
boolean | true |
Neutralize sticky/fixed headers. |
handle_cookies |
boolean | true |
Dismiss cookie consent banners. |
wait_for_images |
boolean | true |
Wait for all images to finish loading. |
single_page |
boolean | false |
Render each page as a single continuous page (PDF only). |
viewport_width |
integer | 1920 |
Browser viewport width in pixels. |
viewport_height |
integer | 1080 |
Browser viewport height in pixels. |
pdf_options |
object | null |
PDF output configuration for website-to-pdf. Controls page size, orientation, margins, scale, grayscale, and headers/footers. Applied to each page in the batch. See URL Converters — PDF Options. |
Response
Website capture is always asynchronous. The API immediately returns an HTTP 202 Accepted response.
{
"status": "processing",
"batch_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"url_count": 42,
"total_discovered": 42,
"discovery_method": "full_crawl",
"output_format": "zip"
}
| Field | Description |
|---|---|
status |
Always "processing" on success. |
batch_id |
Unique identifier for tracking this batch job. |
url_count |
Number of pages that will be converted. |
total_discovered |
Total number of pages discovered during the discovery phase. |
discovery_method |
The discovery method used: "sitemap" or "full_crawl". |
output_format |
Always "zip" — all pages are bundled into a single archive. |
When the job completes, you will receive a notification via email (and/or webhook if configured). The ZIP archive can be downloaded from the Activity page in the dashboard.
Sitemap Requirements (Sitemap Mode)
When using sitemap mode, the target website must have a valid sitemap.xml at its root URL. The API supports:
- Standard sitemaps (
<urlset>) — a flat list of<url><loc>entries. - Sitemap indexes (
<sitemapindex>) — a list of child sitemaps that are recursively fetched and parsed.
If the sitemap cannot be fetched, is not valid XML, or contains no URLs, the API returns a 400 Bad Request error.
crawl_mode: "full" (Pro/Business plans) to discover pages algorithmically.
Examples
Website to PDF (Auto Mode)
Uses the best discovery method available for your plan:
curl -X POST https://api.enconvert.com/v1/convert/website-to-pdf \
-H "Authorization: Bearer YOUR_PRIVATE_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"notification_email": "team@example.com",
"output_filename": "docs-site-backup"
}'
Full Crawl with URL Filtering
Crawl a website but only include blog pages:
curl -X POST https://api.enconvert.com/v1/convert/website-to-pdf \
-H "Authorization: Bearer YOUR_PRIVATE_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.example.com",
"crawl_mode": "full",
"include_patterns": [".*/blog/.*"],
"exclude_patterns": [".*/tag/.*", ".*/author/.*"]
}'
Sitemap Only
Force sitemap-only mode (faster, but requires sitemap.xml):
curl -X POST https://api.enconvert.com/v1/convert/website-to-screenshot \
-H "Authorization: Bearer YOUR_PRIVATE_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.example.com",
"crawl_mode": "sitemap",
"viewport_width": 1440,
"viewport_height": 900
}'
Website to PDF with PDF Options
curl -X POST https://api.enconvert.com/v1/convert/website-to-pdf \
-H "Authorization: Bearer YOUR_PRIVATE_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"notification_email": "team@example.com",
"pdf_options": {
"page_size": "A4",
"margins": { "top": 20, "bottom": 20, "left": 15, "right": 15 },
"footer": {
"content": "<div style=\"font-size: 9px; width: 100%; text-align: center;\">Page {{page}} of {{total_pages}}</div>",
"height": 12
}
}
}'
Error Responses
No Sitemap Found (400) — Sitemap Mode
{
"detail": "Could not fetch sitemap: https://example.com/sitemap.xml returned 404"
}
No Pages Discovered (400) — Full Crawl Mode
{
"detail": "No pages discovered on https://example.com"
}
Empty Sitemap (400)
{
"detail": "No URLs found in sitemap: https://example.com/sitemap.xml"
}
Free Plan (403)
{
"detail": "Website crawling is not available on your current plan. Please upgrade to access this feature."
}
Full Crawl Not Available (403)
Returned when a Starter plan user requests crawl_mode: "full":
{
"detail": "Full website crawling requires a Pro plan or higher. Your plan supports sitemap-based crawling only."
}
Batch Limit Exceeded (403)
Returned when the number of discovered URLs exceeds your plan's batch limit.
{
"detail": "Batch size 150 exceeds your plan limit of 50 URLs per batch."
}
Plan Comparison
| Feature | Starter | Pro | Business |
|---|---|---|---|
| Sitemap mode | Yes | Yes | Yes |
| Full crawl mode | No | Yes | Yes |
| Max pages per batch | 50 | 100 | 400 |
| Discovery timeout | — | 10 min | 10 min |