url-to-markdown
Convert any publicly accessible URL into clean GitHub-Flavored Markdown with YAML frontmatter metadata. The page is rendered in a real browser, passed through a readability extractor to isolate the main article, then serialised to Markdown with normalised links, images, and fenced code blocks. Boilerplate (navigation, footer, aside, scripts, forms, buttons) is removed, and relative URLs are resolved to absolute URLs.
Useful for LLM ingestion pipelines, content archival, RSS-style feeds, and any workflow that needs plain-text article content without HTML noise.
Endpoint
POST /v1/convert/url-to-markdown
Content-Type: application/json
Output format: Markdown (.md, UTF-8) with a YAML frontmatter block at the top of the file containing page metadata. The output format is not configurable — Markdown with YAML frontmatter is always produced.
Authentication
This endpoint supports both private and public key authentication.
Private Key
Include your secret key in the X-API-Key header. Use this for server-to-server calls where the key is never exposed to the client.
X-API-Key: sk_live_your_private_key
Public Key with JWT
For client-side usage, first generate a JWT token using your public key, then pass it as a Bearer token.
Step 1 -- Get a token:
POST /v1/auth/token
X-API-Key: pk_live_your_public_key
Step 2 -- Use the token:
Authorization: Bearer eyJhbGciOiJIUzI1NiIs...
Request Parameters
Top-Level Parameters
| Parameter | Type | Required | Default | Description | Plan Gating |
|---|---|---|---|---|---|
url |
string or string[] |
Yes | -- | A single URL string or an array of URLs to convert. Multiple URLs require async mode. | -- |
async_mode |
boolean |
No | false |
Run the conversion asynchronously. Returns a batch_id immediately for polling. Required for batch (multiple URLs). |
Requires async access |
direct_download |
boolean |
No | false |
Return raw Markdown bytes in the response body instead of a JSON response with a presigned URL. Forced true for public keys. Incompatible with async_mode and multiple URLs. |
-- |
output_format |
boolean |
No | false |
When true with multiple URLs, bundles all output Markdown files into a single ZIP archive. Requires multiple URLs. |
Requires ZIP output access |
output_filename |
string |
No | Auto-generated | Custom filename for the output file. The .md extension is added automatically. Default format: {domain}_{timestamp}.md. |
-- |
job_id |
string |
No | -- | Client-provided job ID for timeout recovery. Public keys only. When a sync conversion exceeds reverse-proxy timeout limits, the client can poll GET /v1/convert/status/{job_id} to retrieve the result. Ignored for private keys. |
-- |
notification_email |
string |
No | Project owner email | Email address to notify when an async job completes. Private keys only. | -- |
callback_url |
string |
No | -- | Webhook URL to receive a POST request when the conversion completes. Private keys only. | Requires webhook access |
Browser & Rendering Parameters
| Parameter | Type | Required | Default | Description | Plan Gating |
|---|---|---|---|---|---|
viewport_width |
integer |
No | 1920 |
Browser viewport width in pixels. Affects responsive content and which layout variant is captured before extraction. | -- |
viewport_height |
integer |
No | 1080 |
Browser viewport height in pixels. Used as a reference for rendering and viewport-unit calculation. | -- |
load_media |
boolean |
No | true |
Wait for all images and videos to fully load before extraction. When false, extraction is faster but lazy-loaded images may have placeholder src values in the Markdown output. |
-- |
enable_scroll |
boolean |
No | true |
Scroll the page top-to-bottom to trigger lazy-loading content (IntersectionObserver-based loaders). | -- |
handle_sticky_header |
boolean |
No | true |
Detect sticky/fixed headers and scroll to top before extraction so content ordering is preserved correctly. | -- |
handle_cookies |
boolean |
No | true |
Auto-dismiss cookie consent banners (OneTrust, Cookiebot, Didomi, Usercentrics, and generic banners) before extraction. | -- |
wait_for_images |
boolean |
No | true |
Wait for all <img> elements to finish loading (5-second timeout per image) so alt text and final src values are captured correctly. |
-- |
Authentication & Custom Requests
| Parameter | Type | Required | Default | Description | Plan Gating |
|---|---|---|---|---|---|
auth |
object |
No | null |
HTTP Basic Auth credentials for the target URL. Format: {"username": "...", "password": "..."}. Cannot be used together with an Authorization custom header. |
Requires basic auth access |
cookies |
array |
No | null |
Array of cookie objects to inject before navigation. Maximum 50 cookies. Each cookie must have name, value, and either domain or url. |
Requires basic auth access |
headers |
object |
No | null |
Dictionary of custom HTTP headers sent with every request to the target URL. Maximum 20 headers. Blocked headers: host, content-length, transfer-encoding, connection, upgrade, te, trailer. |
Requires basic auth access |
single_page and pdf_options parameters from the url-to-pdf endpoint are accepted for request-shape parity but have no effect on Markdown output. Markdown has no concept of pages, margins, or orientation.
Cookie Object Schema
Each item in the cookies array must follow this structure:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name |
string |
Yes | -- | Cookie name. |
value |
string |
Yes | -- | Cookie value. |
domain |
string |
Conditional | -- | Cookie domain. Either domain or url must be provided. |
url |
string |
Conditional | -- | URL to associate the cookie with. Either domain or url must be provided. |
path |
string |
No | "/" |
Cookie path. Defaults to "/" when domain is set. |
Response
Synchronous with Direct Download (direct_download=true)
Private key -- returns raw Markdown bytes:
HTTP 200 OK
Content-Type: text/markdown; charset=utf-8
Content-Disposition: inline; filename="example_20260421_123456789.md"
X-Object-Key: env/files/{project_id}/url-to-markdown/example_20260421_123456789.md
X-File-Size: 8421
X-Conversion-Time: 6.3
X-Filename: example_20260421_123456789.md
(UTF-8 Markdown with YAML frontmatter)
Public key -- returns JSON with a presigned URL:
{
"presigned_url": "https://spaces.example.com/...",
"object_key": "env/files/{project_id}/url-to-markdown/example_20260421_123456789.md",
"filename": "example_20260421_123456789.md",
"file_size": 8421,
"conversion_time_seconds": 6.3,
"job_id": "client-provided-id"
}
Synchronous without Direct Download (direct_download=false)
Available with private keys only.
{
"presigned_url": "https://spaces.example.com/...",
"object_key": "env/files/{project_id}/url-to-markdown/example_20260421_123456789.md",
"filename": "example_20260421_123456789.md",
"file_size": 8421,
"conversion_time_seconds": 6.3
}
Asynchronous Mode
Returns immediately with a batch_id for polling.
HTTP 202 Accepted
{
"status": "processing",
"batch_id": "550e8400-e29b-41d4-a716-446655440000",
"url_count": 5,
"output_format": "individual"
}
When output_format=true (ZIP bundling):
{
"status": "processing",
"batch_id": "550e8400-e29b-41d4-a716-446655440000",
"url_count": 5,
"output_format": "zip"
}
Job Status Polling (Public Keys Only)
For public key timeout recovery:
GET /v1/convert/status/{job_id}
Authorization: Bearer <jwt_token>
| Status | Response |
|---|---|
| Processing | {"status": "processing"} |
| Success | {"status": "success", "presigned_url": "...", "object_key": "..."} |
| Failed | {"status": "failed", "error": "..."} |
Batch Status Polling (Private Keys Only)
For async batch jobs, poll with the batch_id from the 202 response:
GET /v1/convert/batch/{batch_id}
X-API-Key: sk_live_your_private_key
Returns aggregate status, per-URL statuses, and presigned download URLs. See Batch Status Polling for the full response schema.
Webhook Callback Payload
When a callback_url is provided, Enconvert sends a POST request to that URL on completion.
Single URL job:
{
"job_id": "activity_id",
"status": "success",
"batch_id": "...",
"gcs_uri": "object_key",
"filename": "example_20260421_123456789.md",
"file_size": 8421
}
Batch job:
{
"job_id": "activity_id",
"status": "success",
"batch_id": "...",
"total_tasks": 10,
"successful_tasks": 8,
"failed_tasks": 2,
"tasks": [
{"url": "https://example.com/page1", "status": "success", "filename": "page1.md"},
{"url": "https://example.com/page2", "status": "failed", "filename": null}
]
}
Output Format
Every Markdown file starts with a YAML frontmatter block containing page metadata, followed by the extracted article body.
---
url: https://example.com/articles/my-post
title: My Post Title
description: A short summary from the meta description or og:description tag.
links:
- url: https://example.com/related
text: Related article
- url: https://example.com/about
text: About the author
images:
- url: https://example.com/hero.jpg
alt: Hero image alt text
- url: https://example.com/diagram.png
alt: Architecture diagram
---
# My Post Title
Opening paragraph of the article body, converted to GitHub-Flavored Markdown...
## A Section Heading
- List item one
- List item two
\`\`\`python
def example():
return "code blocks are fenced with language hints"
\`\`\`
[A link in the body](https://example.com/linked-page)

Frontmatter Fields
| Field | Type | Description |
|---|---|---|
url |
string |
The final URL after redirects (not always the URL you sent). |
title |
string |
The page title from <title>, falling back to the Readability-detected short title. |
description |
string |
The <meta name="description"> value, falling back to <meta property="og:description">. |
links |
array |
Every <a href> found on the page, with absolute URLs and visible anchor text. |
images |
array |
Every <img src> found on the page, with absolute URLs and alt text. |
Markdown Conventions
- Heading style: ATX (
#,##,###) - List bullets:
- - Emphasis:
*bold*,*italic*with escaped*and_in literal text - Soft line breaks: trailing two spaces (preserved in output)
- Code blocks: fenced (
```) with language hints detected fromclass="language-xxx",class="lang-xxx",class="highlight-source-xxx",data-lang, anddata-language - Links:
[text](url)when anchor text is present, autolink form<url>when the anchor is empty, anchor-only (#foo) andjavascript:links are unwrapped to plain text - Images:
, withtitlepreserved when present, falling back todata-srcwhensrcis missing (lazy-loaded images) - Horizontal rules:
---
Features
Clean Content Extraction
Enconvert uses the Readability algorithm (the same library that powers Firefox Reader View) to isolate the main article content from the rest of the page, then applies a second pass of post-processing to produce clean Markdown.
Removed before conversion:
- Navigation (
<nav>), footers (<footer>), asides (<aside>) - Scripts (
<script>,<noscript>), styles (<style>), iframes, forms, buttons - Inline SVG, canvas, and template elements
style,class,id, and allon*event handler attributes
Preserved:
- Headings, paragraphs, lists, tables, blockquotes, code blocks
- Links with their
hrefand anchor text (absolute URLs) - Images with
alt,title, and absolutesrc - Figures and figcaptions (inline images are kept inside these)
Clear Capture Mode
Before extraction, the page is rendered in a real browser and cleaned up the same way as url-to-pdf:
- Cookie consent banners -- Auto-dismissed across the main page and iframes (OneTrust, Cookiebot, Didomi, Usercentrics, and generic banners).
- Modal and popup dismissal -- Overlays closed via Escape key, ARIA close buttons, class-based close buttons, and role-based dialog buttons.
- Scroll animation reveal -- Forces visibility on elements hidden by WOW.js, AOS, ScrollReveal, GSAP ScrollTrigger, and generic animation classes.
- Sticky header handling -- Sticky/fixed headers detected and the page scrolled back to the top so content ordering is preserved.
Absolute URL Resolution
Every relative href and src in the extracted article is resolved against the final page URL (after redirects), so the Markdown output always contains absolute, clickable links — useful for LLM ingestion pipelines that would otherwise see broken relative paths.
Anchor-only links (#section), javascript: links, mailto:, and tel: links are not rewritten. Anchor-only and javascript: links are unwrapped to plain text because they have no meaning outside the original page.
Code Block Language Detection
Code blocks are fenced with a detected language hint where possible:
<pre><code class="language-python">...</code></pre> → ```python
<pre data-lang="js">...</pre> → ```js
<pre><code class="highlight-source-shell">...</code></pre> → ```shell
Classes matching language-*, lang-*, highlight-source-*, and brush:* are recognised, plus data-lang and data-language attributes on both the <pre> and its nested <code>. If no hint is found, the block is fenced without a language label.
HTTP Basic Auth
Pass auth with username and password to convert pages behind HTTP Basic Authentication.
{
"url": "https://staging.example.com/docs/article",
"auth": {
"username": "admin",
"password": "secret"
}
}
Cookie Injection
Inject up to 50 cookies before the page loads. Useful for converting member-only or locale-specific article pages.
{
"url": "https://example.com/members/post",
"cookies": [
{"name": "session_id", "value": "abc123", "domain": "example.com"},
{"name": "locale", "value": "en-US", "domain": "example.com"}
]
}
Custom Headers
Send up to 20 custom HTTP headers with every request to the target page.
{
"url": "https://example.com/api-docs",
"headers": {
"X-Custom-Token": "my-token-value",
"Accept-Language": "en-US"
}
}
Lazy Image Loading
When load_media and enable_scroll are enabled (both default to true), the converter scrolls the page slowly to trigger lazy loaders, then waits for all images to finish loading before capturing the final HTML. This ensures data-src values have been promoted to real src values and the frontmatter images list is complete.
Set load_media=false for faster extraction when you only need the text body — placeholder src values may remain in the output.
Additional Rendering Features
- Viewport unit normalization -- CSS viewport units (
vh,svh,lvh,dvh) are converted to fixed pixel values before extraction. - Stealth mode -- Browser fingerprint masking to avoid bot detection on protected pages.
- Popup interception -- Automatically closes any new browser tabs or popups triggered by the page.
- CSP bypass -- Handles Content Security Policy and Trusted Types restrictions that would otherwise block page manipulation.
Subscription Plan Gating
| Feature | Free | Starter | Pro | Enterprise |
|---|---|---|---|---|
| Basic conversion (single URL, sync) | Yes | Yes | Yes | Yes |
| Viewport and rendering options | Yes | Yes | Yes | Yes |
| Async mode | No | Yes | Yes | Yes |
| Batch processing (multiple URLs) | No | Yes | Yes | Yes |
| ZIP output bundling | No | No | Yes | Yes |
| Webhook callbacks | No | No | Yes | Yes |
| HTTP Basic Auth | No | Yes | Yes | Yes |
| Cookie injection | No | Yes | Yes | Yes |
| Custom headers | No | Yes | Yes | Yes |
| Monthly conversions | 100 | Plan-based | Plan-based | Unlimited |
| Batch size limit | 0 | Plan-based | Plan-based | Unlimited |
| File retention | 1 hour | Plan-based | Plan-based | Plan-based |
Async Mode
Asynchronous mode is useful for long-running conversions or when converting multiple URLs.
How It Works
- Send a request with
async_mode=true(or pass multiple URLs, which enables async automatically). - The API returns HTTP 202 immediately with a
batch_idandurl_count. - Each URL is converted in the background, uploaded to storage, and tracked individually.
- Monitor completion via batch status polling, email notification, or webhook callback.
Email Notification
By default, a completion email is sent to the project owner's email address. Override with notification_email:
{
"url": ["https://example.com/page1", "https://example.com/page2"],
"async_mode": true,
"notification_email": "team@example.com"
}
Webhook Callback
Provide a callback_url to receive an automatic POST notification on completion:
{
"url": ["https://example.com/page1", "https://example.com/page2"],
"async_mode": true,
"callback_url": "https://your-server.com/webhook/enconvert"
}
The webhook is sent with a 30-second timeout and considers HTTP 200, 201, 202, and 204 as successful delivery.
Batch and Bulk Processing
Convert multiple URLs in a single request. Requires async mode and a private key.
Individual Output (default)
Each URL produces a separate Markdown file:
{
"url": [
"https://example.com/post-1",
"https://example.com/post-2",
"https://example.com/post-3"
],
"async_mode": true
}
ZIP Bundle Output
Bundle all Markdown files into a single ZIP archive:
{
"url": [
"https://example.com/post-1",
"https://example.com/post-2",
"https://example.com/post-3"
],
"async_mode": true,
"output_format": true,
"output_filename": "blog-archive"
}
The ZIP file is named {output_filename}_{timestamp}.zip or batch_{timestamp}.zip if no custom name is provided.
Code Examples
Python (Private Key)
import requests
response = requests.post(
"https://api.enconvert.com/v1/convert/url-to-markdown",
headers={"X-API-Key": "sk_live_your_private_key"},
json={
"url": "https://example.com/articles/my-post",
"direct_download": True
}
)
response.raise_for_status()
markdown_text = response.content.decode("utf-8")
print(markdown_text)
PHP (Private Key)
$ch = curl_init("https://api.enconvert.com/v1/convert/url-to-markdown");
curl_setopt_array($ch, [
CURLOPT_POST => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => [
"Content-Type: application/json",
"X-API-Key: sk_live_your_private_key"
],
CURLOPT_POSTFIELDS => json_encode([
"url" => "https://example.com/articles/my-post"
])
]);
$response = curl_exec($ch);
curl_close($ch);
$data = json_decode($response, true);
echo $data["presigned_url"];
Node.js (Private Key)
const response = await fetch("https://api.enconvert.com/v1/convert/url-to-markdown", {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": "sk_live_your_private_key"
},
body: JSON.stringify({
url: "https://example.com/articles/my-post"
})
});
const data = await response.json();
console.log(data.presigned_url);
Go (Private Key)
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)
func main() {
body, _ := json.Marshal(map[string]interface{}{
"url": "https://example.com/articles/my-post",
})
req, _ := http.NewRequest("POST", "https://api.enconvert.com/v1/convert/url-to-markdown", bytes.NewBuffer(body))
req.Header.Set("Content-Type", "application/json")
req.Header.Set("X-API-Key", "sk_live_your_private_key")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
respBody, _ := io.ReadAll(resp.Body)
fmt.Println(string(respBody))
}
JavaScript -- Browser (Public Key)
// Step 1: Get a JWT token
const tokenRes = await fetch("https://api.enconvert.com/v1/auth/token", {
method: "POST",
headers: { "X-API-Key": "pk_live_your_public_key" }
});
const { token } = await tokenRes.json();
// Step 2: Convert URL to Markdown (public keys receive raw bytes)
const convertRes = await fetch("https://api.enconvert.com/v1/convert/url-to-markdown", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${token}`
},
body: JSON.stringify({
url: "https://example.com/articles/my-post"
})
});
const markdown = await convertRes.text();
console.log(markdown);
React (Public Key)
import { useState } from "react";
function UrlToMarkdown() {
const [loading, setLoading] = useState(false);
const [markdown, setMarkdown] = useState("");
async function convertUrl() {
setLoading(true);
try {
// Get JWT token
const tokenRes = await fetch("https://api.enconvert.com/v1/auth/token", {
method: "POST",
headers: { "X-API-Key": "pk_live_your_public_key" }
});
const { token } = await tokenRes.json();
// Convert
const convertRes = await fetch("https://api.enconvert.com/v1/convert/url-to-markdown", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${token}`
},
body: JSON.stringify({ url: "https://example.com/articles/my-post" })
});
setMarkdown(await convertRes.text());
} finally {
setLoading(false);
}
}
return (
<div>
<button onClick={convertUrl} disabled={loading}>
{loading ? "Converting..." : "Convert to Markdown"}
</button>
{markdown && <pre>{markdown}</pre>}
</div>
);
}
export default UrlToMarkdown;
Error Responses
| Status | Condition |
|---|---|
400 Bad Request |
Missing or empty url parameter |
400 Bad Request |
output_format=true with a single URL (requires multiple URLs) |
400 Bad Request |
direct_download=true with multiple URLs |
400 Bad Request |
direct_download=true with async_mode=true |
400 Bad Request |
Invalid auth object (missing username or password) |
400 Bad Request |
Invalid cookies (not an array, exceeds 50 entries, missing required fields) |
400 Bad Request |
Invalid headers (not an object, exceeds 20 entries, blocked header names, non-string values) |
400 Bad Request |
Conflicting auth and Authorization custom header |
400 Bad Request |
Public key attempting multiple URLs |
401 Unauthorized |
Missing or invalid API key / JWT token |
402 Payment Required |
Monthly conversion limit reached |
402 Payment Required |
Batch would exceed remaining monthly quota |
402 Payment Required |
Storage limit reached |
403 Forbidden |
Endpoint not in the API key's allowed endpoints |
403 Forbidden |
Feature not available on current plan (async, webhook, ZIP, basic auth) |
403 Forbidden |
Batch size exceeds plan's batch limit |
404 Not Found |
Job ID not found (when polling status) |
500 Internal Server Error |
Conversion failed (browser crash, navigation error, extraction failure) |
Limits
| Limit | Value |
|---|---|
| Page navigation timeout | 60 seconds |
| Per-image load timeout | 5 seconds |
| Cookie banner dismiss timeout | 3 seconds |
| Maximum cookies per request | 50 |
| Maximum custom headers per request | 20 |
| Monthly conversions | Plan-dependent (Free: 100) |
| Batch size | Plan-dependent (Free: disabled) |
| File retention | Plan-dependent (Free: 1 hour) |
| Webhook delivery timeout | 30 seconds |