PDF OCR
PDF OCR Endpoint
POST /v1/pdf
Use this endpoint to extract metadata and text from a public PDF URL.
#Pricing
- 1 credit per page
- minimum 1 credit
- maximum 30 credits (30-page limit)
Credits are only deducted for successful requests.
#Request Example
curl -X POST /v1/pdf \
-H "Authorization: Bearer your-token" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/report.pdf"
}'
#Request Body
{
"url": "https://example.com/report.pdf"
}
#Fields
urlrequired, must be a valid public URL- the URL path must end in
.pdf
#Response
{
"metadata": {
"word_count": 1250,
"character_count": 7350,
"page_count": 3,
"file_size": "180.42 KB",
"creation_date": "D:20260323000000Z"
},
"pages": [
{
"page": 1,
"text": "Extracted text from page 1"
}
]
}
#Limits
- PDFs are limited to a maximum of 30 pages
- PDFs exceeding 30 pages return
400and do not deduct credits
#Typical Use Cases
- extract text from reports and whitepapers
- process multi-page invoices and statements
- read content from e-books and manuals
- analyze document structure page by page
#Notes
- text is returned per page in the
pagesarray - metadata includes page count when it can be detected
- cost is calculated from the detected page count after successful extraction
- scanned PDFs without embedded text may fail if text cannot be extracted
#Errors
422: URL is missing, invalid, or does not point to a.pdffile403: authenticated user has insufficient credits for the PDF page count400: PDF could not be downloaded, extracted, or exceeds the 30-page limit