PDF OCR

PDF OCR Endpoint

POST /v1/pdf

Use this endpoint to extract metadata and text from a public PDF URL.

#Pricing

  • 1 credit per page
  • minimum 1 credit
  • maximum 30 credits (30-page limit)

Credits are only deducted for successful requests.

#Request Example

curl -X POST /v1/pdf \
  -H "Authorization: Bearer your-token" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/report.pdf"
  }'

#Request Body

{
  "url": "https://example.com/report.pdf"
}

#Fields

  • url required, must be a valid public URL
  • the URL path must end in .pdf

#Response

{
  "metadata": {
    "word_count": 1250,
    "character_count": 7350,
    "page_count": 3,
    "file_size": "180.42 KB",
    "creation_date": "D:20260323000000Z"
  },
  "pages": [
    {
      "page": 1,
      "text": "Extracted text from page 1"
    }
  ]
}

#Limits

  • PDFs are limited to a maximum of 30 pages
  • PDFs exceeding 30 pages return 400 and do not deduct credits

#Typical Use Cases

  • extract text from reports and whitepapers
  • process multi-page invoices and statements
  • read content from e-books and manuals
  • analyze document structure page by page

#Notes

  • text is returned per page in the pages array
  • metadata includes page count when it can be detected
  • cost is calculated from the detected page count after successful extraction
  • scanned PDFs without embedded text may fail if text cannot be extracted

#Errors

  • 422: URL is missing, invalid, or does not point to a .pdf file
  • 403: authenticated user has insufficient credits for the PDF page count
  • 400: PDF could not be downloaded, extracted, or exceeds the 30-page limit