Skip to main content

PDF OCR API

Extract text from PDF documents using advanced OCR technology with support for both native text PDFs and scanned document PDFs with high accuracy.

🚀 Key Features

  • Universal PDF Support - Process both native text and scanned PDFs
  • High Accuracy - Advanced text extraction with layout preservation
  • URL-Based Processing - Process files directly from URLs without uploads
  • Metadata Extraction - Get document metadata including title, author, creation date
  • Credit-Based System - Pay only for successful extractions
  • Fast Processing - Quick text extraction with minimal latency
  • Error Handling - Comprehensive validation and error responses

📋 Endpoint

POST requests only - All PDF extraction requests must use the POST method:

POST https://scrapingapi.qoest.com/v1/pdf

🔑 Authentication

All requests must include your API token in the Authorization header using Bearer authentication:

Authorization: Bearer YOUR_API_TOKEN

📊 Parameters

ParameterRequiredTypeDescription
urlYesstringURL pointing to PDF file

URL Format Requirements

  • Must be a valid HTTP/HTTPS URL
  • Must end with .pdf extension
  • File must be publicly accessible

💰 Pricing

Monthly Subscription Tiers

PlanPriceCreditsCost per Credit
Tier 1$10/month10,000 credits$0.001
Tier 2$50/month55,000 credits$0.0009
Tier 3$100/month115,000 credits$0.00087
Tier 4$500/month600,000 credits$0.00083
Tier 5$1,000/month1,250,000 credits$0.0008

Credit Usage

FeatureCredits Required
PDF Text Extraction1 credit per successful extraction

Usage Examples

  • Tier 1 ($10): 10,000 PDF extractions
  • Tier 2 ($50): 55,000 PDF extractions
  • Tier 3 ($100): 115,000 PDF extractions

📝 Examples

Basic PDF Text Extraction

curl --location 'https://scrapingapi.qoest.com/v1/pdf' \
--header 'Authorization: Bearer YOUR_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://example.com/document.pdf"
}'

Response:

{
"text": "This is the extracted text from the PDF document...",
"pages": 5,
"metadata": {
"title": "Document Title",
"author": "Document Author",
"creation_date": "2024-01-15"
}
}

Processing Different PDF Types

Native Text PDFs

curl --location 'https://scrapingapi.qoest.com/v1/pdf' \
--header 'Authorization: Bearer YOUR_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://example.com/native-text-document.pdf"
}'

Scanned PDFs

curl --location 'https://scrapingapi.qoest.com/v1/pdf' \
--header 'Authorization: Bearer YOUR_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://example.com/scanned-document.pdf"
}'

Research Papers

curl --location 'https://scrapingapi.qoest.com/v1/pdf' \
--header 'Authorization: Bearer YOUR_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://example.com/research-paper.pdf"
}'

📤 Response Format

Successful Response (200)

{
"text": "Extracted text content from all pages of the PDF document. This includes all readable text from the document with proper formatting and structure preserved where possible.",
"pages": 3,
"metadata": {
"title": "Document Title",
"author": "Author Name",
"creation_date": "2024-01-15",
"page_count": 3,
"file_size": "2.5MB",
"pdf_version": "1.4"
}
}

Error Responses

Validation Error (422)

{
"message": "The given data was invalid.",
"errors": {
"url": [
"The url field is required.",
"The url must be a valid URL.",
"The URL must point to a valid PDF file."
]
}
}

Insufficient Credits (403)

{
"message": "Insufficient credits"
}

Processing Failed (400)

{
"message": "Failed to extract data from PDF URL"
}

Authentication Required (401)

{
"message": "Unauthenticated."
}

⚠️ Validation Rules

URL Requirements

  • Required field: Must be a valid HTTP/HTTPS URL
  • PDF URLs: Must end with .pdf extension
  • Accessibility: File must be publicly accessible without authentication
  • File size: Recommended maximum 50MB for optimal processing

Credit Requirements

  • Minimum balance: Must have at least 1 credit to process requests
  • Deduction timing: Credits are deducted only after successful processing
  • Failed requests: No credits deducted for failed processing attempts

🚨 Common Issues

  • Invalid URL Format: Ensure URL ends with .pdf and is publicly accessible
  • Insufficient Credits: Check credit balance before making requests
  • File Not Found: Verify the URL is correct and file exists
  • Password Protected PDFs: Remove password protection before processing
  • Large Files: Very large files may timeout - consider optimizing file size
  • Authentication: Ensure Bearer token is correctly formatted and valid
  • Corrupted PDFs: Ensure PDF file is not corrupted or damaged

🎯 Use Cases

Academic Research

Extract text from research papers, books, and academic documents.

curl "https://scrapingapi.qoest.com/v1/pdf" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/research-paper.pdf"}'

Process legal documents, contracts, and compliance materials.

curl "https://scrapingapi.qoest.com/v1/pdf" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/legal-document.pdf"}'

Business Document Analysis

Extract text from reports, proposals, and business documents.

curl "https://scrapingapi.qoest.com/v1/pdf" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/business-report.pdf"}'

Content Management

Process documents for content indexing and search functionality.

curl "https://scrapingapi.qoest.com/v1/pdf" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/manual.pdf"}'

Data Migration

Convert legacy PDF documents to searchable text formats.

curl "https://scrapingapi.qoest.com/v1/pdf" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/legacy-document.pdf"}'

📊 Best Practices

PDF Processing Tips

  • Text-based PDFs: Work best with native text (not scanned images)
  • Scanned PDFs: Also supported but may have lower accuracy
  • File Size: Optimize large PDFs for faster processing
  • Password Protection: Remove password protection before processing
  • Quality: Higher quality scans produce better text extraction results

Performance Optimization

  • Batch Processing: Process multiple files in sequence rather than parallel
  • Error Handling: Automatic retries to achieve 99%+ uptime
  • Credit Monitoring: Monitor credit usage to avoid service interruption
  • URL Validation: Validate URLs before sending requests
  • File Preparation: Ensure PDFs are optimized and accessible

Supported PDF Types

  • Native Text PDFs: Best accuracy and fastest processing
  • Scanned PDFs: OCR processing with good accuracy
  • Mixed Content: PDFs with both text and images
  • Multi-page Documents: Full document processing with page count

👤 User Management

Check User Profile

curl --location 'https://scrapingapi.qoest.com/v1/me' \
--header 'Authorization: Bearer YOUR_TOKEN'

Response:

{
"user": {
"id": 1,
"name": "Your Name",
"email": "[email protected]",
"credits": 9850,
"created_at": "2024-01-15T10:00:00.000000Z",
"updated_at": "2024-01-15T10:00:00.000000Z"
}
}

Add Credits

curl --location 'https://scrapingapi.qoest.com/v1/add-credits' \
--header 'Authorization: Bearer YOUR_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
"amount": 1000
}'

Response:

{
"message": "Credits added successfully",
"remaining_credits": 10850
}