PDF OCR API

Extract text from PDF documents using advanced OCR technology with support for both native text PDFs and scanned document PDFs with high accuracy.

🚀 Key Features

Universal PDF Support - Process both native text and scanned PDFs
High Accuracy - Advanced text extraction with layout preservation
URL-Based Processing - Process files directly from URLs without uploads
Metadata Extraction - Get document metadata including title, author, creation date
Credit-Based System - Pay only for successful extractions
Fast Processing - Quick text extraction with minimal latency
Error Handling - Comprehensive validation and error responses

📋 Endpoint

POST requests only - All PDF extraction requests must use the POST method:

POST https://scrapingapi.qoest.com/v1/pdf

🔑 Authentication

All requests must include your API token in the Authorization header using Bearer authentication:

Authorization: Bearer YOUR_API_TOKEN

📊 Parameters

Parameter	Required	Type	Description
`url`	Yes	string	URL pointing to PDF file

URL Format Requirements

Must be a valid HTTP/HTTPS URL
Must end with .pdf extension
File must be publicly accessible

💰 Pricing

Monthly Subscription Tiers

Plan	Price	Credits	Cost per Credit
Tier 1	$10/month	10,000 credits	$0.001
Tier 2	$50/month	55,000 credits	$0.0009
Tier 3	$100/month	115,000 credits	$0.00087
Tier 4	$500/month	600,000 credits	$0.00083
Tier 5	$1,000/month	1,250,000 credits	$0.0008

Credit Usage

Feature	Credits Required
PDF Text Extraction	1 credit per successful extraction

Usage Examples

Tier 1 ($10): 10,000 PDF extractions
Tier 2 ($50): 55,000 PDF extractions
Tier 3 ($100): 115,000 PDF extractions

📝 Examples

Basic PDF Text Extraction

curl --location 'https://scrapingapi.qoest.com/v1/pdf' \
--header 'Authorization: Bearer YOUR_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
    "url": "https://example.com/document.pdf"
}'

Response:

{
    "text": "This is the extracted text from the PDF document...",
    "pages": 5,
    "metadata": {
        "title": "Document Title",
        "author": "Document Author",
        "creation_date": "2024-01-15"
    }
}

Processing Different PDF Types

Native Text PDFs

curl --location 'https://scrapingapi.qoest.com/v1/pdf' \
--header 'Authorization: Bearer YOUR_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
    "url": "https://example.com/native-text-document.pdf"
}'

Scanned PDFs

curl --location 'https://scrapingapi.qoest.com/v1/pdf' \
--header 'Authorization: Bearer YOUR_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
    "url": "https://example.com/scanned-document.pdf"
}'

Research Papers

curl --location 'https://scrapingapi.qoest.com/v1/pdf' \
--header 'Authorization: Bearer YOUR_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
    "url": "https://example.com/research-paper.pdf"
}'

📤 Response Format

Successful Response (200)

{
    "text": "Extracted text content from all pages of the PDF document. This includes all readable text from the document with proper formatting and structure preserved where possible.",
    "pages": 3,
    "metadata": {
        "title": "Document Title",
        "author": "Author Name",
        "creation_date": "2024-01-15",
        "page_count": 3,
        "file_size": "2.5MB",
        "pdf_version": "1.4"
    }
}

Error Responses

Validation Error (422)

{
    "message": "The given data was invalid.",
    "errors": {
        "url": [
            "The url field is required.",
            "The url must be a valid URL.",
            "The URL must point to a valid PDF file."
        ]
    }
}

Insufficient Credits (403)

{
    "message": "Insufficient credits"
}

Processing Failed (400)

{
    "message": "Failed to extract data from PDF URL"
}

Authentication Required (401)

{
    "message": "Unauthenticated."
}

⚠️ Validation Rules

URL Requirements

Required field: Must be a valid HTTP/HTTPS URL
PDF URLs: Must end with .pdf extension
Accessibility: File must be publicly accessible without authentication
File size: Recommended maximum 50MB for optimal processing

Credit Requirements

Minimum balance: Must have at least 1 credit to process requests
Deduction timing: Credits are deducted only after successful processing
Failed requests: No credits deducted for failed processing attempts

🚨 Common Issues

Invalid URL Format: Ensure URL ends with .pdf and is publicly accessible
Insufficient Credits: Check credit balance before making requests
File Not Found: Verify the URL is correct and file exists
Password Protected PDFs: Remove password protection before processing
Large Files: Very large files may timeout - consider optimizing file size
Authentication: Ensure Bearer token is correctly formatted and valid
Corrupted PDFs: Ensure PDF file is not corrupted or damaged

🎯 Use Cases

Academic Research

Extract text from research papers, books, and academic documents.

curl "https://scrapingapi.qoest.com/v1/pdf" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/research-paper.pdf"}'

Legal Document Processing

Process legal documents, contracts, and compliance materials.

curl "https://scrapingapi.qoest.com/v1/pdf" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/legal-document.pdf"}'

Business Document Analysis

Extract text from reports, proposals, and business documents.

curl "https://scrapingapi.qoest.com/v1/pdf" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/business-report.pdf"}'

Content Management

Process documents for content indexing and search functionality.

curl "https://scrapingapi.qoest.com/v1/pdf" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/manual.pdf"}'

Data Migration

Convert legacy PDF documents to searchable text formats.

curl "https://scrapingapi.qoest.com/v1/pdf" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/legacy-document.pdf"}'

📊 Best Practices

PDF Processing Tips

Text-based PDFs: Work best with native text (not scanned images)
Scanned PDFs: Also supported but may have lower accuracy
File Size: Optimize large PDFs for faster processing
Password Protection: Remove password protection before processing
Quality: Higher quality scans produce better text extraction results

Performance Optimization

Batch Processing: Process multiple files in sequence rather than parallel
Error Handling: Automatic retries to achieve 99%+ uptime
Credit Monitoring: Monitor credit usage to avoid service interruption
URL Validation: Validate URLs before sending requests
File Preparation: Ensure PDFs are optimized and accessible

Supported PDF Types

Native Text PDFs: Best accuracy and fastest processing
Scanned PDFs: OCR processing with good accuracy
Mixed Content: PDFs with both text and images
Multi-page Documents: Full document processing with page count

👤 User Management

Check User Profile

curl --location 'https://scrapingapi.qoest.com/v1/me' \
--header 'Authorization: Bearer YOUR_TOKEN'

Response:

{
    "user": {
        "id": 1,
        "name": "Your Name",
        "email": "[email protected]",
        "credits": 9850,
        "created_at": "2024-01-15T10:00:00.000000Z",
        "updated_at": "2024-01-15T10:00:00.000000Z"
    }
}

Add Credits

curl --location 'https://scrapingapi.qoest.com/v1/add-credits' \
--header 'Authorization: Bearer YOUR_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
    "amount": 1000
}'

Response:

{
    "message": "Credits added successfully",
    "remaining_credits": 10850
}

Image OCR - Extract text from images
Web Scraping - Extract website data
Google Search - Extract search results

🚀 Key Features​

📋 Endpoint​

🔑 Authentication​

📊 Parameters​

URL Format Requirements​

💰 Pricing​

Monthly Subscription Tiers​

Credit Usage​

Usage Examples​

📝 Examples​

Basic PDF Text Extraction​

Processing Different PDF Types​

Native Text PDFs​

Scanned PDFs​

Research Papers​

📤 Response Format​

Successful Response (200)​

Error Responses​

Validation Error (422)​

Insufficient Credits (403)​

Processing Failed (400)​

Authentication Required (401)​

⚠️ Validation Rules​

URL Requirements​

Credit Requirements​

🚨 Common Issues​

🎯 Use Cases​

Academic Research​

Legal Document Processing​

Business Document Analysis​

Content Management​

Data Migration​

📊 Best Practices​

PDF Processing Tips​

Performance Optimization​

Supported PDF Types​

👤 User Management​

Check User Profile​

Add Credits​

📚 Related APIs​

🚀 Key Features

📋 Endpoint

🔑 Authentication

📊 Parameters

URL Format Requirements

💰 Pricing

Monthly Subscription Tiers

Credit Usage

Usage Examples

📝 Examples

Basic PDF Text Extraction

Processing Different PDF Types

Native Text PDFs

Scanned PDFs

Research Papers

📤 Response Format

Successful Response (200)

Error Responses

Validation Error (422)

Insufficient Credits (403)

Processing Failed (400)

Authentication Required (401)

⚠️ Validation Rules

URL Requirements

Credit Requirements

🚨 Common Issues

🎯 Use Cases

Academic Research

Legal Document Processing

Business Document Analysis

Content Management

Data Migration

📊 Best Practices

PDF Processing Tips

Performance Optimization

Supported PDF Types

👤 User Management

Check User Profile

Add Credits

📚 Related APIs