← Blog · April 16, 2026 · 3 min read

Bulk OCR thousands of PDFs with curl and the XRPpdf API

A step-by-step guide to batch-processing thousands of scanned PDFs into searchable documents using curl, bash, and the XRPpdf API — with webhooks, error handling, and rate-limit awareness.

api how-to engineering

You have a folder full of scanned PDFs. Maybe hundreds. Maybe thousands. You need every one of them searchable. Here's how to do it with curl, a few lines of bash, and the XRPpdf API.

Prerequisites

An XRP wallet linked at xrppdf.com
Page credits funded (Scale tier: 30 XRP → 10,000 pages at $0.0043/page)
An API key from your dashboard
curl and jq installed

Step 1: Submit all files

#!/usr/bin/env bash
set -euo pipefail

API_KEY="xrpocr_live_YOUR_KEY_HERE"
INPUT_DIR="./scanned-pdfs"
LOG_FILE="./ocr-jobs.log"

# Clear previous log
> "$LOG_FILE"

for pdf in "$INPUT_DIR"/*.pdf; do
  filename=$(basename "$pdf")

  response=$(curl -s -X POST https://xrppdf.com/api/v1/ocr \
    -H "Authorization: Bearer $API_KEY" \
    -F "file=@$pdf")

  job_id=$(echo "$response" | jq -r '.job_id // empty')

  if [[ -n "$job_id" ]]; then
    echo "$filename $job_id" >> "$LOG_FILE"
    echo "✓ Submitted: $filename → $job_id"
  else
    error=$(echo "$response" | jq -r '.error // "unknown error"')
    echo "✗ Failed: $filename — $error" >&2
  fi
done

echo "Done. $(wc -l < "$LOG_FILE") jobs submitted."

This loops through every PDF in ./scanned-pdfs/, submits each one, and logs the filename-to-job-ID mapping.

Step 2: Poll for results and download

#!/usr/bin/env bash
set -euo pipefail

API_KEY="xrpocr_live_YOUR_KEY_HERE"
LOG_FILE="./ocr-jobs.log"
OUTPUT_DIR="./searchable-pdfs"
mkdir -p "$OUTPUT_DIR"

while IFS=' ' read -r filename job_id; do
  echo -n "Checking $filename ($job_id)... "

  # Poll until done (max 5 minutes per job)
  for attempt in $(seq 1 60); do
    status_json=$(curl -s \
      -H "Authorization: Bearer $API_KEY" \
      "https://xrppdf.com/api/v1/jobs/$job_id")

    status=$(echo "$status_json" | jq -r '.status')

    if [[ "$status" == "complete" ]]; then
      curl -s -o "$OUTPUT_DIR/$filename" \
        -H "Authorization: Bearer $API_KEY" \
        "https://xrppdf.com/download/$job_id"
      echo "✓ Downloaded"
      break
    elif [[ "$status" == "error" ]]; then
      echo "✗ Error: $(echo "$status_json" | jq -r '.error')" >&2
      break
    else
      sleep 5
    fi
  done
done < "$LOG_FILE"

echo "Done. Results in $OUTPUT_DIR/"

Better approach: use webhooks

Polling works, but webhooks are cleaner — especially for large batches. Instead of looping and sleeping, let XRPpdf call you when each job finishes.

1. Register a webhook

curl -X POST https://xrppdf.com/api/webhooks \
  -H "Authorization: Bearer $SESSION_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-server.com/ocr-callback"}'

You'll get back a secret (shown once). Save it.

2. Receive callbacks

Every completed job sends a POST to your URL:

{
  "event": "job.complete",
  "job_id": "abc123",
  "status": "complete",
  "pages": 12,
  "processing_seconds": 8.4
}

Headers include an HMAC signature for verification:

X-XRPOCR-Signature: sha256=<hex>
X-XRPOCR-Timestamp: 1713456789
X-XRPOCR-Job-Id: abc123

3. Verify the signature

import hmac, hashlib

def verify_webhook(secret: str, timestamp: str, body: bytes,
                   signature: str) -> bool:
    expected = hmac.new(
        secret.encode(),
        f"{timestamp}.".encode() + body,
        hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(f"sha256={expected}", signature)

4. Download on callback

import requests

def handle_callback(job_id: str, api_key: str, output_dir: str):
    r = requests.get(
        f"https://xrppdf.com/download/{job_id}",
        headers={"Authorization": f"Bearer {api_key}"}
    )
    with open(f"{output_dir}/{job_id}.pdf", "wb") as f:
        f.write(r.content)

Rate limits and concurrency

Tier	Concurrent jobs	Notes
Default	2	Good for casual use
Pro (8 XRP)	Higher limit	Contact for adjustment
Scale (30 XRP)	Up to 50	Built for batch workflows

If you hit the concurrency limit, the API returns HTTP 429 with {"error": "...", "in_flight": 2, "limit": 2}. Back off and retry.

A simple throttle for the submit script:

MAX_CONCURRENT=10
active=0

for pdf in "$INPUT_DIR"/*.pdf; do
  submit_job "$pdf" &
  ((active++))

  if ((active >= MAX_CONCURRENT)); then
    wait -n
    ((active--))
  fi
done
wait

Idempotency keys

If your network is unreliable, add an idempotency key to prevent double-processing:

curl -X POST https://xrppdf.com/api/v1/ocr \
  -H "Authorization: Bearer $API_KEY" \
  -H "Idempotency-Key: batch-2026-04-18-invoice-0042" \
  -F "[email protected]"

Same key within 24 hours = same response replayed. No duplicate charges.

Cost at scale

Live XRP/RLUSD feed: $1.42 per XRP.

Pages	Tier	XRP cost	Approx USD
100	100-bundle	2 XRP	$2.83
1,000	Pro	8 XRP	$11.33
10,000	Scale	30 XRP	$42.5
50,000	5× Scale	150 XRP	$212.5

Credits never expire. Buy once, use over weeks or months.

Full API docs

Everything above — endpoints, auth, webhooks, idempotency, error codes — is documented at xrppdf.com/docs.

Ready to batch-process? Get an API key → — link a wallet, fund credits, start calling.

Sign in to XRPpdf

Scan with Xaman

Buy credits