You have a folder full of scanned PDFs. Maybe hundreds. Maybe thousands.
You need every one of them searchable. Here's how to do it with curl,
a few lines of bash, and the XRPpdf API.
Prerequisites
- An XRP wallet linked at xrppdf.com
- Page credits funded (Scale tier: 30 XRP → 10,000 pages at $0.0043/page)
- An API key from your dashboard
curlandjqinstalled
Step 1: Submit all files
#!/usr/bin/env bash
set -euo pipefail
API_KEY="xrpocr_live_YOUR_KEY_HERE"
INPUT_DIR="./scanned-pdfs"
LOG_FILE="./ocr-jobs.log"
# Clear previous log
> "$LOG_FILE"
for pdf in "$INPUT_DIR"/*.pdf; do
filename=$(basename "$pdf")
response=$(curl -s -X POST https://xrppdf.com/api/v1/ocr \
-H "Authorization: Bearer $API_KEY" \
-F "file=@$pdf")
job_id=$(echo "$response" | jq -r '.job_id // empty')
if [[ -n "$job_id" ]]; then
echo "$filename $job_id" >> "$LOG_FILE"
echo "✓ Submitted: $filename → $job_id"
else
error=$(echo "$response" | jq -r '.error // "unknown error"')
echo "✗ Failed: $filename — $error" >&2
fi
done
echo "Done. $(wc -l < "$LOG_FILE") jobs submitted."
This loops through every PDF in ./scanned-pdfs/, submits each one, and
logs the filename-to-job-ID mapping.
Step 2: Poll for results and download
#!/usr/bin/env bash
set -euo pipefail
API_KEY="xrpocr_live_YOUR_KEY_HERE"
LOG_FILE="./ocr-jobs.log"
OUTPUT_DIR="./searchable-pdfs"
mkdir -p "$OUTPUT_DIR"
while IFS=' ' read -r filename job_id; do
echo -n "Checking $filename ($job_id)... "
# Poll until done (max 5 minutes per job)
for attempt in $(seq 1 60); do
status_json=$(curl -s \
-H "Authorization: Bearer $API_KEY" \
"https://xrppdf.com/api/v1/jobs/$job_id")
status=$(echo "$status_json" | jq -r '.status')
if [[ "$status" == "complete" ]]; then
curl -s -o "$OUTPUT_DIR/$filename" \
-H "Authorization: Bearer $API_KEY" \
"https://xrppdf.com/download/$job_id"
echo "✓ Downloaded"
break
elif [[ "$status" == "error" ]]; then
echo "✗ Error: $(echo "$status_json" | jq -r '.error')" >&2
break
else
sleep 5
fi
done
done < "$LOG_FILE"
echo "Done. Results in $OUTPUT_DIR/"
Better approach: use webhooks
Polling works, but webhooks are cleaner — especially for large batches. Instead of looping and sleeping, let XRPpdf call you when each job finishes.
1. Register a webhook
curl -X POST https://xrppdf.com/api/webhooks \
-H "Authorization: Bearer $SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://your-server.com/ocr-callback"}'
You'll get back a secret (shown once). Save it.
2. Receive callbacks
Every completed job sends a POST to your URL:
{
"event": "job.complete",
"job_id": "abc123",
"status": "complete",
"pages": 12,
"processing_seconds": 8.4
}
Headers include an HMAC signature for verification:
X-XRPOCR-Signature: sha256=<hex>
X-XRPOCR-Timestamp: 1713456789
X-XRPOCR-Job-Id: abc123
3. Verify the signature
import hmac, hashlib
def verify_webhook(secret: str, timestamp: str, body: bytes,
signature: str) -> bool:
expected = hmac.new(
secret.encode(),
f"{timestamp}.".encode() + body,
hashlib.sha256
).hexdigest()
return hmac.compare_digest(f"sha256={expected}", signature)
4. Download on callback
import requests
def handle_callback(job_id: str, api_key: str, output_dir: str):
r = requests.get(
f"https://xrppdf.com/download/{job_id}",
headers={"Authorization": f"Bearer {api_key}"}
)
with open(f"{output_dir}/{job_id}.pdf", "wb") as f:
f.write(r.content)
Rate limits and concurrency
| Tier | Concurrent jobs | Notes |
|---|---|---|
| Default | 2 | Good for casual use |
| Pro (8 XRP) | Higher limit | Contact for adjustment |
| Scale (30 XRP) | Up to 50 | Built for batch workflows |
If you hit the concurrency limit, the API returns HTTP 429 with
{"error": "...", "in_flight": 2, "limit": 2}. Back off and retry.
A simple throttle for the submit script:
MAX_CONCURRENT=10
active=0
for pdf in "$INPUT_DIR"/*.pdf; do
submit_job "$pdf" &
((active++))
if ((active >= MAX_CONCURRENT)); then
wait -n
((active--))
fi
done
wait
Idempotency keys
If your network is unreliable, add an idempotency key to prevent double-processing:
curl -X POST https://xrppdf.com/api/v1/ocr \
-H "Authorization: Bearer $API_KEY" \
-H "Idempotency-Key: batch-2026-04-18-invoice-0042" \
-F "[email protected]"
Same key within 24 hours = same response replayed. No duplicate charges.
Cost at scale
Live XRP/RLUSD feed: $1.42 per XRP.
| Pages | Tier | XRP cost | Approx USD |
|---|---|---|---|
| 100 | 100-bundle | 2 XRP | $2.83 |
| 1,000 | Pro | 8 XRP | $11.33 |
| 10,000 | Scale | 30 XRP | $42.5 |
| 50,000 | 5× Scale | 150 XRP | $212.5 |
Credits never expire. Buy once, use over weeks or months.
Full API docs
Everything above — endpoints, auth, webhooks, idempotency, error codes — is documented at xrppdf.com/docs.
Ready to batch-process? Get an API key → — link a wallet, fund credits, start calling.