DataParse API Documentation
Welcome to the Data Parse API! Leverage our cutting-edge OCR and LLM pipeline to transform images and PDFs into structured JSON data. Whether you’re extracting invoice totals, names, dates, or more complex nested data structures, our API adapts to your custom JSON schema for maximum flexibility and precision.
Note: Schemas aren’t limited to flat structures—they can define nested objects, arrays, and conditional logic. This capability allows you to fine-tune extraction even for challenging document layouts.
Overview
Endpoint:
POST https://api.datapar.se/parse
Purpose:
Automate the extraction of key data from images and PDFs by tailoring the output structure with your custom JSON schema.Workflow:
- Upload Media: Provide your media as base64-encoded strings or publicly accessible URLs.
- Define Schema: Send a detailed JSON schema that describes the structure and types of the fields you wish to extract.
- Receive Data: Get back a structured JSON response containing the parsed data.
Authentication
Every request must be authenticated using your API key.
Header | Required | Description |
---|---|---|
x-api-key |
Yes | Your unique API key. |
Content-Type |
Yes | Must be set to application/json . |
Request Structure
Your request payload must include both the media and a corresponding schema that outlines the expected data structure.
Body Parameters
Field | Type | Description |
---|---|---|
media |
Array | An array of media objects. Each object must include: |
type |
String | The media type. Allowed values: "IMAGE" or "PDF" . |
data |
String | Either a base64-encoded string (with MIME prefix for images) or a publicly accessible URL. |
schema |
Object | A JSON schema describing the expected output. Use detailed property definitions (including nested objects, arrays, and conditional logic) to enhance extraction accuracy. |
Quick Start Examples
Example using cURL
curl -X POST https://api.datapar.se/parse \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"media": [
{ "type": "IMAGE", "data": "..." },
{ "type": "PDF", "data": "JVBERi0xLjQK..." }
],
"schema": {
"type": "object",
"properties": {
"invoice_number": { "type": "string", "description": "Unique ID of the invoice." },
"total_amount": { "type": "number", "description": "Total invoice amount in USD." }
}
}
}'
Example using Node.js
const axios = require('axios');
(async () => {
try {
const response = await axios.post('https://api.datapar.se/parse', {
media: [
{ type: 'IMAGE', data: '...' },
{ type: 'PDF', data: 'JVBERi0xLjQK...' }
],
schema: {
type: 'object',
properties: {
invoice_number: { type: 'string', description: 'Unique ID of the invoice.' },
total_amount: { type: 'number', description: 'Total invoice amount in USD.' }
}
}
}, {
headers: {
'x-api-key': 'YOUR_API_KEY',
'Content-Type': 'application/json'
}
});
console.log(response.data);
} catch (error) {
console.error(error.response ? error.response.data : error.message);
}
})();
Response Format
The API responds with a JSON object containing either the successfully parsed data or an error message.
Successful Response
{
"success": {
"invoice_number": "INV-00123",
"total_amount": 1250.75,
"details": {
"date": "2025-01-15",
"items": [
{ "description": "Product A", "price": 750.50 },
{ "description": "Product B", "price": 500.25 }
]
}
},
"failure": null
}
Error Response
{
"success": null,
"failure": "A descriptive error message."
}
Tip: Always check the
failure
field before processing thesuccess
data.
Advanced Schema Capabilities
Our API supports powerful JSON Schema features to ensure that your extracted data is structured exactly as you need it. You can:
- Nest Objects & Arrays: Define deeply nested structures to mirror complex document layouts.
- Conditional Extraction: Use keywords like
oneOf
,anyOf
, orallOf
to capture data that may appear in multiple formats.
These capabilities enable you to customize the extraction process with precision—turning even the most unstructured documents into well-organized JSON.
JSON Schema Examples
Craft your schema carefully to ensure high extraction accuracy. Here are a few examples to get you started:
Basic Personal Info
{
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Full legal name."
},
"date_of_birth": {
"type": "string",
"description": "Date of birth in YYYY-MM-DD format."
}
}
}
Receipt Parsing
{
"type": "object",
"properties": {
"store_name": {
"type": "string",
"description": "The store’s name."
},
"transaction_date": {
"type": "string",
"description": "Date in YYYY-MM-DD format."
},
"total_amount": {
"type": "number",
"description": "Total amount in local currency."
}
}
}
Advanced Document Extraction
{
"type": "object",
"properties": {
"invoice": {
"type": "object",
"properties": {
"number": {
"type": "string",
"description": "Invoice number in the format INV-XXXX."
},
"date": {
"type": "string",
"description": "Invoice date in YYYY-MM-DD format."
}
}
},
"items": {
"type": "array",
"description": "List of purchased items with detailed breakdown.",
"items": {
"type": "object",
"properties": {
"description": {
"type": "string",
"description": "Description of the item."
},
"quantity": {
"type": "number",
"description": "Number of units purchased."
},
"unit_price": {
"type": "number",
"description": "Price per unit."
}
}
}
}
}
}
Media Guidelines
Ensure your media is prepared for optimal processing:
Base64-Encoded Media
- Images: Must include the MIME prefix (e.g.,
data:image/png;base64,...
). - PDFs: Can be provided as raw base64 strings (e.g.,
JVBERi0xLjQK...
).
- Images: Must include the MIME prefix (e.g.,
Public URLs
- Must be directly accessible without authentication or redirections.
- Recommended for large files or when you prefer not to embed base64 data.
Best Practices
Schema Design
- Use clear and descriptive property names.
- Leverage nested objects, arrays, and conditional logic to capture complex data accurately.
- Limit your schema to only the necessary fields to reduce processing time and minimize errors.
Performance & Security
- For private or small files, prefer base64 encoding.
- For larger or public files, use URLs.
- Submit only the relevant pages or images.
Error Handling
- Always verify the presence of a
failure
message before processing thesuccess
data. - Use HTTP status codes to differentiate between various error types.
- Always verify the presence of a
Common Use Cases
- Invoices & Receipts: Extract key financial information like totals, dates, and invoice numbers—including detailed line-item breakdowns.
- Identification Documents: Capture personal details such as names, dates of birth, and addresses using nested schema structures.
- Contracts & Legal Documents: Customize your schema to extract parties, references, clauses, and more, even when the data is nested or conditionally formatted.
Support
For further assistance or to discuss complex use cases, please contact our support team:
support@datapar.se