Schema Design Guide
Learn how to design effective schemas that improve parsing accuracy and extract the data you need from documents.
Why Schema Descriptions Matter
Section titled “Why Schema Descriptions Matter”Schema descriptions are crucial for AI-powered document parsing. They help the LLM understand:
- What each field represents in the context of your document
- Expected data formats and patterns
- Business context that affects parsing decisions
- Edge cases and special handling requirements
Best Practices
Section titled “Best Practices”1. Use Clear, Descriptive Field Names
Section titled “1. Use Clear, Descriptive Field Names”// ❌ Poor field names{ "properties": { "f1": {"type": "string"}, "amt": {"type": "number"} }}
// ✅ Clear, descriptive names{ "properties": { "invoice_number": {"type": "string"}, "total_amount": {"type": "number"} }}2. Add Comprehensive Descriptions
Section titled “2. Add Comprehensive Descriptions”{ "type": "object", "properties": { "invoice_number": { "type": "string", "description": "The unique invoice number or reference ID, typically found at the top of the invoice" }, "vendor_name": { "type": "string", "description": "The name of the company or individual who issued the invoice" }, "total_amount": { "type": "number", "description": "The final amount due including all taxes and fees, usually shown as 'Total' or 'Amount Due'" }, "date": { "type": "string", "description": "The invoice date in YYYY-MM-DD format" } }}3. Specify Data Formats and Patterns
Section titled “3. Specify Data Formats and Patterns”{ "properties": { "phone_number": { "type": "string", "description": "Phone number in international format (e.g., +1-555-123-4567) or local format", "pattern": "^[+]?[0-9\\-\\s\\(\\)]+$" }, "email": { "type": "string", "description": "Valid email address", "format": "email" }, "amount": { "type": "number", "description": "Monetary amount as a decimal number (e.g., 1250.50 for $1,250.50)" } }}4. Handle Arrays and Complex Objects
Section titled “4. Handle Arrays and Complex Objects”{ "properties": { "line_items": { "type": "array", "description": "Individual items or services listed on the invoice", "items": { "type": "object", "properties": { "description": { "type": "string", "description": "Description of the product or service" }, "quantity": { "type": "number", "description": "Number of units or hours" }, "unit_price": { "type": "number", "description": "Price per unit or hour" }, "line_total": { "type": "number", "description": "Total for this line item (quantity × unit_price)" } } } } }}Common Patterns
Section titled “Common Patterns”Document Types and Their Key Fields
Section titled “Document Types and Their Key Fields”Different document types have characteristic data patterns:
- Invoices:** invoice numbers, vendor info, line items, totals, dates
- Receipts:** merchant info, transaction details, itemized purchases
- Business Cards:** contact information, professional details
- Contracts:** parties, dates, terms, values
- Forms:** structured data fields, personal information
Field Naming Conventions
Section titled “Field Naming Conventions”Use consistent, descriptive naming patterns:
{ "properties": { // Use snake_case for field names "invoice_number": { "type": "string" }, "total_amount": { "type": "number" }, "due_date": { "type": "string" },
// Be specific about what the field contains "vendor_name": { "type": "string" }, // Not just "name" "customer_email": { "type": "string" }, // Not just "email" "billing_address": { "type": "object" } // Not just "address" }}Advanced Tips
Section titled “Advanced Tips”1. Use Enums for Categorical Data
Section titled “1. Use Enums for Categorical Data”{ "properties": { "status": { "type": "string", "description": "Current status of the document", "enum": ["draft", "pending", "approved", "rejected", "paid"] }, "priority": { "type": "string", "description": "Priority level of the request", "enum": ["low", "medium", "high", "urgent"] } }}2. Handle Optional vs Required Fields
Section titled “2. Handle Optional vs Required Fields”{ "type": "object", "required": ["invoice_number", "total_amount", "date"], "properties": { "invoice_number": { "type": "string", "description": "Required: Unique invoice identifier" }, "notes": { "type": "string", "description": "Optional: Additional notes or comments" } }}3. Use Nested Objects for Complex Data
Section titled “3. Use Nested Objects for Complex Data”{ "properties": { "billing_address": { "type": "object", "description": "Billing address information", "properties": { "street": { "type": "string", "description": "Street address" }, "city": { "type": "string", "description": "City name" }, "state": { "type": "string", "description": "State or province" }, "postal_code": { "type": "string", "description": "ZIP or postal code" }, "country": { "type": "string", "description": "Country name" } } } }}Testing Your Schema
Section titled “Testing Your Schema”- Start simple - Begin with basic fields and add complexity
- Test with real documents - Use actual invoices, receipts, etc.
- Iterate based on results - Refine descriptions based on parsing accuracy
- Handle edge cases - Consider unusual formats or missing data
Next Steps
Section titled “Next Steps”- Browse ready-to-use schemas for common document types
- Test your schema with the quickstart guide
- Review the API Reference for technical implementation details
- Start with our free tier to experiment without cost