Haystac Platform / Chicago

Separate and identify every document before the workflow starts.

Chicago automatically splits mixed files, packets, scans, and attachments into distinct business documents, then classifies each one by meaning — so Nashville, Orion, Polaris, and your downstream systems start with clean, reliable input.

haystac.local · Chicago · mixed packet → routed documents
Auto-separate
01
Split mixed packets into individual documents
Auto-identify
02
Classify by meaning, not rigid templates
Route correctly
03
Send each document to the right next step
Improve downstream AI
04
Give extraction & reasoning cleaner inputs
Inbound packet · mixedlive
page 1 · ?
page 2 · ?
page 3 · ?
page 4 · ?
page 5 · ?
Boundaries · classifiedsemantic + visual
doc 1 · pp.1–2Claim form0.96
doc 2 · p.3Policy0.93
doc 3 · p.4Medical bill0.91
doc 4 · p.5Correspondence0.88
5 pages → 4 documentsSplitok
Routed downstreamworkflow
Claim formNashville
PolicyOrion
Medical billNashville
CorrespondencePolaris
✓ 4 docs routed · 0 manual sorts · case ref dec-4471-a
Where Chicago fits

Chicago is the first control point in the Haystac pipeline.

Before AI can extract facts, answer questions, or trigger workflows, it needs to know what it is looking at. Chicago answers the first two questions in every document workflow: where does one document end, and what type of document is it? Once Chicago separates and identifies the content, Nashville extracts the facts, Orion reasons over the evidence, and Polaris moves the work forward.

Chicago
classify & separate
doc type + boundaries
Nashville
extract structured data
fields · tables
Orion
reason over evidence
cited answers
Polaris
trigger next action
workflow + audit
delivered
{ JSON }

Chicago is the first layer. Without it, every other stage works from messy input — merged packets, mislabeled files, scans of unknown type. Get this step right, and every downstream step gets easier.

The problem

Inbound content rarely arrives clean.

Organizations do not receive neat, single-purpose documents. They receive claim packets, loan files, application bundles, scanned batches, email attachments, faxes, supporting records, duplicate pages, and merged PDFs.

Before any system can extract or reason, someone has to separate the content and identify what each document is. When that first step is wrong, everything downstream gets worse.

01

Mixed packets slow the intake queue.

A single upload may contain forms, IDs, invoices, statements, letters, notes, and supporting evidence. Someone still has to figure out where each document begins and ends.

02

Template-based classification breaks.

Document formats change. Vendors use different layouts. Scans are noisy. Similar-looking documents may mean very different things.

03

Bad intake creates downstream errors.

The wrong split leads to missing fields. The wrong label leads to wrong routing. The wrong context leads to wrong answers and manual cleanup.

The Chicago answer

Turn messy inbound content into clean, labeled intake.

Chicago is Haystac’s document separation and classification layer. It automatically breaks mixed document batches into individual business documents and identifies each one by meaning. It does not rely only on fixed templates, brittle keyword rules, or layout matching. Chicago uses semantic and visual understanding to recognize what a document represents, even when formats vary. That means downstream systems start with the right document, in the right category, routed to the right place.

01

Separate the packet

Split merged PDFs, scanned batches, faxes, uploads, and attachments into distinct business documents.

02

Identify each document

Classify content as a claim form, policy page, medical bill, ID, application, invoice, statement, contract, letter, or supporting record.

03

Route the next step

Send each document to Nashville for extraction, Orion for reasoning, Polaris for workflow action, or an external system.

04

Keep improving

Adapt to new document types, changing formats, and evolving intake streams with less manual rule maintenance.

What it enables

Create order before extraction begins.

Split merged files into individual documents

Break large PDFs, packets, scanned batches, and uploads into separate documents that downstream systems can process correctly.

Classify every document type

Identify forms, statements, policies, letters, invoices, applications, IDs, medical records, and supporting materials.

Route documents to the right workflow

Send claims, loan documents, applications, contracts, and case records to the right extraction, review, or automation path.

Group similar unknown documents

Cluster documents by narrative and visual similarity to discover recurring document types across large repositories.

Prepare content for RAG and reasoning

Make sure Orion retrieves and reasons over the right evidence, not a mislabeled or poorly separated file.

Reduce manual mailroom sorting

Replace human-heavy intake triage with automated separation and classification at scale.

How it works

From mixed intake to workflow-ready documents.

Chicago organizes inbound content before business logic is applied. It separates, classifies, labels, and routes documents so every downstream step starts from a cleaner input.

01
Receive

Take in files from the systems where work begins.

Documents arrive from scanners, digital mailrooms, file systems, ECM platforms, workflow tools, host applications, or inbound APIs.

Inbound sourceschicago.intake
scanner · SFTPlive
digital mailroomlive
ECM webhooklive
REST /v1/intakeready
02
Detect boundaries

Find where one document ends and another begins.

Chicago separates mixed batches into distinct documents, even when a single file contains multiple forms, attachments, letters, or supporting records.

Boundary detectionchicago.split
page 1 · doc A
page 2 · doc A
boundary detected
page 3 · doc B
boundary detected
page 4 · doc C
03
Generate signals

Understand the document beyond surface text.

Chicago creates representations of the document’s meaning, structure, and visual patterns so classification is not limited to keywords or templates.

Multi-modal signalschicago.encode
semantic embedding1024d
visual embedding768d
layout signaturehash
vocabulary & entitieson
04
Classify by meaning

Identify what the document represents.

Documents are labeled based on semantic similarity and context, even when layouts, vendors, scans, or wording differ.

Classified outputchicago.classify
doc A · Claim form0.96
doc B · Policy0.93
doc C · Medical bill0.91
doc D · Correspondence0.88
05
Route downstream

Send the right document to the right next step.

Outputs can be returned to the host system, routed to Nashville for extraction, sent to Orion for reasoning, used by Polaris for workflow execution, or passed into external applications.

Routing decisionschicago.route
Claim Nashville
Policy Orion
Correspondence Polaris
POST /v1/route · 200 OK
The core difference

Traditional systems recognize what a document looks like. Chicago understands what it means.

Rules-based systems work when documents are predictable. They look for fixed layouts, keywords, barcodes, page positions, or template matches. Chicago takes a different approach. It uses embedding-based understanding to compare documents by meaning, structure, and visual context. Two invoices can look different and still be recognized as invoices. Two denial letters can use different wording and still be grouped correctly. A new vendor format can be classified without rebuilding an entire rule library.

Traditional classification
Chicago
Matches fixed templates
Classifies by semantic meaning
Depends on keywords and layout rules
Uses text, visual, and multi-modal signals
Breaks when formats change
Handles variation across vendors and sources
Requires ongoing rule maintenance
Learns and adapts with less manual setup
Struggles with noisy intake
Built for real-world document streams

Chicago does not just ask, “What does this page look like?” It asks, “What is this document?”

Downstream impact

Cleaner intake makes the whole AI pipeline more reliable.

Every downstream step depends on Chicago getting the first step right. Extraction, reasoning, routing, and automation all become more accurate when the system starts with the correct document in the correct context.

Nashville

Better input.

Separated, correctly labeled documents make field extraction more accurate and reduce manual correction.

Orion

Better evidence.

When documents are organized by meaning, retrieval-grounded answers are based on the right sources.

Chicago
cleaner intake
→ better everything
Polaris

Right workflow.

Correct classification helps agents route, escalate, validate, and act on the right document type.

Human teams

Fewer errors to review.

Better intake means fewer misrouted documents, fewer failed extractions, and less manual cleanup.

Capabilities

Built for noisy, inconsistent, high-volume intake.

01

Document separation

Automatically split mixed files, packets, scanned batches, and attachments into individual business documents.

02

Semantic classification

Classify documents by meaning and context instead of relying only on keywords, file names, or fixed layouts.

03

Visual and textual understanding

Use both document appearance and content meaning to identify documents more reliably.

04

Clustering and discovery

Group similar documents across large repositories to identify recurring document types and prepare content for downstream AI.

05

Routing-ready outputs

Return labeled, structured outputs that can be sent to host systems, Nashville, Orion, Polaris, or external workflows.

06

Continuous improvement

Adapt to new document types and changing intake patterns with less manual rule-building and template maintenance.

Use cases

For any workflow where documents arrive mixed together.

Insurance claims

Forms, photos, evidence — sorted at intake.

Separate claim forms, policy pages, medical bills, adjuster notes, photos, correspondence, and supporting evidence before extraction or adjudication.

~7×fewer manual sorts
Banking & lending

Loan packets, organized at intake.

Organize loan applications, IDs, income statements, bank statements, disclosures, KYC documents, and underwriting materials.

days → minspacket triage
Government intake

Casework with clean inputs.

Separate applications, eligibility forms, notices, supporting records, appeals, permits, and case correspondence.

100%audit-traceable
Healthcare packets

Prior auth & chart organization.

Organize prior authorization packets, referrals, medical records, lab results, plan documents, and provider correspondence.

PHIstays on-prem
Digital mailroom

Automated intake at scale.

Classify and route scanned batches, inbound PDFs, email attachments, faxes, and business correspondence at intake.

24/7no human triage
Legacy archives

Turn archives into usable corpora.

Cluster and label large volumes of historical content so archives become usable for extraction, search, reasoning, and governance.

M+ docsat a time
OmniSuite™

Chicago creates the foundation for everything that follows.

Chicago is the first layer of OmniSuite™. It gives the rest of the platform clean, labeled, context-aware documents to work with.

Chicago
What is this document?

Separates, classifies, and routes inbound content.

Nashville
What facts does it contain?

Extracts fields, tables, entities, handwriting, clauses, and relationships.

Orion
What does the evidence say?

Reasons over structured data and verified enterprise content.

Polaris
What should happen next?

Routes, escalates, validates, and triggers workflow actions.

Chicago identifies the document. Nashville extracts the facts. Orion reasons from them. Polaris moves the work forward.

FAQ

Common questions about Chicago.

What does Chicago actually do?

Chicago separates mixed inbound content into individual business documents and classifies each document by meaning, so downstream systems know what they are processing.

Is Chicago only for digital mailrooms?

No. Digital mailroom is a strong fit, but Chicago also supports claims intake, loan processing, government case intake, healthcare packets, repository cleanup, and any workflow where documents arrive mixed together.

How is Chicago different from template-based classification?

Template systems rely on fixed layouts, keywords, or rules. Chicago uses semantic and visual understanding to classify documents by what they represent, even when formats vary.

Can Chicago separate documents inside one PDF?

Yes. Chicago is designed to detect document boundaries inside mixed files, packets, scans, and merged PDFs.

What happens after Chicago classifies a document?

The document can be routed to Nashville for extraction, Orion for reasoning, Polaris for workflow execution, a human review queue, or an external host system.

Does Chicago improve downstream accuracy?

Yes. When documents are correctly separated and labeled at intake, extraction, retrieval, reasoning, and workflow automation all start from cleaner inputs.

Ready when you are

Start with cleaner intake.

Chicago turns high-volume inbound content into separated, labeled, workflow-ready documents — so every downstream AI step starts with the right context.

We enable highly regulated organizations to build, govern, and operate domain-specific models within their own infrastructure and governance frameworks.

Haystac Platform / Chicago

Separate and identify every document before the workflow starts.

Chicago automatically splits mixed files, packets, scans, and attachments into distinct business documents, then classifies each one by meaning — so Nashville, Orion, Polaris, and your downstream systems start with clean, reliable input.

haystac.local · Chicago · mixed packet → routed documents
Auto-separate
01
Split mixed packets into individual documents
Auto-identify
02
Classify by meaning, not rigid templates
Route correctly
03
Send each document to the right next step
Improve downstream AI
04
Give extraction & reasoning cleaner inputs
Inbound packet · mixedlive
page 1 · ?
page 2 · ?
page 3 · ?
page 4 · ?
page 5 · ?
Boundaries · classifiedsemantic + visual
doc 1 · pp.1–2Claim form0.96
doc 2 · p.3Policy0.93
doc 3 · p.4Medical bill0.91
doc 4 · p.5Correspondence0.88
5 pages → 4 documentsSplitok
Routed downstreamworkflow
Claim formNashville
PolicyOrion
Medical billNashville
CorrespondencePolaris
✓ 4 docs routed · 0 manual sorts · case ref dec-4471-a
Where Chicago fits

Chicago is the first control point in the Haystac pipeline.

Before AI can extract facts, answer questions, or trigger workflows, it needs to know what it is looking at. Chicago answers the first two questions in every document workflow: where does one document end, and what type of document is it? Once Chicago separates and identifies the content, Nashville extracts the facts, Orion reasons over the evidence, and Polaris moves the work forward.

Chicago
classify & separate
doc type + boundaries
Nashville
extract structured data
fields · tables
Orion
reason over evidence
cited answers
Polaris
trigger next action
workflow + audit
delivered
{ JSON }

Chicago is the first layer. Without it, every other stage works from messy input — merged packets, mislabeled files, scans of unknown type. Get this step right, and every downstream step gets easier.

The problem

Inbound content rarely arrives clean.

Organizations do not receive neat, single-purpose documents. They receive claim packets, loan files, application bundles, scanned batches, email attachments, faxes, supporting records, duplicate pages, and merged PDFs.

Before any system can extract or reason, someone has to separate the content and identify what each document is. When that first step is wrong, everything downstream gets worse.

01

Mixed packets slow the intake queue.

A single upload may contain forms, IDs, invoices, statements, letters, notes, and supporting evidence. Someone still has to figure out where each document begins and ends.

02

Template-based classification breaks.

Document formats change. Vendors use different layouts. Scans are noisy. Similar-looking documents may mean very different things.

03

Bad intake creates downstream errors.

The wrong split leads to missing fields. The wrong label leads to wrong routing. The wrong context leads to wrong answers and manual cleanup.

The Chicago answer

Turn messy inbound content into clean, labeled intake.

Chicago is Haystac’s document separation and classification layer. It automatically breaks mixed document batches into individual business documents and identifies each one by meaning. It does not rely only on fixed templates, brittle keyword rules, or layout matching. Chicago uses semantic and visual understanding to recognize what a document represents, even when formats vary. That means downstream systems start with the right document, in the right category, routed to the right place.

01

Separate the packet

Split merged PDFs, scanned batches, faxes, uploads, and attachments into distinct business documents.

02

Identify each document

Classify content as a claim form, policy page, medical bill, ID, application, invoice, statement, contract, letter, or supporting record.

03

Route the next step

Send each document to Nashville for extraction, Orion for reasoning, Polaris for workflow action, or an external system.

04

Keep improving

Adapt to new document types, changing formats, and evolving intake streams with less manual rule maintenance.

What it enables

Create order before extraction begins.

Split merged files into individual documents

Break large PDFs, packets, scanned batches, and uploads into separate documents that downstream systems can process correctly.

Classify every document type

Identify forms, statements, policies, letters, invoices, applications, IDs, medical records, and supporting materials.

Route documents to the right workflow

Send claims, loan documents, applications, contracts, and case records to the right extraction, review, or automation path.

Group similar unknown documents

Cluster documents by narrative and visual similarity to discover recurring document types across large repositories.

Prepare content for RAG and reasoning

Make sure Orion retrieves and reasons over the right evidence, not a mislabeled or poorly separated file.

Reduce manual mailroom sorting

Replace human-heavy intake triage with automated separation and classification at scale.

How it works

From mixed intake to workflow-ready documents.

Chicago organizes inbound content before business logic is applied. It separates, classifies, labels, and routes documents so every downstream step starts from a cleaner input.

01
Receive

Take in files from the systems where work begins.

Documents arrive from scanners, digital mailrooms, file systems, ECM platforms, workflow tools, host applications, or inbound APIs.

Inbound sourceschicago.intake
scanner · SFTPlive
digital mailroomlive
ECM webhooklive
REST /v1/intakeready
02
Detect boundaries

Find where one document ends and another begins.

Chicago separates mixed batches into distinct documents, even when a single file contains multiple forms, attachments, letters, or supporting records.

Boundary detectionchicago.split
page 1 · doc A
page 2 · doc A
boundary detected
page 3 · doc B
boundary detected
page 4 · doc C
03
Generate signals

Understand the document beyond surface text.

Chicago creates representations of the document’s meaning, structure, and visual patterns so classification is not limited to keywords or templates.

Multi-modal signalschicago.encode
semantic embedding1024d
visual embedding768d
layout signaturehash
vocabulary & entitieson
04
Classify by meaning

Identify what the document represents.

Documents are labeled based on semantic similarity and context, even when layouts, vendors, scans, or wording differ.

Classified outputchicago.classify
doc A · Claim form0.96
doc B · Policy0.93
doc C · Medical bill0.91
doc D · Correspondence0.88
05
Route downstream

Send the right document to the right next step.

Outputs can be returned to the host system, routed to Nashville for extraction, sent to Orion for reasoning, used by Polaris for workflow execution, or passed into external applications.

Routing decisionschicago.route
Claim Nashville
Policy Orion
Correspondence Polaris
POST /v1/route · 200 OK
The core difference

Traditional systems recognize what a document looks like. Chicago understands what it means.

Rules-based systems work when documents are predictable. They look for fixed layouts, keywords, barcodes, page positions, or template matches. Chicago takes a different approach. It uses embedding-based understanding to compare documents by meaning, structure, and visual context. Two invoices can look different and still be recognized as invoices. Two denial letters can use different wording and still be grouped correctly. A new vendor format can be classified without rebuilding an entire rule library.

Traditional classification
Chicago
Matches fixed templates
Classifies by semantic meaning
Depends on keywords and layout rules
Uses text, visual, and multi-modal signals
Breaks when formats change
Handles variation across vendors and sources
Requires ongoing rule maintenance
Learns and adapts with less manual setup
Struggles with noisy intake
Built for real-world document streams

Chicago does not just ask, “What does this page look like?” It asks, “What is this document?”

Downstream impact

Cleaner intake makes the whole AI pipeline more reliable.

Every downstream step depends on Chicago getting the first step right. Extraction, reasoning, routing, and automation all become more accurate when the system starts with the correct document in the correct context.

Nashville

Better input.

Separated, correctly labeled documents make field extraction more accurate and reduce manual correction.

Orion

Better evidence.

When documents are organized by meaning, retrieval-grounded answers are based on the right sources.

Chicago
cleaner intake
→ better everything
Polaris

Right workflow.

Correct classification helps agents route, escalate, validate, and act on the right document type.

Human teams

Fewer errors to review.

Better intake means fewer misrouted documents, fewer failed extractions, and less manual cleanup.

Capabilities

Built for noisy, inconsistent, high-volume intake.

01

Document separation

Automatically split mixed files, packets, scanned batches, and attachments into individual business documents.

02

Semantic classification

Classify documents by meaning and context instead of relying only on keywords, file names, or fixed layouts.

03

Visual and textual understanding

Use both document appearance and content meaning to identify documents more reliably.

04

Clustering and discovery

Group similar documents across large repositories to identify recurring document types and prepare content for downstream AI.

05

Routing-ready outputs

Return labeled, structured outputs that can be sent to host systems, Nashville, Orion, Polaris, or external workflows.

06

Continuous improvement

Adapt to new document types and changing intake patterns with less manual rule-building and template maintenance.

Use cases

For any workflow where documents arrive mixed together.

Insurance claims

Forms, photos, evidence — sorted at intake.

Separate claim forms, policy pages, medical bills, adjuster notes, photos, correspondence, and supporting evidence before extraction or adjudication.

~7×fewer manual sorts
Banking & lending

Loan packets, organized at intake.

Organize loan applications, IDs, income statements, bank statements, disclosures, KYC documents, and underwriting materials.

days → minspacket triage
Government intake

Casework with clean inputs.

Separate applications, eligibility forms, notices, supporting records, appeals, permits, and case correspondence.

100%audit-traceable
Healthcare packets

Prior auth & chart organization.

Organize prior authorization packets, referrals, medical records, lab results, plan documents, and provider correspondence.

PHIstays on-prem
Digital mailroom

Automated intake at scale.

Classify and route scanned batches, inbound PDFs, email attachments, faxes, and business correspondence at intake.

24/7no human triage
Legacy archives

Turn archives into usable corpora.

Cluster and label large volumes of historical content so archives become usable for extraction, search, reasoning, and governance.

M+ docsat a time
OmniSuite™

Chicago creates the foundation for everything that follows.

Chicago is the first layer of OmniSuite™. It gives the rest of the platform clean, labeled, context-aware documents to work with.

Chicago
What is this document?

Separates, classifies, and routes inbound content.

Nashville
What facts does it contain?

Extracts fields, tables, entities, handwriting, clauses, and relationships.

Orion
What does the evidence say?

Reasons over structured data and verified enterprise content.

Polaris
What should happen next?

Routes, escalates, validates, and triggers workflow actions.

Chicago identifies the document. Nashville extracts the facts. Orion reasons from them. Polaris moves the work forward.

FAQ

Common questions about Chicago.

What does Chicago actually do?

Chicago separates mixed inbound content into individual business documents and classifies each document by meaning, so downstream systems know what they are processing.

Is Chicago only for digital mailrooms?

No. Digital mailroom is a strong fit, but Chicago also supports claims intake, loan processing, government case intake, healthcare packets, repository cleanup, and any workflow where documents arrive mixed together.

How is Chicago different from template-based classification?

Template systems rely on fixed layouts, keywords, or rules. Chicago uses semantic and visual understanding to classify documents by what they represent, even when formats vary.

Can Chicago separate documents inside one PDF?

Yes. Chicago is designed to detect document boundaries inside mixed files, packets, scans, and merged PDFs.

What happens after Chicago classifies a document?

The document can be routed to Nashville for extraction, Orion for reasoning, Polaris for workflow execution, a human review queue, or an external host system.

Does Chicago improve downstream accuracy?

Yes. When documents are correctly separated and labeled at intake, extraction, retrieval, reasoning, and workflow automation all start from cleaner inputs.

Ready when you are

Start with cleaner intake.

Chicago turns high-volume inbound content into separated, labeled, workflow-ready documents — so every downstream AI step starts with the right context.

We enable highly regulated organizations to build, govern, and operate domain-specific models within their own infrastructure and governance frameworks.