Haystac Platform / Chicago

Separate and identify every document before the workflow starts.

Chicago automatically splits mixed files, packets, scans, and attachments into distinct business documents, then classifies each one by meaning — so Nashville, Orion, Polaris, and your downstream systems start with clean, reliable input.

Request demo → Back to platform

haystac.local · Chicago · mixed packet → routed documents

Auto-separate

Split mixed packets into individual documents

Auto-identify

Classify by meaning, not rigid templates

Route correctly

Send each document to the right next step

Improve downstream AI

Give extraction & reasoning cleaner inputs

Inbound packet · mixedlive

page 1 · ?

page 2 · ?

page 3 · ?

page 4 · ?

page 5 · ?

Boundaries · classifiedsemantic + visual

doc 1 · pp.1–2Claim form0.96

doc 2 · p.3Policy0.93

doc 3 · p.4Medical bill0.91

doc 4 · p.5Correspondence0.88

5 pages → 4 documentsSplitok

Routed downstreamworkflow

Claim form→Nashville

Policy→Orion

Medical bill→Nashville

Correspondence→Polaris

✓ 4 docs routed · 0 manual sorts · case ref dec-4471-a

Where Chicago fits

Chicago is the first control point in the Haystac pipeline.

Before AI can extract facts, answer questions, or trigger workflows, it needs to know what it is looking at. Chicago answers the first two questions in every document workflow: where does one document end, and what type of document is it? Once Chicago separates and identifies the content, Nashville extracts the facts, Orion reasons over the evidence, and Polaris moves the work forward.

Chicago

classify & separate

doc type + boundaries

Nashville

extract structured data

fields · tables

Orion

reason over evidence

cited answers

Polaris

trigger next action

workflow + audit

delivered

{ JSON }

Chicago is the first layer. Without it, every other stage works from messy input — merged packets, mislabeled files, scans of unknown type. Get this step right, and every downstream step gets easier.

The problem

Inbound content rarely arrives clean.

Organizations do not receive neat, single-purpose documents. They receive claim packets, loan files, application bundles, scanned batches, email attachments, faxes, supporting records, duplicate pages, and merged PDFs.

Before any system can extract or reason, someone has to separate the content and identify what each document is. When that first step is wrong, everything downstream gets worse.

Mixed packets slow the intake queue.

A single upload may contain forms, IDs, invoices, statements, letters, notes, and supporting evidence. Someone still has to figure out where each document begins and ends.

Template-based classification breaks.

Document formats change. Vendors use different layouts. Scans are noisy. Similar-looking documents may mean very different things.

Bad intake creates downstream errors.

The wrong split leads to missing fields. The wrong label leads to wrong routing. The wrong context leads to wrong answers and manual cleanup.

The Chicago answer

Turn messy inbound content into clean, labeled intake.

Chicago is Haystac’s document separation and classification layer. It automatically breaks mixed document batches into individual business documents and identifies each one by meaning. It does not rely only on fixed templates, brittle keyword rules, or layout matching. Chicago uses semantic and visual understanding to recognize what a document represents, even when formats vary. That means downstream systems start with the right document, in the right category, routed to the right place.

Separate the packet

Split merged PDFs, scanned batches, faxes, uploads, and attachments into distinct business documents.

Identify each document

Classify content as a claim form, policy page, medical bill, ID, application, invoice, statement, contract, letter, or supporting record.

Route the next step

Send each document to Nashville for extraction, Orion for reasoning, Polaris for workflow action, or an external system.

Keep improving

Adapt to new document types, changing formats, and evolving intake streams with less manual rule maintenance.

What it enables

Create order before extraction begins.

Split merged files into individual documents

Break large PDFs, packets, scanned batches, and uploads into separate documents that downstream systems can process correctly.

Classify every document type

Identify forms, statements, policies, letters, invoices, applications, IDs, medical records, and supporting materials.

Route documents to the right workflow

Send claims, loan documents, applications, contracts, and case records to the right extraction, review, or automation path.

Group similar unknown documents

Cluster documents by narrative and visual similarity to discover recurring document types across large repositories.

Prepare content for RAG and reasoning

Make sure Orion retrieves and reasons over the right evidence, not a mislabeled or poorly separated file.

Reduce manual mailroom sorting

Replace human-heavy intake triage with automated separation and classification at scale.

How it works

From mixed intake to workflow-ready documents.

Chicago organizes inbound content before business logic is applied. It separates, classifies, labels, and routes documents so every downstream step starts from a cleaner input.

Receive

Take in files from the systems where work begins.

Documents arrive from scanners, digital mailrooms, file systems, ECM platforms, workflow tools, host applications, or inbound APIs.

Inbound sourceschicago.intake

scanner · SFTPlive

digital mailroomlive

ECM webhooklive

REST /v1/intakeready

Detect boundaries

Find where one document ends and another begins.

Chicago separates mixed batches into distinct documents, even when a single file contains multiple forms, attachments, letters, or supporting records.

Boundary detectionchicago.split

page 1 · doc A

page 2 · doc A

boundary detected

page 3 · doc B

boundary detected

page 4 · doc C

Generate signals

Understand the document beyond surface text.

Chicago creates representations of the document’s meaning, structure, and visual patterns so classification is not limited to keywords or templates.

Multi-modal signalschicago.encode

semantic embedding1024d

visual embedding768d

layout signaturehash

vocabulary & entitieson

Classify by meaning

Identify what the document represents.

Documents are labeled based on semantic similarity and context, even when layouts, vendors, scans, or wording differ.

Classified outputchicago.classify

doc A · Claim form0.96

doc B · Policy0.93

doc C · Medical bill0.91

doc D · Correspondence0.88

Route downstream

Send the right document to the right next step.

Outputs can be returned to the host system, routed to Nashville for extraction, sent to Orion for reasoning, used by Polaris for workflow execution, or passed into external applications.

Routing decisionschicago.route

Claim → Nashville

Policy → Orion

Correspondence → Polaris

POST /v1/route · 200 OK

The core difference

Traditional systems recognize what a document looks like. Chicago understands what it means.

Rules-based systems work when documents are predictable. They look for fixed layouts, keywords, barcodes, page positions, or template matches. Chicago takes a different approach. It uses embedding-based understanding to compare documents by meaning, structure, and visual context. Two invoices can look different and still be recognized as invoices. Two denial letters can use different wording and still be grouped correctly. A new vendor format can be classified without rebuilding an entire rule library.

Traditional classification

Chicago

Matches fixed templates

Classifies by semantic meaning

Depends on keywords and layout rules

Uses text, visual, and multi-modal signals

Breaks when formats change

Handles variation across vendors and sources

Requires ongoing rule maintenance

Learns and adapts with less manual setup

Struggles with noisy intake

Built for real-world document streams

Chicago does not just ask, “What does this page look like?” It asks, “What is this document?”

Downstream impact

Cleaner intake makes the whole AI pipeline more reliable.

Every downstream step depends on Chicago getting the first step right. Extraction, reasoning, routing, and automation all become more accurate when the system starts with the correct document in the correct context.

Nashville

Better input.

Separated, correctly labeled documents make field extraction more accurate and reduce manual correction.

Orion

Better evidence.

When documents are organized by meaning, retrieval-grounded answers are based on the right sources.

Chicago

cleaner intake
→ better everything

Polaris

Right workflow.

Correct classification helps agents route, escalate, validate, and act on the right document type.

Human teams

Fewer errors to review.

Better intake means fewer misrouted documents, fewer failed extractions, and less manual cleanup.

Capabilities

Built for noisy, inconsistent, high-volume intake.

Document separation

Automatically split mixed files, packets, scanned batches, and attachments into individual business documents.

Semantic classification

Classify documents by meaning and context instead of relying only on keywords, file names, or fixed layouts.

Visual and textual understanding

Use both document appearance and content meaning to identify documents more reliably.

Clustering and discovery

Group similar documents across large repositories to identify recurring document types and prepare content for downstream AI.

Routing-ready outputs

Return labeled, structured outputs that can be sent to host systems, Nashville, Orion, Polaris, or external workflows.

Continuous improvement

Adapt to new document types and changing intake patterns with less manual rule-building and template maintenance.

Use cases

For any workflow where documents arrive mixed together.

Insurance claims

Forms, photos, evidence — sorted at intake.

Separate claim forms, policy pages, medical bills, adjuster notes, photos, correspondence, and supporting evidence before extraction or adjudication.

~7×fewer manual sorts

Banking & lending

Loan packets, organized at intake.

Organize loan applications, IDs, income statements, bank statements, disclosures, KYC documents, and underwriting materials.

days → minspacket triage

Government intake

Casework with clean inputs.

Separate applications, eligibility forms, notices, supporting records, appeals, permits, and case correspondence.

100%audit-traceable

Healthcare packets

Prior auth & chart organization.

Organize prior authorization packets, referrals, medical records, lab results, plan documents, and provider correspondence.

PHIstays on-prem

Digital mailroom

Automated intake at scale.

Classify and route scanned batches, inbound PDFs, email attachments, faxes, and business correspondence at intake.

24/7no human triage

Legacy archives

Turn archives into usable corpora.

Cluster and label large volumes of historical content so archives become usable for extraction, search, reasoning, and governance.

M+ docsat a time

OmniSuite™

Chicago creates the foundation for everything that follows.

Chicago is the first layer of OmniSuite™. It gives the rest of the platform clean, labeled, context-aware documents to work with.

Chicago

What is this document?

Separates, classifies, and routes inbound content.

Nashville

What facts does it contain?

Extracts fields, tables, entities, handwriting, clauses, and relationships.

Orion

What does the evidence say?

Reasons over structured data and verified enterprise content.

Polaris

What should happen next?

Routes, escalates, validates, and triggers workflow actions.

Chicago identifies the document. Nashville extracts the facts. Orion reasons from them. Polaris moves the work forward.

FAQ

Common questions about Chicago.

What does Chicago actually do?

Chicago separates mixed inbound content into individual business documents and classifies each document by meaning, so downstream systems know what they are processing.

Is Chicago only for digital mailrooms?

No. Digital mailroom is a strong fit, but Chicago also supports claims intake, loan processing, government case intake, healthcare packets, repository cleanup, and any workflow where documents arrive mixed together.

How is Chicago different from template-based classification?

Template systems rely on fixed layouts, keywords, or rules. Chicago uses semantic and visual understanding to classify documents by what they represent, even when formats vary.

Can Chicago separate documents inside one PDF?

Yes. Chicago is designed to detect document boundaries inside mixed files, packets, scans, and merged PDFs.

What happens after Chicago classifies a document?

The document can be routed to Nashville for extraction, Orion for reasoning, Polaris for workflow execution, a human review queue, or an external host system.

Does Chicago improve downstream accuracy?

Yes. When documents are correctly separated and labeled at intake, extraction, retrieval, reasoning, and workflow automation all start from cleaner inputs.

Ready when you are

Start with cleaner intake.

Chicago turns high-volume inbound content into separated, labeled, workflow-ready documents — so every downstream AI step starts with the right context.

Request demo → Explore OmniSuite™

We enable highly regulated organizations to build, govern, and operate domain-specific models within their own infrastructure and governance frameworks.

Haystac Platform / Chicago

Separate and identify every document before the workflow starts.

Request demo → Back to platform

haystac.local · Chicago · mixed packet → routed documents

Auto-separate

Split mixed packets into individual documents

Auto-identify

Classify by meaning, not rigid templates

Route correctly

Send each document to the right next step

Improve downstream AI

Give extraction & reasoning cleaner inputs

Inbound packet · mixedlive

page 1 · ?

page 2 · ?

page 3 · ?

page 4 · ?

page 5 · ?

Boundaries · classifiedsemantic + visual

doc 1 · pp.1–2Claim form0.96

doc 2 · p.3Policy0.93

doc 3 · p.4Medical bill0.91

doc 4 · p.5Correspondence0.88

5 pages → 4 documentsSplitok

Routed downstreamworkflow

Claim form→Nashville

Policy→Orion

Medical bill→Nashville

Correspondence→Polaris

✓ 4 docs routed · 0 manual sorts · case ref dec-4471-a

Where Chicago fits

Chicago is the first control point in the Haystac pipeline.

Chicago

classify & separate

doc type + boundaries

Nashville

extract structured data

fields · tables

Orion

reason over evidence

cited answers

Polaris

trigger next action

workflow + audit

delivered

{ JSON }

The problem

Inbound content rarely arrives clean.

Before any system can extract or reason, someone has to separate the content and identify what each document is. When that first step is wrong, everything downstream gets worse.

Mixed packets slow the intake queue.

A single upload may contain forms, IDs, invoices, statements, letters, notes, and supporting evidence. Someone still has to figure out where each document begins and ends.

Template-based classification breaks.

Document formats change. Vendors use different layouts. Scans are noisy. Similar-looking documents may mean very different things.

Bad intake creates downstream errors.

The wrong split leads to missing fields. The wrong label leads to wrong routing. The wrong context leads to wrong answers and manual cleanup.

The Chicago answer

Turn messy inbound content into clean, labeled intake.

Separate the packet

Split merged PDFs, scanned batches, faxes, uploads, and attachments into distinct business documents.

Identify each document

Classify content as a claim form, policy page, medical bill, ID, application, invoice, statement, contract, letter, or supporting record.

Route the next step

Send each document to Nashville for extraction, Orion for reasoning, Polaris for workflow action, or an external system.

Keep improving

Adapt to new document types, changing formats, and evolving intake streams with less manual rule maintenance.

What it enables

Create order before extraction begins.

Split merged files into individual documents

Break large PDFs, packets, scanned batches, and uploads into separate documents that downstream systems can process correctly.

Classify every document type

Identify forms, statements, policies, letters, invoices, applications, IDs, medical records, and supporting materials.

Route documents to the right workflow

Send claims, loan documents, applications, contracts, and case records to the right extraction, review, or automation path.

Group similar unknown documents

Cluster documents by narrative and visual similarity to discover recurring document types across large repositories.

Prepare content for RAG and reasoning

Make sure Orion retrieves and reasons over the right evidence, not a mislabeled or poorly separated file.

Reduce manual mailroom sorting

Replace human-heavy intake triage with automated separation and classification at scale.

How it works

From mixed intake to workflow-ready documents.

Chicago organizes inbound content before business logic is applied. It separates, classifies, labels, and routes documents so every downstream step starts from a cleaner input.

Receive

Take in files from the systems where work begins.

Documents arrive from scanners, digital mailrooms, file systems, ECM platforms, workflow tools, host applications, or inbound APIs.

Inbound sourceschicago.intake

scanner · SFTPlive

digital mailroomlive

ECM webhooklive

REST /v1/intakeready

Detect boundaries

Find where one document ends and another begins.

Chicago separates mixed batches into distinct documents, even when a single file contains multiple forms, attachments, letters, or supporting records.

Boundary detectionchicago.split

page 1 · doc A

page 2 · doc A

boundary detected

page 3 · doc B

boundary detected

page 4 · doc C

Generate signals

Understand the document beyond surface text.

Chicago creates representations of the document’s meaning, structure, and visual patterns so classification is not limited to keywords or templates.

Multi-modal signalschicago.encode

semantic embedding1024d

visual embedding768d

layout signaturehash

vocabulary & entitieson

Classify by meaning

Identify what the document represents.

Documents are labeled based on semantic similarity and context, even when layouts, vendors, scans, or wording differ.

Classified outputchicago.classify

doc A · Claim form0.96

doc B · Policy0.93

doc C · Medical bill0.91

doc D · Correspondence0.88

Route downstream

Send the right document to the right next step.

Outputs can be returned to the host system, routed to Nashville for extraction, sent to Orion for reasoning, used by Polaris for workflow execution, or passed into external applications.

Routing decisionschicago.route

Claim → Nashville

Policy → Orion

Correspondence → Polaris

POST /v1/route · 200 OK

The core difference

Traditional systems recognize what a document looks like. Chicago understands what it means.

Traditional classification

Chicago

Matches fixed templates

Classifies by semantic meaning

Depends on keywords and layout rules

Uses text, visual, and multi-modal signals

Breaks when formats change

Handles variation across vendors and sources

Requires ongoing rule maintenance

Learns and adapts with less manual setup

Struggles with noisy intake

Built for real-world document streams

Chicago does not just ask, “What does this page look like?” It asks, “What is this document?”

Downstream impact

Cleaner intake makes the whole AI pipeline more reliable.

Nashville

Better input.

Separated, correctly labeled documents make field extraction more accurate and reduce manual correction.

Orion

Better evidence.

When documents are organized by meaning, retrieval-grounded answers are based on the right sources.

Chicago

cleaner intake
→ better everything

Polaris

Right workflow.

Correct classification helps agents route, escalate, validate, and act on the right document type.

Human teams

Fewer errors to review.

Better intake means fewer misrouted documents, fewer failed extractions, and less manual cleanup.

Capabilities

Built for noisy, inconsistent, high-volume intake.

Document separation

Automatically split mixed files, packets, scanned batches, and attachments into individual business documents.

Semantic classification

Classify documents by meaning and context instead of relying only on keywords, file names, or fixed layouts.

Visual and textual understanding

Use both document appearance and content meaning to identify documents more reliably.

Clustering and discovery

Group similar documents across large repositories to identify recurring document types and prepare content for downstream AI.

Routing-ready outputs

Return labeled, structured outputs that can be sent to host systems, Nashville, Orion, Polaris, or external workflows.

Continuous improvement

Adapt to new document types and changing intake patterns with less manual rule-building and template maintenance.

Use cases

For any workflow where documents arrive mixed together.

Insurance claims

Forms, photos, evidence — sorted at intake.

Separate claim forms, policy pages, medical bills, adjuster notes, photos, correspondence, and supporting evidence before extraction or adjudication.

~7×fewer manual sorts

Banking & lending

Loan packets, organized at intake.

Organize loan applications, IDs, income statements, bank statements, disclosures, KYC documents, and underwriting materials.

days → minspacket triage

Government intake

Casework with clean inputs.

Separate applications, eligibility forms, notices, supporting records, appeals, permits, and case correspondence.

100%audit-traceable

Healthcare packets

Prior auth & chart organization.

Organize prior authorization packets, referrals, medical records, lab results, plan documents, and provider correspondence.

PHIstays on-prem

Digital mailroom

Automated intake at scale.

Classify and route scanned batches, inbound PDFs, email attachments, faxes, and business correspondence at intake.

24/7no human triage

Legacy archives

Turn archives into usable corpora.

Cluster and label large volumes of historical content so archives become usable for extraction, search, reasoning, and governance.

M+ docsat a time

OmniSuite™

Chicago creates the foundation for everything that follows.

Chicago is the first layer of OmniSuite™. It gives the rest of the platform clean, labeled, context-aware documents to work with.

Chicago

What is this document?

Separates, classifies, and routes inbound content.

Nashville

What facts does it contain?

Extracts fields, tables, entities, handwriting, clauses, and relationships.

Orion

What does the evidence say?

Reasons over structured data and verified enterprise content.

Polaris

What should happen next?

Routes, escalates, validates, and triggers workflow actions.

Chicago identifies the document. Nashville extracts the facts. Orion reasons from them. Polaris moves the work forward.

FAQ

Common questions about Chicago.

What does Chicago actually do?

Chicago separates mixed inbound content into individual business documents and classifies each document by meaning, so downstream systems know what they are processing.

Is Chicago only for digital mailrooms?

How is Chicago different from template-based classification?

Template systems rely on fixed layouts, keywords, or rules. Chicago uses semantic and visual understanding to classify documents by what they represent, even when formats vary.

Can Chicago separate documents inside one PDF?

Yes. Chicago is designed to detect document boundaries inside mixed files, packets, scans, and merged PDFs.

What happens after Chicago classifies a document?

The document can be routed to Nashville for extraction, Orion for reasoning, Polaris for workflow execution, a human review queue, or an external host system.

Does Chicago improve downstream accuracy?

Yes. When documents are correctly separated and labeled at intake, extraction, retrieval, reasoning, and workflow automation all start from cleaner inputs.

Ready when you are

Start with cleaner intake.

Chicago turns high-volume inbound content into separated, labeled, workflow-ready documents — so every downstream AI step starts with the right context.

Request demo → Explore OmniSuite™

We enable highly regulated organizations to build, govern, and operate domain-specific models within their own infrastructure and governance frameworks.