Unsiloed AI

API for parsing multimodal unstructured data

Fall 2025

Active

https://www.unsiloed.ai/

API for parsing multimodal unstructured data

AI teams spend 6+ months building document workflows, yet fewer than 10% ever reach production. Generic LLM parsers and OCR collapse on multimodal documents with text, tables, images, and charts. Poor parsing and suboptimal chunking cripple RAG pipelines and downstream automation. Unsiloed AI has built state-of-the-art vision models which serves as the infrastructure layer for turning unstructured data into structured, queryable, and LLM-ready assets. Our APIs are already parsing hundreds of thousands of documents for startups and NASDAQ-listed enterprises, powering vertical AI solutions across industries. On public benchmarks, Unsiloed AI consistently outperforms solutions from LlamaIndex, Gemini, Mistral, and Unstructured.io among others.

Active Founders

Adnan Abbas

Founder

Co-founder & CTO at Unsiloed AI • MIT • IIT Kharagpur Built multi-modal models deployed at a Fortune 10 company. Was building autonomous navigation systems at Mercedes Benz. Launched India’s first Web 3.0 audio app while in college, scaling it to thousands of users within a month.

Adnan Abbas

Founder

Aman Mishra

Founder

Co-founder & CEO at Unsiloed AI • IIT Kharagpur Previously built an ultra low-latency trading system moving billions at a hedge fund. Founding Engineer (#1) at an SF-based stealth startup building AI copilots for firms like Goldman Sachs and Charles Schwab. Launched a P2P rental platform from my dorm room, scaling it to 7-figure ARR within 3 months.

Aman Mishra

Founder

Company Launches

Unsiloed AI: Make Unstructured Data LLM-Ready

See original launch post

Hey YC, we are Aman Mishra and Adnan Abbas from Unsiloed AI.

TL;DR: Unsiloed AI is building the most accurate APIs for ingesting multimodal unstructured data like PDFs, PPT, DOCX, tables, charts, and images, and converting it into structured Markdown and JSON for downstream LLMs and AI Agents.

We are already processing millions of pages of complex documents each week for Fortune 150 banks, NASDAQ‑listed companies, as well as early‑stage startups in accuracy-sensitive domains like finance, legal, and healthcare.

https://youtu.be/ULDK5dgfgzM

The Problem

More than 80% of enterprise data is multimodal and unstructured. AI teams spend 6+ months building accurate document‑ingestion pipelines that keep breaking.

From the tons of open‑source solutions out there, it’s still tough to achieve superior accuracy on even mildly complex cases.
Traditional OCRs are static and break with changing layouts.
LLMs, although good at comprehension, suffer at deterministic extraction, making them unreliable for accuracy‑sensitive domains like finance and healthcare.
Early‑stage vertical AI teams end up becoming document AI companies, reinventing the wheel, as evident from 300+ conversations we’ve had with AI teams of all sizes.

Solution (What we built)

We combine vision models with OCR‑based models to accurately extract information from complex documents.

1) Pre‑processing & Segmentation

We segment data into texts, tables, images, and plots using specialized models for each task.
We use a heatmap‑based chunking technique that first generates pivot elements from the document. Pivot elements are the elements of importance, e.g., numbers and merged cells in tables.
This chunking strategy ensures all related pieces of information are preserved in the same chunk (e.g., a table spanning multiple pages, rows split across pages), while unrelated information is split across chunks. The result: retrieval feeds only accurate, complete‑context chunks to LLMs.

2) Dual‑Stream Representation

Post pre‑processing, we pass the segmented chunks through two parallel streams:

Data Stream: preserves the extracted content.
Layout Stream: preserves the actual layout & hierarchy (indentation, alignment, clause/sub‑clause structure).

This matters because the data is not just text and numbers the structure carries meaning (e.g., a right‑aligned cell in a financial table or the way clauses/sub‑clauses are arranged). The dual stream captures both semantic content and structural cues.

3) Domain‑Specific Decoder

A decoder consumes both the streams and structures the outputs as per the required JSON schema or Markdown.
We incorporate domain‑specific ontologies (finance, healthcare, legal).
An in-built RL pipeline to train the decoder when outputs involve internal terminology that is hard to capture using a general LLM.
We generate confidence scores for each extracted item; low‑score items are collected over time to run fine‑tuning jobs

We can run all of this under fully air-gapped on-premise environments as well for privacy-sensitive verticals.

Here are some sample outputs generated by our Vision Models:

PIE Chart formatted markdown

JSON output from handwritten, scanned docs, along with confidence scoring and citations.

Our Progress:

We are already processing millions of pages for Fortune 150 banks, NASDAQ‑listed companies, as well as early‑stage startups (including 10+ YC startups) across finance, legal, and healthcare. On public benchmarks, we consistently outperform solutions from LlamaIndex, Gemini, Mistral, and Unstructured.io, among others.

Here is a representation of the volume of pages we have processed, stacked on top of each other.

Our Ask:

Parsing PDFs, images, PPTs, or Excel files for your Vertical AI use case or RAG pipeline? Give Unsiloed AI a try. We turn months of ingestion work into one API call for every document type.

Sign up on unsiloed.ai to give it a try (no credit card needed)

For any queries or feedback:

Shoot us an email at founders@unsiloed.ai
Ping on WhatsApp/iMessage at +1 415 996 5878 (Aman)
Website: https://www.unsiloed.ai/

YC Photos