Most AI pipelines are only as good as the data we provide them with, and that usually means PDFs or other unstructured documents.
Contracts, invoices, reports... All have special layout, language, and context mixed together, and getting reliable structured data out of them is still one of the hardest unsolved problems in enterprise AI.
Parse-Flow is an open-source project we built to tackle this head-on. It puts four document processing primitives at the center of a visual workflow designer:
๐ Parse โ clean markdown and text from raw documents ๐๏ธ Classify โ assign documents to user-defined categories โ๏ธ Split โ segment documents into typed chunks ๐ช Extract โ pull structured JSON against a schema
You drag steps onto a canvas, drop in a document, and watch events stream back as the pipeline runs. Under the hood it's powered by a LlamaAgents workflow that walks your flow one step at a time, making every transition observable and every failure a first-class value.
๐๏ธ Full write-up on the architecture here: https://www.llamaindex.ai/blog/designing-a-visual-document-intelligence-workflow-with-llamaparse?utm_medium=socials&utm_source=twitter&utm_campaign=2026-jun- ๐ฉโ๐ป Source code: http://github.com/run-llama/parse-flow
