The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
-
Updated
Dec 17, 2025 - Python
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
A hub for various industry-specific schemas to be used with VLMs.
Yet another self-hosted AI voice assistant. GlaDOS' blazing fast pipeline with Kokoro TTS voice and vision.
Redact PDF/image-based documents, Word, or CSV/XLSX files using a graphical user interface. Demo: https://huggingface.co/spaces/seanpedrickcase/document_redaction or with try with VLMs: https://huggingface.co/spaces/seanpedrickcase/document_redaction_vlm
IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.
Document image retrieval via MCP or API for agentic systems using semantic embeddings, YOLO, and VLM classification.
DocuLingo is a powerful document parsing tool built with multimodal large language models to enhance RAG (Retrieval Augmented Generation) workflows.
The CyberTech VLM Detector is a computer vision system designed to run entirely on edge devices, without requiring cloud access. The system uses vision-language models (VLM) to detect and locate objects in images based on natural language commands and development, including my creation of HIM™ and MAIC™
Развитие идей прототипа проекта SmartGrant, созданного во время Young Scientist Hackathon 2025. Система автоматизированной генерации смарт-контрактов на основе анализа бюджетных смет с применением LLM для обеспечения прозрачности целевого использования грантовых средств.
Add a description, image, and links to the vlm-ocr topic page so that developers can more easily learn about it.
To associate your repository with the vlm-ocr topic, visit your repo's landing page and select "manage topics."