vlm-ocr

IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.

Updated Nov 7, 2025
Python

aeilot / OCRit

Star

OCR it on macOS with DeepSeek-OCR

macos swift ocr ai vlm swiftui deepseek vlm-ocr

Updated Nov 24, 2025
Swift

morkev / vlm-yolo-detector

Star

Document image retrieval via MCP or API for agentic systems using semantic embeddings, YOLO, and VLM classification.

api mcp yolo image-classification manufacturing schematics image-retrieval vlm pymupdf pdf-document-processor semantic-embedding machine-classification schematic-diagram ollama ollama-api vlm-ocr diagram-extraction vlm-yolo

Updated Jan 21, 2026
Python

Niraya666 / DocuLingo

Star

DocuLingo is a powerful document parsing tool built with multimodal large language models to enhance RAG (Retrieval Augmented Generation) workflows.

document-converting rag vlm-ocr

Updated May 7, 2025
Python

Takk8IS / CyberTechVLMDetector

Sponsor

Star

The CyberTech VLM Detector is a computer vision system designed to run entirely on edge devices, without requiring cloud access. The system uses vision-language models (VLM) to detect and locate objects in images based on natural language commands and development, including my creation of HIM™ and MAIC™

python camera view read detector vlm vlms takk8is takk-ag takk-design davidccavalcante vlm-ocr

Updated Jul 24, 2025
Python

Timur-SA / smart-grant

Star

Развитие идей прототипа проекта SmartGrant, созданного во время Young Scientist Hackathon 2025. Система автоматизированной генерации смарт-контрактов на основе анализа бюджетных смет с применением LLM для обеспечения прозрачности целевого использования грантовых средств.

ml fintech grants-management llm vlm-ocr

Updated Jan 25, 2026
Python

posicube-services / KolmOCR

Star

Korean olmOCR

document vlm vlm-ocr

Updated Jan 6, 2026
HTML

sevkaz / Vision-Language-Video-Scanner-

Star

VLM-first video frame scanner that analyzes video frames with a vision-language model and optional OCR.

youtube ai frames vlm vlm-ocr

Updated Jan 25, 2026
Python

Improve this page

Add a description, image, and links to the vlm-ocr topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vlm-ocr topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vlm-ocr

Here are 14 public repositories matching this topic...

bytedance / Dolphin

vlm-run / vlmrun-hub

Roots-Automation / GutenOCR

kaminoer / KokoDOS

video-db / ocr-benchmark

seanpedrick-case / doc_redaction

OmarSamirz / ImageFromTextGenerator

aeilot / OCRit

morkev / vlm-yolo-detector

Niraya666 / DocuLingo

Takk8IS / CyberTechVLMDetector

Timur-SA / smart-grant

posicube-services / KolmOCR

sevkaz / Vision-Language-Video-Scanner-

Improve this page

Add this topic to your repo