Tags: copyleftdev/pdfvec
Tags
Release v0.1.0 Initial release of pdfvec - high-performance PDF text extraction for vectorization pipelines. Features: - PDF text extraction (parallel & streaming) - Structured Document/Page API - Text chunking (fixed, paragraph, sentence strategies) - Metadata extraction (title, author, dates) - CLI tool Performance: 40-134 MiB/s (15-143x faster than pdf-extract)