Skip to content

Tags: copyleftdev/pdfvec

Tags

v0.1.1

Toggle v0.1.1's commit message
Release v0.1.1

- Add README.md with documentation and examples
- Add LICENSE-MIT and LICENSE-APACHE
- Fix repository URL in Cargo.toml

v0.1.0

Toggle v0.1.0's commit message
Release v0.1.0

Initial release of pdfvec - high-performance PDF text extraction for vectorization pipelines.

Features:
- PDF text extraction (parallel & streaming)
- Structured Document/Page API
- Text chunking (fixed, paragraph, sentence strategies)
- Metadata extraction (title, author, dates)
- CLI tool

Performance: 40-134 MiB/s (15-143x faster than pdf-extract)