This directory contains benchmarks, tests, and exploratory scripts for the StringZilla library, focused on internal functionality, rather than third-party alternatives.
- For comparative performance analysis, please refer to StringWars.
- To understand the distributional properties of hash functions, see HashEvals.
Benchmarks validate SIMD-accelerated backends against serial baselines and measure throughput on real-world workloads.
bench_find.cpp- bidirectional substring search, byte search, and byteset searchbench_token.cpp- token-level operations: hashing, checksums, equality, and orderingbench_sequence.cpp- sorting, partitioning, and set intersections of string arraysbench_memory.cpp- memory operations: copies, moves, fills, and lookup table transformationsbench_container.cpp- STL associative containers (std::map,std::unordered_map) with string keysbench_similarities.cpp- Levenshtein, Needleman-Wunsch, Smith-Waterman scoring on CPUbench_fingerprints.cpp- MinHash rolling fingerprints and multi-pattern search on CPUbench_similarities.cu- similarity scoring algorithms on CUDA GPUsbench_fingerprints.cu- fingerprinting algorithms on CUDA GPUs
All benchmarks support environment variables for configuration. Check file headers for details.
Unit tests validate correctness across all backends and programming languages.
test_stringzilla.cpp- C++ API tests against STL baselinestest_stringzilla.py- Python API tests against native stringstest_stringzillas.cpp- parallel CPU backend teststest_stringzillas.cu- CUDA backend teststest.js- JavaScript API tests
Jupyter notebooks for algorithm visualization and analysis.
explore_levenshtein.ipynb- edit distance algorithms and diagonal traversalexplore_fingerprint.ipynb- MinHash and rolling fingerprintsexplore_unicode.ipynb- UTF-8 handling and Unicode normalization