Sassy is a library and tool for searching short strings in texts, a problem that goes by many names:
- approximate string matching,
- pattern matching,
- fuzzy searching.
The motivating application is searching short (length 20 to 100) DNA sequences in a human genome or e.g. in a set of reads. Sassy generally works well for patterns/queries up to length 1000, and supports both ASCII, DNA, and IUPAC.
It has a grep-like mode for quick human inspection, as well as search to
report locations of matches, and filter to only output (non)-matching records.
Feature highlights:
- Sassy uses bitpacking and SIMD (both AVX2 and NEON supported). Its main novelty is tiling these in the text direction.
- Support for overhang alignments where the pattern extends beyond the text. (See paper appendix for details.)
- Support for (case-insensitive) ASCII, DNA (
ACGT), and IUPAC (=ACGT+NYR...) alphabets. - Rust library (
cargo add sassy), binary (cargo install sassy, see details below), Python bindings (pip install sassy-rs), and C bindings (see below).
See the papers, detailed docs on docs.rs, and corresponding evals in evals/:
Rick Beeloo and Ragnar Groot Koerkamp.
Sassy2: Batch Searching of Short DNA Patterns
bioRxiv, March 2026.
https://doi.org/10.64898/2026.03.10.710811
and
Rick Beeloo and Ragnar Groot Koerkamp.
Sassy: Fuzzy Searching DNA Sequences using SIMD
bioRxiv, July 2025.
https://doi.org/10.1101/2025.07.22.666207
See the latest release.
You can also get these via
cargo binstall sassyor via conda/mamba/pixi:
conda install -c bioconda sassyAVX-512: The prebuilt x86-64 binaries distributed via GitHub
releases and bioconda (but not Pypi) target both AVX2 and AVX-512 using cargo multivers.
Specifically, version 2 (sassy {search,crispr} --v2) is 2x faster for AVX-512.
RUSTFLAGS="-C target-cpu=native" cargo install --profile dist --git https://github.com/RagnarGrootKoerkamp/sassy sassySassy uses AVX2 or NEON instructions performance reasons, which requires either
target-cpu=native or target-cpu=x86-64-v3 on x64 machines.
See this README for details and this
blog for background.
The same restrictions apply when using the sassy library in a larger project.
Sassy2 can also use AVX-512.
This requires target-cpu=x86-64-v4, target-cpu=native on a machine
with AVX-512 support, or cargo multivers --profile dist.
Sassy requires Rust 1.91 or newer. Get it via rustup update. (Switch to
rustup when your system installation is too old).
Sassy can be used via the CLI, or as Rust, Python, or C library.
The library can be used to search for ASCII or DNA strings.
A larger example can be found in src/lib.rs.
// cargo add sassy
use sassy::{Searcher, Match, profiles::Iupac, Strand};
let pattern = b"ATCG";
let text = b"AAAATTGAAA";
let k = 1;
// The Iupac profile supports N and YR... characters.
// If you are sure you only have ACGT input, then `profiles::Dna` is slightly faster.
let mut searcher = Searcher::<Iupac>::new_fwd();
let matches = searcher.search(pattern, &text, k);
assert_eq!(matches.len(), 1);
assert_eq!(matches[0].text_start, 3);
assert_eq!(matches[0].text_end, 7);
assert_eq!(matches[0].cost, 1);
assert_eq!(matches[0].strand, Strand::Fwd);
assert_eq!(matches[0].cigar.to_string(), "2=1X1=");When searching multiple equally long (<=64bp) patterns you can pre-encode the patterns. This is around 10-20x faster for short texts (<=200bp), and 2-3x faster for longer texts.
use sassy::{Searcher, Match, profiles::Iupac, Strand};
let patterns = [b"ATG".to_vec(), b"TTT".to_vec()];
let text = b"CCCCATGCCCCTTT";
let k = 1;
let mut searcher = Searcher::<Iupac>::new_fwd();
let encoded = searcher.encode_patterns(&patterns);
let matches = searcher.search_encoded_patterns(&encoded, text, k);
assert_eq!(matches.len(), 2);
assert_eq!(matches[0].text_start, 4); // ATG
assert_eq!(matches[1].text_start, 11); // TTTThe CLI can be used via:
sassy grep: to show nicely coloured output.sassy search: to write a.tsvof matching locations.sassy filter: to write a.fasta/.fastqof (non)-matching records.sassy crispr: to search for CRISPR guides.
grep, search, and filter all take the same arguments, and are implemented
by forwarding to grep. Thus, they can all be combined via e.g.
sassy grep -p ACGTCAAACCTA -k 3 --matches matches.tsv --output filtered.fastq reads.fastq.gzSearch a pattern ATGAGCA in text.fasta with ≤1 edit:
sassy search --pattern ATGAGCA -k 1 text.fastaor search all records of a fasta file with --pattern-fasta <fasta-file> instead of --pattern.
The grep output is coloured:
- green shows matching characters,
- orange shows mismatches,
- red shows deleted characters (in pattern but not in text),
- blue shows inserted characters (in text but not in pattern).

patterns.fasta
>p1
ATGAGCA
>p2
TTAAATA
sassy search --pattern-fasta patterns.fasta -k 1 text.fastaIf your patterns.fasta has many patterns (>8) which are equally long and <=64bp enable V2
--v2 for higher throughput:
sassy search --pattern-fasta patterns.fasta -k 1 text.fasta --v2sassy search -p GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG -k 2 reads.fa > matches.tsv
# or
sassy search -p GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG -k 2 reads.fa --matches matches.tsv
# or
sassy grep -p GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG -k 2 reads.fa --matches matches.tsvgives .tsv output like this:
pat_id text_id cost strand start end match_region cigar
pattern AC_000001.1__1_1 0 + 6 48 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_35 0 + 897 939 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_49 1 + 866 908 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCGCGCG 37=1X4=
pattern AC_000001.1__1_64 0 - 1267 1309 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_67 0 + 600 642 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_68 0 - 1826 1868 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_78 3 - 4381 4425 GTACAGAAACGAGCGGATGGAAAATAGTAGTGAGCGGCCTCGCG 23=1X1I10=1I8=
pattern AC_000001.1__1_92 0 - 6554 6596 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_94 0 - 6413 6455 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_115 2 + 2091 2131 GTACAGAAACGAGCATGGAAAGAGTAGTGAGCGCCTCGCG 14=2D26=
pattern AC_000001.1__1_118 0 - 3062 3104 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_123 0 + 1416 1458 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=
pattern AC_000001.1__1_127 0 + 27 69 GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG 42=Table specification:
pat_id: the record id of the matched patterntext_id: the record id of the matching textcost: the edit distance (non-negative integer) of the matchstrand: the strand of the match, either+for forward or-for rc matchesstart: the 0-based inclusive start of the match in the textend: the 0-based exclusive end of the match in the textmatch_region: the region of the text that matches the pattern, possibly reverse-complemented to 'align' with the direction of the pattern.text[start..end]for forward (+) matches andrc(text[start..end])for reverse (-) matches.cigar: the CIGAR string between the pattern andmatch_region, in the direction of the pattern.
Note on CIGAR strings and tracebacks: Since version 0.2.1, the alignment returned by Sassy prefers matches and mismatches, and otherwise prefers deletions over insertions, see #46. In older versions, deletions were preferred over substitutions, possibly resulting in suboptimal alignments.
Note on SAM-compatibility: The SAM format outputs the information for
reverse complement matches differently. Rather than reverse-complementing the
text to align with the pattern, it reverse-complements the pattern to align with
the text. That means the equivalent to the match_region column always reads
in the direction of the text, and likewise the cigar is oriented to
correspond to match_region, also in the direction of the text.
Use the --sam flag to get this SAM-compatible output.
sassy filter -p GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG -k 2 reads.fq > filtered.fq
# or
sassy filter -p GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG -k 2 reads.fq -o filtered.fq
# or
sassy grep -p GTACAGAAACGAGCGGATGGAAAGAGTAGTGAGCGCCTCGCG -k 2 reads.fq -o filtered.fqWrites a file containing only matching records. Use --invert to only
write non-matching records.
Search for one or more guides in guides.txt:
sassy crispr --threads 8 --guide guides.txt --k 5 --max-n-frac 0.1 --output hits.tsv hg38.fastaAllows <= k edits in the sgRNA, and the PAM (the last 3 characters of each guide) has to match exactly, unless --allow-pam-edits is given.
Use e.g. --pam-length 5 to change the default of 3.
Output of the crispr command is a tab-delimited file with one row per hit, e.g.:
guide text_id cost strand start end match_region cigar
GAGTCCGAGCAGAAGAAGAANGG chr21 5 + 5024135 5024154 GAGGCCACAGAGAAGAGGG 3=1X2=1D1=1D3=1D5=1D4=
GAGTCCGAGCAGAAGAAGAANGG chr21 3 + 21087337 21087359 gagaccgaggagaagaaaaagg 3=1X5=1X7=1D5=
GAGTCCGAGCAGAAGAAGAANGG chr21 3 - 9701297 9701320 GACTCGAGCATGAAGAAGAAAGG 2=1X1=1D6=1I12=
GAGTCCGAGCAGAAGAAGAANGG chr21 5 - 46396975 46396998 CAGTCCCAGCAGACGACGGACGG 1X5=1X6=1X2=1X1=1X4=
The start and end are 0-based open-ended (i.e. 0-based inclusive of the
start, but exclusive of the end), and start is always less than end
(regardless of the strand). The
match_region reported will be the sequence from the target file when strand is +, or the reverse complement
of the sequence from the target file when strand is -, so that it matches the guide sequence.
The cigar is always oriented to read left-to-right with the provided guide and match_region sequences.
Note that this searches for approximate occurrences of the guide sequence itself, and not for reverse-complement binding sites. If binding sites are to be found, please reverse-complement the input or output manually.
PyPI wheels can be installed with:
pip install sassy-rs import sassy
pattern = b"ACTG"
text = b"ACGGCTACGCAGCATCATCAGCAT"
searcher = sassy.Searcher("dna") # ascii / dna / iupac
matches = searcher.search(pattern, text, k=1)
for m in matches:
print(m)See python/README.md for more details.
See c/README.md for details. Quick example:
#include "sassy.h"
int main() {
const char* pattern = "ACTG";
const char* text = "ACGGCTACGCAGCATCATCAGCAT";
// DNA alphabet, with reverse complement, without overhang.
sassy_SearcherType* searcher = sassy_searcher("dna", true, NAN);
sassy_Match* out_matches = NULL;
size_t n_matches = search(searcher,
pattern, strlen(pattern),
text, strlen(text),
1, // k=1
&out_matches);
sassy_matches_free(out_matches, n_matches);
sassy_searcher_free(searcher);
}