colabfoldv

This repository is intended to replicate only the colabfold_search functionality of the local ColabFold MSA generation pipeline, with an added protein database of 129944764 non-redundant proteins tailored for viruses, especially phages. Please see dataset_curation for more details on how it was constructed.

For the full Colabfold functionality and up-to-date changes (and frankly anything outside of the use case presented below), please go to the ColabFold repository.

This was used to create some MSAs used in phold's database.

Installation

This will replicate the MSAs used in the enVhogs and PHROG singleton protein structures in Phold. You can use other, later versions of MMSeqs2 if you'd like.
This assumes you have conda installed.

conda create -n colabfoldv_env python=3.12 mmseqs==15.6f452 pip
conda activate colabfoldv_env
git clone https://github.com/gbouras13/colabfoldv
cd colabfoldv
pip install -e .
pip install colabfold[alphafold]

Download Colabfold DBs

This will download the regular ColabFold uniref30_2302 and colabfold_envdb_202108 databases

mkdir -p colabfoldDBs
cd colabfoldDBs
bash ../setup_databases.sh

Download the viral/phage DB

The viral/phage database database is stored on Zenodo. It is 39GB, and therefore it may take some time to download.
I would highly recommend using aria2c (as used in MMSeqs2 and Foldseek) to download, but you can use e.g. wget or curl

aria2c https://zenodo.org/records/15045387/files/viral_db.tar.xz
tar -xvf viral_db.tar.xz

To use (change to the desired number of threads)

This will create MSAs without pairing (i.e. for monomers) for all proteins in example/NC_043029_aa.fasta using all three databases

THREADS=72
colabfold_search example/NC_043029_aa.fasta colabfoldDBs viral_db NC_043029_aa_msas --threads $THREADS

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
colabfold		colabfold
dataset_curation		dataset_curation
example		example
tests		tests
utils		utils
.gitignore		.gitignore
Contributing.md		Contributing.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
colabfold_search.sh		colabfold_search.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup_databases.sh		setup_databases.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

colabfoldv

Installation

Download Colabfold DBs

Download the viral/phage DB

To use (change to the desired number of threads)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

colabfoldv

Installation

Download Colabfold DBs

Download the viral/phage DB

To use (change to the desired number of threads)

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages