Skip to content

ATCC-Bioinformatics/AGP_methylation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Clickable-Awesome-Portal-portal

AGP_methylation

Repository to contain all code and scripts related to methylation basecalling on Oxford Nanopore Technologies data directed toward the ATCC® Genome Portal. Data can be retrieved through the API in methylbed format, generated using ONT Dorado's modified-base calling pipeline from Oxford Nanopore Technologies (ONT) sequencing data. The files provide per-base methylation calls, specifically for CpG methylation (5mC), derived from long-read sequencing.

Please note that the ATCC® Genome Portal is a living database and some datasets were produced prior to current modified-basecalling technology became available (e.g., samples were sequenced on the R9 flowcells prior to the release of R10).

Genomes with available methylation datasets

For a list of genomes that currently have methylation data on the AGP, refer to this table. This table will be continually updated as we add more methylation data to the AGP.

Additional Links

dorado link

https://github.com/nanoporetech/dorado/

AGP API

Methylation data can also be downloaded via the One Codex REST-API. Instructions on how to use the API and download methylation data can be found here: https://github.com/ATCC-Bioinformatics/genome_portal_api

Methylation calls

The current available methylation models available for dorado to basecall can be found at https://dorado-docs.readthedocs.io/en/latest/basecaller/mods/

As of June 2025, the models applied to datasets in the ATCC® Genome Portal include as many of the following as were compatible with the ONT sequencing available at the time each dataset was generated.

Mod Name SAM Code CHEBI
5mC 5-Methylcytosine C+m CHEBI:27551
5hmC 5-Hydroxymethylcytosine C+h CHEBI:76792
4mC N(4)-methylcytosine C+21839 CHEBI:21839
6mA 6-Methyladenine A+a CHEBI:28871

File Format

methylbed: A tab-delimited format extending the BED specification, containing methylation call information. Columns include:

Chromosome

Start

End (standard BED coordinates).

Modified base type (e.g., 5mC).

Methylation probability or score.

Read ID and strand information.

Additional metadata (e.g., basecalling model version).

For a more thorough description of these columns, please see: https://github.com/nanoporetech/modkit?tab=readme-ov-file#bedmethyl-column-descriptions

Data Source

Generated by ONT Dorado's modified basecalling algorithm with modkit pileup and the BAM files output by the dorado basecaller using as many available methylation models as supported by the sequencing technology at the time of data generation.

Input: Raw nanopore sequencing data (FAST5/POD5).

Output: Methylation probabilities for the available models.

Usage

Suitable for downstream epigenetic analysis, such as differential methylation studies or visualization in genome browsers. Compatible with tools like bedtools, pybedtools, or custom scripts for processing.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors