Repository to contain all code and scripts related to methylation basecalling on Oxford Nanopore Technologies data directed toward the ATCC® Genome Portal. Data can be retrieved through the API in methylbed format, generated using ONT Dorado's modified-base calling pipeline from Oxford Nanopore Technologies (ONT) sequencing data. The files provide per-base methylation calls, specifically for CpG methylation (5mC), derived from long-read sequencing.
Please note that the ATCC® Genome Portal is a living database and some datasets were produced prior to current modified-basecalling technology became available (e.g., samples were sequenced on the R9 flowcells prior to the release of R10).
For a list of genomes that currently have methylation data on the AGP, refer to this table. This table will be continually updated as we add more methylation data to the AGP.
https://github.com/nanoporetech/dorado/
Methylation data can also be downloaded via the One Codex REST-API. Instructions on how to use the API and download methylation data can be found here: https://github.com/ATCC-Bioinformatics/genome_portal_api
The current available methylation models available for dorado to basecall can be found at https://dorado-docs.readthedocs.io/en/latest/basecaller/mods/
As of June 2025, the models applied to datasets in the ATCC® Genome Portal include as many of the following as were compatible with the ONT sequencing available at the time each dataset was generated.
| Mod | Name | SAM Code | CHEBI |
|---|---|---|---|
| 5mC | 5-Methylcytosine | C+m | CHEBI:27551 |
| 5hmC | 5-Hydroxymethylcytosine | C+h | CHEBI:76792 |
| 4mC | N(4)-methylcytosine | C+21839 | CHEBI:21839 |
| 6mA | 6-Methyladenine | A+a | CHEBI:28871 |
methylbed: A tab-delimited format extending the BED specification, containing methylation call information. Columns include:
Chromosome
Start
End (standard BED coordinates).
Modified base type (e.g., 5mC).
Methylation probability or score.
Read ID and strand information.
Additional metadata (e.g., basecalling model version).
For a more thorough description of these columns, please see: https://github.com/nanoporetech/modkit?tab=readme-ov-file#bedmethyl-column-descriptions
Generated by ONT Dorado's modified basecalling algorithm with modkit pileup and the BAM files output by the dorado basecaller using as many available methylation models as supported by the sequencing technology at the time of data generation.
Input: Raw nanopore sequencing data (FAST5/POD5).
Output: Methylation probabilities for the available models.
Usage
Suitable for downstream epigenetic analysis, such as differential methylation studies or visualization in genome browsers. Compatible with tools like bedtools, pybedtools, or custom scripts for processing.
