1. Overview¶
CpGtools package provides a number of Python programs to annotate, QC, visualize, and analyze DNA methylation data generated from Illumina HumanMethylation450 BeadChip (450K) / MethylationEPIC BeadChip (850K) array or RRBS / WGBS.
These programs can be divided into three groups:
- CpG position analysis modules
- CpG signal analysis modules
- Differential CpG analysis modules
1.1. CpG position analysis modules¶
These modules are primarily used to analyze CpG’s genomic locations.
Name | Description |
CpG_aggregation.py | Aggregate proportion values of CpGs that located in give genomic regions (eg. CpG islands, promoters, exons, etc.). |
CpG_anno_position.py | Add annotation information CpGs according to their genomic coordinates. |
CpG_anno_probe.py | Add annotation information to 450K/850K probes. |
CpG_density_gene_centered.py | Generate the CpG density (count) profile over gene body and the up/down-stream intergenic regions. |
CpG_distrb_chrom.py | Calculate the distribution of CpG over chromosomes. |
CpG_distrb_gene_centered.py | Calculate the distribution of CpG over gene-centered genomic regions. |
CpG_distrb_region.py | Calculate the distribution of CpG over user-specified genomic regions. |
CpG_logo.py | Generate a DNA motif logo and matrices for a given set of CpGs. |
CpG_to_gene.py | Assign CpGs to their putative target genes. It uses the algorithm similar to GREAT. |
1.2. CpG signal analysis modules¶
These modules are primarily used to analyze CpG’s DNA methylation beta values
Name | Description |
beta_PCA.py | Perform PCA (principal component analysis) for samples. |
beta_jitter_plot.py | Generate jitter plot (a.k.a. strip chart) and bean plot for each sample.” |
beta_m_conversion.py | Convert Beta-value into M-value or vice versa. |
beta_profile_gene_centered.py | Calculate the methylation profile (i.e., average beta value) for genomic regions around genes. |
beta_profile_region.py | Calculate methylation profile (i.e. average beta value) around the user-specified genomic regions. |
beta_stacked_barplot.py | Create stacked barplot for each sample. The stacked barplot showing the proportions of CpGs whose beta values are falling into [0,0.25], [0.25,0.5], [0.5,0.75],[0.75,1] |
beta_stats.py | Summarize basic information on CpGs located in each genomic region. |
beta_tSNE.py | Perform t-SNE (t-Distributed Stochastic Neighbor Embedding) analysis for samples. |
beta_topN.py | Select the top N most variable CpGs (according to standard deviation) from the input file. |
beta_trichotmize.py | Use Bayesian Gaussian Mixture model to trichotmize beta values into three status: ‘Un-methylated’,’Semi-methylated’, ‘Full-methylated’, and ‘unassigned’. |
beta_UMAP.py | Perform UMAP (Uniform Manifold Approximation and Projection) for samples. |
1.3. Differential CpG analysis modules¶
These modules are primarily used to identify CpGs that are differentially methylated between groups
Name | Description |
dmc_Bayes.py | Differential CpG analysis using the Bayesian approach. (for 450K/850K data) |
dmc_bb.py | Differential CpG analysis using the beta-binomial model. (for RRBS/WGBS count data) |
dmc_fisher.py | Differential CpG analysis using Fisher’s Exact Test. (for RRBS/WGBS count data) |
dmc_glm.py | Differential CpG analysis using the GLM generalized liner model. (for 450K/850K data) |
dmc_logit.py | Differential CpG analysis using logistic regression model. (for RRBS/WGBS count data) |
dmc_nonparametric.py | Differential CpG analysis using Mann-Whitney U test for two group comparison, and the Kruskal-Wallis H-test for multiple groups comparison. |
dmc_ttest.py | Differential CpG analysis using T test. (for 450K/850K data) |