Overview ========= The **CpGtools** package provides a collection of Python programs designed to annotate, perform quality control (QC), visualize, and analyze DNA methylation data generated from the following Illumina platforms: - `HumanMethylation450 BeadChip (450K) `_ - `MethylationEPIC BeadChip (850K) `_ - `Infinium MethylationEPIC v2.0 (930K) `_ - `Reduced Representation Bisulfite Sequencing (RRBS) / Whole Genome Bisulfite Sequencing (WGBS) `_ The CpGtools modules are organized into four main categories: - **CpG position analysis modules** - **CpG signal analysis modules** - **Differential CpG analysis modules** - **Predictive modules** (under development) CpG Position Analysis Modules ----------------------------- These modules focus on analyzing CpG genomic locations and their annotations. +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Name | Description | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CpG_aggregation.py `_ | Aggregate proportion values of CpGs that located in give genomic regions (eg. CpG islands, promoters, exons, etc.). | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CpG_anno_position.py `_ | Add annotation information CpGs according to their genomic coordinates. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CpG_anno_probe.py `_ | Add annotation information to 450K/850K probes. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CpG_density_gene_centered.py `_ | Generate the `CpG density (count) profile `_ over gene body and the up/down-stream intergenic regions. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CpG_distrb_chrom.py `_ | Calculate the distribution of CpG over chromosomes. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CpG_distrb_gene_centered.py `_ | Calculate the distribution of CpG over `gene-centered genomic regions `_. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CpG_distrb_region.py `_ | Calculate the distribution of CpG over `user-specified genomic regions `_. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CpG_logo.py `_ | Generate a `DNA motif logo `_ and matrices for a given set of CpGs. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `CpG_to_gene.py `_ | Assign CpGs to their putative target genes. It uses the algorithm similar to `GREAT `_. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ CpG Signal Analysis Modules --------------------------- These modules analyze CpG methylation beta values across samples and genomic regions. +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Name | Description | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `beta_PCA.py `_ | Perform `PCA `_ (principal component analysis) for samples. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `beta_jitter_plot.py `_ | Generate `jitter plot `_ (a.k.a. strip chart) and bean plot for each sample." | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `beta_m_conversion.py `_ | Convert Beta-value into M-value or *vice versa*. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `beta_profile_gene_centered.py `_ | Calculate the `methylation profile `_ (i.e., average beta value) for genomic regions around genes. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `beta_profile_region.py `_ | Calculate `methylation profile `_ (i.e. average beta value) around the user-specified genomic regions. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `beta_stacked_barplot.py `_ | Create `stacked barplot `_ for each sample. The stacked barplot showing the proportions of CpGs whose beta values are falling into [0,0.25], [0.25,0.5], [0.5,0.75],[0.75,1] | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `beta_stats.py `_ | Summarize basic information on CpGs located in each genomic region. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `beta_tSNE.py `_ | Perform `t-SNE `_ (t-Distributed Stochastic Neighbor Embedding) analysis for samples. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `beta_topN.py `_ | Select the top N most variable CpGs (according to standard deviation) from the input file. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `beta_trichotmize.py `_ | Use `Bayesian Gaussian Mixture model `_ to trichotmize beta values into three status: 'Un-methylated','Semi-methylated', 'Full-methylated', and 'unassigned'. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `beta_UMAP.py `_ | Perform `UMAP `_ (Uniform Manifold Approximation and Projection) for samples. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `beta_selectNbest.py `_ | Select the K best features using ANOVA, Mutual information or Chi-squared stat. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `beta_combat.py `_ | Corrects batch effect using the `combat `_ algorithm. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ --- Differential CpG Analysis Modules --------------------------------- These modules identify CpGs that are differentially methylated between experimental or biological groups. +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Name | Description | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `dmc_Bayes.py `_ | Differential CpG analysis using the Bayesian approach. (for 450K/850K data) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `dmc_bb.py `_ | Differential CpG analysis using the beta-binomial model. (for RRBS/WGBS count data) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `dmc_fisher.py `_ | Differential CpG analysis using Fisher's Exact Test. (for RRBS/WGBS count data) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `dmc_glm.py `_ | Differential CpG analysis using the `GLM `_ generalized liner model. (for 450K/850K data) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `dmc_logit.py `_ | Differential CpG analysis using logistic regression model. (for RRBS/WGBS count data) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `dmc_nonparametric.py `_ | Differential CpG analysis using `Mann-Whitney U test `_ for two group comparison, and the `Kruskal-Wallis H-test `_ for multiple groups comparison. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `dmc_ttest.py `_ | Differential CpG analysis using T test. (for 450K/850K data) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Predictive Modules ------------------ These modules aim to predict phenotypes or biological attributes from DNA methylation profiles. +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Name | Description | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | `predict_sex.py `_ | Predict sex based on the semi-methylation (also known as genomic imprinting) ratio. | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+