26. dmc_logit.py

26.1. Description

This program performs differential CpG analysis using logistic regression model based on proportion values. It allows for covariable analysis. Users can choose to use “binomial” or “quasibinomial” family to model the data. The quasibinomial family estimates an addition parameter indicating the amount of the overdispersion.

26.2. Options

--version

show program’s version number and exit

-h, --help

show this help message and exit

-i INPUT_FILE, --input_file=INPUT_FILE

Data file containing methylation proportions (represented by “methyl_count,total_count”, eg. “20,30”) with the 1st row containing sample IDs (must be unique) and the 1st column containing CpG positions or probe IDs (must be unique). This file can be a regular text file or compressed file (*.gz, *.bz2) or accessible url.

-g GROUP_FILE, --group=GROUP_FILE

Group file defining the biological groups of each sample as well as other covariables such as gender, age. The first variable is grouping variable (must be categorical), all the other variables are considered as covariates (can be categorical or continuous). Sample IDs should match to the “Data file”.

-f FAMILY_FUNC, --family=FAMILY_FUNC

Error distribution and link function to be used in the GLM model. Can be integer 1 or 2 with 1 = “quasibinomial” and 2 = “binomial”. Default=1.

-o OUT_FILE, --output=OUT_FILE

The prefix of the output file.

26.3. Input files (examples)

26.4. Command

$ dmc_logit.py -i test_04_TwoGroup.tsv.gz -g test_04_TwoGroup.grp.csv -o output_quasibin
$ dmc_logit.py -i test_04_TwoGroup.tsv.gz -g test_04_TwoGroup.grp.csv -f 2  -o output_bin