29. predict_sex.py

29.1. Description

Predict sex based on the semi-methylation (also known as genomic imprinting) ratio. This method leverages the fact that, due to X chromosome inactivation, females have a higher proportion of semi-methylated CpGs on their X chromosomes. A log2(ratio) greater than 0 indicates a female, while a log2(ratio) less than 0 indicates a male.

29.2. Options

Options:
--version

show program’s version number and exit

-h, --help

show this help message and exit

-i INPUT_FILE, --input_file=INPUT_FILE

Tab-separated data frame file containing beta values with the 1st row containing sample IDs and the 1st column containing CpG IDs.

-x XPROBE_FILE, --xprobe=XPROBE_FILE

File with CpG IDs mapped to the X chromosome, with one probe listed per row.

-c CUTOFF, --cut=CUTOFF

The cutoff of log2(SM ratio) to determine the sex prediction. Log2(SM ratio) greater than this cutoff indicates a female, while a log2(ratio) less than this cutoff indicates a male. default=0.0

-o OUT_FILE, --output=OUT_FILE

The prefix of the output file.

29.3. Input files (examples)

29.4. Command

predict_sex.py -x chrX_CpGs.txt.gz -i test_10.tsv.gz -o output

29.5. Output files

  • output.predicted_sex.tsv

$ cat output.predicted_sex.tsv
Sample_ID log2_SM_ratio Predicted_sex
2621  -2.249628052954919  Male
2622  -2.2671726671830674 Male
2691  1.4530581933290616  Female

29.6. Evaluation

When evaluating this classifier with Illumina HumanMethylation450 BeadChip data (GSE105018) from 832 males and 826 females, the prediction accuracy achieved is 100%.

../_images/predict_sex.png