10. beta_PCA.py

10.1. Description

This program performs PCA (principal component analysis) for samples.

Example of input data file

ID     Sample_01       Sample_02       Sample_03       Sample_04
cg_001 0.831035        0.878022        0.794427        0.880911
cg_002 0.249544        0.209949        0.234294        0.236680
cg_003 0.845065        0.843957        0.840184        0.824286
...

Example of input group file

Sample,Group
Sample_01,normal
Sample_02,normal
Sample_03,tumor
Sample_04,tumo
...

Notes

  • Rows with missing values will be removed

  • Beta values will be standardized into z scores

  • Only the first two components will be visualized

  • Variance% explained by each component will be printed to screen

Options:
--version

show program’s version number and exit

-h, --help

show this help message and exit

-i INPUT_FILE, --input_file=INPUT_FILE

Tab-separated data frame file containing beta values with the 1st row containing sample IDs and the 1st column containing CpG IDs.

-g GROUP_FILE, --group=GROUP_FILE

Comma-separated group file defining the biological groups of each sample. Different groups will be colored differently in the PCA plot. Supports a maximum of 20 groups.

-n N_COMPONENTS, --ncomponent=N_COMPONENTS

Number of components. default=2

-l, --label

If True, sample ids will be added underneath the data point. default=False

-c PLOT_CHAR, --char=PLOT_CHAR

Ploting character: 1 = ‘dot’, 2 = ‘circle’. default=1

-a PLOT_ALPHA, --alpha=PLOT_ALPHA

Opacity of dots. default=0.5

-x LEGEND_LOCATION, --loc=LEGEND_LOCATION

Location of legend panel: 1 = ‘topright’, 2 = ‘bottomright’, 3 = ‘bottomleft’, 4 = ‘topleft’. default=1

-o OUT_FILE, --output=OUT_FILE

The prefix of the output file.

10.2. Input files (examples)

10.3. Command

$beta_PCA.py -i cirrHCV_vs_normal.data.tsv -g cirrHCV_vs_normal.grp.csv -o HCV_vs_normal

10.4. Output files

  • HCV_vs_normal.PCA.r

  • HCV_vs_normal.PCA.tsv

  • HCV_vs_normal.PCA.pdf

../_images/HCV_vs_normal.PCA.png