19. beta_topN.py

19.1. Description

This program picks the top N rows (according to standard deviation) from the input file. The resulting file can be used for clustering and PCA analysis.

Example of input

CpG_ID  Sample_01       Sample_02       Sample_03       Sample_04
cg_001  0.831035        0.878022        0.794427        0.880911
cg_002  0.249544        0.209949        0.234294        0.236680
cg_003  0.845065        0.843957        0.840184        0.824286

19.2. Options

--version show program’s version number and exit
-h, --help show this help message and exit
-i INPUT_FILE, --input_file=INPUT_FILE
 Tab-separated data frame file containing beta values with the 1st row containing sample IDs and the 1st column containing CpG IDs.
 Number of most variable CpGs (ranked by standard deviation) to keep. default=1000
-o OUT_FILE, --output=OUT_FILE
 The prefix of the output file.

19.3. Input files (examples)

19.4. Command

$beta_topN.py -i test_05_TwoGroup.tsv.gz -c 500 -o test_05_TwoGroup

19.5. Output file

  • test_05_TwoGroup.sortedStdev.tsv
  • test_05_TwoGroup.sortedStdev.topN.tsv