19. beta_topN.py¶

19.1. Description¶

This program picks the top N rows (according to standard deviation) from the input file. The resulting file can be used for clustering and PCA analysis.

Example of input

CpG_ID  Sample_01       Sample_02       Sample_03       Sample_04
cg_001  0.831035        0.878022        0.794427        0.880911
cg_002  0.249544        0.209949        0.234294        0.236680
cg_003  0.845065        0.843957        0.840184        0.824286

19.2. Options¶

Options:

`--version`	show program’s version number and exit
`-h, --help`	show this help message and exit
`-i INPUT_FILE, --input_file=INPUT_FILE`
	Tab-separated data frame file containing beta values with the 1st row containing sample IDs and the 1st column containing CpG IDs.
`-c CPG_COUNT, --count=CPG_COUNT`
	Number of most variable CpGs (ranked by standard deviation) to keep. default=1000
`-o OUT_FILE, --output=OUT_FILE`
	The prefix of the output file.

19.3. Input files (examples)¶

test_05_TwoGroup.tsv.gz

19.4. Command¶

$beta_topN.py -i test_05_TwoGroup.tsv.gz -c 500 -o test_05_TwoGroup

19.5. Output file¶

test_05_TwoGroup.sortedStdev.tsv
test_05_TwoGroup.sortedStdev.topN.tsv