# 1. Input file and data format¶

## 1.1. BED file¶

BED (Browser Extensible Data) format is commonly used to describe blocks of genome. The BED format consists of one line per feature, each containing 3-12 columns of data. It is 0-based (meaning the first base of a chromosome is numbered 0). It is s left-open, right-closed. For example, the bed entry “chr1 10 15” contains the 11-th, 12-th, 13-th, 14-th and 15-th bases of chromosome-1.

- BED12 file
- The standard BED file which has 12 fields. Each row in this file describes a gene or an array of disconnected genomic regions. Details are described here
- BED3 file
- Only has the first three required fields (chrom, chromStart, chromEnd). Each row is used to represent a single genomic region where “score” and “strand” are not necessary.
- BED3+ file
- Has at least three columns (chrom, chromStart, chromEnd). It could have other columns, but these additional columns will be ignored.
- BED6 file
- Has the first six fields (chrom, chromStart, chromEnd, name, score, strand). Each row is used to represent a single genomic region and their associated scores, or in cases where “strand” information is essential.
- BED6+ file
- Has at least six columns (chrom, chromStart, chromEnd, name, score, stand). It could have other columns, but these additional columns will be ignored.

## 1.2. Proportion values¶

In bisulfite sequencing (RRBS or WGBS), the methylation level of a particular CpG or region can be represented by a “proportion” value. We define the proportion value as a pair of integers separated by comma (“,”) with the first integer (m, 0 <- m <- n) representing “number of methylated reads” and the second integer (n, n >- 0) representing “number of total reads”. for example:

```
0,10 1,27 2,159 #Three proportions values indicated 3 hypo-methylated loci
7,7 17,19 30,34 #Three proportions values indicated 3 hyper-methylated loci
```

## 1.3. Beta values¶

The Beta-value is a value between 0 and 1, which can be interpreted as the approximation of the percentage of methylation for a given CpG or locus. One can convert proportion value into beta value, but not vice versa. In the equation below, C is the “probe intensity” or “read count” of methylated allele, while U is the “probe intensity” or “read count” of unmethylated allele.

## 1.4. M values¶

The M-value is calculated as the log2 ratio of the probe intensities (or read counts) of methylated allele versus unmethylated allele. In the equation below, C is the “probe intensity” or “read count” of methylated allele, while U is the “probe intensity” or “read count” of unmethylated allele. w is the offset or pseudo count added to both denominator and numerator to avoid unexpected big changes and performing log transformation on zeros.

## 1.5. Convert Beta value to M value or *vice versa*¶

The relationship between Beta-value and M-value is shown as equation and figure: