Input file


To simulate CNVs, a CNV information file, which specifies the CNV locations, CNV types, frequencies, and effect sizes, is required. The file expects 5 columns:

Start: the start position of the CNV. Note that the position corresponds to the order of the SNP site in the reference sequences, not the chromosomal positions.
End: the end position of the CNV. Also note that the position corresponds to the order of the SNP site in the reference sequences, not the chromosomal positions.
Type: -1 for a deletion, 1 for a two-copy duplication (two copies on a chromosome), and 2 for a three-copy duplications (three copies on a chromosome).
Frequency: frequency of the CNV.
Effect size: The effect size of the CNV on the disease.

An example of a CNV information file:

1 10 -1 0.2 0.667
1 10 1 0.1 1.2
1 10 2 0.1 1.2
20 35 1 0.3 2

The file specifies two CNV segments. The first segment starts from positions 1 to 10, and the second segment starts from positions 20 to 35. The first segment has a deletion with frequency of 20% and odds ratio of 0.667, a two-copy duplication with frequency of 10% and odds ratio of 1.2, and a three-copy duplication with frequency of 10% and odds ratio of 1.2. The normal copy (i.e., one copy on a chromosome) in the segment has frequency of 60% and is the reference for the odds ratio. The second segment starts from positions 20 to 35. It has a two-copy duplication with frequency of 30% and odds ratio of 2. It also has a normal copy with frequency of 70%. We have compiled CNV profiles of tumor tissues for 33 cancer types, which can be found under the cnv folder in the profile folder downloaded from the OmicsSIMLA profile file. The file name starts with an abbreviated name of a cancer type. For example, OV_cnv_profile.txt is the CNV profile for the ovarian cancer. The corresponding cancer type of each abbreviated name can be found here.

Related options: -cnvfile