Input File Formats

  • Files are tab-delimited and no header is needed unless otherwise specified
  • The alleles of one genotype are separated by one space
  • Markers / Marker genotypes are listed in ascending order of their physical positions
- PED Not applicable to HapMap Data
- MAP Not applicable to HapMap Data
- Forced selection
- Forced non-selection
- SNP-specific threshold

Parameters

- Default threshold for SNPs Threshold for SNPs not included in the SNP-specific threshold file (See below)
- Max inter-SNP distance Marker pairs with inter-distance outside that will be excluded in the R2 calculation
- MAF cut-off Only markers with minor allele frequency (MAF) higher than that will be considered, except those included in the forced selection file or has a SNP-specific threshold greater than the Default threshold for SNPs parameter

Sample Datasets

  • Click here to download the sample input files
  • Click here to download the sample output files

File Descriptions

  PED (in .ped extension)

     It shows the familial relationship among individuals and their genotypes for each marker. Each line represents one individual and the genotypes are listed in the same order as that of their corresponding markers in the MAP file (See below)

Family ID   Unique family ID
Individual ID      Unique ID within family
Father ID   Individual ID of father, 0 = founder
Mother ID Individual ID of mother, 0 = founder
Gender ID 1 = male and 2 = female
Affection Reserved for association studies, 0 = unknown
Marker genotypes Genotypes in columns
1 = A, 2 = C, 3 = G, 4 = T, 0 = missing data

     A typical PED file looks like this:

family ID individual ID father ID mother ID sex affection genotype 1  genotype 2  genotype 3 . . .
4567 1 0 0 1 1 1 1 1 2 2 2 . . .
4567 2 0 0 2 1 1 1 1 1 1 2 . . .
4567 3 1 2 1 2 1 1 1 2 1 2 . . .
4567 4 1 2 2 2 1 1 1 2 2 2 . . .
4567 5 1 2 2 1 1 1 1 1 2 2 . . .

  MAP (in .map extension)

     It describes the chromosomal position of each marker. Header is needed.

     A typical MAP file looks like this:

chromosome marker position
3 rs100001 10010000
3 rs100002 10020000
3 rs100003 10030000
3 rs100004 10040000
3 rs100005 10050000
3 rs100006 10060000
3 rs100007 10070000
3 rs100008 10080000
3 rs100009 10090000
3 rs100010 10100000

  Forced selection

     List of markers forced to be tags, i.e. those with known genotype information

     A typical forced selection file looks like this:

marker
rs100003
rs100005
rs100009

  Forced non-selection

     List of markers forced to be tagged, i.e. those with assay design problems

     A typical forced non-selection file looks like this:

marker
rs100002
rs100006
rs100007

  SNP-specific threshold

     Each threshold (between 0 and 1 inclusive) represents the minimum R2 distance required for the marker to be tagged by the others. Markers of particular interest, i.e. functional SNPs, should be given a higher threshold (e.g. 0.8).

     A typical SNP-specific threshold file looks like this:

marker threshold
rs100001 0.8
rs100004 0.8
rs100008 0.4
rs100010 0.3

Should you have any comments, please contact us at pcsham@hku.hk
Copyright © 2006-2010 Pak Sham. All rights reserved