|
If some SNPs have been genotyped in the sample,
then information on these SNPs and on those that
are in tight linkage disequilibrium with them (as
measured by R2) are already available
and therefore do not require further tagging. Thus, users
are advised to provide a list of already-genotyped
SNPs through the forced selection file. This
can force the program to select these SNPs as tag SNPs,
before selecting further tag SNPs for the remaining
untagged SNPs.
Algorithmic details:
Central to the algorithm is an asymmetric matrix of
"similarities" in which element R[ij]
determines whether SNP i is able to serve as a tag
SNP for SNP j. If SNP d has been already
genotyped, then we set R[id] = 0 for all i
≠ d, and R[dd] = 1. This guarantees
that SNP d can be tagged by itself but not any of
the other SNPs, thereby ensuring its inclusion as a tag SNP.
Since these SNPs cannot be assayed by the genotyping platform,
they should not be selected as tag SNPs, but rather should
be tagged by other SNPs if possible. However, some of them
may not be tagged by any "assayable" SNPs; these
will be left untagged at the end of the procedure and indicated as
such in the final output.
Algorithmic details:
For the asymmetric matrix of "similarities" R[ij]
we set R[ni] = 0 for each "non-assayable"
SNP n and all i including i=n.
This means that SNP n is unable to tag any SNP including
itself, thereby ensuring that none of the "non-assayable"
SNPs is selected as tag SNPs.
To avoid the situation that some "non-assayable" SNPs
are left untagged because none of the "assayable" SNPs
capable of tagging them is included as tag SNP in the selection process,
we first check if any of the already-genotyped SNPs can tag the
"non-assayable" SNPs. The "non-assayable"
SNPs that cannot be tagged constitute the set of untagged "non-assayable"
SNPs. The following process is then repeated until no "assayable"
SNP can tag any of the untagged "non-assayable" SNPs:
Each "assayable" (but not already-genotyped) SNP is
checked for the no. of untagged "non-assayable" SNPs
that it is able to tag; the one with the largest number is marked out as
a SNP for forced selection (in the same way that already-genotyped
SNPs are treated). The "non-assayable" SNPs that it tags
are then removed from the set of untagged "non-assayable" SNPs.
|