WCLUSTAG is an advanced tag SNP selection program developed from Ao's algorithms (2005). It allows users to specify different R2 thresholds for SNPs of variable degrees of interest. Downloadable version is available here. Citation can refer to P. C. Sham et al., Bioinformatics 23, 129 (Jan 1, 2007).

 

If some SNPs have been genotyped in the sample, then information on these SNPs and on those that are in tight linkage disequilibrium with them (as measured by R2) are already available and therefore do not require further tagging. Thus, users are advised to provide a list of already-genotyped SNPs through the forced selection file. This can force the program to select these SNPs as tag SNPs, before selecting further tag SNPs for the remaining untagged SNPs.

Algorithmic details:
Central to the algorithm is an asymmetric matrix of "similarities" in which element R[ij] determines whether SNP i is able to serve as a tag SNP for SNP j. If SNP d has been already genotyped, then we set R[id] = 0 for all id, and R[dd] = 1. This guarantees that SNP d can be tagged by itself but not any of the other SNPs, thereby ensuring its inclusion as a tag SNP.

Since these SNPs cannot be assayed by the genotyping platform, they should not be selected as tag SNPs, but rather should be tagged by other SNPs if possible. However, some of them may not be tagged by any "assayable" SNPs; these will be left untagged at the end of the procedure and indicated as such in the final output.

Algorithmic details:
For the asymmetric matrix of "similarities" R[ij] we set R[ni] = 0 for each "non-assayable" SNP n and all i including i=n. This means that SNP n is unable to tag any SNP including itself, thereby ensuring that none of the "non-assayable" SNPs is selected as tag SNPs.

To avoid the situation that some "non-assayable" SNPs are left untagged because none of the "assayable" SNPs capable of tagging them is included as tag SNP in the selection process, we first check if any of the already-genotyped SNPs can tag the "non-assayable" SNPs. The "non-assayable" SNPs that cannot be tagged constitute the set of untagged "non-assayable" SNPs. The following process is then repeated until no "assayable" SNP can tag any of the untagged "non-assayable" SNPs: Each "assayable" (but not already-genotyped) SNP is checked for the no. of untagged "non-assayable" SNPs that it is able to tag; the one with the largest number is marked out as a SNP for forced selection (in the same way that already-genotyped SNPs are treated). The "non-assayable" SNPs that it tags are then removed from the set of untagged "non-assayable" SNPs.

Should you have any comments, please contact us at pcsham@hku.hk
Copyright © 2006-2010 Pak Sham. All rights reserved