Using CpG Island Explorer to find out the CpG Islands

CpG Island Explorer is developed by Wang Yong, Patrick from Department of Zoology, University of Hong Kong. This program is written in java, aimed at CpG island searching. A specific feature of this program is that a file containing multiple DNA sequences in FASTA or GB format can be processed in one operation.

This program is introduced in the paper

An evaluation of new criteria for CpG islands in the human genome as gene markers

by Yong Wang and Frederick C.C. Leung, Bioinformatics. 2004 May 1;20(7): 1170-7.

The current version is V2.0. If you want to run the program in your machine, you can download it here. Installing J2SE SDK is necessary for users who have not done so. The J2SE is available in this web site http://java.sun.com/j2se/1.5.0/download.jsp.

For Windows or Mac users, simply double-clicking the icon of CpGIE.jar will launch the user-interface of the program. Please note that the installed file is an Executable Jar file. Its being unexpectedly upzipped automatically by double-clicking is actually an opening error. You may solve the problem by selecting java as the program to open the file in the right-click menu.

To launch it in Linux and Unix, please go to the directory that contains the file CpGIE.jar first, and type the following command:

java -jar CpGIE.jar

In case of oversize of input file (in general, >5Mb), java will give you a warning of OutOfMemoryError. You may enlarge the heapsize of java by using the following command:

java –Xmx***m –jar CpGIE.jar

The proper heapsize (***) in this command can be estimated by multiplying input file size with 10. For example, a heapsize of 256m is sufficient in processing a 20Mb sequence.

If java command does not work, try using the full path of java. For example, if the J2SE SDK is installed in C:\j2sdk1.4.1_01\bin\, type the following command:

C:\j2sdk1.4.1_01\bin\java -jar CpGIE.jar

To avoid using the full path every time, you will need to add the value of your path (For example C:\j2sdk1.4.1_01\bin) to the Path variable. In Windows XP, this variable can be added at StartàSettingsàControl PanelàSystemàAdvancedàEnvironment VariablesàSystem VariablesàPath.

If you don't want to run the program in your machine, you can launch the program remotely at the BIOINFO server as follows.

To launch the program in Windows platform, do the following:

 

To launch the program in UNIX platform such as Linux, Solaris, IRIX, etc, do the following:

How to run the program

 

 

 

Upgrade history:

V1.5-2003.5.23 First launch.

V1.6-2003.6.15 Files containing multiple sequences could be processed.

V1.7-2004.9.24 A bug that caused the program unable to search CpG islands across whole Drosophila chromosomes was fixed.

V1.8-2004.11.9 Algorithm for calculating mono- and di-nucleotide frequencies was rewritten. Ns(unknown nucleotides) if have in a DNA sequence are neglected in calculation.

V1.9-2005.1.24 A new function capable of summarizing the output of CpG islands was built in. The program can process DNA sequences in GB format.

V2.0-2005.6.6 A new option of "Open Internet Sequence" enables users to download sequence(s) from public database by simply using accession number(s). An Edit menu provides functions of Copy and Paste. Please make sure that there is no read-in sequences in memo right before starting to process your pasted sequences. You may re-launch the program to avoid the problem.

Bug reports and comments are welcome!  The author’s email contact address is: wangyong@hkucc.hku.hk

Important Note: CpGIE is a free software only for a non-commercial purpose. It should not be redistributed or used for any commercial purpose without written permission from the author and the University of Hong Kong.