Using CpG
Island Explorer to find out the CpG Islands
CpG Island Explorer is developed by Wang Yong, Patrick from Department
of Zoology, University of Hong Kong. This program is written in java, aimed at
CpG island searching. A specific feature of this program is that a file
containing multiple DNA sequences in FASTA or GB format can be processed in one
operation.
This program is introduced in the paper
An evaluation of new criteria for CpG islands in the human
genome as gene markers
by Yong Wang and Frederick C.C. Leung, Bioinformatics. 2004 May
1;20(7): 1170-7.
The current version is V2.0. If you want to run the program in your
machine, you can download it here. Installing J2SE
SDK is necessary for users who have not done so. The J2SE is available in
this web site http://java.sun.com/j2se/1.5.0/download.jsp.
For Windows or Mac users, simply double-clicking the icon of
CpGIE.jar will launch the user-interface of the program. Please note that
the
installed file is an Executable Jar file. Its being unexpectedly upzipped
automatically by double-clicking is actually an opening error. You may solve
the problem by selecting java as the program to open the file in the
right-click menu.
To launch it in Linux and Unix, please go to the directory that
contains the file CpGIE.jar first, and type the following command:
java -jar CpGIE.jar
In case of oversize of input file (in general, >5Mb), java will give
you a warning of OutOfMemoryError. You may enlarge the heapsize of java by
using the following command:
java –Xmx***m –jar CpGIE.jar
The proper heapsize (***) in this command can
be estimated by multiplying input file size with 10. For example, a heapsize of
256m is sufficient in processing a 20Mb sequence.
If java command does not work, try using the full path of java. For
example, if the J2SE SDK is installed in C:\j2sdk1.4.1_01\bin\, type the
following command:
C:\j2sdk1.4.1_01\bin\java -jar CpGIE.jar
To avoid using the full path every time, you will need to add the value
of your path (For example C:\j2sdk1.4.1_01\bin) to the Path variable. In
Windows XP, this variable can be added at StartàSettingsàControl
PanelàSystemàAdvancedàEnvironment
VariablesàSystem
VariablesàPath.
If you don't want to run the program in your machine, you can launch
the program remotely at the BIOINFO server as follows.
To launch the program in Windows platform, do the following:



To launch the program in UNIX platform such as Linux, Solaris, IRIX,
etc, do the following:
How to run the program





Upgrade history:
V1.5-2003.5.23 First launch.
V1.6-2003.6.15 Files containing multiple sequences could be processed.
V1.7-2004.9.24 A bug that caused the program unable to search CpG
islands across whole Drosophila chromosomes was fixed.
V1.8-2004.11.9 Algorithm for calculating mono- and di-nucleotide
frequencies was rewritten. Ns(unknown nucleotides) if have in a DNA sequence
are neglected in calculation.
V1.9-2005.1.24 A new function capable of summarizing the output of CpG
islands was built in. The program can process DNA sequences in GB format.
V2.0-2005.6.6 A new option of "Open Internet Sequence" enables users to
download sequence(s) from public database by simply using accession
number(s). An Edit menu provides functions of Copy and Paste. Please make
sure that there is no read-in sequences in memo right before starting to
process your pasted sequences. You may re-launch the program to avoid the
problem.
Bug reports and comments are welcome! The author’s email contact address is: wangyong@hkucc.hku.hk
Important Note: CpGIE is a free software only for a non-commercial purpose. It should not be redistributed or used for any commercial purpose without written permission from the author and the University of Hong Kong.