ClustalW is a computer tool of significant importance in bioinformatics. Primarily, biologists and statisticians used it for multiple sequence alignment. Many versions of ClustalW over the development of the algorithm are available now.
How to perform a search on ClustalW?
1. Go to the ClustalW sequence entry page. Please look over the page to become familiar with the information available and the data required for a search. Each of the entry window or drop menu titles is a link to information about that part of the form. For the purposes of this game, there is no need to change none of the parameters for a ClustalW multiple alignment.
2. When performing ClustalW alignments outside of this game (http://www2.ebi.ac.uk/clustalW/), no need to enter an email address unless it is necessary that the results may return by email. This should only be necessary when aligning many sequences or sequences that are very long. The sequences that will be provided in the game should be aligned quickly (30 sec to 1 min.), so no email address is needed.
3. If desired, a sequence title may be entered, but is not necessary for the alignment to succeed.
4. If the alignment will only be viewed on a screen, a color alignment can be done by selecting yes in the “Color Alignment” drop menu. For the game, do not change this option. This tutorial will only explain the results obtained with a non-color alignment.
5. ClustalW can generate phylogenetic trees when a properly formatted alignment is input. The subject of phylogenetics is beyond the scope of this game, but further information can be found elsewhere on the Internet.
6. The remainder of the parameters influence the way the program will score and build the alignments. These have been optimized for general alignments, so the defaults will be used.
7. Paste the sequences below into the sequence entry window. During the game, copy the preformatted sequences obtained from the game notebook “Reformated Sequences for ClustalW Input” page into the sequence entry window. Remember that these sequences should be in concatenated FASTA format.
Examples
>query
MKNTLLKLGVCVSLLGITPFVSTISSVQAERTVEH
KVIKNETGTISISQLNKNVHTELGYFSGEAVPS NGLVLNTSKGLVLVDSSWDDKLTKELIEMVEKKFKKRV TDVIITHAHADRIGGMKTLKERGIKAHSTALT AELAKKNGYEEPLGDLQSVTNLK FGNMKVETFYPGKGHTEDNIVVWLPQYQILAGGCLVKSASSKDLGNV ADAYV NEWSTSIENVLKRYGNINLVVPGHGEVGDRGLLLHTLDLLK
>gi|2984094
MGGFLFFFLLVLFSFSSEYPKHVKETLRKITDRIYGVFGVYEQVSYENRGFISNAY FYVADDGVLVVDALSTYKLGKELIESIRSVTNKPIRFLVVTHYHTDHFYGAKAFR EVGAEVIAHEWAFDYISQPSSYNFFLARKKILKEHLEGTELTPPTITLTKNLNVYLQ VGKEYKRFEVLHLCRAHTNGDIVVWIPDEKVLFSGDIVFDGRLPFLGSGNSRTWL VCLDEILKMKPRILLPGHGEALIGEKKIKEAVSWTRKYIKDLRETIRKLYEEGCDVE
CVRERINEELIKIDPSYAQVPVFFNVNPVNAYYVYFEIENEILMGE
>gi|115023|sp|P10425|
MKKNTLLKVGLCVSLLGTTQFVSTISSVQASQKVEQIVIKNETGTISISQLNKNVW VHTELGYFNGEAVPSNGLVLNTSKGLVLVDSSWDNKLTKELIEMVEKKFQKRVTD VIITHAHADRIGGITALKERGIKAHSTALTAELAKKSGYEEPLGDLQTVTNLKFGNTK VETFYPGKGHTEDNIVVWLPQYQILAGGCLVKSAEAKNLGNVADAYVNEWSTSIE
NMLKRYRNINLVVPGHGKVGDKGLLLHTLDLLK
>gi|115030|sp|P25910|
MKTVFILISMLFPVAVMAQKSVKISDDISITQLSDKVYTYVSLAEIEGWGMVPSNGM IVINNHQAALLDTPINDAQTEMLVNWVTDSLHAKVTTFIPNHWHGDCIGGLGYLQR KGVQSYANQMTIDLAKEKGLPVPEHGFTDSLTVSLDGMPLQCYYLGGGHATDNIV VWLPTENILFGGCMLKDNQATSIGNISDADVTAWPKTLDKVKAKFPSARYVVPGH GDYGGTELIEHTKQIVNQYIESTSKP
>gi|282554|pir||S25844
MTVEVREVAEGVYAYEQAPGGWCVSNAGIVVGGDGALVVDTLSTIPRARRLAEWV
DKLAAGPGRTVVNTH FHGDHAFGNQVFAPGTRIIAHEDMRSAMVTTGLALTGLWP RVDWGEIELRPPNVTFRDRLTLHVGERQVE LICVGPAHTDHDVVVWLPEERVLFAGD VVMSGVTPFALFGSVAGTLAALDRLAELEPEVVVGGHGPVAGP EVIDANRDYLRWV QRLAADAVDRRLTPLQAARRADLGAFAGLLDAERLVANLHRAHEELLGGHVRDAMEI FAELVAYNGGQLPTCLA
8. Click on the button of “Run ClustalW”. Wait for the results.
Interpreting Results on ClustalW
1. Remember that unlike the pairwise alignments in BLAST that align only the best matching segments of the hit and query sequences, ClustalW finds the best alignment over the full length of each sequence submitted. The sequences formatted by the game are the full length sequences of the query and hits as they appear in GenBank or GenPept.
2. The first section of the ClustalW Results page contains two buttons. One returns the user to the sequence entry page to run another analysis. The other (labelled JalView) takes the user to a Java Applet alignment editor called JalView. Because all game users may not have Java ready browsers, the game does not support this option and concentrates on the alignment that may return in the users browser window. (JalView brings up the ClustalW results in a color alignment with a histogram showing the degree of similarity between all sequences. Scientists may edit this alignment for publication).
3. The next section of the result page (labelled “Pairwise Scores”) gives some information about how the program ran. It gives the names and lengths of the sequences read, the scores obtained as each sequence may align in a pairwise alignment with each other sequence.
4. The section labelled “Your guide tree:” gives a guide tree to construct a phylogenetic tree. The alignment builts by first aligning the two most similar sequences and then adding the other sequences to the alignment in descending order of similarity.
5. The next section shows the actual multiple alignment of the input sequences. Each group of lines shows the sequence names followed by 50 letters (or dashes for gaps) of each sequence. That part of the alignment for the next group shows the next 50 positions of the alignment, and so on.
6. Under aligned protein sequences there is a line containing the characters ., :, and *.
Meaning of characters
These characters mean the following:
* = this column of the alignment contains identical amino acid residues in all sequences (or similar bases if DNA sequences are in alignment)
: = this column of the alignment consists of different but highly conserved (very similar) amino acids
. = this column of the alignment consists of different amino acids that are somewhat similar
Blank = this column of the alignment consists of dissimilar amino acids or gaps (or different bases if there is an alignment in the sequences of DNA)
Outcomes of results
7. With aligned DNA sequences, the character * will only appear, and it means that the same base is present at that position in all sequences in the alignment.
8. There are many uses for ClustalW alignments, but for this game (review number 1 in this section) there are three main pieces of information that it will provide.
A fair number (at least 30-40%) of similarities (., :, or *) spread over the full lengths of the sequences, it means they are likely to be related. If there is a group of a larger number (at least 50-60%) of similarities in a segment or segments of the alignment, the sequences are likely to share one or more functional domains. In case of a few similarities (less than 20-25%) in the alignment, the sequences are not likely to be functionally related.
The hits seen in the BLAST output must be random similarities to different parts of the query sequence. This does not mean that the query sequence is not related to any of the hit sequences; it just means that it is not likely to be functionally related to any known terrestrial families of proteins.
9. For the DNA sequence multiple alignment performed for this tutorial, around 70 identical nucleotides spread over the full lengths of the sequences. It indicates that all of the sequences relate, and if these sequences are high BLAST hits, that the sample is likely to be a terrestrial contaminant.
10. Go to this ClustalW result page to see an alignment that illustrates the last possibility in number 8 above.
Also read: Everything You Need To Know About Databases In Biological Experiments