BioTech FYI Center - Resources

ClustalW FAQ (Frequently Asked Questions)

Part:   1  2  3  4  5  6 

(Continued from previous part...)

What type of sequences can ClustalW align?

It can align either nucleotide or protein sequences. In the case of nucleotide sequences, it will align them as they are input - the program does not provide the option of specifying DNA strands. The EMBOSS tool revseq can be used to reverse and/or complement nucleotide sequences.

What input formats does ClustalW accept?

The program accepts sequences in the following formats:

NBRF/PIR, EMBL/UniProt, Pearson (Fasta), GDE, ALN/ClustalW, GCG/MSF, RSF (see the Clustal help pages for details about formats).

The sequences can either be pasted into the web form or uploaded to the web form in a file. It is very important that each of the sequences has a unique name. If they do not, the program will fail. There must be no empty lines, white spaces or control characters between sequences or at the top of the file. This will also cause the program to fail.

What output formats does ClustalW produce?

There are a number of options provided as output for the user:

aln with numbers (default), aln without numbers, gcg MSF, phylip, pir and gde.

The user can specify which of these they want on the web form in the OUTPUT section. There is also an option to specify the order that the sequences appear in the alignment: aligned (default) or in the order in which they were input. The alignment will appear on the results page along with details of scores and guide trees. The alignment can be obtained on its own by clicking on the alignment file option at the top (.aln). This file can be opened in a separate window and/or saved to a file. 

How can I save my alignment to a file?

The alignment will appear on the results page along with details of scores and guide trees. The alignment can be obtained on its own by clicking on the alignment file option at the top (.aln). This file can be opened in a separate window or saved to a file.

Is there a limit on the number of sequences or the size of the file that I submit to ClustalW?


The input for ClustalW is limited to a maximum of 500 sequences or to a 10MB file (whichever is smaller). When the input file or the number of sequences is large, ClustalW can run for days and in some cases may not finish at all. If you plan to input large amounts of data/sequences, you should use the "RESULTS: email" option and "CPU MODE: multiple".

Email jobs are allowed to run for more than 24 hours and the results are kept for a week.

What do the file extensions mean that I get in my results?

On our ClustalW submission page, when you submit a number of sequences using the default parameters, you retrieve a .aln and a .dnd file. The .aln file is the alignment and the .dnd file is a guide tree - it is not a phylogenetic tree.

To get an accurate phylogenetic tree, you need to use the .aln file as input and put this back into the ClustalW form. This time you need to choose one of the tree options - nj, phylip or dist (all methods for making phylogenetic trees). This time you will retrieve a .ph (always), .dst and/or .nj (depending on options), which are phylogenetic trees.

The .input is your input and the .output is the results that are output.

(Continued on next part...)

Part:   1  2  3  4  5  6 

ClustalW FAQ (Frequently Asked Questions)