The input FASTA file might contain characters other than "A", "G", "C" and "T", which are used to represent gaps and degenerate bases. This kind of sequence will cause problems in the translation steps during the annotation. As a result, PMGA does not accept sequences containing characters other than "A", "G", "C" and "T". This CleanSeq module will convert a sequence with unacceptable characters to a sequence only containing "A", "G", "C" and "T".
Basically, the frequencies of all characters in the input sequence will be counted, the degenerate bases will be replaced with the standard bases having the highest frequencies. Please note that this is only a temporary measure. We strongly recommend using your sequencing data to determine this through reads mapping.
The input mitogenome sequence should be in FASTA format. Here is a sample: FASTA File