Common Sequence Editing Issues
Under the right conditions, trace files should require very little manual editing. However optimal conditions cannot always be met, and several issues can arise that may need to be corrected.
1) Minimal Background Noise
Trace files with minimal background noise may need more manual editing than high quality traces, but should still produce a reliable read.
2) Dye Blobs
If the dye blob occurs at the beginning of the trace, it can be corrected by deleting the nucleotide sequence before the blob. If it occurs in the middle of the trace, it is best to leave the nucleotide sequence ambiguous. If sequencing was performed bidirectionally, the opposite trace might be able to rescue the final sequence.
3) Low Quality Traces
Low quality traces have peaks that are not well defined and contain a run of repeated bases that may appear to merge. These types of traces will require significant manual editing. Align to related sequences if possible to resolve repeat bases.
4) Partial Co-amplification of Contaminants
Sometimes only part of contaminating sequences gets amplified. This can be corrected by deleting the sequence upstream to the "drop" in signal.
5) Double-peaks in Co-amplification of Similar Sequences
Co-amplification is when two or more sequences from related species are amplified simultaneously. If only a few double-peaks are present in the Barcode region, they can be left as ambiguous bases at the discretion of the user.
6) Homopolimeric Tracts
Homopolimeric tracts, or repetition of bases or sequences, are often natural occurrences and cannot be avoided. In certain situations, these tracts can result in out of sync sequences downstream from the tract. Traces with these tracts can often be rescued by bidirectional sequencing. To do this, delete the sequence downstream of the homopolymeric tract and align forward and reverse traces with a reference sequence. Manually overlap forward and reverse traces at the homopolymeric tract.
7) Alignment Errors
Shifts in the reading frame usually means a nucleotide or gap was added to one of the traces incorrectly. This type of error can usually be corrected by inspecting the properly aligned section of the sequence to determine the location of the mistake.
8) Indels (Insertion/Deletions)
Indels are insertions or deletions of nucleotides in a sequence. They can occur naturally during the evolutionary life of a species or as a result of poor sequence alignment. Identifying indels often requires comparing sequences from multiple species to determine where gaps should be located. Indels are naturally occurring and correctly placed if two criteria are met:
Animal groups with known indels:
9) Stop Codons
10) PCR Primers included in the Sequence
PCR primers need to be removed from the sequence whenever possible to ensure the proper sequence length and reading frame are achieved. Different primers will be used depending on the taxonomic group being analyzed, so maintaining a copy of the primer sequence is essential to recognize and delete it from the sequence. The standard barcode length for most animal species is 658bp for sequences with no indels (see section 8. Indels). As long as traces contain approximately 500 bp of high quality sequence, PCR primers should be visible at the 3' end of each trace.
Tips and TroubleshootingSometimes it is not possible to recognize the PCR primer in a trace. To ensure a trace file is trimmed in the correct nucleotide position, a sequence with the correct length from a closely related species can be downloaded from BOLD and aligned to the original trace. Using the BOLD sequence as reference, the trace can be trimmed to the same starting and ending point. This is an easy way to ensure trace files are in the correct reading frame.
11) Reading Frame Shifts
The reading frame refers to the way a nucleotide sequence is translated into amino acids. There are 3 possible reading frames for a sequence, though only one is correct. A sequence is in the correct reading frame if translation starting at the second nucleotide results in a sequence with no stop codons.
Tips and Troubleshooting
Before translating a sequence into amino acids, it is important to ensure that the correct genetic code table is being used. Most invertebrates will use a generic "invertebrate mitochondrial" translation table, however vertebrates and plants have their own specific tables. If the wrong table is used, false stop codons may appear in the sequence.
Back to Top