Common Sequence Editing Issues

Search Handbook

Common Sequence Editing Issues

Under the right conditions, trace files should require very little manual editing. However optimal conditions cannot always be met, and several issues can arise that may need to be corrected.

1) Minimal Background Noise

Trace file with minimal background noise

Trace files with minimal background noise may need more manual editing than high quality traces, but should still produce a reliable read.

2) Dye Blobs

Dye Blob st at the beginning of a atrace Dye blob at the beginning of the trace

If the dye blob occurs at the beginning of the trace, it can be corrected by deleting the nucleotide sequence before the blob. If it occurs in the middle of the trace, it is best to leave the nucleotide sequence ambiguous. If sequencing was performed bidirectionally, the opposite trace might be able to rescue the final sequence.

3) Low Quality Traces

Low quality trace file

Low quality traces have peaks that are not well defined and contain a run of repeated bases that may appear to merge. These types of traces will require significant manual editing. Align to related sequences if possible to resolve repeat bases.

4) Partial Co-amplification of Contaminants

coamplification of contaminants Co-amplification of contaminant

Sometimes only part of contaminating sequences gets amplified. This can be corrected by deleting the sequence upstream to the "drop" in signal.

5) Double-peaks in Co-amplification of Similar Sequences

double peak amplification Double peak caused by co-amplification

Co-amplification is when two or more sequences from related species are amplified simultaneously. If only a few double-peaks are present in the Barcode region, they can be left as ambiguous bases at the discretion of the user.

6) Homopolimeric Tracts

Homopolimeric tract of tyrosine

Homopolimeric tracts, or repetition of bases or sequences, are often natural occurrences and cannot be avoided. In certain situations, these tracts can result in out of sync sequences downstream from the tract. Traces with these tracts can often be rescued by bidirectional sequencing. To do this, delete the sequence downstream of the homopolymeric tract and align forward and reverse traces with a reference sequence. Manually overlap forward and reverse traces at the homopolymeric tract.

7) Alignment Errors

Misaligned forward and reverse traces Misalignment of forward and reverse trace files

Shifts in the reading frame usually means a nucleotide or gap was added to one of the traces incorrectly. This type of error can usually be corrected by inspecting the properly aligned section of the sequence to determine the location of the mistake.

8) Indels (Insertion/Deletions)

Indels Nucleotide sequence with real indels and correspondent amino acid translation

Indels are insertions or deletions of nucleotides in a sequence. They can occur naturally during the evolutionary life of a species or as a result of poor sequence alignment. Identifying indels often requires comparing sequences from multiple species to determine where gaps should be located. Indels are naturally occurring and correctly placed if two criteria are met:

All indels occur in multiples of three nucleotides.
The alignment does not contain stop codons or frameshifts (ie. translation results in correct COI amino acid sequences).

Animal groups with known indels:

Mollusca
Hymenoptera
Nematoda
Hemiptera

9) Stop Codons

Stop codons in the amino acid translation

Stop Codons are sequences of three nucleotides which signal the termination of a translation. Most assembling software will recognize and expose the presence of stop codons in a sequence. As stop codons should be eliminated once the sequence has been placed in the correct reading frame, they should not be present in corrected COI-5P barcode sequences (see section 11. Reading Frame Shifts below). BOLD will flag sequences with stop codons upon upload. These should be validated and corrected when possible.

10) PCR Primers included in the Sequence

PCR primers need to be removed from the sequence whenever possible to ensure the proper sequence length and reading frame are achieved. Different primers will be used depending on the taxonomic group being analyzed, so maintaining a copy of the primer sequence is essential to recognize and delete it from the sequence. The standard barcode length for most animal species is 658bp for sequences with no indels (see section 8. Indels). As long as traces contain approximately 500 bp of high quality sequence, PCR primers should be visible at the 3' end of each trace.

Tips and Troubleshooting

Sometimes it is not possible to recognize the PCR primer in a trace. To ensure a trace file is trimmed in the correct nucleotide position, a sequence with the correct length from a closely related species can be downloaded from BOLD and aligned to the original trace. Using the BOLD sequence as reference, the trace can be trimmed to the same starting and ending point. This is an easy way to ensure trace files are in the correct reading frame.

11) Reading Frame Shifts

Shifts in reading frame Incorrect and correct amino acid translations for the same nucleotide sequences

The reading frame refers to the way a nucleotide sequence is translated into amino acids. There are 3 possible reading frames for a sequence, though only one is correct. A sequence is in the correct reading frame if translation starting at the second nucleotide results in a sequence with no stop codons.

Tips and Troubleshooting

Before translating a sequence into amino acids, it is important to ensure that the correct genetic code table is being used. Most invertebrates will use a generic "invertebrate mitochondrial" translation table, however vertebrates and plants have their own specific tables. If the wrong table is used, false stop codons may appear in the sequence.

tag_sequence
tag_tracefile
tag_analysis

Handbook

You are viewing an outdated version of BOLD Systems. Visit the latest version at boldsystems.org.

Search Handbook

Common Sequence Editing Issues

1) Minimal Background Noise

2) Dye Blobs

3) Low Quality Traces

4) Partial Co-amplification of Contaminants

5) Double-peaks in Co-amplification of Similar Sequences

6) Homopolimeric Tracts

7) Alignment Errors

8) Indels (Insertion/Deletions)

Animal groups with known indels:

9) Stop Codons

10) PCR Primers included in the Sequence

Tips and Troubleshooting

11) Reading Frame Shifts

Tips and Troubleshooting

You are viewing an outdated version of BOLD Systems. Visit the latest version at boldsystems.org.

Databases

Resources

Organization

Community

Partners