BOLD Systems: Student Portal - Quick Start Guide

Video Tutorial

	1. Registering an Account An introduction to the BOLD Student Data Portal website. This video provides an overview of the system and describes how to register and create a course.
	2. Submitting Data This video follows the submission of a DNA barcode record from the specimen data all the way to the sequence. It focuses on the student interface, but it allows instructors to follow the steps students will need to undertake in order to create their records.
	3. Overseeing a course This video provides an overview of the tools available to instructors to monitor student work and participation. It also describes the steps needed in validating and approving student-generated data for publication on BOLD and GenBank.

SDP Sequence Submission

Analyzing, interpreting, and editing nucleotide sequence data are among the most challenging conceptual aspects of the DNA barcoding workflow. This supplement to the BOLD-SDP user documentation is intended to assist by providing instructors and students with an overview of dye terminator cycle sequencing, recommendations for submitting COI PCR fragments (amplicons) to a sequencing facility, guidance on how to interpret raw sequence data, and detailed instructions for assembling COI nucleotide sequences and linking them to specimen/sample records. Basic information on how to navigate BOLD-SDP and utilize its major features and functionalities can be found in the BOLD-SDP System Overview, video tutorials, and earlier sections of the Quick Start Guide. We encourage you to carefully review the information in these resources before attempting to perform the steps outlined in this supplement.

There are 4 sections in this protocol:

DNA Sequencing of COI amplicons
Submitting DNA samples For Bi-Directional Sequencing
Sequence Assembly and Editing in BOLD-SDP
Using BOLD Identification Engine

DNA Sequencing of COI amplicons

Background DNA sequencing is a procedure for determining the order in which nucleotides (adenine, guanine, cytosine, and thymine) appear in an individual gene or gene fragment, a continuous cluster of genes, a complete chromosome, or an entire genome. Over the last several decades, a variety of sequencing methods have been developed for different applications and research goals. A researcher’s selection of a particular method is based on a variety of considerations, including speed, cost, accuracy, and the length of the DNA molecule to be sequenced. Dye terminator cycle sequencing – an automated variation of Sanger sequencing – is the method of choice for DNA barcoding. This PCR-based method of automated DNA sequencing is performed at a nominal cost by both commercial and university-based sequencing facilities.

Methodology For DNA barcoding, two dye terminator sequencing reactions are performed separately for each COI amplicon. The forward sequencing reaction will determine the nucleotide sequence of the sense strand, whereas the reverse sequencing reaction will determine the nucleotide sequence of the antisense strand.
The following components are common to both the forward and reverse sequencing reactions, which are performed in separate tubes: 1) multiple copies of a double- stranded COI amplicon (the DNA template for each sequencing reaction); 2) a heat- stable DNA polymerase; 3) dNTPs (the basic building blocks of DNA); and 4) ddNTPs (fluorescently labeled terminator nucleotides that lack an –OH group at position 3 of the ribose ring). Unlike conventional PCR, only a single oligonucleotide primer is used for each sequencing reaction.
Each sequencing reaction progresses through the same major steps of a PCR reaction:

During the denaturation step, each sequencing reaction mixture is heated to ~96˚C to disrupt the hydrogen bonds that hold the sense and antisense strands of the COI amplicon together.
During the annealing step, each reaction mixture is lowered to ~50˚C, allowing the sequencing primer to bind to a complementary sequence on one strand of the COI amplicon. The sequencing primer that was added to the forward sequencing reaction binds or anneals to a complementary sequence on the antisense strand according to the base pairing rules. The sequencing primer that was added to the reverse sequencing reaction binds or anneals to a complementary sequence on the sense strand.
During the elongation step, each reaction mixture is raised to ~60˚C. At this temperature, a heat-stable DNA polymerase finds the 3’ end of the sequencing primer and begins joining nucleotides that are complementary to those present in the template strand. For the forward sequencing reaction, the DNA polymerase joins nucleotides that are complementary to those in the antisense strand. For the reverse sequencing reaction, the DNA polymerase joins nucleotides that are complementary to those in the sense strand.
During this step of the sequencing reaction, the DNA polymerase cannot distinguish between dNTPs and ddNTPs present in the reaction mixture. Because a higher proportion of dNTPs were added to each sequencing reaction mixture, they are more likely to be incorporated into the growing DNA chain. However, when a ddNTP lacking a 3’ –OH is incorporated, DNA synthesis stops as no new nucleotides can be added to the growing chain.
The denaturation, annealing, and elongation steps are repeated multiple times, thereby ensuring that at the conclusion of each sequencing reaction, single-stranded DNA fragments of every possible length are generated. Importantly, each fragment terminates with one of the four ddNTPs, which are labeled with a different fluorescent tag.

Upon completion of each sequencing reaction, the fluorescently labeled DNA fragments are separated according to size using capillary electrophoresis. As the DNA fragments migrate from smallest to largest through the capillary, they pass through a laser, which excites the fluorescent ddNTP at the terminal end of each fragment.

DNA fragments terminated by ddATP emit green light
DNA fragments terminated by ddTTP emit red light
DNA fragments terminated by ddGTP emit yellow light
DNA fragments terminated by ddCTP emit blue light

The light emitted from fluorescent labeled DNA fragments is detected by the sequencer and represented as a continuous series of colored peaks in an electropherogram or trace file. The peak from the smallest fluorescently labeled DNA fragment is represented first in the trace file, whereas the peak from the largest fragment is represented last. The information contained in a trace file will be discussed in greater detail below.

Submitting DNA samples For Bi-Directional Sequencing

In this segment of the DNA barcoding workflow, you will submit COI amplicons generated in your science lab to a sequencing facility for bi-directional sequencing (i.e. sequencing the sense and antisense strands of each COI amplicon). Because the reverse complement of the nucleotide sequence generated for one strand matches the nucleotide sequence generated for the other (except in non-overlapping regions), bi-directional sequencing essentially doubles the amount of sequence data available for each amplicon. As discussed in greater detail below, this duplication of information can help resolve ambiguities in nucleotide sequence data that arise from technical limitations of dye terminator cycle sequencing.
Most DNA sequencing facilities make use of online submission forms that require you to provide the following types of information for each DNA sample:

The format in which DNA samples are delivered (e.g. in tubes or multi-well plates)
To avoid sample mix-ups, we strongly recommend that you submit DNA samples in 0.5 mL (or smaller) tubes that are clearly labeled with an appropriate sample ID (see below for additional details).
The method of payment (e.g. purchase order number, credit card number, etc.) Most schools and universities will require you to obtain a purchase order (PO) number from an authorized purchasing agent.
The type of DNA sample being submitted (e.g. plasmid DNA, PCR product, etc.) Indicate that you are submitting samples containing PCR product.
The type of DNA sequencing service requested
At the conclusion of the initial PCR reaction used to generate COI amplicons, each reaction tube contained unincorporated nucleotides and primers, Taq polymerase, and magnesium. The presence of these reaction components will adversely affect the sequencing reaction and consequently produce low quality or uninterpretable sequence data. It is therefore imperative that you purify COI amplicons prior to submitting them to a sequencing facility. If your COI samples were purified, then you should request standard sequencing. If not, then you may request this service for an additional charge. Please be advised that some sequencing facilities do not offer this service to its clients.
The method used to purify the DNA
If applicable, indicate the name of the vendor that supplies the reagents that your students utilized to purify each PCR reaction.
The name or ID of the DNA sample
The ID of the DNA sample should match the ID of the corresponding tissue sample from which it was obtained. We strongly discourage the practice of re- coding DNA samples as it increases the likelihood of sample mix-ups. Ensure that the side or lid of each sample tube is clearly labeled with the appropriate ID. Since COI amplicons will be sequenced bi-directionally, you should aliquot or apportion an equal volume of each DNA sample into two separate tubes (unless you are otherwise instructed by the sequencing facility). For a purified DNA sample with the ID code CMB12, one volume should be aliquoted into a tube labeled CMB12-F and an equal volume should be aliquoted into a tube labeled CMB12-R. The forward sequencing reaction will be performed using template DNA in the tube labeled CMB12-F, and the reverse sequencing reaction will be performed using template DNA in the tube labeled CMB12-R. Both codes should be entered into the sample ID column of the online submission form. You should also specify on the form that the forward sequencing primer should be used for sample CMB12-F and that the reverse sequencing primer should be used for sample CMB12-R.
The name of the sequencing primer
Your choice of sequencing primers will be dictated by the primers that your students used to generate COI amplicons during the initial PCR reaction. iBOL researchers sometimes use M13-tailed PCR primers, which contain at their 5’ ends a short string of nucleotides derived from M13 DNA. These short nucleotide sequences of are viral in origin and therefore not represented in the COI template. Downstream of the M13 sequences, the primers also contain a continuous series of nucleotides that are complementary to those flanking the barcode region of the COI gene. During the annealing step of PCR, these complementary primer sequences bind to the sense and antisense strands of the COI gene according to the base pairing rules, allowing Taq polymerase to copy the intervening region of DNA during the subsequent elongation step.
Importantly, the M13 region of each primer cannot bind the template DNA during the annealing steps of PCR (because complementary nucleotides are not present in the COI gene). However, during the subsequent elongation step, these short M13 nucleotide sequences are incorporated into the 5’ ends of each newly synthesized strand of DNA along with the remainder of each primer sequence. The COI amplicons generated during PCR therefore contain different M13 nucleotide sequences at each end of the sense and antisense strands. These M13 nucleotide sequences serve as binding sites for universal sequencing primers used by most sequencing facilities. If you used M13-tailed PCR primers, the universal M13F-20 primer can be selected for the forward sequencing reaction of each COI amplicon, and the universal M13R primer can be selected for the reverse sequencing reaction of each COI amplicon.

If your PCR primers lack M13 tails, then the forward PCR primer can be selected for the forward sequencing primer and the reverse PCR primer can be selected for the reverse sequence primer. In this case, you will need to submit an aliquot of each primer to the sequencing facility and provide the sequence of each primer with your submission.
The length (in bp) of the DNA sample to be sequenced
The standard length of the COI barcode region is ~650 bp.
The concentration of the DNA sample (in ng/µL)
In order to generate high quality sequence data, sequencing facilities generally require a DNA concentration in the range of ~5-20 ng/µL. The DNA concentration of each sample can be determined using a spectrophotometer or by running an agarose gel to compare the band intensity of a small volume of sample DNA with the band intensity of DNA concentration standards. Since standard spectrophotometers found in most educational science labs yield highly variable and inaccurate values for PCR products, the latter method is recommended.
The total amount of DNA sample
This value is determined by multiplying the DNA concentration of the sample by the volume (in µL aliquoted into a given sample tube. In general, sequencing facilities recommend that you submit a total of ~100 ng DNA. This amount of DNA enables the sequencing facility to repeat a sequencing reaction in the event that a procedural error occurred.

The section above is intended to provide you with a basic understanding of the types of information that a sequencing facility will request. Before submitting your DNA, we strongly encourage you to contact the sequencing facility specific submission guidelines and procedures.

Sequence Assembly and Editing in BOLD-SDP

As noted above, the fluorescently labeled DNA fragments generated during dye terminator cycle sequencing migrate sequentially through a capillary according to their size (smallest to largest) and pass by a laser. Upon exposure to the laser, fragments terminated by a ddATP emit green light, fragments terminated by ddTTP emit red light, fragments terminated by ddCTP emit blue light, and fragments terminated by ddGTP emit yellow light. The light signals are detected by the DNA sequencer, processed by a software program, and represented as a series of colored peaks in a trace file (yellow light signals emitted from DNA fragments terminated by ddGTP are represented as black peaks in the trace file to make them more readable on a white background).
The software also uses an algorithm to assign base calls (nucleotides) to each peak in the trace file and to compute a confidence or quality (Q) score for each base call. The quality score represents the level of confidence that a base call was made correctly. To compute quality scores, the algorithm examines several parameters associated with the peak shape and resolution at each position in the trace file. The resulting scores are logarithmically linked to error probabilities according to the following equation:

Q = -10 log10 P

where P represents the probability of an incorrect base call

Based on this equation, a quality score of 20 indicates that the probability of an incorrect base call is 1 in 1,000, whereas a quality score of 40 indicates that the probability of an incorrect base call is 1 in 10,000. Generally speaking, quality scores less than 20 are considered unacceptable and must be edited.

Step 1. Obtain/receive sequence data from facility and organize

Many DNA sequencing facilities send their clients email notifications when sequence data are available. The notification will usually contain a link to a facility-managed website where the trace files can be downloaded. A trace file may bear one of several file extensions (e.g. .abi, .scf, etc.), depending on the type of DNA sequencer and software used by the sequencing facility. ab1 files represent the unedited output files created by the Applied Biosystems Sequencing Analysis Software, which is extensively used by DNA sequencing facilities.

For each COI DNA sample that you submit, you are likely to receive trace files containing the .ab1 file extension. One trace file corresponds to the forward sequencing reaction (which produced the unedited nucleotide sequence of the sense strand of the amplicon), while the other trace file corresponds to the reverse sequencing reaction (which produced the unedited sequence of the antisense strand). We recommend that for each class, you download and organize trace files into a single folder that your students can copy onto a computer desktop.

Step 2. Upload forward and reverse trace files to BOLD-SDP

Unedited trace files form an important part of a DNA barcode record as they aid in the verification of barcode sequences, which are typically assembled using the raw data contained in the forward and reverse trace files. Before assembling barcode sequences with the BOLD-SDP Sequence Editor, the corresponding trace files must first be linked
to a specimen/sample through the Data Management Console of BOLD-SDP. Trace files and associated data are linked to the appropriate specimen/sample by following the steps outlined below. Please be advised that in order to link trace files to a particular specimen/sample, you must have already completed a New Specimen page for the specimen/sample in the Data Submission Console of BOLD-SDP. Consult the relevant section of the BOLD-SDP Brochure and Quick Start Guide for details.
a. Log in to the Main Student Console by clicking on the Student Console icon on the BOLD-SDP home page.

b. On the Main Student Console page, click the <Upload Traces> icon.

c. In Section A of the Upload Traces page, enter a Sample ID to connect your traces with the appropriate specimen/sample. If you forgot the Sample ID of the specimen/sample from which the trace files were generated, click the <Lookup> button to find its ID in your class record list.

d. In Section B, use the pull-down menus to select the Forward and Reverse PCR primers used to generate your COI amplicon. If necessary, consult your instructor for guidance in selecting the correct PCR primers.

e. In Section C, use the pull-down menus to select the Forward and Reverse Sequencing Primers used to generate your trace files. Next, click the <Choose File> buttons to select and upload the forward and reverse trace files from your computer. Be sure that the forward trace file appears in the top selection along the forward sequencing primer, and vice versa.

f. Be sure to select your name from the <Student Attribution> pull-down menu in the upper right-hand corner of the page so that you are credited with uploading trace files. If you are working with another student, click <Add Student> and select his/her name from the pull-down menu that appears. If necessary, this operation may be repeated to include additional student partners.

g. Click the <Submit> button to incorporate the information on this page into the specimen/sample record.

Step 3. Visualize trace files in the specimen/Sample record

Before assembling barcode sequences, it is extremely important to verify that your trace files were incorporated into the correct specimen/sample record. To perform this task, follow each of the steps below:
a. Navigate to the Main Student Console page of BOLD-SDP.
b. In the right sidebar of the Main Console page, click the <View Data> icon.

c. On the Record List page for your class, locate the row for the specimen/sample that you linked to the recently uploaded trace files. Then click the Process ID link for the specimen/sample to open its sequence page in a new window.

d. The PCR primer names, sequencing primer names, and trace file names should appear in the Sequencing Runs pane. BOLD-SDP also displays a quality designation (e.g. high, medium, low, or failed) to each of the trace files. These designations are based on the average quality scores of the base calls in each trace file. High quality trace files have a mean quality score >40, medium quality trace files have a mean quality score between 30 and 40, and low quality trace files have a mean quality score <30. Trace files with fewer than 10 base calls are designated as failed. Please be advised that it may take several hours for BOLD to assign quality designations to the trace files once they are uploaded. Trace files with a low or failed designation usually indicate that a procedural error occurred in the lab prior to the submission. Sequencing facilities sometimes provide an explanation of the likely cause of the problem. Trace files with a low or failed designation cannot be used to assemble contigs using the BOLD Sequence Editor (see below for additional details).

e. To view and examine the trace files, select both check boxes that appear next to their filenames and then click the <View Trace Files> button.

f. The forward and reverse trace files are displayed in the top and bottom panes of the Trace Viewer page, respectively.

The Trace File Viewer displays quality values for individual base calls in the trace files using a histogram. The quality value for each base call can be determined by comparing the height of its shaded bar to the vertical scale that appears to on the right hand side of the trace file window. Continuous stretches of low quality base calls that appear on the 5’ and 3’ ends of each trace file are displayed in reduced opacity.

BOLD-SDP also computes quality statistics and displays them in tabular and graphical format above each trace file window. Of these statistical values, the mean and standard deviation are the most informative. The mean refers to the average quality score for the base calls in a given trace file. The standard deviation (Stdev) is a measure of how close the quality scores for the base calls are to the mean. A low standard deviation value indicates that the quality scores are clustered near the mean, whereas a high standard deviation value indicates that the quality scores are dispersed over a large range of values. Lower standard deviation values therefore indicate a greater level of consistency in the quality of base calls, which imparts a higher degree of confidence in the overall accuracy of the trace file. The frequency histograms that appear above each trace file window show the percentage of base calls that correspond to different quality scores (QV). The data displayed in these histograms provide an indication of the range of quality scores for the base calls.

The scroll bar at the bottom of each trace file window allows you to examine the sequences along their entire length. Even in the absence of quality values/scores for individual base calls in the trace files, their quality can be inferred from the resolution of their corresponding peaks. Notice that the peaks in the beginning (5’ end) of forward trace file are broad, overlapping, and poorly resolved. This semi-transparent region of the trace file corresponds to low quality base calls, which correlate with high error probabilities.

As you scroll to the right, the un-shaded peaks appear sharp, well resolved, and non-overlapping. This region of the trace files corresponds to high quality base calls, which correlate with low error probabilities.

As you scroll even further to the right (toward the 3’ end of the trace files), the peaks become lower in amplitude and begin to broaden and overlap. This semi- transparent region of the trace file corresponds to low quality base calls.

The low quality base calls that appear at the beginning and end of a trace file arise from technical limitations of dye terminator sequencing, which are complicated and caused by a variety of different interacting factors associated capillary electrophoresis and the underlying chemistry of this particular sequencing method. Regardless of their cause, low quality base calls must be eliminated from the sequence in order to preserve its overall accuracy.

Step 4. Assembling COI barcode sequences from trace files. Important Note: Steps 4-6 must be completed in their entirety.

Although low quality or ambiguous base calls are normally found at the 5’ and 3’ ends of a trace file, they may also appear elsewhere in the sequence. If only a single trace file was generated for a given COI amplicon, it would be difficult or impossible to confidently determine the identity of a base call that is assigned a low quality score value (i.e. a value < 20). However, a second trace file contains duplicate data that can help determine its identity with a greater level of statistical certainty.

Bringing two trace files into register and displaying them in the same window enables a researcher to identify regions of agreement or disagreement in base calls. In cases
where a low quality base call is found in one trace file, the researcher can find the position of the base call in the other trace file and compare the differences in quality scores. If a higher quality score value (>20) is assigned to the base call in the second trace file, then that base call is regarded as the correct nucleotide and accepted.

The algorithm that operates within the BOLD-SDP Sequence Editor (and the Trace File Viewer described above) automatically reverses the sequence of base calls and peaks in the reverse trace file (which corresponds to the sequence of the antisense strand) so that they read in the opposite direction. It then converts each base call to its complementary nucleotide and re-colors the corresponding peaks accordingly. For example, a base call of <T> that appears at the first position of the trace file above a red peak is replaced with a base call of <A> and moved to the last position above a green peak of the same shape and height as the original red peak. The confidence score assigned to the original base call is also shifted to the last position. Next, the program aligns this sequence of complementary base calls and appropriately re-colored peaks with the unaltered sequence of base calls and peaks of the forward trace file (which corresponds to the sequence of the sense strand). The largely overlapping DNA sequences are displayed in a project window of the BOLD-SDP Sequence Editor as shown below.

The forward trace file appears in the top pane of the assembly project window along with its sequence of base calls. The reverse complement of the reverse trace file appears in the lower pane along with its corresponding base calls. For both trace files, quality scores are represented graphically in the form of a histogram, where higher bars indicate higher quality scores and vice versa. The vertical scale on the right side of each trace file histogram displays the numerical quality values.

The nucleotide sequence that appears at the bottom of the project window represents a contig – a continuous nucleotide sequence assembled from two overlapping DNA sequences (in this case, from the forward trace file and the reverse complement of the reverse trace file). The BOLD-SDP Sequence Editor compares the quality scores of base calls at every position of the trace files and accepts the base call with the higher quality score for inclusion in the contig. The bars that appear above each nucleotide in the contig are graphical representations of quality values, which are color-coded according to the legend that appears in the upper right hand corner of the window.

It is important to realize that the algorithm utilized by BOLD-SDP to make these comparisons is not perfect, so students must scan through the contig to ensure that no errors were made. This task is simplified by examining the quality scores that appear over each nucleotide in the contig. These scores represent the algorithm’s confidence that the correct base call was chosen for inclusion in the contig. Low quality scores flagged with orange or red bars require human inspection.

To assemble contigs for your forward and reverse trace files, please follow the steps outlined below:
a. Log in to the Main Student Console by clicking on the Student Console icon on the BOLD-SDP home page.

b. On the Main Student Console page, click the <Add Sequence> icon.

c. In Section A of the Upload Sequence page, enter a Sample ID that corresponds to the trace files that you wish to assemble and edit. If you forgot the Sample ID of the specimen/sample, click the <Lookup> button to find its ID in your class record list.

d. In Section B of the Add Sequence page, click the <BOLD Sequence Editor> button to load the trace files associated with the specimen/sample into the BOLD- SDP Sequence Editor. If you are unable to assemble a contig using the BOLD- SDP Sequence Editor, refer the last section of this document for guidance on how to proceed.

e. The BOLD-SDP Sequence Editor simplifies the editing process by automatically eliminating continuous stretches of low quality base calls from the contig. It is important to realize that although these base calls are not included in the contig, they are still displayed in the forward and reverse trace files in reduced opacity. The scroll bar at the bottom of the assembly project window allows you to examine the trace files and contig along their entire length (moving from 5’ to 3’).

In the BOLD-SDP Sequence Editor, start by scanning the entire length of the assembly to identify low quality bases, which are flagged with orange or red bars above the consensus sequence. Moving the mouse pointer over a base call in the trace files or consensus sequence will highlight the alignment position and display the base calls and associated quality scores/values at the top of the editor. Clicking on a base will expose the editing tool, which enables a base call to be revised, deleted, or made ambiguous. This operation is performed by selecting one of the six options in the pull-down menu (e.g. by selecting an <A>, <T>, <C>, <G>, <N>, or <->).

f. The first step in the editing process is to carefully inspect the color of the bars that appear over each nucleotide in the contig, starting from the 5’ end (left side). The bars are graphical representations of quality values, which are color-coded according to the legend that appears in the upper right hand corner of the window. It is important to watch for quality scores < 20 (which are indicated by orange and reds bars). For assembly projects performed with higher quality trace files, orange or red bars are likely to only appear at the beginning and end of each contig.

If you discover a red bar in the beginning (5’ end) of the contig, highlight the nucleotide that appears beneath it with your mouse. Notice that the corresponding base call in each trace file is also highlighted. Next, carefully inspect the quality scores for the corresponding base calls in both trace files. If the quality score for the base call is >20 in at least one trace file, the nucleotide in the contig can be regarded as correct. However, if the quality score for the base call in both trace files is < 20, then that corresponding nucleotides and all nucleotides that precede it (to the left) in the contig must be changed to an <N> using the Edit Base pull-down menu. BOLD-SDP will automatically eliminate these base calls during the final processing of your sequence.

Once this operation is performed, click and drag the scroll bar at the bottom of the project assembly window to scan for additional orange or red bars above the contig. If the quality of the trace files is high, then orange or red bars signifying low quality scores will only appear over nucleotides at the extreme 3’ end of the contig. If you discover a nucleotide with an orange or red bar in this region of the contig, click the nucleotide that appears beneath it with your mouse and follow the steps outlined above. If a nucleotide requires deletion, you must convert that nucleotide and all others that follow it (to the right) in the contig with an <N> using the Edit Base pull-down menu.

Instructors should be aware that lower quality trace files invariably present a variety of challenging scenarios during sequence assembly and editing. For instance, low quality base calls will often occur at the same position in more central regions of the trace files. In these instances, deleting the corresponding nucleotide and all others that precede or follow it in the contig will result in a truncated sequence that does not meet the minimum barcode length of 500 nucleotides. A visual inspection of the peaks (and the environment surrounding the peaks) can sometimes provides insights into the likely identity of the base call, but these inspections are extremely subjective and require a modest degree of experience to interpret appropriately.

To simplify this segment of the barcoding workflow for students, sequence editing should only be performed on contigs assembled from two high quality trace files, or one high quality trace file and one medium quality trace file. The quality designation of each trace file can be found on the Trace Viewer page of BOLD- SDP as outlined above. In our experience, the engagement of students in editing contigs assembled from two medium quality trace files requires supervision by someone who is highly experienced in sequence analysis and editing procedures.
If you encounter difficulties in the sequence editing process, contact eBOL staff at the following URL for specific guidance:
http://www.educationandbarcoding.org/contact.php

Step 5. Inspecting contigs for the presence of STOP codons

COI is a mitochondrial gene that directs the production of a protein subunit vital for cellular respiration. All mitochondrial protein-coding genes terminate in a STOP codon
- a triplet nucleotide that may take one of several forms depending on the taxon under study. During the process of transcription, the STOP codon of a protein-coding gene is
transcribed into messenger RNA. At the conclusion of translation, the STOP codon binds a release factor, which signals the ribosome to dissociate and release the newly
synthesized amino acid chain.
The -650 bp region of the COI gene that you amplified by PCR is located upstream of the STOP codon found in the mitochondrial DNA template. Accordingly, STOP codons
should be absent in your edited contig. The presence of a STOP codon indicates one of two likely possibilities: 1) a nucleotide was erroneously omitted in the contig, or 2) an extra nucleotide was erroneously included. Either possibility will require you to re-examine your contig for possible editing errors.
The BOLD-SDP Sequence Editor enables you to examine your sequence for the presence or absence of STOP codons. Because the COI barcode region that you amplified is also downstream of the START (ATG) codon found in the mitochondrial DNA template gene, the Auto Translator algorithm built into the BOLD-SDP Sequence Editor must first organize your contig into three reading frames. For reading frame 1, nucleotides are grouped into codons beginning with the first nucleotide in the contig. For reading frame 2, nucleotides are grouped into codons beginning with the second nucleotide in the contig (the first nucleotide is ignored). For reading frame 3, nucleotides are grouped into codons beginning with the third nucleotide in the contig (the first and second nucleotides are ignored). The translator then uses a translation matrix similar to a genetic code table used in classroom settings to determine the amino acid sequence of each reading frame. It then compares the three amino acid sequences to a database of known COI amino acid sequences to determine which reading frame is correct. The correct amino acid sequence is displayed at the bottom of the sequence editor project window.

a. The single letter amino acid code for the translated sequence appears below the contig sequence. The line that appears above each amino acid identifies the corresponding codon in the contig sequence. To scan the sequence for the presence of a STOP codon (indicated by a red asterisk <*>), click and drag the scroll bar at the bottom of the window. If a STOP codon is found in the amino acid sequence, then an editing error was made. To find the source of the editing error, repeat the steps outlined in the section above.
b. If no STOP codons were detected in the amino acid translation of the contig, then click the <Save> button in the toolbar to enter the contig into the Add Sequence window.

c. Next, instruct BOLD-SDP to automatically trim primers sequence from your edited sequence by clicking the <Trim Primers> button in Section C.

d. Once BOLD-SDP performs the trimming function, the <Trim Primers> button will be replaced by the <Check for Contaminant> button. Clicking this button will instruct BOLD-SDP to inspect your sequence for the presence of common lab contaminants, including human contaminants.

e. Once the contaminant check is passed click <Submit Sequence> button to link the edited and validated barcode sequence to your specimen/sample. Be sure to select your name from the <Student Attribution> pull-down menu in the upper right-hand corner of the page before submitting your sequence.

Step 6. Verify that editing COI sequence was incorporated in specimen/sample record.

The final step in the editing process is to verify that the edited sequence was integrated into the barcode record of the appropriate specimen/sample. To perform this function:
a. Navigate to the Main Student Console page of BOLD-SDP.

b. In the right sidebar of the Main Student Console page, click the <View Data> icon.

c. On the Record List page, locate the row for the specimen/sample that you linked to the recently uploaded sequence.

d. Click the Process ID link for the specimen/sample to open its sequence page in a new window.

The edited COI nucleotide sequence can be found in the Nucleotide Sequence pane along with associated data, including sequence length (in base pairs), sequence composition (e.g. the number of A’s, C’s, T’s, and G’s in the sequence), and the number of ambiguous characters or nucleotides (N’s).
The amino acid translation and total number of amino acid residues encoded by your COI nucleotide sequence are located in the lower left pane of the Sequence page in the Amino Acid Sequence pane.
The illustrative barcode that appears in the upper right hand corner of the Sequence page represents each nucleotide in your barcode sequence as a different colored line. A’s are represented with green lines, T’s with red lines, C’s with blue lines, and G’s with black lines.

e. To compare the barcode sequence in your record with other barcode sequences in the BOLD species database, click the <Species DB> button that appears at
the bottom of the Nucleotide Sequence pane.

A Specimen Identification Request window will open that contains different forms of information.

The Search Result pane that appears at the top of the page contains a summary statement of the search performed by the BOLD Identification System (BOLD-IDS), which is supported by the data displayed in other sections of the page.
The Identification Summary Table shows the probability (expressed as %) that the specimen/sample belongs to the taxonomic groups listed in the middle column.
The second table was generated by BOLD-IDS by comparing the COI nucleotide sequence of your specimen/sample with COI sequences of other specimens registered in the BOLD reference library. The percent nucleotide sequence similarity for the 20 closest matches is displayed along with the taxonomy for each match. Similar data for the top 100 matches is also displayed graphically on the page.
The world map at the bottom of the page shows the collection site of specimens with COI sequences that are >98% similar to the COI barcode sequence of your specimen/sample.

Step 7. Review data in each specimen/Sample record

The integration of trace files and edited COI sequences into a specimen/sample record completes the record assembly process in BOLD-SDP. If your project involves the creation of reference DNA barcode records, the information contained in each class record will be carefully reviewed by your instructor and submitted to scientific experts, who will review compliance with current barcode data standards. Required data standards for each specimen record minimally include:

A species name assigned by an expert (or a provisional name)
A unique specimen identifier
A retrievable voucher specimen and the name of the institution that is storing the voucher
A collection record containing the collector name, collection date, collection location, and geospatial coordinates)
A COI sequence of at least 500 nucleotides in length with fewer than 1% ambiguous base calls (N’s)
The name of the PCR primers used to generate the COI amplicon
The unedited trace files

Student records that meet these standards are eligible for inclusion in the BOLD researcher workbench and for publication in GenBank. Maintained by the National Center for Biotechnology Information (NCBI), GenBank is a sequence database that contains an annotated collection of all publicly available nucleotide sequences and their protein translations.
The publication process generally takes between 6-8 weeks. During this time, a GenBank accession number will be assigned to your record and appear on the Sequence page of your barcode record(s) as an inactive link. The accession number is a unique identifier assigned by GenBank to your COI barcode sequence. Once your data is published in GenBank, the accession number on the Sequence page becomes an active link to GenBank. Clicking the number will retrieve your barcode record from the GenBank database.

Notice that the record contains the BARCODE designation in the KEYWORD row. GenBank records with this special designation incorporate data that appears in your specimen records (e.g. taxonomy, collection event details, nucleotide sequence information, etc.). Furthermore, the AUTHOR row lists the names of the students who contributed to the creation of the record.

Step 8. Analyze Barcode Data

Regardless of whether your project involves building the BOLD reference barcode library, or using the reference library to determine the identify of an unknown sample, the Sequence Analysis Console of BOLD-SDP contains an impressive suite of informatics tools that enables you to visual and analyze your barcode data in extremely powerful and informative ways. Once you complete the assembly of barcode records, we encourage you to consult the BOLD-SDP User Manual for information on how to access and utilize these tools.

Using BOLD-IDS to Submit Identification Requests for Low Quality Sequence Data

If you were unable to assemble a contig using the BOLD-SDP Sequence Editor, you may nevertheless use the BOLD Identification System (BOLD-IDS) to determine the possible identity of the specimen/sample from which your sequence data were generated. Follow the steps below to submit identification requests through BOLD-SDP.
a. Navigate to the Main Student Console page of BOLD-SDP.
b. In the right sidebar of the Main Console page, click the <View Data> icon.

c. On the Record List page for your class, locate the row for the specimen/sample that is linked to sequence that you plan to submit to BOLD-IDS. Then click the Process ID link for the specimen/sample to open its sequence page in a new window.

d. In the SEQUENCING RUNS pane of the Sequence page, use your mouse to check the boxes that appear next to each of your trace files and click the <View Trace Files> button.

e. Your trace files are now displayed in the Trace Viewer page along with their quality statistics (please be advised quality statistics are not computed for trace files that are designated as failed). Click the <View Sequence> button that appears beneath the window of the trace file with the higher mean quality score.

f. The text sequence is extracted from your trace file and displayed in a new window. Ambiguous base calls are represented by n’s. Highlight and copy the text sequence that appears in this window. Remember that the reverse trace file contains the nucleotide sequence of the antisense strand of your COI amplicon. For reverse trace files, BOLD-SDP extracts the text sequence and displays its reverse complement in the sequence window.

g. To submit your request, navigate to the home page of BOLD-SDP and click the
<Identification> menu item that appears in the main toolbar.

h. Paste the text sequence that you copied into the text box on the Identification Request page.

i. From within the text box, highlight and delete ambiguous base calls (n's) on the 5' end of the sequence.

J. Next, highlight and delete ambiguous base calls (n's) on the 3' end of the sequence. Then click the <Submit> button to submit your identification request to BOLD-IDS.

k. A window containing the search results will open. Refer to Step 6 above for an explanation of the information displayed on this page. Because you submitted sequence data from only a single sequencing reaction, BOLD-IDS may be unable to return a conclusive species level match for your identification request.