Handbook |
Search Handbook
|
Extended Analysis Tools
Distance SummaryIt is desirable for barcodes to show very low sequence divergence within a species, with significantly higher sequence divergence at higher taxonomic levels. The Distance Summary tool gives a report of sequence divergence between barcode sequences at the conspecific and congeneric levels. ParametersVarious distance models and alignment algorithms are available as parameters, as well as options to filter out sequences based on sequence length or sequence issues. ResultsComparisons are performed between the given taxonomic levels with the frequency plotted as shown below. There is one visualization provided that is normalized by species to remove sampling bias. Details for the comparisons done at the level of species, genus, and family are available by clicking on the links in the top right corner. Back to Top Sequence CompositionThe frequency of DNA bases, observed with emphasis on GC-content, can be a useful metric for evolutionary biologists. For example, GC-content within the barcoding region of CO1 has been correlated with GC-content of the entire mitochondrial genome for many species. ParametersVarious distance models and alignment algorithms are available as parameters, as well as options to filter out sequences based on sequence length or sequence issues. The default parameters allow for GC percentages to be calculated on overall sequence composition as well as codon positions 1, 2 and 3 - but these may be unselected if desired. ResultsThe results page provides statistics on the frequency of each base (G, C, A and T) in the selected records and can display histograms for GC content on all codon positions. Back to Top Barcode Gap SummaryThe Barcode Gap Summary presents users with an examination of the distance to the nearest neighbour for each of the species in the list of selected specimens. ParametersVarious distance models and alignment algorithms are available as parameters, as well as options to filter out sequences based on sequence length or sequence issues. ResultsDistances are highlighted if the nearest neighbour is less than 2% divergent, or when the distance to the nearest neighbour is less than the intra-specific distance. Warnings presented by this tool may be summarized by clicking on the link in the top right corner of the Barcode Gap results page. Back to Top Accumulation CurveAn Accumulation Curve of standardized DNA barcodes and related features provides a clear, transparent, and reproducible estimate of the diversity and sampling efficiency of areas or collections. This tool also allows users to quickly compare sampling efficiency at multiple regions by multiple taxonomic levels. ParametersEach curve is a plot of the number of species, genera, subfamilies, and/or BINs as a function of the number of samples. The Extra Info field can also be plotted, for example to graph morphotypes. As the tool allows for multiple graphs, it can help a researcher determine which geographic regions are producing less new groups (creating multiple graphs by country, province, or region), or which taxonomic group is plateauing (creating multiple graphs by phylum, class, order, family, or subfamily). The Extra Info field can also be utilized to investigate efficiency of sampling protocols, progress in FAO regions, etc. Sampling order can be randomized, and for a large dataset, a higher a degree of smoothing may be optimal via more iterations; however this will take longer to calculate. Order of submission can also be chosen to visualize the impact of sampling efforts. ResultsA steep slope indicates that a large fraction of the diversity remains to be discovered. A curve that is flatter to the right indicates that a reasonable number of individual samples have been collected and more intensive sampling is likely to yield only few additional groups. Back to Top Alignment BrowserManaging sequence alignments and base calls is a critical step in any barcode analysis. To prevent the inconvenience of importing sequences into 3rd party software to analyze and edit, BOLD provides an integrated alignment browser that includes many features popular in other packages. In the newest version of BOLD, the updated alignment browser supports direct editing to the database. Multiple alignment options such as MUSCLE and Kalign algorithms, as well as colourization options, are also available. Sequence EditingIn the newest version of BOLD, the updated Alignment Browser supports direct editing to the database. Users can select sequences or single bases then right click to see editing options. Once edited, the entire session can be submitted to upload the edited sequences to their records. ParametersVarious distance models and alignment algorithms are available as parameters, as well as options to filter out sequences based on sequence length or sequence issues. Back to Top Diagnostic CharactersThe Diagnostic Character analysis provides a means to examine nucleotide or amino acid polymorphism between sets of sequences that are grouped by taxonomic or geographic labels. More specifically, this tool identifies consensus bases from each group, compares them to those from the remaining sequences in other groups, then characterizes how unique each consensus base is. The purpose of this tool is categorizes consensus bases by their diagnostic potential, which are categorizes as followed:
ParametersSince this tool only performs the analysis on the set of sequences selected by the user, the result is greatly affected by the initial data and the analysis parameters. Even the smallest change in the initial sequences, filtering options, or the analysis parameters can cause the consensus sequences in each group and hence the diagnostic potential to be different between analyses. As a result, the interpretation of each analysis is absolutely dependent on all the factors combined. In general, having more sequences per group will provide a more accurate diagnosis of each group, as it reduces the problem caused by small sample size. Algorithm
Back to Top BIN Discordance ReportThe Barcode Index Number (BIN) module analyzes new COI sequences and assigns them to an existing or a new BIN. Please visit the BIN documentation for more details. Besides generating BIN pages, this system acts as a rapid check of the validity of taxonomic designation on specimen records. The BIN Discordance Report facilitates this check by comparing the taxonomy on selected records against all others in the BINs they are associated with. ResultsThe results are sorted by the degree of conflict, displaying those records in BINs where there is a phylum level conflict first (likely the result of cross-contamination) down to species level conflicts. Users can select and retrieve records from this page to examine ancillary data, comment, tag, or edit the taxonomy where there is a confirmed error. The report also lists records that are in BINs that contain no taxonomic discordance (see the Concordant BINs tab in the results page), as well as records that are in BINs that contain no other sequences (see the Singletons tab). Back to Top |