BOLD Identification Engine
The library of sequences collected in BOLD is available for facilitating identification of unknown sequences. The BOLD Identification Engine uses all sequences uploaded to BOLD from public and private projects to locate the closest match. To ensure data security, sequences from private records are never exposed.
BOLD now provides the ability to submit a batch of query sequences for identification. This service is available for up to 100 sequences at a time for users signed into the system.
Users can email the identification results so that identification requests may be run in parallel. The new option is next to the Submit button. Upon submitting the ID Engine request, the system will provide you with an estimated run-time.
The BOLD ID Engine accepts sequences from the 5’ region of the mitochondrial gene COI and returns a species-level identification (when possible). BOLD uses the BLAST algorithm to identify single base indels before aligning the protein translation through profile to a Hidden Markov Model of the COI protein. There are four types of databases that can be used to identify COI sequences. The BOLD ID Engine provides historical copies of the COI databases dating back to 2009 for use in replicating results from previous years. The Full-Length COI database is designed for use with short query sequences as it provides maximum overlap in the barcode region of COI.
Fungal (ITS) and Plant (rbcL & matK) Identification
In the BOLD ID Engine, ITS is the default identification tool for fungal barcodes and rbcL and matK are the defaults for plant barcodes. Both return a species-level identification (when possible). The BLAST algorithm is employed in place of BOLD’s internal identification engine for these sequences. The number of fungal and plant sequences in BOLD is relatively limited compared to the number of animal sequences and thus a successful species match may not be possible. As new sequences are added to the database, the number of successful matches should improve. These databases include many species represented by only one or two specimens, as well as all species with interim taxonomy. Both searches will return a list of the nearest matches but do not provide a probability of placement to a taxon.
The results page for a typical animal sequence identification is illustrated below. For each sequence queried, a overview is provided describing the best match, links to both the taxonomic page and the BIN cluster for the match, as well as a Taxon ID Tree placing the query sequence in among 100 of the closest matches. The top matches listed in the table provide links to the public record where available. A map is provided displaying the collection location of all the public records in the top 100 matches. For a batch of sequences queried, each result page is accessible via the accordion tabs in the page.
Back to Top
The Taxonomy Browser is a synthetic database that allows users to examine the progress of DNA barcoding by browsing through the different levels of the taxonomic hierarchy available on BOLD.
Within the Taxonomy Browser, users can select phlya in the Animal, Plant, Fungus, or Protist kingdoms to navigate from phylum to species level. Statistics on the progress of DNA barcoding at each taxon are generated from both public and private data while protecting private user-owned data. To look up a specific taxon directly, use the search function by entering a taxonomic name into the search bar at the top of the Taxonomy Browser or on the BOLD Home page. Descriptions of the features on each taxon page are illustrated and described below.
Back to Top
The Publication Database contains details on publications that are relevant to the barcoding community and are submitted by users of the system. It is accessible without logging into BOLD. This database indexes title, abstract, year, and authors, allowing for broad searches. Expanding a publication from the results list will provide details on the publication, including a link to the article on the journal’s site, as illustrated below. A citation or set of citations can be downloaded from BOLD using the drop down menu to the right of the search bar.
Bibliographies can be submitted to this database by users, following the Bibliography Submission protocol. By associating records to a bibliography on BOLD, the article citation will appear everywhere the records appear in BOLD.
Back to Top
The Primer Database is a database of all the public primers available in BOLD. This can be accessed without a BOLD account. Using the search bar, users can enter terms that appear in the primer code, submitter, or reference fields. Selecting a primer from the database will provide details on the primer, including primer performance statistics derived from data submitted to BOLD as illustrated below. A primer or set of selected primers can be downloaded in FASTA format using the Download Selected Primers button the the right of the search bar.
If users have previously registered a primer in BOLD, it will be available in the Primer Database if the user is signed in to BOLD, allowing private primers to be edited (ie, to make it publicly available and to add citation information). New primers must be registered from the User Console before trace files generated using them are submitted to records on BOLD following the Trace Submission protocol.
Back to Top
Public Data Portal
Searching the Public Data Portal
The BOLD Public Data Portal is a database of all of the public records on BOLD, including those in the early data release phase of the iBOL project, where information is still masked. This database can be used to access and download specimen data and sequences.
Public users can search the Public Data Portal using taxonomy, geography (country and state/province), and institution keywords, or by using Sample ID or BOLD Process ID to find individual records.
Any combination of keywords into the search bar. For example, searching "Lepidoptera Canada" will return all of the Lepidoptera records collected in Canada. Searching "Lepidoptera Canada -Ontario" will return the same results, but with the specimens collected in Ontario omitted.
The search results will display a list of the public records that match the searched terms, as illustrated below. Toggling to "BINs" next to the search button will convert the list to all BINs available.
The record page gives information on the specimen identifier, taxonomy, specimen details, collection data (including collection site), sequence information, specimen image details, and attribution details. The figure below shows the details page for a particular record. A record page will reference a BIN when one is available and provides links to GenBank records.
Back to Top
Barcode Index Numbers (BINs)
The Barcode Index Number System is an online framework that clusters barcode sequences algorithmically, generating a web page for each cluster. Since clusters show high concordance with species, this system can be used to verify species identifications as well as document diversity when taxonomic information is lacking. This system consists of three parts:
The BIN framework can greatly expedite the evaluation and annotation of described species and putative new ones while reducing the need to generate interim names, a non-trivial issue in barcoding datasets. The BIN algorithm has been effectively tested on a broad set of taxonomic groups and shows potential for applications in species abundance studies and environmental barcoding. The registry employs modern URI and web service functionality enabling integration with other databases.
COI sequences over 500bp will be evaluated for inclusion into BINs if they meet the quality standards. Sequences over 300bp will be considered for membership into an existing BIN, but will not create or split BINs.
Ratnasingham S, Hebert PDN (2013) A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System. PLoS ONE 8(8): e66213. DOI:10.1371/journal.pone.0066213
BIN pages display aggregated data in several sections described and illustrated below.
Back to Top
Public Annotation on Databases
As the volume of barcode data being generated increases rapidly, the need for routine curation has become apparent. BOLD’s annotation and notification system supports rapid community based validation of barcode data. Annotation can occur at the project level, record level, and also on specific data elements including taxonomy, images, and sequences on BIN pages . The Annotation System leverages the large user-base and expert knowledge for curation of both private data within collaborative projects and public data through the Public Data Portal. Tagging allows for categorization using custom and controlled tags. Both custom and controlled tags can be used for filters, searches, and workflow management.
Comments and tags applied to data by BOLD users will appear in the Activity Report on the User Console and the Activity Report on the appropriate Project Console. Comments will persist on the data element with the user's full name and a date stamp. Tags can be removed at any time by any user.
Annotation is available wherever the Add Tags and Comments button appears within BOLD. Users must be signed in to BOLD to be able to add tags and comments.
The figure below illustrates the annotation window which allows for comments as well as the option to choose an existing tag or create a new tag.
Back to Top