Classifier. Each program needs either one, two or three types for its input and generates one type. For example
Sequence type and format
IDand optionally a description. The sequences should not be aligned. Dashes of gaps should be removed.
Procedure to import sequences
FASTA formatted sequences can be entered directly into the text area. If the data is large, it is advisable to upload a file containing the sequences in FASTA format with the button. However there is a limit of 200 MB of data.
Label type and format
Labels are the target features which the classifiers or models attempt to predict. In the training step, each reference sequence must be annotated with one label. Labels could be a taxonomic category (e.g., genus level) or extrinsic traits (e.g., geographic locations). Submitted label data must be formatted as follows: each line contains one pair
seqeunce ID - labelseparated by tabulations
;. Neither Sequence IDs nor labels must contain these separators. Lines beggining with a
#will be omitted.
Procedure to import labels
Formatted label data can be entered directly into the text area. If the data is large, it is advisable to upload a file with the same format with the button. However there is a limit of 200 MB of data.
A classifier is a model trained and built with one of
training tools ( and ). Each classifier has a unique identifier (Not to be confused with the JOB ID). The classifier ID could be found in the Classifier viewer.
Personal classifier IDs have a prefix (
are seven characters and begin with a
BM) and end with the name of the used machine learning algorithms, e.g.,
MD00EXAMPLE1_SVM. Shared classifier IDs in
PMprefix , e.g.,
Classifier files are a way to persist classification models. Once the classifier is built by one of
training tools, the user could download the classifier file with the button in the Classifier viewer. It is a compressed file (
.tar.gz) containing several files, among them the training model file and a metadata JSON file. Users could upload a previously constructed classifier file (
.tar.gzfile) from their local machines with in the Classifier viewer.
Procedure to load classifiers with Classifier viewer
The Classifier viewer can load a classifier:
- From a personal job folder: Enter the classifier ID in the input area and press on or on Enter
: Press on to select a classifier from the database
From a local file: Upload a classifier file (created with
platform) with and press on
This is the principal application that allows user to annotate a viral sequences according to a chosen classifier. It also serves as evaluation module for classifiers with a labeled test sets. The results are provided with enriched graphics and performance measures.
Procedure to classify sequences
Select and upload a suitable classifier for the classification task (see uploading classifier procedure), then import sequences in FASTA format (see import sequence procedure). DNA sequences should not be labaled. After that, press on button. Select Evaluation mode to test a classifier with a set of labeled sequences. Labels should be embedded into the description of the sequences (>IDSeq label).
The program allows a user to the create and train new classifiers from a set of labeled DNA sequences. It contains default parameters and advanced options letting a user to customize the classifier parameters. It can be used also to update the parameters or input sequences of an already built classifier. The constructed classifiers can be saved in an exportable file locally or publish to the community via
Procedure to build classifier from new data
Procedure to build classifier from other one
It constructs improved classifiers. unlike CASTOR-build that allows user to define metrics, algorithms and feature selection models, It assesses all combinations of the classification parameters and provides the best fitting classifier according to the input data.
Procedure to build improved classifier from new data
Procedure to improve a built classifier
This is a public database of classifiers which allow the community to share their expertise and models. It facilitates experience reproducibility and models refinement. A search engine and classifier properties viewer are also implemented. Hence, from the interface of CASTOR-database, users can download, reuse, update and comment the published classifiers.