What is CatANalyst

CatANalyst is a web service for predicting catalytic residues. Starting from a protein sequence or from a protein tertiary structure, CatANalyst combines several features into a vectorial representation for each residue in the protein. Among the features extracted, there are conservation profiles, residue structural neighbourhood features, protein solvent accessibility, protein secondary structure, protein clefts, etc. A detailed description of the underlying predictor and its state-of-the-art performance on a set of benchmark datasets can be found here.

Based on this residue representation, a Support Vector Machine that has been trained on thousands of catalytic and non-catalytic residue examples is queried to predict if the residues in the input sequence are catalytic.

CatANalyst outputs the protein sequence by rendering the residues with different colour temperatures and sizes, reflecting how likely they are to be catalytic.

Choose the most suitable predictor according to the input data you have. If you only know the primary structure choose the sequence-based predictor, if you also know the tertiary structure choose the structure-based one: the richer the input information the more accurate the prediction.

Prediction from the sequence

The image below shows the input form of the sequence-based predictor. As you can see from the image, you can enter directly in the text area one or more protein sequences in FASTA format. Alternatively, you can upload a text file containing the FASTA sequences on the server. If you ask for the results to be notified by e-mail, you have to specify a valid e-mail address. The prediction results will be sent to that e-mail address when ready. If you prefer a more interactive notification, by specifying "interactively" the results will be shown in the same web-browser window. In this second case, you can also specify your email address and receive a notification when your prediction task is complete.

Example prediction from sequence.

By clicking on the "Submit" button your request will be enqueued on the CatANalyst server. Note that the elaboration can take several minutes depending on several factors such as the CatANalyst server load, the length of the submitted protein/proteins, etc.

If you asked for the results to be displayed interactively, you will see a page like the one displayed below:

Example of interactive results.

You can bookmark the page and check the results later when they become available. Alternatively you can leave your browser window open: the page is automatically refreshed every 10 seconds, if the results are available, they are displayed on the same page. If the page is not automatically refreshed, you can safely reload the page manually and check the status of your job (queued, running or finished).

Prediction from the structure

Like for the case of the sequence-predictor above, you can specify to have the results displayed interactively or notified by e-mail. What is different with the structure-based predictor is the input data:

Example prediction from structure.

You can enter the PDB identifier of the protein and the chain you want to analyse. Click on the "Example" button for an example protein and chain.

Alternatively you can upload on the server a file in standard PDB format. Note that this feature should be used only for structures that are not present in the Protein Data Bank (e.g. predicted structures). If the PDB structure is already present in the Protein Data Bank better accuracy of the prediction could be obtained by entering directly the PDB ID and chain.

Entering the protein chain is mandatory in the current version of the server and it is quite relevant if you are specifying a PDB ID which refers to a multimer molecule. If the chain is not specified in the PDB file, just enter A.

The results

The picture below shows an example result page:

Example of result visualisation.

The size of the residues reflects how likely they are to be catalytic. The same holds for the colour temperature ranging from blue to red. Detailed predictions are reported for residues that CatANalyst classified as catalytic with at least 0.5 probability. You can also shift the probability for the visualization of the putative catalytic residues by selecting a different threshold in the little pop-up menu.

Whenever you find the link "Catalytic Site Atlas (CSA) annotations" in the page, you can visualize the annotations available for your protein in the Catalytic Site Atlas (CSA).

The entire set of predictions can be visualized by clicking on the link "below".