TUTORIAL FOR PKS/NRPS WEB SERVER

TUTORIAL FOR THE PKS/NRPS ANALYSIS WEB SERVER

This tutorial will highlight the main features of this webserver. The main goal is to provide you with a comprehensive overview.

NRPS/PKS peptide sequence files have been provided to use in this tutorial: Click on the link below, which will open in a separate window, copy and paste the sequence in the main page window to follow the example.

BacA.fasta

This server is simple to use. As an example you can just paste the sequence of the Bacitracin Peptide Synthetase I (BacA) into the main window and click .

This is a snapshot of the output for BacA.

Clickable Objects:

1. By Clicking on the Name of the protein you just analyzed you obtain the sequence of the entire protein in fasta format.

2. By Clicking on a Domain you obtain the sequence and the coordinate in fasta format: For example for the first A-domain

       >A_DOMAIN_1 14..538
       KMTENEKELILHFNNTKTDYPKNKTLHELFEEQAMKTPDHTALVFGAQRMTYRELNEKAN
       QTARLLREKGIGRGSIAAIIADRSFEMIIGIIGILKAGGAYLPIDPETPKDRIAFMLSDT
       KAAVLLTQGKAADGIDCEADIVQLDREASDGFSKEPLSSVNDSGDTAYIIYTSGSTGTPK
       GVITPHYSVIRVVQNTNYIDITEDNVILQLSNYSFDGSVFDIFGALLNGASLVMIEKEAL
       LNINRLGSAINEEKVSVMFITTALFNMIADIHVDCLSNLRKILFGGERASIPHVRKVLNH
       VGRDKLIHVYGPTESTVYATYYFINEIDDEAETIPIGSPLANTSVLIMDEAGKLVPIGVP
       GELCIAGDGLSKGYLNREELTAEKFIPHPFIPGERLYKTGDLAKWLPDGNIEFIGRIDHQ
       VKIRGFRIELGEIESRLEMHEDINETIVTVREDEESRPYICAYITANREISLDELKGFLG
       EKLPEYMIPAYFVKLDKLPLTKNGKVDRKALPEPDRTAGAENEYE

3. By Clicking on the icon, you obtain the aligned sequences of the particular A3-A6 A-Domain and A3-A6 GrsA Phenylalanine activating A-Domain. This alignment was used to extract the identity of the 8AA lining the binding pocket. The alignment can be used to confirm the computer's prediction which is located next to the icon. If you feel that the identity of these 8AA is wrong, you can rerun the predictive analysis. I made available the old server from this page, by clicking on the link at the bottom of the page "PREDICTIVE BLAST SERVER". You will access the page you were familiar too.

COMPUTER PREDICTION: D G F F L G V V as seen below.

BLAST ALIGNMENT FOR A-domain 1
Query: 2   AYIIYTSGSTGTPKGVITPHYSVIRV-VQNTNYIDITEDNVILQLSNYSFDGSVFDIFGA 60
           AY+IYTSG+TG PKG +  H  +  + V   N +++TE + I Q ++ SFD SV+++F A
Sbjct: 2   AYVIYTSGTTGNPKGTMLEHKGISNLKVFFENSLNVTEKDRIGQFASISFDASVWEMFMA 61

Query: 61  LLNGASLVMIEKEALLNINRLGSAINEEKVSVMFITTALFNMIADIHVDCLSNLRKILFG 120
           LL GASL +I K+ + +  +    IN+++++V+ +       +  +  + + +++ ++  
Sbjct: 62  LLTGASLYIILKDTINDFVKFEQYINQKEITVITLPPTY---VVHLDPERILSIQTLITA 118

Query: 121 GERASIPHVRKVLNHVGRDKLIHVYGPTESTVYATYYFINEIDDEAETIPIGSPLANTSV 180
           G   S   V K    V     I+ YGPTE+T+ AT +   + +    ++PIG+P+ NT +
Sbjct: 119 GSATSPSLVNKWKEKV---TYINAYGPTETTICATTWVATK-ETIGHSVPIGAPIQNTQI 174

Query: 181 LIMDEAGKLVPIGVPGELCIAGDGLSKGY 209
            I+DE  +L  +G  GELCI G+GL++GY
Sbjct: 175 YIVDENLQLKSVGEAGELCIGGEGLARGY 203

4. By clicking on the icon, you can see the complete results of the Blast predictive server. This analysis is run automatically and the top hit is returned. However, you still have access to the entire list of hits

Query= AD1
Database: database/eightball.txt 
                                                                   Score     E
Sequences producing significant alignments:                        (bits)  Value
gi|3046722|emb|CAA06325.1|LchAC-M1-Leu|lichenysin synthetase           21  0.020
gi|3080744|gb|AAD04759.1|LicC-M1-Ile/Leu/Val|lichenysin synthetase     21  0.020
gi|2982194|gb|AAC06346.1|BacA-M3-Ile|Bacitracin synthetase 1           21  0.020
gi|2982194|gb|AAC06346.1|BacA-M1-Ile|Bacitracin synthetase 1           21  0.020
gi|2982196|gb|AAC06348.1|BacC-M1-Ile|bacitracin synthetase 3           19  0.059
>gi|3046722|emb|CAA06325.1|LchAC-M1-Leu|lichenysin synthetase
          Length = 8
 Score = 20.8 bits (42), Expect = 0.020
 Identities = 8/8 (100%), Positives = 8/8 (100%)
Query: 1 DGFFLGVV 8
         DGFFLGVV
Sbjct: 1 DGFFLGVV 8
>gi|3080744|gb|AAD04759.1|LicC-M1-Ile/Leu/Val|lichenysin
         synthetase
          Length = 8
 Score = 20.8 bits (42), Expect = 0.020
 Identities = 8/8 (100%), Positives = 8/8 (100%)
Query: 1 DGFFLGVV 8
         DGFFLGVV
Sbjct: 1 DGFFLGVV 8

5. HMM all Hits/HMM parsed Hits

This server uses Hidden Markov Model (HMM) to predict the identity of each domain in this multi-modular enzymes.

Hidden Markov Models are statistical representations of groups of proteins which share sequence, and consequently, functional similarity.
HMMs can be built to represent very specific enzymatic functions, memberships in a superfamily of related functions, or any stage in-between. The most useful HMMs are those built to represent specific functions. This is the type of HMM that were built for this analysis.

HMMs were built for each known domains from NRPS and PKS. Each HMMs is tested against a larger set of protein to determined the lowest score possible that will still identify the specific domain the HMM was built against. This score is call the "trusted" score or cut-off. A specific cut-off has been determined for each HMMs.

By clicking on the " HMM All Hits" link you can see all the HMM hits before these cut-off score are applied to the entire set of HMM Hits. This is what you get after for example for BacA.fasta

LIST OF PARSE HMMs HITs for 
gi|2982194|gb|AAC06346.1| bacitracin synthetase 1; BacA [Bacillus licheniformis]
A_DOMAIN    1/5     179   388 ..     1   228 []   357.6 2.9e-107
T_DOMAIN    1/5     546   610 ..     1    68 []    79.1  2.8e-24
Cy_DOMAIN   1/1     629  1060 ..     1   450 []   902.9 2.1e-271
A_DOMAIN    2/5    1217  1427 ..     1   228 []   327.4  3.7e-98
T_DOMAIN    2/5    1587  1651 ..     1    68 []    56.4  6.6e-18
C_DOMAIN    1/3    1669  2099 ..     1   455 []   394.1 3.1e-118
A_DOMAIN    3/5    2268  2465 ..     1   228 []   347.4 3.3e-104
T_DOMAIN    3/5    2623  2687 ..     1    68 []    77.5  7.6e-24
C_DOMAIN    2/3    2703  3123 ..     1   455 []   450.3 3.6e-135
A_DOMAIN    4/5    3290  3506 ..     1   228 []   350.8 3.2e-105
T_DOMAIN    4/5    3666  3729 ..     1    68 []    52.1  1.1e-16
E_DOMAIN    1/1    3742  4197 ..     1   481 []   827.8 8.2e-249
C_DOMAIN    3/3    4207  4640 ..     1   455 []   238.3  2.4e-71
A_DOMAIN    5/5    4806  5015 ..     1   228 []   363.4 5.2e-109
T_DOMAIN    5/5    5173  5237 ..     1    68 []    56.1  8.3e-18

What you see here for example.

A_DOMAIN       3/5 2268 2465 .. 1 228 [] 347.4 3.3e-104

A_DOMAIN is the name of the HMM for A-domains.

3/5 means that it is the 3rd hit out of five for A-domains.

2268 2465 is the coordinates on BacA of the Hit. The A-domain HMM was built using only the A3-A6 sequence this is what you see here.

.. Means that the hit was good at both the Nterminus and Cterminus. If a part of the HMM is missing you will see something like that

[.which indicates that the N-terminus is missing or had a very low score, but the C-terminus is present with a high score.

347.4 is the actuall HMM score and is above the trusted cut-off in this case as it is in the parsed HMM hits list.

3.3e-104 represent the probability that this hit is to a A-domain. In this case really good.

In this list, by looking at the coordinates of each domains you can very quickly see if there were domains that the analysis didn't predicted or a with novel function.

This is also done automatically if the gap between two domains is more than 450 aa. This is very arbitrary, but it serves well in the testing phase. In the case an unknown domain is found, this is the icon you will see in the top part of the output screen:

In addition, a message will appears at the bottom of the screen indicating the possibility of a unrecognized domain and it's coordinates on the protein.