Documentation

The following documentation is to help the user in usage of K-PAM web server. K-PAM web server has three major modules: (i) serotype prediction (K-type and O-type), (ii) 3-diemsional structural depository of surface antigens (75 K-antigens and 11 O-antigen associated LPS) and (iii) K-antigen modeler, and an additional module (iv) hypervirulent strain identifiction. All the modules are described in detail under individual sections.

Construction of local database

A local database comprising of Klebsiella spp. wza,wzb,wzc,wzi,wzx,wzy,wbap and wcaj gene as well as their protein sequences (whose K-types are already known) has been constructed by fetching the FASTA format sequences from NCBI or Kaptive web server and are grouped according to 141 distinct K-serotypes. It is noteworthy that the newly defined K-types, namely KN1-KN3, KL103-KL128, KL130-KL153, KL155 and KL157-KL165, whose phenotype is not yet defined, has also been incorporated in the database. Thus, the database consists of 1345 and 1095 non-redundant gene and protein sequences respectively corresponding to 8 genes in the CPS-loci.

The genes that lack the sequence corresponding to a particular K-type (given in the bracket) are: wzi (K33 & K40), wzb (K50), wzc (K50), wzx (K34, K50, KL107 & KL127) and wzy (K29, K50, K71, KL107, KL108, KL109, KL110, KL113, KL116, KL118, KL126, KL147, KL148, KL149, KL150, KL152, KL153, KL157, KL158 & KL161). Note that wbap & wcaj are mutually exclusive genes.

Similarly, the database also has a total of 13 distinct O-antigen serotypes including 4 sequences that belong to Klebsiella O-antigen types OL101-OL104. Further, to perform O-antigen prediction, 43 wzm and 44 wzt gene sequences and 22 Wzm and 23 Wzt protein sequences that are classified according to their O-antigen type have been incorporated in the database.

Complete list of Genbank accession numbers corresponding to CPS locus genes and wzm & wzt are provided here:

K-antigen typing

K-PAM provides the freedom of using a single or multiple genes for K-typing. The user can either specify the gene type or let the server to identify it. Thus, either a single gene or entire CPS locus or whole genome sequence can be given as input for prediction. Gene or protein sequences (FASTA format) corresponding to one or more CPS genes (wza, wzb, wzc, wzi, wzx, wzy, wbap and wcaj) can be used as input for K-type prediction. The user can also upload the whole genome sequence in either FASTA or FASTQ format as single or multiple files. Additionally, the user has the option of directly specifying the GenBank ID. K-PAM offers three options for the users to perform K-type prediction under single or multiple gene options: a) nucleotide sequence searched against nucleotide sequence in the reference database (NN), b) protein sequence searched against protein sequence reference database (PP) and c) nucleotide sequence searched against protein sequence in the reference database (NP).

Input for K-antigen typing

Query sequence information can be given to the server as any of the following ways:

NCBI Genbank ID (nucleotide ID or protein ID) associated with the sequence
Through manual entry of nucleotide or protein sequence in fasta format
Upload the file containing the query sequence
Multiple files as query (For standalone version)

Sample input sequences for manual entry

Nucleotide input query sequence with multiple coding regions or as a complete fasta sequence (input for NN/NP options): sample_nn.fasta (OR) sample_nn_unannotated.fasta
Protein input query sequence with multiple coding regions (Input for PP option): sample_pp.fasta

Note: If genomic DNA and plasmid DNA sequences of a strain are stored in a single file, use the stand-alone version of K-PAM for serotype prediction, else store the sequences in individual files before submitting to web-gui of K-PAM.

Sample example for filling form for serotyping with single gene sequence or with query sequence containing multiple coding regions in the Serotype predictor subpage is given below.

Output for K-antigen typing with single gene query

Some test cases corresponding to the clinically important Klebsiella species whose serotypes are undefined are used to demonstrate the serotype prediction method using single protein coding region. The output are listed in the file : Table_single_prot.pdf

Use of more than one protein coding region increased the prediction accuracy

The figure above is a schematic illustration of K-typing using both Wzi & Wza. Protein query sequences corresponding to both Wzi and Wza (NCBI accession ID: SCA96233.1 and SCA96234.1 respectively) are aligned against the respective reference sequences stored in the local database. The common outcome from both the alignments is considered as the predicted K-type for the input query sequences.

Few similar examples illustrating the importance of K-type prediction by the combined use of Wzi and Wzc sequences (highlighted in green) has been shown in the table below.

**Use of only eight cps protein coding regions in K-typing**

Output for K-antigen typing with a query sequence with multiple coding regionsThe serotype prediction example considering multiple protein coding regions in the query sequence (NCBI accession number LT174536.1). An input box is provided for submitting either an NCBI accession ID or a query sequence along with the options to choose the gene/protein type (or “all”, if gene information is unknown) and the type of alignment (NN, PP or NP). The predicted serotype is shown (here, K15) in the result page. The proteins present in the query sequence and used in serotyping are given in the color-coded box. The user can get the antigen structural/chemical details by clicking the serotype button. A graph representing the reliability of serotype prediction with respect to the individual protein coding sequences. D) Tables (stacked) summarizing the prediction with respect to the individual gene sequences.

Sample examples for serotyping with single gene sequences and different modes of input are provided in the Serotype predictor subpage is given below.

**K-PAM: Klebsiella species serotype predictor and surface antigens modeler**

Documentation

Serotype prediction

Construction of local database

K-antigen typing

Input for K-antigen typing

Note: If genomic DNA and plasmid DNA sequences of a strain are stored in a single file, use the stand-alone version of K-PAM for serotype prediction, else store the sequences in individual files before submitting to web-gui of K-PAM.

Output for K-antigen typing with single gene query

Use of more than one protein coding region increased the prediction accuracy

**Use of only eight cps protein coding regions in K-typing**

Hypervirulent strain identification

Stand alone version of K-PAM

3-dimensional structural database

Nomenclature used in chemical representation of monosaccharides in K-PAM

Sugar name: α-L-Fucp
Position: 1-2-3456

Schematic representation

K-antigen modeler

Antigen modeling Methodology

Four letter code

K-PAM Format converter:

K-PAM: Klebsiella species serotype predictor and surface antigens modeler

Documentation

Serotype prediction

Construction of local database

K-antigen typing

Input for K-antigen typing

Note: If genomic DNA and plasmid DNA sequences of a strain are stored in a single file, use the stand-alone version of K-PAM for serotype prediction, else store the sequences in individual files before submitting to web-gui of K-PAM.

Output for K-antigen typing with single gene query

Use of more than one protein coding region increased the prediction accuracy

Use of only eight cps protein coding regions in K-typing

Hypervirulent strain identification

Stand alone version of K-PAM

3-dimensional structural database

Nomenclature used in chemical representation of monosaccharides in K-PAM

Sugar name: α-L-Fucp Position: 1-2-3456

Schematic representation

K-antigen modeler

Antigen modeling Methodology

Four letter code

K-PAM Format converter:

**K-PAM: Klebsiella species serotype predictor and surface antigens modeler**

**Use of only eight cps protein coding regions in K-typing**

Sugar name: α-L-Fucp
Position: 1-2-3456