Search for genes/targets

Use this form to search for proteins/targets using any of the following criteria. Note that all selected criteria will be evaluated as the intersection (boolean 'AND') of the respective result sets. You can read more about this SEARCH page in the User's Manual.

2. Filter targets based on:

EC number
Gene Ontology
Pfam / Interpro domains
GO Slim Category
KEGG high-level pathway  
KEGG detailed pathway    
Protein length (AA)  
Molecular weight  
Isoelectric point
# of transmembrane (TM) spans
Signal peptide
GPI Anchor  
Retrieve targets with three dimensional data from  
Crystal structures (PDB) Structural models (from Modbase)
Number of models (at least)
Expr. of M. tuberculosis  
Expression of P. falciparum  

Notes on available datasets:

murphy: Identification of gene targets against dormant phase Mycobacterium tuberculosis infections. PubMed
hasan: Prioritizing genomic drug targets in pathogens: application to Mycobacterium tuberculosis. PubMed

Toggle More information about this search

Expression evidence was collected from several microarray experiments and combined in a simplified scheme. Genes were grouped in 5 categories depending on how much upregulation they showed in the selected life cycle stage: 0-20%, 20-40%, 40-60%, 60-80% and 80-100%. With 0-20% representing the lower 20% of genes, showing less upregulation in the corresponding life cycle stage; and 80-100% representing the top 20% of genes (i.e. those showing a higher upregulation).

More detailed queries (fine grained) on gene expression can be performed with the original datasets at their respective sources (for example PlasmoDB).

Restrict to targets with orthologs (present/absent) in:

A. thaliana B. bovis
B. malayi C. albicans
C. elegans C. hominis
C. parvum C. trachomatis
D. discoideum D. melanogaster
E. coli E. granulosus
E. histolytica E. multilocularis
G. lamblia H. sapiens
L. braziliensis L. donovani
L. infantum L. major
L. mexicana L. Loa (eye worm)
M. leprae M. musculus
M. tuberculosis M. ulcerans
N. caninum O. sativa
P. berghei P. falciparum
P. knowlesi P. vivax
P. yoelii S. cerevisiae
S. japonicum S. mansoni
S. mediterranea T. brucei gambiense
T. brucei T. congolense
T. cruzi T. gondii
T. pallidum T. parva
T. vaginalis W. endosymbiont of Brugia malayi
Number of Paralogs

Retrieve targets for which genome-wide information about their essentiality is available.
If genome-wide information for an organism is not available, you can evaluate the essentiality of the corresponding orthologs by selecting a different species from the options below. Also note that essential genes for your organism of interest might show up during manual curation (check the Validation data search option further down).

Any evidence of essentiality in any species


Select the species and the type of 'essential' phenotype from the options below: note that if you select more than one option, the resulting set of genes will be the UNION (boolean OR) of the selection.

C. elegans

E. coli

M. tuberculosis

P. berghei

S. cerevisiae

T. brucei

Note that the ongoing curation effort will be producing curated data for all WHO target organisms. At this moment curated information is currently limited to T. brucei, L. major, T. cruzi,P. falciparum, and S. mansoni. Curated data from other organisms will be made available soon.

Any form of validation
Genetic validation
Pharmacological validation
Observed Phenotype

Toggle advanced search form (recommended for expert users) What's this?

Phenotype Advanced Form

You can use the advanced form to further refine your phenotype filtering

Note that you cannot use both the simplified form above and the advanced form below at the same time. This will produce unpredictable results! You have been warned!

To annotate phenotypes and/or validation experiments, we are using a phenotype syntax composed of terms derived from a number of controlled vocabularies (i.e. ontologies, e.g. GO, PATO, ECO). The syntax we are using is similar in spirit to the Pheno-syntax format described by Chris Mungall, although we have not attempted to formally comply with the pheno-syntax grammar.

The syntax consists of a collection of tag-values. Tags are i) the phenotypic quality or attribute (Q), ii) the entity bearing the phenotype (B), iii) an anatomy term that describes where the phenotype occurs (A) — not relevant for unicellular organisms, and iv) a term describing the timing during which the phenotype occurs (T) – usually a developmental stage.

As an example, the way to describe "slow growth in bloodstream forms" using this syntax is:

  • Phenotypic Quality (Q) = slow
  • Affected entity (B) = growth
  • During (T) = bloodstream form
  • Where (A) = whole organism

In this case the affected entity is growth and the way in which the growth is affected is described by the Q term (slow in this case), and this phenotype occurs in the whole organism.

The form below gives you access to search for targets using any of the terms we have used to annotate them. Do note that only a few combinations of Q, B, A, and T terms are represented in the db, even though you can search for all of them!


Affected entity
Phenotypic quality
Where (anatomic term)
When (time term)




Observed in
Druggability index
Assosiated compounds

Toggle More information about this search

The druggability index query lets you restrict your search based on a druggability measure.

The druggability index (Dindex) is a composite score consisting of a weighted normalised sum, where each of the different druggability prediction methods are given different weights depending on their relative contribution to prediction. The Dindex values range from 0 to 1, where a larger index score means a more likely to be druggable target. A description of this analysis is provided in "Al-Lazikani B, Gaulton A, Paolini G, Lanfear J, Overington J and Hopkins A (2007). The molecular basis of predicting druggability; in Bioinformatics, from Genomes to Therapies, Vol 3. Edited by Thomas Lengauer, Wiley-VCH"

In cases where the information about druggability is not available from the pathogen protein itself, it is derived from the closest druggable homolog, and the degree of similarity and conservation of essential motifs and features is used to adjust the final dindex. Known druggable targets were derived from the Inpharmatica literature SAR database (Starlite)

The similarity vs druggable targets query lets you restrict your target searches to those that have a positive hit (using BLAST) against the Starlite database of known targets. Do note that this filter and the Dindex filter are redundant. The Dindex calculation already includes a similarity component. This filter is mainly provided for new genomes in TDR Targets (S. mansoni) that were not yet submitted for Dindex calculation.

The compound desirability query lets you restrict your search to targets based on the chemical quality of the compounds associated with the target (or with its closest druggable homolog).

The compound desirability index (Pfizer) was calculated for the compounds associated with each of the closest druggable targets (as measured by BLAST similarity, see above). The compound desirability index is a fitness value that summarises the average `chemical quality' of each target.
The compound desirability value links direct to actual compounds (which could the basis for composing target-specific screening subsets of compounds), but these have not been disclosed and are not available for searching and/or display.

The desirability function is based on Harrington's desirability index where the target function is based on the molecular properties distribution of oral, small molecule drugs. The function also contains penalty functions for acticity, promiscuity and structural alerts (risk and reactive groups in compounds).

Associated compounds: associations between genes and chemical compounds are derived from a number of sources. In each case the association between a gene and a compound has been done by different methods, and thus the reliability and/or the relevance of the association is different for each source.

The sources are: i) manual curation (more reliable, more relevant); ii) transitive (semi-automatic) association of compounds to gene A based on its similarity to a gene B for which there are known compounds (DrugBank); iii) finally associations between compounds and enzymes have been established based on co-occurrence of EC numbers and compound identifiers (CAS registry IDs) in PubMed abstracts (indexed for MEDLINE). In this last case because the link between the gene and the compound is established only due to co-occurrence, there is no guarantee of any functional association between the targets and the associated compounds. In other words, there is no guarantee that genes annotated with an EC number will interact, bind to, or be inhibited by the compounds that are mentioned in the corresponding papers. Also, the compounds obtained in this way might not be the same as those for which the desirability index has been calculated. You have been warned.

Assay available
Reagent available

Toggle More information about this search

Assay available is currently evaluated as positive if the target is an enzyme, and if the enzyme has an assay described in the list of enzyme assays available from Sigma-Aldrich or if the pathogen protein has been assayed according to the BRENDA database.

Reagent available is evaluated as positive if the pathogen protein has been cloned or purified according to the BRENDA database or if recombinant, soluble protein has been produced by the Consortium for Structural Genomics of Pathogenic Protozoa. This is based on their ability to express the protein in an heterologous system (usually E. coli) in soluble form. You can check the status of their progress here.

Only search for genes for which we have mapped publications in PubMed

In PubMed


Search for genes associated with a particular publication

PubMed ID