Other Databases


DrugBank The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information. The latest release of DrugBank (version 5.0.10, released 2017-11-14) contains 10,523 drug entries including 1,739 approved small molecule drugs, 874 approved biotech (protein/peptide) drugs, 106 nutraceuticals and over 5,029 experimental drugs. Additionally, 4,775 non-redundant protein (i.e. drug target/enzyme/transporter/carrier) sequences are linked to these drug entries. Each DrugCard entry contains more than 200 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data.
TTD Therapeutic Target Database (TTD) is a database to provide information about the known and explored therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs directed at each of these targets. Also included in this database are links to relevant databases containing information about target function, sequence, 3D structure, ligand binding properties, enzyme nomenclature and drug structure, therapeutic class, clinical development status. All information provided are fully referenced.
PharmGKB PharmGKB annotates drug labels containing pharmacogenetic information approved by the US Food and Drug Administration (FDA), European Medicines Agency (EMA), Pharmaceuticals and Medical Devices Agency, Japan (PMDA) and Health Canada (Santé Canada) (HCSC). PharmGKB annotations provide a brief summary of the PGx in the label, an excerpt from the label and a downloadable highlighted label PDF file. A list of genes and phenotypes found within the label is mapped to label section headers and listed at the end of each annotation. PharmGKB also attempts to interpret the level of action implied in each label with the "PGx Level" tag.
DGldb The drug–gene interaction database (DGIdb, www.dgidb.org) consolidates, organizes and presents drug–gene interactions and gene druggability information from papers, databases and web resources. DGIdb normalizes content from 30 disparate sources and allows for user-friendly advanced browsing, searching and filtering for ease of access through an intuitive web user interface, application programming interface (API) and public cloud-based server image. DGIdb v3.0 represents a major update of the database.
Pubchem PubChem contains a wealth of chemical structures, bioactivity, health & safety, spectra data and more. In addition its web interface, PubChem provides direct data access via programmatic services and FTP downloads.If you use data or services from PubChem, proper citation helps readers locate the original source of the work. Some records in PubChem are directly assembled from the contributing organization with attribution provided on the record. Other records, like Compound records, contain derived data in addition to collected annotations with attribution.
PolySearch2 A critical task in biomedical text mining is to discover potential associations between various types of biomedical entities. PolySearch 2.0 (polysearch.ca) is an online text-mining system for identifying relationships between human diseases, genes, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch 2.0 supports a generalized 'Given X, find all associated Ys' query, where X and Y can be selected from the aforementioned biomedical entities. For example, 'Find all associated diseases with Bisphenol A'. PolySearch 2.0 searches for associations against comprehensive collections of free-text corpora, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles, and US Patent application abstracts. PolySearch 2.0 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and HMDB to improve its accuracy and coverage. PolySearch 2.0 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch 2.0 also generates, ranks, and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation.
Cosmic COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer.
HGNC HGNC is responsible for approving unique symbols and names for human loci, including protein coding genes, ncRNA genes and pseudogenes, to allow unambiguous scientific communication.
CHAT The Cancer Hallmarks Analytics Tool (CHAT) was developed and is maintained in collaboration between the Language Technology Lab at University of Cambridge (UK) and the Institute of Environmental Medicine at Karolinska Institutet (Sweden). In addition to this tool, we share our software code under open licence which can be found here: https://github.com/cambridgeltl/chat Likewise with our Natural Language processing pipeline and classifier software code, along with our annotated and labelled corpus which can be found here: https://github.com/cambridgeltl/chat-classifier.
ChemSpider ChemSpider is a free chemical structure database providing fast text and structure search access to over 60 million structures from hundreds of data sources.
KEGG KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies. See Release notes (December 1, 2017) for new and updated features.
ChEMBLdb ChEMBLdb is a database of bioactive drug-like small molecules, it contains 2-D structures, calculated properties (e.g. logP, Molecular Weight, Lipinski Parameters, etc.) and abstracted bioactivities (e.g. binding constants, pharmacology and ADMET data). We attempt to normalise the bioactivities into a uniform set of end-points and units where possible, and also to tag the links between a molecular target and a published assay with a set of varying confidence levels. The data is abstracted and curated from the primary scientific literature, and cover a significant fraction of the SAR and discovery of modern drugs.