PubChem Databases

Дата канвертавання25.04.2016
Памер31.97 Kb.
PubChem (1) is designed to provide information on biological activities of small molecules, generally those with molecular weight less than 500 daltons(2). PubChem's integration with NCBI's Entrez (3) information retrieval system provides sub/structure, similarity structure, bioactivity data as well as links to biological property information in PubMed and NCBI's Protein 3D Structure Resource.
PubChem Databases
PubChem is comprised of three linked databases --
PubChem Compound,

PubChem Substance and

PubChem Bioassay
PubChem Compound (unique structures with computed properties)
PubChem Compound (4) is a searchable database of chemical structures with validated chemical depiction information provided to describe substances in PubChem Substance. Structures stored within PubChem Compounds are pre-clustered and cross-referenced by identity and similarity groups. PubChem Compound includes over 5M compounds.

  • Molecular Name Searches (e.g., Tylenol, Benzene) allow searching with a variety of chemical synonyms,

  • Chemical Property Range Searches (e.g., Molecular Weight between 100 and 200, Hydrogen Bond Acceptor Count between 3 and 5) allow searching for compounds with a variety of physical/chemical properties, and descriptors.

  • Simple Elemental Searches (all compounds containing Gallium) allow searching with specific element restrictions.

PubChem Substance (deposited structures)

PubChem Substance (5) is a searchable database containing descriptions of chemical samples, from a variety of sources, and links to PubMed citations, protein 3D structures, and biological screening results available in PubChem BioAssay. PubChem Substance includes over 8M records. Substances with known content are linked to PubChem Compound.

  • Molecule Synonym Searches (e.g. all substances with 'deoxythymidine' as a name fragment, or substances that contain 3'-Azido-3'-deoxythymidine).

  • Biology Links Search (e.g. substances with tested, active or inactive bioassays).

  • Combined Searches (e.g. substances that are 'Active in any BioAssay' and contain the element Ruthenium).

PubChem BioAssay

PubChem BioAssay (6) is a searchable database containing bioactivity screens of chemical substances described in PubChem Substance. PubChem BioAssay includes over 180 bioassays. Searchable descriptions of each bioassay are provided that include descriptions of screening procedural conditions and readouts.

  • To Search for BioAssay Data Sets (e.g. HIV growth inhibition).

  • To Browse or Download PubChem BioAssay Results (NCI AIDS Antiviral Assay)

Searching PubChem

PubChem Text Search
PubChem Text Search for searching compound name, synonym or ID that defaults to

PubChem Compound. The search results page offers a pull down 'databases' menu that

allows searching in PubChem Substance, PubChem BioAssay and a variety of other Entrez


PubChem Chemical Structure Search
PubChem Chemical Structure Search (7) has the following options: Search SMILES (including SMARTS or InChI) or Formula which includes a 'Sketch' link to a drawing program that converts structural diagrams to SMILES(exact), SMARTS(substructure) or InChI(exact) strings for searching.
Clicking 'Done' on the 'structure editor' converts the structural diagram to the appropriate string and transfers it to the search box.
Select Structure File allows importation of standard and common chemical file formats (8).
Specify Search Type allows restriction to: same compound, similar compounds (9), formula or


PubChem Indexes and Index Search
PubChem Indexes and Index Search allows fielded/range searching from either the PubChem homepage or Entrez search page. A extensive list of field aliases and examples of range searching is provided (10).

PubChem Search Results (11)

PubChem Compound
PubChem Compound results are derived from PubChem Substance records that provide structures. Since compounds are structurally unique, one compound may link to multiple substances.

The default display is a compound summary with thumbnails with cross links(12) to each PubChem database, other NCBI databases, and depositor's databases.

Clicking either the structure or SID link gives the full display which includes the compound's property data, description, related substance information, neighboring structures, and cross links.

PubChem Substance

PubChem Substance has unique records if the structure is not known or supplied. For example, Sulfated polymannuroguluronate, a novel anti-acquired immune deficiency syndrome (AIDS) drug candidate, and other natural products.
The PubChem Substance Summary Record,

SID: 3724242


Sulfated polymannuroguluronate, AIDS218087 ...

Source: NIAID(218087)

is linked to the full record by clicking on the SID number (PubChem's substance identifier). This displays the full substance record, that includes links: to PubMed and the source; the Medical Subject Annotation (MESH Substance Name) and a MESH PubMed search link; and depositor supplied synonyms and comments.

PubChem BioAssay

The PubChem BioAssay Summary Record,

AID: 179


NCI AIDS Antiviral Assay
Source: DTP/NCI
15 Readouts, 37678 substances tested


is linked to the full record by clicking on the AID number (PubChem's assay (protocol) identifier). This displays the full bioassay record, that includes: links to the substances tested (all, active, inactive, inconclusive) and related PubMed, Protein, Taxonomy, OMIM and related BioAssay records; and a description of the assay possibly with protocols and comments.


1a. PubChem
1b. PubChem - Overview
1c. PubChem FAQ
1d. PubChem Glossary
1e. PubChem - Help

"Provides tips and examples for searches of the three PubChem databases by text term/keyword, as well as tips for searching PubChem Compound by chemical properties. The help documents for structure search provide tips on using chemical information for basic and advanced structure search options in the PubChem Structure Search."

2. NIH Roadmap for Medical Research. Molecular Libraries and Imaging.
3. Entrez Databases
4a. PubChem Compound

"Compound -- Chemical representatives in a substance. Chemical structure presented in a compound is standardized through PubChem's data pipeline. A mixture substance may have several standardized compounds." Since compounds are structurally unique, one compound may link to many substances. CID is PubChem's compound identifier.

4b. PubChem Compound Database - search examples
5a. PubChem Substance

"Substance -- Individual record object collected from depositors, representing a sample used at bioassay."

5b. PubChem Substance Database - search examples
6. PubChem BioAssay
6b. PubChem BioAssay Database - search and display examples
7a. PubChem Structure Search
7b. PubChem Structure Search Help
7c. PubChem Advanced Structure Search Help
8. PubChem Structure Search Help. Upload Query File

"Most (if not all) standard and common chemical file formats may be used, including "MOL", "SDF" (both v2000 and v3000), "CDX", "SKC", "MOL", "MOL2", "JME", and "SK2". You may also use a text file with your choice of a "SMILES", "SMARTS", or "SLN" string."

9. PubChem Help. Similar Compounds/Substances Link.

"The different percent similarities are determined using a Tanimoto score relative to the "binary fingerprint" calculated for two different chemical structures."

10. PubChem Indexes and Index Search
11. PubChem Summary Display
12. PubChem Cross Links

>1) Go to the PubChem "Compound" or "Substance" pages, depending on whether

>you want unique structure records only or all deposited structures. The URLs



> (1 hit for the

>InChI below)



>(5 hits for the InChI below)


>2) Paste in your InChI, for example:





>Note that the QUOTES ARE REQUIRED, and there must be no carriage return or

>line feed in the string, despite appearances on this email. Note also that

>the current text query system does not actually recognize the numbers and

>punctuation characters, just the count. It seems to identify the correct

>structures most all the time, nonetheless. I am told a proper recognition

>system for InChI's (a structure decoder) is in the works. Anyone

>interested in this should contact NLM/NCBI directly.

Entrez cross-database search page
Titled: "Entrez: The life sciences search engine" this page is a useful gateway for searching all NBCI databases including PubMed, PubChem, Genome projects and more. Users can enter terms and click 'GO' to run the search against ALL the databases, OR Click Database Name or Icon to go directly to the Search Page for that database, OR click Question Mark for a short explanation of that database.

eMolecules discovers sources of chemical data by searching the internet, and receives submissions from data providers such as chemical suppliers and academic researchers.

This is the most comprehensive overhaul of eMolecules since our launch in November 2005. We rebuilt the eMolecules database from the ground up, starting from updated databases and catalogs. Then we added four million new entries from dozens of new sources. In addition, the results pages were redesigned for a cleaner, more compact presentation.
Over 5.5 million unique molecules from over 16 million sources

Over 500,000 CAS numbers

Over 100 chemical suppliers

New government and academic databases, such as NIST and NCI, with direct links to their data

Tens of thousands of trade names and common names

Over 2 million IUPAC and other names

We provide links to real molecules, those available for purchase, and to chemical properties databases with real information, whenever possible, and without ambiguity in the stereochemistry.

База данных защищена авторским правом © 2016
звярнуцца да адміністрацыі

    Галоўная старонка