RCSB PDB Help

Identifiers in PDB

Overview

Identifiers or IDs are commonly used in data resources to point to specific data contents. They may also be used to connect different data resources and indicate their relationships.

In the PDB, identifiers are used at all levels of the structural hierarchy in the entry. This includes:

  • PDB ID for the entry - currently 4-character alphanumeric, planned to be extended in the future to eight characters prefixed by ¡®pdb¡¯
  • Numeric ID for the assemblies in the entry
  • Chain ID for instances of entities - commonly 1- or 2-character alphanumeric
  • Residues and small molecule ID - either 3 character or 5 character long
  • ¡°ATOM¡± or ¡°HETATM¡± ID and 4-character atom names for individual atoms

These identifiers are used to specifically select, visualize, locate a specific instance of a ligand, amino acid in a protein chain in a particular PDB entry. Hence learning about them can help specifically locate, visualize, and analyze all or specific parts of PDB structures.

Relevance of Identifiers in PDB Exploration

The PDB archives the location (three-dimensional coordinates) of each atom in a structure. In order to explore the structure and analyze molecular interactions in atomic detail, the locations of each atom in the PDB must be uniquely assigned. Various identifiers are used to specifically indicate one atom or groups of atoms. These identifiers enable users to visually or programmatically select one or more atoms of interest in order to visualize the selected atoms, specifically represent them as ribbons, ball and stick, or spacefill, and/or analyze them (such as measuring distances, angles, and torsions).

In addition, some IDs are also drawn from other data resources and are included either in the data files or in the RCSB archive for ease in connecting PDB data to related information (such as protein or nucleic acid sequences and EM maps). As a result, these IDs can be used for rapidly searching an archive for specific structures related to a topic of interest.

Identifiers, Conventions, and Examples

Various types of identifiers used at different levels of organization in the PDB are used for query and browsing. Some of the key identifiers used for searching, cross-referencing with other data resources, and pin-pointing data fields at various organizational levels are described here with examples. Learn more about Organization of 3D Structures in the Protein Data Bank.

Entry level Identifiers

Experimental Structures
Every experimental structure in the PDB is assigned a 4-character alphanumeric identifier called the PDB identifier or PDB ID (e.g., 2hbs). In some cases, large groups of structures (e.g., a protein bound to a series of different inhibitors/drugs) are submitted to the PDB. In addition to PDB IDs these structures have an additional identifier, called the Group ID (e.g., G_1002018). The structure(s) may be described in the scientific literature, so associated PubMed IDs (e.g., 28436492) may be used to search the archive for these structures. Structures determined by Electron Microscopy must have associated EMDB IDs (e.g., EMD-21578) that connect the structure to EM maps that were used to solve the structure or to maps of related structures.

Computed Structure Models (CSMs)
Currently, there is no community-wide standard to enforce naming conventions for CSMs. Custom identifiers were introduced to normalize and sanitize entry identifiers during loading and make them an integral aspect of RCSB.org infrastructure. These identifiers are namespaced and indicate the source repository (e.g., AF-A0A452S449-F1 from AlphaFold DB and ma-bak-cepc-0001 from ModelArchive). This aligns with extended PDB ID codes (e.g., PDB_00001ABC), which will become necessary when the pool of 4-character PDB identifiers has been exhausted. The original identifiers are retained to ensure interoperability with external resources and for searching using the original identifiers. Learn more about plans about extended CCD or PDB IDs.

Entity level Identifiers:

Entities in the structure may be

  • macromolecules or polymers (e.g., proteins or nucleic acids)
  • oligosaccharides or branched polymers (e.g., hyaluronic acid)
  • small molecules or non-polymers (e.g., ligands, inhibitors, and individual residues)
  • complex small molecules with macromolecule-like composition (e.g., peptide-like inhibitors and antibiotics also called Biologically Interesting or BIRD molecules)

A Protein or peptide (short fragment of protein) whose sequence has been mapped to UniProt includes a UniProt Accession Code (e.g., P01019) for that entity. Similarly gene sequences mapped to GenBank have associated GenBank Accession Codes (e.g., 55771382).

A small molecule, ligand, or individual residue has a Chemical ID assigned in the Chemical Component reference Dictionary (e.g., ATP or A1LU6), while a complex small molecule such as a peptide-like inhibitor, antibiotic, or well-known di- or trisaccharide has a Biologically Interesting molecule Reference Dictionary identifier or BIRD ID (e.g., PRD_000006).

Within a PDB entry all entities are assigned unique IDs (e.g., entity 1). The Entity ID is specific to the particular structure (e.g., 4HHB_1 refers to entity 1 in PDB entry 4HHB) and is used to track its properties throughout the file - such as name, sequence, source, and links to IDs from other databases or dictionaries (e.g., UniProt, GenBank, Chemical, BIRD)

Instance level Identifiers:

An instance is a distinct copy of an entity or molecule. Instance level IDs are assigned according to the type of entity.

Macromolecular Instance ID

Macromolecules are polymeric chains made of covalently linked building blocks, such as amino acids and nucleotides. For each instance of protein or nucleic acid in the entry Chain IDs (e.g., A, A1, AA ) are assigned. Two sets of chain IDs are found in each PDB entry - one assigned by the PDB (label_asym_id), usually beginning with the alphabet A, and the other selected by the author (auth_asym_id) at the time of deposition. Most commonly both these chain IDs are the same but in some instances they may differ - e.g., in PDB ID 2or1, the author assigned chain IDs for the protein chains in the entry are L and R, while the PDB assigned ones are C and D respectively.

The polymer sequences are included in the PDB file, both in FASTA format (one-character codes) and as a list of Chemical IDs (or three-character codes) of amino acids from the N- to C-terminal end. Any residue in a polymer chain is specified by specifying its chemical ID (e.g., SER for the amino acid Serine) and residue number, or position in the polymer chain. Two residue numbering schemes are included for each residue (amino acid or nucleotide) in the file - a PDB assigned sequential numbering (label_seq_id) that starts from 1, and an author-specified numbering (auth_seq_id) that may match the numbering of related structures reported in the literature and/or the numbering of associated sequence database (e.g., UniProt) entries.

For example, in PDB ID 6kr6, the amino acids in protein Piwi have a PDB assigned sequential numbering from 1-810, while the author defined residue numbers are from 34-843 to match the UniProt numbering. While some visualization tools may display both residue numbers, as included in the cif format file, others tools may use only the one listed in the PDB format file, i.e., the residue numbers from the author.¡± Learn more about PDB and PDBx/mmCIF format files. Learn more about the use of two chain IDs.

In some cases selected residues or parts of residues may have alternate locations as determined by the experiment. Each alternate location of a particular atom is differentiated with a unique Alt ID. For example, the residue number Ser 9 in Chain D in PDB entry 1trz has two atoms, each with alternate IDs A and B. When all the atoms of a structure have multiple locations, they are presented as multiple models and assigned unique Model IDs, often seen in NMR structures (e.g., PDB ID 2kpq).

Each atom in each residue is assigned a specific atom name in accordance with the Chemical Component Dictionary maintained by the PDB (e.g., N, CA, C, O, CB, OG are names of all the non-hydrogen atoms in Serine). All instances of serine in a structure will use the same atom names but will be assigned a unique combination of entity, and instance (or chain) ID, and residue number. If appropriate Alt IDs will also be specified. For example, in the PDB ID 1trz, chain ID D, Ser 9, two of its atoms (CB and OG) have Alt IDs A and B.

Small Molecule Instance ID

Small molecules such as ligands, ions, drugs, inhibitors, and individual residues (amino acids, nucleotides etc.) are found in PDB structures interacting with macromolecules such as proteins, and nucleic acids. They are assigned the chain ID of the (spatially) closest macromolecule. For example, all ligands, water molecules, etc. nearest protein chain A will also be assigned to chain A, though each of these small molecules will have unique residue numbers. So all ligands, waters etc. near a protein with chain ID A will be assigned the same chain ID. However, each of these small molecules and ligands can be specifically located by using unique residue numbers.

Atom names for all atoms in a small molecule are assigned according to the Chemical Component Dictionary.

Oligosaccharide Instance ID

Oligosaccharides are polymers of sugars that are covalently linked to form linear or branched chains. Like proteins or nucleic acids all instances of oligosaccharides are assigned unique chain IDs.

Oligosaccharides are often found covalently linked to protein (e.g., glycoproteins). If a single sugar molecule is covalently linked to a protein it is treated like a small molecule and assigned the chain ID of the protein it is linked to. If 2 or more sugars covalently linked to each other via glycosidic bonds they are assigned a unique chain ID.

Atoms in each sugar molecule are assigned atom names according to the Chemical Component Dictionary.

Assembly level Identifiers

Experimentally determined structures submitted to the PDB contain coordinates of macromolecules and small molecules that may represent a complete biologically relevant assembly, a portion of an assembly, or multiple copies of an assembly. Numerical Assembly IDs are assigned to each biologically relevant assembly. These IDs are entry-specific, can be used to visualize or download and provide instructions for defining specific biological assemblies. For example, when multiple assemblies are present in the entry (or structure), the assembly ID is used to group specific instances of entities that form each assembly (e.g., see PDB ID 2hbs). The Assembly ID may also be used for providing instructions to apply symmetry operations (denoted by Asymmetric Unit IDs for Asym IDs) to generate the biologically relevant assembly (e.g., see PDB ID 1out).

Exceptions

Sometimes in structures available from RCSB.org may have two different chain identifiers (IDs) listed for a biopolymer or ligand, in the form X [auth Y]. This is because the PDB uses two distinct systems for labeling chain IDs.

The ¡°auth¡± system, represented by Y above, comprises author-provided chain IDs for most polymer chains (e.g., protein, DNA, RNA), PDB-assigned chain IDs for polysaccharide chains, and chain IDs for non-polymer ligands and solvent that have been assigned by the PDB to match the chain ID of the nearest polymer chain. These are the chain IDs displayed by most visualization programs and the chain IDs that will be present in a PDB-formatted atomic coordinate file (if one is available for a structure). For example, in the ¡°auth¡± system, a protein dimer with a small-molecule inhibitor bound to each subunit might have chain ID ¡°P¡± for both the protein subunit and its bound inhibitor and chain ID ¡°Q¡± for both the other protein subunit and its bound inhibitor. Each polymer and non-polymer has a separate chain ID, with the exception of chain IDs for solvents. In the "auth" system the solvent (and ligand) is assigned a chain ID to match the chain ID of the polymer close to the groups.

The ¡°label¡± system, represented by X above, labels each polymer and non-polymer present in the structure consecutively starting with ¡°A¡± and proceeding through the alphabet, adding additional letters as necessary (i.e., A, B...Z, AA, BA¡­ZA, AB, BB, etc.). The solvent (and ligand) chain IDs are distinct in the ¡°label¡± system, which are frequently used for bioinformatics applications.

For the inhibitor-bound protein dimer example above, if the structure components are present in the order protein subunit #1, protein subunit #2, inhibitor #1 (bound to subunit #1), inhibitor #2 (bound to subunit #2), they would labeled A, B, C, and D such that their combined chain ID representation on the RCSB.org site would be

protein subunit #1 A [auth P]
protein subunit #2 B [auth Q]
inhibitor #1 C [auth P]
inhibitor #2 D [auth Q]

It is important to note that sometimes the ¡°auth¡± and ¡°asym¡± IDs for a chain are identical, in which case the chain ID is displayed as simply X without the [auth Y] addendum. An example of this is PDB entry 4HHB, where the author-provided protein chain IDs are A, B, C, and D, and the components of the structure are labeled

hemoglobin subunit alpha A,C
hemoglobin subunit beta B,D
heme E [auth A], G [auth B], H [auth C] J [auth D]
phosphate F [auth B], I [auth D]

The residue numbers also have the same fate as the chain IDs. For example, in the context of the PDB entry, 1cbw, you may find an amino acid residue referred to as F [auth G] Leu 18 [auth 33].



Please report any encountered broken links to info@rcsb.org
Last updated: 10/5/2024
seductrice.net
universo-virtual.com
buytrendz.net
thisforall.net
benchpressgains.com
qthzb.com
mindhunter9.com
dwjqp1.com
secure-signup.net
ahaayy.com
tressesindia.com
puresybian.com
krpano-chs.com
cre8workshop.com
hdkino.org
peixun021.com
qz786.com
utahperformingartscenter.org
worldqrmconference.com
shangyuwh.com
eejssdfsdfdfjsd.com
playminecraftfreeonline.com
trekvietnamtour.com
your-business-articles.com
essaywritingservice10.com
hindusamaaj.com
joggingvideo.com
wandercoups.com
wormblaster.net
tongchengchuyange0004.com
internetknowing.com
breachurch.com
peachesnginburlesque.com
dataarchitectoo.com
clientfunnelformula.com
30pps.com
cherylroll.com
ks2252.com
prowp.net
webmanicura.com
sofietsshotel.com
facetorch.com
nylawyerreview.com
apapromotions.com
shareparelli.com
goeaglepointe.com
thegreenmanpubphuket.com
karotorossian.com
publicsensor.com
taiwandefence.com
epcsur.com
mfhoudan.com
southstills.com
tvtv98.com
thewellington-hotel.com
bccaipiao.com
colectoresindustrialesgs.com
shenanddcg.com
capriartfilmfestival.com
replicabreitlingsale.com
thaiamarinnewtoncorner.com
gkmcww.com
mbnkbj.com
andrewbrennandesign.com
cod54.com
luobinzhang.com
faithfirst.net
zjyc28.com
tongchengjinyeyouyue0004.com
nhuan6.com
kftz5k.com
oldgardensflowers.com
lightupthefloor.com
bahamamamas-stjohns.com
ly2818.com
905onthebay.com
fonemenu.com
notanothermovie.com
ukrainehighclassescort.com
meincmagazine.com
av-5858.com
yallerdawg.com
donkeythemovie.com
corporatehospitalitygroup.com
boboyy88.com
miteinander-lernen.com
dannayconsulting.com
officialtomsshoesoutletstore.com
forsale-amoxil-amoxicillin.net
generictadalafil-canada.net
guitarlessonseastlondon.com
lesliesrestaurants.com
mattyno9.com
nri-homeloans.com
rtgvisas-qatar.com
salbutamolventolinonline.net
sportsinjuries.info
wedsna.com
rgkntk.com
bkkmarketplace.com
zxqcwx.com
breakupprogram.com
boxcardc.com
unblockyoutubeindonesia.com
fabulousbookmark.com
beat-the.com
guatemala-sailfishing-vacations-charters.com
magie-marketing.com
kingstonliteracy.com
guitaraffinity.com
eurelookinggoodapparel.com
howtolosecheekfat.net
marioncma.org
oliviadavismusic.com
shantelcampbellrealestate.com
shopleborn13.com
topindiafree.com
v-visitors.net
djjky.com
053hh.com
originbluei.com
baucishotel.com
33kkn.com
intrinsiqresearch.com
mariaescort-kiev.com
mymaguk.com
sponsored4u.com
crimsonclass.com
bataillenavale.com
searchtile.com
ze-stribrnych-struh.com
zenithalhype.com
modalpkv.com
bouisset-lafforgue.com
useupload.com
37r.net
autoankauf-muenster.com
bantinbongda.net
bilgius.com
brabustermagazine.com
indigrow.org
miicrosofts.net
mysmiletravel.com
selinasims.com
spellcubesapp.com
usa-faction.com
hypoallergenicdogsnames.com
dailyupdatez.com
foodphotographyreviews.com
cricutcom-setup.com
chprowebdesign.com
katyrealty-kanepa.com
tasramar.com
bilgipinari.org
four-am.com
indiarepublicday.com
inquick-enbooks.com
iracmpi.com
kakaschoenen.com
lsm99flash.com
nana1255.com
ngen-niagara.com
technwzs.com
virtualonlinecasino1345.com
wallpapertop.net
casino-natali.com
iprofit-internet.com
denochemexicana.com
eventhalfkg.com
medcon-taiwan.com
life-himawari.com
myriamshomes.com
nightmarevue.com
healthandfitnesslives.com
androidnews-jp.com
allstarsru.com
bestofthebuckeyestate.com
bestofthefirststate.com
bestwireless7.com
britsmile.com
declarationintermittent.com
findhereall.com
jingyou888.com
lsm99deal.com
lsm99galaxy.com
moozatech.com
nuagh.com
patliyo.com
philomenamagikz.net
rckouba.net
saturnunipessoallda.com
tallahasseefrolics.com
thematurehardcore.net
totalenvironment-inthatquietearth.com
velislavakaymakanova.com
vermontenergetic.com
kakakpintar.com
jerusalemdispatch.com
begorgeouslady.com
1800birks4u.com
2wheelstogo.com
6strip4you.com
bigdata-world.net
emailandco.net
gacapal.com
jharpost.com
krishnaastro.com
lsm99credit.com
mascalzonicampani.com
sitemapxml.org
thecityslums.net
topagh.com
flairnetwebdesign.com
rajasthancarservices.com
bangkaeair.com
beneventocoupon.com
noternet.org
oqtive.com
smilebrightrx.com
decollage-etiquette.com
1millionbestdownloads.com
7658.info
bidbass.com
devlopworldtech.com
digitalmarketingrajkot.com
fluginfo.net
naqlafshk.com
passion-decouverte.com
playsirius.com
spacceleratorintl.com
stikyballs.com
top10way.com
yokidsyogurt.com
zszyhl.com
16firthcrescent.com
abogadolaboralistamd.com
apk2wap.com
aromacremeria.com
banparacard.com
bosmanraws.com
businessproviderblog.com
caltonosa.com
calvaryrevivalchurch.org
chastenedsoulwithabrokenheart.com
cheminotsgardcevennes.com
cooksspot.com
cqxzpt.com
deesywig.com
deltacartoonmaps.com
despixelsetdeshommes.com
duocoracaobrasileiro.com
fareshopbd.com
goodpainspills.com
hemendekor.com
kobisitecdn.com
makaigoods.com
mgs1454.com
piccadillyresidences.com
radiolaondafresca.com
rubendorf.com
searchengineimprov.com
sellmyhrvahome.com
shugahouseessentials.com
sonihullquad.com
subtractkilos.com
valeriekelmansky.com
vipasdigitalmarketing.com
voolivrerj.com
worldhealthstory.com
zeelonggroup.com
1015southrockhill.com
10x10b.com
111-online-casinos.com
191cb.com
3665arpentunitd.com
aitesonics.com
bag-shokunin.com
brightotech.com
communication-digitale-services.com
covoakland.org
dariaprimapack.com
freefortniteaccountss.com
gatebizglobal.com
global1entertainmentnews.com
greatytene.com
hiroshiwakita.com
iktodaypk.com
jahatsakong.com
meadowbrookgolfgroup.com
newsbharati.net
platinumstudiosdesign.com
slotxogamesplay.com
strikestaruk.com
techguroh.com
trucosdefortnite.com
ufabetrune.com
weddedtowhitmore.com
12940brycecanyonunitb.com
1311dietrichoaks.com
2monarchtraceunit303.com
601legendhill.com
850elaine.com
adieusolasomade.com
andora-ke.com
bestslotxogames.com
cannagomcallen.com
endlesslyhot.com
iestpjva.com
ouqprint.com
pwmaplefest.com
qtylmr.com
rb88betting.com
buscadogues.com
1007macfm.com
born-wild.com
growthinvests.com
promocode-casino.com
proyectogalgoargentina.com
wbthompson-art.com
whitemountainwheels.com
7thavehvl.com
developmethis.com
funkydogbowties.com
travelodgegrandjunction.com
gao-town.com
globalmarketsuite.com
blogshippo.com
hdbka.com
proboards67.com
outletonline-michaelkors.com
kalkis-research.com
thuthuatit.net
buckcash.com
hollistercanada.com
docterror.com
asadart.com
vmayke.org
erwincomputers.com
dirimart.org
okkii.com
loteriasdecehegin.com
mountanalog.com
healingtaobritain.com
ttxmonitor.com
nwordpress.com
11bolabonanza.com