RCSB PDB Help

Sequence Similarity Search

¡ð Results

Introduction

The order of amino acids in a protein or the protein sequence is thought to hold the key for protein folding, i.e., each protein sequence folds into a unique shape to perform its function(s). While this relationship between sequence and structure holds true in most of the cases, some folded proteins may adopt multiple conformations corresponding to different functional states and a few protein sequences can fold into entirely different shapes (e.g., prion proteins). Sequence comparison is the most commonly used strategy for identifying homology between proteins and/or protein domains.

What is Sequence Similarity Search?

The Sequence Similarity Search option allows you to query the PDB archive using the amino acid sequence of a protein. Although the most common method for sequence similarity searches is Basic Local Alignment Search Tool (BLAST, Altschul et al., 1990), the Sequence Similarity Search option available from RCSB.org uses the MMseqs2 software (Steinegger and S?ding, 2017) to find similar protein and nucleic acid sequences. The MMseqs2 tool is similar to BLAST, but achieves better performance at comparable levels of sensitivity. The search is dependent on a user-defined "sequence identity cutoff". During the search, each structure in the archive is aligned with the target structure, and the number of identical amino acids is tallied. If this number is greater than the cutoff, the structure is returned by the search.

Why run a Sequence Similarity Search?

Sequence similarity searches are widely used to find similar proteins in the archive and identify conserved domains in them.

Aligning a query sequence with other sequences in a database or aligning a group of similar sequences and statistically assessing how well they match can help identify homology between proteins. This can be used to transfer information and develop hypotheses about the protein¡¯s interactions and functions.

For example, a search of sequences similar to that of whale myoglobin using a high cutoff value for sequence identity (~90%) will return a list of additional structure determinations of the particular protein (i.e., the whale myoglobin). Using a lower cutoff value (~30%) for the same search will return a longer list that includes myoglobin structures from other organisms and other proteins with similar structure, e.g., cytoglobin.

Documentation

There are a few different options that can be combined to run a Sequence Similarity Search. These options are being listed here under 3 different sections:

  • Query - this will describe the options you have to inputting your query
  • Search - this will describe the search parameters and filters that can be applied.
  • Results - this will describe options available for what you wish to see in the results page.

Query Options

A Sequence Similarity Search query must be at least 25 residues long and be submitted in FASTA format (amino acid one letter code). For shorter sequences try the Sequence Motif Search. The sequence similarity search query can be submitted in two ways - either using the Advanced Search interface or from the Structure Summary Page.

Query using the Top Bar Search box

Pasting the one-letter code of a protein, DNA, or RNA sequence in the top bar search box is recognized as a sequence similarity search request (Figure 1A). This sequence is automatically transferred to the Advanced Search Query Builder¡¯s Sequence Similarity search and a search is performed for experimental structures and Computed Structure Models (CSMs) with the same or similar sequence (Figure 1B).

Figure 1: Options for sequence similarity search from the Top bar search box - A. paste FASTA sequence; B. the query sequence is transfered to the Sequence Similarity search options in the Advanced Search Query Builder to run the search. Note: Turn on the toggle switch to Include CSMs.
Figure 1: Options for sequence similarity search from the Top bar search box - A. paste FASTA sequence; B. the query sequence is transfered to the Sequence Similarity search options in the Advanced Search Query Builder to run the search. Note: Turn on the toggle switch to Include CSMs.

Query using the 'Advanced Search' panel

The sequence similarity search options available from the ¡°Advanced Search¡± panel are as follows:

The query can be submitted using the PDB ID. This is useful to find all sequences that are similar to the sequence from a specified chain.

  • Type in a structure ID (e.g., PDB ID or RCSB.org assigned CSM ID) into the appropriate box. Once you enter a valid structure ID the tool presents a list of macromolecular sequences with chain IDs in a pop-up box. Note that the sequences of all unique polymer entities are presented here (Figure 2A).
  • Select a Chain ID and sequence of interest from the pull-down menu. This selected sequence will be pasted into the Sequence box above the PDB ID box (Figure 2B)
Figure 2: Options for sequence similarity search from the Advanced Search Query Builder - A. input structure ID and select polymer chain of interest; B. the selected polymer chain sequence is populated in the Sequence similarity search box.
Figure 2: Options for sequence similarity search from the Advanced Search Query Builder - A. input structure ID and select polymer chain of interest; B. the selected polymer chain sequence is populated in the Sequence similarity search box.

The query can also be submitted as a sequence. This is useful when you have a FASTA format sequence on hand (either from a publication or a bioinformatics resource) to pasted inthe box provided (Figure 2B).

  • Paste the sequence in one-letter code format in the Sequence text box (See Panel B in figure above). Be sure to remove any other information that might be at the top of the pasted sequence (e.g., FASTA headers).

Query from the Structure Summary page

Each structure in the PDB has a dedicated Structure Summary page that displays information about the entities in that entry. To search for polymer entities with sequences similar to that of a specific polymer entity click on the ¡°Sequence¡± link above the details listed for the macromolecular entity and choose the desired sequence identity level from the drop-down (Figure 3). The search is run with the default E-value cutoff (0.1). This and other search options are explained further in the following section.

Figure 3: Options for launching a sequence similarity search from the structure summary page of a specific 3D structure.
Figure 3: Options for launching a sequence similarity search from the structure summary page of a specific 3D structure.

Search Options

Once the Query sequence has been provided, the following parameters can be specified to launch the search (Figure 4). If they are not specified, reasonable defaults will be used.

Figure 4: Sequence Similarity Search parameters.
Figure 4: Sequence Similarity Search parameters.
  • Search Target: There are three possible target databases for the sequence similarity search present in the PDB and in CSMs available from RCSB.org - (1) all current protein sequences, (2) all current DNA sequences and (3) all current RNA sequences.
  • E-value Cutoff: The E-value or Expect value is a parameter that describes the probability that the sequences matched by chance. The lower the E-value, or the closer it is to zero, the more "significant" the match is. Sequences that have an E-value higher than the set cutoff are filtered out of the results. The default E-value for Sequence Similarity Search is set at 0.1 and is usually indicated in scientific notation, (e.g., 1e-10). You can change this to a smaller or larger number to meet the needs of your search. Since the scoring takes chain length into consideration shorter sequences can have identical matches with high E-Value.
  • Identity Cutoff defines the extent of match between two sequences. The cutoff value is measured as a percentage, with a value between 0 and 100%. The default value is set to zero so all matched sequences are listed in the results. Select a level of identity for the sequence similarity search as needed. Matched sequences that have sequence identity lower than this cutoff are filtered out of the results.

Since the query for this search is a sequence, the results are expected to be sequences of polymer entities. Select the ¡°Polymer Entities¡± option for the Display Results as pull-down list, use the default option to exclude CSMs or turn on the toggle switch to include CSMs, and then launch the search by clicking on the Search button.

Results

At the top of the search results page, several options are available for displaying, reporting, sorting, and downloading all or selected entities, entries, and assemblies that match the query.

Figure 5: Options to display, report, sort, and download search results.
Figure 5: Options to display, report, sort, and download search results.

The default option for displaying the results is the Summary view. This includes a snapshot of the matched polymer entity, entry, or assembly, and lists the structure title, authors, brief information about the contents (polymer name, ligands), source organism and a few other details. Options are available to display the results in a Compact view, or as an image Gallery of the matched polymers, entries, or assemblies.

The Tabular Report options allow you to create detailed reports about the matched entries, entities, or assemblies. Using the sorting options, you can order the results by score, date of release, etc., while the download options allow users to download all or selected entries, entities, and assemblies.

The ¡°Alignment Reference¡± options are unique to the sequence similarity search results. These options allow you to change the interactive graphical summaries of the sequence alignment. Learn more about these options and the color scheme used for alignment here.

Polymer entities that match the query and search parameters are listed on the Results page using the default ¡°Alignment Reference¡± option (Figure 6).

Figure 6: Part of a sequence similarity search results page showing an experimental structure and a CSM that matched the query. The inset shows a zoomed in view of the sequence alignment including an insertion and several mismatches. Sequence similarity measures are highlighted with red arrows.
Figure 6: Part of a sequence similarity search results page showing an experimental structure and a CSM that matched the query. The inset shows a zoomed in view of the sequence alignment including an insertion and several mismatches. Sequence similarity measures are highlighted with red arrows.

The following values are presented to qualify the extent of the match:

  • Sequence Identity is the ratio of the number of identical amino acids between the 2 aligned sequences over the aligned length, expressed as a percentage.
  • E-Value provides a measure for whether the observed sequence identity was a chance match or if it has any evolutionary significance. This is reported as a number close to zero. The lower the E-Value, or the closer it is to zero, the more "significant" the match is. An E-Value of a significant match is often expressed in scientific notation, where the higher the exponent¡¯s power the more significant the match (i.e., 1e-80 is a more significant match than 1e-30). Note that short query sequences may be seen in many other protein sequences by chance so these matches may have a high E-Value.
  • Region specifies the amino acid residue numbers of the part of the protein that was considered in the sequence comparison.

You can explore the sequence match using the interactive alignment displayed. Zoom into the sequence and move the sequence left or right to examine sequence identities, mismatches, insertions, deletions or missing residues etc.

You can also download and view the matched files and/or the alignments.

Limitations of Sequence Similarity Search

Sequence similarity searches with short query sequences can match many sequences and provide false evolutionary relationships. This should be carefully examined and considered when making any conclusions about the matched sequences. For short sequences, the Sequence motif search may be more suitable.

Examples

1. Search for sequences that are 90% similar to the beta-lactamase in the PDB entry 1blc using the Advanced Search interface.

  • Launch this search from the Advanced Search interface for PDB ID 1blc, Chain ID A.
  • Provide the Identity Cutoff as 90 and select the Display results options from the pulldown menu to be Polymer Entities, and launch the search
Figure 7: Setting up the sequence similarity search for the sequence in Chain A of PDB entry 1blc, at 90% sequence identity. Turn on the toggle switch to include CSMs.
Figure 7: Setting up the sequence similarity search for the sequence in Chain A of PDB entry 1blc, at 90% sequence identity. Turn on the toggle switch to include CSMs.
  • The search results include several other beta lactamase entities but all from the same organism, Staphylococcus aureus.

2. Search for sequences that are 30% similar to the beta-lactamase in the PDB entry 1blc from the Structure Summary page (of PDB entry 1blc)

  • Launch this search from the Structure Summary page for the PDB entry 1blc, Macromolecule Entity 1 (BETA-LACTAMASE).
  • Click on the Sequence button and select the 30% option in the pulldown menu.
Figure 8: Launching a sequence similarity search from the Structure Summary Page for PDB entry 1blc.
Figure 8: Launching a sequence similarity search from the Structure Summary Page for PDB entry 1blc.
  • The search results include many other beta-lactamase structures from different organisms and specificities.

References

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of molecular biology, 215(3), 403¨C410. https://doi.org/10.1016/S0022-2836(05)80360-2

Steinegger, M., S?ding, J. (2017). MMseqs2 enables sensitive protein sequence similarity searching for the analysis of massive data sets. Nat Biotechnol 35, 1026¨C1028. https://doi.org/10.1038/nbt.3988



Please report any encountered broken links to info@rcsb.org
Last updated: 11/27/2023
universo-virtual.com
buytrendz.net
thisforall.net
benchpressgains.com
qthzb.com
mindhunter9.com
dwjqp1.com
secure-signup.net
ahaayy.com
soxtry.com
tressesindia.com
puresybian.com
krpano-chs.com
cre8workshop.com
hdkino.org
peixun021.com
qz786.com
utahperformingartscenter.org
maw-pr.com
zaaksen.com
ypxsptbfd7.com
worldqrmconference.com
shangyuwh.com
eejssdfsdfdfjsd.com
playminecraftfreeonline.com
trekvietnamtour.com
your-business-articles.com
essaywritingservice10.com
hindusamaaj.com
joggingvideo.com
wandercoups.com
onlinenewsofindia.com
worldgraphic-team.com
bnsrz.com
wormblaster.net
tongchengchuyange0004.com
internetknowing.com
breachurch.com
peachesnginburlesque.com
dataarchitectoo.com
clientfunnelformula.com
30pps.com
cherylroll.com
ks2252.com
webmanicura.com
osostore.com
softsmob.com
sofietsshotel.com
facetorch.com
nylawyerreview.com
apapromotions.com
shareparelli.com
goeaglepointe.com
thegreenmanpubphuket.com
karotorossian.com
publicsensor.com
taiwandefence.com
epcsur.com
odskc.com
inzziln.info
leaiiln.info
cq-oa.com
dqtianshun.com
southstills.com
tvtv98.com
thewellington-hotel.com
bccaipiao.com
colectoresindustrialesgs.com
shenanddcg.com
capriartfilmfestival.com
replicabreitlingsale.com
thaiamarinnewtoncorner.com
gkmcww.com
mbnkbj.com
andrewbrennandesign.com
cod54.com
luobinzhang.com
bartoysdirect.com
taquerialoscompadresdc.com
aaoodln.info
amcckln.info
drvrnln.info
dwabmln.info
fcsjoln.info
hlonxln.info
kcmeiln.info
kplrrln.info
fatcatoons.com
91guoys.com
signupforfreehosting.com
faithfirst.net
zjyc28.com
tongchengjinyeyouyue0004.com
nhuan6.com
oldgardensflowers.com
lightupthefloor.com
bahamamamas-stjohns.com
ly2818.com
905onthebay.com
fonemenu.com
notanothermovie.com
ukrainehighclassescort.com
meincmagazine.com
av-5858.com
yallerdawg.com
donkeythemovie.com
corporatehospitalitygroup.com
boboyy88.com
miteinander-lernen.com
dannayconsulting.com
officialtomsshoesoutletstore.com
forsale-amoxil-amoxicillin.net
generictadalafil-canada.net
guitarlessonseastlondon.com
lesliesrestaurants.com
mattyno9.com
nri-homeloans.com
rtgvisas-qatar.com
salbutamolventolinonline.net
sportsinjuries.info
topsedu.xyz
xmxm7.com
x332.xyz
sportstrainingblog.com
autopartspares.com
readguy.net
soniasegreto.com
bobbygdavis.com
wedsna.com
rgkntk.com
bkkmarketplace.com
zxqcwx.com
breakupprogram.com
boxcardc.com
unblockyoutubeindonesia.com
fabulousbookmark.com
beat-the.com
guatemala-sailfishing-vacations-charters.com
magie-marketing.com
kingstonliteracy.com
guitaraffinity.com
eurelookinggoodapparel.com
howtolosecheekfat.net
marioncma.org
oliviadavismusic.com
shantelcampbellrealestate.com
shopleborn13.com
topindiafree.com
v-visitors.net
qazwsxedcokmijn.com
parabis.net
terriesandelin.com
luxuryhomme.com
studyexpanse.com
ronoom.com
djjky.com
053hh.com
originbluei.com
baucishotel.com
33kkn.com
intrinsiqresearch.com
mariaescort-kiev.com
mymaguk.com
sponsored4u.com
crimsonclass.com
bataillenavale.com
searchtile.com
ze-stribrnych-struh.com
zenithalhype.com
modalpkv.com
bouisset-lafforgue.com
useupload.com
37r.net
autoankauf-muenster.com
bantinbongda.net
bilgius.com
brabustermagazine.com
indigrow.org
miicrosofts.net
mysmiletravel.com
selinasims.com
spellcubesapp.com
usa-faction.com
snn01.com
hope-kelley.com
bancodeprofissionais.com
zjccp99.com
liturgycreator.com
weedsmj.com
majorelenco.com
colcollect.com
androidnews-jp.com
hypoallergenicdogsnames.com
dailyupdatez.com
foodphotographyreviews.com
cricutcom-setup.com
chprowebdesign.com
katyrealty-kanepa.com
tasramar.com
bilgipinari.org
four-am.com
indiarepublicday.com
inquick-enbooks.com
iracmpi.com
kakaschoenen.com
lsm99flash.com
nana1255.com
ngen-niagara.com
technwzs.com
virtualonlinecasino1345.com
wallpapertop.net
nova-click.com
abeautifulcrazylife.com
diggmobile.com
denochemexicana.com
eventhalfkg.com
medcon-taiwan.com
life-himawari.com
myriamshomes.com
nightmarevue.com
allstarsru.com
bestofthebuckeyestate.com
bestofthefirststate.com
bestwireless7.com
declarationintermittent.com
findhereall.com
jingyou888.com
lsm99deal.com
lsm99galaxy.com
moozatech.com
nuagh.com
patliyo.com
philomenamagikz.net
rckouba.net
saturnunipessoallda.com
tallahasseefrolics.com
thematurehardcore.net
totalenvironment-inthatquietearth.com
velislavakaymakanova.com
vermontenergetic.com
sizam-design.com
kakakpintar.com
begorgeouslady.com
1800birks4u.com
2wheelstogo.com
6strip4you.com
bigdata-world.net
emailandco.net
gacapal.com
jharpost.com
krishnaastro.com
lsm99credit.com
mascalzonicampani.com
sitemapxml.org
thecityslums.net
topagh.com
flairnetwebdesign.com
bangkaeair.com
beneventocoupon.com
noternet.org
oqtive.com
smilebrightrx.com
decollage-etiquette.com
1millionbestdownloads.com
7658.info
bidbass.com
devlopworldtech.com
digitalmarketingrajkot.com
fluginfo.net
naqlafshk.com
passion-decouverte.com
playsirius.com
spacceleratorintl.com
stikyballs.com
top10way.com
yokidsyogurt.com
zszyhl.com
16firthcrescent.com
abogadolaboralistamd.com
apk2wap.com
aromacremeria.com
banparacard.com
bosmanraws.com
businessproviderblog.com
caltonosa.com
calvaryrevivalchurch.org
chastenedsoulwithabrokenheart.com
cheminotsgardcevennes.com
cooksspot.com
cqxzpt.com
deesywig.com
deltacartoonmaps.com
despixelsetdeshommes.com
duocoracaobrasileiro.com
fareshopbd.com
goodpainspills.com
kobisitecdn.com
makaigoods.com
mgs1454.com
piccadillyresidences.com
radiolaondafresca.com
rubendorf.com
searchengineimprov.com
sellmyhrvahome.com
shugahouseessentials.com
sonihullquad.com
subtractkilos.com
valeriekelmansky.com
vipasdigitalmarketing.com
voolivrerj.com
zeelonggroup.com
1015southrockhill.com
10x10b.com
111-online-casinos.com
191cb.com
3665arpentunitd.com
aitesonics.com
bag-shokunin.com
brightotech.com
communication-digitale-services.com
covoakland.org
dariaprimapack.com
freefortniteaccountss.com
gatebizglobal.com
global1entertainmentnews.com
greatytene.com
hiroshiwakita.com
iktodaypk.com
jahatsakong.com
meadowbrookgolfgroup.com
newsbharati.net
platinumstudiosdesign.com
slotxogamesplay.com
strikestaruk.com
trucosdefortnite.com
ufabetrune.com
weddedtowhitmore.com
12940brycecanyonunitb.com
1311dietrichoaks.com
2monarchtraceunit303.com
601legendhill.com
850elaine.com
adieusolasomade.com
andora-ke.com
bestslotxogames.com
cannagomcallen.com
endlesslyhot.com
iestpjva.com
ouqprint.com
pwmaplefest.com
qtylmr.com
rb88betting.com
buscadogues.com
1007macfm.com
born-wild.com
growthinvests.com
promocode-casino.com
proyectogalgoargentina.com
wbthompson-art.com
whitemountainwheels.com
7thavehvl.com
developmethis.com
funkydogbowties.com
travelodgegrandjunction.com
gao-town.com
globalmarketsuite.com
blogshippo.com
hdbka.com
proboards67.com
outletonline-michaelkors.com
kalkis-research.com
thuthuatit.net
buckcash.com
hollistercanada.com
docterror.com
asadart.com
vmayke.org
erwincomputers.com
dirimart.org
okkii.com
loteriasdecehegin.com
mountanalog.com
healingtaobritain.com
ttxmonitor.com
bamthemes.com
nwordpress.com
11bolabonanza.com
avgo.top