RCSB PDB Help

Sequence Motif Search

Introduction

What is a sequence motif?

Sequence motifs are short segments of conserved protein or nucleic acid sequences, that are present in many proteins or genes (respectively) and believed to have specific functional significance.
In some cases, the entire set of amino acids or nucleic acids in the sequence is conserved and required to perform the specific function.
In other cases, only amino acids or nucleic acids at specific locations in the sequence motif may be conserved and significant for the function.

What is a Sequence Motif Search?

The sequence motif search option allows you to query for amino acid or nucleotide sequence fragments in an FASTA sequence that appear frequently in polymers present in 3D structures.

Why run a Sequence Motif Search?

Finding a specific sequence motif in a protein or nucleic acid suggests that it may have the function associated with the motif; i.e., it can be used to predict function(s).
Another reason to run sequence motif searches is that it is indeed different from a regular similarity-based sequence search (e.g., BLAST) in two ways:

  • The sequence defining the sequence motif is short (so the similarity-based searches will not work effectively)
  • Parts of the sequence motif may have alternate sequences or may not be conserved at all (so specific conditions have to be included in the query for defining the non-contiguous conserved amino acids/nucleotides in the sequence motif)

Running a search

The sequence motif search options are available from the Advanced Search Query builder (Figure 1).

Figure 1: Interface to specify a sequence motif search for Protein, DNA, or RNA sequences in different formats and find polymer entities that match the query. If appropriate, turn on toggle switch to include CSMs in the search.
Figure 1: Interface to specify a sequence motif search for Protein, DNA, or RNA sequences in different formats and find polymer entities that match the query. If appropriate, turn on toggle switch to include CSMs in the search.

Query options

  • Set the Sequence Type to search for protein (amino acid), DNA, or RNA sequences.

  • Set the Mode to specify the motif syntax as simple, PROSITE, or regex.

  • Set the Data Source to search among experimental records only, computational records, or both.

Sequence Type

In all three Modes, amino acid residue (or nucleotide) types are specified using one-letter codes, which are defined by IUPAC. For example: for amino acid sequences, R is arginine; for RNA sequences, U is uracil. Nucleotide sequences also support so-called ambiguous codes; for example, S is either cytosine or guanine. Only Simple and PROSITE Modes support ambiguous codes. Below is a full reference of one-letter codes.

Queries are case-insensitive for all three Modes: ATGC and atgc are identical. (This also applies to X and x in Simple and PROSITE Modes.)

? Tables of one-letter codes /U
Nucleotide Codes
codemeaning
Aadenine
Ccytosine
Gguanine
T 1thymine
U 1uracil
B 2C/G/T/U
D 2A/G/T/U
H 2A/C/T/U
K 2G/T
M 2A/C
R 2A/G
S 2C/G
V 2A/C/G
W 2A/T/U
Y 2C/T/U
N 2any base

1 T is restricted to DNA; U is restricted to RNA

3 Termed ambiguous; only supported in Simple and PROSITE Modes.

Amino Acid Codes
codemeaning
Aalanine
Ccysteine
Daspartic acid
Eglutamic acid
Fphenylalanine
Gglycine
Hhistidine
Iisoleucine
Klysine
Lleucine
Mmethionine
Nasparagine
Pproline
Qglutamine
Rarginine
Sserine
Tthreonine
Vvaline
Wtryptophan
Ytyrosine

Simple Mode

Input a sequence of one or more of one-letter codes. Ambiguous nucleotide codes are supported, and the wildcard symbol (X) can be used to represent any amino acid or nucleotide. Use < and > to match the N- and C-termini, respectively.

Examples

  • XPPXP (protein): SH3 domains (any ¡ú proline ¡ú proline ¡ú any ¡ú proline)
  • YYY (DNA): 3¡Á cytosine/thymine
  • <SSS: any sequence that starts with 3¡Á serine

PROSITE Mode

Complex queries can be expressed using PROSITE patterns. A PROSITE pattern is composed of one or more atoms, optionally separated by hyphens (-). The sequence is optionally terminated by a period (.).

X can be used to stand in for any amino acid or nucleotide type, and ambiguous nucleotide codes (e.g., B) are supported.

Note that this syntax is a superset of classic PROSITE: The search supports some patterns that may not be accepted by other tools, such as EXPASY ScanProsite. For complete information, refer to the PROSITE extended information.

Atom types

Each atom is one of seven types:

Literal
A one-letter code (e.g., A). This matches exactly 1 residue.
Any-of ([])
One or more codes enclosed in [], such as [ATC]. This matches exactly 1 residue whose code is listed.
None-of ({})
One or more codes enclosed in {}, such as {ATC}. This matches exactly 1 residue whose code is not listed.
N-terminus (<)
An N-terminal marker, <, indicating the start of the sequence. If included, this must be the first element.
C-terminus (>)
A C-terminal marker, >, indicating the end of the sequence. If included, this must be the last element.
Any-of / C-terminus (e.g., [A>])
A variable C-terminal element, such as [>AC], [A>C], or [AC>] (equivalently). This matches either the end of the sequence or exactly 1 reside among those listed (but not both).

Quantifiers

Each literal, wildcard, any-of, and none-of element may be followed by a quantifier to match the preceding element some number of times. The quantifier is enclosed in () and can be Exact, Minimum, or Range:

Exact
A(2) matches exactly AA.
Minimum
A(2,) matches at AA, AAA, ¡­ .
Range
A(2,4) matches AA, AAA, and AAAA.

Regex Mode

Regular expressions (regex) are also supported. This option is more powerful than PROSITE and may be familiar to programmers. Note that the service may refuse to process some queries.

A regex pattern contains one or more atoms, each with an optional quantifier. | denotes a logical or, and () groups atoms into groups.

Ambiguous nucleotide codes are not supported, nor is X. Use . instead of X, and use [CGT] (for DNA) or [CGU] (for RNA) instead of B.

Examples

  • W.{7}G.{20}L matches tryptophan ¡ú 7¡Áany ¡ú glycine ¡ú 20¡Áany ¡ú lysine.
  • C.{2,4}C.{12}H.{3,5}H matches the zinc finger motif that binds Zn in a DNA-binding domain.
  • ^H+$ matches N-terminus ¡ú 1+ histidine ¡ú C-terminus.
  • [AG].{4}GK[ST] matches the Walker (P loop) motif that binds ATP or GTP.

Viewing results

Pre-search checklist

  • Before running the search remember to do the following:
  • change the result return option to Polymer entities
  • decide whether to include CSMs (default) or exclude them (by turning off the toggle switch next to the Search button.

Result options

The search results display the numbering for the sequence match region (corresponding to PDBx/mmCIF file numbering) (Figure 2).
Click on the 3D View button included for each matched result to view the structure interactively in 3D.
The matched region specified in the results can be examined closely.

Figure 2: Part of the query results page for a sequence motif search showing the regions of the polymer entity that matches the query sequence motif in a red box. Clicking on the 3D view marked with red arrows opens the structure in Mol*.
Figure 2: Part of the query results page for a sequence motif search showing the regions of the polymer entity that matches the query sequence motif in a red box. Clicking on the 3D view marked with red arrows opens the structure in Mol*.

Example searches


Extended (advanced) information

PROSITE Mode details

Terminology

This documentation uses the following definitions.

Atom
A PROSITE syntax item to match against 1 residue or nucleotide. A literal (e.g., A), gap (.), any-of (e.g., [CG]), none-of (e.g., {AT}, N-terminus (<), C-terminus (>[>AT])
Term
An atom with its quantifier, if any

Nonstandard but allowed in RCSB

RCSB PROSITE is more forgiving than standard PROSITE; it differs in the following ways:
  • Case is ignored. A and a are the same, as are X and x.
  • Range quantifiers ((x,y)) may be used for all atoms, not just gaps (x). For example, A(1,4) matches between 1 and 4 alanines. In contrast, standard PROSITE only permits, for example, x(1,4).
  • Hyphens (-) may be omitted, even with one-letter nucleotide codes, such as B. Hyphens are ignored as long as they are used in valid positions.

RCSB-specific rules

Some parts of the PROSITE specification can be interpreted in multiple ways. RCSB PROSITE has decided on these rules:
  • Spaces (characters with Unicode category Zs) are ignored when in reasonable positions. For example, A T{1, 3} is allowed.
  • The query must contain at least 1 atom. (<, >, <>, and the empty string are forbidden.)
  • Any-of matches ([]) require at least 1 character.
  • None-of matches ({}) cannot include every one-letter code. {ATGC} is invalid for DNA sequences (and could never match a sequence).
  • An exact quantifier (n) is allowed if and only if n ¡Ý 1.
  • A range quantifier (m, n) is allowed if and only if n ¡Ý m and m > 0.

Formal grammar

This grammar uses RFC 5234 ABNF.

query            = start *(['-'] term) ['-' end] ['.']
                 ;  ^          ^           ^
                 ; required    0 or more   optional

start            = term / (nterm non-gap-term) / (nterm gap)
                 % EITHER: A term (1+ elements) without an N-term
                 % OR: N-term with non-gap term (1+ elements)
                 % OR: single gap (1 element; non-repeated)
end              = term / (non-gap-term / cterm) / (gap cterm)

term             = element [count / range]
element          = code / any-of / none-of / gap
non-gap-term     = non-gap-element [count / range]
non-gap-element  = code / any-of / none-of

aa               = "a one-letter code"
gap              = 'x'
                 ; Matches any single residue
any-of           = '[' 1*aa ']'
                 ; Matches any single residue included in []
                 ; For example, [ACE] matches A, C, or E
none-of          = '{' 1*aa '}'
                 ; Matches any single residue NOT included in {}

count            = '(' natural ')'
                 ; An exact number of times to repeat the preceding element
                 ; For example, [AW](3) is equivalent to [AW][AW][AW]
range            = '(' number ',' natural ')'
                 ; A min and max number of times to repeat the preceding element
                 ; For example, A{1,3} matches A, AA, and AAA
                 ; Note: min must be less than max

nterm            = '<'
                 ; Matches the sequence start (N-terminus)
cterm            = cterm-literal / cterm-or-any-of
cterm-literal    = '>'=
                 ; Matches the sequence end (C-terminus)
cterm-or-any-of  = ('[' (1*aa '>' *aa) / (*aa '>' 1**aa) ']')
                 ; Matches either the sequence end (C-terminus),
                 ; OR an aa included in [] / an aa not included in {}
                 ; For example, [A>] matches either the sequence end or A.
                 ; Valid examples: [A>], [>A], [A,C], [ACDE>]
                 ; Invalid examples: [A>>], [A>C>], [>], []

number           = 1*DIGIT
natural          = NONZERO *DIGIT
NONZERO          = %x31-39

Regex Mode details

Supported and non-supported constructs

The query syntax is IEEE POSIX Extended Regular Expressions. Nearly all of the standard is supported, including advanced constructs like lookarounds and backreferences.

However, a few things are not supported. Most notably, characters that are not in one-letter codes are not allowed in literals or character classes. For example, Z and [A-Z] would result in an error. Named character classes, such as \s and \p{Alpha}, are also not supported.

Queries that will be rejected

In addition, the service will not allow expressions that could seriously degrade performance. Specifically, these are expressions with non-polynomial worst-case runtime or space complexity. The service will reject:
  • patterns that use non-possessive, inexact quantifiers on groups that match a variable number of characters, n, n > 1;
  • patterns that use quantifiers on groups satisfying certain (other) ways;
  • patterns that use lazy, inexact quantifiers excessively or in certain ways;
  • patterns that use lookarounds excessively or in certain ways;
  • patterns with non-polynomial worst-case runtime or memory requirements; and
  • patterns with excessive total complexity.
API users should also note rare failure types: service-wide limits (HTTP 503), excessive query duration (504), and excessive querying (429).

Tips to simplify queries

Follow these guidelines to avoid a query being rejected.

  • Do not use lazy quantifiers.
  • Avoid lookarounds.
  • When applying quantifiers to groups, make sure the group is simple and only use either greedy ? or (preferably) a possessive quantifier.
  • Use possessive quantifiers where possible.
  • Do not begin or end a sequence with .*, ^.*, .*$, or similar.
Note that you can replace most uses of a lazy quantifier with one or two greedy quantifiers.


Please report any encountered broken links to info@rcsb.org
Last updated: 2/22/2024
universo-virtual.com
buytrendz.net
thisforall.net
benchpressgains.com
qthzb.com
mindhunter9.com
dwjqp1.com
secure-signup.net
ahaayy.com
soxtry.com
tressesindia.com
puresybian.com
krpano-chs.com
cre8workshop.com
hdkino.org
peixun021.com
qz786.com
utahperformingartscenter.org
maw-pr.com
zaaksen.com
ypxsptbfd7.com
worldqrmconference.com
shangyuwh.com
eejssdfsdfdfjsd.com
playminecraftfreeonline.com
trekvietnamtour.com
your-business-articles.com
essaywritingservice10.com
hindusamaaj.com
joggingvideo.com
wandercoups.com
onlinenewsofindia.com
worldgraphic-team.com
bnsrz.com
wormblaster.net
tongchengchuyange0004.com
internetknowing.com
breachurch.com
peachesnginburlesque.com
dataarchitectoo.com
clientfunnelformula.com
30pps.com
cherylroll.com
ks2252.com
webmanicura.com
osostore.com
softsmob.com
sofietsshotel.com
facetorch.com
nylawyerreview.com
apapromotions.com
shareparelli.com
goeaglepointe.com
thegreenmanpubphuket.com
karotorossian.com
publicsensor.com
taiwandefence.com
epcsur.com
odskc.com
inzziln.info
leaiiln.info
cq-oa.com
dqtianshun.com
southstills.com
tvtv98.com
thewellington-hotel.com
bccaipiao.com
colectoresindustrialesgs.com
shenanddcg.com
capriartfilmfestival.com
replicabreitlingsale.com
thaiamarinnewtoncorner.com
gkmcww.com
mbnkbj.com
andrewbrennandesign.com
cod54.com
luobinzhang.com
bartoysdirect.com
taquerialoscompadresdc.com
aaoodln.info
amcckln.info
drvrnln.info
dwabmln.info
fcsjoln.info
hlonxln.info
kcmeiln.info
kplrrln.info
fatcatoons.com
91guoys.com
signupforfreehosting.com
faithfirst.net
zjyc28.com
tongchengjinyeyouyue0004.com
nhuan6.com
oldgardensflowers.com
lightupthefloor.com
bahamamamas-stjohns.com
ly2818.com
905onthebay.com
fonemenu.com
notanothermovie.com
ukrainehighclassescort.com
meincmagazine.com
av-5858.com
yallerdawg.com
donkeythemovie.com
corporatehospitalitygroup.com
boboyy88.com
miteinander-lernen.com
dannayconsulting.com
officialtomsshoesoutletstore.com
forsale-amoxil-amoxicillin.net
generictadalafil-canada.net
guitarlessonseastlondon.com
lesliesrestaurants.com
mattyno9.com
nri-homeloans.com
rtgvisas-qatar.com
salbutamolventolinonline.net
sportsinjuries.info
topsedu.xyz
xmxm7.com
x332.xyz
sportstrainingblog.com
autopartspares.com
readguy.net
soniasegreto.com
bobbygdavis.com
wedsna.com
rgkntk.com
bkkmarketplace.com
zxqcwx.com
breakupprogram.com
boxcardc.com
unblockyoutubeindonesia.com
fabulousbookmark.com
beat-the.com
guatemala-sailfishing-vacations-charters.com
magie-marketing.com
kingstonliteracy.com
guitaraffinity.com
eurelookinggoodapparel.com
howtolosecheekfat.net
marioncma.org
oliviadavismusic.com
shantelcampbellrealestate.com
shopleborn13.com
topindiafree.com
v-visitors.net
qazwsxedcokmijn.com
parabis.net
terriesandelin.com
luxuryhomme.com
studyexpanse.com
ronoom.com
djjky.com
053hh.com
originbluei.com
baucishotel.com
33kkn.com
intrinsiqresearch.com
mariaescort-kiev.com
mymaguk.com
sponsored4u.com
crimsonclass.com
bataillenavale.com
searchtile.com
ze-stribrnych-struh.com
zenithalhype.com
modalpkv.com
bouisset-lafforgue.com
useupload.com
37r.net
autoankauf-muenster.com
bantinbongda.net
bilgius.com
brabustermagazine.com
indigrow.org
miicrosofts.net
mysmiletravel.com
selinasims.com
spellcubesapp.com
usa-faction.com
snn01.com
hope-kelley.com
bancodeprofissionais.com
zjccp99.com
liturgycreator.com
weedsmj.com
majorelenco.com
colcollect.com
androidnews-jp.com
hypoallergenicdogsnames.com
dailyupdatez.com
foodphotographyreviews.com
cricutcom-setup.com
chprowebdesign.com
katyrealty-kanepa.com
tasramar.com
bilgipinari.org
four-am.com
indiarepublicday.com
inquick-enbooks.com
iracmpi.com
kakaschoenen.com
lsm99flash.com
nana1255.com
ngen-niagara.com
technwzs.com
virtualonlinecasino1345.com
wallpapertop.net
nova-click.com
abeautifulcrazylife.com
diggmobile.com
denochemexicana.com
eventhalfkg.com
medcon-taiwan.com
life-himawari.com
myriamshomes.com
nightmarevue.com
allstarsru.com
bestofthebuckeyestate.com
bestofthefirststate.com
bestwireless7.com
declarationintermittent.com
findhereall.com
jingyou888.com
lsm99deal.com
lsm99galaxy.com
moozatech.com
nuagh.com
patliyo.com
philomenamagikz.net
rckouba.net
saturnunipessoallda.com
tallahasseefrolics.com
thematurehardcore.net
totalenvironment-inthatquietearth.com
velislavakaymakanova.com
vermontenergetic.com
sizam-design.com
kakakpintar.com
begorgeouslady.com
1800birks4u.com
2wheelstogo.com
6strip4you.com
bigdata-world.net
emailandco.net
gacapal.com
jharpost.com
krishnaastro.com
lsm99credit.com
mascalzonicampani.com
sitemapxml.org
thecityslums.net
topagh.com
flairnetwebdesign.com
bangkaeair.com
beneventocoupon.com
noternet.org
oqtive.com
smilebrightrx.com
decollage-etiquette.com
1millionbestdownloads.com
7658.info
bidbass.com
devlopworldtech.com
digitalmarketingrajkot.com
fluginfo.net
naqlafshk.com
passion-decouverte.com
playsirius.com
spacceleratorintl.com
stikyballs.com
top10way.com
yokidsyogurt.com
zszyhl.com
16firthcrescent.com
abogadolaboralistamd.com
apk2wap.com
aromacremeria.com
banparacard.com
bosmanraws.com
businessproviderblog.com
caltonosa.com
calvaryrevivalchurch.org
chastenedsoulwithabrokenheart.com
cheminotsgardcevennes.com
cooksspot.com
cqxzpt.com
deesywig.com
deltacartoonmaps.com
despixelsetdeshommes.com
duocoracaobrasileiro.com
fareshopbd.com
goodpainspills.com
kobisitecdn.com
makaigoods.com
mgs1454.com
piccadillyresidences.com
radiolaondafresca.com
rubendorf.com
searchengineimprov.com
sellmyhrvahome.com
shugahouseessentials.com
sonihullquad.com
subtractkilos.com
valeriekelmansky.com
vipasdigitalmarketing.com
voolivrerj.com
zeelonggroup.com
1015southrockhill.com
10x10b.com
111-online-casinos.com
191cb.com
3665arpentunitd.com
aitesonics.com
bag-shokunin.com
brightotech.com
communication-digitale-services.com
covoakland.org
dariaprimapack.com
freefortniteaccountss.com
gatebizglobal.com
global1entertainmentnews.com
greatytene.com
hiroshiwakita.com
iktodaypk.com
jahatsakong.com
meadowbrookgolfgroup.com
newsbharati.net
platinumstudiosdesign.com
slotxogamesplay.com
strikestaruk.com
trucosdefortnite.com
ufabetrune.com
weddedtowhitmore.com
12940brycecanyonunitb.com
1311dietrichoaks.com
2monarchtraceunit303.com
601legendhill.com
850elaine.com
adieusolasomade.com
andora-ke.com
bestslotxogames.com
cannagomcallen.com
endlesslyhot.com
iestpjva.com
ouqprint.com
pwmaplefest.com
qtylmr.com
rb88betting.com
buscadogues.com
1007macfm.com
born-wild.com
growthinvests.com
promocode-casino.com
proyectogalgoargentina.com
wbthompson-art.com
whitemountainwheels.com
7thavehvl.com
developmethis.com
funkydogbowties.com
travelodgegrandjunction.com
gao-town.com
globalmarketsuite.com
blogshippo.com
hdbka.com
proboards67.com
outletonline-michaelkors.com
kalkis-research.com
thuthuatit.net
buckcash.com
hollistercanada.com
docterror.com
asadart.com
vmayke.org
erwincomputers.com
dirimart.org
okkii.com
loteriasdecehegin.com
mountanalog.com
healingtaobritain.com
ttxmonitor.com
bamthemes.com
nwordpress.com
11bolabonanza.com
avgo.top