Molecularmarkers 
for genebank management
ISBN-13: 978-92-9043-684-3
ISBN-10: 92-9043-684-0
M
o
le
cu
lar m
arke
rs   
Te
ch
n
ical B
u
lle
tin
 N
o
. 1
0
 
IP
G
R
I
IPGRI TECHNICAL BULLETIN NO. 10
D. Spooner, R. van Treuren and M. C. de Vicente
IPGRI is
a Future Harvest Centre
supported by the
Consultative Group on
International Agricultural
 Research (CGIAR)
IPGRI is
a Future Harvest Centre
supported by the
Consultative Group on
International Agricultural
Research (CGIAR)
IPGRI Technical Bulletins are published by the International 
Plant Genetic Resources Institute with the intention of putting 
forward definitive recommendations for techniques in genetic 
resources. They are specifically aimed at National Programme and 
genebank personnel.
Previous titles in this series:
A protocol to determine seed storage behaviour
T.D. Hong and R.H. Ellis
IPGRI Technical Bulletin No. 1, 1996.
Molecular tools in plant genetic resources conservation: a guide 
to the technologies 
A. Karp, S. Kresovich, K.V. Bhat, W.G. Ayad and T. Hodgkin
IPGRI Technical Bulletin No. 2, 1997.
Core collections of plant genetic resources
Th.J.L. van Hintum, A.H.D. Brown, C. Spillane and T. Hodgkin
IPGRI Technical Bulletin No. 3, 2000.
Design and analysis of evaluation trials of genetic resources 
collections
Statistical Services Centre and University of Reading
IPGRI Technical Bulletin No. 4, 2001.
Accession management: combining or splitting accessions as a tool 
to improve germplasm management efficiency
N.R. Sackville Hamilton, J. M.M. Engels, Th.J.L. van Hintum, B. Koo and 
M. Smale
IPGRI Technical Bulletin No. 5, 2002.
Forest tree seed health
J.R. Sutherland, M. Diekmann and P. Berjak
IPGRI Technical Bulletin No. 6, 2002.
In vitro collecting techniques for germplasm conservation
V.C. Pence, J.A. Sandoval, V.M. Villalobos A. and F. Engelmann
IPGRI Technical Bulletin No. 7, 2002.
Análisis estadístico de datos de caracterización morfológica
T.L. Franco y R. Hidalgo
IPGRI Technical Bulletin No. 8, 2002.
A methodological model for ecogeographic surveys of crops
L. Guarino, N. Maxted and E.A. Chiwona, editors
IPGRI Technical Bulletin No. 9, 2005.
Copies can be obtained in PDF format from IPGRI’s Website (www.
ipgri.cgiar.org) or in printed format by sending a request to ipgri-
publications@cgiar.org.
Molecularmarkers
for genebank management
D. Spooner¹, R. van Treuren² and M. C. de Vicente³
¹ USDA, Agricultural  
 Research Service
 Department of   
 Horticulture
 University of   
 Wisconsin
 1575 Linden Drive
 Madison, Wisconsin  
 53706-1590
 USA
² Centre for Genetic  
 Resources, 
 The Netherlands  
 (CGN)
 Wageningen   
 University 
 and Research Centre
 P.O. Box 16
 6700 AA   
 Wageningen
 The Netherlands 
³ International Plant  
 Genetic 
 Resources Institute
 IPGRI Office for the  
 Americas A.A.
 6713 Cali
 Colombia
Introduction to the Series
The Technical Bulletin series is targeted at scientists and technicians 
managing genetic resources collections. Each title will aim to provide 
guidance on choices while implementing conservation techniques 
and procedures and in the experimentation required to adapt these 
to local operating conditions and target species. Techniques are 
discussed and, where relevant, options presented and suggestions 
made for experiments. The Technical Bulletins are authored by 
scientists working in the genetic resources area. IPGRI welcomes 
suggestions of topics for future volumes. In addition, IPGRI would 
encourage, and is prepared to support, the exchange of research 
findings obtained at the various genebanks and laboratories.
iv IPGRI TECHNICAL BULLETIN NO. 10
The International Plant Genetic Resources Institute (IPGRI) is an independent 
international scientific organization that seeks to improve the well-being of 
present and future generations of people by enhancing conservation and the 
deployment of agricultural biodiversity on farms and in forests. It is one of 15 
Future Harvest Centres supported by the Consultative Group on International 
Agricultural Research (CGIAR), an association of public and private members 
who support efforts to mobilize cutting-edge science to reduce hunger and 
poverty, improve human nutrition and health, and protect the environment. 
IPGRI has its headquarters in Maccarese, near Rome, Italy, with offices in 
more than 20 other countries worldwide. The Institute operates through 
four programmes: Diversity for Livelihoods, Understanding and Managing 
Biodiversity, Global Partnerships, and Improving Livelihoods in Commodity-
based Systems.
 The Agricultural Research Service (ARS) is the US Department of Agriculture’s 
chief scientific research agency. Its mission is to find solutions to agricultural 
problems that affect consumers every day, from field to table. The agency 
conducts research to develop and transfer solutions to agricultural problems of 
high national priority and provide information access and dissemination.
 The Centre for Genetic Resources, the Netherlands (CGN) is part of 
Wageningen University and Research Centre. Under a mandate of the Netherlands 
government, CGN is responsible for research tasks that relate to biodiversity 
and identity of species of importance to agriculture and forestry. CGN carries 
out the co-ordination of the governmental programme aimed at conservation 
and utilization of genetic resources. CGN’s mission is to contribute to global 
conservation efforts.
 The geographical designations employed and the presentation of material in 
this publication do not imply the expression of any opinion whatsoever on the 
part of IPGRI or the CGIAR concerning the legal status of any country, territory, 
city or area or its authorities, or concerning the delimitation of its frontiers or 
boundaries. Similarly, the views expressed are those of the authors and do not 
necessarily reflect the views of these organizations.
 Mention of a proprietary name does not constitute endorsement of the 
product and is given only for information.
Citation: Spooner D., R. van Treuren and M.C. de Vicente. 2005. Molecular markers 
for genebank management. IPGRI Technical Bulletin No. 10. International Plant 
Genetic Resources Institute, Rome, Italy.
Cover Image: Flowers of the wild tomato species Solanum arcanum Peralta 
(photo: D. Spooner) and part of a polyacrylamide sequencing gel S³5 
radio-labelling (photo: R. van Treuren). Design: P. Tazza.
ISBN-10: 92-9043-684-0
ISBN-13: 978-92-9043-684-3
IPGRI
Via dei Tre Denari 472/a
00057 Maccarese - Rome, Italy
© International Plant Genetic Resources Institute, 2005
Molecular markers for genebank management v
Table of Contents
List of figures  vi
List of tables  vii
Introduction  1
Overview of molecular technologies 3
  Main marker technologies 3
  Comparative qualities of marker techniques 17
Genebank management 24
  Acquisition of collection material 25
  Taxonomic issues 27
  Characterization of germplasm 48
  Maintenance of the genetic integrity of accessions 58
  Utilization of genetic resources 61
  Streamlining procedures and goals among  65
  cooperating genebanks
Crop Breeding  67
  Parental contributions of artificial hybrids 67
  Geneflow between crops and weeds 68
  Autotetraploid vs. allotetraploid inheritance 72
  Molecular diversity and heterosis 72
 Current developments 74
  Developments in marker techniques 75
  Functional diversity markers 77
  Developments in detection techniques 79
  Developments in functional genomics 84
Future challenges  87
Concluding remarks 90
Acknowledgments 91
References  92
Glossary   126
vi IPGRI TECHNICAL BULLETIN NO. 10
List of figures
Figure 1. Section of a polyacrylamide sequencing  
gel using S³5 radiolabelling.  9
Figure 2. Peak patterns of six perennial ryegrass samples 
screened for a microsatellite locus using fluorescent  
labelling on an ABI Prism 3700 DNA analyzer.  12
Figure 3. Variation among flax samples in part  
of an AFLP autoradiogam using P³³ radiolabelling.  16
Figure 4. A phenogram with a vertical phenon  
line drawn at distance coefficient about 2.5.  30
Figure 5. Terms relative to cladograms.  31
Figure 6. Cladistic relationships relative to cladograms.  33
Figure 7. Principles of SNP analysis using the SNaPshot 
method.    76
Figure 8. ABI Prism 3700 DNA analyzer.  81
Figure 9. Example of the results from the differential  
display technique using DNA chip technology.  82
Figure 10. Significant associations between AFLP  
markers and resistance to different pathotypes  
of downy mildew in lettuce. 85
Molecular markers for genebank management vii
List of tables
Table 1. Classification of marker techniques for relatively  
closely related germplasm.  5
Table 2. Overview of the relevant characteristics  
of 11 main marker technologies.  18
viii IPGRI TECHNICAL BULLETIN NO. 10
Molecular markers for genebank management 1
Introduction
In the last decade, the use of DNA markers for the study of crop 
genetic diversity has become routine, and has revolutionized 
biology. Increasingly, techniques are being developed to more 
precisely, quickly and cheaply assess genetic variation. These 
techniques have changed the standard equipment of many labs, 
and most germplasm scientists are expected to be trained in DNA 
data generation and interpretation. The rapid growth of new 
techniques has stimulated this update of IPGRI’s Technical Bulletin 
No. 2, “Molecular tools in plant genetic resources conservation: a 
guide to the technologies” (Karp et al. 1997b). Our goal is to update 
DNA techniques from this publication, to show examples of their 
applications, and to guide genebank researchers towards ways to 
maximize their use. This bulletin reviews basic qualities of molecular 
markers, their characteristics, the advantages and disadvantages of 
their applications, and analytical techniques, and provides some 
examples of their use. 
 There is no single molecular approach for many of the problems 
facing genebank managers, and many techniques complement each 
other. However, some techniques are clearly more appropriate than 
others for some specific applications. In an ideal situation, the most 
appropriate marker(s) can be chosen irrespective of time or funding 
constraints, but in other cases the choice of marker(s) will depend on 
constraints of equipment or funds. The purpose of this publication 
is to explain the characteristics of different markers and guide to 
their use through a number of real examples that represent well-
informed choices. What is most important is to choose a marker that 
can appropriately address well-defined questions through good 
experimental design, ideally leading to peer-reviewed scientific 
publications.
 Experimental design has many definitions depending on the type 
of question being asked and on the field of science addressed. We use 
the term here in a very general way to cover all aspects of planning 
an experiment, including a clear definition of the question being 
addressed; knowledge of prior studies addressing the question; 
proper choice of molecular markers and of data used to address the 
question; knowledge of the characteristics, strengths and weaknesses 
of the data; sources of unexpected variation in the data; how much 
data are needed; proper methods to analyze the data; and limits to 
conclusions you can make from the results.
 One of the most important considerations before beginning any 
experiment is to address proper experimental design. Improper 
2 IPGRI TECHNICAL BULLETIN NO. 10
experimental design can make the work inconclusive, misleading, 
insignificant, and most likely unpublishable. Similarly, improvements 
in experimental design can change an uninspired study to a highly 
significant one with little to no increase in time and funds. Poor 
experimental design can waste significant resources and damage 
the reputation and impact of your genebank. 
 It is beyond the scope of any publication to outline all possible 
pitfalls that can lead to poorly designed experiments, analyses or 
conclusions, and different considerations of proper experimental 
design need to be made in particular fields. This technical bulletin 
outlines some basic considerations regarding molecular marker 
types and analyses to lead the reader. There is no substitute, 
however, for basic knowledge of the biological questions being 
addressed, knowledge of the taxonomic group under consideration 
and a thorough literature review to ensure that similar work has not 
been done before. If limitations of any type hinder genebank and 
germplasm managers with regards to these factors, collaboration 
or consultation with experts is well worth the effort.
 Excellent reviews of methodology and data interpretation are 
presented in Weising et al. (1995), Hillis et al. (1996), Staub et al. 
(1996), Hillis (1997), Karp et al. (1997a,b) and Avise (2004). Hamrick 
and Godt (1997) present a review of isozyme data; Doebley (1992), 
Clegg (1993b) and Spooner and Lara-Cabrera (2001) present a review 
of molecular data for plant genetic resources and crop evolution; 
Bruford and Wayne (1993), Wang et al. (1994), Gupta et al. (1996), 
Powell et al. (1996a) and Weising et al. (1998) of microsatellite data; 
Wolfe and Liston (1998) on Polymerase Chain Reaction (PCR) related 
data. Schlötterer (2004) reviews the history and relative utility of 
different molecular marker types. Sytsma and Hahn (1997) present 
reviews of molecular studies in crop and non-crop plants. Some 
information from Spooner and Lara-Cabrera (2001) for crop diversity 
studies was used and updated; Spooner et al. (2003) was used for 
taxonomy studies.
 An overview of the main marker techniques and their comparative 
qualities is presented in the section titled, “Overview of molecular 
technologies”. Applications of molecular techniques in genebank 
management and crop breeding are the subject of the following 
sections. The section titled, “Future challenges” focuses on the 
current developments in molecular marker applications and future 
challenges that could result from these developments. Elements of 
experimental design are discussed throughout and some basic aspects 
of data analysis are discussed in “Genebank management”.
Molecular markers for genebank management 3
Overview of molecular technologies
Due to the rapid developments in the field of molecular genetics, 
a variety of different techniques have emerged to analyze genetic 
variation during the last few decades (Whitkus et al. 1994; Karp et 
al. 1996, 1997a,b; Parker et al. 1998; Schlötterer 2004). These genetic 
markers may differ with respect to important features, such as 
genomic abundance, level of polymorphism detected, locus specificity, 
reproducibility, technical requirements and financial investment. No 
marker is superior to all others for a wide range of applications. 
The most appropriate genetic marker will depend on the specific 
application, the presumed level of polymorphism, the presence of 
sufficient technical facilities and know-how, time constraints and 
financial limitations. The main marker technologies that have been 
widely applied during the last decades are summarized in Table 1, and 
briefly outlined below, together with their strengths and weaknesses. 
Information about the technologies and their applications may also 
be accessed via the website of the Centre for Genetic Resources, The 
Netherlands (CGN) at http://www.cgn.wur.nl/pgr/.
Main marker technologies
Allozymes
Description: Allozymes are allelic variants of enzymes encoded 
by structural genes. Enzymes are proteins consisting of amino 
acids, some of which are electrically charged. As a result, enzymes 
have a net electric charge, depending on the stretch of amino acids 
comprising the protein. When a mutation in the DNA results 
in an amino acid being replaced, the net electric charge of the 
protein may be modified, and the overall shape (conformation) of 
the molecule can change. Because changes in electric charge and 
conformation can affect the migration rate of proteins in an electric 
field, allelic variation can be detected by gel electrophoresis and 
subsequent enzyme-specific stains that contain substrate for the 
enzyme, cofactors and an oxidized salt (e.g. nitro-blue tetrazolium). 
Usually two, or sometimes even more loci can be distinguished 
for an enzyme and these are termed isoloci. Therefore, allozyme 
variation is often also referred to as isozyme variation (Kephart 
1990; May 1992). 
 Strengths: The strength of allozymes is simplicity. Because 
allozyme analysis does not require DNA extraction or the availability 
of sequence information, primers or probes, they are quick and 
easy to use. Some species, however, can require considerable 
optimization of techniques for certain enzymes. Simple analytical 
4 IPGRI TECHNICAL BULLETIN NO. 10
procedures, allow some allozymes to be applied at relatively low 
costs, depending on the enzyme staining reagents used. Allozymes 
are codominant markers that have high reproducibility. Zymograms 
(the banding pattern of isozymes) can be readily interpreted in 
terms of loci and alleles, or they may require segregation analysis 
of progeny of known parental crosses for interpretation. Sometimes, 
however, zymograms present complex banding profiles arising from 
polyploidy or duplicated genes and the formation of intergenic 
heterodimers, which may complicate interpretation. 
 Weaknesses: The main weakness of allozymes is their relatively 
low abundance and low level of polymorphism. Moreover, proteins 
with identical electrophoretic mobility (co-migration) may not be 
homologous for distantly related germplasm. In addition, their 
selective neutrality may be in question (Berry and Kreitman 1993; 
Hudson et al. 1994; Krieger and Ross 2002). Lastly, often allozymes 
are considered molecular markers since they represent enzyme 
variants, and enzymes are molecules. However, allozymes are 
in fact phenotypic markers, and as such they may be affected by 
environmental conditions. For example, the banding profile obtained 
for a particular allozyme marker may change depending on the type 
of tissue used for the analysis (e.g. root vs. leaf). This is because a 
gene that is being expressed in one tissue might not be expressed in 
other tissues. On the contrary, molecular markers, because they are 
based on differences in the DNA sequence, are not environmentally 
influenced, which means that the same banding profiles can be 
expected at all times for the same genotype.
 Applications: Allozymes have been applied in many population 
genetics studies, including measurements of outcrossing rates 
(Erskine and Muehlenbauer 1991), (sub)population structure 
and population divergence (Freville et al. 2001). Allozymes are 
particularly useful at the level of conspecific populations and closely 
related species, and are therefore useful to study diversity in crops 
and their relatives (Hamrick and Godt 1997). They have been used, 
often in concert with other markers, for fingerprinting purposes 
(Tao and Sugiura 1987; Maass and Ocampo 1995), and diversity 
studies (Lamboy et al. 1994; Ronning and Schnell 1994), to study 
interspecific relationships (Garvin and Weeden 1994), the mode 
of genetic inheritance (Warnke et al. 1998), and allelic frequencies 
in germplasm collections over serial increase cycles in germplasm 
banks (Reedy et al. 1995), and to identify parents in hybrids (Parani 
et al. 1997).
Molecular markers for genebank management 5
Table 1. Classification of marker techniques for relatively closely related 
germplasm
A. Biochemical markers
- Allozymes (Tanksley and Orton 1983; Kephart 1990; May 1992)
B. Molecular markers¹
i) Non-PCR² based techniques
- Restriction Fragment Length Polymorphisms (RFLP, Botstein et al. 1980; Neale 
and Williams 1991)
- Minisatellites or Variable Number of Tandem Repeats (VNTR, Jeffreys et al. 
1985a,b)
ii) PCR-based techniques
- DNA sequencing
• Multi-copy DNA, Internal Transcribed Spacer regions of nuclear ribosomal 
genes (ITS, Takaiwa et al. 1985; Dillon et al. 2001)
• Single-copy DNA, including both introns and exons (Sanger et al. 1977; Clegg 
1993a)
- Sequence-Tagged Sites (STS)
• Microsatellites, Simple Sequence Repeat (SSR), Short Tandem Repeat 
(STR), Sequence Tagged Microsatellite (STMS) or Simple Sequence Length 
Polymorphism (SSLP) (Hearne et al. 1992; Morgante and Olivieri 1993; Queller 
et al. 1993; Jarne and Lagoda 1996)
• Amplified Sequence Length Polymorphism (ASLP, Maughan et al. 1995)
• Sequence Characterized Amplified Region (SCAR, Paran and Michelmore 
1993)
• Cleaved Amplified Polymorphic Sequence (CAPS, Akopyanz et al. 1992; 
Konieczny and Ausubel 1993)
• Single-Strand Conformation Polymorphism (SSCP, Hayashi 1992)
• Denaturing Gradient Gel Electrophoresis (DGGE, Riedel et al. 1990)
• Thermal Gradient Gel Electrophoresis (TGGE, Riesner et al. 1989)
• Heteroduplex Analysis (HDA, Perez et al. 1999; Schneider et al. 1999)
• Denaturing High Performance Liquid Chromatography (DHPLC, Hauser et 
al. 1998; Steinmetz et al. 2000; Kota et al. 2001)
- Multiple Arbitrary Amplicon Profiling (MAAP, Caetano-Anolles 1996; Caetano 
Anolles et al. 1992)
• Random Amplified Polymorphic DNA (RAPD, Williams et al. 1990; Hadrys et 
al. 1992)
• DNA Amplification Fingerprinting (DAF, Caetano-Anolles et al. 1991)
• Arbitrarily Primed Polymerase Chain Reaction (AP-PCR, Welsh and McClelland 
1990; Williams et al. 1990)
• Inter-Simple Sequence Repeat (ISSR, Zietkiewicz et al. 1994; Godwin et al. 
1997)
• Single Primer Amplification Reaction (SPAR, Staub et al. 1996)
• Directed Amplification of Minisatellite DNA (DAMD, Heath et al. 1993; Somers 
and Demmon 2002)
• Amplified Fragment Length Polymorphism (AFLP, Vos et al. 1995)
• Selectively Amplified Microsatellite Polymorphic Loci (SAMPL, Witsenboer 
et al. 1997)
¹ Molecular markers can be based on cytoplasmic DNA (chloroplast, cpDNA, and/or mitochondrion, 
mtDNA) or nuclear DNA.
² PCR is an abbreviation of Polymerase Chain Reaction, a technology that amplifies DNA fragments.
6 IPGRI TECHNICAL BULLETIN NO. 10
Restriction Fragment Length Polymorphism (RFLP) 
Description: RFLPs are bands that correspond to DNA fragments, 
usually within the range of 2–10 kb, that have resulted from the 
digestion of genomic DNA with restriction enzymes. DNA fragments 
are separated by agarose gel electrophoresis and are detected by 
subsequent Southern blot hybridization to a labelled DNA probe. 
Labelling of the probe may be performed with a radioactive isotope 
or with alternative non-radioactive stains, such as digoxigenin or 
fluorescein. The locus specific RFLP probes consist of a homologous 
sequence of a specific chromosomal region. Probes are generated 
through the construction of genomic or complementary DNA 
(cDNA) libraries and therefore may be composed of a specific 
sequence of unknown identity (genomic DNA) or part of the 
sequence of a functional gene (exons only, cDNA). RFLP probes are 
maintained as clones in suitable bacterial vectors that conveniently 
allow the isolation of the DNA fragments they hold. Probes from 
related species may be used (heterologous probes). DNA sequence 
variation affecting the absence or presence of recognition sites 
of restriction enzymes, and insertions and deletions within two 
adjacent restriction sites, form the basis of length polymorphisms.
 Strengths: RFLPs are generally found to be moderately 
polymorphic. In addition to their high genomic abundance and 
their random distribution, RFLPs have the advantages of showing 
codominant alleles and having high reproducibility. 
 Weaknesses: The main drawbacks of RFLPs are the requirement 
of laborious and technically demanding methodological procedures, 
and high expense. In general, if research is conducted with poorly 
studied groups of wild species or poorly studied crops (orphan 
crops) suitable probes may not yet be available, so considerable 
investments are needed for development. Moreover, large quantities 
(1–10 µg) of purified, high molecular weight DNA are required for 
each DNA digestion. Larger quantities are needed for species with 
larger genomes, and for the greater number of times needed to probe 
each blot. RFLPs are not amenable to automation and collaboration 
among research teams requires distribution of probes.
 Applications: RFLPs can be applied in diversity and phylogenetic 
studies ranging from individuals within populations or species, 
to closely related species. RFLPs have been widely used in gene 
mapping studies because of their high genomic abundance due to 
the ample availability of different restriction enzymes and random 
distribution throughout the genome (Neale and Williams 1991). They 
also have been used to investigate relationships of closely related 
taxa (Miller and Tanksley 1990; Lanner et al. 1997), as fingerprinting 
tools (Fang et al. 1997), for diversity studies (Debreuil et al. 1996), 
Molecular markers for genebank management 7
and for studies of hybridization and introgression, including studies 
of gene flow between crops and weeds (Brubaker and Wendel 1994; 
Clausen and Spooner 1998; Desplanque et al. 1999).
Minisatellites
Description: Minisatellite analysis, like RFLPs, also involves 
digestion of genomic DNA with restriction endonucleases, but 
minisatellites are a conceptually very different class of marker. They 
consist of chromosomal regions containing tandem repeat units of 
a 10–50 base motif, flanked by conserved DNA restriction sites. A 
minisatellite profile consisting of many bands, usually within a 4–20 
kb size range, is generated by using common multilocus probes 
that are able to hybridize to minisatellite sequences in different 
species. Locus specific probes can be developed by molecular 
cloning of DNA restriction fragments, subsequent screening with 
a multilocus minisatellite probe and isolation of specific fragments. 
Variation in the number of repeat units, due to unequal crossing 
over or gene conversion, is considered to be the main cause of length 
polymorphisms. Due to the high mutation rate of minisatellites, the 
level of polymorphism is substantial, generally resulting in unique 
multilocus profiles for different individuals within a population. 
Minisatellite loci are also often referred to as Variable Number of 
Tandem Repeats (VNTR) loci. 
 Strengths: The main advantages of minisatellites are their high 
level of polymorphism and high reproducibility. 
 Weaknesses: Disadvantages of minisatellites are similar to RFLPs 
due to the high similarity in methodological procedures. If multilocus 
probes are used, highly informative profiles are generally observed 
due to the generation of many informative bands per reaction. In that 
case, band profiles can not be interpreted in terms of loci and alleles 
and similar sized fragments may be non-homologous. In addition, 
the random distribution of minisatellites across the genome has 
been questioned (Schlötterer 2004).
 Applications: The term DNA fingerprinting was introduced 
for minisatellites, though DNA fingerprinting is now used in 
a more general way to refer to a DNA-based assay to uniquely 
identify individuals. Minisatellites are particularly useful in studies 
involving genetic identity, parentage, clonal growth and structure, 
and identification of varieties and cultivars (Jeffreys et al. 1985a,b; 
Zhou et al. 1997), and for population-level studies (Wolff et al. 1994). 
Minisatellites are of reduced value for taxonomic studies because 
of hypervariability.
8 IPGRI TECHNICAL BULLETIN NO. 10
Polymerase Chain Reaction (PCR)-sequencing 
Description: PCR was a major breakthrough for molecular markers 
in that for the first time, any genomic region could be amplified 
and analyzed in many individuals without the requirement for 
cloning and isolating large amounts of ultra-pure genomic DNA 
(Schlötterer 2004). PCR sequencing involves determination of the 
nucleotide sequence within a DNA fragment amplified by the PCR, 
using primers specific for a particular genomic site. The method that 
has been most commonly used to determine nucleotide sequences 
is based on the termination of in vitro DNA replication. The 
procedure is initiated by annealing a primer to the amplified DNA 
fragment, followed by dividing the mixture into four subsamples. 
Subsequently, DNA is replicated in vitro by adding the four 
deoxynucleotides (adenine, cytocine, guanine, thymidine; dA, dC, 
dG and dT), a single dideoxynucleotide (ddA, ddC, ddG or ddT) and 
the enzyme DNA polymerase to each reaction. Sequence extension 
occurs as long as deoxynucleotides are incorporated in the newly 
synthesized DNA strand. 
 However, when a dideoxynucleotide is incorporated, DNA 
replication is terminated. Because each reaction contains many 
DNA molecules and incorporation of dideoxynucleotides occurs at 
random, each of the four subsamples contains fragments of varying 
length terminated at any occurrence of the particular dideoxy base 
used in the subsample. Finally, the fragments in each of the four 
subsamples are separated by gel electrophoresis (Figure 1).
 Strengths: Because all possible sequence differences within the 
amplified fragment can be resolved between individuals, PCR 
sequencing provides the ultimate measurement of genetic variation. 
Universal primer pairs to target specific sequences in a wide range 
of species are available for the chloroplast, mitochondrial and 
ribosomal genomes. Advantages of PCR sequencing include its 
high reproducibility and the fact that sequences of known identity 
are studied, increasing the chance of detecting truly homologous 
differences. Due to the amplification of fragments by PCR only low 
quantities of template DNA (the “target”º DNA used for the initial 
reaction) are required, e.g. 10–100 ng per reaction. Moreover, most 
of the technical procedures are amenable to automation. Much of 
this process is now automated (see “Developments in detection 
techniques”.
 Weaknesses: Disadvantages include low genome coverage and 
low levels of variation below the species level. In the event that 
primers for a genomic region of interest are unavailable, high 
development costs are involved. If sequences are visualized by 
polyacrylamide gel electrophoresis and autoradiography, analytical 
Molecular markers for genebank management 9
procedures are laborious and technically demanding. 
Fluorescent detection systems and reliable analytical 
software to score base pairs using automated sequencers 
are now widely applied. This requires considerable 
investments for equipment or substantial costs in 
the case of outsourcing. Because sequencing is costly 
and time-consuming, most studies have focused on 
only one or a few loci (but see Rokas et al. 2003 in 
“Genebank management” below). This restricts genome 
coverage and together with the fact that different genes 
may evolve at different rates, the extent to which 
the estimated gene diversity reflects overall genetic 
diversity is yet to be determined.
 Applications: In general, insufficient nucleotide 
variation is detected below the species level, and PCR 
sequencing is most useful to address questions of 
interspecific and intergeneric relationships (Sanger 
et al. 1977; Clegg 1993a). Until recently, chloroplast 
DNA and nuclear ribosomal DNA have provided the 
major datasets for phylogenetic inference because of 
the ease of obtaining data due to high copy number. 
Recently, single- to low-copy nuclear DNA markers 
(here referred to simply as low-copy) have been 
developed as powerful new tools for phylogenetic 
analyses (Mort and Crawford 2004; Small et al. 2004). 
Low-copy nuclear markers generally circumvent 
problems of uniparental inheritance frequently found 
in plastid markers (Corriveau and Coleman 1988), and 
concerted evolution found in nuclear ribosomal DNA 
(Arnheim 1983) that limits their utility and reliability 
in phylogenetic studies (Bailey et al. 2003). In addition 
to biparental inheritance, low-copy nuclear markers 
exhibit higher rates of evolution (particularly in intron 
regions) than cpDNA and nrDNA markers (Wolfe et 
al. 1987; Small et al. 2004) making them useful for 
closely related species. Yet another advantage is that 
low-copy sequences generally evolve independently of 
paralogous sequences and tend to be stable in position 
and copy number.
Random Amplified Polymorphic DNA (RAPD)
Description: RAPDs are DNA fragments amplified by 
the PCR using short synthetic primers (generally 10 bp) 
of random sequence. These oligonucleotides serve as 
Figure 1. Section of a 
polyacrylamide sequencing gel 
using S³5 radiolabelling. Part 
of the nucleotide sequence 
of two cloned DNA fragments 
containing microsatellite loci 
(indicated by boxes) is shown. 
This sequence analysis was 
part of a project to develop 
microsatellite markers for the 
oystercatcher (van Treuren et al. 
1999).
10 IPGRI TECHNICAL BULLETIN NO. 10
both forward and reverse primer, and are usually able to amplify 
fragments from 1–10 genomic sites simultaneously. Amplified 
fragments, usually within the 0.5–5 kb size range, are separated 
by agarose gel electrophoresis, and polymorphisms are detected, 
after ethidium bromide staining, as the presence or absence of 
bands of particular sizes. These polymorphisms are considered 
to be primarily due to variation in the primer annealing sites, but 
they can also be generated by length differences in the amplified 
sequence between primer annealing sites. 
 Strengths: The main advantage of RAPDs is that they are quick 
and easy to assay. Because PCR is involved, only low quantities 
of template DNA are required, usually 5–50 ng per reaction. Since 
random primers are commercially available, no sequence data for 
primer construction are needed. Moreover, RAPDs have a very high 
genomic abundance and are randomly distributed throughout the 
genome. 
 Weaknesses: The main drawback of RAPDs is their low 
reproducibility (Schierwater and Ender 1993), and hence highly 
standardized experimental procedures are needed because of their 
sensitivity to the reaction conditions. RAPD analyses generally 
require purified, high molecular weight DNA, and precautions 
are needed to avoid contamination of DNA samples because short 
random primers are used that are able to amplify DNA fragments 
in a variety of organisms. Altogether, the inherent problems of 
reproducibility make RAPDs unsuitable markers for transference 
or comparison of results among research teams working in a similar 
species and subject. As for most other multilocus techniques, RAPD 
markers are not locus-specific, band profiles cannot be interpreted in 
terms of loci and alleles (dominance of markers), and similar sized 
fragments may not be homologous.
 Applications: RAPDs have been used for many purposes, ranging 
from studies at the individual level (e.g. genetic identity) to studies 
involving closely related species. RAPDs have also been applied 
in gene mapping studies to fill gaps not covered by other markers 
(Williams et al. 1990; Hadrys et al. 1992). Variants of the RAPD 
technique include Arbitrarily Primed Polymerase Chain Reaction 
(AP-PCR) which uses longer arbitrary primers than RAPDs, and 
DNA Amplification Fingerprinting (DAF) that uses shorter, 5–8 bp 
primers to generate a larger number of fragments. Multiple Arbitrary 
Amplicon Profiling (MAAP) is the collective term for techniques 
using single arbitrary primers.
Molecular markers for genebank management 11
Microsatellites
Description: Microsatellites, like minisatellites, represent tandem 
repeats, but their repeat motifs are shorter (1–6 base pairs). If nucleotide 
sequences in the flanking regions of the microsatellite are known, 
specific primers (generally 20–25 bp) can be designed to amplify the 
microsatellite by PCR. Microsatellites and their flanking sequences can 
be identified by constructing a small-insert genomic library, screening 
the library with a synthetically labelled oligonucleotide repeat and 
sequencing the positive clones (Figure 1). Alternatively, microsatellites 
may be identified by screening sequence databases for microsatellite 
sequence motifs from which adjacent primers may then be designed. 
In addition, primers may be used that have already been designed for 
closely related species. Polymerase slippage during DNA replication, 
or slipped strand mispairing, is considered to be the main cause of 
variation in the number of repeat units of a microsatellite, resulting 
in length polymorphisms that can be detected by gel electrophoresis. 
Other causes have also been reported (Matsuoka et al. 2002).
 Strengths: The strengths of microsatellites include the codominance 
of alleles, their high genomic abundance in eukaryotes and their 
random distribution throughout the genome, with preferential 
association in low-copy regions (Morgante et al. 2002). Because 
the technique is PCR-based, only low quantities of template DNA 
(10–100 ng per reaction) are required. Due to the use of long PCR 
primers, the reproducibility of microsatellites is high and analyses 
do not require high quality DNA. Although microsatellite analysis 
is, in principle, a single-locus technique, multiple microsatellites 
may be multiplexed during PCR or gel electrophoresis if the size 
ranges of the alleles of different loci do not overlap (Ghislain et al. 
2004). This decreases significantly the analytical costs. Furthermore, 
the screening of microsatellite variation can be automated, if the use 
of automatic sequencers is an option (Figure 2).
 Weaknesses: One of the main drawbacks of microsatellites is that 
high development costs are involved if adequate primer sequences 
for the species of interest are unavailable, making them difficult 
to apply to unstudied groups. Although microsatellites are in 
principle codominant markers, mutations in the primer annealing 
sites may result in the occurrence of null alleles (no amplification of 
the intended PCR product), which may lead to errors in genotype 
scoring. The potential presence of null alleles increases with the use 
of microsatellite primers generated from germplasm unrelated to 
the species used to generate the microsatellite primers (poor “cross-
species amplification”). Null alleles may result in a biased estimate 
of the allelic and genotypic frequencies and an underestimation of 
heterozygosity. Furthermore, the underlying mutation model of 
12 IPGRI TECHNICAL BULLETIN NO. 10
microsatellites (infinite allele model or stepwise mutation model) is 
still under debate. Homoplasy may occur at microsatellite loci due 
to different forward and backward mutations, which may cause 
underestimation of genetic divergence. A very common observation 
in microsatellite analysis is the appearance of stutter bands that are 
artifacts in the technique that occur by DNA slippage during PCR 
amplification. These can complicate the interpretation of the band 
profiles because size determination of the fragments is more difficult 
Figure 2. Peak patterns of six perennial ryegrass samples screened for a microsatellite 
locus using fluorescent labelling on an ABI Prism 3700 DNA analyzer. At the top a size 
reference in number of base pairs is shown, and at the right the strength of the fluorescent 
signal. Alleles can be distinguished at respectively 120, 133, 137 and 139 bp. This 
microsatellite analysis was carried out within the framework of a project to study mating 
patterns in a regeneration population of perennial ryegrass (van Treuren, unpublished).
Molecular markers for genebank management 13
and heterozygotes may be confused with homozygotes. However, 
the interpretation may be clarified by including appropriate 
reference genotypes of known band sizes in the experiment.
 Applications: In general, microsatellites show a high level of 
polymorphism. As a consequence, they are very informative markers 
that can be used for many population genetics studies, ranging from 
the individual level (e.g. clone and strain identification) to that of 
closely related species. Conversely, their high mutation rate makes 
them unsuitable for studies involving higher taxonomic levels. 
Microsatellites are also considered ideal markers in gene mapping 
studies (Hearne et al. 1992; Morgante and Olivieri 1993; Queller et 
al. 1993; Jarne and Lagoda 1996). 
Inter Simple Sequence Repeats (ISSR)
Description: ISSRs are DNA fragments of about 100–3000 bp located 
between adjacent, oppositely oriented microsatellite regions. ISSRs 
are amplified by PCR using microsatellite core sequences as primers 
with a few selective nucleotides as anchors into the non-repeat 
adjacent regions (16–18 bp). About 10–60 fragments from multiple 
loci are generated simultaneously, separated by gel electrophoresis 
and scored as the presence or absence of fragments of particular size. 
Techniques related to ISSR analysis are Single Primer Amplification 
Reaction (SPAR) that uses a single primer containing only the core 
motif of a microsatellite, and Directed Amplification of Minisatellite-
region DNA (DAMD) that uses a single primer containing only the 
core motif of a minisatellite. 
 Strengths: The main advantage of ISSRs is that no sequence 
data for primer construction are needed. Because the analytical 
procedures include PCR, only low quantities of template DNA are 
required (5–50 ng per reaction). Furthermore, ISSRs are randomly 
distributed throughout the genome. 
 Weaknesses: Because ISSR is a multilocus technique, disadvantages 
include the possible non-homology of similar sized fragments. 
Moreover, ISSRs, like RAPDs, can have reproducibility problems.
 Applications: Because of the multilocus fingerprinting profiles 
obtained, ISSR analysis can be applied in studies involving genetic 
identity, parentage, clone and strain identification, and taxonomic 
studies of closely related species. In addition, ISSRs are considered 
useful in gene mapping studies (Godwin et al. 1997; Zietkiewicz et 
al. 1994; Gupta et al. 1994).
14 IPGRI TECHNICAL BULLETIN NO. 10
Single-Strand Conformation Polymorphism (SSCP) 
Description: SSCPs are DNA fragments of about 200–800 bp amplified 
by PCR using specific primers of 20–25 bp. Gel electrophoresis of 
single-strand DNA is used to detect nucleotide sequence variation 
among the amplified fragments. The method is based on the fact 
that the electrophoretic mobility of single-strand DNA depends 
on the secondary structure (conformation) of the molecule, which 
is changed significantly with mutation. Thus, SSCP provides a 
method to detect nucleotide variation among DNA samples without 
having to perform sequence reactions. In SSCP the amplified 
DNA is first denatured, and then subject to non-denaturing 
gel electrophoresis. Related techniques to SSCP are Denaturing 
Gradient Gel Electrophoresis (DGGE) that uses double stranded 
DNA which is converted to single stranded DNA in an increasingly 
denaturing physical environment during gel electrophoresis, 
and Thermal Gradient Gel Electrophoresis (TGGE) which uses 
temperature gradients to denature double stranded DNA during 
electrophoresis.
 Strengths: Advantages of SSCP are the codominance of alleles 
and the low quantities of template DNA required (10–100 ng per 
reaction) due to the fact that the technique is PCR-based.
 Weaknesses: Drawbacks include the need for sequence data 
to design PCR primers and the necessity of highly standardized 
electrophoretic conditions in order to obtain reproducible results. 
Furthermore, some mutations may remain undetected, and hence 
absence of mutation cannot be proven.
 Applications: SSCPs have been used to detect mutations in 
genes using gene sequence information for primer construction 
(Hayashi 1992).
Cleaved Amplified Polymorphic Sequence (CAPS) 
Description: CAPS are DNA fragments amplified by PCR using 
specific 20–25 bp primers, followed by digestion of the PCR products 
with a restriction enzyme. Subsequently, length polymorphisms 
resulting from variation in the occurrence of restriction sites are 
identified by gel electrophoresis of the digested products. CAPS 
have also been referred to as PCR-Restriction Fragment Length 
Polymorphism (PCR-RFLP).
 Strengths: Advantages of CAPS include the involvement of PCR 
requiring only low quantities of template DNA (50–100 ng per 
reaction), the codominance of alleles and the high reproducibility. 
Compared to RFLPs, CAPS analysis does not include the laborious 
and technically demanding steps of Southern blot hybridization 
and radioactive detection procedures. 
Molecular markers for genebank management 15
 Weaknesses: In comparison with RFLP analysis, CAPS polymorphisms 
are more difficult to find because of the limited size of the amplified 
fragments (300–1800 bp). Furthermore, sequence data are needed to 
design the PCR primers.
 Applications: CAPS markers have been applied predominantly 
in gene mapping studies (Akopyanz et al. 1992; Konieczny and 
Ausubel 1993).
Sequence Characterized Amplified Region (SCAR) 
Description: SCARs are DNA fragments amplified by the PCR using 
specific 15–30 bp primers, designed from nucleotide sequences established 
from cloned RAPD fragments linked to a trait of interest. By using longer 
PCR primers, SCARs do not face the problem of low reproducibility 
generally encountered with RAPDs. Obtaining a codominant marker 
may be an additional advantage of converting RAPDs into SCARs, 
although SCARs may exhibit dominance when one or both primers 
partially overlap the site of sequence variation. Length polymorphisms 
are detected by gel electrophoresis. 
 Strengths: The main advantage of SCARs is that they are quick and 
easy to use. In addition, SCARs have a high reproducibility and are 
locus-specific. Due to the use of PCR, only low quantities of template 
DNA are required (10–100 ng per reaction). 
 Weaknesses: Disadvantages include the need for sequence data to 
design the PCR primers.
 Applications: SCARs are locus specific and have been applied 
in gene mapping studies and marker assisted selection (Paran and 
Michelmore 1993).
Amplified Fragment Length Polymorphism (AFLP)
Description: AFLP is a trademark of KeyGene (Wageningen, The 
Netherlands). AFLPs are DNA fragments (80–500 bp) obtained from 
digestion with restriction enzymes, followed by ligation of oligonucleotide 
adapters to the digestion products and selective amplification by the 
PCR. AFLPs therefore involve both RFLP and PCR. The PCR primers 
consist of a core sequence (part of the adapter), and a restriction enzyme 
specific sequence and 1–5 selective nucleotides (the higher the number 
of selective nucleotides, the lower the number of bands obtained per 
profile). The AFLP banding profiles are the result of variations in the 
restriction sites or in the intervening region. The AFLP technique 
simultaneously generates fragments from many genomic sites (usually 
50–100 fragments per reaction) that are separated by polyacrylamide 
gel electrophoresis and that are generally scored as dominant markers 
(Figure 3). Selective Fragment Length Amplification (SFLA) and Selective 
Restriction Fragment Amplification (SRFA) are synonyms sometimes 
16 IPGRI TECHNICAL BULLETIN NO. 10
used to refer to AFLPs. A variation of the AFLP technique is 
known as Selectively Amplified Microsatellite Polymorphic 
Locus (SAMPL). This technology amplifies microsatellite loci 
by using a single AFLP primer in combination with a primer 
complementary to compound microsatellite sequences, which 
do not require prior cloning and characterization.
 Strengths: The strengths of AFLPs lie in their high genomic 
abundance, considerable reproducibility, the generation of 
many informative bands per reaction, their wide range of 
applications, and the fact that no sequence data for primer 
construction are required. AFLPs may not be totally randomly 
distributed around the genome as clustering in certain genomic 
regions, such as centromers, has been reported for some crops 
(Alonso-Blanco et al. 1998; Young et al. 1999; Saal and Wricke 
2002). AFLPs can be analyzed on automatic sequencers, 
but software problems concerning the scoring of AFLPs are 
encountered on some systems. 
 Weaknesses: Disadvantages include the need for purified, 
high molecular weight DNA, the dominance of alleles, and the 
possible non-homology of comigrating fragments belonging to 
different loci. In addition, due to the high number and different 
intensity of bands per primer combination, there is the need 
to adopt certain strict but subjectively determined criteria for 
acceptance of bands in the analysis. Special attention should be 
paid to the fact that AFLP bands are not always independent. For 
example, in case of an insertion between two restriction sites the 
amplified DNA fragment results in increased band size. This will 
be interpreted as the loss of a small band and at the same time 
as the gain of a larger band. This is important for the analysis 
of genetic relatedness, because it would enhance the weight of 
non-independent bands compared to the other bands.
 Applications: Because of the highly informative fingerprinting 
profiles generally obtained, AFLPs can be applied in studies 
involving genetic identity, parentage and identification of 
clones and cultivars, and phylogenetic studies of closely related 
species. Their high genomic abundance and generally random 
distribution throughout the genome make AFLPs a widely 
valued technology for gene mapping studies (Vos et al. 1995). 
SAMPL is considered more applicable to intraspecific than to 
interspecific studies due to frequent null alleles.
Figure 3. Variation among flax samples in part of an AFLP autoradiogam using P³³ radiolabelling. 
This AFLP analysis formed part of a marker-assisted rationalization study in a flax collection 
(van Treuren et al. 2001). 
Molecular markers for genebank management 17
Comparative qualities of marker techniques
DNA provides many advantages that make it especially attractive 
in studies of diversity and relationships. These advantages have 
been reviewed elsewhere (e.g. Crawford 1990). They include: 1) 
Freedom from environmental and pleiotropic effects. Molecular 
markers do not exhibit phenotypic plasticity, while morphological 
and biochemical markers can vary in different environments. DNA 
characters have a much better chance of providing homologous 
traits. Most morphological or biochemical markers, in contrast, 
are under polygenic control, and subject to epistatic control 
and environmental modification (plasticity); 2) A potentially 
unlimited number of independent markers are available, unlike 
morphological or biochemical data; 3) DNA characters can be more 
easily scored as discrete states of alleles or DNA base pairs, while 
some morphological, biochemical and field evaluation data must 
be scored as continuously variable characters that are less amenable 
to robust analytical methods; 4) Many molecular markers are 
selectively neutral. These advantages do not imply that other more 
traditional data used to characterize biodiversity are not valuable. 
On the contrary, morphological, ecological and other “traditional” 
data will continue to provide practical and often critical information 
needed to characterize genetic resources.
 Molecular markers differ in many qualities and must therefore 
be carefully chosen and analyzed differently with their differences 
in mind. To assist in choosing the appropriate marker technique, an 
overview of the main properties of the 11 main marker technologies 
described in ”Overview of molecular technologies” will follow. 
These properties are summarized in Table 2. 
Genomic abundance 
The number of markers that can be generated is determined mainly 
by the frequency at which the sites of interest occur within the 
genome. RFLPs and AFLPs generate abundant markers due to the 
large number of restriction enzymes available and the frequent 
occurrence of their recognition sites within genomes. Within 
eukaryotic genomes, microsatellites have also been found to 
occur frequently. RAPD markers are even more abundant because 
numerous random sequences can be used for primer construction. 
In contrast, the number of allozyme markers is restricted due to the 
limited number (about 30) of enzyme detection systems available for 
analysis. To investigate specific genomic regions by PCR sequencing, 
SSCP, CAPS or SCAR, sequence data of the sites of interest (structural 
genes mainly) are required for primer construction. Although, in 
principle, many sites of interest may occur within genomes, the 
18 IPGRI TECHNICAL BULLETIN NO. 10
Ta
b
le
 2
. O
ve
rv
ie
w
 o
f t
he
 re
le
va
nt
 c
ha
ra
ct
er
is
tic
s 
of
 1
1 
m
ai
n 
m
ar
ke
r 
te
ch
no
lo
gi
es
A
llo
zy
m
es
R
FL
P
M
in
i-
sa
te
lli
te
s
P
C
R
 
se
q
ue
nc
in
g
R
A
P
D
M
ic
ro
-
sa
te
lli
te
s
IS
S
R
S
S
C
P
C
A
P
S
S
C
A
R
A
FL
P
G
en
o
m
ic
 a
b
un
d
an
ce
Lo
w
H
ig
h
M
ed
iu
m
Lo
w
H
ig
h
H
ig
h
M
ed
iu
m
-
H
ig
h
Lo
w
Lo
w
Lo
w
H
ig
h
Le
ve
l
o
f 
p
o
ly
m
o
rp
hi
sm
Lo
w
M
ed
iu
m
H
ig
h
Lo
w
M
ed
iu
m
H
ig
h
M
ed
iu
m
Lo
w
Lo
w
-
M
ed
iu
m
M
ed
iu
m
M
ed
iu
m
Lo
cu
s-
sp
ec
ifi
ci
ty
Ye
s
Ye
s
N
o/
Ye
s
Ye
s
N
o
Ye
s
N
o
Ye
s
Ye
s
Ye
s
N
o
C
o
d
o
m
in
an
ce
o
f 
al
le
le
s
Ye
s
Ye
s
N
o/
Ye
s
Ye
s
N
o
Ye
s
N
o
Ye
s
Ye
s
N
o/
Ye
s
N
o/
Ye
s
R
ep
ro
d
uc
ib
ili
ty
H
ig
h
H
ig
h
H
ig
h
H
ig
h
Lo
w
H
ig
h
M
ed
iu
m
-
H
ig
h
M
ed
iu
m
H
ig
h
H
ig
h
M
ed
iu
m
-
H
ig
h
La
b
o
ur
-i
nt
en
si
ty
Lo
w
H
ig
h
H
ig
h
Lo
w
/H
ig
h
Lo
w
Lo
w
Lo
w
Lo
w
-
M
ed
iu
m
Lo
w
-
M
ed
iu
m
Lo
w
M
ed
iu
m
Te
ch
ni
ca
l d
em
an
d
s
Lo
w
H
ig
h
H
ig
h
H
ig
h
Lo
w
Lo
w
-
M
ed
iu
m
Lo
w
-
M
ed
iu
m
M
ed
iu
m
Lo
w
Lo
w
M
ed
iu
m
O
p
er
at
io
na
l c
o
st
s
Lo
w
H
ig
h
H
ig
h
H
ig
h
Lo
w
Lo
w
Lo
w
-
M
ed
iu
m
Lo
w
-
M
ed
iu
m
Lo
w
Lo
w
M
ed
iu
m
D
ev
el
o
p
m
en
t 
co
st
s
Lo
w
M
ed
iu
m
-
H
ig
h
M
ed
iu
m
-
H
ig
h
H
ig
h
Lo
w
-
M
ed
iu
m
H
ig
h
Lo
w
H
ig
h
M
ed
iu
m
M
ed
iu
m
Lo
w
Q
ua
nt
it
y 
o
f 
D
N
A
 
re
q
ui
re
d
-
H
ig
h
H
ig
h
Lo
w
Lo
w
Lo
w
Lo
w
Lo
w
Lo
w
Lo
w
M
ed
iu
m
A
m
en
ab
ili
ty
to
 a
ut
o
m
at
io
n
N
o
N
o
N
o
Ye
s
Ye
s
Ye
s
Ye
s
N
o
Ye
s
Ye
s
Ye
s
Molecular markers for genebank management 19
proportion of the genome covered by PCR sequencing, SSCP, CAPS 
and SCAR in studies reported to date is limited. However, this is 
expected to change due to the wealth of sequence information that 
is becoming increasingly available for different crops. Genomic 
abundance is essential to studies where a large fraction of the 
genome needs to be covered, e.g. for the development of high-
density linkage maps in gene mapping studies. 
 If, in addition to genomic abundance, genome coverage is also 
sought, caution should be taken in marker selection. While some 
markers are known to be scattered quite evenly across the genomes, 
others, such as some AFLP markers, sometimes cluster in certain 
genomic regions. For example, clustering of AFLP markers has been 
reported in centromeric regions of Arabidopsis thaliana (Alonso-Blanco et 
al. 1998), soybean (Young et al. 1999) and rye (Saal and Wricke 2002).
Level of polymorphism 
The resolving power of genetic markers is determined by the level 
of polymorphism detected, which is determined by the mutation 
rate at the genomic sites involved. Variation at allozyme loci is 
caused by point mutations, which occur at low frequency (<10–6 
per meiosis). Moreover, only mutations modifying the net electric 
charge and conformation of proteins can be detected, reducing the 
resolving power of allozymes. In contrast, mutations at minisatellite 
and microsatellite loci, mainly due to changes in the number of 
repeat units of the core sequence, have been estimated to occur at 
the relatively high frequency of 10–3–10–2 and 10–5–10–2 per meiosis, 
respectively (Jarne and Lagoda 1996). The other markers presented 
in Table 2 generally show intermediate levels of polymorphism, 
resulting from base substitutions, insertions or deletions which 
may alter primer annealing sites and recognition sites of restriction 
enzymes, or change the size of restriction fragments and amplified 
products. In choosing the appropriate technique, the level of 
polymorphism detected by the marker needs to be considered 
in relation to the presumed degree of genetic relatedness within 
the material to be studied. Higher resolving power is required 
when samples are more closely related. For example, analyses 
within species or among closely related species may call for fast-
evolving markers such as microsatellites. However if the objective 
is to study genetic relatedness at higher taxonomic levels (such as 
congeneric species), AFLPs or RFLPs may be a better choice because 
co-migrating fast-evolving markers will have less chance of being 
homologous. A primary guiding principle in marker selection is that 
more conservative markers (those having slower evolutionary rates) 
are needed with increasing evolutionary distance and vice-versa.Ta
b
le
 2
. O
ve
rv
ie
w
 o
f t
he
 re
le
va
nt
 c
ha
ra
ct
er
is
tic
s 
of
 1
1 
m
ai
n 
m
ar
ke
r 
te
ch
no
lo
gi
es
A
llo
zy
m
es
R
FL
P
M
in
i-
sa
te
lli
te
s
P
C
R
 
se
q
ue
nc
in
g
R
A
P
D
M
ic
ro
-
sa
te
lli
te
s
IS
S
R
S
S
C
P
C
A
P
S
S
C
A
R
A
FL
P
G
en
o
m
ic
 a
b
un
d
an
ce
Lo
w
H
ig
h
M
ed
iu
m
Lo
w
H
ig
h
H
ig
h
M
ed
iu
m
-
H
ig
h
Lo
w
Lo
w
Lo
w
H
ig
h
Le
ve
l
o
f 
p
o
ly
m
o
rp
hi
sm
Lo
w
M
ed
iu
m
H
ig
h
Lo
w
M
ed
iu
m
H
ig
h
M
ed
iu
m
Lo
w
Lo
w
-
M
ed
iu
m
M
ed
iu
m
M
ed
iu
m
Lo
cu
s-
sp
ec
ifi
ci
ty
Ye
s
Ye
s
N
o/
Ye
s
Ye
s
N
o
Ye
s
N
o
Ye
s
Ye
s
Ye
s
N
o
C
o
d
o
m
in
an
ce
o
f 
al
le
le
s
Ye
s
Ye
s
N
o/
Ye
s
Ye
s
N
o
Ye
s
N
o
Ye
s
Ye
s
N
o/
Ye
s
N
o/
Ye
s
R
ep
ro
d
uc
ib
ili
ty
H
ig
h
H
ig
h
H
ig
h
H
ig
h
Lo
w
H
ig
h
M
ed
iu
m
-
H
ig
h
M
ed
iu
m
H
ig
h
H
ig
h
M
ed
iu
m
-
H
ig
h
La
b
o
ur
-i
nt
en
si
ty
Lo
w
H
ig
h
H
ig
h
Lo
w
/H
ig
h
Lo
w
Lo
w
Lo
w
Lo
w
-
M
ed
iu
m
Lo
w
-
M
ed
iu
m
Lo
w
M
ed
iu
m
Te
ch
ni
ca
l d
em
an
d
s
Lo
w
H
ig
h
H
ig
h
H
ig
h
Lo
w
Lo
w
-
M
ed
iu
m
Lo
w
-
M
ed
iu
m
M
ed
iu
m
Lo
w
Lo
w
M
ed
iu
m
O
p
er
at
io
na
l c
o
st
s
Lo
w
H
ig
h
H
ig
h
H
ig
h
Lo
w
Lo
w
Lo
w
-
M
ed
iu
m
Lo
w
-
M
ed
iu
m
Lo
w
Lo
w
M
ed
iu
m
D
ev
el
o
p
m
en
t 
co
st
s
Lo
w
M
ed
iu
m
-
H
ig
h
M
ed
iu
m
-
H
ig
h
H
ig
h
Lo
w
-
M
ed
iu
m
H
ig
h
Lo
w
H
ig
h
M
ed
iu
m
M
ed
iu
m
Lo
w
Q
ua
nt
it
y 
o
f 
D
N
A
 
re
q
ui
re
d
-
H
ig
h
H
ig
h
Lo
w
Lo
w
Lo
w
Lo
w
Lo
w
Lo
w
Lo
w
M
ed
iu
m
A
m
en
ab
ili
ty
to
 a
ut
o
m
at
io
n
N
o
N
o
N
o
Ye
s
Ye
s
Ye
s
Ye
s
N
o
Ye
s
Ye
s
Ye
s
20 IPGRI TECHNICAL BULLETIN NO. 10
Locus-specificity 
Genetic markers using multilocus probes or primers benefit from 
the fact that multiple polymorphisms, representing various genomic 
regions, are generated simultaneously. However, a major drawback 
is that in general the band profiles cannot be interpreted in terms of 
loci and alleles, but are scored as the presence or absence of bands 
of a particular size. As a consequence, similar sized fragments 
may represent alleles from different loci and not be homologous. 
Therefore, locus-specific markers should be considered for questions 
of phylogeny or genetic relatedness. Alternatively, markers for 
fingerprinting studies rely on differences only, and homology is not a 
concern. In general, locus-specific markers generate polymorphisms 
of known identity, however in most cases sequencing data are 
needed for their development.
Codominance of alleles 
Codominant markers are markers for which both alleles are 
expressed when co-occurring in an individual. Therefore, with 
codominant markers, heterozygotes can be distinguished from 
homozygotes, allowing the determination of genotypes and 
allele frequencies at loci. In contrast, band profiles of dominant 
markers are scored as the presence or absence of fragments of a 
particular size, and heterozygosity cannot be determined directly. 
As a consequence, only an approximation of allele frequency 
can be obtained by assuming Hardy-Weinberg equilibrium in a 
population and estimating allele frequency from the proportion of 
individuals with the absent phenotype (homozygous recessive). 
For predominantly self-fertilizing species, heterozygosity could be 
disregarded and allele frequencies be considered equal to observed 
band frequencies. Codominant markers are preferred for most 
applications. The majority of codominant markers are single locus 
markers and hence the degree of information per assay is usually 
lower compared to the multilocus techniques.
Reproducibility 
Reproducibility is always an important property of markers, 
but even more important with collaborative projects, involving 
the generation of data by different labs whose results need to be 
assembled. To obtain reproducible results, the extraction of purified, 
high quality DNA is a prerequisite for the majority of the marker 
techniques. For example, degraded and/or unpurified DNA may 
affect the amplification or restriction of DNA, resulting in unspecific 
polymorphisms. Even when purified and high molecular weight 
DNA is used, RAPDs often fail to show reproducible results. This 
Molecular markers for genebank management 21
is because RAPD primers are very short (10 bp), which can result in 
alterations in their annealing behaviour to the template DNA and the 
resulting band profiles as a result of small deviations in experimental 
conditions. Therefore, highly standardized experimental procedures 
are required when RAPD markers are being used. This implies 
the need for including repeated samples and also the inclusion 
of reference genotypes which represent bands of known size. 
Problems with reproducibility in RAPD analysis could be overcome 
by focusing on mapped markers for which their inheritance has 
already been verified.
Labour-intensity 
RFLPs and minisatellites are labour-intensive markers because 
their analysis includes the time-consuming steps of Southern 
blotting, labelling of probes and hybridization. Therefore, PCR-
based techniques are currently preferred, some of which can even 
be automated to decrease the labour-intensity. PCR sequencing 
may still be quite labour-intensive if performed by the old time-
consuming method of performing four separate sequence reactions 
per sample. However, automated procedures have greatly reduced 
labour-intensity of PCR-sequencing. The labour-intensity of the 
other PCR-based techniques presented varies from low to medium, 
depending on the methodological procedures required in addition 
to PCR (Table 2). 
Technical demands 
RFLPs, minisatellites and manual PCR sequencing require higher 
technical skills and facilities for analysis. RFLP and minisatellite 
analyses require Southern blot hybridizations and may include 
radioactive labelling. This calls for expertise and exclusive facilities 
needed to comply with special legal and safety requirements. These 
technologies are therefore among the most technically demanding 
markers. Another type of technical demand arises from the use of 
polyacrylamide gels and automated equipment. Allozymes and 
PCR-based markers analyzed on agarose gels (e.g. RAPD, SCAR 
and microsatellites) are the least technically demanding. 
Operational costs 
Wages, laboratory facilities, technical equipment and consumables 
all contribute to the operational costs of the technologies. Relatively 
expensive consumables include Taq-polymerase needed for all PCR-
based marker types, restriction enzymes (for RFLPs, minisatellites 
and CAPS, and particularly the restriction enzyme MseI often used in 
AFLPs) and isotopes where polymorphisms are visualized by means 
22 IPGRI TECHNICAL BULLETIN NO. 10
of radioactive labelling. Polyacrylamide gels are more expensive to 
run than agarose gels and require visualization of polymorphisms by 
autoradiography or silver staining procedures, which are more costly 
compared to ethidium-bromide staining. Laborious and technically 
demanding markers, such as RFLPs, minisatellites, PCR sequencing, 
and those techniques being performed by automated equipment, 
are quite expensive. Costs of performing RAPD analyses are usually 
considered low. However, if measures to ensure reproducibility and 
low numbers of markers per primer are taken into account, costs 
may increase to the level of the more complex technologies.
 In general, operational costs of markers will vary depending on the 
methodology. Regarding automated procedures and technologies, 
while purchasing the equipment is usually very expensive and 
the technical expertise required is high, a significant increase in 
throughput may be obtained through multiplexing. An additional 
consideration is the emergence of cost effective “outsourcing” 
companies to generate marker-based and DNA sequencing data, as 
service laboratories keep up with efficient equipment developments. 
Outsourcing allows researchers to concentrate on defining questions, 
experimental design, data analysis and interpretation. The relative 
costs/benefits of outsourcing will vary in different labs according to 
local labour and supply costs, availability of equipment, the benefit 
of generating your own data for quality control or educational 
purposes, and the legal requirements to ship crop germplasm DNA 
out of a country.
Development costs 
Marker development may be very time-consuming and costly 
when suitable probes or sequence data for primer construction 
are unavailable. Development of suitable probes for Southern blot 
hybridizations (e.g. for RFLP analysis) requires the construction of 
either genomic or cDNA libraries and the examination of various 
probe/restriction enzyme combinations for their ability to detect 
polymorphisms. The development of site-specific PCR primers 
(e.g. for microsatellite analysis) also requires the construction of 
libraries, which then need to be screened to identify the fragments of 
interest. Subsequently, the identified fragments need to be sequenced 
to verify their suitability and to design primers. Therefore, the 
investment required for marker development should be evaluated 
in relation to the intended range of application of the technique. 
Alternatively, new genomic tools are allowing probes, primers 
and sequence data to be obtained from genome databases of other 
species, with the understanding, as in all DNA tools, that their 
Molecular markers for genebank management 23
usefulness may decrease with increasing evolutionary distance 
between the species. 
Quantity of DNA required 
Because only small quantities of template DNA (5–100 ng per 
reaction) are required, techniques which are based on the PCR are 
currently preferred. Although RFLPs and minisatellites require 
the largest amount of DNA (5–10 µg per reaction), Southern blot 
membranes may be probed several times. Intermediate quantities of 
DNA are needed for AFLP-analysis (0.3–1 µg per reaction) because 
restriction of the DNA precedes the PCR reaction. In general, 
consideration should be given to the use of PCR-based markers if 
only small amounts of DNA can be obtained.
Amenability to automation 
Currently, if adequate equipment and resources are available, 
techniques that can be automated are highly preferred because of 
the potential for high sample throughput. Although considerable 
financial investment is still required, automation may be cost-
effective when techniques are applied on a routine basis. As 
pointed out above, outsourcing of data generation may also be an 
alternative strategy. Nearly all techniques that are based on the PCR 
are amenable to a certain degree of automation.
24 IPGRI TECHNICAL BULLETIN NO. 10
Genebank management
The realization that the world was rapidly losing much of its 
agrobiodiversity led to a global effort to collect and conserve 
germplasm. An increasing awareness of the narrow genetic base 
of crops in advanced agriculture and potential susceptibility to 
crop failures (National Research Council 1972) further stimulated 
the efforts to collect, and a system of national and international 
genebanks eventually amassed holdings of 6.1 million collections 
in 1300 genebanks worldwide (FAO 1996).
 The great success of this collecting phase has presented new 
challenges to genebank managers to determine needs for new 
collections, maintain existing collections, determine optimum 
regeneration methods, characterize collections for useful agronomic 
traits, classify the collections, and reduce the size of the working 
collection to a manageable size (the core collection concept; Frankel 
1984; Hamon et al. 1995). A major question facing genebank 
managers concerns which material needs to be included in a 
collection to conserve a representative sample of the total genetic 
diversity range of a crop. Schoen and Brown (1993) document how 
molecular marker techniques provide the optimal strategy to sample 
materials from populations of wild relatives to maximize the number 
of alleles. Hamilton (1994) argues that simple marker diversity is an 
insufficient measure of maintenance of diversity, and suggests that 
more detailed studies of genetic correlations of quantitative genetic 
variation and gene and environment interaction are needed.
 Another important question is how the genetic diversity present 
in the original collections is being affected and retained after serial 
germplasm increase cycles. Germplasm is stored as seeds for the 
majority of species, and under proper conditions of low temperature 
and humidity, seeds remain viable for 20 years or more, but other 
species have seed that rapidly lose viability and, consequently must 
be regenerated vegetatively. Each time a collection is regenerated 
from seeds, it has the potential to lose genetic diversity, especially 
if it is increased under greenhouse or experimental field conditions 
where it is removed from natural evolutionary forces (Bretting and 
Duvick 1997). Others question the adequacy of population sizes 
needed for increase cycles (Brown et al. 1997), proper pollination 
and seed increase strategies (Crossa and Vencovsky 1994), and 
avoidance of the accumulation of deleterious mutations within 
accessions under long-term storage (Schoen et al. 1998). 
Molecular markers for genebank management 25
Acquisition of collection material
Biogeography is the science that attempts to document and understand 
spatial patterns of biodiversity, and includes the study of historical and 
climatic effects on plant distributions (Brown and Lomolino 1998). The 
literature sometimes shows association of DNA-based relationships to 
geography and sometimes fails to show such associations. One study 
showing association is that of Whitkus et al. (1998), who investigated 
the origins of different cultivars of cacao. Wild populations of cacao are 
widely distributed from the upper Amazonian basin (South American 
populations) to southern Mexico (Mesoamerican populations). Cultivars 
from Mesoamerica are termed “criollo”, while South American 
cultivars are termed “forastero”. One hypothesis suggested a single 
origin of cultivars in South America, with human dispersion and later 
differentiation to Mesoamerica, while the alternative hypothesis was for 
independent origins in both areas from indigenous wild populations. 
Whitkus et al. (1998) examined these alternative hypotheses using 
RAPD markers from a wide variety of geographically diverse wild 
and cultivated populations. Their results supported a single origin of 
modern cultivars in South America. However, they also discovered 
relationships of distinct populations in ancient Mayan groves in 
southern Mexico to wild Mesoamerican populations. If these ancient 
Mayan populations are truly remnants of ancient cultivated sites, and 
if RFLP data are providing a good recapitulation of relationships, a 
separate origin of cacao in both regions is supported.
 Cronn et al. (1997) evaluated isozyme variability of 146 accessions 
of wild and weedy sunflower and two outgroup species, representing 
the geographic range of the species throughout much of North 
America. The great genetic variation in this species has led to a 
multitude of names. The goals of the study were to quantify the range 
of genetic diversity, divergence and redundancy in the collection, and 
to elucidate possible patterns of ecogeographic variation that would 
clarify interrelationships and place of origin of the domesticated 
forms. Prior hypotheses suggested domestication in the central United 
States with later dissemination from this region, or domestication in 
the southwestern United States. The results showed greater diversity 
in wild compared to cultivated accessions. There was a geographic 
association with isozyme diversity, with the greatest diversity from 
the Great Plains. The domesticated accessions are most similar to those 
from the Great Plains, suggesting this as the site of domestication. The 
alternative hypothesis of origin in the southwestern United States 
cannot be discounted however if early cultivars were introgressed with 
germplasm from the southwest.
 Other studies showing a relationship of molecular variation to 
geography or ecology are Espejo-Ibañez et al. (1994), Garvin and 
26 IPGRI TECHNICAL BULLETIN NO. 10
Weeden (1994), Yang et al. (1996), Lanner et al. (1997), Nevo et al. (1997, 
1998) Fahima et al. (1998), and Zerega (2004). However, there is often 
not a congruence of genetic distance and geographic distance as shown 
in Maass and Ocampo (1995), Lanner et al. (1996), Ursla et al. (1997), 
Varghese et al. (1997), Comes and Abbott (1998), Freville et al. (2001) and 
del Rio and Bamberg (2002).
 Marker studies addressing the distribution of genetic variation within 
and between populations have been used to guide the acquisition of new 
material for germplasm conservation. For example, the substantially 
higher level of RFLP variation observed in self-incompatible, as 
compared to self-compatible species of tomatoes was used to recommend 
predominantly sampling the self-incompatible species for germplasm 
acquisition (Miller and Tanksley 1990). Priority regions for further 
sampling of sorghum were identified by high diversity levels in some 
populations estimated from allozymes (Aldrich et al. 1992). Sampling of 
marginal populations was recommended in order to capture most of the 
rare and local alleles responsible for differentiation of pawpaw [Asimina 
triloba (L.) Dunal] in the US, based on the distribution of variation for 
allozme and RAPD markers (Huang et al. 1998, 2000). Steiner et al. (1998) 
used RAPDs to measure genetic diversity of germplasm collections 
of Trifolium incarnatum L. In combination with data on pedigree, they 
documented low diversity in the existing cultivars and identified 
populations that need additional collecting. Lamboy et al. (1996) 
examined, with six enzyme systems, 291 seedlings of 31 sib families of 
Malus sieversii (Ledeb.) M. Roem., the primary progenitor of the cultivated 
apple. The families were produced from 14 populations collected in 4 
regions in eastern Kazakhstan. They found that there was only a very 
weak correspondence of allele frequencies to geographic region, that 
most of the sib families were more closely related to sib families from 
other regions, and that there were no alleles that were both fixed within 
and unique to a region. They concluded that populations of this species 
formed a large panmictic population and that thorough sampling of 
a few large populations would efficiently capture most of its genetic 
diversity. Other studies addressing the application of marker data for 
germplasm acquisition include Brown and Munday (1982), Murphy and 
Philips (1993), Lamboy et al. (1994), Tsegaye et al. (1996), Nebauer et al. 
(1999), Maquet et al. (1996, 1997) and Zoro Bi et al. (1998).
 Marker data have also been used to identify unique germplasm that 
may be underrepresented in a genebank. Doebley (1989) examined, 
by RFLP analysis, the chloroplast DNA of 39 accessions representing 
the range of maize and its wild relatives in the genus Zea L. and the 
related Tripsacum dactyloides (L.) L. and T. pilosum Schibn. & Merr. 
The finding of an atypical cytoplasmic genome, incorporated into the 
Molecular markers for genebank management 27
nuclear background of some Z. perennis (Hitchc.) Reeves & Mangelsd. 
accessions led to the conclusion that introgression had occurred from 
an unknown Zea taxon. Kesseli et al. (1991) compared variation at 143 
RFLP loci between 67 accessions of cultivated lettuce and five related 
species. Lactuca serriola L. is closely related to cultivated lettuce, but the 
RFLP data indicated that unknown populations of L. serriola or other 
unknown entities have been involved in the evolution of cultivated 
lettuce. In summary, molecular data are useful to guide genebank 
curators in decisions concerning acquisition in cases where molecular 
variation is associated with geographic variation.
Taxonomic issues
Genebank researchers often address questions of species delimitations 
and species interrelationships (taxonomy or systematics). These data 
can be analyzed in different ways, depending on the taxonomic level of 
the question and the marker type used. The discussion that follows is 
intended only as a very basic introduction to fundamental differences 
of various classes of methods (phenetic and cladistic) to analyze 
molecular data; the agreement (congruence) or disagreement of results 
from different molecular markers of the same germplasm sources; the 
predictive value of taxonomic results; and different ways in which 
these and other data are used to define species. We guide the reader 
to Judd et al. (2002) for detailed treatments of molecular systematics 
and analytical techniques.
 Taxonomy, as all branches of science, continues to advance with new 
sources of data, new ways to analyze these data, new ways to interpret 
the results of data analyses, and different philosophies on how to 
interpret the results of a data analysis. This manual will touch on some 
of these sources of ambiguity or controversy. These include different 
methods to use in phenetic and cladistic analyses, different ways to 
interpret the results, how to interpret the different results from the same 
group of organisms but with different sources of data (congruence of 
results), and different ways to define species. A discussion of these 
unresolved ambiguities is not meant to confuse the reader but rather 
to impart an understanding of different interpretations from different 
investigators and to help you more intelligently design and interpret 
your own results.
Phenetic vs. cladistic analyses
Cladistics and phenetics are systematic approaches that share some 
points in common but have many fundamental differences. Both focus 
on the explicit examination of many characters, in contrast to earlier 
intuitive systematic techniques that sometimes used limited characters, 
28 IPGRI TECHNICAL BULLETIN NO. 10
or placed greater importance (weight) on only some characters. 
Nevertheless, many early and intuitive taxonomic interpretations 
concur with more recent molecular studies and have stood the test of 
time. They continue to form the basis for newer approaches and have 
provided invaluable points of reference for comparison of datasets. 
 Cladistic and phenetic procedures begin in the same way, by 
gathering an organism-by-character rectangular data matrix. The 
organism can be an individual, from which actual data are measured, 
or a “virtual taxon” (a species or higher rank as genus or family) from 
which taxon-specific (or near taxon-specific) characters are inferred. 
The term Operational Taxonomic Unit, (OTU), is frequently used to 
refer to the individual or taxon being scored for characters. It is from 
this point of explicit data gathering that the techniques differ. Some 
basic references for phenetic procedures are Sneath and Sokal (1962), 
Sokal and Crovello (1970), Rohlf (1992); and for cladistic procedures 
are Wiley et al. (1991) and Hall (2001).
Phenetics 
The advent of computers allowed the application of multivariate phenetic 
techniques to taxonomic data (grouping based on overall similarity 
(Sneath and Sokal 1962)). A phenetic philosophy typically analyzes all 
data types (as morphological, anatomical, chemical, molecular, or any 
character type) as having equal value, even reproductive characters 
that in the taxonomic morphological species concept (see below) were 
typically given more value. An added claim was that these explicit, 
computer-based methods opened up the new classifications to scrutiny, 
as all data and analytical techniques were clearly stated and open to re-
evaluation by others. 
 Phenetic philosophy holds that changes in character states are 
so common that the evolutionary history of organisms can never be 
determined. As such, organisms are best classified by overall similarity. 
A phenetic organism-by-character matrix can be: 1) actual quantitative 
measurements, as lengths, widths, or other quantitative measures of 
many plant parts; or 2) simple coded qualitative characters as 0/1; or 3) 
ranges of qualitative measurements, as 0, 1, 2, n; or 4) a mixture of each 
type of coded character. Generally, it is best to minimize the combination 
of qualitative and quantitative data because there are different algorithms 
to analyze each type of data. These characters can be either morphological 
or molecular characters, but we concentrate in this manual on molecular 
characters.
 As described in “Overview of molecular technologies”, molecular 
marker data are useful in that they can be scored as discrete characters 
(present=1, absent=0), or for DNA sequence data can be scored as one 
of the four nucleotides. As detailed in Table 2, some molecular markers 
Molecular markers for genebank management 29
are codominant (showing both alleles) and others dominant (showing 
only one allele). Dominant markers, such as AFLPs or RAPDs have a 
higher information content, therefore, in shared present/present (1/1) or 
present/absent comparisons (1/0) than in absent/absent comparisons 
(0/0) comparisons; that is, they have more chance of representing 
homologous states. Because of this, they are best analyzed with proximity 
algorithms that place no weight on 0/0 matches, as a Jaccard’s algorithm 
(Jaccard 1908), or reduced weight on 0/0 matches, as a Dice algorithm 
(Dice 1945) also called Nei-Li (Nei and Li 1979). Codominant markers, on 
the other hand, can be analyzed with similarity algorithms that provide 
equal weight to all pair-wise combinations, including 0/0 matches, such 
as the simple matching coefficient.
 Phenograms are then generated from these proximity matrices. As 
with proximity algorithms, choices must be made from a wide range of 
phenogram methods to use, such as single linkage, complete linkage, 
unweighted pair group mean with arithmetic averaging (UPGMA), and 
others. More than one tree can be generated from the same proximity 
matrix. If more than one tree is produced, “consensus” programs 
summarize the common branching points of these alternative trees. 
 Another type of tree building method is the neighbour-joining 
method. It was developed by Saitou and Nei (1987) as a method for 
estimating phylogenetic trees. While the method is based on the idea 
of parsimony (a cladistic technique) as described later, the neighbour-
joining method does not attempt to obtain the shortest possible tree for 
a set of data. Rather, it attempts to find a tree that is close to the shortest 
phylogenetic tree (Rohlf 1992). This method does not require the use of 
outgroups but they can be used. It can be viewed as an intermediate type 
of analysis to phenetics and cladistics. 
 Different results are produced, therefore, from using different 
combinations of proximity coefficients and tree building methods. 
Despite the goal of objectivity of phenetic approaches, there are different 
choices (many more than presented here) for proximity algorithms and 
tree building methods. They provide yet different results. In addition to 
the branching trees (dendrograms) discussed above, there are various 
ordination analyses, as principal components analysis (PCA) and 
canonical discriminant analysis (CDA), that provide plots of OTUs in 
two or three-dimensional space (dots placed in a square or in a box, or 
cube, with the closer-spaced dots inferring closer similarity among OTUs 
than farther-spaced dots).
 Because the above phenetic approaches use very different algorithms 
and operate under different assumptions about data sets, multiple 
phenetic analyses using different proximity algorithms and tree building 
methods are often used to explore similarities. Since different procedures 
give various results, conclusions must be drawn through either choice 
30 IPGRI TECHNICAL BULLETIN NO. 10
of only one result, or discussion of the commonality of different results, 
and both choices have a certain degree of subjectivity. One objective 
method sometimes applied to choose the “best” tree is to compare the 
distortion in the different trees relative to the proximity matrix with 
cophenetic correlation coefficients (a procedure present in NTSYS-pc). 
These values vary from 1 (no distortion) to 0 (total distortion) and the 
tree with the least distortion is presented.
 After these various analyses are presented, they must be interpreted 
in light of the question being addressed. Here more subjectivity is 
encountered, because there is no universal or statistical method to 
interpret these results. Some investigators interpret phenograms by 
drawing in a vertical line (phenon line) across branches to make 
taxonomic decisions by group membership relative to this line (e.g. 
Figure 4), but it is the investigator’s decision where to draw this line, 
ideally based on inferences from other data. For example, in Figure 4, if 
the phenon line is drawn as shown, the conclusion may be that there are 
four taxa (as species or subspecies): 1) OTU 1 + OTU 5, 2) OTU 3, 3) OTU 
2 and 4) OTU 4. A different line would reach a different conclusion. 
OUT1
OUT5
OUT3
OUT2
OUT4
5.00 4.25 3.50 2.75 2.00
Figure 4. A phenogram with a vertical phenon line drawn at distance 
coefficient about 2.5.
Molecular markers for genebank management 31
Cladistics
Cladistics, like phenetics, begins with an explicit analysis of data, and 
scores these data as an organism-by-character data matrix. However, 
the assumptions and methods diverge at this point. While pheneticists 
state that phylogeny is so difficult to reconstruct that overall similarity 
is the only objective criterion for grouping organisms, cladists state 
that phylogenetic relationship is the only valid criterion for classifying 
organisms, no matter how hard it may be to infer.
 Cladistics has its own terminology that conveys the basic assumptions 
and methods that distinguish cladistics from phenetics and some basic 
cladistic terms are here briefly explained; the reader is directed to Wiley 
et al. (1991) for more detailed explanations. A monophyletic group 
encompasses an ancestor and all of its descendants. Put another way, it 
refers to all of the taxa that trace down to a common branching point in 
a phylogenetic tree or cladogram. Monophyletic groups are determined 
by constructing cladograms and the results of the cladograms are used 
to make decisions of what is a monophyletic group.
 To construct a cladogram you begin with what you think may be a 
monophyletic group, referred to as an ingroup, such as “species A”or 
“tuber-bearing solanums,” or “the sunflower family”. Evolutionary 
relationships within the ingroup are determined by the use of one or more 
outgroups, that are thought to be closely related to the ingroup. As in 
phenetics you make a character-by-organism data matrix of the putative 
ingroup and outgroup(s). A sister group is the most closely related 
monophyletic outgroup to the ingroup. The proper choice of outgroups 
is critical to the results, and sometimes the sister group relationships are 
unclear. To alleviate this problem, further outgroups can be analyzed for 
a multiple outgroup analysis (Maddison et al. 1984) (Figure 5.). 
Second Outgroup
Sister Group Ingroup
Branch
Node
Internode
E D A B C
Figure 5. Terms relative to cladograms.
32 IPGRI TECHNICAL BULLETIN NO. 10
 Any character type can potentially be used for a cladistic 
analysis, including morphological or molecular characters (as DNA 
nucleotides or restriction endonuclease sites). Most analyses score 
these characters qualitatively, as presence or absence (1-0), as DNA 
nucleotides, or as a range of discrete character states (0-1-2-n). Care 
is taken to score only homologous characters arising from common 
ancestry, avoiding characters that may look similar but actually arise 
in parallel from different ancestors, when homology is known. This 
has important implications for the choice of a molecular marker. As 
mentioned above, homology is a function of taxonomic distance, 
and the more rapidly evolving markers would not be expected to be 
homologous (and therefore inappropriate) for increasingly distant 
taxa. Orthologous characters are homologous by a speciation event, 
meaning that they trace their ancestry to a common progenitor, 
and are taken as the only useful type of homologous character. 
Molecular taxonomists are searching for single-copy nuclear genes 
for phylogeny construction while doing everything possible to 
avoid paralogous characters that have arisen from gene duplication. 
Such duplicated genes can evolve separately in the same lineage, 
may falsely appear to be homologous, and can provide misleading 
phylogenetic information if each copy has independently diverged 
from the other.
 In addition to parsimony, other techniques are used to produce 
cladograms, depending on the data type. For example, while 
parsimony is frequently used for DNA sequence data, other 
techniques, such as maximum likelihood (Felsenstein 1981) and 
Bayesian analysis (Mau et al. 1999) search for trees that may be longer 
but that represent character changes based on certain evolutionary 
models. All of these methods “root” trees based on characters of the 
outgroup(s), and monophyletic groups are supported relative to the 
branching patterns of the ingroups. Cladograms may superficially 
look like phenetic trees (also called dendrograms), but branches of 
a parsimony-based cladogram are supported by specific characters 
and phenograms by an average of all characters. As a result, 
parsimony cladograms have characters supporting each branch 
shown on the tree, while phenograms only have average similarity 
values placed under the entire tree.
 Pheneticists infer only overall similarity of organisms from 
their phenograms, not phylogeny, while most cladists interpret 
cladograms phylogenetically. That is, cladists try to recognize 
only monophyletic groups and think that phenetic groups (groups 
resulting from a phenetic analysis) have no reality and should not 
be used in classification. Cladists refer to different classes of non-
monophyletic groups as is diagrammed in Figure 6. These non-
Molecular markers for genebank management 33
monophyletic terms are used in cladistic discussions all the time so 
it is important to understand them. They include 1) paraphyletic 
groups (groups containing some, but not all, descendants of the most 
recent common ancestor), and 2) polyphyletic groups (the common 
ancestor is placed in another taxon). 
 There is a wide diversity of opinion on application and 
interpretation of these concepts. For example, as described in 
“Comparison of molecular marker data” below, phylogenetic results 
of the same organisms obtained from different data sources are 
frequently in conflict (Wendel and Doyle 1998). Some researchers 
advocate analyzing different data sources separately to discover 
datasets providing misleading results, while others advocate 
combining all data into a single matrix for a total evidence analysis 
(e.g. Eernisse and Kluge 1993). Cladistic results can be affected 
by poor choice of outgroups, by analysis of unrecognized non-
orthologous characters, by different choice of cladistic algorithms to 
construct trees, by insufficient ingroup or outgroup sampling, and 
by different methods to handle missing data. There also is debate 
among cladists whether cladograms truly reflect recently evolved 
groups sharing common ancestry (process cladists), or whether they 
need to be theory neutral and only show patterns decoupled from 
assumptions of ancestry (pattern cladists or transformed cladists) 
Figure 6. Cladistic relationships relative to cladograms.
Cladistic relationships
Monophyletic
PolyphyleticParaphyletic
34 IPGRI TECHNICAL BULLETIN NO. 10
(Ereshefsky 2001). Perhaps the greatest source of debate is the use 
of cladistics at the species level. This is because cladistic procedures 
assume divergent taxa, yet individuals within species generally 
hybridize, leading some to consider cladistics to be an inappropriate 
method to define species (Templeton 1989).
 So is it best to use a phenetic or cladistic analysis of your 
molecular data? A case can be made that low taxonomic level 
molecular marker datasets (within species or among closely-related 
species) or dominant characters, should be analyzed with phenetic, 
as well as with cladistic methods (Koopman et al. 2001). The use 
of characters that have a greater chance of being homologous (as 
DNA sequence data), in questions above the species level, should 
use cladistic approaches, but such cladistic analyses should always 
choose outgroups to properly root the tree. If outgroups are 
unknown or problematic, multiple possible outgroups should be 
chosen as mentioned above.
Congruence of molecular marker data
Congruence of cladistic data
Systematics has progressed through a variety of conceptual and 
procedural developments including experimental crossing data, 
secondary chemical and isozyme data, and molecular data. Each 
development was preceded by an excitement of the primacy of the 
new technique, and the advantages of molecular data led to similar 
excitement. However, the use of separate molecular markers for 
the same germplasm sources frequently show at least somewhat 
different results, questioning the utility of any individual marker. An 
excellent review of phylogenetic discordance by Wendel and Doyle 
(1998) summarized its causes and biological interpretation. Although 
discordant results can cause problems in the strict translation of 
molecular results into hypotheses of evolution and into taxonomic 
treatments, they have the potential to provide insights into potential 
problems of individual markers, and insights into unexpected 
evolutionary events (Wendel and Doyle 1998). 
 Wendel and Doyle (1998) classified discordance into three classes to 
include: 1) technical causes such as poor gene choice, sequencing error, 
or insufficient taxonomic sampling; 2) organism level processes such 
as convergent or rapid morphological evolution, rapid diversification 
of organisms, hybridization and introgression, lineage sorting, and 
horizontal gene transfer; and 3) gene and genome-level processes 
such as intra-genic recombination, use of paralogous genes (genes 
arising from a single-copy gene that has duplicated and can diverge 
separately in the same organism) inter-locus interactions and concerted 
evolution, and non-independence of sites examined in the analyses. 
Molecular markers for genebank management 35
Examples of each possible cause of congruence are given in their 
paper. We recommend it as standard reading for anyone wishing 
to choose markers for phylogenetic analysis, to gain appreciation of 
the need to wisely choose multiple genes for analysis, and possible 
explanations of why different genes give different phylogenetic 
results. We discuss some of these reasons below.
 Every study is limited by time and funds, and decisions need to 
be made regarding number of taxa and characters to use for well-
supported phylogenetic results. The importance of outgroup(s) is 
discussed in cladistic studies (see “Genebank management”). Sufficient 
types and numbers of taxa and numbers of characters also are 
important. Often, inclusion of taxa is determined by simple availability. 
For example, studies of small taxonomic groups or of rare taxa must 
choose accessions from the limited material available. Lack of readily 
available germplasm, however, is not an excuse for poor experimental 
design; we have reviewed many unpublishable papers that analyzed 
only the available accessions in their genebank without realizing that 
these were insufficient to address significant questions. 
 Hillis (1998) outlined five possible strategies for adding taxa to a 
cladistic analysis of different species: 1) add taxa randomly from all 
living organisms; 2) choose taxa randomly within the monophyletic 
group being investigated; 3) select taxa within the monophyletic 
group that represent the overall diversity of the group; 4) select 
taxa within the monophyletic group that are expected to alleviate 
problems of “long branch attraction” (this is a phenomenon where 
strongly unequal rates of evolutionary change in different members 
of a group under study causes cladistics to produce incorrect trees, 
see Felsenstein 1978 for details); 5) add or delete taxa until the a-
priori biases of the investigator are supported. Most taxonomists 
would avoid choice 1 because it adds distant outgroups; that is, if 
you are interested in investigating maize, it adds little to an analysis 
to use pines as an outgroup, the gene used for ingroup analysis may 
not analyze distant outgroups well, and the outgroups form long 
branches. The second choice fails to best analyze diversity within 
clades, and the fifth choice is unscientific. Most taxonomists choose 
taxa based on a combination of options 3 or 4.
 The largest study to date to analyze discordance of different 
genes in a phylogenetic analysis was Rokas et al. (2003). They 
investigated the incongruence of phylogenetic trees produced from 
an analysis of 106 orthologous genes, from eight species of yeast and 
an outgroup. These genes, distributed on all 16 yeast chromosomes, 
comprised a total length of 127 026 nucleotides, encoding 42 324 
amino acids, containing roughly 1% of the genomic sequence 
and 2% of the predicted genes in yeast. This dataset is an order of 
36 IPGRI TECHNICAL BULLETIN NO. 10
magnitude larger than any other one for a study of phylogeny and 
incongruence. It is made possible because of the complete genome 
sequences of these eight yeast species and outgroup. The analysis 
consisted of maximum likelihood (ML) of the nucleotide data, 
maximum parsimony (MP) of the nucleotide data, and MP of the 
inferred amino acid sequences, with branch support estimated by 
bootstrap analysis. Comparisons were made of single genes and all 
genes together (a concatenated dataset). The results of individual 
genes produced more than 20 alternative ML or MP trees, but all 
3 analytical methods produced a single tree with 100% bootstrap 
support on each branch when applied to the concatenated dataset. 
This level of bootstrap support is unprecedented and suggests 
that large datasets can overcome incongruence in any single-gene 
analysis. In this study they concluded that: 1) eight genes were 
required to obtain a mean bootstrap value of greater than 70% with 
a 95% confidence interval; 2) 20 genes were required for a mean 
bootstrap value of at least 90% bootstrap support; 3) only 3000 
randomly selected nucleotides from the entire concatenated dataset 
of 127 026 nucleotides were needed to obtain the 70% bootstrap 
95% confidence interval tree, corresponding to an average length of 
only 2.5 genes. The implications, of 8–20 separate genes needed for 
well-supported phylogenies, need to be tested with other organisms 
varying by different levels of phylogenetic divergence. The study 
highlights, however, the need for the use of many genes to obtain 
well-supported phylogenies.
Congruence of phenetic data
We summarize here some studies examining congruence of phenetic 
results statistically. Powell et al. (1996b) examined ten accessions of 
cultivated soyabean, Glycine max (L.) Merr. and its progenitor species 
G. soja Siebold & Zucc., chosen to represent the maximum diversity 
present in the United States collection. They examined expected 
heterozygosity, multiplex ratios, and effectiveness in assessing 
relationships between accessions for AFLPs, RAPDs, RFLPs and 
SSRs. The heterozygosity measures were calculated from actual allele 
frequencies. The multiplex ratios were calculated from the number 
of bands simultaneously analyzed per experiment, for example the 
number of bands resolved on a particular gel. The marker index 
was the product of heterozygosity and multiplex ratio and is a 
measure to evaluate the overall information content of a marker 
system. They found that the average heterozygosity increased from 
AFLP and RAPD (similar in magnitude) to RFLP to SSR (greatest 
heterozygosity). The effective multiplex ratio increased from RFLP 
to SSR to RAPD to AFLP. The marker index increased from RFLP to 
Molecular markers for genebank management 37
RAPD to SSR to AFLP (highest). Despite the low heterozygosity of 
AFLPs, its high marker index is caused by its very high multiplex 
ratio (many more bands are detected per each experiment). The 
hypervariable SSR alleles proved best at detecting individual genetic 
differences. SSRs were the most effective in showing the differences 
among individual accessions (average genetic similarity between 
genotypes 0.341), while AFLPs, RFLPs and RAPDs were less able 
to distinguish genotypes (0.64–0.66). All marker types were very 
effective at the interspecific level and distinguished G. max from 
G. soja, but differed markedly in congruence within species. At the 
intraspecific level (when only G. max was considered), only AFLP 
and RAPD similarities were significantly correlated.
 Milbourne et al. (1997) examined 16 cultivars of potato. The 
marker index in this study increased from SSR to RAPD to AFLP. 
Proximity matrices showed low correlation between the marker 
types, but the best was between AFLP and RAPD. Russell et 
al. (1997a) examined RFLPs, AFLPs, RAPDs and SSRs using 18 
cultivated barley accessions, chosen to represent the majority of 
the ancestors of European cultivated barley. SSRs appeared to be 
the most polymorphic, while AFLPs displayed the highest marker 
index (0.937). The lowest marker index was observed for RFLPs 
(0.322). Spearman rank correlations of genetic similarity values 
ranked over 70% of the pairwise comparisons between AFLPs and 
RFLPs in the same order. SSRs showed the lowest correlation to the 
other marker systems. Spooner et al. (1996) examined congruence as 
a function of the number of markers per marker type, and showed 
that at the interspecific level there was a gradation of resolution from 
isozymes to RAPD to RFLP (highest), while at the intraspecific level 
it was RFLP to RAPD (there were insufficient isozyme markers for 
intraspecific questions). Lu et al. (1996) compared RFLPs, RAPDs, 
AFLPs, SSRs and ISSRs for informativeness (level of polymorphism 
detected and ability to discriminate between germplasm lines) 
and genetic diversity assessment among 10 pea genotypes. The 
PCR-based techniques were found to be more informative than 
RFLPs. A Mantel test revealed significant correlations among trees 
derived from the different marker systems, except for ISSRs. Pejic 
et al. (1998) evaluated informativeness and applicability for genetic 
diversity studies of RFLPs, RAPDs, SSRs and AFLPs using 33 maize 
inbred lines. SSRs displayed the highest expected heterozygosity 
and average number of alleles, while AFLPs showed the lowest 
polymorphism. However, the marker index of AFLPs was found to 
be more than 10-fold higher compared to the other marker systems. 
Genetic similarity trees for the different marker types appeared 
highly correlated, with the exception of the RAPD-based tree.
38 IPGRI TECHNICAL BULLETIN NO. 10
 Other comparative studies of closely related taxa have been 
conducted by Thorman et al. (1994), Dos Santos et al. (1994) and 
Hallden et al. (1994) in mustards (RFLP and RAPD); Sharma et 
al. (1996) in lentil (AFLP and RAPD); Lin et al. (1996) in soyabean 
(RFLP, RAPD and AFLP); Yang et al. (1996) in Chinese sorghum 
(RFLP, RAPD and ISSR); Olufowote et al. (1997) in rice (RFLP and 
SSR); Nagaoka and Ogihara (1997) in wheat (RFLP, RAPD and 
ISSR); Parsons et al. (1997) in rice (isozymes, RAPD and ISSR); Fang 
et al. (1997) in trifoliate orange (isozymes, RFLP and ISSR); Virk 
et al. (2000b) in rice (isozymes, RAPD, ISSR and AFLP); Anthony 
et al. (2002) in coffee (SSR and AFLP); Palombi and Damiano 
(2002) in kiwifruit (RAPD and SSR); and Lopez-Sese et al. (2002) 
in Spanish melon (RAPD and SSR). In general, the highest level 
of polymorphism per marker was found for SSRs, while AFLPs 
displayed the highest marker index. Compared to closely related 
accessions, better congruence was found between marker systems 
for more distantly related accessions. 
 An additional comparison of congruence is based on pedigree 
data. There are potential problems in these comparisons because 
of incomplete or incorrect pedigree records, assumptions of equal 
parental contribution to the genetic makeup of the cultivar, and 
different methods to generate pedigree estimators and compare 
them to genetic identity statistics. Schut et al. (1997) used 31 barley 
lines to investigate the congruence between similarity values for 
681 AFLP markers, pedigree-based coefficients of co-ancestry, 
and morphological distance based on 25 characters. Using a core 
collection of 25 European two-row spring barleys, AFLP and 
pedigree data showed poor to moderate correlation, while the 
morphological data were not significantly correlated with either 
the AFLP or pedigree data. However, inclusion of more distantly 
related barleys (winter types, North American origin, six-row 
barleys) improved the correlation between the AFLP and pedigree 
data considerably. In general, congruence between molecular and 
pedigree data varies greatly among studies. For example, significant 
but low correlations between pedigree and genetic similarity were 
shown by Graner et al. (1994) in barley (RFLP), O’Donoughue et 
al. (1994) in oat (RFLP), Hallden et al. (1994) in mustards (RFLP 
and RAPD), Ahnert et al. (1996) in sorghum (RFLP). “Reasonable 
to good” correlations exist between pedigree and genetic similarity 
as shown by Mumm and Dudley (1994) in maize (RFLP); Hill et 
al. (1996) in lettuce (AFLP); Doldi et al. (1997) in soyabean (RAPD 
and SSR); Huff (1997) in ryegrass (RAPD); Paz and Villeux (1997) 
in potato (RAPD); and Prabhu et al. (1997) in soyabean (DAF and 
RFLP).
Molecular markers for genebank management 39
 The choice of markers for phylogenetic or diversity studies will 
depend on many factors. These include the anticipated degree 
of relatedness between accessions. Because of their extensive 
polymorphism and frequently observed PCR problems at higher 
taxonomic levels, the utility of SSRs lies in investigating very closely 
related taxa. The high marker index and general concordant results 
make AFLPs very useful for close-to medium-divergent (within 
genera) studies. The high conservation of RFLP markers makes them 
more useful for comparative genome studies. It is likely that at low 
taxonomic levels (closely related accessions), perfect congruence of 
any markers will be an elusive goal.
 One long-held assumption in the choice of molecular markers 
was that the most reliable ones for diversity assessment would be 
those that, through prior mapping studies, were shown to be evenly 
spread throughout the genome (e.g. Bonierbale et al. 1995; Karp et 
al. 1997a). However, Virk et al. (2000a) found no advantage of using 
mapped AFLP markers compared to unmapped markers in assessing 
genetic relationships among rice accessions. They even concluded 
that the use of mapped markers may lead to misleading patterns 
of diversity because results are biased towards differences between 
the parents used to obtain the mapping population.
Predictive value of taxonomy
A major justification for investment in taxonomic research is that it 
produces something useful. One claim of its use is its “predictivity”. 
Taxonomy has many components, but two of the most important are: 
to determine what is a species; and to look for the interrelationships 
among these species. Interrelated species are grouped into higher-
level taxonomic ranks such as genera. The predictive goal is to use 
these species, genera, and other taxonomic ranks to make inferences 
about the entire group when you have data for only some of its 
members. For example, we may know from actual cases of poisoning 
that, “species A has is a deadly toxin to humans”, or that, “species B 
is a useful wild species to be used in a potato breeding program for 
late blight resistance”. Similarly, we may infer that the species most 
closely related to a species with a known trait more likely share that 
trait than do unrelated species. These predictive statements help us 
avoid or choose these species to fit our needs. 
 The idea that taxonomy serves to make such predictive statements 
has long been accepted (Michener 1963; Rollins 1965; Warburton 
1967; Sokal 1985; Stuessy 1990; Miller and Rossman 1995; Daly et al. 
2001). Clearly, better taxonomic classifications would be expected 
to make better predictions. Modern taxonomy has undergone a 
renaissance during the last 20 years due to new molecular data 
40 IPGRI TECHNICAL BULLETIN NO. 10
and improved taxonomic theory and methodology that allows the 
recognition of monophyletic (“natural”) groups. These developments 
have upgraded the position of taxonomy in the biological sciences 
and clearly have improved the link to prediction, addressing a range 
of questions regarding biosystematic and developmental pathways, 
sources of natural products, origins and migrations of evolutionary 
lineages, and conservation. 
 There are dramatic examples of this increased prediction role 
through cladistic studies, as outlined in Daly et al. (2001). For example, 
15 families of angiosperms were known to produce glucosinolates, 
and these traits were thought to have evolved separately many 
times. Cladistic analyses of these families showed them to be part of 
a single clade (monophyletic branch of a phylogenetic tree), called 
the Brassicales, with the exception of the genus Drypetes Vahl that was 
in another clade called the Malpigiales. This cladistic result suggested 
that there were two evolutionary origins for mustard glucosinolates 
(Rodman et al. 1993), and indeed glucosinolates are derived by two 
different biosystematic pathways in these two orders, showing they 
had different historical origins. Daly et al. (2001) point out two other 
dramatic examples of prediction. The important bioactive compound 
taxol was known only from members of the plant family Taxaceae, 
but searches for other sources in the most closely related family 
Podocarpaceae identified new sources there (Stahlhut et al. 1999). In 
another example, new data on the interrelationships of the angiosperms 
have shown that those families exhibiting nitrogen fixation are members 
of one clade, suggesting a single origin of this syndrome of traits (Soltis 
et al. 1995). Such striking concordances have increased our confidence 
of the power of the cladistic method to address prediction. We know 
of no similar studies to demonstrate the predictive role of taxonomy 
at lower taxonomic levels, but it is widely assumed to occur.
 In an applied sense, prediction means that germplasm can be 
chosen or avoided by breeders based on past positive or negative 
evaluations of related species. Germplasm evaluations of resistance or 
agronomic traits are common in the literature. For example, species-
specific statements about the breeding value of wild potato germplasm 
are found in Ross (1986), Hawkes (1990) and Ruiz de Galerreta et al. 
(1998). Clearly, not all accessions of a species will share all traits, but 
lacking prior evaluation data, taxonomy should provide a useful guide 
to make inferences about unevaluated germplasm based on present 
knowledge.
 However, claims of the predictive component of taxonomy have 
always been by post-hoc discoveries of associations that match 
expected prediction models. The dramatic examples of the association 
of glucosinolates, taxol and nitrogen fixation to new knowledge of 
Molecular markers for genebank management 41
cladistic relationships are intriguing, but they ignore non-associations 
in other groups that need to be considered and tested statistically. 
Without such tests, there is no way to convincingly demonstrate that 
associations of traits to taxonomy are not simply due to chance. A-priori 
experiments are needed to specifically test the prediction assumption 
empirically before convincing statements can be made of the overall 
strength of taxonomy as a predictive tool. Most of the above predictions 
are for traits under polygenic control, but prediction also can address 
simply inherited traits.
 Taxonomy is not the only possible predictor. Biogeographical 
variables have also been used to predict the presence or absence of 
traits in wild plants. Biogeography-based hypotheses of association 
are fundamental in guidelines for collecting plant genetic resources. 
For example, it is often suggested that germplasm collectors should 
sample from as many ecologically different environments as possible 
(Brown and Marshall 1995). It is also thought to be important to include 
geographical extremes of the range of a species (Allard 1970). Although 
such populations may not present great genotypic variation, they may 
harbor unique traits or taxa (von Bothmer and Seberg 1995).
 The presence of biogeographical associations might reflect adaptation 
of plants to prevailing ecological conditions where they grow. Rick 
(1973) and Nevo et al. (1982) found similar convergences in resistance to 
drought in populations growing in dry areas. One would also expect to 
find similar traits in areas with a comparable bioclimatic environment, 
even when these areas are far apart and there is no genetic exchange 
among the populations. In the case of disease resistance, adaptation 
could arise in an area as a result of coevolution of a pathogen with a 
limited range. Thus, resistance to a certain disease may be present in 
areas where the pathogen is endemic, whereas it may be absent in 
areas that are similar from an ecological and/or taxonomic perspective 
but where the pathogen is absent. For example, nearly all R genes for 
resistance to potato late blight have been found in accessions from 
central Mexico. This is the centre of diversity of late blight and its 
likely centre of origin (Fry and Goodwin 1997). Patterson and Givinish 
(2002) showed convergence of several non-disease related traits in 
different clades in the Liliales, and termed this phenomenon concerted 
convergence.
 In convergence situations one expects to find the presence of traits 
to be restricted to certain areas and independent of species. In the 
case of widespread species, one would expect the trait to appear (or 
increase in frequency) as the selection pressure (whether ecological 
or coevolutionary) increases. Thus, this strongly contrasts with the 
taxonomic prediction paradigm. In reality, both taxonomic and 
biogeographical factors may be associated with distribution of traits 
42 IPGRI TECHNICAL BULLETIN NO. 10
(Hijmans et al. 2003). The question then becomes how to best use these 
predictive capacities. Which of the two is a stronger predictor? How 
may they complement each other?
 Wild crop relatives provide an outstanding resource to test prediction. 
Many large germplasm collections are well studied regarding taxonomy, 
distribution, and have large databases of screening data for useful traits 
such as disease resistance. The greatest impacts of disease resistance 
from wild germplasm in food crops have been in wheat, potato and 
tomato (Lenné and Wood 1991). Because of the economic importance 
of disease resistances and other crop improvement traits, and for many 
other purposes, large germplasm collections have been assembled, and 
they represent a ready resource for prediction studies.
Species concepts
The idea of what constitutes a species has changed from original 
concepts based on special creation by God, to modern ideas 
incorporating biological and DNA data. Originally, species were 
defined by simple impressions of differences as determined by 
morphological observations. Our concepts of species changed with 
new discoveries of genetics, reproductive biology, data analyses, 
and theory of what forms and maintains species. Many different 
species concepts arose from these developments, and there is no 
consensus on what constitutes a species. The purpose of this section 
is to introduce the reader to basic species concepts to allow you to 
interpret your own data, and that of others, in the basic framework 
of these historical developments and conceptual differences of 
species.
 Taxonomy is the theory and practice of describing, naming and 
classifying organisms (Lincoln et al. 1998). Systematics is a related 
term, sometimes used synonymously, but involves a broader 
discipline of discovering phylogenetic relationships through modern 
experimental methods using comparative anatomy, cytogenetics, 
ecology, morphology, molecular data, or other data (Stuessy 
1990). We use the term taxonomy to describe all these aspects here 
for simplicity. Taxonomy has many components but primarily 
involves: 1) determining what is a species (or their subdivisions, 
as subspecies); 2) distinguishing these species from others through 
taxonomic keys and descriptions and examining the geographic 
ranges of species; 3) investigating their interrelationships; and 4) 
determining proper names of species and higher order ranks (as 
genera or families) using international rules of nomenclature. In 
addition, some taxonomists investigate processes of evolution that 
lead to the existing pattern of species and their interrelationships 
(Spooner et al. 2003).
Molecular markers for genebank management 43
 Standard ranks in the taxonomic hierarchy from lowest to highest 
are form, variety, subspecies, species, series, section, genus, tribe, 
family, order, class, division and kingdom. More ranks can be added 
if desired by adding qualifier terms such as “sub” or “super”, 
for example, to create subgenus or supergenus. Ranks only have 
meaning in a relative (not absolute) sense in that a genus is less 
inclusive than a family, and a family less inclusive than an order 
(Stevens 1998). There are no objective criteria or set of characters 
to indicate what taxonomic level is a genus, family, or order. As 
such, families or any rank are not comparable regarding the relative 
diversity they contain or how diverged phylogenetically they are to 
other ranks. Put in a phylogenetic context, traditional ranks are not 
necessarily equivalent in that they do not designate sister clades. 
Many taxonomists today are attempting to have taxonomy reflect the 
branching patterns of phylogenetic trees. In some cases traditional 
ranks are not monophyletic, and in others, one clade could represent 
a family and its sister clade could represent a genus.
 There are many reasons why taxonomy is important. Taxonomists 
use standardized rules of nomenclature to make names as stable 
as possible. For wild and cultivated plants, taxonomists use the 
International Code of Botanical Nomenclature (ICBN) (Greuter 
et al. 2000), and for cultivated plants use the International Code 
of Nomenclature for Cultivated Plants (ICNCP) (Brickell et al. 
2004); the use of either code is allowable for cultivated plants. Such 
stable names help maintain continuity of the scientific literature 
and allow all disciplines to communicate among each other with 
a common language. Taxonomy also produces descriptions and 
distribution maps and identification aids (as taxonomic keys) that 
aid identification of unknown specimens. The predictive role of 
taxonomy is described above.
 Taxonomic treatments are especially useful for genebank 
managers. For example, a stable nomenclature allows efficient 
literature searches for traits of breeding interest, suggests localities 
where species are known to occur in the wild and may be in need 
of collecting, defines species diversity (as determined by numbers 
of species per area) for biodiversity conservation, guides genebank 
managers to rationally organize and document collections, and 
guides breeders to possible sexual crossability. Despite the above 
admirable taxonomic goals, a reliable taxonomy can be difficult to 
obtain because of different classification philosophies or practices 
based on morphological, crossability, phylogenetic, ecological, 
molecular, or other data. For example, Mayden (1997) lists a total 
of 22 different species concepts. We here list only the major variants 
of these concepts and summarize them into six major classes 
44 IPGRI TECHNICAL BULLETIN NO. 10
(modified from Spooner et al. 2003). There are various perspectives 
on the proper criteria for recognizing species that lead to different 
classifications as mentioned below.
Morphological species concepts
Morphological species concepts define species entirely on 
morphological or anatomical characters. Because of their utility, 
they have been frequently applied, especially historically when 
taxonomy was primarily based on morphological data from 
herbarium specimens. Taxonomists can apply these concepts 
very effectively by gaining initial impressions of species limits 
from examination of herbarium specimens, sometimes followed 
by microscopic examination to gain additional data to modify 
species delimitation. Cronquist (1978) defined this very practical 
application of the morphological species concept as the taxonomic 
morphological species concept: “Species are the smallest groups 
that are consistently and persistently distinct and distinguishable 
by ordinary means”.
 The characters leading to this subjective judgement are often 
unclear, sometimes even to the taxonomist applying them. The 
advent of computers and phenetic philosophy led to a more objective 
evaluation of characters, and the phenetic species concept arose to 
allow taxonomic decisions based on clustering of individuals. Sokal 
and Crovello (1970) defined the phenetic morphological species 
concept as: “dense regions of hyperdimentional space” (referring to 
clustering of individuals in ordination analyses), but the definition 
refers to dendrograms as well.
Interbreeding species concepts
The interbreeding species concepts focus almost entirely on the 
ability of species to exchange genes, either naturally or artificially, as 
assessed by artificial crossing programmes, studies of mechanisms 
to facilitate gene flow, and biological isolating mechanisms. Mayr 
(1942) advanced the biological species concept (an interbreeding 
concept) as: “Species are groups of interbreeding natural populations 
that are reproductively isolated from other such groups”. This 
concept matches that held in the minds of the general public and is 
intuitively appealing, but there are many practical and theoretical 
problems in applying this concept. Procedurally, it is almost 
impossible to apply to a group of any size because replicated pair-
wise crosses are needed in most interspecific combinations to be 
confidently interpreted (Sokal and Crovello 1970). As well, data 
from greenhouse situations do not always match crossability data 
in natural situations, and organisms frequently display varying 
Molecular markers for genebank management 45
degrees of crossing success that make interpretation of the data 
difficult. Also, the concept is inapplicable to species reproducing 
asexually. The lifetime of crossing studies by Rick (1963, 1979) in 
tomato is a notable application of this concept, but this depth of 
study is exceptional and rarely been applied as thoroughly in other 
groups.
 As pointed out by Mallet (2004) Mayr’s biological species 
concept had roots in a paper by Poulton (1904), perhaps the first 
paper devoted entirely to the discussion of species concepts. This 
early paper outlined key ideas important to species theory today, 
including reproductive and geographic isolation, the classification 
of isolating mechanisms, and the term sympatry, a key concept 
referring to species growing in the same geographical area.
Ecological species concept
Van Valen (1976) noted the perplexing array of variation in oaks 
that often have broadly sympatric (growing in the same area) sets 
of very similar species, often hybridizing among each other. He 
noted that despite many hybrids, oak species are often maintained 
in their distinct habitats. For example, swamp white oak is broadly 
sympatric with burr oak in the Great Lakes and Ohio River basins, 
and they frequently hybridize. The former, however, grows in wet 
bottomlands, stream sides and swamps, and the latter in moist 
habitats of rich woods and fertile slopes. Van Valen was so influenced 
by the ecological partitioning of distinct species in specific habitats 
that he contended that “The control of evolution is largely controlled 
by ecology and the constraints of individual development.” He 
defined the ecological species concept as: “A species is a lineage (or 
closely defined related set of lineages) which occupies an adaptive 
zone minimally different from that of any other lineage in its range 
and which evolves separately from other sets of lineages outside its 
range.” He contended that ecological factors are more closely related 
to genetic differences than reproductive isolation. 
Cladistic species concepts
The most recent and conceptually difficult species concepts are those 
based on cladistic criteria. Cladistics, as discussed in “Genebank 
management”, also has its own unique set of terms that can initially 
make cladistics difficult to understand. Cladistic species concepts 
arose out of the ideas of Hennig (1950, 1966) who grouped taxa 
entirely on historical relationships as determined by cladistic 
analyses. Hennig never used cladistics to define species, but only 
to group species, but some taxonomists have later applied cladistics 
at the species level to try to define species.
46 IPGRI TECHNICAL BULLETIN NO. 10
 Cladists investigate progenitor-derivative relationships, based 
on shared derived character states, as determined from states in 
related taxa (outgroups) to form monophyletic groups. The ability to 
interbreed is viewed by cladists as a potentially misleading character 
for assessing interrelationships, however, because it is not a shared 
derived character. Rather, two species often diverge by a breeding 
barrier that separates them, and the inability to interbreed is often 
derived. For instance, two species sharing a common branching 
point on a phylogenetic tree (sister species) may have diverged 
after an isolating mechanism that prevented their interbreeding, 
while more distantly related species of the same group may have 
retained the ability to interbreed (a shared primitive character). In 
such cases, biological and cladistic species concepts provide very 
different delimitations of species (Cracraft 1989).
 The attempt to apply cladistic criteria to define species poses 
problems, with the main one being that cladistic approaches search 
for patterns of divergence, yet species are composed of potentially 
interbreeding populations. Theory shows that those taxonomic 
markers useful for defining species and subspecific taxa can show 
different cladistic results depending on potential levels of crossing 
over and linkage relationships (Maddison 1995). As a result, studies 
using multiple genes useful for studies at the species level have been 
advocated to search for points of agreement in different gene trees that 
may indicate a species divergence (Baum and Donoghue 1995). 
 Rieseberg and Brouillet (1994) and Olmstead (1995) have argued 
that geographically localized models of speciation typically produce 
many monophyletic daughter species and an extant paraphyletic 
progenitor species, and argue that a strict concept for monophyly 
fails for many species. Olmstead (1995) termed the former apospecies 
and the latter plesiospecies. Castillo and Spooner (1997) applied this 
concept to locally endemic species of wild potatoes arising from 
more widespread progenitor species to recognize plesiospecies and 
apospecies.
Eclectic species concepts
The former species concepts highlight single processes to define 
species. Eclectic species concepts, in contrast, assume that species 
are formed and maintained by a variety of forces. Doyden and 
Slobobchikoff (1974) constructed a flow chart detailing a variety of 
morphological, geographical, biological and ecological criteria to 
define species. Ereshefsky (2001) outlined several classes of species 
concepts, and advanced a pluralist species view that no single correct 
definition of species exists and that a number of alternative concepts 
may be legitimate. Mallet (2004) highlights a key paper by Poulton 
Molecular markers for genebank management 47
(1904) as perhaps the first devoted entirely to the discussion of 
species concepts. Poulton argued that species formed reproductive 
communities, the individual members of which were united by 
common descent, combining aspects of the biological and cladistic 
species concepts.
Nominalistic concepts of species
Some question the very existence of species, and believe that 
individuals or interbreeding populations are the only entities that 
have any objective reality. Such ideas arose out of the philosophy of 
nominalism, arguing that only individuals are real and that classes 
of any kind (as species, genera, or families) are artificial constructs. 
For example, Burma (1954) stated, “…species are highly abstract 
fictions”. Levin (2000) likewise argued that the local population is the 
only unit of evolution, and species are artificial. Ehrlich and Raven 
(1969) documented many cases of reduced gene flow in both plants 
and animals that would preclude any cohesive force to maintain 
species. They contended, “Selection alone is both the primary 
cohesive and disruptive force in evolution…for sexual organisms 
it is the local interbreeding population and not the species that is 
clearly the evolutionary unit of importance”.
So just what is a species? 
Our discussion presents only a small sample of a wide diversity of 
opinions on what constitutes a species, and the criteria important 
for their recognition. Our division of criteria to define species is 
really a “taxonomy” of species concepts, and there are other ways 
to view these criteria. For instance, Mallet (2001) argues that a clear 
distinction needs to be made between the data used to define species 
(e.g. morphology, isolating mechanisms, molecular data), and the 
methods used to analyze these data (e.g. phenetics, cladistics, 
population biological methods). Under this perspective, the term 
“morphological species concept” is a misnomer in that it describes 
only one data type, which with others, such as DNA characters, 
should be used together to define species. He also points out that the 
“reality” of many species decreases over wide geographic ranges. 
For instance, in small areas species may appear to be discrete, but 
over wider ranges show more variability that makes them harder to 
distinguish from similar species. We fully agree with these ideas and 
present our species “taxonomy” as we do to highlight the different 
criteria that have developed to define species. 
 Despite some valid arguments for “species are not real”, there is 
utility in their recognition as we can best define them because they 
have practical value. These include communication among plant 
48 IPGRI TECHNICAL BULLETIN NO. 10
breeders and other biologists, biodiversity conservation, ecological 
studies, and legislation of biodiversity. Genebank managers are 
particularly dependent on species names to collect, manage, and 
legally distribute species across borders. Our perspective is that there 
is a wide continuum between well defined to almost impossible to 
define species, but that we must do our best to define them for these 
practical needs. It is crucial for us to understand that these different 
concepts exist, and to interpret the literature with these differences 
in mind. Ideally, genebank managers will cooperate to gather a 
common set of evaluation data to classify germplasm, in order that 
common concepts can be applied and compared across collections. 
We believe that a variety of criteria are useful to help define species, 
including morphology, molecular markers useful at the species level 
(as discussed in “Overview of molecular technologies”), crossing 
studies, and other types of data. There has never been a single set 
definition to define species and likely there never will be one. What 
is most important for genebank managers is to gather their data in 
well designed experiments relative to choice of taxa and markers, 
to analyze their data appropriately, to make their data publicly 
available for possible reanalysis and evaluation by others, and to 
interpret and discuss their results with full knowledge of the above 
concepts.
Characterization of germplasm
Systematics
Cultivated plants evolve through human selection, unlike natural 
selection of wild species, which can have a profound effect on the 
morphology of cultivars and thus on their classification. They are 
often distinguished from their wild relatives by artificially selected, 
novel and extreme morphological and physiological differences 
relating to seed dispersal, inflorescence architecture, seed size, and 
gigantism that may make the relationship to their progenitor(s) 
unclear. Traits that are typically selected for cultivated forms can 
reduce fitness in wild habitats (e.g. lack of seed dormancy or seed 
shattering) but can confer a selective advantage in the drastically 
different artificially selective environment of cultivated habitats. 
 Different taxonomists of the same crop often construct different 
taxonomic treatments and it is often hard for users to choose one 
that is “better” (Harlan and de Wet 1971). Also, crop species often 
are more “over described” (too many species recognized) than is 
typically found in taxonomic treatments of wild plants. Many crops 
have only recently diverged from their wild relatives, and often 
form hybrid swarms with them. The same crop may have arisen 
independently many times. For example, multiple crop origins have 
Molecular markers for genebank management 49
been demonstrated for common bean (Sonnate et al. 1994), cotton 
(Wendel 1995), millet (Yabuno 1962), rice (Second 1982), and squash 
(Decker 1988). They often form intergrading, reticulating gene pools 
with their wild relatives, and species distinctions are often obscure. 
Because of this, genebank curators should be skeptical of taxonomic 
treatments of their cultivated collections and should critically review 
the criteria that were used for the taxonomy, and compare possible 
alternative taxonomic treatments. Genebanks with global collections 
provide a wonderful resource of well-represented collections to 
address these taxonomic questions.
 Taxonomic questions are addressed by almost every molecular 
marker class from microsatellites useful at the species level to DNA 
sequences useful at the generic and family levels. For example, 
Roa et al. (1997) used phenetic analyses of AFLPs to investigate the 
origin of cassava (Manihot esculenta Crantz) relative to four other 
wild species and two non-cultivated subspecies of M. esculenta. 
These subspecies were thought to be progenitors of the cultivars 
or escapes from cultivation. As is typical in crops, the cultivars 
contained less variation than their putative wild progenitors. The 
two non-cultivated subspecies showed distinct AFLP differentiation 
from the cultivars suggesting they were progenitors, not escapes 
from cultivation. Species-specific markers characterized the species 
but not the two subspecies. A morphological analysis of the same 
accessions likewise failed to separate subspecies, suggesting that 
they were unworthy of separate taxonomic status. Manihot tristis 
Müll. Arg. was supported to be most similar to M. esculenta.
 Kardolus et al. (1998) used cladistic and phenetic analyses of AFLPs 
to investigate interspecific relationships of wild potatoes, with tomatoes 
as outgroups. The results were in broad congruence with other modern 
molecular results, showing the applicability of AFLPs at this taxonomic 
level. Many other interspecific studies, including appropriate outgroups, 
have examined relationships with RAPDs and isozymes (Maass and 
Klaas 1995, onion; Chan and Sun 1997, amaranths), and RFLPs and 
RAPDs (Miller and Spooner 1999, potato).
 In general, genebank collections consist of multiple accessions 
of the same or similar species, and their identification can be 
difficult. Taxonomic misidentifications of genebank accessions can 
be common, leading to confusion by users of this germplasm and 
perpetuation of errors in resulting publications. Molecular markers 
provide tools to test the taxonomic validity of species and can 
provide species-specific diagnostic markers. For example, Martin et 
al. (1997) used RAPDs to re-identify a collection of oat accessions. 
Lee et al. (1996) used RAPDs to discriminate Brassica L. varieties. 
Sharma et al. (1995) effectively used RAPDs to identify species-
50 IPGRI TECHNICAL BULLETIN NO. 10
specific markers for lentil. Species-specific markers also have been 
found with minisatellites (Baurens et al. 1997, banana), and Zhou et 
al. (1997) used minisatellite sequences to discover genome-specific 
fragments in oat, but not cultivar-specific fragments.
 McGregor et al. (2002) used AFLPs to characterize 314 accessions 
of the wild potato subgroup Solanum L. series Acaulia Juz. Series 
Acaulia consists of the species S. albicans (Ochoa) Ochoa and S. acaule 
Bitter, with the latter subdivided in four subspecies. An UPGMA 
cluster analysis of the AFLP data grouped the majority of accessions 
into species and subspecies. The AFLP data uncovered 16 taxonomic 
misidentifications to species or subspecies including four cases 
that were later identified as different species outside the series. 
The AFLP data also allowed determination to subspecies rank of 
97 accessions that previously were described as S. acaule only. Two 
accessions appeared to consist of a mixture of species. AFLP analysis, 
therefore, proved to be a very powerful tool to verify the taxonomic 
status of accessions within the series Acaulia, and hence contributed 
considerably to more reliable identification of this collection.
 Santacruz-Varela et al. (2004) used morphology, isozymes, and 
microsatellites to investigate the origin of North American maize 
relative to populations from Mexico and South America. They 
discovered three distinctive groups from North America, and 
proposed their recognition as distinct races: 1) North American 
Yellow Pearl Popcorns that represent the common popcorn for US 
production and originated from Chilean races in the 19th century; 
2) North American Pointed Rice Popcorns that were commercially 
important in the first-half of the 20th century and originated in central 
Mexico; and 3) North American Early Popcorns, sharing a diversity of 
traits with Northern Flint Maize, other Mexican races and European 
popcorn varieties introduced in the late 19th century.
Fingerprinting studies
Advanced cultivars of many crops frequently are morphologically 
very similar, and may have arisen from their progenitors only by 
somatic mutations. “Fingerprinting” is an attempt to discover 
cultivar-specific molecular markers that aid their identification. 
Certain cultivars are economically very important and form the 
backbone of regionally important industries. The grape cultivar 
‘Sangiovese’ is a case in point. It is one of the economically most 
important grape cultivars in Italy and is the major cultivar of Tuscan 
wines. Many phenotypic variants of ‘Sangiovese’ exist, and the 
identification and maintenance of these lineages is important to the 
industry. Some members of the ‘Sangiovese’ group are local variants 
and are given different names. The identification of these variants is 
Molecular markers for genebank management 51
important for identification of clones to be used in breeding and for 
germplasm conservation. Grapes show great SSR allelic diversity 
and priming sites have been characterized (Vignani et al. 1996; 
Lamboy and Alpha 1998). SSRs are logical choices for investigating 
differences at the taxonomic level of these closely related clones. 
Vignani et al. (1996) used 7 SSRs to investigate 12 ‘Sangiovese’ clones. 
Eleven of these 12 clones were identical at all 7 loci, but 1 clone 
differed by 1 allele at each of 4loci. The results support the “clonal” 
nature of these 11 accessions. SSRs do not provide a clear answer 
regarding the relationship of the twelfth clone to ‘Sangiovese’, but 
a close relationship is inferred from much allele sharing with the 
other clones.
 Potato is another clonal crop with many morphologically similar 
cultivars. Schneider and Douches (1997) were able to distinguish 
24 of 40 potato cultivars with 7 SSR primer pairs. When the SSR 
data were combined with the tuber morphology, only five pairs of 
cultivars could not be distinguished. Similarly, Mandolino et al. 
(1996) distinguished eight potato cultivars with two probe-enzyme 
RFLP combinations and three RAPD primers. Crops propagated 
by seed have been distinguished as well. For example, Charters 
et al. (1996) demonstrated the ability to distinguish all 20 Brassica 
cultivars examined with just 2 SSR probes. Other fingerprinting 
studies involve DNA Amplification Fingerprinting (Weaver et al. 
1995, Eremochola Buse; Scott et al. 1996, Chrysanthemum L.), RAPDs 
(Lee et al. 1996; mustards; Golembiewski et al. 1997, Agrostis L.; 
Degani et al. 1998, strawberry), and SSR (Guilford et al. 1997, apple; 
Russell et al. 1997b, barley; Lamboy and Alpha 1998, grape). These 
many studies clearly document the use of SSRs for fingerprinting 
genebank accessions.
Putative natural hybrids
Hybridization is thought to be a major evolutionary force at both 
the diploid and polyploid levels (Rieseberg 1995). The major data 
used to infer whether accessions are hybrids have been additive 
morphological traits. Nonetheless, Rieseberg and Ellstrand (1993) 
showed that hybrids are no more likely to display intermediate 
character states than parental ones, and can additionally express an 
array of transgressive or novel traits. Other long-held beliefs that 
hybrids are less fit than their parents, and that they exhibit character 
coherence (parental characteristics remain associated) have also been 
questioned (Rieseberg 1995).
 The utility of molecular markers to investigate hybrids decreases 
with time of divergence, as species-specific markers from both 
parents can be disrupted through recombination, and both can 
52 IPGRI TECHNICAL BULLETIN NO. 10
mutate to new markers. This can happen very rapidly as was shown 
by Song et al. (1995). They showed extensive loss of markers and 
the appearance of novel new markers in just the F5 generation of 
an artificial interspecific Brassica hybrid. This was constructed as 
a homozygous allopolyploid, demonstrating that at least on the 
polyploid level species can generate extensive genetic diversity in 
a very short period of time.
 Molecular markers present powerful new tools to reinvestigate 
hypotheses of hybridization. One method is to search for markers 
in the hybrid that are found in both putative parents. The utility 
of this method depends on discovering species-specific markers, 
yet this may be difficult because many hybrids are between closely 
related taxa that have not diverged enough to have formed specific 
markers. Additive molecular markers were used to investigate 
three hypotheses of hybrid origins in wild potatoes. In all three 
hypotheses discussed below, hybrid origins were strongly inferred 
from intermediate morphology and distribution of the hybrids at 
contact zones of the parents, yet molecular data supported only one 
of these hybrid origins. Clausen and Spooner (1998) supported a 
prior hypothesis of hybridization in the wild potato species Solanum 
× rechei Hawkes & Hjert. with RFLP data. Miller and Spooner (1996) 
on the other hand, failed to support a hypothesis of introgression of 
the wild potato species S. microdontum Bitter into S. chacoense Bitter 
with RAPD and RFLP data, and Giannattasio and Spooner (1994) 
and Spooner et al. (1991) failed to support a hypothesis of hybrid 
origin of S. raphanifolium Cárdenas & Hawkes with nuclear RFLP 
and chloroplast DNA data. These results show that morphology 
can provide misleading clues to the hybrid nature of accessions and 
show the use of molecular markers to reinvestigate hybrids.
 A classical experimental method to reinvestigate hybrid origins 
is to re-synthesize an artificial hybrid and to compare it to the 
natural hybrid relative to morphology, reproductive success, or 
a compliment of parental molecular, biochemical, or ecological 
traits. Rieseberg et al. (1996) investigated the diploid hybrid origin 
of the wild sunflower species, Helianthus anomalus S. F. Blake, 
which is derived from two chromosomally divergent parental 
species, H. annuus L. and H. petiolaris Nutt. Earlier studies had 
already confirmed the hybrid origin of H. anomalous (Rieseberg 
1991). Rieseberg et al. (1996) additionally investigated its genome 
composition via the construction of molecular linkage maps with 
RAPDs. They produced three separate hybrid lineages to the F3 
generation by different designs of backcrossing, and compared the 
genomes of these three lineages to the natural hybrid. The genome 
compositions of all three artificial hybrids and natural hybrid were 
Molecular markers for genebank management 53
remarkably similar. In addition to providing yet more evidence 
for hybridity, this study showed that blocks of chromosomes were 
rearranged as conserved units in both natural and artificial hybrids. 
It also showed the resistance of certain genes for recombination, and 
suggested that positive gene interactions drive the selection for these 
rearrangements in the hybrid. Because many crops are thought to be 
of hybrid origin, or undergoing introgression with related weeds, 
such studies can be applied to many other crops.
Genomic differences in wild relatives of crops
The genome refers to the array of genes carried by different 
individuals. Major genomic differences are usually represented 
by separate capitol letters (as genomes AA, BB, AB for diploid 
species, or AAAA or AABB or BBBB for a tetraploid), with one letter 
per haploid genome set. Minor genome differences are typically 
expressed as subscripted or superscripted variants of individual 
genomes (as AªAª, or A¹A¹).
 Because of the practical importance of interbreeding crops for 
crop improvement, the discovery and manipulation of species-
specific genomic differences is of use to agronomists (Smartt and 
Simmonds 1995). Although genomes are primarily defined by 
crossability data, they have been inferred in the absence of these 
data by phylogenetic investigations. For examples, Kollipara et al. 
(1997), Singh et al. (1998) and Hymowitz et al. (1998) collectively 
investigated cladistic relationships, via DNA sequencing from 
the internal transcribed spacer region of nuclear ribosomal DNA 
(ITS), of all 18 wild and cultivated soyabean species, and assigned 
genome symbols to unknown species by phylogenetic results. 
Biochemical, cytogenetic and molecular data are congruent with 
genome designations in these species.
 Singh and Smartt (1998), however, point out incongruence 
between cytogenetic and molecular evidence to genome hypotheses 
in the cultivated peanut (Arachis hypogaea L., a tetraploid of AABB 
genome constitution). Cytogenetic evidence supports A. batizocoi 
Krapov. & W.C. Greg. as the closest B-genome diploid species 
relative of Arachis hypogaea but RFLP data suggests that A. batizocoi 
is more distantly related to A. hypogaea than other species of the 
section Arachis. In this case, therefore, RFLP data may not be a 
good indicator of genomes in Arachis. They suggest that additional 
crossing evidence is needed to infer genome progenitors in this crop. 
Because genomic differences are so important for predicting the 
ability to hybridize or to introgress genes among different species, 
further studies of genome differences in crops are needed.
54 IPGRI TECHNICAL BULLETIN NO. 10
Rationalization of germplasm collections
Redundancies (identical or near identical accessions) may occur in 
collections for various reasons, e.g. documentation errors, exchange 
of identical accessions between genebanks, and the sampling of 
multiple populations from genetically homogeneous collection 
sites. Clearly, redundant accessions are a nuisance to genebank 
curators because they do not contribute to the genetic diversity of 
a collection, but require resources to maintain them. Identification 
and elimination of redundancies are therefore important aspects 
in plant genetic resources management, both from a genetic and 
economic view. Comparison of passport data may only suggest 
potential redundancies within collections, which subsequently may 
be validated with morphological data. 
 Currently, molecular markers are being widely applied to identify 
or validate redundancies. However, these analyses are by no means 
straightforward. In a strict sense, absolute certainty that two samples are 
identical can only be inferred from DNA sequence comparison of their 
entire genomes. For obvious practical reasons, analyses are restricted 
to a limited number of markers, which can only prove that samples 
are different, but not identical. Sexually reproducing species consist 
of heterogenous populations and, even for self-fertilizing species, 
intra-accession variation is often observed (e.g. van Treuren and van 
Hintum 2001). Therefore, accessions will rarely be found identical, 
and “redundancy” analyses must make decisions on the degree of 
difference needed to justify discarding very similar accessions. For this 
reason the term “redundancy” is more appropriate than “duplicate”. 
Think for instance of an outcrossing population that is independently 
sampled twice and analyzed with molecular markers. Unless the 
entire population is sampled twice and no scoring errors are made, the 
probability that the two samples are completely identical will be close 
to zero. However, the two samples may be very similar and one sample 
may be considered redundant. The question then is how similar two 
samples should be to consider them redundant, or how different two 
samples should be to consider them not redundant. The use of statistical 
theory may help to decide whether accessions are sufficiently different 
and to obtain probability estimates of making incorrect decisions in 
rationalization (e.g. Excoffier et al. 1992).
 Another consideration is that neutral marker data do not necessarily 
reflect variation in functional characteristics. Despite identical marker 
profiles, accessions may still differ in important phenotypes, such as 
a disease resistance. Therefore, in redundancy studies, marker data 
should preferably be used in conjunction with passport, morphological 
and evaluation data. If redundancy studies are carried out mainly for 
economic reasons, the costs to perform the molecular analyses should 
Molecular markers for genebank management 55
be evaluated against the expected benefits of a smaller collection. 
For example, for crops that can be regenerated relatively easy and 
that can be stored for relatively long times at low cost (e.g. maize), 
the benefits may not outweigh the costs for the molecular analyses 
(Engels and Visser 2003). 
 With these caveats in mind, redundancy studies will benefit 
from marker data. To identify redundancies among rice accessions, 
potential duplicates selected by examination of passport data were 
characterized by morphological and/or RAPD analysis (Virk et al. 
1995). To reduce the size of a Brassica oleracea L. collection, van Hintum 
et al. (1996) used allozymes to validate bulking of accessions into 
groups that were formed by crop experts based on historical and 
morphological data. Phippen et al. (1997) used RAPDs to analyze 14 
phenotypically uniform Brassica oleracea accessions. Statistical analysis 
that partitions variation among accessions showed that the accessions 
could be reduced to four groups with only minimal loss of variation, 
thereby saving about 70% of the costs per cycle of regeneration. In a 
similar approach using microsatellites, Dean et al. (1999) achieved a 
50% reduction among accessions within a sorghum collection without 
jeopardizing the overall genetic diversity. Using microsatellites, 
isozymes and AFLPs, potential duplicates were identified in a 
cassava core collection (Chavarriaga-Aquirre et al. 1999). Prior to 
the development of a core collection of the cultivated potato Group 
Andigenum, morphological characters and electrophoresis of total 
proteins and esterases were used to reduce a potato collection from 
10 722 to 2 379 (Huamán et al. 2000). AFLPs were used to characterize 
29 flax accessions belonging to 3 groups based on similar accession 
names. A procedure called “stepwise bulking” was subsequently 
used to form significantly different groups containing genetically 
similar material. To form these groups, an analysis of molecular 
variance (AMOVA) was used to evaluate within and between 
accession variation. Using this bulking approach, the 29 accessions 
could be reduced to 14 with only 2.6% loss of the among-population 
component of variance (van Treuren et al. 2001). Accessions of the 
wild potato Solanum L. series Acaulia Juz. were analyzed with AFLPs, 
which revealed a redundancy of about 5%. It was estimated that the 
costs to identify these redundancies were about 2.5 times as high as 
the savings that could be expected per generation by rationalization 
of the collection (McGregor et al. 2002; van Treuren et al. 2004). Other 
studies addressing the use of molecular markers in the identification 
or validation of redundancies include Tao and Sugiura (1987), Waycott 
and Fort (1994), Lamboy et al. (1994), van Hintum and Visser (1995), 
Cao et al. (1998), Cervera et al. (1998), Zeven et al. (1998), Fregene et 
al. (2000), Negash et al. (2002) and Engels and Visser (2003). 
56 IPGRI TECHNICAL BULLETIN NO. 10
Assembly of core collections
Genebanks are reservoirs of genetic diversity. However, the 
maintenance of maximum genetic diversity is done at a significant 
cost. One of the most important purposes for genetic resources 
conservation is their utilization by plant breeders for germplasm 
enhancement through variety development. However, the actual 
clientele of genebank collections is much broader and may include 
taxonomists, entomologists, molecular geneticists and scientists 
from many other disciplines (Engels and Visser 2003). 
 Whatever the purpose for conserving a collection, its size can 
create difficulties because of costs and inability to evaluate all of the 
accessions. Without evaluation data, breeders may have difficulty in 
selecting germplasm for their needs, which may in turn reduce the 
utilization of collections. To facilitate utilization, core collections have 
been developed by genebanks, following the concept developed by 
Frankel (1984). Core collections would represent, with a minimum 
of repetitiveness, the genetic diversity of a crop and its relatives. The 
remaining accessions would be maintained as a reserve collection. 
The basic idea is to subject the smaller core collection (e.g. 10% of the 
entire collection) to intensive characterization and to maintain the 
reserve collection (the remaining 90%), as a back-up of potential, but 
largely unevaluated variability for future use if new or extraordinary 
needs arise, such as emerging diseases. 
 The core collection could be constructed in different ways to 
fulfill varying objectives, from subsets of large genetic resources 
collections, including diverse arrays of taxa, to smaller collections 
reflecting particular priorities within a given genepool mostly 
addressing user requests. Core collections should in any case 
integrate a representative sample of the variation within the crop of 
interest, including its wild relatives, with a minimum of redundancy 
(Hodgkin et al. 1995; van Hintum et al. 2000).
 The concept of core collections was further developed by Frankel 
and Brown (1984) and Brown (1989a,b). Traditionally, the core 
has been constructed by a variety of taxonomic, morphological, 
agronomic and ecogeographical criteria. Often, after a core collection 
has been completed by any of the traditional criteria, biochemical 
and/or molecular data have been later applied either to validate 
the strategy adopted or to verify the amount of genetic diversity 
gathered in comparison with the original collection. For example, 
variation at 3 morphological trait loci, 17 isozyme loci, and an 
amylase gene in a collection of lentil was determined to compare 
the usefulness of 2 different sampling methods (Erskine and 
Muehlbauer 1991). A set of 19 isozyme profiles, C-banding variants 
Molecular markers for genebank management 57
in all chromosomes, 3 morphological descriptors, and phenotypic 
patterns for hordein and DDT-susceptibility were used to validate 
the selection of a core collection based on pedigree data against a 
random strategy (van Hintum 1994). Some core collections have 
been established exclusively on molecular marker data. For instance, 
Ghislain et al. (1999) set up a cultivated potato core collection based 
on RAPD data. Marita et al. (2000) also used RAPDs to develop a 
core collection of cacao and pepper.
 The success of alternative methods for composing a core 
collection has been evaluated by quantifying isozyme variation, 
with the assumption that the larger the number of alleles in the 
core collection, the more successful the method (van Hintum et al. 
1995). Simulations have also addressed the usefulness of utilizing 
marker data as a selection criterion. For example, methods based 
on variation at isozyme and RFLP loci were compared with marker-
independent methods and showed that marker-assisted methods 
were able to gather higher allelic richness (Schoen and Brown 1995). 
AFLPs were used to evaluate the genetic structure between and 
within gene pools of a wild bean core collection and the enhanced 
power of molecular markers was contrasted with the limitations of 
previous methods of analysis (Tohme et al. 1996).
 Molecular markers have also been used to compare the genetic 
diversity of the core and reserve collections of common bean 
(Skroch et al. 1998, using RAPDs); potato (Huamán et al. 2000, using 
isozymes); sorghum (Grenier et al. 2000, using SSRs); sandalwood 
(Shashidhara et al. 2003, using RAPDs). In all cases, no significant 
genetic differences were found between core and reserve samples, 
indicating that the core collection formed a representative sample 
of the entire collection. In the bean and potato situations, data were 
interpreted to suggest that the traditional core collection selection 
criteria captured the most representative sample of genetic variation. 
In the sandalwood study, the authors concluded that molecular 
markers were efficient markers for estimating diversity and in the 
case of sorghum, the results showed that genetic diversity was 
captured in various sampling procedures. Another class of markers 
of possible use for core collections are functional diversity markers 
that may be able to screen for traits, as disease resistance markers 
(see “Future challenges”).
 In some instances, the assembly of core collections has been based on 
a combination of data, including biochemical and molecular markers. 
Eight polymorphic loci in combination with geographical origin and 
morphological characters were considered an optimal method to build 
a core collection of cacao (Ronning and Schnell 1994). The combination 
of quantitative and qualitative data, including molecular markers, 
58 IPGRI TECHNICAL BULLETIN NO. 10
was shown to be adequate to assemble core collections of a number of 
tropical crop plants (coffee, rubber tree, rice and sorghum; Hamon et 
al. 1998) and Cichorium L. (Kiers et al. 2000).
 The most important issue facing gene banks is that of improving 
the use of their collections. For plant breeders, a logical question to 
ask is: are molecular marker data useful to build the core collection 
if its purpose is the maximization of agronomically important 
characters? Stated another way, is neutral marker diversity correlated 
with functional diversity? However, the power of molecular markers 
to form a core collection to maximize useful phenotypic diversity 
has not been thoroughly tested. Molecular markers share a number 
of traits that make them differentially useful depending on their 
applications. As genotypic markers, they are likely more suited 
to evaluate genetic distance than to construct core collections that 
maximize a wide range of functional diversity (as disease resistances 
and agronomic characters). Most studies suggest that neutral 
markers have reduced utility for being linked to functional diversity, 
and therefore of reduced utility for the construction of most types of 
core collections. For example, an analysis of 71 publications showed 
only a weak mean correlation between neutral molecular markers 
and quantitative measures of variation. Furthermore, there was no 
significant correlation between genetic variation and life history 
traits or heritability (Reed and Frankham 2001). However, there 
was a moderate correlation between genetic diversity as measured 
by neutral molecular markers and fitness, despite theoretical 
considerations and empirical observations that have suggested 
otherwise (Reed and Frankham 2003).
Maintenance of the genetic integrity of accessions
Genetic resources conserved as seed populations need to be 
regenerated at certain intervals to regenerate stocks for distribution 
and because of loss of viability. Each time a variable accession is 
regenerated there is a risk that the genetic integrity of the accession 
is compromised by genetic drift, selection, or gene flow (Sackville 
Hamilton and Chorlton 1997).
 Genetic drift is a stochastic phenomenon that describes fluctuations 
in allele frequencies in the offspring deviating from the parental 
population. This may cause random fluctuations in allele frequencies 
from generation to generation and may eventually result in the 
loss of alleles from accessions. For diploid organisms the change in 
allele frequency in one generation equals q(1-q)/2Ne, in which q 
represents the frequency of allele q and Ne the effective population 
size (Falconer 1981). Thus, the extent of genetic drift increases 
with reduced effective population size. The effective population 
Molecular markers for genebank management 59
size represents the size of an idealized population in which all 
individuals have an equal probability to contribute gametes to the 
next generation.
 Disparity between actual and effective population size occurs 
when only a fraction of the population participates in reproduction; 
for example, when not all plants are flowering. Effective population 
sizes are generally smaller than actual population sizes. Factors 
that lower Ne include unequal numbers of females and males, 
overlapping generations, non-random mating, differential fertility 
and fluctuations in population size (Falconer 1981; Barrett and 
Kohn 1991). In the regeneration of accessions the question is not 
so much whether genetic drift occurs, but rather to which extent. 
Potential measures to control genetic drift include adjustment 
of the size of the regenerating population or the development of 
improved regeneration methods (Engels and Visser 2003). Genetic 
drift influences all loci simultaneously and to the same extent. The 
neutrality of markers is an advantage in studies on genetic drift 
because of the random nature of the process. Co-dominant markers 
are to be preferred in assessments of genetic drift because they allow 
accurate estimation of allele frequencies.
 During regeneration, the genetic composition of populations may 
also change due to selection. Selection towards particular genotypes 
may occur when variation in flowering time occurs and seed harvesting 
is performed only once. Accessions may become adapted to the 
circumstances under which they are regenerated and deviate from allele 
frequencies adapted to their natural habitats. The difference between 
selection and genetic drift is that selection is not random and does not 
affect all loci simultaneously. Therefore, selection during regeneration 
can be inferred from strong shifts in marker frequencies for certain loci 
between parental and offspring populations. Obviously, the absence of 
such observations cannot be taken as evidence that selection has not 
occurred because selection may act on a single gene or on a few genes 
that are not targeted by the markers.
 Multiple accessions of a crop are usually regenerated 
simultaneously. Unless accessions are regenerated in complete 
isolation, gene flow between accessions may occur. Gene flow 
may occur by means of pollen or through seed contamination after 
harvesting. The probability of pollen flow between populations 
strongly depends on the mating system of the species. To prevent 
gene flow between outcrossing accessions, regeneration may be 
carried out in isolation, e.g. in isolation chambers or by using different 
crops to separate populations in the field. Molecular markers may be 
used for monitoring contamination of accessions or for evaluating 
the effectiveness of measures to prevent contamination. Gene flow 
60 IPGRI TECHNICAL BULLETIN NO. 10
between populations can readily be studied with neutral, preferably 
co dominant, markers that are diagnostic for different populations. 
The probability of detecting gene flow between populations very 
much depends on the extent of genetic differentiation between the 
populations.
 The objective of genebanks is to maintain the genetic integrity of 
their accessions as much as possible. The loss of genetic integrity from 
variable accessions will always occur, and the best genebanks can do 
is to reduce it as much as possible. Molecular markers are useful tools 
to evaluate and optimize the efficiency of the regeneration protocols. 
Reedy et al. (1995) used isozymes to investigate allele frequencies in 
maize accessions following several cycles of regeneration. Significant 
changes in allele frequencies over generations were observed that were 
attributed to genetic drift because of the absence of any linear trend. 
Del Rio et al. (1997a) used RAPDs to measure the loss of diversity in 
genebank samples of wild potatoes after one to four cycles of seed 
increase. The majority of the populations showed no significant 
or only very little loss, and they suggested that the seed increase 
methodology they used (using 20 individuals and pollinating from 
bulked pollen) was an appropriate seed increase strategy. Del Rio et al. 
(1997b) used RAPDs to measure genetic differences between genebank 
samples subjected to one cycle of seed increase and recollections from 
the original site of collections in the wild. These collections showed 
significant differences that could be due to genetic drift, gene flow 
from adjacent populations, or differences in sampling in the wild. 
Parzies et al. (2000) used morphological and isozyme markers to 
analyze diversity within barley landraces stored in genebanks for 10, 
40 and 72 years. Data were compared with recent collections of the 
same landrace. Severe declines in genetic diversity of the landraces 
were observed with length of time in storage, as well as a strong 
increase of the level of genetic differentiation among accessions over 
time. Thus, not only was genetic variation lost from accessions, but 
populations also differentiated in genetic composition over time. If 
the accessions are regenerated about once every five years it was 
estimated from the genetic data that the effective population size over 
the period of seed storage was only 4.7. By means of allozyme analysis 
Wagner and Allard (1991) presented evidence of long-distance pollen 
migration in predominantly selfing barley, affecting the genetic 
integrity of pedigree stocks and experimental populations. Börner et 
al. (2000) investigated wheat accessions regenerated 24 times under 
ex situ conditions using microsatellite markers. No pollen or seed 
contamination was detected for any of the accessions, whereas genetic 
drift was observed in one case. Other studies using genetic markers 
to address questions about the genetic integrity of accessions include 
Molecular markers for genebank management 61
Spagnoletti-Zeuli et al. (1995), Penteado et al. (1996), Schittenhelm et 
al. (1997) and Steiner et al. (1997). In summary, changes in the genetic 
constitution of accessions as a result of regeneration can readily be 
studied with molecular, preferably co-dominant, markers. Resulting 
data are useful to genebank curators in evaluating and improving their 
regeneration protocols in order to minimise loss of genetic integrity 
as much as possible. 
Utilization of genetic resources
Germplasm base broadening
The replacement of traditional agricultural systems by modern 
industrial methods and the introduction of modern high yielding 
varieties have been considered to decrease crop diversity relative 
to landrace or wild species progenitors. Although some studies 
suggest that this reduction in genetic diversity may be less severe 
than previously assumed (Petersen et al. 1994; Struss and Plieske 
1998; Backes et al. 2003; Khlestkina et al. 2004), broadening of the 
genetic base of crops may be essential in the development of new 
varieties. The striking success of the “Green Revolution” was based 
largely on introgression of genes for reduced plant height and 
increased resistance to diseases in wheat and rice. This success led to 
a widespread endeavour to assemble genetic resources in genebank 
collections for the present and future use of mankind. Much progress 
of plant breeding was possible thanks to developments such as 
advances in cell and tissue culture, embryo rescue, identification of 
somaclonal variation, protoplast fusion, double-haploids and genetic 
transformation (Koornneef and Stam 2001). A new era started in the 
1970’s with the development of genetic markers, initially biochemical 
and then molecular, and the possibilities they provide to assess genetic 
variation. DNA markers simplified the construction of genetic linkage 
maps, the identification of single-gene traits, quantitative trait loci, 
and application of molecular breeding strategies through “marker-
assisted” introgression and selection. 
 Markers have been used to infer the origins of crop species and 
identify ancestors that could be sources of interesting traits. Maize 
and its wild relatives, i.e. teosinte, were analyzed at 21 allozyme 
loci to identify teosinte as the progenitor of maize and offered new 
perspectives for the improvement of maize with introgressed teosinte 
germplasm (Doebley et al. 1984). RFLPs have been used to confirm gene 
introgression of somatic hybrids of potato and one of its non-tuber-
bearing wild relatives (Watanabe et al. 1995). RAPDs were successfully 
used for a genetic diversity study of faba bean germplasm aiming at the 
identification of heterotic groups for broadening the genetic base of the 
crop (Link et al. 1995). Allozymes and RFLPs were used to investigate 
62 IPGRI TECHNICAL BULLETIN NO. 10
the success of hybrids recreating rapeseed from its parents as a source 
for broadening the genetic variation in Brassica napus L. (Becker et al. 
1995). A PCR-based DNA marker able to detect a wide-compatibility 
allele in rice was found to be useful to facilitate the exploitation of 
crosses between different subspecies of Oryza sativa L. germplasm 
sources (Williams et al. 1997).
 Introgression of desirable traits from exotic germplasm in traditional 
breeding approaches requires several back-crossing generations. 
Beckmann and Soller (1986) calculated the frequency of a favourable 
allele after several backcross generations with and without the help of 
a linked marker. They found that the use of a marker could improve 
efficiency of introgression ten-fold. RFLPs were used to determine 
whether unadapted landrace material persisted during selection for 
early flowering following introgression into maize (Koester et al. 
1993). A codominant RAPD marker linked to a resistance gene for 
the bean golden mosaic virus was identified and proved useful to 
expedite the rapid introgression of partial resistance into susceptible 
germplasm of common bean (Urrea et al. 1996).
 Both a CAPS and a microsatellite marker were developed for 
marker-assisted introgression of a locus encoding resistance to 
different strains of the Barley Yellow Mosaic virus (Graner et al. 
1999). The availability of an RFLP genetic linkage map made possible 
the evaluation of several quantitative trait loci in an artificial cross 
of tomato, involving the wild relative Solanum pennellii Correll, and 
led to the understanding of the basis of transgressive segregation 
in a wide cross. The same study revealed the power of molecular 
markers to identify useful QTLs in wild species for traits in which 
they were considered phenotypically inferior in comparison with 
the cultivated species (de Vicente and Tanksley 1993). This research 
emphasized the importance of genetic rather than phenotypic 
traits in crop improvement. This evidence was then successfully 
pursued using a range of wild tomato species, while improving 
methodologies and similar approaches in other crops (Eshed and 
Zamir 1995; Eshed et al. 1996; Tanksley and Nelson 1996; Xiao et al. 
1996; Bernacchi et al. 1998). These studies demonstrated the ability 
to identify accessions with DNA markers that possessed useful 
alleles (Tanksley and McCouch 1997). Determining the genotypes 
of accessions in a genebank collection may improve the choice of 
specific alleles to greatly improve the likelihood of uncovering novel 
and useful alleles.
 A strategy was proposed to develop exotic libraries that are 
collections of elite crop lines containing defined genomic regions 
from wild species, to provide pre-breeding material for modern 
varieties (Stuber et al. 1999; Zamir 2001). The procedures involve 
Molecular markers for genebank management 63
the construction of near isogenic lines (NILs), covering the donor 
genome, in a process monitored with molecular markers, in which 
sequential segments of the genome are replaced by the segments 
originated in the donor genome. These collections could be very 
useful for gene discovery and characterization, in particular for 
quantitative trait loci, and may represent an excellent strategy to 
exploit germplasm of genetic resources.
 Molecular breeding has many challenges (Young 1999). Many 
markers may not be sufficiently close to the genes of interest to aid 
effective introgression; depending on the question being addressed, 
marker coverage of the genome might be insufficient; poor parental 
sources may be chosen; and when pursuing the location of QTLs, 
poor phenotyping and lack of sufficient replication trials may 
hamper the results. Tanksley and Nelson (1996) identified two 
reasons for the reduced success in the development of new varieties 
with important QTLs: 1) QTL mapping and variety development 
are usually two separate activities; and 2) most QTL studies have 
focused on elite materials. Stuber et al. (1999) indicated that the use 
of QTLs will depend very much on the genetic background where 
they were found and the need for their proper screening.
 The development and applications of modern techniques will 
play a significant role in uncovering and using genetic variation. 
Markers have been identified and successfully used to introgress 
simple and complex traits. More research is needed to understand 
the genetic basis underlying complex variation. Ultimately, plant 
scientists will only be able to use genetic variation if it is properly 
maintained and accessible. With sufficient knowledge on the 
experimentation needed, genebanks may play a key role in variety 
development through pre-breeding.
Identification of relevant diversity for specific traits
The role of markers for identifying and transferring useful genes 
from germplasm to crops has already been mentioned. However, 
genebanks are often not linked to breeding programmes, or they 
may not have the capacity to conduct pre-breeding. Here we describe 
the use of molecular markers linked to agronomically important 
traits to screen germplasm for these traits. The value of these 
types of markers will depend on how well they uncover sufficient 
polymorphism in different genetic backgrounds. Rick and Fobes 
(1974) found a specific acid phosphatase allele (Aps1) in nematode 
resistance stocks, after testing a wide array of tomato cultivars for 
isozyme patterns. The particular allozyme was missing from other 
tested tomato germplasm and the test of segregating populations 
showed that only +/+ individuals were nematode susceptible. This 
64 IPGRI TECHNICAL BULLETIN NO. 10
meant that an exceptional codominant marker had been detected, 
which could be efficiently used to screen for nematode resistance 
in germplasm without the need to expose materials to the parasite. 
However, given that the source of the resistance was the wild 
species Solanum peruvianum L., the usefulness of this marker would 
be mostly limited to breeding material that used S. peruvianum in 
its pedigree. A similar case is that of a shikimate dehydrogenase 
allozyme, which was found to be an effective marker in screening 
rice germplasm for high seed protein content (Shenoy et al. 1990). 
Tang and Zhang (1992) found a peroxidase enzyme marker linked 
to dwarfness that could select for this trait when trees were only 
two months old. More recently, a preliminary RAPD screening 
was used to identify markers linked to Fusarium wilt resistance in 
chickpea. Allele-specific associated primers were then developed 
for susceptibility of race 1 of Fusarium wilt (Mayer et al. 1997).
 These linked markers may be useful to structure a collection 
according to a certain genotypic composition. Another alternative 
would be to characterize germplasm for diversity at specific gene 
loci. In both cases, genebanks would offer potential clients, whether 
plant breeders or the entire plant scientific community, value-added 
germplasm with will-characterized traits.
 Manipulation of the phenotype is improved by identification 
of the gene controlling the character and also knowledge of its 
gene action and interactions. The study of variants in genes will 
eventually lead to the deciphering of gene function and might 
facilitate the management of genes in approaches tailored to solve 
specific agricultural challenges. Modern technologies and research 
currently focus on the diversity at its the DNA sequence level. 
Genome sequencing data will assist in the search for putative 
genes of agronomic importance. Since the genomes of all species 
will not be sequenced because of cost, comparative mapping and 
comparative genomics provide other options. Comparative mapping 
makes it possible to map the presence of a putative gene of interest 
in a related species. Markers derived from the known sequence of 
a gene in a related species may be used to identify the gene in the 
lesser-known species through the use of comparative genomics (see 
also “Future challenges”).
 Such genomic approaches have already been made. Primers were 
constructed from conserved regions of the leghemoglobin gene, a 
potentially important gene for enhanced nitrogen fixation. These 
primers had complete homology with the leghemoglobin-encoding 
genes of common bean, two leghemoglobin genes of soyabean and 
90% homology with a third gene of soyabean. Grouping of eleven 
species of Phaseolus L. was possible based on the product of the PCR 
Molecular markers for genebank management 65
amplification, while intraspecies variation could only be observed for 
P. acutifolius A. Gray and P. angustissimus A. Gray. (Skroch et al. 1993). 
Plant disease resistance genes have already been isolated, and many 
show common structural features in nucleotide binding sites and 
leucine-rich repeats. A study of molecular diversity was conducted 
with lettuce germplasm, both wild and cultivated, using markers 
derived of the NBS-LRR resistance gene type (Sicard et al. 1999). One 
microsatellite marker and two SCAR markers from the LRR-encoding 
regions were developed and were shown to be highly correlated with 
resistance phenotypes in L. sativa L. In addition, several haplotypes 
indicated the presence of numerous resistance genes in wild species. 
There are many copies of this type of resistance gene homologue 
in plant genomes, and they can be used as candidates at QTLs for 
resistance to plant diseases (Pflieger et al. 1999; Geffroy et al. 2000). 
Based on the sequence similarities of several plant resistance genes 
available, Chen et al. (1998) used denaturing polyacrylamide-gel 
electrophoresis to detect PCR products of genomic DNA amplified 
with primers designed from conserved regions of those genes. Because 
segregation analysis of the PCR products in breeding populations 
showed linkages with known resistance genes, the authors concluded 
that the markers developed could be used to assess genetic diversity 
based on candidate genes for resistance.
 Different approaches aim at tackling the discovery of genetic 
variation within coding sequences in order to identify allelic differences 
responsible for phenotype, such as allele mining, comparison of EST 
sequence data, identification of single-nucleotide polymorphisms, 
correlation of SNP haplotypes with phenotypes, use of conserved 
ortholog gene primers in different species, etc.; see also “Future 
challenges”. However, recent investigations show that changes in the 
phenotype of a character are not necessarily due to changes in the 
DNA sequence, or the structure of the protein they encode, but rather 
in the regulatory elements that affect the expression of the gene (Frary 
et al. 2000). Pursuing research in many of these areas will be beyond 
the role of most genebanks, but this may change as methodologies 
become more routine and cost effective.
Streamlining procedures  
and goals among cooperating genebanks
Experimental procedures to determine marker variation may vary 
between different laboratories and comparison of data can be 
problematic. However, international cooperation between genebanks 
is indispensable in achieving maximum efficiency in the management 
of genetic resources. Such cooperation includes the exchange of 
methodologies and technologies to research, document, manage and 
66 IPGRI TECHNICAL BULLETIN NO. 10
utilize genetic resources. Since the diversity of many crops is typically 
distributed among many genebanks, the ability to compare different 
collections is important for investigating the extent of diversity of a crop 
and of the overlap existing between genebanks. Due to environmental 
effects, comparing accessions by morphological analysis at different 
locations has obvious limitations. In contrast, molecular markers can 
overcome such environmental effects and are particularly useful for 
these investigations. However, even though the majority of molecular 
markers are quite robust, several methodological issues need to be 
carefully considered in order to obtain results that can be compared 
between different laboratories.
 In a small-scale study, Jones et al. (1997) compared the reproducibility 
of RAPDs, AFLPs and SSRs between different European laboratories 
through an examination of small numbers of poplar, sugar beet and 
tomato samples involving nine different laboratories. Molecular 
analyses were carried out on DNA samples extracted by a single 
laboratory and optimal protocols were used among participating 
laboratories. None of the laboratories was able to reproduce the RAPD 
profile exactly, but AFLP and SSR profiles were highly reproducible. 
 Bredemeijer et al. (2002) examined 521 tomato varieties using 20 
microsatellites, carried out by a consortium of 5 laboratories using 
different detection methods. Each laboratory used the same set of 
reference alleles, predefined in a previously standardized experiment 
with 22 tomato varieties. No discrepancies between laboratories were 
observed for 361 varieties (70%), whereas 136 varieties (25%) showed 
discrepancies at 1 or 2 loci, and 24 varieties (5%) showed discrepancies 
at more than 2 loci. The majority of the discrepancies could be attributed 
to heterogeneity of the seed sample and only few discrepancies were 
ascribed to methodological differences, e.g. the use of different selection 
thresholds for allelic peaks. It was concluded that microsatellites could 
be used successfully to construct databases containing molecular data 
generated by different laboratories, provided that attention is paid 
to methodological considerations, including the careful selection of 
markers, the duplication of analyses in at least two laboratories, and 
knowledge of possible heterogeneity of seed samples.
 In a similar study, Röder et al. (2002) characterized 502 European 
wheat varieties with 19 microsatellites. Data were collected in duplicate 
in at least two laboratories using different experimental procedures. Out 
of the 11 080 data points generated, 34 discrepancies remained unsolved, 
revealing an accuracy of more than 99.5%. The use of reference alleles 
was indispensable in achieving correct allele identification. Although 
studies on the reproducibility of marker data in network situations 
are still scarce, microsatellites appear to be one of the most repeatable 
marker technologies in collaborative projects.
Molecular markers for genebank management 67
Crop Breeding
Parental contributions of artificial hybrids
DNA markers are very useful for confirming hybridity of artificial 
sexual hybrids or somatic fusion hybrids. Molecular markers are 
especially useful when hybridity is questioned by morphological 
reasons or for early screening of large putative hybrid populations. 
Thieme et al. (1997) used isozyme and RAPD data to screen hundreds 
of first generation somatic fusion hybrids and later sexual backcross 
generations between cultivated and wild species of potato, to distinguish 
hybrid from non-hybrid progeny. Parani et al. (1997) also used isozymes 
and RAPDs to confirm hybridity of F2 generations of Sesamum L. where 
the identity of hybrids was questioned on morphological grounds. Lee 
et al. (1998) used RAPDs to disprove hybridity in putative Saccharum 
L. and Erianthus Michx. hybrids. Durham and Korban (1994) used 
RAPDs to identify apple clones with introgressed genes from a wild 
apple species that was used to transfer apple scab resistance through 
many cycles of introgressive hybridization. Oberwalder et al. (1997) 
used RFLP and SSR probes to investigate the genome contribution 
to asymmetric somatic fusion hybrids. These differ from symmetric 
somatic fusion hybrids by the elimination of part of the genome of 
one of the parents, through irradiation or chemical treatments, before 
fusion, in order to try to eliminate undesirable traits. They found that 
RFLPs were better able to distinguish symmetric from asymmetric 
fusion hybrids. Yamada et al. (1997) used cpDNA and mtDNA to 
investigate symmetric somatic fusion hybrids of cultivated potato 
and one of its wild relatives. The results suggested that chloroplast 
genomes of the respective parents segregated randomly while the 
mitochondrial genomes favoured the cultivated species parent. Provan 
et al. (1996) used SSRs to show how somatic fusion hybrids could be 
rapidly screened at the callus level to quickly and unambiguously 
distinguish fused from non-fused products.
 Chromosome substitution lines are genetic stocks differing by 
deleted chromosomes, and are used in genetic studies of the inheritance 
of quantitative traits. Korzun et al. (1997) used mapped SSRs to 
authenticate different substitution lines. The high polymorphism 
present in SSRs, in contrast to isozymes and RFLPs, made them 
ideal markers for this purpose. Crouch et al. (1998) used SSRs to 
investigate the genetic constitution of tetraploid hybrids in banana. 
Prior breeding schemes did not appreciate the occurrence of diploid 
gametes that would allow a broader range of breeding strategies, 
and SSRs were used to infer the ploidy level of the gametes leading 
to these hybrids.
68 IPGRI TECHNICAL BULLETIN NO. 10
Geneflow between crops and weeds
Gene flow among crops and weeds has long been considered a 
common occurrence in nature, and is a continuing source of useful 
new variation in landraces (van Raamsdonk and van der Maesen 
1996). Artificial gene exchange also forms the basis of modern 
plant breeding which is replete with wide crosses, sometimes 
even between genera (Harlan and DeWet 1971). In 1996, the first 
commercial transgenic crops were introduced after review and 
approval from governmental regulatory agencies in the US and 
around the world. Since then, gene flow has become a much more 
topical issue. The US National Research Council (2002) concluded 
that transgenic plants are not “different in kind” from crops bred 
by other means, but contend that certain transgenes may be novel 
and therefore a special source of concern. Molecular markers are key 
tools to investigate gene flow, with implications for a wide array of 
economic, environmental, food safety and social concerns. 
 Environmental concerns relate to possible increased weediness 
and the survival of rare populations. For example, Ellstrand et al. 
(1999) conclude that gene flow from traditionally bred crops to 
weedy relatives has been implicated in the increased weediness 
in wild relatives of 7 of the 13 most important crops worldwide. 
They cite one extreme example of a weedy rye derived from natural 
hybridization between cultivated rye and a wild rye species. In 
California the weedy hybrid is so dominant that farmers have 
abandoned efforts to grow rye there for human consumption 
(National Academy of Sciences 1989). Another concern is the 
possible extinction, or reduction in fitness, of wild local populations 
resulting from transgenic gene flow (Ellstrand and Elam 1993; Levin 
et al. 1996; Rhymer and Simberloff 1996). Crops often contain genes 
that theoretically reduce fitness of individuals within populations 
in the wild, and many crops do not survive long in the wild. 
Extinction of related wild populations may result from outbreeding 
depression - a reduction in fitness following hybridization among 
individuals in populations (Templeton 1986; Waser 1993). Another 
concern is “swamping” of locally rare species with transgenes 
through repeated bouts of introgression (Ellstrand and Elam 1993). 
For example, hybridization was documented between cultivated 
rice and Taiwanese wild rice, resulting in a progressive loss of 
wild rice traits in native populations in Taiwan (Kiang et al. 1979). 
In less than 80 years, gene flow from cultivated rice, coupled with 
environmental factors, drove the populations of Taiwanese wild 
rice to near extinction.
 Studies measuring pollen movement out of a source planting into 
larger surrounding fields indicates that pollen moves very short 
Molecular markers for genebank management 69
distances, typically 5–20 m (e.g. Conner and Dale 1996; Llewellyn and 
Fitt 1996). Pollen sink studies, however, show vast underestimations 
of such gene transfer, with pollen measured at distances well over 
1000 m or more (e.g. Arias and Rieseberg 1994; St. Amand et al. 2000; 
Rieger et al. 2002). However, these pollen flow studies are short-term, 
and gene flow is also possible by transgenics remaining in a field 
through the soil seed bank, and seed movement through transport 
from field to market. Individual studies and reviews are beginning 
to accept the reality of genes frequently being dispersed from crop 
to crop, and crop to weed, assuming compatible recipient wild and 
cultivated species are present within some reasonable distance, 
sometimes measured to distances over 1 km (Dale 1992; Hancock 
et al. 1996; Snow and Palma 1997; Linder et al. 1998; Ellstrand et al. 
1999; Ellstrand 2001; St. Amand 2004). Rieseberg and Burke (2001) 
argue that gene flow is not as important as selection coefficients 
in determining maintenance of a gene in nature, and suggest that 
advantageous transgenes will likely be maintained.
 Following are examples of gene flow studies facilitated by 
molecular markers. Rabinowitz et al. (1990) tested hypotheses of 
gene flow between wild and cultivated potato. By use of population-
specific isozyme markers they were able to document high levels 
of natural gene flow in experimental field plots in the Andes. They 
used these data to suggest that there was extensive gene flow among 
other cultivated and wild species. Skogsmyr (1994) reported high 
frequencies of transgenic pollen dispersal up to 1000 m from a field 
trial of transgenic potatoes. Conner and Dale (1996) pointed out 
procedural problems in the design of the experiment and questioned 
if the geneflow was artefactual; the question remains unresolved. 
However, potatoes are pollinated by buzz-pollinators, and gene flow 
from such insects is possible in other insect pollinated species.
 Beets are self-incompatible wind pollinated species and therefore 
obligately outcrossing. Weed beets pose a serious problem for 
sugar beet cultivation, and traditionally have been controlled only 
by manual removal. Transgenic herbicide resistant beets have 
been considered as a way to control this problem. Desplanque 
et al. (1999) showed, through the use of RFLP and microsatellite 
markers that weed beets in northern France were intermediates 
between cultivated sugar beets and wild beets in southwestern 
France. They attributed the origin of weed beet infesting cultivated 
sugar beet fields to accidental and recurrent hybridization between 
cultivated beet and wild beets during the production of commercial 
seeds in southwestern France. Desplanque et al. (2002) showed 
that herbicide resistant sugar beets could transfer their herbicide 
resistance to co-occurring weed beet populations, rapidly reducing 
70 IPGRI TECHNICAL BULLETIN NO. 10
the effectiveness of the transgene. Saeglitz et al. (2000) also showed 
wide pollen dispersal in sugar beets (greater than 200 m), even when 
“containment” border plants (in this case hemp) were used, through 
the use of cytoplasmically male sterile receptor plants, and PCR 
screening with probes specific to a transgene. Bartsch and Ellstrand 
(1999) investigated the origin and gene flow among cultivated and 
weed beets from California, and showed, through allozyme analysis, 
germplasm conforming to cultivated beet, sea beet (different 
subspecies of Beta vulvaris L.) naturalized B. macrocarpa Guss., and 
evidence of gene flow among all types.
 In contrast to beet, cultivated barley is a self-pollinating species 
and would be expected to have less transgene escape. Ritala et 
al. (2002) measured, via PCR screening with probes specific to 
a transgene, pollen-mediated dispersal of barley transgenes via 
cross-fertilization in barley at distances of 1, 2, 3, 6, 12, 25, 50 and 
100 m from the donor plots. The number of seeds obtained from 
male-sterile heads diminished rapidly with distance and only a few 
seeds were found at distances of 50 and 100 m. Molecular genetic 
analysis revealed that all seeds obtained from male-sterile heads 
at a distance of 1 m were transgenic, as anticipated. However, 
only 3% of the distant seeds (50 m) actually carried the transgene, 
while the remaining resulted from fertilization with non-transgenic 
background pollen. This background pollen was mainly due to 
pollen leakage in some male-sterile heads. In normal male-fertile 
barley, the cross-fertilization frequency with transgenic pollen varied 
from 0 to 7% at a distance of 1 m, depending on weather conditions 
on the heading day. They concluded that because of competing self-
produced and non-transgenic background pollen, the possibility of 
cross-pollination is very low between a transgenic barley field and 
an adjacent field cultivated with normal barley. Adequate isolation 
distances and best management practices are needed for cultivation 
of transgenic barley.
 The above studies document introgression in early generations. 
Brubaker and Wendel (1994) document longer-term persistence of 
cultivar genes into wild populations of cotton. Linder et al. (1998), 
using RAPDs, documented the long persistence (perhaps up to 40 
years) of cultivated sunflower specific markers in adjacent wild 
populations of this species.
 Oilseed rape (both Brassica rapa L. and B. napus L.) are pollinated 
by wind and insects. Jørgensen and Andersen (1994) documented 
hybridization, using isozymes, RAPDs, morphological markers 
and chromosome counts, between cultivated oilseed rape (B. 
napus) and adjacent populations of B. campestris L. The former is 
an amphidiploid (2n = 38) and the latter is one of its diploid (2n = 
Molecular markers for genebank management 71
20) progenitors. Hybrids were commonly produced, bidirectionally, 
between both crop and weed. Backcrossing of the hybrids to weedy 
B. campestris was documented with an isozyme marker, supporting 
an avenue for transgene escape. 
 Timmons et al. (1995) detected Brassica napus pollen distributed 
2.5 km from its source. Natural hybrids between B. rapa and B. napus 
are documented in the British Isles (Harberd 1975). Field studies 
show these species to readily hybridize (Jørgensen et al. 1996). 
Extensive hybridization of these 2 species, in a field where they 
co-occurred for 11 years, was supported by AFLP data (Hansen et 
al. 2001). Rieger et al. (2002) screened 48 million individual canola 
(B. napus) plants. This was possible with a transgenic canola line 
resistant to acetolactate synthase (ALS) inhibiting herbicide. Seeds 
were collected from 63 conventional canola fields growing near 
herbicide-resistant fields in New South Wales, Victoria and South 
Australia. At crop maturity, 10 stratified samples totalling at least 
100 000 seeds were taken from each of three locations in each field 
of conventional canola. These were parallel to the source field and 
taken at the edge nearest to the source field, the middle, and the 
edge furthest from the source field. Collected seed samples were 
screened with a lethal discriminating dose of the ALS. The results 
show that, in most cases, gene flow via pollen movement occurs 
between canola fields. However, even adjacent commercial canola 
fields in Australia will have much less than 1% gene flow. Resistance 
was detected up to, but not beyond three km from the source.
 Despite theoretical concerns about transgene release, in the ten 
years since transgenics have been used in thousands of products, 
there has not yet been a confirmed case of a person suffering 
any ill effects from consuming food derived from genetically 
modified plants. Also, the only long-term experiment comparing 
the performance of four transgenic crops (oilseed rape, potato, 
maize and sugar beet) found no proof that the transgenics are 
more invasive or persistent than their non-transgenic conventional 
counterparts (Crawley et al. 2001). This experiment involved field 
comparisons over 10 years in 12 different habitats. The transgenic 
traits were the herbicide glufosinate tolerance for oilseed rape and 
maize, the herbicide glyphosate (‘Roundup’) tolerance in sugar beet, 
and two types of transgenic potato expressing either the insecticidal 
Bt toxin or a pea lectin. It is important to note that these experiments 
involved transgenic traits (resistance to herbicides or insects) that 
were not expected to increase fitness in natural habitats, or to 
increase fitness only somewhat depending on the level of biological 
control. This is in contrast to other traits that would have greater 
fitness impact such as cold, drought, or metal tolerance, improved 
72 IPGRI TECHNICAL BULLETIN NO. 10
nutrient uptake, or altered development. These results cannot be 
extrapolated to all transgenic plants in every environment, but they 
are noteworthy for the length of the experiment and diversity of 
crops and habitats. A similar conclusion was made by Salisbury 
(2000) with oilseed rape.
Autotetraploid vs. allotetraploid inheritance
Autopolyploid refers to the multiplication of the chromosome set 
from a single species and allopolyploid refers to the multiplication 
from chromosome sets of different species. Some use the concept 
to refer to different genetic backgrounds within species as well. 
Knowledge of genome constitutions and inheritance patterns are 
essential in designing crosses for genetic improvement. Genetic 
data from codominant markers can clearly distinguish between 
allo- and autopolyploidy by progeny classes observed between 
genetically characterized crosses. A diploid organism with two 
different alleles at a locus (A and a) produces one heterozygote 
class (Aa). Selfing of this individual will produce a progeny 
array of 1AA:2Aa:1aa. However with tetrasomic inheritance, in 
an autotetraploid, three different classes of heterozygotes can be 
produced (AAaa), (Aaaa) and (AAAa). Selfing of an individual 
having the (AAaa) genotype will result in a progeny array of 1AA
AA:8AAAa:18AAaa:8Aaaa:1aaaa. In contrast, preferential pairing 
of genomes in an allotetraploid having AA on one chromosome 
pair and aa on the other chromosome pair would produce only 
AAaa progeny resulting in “fixed” heterozygosity that would not 
be expected in an autotetraploid (Warnke et al. 1998).
 Creeping bentgrass (Agrostis palustris Huds.) is a major cultivated 
turfgrass species. It is a tetraploid (2n = 4x = 28), and poorly 
characterized genetically. Warnke et al. (1998) screened 650 clones 
of A. palustris with 12 enzyme systems to assess polymorphism and 
selected putative duplex allele states (i.e. AAaa) that provided the 
clearest differentiation between disomic and tetrasomic inheritance 
in self and testcross progenies. They provided strong evidence that 
A. palustris is an allotetraploid via segregation patterns at four loci 
that exhibited sufficiently clear segregation patterns to infer this 
state.
Molecular diversity and heterosis
Heterosis, or hybrid vigour, originally referred to the selective 
superiority of heterozygotes regarding continuously variable 
characters of size, yield and vigour. The concept was expanded to 
include adaptive, selective, or reproductive advantage (Dobzhansky 
1950). The biological basis of heterosis remains unknown, but 
Molecular markers for genebank management 73
Tsaftaris and Shull (1995) summarized studies of RFLP genetic 
mapping in maize to suggest that a few major qualitative trait loci 
scattered throughout the genome explain some of the attributes of 
heterosis. Heterosis is important to breeders because it is commonly 
assumed that the cross of diverse rather than similar parents 
enhances breeding success. Much of the genetic characterization 
of crop germplasm has the practical goal of discovering genetically 
diverse lines for breeding (Mumm and Dudley 1994; Abo-elwafa et 
al. 1995; Maughan et al. 1995; Dubreuil et al. 1996; Yang et al. 1996; 
Menkir et al. 1997). Smith et al. (1990) showed a strong correlation 
of genetic distance and heterosis in maize. Melchinger et al. (1994) 
reported that genetic distances of inbred lines of maize, as assayed 
via RFLP data, were not correlated with heterosis for yield. Paz and 
Veilleux (1997) reported that in crosses between a diploid cultivated 
potato species with diploid clones of other complex interspecific 
hybrids of potato, the greatest total tuber yield was associated 
with diverse parents as measured by RAPDs. Manjarrez-Sandoval 
et al. (1997) compared RFLP similarity estimates of the parents, 
to yield in soyabean, and also found a positive correlation. Diers 
et al. (1996) examined the correlation of high RFLP variation to 
heterosis in mustards, and related this to the general combining 
ability of the parents (the ability of an accession to transfer a trait 
of interest to a diverse range of progeny). They found that genetic 
distance was related to heterosis in only some of the crosses, and 
concluded that genetic distance alone does not identify heterotic 
combinations consistently. The data, therefore, provide a suggestion 
of the importance of genetic diversity as measured by neutral 
markers to heterosis, but apparently the relationship is not constant 
in all cases and more research is needed to understand when his 
relationship will occur. The germplasm used in these studies may 
have a major effect on the correlation of genetic distance and yield. 
While Smith et al. (1990) used fairly elite (agronomically advanced) 
germplasm, other studies did not do so, and the effect of germplasm 
advancement needs to be investigated.
74 IPGRI TECHNICAL BULLETIN NO. 10
Current developments
The majority of marker technologies outlined in “Overview of 
molecular technologies”, sample random genomic regions except 
those targeted to specific genes. Because the largest part of the 
genome consists of non-coding DNA, the sampled diversity through 
the use of conventional markers in general will be selectively 
neutral. For some questions facing genebank managers the selective 
neutrality of a marker is not problematic, and for some purposes 
it is even to be preferred (e.g. in studying the effect of random 
processes, such as genetic drift on genetic diversity; see “Crop 
breeding”). However, the representativeness of these markers 
to overall diversity remains an open question. Plant breeders 
are especially interested in diversity in agronomically important 
characters that may be expected to be influenced by selection. The 
extent to which neutral markers are representative of functional 
diversity will strongly depend on the extent to which the underlying 
genes have been influenced by selection. This may vary from crop 
to crop and among wild species, landraces and cultivars. Molecular 
markers that are able to quantify broad sense functional diversity 
may be very useful for the characterization and optimization of 
genebank collections.
 In addition, knowledge about variation in specific (groups of) 
genes will contribute to the utilization of genetic resources. During 
the last decade large amounts of sequencing data have become 
available for various crops (as rice and tomatoes), and in the case 
of Arabidopsis thaliana (L.) Heynh. the entire genome has been 
sequenced. Determination of the sequence of expressed DNAs 
(Expressed Sequence Tags - ESTs) is possible through isolation of 
messenger-RNA( mRNA) and the construction of complementary 
DNA( cDNA) libraries. Gene functions may then be assigned to 
ESTs, for example by means of “differential display techniques” or 
through matching with known sequences that have already been 
assigned a function. Massive amounts of sequence data, possibly 
with genome location and gene function, have been stored in 
databases that are publicly available. Access to these data enables 
the exploitation of this knowledge, e.g. for the development of 
functional diversity markers and the identification of putative genes 
and the variation therein for traits of interest.
 Because genetic resources collections may consist of large numbers 
of accessions, sample throughput is an important aspect in the 
application of molecular markers. Even the PCR-based techniques 
may still be time-consuming, particularly when polyacrylamide 
Molecular markers for genebank management 75
gel electrophoresis, radiolabelling and manual scoring of 
autoradiograms are involved. Therefore, at the detection level, 
developments in the field of molecular marker technologies focus 
on gel-free, non-radioactive and highly automated experimental 
procedures. 
 The aforementioned developments in molecular marker 
applications have great potential for genebanks and are therefore 
discussed in more detail in the following sections.
Developments in marker techniques
Single Nucleotide Polymorphism (SNP)
The fact that in many organisms most polymorphisms result from 
changes in a single nucleotide position (point mutations), has 
led to the development of techniques to study single nucleotide 
polymorphisms (SNPs). Analytical procedures require sequence 
information for the design of allele-specific PCR primers or 
oligonucleotide probes. SNPs and flanking sequences can be found 
by library construction and sequencing or through the screening of 
readily available sequence databases. Once the location of SNPs is 
identified and appropriate primers designed, one of the advantages 
they offer is the possibility of high throughput automation. To 
achieve high sample throughput, multiplex PCR and hybridization 
to oligonucleotide microarrays or analysis on automated sequencers 
are often used to interrogate the presence of SNPs. Figure 7 outlines 
the analysis of SNPs using the SNaPshot method. SNP analysis may 
be useful for cultivar discrimination in crops where it is difficult to 
find polymorphisms, such as in the cultivated tomato. SNPs may 
also be used to saturate linkage maps in order to locate relevant 
traits in the genome. For instance, in Arabidopsis thaliana a high-
density linkage map for easy to score DNA-markers was lacking 
until SNPs became available (Cho et al. 1999). To date, SNP markers 
are not yet routinely applied in genebanks, in particular because 
of the high costs involved.
Retrotransposon-based markers
Retrotransposons consist of long terminal repeats (LTR) with a highly 
conserved terminus, which is exploited for primer design in the 
development of retrotransposon-based markers. Retrotransposons 
have been found to comprise the most common class of transposable 
elements in eukaryotes, and to occur in high copy number in 
plant genomes. Several of these elements have been sequenced 
and were found to display a high degree of heterogeneity and 
insertional polymorphism, both within and between species. 
Because retrotransposon insertions are irreversible (Minghetti 
and Dugaiczyk 1993; Shimamura et al. 1997), they are considered 
76 IPGRI TECHNICAL BULLETIN NO. 10
particularly useful in phylogenetic studies. In addition, their 
widespread occurrence throughout the genome can be exploited in 
gene mapping studies, and they are frequently observed in regions 
adjacent to known plant genes. 
 Several variations of retrotransposon-based markers exist. 
Sequence-Specific Amplified Polymorphism (S-SAP) is a dominant, 
multiplex marker system for the detection of variation in DNA 
flanking the retrotransposon insertion site. Retrotransposon containing 
fragments are amplified by PCR, using one primer designed from 
Figure 7. Principles of SNP analysis using the SNaPshot method. (A) 
Sequence data flanking the SNP are used to design a PCR primer with 
extension starting at the position of the SNP. The use of dideoxynucleotides 
(ddATP, ddCTP, ddGTP and ddTTP) ensures that primer extension only 
occurs at the SNP position. Using differently labeled dideoxynucleotides 
(A = black, C = green, G = blue, T = red) during PCR, extension products 
are tested for fluorescent signals by electrophoresis to determine which 
dideoxynucleotides are incorporated. (B) The use of unique primer sizes for 
different SNP loci allows multiplexing up to 15-fold during PCR. Nucleotide 
variation at the SNP position is detected by variation in incorporation of 
dideoxynucleotides. Note that heterozygotes display two different peaks 
and hence that SNPs are scored in a codominant manner. By courtesy of 
Gerard van der Linden (Plant Research International BV).
A
B
Molecular markers for genebank management 77
the conserved terminus of the LTR and one based on the presence 
of a nearby restriction endonuclease site. Experimental procedures 
resemble those used for AFLP analysis and they are usually dominant 
markers. Compared to AFLP, S-SAP generally yields fewer fragments 
but higher levels of polymorphism (Waugh et al. 1997).
 Inter-retrotransposon Amplified Polymorphism (IRAP) and 
Retrotransposon-Microsatellite Amplified Polymorphism (REMAP) 
are dominant, multiplex marker systems that examine variation in 
retrotransposon insertion sites. With IRAP, fragments between two 
retrotransposons are isolated by PCR, using outward-facing primers 
annealing to LTR target sequences. In the case of REMAP, fragments 
between retrotransposons and microsatellites are amplified by PCR, 
using one primer based on a LTR target sequence and one based on 
a simple sequence repeat motif. IRAP as well as REMAP fragments 
can be separated by high-resolution agarose gel electrophoresis 
(Kalendar et al. 1999).
 Retrotransposon-Based Insertional Polymorphism (RBIP) is a 
codominant marker system that uses PCR primers designed from 
the retrotransposon and its flanking DNA to examine insertional 
polymorphisms for individual retrotransposons. Presence or absence 
of insertion is investigated by two PCRs, the first using one 
primer from the retrotransposon and one from the flanking DNA, 
the second using primers designed from both flanking regions. 
Polymorphisms are detected by simple agarose gel electrophoresis 
or by dot hybridization assays. A drawback of the method is that 
sequence data of the flanking regions is required for primer design. 
A major advantage of RBIP is that it can easily be automated, using 
gel-free procedures such as TaqMan™ or DNA chip technology 
(both explained in “Future challenges”) in order to increase 
sample throughput (Flavell et al. 1998). RBIP markers have been 
used to characterize genetic diversity of international germplasm 
collections for pea, barley, tomato and pepper within the context 
of the European Union project “TEGERM” (http://www.biocenter.
helsinki.fi/bi/tegerm/).
Functional diversity markers
The availability of sequence data of expressed DNA has enabled the 
development of markers that are physically associated with coding 
regions of the genome. ESTs are the result of sequencing cDNA clones 
and the information generated is generally stored in databases. These 
sequences can then be used for designing primers either to readily 
generate polymorphic markers or as a source of bands for CAPS 
markers. The raw sequence information will also aid in screening for 
the occurrence of microsatellite sequences (EST-SSR) or single nucleotide 
78 IPGRI TECHNICAL BULLETIN NO. 10
polymorphisms (EST-SNP), after which markers can be developed that 
are targeted to transcribed regions of the genome. EST-SSRs have been 
applied successfully in the characterization of accessions of wheat (Eujayl 
et al. 2002) and barley (Thiel et al. 2003). EST-SNPs have been used to 
study functional diversity in maize (Rafalski 2002).
 Complementary DNA can also be used as template for subsequent 
direct marker generation, for example through AFLP technology (cDNA-
AFLP). cDNA-AFLP is commonly used in the identification of genetic 
polymorphisms between contrasting phenotypes under controlled 
conditions in order to facilitate the construction of linkage maps (e.g. 
Brugmans et al. 2002), or to identify candidate genes. In diversity studies, 
the application of cDNA-AFLP should be limited to the identification 
and comparison of specific gene-related patterns, as in general cDNA 
differences may be caused by differences in the developmental stage of 
plants and the environmental conditions rather than by existing DNA 
polymorphisms.
 A simple PCR-based marker technique that targets coding sequences 
in the genome is Sequence-Related Amplified Polymorphism (SRAP). 
SRAP uses forward primers consisting of an unspecific filler sequence of 
ten bases, the sequence CCGG and three selective nucleotides. Reverse 
primers also contain a filler sequence, but are followed by the sequence 
AATT and three selective nucleotides. The CCGG sequence is used to 
target GC-rich regions, such as exons in open reading frames, while 
the AATT sequence on the reverse primers is aimed at AT-rich regions, 
such as promoters and introns. The generally conserved nature of exon 
sequences, combined with the generally variable nature of introns, 
promoters and spacers, enables SRAP analysis to generate polymorphic 
bands. The use of selective nucleotides on the PCR primers results in the 
amplification of subsets of open reading frames that display multilocus 
band profiles following appropriate labelling of a primer, polyacrylamide 
gel electrophoresis and autoradiography. In Brassica oleracea L. sequence 
analysis revealed that 45% of the SRAP fragments could be matched 
with known genes. Furthermore, SRAP analysis could easily be applied 
successfully to other crops, such as potato, rice, lettuce and apple (Li and 
Quiros 2001). 
 A related technique that uses EST sequence information is Target 
Region Amplification Polymorphism (TRAP). For TRAP analysis, 
a fixed primer designed from a targeted EST sequence is combined 
with an arbitrary primer having an AT- or CG-rich core sequence. For 
different plant species TRAP revealed multiple scorable fragments, 
and the technique may be well suited for determining the genotypes of 
germplasm and tagging genes for traits of interest (Hu and Vick 2003).
 EST-based, cDNA-based and SRAP markers are common in that they 
target diversity in coding regions. The gene function of the targeted DNA 
Molecular markers for genebank management 79
will usually be unknown. If gene sequence data are available, markers 
may be developed for particular (groups of) genes. This is the case, 
for example, in a recently developed strategy for analyzing groups of 
resistance genes - Resistance Gene Homologue Polymorphism (RGHP). 
With this methodology, groups of resistance genes are targeted by PCR 
using primers aimed at conserved domains of resistance genes, such as 
the Leucine Rich Repeat (LRR) or the Nucleotide Binding Site (NBS), 
both involved in resistance mechanisms (Chen et al. 1998; Sicard et al. 
1999). In NBS-directed profiling (NBS-DP), one primer is targeted to a 
conserved sequence of the NBS, while the other primer is based on the 
presence of a nearby endonuclease restriction site. The highly conserved 
nature of these targets allows the NBS-DP primers to be used beyond 
the species level. Because resistance genes have often originated from 
gene duplications, variation may also be traced in analogs of the gene. 
Polyacrylamide gel electrophoresis and autoradiography are part of 
the NBS-DP technique, resulting in AFLP-like banding profiles that are 
scored as dominant markers. Variable fragments isolated from the gels 
and sequenced can be subjected to Basic Local Alignment Search Tool 
(BLAST) analysis in sequence databases in order to search for matches 
with sequences from known genes. The appealing feature of NBS-DP 
is that it may also detect variation in resistance genes that were thus far 
unknown (van der Linden et al. 2004).
 A few studies show marker data to be correlated with such functional 
traits. Research was conducted in lettuce germplasm to evaluate the 
diversity of wild and cultivated material by means of markers derived 
from resistance genes of the NBS-LRR type (Sicard et al. 1999). The three 
markers used were highly correlated with the resistance phenotypes in 
lettuce and proved useful in differentiating accessions. This shows that 
markers based on functional genes may have some utility to build core 
collections, but this needs much further investigation.
Developments in detection techniques
TaqMan™
TaqMan™ is a probe used to detect specific sequences in PCR 
products by employing the 5’→3’ exonuclease activity of the Taq 
DNA polymerase. The TaqMan™ probe (20–30 bp), disabled from 
extension at the 3’ end, consists of a site-specific sequence labeled with 
a fluorescent reporter dye and a fluorescent quencher dye. During 
PCR the TaqMan™ probe hybridizes to its complementary single 
strand DNA sequence within the PCR target. When amplification 
occurs, the TaqMan™ probe is degraded due to the 5’→3’ exonuclease 
activity of Taq DNA polymerase, thereby separating the quencher 
from the reporter during extension. Due to the release of the 
quenching effect on the reporter, the fluorescence intensity of the 
80 IPGRI TECHNICAL BULLETIN NO. 10
reporter dye increases. During the entire amplification process this 
light emission increases exponentially, the final level being measured 
by spectrophotometry after termination of the PCR. Because increase 
of the fluorescence intensity of the reporter dye is only achieved 
when probe hybridization and amplification of the target sequence 
has occurred, the TaqMan™ assay offers a sensitive method to 
determine the presence or absence of specific sequences. Therefore, 
this technique is particularly useful in diagnostic applications, such 
as the screening of samples for the presence or incorporation of 
favourable traits, the detection of pathogens and diseases in plants 
and the screening of plant material for the presence of transgenic 
elements. The TaqMan™ assay allows high sample throughput 
because no gel electrophoresis is required for detection. When 
different probes are used which are able to discriminate between 
allelic variants, TaqMan™ behaves as a codominant marker. 
TaqMan™ is also referred to as fluorogenic 5’ nuclease assay 
(Holland et al. 1991; Lee et al. 1993).
Automated sequencers 
DNA sequencing was initially carried out through radiolabeling, 
polyacrylamide gel electrophoresis and visual reading of gels (Figure 
1). Throughput of DNA samples has increased substantially by the 
use of automated sequencers and automated data processing. Prior 
to running the samples on automated sequencers, tissue preparation, 
fluorescent labelling and PCR need to be performed. Depending 
on the type of sequencer used, preparation of polyacrylamide gels 
and manual loading of samples onto the gels may still be required. 
More advanced models do not require the preparation of gels. 
For example, in the ABI Prism 3700 DNA analyzers, fluorescently 
labelled samples are transferred by a robot from a microtitre plate 
to an electrophoresis chamber and run through a capillary system 
(Figure 8). A laser detection system measures the light emission of 
samples, either during individual or simultaneous electrophoresis. 
The 4300 System by LI-COR is a third generation instrument based 
on highly sensitive infrared fluorescence detection technology. The 
throughput of sample analysis is greatly enhanced compared to the 
manual methods, and can range from 450 000 bases to 2.8 million 
bases in a 24-hr day (for example, in a MegaBACE 4000 instrument). 
Apart from sequence analyses, automated sequencers are also being 
used for the analysis of microsatellites, AFLPs and SNPs. The high 
throughput capacity of these machines is achieved by the improved 
mechanical operation and detection systems, but also by allowing 
multiplexing of PCR reactions. Collected electronic data may then 
be processed with dedicated software packages (Figure 2).
Molecular markers for genebank management 81
Microarray or DNA chip technology
Microarray or DNA chip technology is a high throughput screening 
technique based on the hybridization between oligonucleotide 
probes (genomic DNA or cDNA) and either DNA or mRNA. Chips 
may consist of arrays of amplified DNA immobilized on miniature 
glass or nylon substrates that are then tested by hybridization to a 
series of fluorescently labelled probes. More commonly however, 
arrays of oligonucleotide probes are synthesized (e.g. Affymetrix 
GeneChips), followed by the exposure to fluorescently labelled PCR 
samples. Hybridization signals are determined by laser technology 
from which data on sequence variation is obtained (Figure 9). 
Microarrays now may comprise up to 250,000 features per square 
centimetre; however, technical advances are likely to further increase 
the extent of miniaturization in the future. Applications of the DNA 
chip technology include diagnostics, mutation and polymorphism 
detection, gene discovery, gene expression and gene mapping 
(Lemieux et al. 1998; Ramsay 1998; Gibson 2002).
Figure 8. (A) ABI Prism 3700 DNA analyzer with opened doors and hood. The bottom 
part consists of a storage facility for water, buffers, etc. The opened hood gives 
access to the work surface. (B) Close-up picture of the work surface with opened 
electrophoresis chamber. Microtitre plates are loaded at the left, and the capillary 
system is located at the right. The robotic arm positioned above the work surface 
transfers the samples from the plates to the capillary system. At the end of the 
capillaries an optic device measures the light emission of the samples. By courtesy 
of Applied Biosystems.
(A) (B)
82 IPGRI TECHNICAL BULLETIN NO. 10
Wide application of DNA chips, 
such as the detection of SNPs between 
genotypes, has not yet found its way 
to genebanks, particularly because 
of the laborious sequencing methods 
and high costs involved. However, a 
recent development in the application 
of DNA chips to the analysis of genetic 
polymorphisms is the Diversity Array 
Technology (DArT) introduced by 
Jaccoud et al. (2001). First, a so-called 
’diversity panel‘ is created consisting of 
arrays of DNA fragments originating 
from a group of diverse genotypes. 
To construct a diversity panel, total 
genomic DNA is isolated from a group 
of genotypes, after which the DNA 
is pooled and digested by restriction 
enzymes. Enzyme-specific adapters 
are then ligated to the fragments, 
followed by reduction of the genomic 
complexity through PCR using 
primers with selective nucleotides. 
Subsequently, the DNA fragments 
are molecularly cloned, the inserts 
amplified by PCR and the amplified 
fragments (amplicons) arrayed onto DNA chips. Thus, a diversity panel 
consists of a large subset of DNA fragments derived from total DNA 
of a group of genotypes, immobilized on a solid substrate. Second, 
organisms or groups of organisms belonging to the gene pool from 
which the diversity panel was derived can be genetically fingerprinted 
by testing for hybridization to the arrayed fragments. Two approaches 
are being used; in the first approach, two genomic samples are converted 
to so-called “representations” by isolation of DNA, followed by 
restriction of the DNA and reduction of the genomic complexity using 
the same procedures as in the development of the diversity panels. The 
amplicons of each representation are fluorescently labelled with a red or 
green dye, after which the representations are mixed and hybridized to 
the diversity panel. For each element of the array, the red/green signal 
ratio is determined. A significant deviation from a ratio of 1 indicates 
a difference in the presence of the fragment between the samples, and 
hence a genetic polymorphism. In the second approach, a single DNA 
sample is converted to a representation following the same procedures as 
in the first approach and labelled with green fluorescent dye. In addition, 
Figure 9. Example of the results from the differential 
display technique using DNA chip technology. A 
series of DNA probes are tested for hybridization 
to DNA samples with different fluorescent labels. 
Differences in gene expression between the samples 
are derived from determination of the red:green 
ratio. By courtesy of Asaph Aharoni (Plant Research 
International BV).
Molecular markers for genebank management 83
fragments of the cloning vector that are common to all elements of the 
array are labelled with red fluorescent dye, after which the green and 
red fragments are hybridized to the diversity panel. Signal intensity 
ratios are then determined at each element of the array for each of the 
genotypes used to construct the diversity panel. Comparison of signal 
ratios at array elements between genotypes allows the identification of 
variation in the presence or absence of DNA fragments, and hence the 
identification of genetic polymorphisms. DArT does not require any 
sequence data and is considered an economical, high-throughput, robust 
marker detection technology with a high genome coverage that may 
have potential relevance to genebanks in germplasm characterization 
(Jaccoud et al. 2001). The data are similar to AFLPs.
Pyrosequencing™
Pyrosequencing™ refers to sequencing by synthesis, a simple to 
use technique for accurate analysis of DNA sequences. It is a novel 
sequencing method of relatively short DNA templates based on 
real-time (quantitative) pyrophosphate release, as outlined in the 
following. A DNA fragment consisting of a sequencing primer 
hybridized to a single stranded DNA template is incubated with the 
enzymes DNA polymerase, ATP sulfurylase, firefly luciferase and a 
nucleotide degrading enzyme. The deoxynucleotides dATP, dCTP, 
dGTP and dTTP are added sequentially in an iterative manner. In case 
of complementarity to the base of the DNA template, each time the 
DNA polymerase incorporates a nucleotide to the new DNA strand, 
pyrophosphate is released in equal molarity to that of the incorporated 
deoxynucleotide. Pyrophosphate is then used as a substrate for 
the enzyme ATP sulfurylase converting pyrophosphate into ATP. 
Subsequently, the concentration of ATP is detected by the enzyme 
luciferase as a visible and measurable real-time light signal. Between 
each addition of deoxynucleotides, unincorporated nucleotides and 
ATP are degraded by the nucleotide-degrading enzyme. Properties 
of this enzyme include a slower degradation of nucleotides than 
the nucleotide incorporation by the DNA polymerase and a slower 
degradation of ATP than ATP synthesis by the sulfurylase. Sequences 
of up to 20 bases, such as those in which SNPs are found, can be 
accurately determined by pyrosequencing™. A high throughput of 
samples can be achieved through a high degree of automation of the 
methods, e.g. by the use of high-density microtitre plates and micro-
injector technology (Ronaghi et al. 1998). Pyrosequencing™can be 
successfully used for SNP and insertion/deletions detection, genotype 
identification and characterization, and in general for applications 
focusing on the variation of short DNA fragments. It can be used for 
accurate quantification of ratios of variant bases in a DNA sample.
84 IPGRI TECHNICAL BULLETIN NO. 10
Developments in functional genomics
Allele mining
Identification and access to allelic variation that affects the plant 
phenotype is of the utmost importance for the utilization of genetic 
resources, such as in plant variety development. Considering the 
huge numbers of accessions that are held collectively by genebanks, 
genetic resources collections are deemed to harbour a wealth of 
undisclosed allelic variants. The challenge is how to unlock this 
variation. Allele mining is a research field aimed at identifying allelic 
variation of relevant traits within genetic resources collections. 
For identified genes of known function and basic DNA sequence, 
genetic resources collections may be screened for allelic variation 
by e.g. the “tiling strategy” using DNA chip technology (e.g. 
Lemieux et al. 1998). In that approach the basic DNA sequence of 
a gene is spotted on a chip in the form of large series of sequence-
overlapping probes consisting of 15–20 bases. Each base position in 
a fluorescently labelled sample is then interrogated for the presence 
of point mutations by monitoring hybridization signals with the 
spotted probes. Because the sequence of samples is determined in 
comparison with the primary composition of a gene, this method 
is also known as “re-sequencing”. With this method new point 
mutations, in relatively large DNA fragments, can be detected. 
Once allelic variants of interest have been identified, the approach 
can be optimized by focusing on target sets of polymorphisms, for 
example by using SNP detection methods (see “Future challenges”). 
As an example, the tiling strategy has been used by the International 
Rice Research Institute (IRRI) to identify favourable alleles related 
to tolerance to biotic and abiotic stress factors in rice.
Association genetics
In allele mining studies as described in the previous section, allelic 
variation is analyzed for identified genes whose function and basic 
DNA sequence are known and whose map position in the genome 
will generally have been determined. On the contrary, in association 
genetic studies no prior information about the genes of interest 
is available, but associations between genetic markers and the 
considered traits are simply derived from observational research 
(Figure 10). Association genetics focuses on the identification of 
correlations between phenotypic traits and genetic markers with 
the aim to identify and locate the underlying genes in the genome 
(association mapping). Association genetics originated in human 
genetic studies focusing on the application of significant associations 
between marker data and diseases for mapping and diagnostic 
purposes (e.g. Peroutka 1997). Recently, association genetics has 
Molecular markers for genebank management 85
gained increasing interest from plant geneticists (e.g. Buckler and 
Thornsberry 2002; Rafalski 2002). The rationale behind association 
genetics is that, in general, alleles at different loci are expected to 
be randomly associated into genotypes, or in other words, to occur 
in linkage equilibrium. The more adjacent two loci are, the lower 
the probability of chromosomal recombination occurring between 
the loci during meiosis. However, given sufficient numbers of 
regeneration cycles, once established 
linkage disequilibria will eventually be 
disrupted by recombination even in 
the case of tightly linked loci. Linkage 
disequilibria are therefore expected to 
be readily lost from populations, the 
rate of decay being determined by the 
recombination fraction and the number 
of generations elapsed. However, 
selection tends to accumulate favourably 
interacting alleles and hence opposes 
the decay of linkage disequilibria. 
Observed linkage disequilibria may 
therefore point towards adaptive 
significance and possible linkage of the 
underlying alleles. The interpretation 
of observed associations is by no 
means straightforward. For example, 
population sub-structuring and founder 
events may lead to irrelevant associations 
between loci, even if the markers and 
genes of interest are unlinked. Linkage 
disequilibria are also affected by the 
mating system and population history, 
and hence may vary between different 
kinds of populations (e.g. selfing vs. 
outcrossing populations or wild vs. 
cultivated material). In addition, in the 
case of complex traits, the presence 
or absence of a genetic marker may 
not be indicative for the expression of 
the phenotype. Theory and analysis 
regarding genetic linkage and association 
genetics are still under development (e.g. 
Ewens and Spielman 2001).
Figure 10. Significant associations between AFLP 
markers and resistance to different pathotypes of 
downy mildew in lettuce. Results are based on 
data from the EU biotech demonstration project 
“molecular markers for genebanks”, directed to 
the characterization of an entire lettuce collection 
of about 2300 accessions with molecular markers. 
Data analyses included the relationship with 
existing evaluation data for about 1500 accessions 
of cultivated lettuce. Associations between 
markers and phenotypic traits were tested for 
significance by Chi-square values, different colours 
representing different levels of significance. Note 
that some markers are associated with resistance 
to different pathotypes and that some pathotypes 
are associated with multiple markers (Theo van 
Hintum, unpublished data).
Bremia restistance
A
FL
P
 m
ak
er
χ² > 200
150 < χ² < 200
100 < χ² < 1500
86 IPGRI TECHNICAL BULLETIN NO. 10
Comparative genomics
Comparative genomics focuses on the integration of genome 
information derived from different species with the aim to obtain 
more insight in the genetic organization of traits through the 
identification of conserved mechanisms (e.g. Laurie and Devos 
2002). This research field has emerged from previous findings of 
comparative mapping, by which conservation of tracts of collinear 
markers have been investigated not only among members of the 
same botanical family, but also among different species, and often 
led to better understanding of genome evolution. Comparative 
mapping has shown that a large proportion of the markers and genes 
are indeed located at comparable positions in the genome (synteny). 
Together with the availability of an increasing number of genome 
sequences, including those of known genes, the conservation 
of gene sequences and their functions among species have also 
been investigated and are used to further develop the knowledge 
obtained from previous genetic linkage maps for different species. 
Conserved Orthologous Set (COS) markers are conserved markers 
that may serve as anchors for map development (Fulton et al. 2002). 
Degenerate Oligonucleotide Primed-PCR (DOP-PCR) uses partially 
degenerated primers for polymorphism detection (Telenius et al. 
1992). Because no prior sequence information is required, DOP-
PCR is considered useful when crops are involved for which no, or 
limited sequence information is available. Comparison of the genetic 
maps of different species will reveal information on chromosome 
evolution and the common genetic control of traits in different 
organisms. Comparative maps are being developed for various 
important crops and are expected to facilitate the understanding 
of the genetic organization of traits in less studied organisms. This 
may also reveal novel alleles for relevant genes that subsequently 
may be exploited in crop improvement.
Molecular markers for genebank management 87
Future challenges 
As outlined in the previous section, many interesting developments 
in the field of molecular biology and functional genomics are 
currently ongoing that are relevant to the genebank and user 
community. However, these developments are accompanied with 
new and exciting challenges, demanding involvement of expertise 
from other disciplines, including statistics, bioinformatics and 
economics.
 Advances in biotechnology have resulted in a large variety 
of molecular marker systems and enhanced opportunities for 
automation of the majority of the techniques. Therefore, throughput 
of samples is expected to increase substantially in the near future, 
resulting in a wealth of information. This is expected to increase 
even further when the costs to apply the techniques decrease 
in the future. These developments are important to genebanks, 
considering the vast amounts of germplasm they maintain. Projects 
are already underway in which entire collections are screened with 
molecular markers. In the EU-funded project, “Molecular markers 
for Genebanks” (http://www.cgn.wur.nl/pgr/research/lettuce/) 
the lettuce collection of the Centre for Genetic Resources, The 
Netherlands was characterized with 3 AFLP primer combinations 
and 10 SSRs. Altogether a total of more than 8 000 DNA samples from 
about 2300 accessions were analyzed, resulting in nearly 2 million 
data points. In the EU-funded project, TEGERM (http://www.
biocenter.helsinki.fi/bi/tegerm/) high throughput techniques for 
retrotransposon-based markers (see “Current developments”) are 
used to screen the genetic diversity of several complete germplasm 
collections of pea, barley, tomato and pepper. The exploding 
amount of information will inevitably pose serious problems on 
data access, data analysis and presentation of results. Adequate 
information systems are currently lacking that can store the data 
efficiently, that can analyze the data properly in coherence with 
passport and evaluation data, and that can visualize the results of 
analyses in a meaningful way. This will frustrate or at least slow 
down any further progress towards the better conservation and 
exploitation of genebank collections. During the last years several 
bioinformatics projects addressing these issues were initiated, such 
as the EU-funded project “GENE-MINE” (http://www.gene-mine.
org/) and the BBSRCfunded project “GERMINATE” (http://bioinf.
scri.sari.ac.uk/germinate/).
 Economics is getting more attention in the genebank community, 
assessing the costs of maintaining and characterizing many 
88 IPGRI TECHNICAL BULLETIN NO. 10
accessions against their benefits. Insight in the costs and benefits of 
different genebank operations will allow genebanks to evaluate the 
efficiency of their strategies and will aid in the choices to be made 
(Engels and Visser 2003). For example, in the EU-funded project 
“ICONFORS” (http://www.igergru.bbsrc.ac.uk/iconfors/index.
htm) genetic and economic knowledge is acquired that is essential 
to improve seed multiplication methodologies for genebanks 
maintaining ex situ seed collections of perennial European forage 
species. In an extensive experimental set-up, the effect is investigated 
of different multiplication strategies and sites within Europe on 
the genetic integrity of accessions of forage species. At the same 
time, detailed data are collected about all the costs involved in 
the regeneration procedures. Effects of the different strategies and 
regeneration sites on the genetic integrity will be evaluated against 
the necessary costs in order to optimize regeneration protocols. 
 Molecular markers are now common tools that are applied in 
many aspects of PGR management. However, like for all genebank 
operations, the use of molecular markers requires resources of 
time and funds that should not be taken for granted. For example, 
molecular markers may be used to rationalize collections for 
economic limitations. The success of such an approach will very 
much depend on the expected benefits relative to the investments 
necessary to collect the data. A case study is provided by McGregor 
et al. (2002) and van Treuren et al. (2004), using AFLPs that revealed 
5% redundancy in a wild potato collection (314 accessions) of the 
Centre for Genetic Resources, The Netherlands. It was estimated 
that the cost of the AFLP data was 2.5 times as high as the savings 
in maintaining the collections for one regeneration cycle; thus, on 
the long term return of investments can only be expected only after 
three cycles of regeneration. This may seem not worth the effort. 
However, the study also revealed other relevant data that are 
difficult to express in economic terms. For example, the AFLP data 
identified several taxonomic misclassifications and incorrect origin 
data that were used to improve the documentation of this collection. 
From investigation of the AFLP data in relation to geographic data, 
general guidelines were derived for sampling strategies to lower 
the probability of sampling similar material in future collection 
missions. More detailed data on intra-accession variation appeared 
useful to optimize regeneration protocols. Therefore, the spin-off 
of molecular studies can be considerable, but these are difficult to 
value without proper economic theory. Theory about genebank 
economy is developing, but still in its infancy (Swanson 1996; 
Pardey et al. 2001; Sackville Hamilton et al. 2002; Engels and Visser 
2003).
Molecular markers for genebank management 89
 The huge numbers of accessions that are stored at present in 
genebanks contain a wealth of genetic diversity that is largely 
unexplored. Identifying this variation within genebank collections 
is a major challenge to make the genetic diversity for specific 
characters available for crop improvement. Allele mining techniques, 
association genetics and comparative genomics are promising 
developments for genebanks in order to achieve this goal. Close 
cooperation between the genebank community, molecular biologists, 
bio-informaticians and plant breeders is thereby needed, such as 
in the CGIAR Challenge Program, “Unlocking genetic resources in 
crops for the resource-poor” (http://www.cgiar.org/research/res_
cppilot.html). As more sequence data become available for particular 
germplasm curators may store sequences as an additional service 
to compliment traditional services. This would require additional 
staff with bioinformatics expertise. . The supply of sequence and 
other molecular marker data might promote the use of germplasm 
by a wider clientele including plant molecular geneticists and 
biologists, and as such, contribute to the sustainable conservation 
of genetic resources. Linked with the interest of providing genomic 
data are also the relevance of good phenotyping. Germplasm users 
(breeders, plant physiologists, etc.) may be the best-positioned 
scientists to conduct appropriate phenotyping of germplasm. The 
advances in genomic research heavily depend on the availability of 
reliable phenotype data, so increased awareness of this important 
task should accompany any involvement of genebank staff in the 
use of modern technologies.
 An important development in agriculture is the increasing 
interest in ecological farming. To meet the demands from this sector, 
genebanks may play a role in providing the genetic resources of 
interest, such as material with a genetically broader base or material 
less adapted to cultivated conditions. A clear and additional 
challenge for genebanks in this era of molecular advances is the 
consideration of the development of pre-breeding as part of their 
routine services for customers. The fact that molecular markers 
aid the identification of useful traits in germplasm, and their 
use in increasing efficiency in introgression in modern cultivars 
could stimulate more pre-breeding work to increase utilization of 
genebank collections.
90 IPGRI TECHNICAL BULLETIN NO. 10
Concluding remarks
A wide variety of new molecular marker technologies are available 
to assess genetic variation, and many of them are increasingly being 
applied to complement traditional approaches in germplasm and 
genebank management. Each technology has its strengths and 
weaknesses that need to be carefully considered in the light of 
the intended application. In diversity studies, microsatellites and 
AFLPs are often the preferred marker technologies. Many marker 
applications in taxonomy, genebank management and crop breeding 
rely on the assumption that the diversity assessed is to a large extent 
predictive for variation in qualitative and quantitative characters. 
However, this assumption should not be taken for granted. In 
diversity studies anonymous markers may be located in non-coding 
genomic regions. There are no few studies testing the assumed 
link between diversity and taxonomy prediction, and marker data 
analyses should be used with this caution in mind. 
 The wealth of sequencing data that are increasingly becoming 
available for more and more crops via whole-genome sequencing 
projects, and access to EST-databases, enables the development of 
markers targeting coding regions of the genome or even specific 
genes. Technological developments continue to increase sample 
throughput, which will facilitate large-scale genotyping of genetic 
resources. Allele mining, association genetics and comparative 
genomics are promising new approaches to obtain insight in the 
organization and variation of genes that affect relevant phenotypic 
traits. Exploiting these developments by combining expertise from 
several disciplines, including molecular genetics, statistics and 
bioinformatics is one of the main challenges facing genebanks. 
Streamlining collaborations among genebanks will greatly aid the 
exploitation of these techniques efficiently. Ultimately, the biggest 
challenge to the genebank community is to share materials and 
technologies to collectively optimize collection, characterization 
and use, to unlock the useful variation in the world’s genebank 
collections.
Molecular markers for genebank management 91
Acknowledgments
Special thanks go to Jan Engels of the International Plant Genetic 
Resources Institute (IPGRI) for his role in encouraging the preparation 
of this publication and for numerous hours of revision to make a 
much better final version. We thank reviewers Glenn Brian and 
Richard Whitkus for review of an earlier version of this manuscript; 
Sarah Stephenson for editorial assistance; Gerard van der Linden, 
Applied Biosystems, and Asaph Aharoni and Theo van Hintum for 
permission to reproduce Figures 7, 8, 9, and 10, respectively. Thanks 
also to Toby Hodgkin, Ramanatha Rao, Luigi Guarino, Ehsan Dulloo 
and Prem Mathur for advice in the development of the outline; and 
IPGRI for financial support to publish this technical bulletin.
 Names are necessary to report data. However, the USDA neither 
guarantees nor warrants the standard of the product, and the use 
of the name by the USDA implies no approval of the product to the 
exclusion of others that may also be suitable.
92 IPGRI TECHNICAL BULLETIN NO. 10
References
Abo-elwafa A, Murai K and Shimada T. 1995. Intra- and inter-specific variations in Lens 
revealed by RAPD markers. Theoretical and Applied Genetics 90:335–340.
Ahnert D, Lee M, Austin DF, Livini C, Woodman WL, Openshaw SJ, Smith JSC, 
Porter K and Dalton G. 1996. Genetic diversity among elite sorghum inbred 
lines assessed with DNA markers and pedigree information. Crop Science 
36:1385–1392.
Akopyanz N, Bukanov N, Westblom TU and Berg DE. 1992. PCR-based RFLP analysis 
of DNA sequence diversity in the gastric pathogen Helicobacter pylori. Nucleic 
Acids Research 20:6221–6225.
Aldrich PR, Doebley J, Schertz KF and Stec A. 1992. Patterns of allozyme variation 
in cultivated and wild Sorghum bicolor. Theoretical and Applied Genetics 
85:451–460.
Allard RW. 1970. Population structure and sampling methods. In Genetic resources 
in plants: their exploration and conservation (OHFrankel and E Bennett, eds.). 
F.A. Davis Company, Philadelphia, USA. pp. 97–107
Alonso-Blanco C, Peeters AJ, Koornneef M, Lister C, Dean C, van den Bosch N, 
Pot J and Kuiper MT. 1998. Development of an AFLP based linkage map of 
Ler, Col and Cvi Arabidopsis thaliana ecotypes and construction of a Ler/Cvi 
recombinant inbred line population. Plant Journal 14:259–271.
Anthony F, Combes MC, Astorga C, Bertrand B, Graziosi G and Lashermes P. 2002. 
The origin of cultivated Coffea arabica L. varieties revealed by AFLP and SSR 
markers. Theoretical and Applied Genetics 104:894–900.
Arias DM and Rieseberg LH. 1994. Gene flow between cultivated and wild sunflowers. 
Theoretical and Applied Genetics 89:655–660.
Arnheim N. 1983. Concerted evolution of multigene families. In Evolution of genes 
and proteins (M Nei and RK Koehn, eds.). Sinauer, Sunderland, Massachusetts, 
USA. pp.38–61
Avise JC. 2004. Molecular Markers, Natural History and Evolution, (2nd ed.). Sinauer 
Associates, Sunderland, Massachusetts, USA.
Backes G, Hatz B, Jahoor A, and Fischbeck G. 2003. RFLP diversity within and between 
major groups of barley in Europe. Plant Breeding 122:291–299.
Bailey CD, Carr TG, Harris SA and Hughes CE. 2003. Characterization of angiosperm 
nrDNA polymorphism, paralogy, and pseudogenes. Molecular Phylogenetics 
and Evolution 29:435–455.
Barrett SCH and Kohn JR. 1991. Genetic and evolutionary consequences of small 
population size in plants: implications for conservation. In Genetics and 
Conservation of Rare Plants (D.A. Falk and K.E. Holsinger, eds.). Oxford 
University Press, Oxford, UK. pp. 3–30.
Bartsch D and Ellstrand NC. 1999. Genetic evidence for the origin of California 
beets (genus Beta). Theoretical and Applied Genetics 99:1120–1130.
Baum DA and Donoghue MJ. 1995. Choosing among alternative “phylogenetic” 
species concepts. Systematic Botany 20:560–573.
Baurens F, Noyer JL, Lanaud C and Lagoda PJL. 1997. Assessment of a species-
specific element (Brep 1) in banana. Theoretical and Applied Genetics 
95:922–931.
Becker HC, Engqvist GM and Karlsson B. 1995. Comparison of rapeseed cultivars 
and resynthesized lines based on allozyme and RFLP markers. Theoretical and 
Applied Genetics 91:62–67.
Beckmann JS and Soller M. 1986. Restriction fragment length polymorphisms in 
plant genetic improvement. Plant Molecular Cell Biology 3:197–250.
Molecular markers for genebank management 93
Bernacchi D, Beck-Bunn T, Emmatty D, Eshed Y, Inai S, Lopez J, Petiard V, 
Sayama H, Uhlig J, Zamir D and Tanksley S. 1998. Advanced backcross 
QTL analysis of tomato. II. Evaluation of near-isogenic lines carrying 
single-donor introgressions for desirable wild QTL-alleles derived from 
Lycopersicon hirsutum and L. pimpinellifolium. Theoretical and Applied 
Genetics 97:170–180.
Berry A and Kreitman M. 1993. Molecular analysis of an allozyme cline: alcohol 
dehydrogenase in Drosophila meloganaster on the east coast of North America. 
Genetics 134:869–893.
Bonierbale M, Beebe S, Tohme J and Jones P. 1995. Molecular genetic techniques in 
relation to sampling strategies and the development of core collections. In 
Report of the IPGRI workshop, 9–11 October 1995 (WG Ayad, T Hodgkin, 
A Jaradat and A Rao, eds.). International Plant Genetic Resources Institute, 
Rome, Italy. pp. 98–102.
Börner A, Chebotar S and Korzun V. 2000. Molecular characterization of the 
genetic integrity of wheat (Triticum aestivum L.) germplasm after long-term 
maintenance. Theoretical and Applied Genetics 100:494–497.
Botstein D, White RL, Skolnick M and Davis RW. 1980. Construction of a genetic 
map in man using restriction fragment length polymorphisms. American 
Journal of Human Genetics 32:314–331.
Bredemeijer GMM, Cooke RJ, Ganal MW, Peeters R, Isaac P, Noordijk Y, Rendell 
S, Jackson J, Röder MS, Wendehake K, Dijcks M, Amelaine M, Wickaert V, 
Bertrand L and Vosman B. 2002. Construction and testing of a microsatellite 
database containing more than 500 tomato varieties. Theoretical and Applied 
Genetics 105:1019–1026.
Bretting PK and Duvick DN. 1997. Dynamic conservation of plant genetic resources. 
Advances in Agronomy 61:2–51.
Brickell CD, Baum BR, Hetterscheid WLA, Leslie AC, McNeill J, Trehane P, 
Vrugtman F and Wiersma J. 2004. International code of nomenclature for 
cultivated plants. Regnum Vegetabile 144:1–123.
Brown ADH. 1989a. The case for core collections. In The use of plant genetic 
resources (ADH Brown, OH Frankel, DR Marshall and JT Williams, eds.). 
Cambridge University Press, Cambridge, USA. pp. 136–156.
Brown AHD. 1989b. Core collections: a practical approach to genetic resources 
management. Genome 31:818–824.
Brown ADH, Brubaker CL and Grace JP. 1997. Regeneration of germplasm samples: 
wild versus cultivated plant species. Crop Science 37:7–13.
Brown JH and Lomolino MV. 1998. Biogeography (2nd ed.). Sinauer Associates, 
Inc. Sunderland, Massachusetts, USA.
Brown AHD and Marshall DR. 1995. A basic sampling strategy: theory and practice. 
In Collecting plant genetic diversity, technical guidelines (L Guarino, V 
Ramanatha Rao and R Reid, eds.). CAB International, Wallingford, UK. 
pp. 75–91.
Brown AHD and Munday J. 1982. Population-genetic structure and optimal 
sampling of land races of barley from Iran. Genetica 58:85–96.
Brubaker CL and Wendel JF. 1994. Reevaluating the origin of domesticated cotton 
(Gossypium hirsutum; Malvaceae) using nuclear restriction fragment length 
polymorphisms (RFLPs). American Journal of Botany 81:1309–1326.
Bruford MW and Wayne RK. 1993. Microsatellites and their application to 
population genetic studies. Current Biology 3:939–943. 
Brugmans B, Fernandez del Carmen A, Bachem CWB, van Os H, van Eck HJ and 
Visser RGF. 2002. A novel method for the construction of genome wide 
transcriptome maps. Plant Journal 31:211–222.
94 IPGRI TECHNICAL BULLETIN NO. 10
Buckler ES and Thornsberry JM. 2002. Plant molecular diversity and applications 
to genomics. Current Opinions in Plant Biolology 5:107–111.
Burma BH. 1954. Reality, existence, and classification: a discussion of the species 
problem. Madroño 12:193–209.
Caetano-Anolles G. 1996. Fingerprinting nucleic acids with arbitrary oligonucleotide 
primers. Agro Food Industry Hi Tech 7:26–31.
Caetano-Anolles G, Bassam BJ and Gresshoff PM. 1991. DNA amplification 
fingerprinting using very short arbitrary oligonucleotide primers. 
Biotechnology 9:553–557.
Caetano-Anolles G, Bassam BJ and Gresshoff PM. 1992. DNA fingerprinting: 
MAAPing out a RAPD redefinition? Biotechnology 10:937.
Cao WG, Hucl P, Scoles G and Chibbar RN. 1998. Genetic diversity within spelta and 
macha wheats based on RAPD analysis. Euphytica 104:181–189.
Castillo R and Spooner DM. 1997. Phylogenetic relationships of wild potatoes, 
Solanum series Conicibaccata (sect. Petota). Systematic Botany 22:45–83. 
Cervera MT, Cabezas JA, Sancha JC, Martinez de Toda F and Martinez-Zapater JM. 
1998. Application of AFLPs to the characterization of grapevine Vitis vinifera L. 
genetic resources. A case study with accessions from Rioja (Spain). Theoretical 
and Applied Genetics 97:51–59.
Chan KF and Sun M. 1997. Genetic diversity and relationships detected by isozyme 
and RAPD analysis of crop and wild species of Amaranthus. Theoretical and 
Applied Genetics 95:865–873.
Charters YM, Robertson A, Wilkinson MJ and Ramsay G. 1996. PCR analysis of oilseed 
rape cultivars (Brassica napus L. ssp. oleifera) using 5’-anchored simple sequence 
repeat (SSR) primers. Theoretical and Applied Genetics 92:442–447.
Chavarriaga-Aguirre P, Maya MM, Tohme J, Duque MC, Iglesias C, Bonierbale MW, 
Kresovich S and Kochert G. 1999. Using microsatellites, isozymes and AFLPs 
to evaluate genetic diversity and redundancy in the cassava core collection 
and to assess the usefulness of DNA-based markers to maintain germplasm 
collections. Molecular Breeding 5:263–273.
Chen XM, Line RF and Leung H. 1998. Genome scanning for resistance-gene analogs 
in rice, barley, and wheat by high-resolution electrophoresis. Theoretical and 
Applied Genetics 97:345–355.
Cho RJ, Mindrinos M, Richards DR, Sapolsky RJ, Anderson M, Drenkard E, Dewdney 
J, Reuber TL, Stammers M, Federspiel N, Theologis A, Yang WH, Hubbell E, 
Au M, Chung EY, Lashkari D, Lemieux B, Dean C, Lipshutz RJ, Ausubel FM, 
Davis RW and Oefner PJ. 1999. Genome-wide mapping with biallelic markers 
in Arabidopsis thaliana. Nature Genetics 23:203–207.
Clausen AM and Spooner DM. 1998. Molecular support for the hybrid origin of the 
wild potato species Solanum × rechei. Crop Science 38:858–865.
Clegg MT 1993a. Chloroplast gene sequences and the study of plant evolution. 
Proceedings of the National Academy Science USA 90:363–367.
Clegg MT 1993b. Molecular evaluation of plant genetic resources. In Gene conservation 
and exploitation: Proceedings of the 20th Stadler genetics symposium held at 
the University of Missouri, Colombia, Missouri, USA. pp. 67–86.
Comes HP and Abbott RJ. 1998. The relative importance of historical events and gene 
flow on the population structure of a Mediterranean ragwort, Senecio gallicus 
(Asteraceae). Evolution 52:355–367.
Conner AJ and Dale PJ. 1996. Reconsideration of pollen dispersal data from field trials 
of transgenic potatoes. Theoretical and Applied Genetics 92:505–508.
Corriveau JL and Coleman AW. 1988. Rapid screening method to detect potential 
biparental inheritance of plastid DNA and results for over 200 angiosperm 
species. American Journal of Botany 75:1443–1458.
Molecular markers for genebank management 95
Cracraft J. 1989. Speciation and its ontology: the empirical consequences of 
alternative species concepts for understanding patterns and processes of 
differentiation. In Speciation and its consequences (D Otte and JA Endler, 
eds.). Sinauer Associates, Inc., Sunderland, Massachusetts, USA. pp. 
28–59.
Crawford DJ. 1990. Plant Molecular Systematics: Macromolecular Approaches. 
John Wiley and Sons, New York, USA. 
Crawley MJ, Brown SL, Hails RS, Kohn DD and Rees M. 2001. Transgenic crops in 
natural habitats. Nature 409:682–683.
Cronn RC, Brothers M, Klier K, Bretting PK and Wendel JF. 1997. Allozyme variation 
in domesticated annual sunflower and its wild relatives. Theoretical and 
Applied Genetics 95:532–545.
Cronquist A. 1978. Once again, what is a species? In Biosystematics in agriculture 
(JA Romberger, ed.). Allenheld, Osman and Company, Montclair, New 
Jersey, USA. pp. 3–20.
Crossa J and Vencovsky R. 1994. Variance effective population size for two-stage 
sampling of monoecious species. Crop Science 37:14–26.
Crouch HK, Crouch JH, Jarret RL, Cregan PB and Ortiz R. 1998. Segregation at 
microsatellite loci in haploid and diploid gametes of Musa. Crop Science 
38:211–217.
Dale PJ. 1992. Spread of engineered genes to wild relatives. Plant Physiology 
100:13–15.
Daly DC, Cameron KM and Stevenson DM. 2001. Plant systematics in the age of 
genomics. Plant Physiology 127:1328–1333.
Dean RE, Dahlberg JA, Hopkins MS, Mitchell SE and Kresovich S. 1999. Genetic 
redundancy and diversity among ‘orange’ accessions in the US National 
Sorghum Collection as assessed with simple sequence repeat (SSR) markers. 
Crop Science 39:1215–1221.
Decker DS. 1988. Origin(s), evolution, and systematics of Curcurbita pepo 
(Cucurbitaceae). Economic Botany 42:4–15.
Degani C, Rowland LJ, Levi A, Hortynski JA and Galletta GJ. 1998. DNA 
fingerprinting of strawberry (Fragaria × ananassa) cultivars using randomly 
amplified polymorphic DNA (RAPD) markers. Euphytica 102:247–253.
Del Rio AH and Bamberg JB. 2002. Lack of association between genetic and 
geographical origin characteristics for the wild potato Solanum sucrense. 
American Journal of Potato Research 79:335–338.
Del Rio AH, Bamberg JB and Huamán Z. 1997a. Assessing changes in the genetic 
diversity of potato gene banks. 1. Effects of seed increase. Theoretical and 
Applied Genetics 95:191–198.
Del Rio AH, Bamberg JB, Huamán Z, Salas A and Vega SE. 1997b. Assessing changes 
in the genetic diversity of potato gene banks. 2. in situ vs. ex situ. Theoretical 
and Applied Genetics 95:199–204.
Desplanque B, Boudry P, Broomberg K, Saumitou-Laprade P, Cuguen J and van 
Dijk H. 1999. Genetic diversity and gene flow between wild, cultivated and 
weedy forms of Beta vulgaris L. (Chenopodiaceae), assessed by RFLP and 
microsatellite markers. Theoretical and Applied Genetics 98:1194–1201.
Desplanque B, Hautekèete N and van Dijk H. 2002. Transgenic weed beets: possible, 
probable, avoidable? Journal of Applied Ecology 39:561–571.
DeVicente MC and Tanksley SD. 1993. QTL analysis of transgressive segregation 
in an interspecific tomato cross. Genetics 134:585–596.
Dice LR. 1945. Measures of the amount of ecologic association between species. 
Ecology 26:279–302.
96 IPGRI TECHNICAL BULLETIN NO. 10
Diers BW, Osborn TC and McVetty PBE. 1996. Relationship between heterosis 
and genetic distance based on Restriction Fragment Length Polymorphism 
markers in oilseed rape (Brassica napus L.). Crop Science 36:79–83.
Dillon SL, Lawrence PK and Henry RJ. 2001. The use of ribosomal ITS to determine 
phylogenetic relationships within Sorghum. Plant Systematics and Evolution 
230:97–110.
Dobzhansky T. 1950. Genetics of natural populations. XIX. Origin of heterosis 
through natural selection in populations of Drosophila pseodobscura. Genetics 
35:288–302.
Doebley JF. 1989. Molecular evidence for a missing wild relative of maize and 
the introgression of its chloroplast genome into Zea perennis. Evolution 
43:1555–1559.
Doebley JF. 1992. Molecular systematics and crop evolution. In Molecular 
Systematics of Plants (PS Soltis, DE Soltis and JJ Doyle, eds.). Chapman and 
Hall, New York, USA. pp. 202–222.
Doebley JF, Goodman MM and Stuber CW. 1984. Isoenzymatic variation in Zea 
(Gramineae). Systematic Botany 9:203-218.
Doldi ML, Vollmann J and Lelley T. 1997. Genetic diversity in soybean as determined 
by RAPD and microsatellite analysis. Plant Breeding 116:331–335.
Dos Santos JB, Nienhuis J, Skroch P, Tivang J and Slocum MK. 1994. Comparison 
of RAPD and RFLP genetic markers in determining genetic similarity 
among Brassica oleracea L. genotypes. Theoretical and Applied Genetics 
87:909–915.
Doyden JT and Slobobchikoff CN. 1974. An operational approach to species 
classification. Systematic Zoology 23:239–247.
Durham RE and Korban SS. 1994. Evidence of gene introgression in apple using 
RAPD markers. Euphytica 79:109–114.
Dubreuil P, Dufour P, Krejci E, Causse M, De Vienne D, Gallais A and Charcosset 
A. 1996. Organization of RFLP diversity among inbred lines of maize 
representing the most significant heterotic groups. Crop Science 36:790–
799.
Eernisse DJ and Kluge AG. 1993. Taxonomic congruence versus total evidence, 
and amniote phylogeny inferred from fossils, molecules, and morphology. 
Molecular Biology and Evolution 10:1170–1195.
Ehrlich PR and Raven PH. 1969. Differentiation of populations. Science 165:1228–
1231.
Ellstrand NC. 2001. When transgenes wander, should we worry? Plant Physiology 
125:1543–1545.
Ellstrand NC and Elam DR. 1993. Population genetic consequences of small 
population size: implications for plant conservation. Annual Review of 
Ecology and Systematics 24:217–242.
Ellstrand NC, Prentice HC and Hancock JF. 1999. Gene flow and introgression from 
domesticated plants into their wild relatives. Annual Review of Ecology and 
Systematics 30:539–563.
Engels JMM and Visser L. 2003. A guide to effective management of germplasm 
collections. IPGRI handbook for Genebanks No. 6. International Plant Genetic 
Resources Institute, Rome, Italy.
Ereshefsky M. 2001. The poverty of the Linnaean hierarchy. Cambridge University 
Press, Cambridge, UK.
Erskine W and Muehlbauer FJ. 1991. Allozyme and morphological variability, 
outcrossing rate and core collection formation in lentil germplasm. Theoretical 
and Applied Genetics 83:119–125.
Molecular markers for genebank management 97
Eshed Y, Gera G and Zamir D. 1996. A genome-wide search for wild-species alleles 
that increase horticultural yield for processing tomatoes. Theoretical and 
Applied Genetics 93:877–886.
Eshed Y and Zamir D. 1995. An introgression line population of Lycopersicon pennellii 
in the cultivated tomato enables the identification and fine mapping of yield-
associated QTL. Genetics 141:1147–1162.
Espejo-Ibañez MC, Sanchez MP, Sanchez MD and Yelamo MD. 1994. Isoenzymatic 
variability in seeds of some Spanish common beans (Phaseolus vulgaris L. 
Leguminosae): relation to their domestication centers. Biochemical Systematics 
and Ecology 22:827–833.
Eujayl I, Sorrells ME, Baum M, Wolters P and Powell W. 2002. Isolation of EST-derived 
microsatellite markers for genotyping the A and B genomes of wheat. Theoretical 
and Applied Genetics 104:399–407.
Ewens WJ and Spielman RS. 2001. Overview: locating genes by linkage and association. 
Theoretical Population Biology 60:135–139.
Excoffier L, Smouse PE and Quattro JM. 1992. Analysis of molecular variance 
inferred from metric distances among DNA haplotypes: application to human 
mitochondrial DNA restriction data. Genetics 131:479–491.
Fahima T, Roeder MS, Grama A and Nevo E. 1998. Microsatellite DNA polymorphism 
divergence in Triticum dicoccoides accessions highly resistant to yellow rust. 
Theoretical and Applied Genetics 96:187–195.
Falconer DS. 1981. Introduction to Quantitative Genetics (2nd ed.). Longman, London, 
UK.
Fang DQ, Roose ML, Krueger RR and Federici CT. 1997. Fingerprinting trifoliate orange 
germplasm accessions with isozymes, RFLPs, and inter-simple sequence repeat 
markers. Theoretical and Applied Genetics 95:211–219.
FAO. 1996. The State of the World’s Plant Genetic Resources for Food and Agriculture. 
Food and Agriculture Organization, Rome, Italy.
Felsenstein J. 1978. Cases in which parsimony and compatibility methods will be 
positively misleading. Systematic Zoology 27:401–410.
Felsenstein J. 1981. Evolutionary trees from DNA sequences: A maximum likelihood 
approach. Journal of Molecular Evolution 17:368–376.
Flavell AJ, Knox MR, Pearce SR and Ellis THN. 1998. Retrotransposon-based insertion 
polymorphisms (RBIP) for high throughput marker analysis. Plant Journal 
16:643–650.
Frankel OH. 1984. Genetic perspectives on germplasm conservation. In Genetic 
Manipulation: Impact on Man and Society (W Arber, K Illmensee, WJ Peacock and 
P Starlinger, eds.). Cambridge University Press, Cambridge, UK. pp. 161–170.
Frankel OH and Brown AHH. 1984. Current plant genetic resources-a critical appraisal. 
In Genetics: new frontiers. Vol. 4. Oxford and IBH Publishing Co., New Delhi, 
India. pp. 1–11.
Frary A, Nesbitt TC, Frary A, Grandillo S, Knaap E, Cong B, Liu J, Meller J, Elber R, Alpert 
KB and Tanksley SD. 2000. Fw2.2: a quantitative trait locus key to the evolution 
of tomato fruit size. Science 289:85–88.
Fregene M, Bernal A, Duque M, Dixon A and Tohme J. 2000. AFLP analysis of African 
cassava (Manihot esculenta Crantz) germplasm resistant to the cassava mosaic 
disease (CMD). Theoretical and Applied Genetics 100:678–685.
Freville H, Justy F and Olivieri I. 2001. Comparative allozyme and microsatellite 
population structure in a narrow endemic plant species, Centaurea corymbosa 
Pourret (Asteraceae). Molecular Ecology 10:879–889.
Fry WE and Goodwin SB. 1997. Re-emergence of potato and tomato late blight in the 
United States. Plant Disease 81:1349–1357.
98 IPGRI TECHNICAL BULLETIN NO. 10
Fulton TM, van der Hoeven R, Eannetta NT and Tanksley SD. 2002. Identification, analysis 
and utilization of conserved ortholog set markers for comparative genomics in 
higher plants. Plant Cell 14:1457–1467.
Garvin DF and Weeden NF. 1994. Isozyme evidence supporting a single geographic 
origin for domesticated tepary bean. Crop Science 34:1390–1395.
Geffroy V, Sevignac M, de Oliveira JCF, Fouilloux G, Skroch P, Thoquet P, Gepts 
P, Langinand T and Dron M. 2000. Inheritance of partial resistance against 
Colletotrichum lindemuthianum in Phaseolus vulgaris and co-localization of 
quantitative trait loci with genes involved in specific resistance. Molecular 
Plant-Microbe Interactions 13:287–296.
Ghislain M, Spooner DM, Rodríguez F, Villamon F, Núñez C, Vásquez C and Bonierbale 
M. 2004. Selection of highly informative and user-friendly microsatellites 
(SSRs) for genotyping of cultivated potato. Theoretical and Applied Genetics 
108:881–890.
Ghislain M, Zhang D, Fajardo D, Huamán Z and Hijmans RJ. 1999. Marker-assisted 
sampling of the cultivated Andean potato Solanum phureja collection using RAPD 
markers. Genetic Resources and Crop Evolution 46:547–555.
Giannattasio RB and Spooner DM. 1994. A reexamination of species boundaries 
and hypotheses of hybridization concerning Solanum megistacrolobum and S. 
toralapanum (Solanum sect. Petota, series Megistacroloba): molecular data. Systematic 
Botany 19:106–115.
Gibson G. 2002. Microarrays in ecology and evolution: a preview. Molecular Ecology 
11:17–24.
Godwin ID, Aitken EAB and Smith LW. 1997. Application of inter simple sequence repeat 
(ISSR) markers to plant genetics. Electrophoresis 18:1524–1528.
Golembiewski RC, Danneberger TK and Sweeney PM. 1997. Potential of RAPD 
markers for use in the identification of creeping bentgrass cultivars. Crop Science 
37:212–214.
Graner A, Ludwig WF and Melchinger AE. 1994. Relationships among European 
barley germplasm: II. Comparison of RFLP and pedigree data. Crop Science 
34:1199–1205.
Graner A, Streng S, Kellermann A, Schiemann A, Bauer E, Waugh R, Pellio B and 
Ordon F. 1999. Molecular mapping and genetic fine-structure of the rym5 locus 
encoding resistance to different strains of the Barley Yellow Mosaic Virus Complex. 
Theoretical and Applied Genetics 98:285–290.
Grenier C, Deu M, Kresovich S, Bramel-Cox PJ and Hamon P. 2000. Assessment of genetic 
diversity in three subsets constituted from the ICRISAT Sorghum collection using 
random vs. non-random sampling procedures B. Using molecular markers. 
Theoretical and Applied Genetics 101:197–202.
Greuter W, Mcneill J, Barrie FR, Burdett HM, Demoulin V, Filgueiras TS, Nicolson 
DH, Silva PC, Skog JE, Trehane P, Turland NJ and Hawksworth DL (eds. and 
compilers). 2000. International Code of Botanical Nomenclature (St. Louis Code). 
Regnum Vegetabile 138:1–474.
Guilford P, Prakash S, Zhu JM, Rikkerink E, Gardiner S, Bassett H and Foster R. 1997. 
Microsatellites in Malus × domestica (apple): Abundance, polymorphism and 
cultivar identification. Theoretical and Applied Genetics 94:249–254.
Gupta M, Chyi Y-S, Romero-Severson J and Owen JL. 1994. Amplification of DNA 
markers from evolutionarily diverse genomes using single primers of simple-
sequence repeats. Theoretical and Applied Genetics 89:998–1006.
Gupta PK, Balyan HS, Sharma PC and Ramesh B. 1996. Microsatellites in plants: a new 
class of molecular markers. Current Science 70:45–54.
Hadrys H, Balick M and Schierwater B. 1992. Applications of random amplified 
polymorphic DNA (RAPD) in molecular ecology. Molecular Ecology 1:55–63.
Molecular markers for genebank management 99
Hall BG. 2001. Phylogenetic trees made easy: a how-to manual for molecular 
biologists. Sinauer Associates, Inc., Sunderland, Massachusetts, USA.
Hallden C, Nilsson NO, Rading IM and Saell T. 1994. Evaluation of RFLP and RAPD 
markers in comparison of Brassica napus breeding lines. Theoretical and Applied 
Genetics 88:123–128.
Hamilton MB. 1994. Ex situ conservation of wild plant species: time to assess the genetic 
assumptions and implications of seed banks. Conservation Biology 8:39–49.
Hamon S, Dussert S, Deu M, Hamon P, Seguin M, Glaszmann JC, Grivet L, Chantereau 
J, Chevallier MH, Flori A, Lashermes P, Legnate H and Noirot M. 1998. 
Methodologies de gestion et de conservation des ressources genetiques. Genetic 
Selection and Evolution 30:S237–S258 (suppl.).
Hamon S, Dussert S, Noirot M, Anthony F and Hodgkin T. 1995. Core collections—
accomplishments and challenges. Plant Breeding Abstracts 65:1125–1133.
Hamrick JL and Godt MJW. 1997. Allozyme diversity in cultivated crops. Crop Science 
37:26–30.
Hancock JF, Grumet R and Hokanson SC. 1996. The opportunity for escape of 
engineered genes from transgenic crops. HortScience 31:1080–1085.
Hansen LB, Siegismund HR and Jørgensen RB. 2001. Introgression between 
oilseed rape (Brassica napus L.) and its weedy relative B. rapa L. in a natural 
population. Genetic Resources and Crop Evolution 48:621–627.
Harberd DJ. 1975. Brassica. In Hybridization and the flora of the British Isles (CA Stace, 
ed.). Academic Press, London,UK. pp. 137–139.
Harlan JR and de Wet JMJ. 1971. Toward a rational classification of cultivated 
plants. Taxon 20:509–517.
Hauser MT, Adhami F, Dorner M, Fuchs E and Glossl J. 1998. Generation of co-
dominant PCR-based markers by duplex analysis on high resolution gels. 
Plant Journal 16:117–125.
Hawkes JG. 1990. The Potato: Evolution, Biodiversity, and Genetic Resources. 
Belhaven Press, Oxford, UK.
Hayashi K. 1992. PCR-SSCP: a method for detection of mutations. Genetic Analysis 
Techniques and Applications 9:73–79.
Hearne CM, Ghosh S and Todd JA. 1992. Microsatellites for linkage analysis of genetic 
traits. Trends in Genetics 8:288–294.
Heath DD, Iwama GK and Devlin RH. 1993. PCR primed with VNTR core sequences 
yields species specific patterns and hypervariable probes. Nucleic Acids 
Research 21:5782–5785.
Hennig W. 1950. Grundzüge einer Theorie der Phylogenetischen Systematik. 
Deutscher Zentralverlag, Berlin, Germany.
Hennig W. 1966. Phylogenetic Systematics (3rd ed.) (trans. DD Davis and R 
Zanderl). University of Illinois Press, Urbana Illinois, USA.
Hijmans RJ, Jacobs M, Bamberg JB and Spooner DM. 2003. Frost tolerance in wild 
potato species: unraveling the predictivity of taxonomic, geographic and 
ecological factors. Euphytica 130:47–59.
Hill M, Witsenboer H, Zabeau M, Vos P, Kesseli R and Michelmore R. 1996. PCR-
based fingerprinting using AFLPs as a tool for studying genetic relationships 
in Lactuca spp. Theoretical and Applied Genetics 93:1202–1210.
Hillis DM. 1997. Phylogenetic analysis. Current Biology 7:R129–R131.
Hillis DM. 1998. Taxonomic sampling, phylogenetic accuracy and investigator bias. 
Systematic Biology 47:3–8.
Hillis DM, Moritz C and Mable BK. 1996. Molecular Systematics of Plants (2nd ed.). 
Sinauer Associates, Inc., Sunderland, Massachusetts. USA.
Hodgkin T, Brown AHD, van Hintum ThJL and Morales EAV. 1995. Core Collections 
of Plant Genetic Resources. John Wiley & Sons, Chichester, UK.
100 IPGRI TECHNICAL BULLETIN NO. 10
Holland PM, Abramson RD, Watson R and Gelfland DH. 1991. Detection of specific 
polymerase chain reaction product by utilizing the 5’→3’ exonuclease activity 
of Thermus aquaticus DNA polymerase. Proceedings of the National Academy 
of Sciences USA. 88:7276–7280.
Hu J and Vick BA. 2003. Target region amplification polymorphism: a novel 
marker technique for plant genotyping. Plant Molecular Biology Reporter 
21:289–294.
Huamán Z, Ortiz R and Gomez R. 2000. Selecting a Solanum tuberosum subsp. 
andigena core collection using morphological, geographical, disease and pest 
descriptors. American Journal of Potato Research 77:183–190.
Huang HW, Layne DR and Kubisiak TL. 2000. RAPD inheritance and diversity in 
pawpaw (Asimina triloba). Journal of the American Society of Horticultural 
Science 125:454–459.
Huang HW, Layne DR and Riemenschneider DE. 1998. Genetic diversity and 
geographic differentiation in pawpaw [Asimina triloba (L.) Dunal] populations 
from nine states as revealed by allozyme analysis. Journal of the American 
Society of Horticultural Science 123:635–641.
Hudson RR, Bailey K, Skarecky D, Kwaitowski J and Ayala FJ. 1994. Evidence for 
positive selection in the superoxide dismutase (Sod) region of Drosophila 
melanogaster. Genetics 136:1329–1340.
Huff DR. 1997. RAPD characterization of heterogeneous perennial ryegrass cultivars. 
Crop Science 37:557–564.
Hymowitz T, Singh RJ and Kollipara KP. 1998. The genomes of Glycine. Plant Breeding 
Reviews 16:289–319.
Jaccard P. 1908. Nouvelles recherches sur la distribution florale. Bulletin Société 
Vaudoise Sciences Naturelles 44:223–270.
Jaccoud D, Peng K, Feinstein D and Kilian A. 2001. Diversity arrays: a solid state 
technology for sequence information independent genotyping. Nucleic Acids 
Research 29:e25.
Jarne P and Lagoda PJL 1996. Microsatellites, from molecules to populations and 
back. Trends Ecology and Evolution 11:424–429.
Jeffreys AJ, Wilson V and Thein SL. 1985a. Hypervariable “minisatellite” regions in 
human DNA. Nature 314:67–73.
Jeffreys AJ, Wilson V and Thein SL. 1985b. Individual-specific “fingerprints” of 
human DNA. Nature 316:76–79.
Jones CJ, Edwards KJ, Castaglione S, Winfield MO, Sala F, van de Wiel C, Bredemeijer 
G, Vosman B, Matthes M, Daly A, Brettschneider R, Bettini P, Buiatti M, 
Maestri E, Malcevschi A, Marmiroli N, Aert R, Volckaert G, Rueda J, Linacero 
R, Vazquez A and Karp A. 1997. Reproducibility testing of RAPD, AFLP and 
SSR markers in plants by a network of European laboratories. Molecular 
Breeding 3:381–390.
Jørgensen RB and Andersen B. 1994. Spontaneous hybridization between oilseed 
rape (Brassica napus) and weedy B. camprestis (Brassicaceae): a risk of 
growing genetically modified oilseed rape. American Journal of Botany 
81:1620–1626.
Jørgensen RB, Anderson B, Landbo L, and Mikkelsen T. 1996. Spontaneous 
hybridization between oilseed rape (Brassica napus) and weedy relatives. 
In Proceedings of an International Symposium on Brassicas/Ninth Crucifer 
Genetics Workshop (JS Dias, I Crute, and AA Montiero, eds.). ISHS, Lisbon, 
Portugal. pp. 193–197.
Judd WS, Campbell CS, Kellogg EA and Stevens PF. 2002. Plant systematics: a 
phylogenetic approach (2nd ed.). Sinauer Associates, Inc., Sunderland, 
Massachusetts, USA.
Molecular markers for genebank management 101
Kalendar R, Grob T, Regina M, Suoniemi A and Schulman A. 1999. IRAP and 
REMAP: Two new retrotransposon-based DNA fingerprinting techniques. 
Theoretical and Applied Genetics 98:704–711.
Kardolus JP, van Eck HJ and van den Berg RG. 1998. The potential of AFLPs in 
biosystematics: a first application in Solanum taxonomy (Solanaceae). Plant 
Systematics and Evolution 210:87–103.
Karp A, Edwards KJ, Bruford M, Funk S, Vosman B, Morgante M, Seberg O, 
Kremer A, Boursot P, Arctander P, Tautz D and Hewitt GM. 1997a. Molecular 
technologies for biodiversity evaluation: opportunities and challenges. 
Nature Biotechology 15:625–628.
Karp A, Kresovich S, Bhat KV, Ayad WG and Hodgkin T. 1997b. Molecular tools 
in plant genetic resources conservation: a guide to the technologies IPGRI 
Technical Bulletin No. 2. International Plant Genetic Resources Institute, 
Rome, Italy.
Karp A, Seberg O and Buiatti M. 1996. Molecular techniques in the assessment of 
botanical diversity. Annals of Botany 78:143–149.
Kephart SR. 1990. Starch gel electrophoresis of plant isozymes: a comparative 
analysis of techniques. American Journal of Botany 77:693–712.
Kesseli R, Ochoa O and Michelmore R. 1991. Variation at RFLP loci in Lactuca spp. 
and origin of cultivated lettuce (L. sativa). Genome 34:430–436.
Kiang YT, Antonovics J and Wu L. 1979. The extinction of wild rice (Oryza perennis-
formosana) in Taiwan. Journal of Asian Ecology 1:1–9.
Kiers AM, Mes THM, van der Meijden R and Bachmann K. 2000. A search for 
diagnostic AFLP markers in Cichorium species with emphasis on endive 
and chicory cultivar groups. Genome 43:470–476.
Khlestkina EK, Huang XQ, Quenum FJ-B, Chebotar S, Röder MS and Börner A. 
2004. Genetic diversity in cultivated plants—loss or stability? Theoretical 
and Applied Genetics 108:1466–1472.
Koester RP, Sisco PH and Stuber CW. 1993. Identification of quantitative trait loci 
controlling days to flowering and plant height in two near isogenic lines of 
maize. Crop Science 33:1209–1216.
Kollipara KP, Singh RJ and Hymowitz T. 1997. Phylogenetic and genomic 
relationships in the genus Glycine Willd. based on sequences from the ITS 
region of nuclear rDNA. Genome 40:57–68.
Konieczny A and Ausubel FM. 1993. A procedure for mapping Arabidopsis mutations 
using co-dominant ecotype-specific PCR-based markers. Plant Journal 
4:403–410.
Koopman WJM, Zevenbergen MJ and van den Berg RG. 2001. Species relationships 
in Lactuca s.l. (Lactuceae, Asteraceae) inferred from AFLP fingerprints. 
American Journal of Botany 88:1881–1887.
Koornneef M and Stam P. 2001. Changing paradigms in plant breeding. Plant 
Physiolology 125:156–159.
Korzun V, Boerner A, Worland AJ, Law CN and Roeder MS. 1997. Applications of 
microsatellite markers to distinguish inter-varietal chromosome substitution 
lines of wheat (Triticum aestivum L.). Euphytica 95:149–155.
Kota R, Wolf M, Michalek W and Graner A. 2001. Application of denaturing high-
performance liquid chromatography for mapping of single nucleotide 
polymorphisms in barley (Hordeum vulgare L.). Genome 44:523–528.
Kreiger M and Ross KG. 2002. Identification of a major gene regulating complex 
social behavior. Science 295:328–332.
Lamboy WF and Alpha CG. 1998. Using simple sequence repeats (SSRs) for DNA 
fingerprinting germplasm accessions of grape (Vitis L.) species. Journal of 
the American Society of Horticultural Science 123:182–188.
102 IPGRI TECHNICAL BULLETIN NO. 10
Lamboy WF, McFerson JR, Westman AL and Kresovich S. 1994. Application of 
isozyme data to the management of the United States national Brassica 
oleracea L. genetic resources collection. Genetic Resources and Crop Evolution 
41:99–108.
Lamboy WF, Yu J, Forsline PL and Weeden NF. 1996. Partitioning of allozyme 
diversity in wild populations of Malus sieversii L. and implications for 
germplasm collection. Journal of the American Society of Horticultural 
Science 121:982–987.
Lanner HC, Bryngelsson T and Gustafsson M. 1997. Relationships of wild Brassica 
species with chromosome number 2n = 18, based on RFLP studies. Genome 
40:302–308.
Lanner HC, Gustafsson M, Falt AS and Bryngelsson T. 1996. Diversity in natural 
populations of wild Brassica oleracea as estimated by isozyme and RAPD 
analysis. Genetic Resources and Crop Evolution. 43:13–23.
Laurie DA and Devos KM. 2002. Trends in comparative genetics and their 
potential impacts on wheat and barley research. Plant Molecular Biology 
48:729–740.
Lee DJ, Berding N, Jackes BR and Bielig LM. 1998. Isozyme markers in Saccharum 
spp. hybrids and Erianthus arundinaceus (Retz.) Jeswiet. Australian Journal of 
Agricultural Research. 49:915–921.
Lee LG, Connell CR and Bloch W. 1993. Allelic discrimination by nick-translation 
PCR with fluorogenic probes. Nucleic Acids Research 21:3761–3766.
Lee DJ, Reeves JC and Cooke RJ. 1996. DNA profiling and plant variety registration: 
1. The use of random amplified DNA polymorphisms to discriminate 
between varieties of oilseed rape. Electrophoresis 17:261–265.
Lemieux B, Aharoni A and Schena M. 1998. Overview of DNA chip technology. 
Molecular Breeding 4:277–289.
Lenné JM and Wood D. 1991. Plant diseases and the use of wild germplasm. Annual 
Review of Phytopathology 29:35–63.
Levin DA. 2000. The origin, expansion, and demise of plant species. Oxford 
University Press, New York, New York, USA.
Levin DA, Francisco-Ortega J and Jansen RK. 1996. Hybridization and extinction 
of rare plant species. Conservation Biology 10:10–16.
Li G and Quiros CF. 2001. Sequence-related amplified polymorphism (SRAP), 
a new marker system based on a simple PCR reaction: its application to 
mapping and gene tagging in Brassica. Theoretical and Applied Genetics 
103:455–461.
Li Z, Pinson SRM, Stansel JW and Park WD. 1995. Identification of quantitative 
trait loci (QTLs) for heading date and plant height in cultivated rice (Oryza 
sativa L.). Theoretical and Applied Genetics 91:374–381.
Lin JJ, Kuo J, Jin M, Saunders DA, Beard HS, MacDonald MH, Kenworthy W, 
Ude GN and Matthews BF. 1996. Identification of molecular markers in 
soybean comparing RFLP, RAPD and AFLP DNA mapping techniques. 
Plant Molecular Biology Reporter 14:156–169.
Lincoln R, Boxshall G and Clark P. 1998. A dictionary of ecology, evolution 
and systematics (2nd ed.). Cambridge University Press, Cambridge, 
Massachusetts, USA.
Linder CR, Taha I, Seiler GJ, Snow AA and Rieseberg LH. 1998. Long-term 
introgression of crop genes into wild sunflower populations. Theoretical 
and Applied Genetics 96:339–347.
Link W, Dixkens C, Singh M, Schwall M and Melchinger AE. 1995. Genetic diversity 
in European and Mediterranean faba bean germ plasm revealed by RAPD 
markers. Theoretical and Applied Genetics 90:27–32.
Molecular markers for genebank management 103
Llewellyn D and Fitt G. 1996. Pollen dispersal from two field trials of transgenic 
cotton in the Namoi Valley, Australia. Molecular Breeding 2:157–166.
Lopez-Sese AI, Staub J, Katzir N and Gomez-Guillamon ML. 2002. Estimation 
of between and within accession variation in selected Spanish melon 
germplasm using RAPD and SSR markers to assess strategies for large 
collection evaluation. Euphytica 127:41–51.
Lu J, Knox MR, Ambrose MJ, Brown JKM and Ellis THN. 1996. Comparative analysis 
of genetic diversity in pea assessed by RFLP- and PCR-based methods. 
Theoretical and Applied Genetics 93:1103–1111.
Maass HI and Klaas M. 1995. Infraspecific differentiation of garlic (Allium sativum 
L.) by isozyme and RAPD markers. Theoretical and Applied Genetics 
91:89–97.
Maass BL and Ocampo CH. 1995. Isozyme polymorphism provides fingerprints 
for germplasm of Arachis glabrata Bentham. Genetic Resources and Crop 
Evolution 42:77–82.
Maddison WP. 1995. Phylogentic histories within and among species. In 
Experimental and molecular approaches to plant biosystematics (PC Hoch 
and AG Stephenson, eds.). Missouri Botanical Garden, St. Louis, Missouri, 
USA. pp. 273–287.
Maddison WP, Donoghue MJ and Maddison DR. 1984. Outgroup analysis and 
parsimony. Systematic Zoology 33:83–103.
Mallet J. 2001. Species, concepts of. In Encyclopedia of Biodiversity (SA Levin, ed.). 
Academic Press, San Diego, California, USA. pp. 427–440.
Mallet J. 2004. Poulton, Wallace and Jordan: how discoveries in Papilio butterflies 
led to a new species concept 100 years ago. Systemetics and Biodiversity 
1:441–452.
Mandolino G, De MS, Faeti V, Bagatta M, Carboni A and Ranalli P. 1996. Stability of 
fingerprints of Solanum tuberosum plants derived from conventional tubers 
and vitrotubers. Plant Breeding 115:439–444.
Manjarrez-Sandoval P, Carter Jr. TE, Webb DM and Burton JW. 1997. Heterosis in 
soybean and its prediction by genetic similarity measures. Crop Science 
23:1443–1452.
Maquet A,  Zoro Bi IZ, Delvaux M, Wathelet B and Baudoin JP. 1997. Genetic 
structure of a Lima bean base collection using allozyme markers. Theoretical 
and Applied Genetics 95:980–991.
Maquet A, Zoro Bi IZ, Rocha OJ and Baudoin JP. 1996. Case studies on breeding 
systems and its consequences for germplasm conservation. Genetic Resources 
and Crop Evolution 43:309–318.
Marita JM, Rodríguez JM and Nienhuis JM. 2000. Development of an algorithm 
identifying maximally diverse core collections. Genetic Resources and Crop 
Evolution 47: 515–526.
Martin C, Juliano A, Newbury HJ, Lu BR, Jackson MT and Ford Lloyd BV. 1997. The 
use of RAPD markers to facilitate the identification of Oryza species within a 
germplasm collection. Genetic Resources and Crop Evolution 44:175–183.
Matsuoka Y, Mitchell SE, Kresovich S, Goodman M and Doebley J. 2002. 
Microsatellites in Zea—variability, patterns of mutations, and use for 
evolutionary studies. Theoretical and Applied Genetics 104:436–450.
Mau B, Neuton MA and Larget B. 1999. Bayesian phylogenetic inference via Markov 
chain Monte Carlo analysis. Biometrics 55:1–12.
Maughan PJ, Saghai Maroof MA and Buss GR. 1995. Microsatellite and amplified 
sequence length polymorphisms in cultivated and wild soybean. Genome 
38:715–723.
104 IPGRI TECHNICAL BULLETIN NO. 10
May B. 1992. Starch gel electrophoresis of allozymes. In Molecular genetic analysis 
of populations: a practical approach (AR Hoelzel, ed.). Oxford University 
Press, Oxford, UK. pp. 1–27.
Mayden RL 1997. A hierarchy of species concepts: the denouement of the saga of 
the species problem. In Species: the units of biodiversity (MF Claridge, HA 
Dawson and MR Wilson, eds.). Chapman and Hall, New York, New York, 
USA. pp. 381–424
Mayer MS, Tullu A, Simon CJ, Kumar J, Kaiser WJ, Kraft JM and Muehlbauer FJ. 1997. 
Development of a DNA marker for Fusarium wilt resistance in chickpea. Crop 
Science 37:1625–1629.
Mayr E. 1942. Systematics and the origin of species. Columbia University Press, 
New York, USA.
McGregor CE, van Treuren R, Hoekstra R and van Hintum TJL. 2002. Analysis of 
the wild potato germplasm of the series Acaulia with AFLPs: implications 
for ex situ conservation. Theoretical and Applied Genetics 104:146–156.
Melchinger AE, Graner A, Singh M and Messmer MM. 1994. Relationships among 
European barley germplasm. I. Genetic diversity among winter and spring 
cultivars revealed by RFLPs. Crop Science 34:1191–1199.
Menkir A, Goldsbrough P and Ejeta G. 1997. RAPD based assessment of genetic 
diversity in cultivated races of sorghum. Crop Science 37:564–569.
Michener CD. 1963. Some future developments in taxonomy. Systematic Zoology 
12:151–172.
Milbourne D, Meyer R, Bradshaw JE, Baird E, Bonar N, Provan J, Powell W and 
Waugh R. 1997. Comparison of PCR-based marker systems for the analysis of 
genetic relationships in cultivated potato. Molecular Breeding 3:127–136.
Miller DR and Rossman AY. 1995. Systematics, biodiversity, and agriculture. 
Bioscience 45:680–686.
Miller JT and Spooner DM. 1996. Introgression of Solanum chacoense (Solanum sect. 
Petota) upland populations reexamined. Systematic Botany 21:461–475. 
Miller JT and Spooner DM. 1999. Collapse of species boundaries in the wild potato 
Solanum brevicaule complex (Solanaceae, S. sect. Petota): molecular data. Plant 
Systematics and Evolution 214:103–130.
Miller JC and Tanksley SD. 1990. RFLP analysis of phylogenetic relationships 
and genetic variation in the genus Lycopersicon. Theoretical and Applied 
Genetics 80:437–448.
Minghetti PP and Dugaiczyk A. 1993. The emergence of new DNA repeats and the 
divergence of primates. Proceedings of the National Academy of Sciences 
USA 90:1872–1876.
Morgante M and Olivieri AM. 1993. PCR-amplified microsatellites as markers in 
plant genetics. Plant Journal 3:175–182.
Morgante M, Hanafey H and Powell W. 2002. Microsatellites are preferentially 
associated with nonrepetitive DNA in plant genome. Nature Genetics 
30:194–200.
Mort ME and Crawford DJ. 2004. The continuing search: low-copy nuclear 
sequences for lower level plant molecular phylogenetic studies. Taxon 
53:257–261.
Mumm RH and Dudley JW. 1994. A classification of 148 U.S. maize inbreds: I. 
Cluster analysis based on RFLPs. Crop Science 34:842–851.
Murphy JP and Philips TD. 1993. Isozyme variation in cultivated oat and its 
progenitor species, Avena sterilis L. Crop Science 33:1366–1372.
Nagaoka T and Ogihara Y. 1997. Applicability of inter-simple sequence repeat 
polymorphisms in wheat for use as DNA markers in comparison to RFLP 
and RAPD markers. Theoretical and Applied Genetics 94:597–602.
Molecular markers for genebank management 105
National Academy of Sciences. 1989. Field testing genetically modified organisms: 
framework for decisions. National Academy Press, Washington, D.C., USA.
National Research Council. 1972. Genetic vulnerability of major crops. National 
Academy of Sciences, Washington, D.C., USA.
National Research Council. 2002. Environmental effects of transgenic plants: the 
scope and adequacy of regulation. National Academy Press, Washington, 
D.C., USA.
Neale DB and Williams CG. 1991. Restriction fragment length polymorphism 
mapping in conifers and applications to forest genetics and tree improvement. 
Canadian Journal of Forest Research 21:545–554.
Nebauer SG, del Castillo-Agudo L and Segura J. 1999. RAPD variation within and 
among natural populations of outcrossing willow-leaved foxglove (Digitalis 
obscura L.). Theoretical and Applied Genetics 98:985–994.
Negash A, Tsegaye A, van Treuren R and Visser L. 2002. AFLP analysis of enset 
clonal diversity in south and southwestern Ethiopia for conservation. Crop 
Science 42:1105–1111.
Nei M and Li WH. 1979. Mathematical model for studying genetic variation in 
terms of restriction endonucleases. Proceedings of the National Academy of 
Sciences USA 76:5269–5273.
Nevo E, Apelbaum Elkaher I, Garty J and Beiles A. 1997. Natural selection causes 
microscale allozyme diversity in wild barley and a lichen at ‘Evolution 
Canyon’, Mt. Carmel, Israel. Heredity 78:373–382.
Nevo E, Baum B, Beiles A and Johnson DA. 1998. Ecological correlates of RAPD DNA 
diversity of wild barley, Hordeum spontaneum, in the Fertile Crescent. Genetic 
Resources and Crop Evolution 45:151–159.
Nevo E, Golenberg E, Beiles A, Brown AHD and Zohary D. 1982. Genetic diversity 
and environmental associations of wild wheat, Triticum dicoccoides, in Israel. 
Theoretical and Applied Genetics 62:241–254.
Oberwalder B, Ruoss B, Schilde Rentschler L, Hemleben V and Ninnemann H. 1997. 
Asymmetric fusion between wild and cultivated species of potato (Solanum 
spp.)—detection of asymmetric hybrids and genome elimination. Theoretical 
and Applied Genetics 94:1104–1112.
O’Donoughue LS,  Souza E, Tanksley SD and Sorrells ME. 1994. Relationships 
among North American oat cultivars based on restriction fragment length 
polymorphisms. Crop Science 34:1251–1258.
Olmstead RG. 1995. Species concepts and plesiomorphic species. Systematic Botany 
20:623–630.
Olufowote JO, Xu Y, Chen X, Park WD, Beachell HM, Dilday RH, Goto M and 
McCouch SR. 1997. Comparative evaluation of within-cultivar variation 
of rice (Oryza sativa L.) using microsatellite and RFLP markers. Genome 
40:370–378.
Palombi MA and Damiano C. 2002. Comparison between RAPD and SSR molecular 
markers in detecting genetic variation in kiwifruit (Actinidia deliciosa A. Chev). 
Plant Cell Reports 20:1061–1066.
Paran I and Michelmore RW. 1993. Development of reliable PCR based markers 
linked to downy mildew resistance genes in lettuce. Theoretical and Applied 
Genetics 85:985–993.
Parani M, Singh KN, Rangasamy S and Ramalingam RS. 1997. Identification of 
Sesamum alatum × Sesamum indicum hybrid using protein, isozyme and RAPD 
markers. Indian Journal of Genetics and Plant Breeding 57:381–388.
Pardey PG, Koo B, Wright BD, van Dusen ME, Skovmand B and Taba S. 2001. Costing 
the conservation of genetic resources: CIMMYT’s ex situ maize and wheat 
collection. Crop Science 41:1286–1299.
106 IPGRI TECHNICAL BULLETIN NO. 10
Parker PG, Snow AA, Schug MD, Booton GC and Fuerst PA. 1998. What molecules 
can tell us about populations: choosing and using a molecular marker. 
Ecology 79:361–382.
Parsons BJ, Newbury HJ, Jackson MT and Ford-Lloyd BV. 1997. Contrasting genetic 
diversity relationships are revealed in rice (Oryza sativa L.) using different 
marker types. Molecular Breeding 3:115–125.
Parzies HK, Spoor W and Ennos RA. 2000. Genetic diversity of barley landrace 
accessions (Hordeum vulgare ssp. vulgare) conserved for different lengths of 
time in ex situ gene banks. Heredity 84:476–486.
Patterson TB and Givinish TJ. 2002. Phylogeny, concerted convergence, and 
phylogenetic niche conservatism in the core Liliales: insights from rbcL and 
ndhF sequence data. Evolution 56:233–252.
Paz MM and Veilleux RE. 1997. Genetic diversity based on randomly amplified 
polymorphic DNA (RAPD) and its relationship with the performance of 
diploid potato hybrids. Journal of the American Society of Horticultural 
Science 122:740–747.
Pejic I, Ajmone-Marsan P, Morgante M, Kozumplick V, Castiglioni P, Taramino G 
and Motto M. 1998. Comparative analysis of genetic similarity among maize 
inbred lines detected by RFLPs, RAPDs, SSRs and AFLPs. Theoretical and 
Applied Genetics 97:1248–1255.
Penteado MID, deMiera LES and de laVega MP. 1996. Genetic resources of 
Centrosema spp: Genetic changes associated to the handling of an active 
collection. Genetic Resources and Crop Evolution 43:85–90.
Perez JA, Maca N and Larruga JM. 1999. Expanding informativeness of microsatellite 
motifs through the analysis of heteroduplexes: a case applied to Solanum 
tuberosum. Theoretical and Applied Genetics 99:481–486.
Peroutka SJ. 1997. The medical utility of genomics data in neuropsychiatry: 
mutational genetics versus association genetics. Current Opinions in 
Biotechnology 8:688–691.
Petersen L, Ostergard H and Giese H. 1994. Genetic diversity among wild and 
cultivated barley as revealed by RFLP. Theoretical and Applied Genetics 
89:676–681.
Pflieger S, Lefebvre V, Caranta C, Blattes A, Goffinet B and Palloix A. 1999. Disease 
resistance gene analogs as candidates for QTLs involved in pepper–pathogen 
interactions. Genome 42:1100–1110.
Phippen WB, Kresovich S, Candelas FG and McFerson JR. 1997. Molecular 
characterization can quantify and partition variation among genebank 
holdings: a case study with phenotypically similar accessions of Brassica 
oleracea var. capitata L. (cabbage) ‘Golden Acre’. Theoretical and Applied 
Genetics 94:227–234.
Poulton EB. 1904. What is a species? Proceedings of the Entomological Society of 
London, UK. 1903: lxxvii-cxvi.
Powell W, Machray GC and Provan J. 1996a. Polymorphism revealed by simple 
sequence repeats. Trends in Plant Science 1:215–222.
Powell W, Morgante M, Andre C, Hanafey M, Vogel J, Tingey S and Rafalsky A. 
1996b. The comparison of RFLP, RAPD, AFLP and SSR (microsatellite) 
markers for gemplasm analysis. Molecular Breeding 2:225–238.
Prabhu RR, Webb D, Jessen H, Luk S, Smith S and Gresshoff PM. 1997. 
Genetic relatedness among soybean genotypes using DNA amplification 
fingerprinting (DAF), RFLP and pedigree. Crop Science 37:1590–1595.
Provan J, Kumar A, Shepherd L, Powell W and Waugh R. 1996. Analysis of 
intra-specific somatic hybrids of potato (Solanum tuberosum) using simple 
sequence repeats. Plant Cell Reports 16:196–199.
Molecular markers for genebank management 107
Queller DC, Strassmann JE and Hughes CR. 1993. Microsatellites and kinship. 
Trends in Ecology and Evolution 8:285–288.
Rabinowitz D, Linder CR, Ortega R, Begazo D, Murguia H, Douches DS and 
Quiros CF. 1990. High levels of interspecific hybridization between Solanum 
sparsipilum and S. stenotomum in experimental plots in the Andes. American 
Potato Journal 67:73–81.
Rafalski JA. 2002. Novel genetic mapping tools in plants: SNPs and LD-based 
approaches. Plant Science 162:329–333.
Ramsay G. 1998. DNA chips: State-of-the art. Nature Biotechnology 16:40–44.
Reed DH and Frankham R. 2001. How closely correlated are molecular and 
quantitative measures of genetic variation? A meta-analysis. Evolution 
55:1095–1103.
Reed DH and Frankham R. 2003. Correlation between fitness and genetic diversity. 
Conservation Biology 17:230–237.
Reedy ME, Knapp AD and Lamkey KR. 1995. Isozyme allelic frequency changes 
following maize (Zea mays L.) germplasm regeneration. Maydica 40:269–
273.
Rhymer JM and Simberloff D. 1996. Extinction by hybridization and introgression. 
Annual Review of Ecology and Systematics 27:83–109.
Rick CM. 1963. Barriers to interbreeding in Lycopersicon peruvianum. Evolution 
17:216–232.
Rick CM. 1973. Potential genetic resources in tomato species: clues from 
observations in native habitats. In Genes, enzymes, and populations (AM 
Srb, ed.). Plenum Press, New York, USA. pp. 255–269.
Rick CM. 1979. Biosystematic studies in Lycopersicon and closely related species in 
Solanum. In The biology and taxonomy of the Solanaceae. Linnean Society 
of London Symposium. Series 7 (JD Hawkes, RN Lester and AD Skelding, 
eds.). Academic Press, New York, USA. pp. 667–678 + 1 pl.
Rick CM and Fobes JF. 1974. Association of an allozyme with nematode resistance. 
Tomato Genetics Cooperative Report 24:25.
Riedel GE, Swanberg SL, Kuranda KD, Marquette K, LaPan P, Bledsoe P, Kennedy 
A and Lin BY. 1990. Denaturing gradient gel electrophoresis identifies 
genomic DNA polymorphism with high frequency in maize. Theoretical 
and Applied Genetics 80:1–10.
Rieger MA, Lamond M, Preston C, Powles SB and Roush RT. 2002. Pollen-mediated 
movement of herbicide resistance between commercial canola fields. Science 
296:2386–2388.
Rieseberg LH. 1991. Homoploid reticulate evolution in Helianthus (Asteraceae): 
evidence from ribosomal genes. American Journal of Botany 78:1218–
1237.
Rieseberg LH. 1995. The role of hybridization in evolution: old wine in new skins. 
American Journal of Botany 82:944–953.
Rieseberg LH and Brouillet L. 1994. Are many plant species paraphyletic? Taxon 
43:21–32.
Rieseberg LH and Burke JM. 2001. The biological reality of species: gene flow, 
selection and collective evolution. Taxon 50:47–67.
Rieseberg LH and Ellstrand NC. 1993. What can molecular and morphological 
markers tell us about plant hybridization? Critical Reviews in Plant Science 
12:213–241.
Rieseberg LH, Sinervo B, Linder CR, Ungerer M and Arias DM. 1996. Role of gene 
interactions in hybrid speciation: evidence from ancient and experimental 
hybrids. Science 272:741–745. 
108 IPGRI TECHNICAL BULLETIN NO. 10
Riesner D, Steger G, Zimmat R, Owens RA, Wagenhofer M, Hillen W, Vollbach 
S and Henco K. 1989. Temperature-gradient gel electrophoresis of nucleic 
acids: analysis of conformational transitions, sequence variations, and 
protein-nucleic acid interactions. Electrophoresis 10:377–89.
Ritala A, Nuutila AM, Aikasalo R, Kauppinen V and Tammisola J. 2002. Measuring 
gene flow in the cultivation of transgenic barley. Crop Science 42:278–285.
Roa AC, Maya MM, Duque MC, Tohme J, Allem AC and Bonierbale MW. 1997. 
AFLP analysis of relationships among cassava and other Manihot species. 
Theoretical and Applied Genetics 95:741–750.
Röder MS, Wendehake K, Korzun V, Bredemeijer G, Laborie D, Bertrand L, Isaac P, 
Rendell S, Jackson J, Cooke RJ, Vosman B and Ganal MW. 2002. Construction 
and analysis of a microsatellite-based database of European wheat varieties. 
Theoretical and Applied Genetics 106:67–73.
Rodman J, Price RA and Karol K. 1993. Nucleotide sequences of the rbcL gene 
indicate monophyly of mustard oil plants. Annals of the Missouri Botanical 
Garden 80:686–699.
Rohlf FJ. 1992. NTSYS-pc, numerical taxonomy and multivariate system. Exeter 
Publishing, New York, USA.
Rokas A, Williams BL, King N and Carroll SB. 2003. Genome-scale approaches to 
resolving incongruence in molecular phylogenies. Nature 425:798–804.
Rollins RC. 1965. On the basis of biological classification. Taxon 14:1–6.
Ronaghi M, Uhlén M and Nyrén P. 1998. A sequencing method based on real-time 
pyrophosphate. Science 281:363.
Ronning CM and Schnell RJS. 1994. Allozyme diversity in a germplasm collection 
of Theobroma cacao L. Journal of Hereditity 85:291–295.
Ross H. 1986. Potato Breeding—Problems and Perspectives. Advances in Plant 
Breeding Supplement 13. Journal of Plant Breeding. Verlag Paul Parey, 
Berlin, Germany.
Ruiz de Galerreta JI, Carrasco A, Salazar A, Barrena I, Iturritxa E, Marquinez R, 
Legorburu FJ and Ritter E. 1998. Wild Solanum species as resistance sources 
against different pathogens of potato. Potato Research 41:57–68.
Russell JR, Fuller JD, Macaulay M, Hatz BG, Jahoor A, Powell W and Waugh 
R. 1997a. Direct comparison of levels of genetic variation among barley 
accessions detected by RFLPs, AFLPs, SSRs and RAPDs. Theoretical and 
Applied Genetics 95:714–722.
Russell JR, Fuller JD, Young G, Thomas B, Taramino G, Macaulay M, Waugh R 
and Powell W. 1997b. Discriminating between barley genotypes using 
microsatellite markers. Genome 40:442–450.
Saal B and Wricke G. 2002. Clustering of amplified fragment length polymorphism 
markers in a linkage map of rye. Plant Breeding 121:117–123.
Saeglitz C, Pohl M and Bartsch D. 2000. Monitoring gene flow from transgenic 
sugar beet using cytoplasmic male-sterile bait plants. Molecular Ecology 
9:2035–2040.
Sackville Hamilton NR, Engels JMM, van Hintum TJL, Koo B and Smale M. 2002. 
Accession management. Combining or splitting accessions as a tool to 
improve germplasm management efficiency. IPGRI Technical Bulletin No. 
5. International Plant Genetic Resources Institute, Rome, Italy.
Sackville Hamilton NR and Chorlton KH. 1997. Regeneration of accessions in seed 
collections: a decision guide. Handbook for Genebanks No. 5. International 
Plant Genetic Resources Institute, Rome, Italy.
Saitou N and Nei M. 1987. The neighbor-joining method: a new method for 
reconstructing phylogenetic trees. Molecular Biology and Evolution 
4:406–425.
Molecular markers for genebank management 109
Salisbury PA. 2000. The myths of gene transfer: A canola case study. Plant 
Protection Quarterly 15:71–76.
Sanger F, Nicklen S and Coulson AR. 1977. DNA sequencing with chain-
terminating inhibitors. Proceedings of the National Academy of Sciences 
USA 74:5463–5467.
Santacruz-Varela A, Widrlechner MP, Ziegler KE, Salvador RJ, Mallard MJ and 
Bretting PK. 2004. Phylogenetic relationships among North American 
popcorns and their evolutionary links to Mexican and South American 
popcorns. Crop Science 44:1456–1467.
Schierwater B and Ender A. 1993. Different thermostable DNA polymerases may 
apply to different RAPD products. Nucleic Acids Research 21:4647–4648.
Schittenhelm S, Gladis T and Rao VR. 1997. Efficiency of various insects in 
germplasm regeneration of carrot, onion and turnip rape accessions. Plant 
Breeding 116:369–375.
Schneider K, Borchardt DC, Schafer-Pregl R, Nagl N, Glass C, Jeppsson A, 
Gebhardt C and Salamini F. 1999. PCR-based cloning and segregation 
analysis of functional gene homologues in Beta vulgaris. Molecular Genetics 
262:515–524.
Schneider K and Douches DS. 1997. Assessment of PCR-based simple sequence 
repeats to fingerprint North American potato cultivars. American Potato 
Journal 74:149–160.
Schoen DJ and Brown AHD. 1993. Conservation of allelic richness in wild crop 
relatives is aided by assessment of genetic markers. Proceedings of the 
NationalAcademy of Sciences USA 90:10623–10627.
Schoen DJ and Brown AHD. 1995. Maximising genetic diversity in core collections 
of wild relatives of crop species. Pp. 55–76 in Core Collections of Plant 
Genetic Resources (T Hodgkin, AHD Brown, TJL van Hintum and EAV 
Morales, eds.). John Wiley & Sons, Chichester, UK.
Schoen DJ, David JL and Bataillon TM. 1998. Deleterious mutation accumulation 
and regeneration of genetic resources. Proceedings of the National Academy 
of Sciences USA 95:349–399.
Schut JW, Qi X and Stam P. 1997. Association between relationship measures 
based on AFLP markers, pedigree data and morphological traits in barley. 
Theoretical and Applied Genetics 95:1161–1168.
Schlötterer C. 2004. The evolution of molecular markers—just a matter of fashion? 
Nature Reviews Genetics 5:63–69.
Scott MC, Caetano AG and Trigiano RN. 1996. DNA amplification fingerprinting 
identifies closely related Chrysanthemum cultivars. Journal of the American 
Society of Horticultural Science 121:1043–1048.
Second G. 1982. Origin of the genic diversity of cultivated rice (Oryza spp.): study 
of the polymorphism scored at 40 isozyme loci. Japanese Journal of Genetics 
57:25–57.
Sharma SK, Dawson IK and Waugh R. 1995. Relationships among cultivated and 
wild lentils revealed by RAPD analysis. Theoretical and Applied Genetics 
91:647–654.
Sharma SK, Knox MR and Ellis THN. 1996. AFLP analysis of the diversity and 
phylogeny of Lens and its comparison with RAPD analysis. Theoretical and 
Applied Genetics 93:751–758.
Shashidhara G, Hema MV, Koshy B and Farooqi AA. 2003. Assessment of genetic 
diversity and identification of core collection in sandalwood germplasm 
using RAPDs. Journal of Horticutural Science and Biotechnology 78:528–
536.
110 IPGRI TECHNICAL BULLETIN NO. 10
Shenoy VV, Seshu DV and Sachan JKS. 1990. Shikimate dehydrogenase-1(2) 
allozyme as a marker for high seed protein content in rice. Crop Science 
30:937–940.
Shimamura M, Yasue H, Oshima K, Abe H, Kato H, Kishiro T, Goto M, Munechika 
I and Okada N. 1997. Molecular evidence from retrotransposons that whales 
form a clade within even-toed ungulates. Nature 388:666–670.
Sicard D, Woo SS, Arroyo-Garcia R, Ochoa O, Nguyen D, Korol A, Nevo E and 
Michelmore R. 1999. Molecular diversity at the major cluster of disease 
resistance genes in cultivated and wild Lactuca spp. Theoretical and Applied 
Genetics. 99:405–418.
Singh RJ, Kollipara KP and Hymowitz T. 1998. The genomes of Glycine canescens F.J. 
Herm., and G. tomentella Hyata of Western Australia and their phylogenetic 
relationships in the genus Glycine Willd. Genome 41:669–679. 
Singh AK and Smartt J. 1998. The genome donors of the groundnut/peanut (Arachis 
hypogaea L.) revisited. Genetic Resources and Crop Evolution 45:113–118.
Skogsmyr I. 1994. Gene dispersal from transgenic potatoes to conspecifics: a field 
trial. Theoretical and Applied Genetics 88:770–774.
Skroch PW, Dobert RC, Triplett EW and Nienhuis J. 1993. Polymorphism of the 
leghemoglobin gene in Phaseolus demonstrated by polymerase chain reaction 
amplification. Euphytica 69:177–183.
Skroch PW, Nienhuis J, Bebee S, Tohme J and Pedraza F. 1998. Comparison of 
Mexican common bean (Phaseolus vulgaris L.) core and reserve germplasm 
collections. Crop Science 38:488–496.
Small RL, Cronn RC and Wendel JF. 2004. Use of nuclear genes for phylogeny 
reconstruction in plants. Australian Systematic Botany 17:145–170.
Smartt J and Simmonds NW. 1995. Evolution of crop plants. Longman Scientific 
and Technical, Essex, UK.
Smith OS, Smith JSC, Bowen SL, Tenborg RA and Wall SJ. 1990. Similarities among 
a group of elite maize inbreds as measured by pedigree, F1 grain yield, grain 
yield, heterosis and RFLPs. Theoretical and Applied Genetics 80:833–840.
Sneath PHA and Sokal RR. 1962. Numerical taxonomy: the principles and practice 
of numerical classification. W.H. Freeman and Company, New York.
Snow AA and Palma PM. 1997. Commercialization of transgenic plants: potential 
ecological risks. BioScience 47:86–96.
Sokal RR. 1985. The continuing search for order. American Naturalist 126:729–
749.
Sokal RR and Crovello TJ. 1970. The biological species concept: a critical evaluation. 
American Naturalist 104:127–153.
Soltis DE, Soltis PE, Morgan DR, Swensen SM, Mullin BC, Down JM and Martin 
PG. 1995. Chloroplast gene sequence data suggest a single origin of the 
predisposition for symbiotic nitrogen fixation in angiosperms. Proceedings 
of the National Academy of Sciences USA 92:2647–2651.
Somers DJ and Demmon G. 2002. Identification of repetitive, genome-specific probes 
in crucifer oilseed species. Genome 45:485–492.
Song K, Liu P and Osborn TC. 1995. Rapid genome change in synthetic polyploids 
of Brassica and its implications for polyploid evolution. Proceedings of the 
National Academy of Sciences USA 92:7719–7723.
Sonnate G,  Stockton T, Nodari RO, Becerra Velásquez VL and Gepts P. 1994. 
Evolution of genetic diversity during the domestication of common bean 
(Phaseolus vulgaris L.). Theoretical and Applied Genetics 89:629–635.
Spagnoletti-Zeuli PL, Sergio L and Perrino P. 1995. Changes in the genetic structure 
of wheat germplasm accessions during seed rejuvenation. Plant Breeding 
114:193–198.
Molecular markers for genebank management 111
Spooner DM,  Hetterscheid WLA van den Berg RG and Brandenburg W. 2003. Plant 
nomenclature and taxonomy: An horticultural and agronomic perspective. 
Horticulture Reviews 28:1–60.
Spooner D and Lara-Cabrera S. 2001. Sistemática molecular y evolución de plantas 
cultivadas. In Enfoques Contemporáneos para el estudio de la biodiversidad 
(HM Hernández, A García-Aldrete, F Alvarez and M. Ulloa, eds.). Instituto 
de Biología, UNAM/ Fondo de Cultura Económica, Mexico. pp. 57–114.
Spooner DM, Tivang J, Nienhuis J, Miller JT, Douches DS and Contreras-M. A. 1996. 
Comparison of four molecular markers in measuring relationships among 
the wild potato relatives Solanum section Etuberosum (Subgenus Potatoe). 
Theoretical and Applied Genetics 92:532–540.
Spooner DM, Sytsma KJ and Smith JF. 1991. A molecular reexamination of diploid 
hybrid speciation of Solanum raphanifolium. Evolution 45:757–764.
Stahlhut R, Park G, Petersen R, Ma W and Hylands P. 1999. The occurrence of the 
anti-cancer diterpene taxol in Podocarpus gracilior Pilger (Podocarpaceae). 
Biochemical Systematics and Ecology 27L:613–622.
St. Amand PC. 2004. Risks associated with genetically engineered crops. In 
Genetically modified crops: their development, uses, and risks (G H Liang 
and D Skinner, eds.). Haworth Press, Inc. Binghamton, New York, USA.
St. Amand PC, Skinner DZ and Peaden RN. 2000. Risk of alfalfa transgene 
dissemination and scale-dependent effects. Theoretical and Applied Genetics 
101:107–114.
Staub JE, Serquen FC and Gupta M. 1996. Genetic markers, map construction, and 
their application in plant breeding. HortScience 31:729–741.
Steiner JJ, Piccioni E, Falcinelli M and Liston A. 1998. Germplasm diversity 
among cultivars and the NPGS crimson clover collection. Crop Science 
38:263–271.
Steiner AM, Ruckenbauer P and Goecke E. 1997. Maintenance in genebanks, a case study: 
contaminations observed in the Nurnberg oats of 1831. Genetic Resources and 
Crop Evolution 44:533–538.
Steinmetz LM, Mindrinos M and Oefner PJ. 2000. Combining genome sequences 
and new technologies for dissecting the genetics of complex phenotypes. 
Trends in Plant Science 5:397–401.
Stevens PF. 1998. What kind of classification should the practicing taxonomist 
use to be saved? In Plant diversity in Malesia III: Proceedings of the 3rd 
International Flora Malesiana Symposium 1995 (J Drandsfield, MJE Coode 
and DA Simpson, eds.). Royal Botanic Gardens, Kew, UK. pp. 295–319
Struss D and Plieske J. 1998. The use of microsatellite markers for detection of 
genetic diversity in barley populations. Theoretical and Applied Genetics 
97:308–315.
Stuber CW, Polacco M and Lynn Sr. M. 1999. Synergy of empirical breeding, 
marker-assisted selection, and genomics to increase crop yield potential. 
Crop Science 39:1571–1583.
Stuessy TF. 1990. Plant taxonomy: the systematic evaluation of comparative data. 
Columbia University Press, New York, USA.
Swanson T. 1996. Global values of biological diversity: the public interest in 
the conservation of plant genetic resources for agriculture. Plant Genetic 
Resources Newsletter 105:1–7.
Sytsma KJ and Hahn W. 1997. Molecular systematics: 1994-1995. Progress in Botany 
58:470–499.
Takaiwa F, Oono K and Sugiura M. 1985. Nucleotide sequence of the 17S - 25S spacer 
region from rice rDNA. Plant Molecular Biology 4:355–364.
112 IPGRI TECHNICAL BULLETIN NO. 10
Tang X and Zhang W. 1992. Studies on pre-selection of dwarf apple seedlings by 
starch gel electrophoresis. Acta Horticulturae 317:29–34.
Tanksley SD and McCouch SR. 1997. Seed banks and molecular maps: unlocking 
genetic potential from the wild. Science 277:1063–1066.
Tanksley SD and Nelson JC. 1996. Advanced backcross QTL analysis: a method for 
the simultaneous discovery and transfer of valuable QTLs from unadapted 
germplasm into elite breeding lines. Theoretical and Applied Genetics 
92:191–203. 
Tanksley SD and Orton TJ. 1983. Isozymes in plant genetics and breeding. Elsevier 
Science Publishers, Amsterdam, The Netherlands.
Tao R and Sugiura A. 1987. Cultivar identification of Japanese persimmon by leaf 
isozymes. HortScience 22:932–935.
Telenius H, Carter NP, Bebb CE, Nordenskjold M, Ponder BAJ and Tunnacliffe A. 
1992. Degenerate oligonucleotide-primed PCR: general amplification of target 
DNA by a single degenerate primer. Genomics 13:718–725.
Templeton AR. 1986. Coadaptation and outbreeding depression. in Conservation 
biology: the science of scarcity and diversity (M Soulé, ed.). Sinauer Press, 
Sunderland, Massachusetts, USA. pp. 105–116.
Templeton, AR. 1989. The meaning of species and speciation: a genetic perspective. 
In Speciation and its consequences (D Otte and JA Endler, eds.). Sinauer 
Associates, Inc., Sunderland, Massachusetts, USA. pp. 3–27
Thiel T, Michalek W, Varshney RK and Graner A. 2003. Exploiting EST databases for 
the development and characterization of gene-derived SSR-markers in barley 
(Hordeum vulgare L.). Theoretical and Applied Genetics 106:411–422.
Thieme R, Darsow U, Gavrilenko T, Dorokhov D and Tiemann H. 1997. Production 
of somatic hybrids between S. tuberosum L. and late blight resistant Mexican 
wild potato species. Euphytica 97:189–200.
Thorman CE, Ferreira ME, Camargo LEA, Tivang JG and Osborn TC. 1994. 
Comparison of RFLP and RAPD markers to estimating genetic relationships 
within and among cruciferous species. Theoretical and Applied Genetics 
88:973–980.
Timmons AM, O’Brien ET, Charters YM, Dubbels SJ and Wilkonson MJ. 1995. 
Assessing the risks of wind pollination from fields of genetically modified 
Brassica napus ssp. oleifera. Euphytica 85:417–423.
Tohme J, Gonzalez DO, Beebe S and Duque MC. 1996. AFLP analysis of gene pools 
of a wild bean core collection. Crop Science 36:1375–1384.
Tsaftaris SA and Shull GH. 1995. Molecular aspects of heterosis in plants. Physiology 
of Plants 94:362–370.
Tsegaye S, Tesemma T and Belay G. 1996. Relationships among tetraploid wheat 
(Triticum turgidum L) landrace populations revealed by isozyme markers and 
agronomic traits. Theoretical and Applied Genetics 93:600–605.
Urrea CA, Miklas PN, Beaver JS and Riley RH. 1996. A codominant randomly 
amplified polymorphic DNA (RAPD) marker useful for indirect selection of 
bean golden mosaic virus resistance in common bean. Journal of the American 
Society of Horticultural Science 121:1035–1039.
Ursla FWM, Hayward MD and Kearsey MJ. 1997. Isozyme and quantitative traits 
polymorphisms in European provenances of perennial ryegrass (Lolium 
perenne L.). Euphytica 93:263–269.
van der Linden G, Smulders MJM and Vosman B. 2004. Motif-directed profiling: 
a glance at molecular evolution. In: Plant species-level systematics: new 
perspectives on pattern and process (FT Bakker, LW Chatrou, B Gravendeel 
and PB Pelser, eds.). Regnum Vegetabile 142. Koeltz, Königstein (in press).
Molecular markers for genebank management 113
van Hintum TJL. 1994. Comparison of marker systems and construction of a core 
collection in a pedigree of European spring barley. Theoretical and Applied 
Genetics 89:991–997.
van Hintum TJL, Boukema IW and Visser DL. 1996. Reduction of duplication in a 
Brassica oleracea germplasm collection. Genetic Resources and Crop Evolution 
43:343–349.
van Hintum TJL, Brown AHD, Spillane C and Hodgkin T. 2000. Core collections of plant 
genetic resources. IPGRI Technical Bulletin No. 3. International Plant Genetic 
Resources Institute, Rome, Italy.
van Hintum TJL and Visser DL. 1995. Duplication within and between germplasm 
collections. II. Duplication in four European barley collections. Genetic Resources 
and Crop Evolution 42:135–145.
van Hintum TJL, von Bothmer R and Visser DL. 1995. Sampling strategies for composing 
a core collection of cultivated barley (Hordeum vulgare s. lat.) collected in China. 
Hereditas 122:7–17.
van Raamsdonk LWD and van der Maesen LJG. 1996. Crop-weed complexes: The 
complex relationship between crop plants and their wild relatives. Acta Botanica 
Neerlandica 45:135–155.
van Treuren R, Bijlsma R, Tinbergen JM, Heg D and van de Zande L. 1999. Genetic 
analysis of the population structure of socially organized oystercatchers 
(Haematopus ostralegus) using microsatellites. Molecular Ecology 8:181–187.
van Treuren R, Magda A, Hoekstra R and van Hintum TJL. 2004. Genetic and economic 
aspects of marker-assisted reduction of redundancy from a wild potato 
germplasm collection. Genetic Resources and Crop Evolution 51:277–290. 
van Treuren R and van Hintum TJL. 2001. Identification of intra-accession genetic 
diversity in selfing crops using AFLP markers: implications for collection 
management. Genetic Resources and Crop Evolution. 48:287–295.
van Treuren R, van Soest LJM and van Hintum TJL. 2001. Marker-assisted rationalisation 
of genetic resources collections: a case study in flax using AFLPs. Theoretical 
and Applied Genetics 103:144–152.
van Valen L. 1976. Ecological species, multispecies and oaks. Taxon 25:233–239.
Varghese YA, Knaak C, Sethuraj MR and Ecke W. 1997. Evaluation of random amplified 
polymorphic DNA (RAPD) markers in Hevea brasiliensis. Plant Breeding 
116:47–52.
Vignani R, Bowers JE and Meredith CPL. 1996. Microsatellite DNA polymorphism 
analysis of clones of Vitis vinifera ‘Sangiovese’. Science and Horticulture 
65:163–169.
Virk PS, Newbury HJ, Jackson MT and Ford-Lloyd BV. 1995. The identification of 
duplicate accessions within a rice germplasm collection using RAPD analysis. 
Theoretical and Applied Genetics 90:1049–1055.
Virk PS, Newbury HJ, Jackson MT and Ford-Lloyd BV. 2000a. Are mapped markers 
more useful for assessing genetic diversity? Theoretical and Applied Genetics 
100:607–613.
Virk PS, Zhu J, Newbury HJ, Bryan GJ, Jackson MT and Ford-Lloyd BV. 2000b. 
Effectiveness of different classes of molecular marker for classifying and revealing 
variation in rice (Oryza sativa) germplasm. Euphytica 112:275–284.
Von Bothmer R and Seberg O. 1995. Strategies for the collecting of wild species. Pp. 
93-111 in Collecting Plant Genetic Diversity, Technical Guidelines (L Guarino, V 
Ramanatha Rao and R Reid, eds.). CAB International, Wallingford, UK.
Vos P, Hogers R, Bleeker M, Reijans M, van de Lee T, Hornes M, Frijters A, Pot J, Peleman 
J, Kuiper M and Zabeau M. 1995. AFLP: a new technique for DNA fingerprinting. 
Nucleic Acids Research 23:4407–4414.
114 IPGRI TECHNICAL BULLETIN NO. 10
Wagner DB and Allard RW. 1991. Pollen migration in predominantly self-fertilizing 
plants: barley. Journal of Hereditity 82:302–304.
Wang Z, Weber JL, Zhong G and Tanksley SD. 1994. Survey of plant short tandem 
DNA repeats. Theoretical and Applied Genetics 88:1–6.
Warburton FE. 1967. The purposes of classifications. Systematic Zoology 16:241–
245.
Warnke SE, Douches DS and Branham BE. 1998. Isozyme analysis supports 
allotetrapoloid inheritance in tetraploid creeping bluegrass (Agrostis palustris 
Huds.). Crop Science 38:801–805.
Waser NM. 1993. Population structure, optimal outbreeding, and assortative mating 
in angiosperms. In The natural history of inbreeding and outbreeding, 
theoretical and empirical perspectives (NW Thornhill, ed.). University of 
Chicago Press, Chicago, USA. pp. 173–199
Watanabe KN, Orrillo M, Vega S, Valkonen JPT, Pehu E, Hurtado A and Tanksley SD. 
1995. Overcoming crossing barriers between nontuber-bearing and tuber-
bearing Solanum species: towards potato germplasm enhancement with a 
broad spectrum of solanaceous genetic resources. Genome 38:27–35.
Waugh R, McLean K, Flavell AJ, Pearce SR, Kumar A, Thomas BBT and Powell W. 
1997. Genetic distribution of Bare-1-like retrotransposable elements in the 
barley genome revealed by sequence-specific amplification polymorphisms 
(S-SAP). Molecular Genetics and Genomics 253:687–694.
Waycott W and Fort SB. 1994. Differentiation of nearly identical germplasm 
accessions by a combination of molecular and morphologic analyses. 
Genome 37:577–583.
Weaver KR, Callahan LM, Caetano Anolles G and Gresshoff PM. 1995. DNA 
amplification fingerprinting and hybridization analysis of centipedegrass. 
Crop Science 35:881–885.
Weising K, Nybom H, Wolff K and Kahl G. 2005. DNA fingerprinting in plants: 
principles, methods, and applications CRC Press, Boca Raton, Florida, USA.
Weising K, Winter P, Huttel B and Kahl G. 1998. Microsatellite markers for molecular 
breeding. Journal of Crop Production 1:113–143.
Welsh J and McClelland M. 1990. Fingerprinting genomes using PCR with arbitrary 
primers. Nucleic Acids Research 18:7213–7218. 
Wendel JF. 1995. Cotton. In Evolution of Crop Plants (J Smart and NW Simmonds, 
eds). Longman, Essex, UK. pp. 358–366
Wendel JF and Doyle JJ. 1998. Phylogenetic incongruence: window into genome history 
and molecular evolution. In Molecular systematics of plants II: DNA sequencing 
(DE Soltis, PS Soltis and JJ Doyle, eds.). Kluwer Academic Publishers, Boston, 
USA. pp. 265–296
Westerbergh A and Doebley J. 2002. Morphological traits defining species differences 
in wild relatives of maize are controlled by multiple quantitative trait loci. 
Evolution 56:273–283.
Whitkus R, de la Cruz M, Mota Bravo L, Gomez Pompa A and De la Cruz M. 1998. 
Genetic diversity and relationships of cacao (Theobroma cacao L.) in southern 
Mexico. Theoretical and Applied Genetics 96:621–627.
Whitkus R, Doebley J and Wendel JF. 1994. Nuclear DNA markers in systematics and 
evolution. In DNA-based markers in plants (Advances in cellular and molecular 
biology of plants, vol. 1.) (RL Phillips and IK Vasil, eds.). Kluwer Academic 
Publishers, Dordrecht, The Netherlands. pp. 116–141.
Wiley EO, Brooks DR, Siegel-Causey D and Funk VA. 1991. The complete cladist: a 
primer of phylogenetic procedures. Museum of Natural History, Lawrence, 
KA, USA.
Molecular markers for genebank management 115
Williams CE, Yanagihara S, McCouch SR, Mackill DJ and Ronald PC. 1997. Predicting 
success of indica/japonica crosses in rice, based on a PCR marker for the S-5(n) 
allele at a hybrid-sterility locus. Crop Science 37:1910–1912.
Williams JGK, Kubelik AR, Livak KJ, Rafalski JA and Tingey SV. 1990. DNA 
polymorphisms amplified by arbitrary primers are useful as genetic markers. 
Nucleic Acids Research 18:6531–6535.
Witsenboer H, Vogel J and Michelmore RW. 1997. Identification, genetic localization, 
and allelic diversity of selectively amplified microsatellite polymorphic loci in 
lettuce and wild relatives (Lactuca spp.). Genome 40:923–936.
Wolfe AD and Liston A. 1998. Contributions of PCR-based methods to plant systematics 
and evolutionary biology. In Plant molecular systematics II (DE Soltis, PS Soltis 
and JJ Doyle, eds.). Kluwer Academic Publishers, Dordrecht, The Netherlands. 
pp. 43–86
Wolfe KH, Li W-H and Sharp PM. 1987. Rates of nucleotide substitution vary greatly 
among plant mitochondrial, chloroplast, and nuclear DNAs. Proceedings of 
the National Academy of Sciences USA 84:9054–9058.
Wolff K, Rogstad SH and Schaal BA. 1994. Population and species variation of 
minisatellite DNA in Plantago. Theoretical and Applied Genetics 87:733–740.
Xiao J, Grandillo S, Ahn SN, McCouch SR, Tanksley SD, Li J and Yuan L. 1996. Genes 
from wild rice improve yield. Nature 384:223–224. 
Yabuno T. 1962. Cytotaxonomic studies on the two cultivated species and the wild 
relatives in the genus Echinochola. Cytologia 27:296–305.
Yamada T, Misoo S, Ishii T, Ito Y, Takaoka K and Kamijima O. 1997. Characterization 
of somatic hybrids between tetraploid Solanum tuberosum L. and dihaploid S. 
acaule. Breeding and Science 47:229–236.
Yang WP, De Oliveira AC, Godwin I, Schertz K and Bennetzen JL. 1996. Comparison of 
DNA marker technologies in characterizing plant genome diversity: variability 
in Chinese sorghums. Crop Science 36:1669–1676.
Young ND. 1999. A cautiously optimistic vision for marker-assisted breeding. Molecular 
Breeding 5:505-510.
Young WP, Schupp JM and Keim P. 1999. DNA methylation and AFLP marker 
distribution in the soybean genome. Theoretical and Applied Genetics 99:785–
792.
Zamir D. 2001. Improving plant breeding with exotic genetic libraries. National Review 
of Genetics 2:983–989.
Zerega NJC, Ragone D and Motley TJ. 2004. Complex origins of breadfruit (Artocarpus 
altilis, Moraceae): implications for human migrations in Oceania. American 
Journal of Botany 91:760–766.
Zeven AC, Dehmer KJ, Gladis T, Hammer K and Lux H. 1998. Are the duplicates of 
perennial kale (Brassica oleracea L. var. ramosa DC.) true duplicates as determined 
by RAPD analysis? Genetic Resources and Crop Evolution 45:105–111.
Zhou Z, Bebeli PJ, Somers DJ and Gustafson JP. 1997. Direct amplification of 
minisatellite-region DNA with VNTR core sequences in the genus Oryza. 
Theoretical and Applied Genetics 95:942–949.
Zietkiewicz E, Rafalski A and Labuda D. 1994. Genome fingerprinting by simple 
sequence repeat (SSR)-anchored polymerase chain reaction amplification. 
Genomics 20:176–183.
Zoro Bi I, Maquet A, Degreef J, Wathelet B and Baudoin JP. 1998. Sample size for 
collecting seeds in germplasm conservation: the case of the Lima bean (Phaseolus 
lunatus L.). Theoretical and Applied Genetics 97:187–194.
116 IPGRI TECHNICAL BULLETIN NO. 10
Glossary
Agarose: A chain of sugar molecules that form the basis of agarose gels 
used for electrophoresis.
Allele mining: A research field directed to the identification of useful 
alleles within genetic resources collections.
Allozymes: Allelic forms of an enzyme that can be distinguished by 
gel electrophoresis (see isozyme).
Amplicons: DNA fragments amplified by PCR.
Amplified Fragment Length Polymorphism (AFLP): A molecular 
marker technique that targets variation in DNA restriction sites and 
in DNA restriction fragments. 
Apospecies: A species concept based on cladistics that does not insist on 
monophyly. It recognizes species pairs, one a monophyletic daughter 
species (apospecies) and the other a paraphyletic progenitor species 
(plesiospecies).
Arbitrarily Primed Polymerase Chain Reaction (AP-PCR): A variant of 
the RAPD technique that uses longer arbitrary primers than RAPDs.
Association genetics: A research field directed to the identification of 
correlations between phenotypic traits and genetic markers with the 
aim to identify and locate the underlying genes in the genome.
Basic Local Alignment Search Tool (BLAST): A bioinformatics tool to 
find matches between a DNA sequence and known sequences stored 
in databases.
Bayesian analysis: A statistical approach for constructing phylogenetic 
trees related to maximum likelihood that operates on a priori weighting 
factors and probabilities (see maximum likelihood, parsimony).
Bootstrap analysis: A method in cladistic analysis to infer the “strength” 
or “confidence” of a branch on a phylogenetic tree, obtained by 
generating trees many times from a sample distribution of characters. 
Bootstrap values theoretically can vary from 0% (poor support) to 100% 
(excellent support).
Molecular markers for genebank management 117
Capillary electrophoresis: A technique to separate DNA fragments in 
an electric field, carried out within narrow tubes (capillaries).
cDNA-AFLP: A molecular marker technique performing AFLP analysis 
on cDNA.
Cladistic Species Concept: A philosophy and set of methods that use 
cladistic criteria to determine the limits of species.
Cladistics: A branch of biology that determines evolutionary 
branching orders or trees of descent based on derived similarities (see 
phenetics).
Cladogram: A branching phylogenetic tree of individuals or taxa, 
rooted on an outgroup(s) produced by a method that minimizes 
evolutionary changes (by parsimony, maximum likelihood, or other 
methods) of characters believed to be homologous among a group of 
organisms.
Cleaved Amplified Polymorphic Sequence (CAPS): A molecular 
marker technique that is based on the amplification of DNA fragments 
by PCR followed by DNA restriction of the fragments.
Codominant marker: A genetic marker for which all alleles are 
expressed when co-occurring in an individual.
Comparative genomics: A research field directed to the comparison of 
genomes from different species in order to obtain a better understanding 
of the evolution of species and the location and functioning of genes.
Complementary DNA (cDNA): In vitro generation of DNA constructed 
from mRNA. 
Concatenated dataset: A combined dataset that connects together many 
individual datasets into one, like links in a chain, so that analyses can 
be done as a single unit.
Concerted evolution: A process whereby repetitive DNA families 
maintain one type of sequence within the repeat (become homogenized), 
through genetic mechanisms of unequal crossing over and gene 
conversion.
Congruence: Agreement in results of as taxonomic analysis; refers to 
both phenetic and cladistic results.
118 IPGRI TECHNICAL BULLETIN NO. 10
Conserved Orthologous Set Markers (COS):  Conservative 
molecular markers that are used as anchors for map development 
in comparative genomics studies.
Core collection: A subset of the entire germplasm collection that 
incorporates a representative sample of variation with a minimum 
of redundancy. This is an attempt to bring the entire collection to 
a workable size for economic or space or other constraints, and 
to facilitate its use, by choosing accessions that represent its most 
representative or useful diversity.
Dendrogram: A branching diagrammatic representation of a set 
of individuals or taxa, constructed from overall similarity of a set 
of characters among organisms.
Degenerate Oligonucleotide Primed-PCR (DOP-PCR): A molecular 
marker technique that uses partially degenerated primers for 
polymorphism detection in comparative genomics studies.
Deoxynucleotides: The components of DNA: Adenine (A), 
Cytocine (C), Guanine (G) or Thymidine (T). 
Directed Amplification of Minisatellite-region DNA (DAMD): 
A technique related to ISSR analysis that uses a single primer 
containing only the core motif of a minisatellite. 
DNA Amplification Fingerprinting (DAF): A variant of the RAPD 
technique that uses shorter, 5–8 bp primers to generate a larger 
number of fragments.
DNA chip: Also known as microarray. A high throughput screening 
technique based on the hybridization between oligonucleotide 
probes and either DNA or mRNA, carried out on miniaturized 
reaction surfaces.
DNA sequencing: The determination of the sequence of 
deoxynucleotides in DNA fragments.
Diversity study: The use of morphological or molecular markers 
to assess the diversity of a set of related accessions.
Domestication syndrome: A set of similar traits that confer 
adaptation to the cultivated environment. Specific traits will vary 
among different crops.
Molecular markers for genebank management 119
Dominant marker: A genetic marker for which only a single allele is 
expressed when multiple alleles are co-occurring in an individual.
Eclectic species concept: A philosophy that species are defined and 
formed and maintained by a variety of morphological, interbreeding, 
ecological, and phylogenetic factors.
Ecological farming: A farming system that aims to develop an 
integrated, humane, environmentally and economically sustainable 
agricultural production system.
Ecological species concept: A philosophy that ecological constraints 
are the primary factor in forming and maintaining a species.
Electrophoresis: A technique to separate proteins or DNA fragments 
in an electric field.
EST-SNP: SNP analysis targeted to ESTs.
EST-SSR: Microsatellite analysis targeted to ESTs.
Exotic libraries: Collections of elite crop lines containing defined 
genomic regions from wild species, to provide pre-breeding material 
for modern varieties.
Expressed Sequence Tag (EST): A  DNA sequence derived from 
transcribed regions of a genome.
Extant: Existing or living at the present time, in contrast to extinct 
(no longer living).
Fluorescent labeling: The labeling of probes or primers with 
fluorescent tags in order to enable detection of variation in DNA 
fragments.
Functional diversity:  Genetic diversity as assessed by variation 
in transcribed regions of the genome that is known to be associated 
with a biological function.
Gene flow: The natural flow of genes within a population or from 
one population to another by interbreeding or migration.
Genetic distance: A measure to quantify the genetic relatedness 
between individuals or populations.
120 IPGRI TECHNICAL BULLETIN NO. 10
Genetic drift: Fluctuations in genetic variation between generations 
as a result of random processes.
Genomic DNA: The full complement of DNA contained in the 
genome of a cell or organism.
Geographic Information Systems (GIS): A set of computer software 
designed to capture, organize, store, and analyze geographically 
referenced (spatial) information.
Heterotic groups: Groups of germplasm that when crossed maximize 
heterosis. Heterosis is a phenomenon that heterozygotes in a 
population often have higher fitness than the homozygotes.
Heterozygosity: The condition of having one or more pairs of 
different alleles on homologous chromosomes.
Homologous: Characters that arise by common descent.
Homoplasy: A term in cladistic analysis that refers to the proportion 
of parallelisms and reversals on a phylogenetic tree. Also used 
for different DNA fragments of identical size that cannot be 
distinguished by gel electrophoresis. 
Ingroup: A putatively monophyletic group that is the prime subject 
of a cladistic analysis.
Interbreeding species concept: A philosophy and set of methods 
that define species almost entirely on the ability of species to 
exchange genes naturally or artificially, as assessed by artificial 
crossing programs, studies of mechanisms to facilitate gene flow, 
and biological isolating mechanisms.
Internal Transcribed Spacer (ITS): A sequence of nuclear ribosomal 
DNA commonly examined for phylogenetic analysis consisting 
of two spacer regions intercalated between the 18S, 5.8S, and 26S 
genes. 
Inter-retrotransposon Amplified Polymorphism (IRAP): A 
molecular marker technique that targets variation in retrotransposon 
insertion sites.
Inter Simple Sequence Repeat (ISSR): A molecular marker technique 
that targets variation in the DNA between two microsatellite loci.
Molecular markers for genebank management 121
Interspecific: Refers to studies among species.
Intraspecific: Refers to studies of taxa or populations within a 
species.
Isozymes: Enzymes that have the same chemical function as 
another enzyme but differ in structure as a result of different amino 
acid composition.
Leucine Rich Repeat (LRR): Conserved domain of disease 
resistance genes.
Linkage disequilibrium: The non-random association of the alleles 
from different loci in the gametes.
Linked markers: The presence of genes (or molecular probes) close 
to each other on chromosomes so that they are never completely 
independently assorted; the degree of linkage is greater the closer 
the genes or markers are on the same chromosome.
Long branch attraction: A phenomenon in cladistic analyses where 
strongly unequal rates of evolutionary change in different members 
of a group cause cladistics to produce incorrect trees.
Luciferase: An enzyme used to determine the concentration of ATP 
as a measurable real-time light signal in pyrosequencing.
Mantel test: A test that computes the linear correlation between two 
proximity matrices (dissimilarity or similarity), used in phenetics 
to test whether results from different analyses of the same taxa are 
similar or different. 
Mapped markers: Molecular markers with known chromosomal 
locations.
Marker index: A way to assess the comparative degree of 
information provided by different molecular marker systems, as 
assessed by the product of heterozygosity and multiplex ratio.
Maximum likelihood: A set of methods used to construct 
cladograms based on certain evolutionary models of character state 
changes (see Bayesian analysis, parsimony).
122 IPGRI TECHNICAL BULLETIN NO. 10
Microarray: Also known as DNA chip. A high throughput screening 
technique based on the hybridization between oligonucleotide 
probes and either DNA or mRNA, carried out on miniaturized 
reaction surfaces.
Microsatellite: A molecular marker technique that targets tandem 
repeats of a small (1–6 base pairs) nucleotide repeat motif. 
Minisatellites: A molecular marker technique that targets tandem 
repeats of a large (10–50 base pairs) nucleotide repeat motif.
Monophyletic: A group that includes an ancestral species and all 
of its descendants.
Morphological Species Concept: A philosophy and set of 
methods that define species entirely on morphological or anatomical 
characters.
mRNA: Messenger RNA.
Multiple Arbitrary Amplicon Profiling (MAAP): A collective term 
for techniques using single arbitrary primers, such as AP-PCR, DAF, 
and RAPD.
Multiplex ratio: A measure of the number of bands simultaneously 
analyzed per DNA marker assay (experiment), that is, the number 
of bands resolved on a particular gel. 
Neutral marker diversity: Genetic diversity as assessed by variation 
in non-transcribed regions of the genome that are not known to be 
associated with a biological function.
Nominalistic Species Concept: A philosophy that questions 
the very existence of species, and believes that individuals or 
interbreeding populations are the only population system with any 
objective reality.
Nucleotide Binding Site (NBS): A conserved domain of disease 
resistance genes.
Nucleotide Binding Site-Directed Profiling (NBS-DP): A molecular 
marker technique that targets variation at disease resistance genes 
and analogues.
Molecular markers for genebank management 123
Oligonucleotide probe: A DNA fragment consisting of nucleotides 
that is used to detect the presence of its complementary sequence in a 
DNA sample by hybridization testing.
Orthologous: Characters that are homologous from a speciation 
event, that is, identical by descent.
Outgroup: Any group, used to root a phylogenetic tree in a cladistic 
analysis, which is not a member of the taxon group being studied.
Paralogous: Characters that have arisen as a result of gene 
duplication.
Paraphyletic: A non-monophyletic group containing some, but 
not all representatives of a taxon; said another way, an incomplete 
group of descendants from one common ancestor with one or more 
descendants missing.
Parsimony: A set of methods that assumes that the simplest 
solution is the most likely one. It is used to construct cladograms, 
and assumes that minimizing the number of character state changes 
on a tree is the best approximation of phylogenetic history. (see 
Bayesian analysis, maximum likelihood).
Pedigree: A list of ancestors (often displayed on a branching tree) 
based on known relationships from formal records of crosses.
Phenetics: A branch of biology that determines overall similarity 
of organisms, not evolutionary relationships (see cladistics).
Phenogram: A branching tree of individuals or taxa based on 
phenetics.
Plesiospecies: A species defined by a cladistic concept that does not 
insist on monophyly. It recognizes species pairs, one a monophyletic 
daughter species (apospecies) and the other a paraphyletic progenitor 
species (plesiospecies).
Polyacrylamide: A chemical that forms the basis of polyacrylamide 
gels used for electrophoresis. 
Polymerase Chain Reaction (PCR): A common molecular technique 
used to generate numerous copies of specific DNA fragments in 
vitro.
124 IPGRI TECHNICAL BULLETIN NO. 10
Polyphyletic: A non-monophyletic group where the common 
ancestor is placed in another taxon; said another way, a group in 
which the members do not ultimately derive from one common 
ancestor, where the descendants of one or more other groups are 
included.
Pre-breeding/Pre-competitive breeding: The development of 
germplasm with a genetically broader base for utilization by 
breeders, such as the introduction of exotic germplasm into a 
cultivar.
Predictive component of taxonomy: The idea that inferences or 
predictions can be made from taxonomy. 
Proximity matrices: A numerical measure of similarity (or 
dissimilarity) among objects, used in multivariate analysis, as 
assessed by various formulas.
Pyrophosphate: A chemical molecule that is released each time 
a deoxynucleotide is incorporated to the new DNA strand during 
pyrosequencing.
Pyrosequencing: A new sequencing method of relatively short DNA 
templates based on real-time (quantitative) pyrophosphate release.
Radioactive labeling: The labeling of probes or primers with 
radioactive tags in order to enable detection of variation in DNA 
fragments.
Random Amplified Polymorphic DNA (RAPD): A molecular 
marker technique that uses primers of random sequence to amplify 
DNA fragments by PCR.
Redundancy: Refers to identical or near identical items. In germplasm 
analyses it refers to (near) duplicate germplasm accessions.
Resistance Gene Homologue Polymorphism (RGHP): A group of 
molecular marker techniques that target groups of resistance genes 
by PCR using primers aimed at conserved domains of resistance 
genes.
Restriction Fragment Length Polymorphism (RFLP): A molecular 
marker technique that targets variation in DNA restriction sites and 
in DNA restriction fragments.
Molecular markers for genebank management 125
Re-synthesized/Recreated: In studies of hybrid origins of species, 
investigators may attempt to “re-synthesize” a hybrid by crossing 
its putative parents and compare the natural to the putative hybrid 
by morphological or molecular markers.
Retrotransposon-Based Insertional Polymorphism (RBIP): A 
molecular marker technique that targets variation in retrotransposon 
insertion sites.
Retrotransposon-Microsatellite Amplified Polymorphism 
(REMAP): A molecular marker technique that targets variation in 
retrotransposon insertion sites.
Selective Fragment Length Amplification (SFLA): A synonym 
used for AFLP.
Selective neutrality: The state in which genetic variation is 
influenced only by random processes.
Selective Restriction Fragment Amplification (SRFA): A synonym 
used for AFLP.
Selectively Amplified Microsatellite Polymorphic Locus (SAMPL): 
A variation of the AFLP technique that amplifies microsatellite 
loci by using a single AFLP primer in combination with a primer 
complementary to compound microsatellite sequences, which do 
not require prior cloning and characterization.
Sequence Characterized Amplified Region (SCAR): A molecular 
marker technique that amplifies DNA fragments by PCR using 
specific primers, designed from nucleotide sequences established 
from cloned RAPD fragments linked to a trait of interest.
Sequence-Related Amplified Polymorphism (SRAP): A molecular 
marker technique that targets subsets of open reading frames from 
coding sequences in the genome.
Sequence-Specific Amplified Polymorphism (S-SAP): A dominant, 
multiplex marker system for the detection of variation in DNA 
flanking a retrotransposon insertion site
Simple Sequence Repeat (SSR): A synonym used for 
microsatellites.
126 IPGRI TECHNICAL BULLETIN NO. 10
Single Nucleotide Polymorphism (SNP): A molecular marker 
technique to detect changes in a single nucleotide position.
Single Primer Amplification Reaction (SPAR): A techniques related 
to ISSR analysis that uses a single primer containing only the core 
motif of a microsatellite.
Single-Strand Conformation Polymorphism (SSCP): A molecular 
marker technique that uses PCR and gel electrophoresis of single-
strand DNA to detect nucleotide sequence variation among amplified 
DNA fragments.
Sympatric: Refers to two or more populations that occupy 
overlapping geographic areas.
TaqMan™: A trademark term for a high throughput, closed tube 
assay to detect specific sequences in PCR products.
Target Region Amplification Polymorphism (TRAP): A molecular 
marker technique related to SRAP, but using a fixed primer designed 
from a targeted EST sequence. 
Template DNA: The “template” DNA that is used for the initial 
reaction of a molecular marker analysis.
Tiling strategy: The detection of allelic variation at genes of known 
basic sequence by hybridization tests on microarrays using large 
series of sequence-overlapping probes.
Variable Number of Tandem Repeats (VNTR): A synonym of 
minisatellites.
Molecularmarkers 
for genebank management
ISBN-13: 978-92-9043-684-3
ISBN-10: 92-9043-684-0
M
o
le
cu
lar m
arke
rs   
Te
ch
n
ical B
u
lle
tin
 N
o
. 1
0
 
IP
G
R
I
IPGRI TECHNICAL BULLETIN NO. 10
D. Spooner, R. van Treuren and M. C. de Vicente
IPGRI is
a Future Harvest Centre
supported by the
Consultative Group on
International Agricultural
 Research (CGIAR)
IPGRI is
a Future Harvest Centre
supported by the
Consultative Group on
International Agricultural
Research (CGIAR)