12. Translational Breeding

From PlantBreeding
Jump to: navigation, search

by Cecilia McGregor, Donna K. Harris, Ben Stewart-Brown, Nicole Bachleda, Zachary King, Clint Steketee, Justin Vaughn, and Zenglu Li, Institute of Plant Breeding, Genetics and Genomics, University of Georgia


Marker Discovery

Marker type and evolution

Restriction Fragment Length Polymorphism (RFLP)

RFLP is a genotyping method that was popular for genetic mapping of plants in the 1990s. The method is based on restriction enzyme (RE) digestion and DNA hybridization. Due to the RE digestion step, this method requires the extraction of a large amount of high quality DNA. After digesting the genomic DNA with a RE, the DNA fragments are separated by agarose gel electrophoresis, and transferred to a nylon membrane via Southern Blotting (Southern, 1975). Labeled probes are then hybridized to the digested DNA on the membrane. The position of the visualized probes on the membrane can detect DNA polymorphisms that change the RE cut site or the size of the fragments between cut sites.

Generally, single copy probes were selected for RFLP analysis, resulting in co-dominant markers in diploid species. Co-dominant markers are markers where both homozygote genotypes can be differentiated from the heterozygote genotype. RFLPs were very useful markers, but the need for large amounts of high quality DNA, low throughput, the need for prior sequence information and the inability to automate the procedure (Table 1) means that it is now rarely used.

Random Amplified Polymorphic DNA (RAPD)

The RAPD marker technique (Welsh and McClelland, 1990; Williams et al., 1990) is a polymerase chain reaction (PCR) based method that is quick, easy and requires no prior sequence information. PCR is an incredibly powerful tool in molecular biology enabling large quantities of DNA to be amplified from very small starting quantities. PCR is based on three main components; complementary DNA primers, DNA polymerase, and temperature cycling of the reaction. Using these three components new DNA strands are synthesized with each reaction cycle, theoretically doubling the quantity with each cycle. A typical PCR reaction will continue for about thirty-five cycles.

Fig.1. RAPD fragments separated on an agarose gel and visualized using Ethiudium Bromide and UV light. The first lane is the size standard.

In the RAPD technique a single random 10 basepair primer is used to specify the sequence to be amplified. Due to the short primer sequence, many fragments of the genome will be amplified. These amplified fragments can then be separated through agarose gel electrophoresis (Fig 1). The homozygous presence of a fragment is not distinguishable from its heterozygote, and thus RAPDs are dominant markers. Fragments are usually simply scored as present (1) or absent (0).

The main advantage of the RAPD technique is that no prior sequence information is needed, making it popular for minor crops where genomic tools (and sequence information) are limited. Since the method is based on PCR, only small amounts of DNA are needed (Table 1). However the resulting markers a dominant and results are often not transferable between different laboratories.

Amplified Fragment Length Polymorphism (AFLP)

The AFLP technique (Vos et al., 1995) is based on RE digestion, followed by selective amplification of fragments through the ligation of terminal adaptor sequences, and PCR primers modified by adding two or three selective nucleotides. The resulting fragments were traditionally separated on poly-acryl amide gels (Fig. 2), but the use of capillary systems is now common. Each AFLP run yields many fragments per individual and like RAPDs, these markers are scored as present of absent, with each fragment scored as a single locus. AFLPs are considered more reliable and reproducible than RAPDs, but the need for RE digestion means higher quality and quantities of DNA have to be extracted (Table 1).

Table 1. AFLP fragments separated on a poly-acryl amide gel.

Simple Sequence Repeats (SSRs)

SSRs (Cregan, 1992; Morgante & Olivieri, 1993) are short nucleotide repeats consisting of tandemly repeated units, each between one and 10 base-pairs in length, that are common in plant and animal genomes. Simple (or short) sequence repeats are also known as STR (short tandem repeats) or “microsatellites”. The repeat region is amplified using primers homologous to the sequences flanking these repeats (Fig. 3). Polymorphisms in the number of repeats are observed as differences in the size of amplified fragments. A particular SSR locus can have a large number of alleles (Fig 4), making them highly informative.

Fig 3. Short repeat regions of the genome is amplified using flanking primers during SSR analysis.

SSRs have been extensively used for genetic mapping and diversity studies in plants. The main limitation of these techniques used to be the difficulty of cloning and sequencing the regions flanking the SSR. This must be done for each species and even then the PCR primers designed may not always reveal a high level of polymorphism. The flanking regions are relatively species specific. Markers developed for a particular species are not always useful for application to even closely related species. However with the advances in whole genome sequencing, sequence information that can be used to identify SSRs and develop flanking primers and is now readily available for many species.

The reliability and reproducibility of the markers, especially between laboratories, makes SSR a popular genotyping technique (Table 1). A particular SSR locus can have a large number of alleles and multiple loci can the visualized together by labeling them with different fluorescent dyes (e.g. Fam, Hex, Tamra, etc). SSR alleles are usually described based on the fragment size. Recently methodologies have been developed that use universal dye labeled primers for SSR detection (de Arruda et al., 2010), which decreases the overall cost of the method.

Fig 4. (a) SSRs in tetraploid potato visualized using silver staining. (b) Twp SSR loci (blue and red) in diploid species visualized using two florescent lavels.

Single Nucleotide Polymorphism (SNP)

SNP markers are biallelic and codominant and their abundance in plant genomes makes them a powerful tool in plant breeding (Fig. 5). Prior sequence information is needed for SNP genotyping, but technologies like genotyping-by-sequencing (Elshire et al., 2011) allow simultaneous SNP discovery and detection. The advent of next generation sequencing have made SNP discovery relative cheap and easy for many plant species. Many different platforms exists for SNP detection

Fig. 5. (a) Two SNP loci in a gene sequence. (b) SNP genotypes of parental lines (row 1 and row 2) and progeny in an F2 mapping population of watermelon.

Table 1. Summary of marker characteristics.

Genome sequencing and marker discovery

RFLP, SSR and SNP marker systems require sequence information for the species of interest before genotyping. This requirement made the development of markers expensive, especially for smaller crops. Traditionally marker discovery was carried out by chain-termination sequencing (Sanger) of cloned DNA of the species of interest. This process is often used for sequencing expressed sequence tags (ESTs) to detect polymorphism in genes and is effective for detecting polymorphisms in candidate genes.

The development of next-generation massively-parallel sequencing (MPS) sequencing technologies made it practical for large scale genome wide polymorphism discovery. For species with large and/or complex genomes some type of genome enrichment or reduction strategy is required. This is especially important if an assembled genome sequence is not available. Two popular genome reduction strategies are RAD (restriction site associated DNA) sequencing (Baird et al., 2008) and transcriptome sequencing. Currently the Genotyping-by-Sequencing (GBS) method described by Elshire et. al. (2011) is also popular for plant species. This method uses methylation sensitive restriction enzymes to avoid the problems associated with aligning repetitive regions of the genome. MPS methods use the abundance of specific reads to estimate allele copy number and obtaining sufficient read depths can be problematic for heterozygous individuals (F2 populations) and polyploids (Uitdewilligen et al., 2013). MPS methods for genotyping often also have large amounts of missing data (Fig. 6). The latter problem can be overcome to some extent by imputation (Spindel et al., 2013), but the usefulness of some of the imputation methods is debatable.

Fig. 6. Visualization of (a) the under-representation of heterozygotes and (b) percentage of missing data in an F2 population genotyped using genotyping-by-sequencing. Total number of markers = 2,814.

Fingerprinting and Genotyping Technology

Fingerprinting Technology

Fingerprinting technology is used to genotype tens of thousands to millions of SNP loci at once.


Illumina Beadarray Microarray®

Information courtesy of illumina.com

Illumina Beadarray Microarrays use a chip containing 3 micron silica beads in precisely etched microwells, each containing many copies of a locus-specific probe.

Fig. 7a.

Beads are coated with hundreds of thousands of copies of an oligonucleotide probe containing a bead identifier sequence and gene-specific probe sequence. Probe sequence stops one base-pair short of target SNP.

Fig. 7b.

Fragmented DNA from the sample binds to complimentary probe sequence.

Fig. 7c.

Single base extension incorporates one of four labeled nucleotides. Natural competition between bases minimizes bias.

Fig. 7d.

Wells become laser excited and each nucleotide label emits a signal detected by an Illumina scanner (HiScan or iScan). The intensity of the signal will indicate the SNP present for a given locus.

Fig. 7e.

SNPs are bi-allelic. Able to detect homozygous or heterozygous loci based upon nucleotide signal given off.


Affymetrix SNP Array®

Information courtesy of Affymetrix

Fig. 8a.

Affymetrix SNP Arrays use a chip containing up to 6.5 million different oligonucleotides on a 1.7 cm2 surface. Each feature contains over one million oligonucleotide probes.

Fig. 8b.

DNA is digested using NSP I and STY 1 restriction enzymes.

Fig. 8c.

NSP I and STY I specific adapters ligated to fragments.

Fig. 8d.

Using ligation adapter sequence based primers, PCR amplify fragments. Larger fragments won’t be able to amplify during PCR reaction so the sample complexity will be reduced.

Fig. 8e.

Fragments are digested into smaller pieces and fluorescently labeled.

Fig. 8f.

Labeled DNA is hybridized to microarray in complimentary spots, causing fluorescence, which can be photographed by a fluorescence microscope in order to infer genotypes. The SNP lies within these labeled sequences. The microarray will have spots corresponding to each SNP possible at each locus of interest. In this example, we are examining three different loci. The top row contains oligonucleotides complimentary to the target sequence in parent 1, while the bottom row contains oligonucleotides complimentary to the target sequence in parent 2.


DArT array®

Information courtesy of DArT

Fig. 9a.

DArT arrays are able to scan hundreds of thousands of polymorphic markers.

Fig. 9b.

DNA is digested using PstI (rare cutter) and BstNI (frequent cutter) restriction enzymes.

Fig. 9c.

Pst1 specific adapters ligated to fragments.

Fig. 9d.

Using ligation adapter sequence based primers, PCR amplifies fragments. Larger fragments cut by frequent cutter restriction enzymes won’t be able to amplify during PCR reaction so the sample complexity will be reduced.

Fig. 9e.

Blue fluorescent labeled reference DNA quantifies the amount of DNA spotted on the array for each marker. The target DNA is green fluorescent labeled and hybridized to the array chip. A laser scans the array slide and fluorescence intensity of green/blue indicated the relative abundance of DNA for each marker in sample.

Fig. 9f.

Hybridization intensity for each array spot is scored as “0” (absent) or “1” (present). Allele calling is performed in this manner for all probes of all samples.

Genotyping Technology

These genotyping technologies are useful to make allele calls for SNPs (single nucleotide polymorphisms) or InDels (Insertion/Deletions) at specific genomic loci. They are highly robust, require little DNA, and are cost effective for high-throughput genotyping.


KASP® genotyping technology

Kompetitive Allele Specific PCR (KASP)

Information courtesy of LGC Group

Fig. 10a.

Two allele specific forward primers are present, one containing a FAM sequence and the other containing a HEX sequence. There is a common reverse primer. There is also the target DNA sequence. This SNP assay is designed to detect a C/A SNP at this locus. The target sample is homozygous for the C allele. The assay mix contains FRET cassette, taq polymerase, and a buffer solution.

Fig. 10b.

PCR-round 1:

Allele specific forward primer-1 hybridizes to the target sequence and elongates. Common reverse primer hybridizes and elongates anti-sense strand of target region. Since sample is homozygous for C allele, allele specific forward primer-2 does not hybridize and elongate.

Fig. 10c.

PCR Round 2:

Reverse primer hybridizes, elongates and creates a complimentary copy of corresponding tail sequence.

Fig. 10d.

PCR Round 3:

FAM-labeled oligo separates from quencher and hybridizes to newly synthesized complementary tail sequences, releasing a fluorescent signal.

Fig. 10e.

Light Cycler in association with a genotype clustering software can be used for SNP calling of individual DNA samples


Taqman® genotyping technology

Information courtesy of Life Technologies

Fig. 11a.

One forward and one reverse primer are present. There are two allele specific probes. Each probe has a reporter (FAM/VIC) at one end and a quencher at the other. There is also taq polymerase and a buffer solution present. This SNP assay is designed to detect a C/A SNP at this locus. The target sample is homozygous for the C allele.

Fig. 11b.

Allele specific probe 1 hybridizes to target sequence. Since sample is homozygous for C allele, allele specific probe 2 will not hybridize to target sequence. Forward and reverse primers will also hybridize to complimentary regions of target region.

Fig. 11c.

Taq polymerase extends from forward primer and acts as endonuclease upon reaching the allele specific probe. The fluorescent dye is released from the quencher and a corresponding signal is generated indicating which allele is present at the SNP locus.

Fig. 11d.

Light Cycler in association with a genotype clustering software can be used for SNP calling of individual DNA samples


SimpleProbe® genotyping technology

Information courtesy of Roche Life Sciences

Fig. 12a.

Two allele specific forward primers and a common reverse are present. Each probe has a reporter (FAM) and a quencher present at the 3’ end. There is also taq polymerase and a buffer solution present. This SNP assay is designed to detect a C/A SNP at this locus. The target sample is homozygous for the C allele.

Fig. 12b.

Each allele specific forward primer hybridizes to the target sequence. The reporter fluoresces when hybridized to the target sequence and is quenched when denatured from the target sequence.

Fig. 12c.

During the PCR reaction, as the temperature increases, primers with a perfect match to the target sequence will denature at a higher melting temperature. The primers with imperfect matches will denature at a slightly lower temperature because they are not as tightly bound to the target sequence. This indicates which allele is present at the SNP locus.

Fig. 12d.
Fig. 12e.
Fig. 12f.

Light Cycler in association with a melting curve analysis software can be used for SNP calling of individual DNA samples.

Tissue Collection and DNA extraction

Leaf tissue collection

The amount of leaf tissue collected is dependent of the purpose of collection and whether DNA extraction is going to be carried out from the total amount collected, or just a sample of the collected materials. A relatively larger amount of leaf tissue may be collected for mapping population parents compared to plants to be screened with high throughput genotyping technologies (e.g. KASP) for marker assisted selection (MAS). For all leaf collection it is important to collect healthy, young leaves.

Larger amounts of leaf material are usually collected in 15 ml or 50 ml tubes (Fig 13). These tubes can be put in liquid nitrogen until transferred to an ultralow freezer (-80°C) for long term storage. Alternatively, leaf material can be freeze-dried. For some crops, leaf material can be dried at 60°C for 24 hours and then stored at room temperature.

Fig. 13. Leaf material being collected in 15 ml tubes.

When collecting small amounts of leaf material (just enough for one DNA extraction), it is important to collect the correct amount of leaf material, since the 1st step of the DNA extraction will be carried out in the same tube as collection. One of the most common ways to make sure that the correct amount is consistently collected is to use a paper-punch or biopsy punch (Fig. 14). The researcher will first determine how many punches are optimal for the specific DNA extraction method that is going to be used and then collect that number of punches for each plant. It is important to find a quick and efficient way to collect samples, since a large number of samples usually need to be collected for MAS.

Leaf-punches are usually collected in 96-well plates that can also be used to grind the samples for the 1st step of the DNA extraction. The choice of plates to use depends on the individual researcher, but the plates must fit in the equipment used for grinding.

Fig. 14.Leaf material for DNA can be collected using (a) a paper punch, (b) forceps, or (c) a biopsy punch. After collection samples are placed in (d) a 96 wells plate for grinding and DNA extraction.

Seed and seed chips

The ability to extract DNA from partial seed is very desirable for MAS. A small part of the distal end of the seed is cut off for genotyping, while the rest of the seed is sown (Fig. 15). It is very important that the seed chip is large enough for DNA extraction, but not so large that it slows germination and growth of the seedling. Seed chipping can be done by hand using a sharp blade, but it is very tedious. It is best to have an automated way to carry out this task. For some crops with very small seeds (e.g. tomato) it might not be possible to use seed chip technology.

Fig. 15. The correct (ü) and incorrect (û) sizes of seed chips for different sizes of watermelon seeds. The distal ends are used for genotyping, while the proximal ends are sown.

DNA extraction methodology

The first step of any DNA extraction method is sample disruption (grinding the leaf or seed material into a fine powder). Traditionally this was accomplished using a mortar and pestle, but now high sample throughput equipment like the Qiagen TissueLyser is used (Fig. 16). A metal ball is added to the sample tube and plant material is ground into a fine powder by shaking fast.

Fig. 16. Leaf material can be disrupted by grinding in (a) a mortar and pestle (b) with a pestle in a microcentrifuge tube, or (c) by adding a metal ball to the tube and grinding in a (d) Quiagen TissueLyzer or similar equipment.

The DNA extraction methodology used depends on the downstream application. Higher quality DNA is required for genotyping-by-sequencing than for MAS using KASP technology. Numerous commercial DNA extraction kits (e.g. Qiagen, Sigma, Omega bio-tek) are available for genomic DNA extraction from plants. Some of these kits work better for some crops than others, so the researcher must determine which kit gives the desired quality and quantity of DNA for the downstream application. The cost of the kits can be prohibitive for large scale MAS. Generally a quick and cheap DNA extraction method is used for downstream PCR based applications. Many

different quick DNA extraction methods have been developed and tested for different crops (Gao et al., 2008; Meru et al., 2013). Ideally the chosen method will be quick, cheap, reliable, high throughout and not use any hazardous chemicals.

Quantitative trait loci (QTL) Mapping


  • Quantitative trait = multiple genes involved, phenotype typically follows normal distribution (Fig 17)
  • QTL = a gene or chromosomal region that affects a quantitative trait
  • QTL mapping = process of locating genes with effects on quantitative traits using molecular markers
Fig. 17. Distribution of root length in Arabidopsis thalinana (Li et al., 2014) and soybean (Manavalan, et al., 2015) populations.

Linkage mapping and mapping methodologies

  • Quantitative trait loci (QTL) mapping
    • Analysis is based on trait means recorded for a given DNA marker locus
    • Four steps are necessary (Bernardo, 2008)
      • Establishment of a segregating population(s)
      • Accurately phenotyping the population for the trait(s) of interest
      • Genotyping the population with molecular markers (e.g. simple sequence repeat (SSR) and/or single nucleotide polymorphism (SNP) markers)
      • Leveraging statistical analyses to infer QTL location(s) based on marker-trait association

Comparative approaches for mapping as dictated by time, cost, and allele number (Fig 18)

Fig. 18. Comparing methods in terms of time (years, Y-axis) and resolution to QTL (x-axis) using different mapping methods and single nucleotide polymorphisms (SNPs) allele number. Target allele number is by color (Reproduced from Yu and Buckler, 2006).

Breeding populations for bi-parental mapping

  • Elite material typically has a smaller genetic base
  • Crosses of elite and exotic material may be useful as well
  • Select parents that differ phenotypically for a trait of interest
  • Larger difference for trait could improve resolution of QTL identified

Common bi-parental populations

  • F2 and F2:3 families, recombinant inbred lines, double haploids
    • F2:3 families allow for you to infer the F2 genotype with single genes of interest and follow F2 segregation ration
    • Recombinant inbred lines hold several advantages
      • Inbreds are fixed and therefore stable
      • More generation of meiosis have created more recombinations to aid in fine mapping
      • The disadvantage is they take more time and resources to create
      • Bulked segregant analysis is a good mapping method for qualitative traits whereby the marker(s) with one bulk are not present in the other bulk, and therefore works best when a trait such as disease resistance is present or absent

Recombinant inbred population example

  • Each column represents a RIL. Note the residual heterozygosity in line 3. Mapping is typically done at the F5 stage or later where residual heterozygosity is minimal for most RILs (Fig 19, Image by Dr. Justin Vaughn). However, methods are available that makes mapping in the F4 population as useful as using F6 or F7 generations (Fig 20; Takuno et al., 2012).
Fig. 19. Development of an recombinant inbred line population. Also see Fig 20.
Fig. 20. Distributions of LOD scores at markers on chromosome with and without QTL based on simulations (Takuno et al., 2012).

Common mapping methods

  • Single-factor analysis: measures the significance between the phenotype (QTL) using one marker at a time
    • The location of the QTL cannot be identified using this method (Bernardo, 2010)
  • Interval mapping: Measures a QTL effect between two adjacent DNA markers (flanking markers) on the same chromosome
    • Requires the calculation of a logarithm of odds (LOD score) determined by likelihood-ratio statistic/2ln 10
    • A LOD score of 3.0 would indicate there is 1000:1 odds that the gene lies at the predicted location within the interval
  • Co-location of delayed senescence (green) and grain yield (red and magenta) on QTL Dro-3 (Fig 21)
Fig. 21. Co-location of delayed senescence (green) and grain yield (red and magenta) in cowpea (Muchero, et al., 2013).
  • Multiple interval mapping: Builds a genetic model that includes multiple marker intervals at the same time
    • Stepwise selection identifies the strongest QTL
    • Therefore this method would be used if more than one QTL were creating a significant effect on the phenotype (e.g. two mutations that increase disease resistance)

Pedigree mapping

  • Pedigree information includes the parents used to generate the progeny and the relationships of individuals
  • By knowing the relationships of individuals you can distinguish alleles based on their origin
    • Alike in state: Alleles are physically the same (i.e. at the DNA level)
    • Identical by descent: The same allele was contributed by a common ancestor
      • Can be calculated as the coefficient of coancestry
  • Pedigree mapping is a good choice if many existing populations are available and there are few individuals in each population, but good pedigree information
    • Bi-parental mapping can be limited by low polymorphism rates giving pedigree breeding an advantage
    • Family based QTL mapping can be done without labor intensive crosses and waiting for generation advances if the crosses have already been made
    • The QTL can be observed in different genetic backgrounds
    • You can leverage existing families in a breeding program
  • Mapping methods include the variance component method
    • Identity by descent probabilities are calculated for each DNA marker and equally spaced genomic regions between marker pairs (Rosyara et al., 2009)

Association mapping

  • Detection of QTL in general population instead of in biparental or backcross populations
  • Useful when mapping populations difficult to create, and to exploit wide range of genetic variation
  • Significant association between genotypic marker and trait considered proof of linkage between phenotype and casual site (Fig. 22)
  • Genome-wide association studies
Fig. 22. Manhattan plot showing marker association across maize genome (Larsson, et al., 2013).

Natural populations for association mapping

  • A large panel of genotypes for the crop of interest are selected
  • Ideally very genetically diverse (Figure 23)
  • Need to account for population structure
Fig. 23. Diverse seeds. Image: [www.cookipedia.co.uk].

Fine mapping and recombinant creation

  • Recombinants can be screened for in populations in different stages such as recombinant inbred lines or F2s, F2:3 Families etc.
    • When an interval is defined to harbor a gene of interest, identifying and facilitating more recombinations is essential to narrow the region
    • Plants heterozygous in the locus of interest can be self-pollinated to create more recombinations, thus a single unique F2:3 plant can in a sense create a mapping population
    • Alternatively unique recombinants in a population can be identified that allow for a region to be narrowed in size
  • Fine mapping allows researchers to identify candidate genes in the sequence if it is available and annotated

Comparative mapping approaches can leverage similarity of loci, location of loci, or both comparing among closely related species (Gale and Devos, 1998)

Plant Phenotyping for QTL Mapping

High precision phenotyping is essential for QTL mapping. This is especially important for traits controlled by many genes with small effect. Objective quantitative measurements are desirable, but in practice many traits, especially disease resistance, are often scored subjectively (Fig. 24).

Fig. 24. Disease severity scoring method for Fusarium wilt resistance phenotyping of watermelon seedlings.

Population structure and environment

The most popular population types used for bi-parental mapping are F2, recombinant inbred lines (RILs) and backcross (BC) populations (Fig. 25). When QTL mapping is carried out using F2 or BC populations, the phenotyping can either be carries out on the F2 and BC plants themselves, or the plants can be selfed and phenotyping is carried out on the F2:3 and BC1F2 families. The disadvantage of carrying out the phenotyping on F2 or BC1 plants is that only a single plant is available for phenotyping a specific genotype. This might be effective for highly heritable major gene traits that can be reliably phenotyped on a single plant, but it is not very useful for quantitative traits. The ability to phenotype many plants per genotype increases the accuracy of the mean phenotypic measurement and it allows phenotyping in multiple environments (year, and/or locations).

Phenotyping should generally be carried out in multiple environments and/or years (multi-environment trials; METs). Often major QTL are stable across years and environments, while smaller QTL might not be (Fig. 26). These stable, major QTL are ideal targets for MAS.

Fig. 25. Common population types that are used for QTL mapping are (a) F2, (b) backcross and (c) recombinant inbred lines.
Fig. 26. QTL associated with flowering time in watermelon in 2012 (green) and 2013 (blue). Numbers in parenthesis are %R2 for a particular QTL. The major QTL on chromosome 3 were stable across years, while the minor QTL on chromosome 3 and chromosome 11 were not.

It is desirable to have a phenotyping methods that are high throughput and reliable. High throughput seedling phenotyping is often carried out in the greenhouse where environmental conditions can be carefully controlled (Fig. 27). With the advances in genotyping, phenotyping has become the major bottleneck in plant breeding. High throughput remote sensing is emerging as the new frontier in plant breeding research (Cobb et. al., 2013; Fiorani and Schurr, 2013).

Fig. 27. High throughput tray dip method for phenotyping Fusarium wilt resistance in watermelon seedlings.

Experimental design for phenotyping

The principles of experimental design used to phenotype mapping populations are essentially the same as what is used to phenotype breeding populations. The experimental design used for phenotyping will depend on the generation used for phenotyping. If F2 or BC plants are used for phenotyping, only a single individual plant is available for phenotyping. No replication is possible. This type of phenotyping is only useful for highly heritable traits, where the environment has little or no effect. An example of a trait that was mapped by phenotyping a single F2 plant is the egusi seed phenotype in watermelon (Fig.28). The egusi seed phenotype is not influenced by the environment and can be accurately phenotyped from a single plant, which makes it an appropriate trait for mapping in an F2 population. Single plant phenotyping can be severely hampered for traits with low heritability where environmental variation has a large effect on the expressed phenotype. Spaced checks are often used as a way to “correct” for variability across a field when replication is not possible. Gridding and augmented designs (Federer and Crossa, 2012) are especially useful when phenotyping without replications.

When the mapping population allows for replicated trails (F2:3, RILs, etc) completely randomized design (CRD), randomized complete block design (RCBD) and lattice designs are popular. Mapping populations are usually large (~200 lines) and complete designs with replication can take up a lot of resources (land, labor, etc). Also, the larger the block gets the more heterogeneous the conditions within the block become. Practical issues, like the number of samples that can be processed for phenotyping can also influence the choice of design. Incomplete designs (e.g. apha lattice) can be used to overcome some of these issues. Researchers often have to find a balance between the importance for local replication and the need to replicate the experiment at various locations (METs).

Fig. 28. The egusi seed phenotype in watermelon is not influenced by the environment and was mapped on chromosome 6 in and F2 population that was grown in an unreplicated field trial.

QTL Validation

Insufficient validation of mapped QTL is one of the main reasons why a relatively small percentage of published QTL are used in breeding programs. The associations of QTL with phenotypes need to be validated in different genetic backgrounds, as well as different environments.

Genetic background

The parental genotypes used in mapping populations are usually selected because they have different phenotypes for a specific trait. However, in a breeding program many other lines will be used, and it is important to confirm that the marker allele is associated with the desired phenotype when crossing with these other genetic backgrounds.

There are several ways to validate QTL in different genetic backgrounds. The researcher can map the same trait in several populations in search for QTL that are detected in multiple populations with different genetic backgrounds (Fig. 29). Marker association can also be tested by crossing the parental line with the desired phenotype to several other lines and confirming that the linked marker predicts the phenotype in progeny. Alternatively, existing breeding lines or cultivars with different genetic backgrounds can be phenotyped and genotyped using a linked marker to confirm that marker alleles are predictive of phenotype.

Fig. 29. QTL associated with fruit length in watermelon in three different populations (SxE, KxN and ZxD). The trait was also mapped in two different locations (GA and CA) in the KxN population. A stable QTL across all three genetic backgrounds and two locations was located on linkage group (LG) 2. A QTL on LG 11 was stable across two of the populations, but was not detected in the SxE population.


Most quantitative traits are affected by the environment to some degree. How stable QTL are across different environments depends on the particular trait and QTL. Therefore the affect of a particular QTL on phenotype should be determined in multiple environments (years and locations). Generally, the affect of the QTL should be confirmed in the range of environments where the resulting cultivar will be grown.

During QTL mapping, phenotyping should be carried out in multiple locations and/or environments (Fig. 29 and Fig. 30) in order to identify those QTL that are stable across environments. Once markers linked to the trait have been identified, the ability of these markers to predict phenotype in different environments can be validated using a subset of the mapping population. Mapping populations are usually large and phenotyping entire populations can be expensive and time consuming.

Fig. 30. QTL associated with sugar content (brix, BRX) in watermelon in two different populations (SxE, and KxN). The trait was mapped in two different locations (GA and CA) in the KxN population. None of the QTL was stable across different genetic backgrounds, but the QTL on LG 7 was stable across two different location in the same population.

Molecular Breeding

Marker-assisted Selection

  • Phenotyping is labor intensive, and some traits are not measurable until after the opportunity for crossing (i.e. seed composition, yield, etc.)
  • DNA markers can speed up the selection process and improve accuracy
  • Markers can be from either parent in population development, or can be used to screen many accessions
  • I.e. a marker for disease resistance from a donor parent to screen the progeny. A marker for drought tolerance used to screen hundreds of lines to decide which to take forward in the breeding program

Marker-assisted Backcross

  • Backcrossing is taking a hybrid and repeatedly crossing it to one of its parents (the “recurrent parent”)
  • This is done to reduce the amount of genome from the other parent in the progeny (the “donor parent”)
  • Markers can help minimize the genome of the donor parent and maximize the genome of the recurrent parent during repeated backcross in the offspring by allowing early selections
  • With repeated backcrossing, it is impossible to get the trait of interest in a homozygous state unless the offspring are allowed to self throughout the process as well.

Target Trait Selection

Fig. 31.

Background Selection

Fig. 32.

Recombinant Selection

Fig. 33.

Genomic Selection (GS)

  • Definition
    • When a large number of molecular markers are used to predict the performance of lines of a particular crop species for a particular trait of interest
  • When to Use
    • When many genes or loci with small effects control the trait of interest
    • When phenotypic selection is ineffective
    • When cost of genotyping is less than phenotyping
    • When heritability of trait is high in training population and low in untested population
    • When there are one or a few genes controlling the trait, it is easier to use marker-assisted selection to introgress the trait of interest.
  • How it Works
    • Decreased genotyping costs and new statistical model enable simultaneous estimations of all marker effects
    • A new form of MAS that estimate all marker effects across the whole genome to calculate genome estimates breeding value (GEBV)
    • Markers are not tested for significance - all markers are used in selection
  • Factors affecting the accuracy of GS
    • Non-additive effects
    • Relatedness of training set to predicted sets
    • Size of training data set
    • Trait heritability
    • Number of markers
    • Genetic model
    • Statistical model
  • Statistical Methods of Prediction
    • Least Absolute Shrinkage and Selection Operator (LASSO)
    • Best Linear Unbiased Prediction (BLUP)
    • Stochastic Search Variable Selection (SSVS or Bayes B)
    • Support Vector Machine - Radial Basis Function (SVM-RBF)
    • Support Vector Machine – Polynomial Kernel Function (SVM-POLY)
    • Partial Least Squares Regression (PLS)
Fig. 34.

Plant Germplasm Characterization using DNA Markers

Genetic diversity and the effects of crop domestication

In essence, genetic diversity can be thought of in terms of the number of genes in a population that have multiple versions. Because stable genetic variation is the basis of natural and artificial selection, characterizing diversity is often a first step in understanding if and how a population will respond to selection. Variable positions in a genome are called “sequence polymorphisms” and are the result of mutations that have changed the ancestral state by substituting a new base or inserting/deleting whole chunks of DNA. Alleles are versions of a polymorphic site - the mutant and the ancestral state, for example. More mutations will have accumulated between two cousins than between two siblings, and so on. In addition, mutations will potentially be recombined during meiosis leading to new combinations of all the sequence polymorphisms present in a population. The pattern of sequence polymorphisms in a population can tell us about the history of that population and its structure. From a plant breeding perspective, these patterns can also help us to select lines that might contribute complementary alleles to a cultivar.

Genetic diversity can be measured in various ways (Fig 35). One measure is the number of polymorphic sites (S) divided by the number of sites sampled in a population (L): ps = S / L. This value reflects the number of genes in a population that have multiple versions. In addition, ps can be used to estimate the effective population size (Ne). The degree to which Ne is different from the actual population size (N) is indicative of the degree to which the population (or the position in the genome) deviates from what would be expected if the polymorphisms were neutral and the individuals were mating at random. Another measure of genetic diversity is the average pairwise difference, also called nucleotide diversity (π), which, unlike ps above, is sensitive to the allele frequency of a polymorphism. In other words, two populations can have the same ps but different π values due to the fact that one population has many rare alleles. Rare alleles decrease π more rapidly than they do ps because calculating ps involves many pairwise comparisons. Like ps, π can be used to assess Ne. Barring selection and/or recent changes in population size, estimates of Ne based on ps or π should be the same.

Fig. 35. Examples for two primary measures of diversity illustrated using two sets of aligned sequences. Gray columns are polymorphic sites.

In the vast majority of cases, crops undergo a substantial population contraction related to domestication: only a few plants from the wild population are satisfactory to farmers and those plants go on to give rise to the future crop population. The domestication contraction is exacerbated in plant species such as soybean, which produce male and female gametes on the same plant but generally only fertilize themselves. In such cases, the recently domesticated species stays reproductively isolated from its wild progenitor. Moreover, when a crop is transported by humans (or other animals) to regions outside the native habitat of its wild progenitor, it has virtually no chance of regaining any semblance of its original diversity (barring human intervention). The genetic implications of this scenario can clearly be observed in soybean, wherein the original domestication event reduced the number of polymorphic sites in landraces and migration to North America still further reduced levels of diversity (Fig 36).

Fig. 36. Y-axis represents the number of single-nucleotide polymorphisms (SNPs) present within each population on the X-axis. “G. soja” is the wild species from which soybean was domesticated. “Landraces” are the soybean varieties that pre-date modern breeding and are generally concentrated within a crops center of origin. “N. Am. Ancestors” are the small sample of landraces that became the genetic base of modern breeding programs in North America. “Elite Cultivars” are contemporary soybean varieties. Light-green boxes indicate the number of SNPs in a particular population that are only polymorphic in that population. The relative difference in this SNP category between G. soja and Landraces and Elite Cultivars is a hallmark of the domestication bottleneck. From Hyten et al. 2006.

Allele frequency and regions under breeding selection

New mutations in a population create new polymorphic sites and new genes. So the number of polymorphic sites is clearly related to mutation rate. The number of polymorphic sites is also related to the size of the population. In very small populations, an allele can disappear from one generation to the next merely because small samples (in this case, the next generation) are more sensitive than large samples to stochastic variation, particularly when an allele becomes rare (<10%): if a bag contains 10 white and 90 black marbles, we would often draw all black marbles if limited to only 5 draws; alternatively, we would rarely draw all black marbles if given 200 draws (Fig 37). In either case, chance loss of an allele is known as genetic drift. In contrast, natural or artificial selection reflects the fact that an allele's chance of being included in the next generation is greater than or less than its current frequency. Thus, detecting alleles that are under selection is akin to a Las Vegas gaming inspector detecting whether a casino is rolling a fair dice or spinning a fair roulette wheel. In turn, successful detection depends heavily on how much, if at all, the casino is cheating.

Fig. 37. Effect of sample size (or size of the next generation) on the permanent loss of an allele.

Identifying regions of a genome that are under selection is very important in order to understand which genes transform a wild plant into a crop and which genes result in improved varieties of that crop. As is clearly evident from Fig 38, the frequencies of many alleles have changed dramatically since North American breeders began to improve the soybean ancestors that immigrated here. Can we determine which of these alleles are under selection and which are changing because of drift? If such populations were behaving just like our bag of white and black marbles, then such detection would be fairly routine. Unfortunately, the population of North American varieties has hidden structure: extending the our bag analogy, there are multiple bags and different numbers of marbles are drawn from each. Finding this hidden structure and identifying selected alleles is an ongoing challenge in crop genetics.

Fig. 38. Allele frequency of hundreds of polymorphic sites across chromosome 3 of the soybean genome. Y-axis indicates the decade or era of cultivar's release. Color coding is based on the allele frequency and is relative to the most common allele in the known ancestors of all North American germplasm established in the early 1900s (“Ancestors”). In other words, these alleles will not appear at <50% in the “Ancestors” panel.

Variety Protection

  • Requirements for Plant Variety Protection
    • Distinct – The plant can be distinguished by one or more characteristics
    • Uniform – There is little to know variation within the characteristics that are distinct
    • Stable – When plants are reproduced, the distinctiveness and uniformity remain unchanged
Fig. 39.

Types of Variety Protection

Table 2.

Common Licenses/Agreements for Varieties

Table 3.


Baird, N.A., P.D. Etter, T.S. Atwood, M.C. Currey, A.L. Shiver, et al. 2008. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3: e3376.

Bernardo, R. 2008. Molecular markers and selection for complex traits in plants: Learning from the last 20 years. Crop Sci. 48: 1649-1664.

Bernardo, R. 2010. Breeding for Quantitative Traits in Plants. 2nd ed. Stemma Press, Woodbury, MN. ISBN 978-0-9720724-1-0.

Cobb, J., G. DeClerck, A. Greenberg, R. Clark, and S. McCouch. 2013. Next-generation phenotyping: requirements and strategies for enhancing our understanding of genotype–phenotype relationships and its relevance to crop improvement. Theor. Appl. Genet. 126:867-887.

Collard, B.C.Y. and D.J. Mackill, 2008. Marker-assisted selection: An approach for precision plant breeding in the twenty-first century. Phil. Trans. R. Soc. B 363:557-572

Collard, B.C.Y., M.Z.Z. Jahufer, J.B. Brouwer, and E.C.K. Pang, 2005. “An introduction to markers, quantitative trait loci (QTL) mapping and marker-assisted selection for crop improvement: The basic concepts.” Euphytica 142:169-196

Cregan, P.B., 1992. Simple sequence repeat DNA length polymorphisms. Probe 2: 18–22.

de Arruda, M., E. Gonçalves, M. Schneider, A. da Costa da Silva, and E. Morielle-Versute. 2010. An alternative genotyping method using dye-labeled universal primer to reduce unspecific amplifications. Mol. Biol. Rep. 37:2031-2036.

Eathington, S.R., T.M. Crosbie, M.D. Edwards, R.S. Reiter, and J.K. Bull, 2007. Molecular Markers in a Commercial Breeding Program. Crop Sci.47(S3):S154-S163

Elshire, R.J., J.C. Glaubitz, Q. Sun, J.A. Poland, K. Kawamoto, E.S. Buckler, and S.E. Mitchell. 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6:e19379.

Federer, W.T. and J. Crossa. 2012. I.4 Screening experimental designs for quantitative trait loci, association mapping, genotype-by environment interaction, and other investigations. Frontiers in Physiology 3:156.

Fiorani, F. and U. Schurr. 2013. Future scenarios for plant phenotyping. Annu. Rev. Plant Biol. 64:267-291.

Gale, M.D., and K.M. Devos. 1998. Comparative genetics in the grasses. Proc. Nat;. Acad. Sci. (USA) 95: 1971-1974.

Gao, S., C. Martinez, D. Skinner, A. Krivanek, J. Crouch, and Y. Xu. 2008. Development of a seed DNA-based genotyping system for marker-assisted selection in maize. Mol. Breed. 22:477-494.

Hamilton, M. 2011 Population genetics. John Wiley & Sons.

Hyten, David L., et al. 2006. Impacts of genetic bottlenecks on soybean genome diversity. Proceedings of the National Academy of Sciences 103.45: 16666-16671.

Larsson, S. J., Lipka, A. E. and E. S. Buckler. 2013. Lessons from Dwarf8 on the strengths and weaknesses of structured association mapping. PLoS genetics, 9(2).

Li, W., H. Duan, F. Chen, Z. Wang, X. Huang, X. Deng and Y. Liu, 2014. Identification of Quantitative Trait Loci Controlling High Calcium Response in Arabidopsis thaliana. PloS one, 9(11).

Manavalan, L.P., S.J. Prince, T.A. Musket, J. Chaky, D. Reshmukh, T. D. Vuong, L. Song, P. B. Cregan, J. C. Nelson, J.G. Shannon, J.E. Specht and H.T. Nguyen. 2015. Identification of Novel QTL Governing Root Architectural Traits in an Interspecific Soybean Population. PloS one, 10(3).

Meru, G., D. McDowell, V. Waters, A. Seibel, J. Davis and C. McGregor. 2013. A non-destructive genotyping system from a single seed for marker-assisted selection in watermelon. Gen. Mol. Res. 12:702-709.

Meyer, R.S. and M.D. Purugganan. 2013. Evolution of crop species: genetics of domestication and diversification." Nature Reviews Genetics 14: 840-852.

Morgante, M. and A.M. Olivieri, 1993. PCR-amplified microsatellites in plant genetics. The Plant Journal 3: 175–182.

Mumm, R.H. 2007. Backcross versus forward breeding in the development of transgenic maize hybrids: Theory and practice. Crop Sci. 47(S3):S164-S171

Rosyara U.R., J.L. Gonzalez-Hernandez, K.D. Glover, K.R. Gedye and J.M. Stein. 2009. Family-based mapping of quantitative trait loci in plant breeding populations with resistance to Fusarium head blight in wheat as an illustration Theoretical Applied Genetics 118:1617-1631.

Sleper, D.A. and J.M. Poehlman, 2006. Breeding Field Crops 5th Ed. Blackwell Publishing.

Southern, E.M. 1975. Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol. 98:503-517.

Spindel, J., M. Wright, C. Chen, J. Cobb, J. Gage, S. Harrington, M. Lorieux, N. Ahmadi and S. McCouch. 2013. Bridging the genotyping gap: using genotyping by sequencing (GBS) to add high-density SNP markers and

new value to traditional bi-parental mapping and breeding populations. Theor. Appl. Genet. 126:2699-2716.

Takuno, S., R. Terauchi and H. Innan (2012). The power of QTL mapping with RILs. PloS one, 7(10).

Uitdewilligen, J.G.A.M.L., A.-M.A. Wolters, B.B. D’hoop, T.J.A. Borm, R.G.F. Visser and H.J. van Eck. 2013. A Next-Generation Sequencing method for Genotyping-by-Sequencing of highly heterozygous autotetraploid potato. PLoS ONE 8:e62355.

Vos, P., R. Hogers, M. Bleeker, M. Reijans, T. van der Lee, M. Hornes, A. Frijters, J. Pot, J. Peleman, M. Kuiper and M. Zabeau, 1995. AFLP: a new technique for DNA fingerprinting. Nucl Acids Res 23(21): 4407–4414.

Welsh, J. and M. McClelland, 1990. Fingerprinting genomes using PCR with arbitrary primers. Nucl Acids Res 18: 7213–7218.

Williams, J.G.K., A.R. Kubelik, K.J. Livak, J.A. Rafalski and S.V. Tingey. 1990. DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucl Acids Res 18: 6531–6535.

Yu, J. and E.S. Buckler. 2006. Genetic association mapping and genome organization of maize. Current Opinion in Biotechnology. 17:155-160.