DNA Marker Analysis Reveals Genomic Diversity and Putative QTL Associated with Drupe Traits in Phyllanthus emblica

Phyllanthus emblica L. (V. Nelli) is an important constituent of indigenous medicine and a commercially important fresh fruit species. The genomic diversity of the P. emblica germplasm in Sri Lanka has not been studied in detail. Therefore, the present study was conducted to molecularly characterize the P. emblica germplasm in Sri Lanka. Young tender leaves from 66 trees of P. emblica were collected from various parts of the country for DNA assessment. Genomic DNA was extracted using DNeasy Plant mini kit. Six microsatellite (SSR) markers and one SCAR marker developed for P. emblica were used in the PCR amplification. The PCR products were size separated by using 6% denaturing polyacrylamide gel electrophoresis. A binary scoring matrix was constructed on presence and absence of alleles in each single locus and a dendrogram based on genetic similarities was constructed. For each allele, allele frequency, heterozygosity and polymorphic information content were calculated. Cluster analysis was conducted based on marker data and association analysis was carried out to detect putative QTL associated with drupe traits. The dendrogram based on seven DNA marker alleles, showed two major clusters at 14.67% molecular similarity coefficient. A total of 51 alleles were detected in the sampled P. emblica germplasm with the six SSR markers and the SCAR marker used. Out of the six SSR markers, four were polymorphic, one was monomorphic and the other one did not produce clear bands. Heterozygosity of the tested markers ranged from 0. 44. to 0.89. Allele at 176 bp of the marker Phyll112 showed a significant association (P<0.001 and P<0.0001) with commercially important traits such as drupe height, width, weight and mesocarp thickness. The P. emblica germplasm has a higher variation in Sri Lanka and this diversity can be used to develop genetically superior varieties in the future and also to dissect underlying molecular genetic mechanisms of drupe development.


INTRODUCTION
Phyllanthus emblica is a medicinally important fruit crop species. It is a member of the family Phyllanthaceae (formerly, Euphorbiaceae) and occurs naturally in South Eastern Asia (Dassanayake and Fosberg, 1988). All parts of the tree are utilized in indigenous medicine, yet drupe plays the central role as a remedy to a number of ailments (Krishnaveni and Mirunalini, 2011;Daisy and Rajathi, 2009). Inscriptions indicate that P. emblica was used in Sri Lankan indigenous medicine as far back as 350 AD. In ancient literature, it is referred to as "Ambulu" (Compendium of Medicinal Plants, 2002). P. emblica has a high economic value because of its medicinal and nutritive richness. Therefore, it is used to produce cosmetics, commercial beverages, confectionaries and medicinally important value added products. Fruit itself has a high demand as fresh fruit in the market. Thus, the mature drupe size and the bitterness of P. emblica are the most important traits for commercial exploitation. In modern era where there is a boost in 'global herbal market', P. emblica has a high potential as a cash crop in Sri Lanka. In India alone 10,000 tons of P. emblica drupes are commercially utilized per annum (Sharma et al., 2008a).
P. emblica germplasm in Sri Lanka exhibits vast diversity in drupe morphology (Pushpakumara et al., 2007 andMawalagedera et al., 2014). Some trees bear small bitter drupes while others produce large and less bitter drupes. The fresh fruit market prefers the larger and less bitter drupes. Therefore, there is a grower driven selection towards the large and less bitter drupe bearing trees. However, small and bitter drupe types are immensely important to maintain the P. emblica diversity and mostly to preserve its medicinal value. The grower driven selection leads to uprooting of the small drupe bearing trees causing genetic erosion, threatening the use of it in indigenous medicine.
The diversity of drupe traits has been examined on the basis of drupe morphological traits of 66 trees observed in various parts of Sri Lanka (Mawalagedera et al., 2014). Identifying genetic loci controlling drupe size variation would serve as an aid for selecting trees for marker assisted breeding of superior varieties (preferably those with large and bitterer drupe bearing trees), which could ease the tedious processing of drupes in indigenous medicine. Assuming drupe size is a quantitative trait, as in case of the most fruit crops, dissecting QTL associated with the trait could also help in identifying trees capable of producing large drupes at the seedling stage through DNA marker screening. This will undoubtedly help growers to obtain a yield with desired drupe size that could capture both local and global fresh fruit markets. Furthermore, it will assist in effective conservation of germplasm by identifying a core collection for P. emblica germplasm in Sri Lanka avoiding duplicates and ensuring that the medicinally important portion of the germplasm is conserved. However, to do so, the diversity of P. emblica germplasm in Sri Lanka has to be molecularly characterized, which to our knowledge has not yet been conducted in the past.
The current study was planned with the objectives of assessing the DNA based genetic diversity, comparing molecular vs. morphological diversity, identifying candidate QTL associated with drupe traits and making inferences on the ploidy level of P. emblica.

Sample collection
Young tender leaves were collected from 66 trees of Phyllanthus emblica from various parts of Kandy, Kurunegala and Anuradhapura districts of Sri Lanka. Leaf samples were stored in -80 o C until the extraction of DNA.

DNA extraction
DNeasy Plant mini kit (Qiagen, Valencia, California, USA) was used to extract DNA from 0.1 g of finely powdered leaf in liquid nitrogen following manufacturer's instructions. Genomic DNA was visualized by 1% agarose gel electrophoresis and quantified by UV spectroscopy (Cintra10e, GBC, Australia). Absorbance was measured at 260 nm to determine the DNA concentration (Sambrook and Russell, 2001).

Polymerase Chain Reaction (PCR) PCR for diversity analysis
Six SSR markers (Pandey and Changtragoon, 2012) and one SCAR marker (Dnyaneshwar et al., 2006) developed for P. emblica were used for PCR amplification (Table 1). Initial screening for successful PCR amplification was carried out with 12 randomly selected DNA samples from P. emblica. DNA amplification was performed in 15μl reactions containing 1× GoTaq® Green Master Mix (Promega Corporation, Madison, Wisconsin, USA), 0.5 μM each of forward and reverse primer and 1 μl of DNA template (60ng/μl). Amplifications were carried out in Thermal Cycler (Takara, Otsu, Shiga, Japan) using the PCR cycle: Initial denaturation: 5 minutes at 94 °C; 35 cycles of 45 seconds at 94 °C, 1.30 minutes at annealing temperature (see Table 1 for annealing temperatures of each pair of primer), 2 minutes at 72 °C; and a final extension step of 10 minutes at 72 °C. PCR products were visualized in ethidium bromide stained 1% agarose gel after electrophoresis.
After confirming PCR amplification, all the successful DNA samples were screened by using all the seven pairs of primers in the PCR.

PCR for DNA barcoding
The matK forward and reverse primers were used for PCR amplification of four DNA samples of individuals belonging to four different clusters based on the morphological similarity dendrogram explained in Mawalagedera et al., (2014). Amplifications were carried out in Thermal Cycler (Takara, Otsu, Shiga, Japan) using the PCR cycle: Initial denaturation: 10 minutes at 94 °C; 35 cycles of 20 seconds at 94 °C, 20 seconds at annealing temperature 48 °C , 2 minutes at 72 °C; and a final extension step of 10 minutes at 72 °C. PCR products were visualized in ethidium bromide stained 1% agarose gel after electrophoresis.

DNA sequencing
The PCR amplicons of matK of four DNA samples of individuals belonging to four different clusters explained in Mawalagedera et al., (2014) were sequenced using Sanger sequencing ABI 3500 series Genetic Analyzer (Applied Biosystems®).  (Dnyaneshwar et al., 2006;Pandey and Changtragoon, 2012) Denaturing

Polyacrylamide Gel Electrophoresis
The PCR products of the SSR markers of the DNA samples of 66 P. emblica trees were then size separated by using 6% denaturing polyacrylamide gel electrophoresis.

Data analysis Allele Diversity
Different alleles for each marker were visually identified based on the length of the fragment (bp) and a binary (1= allele present, 0= allele absent) scoring matrix was built for all the markers amplified in the 66 P. emblica trees. For each allele, allele frequency (Pi), heterozygosity (H) and polymorphic information content (PIC) were calculated. The H (Botstein et al., 1980) and PIC (Shete et al., 2000) was calculated using the following formula: n= Number of alleles Pi=Frequency of i th allele Using the allele frequencies, unique alleles (UA), rare alleles (RA) and frequent alleles (FA) were identified.

Cluster analysis
Cluster analysis was performed for 66 trees using all the marker allele data in the binomial data matrix having McQuitty linkage, Euclidean distance using the statistical package Minitab 14.
The 66 trees were given the same tree ID as used in the dendrogram based on drupe morphometric traits explained in Mawalagedera et al., (2014).

Association mapping for QTL analysis
To identify association between marker genotype and trait value, a single marker analysis was conducted. By using the software MapQTL® (Van Ooijen, 2009) marker alleles that were significantly associated with drupe morphometric traits were identified based on Kruskal-Wallis statistics (Kruskal and Wallis, 1952) for the unbalanced natural population.

DNA sequence analysis
Automatic multiple sequence alignment was carried out using the software MEGA 5.2.2 (Tamura et al., 2011) and Clustal Omega free software packages.

Allele composition for the seven DNA markers in P. emblica
The PCR performed with DNA extracted from 66 P. emblica trees using seven species-specific pair of primers amplified a total of 51 alleles ( Table 2). Out of the six microsatellite markers, four exhibited polymorphism, one was monomorphic while the other did not show successful amplification. The SCAR marker was also monomorphic among the entire germplasm tested. Heterozygosity of the tested markers ranged from 0.89 to 0.44. The microsatellite marker Phyll53 showed the lowest heterozygosity while Phyll112 showed the highest heterozygosity ( Figure 1). Polymorphic information content (PIC) was also high (0.88) in Phyll112 making it the most informative marker for the assessment of genetic diversity in P. emblica germplasm in Sri Lanka. From those 51 alleles detected, 26 were rare and found in less than 10% of the germplasm, 18 were frequent but seven were unique alleles, found only in a single individual (Table 2).

Cluster analysis for DNA marker alleles
The dendrogram based on seven DNA marker alleles, showed two major clusters at 14.67% molecular similarity co-efficient ( Figure 2). Cluster two included 70 % of the germplasm assayed, while cluster one included 30 % only. Cluster two showed higher number of subdivisions than cluster one. The two clusters did not show any biasness or specificity in relation to sampling location.
This dendrogram based on microsatellite DNA marker alleles did not align with the dendrogram constructed by Mawalagedara et al., (2014), using drupe morphometric traits, where four clusters were detected based on drupe traits; height, width, weight and mesocarp thickness. The trees showed a random distribution regardless of their drupe morphometric traits. Yet a high diversity was visible in the dendrogram with no two trees being one hundred percent similar.
The comparison of presence and absence of alleles of the polymorphic microsatellite markers by observing the dendrogram based on drupe morphometric data (Mawalagedera et al., 2014) revealed that, the cluster three bearing the smallest drupes contained four alleles (366 bp, 365 bp, 364 bp and 363 bp) for the marker Phyll 68 were unique to that cluster.

Association mapping for QTLs associated with drupe traits
Single marker QTL analysis was conducted to detect marker alleles that are significantly associated with the commercially important drupe traits. The analyses revealed that out of 51 alleles, 15 were significantly associated (Appendix 1) in determining drupe weight, width, height and mesocarp thickness.
Of these markers, allele 176 bp of Phyll112, (Table  3) showed the most significant association (P<0.0001) with commercially important traits; drupe height, width, weight and mesocarp thickness. Furthermore, 363 bp allele of Phyll68 also showed a significant association (P<0.001) with drupe height. In contrast, the stone traits were significantly associated with a different set of marker alleles than the drupe traits including the 150 bp allele of Phyll31, and the 368 bp, 174 bp and 172 bp alleles of Phyll68.

Inferences on ploidy level of P. emblica
When the most informative DNA markers, Phyll31 and Phyll112 were considered they yielded more than two alleles per single locus. For all the four informative markers the number of alleles produced for a single individual ranged from one to nine. Thus, these markers resulted in multiple alleles for the P. emblica germpalsm in Sri Lanka implying the presence of polyploidy.

DNA barcoding
The DNA sequences of matK of four DNA samples of individuals belonging to four different clusters based on fruit size explained in Mawalagedera et al., (2014) showed no polymorphism, indicating fruit size differences are not caused by species level differences (Appendix 2).

DISCUSSION
The molecular characterization of the P. emblica germplasm revealed that there is high genomic diversity within the species. Though phenotypic traits are commonly used to characterize this diversity, changes in the phenotype due to environmental factors alone can mislead diversity analyses (Ganopoulos et al., 2011). Hence molecular genotyping provides a more accurate insight for identifying and establishing the diversity structure of P. emblica germplasm. Molecular markers such as microsatellite markers, Random Amplified Polymorphic DNA (RAPD), Restriction Fragment Length Polymorphic DNA (RFLP), Amplified Fragment Length Polymorphism (AFLP) and Single Nucleotide Polymorphism (SNP) are among the most widely used DNA markers to characterize germplasm at the DNA level. Microsatellite markers for P. emblica were developed with the aim of using them in diversity and population genetic studies (Pandey and Changtragoon, 2012). Genetic diversity studies on sweet cherry (Ganopoulos et al., 2011), peach (Fernandez et al., 2012), wild rice varieties (Feng et al., 2006) and bamboo (Sharma et al., 2008b) are some examples of using SSR markers for genetic diversity assessment in plants.
The molecular diversity analysis of P. emblica required intact and quality genomic DNA. The P. emblica leaves are rich in phenolic compounds and other secondary metabolites which reduce pH during CTAB protocol of DNA extraction, degrading and yielding low quality DNA (Nagarajan et al., 2011). The early attempts to extract DNA from P. emblica leaves using CTAB method too had less success in the present study. Therefore, DNA extraction was carried out using DNeasy Plant mini kit (Qiagen, Valencia, California, USA). For resolving the PCR products of the amplified P. emblica DNA, 6% denaturing polyacrylamide gel electrophoresis was used because it provides a high resolution for DNA of 40 bp -400 bp, while a 1% agarose gel could only resolve 500bp -5000bp DNA (Barril and Nates, 2012). The actual DNA product size of the microsatellite markers fell between 100bp -200bp (Pandey and Changtragoon, 2012). Furthermore, denaturing polyacrylamide gel electrophoresis provides a stable chemically cross-linked gel providing sharper bands even for the low molecular weight DNA (Sambrook and Russell, 2001).
The microsatellite markers (Pandey and Changtragoon, 2012) designed for the Indian P. emblica germplasm resulted in successful amplification of the Sri Lankan P. emblica genomic DNA. This shows that the P. emblica germplasm in these two locations have an evolutionary relatedness. Yet it does not provide evidence about the origin of this species. The microsatellite marker Phyll53, which was monomorphic for Indian germplasm (Pandey and Changtragoon, 2012) showed a heterozygosity of 0.44 in the Sri Lankan germplasm. The marker Phyll7, which was monomorphic for Sri Lankan P. emblica germplasm showed a heterozygosity of 0.75 in the Indian germplasm. The highest heterozygosity was observed with the Phyll112 marker in both germplasm sources. But the observed heterozygosity was much higher (H= 0.89) in the Sri Lankan germplasm than the Indian germplasm (H=0.76) suggesting that the Sri Lankan P. emblica germplasm is more genetically diverse.
The cluster analysis based on molecular data did not support the clustering based on drupe morphometric data (Mawalagedera et al., 2014). It would have been possible to obtain a similar clustering if the DNA markers used were associated with drupe size determining genomic regions of the P. emblica genome. In such a study, RAPD markers have been identified in melon, where the markers were linked with the QTL determining fruit weight and width (Park et al., 2005). Similarly in tomato, a locus, fw.2.2 on the second chromosome was found to be responsible for controlling fruit size (Frary et al., 2000). In the case of P. emblica, out of all the alleles, four alleles were unique to the smallest drupe-bearing cluster of the drupe morphological dendrogram. Hence, it highlights the unique genomic composition of the small drupe bearing P. emblica trees and the need for conserving this portion of the germplasm in Sri Lanka. If the culling of the small drupe bearing trees continues, it would cause the trees to bear larger fruits as in the case of tomato where the larger fruits were selected by growers over decades causing the tomato fruit to become progressively larger (Tanksley, 2004), ultimately losing an important portion of the germplasm forever.
Through association mapping putative QTL associated with important drupe traits were detected. Association mapping was used to avoid the limitations of pedigree based QTL mapping (Khan and Korban, 2012) because this sampled P. emblica germplasm was a random natural population. Single marker analysis revealed that 15 alleles were significantly associated (Appendix 1) in determining drupe weight, width, height and mesocarp thickness. The 176 bp allele of Phyll112 locus was significantly associated with all the four commercially important drupe traits (drupe; height, width, weight and mesocarp thickness). Based on this it is likely that the same genomic region might be controlling all the four phenotypically related drupe traits. Hence, this marker could be important in marker-assisted selection to determine the drupe size and flesh thickness.
For diploid species maximum of two alleles are found at a locus. But some individuals in the sampled germplasm exhibited as high as nine alleles in a single locus providing a hint of polyploidy. Previous studies indicated that the genus Phyllanthus is predominantly polyploid (X = 13) and that P. emblica might be octoploid (Webster and Ellis, 1962). Yet it has been found that microsatellite markers are evolutionary diverse. These microsatellite regions undergo mutations as DNA slippage and mutations that decrease the length of pure repetitive DNA sequences. Furthermore, it has been found that within-locus polymorphism is positively correlated with allele length due to contractionmutations in the locus (Ovidiu and Hörandl, 2006). Therefore, it is premature to make inferences on ploidy level of P. emblica based on the number of alleles at a locus without cytogenetic analysis.
DNA barcode for matK was ineffective to distinguish between the four clusters based on drupe morphological similarity explained by Mawalagedera et al., (2014). Since the barcode for matK resulted in 100% similarity it can be concluded that the variation in drupe size is not due to species level differences within the P. emblica germplasm in Sri Lanka. Yet more accurate insight can be obtained through detailed barcoding based on several barcoding genes.
In summary, these results showed that P. emblica germplasm in Sri Lanka exhibit a significantly high genomic diversity, which can be exploited to develop superior varieties using the putative QTLs. Further studies are required to assess the cytogenetic features of P. emblica germplasm in Sri Lanka. Mohapatra, T. and Ahuja, P. S. (2008b). Evaluation of rice and sugarcane SSR markers for phylogenetic and genetic diversity analyses in bamboo. Genome 51: 91-103. Shete, S., Tiwari, H. and Elston, R. C. (2000). On estimating the heterozugosity and polymorphism information content value. Theoretical Population Biology 57: 265-271. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M. and Kumar, S. (2011)