Volume 17 // Number 3 // Article 1
Comparison of Sampling Strategies to Evaluate Rate of Transgenic Adventitious Presence in Agricultural Fields
Rémi Bancal
INRA, Unité Mixte de Recherche (UMR) 211 INRA AgroParisTech Grignon
Arnaud Bensadoun
INRA, Unité de Recherche (UR) 341 Mathématiques et Informatique Appliquées—Jouy
Antoine Messéan
INRA, UR 1240 EcoInnov Grignon
Hervé Monod
INRA, Unité de Recherche (UR) 341 Mathématiques et Informatique Appliquées—Jouy
David Makowski
INRA, UMR 211 INRA AgroParisTech Grignon
Methods have been developed to detect transgenic presence in nonGM maize fields. These detection methods may be used to determine whether the regulatory transgenic rate threshold (0.9%) is exceeded, but the results are likely to depend on the grain sample size and on the sampling strategy used to collect grains within agricultural fields. Until now, no clear sampling strategy and sample size have been defined for implementing detection methods.
This study aims to compare four types of sampling strategies for maize grains in agricultural fields—i) random sampling, ii) systematic sampling, iii) stratified sampling, and iv) regression sampling. The first approach simply randomly samples maize ears in the considered field. The second approach consists of selecting ears according to a regular grid. The two final approaches use an auxiliary variable correlated with the real transgene distribution in order to define strata with contrasted presence rates or to reweight a sample of ears selected at random. The auxiliary variable considered in this study corresponds to the output of a geneflow model simulating crosspollination in function of wind speed, wind direction, and distance to the closest GM maize field. Data collected in the Montargis (France) experiments in 1998 and 1999 were used to compare the four sampling strategies and to determine the sample size (i.e., number of ears) required to detect transgene presence with a good level of accuracy. Results showed that a sample of 2,000 ears is needed to reach a sensitivity or a specificity of 0.95 with random sampling when the true presence rate differs by 0.2% from the regulatory threshold of 0.9%. We showed that this sample size could be strongly reduced (up to 25 to 100 ears depending on the siteyear) by using stratified sampling. Regression led to intermediate sample sizes, and systematic sampling was found to be very sensitive to the position of the first sampled plant. Key words: detection, geneflow model, maize, stratified sampling.
IntroductionSince the introduction of GM crops decades ago, the coexistence between genetically modified (GM) and conventional crops due to gene flow between fields during pollination has been an issue. In the case of maize, the European Union defined a threshold of 0.9% of GM in nonGM crops in order to ensure the coexistence of the two types of crops; since then, a maize crop is classified as nonGM if its GM content is lower than 0.9%. Therefore, it would be useful to evaluate the rate of transgene in a conventional maize field by sampling some ears in the field before the harvest. Polymerasechainreaction (PCR) methods can be used to identify adventitious presence of GM. However, since these methods are costly, the sampling must be costefficient. The sampling methods that are currently used are variants of systematic sampling, where samples are selected regularly on a grid dividing the field. For instance, transect sampling (where samples are placed regularly on convergent axes) was used in the study by Colbach, ClermontDauphin, and Meynard (2001). Messeguer et al. (2006) proposed a method where the field is divided in quadrangles and the angles of these quadrangles define the sample points. These methods focus on covering the best parts of the field to sample without being too regular (bias source). However, they do not take into account prior information about geneflow characteristics and location of GM presence/absence in the fields. It was shown that gene flow is influenced by wind direction, wind speed, distance between fields, and by other factors. It will be interesting to take advantage of this information to design efficient sampling strategies by locating the sampling sites in different locations showing contrasted levels of risk of GM presence. Among the research done in the coexistence thematic, semimechanistic geneflow models (Angevin et al., 2003) were developed to infer the rate of transgene in conventional fields at the field and the landscape scale. Those models can infer the quantity of transgenic grains in each ear of the conventional field, so we can use them as an auxiliary variable on the ear scale. In this article, we aim at comparing several sampling methods coming from survey theory and based on the use of auxiliary variables. In this study, auxiliary variables correspond to some outputs of some gene flow models. Sampling methods based on auxiliary variables are compared to standard sampling methods using real data. Data of Transgene Dispersion In this study, sampling methods were tested using data coming from three siteyears (Figure 1): the Montargis experiments (1998, 1999) described in Klein, Lavigne, Foueillassar, Gouyon, and Laredo (2003) and the Mas Cebria experiment (2004) described in Palaudelmàs et al. (2012). These experiments provide us with different situations of coexistence since the average transgene ratios were contrasted between the three siteyears (much lower than, close to, or much higher than the 0.9% threshold). In each case, cultivars of different colors were used between the receptor field and the emitter field. In the Montargis experiment, a bluegrain maize (cultivar Adonis) was used as emitter and a yellowgrain maize (cultivar Adonis) as receptor. Adonis was used as a proxy of GM maize; nonGM maize grains contaminated by Adonis pollen show a blue coloration that is dominant for heterozygote grains. In the Mas Cebria experiment, a transgenic yellowgrain (dominant color) maize was used as the emitter and a whitegrain maize as the receptor.
Figure 1. Characteristics of the three siteyears used to compare sampling methods.
Montargis 1998 Trial The experimental design consisted of a plot of 120m × 120m sown with blue grains in the central part of a yellowgrain field. Ears were sampled under a regular grid of 1.6m × 2m (10% sampling ratio) up to 20m of the blue spot and 2.4m × 4m further (3.3% sampling ratio). The final sample size was 2,937 ears, with an average transgene ratio of 1.12% of blue grains in the yellow field. The total number of grains on each ear—which we need in order to calculate the rate of transgene rather than the number of transgenic grains—was not measured, but was estimated by 394 instead (Klein, 2003). Montargis 1999 Trial The experimental design consisted of a plot of 180m × 145m sown with blue grains, surrounded by nude soil, then by a yellowgrain field. Both the bluegrain plot and nude soil ring were not in the central part of the field, but departed against most common winds. The sampling grid was 0.8m × 4m (10% sampling ratio) up to 20m of the blue spot and 1.6m × 4m further (5% sampling ratio). The final sample size was 4,430 ears, with an average transgene ratio of 0.36% of blue grains in the yellow field. We also used 394 grains as the estimation of the average total number of grains on each ear. Mas Cebria 2004 Trial The experimental design consisted of a central plot sown with four different hybrids of the GM cultivar MON810, all of them exhibiting yellow grains. The central plot was surrounded by field sown with a conventional cultivar exhibiting white grains. The sampling procedure described in Palaudelmàs et al. (2012) determined 708 points to sample three ears every time. The average transgene ratio was 1.90% of yellow grains in the white field. In this experiment, yellow kernels and the total number of grains were counted for each ear. Genedispersion Model Some of the sampling methods tested in this article require a covariable independent of the real data. A semimechanist model, developed by Arnaud Bensadoun (presented in a paper at the GMCC 2013 conference), was used to predict the transgene presence rate at the ear level (Bensadoun et al., 2013). This output variable was used as an auxiliary variable to design some of our sampling strategies. The model includes three inputs—the wind direction and its speed during the flowering period, and the spatial configuration of the situation. The model computes an efficient distance r^{*} as
where ω is the angle between the receptor point and the emitter point, θ is the wind force, ω_{0} is the wind direction, and r is the distance between the emitter and the receptor. A dispersion function ϒ(r^{*}) was computed as
where a, b, c, and D are the parameters of the model that need to be fitted. Given the dispersion function, the probability μ of a GM pollen to pollinize the receptor was
where K is the total number of grains on the receptor ear. This model considers only one GM emitter (i.e., the closest emitter to the receptor) and the number of transgenic grains on the receptor ear was calculated as
where ZIP is a zeroinflated Poisson distribution so that y takes the value 0 with a probability p (another parameter of the model to be inferred) and takes a values drawn from a Poisson distribution of parameter μ with a probability 1 − p. The parameters (a, b, c, D, p) of the model were fitted using independent data as follows. Parameters to be used with Montargis 1998 data were adjusted from the data of Montargis 1999, while both Montargis 1999 and Mas Cebria 2004 used a parameter set obtained from Montargis 1998. For each dataset, we ran the model 2,000 times and used the mean of this output as an auxiliary variable for sampling design. The correlation coefficient between the real data and the dispersion model reached 0.751 for Montargis 1998, 0.671 for Montargis 1999, and 0.701 for Mas Cebria 2004. Sampling Methods The objective is to estimate the average transgene rate Ῡ of a field U: {k ∈ ⟦1,N⟧ }, including N ears using a sample t = {t_{1},…,t_{n}} of n ears selected in U. Geometric Methods These methods take into account the spatial configuration of the receptor field, but they do not need any quantifiable prior information (see below). They generally focus on minimizing the maximum distance between two samples in order to provide the best possible exploration of the field. Systematic Sampling This sampling method consists of dividing the field into equal squares and selects one sample by square. The first sample point is chosen randomly in the first square and the others samples are chosen at the same point of placement in the other squares. Messeguer et al. Sampling A variant of grid sampling was proposed by Messeguer in 2006 and is now widely used for the sampling in coexistence scenarios. It considers that most of the GM contamination occurs near the boundaries of the receptor field, so the samples are preferentially selected in these zones. Transects are traced between the points at ⅓ and ⅔ of the distance of the corner of each side. Samples are selected at 0m, 3m, and 10m from each boundary and at the intersections of these transects. The transgene rate of an area delimited by four sample points is estimated by the mean of the transgene rate measured at these four points. Then, the global transgene rate of the field is the sum of the rates obtained for all areas reweighted by its proportion of surface. The emitter GM field in each one of our datasets is located in the center of the receptor field, so this sampling method is not really adapted for this coexistence situation. Numeric Methods Sampling at Random We suppose here that we do not dispose of any prior information. Samples are selected at random without remise in the whole field, and the estimator of the transgene rate is estimated by . Its variance is , where is the dispersion of y on the entire field. This method will be used as a reference. Reweighting of a Random Sampling This method is based on the use of an auxiliary variable x whose value is known on the entire field (here, the output of the gene flow model). In particular, the mean of x over the whole field U is , its dispersion is , and its mean on sample t is . A random sample can be reweighted to increase the accuracy of the estimator of . Several reweighting methods can be used. Ratio Reweighting If we can consider that the geneflow model predicts quite well the areas without contamination (where y is null), we will suppose that x is proportional to y. The coefficient of proportionality R = can be estimated by . Then the estimator of , for a sample selected at random, is
Its variance is
where . When R^{2}S_{x} − 2RS_{xy} < 0, this variance is majored by var(Ŷ_{SRS}). The condition is satisfied when
where ρ, , and CV(y) = , i.e., when the correlation coefficient ρ is high enough. So if the geneflow model is correlated enough with the real data, the ratio reweighting increases the accuracy of the sampling at random. Regression Reweighting We consider a similar hypothesis to the one that leads to the ratio reweighting; here we suppose that x and y are related by a linear relationship. The model can be seen as a generalization of the precedent model since it only adds an intercept. We note α and β so that ∀k ∈ U,y_{k} ≅ α + β x_{k }. Under this hypothesis, , estimator of Ῡ is defined as
Its variance is
This variance is always inferior to the variance of Ŷ_{SRS} since > 0 and it converges toward var(Ŷ_{SRS}) when the correlation coefficient ρ converges toward 0. Ratio or regression estimator—which is better? When R^{2}S_{x} − 2RS_{xy} + > 0, we have > 0, so the regression adjustment Ŷ_{Reg} converges always faster than Ŷ_{Ratio}. Stratified Sampling The principle of this method is to create strata which separate the population into subpopulations that are as homogeneous as possible for x, and we suppose that those subpopulations will be homogeneous for y as well. Let’s note H the number of strata that are created, N_{h} the size of the subpopulation of the stratum h, n_{n}, the number of samples selected in the stratum h ( and ) and the mean of y on this subpopulation. Then, the HorvitzThomson estimator of for stratified sampling is
It is unbiased. Its variance is
There are two possibilities to minimize this variance. The first one consists of minimizing the intrastrata variances , which can be done by building strata that follow the distribution of y; the h^{th} stratum is the subpopulation of U, which contains the elements i for which where q_{k} (y) is the quantile k% of the empirical distribution of y. Since we do not know the distribution of y, we use the quantiles of x to create our strata. The second possibility to minimize the variance consists of distributing the total number of samples n among the h strata (Neyman optimal allocation) according to the following rule:
With this allocation, the strata with highest dispersion are the most sampled. As is unknown, we use
Comparison of Sampling Methods Using the Data Figure 2 shows the rate of wrong decisions for the three siteyears (either false positive or false negative depending on the siteyear). Results are given as a function of the sample size. In most cases, the sampling methods based on the auxiliary variable performed better than the random method and the systematic method; they led to a lower rate of wrong decision. In Montargis 1999, the stratified method based on five strata led to a rate of false positive lower than 5% when the sample size was higher than 25, and the regressionbased method also gave very good results for small sample sizes. On this siteyear, the best method was stratified sampling with 50 strata for sample sizes higher than 50. Stratified sampling with five strata led to low rates of false negatives (lower than 5%) in Mas Cebria, even when only a few ears were collected in the field. The accuracy of all sampling methods was lower in Montargis 1998 due to a rate of contamination very close to the threshold of 0.9%. For this siteyear, the stratified method based on 50 strata led to a falsenegative rate lower than 5% when the sample size was higher than 100 ears. The other methods led to a falsenegative rate higher than 5% even when the sample size was equal to 200 ears.
Figure 2. Rate of wrong decisions (false positive or false negative) obtained with different sampling methods in three siteyears. The sample size is expressed in number of maize ears.
Overall, the sampling methods based on the auxiliary variable performed better than random sampling and systematic sampling. This is due to the relatively high correlations obtained between the observed rates of contamination and the simulated values (from 0.67 to 0.75). These results show that sampling methods based on an auxiliary variable are useful to reduce sample size and to improve the accuracy of GM detection. Conclusion Our results show the benefit of using the output of a geneflow model as an auxiliary variable for estimating transgene presence rate in an agricultural field. Sampling methods using the geneflow model output performed better than simple random sampling in most of the considered situations. Regression reweighting, ratio reweighting, and stratified sampling systematically led to lower rates of misclassification for the three considered siteyears. In practice, methods using geneflow model output as auxiliary variables can be used in different ways. Ratio and regression reweighting methods can be used to reweight ear samples. Stratified sampling based on an auxiliary variable allows one either to reduce the sample size to reach a given level of misclassification rate or to increase the accuracy of the transgene estimates for a given sample size. Angevin, F., Klein, E.K., Choimet, C., Gauffreteau, A., Lavigne, C., Messéan, A., & Meynard, JM. (2008). Modelling impacts of cropping systems and climate on maize crosspollination in agricultural landscapes: The MAPOD model. European Journal of Agronomy, 28, 471484. Bensadoun, A., Monod, H., Angevin, F., Makowski, D., & Messéan, A. (2013). Modeling of gene flow by a Bayesian approach: A new perspective for decision support. AgBioForum, 17(3), 213220. Available on the World Wide Web: http://www.agbioforum.org. Colbach, N., ClermontDauphin, C., & Meynard, J.M. (2001). GENESYS: A model of the influence of cropping system on gene escape from herbicide tolerant rapeseed crops to rape volunteers. I. Temporal evolution of a population of rapeseed volunteers in a field. Agriculture, Ecosystems & Environment, 83, 235253. Klein, E.K., Lavigne, C., Foueillassar, X., Gouyon, P.H., & Laredo, C. (2003). Corn pollen dispersal: Quasi mechanistic models and field experiments. Ecological Monographs, 73, 131150. Messeguer, J., Penas, G., Ballester, J., Bas, M., Serra, J., Salvia, J., et al. (2006). Pollenmediated gene flow in maize in real situations of coexistence. Plant Biotechnology Journal, 4, 633645. Palaudelmàs, M., Mele, E., Monfort, A., Serra, J., Salvia, J., & Messeguer, J. (2012). Assessment of the influence of field size on maize gene flow using SSR analysis. Transgenic Research, 21, 471483. This study was partially funded by the European Union project PRICE (PRactical Implementation of Coexistence in Europe), contract number 289157.
Suggested citation:
Bancal, R., Bensadoun, A., Messéan, A., Monod, H., & Makowski, D. (2014). Comparison of sampling strategies to evaluate rate of transgenic adventitious presence in agricultural fields. AgBioForum, 17(3), 166171. Available on the World Wide Web: http://www.agbioforum.org.

About AgBioForum
Welcome to ABFEditorial Board Privacy Policy Copyright Policy Author Services
Submission GuidelinesManuscript FastTrack™ Member Services
Become a MemberMembership Benefits Feedback
Contact UsComment on ABF  
© 2014 AgBioForum  Design and support provided by Express Academic Services  Contact ABF: editor@agbioforum.org 