Chapter 12: Multi Environment Trials: Linear Mixed Models

William Beavis; Anthony Assibi Mahama; Walter Suza

Chapter 12: Multi Environment Trials: Linear Mixed Models

William Beavis; Anthony Assibi Mahama; and Walter Suza

The general linear model (Equation 1) can be applied to replicated trial data for basic prediction purposes. It is, however, not adequate in all experimental situations, especially where trials are conducted in more than one environment. This chapter explores the role of mixed linear models, BLUEs (Best Linear Unbiased Estimates), and BLUPs (Best Linear Unbiased Predictors) in the analysis of multiple environment trial data to characterize and select among entries.

[latex]Y_{ij}=\mu+g_{i}+r_{j}+\varepsilon_{ij}[/latex]

[latex]\textrm{Equation 1}[/latex] General linear model for basic prediction of replicated trials.

where:
[latex]Y_{ij}[/latex] = the phenotypic response,
[latex]\mu[/latex] = population (or overall) mean,
[latex]g_{i}[/latex] = genotypic units,
[latex]r_{j}[/latex] = number of replicates of the genotypic units,
[latex]\varepsilon_{ij}[/latex] = residual source of variability.

Learning Objectives

Conceptual basis of mixed linear models
Review matrix algebra
The meaning of BLUE and BLUP

Henderson’s Concept

C.R. Henderson recognized the challenge of prediction using models such as Equation 1 and addressed it using the concept of shrinkage estimators for the genotypic units in the model. Note that the fitted regression line provides predictions that are ‘shrunken’ to the line rather than scattered around the line. Henderson’s idea, first published in 1963, was framed in the context of the matrix form of Equation 2, which can be explained using scalar algebra.

First, let us obtain phenotypic averages for each genotypic unit. Next minimize the difference of N₁, where E represents the expectation, that is, the average for the genotypic unit, m is the population mean and g_i is the genotypic value from the scalar version of model Equation 2. In this case, we need to find a value that will ensure that the sum of the squared differences is minimal. As with Equation 2, a little knowledge of how to obtain partial derivatives provides the answer:

[latex]w_{i}=\frac{\sigma^2_g}{\left( \frac{\Large\sigma^2_g + \sigma^2_e}{r} \right)}[/latex]

[latex]\textrm{Equation 2}[/latex] Formula for calculating intra-class correlation.

where:
[latex]w_i[/latex] = intra-class correlation coefficient or broad sense heritability,
[latex]\sigma^2_g[/latex] = genotypic variance,
[latex]\sigma^2_e[/latex] = residual ( or error) variance,
[latex]r[/latex] = number of replications.

This is known as the intra-class correlation coefficient. It is also known as the broad sense heritability, but for now we will refer to it as a shrinkage factor. When w_i is multiplied by (Y_i – m) it will provide the Best Linear Unbiased Predictor of g_i. Notice that the predictions of genotypic values are scaled towards the mean BV, which by definition is zero.

Example Prediction 1

If the overall mean is the only fixed effect (one environment), all lines are unrelated to each other, and the data are balanced, then the predicted genotypic value is obtaining using Equation 3:

[latex]\hat{u}_{j}=w\left( Y_{j}-\hat{Y} \right)[/latex]

[latex]\textrm{Equation 3}[/latex] Formula for calculating predicted genotypic value.

where:
[latex]\hat{u}_{j}[/latex] = predicted genotypic value of genotype j,
[latex]w[/latex] = the Shrinkage factor,
[latex]Y_{j}[/latex] = the phenotype of genotype j,
[latex]\hat{Y}[/latex] = the predicted phenotype.

If [latex]w[/latex] is equal to zero, [latex]\hat{u}_j[/latex] would be zero.

If [latex]w[/latex] is equal to one, [latex]\hat{u}_j[/latex] equals the phenotypic values.

Let us demonstrate this with a simple data set in which four unrelated lines (A, B, C, D) were evaluated for yield (t/ha) in hybrid combination with a single tester (Z) in single rep tests at N environments (Table 1). For this simple example we are only interested in the impact of number of environments (replicates) on w_i and its subsequent impact on the predicted value for each g_i. Also, assume that the residual variance, σ_e²= 40.

Summary Data

Table 1 Summary data of four inbreds evaluated in hybrid combination with one tester (Z) in single rep tests at 10 environments.
Hybrid	[latex]\bar Y_{i.}[/latex]	[latex]\bar Y_{..}[/latex]	[latex]\bar Y_{i.} - \bar Y_{..}[/latex]	[latex]N_{1}[/latex]	[latex]w_{i}(Y_{i.}-\bar Y_{..})[/latex]	[latex]N_{2}[/latex]	[latex]w_{i}(Y_{i.}-\bar Y_{..})[/latex]
AxZ	7	10	-3	10	-2.5	2	-1.5
BxZ	9	10	-1	10	-0.83	2	-.05
CxZ	11	10	1	10	0.83	2	0.5
DxZ	13	10	3	10	2.5	2	1.5

Prove for yourself that the estimated σ_e² = 20.

Some things to notice from the table:

The data are from balanced trials, i.e., all genotypic units are evaluated in the same number of environments (either 2 or 10).
With a large N, the observed differences will be equal to the predicted values.
For balanced trials, shrinkage does not change the relative ranking.

In essence the shrinkage predictor provides us with a value that not only includes the difference relative to the mean, but also weights it by our confidence in the magnitudes of the differences from the overall mean.

We need to consider how to obtain predictions for genotypic units in the more likely situations where not all genotypic units (lines, cultivars, hybrids) will be evaluated equally in all environments. Indeed, we now find it possible with marker technologies to predict the values of the genotypes before they have been grown.

Best Linear Unbiased Prediction

Henderson’s shrinkage predictor can now be considered in the context of the matrix form of the mixed model equation (Equation 4):

[latex]Y = X\beta + Z\mu + e[/latex]

[latex]\textrm{Equation 4}[/latex] Mixed model equation for predicting phenotype.

where:
[latex]Y[/latex] = Vector of observations (phenotypes),
[latex]X[/latex] = Design matrix for fixed effects,
[latex]\beta[/latex] = Vector of unknown fixed effects (to be estimated),
[latex]Z[/latex] = Design matrix of random effects,
[latex]\mu[/latex] = a vector of random effects (genotypic values to be estimated),
[latex]e[/latex] = a vector of residual errors (random effects to be estimated).

The random effects are assumed to be distributed as [latex]u \sim MVN (0, A),\ and\ ε \sim MVN (0, R)[/latex].

Just as estimates for [latex]\beta[/latex] in the matrix form of Equation 4 can be found using the normal equations, the normal equations for Equation 4 can be used to find least squares estimates for the parameters in Equation 5.

[latex]\small \begin{bmatrix} &\hat{\beta} \\ &\hat{\upsilon} \end{bmatrix} = \begin{bmatrix} &{X'R^{-1}X} &{X'R^{-1}Z} \\ &{Z'R^{-1}X} &{Z'R^{-1}X + A^{-1}(V_r/V_A)} \end{bmatrix}^- \begin{bmatrix} &X'R^{-1}y \\ &Z'R^{-1}y \end{bmatrix}[/latex]

[latex]\textrm{Equation 5}[/latex] Normal equations in matrix notation.

where:
[latex]\hat{\beta}[/latex] = the fixed effects parameters,
[latex]\hat{\upsilon}[/latex] = the random effects for the parameters,
[latex]X[/latex] = incidence matrix,
[latex]y[/latex] = a vector of observed phenotype (e.g., yield),
[latex]A[/latex] = the additive relationship matrix,
[latex]R[/latex] = the diagonal matrix,
[latex]Z[/latex] = incidence matrix,
[latex]V_{A}[/latex] = the additive variance,
[latex]V{R}[/latex] = the residual variance.

BLUEs and BLUPs

The values for[latex]\hat{\beta}[/latex] represent the Best Linear Unbiased Estimators (BLUE) of the fixed effects, while the values for [latex]\small E\left(w _{i}\left[ \bar{Y}_i-\mu \right]-g_{i} \right)^2[/latex] represent the Best Linear Unbiased Predictors (BLUP) of the random effects. It is important to remember that BLUE’s and BLUP’s are not methods; they are statistical properties of methods (there are many) that are capable of producing such values. These statistical properties include:

Best: the sampling variance of what is being estimated or predicted is minimized.
Linear: estimates or predictions are linear functions of the observations.
Unbiased: in BLUE indicates that the expected values of the estimates are equal to their true values. In BLUP, indicates that the sum of the predictions have an expectation of zero.
Estimators and Predictors: refer to algorithms that generate the estimated or predicted values.

For BLUE’s the effects are considered fixed. Examples include the overall mean, effects of different soil types, fertilizer treatments, etc. From a practical perspective, fixed effects do not have a covariance structure. Due to the practical perspective, we often consider environments as fixed effects.

Effects of BLUPs

The effects of BLUPs are considered random and it is possible to define covariance structures associated with these effects. Examples include breeding values, dominance effects, tester effects, etc. The challenge for application of methods that provide BLUPs is that Equation 3 assumes covariances and variances are known. The truth is that the variances of genetic and non-genetic random effects are not known. Rather in practice we estimate these values. Thus, all implementations of methods that provide BLUPs from mixed linear model equations provide only approximations of the unknown vector values.

Nonetheless, BLUP values are useful in practical plant breeding trials where designs are unbalanced. Indeed, a method that produces a BLUP value enables the estimation of genetic variances without having to resort to mating designs to obtain estimates of heritability. A typical trial will have different numbers of genotypic units from different families evaluated in different sets of environments, some replicated some not. BLUPs utilize covariance structures (covariances among genotypic units grown in the same sets of environments and covariances among relatives) to maximize information on the traits of interest. Thus, the true purpose of a plant breeding trial (to compare genotypes for purposes of selection), is enabled with the best possible values for comparison because BLUPs maximize the correlation between the true genotypic values and predicted values.

Example

While Equation 5 initially appears to be daunting, with the little bit of matrix algebra, introduced above, you have the skill to do these analyses using EXCEL. For example, consider the simple data set in Table 2 (adapted from Chapter 11 of Bernardo, 2010).

Note: This is an example of a self-pollinating crop (barley) and the number of environments does not factor into solving of the equation.

Table 2 Sample data from Chapter 11 of Bernardo, 2010.
Environments	No. of Env	Line	Yield
Low yield	18	1	4.45
Low yield	18	2	4.61
Low yield	18	4	5.27
High yield	9	2	5.00
High yield	9	3	5.82
High yield	9	4	5.79

We desire to translate this into the following model in Equation 6.

[latex]Y_{ijk} = \mu + G_i + E_j + GE_{ij} + \epsilon_{(ij)k}, \ i = 1, \cdots, g; \quad j =1, \cdots , e; \quad k =1, \cdots , n[/latex]

[latex]\textrm{Equation 6}[/latex] General linear model for basic prediction of replicated trials.

where:
[latex]Y_{ijk}[/latex] = the phenotypic response,
[latex]\mu[/latex] = population (or overall) mean,
[latex]G_{i}[/latex] = genotypic effect,
[latex]E_{j}[/latex] = environmental effect,
[latex]GE_{ij}[/latex] = genotypic by environment interaction effect,
[latex]\varepsilon_{(ij)k}[/latex] = residual source of variability.

In matrix notation, the data are represented in the model, (Equation 4, [latex]y = X\beta + Z\mu + \epsilon[/latex]) as in Equation 7:

[latex]\small \begin{bmatrix} 4.45 \\ 4.61 \\ 5.27 \\ 5.00 \\ 5.82 \\ 5.79 \end{bmatrix} = \small \begin{bmatrix} 1 &0 \\ 1 &0 \\ 1 &0 \\ 0 &1 \\ 0 &1 \\ 0 &1\end{bmatrix}\small \begin{bmatrix} b_1\\b_2 \end{bmatrix} + \small \begin{bmatrix} 1 &0 &0 &0 \\ 0 &1 &0 &0 \\ 0 &0 &0&1 \\0 &1 &0 &0 \\ 0 &0 &1 &0 \\ 0 &0 &0 &1\end{bmatrix} \small \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4\end{bmatrix} + \small \begin{bmatrix} e_1 \\ e_2 \\ e_3 \\ e_4 \\ e_5 \\ e_6\end{bmatrix}[/latex]

[latex]\textrm{Equation 7}[/latex] Matrix notation of Equation 4 and Table 2 data.

Linear Mixed Model Solution

The LMM solution is represented in Equation 8:

[latex]\small \begin{bmatrix} &\hat{\beta} \\ &\hat{\upsilon} \end{bmatrix} = \begin{bmatrix} &{X'R^{-1}X} &{X'R^{-1}Z} \\ &{Z'R^{-1}X} &{Z'R^{-1}X + A^{-1}(V_r/V_A)} \end{bmatrix}^- \begin{bmatrix} &X'R^{-1}y \\ &Z'R^{-1}y \end{bmatrix}[/latex]

[latex]\textrm{Equation 8}[/latex] Linear mixed model solution for Equation 7.

where:

[latex]\small \begin{bmatrix} \hat{\beta} \\ \hat{u} \end{bmatrix}=\small \begin{bmatrix} \hat{b}_1 \\ \hat{b}_2 \\ \hat{u}_1 \\ \hat{u}_2 \\ \hat{u}_3 \\ \hat{u}_4 \end{bmatrix},\ X = \small \begin{bmatrix} 1 &0 \\ 1 &0 \\ 1 &0 \\ 0 &1 \\ 0 &1 \\ 0 &1\end{bmatrix},\ y = \small \begin{bmatrix} 4.45 \\ 4.61 \\ 5.27 \\ 5.00 \\ 5.82 \\ 5.79 \end{bmatrix}, Z = \small \begin{bmatrix} 1 &0 &0 &0 \\ 0 &1 &0 &0 \\ 0 &0 &1&0 \\0 &1 &0 &0 \\ 0 &0 &0 &1 \\ 0 &0 &1 &0\end{bmatrix}[/latex],

[latex]\small R = \small \begin{bmatrix} 1\over18 &0 &0 &0 &0 &0\\ 0 &1\over18 &0 &0 &0 &0 \\ 0 &0 &1\over18 &0 &0 &0 \\0 &0 &0 &1\over9 &0 &0 \\ 0 &0 &0 &0 &1\over9 &0 \\ 0 &0 &0 &0 &0 &1\over9\end{bmatrix}, A=\small \begin{bmatrix} 2 &0 &0 &0 \\ 0 &2 &0 &0 \\ 0 &0 &2 &0 \\ 0 &0 &0 &2 \end{bmatrix}[/latex].

Thus, R represents a matrix that weights the calculations by the number of observations that contribute to the estimated mean values of each cultivar in each type of environment.

Estimated Residual Variance

Assuming that the lines are unrelated to each other,

[latex]\small A=\small \begin{bmatrix} 2 &0 &0 &0 \\ 0 &2 &0 &0 \\ 0 &0 &2 &0 \\ 0 &0 &0 &2 \end{bmatrix},\ and\ \large \frac{V_{R}}{V_{A}}[/latex] is the ratio of the estimated residual variance (sometimes incorrectly referred to as the estimate of the experimental error) to the estimated additive genetic variance. For purposes of illustration let us consider this estimated ratio to be 5, i.e., the estimated additive genetic variance is 20% as large as the residual variability.

Calculations for the example have been implemented in an EXCEL spreadsheet: BLUEs and BLUPs of 4 barley varieties [XLSX].

As an exercise to conduct on your own, consider implementing the LMM.7 for the example on estimation of means using lsmeans discussed in the review module:
Review of EDA and Estimation [DOC]:

Download R-code example eda aov lsmeans [TXT]
Download R-code example for mixed models [TXT]
Download R-code example Regression [TXT]

Reference

Henderson, C. R. 1963. Estimation of variance and covariance components. Biometrics. 9: 226.

How to cite this chapter: Beavis, W., A. A. Mahama, & W. Suza. 2023. Multi Environment Trials: Linear Mixed Models. In W. P. Suza, & K. R. Lamkey (Eds.), Quantitative Genetics for Plant Breeding. Iowa State University Digital Press.

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Quantitative Genetics for Plant Breeding Copyright © 2023 by William Beavis; Katherine Espinosa; Mark Newell; Anthony Assibi Mahama; and Walter Suza is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.