Information

Additive genetic variance components from LMER in R

Additive genetic variance components from LMER in R


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I've set up some dummy data in R which makes 40 genetically related lines, they are all siblings within a line so are genetically related by a factor of ½ thus additive genetic variance should be twice the variance explained by line. For the lines there are 200 individuals being measured, for three characters/traits. The first trait has low phenotypic variance, the second has high environmental variance, and the third has high genetic variance.

rm(list=ls()) re = 200 # replicate individuals per line li = 40 # lines # setup set.seed(5) data = data.frame(rep(1:li, each = re)) colnames(data)="Line" library(nlme) library(lme4) par(mfrow=c(1,3)) # trait 1: little variance (Va or Ve) data$Trait = rnorm(li*re,10,1) boxplot(data$Trait~data$Line, ylim=c(0,20), main = expression("Low V"[A]*"& Low V"[E])) var1 = var(data$Trait); mean1 = mean(data$Trait); m1 = lmer(data$Trait ~ (1|data$Line)) # trait 2: high enivronmental variance, little Va data$Trait = rnorm(li*re,10,1); data$Trait = data$Trait + rnorm(re*li,0,3) boxplot(data$Trait~data$Line, ylim=c(0,20),main = expression("Low V"[A]*"& High V"[E])) var2 = var(data$Trait); mean2 = mean(data$Trait); m2 = lmer(data$Trait ~ (1|data$Line)) # trait 3: high additive genetic variance, little Ve data$Trait = rnorm(li*re,10,1); data$Trait = data$Trait+rep(rnorm(li,0,3),each=re) boxplot(data$Trait~data$Line, ylim=c(0,20), main = expression("High V"[A]*"& Low V"[E])) var3 = var(data$Trait); mean3 = mean(data$Trait); m3 = lmer(data$Trait ~ (1|data$Line))

Plots of the three traits,

Then to estimate additive variance ($V_A$) I extract the line variance ($V_L$) from the lmer model and double it,

# line variances (variance in additive effect of each haploid genome) m1_line = unlist(VarCorr(m1))[[1]]; m2_line = unlist(VarCorr(m2))[[1]]; m3_line = unlist(VarCorr(m3))[[1]] # additive variance (double the line variance because it is a hemiclone) m1_add = 2*m1_line; m2_add = 2*m2_line; m3_add = 2*m3_line

Residual variances should be (assuming perfect experimental design, no measurement error etc.) the estimate of environmental variance ($V_E$) and phenotypic variance ($V_P$) should be the sum of $V_L$ and $V_E$,

# residual variance m1_res = attr(VarCorr(m1), "sc")^2 m2_res = attr(VarCorr(m2), "sc")^2 m3_res = attr(VarCorr(m3), "sc")^2 # phenotypic variance m1_phe = m1_line + m1_res m2_phe = m2_line + m2_res m3_phe = m3_line + m3_res

Heritability is additive variance divided by the phenotypic variance

$h^2 = V_A / V_P$

But I think it is correct in this case to use line variance rather than additive variance (if someone could explain in an answer that would be useful), so I've done,

# heritability (line/ (line+ residual)) m1_h2 = m1_line/ m1_phe m2_h2 = m2_line/ m2_phe m3_h2 = m3_line/ m3_phe m1_h2; m2_h2; m3_h2

My question(s):

Is it appropriate to use thelmerfunction in R to extract variance components in this manner?

Have I calculated $V_A$, $V_E$, $V_P$, $h^2$ correctly? I think $V_A$ and $V_E$ are correct, $V_P$ could perhaps be the sum of $V_A$ and $V_E$ rather than $V_L$, and subsequently $h^2$ may be $V_A/V_P$.


From what I understood of your code and what you are asking I am guessing that you do the following:

Generating a virtual set of 40 individuals (lines) of which you have 200 measurements (repetitions). You say that they are full siblings, so they share both parents. Then you use the lmer function (which I am not familiar with) to give you the total variance, within-group variance and between-group variance, ($V_{T}, V_W, V_B$ respectively). What you call $V_L$ would be the $V_W$.

We know that $V_P=V_A+V_D+V_E$

In full siblings we also know that $V(A,A')=frac{1}2V_A+frac{1}4V_D$ where $A$ is the additive value and $D$ is the dominance value because they share a quarter of their gene combinations.

If you are treating with full siblings the $V_W$ should be then the $V_E$ and the $V_B$ would be the $frac{1}2V_A+frac{1}4V_D$ as you are seeing the variance between siblings.

So no, you are not calculating correctly the $V_A$ because you are missing your $V_D$ on your counting. The code itself looks fine for what I understood, but maybe you should take that to corssvalidated with the R tag


Nadiv : an R package to create relatedness matrices for estimating non-additive genetic variances in animal models

1. The Non-Additive InVerses ( nadiv ) R software package contains functions to create and use non-additive genetic relationship matrices in the animal model of quantitative genetics.

2. This study discusses the concepts relevant to non-additive genetic effects and introduces the package.

3. nadiv includes functions to create the inverse of the dominance and epistatic relatedness matrices from a pedigree, which are required for estimating these genetic variances in an animal model. The study focuses on three widely used software programs in ecology and in evolutionary biology (ASReml, MCMCglmm and WOMBAT) and how nadiv can be used in conjunction with each. Simple tutorials are provided in the Supporting Information.


Abstract

Full factorial breeding designs are useful for quantifying the amount of additive genetic, nonadditive genetic, and maternal variance that explain phenotypic traits. Such variance estimates are important for examining evolutionary potential. Traditionally, full factorial mating designs have been analyzed using a two-way analysis of variance, which may produce negative variance values and is not suited for unbalanced designs. Mixed-effects models do not produce negative variance values and are suited for unbalanced designs. However, extracting the variance components, calculating significance values, and estimating confidence intervals and/or power values for the components are not straightforward using traditional analytic methods. We introduce fullfact – an R package that addresses these issues and facilitates the analysis of full factorial mating designs with mixed-effects models. Here, we summarize the functions of the fullfact package. The observed data functions extract the variance explained by random and fixed effects and provide their significance. We then calculate the additive genetic, nonadditive genetic, and maternal variance components explaining the phenotype. In particular, we integrate nonnormal error structures for estimating these components for nonnormal data types. The resampled data functions are used to produce bootstrap-t confidence intervals, which can then be plotted using a simple function. We explore the fullfact package through a worked example. This package will facilitate the analyses of full factorial mating designs in R, especially for the analysis of binary, proportion, and/or count data types and for the ability to incorporate additional random and fixed effects and power analyses.


Results

The genetic parameters and goodness-of-fit statistics, estimated for each model, are summarized in Table 2. Both P_A and M_A models had narrow-sense heritability (h 2 ) >0.30. After including the dominance effect in the pedigree-based model (P_AD), h 2 decreased by ∼26% and the dominance ratio (d 2 ) estimate was small (0.06) and nonsignificant (2 × SE(d 2 ) > 0.06). When the dominance effect was included with the molecular marker-based model (M_AD), the h 2 decreased 47%, to 0.20, and d 2 increased to 0.12. With the M_AD model, the dominance variance represents 60% of the additive value and 39% of the total genetic variation. We further extended these models to include the additive-by-additive, dominance-by-dominance, and additive-by-dominance first-order epistatic interaction factors in three separate models. In pedigree-based models, P_A#A, P_D#D, and P_A#D, the estimations of variance components for additive and dominance varied only slightly from those of the P_AD model (Supporting Information, Table S1). Moreover, epistasis estimates were zero in all three models. When the additive-by-additive, dominance-by-dominance, and additive-by-dominance interactions were added (models M_A#A, M_D#D, and M_A#D), the narrow-sense heritability dropped by >30% and the dominance ratio by 80%, compared to the M_AD model. The epistatic ratio (i 2 ) was estimated at 0.15, 0.12, and 0.14 for the M_A#A, M_D#D, and M_A#D models, respectively ( Table 2). The alternative parameterization for the dominance genomic relationship matrix proposed by Vitezica et al. (2013) showed similar results regarding the partition of additive and nonadditive effects (Table S2).

Estimates of genetic parameters (standard errors in parentheses) and goodness-of-fit measures

. P_A . M_A . P_AD . M_AD . P_A#A . M_A#A . P_D#D . M_D#D . P_A#D . M_A#D .
h 2 SE(h 2 ) 0.32 (0.017) 0.347 (0.018) 0.235 (0.047) 0.199 (0.035) 0.233 (0.047) 0.088 (0.044) 0.228 (0.046) 0.139 (0.036) 0.231 (0.047) 0.1251 (0.038)
d 2 SE(d 2 ) NA NA 0.056 (0.033) 0.117 (0.029) 0.055 (0.033) 0.023 (0.034) 0.058 (0.032) 0.009 (0.033) 0.056 (0.043) 0.006 (0.035)
i 2 SE(i 2 ) NA NA NA NA 0.000 (0.000) 0.154 (0.038) 0.000 (0.000) 0.121 (0.028) 0.000 (0.000) 0.135 (0.031)
H 2 SE(H 2 ) 0.32 (0.017) 0.347 (0.018) 0.29 (0.021) 0.316 (0.018) 0.288 (0.021) 0.264 (0.019) 0.286 (0.021) 0.269 (0.018) 0.288 (0.021) 0.266 (0.018)
LogL −1299.40 −1323.73 −1295.37 −1307.63 −1294.83 −1293.53 −1293.90 −1292.19 −1294.38 −1292.54
AIC 2606.80 2655.46 2602.74 2627.26 2605.66 2603.06 2603.80 2600.38 2604.76 2601.08
. P_A . M_A . P_AD . M_AD . P_A#A . M_A#A . P_D#D . M_D#D . P_A#D . M_A#D .
h 2 SE(h 2 ) 0.32 (0.017) 0.347 (0.018) 0.235 (0.047) 0.199 (0.035) 0.233 (0.047) 0.088 (0.044) 0.228 (0.046) 0.139 (0.036) 0.231 (0.047) 0.1251 (0.038)
d 2 SE(d 2 ) NA NA 0.056 (0.033) 0.117 (0.029) 0.055 (0.033) 0.023 (0.034) 0.058 (0.032) 0.009 (0.033) 0.056 (0.043) 0.006 (0.035)
i 2 SE(i 2 ) NA NA NA NA 0.000 (0.000) 0.154 (0.038) 0.000 (0.000) 0.121 (0.028) 0.000 (0.000) 0.135 (0.031)
H 2 SE(H 2 ) 0.32 (0.017) 0.347 (0.018) 0.29 (0.021) 0.316 (0.018) 0.288 (0.021) 0.264 (0.019) 0.286 (0.021) 0.269 (0.018) 0.288 (0.021) 0.266 (0.018)
LogL −1299.40 −1323.73 −1295.37 −1307.63 −1294.83 −1293.53 −1293.90 −1292.19 −1294.38 −1292.54
AIC 2606.80 2655.46 2602.74 2627.26 2605.66 2603.06 2603.80 2600.38 2604.76 2601.08

Each column represents a different model. See Table 1 for matrices included in the model.

. P_A . M_A . P_AD . M_AD . P_A#A . M_A#A . P_D#D . M_D#D . P_A#D . M_A#D .
h 2 SE(h 2 ) 0.32 (0.017) 0.347 (0.018) 0.235 (0.047) 0.199 (0.035) 0.233 (0.047) 0.088 (0.044) 0.228 (0.046) 0.139 (0.036) 0.231 (0.047) 0.1251 (0.038)
d 2 SE(d 2 ) NA NA 0.056 (0.033) 0.117 (0.029) 0.055 (0.033) 0.023 (0.034) 0.058 (0.032) 0.009 (0.033) 0.056 (0.043) 0.006 (0.035)
i 2 SE(i 2 ) NA NA NA NA 0.000 (0.000) 0.154 (0.038) 0.000 (0.000) 0.121 (0.028) 0.000 (0.000) 0.135 (0.031)
H 2 SE(H 2 ) 0.32 (0.017) 0.347 (0.018) 0.29 (0.021) 0.316 (0.018) 0.288 (0.021) 0.264 (0.019) 0.286 (0.021) 0.269 (0.018) 0.288 (0.021) 0.266 (0.018)
LogL −1299.40 −1323.73 −1295.37 −1307.63 −1294.83 −1293.53 −1293.90 −1292.19 −1294.38 −1292.54
AIC 2606.80 2655.46 2602.74 2627.26 2605.66 2603.06 2603.80 2600.38 2604.76 2601.08
. P_A . M_A . P_AD . M_AD . P_A#A . M_A#A . P_D#D . M_D#D . P_A#D . M_A#D .
h 2 SE(h 2 ) 0.32 (0.017) 0.347 (0.018) 0.235 (0.047) 0.199 (0.035) 0.233 (0.047) 0.088 (0.044) 0.228 (0.046) 0.139 (0.036) 0.231 (0.047) 0.1251 (0.038)
d 2 SE(d 2 ) NA NA 0.056 (0.033) 0.117 (0.029) 0.055 (0.033) 0.023 (0.034) 0.058 (0.032) 0.009 (0.033) 0.056 (0.043) 0.006 (0.035)
i 2 SE(i 2 ) NA NA NA NA 0.000 (0.000) 0.154 (0.038) 0.000 (0.000) 0.121 (0.028) 0.000 (0.000) 0.135 (0.031)
H 2 SE(H 2 ) 0.32 (0.017) 0.347 (0.018) 0.29 (0.021) 0.316 (0.018) 0.288 (0.021) 0.264 (0.019) 0.286 (0.021) 0.269 (0.018) 0.288 (0.021) 0.266 (0.018)
LogL −1299.40 −1323.73 −1295.37 −1307.63 −1294.83 −1293.53 −1293.90 −1292.19 −1294.38 −1292.54
AIC 2606.80 2655.46 2602.74 2627.26 2605.66 2603.06 2603.80 2600.38 2604.76 2601.08

Each column represents a different model. See Table 1 for matrices included in the model.

Goodness-of-fit statistics show that inclusion of nonadditive effects improved slightly the model fit for pedigree-based models and substantially for marker-based models ( Table 2). The marker-based models M_A#D and M_D#D yielded the best fit of the data however, fitting differences among the more complex models were small. Thus, the dependency of the random component estimates was evaluated to further differentiate the best model.

We studied the sampling correlation among the variance component estimates, to assess which of the nine models shows less dependency and thus partitioned the genetic variance better (Table S3). Figure 1 shows the cumulative proportion of variance explained by high order eigenvalues of the sampling variance–covariance matrix of estimates derived from models including additive-plus-dominance, additive-by-additive, dominance-by-dominance, and additive-by-dominance epistatic interactions, for pedigree- and marker-based models. As reference, the distribution of eigenvalues for a perfect orthogonal correlation matrix, representing the ideal model (all of the eigenvalues equal to 1) is included. In all cases, the marker-based cumulative distributions are closer to the orthogonal distribution, suggesting less dependency between estimates of variance components. Indeed, the sampling correlation between estimates of variance components due to additive and dominance effects decreases in absolute value from 0.90 with the P_AD to 0.70 with the M_AD model (Table S2). In general, all the marker-based models that include epistasis outperform their pedigree-based counterpart ( Figure 1, B–D). Models M_D#D and M_A#D showed the smallest sampling correlations between additive and dominance/epistasis, with absolute correlation values <0.45 (Table S3).

Cumulative proportion of variance explained by eigenvalues for models considering A plus D from pedigree (P_AD) vs. markers (M_AD) (A), A#A interaction from pedigree (P_A#A) vs. markers (M_A#A) (B), D#D interaction from pedigree (P_D#D) vs. markers (M_D#D) (C), and considering A#D interaction from pedigree (P_A#D) vs. markers (M_A#D) (D). The diagonal represents an orthogonal correlation matrix.

Cumulative proportion of variance explained by eigenvalues for models considering A plus D from pedigree (P_AD) vs. markers (M_AD) (A), A#A interaction from pedigree (P_A#A) vs. markers (M_A#A) (B), D#D interaction from pedigree (P_D#D) vs. markers (M_D#D) (C), and considering A#D interaction from pedigree (P_A#D) vs. markers (M_A#D) (D). The diagonal represents an orthogonal correlation matrix.

The standard error of the predictions (SEP) of BV and dominance value (DV) were compared for the pedigree and markers models including additive by additive, dominance by dominance, and additive by dominance ( Figure 2). Values <45° reference line indicate that marker-based models have smaller SEPs. The SEPs for BVs from the marker-based models were smaller than the pedigree-based models in 99.2% of the cases ( Figure 2, A–C). In the case of the SEPs of DVs, a clear advantage was observed for marker-based models (y-axis) over pedigree-based models (x-axis), with SEP on average 52% lower for the marker-based models ( Figure 2, D–F).

Standard error of the prediction (SEP) for pedigree-based (x-axis) against their counterpart marker-based models (y-axis). SEP for BV prediction model including A#A interaction (A), D#D interaction (B), and including A#D interaction (C). SEP for dominance value (DV) prediction model including A#A interaction (D), D#D interaction (E), and A#D interaction (F).

Standard error of the prediction (SEP) for pedigree-based (x-axis) against their counterpart marker-based models (y-axis). SEP for BV prediction model including A#A interaction (A), D#D interaction (B), and including A#D interaction (C). SEP for dominance value (DV) prediction model including A#A interaction (D), D#D interaction (E), and A#D interaction (F).

The predictive ability of breeding value and genetic value for the pedigree-based and marker-based models are shown in Table 3. The highest predictive ability for BV was obtained with the pedigree additive model (P_A). A slight decrease in the BV prediction ability was observed when nonadditive effects were included in the pedigree-based model (0.86), and a much larger decrease was observed for the marker-based models (0.76). All models evaluated reached similar GV predictive ability.

Model of predictive ability and stability

Model . Predictive ability . Predictive stability .
Breeding value * . Genetic value ** . Breeding value . MSE (BV) . 10% rank cor (BV) .
P_A 0.89 0.64 1335.67 0.17
M_A 0.87 0.66 1294.23 0.17
P_AD 0.86 0.89 0.72 681.53 0.12
M_AD 0.82 0.88 0.74 418.83 0.31
P_A#A 0.86 0.89 0.73 669.99 0.15
M_A#A 0.76 0.89 0.85 82.80 0.46
P_D#D 0.86 0.89 0.73 638.58 0.18
M_D#D 0.77 0.89 0.86 161.78 0.43
P_A#D 0.86 0.89 0.73 657.22 0.16
M_A#D 0.76 0.89 0.86 208.15 0.42
Model . Predictive ability . Predictive stability .
Breeding value * . Genetic value ** . Breeding value . MSE (BV) . 10% rank cor (BV) .
P_A 0.89 0.64 1335.67 0.17
M_A 0.87 0.66 1294.23 0.17
P_AD 0.86 0.89 0.72 681.53 0.12
M_AD 0.82 0.88 0.74 418.83 0.31
P_A#A 0.86 0.89 0.73 669.99 0.15
M_A#A 0.76 0.89 0.85 82.80 0.46
P_D#D 0.86 0.89 0.73 638.58 0.18
M_D#D 0.77 0.89 0.86 161.78 0.43
P_A#D 0.86 0.89 0.73 657.22 0.16
M_A#D 0.76 0.89 0.86 208.15 0.42

Correlation between the phenotypic average of all the ramets (phe) and BV-all, and

correlation between phe and total genetic value (GV-all = BV-all + DV + epistatic value). Stability in a cross-validation correlation between BV-all and BV-cv, mean square error (MSE) and correlation of ranking positions for the top 10% individuals [10% rank cor(BV)].

Model . Predictive ability . Predictive stability .
Breeding value * . Genetic value ** . Breeding value . MSE (BV) . 10% rank cor (BV) .
P_A 0.89 0.64 1335.67 0.17
M_A 0.87 0.66 1294.23 0.17
P_AD 0.86 0.89 0.72 681.53 0.12
M_AD 0.82 0.88 0.74 418.83 0.31
P_A#A 0.86 0.89 0.73 669.99 0.15
M_A#A 0.76 0.89 0.85 82.80 0.46
P_D#D 0.86 0.89 0.73 638.58 0.18
M_D#D 0.77 0.89 0.86 161.78 0.43
P_A#D 0.86 0.89 0.73 657.22 0.16
M_A#D 0.76 0.89 0.86 208.15 0.42
Model . Predictive ability . Predictive stability .
Breeding value * . Genetic value ** . Breeding value . MSE (BV) . 10% rank cor (BV) .
P_A 0.89 0.64 1335.67 0.17
M_A 0.87 0.66 1294.23 0.17
P_AD 0.86 0.89 0.72 681.53 0.12
M_AD 0.82 0.88 0.74 418.83 0.31
P_A#A 0.86 0.89 0.73 669.99 0.15
M_A#A 0.76 0.89 0.85 82.80 0.46
P_D#D 0.86 0.89 0.73 638.58 0.18
M_D#D 0.77 0.89 0.86 161.78 0.43
P_A#D 0.86 0.89 0.73 657.22 0.16
M_A#D 0.76 0.89 0.86 208.15 0.42

Correlation between the phenotypic average of all the ramets (phe) and BV-all, and

correlation between phe and total genetic value (GV-all = BV-all + DV + epistatic value). Stability in a cross-validation correlation between BV-all and BV-cv, mean square error (MSE) and correlation of ranking positions for the top 10% individuals [10% rank cor(BV)].

Predictive stability can be viewed as a measure of how much the prediction of the breeding value and genetic value using all the data (BV-all and GV-all) depend on the individual phenotype ( Table 3). Predictions based on models with markers are more stable than those derived from pedigree models (3% increase when comparing M_A to P_A). In the pedigree-based models, inclusion of nonadditive effects increased the stability to predict BV by 13, 14, and 14% for P_A#A, P_D#D, and P_A#D, respectively, while inclusion of nonadditive effects in the more complex marker-based models, increased the BV prediction stability by >29% when compared with the M_A and by >33% when compared to P_A. The mean square error (MSE) decreased by ∼50% from the additive models (P_A) to the more complex pedigree-based models. The addition of nonadditive effects to the marker-based models decreased the MSE even further by >68% and up to 94% decrease in the case of model P_A#A ( Table 3).

In a breeding program, it is important to predict the trend and magnitude of the complete set of individuals in the population however, it is often more important to predict the best performing individuals (potential selections). Here we ranked all individuals based on BV-all and BV-cv and evaluated the ranking correlation of the top 10%, emulating the selection of the top 10% of genotypes ( Table 3). When the pedigree-based matrix was replaced by the marker-based matrix in the additive models (P_A and M_A), the capacity to predict the top 10% remained the same. However, this capacity increased substantially for the more complex marker-based models where the predictive stability of the top 10% of genotypes increased 82–170% ( Table 3).


References

Johnson, P. C. D. (2014). Extension of Nakagawa & Schielzeth’s R2 GLMM to random slopes models. Methods in Ecology and Evolution, 5(9), 944–946. doi: 10.1111/2041-210X.12225

Nakagawa, S., Johnson, P. C. D., & Schielzeth, H. (2017). The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of The Royal Society Interface, 14(134), 20170213. doi: 10.1098/rsif.2017.0213

Zuur, A. F., Savel'ev, A. A., & Ieno, E. N. (2012). Zero inflated models and generalized linear mixed models with R. Newburgh, United Kingdom: Highland Statistics.


Methods

Ethical compliance

The North West Centre for Research Ethics Committee granted ethics approval the UK Biobank study for (11/NW/0382). Participants in the study provided signed electronic consent upon recruitment. This research is approved under the University of Queensland human ethics committee (approval number 201100173).

Sample selection

The UK Biobank is a large prospective cohort study of

500 K individuals from the United Kingdom 40 . Individuals are aged 40–69 years and are assessed for a range of traits including physical, socio-economic and cognitive factors, as well as medical history. Most individuals have genotype information for 807,411 or 825,927 markers from the UK BiLEVE or UK Biobank Axiom Arrays 41 . We identified 417,060 individuals from this cohort that had (1) British or Western European ancestry (see details below), (2) consistent self-reported and genetic sex, (3) one or more recorded phenotypes for 32 quantitative or ordered categorical traits (see details below), (4) self-reported as born in Great Britain with co-ordinates for place of birth, (5) born between 1937 and 1970, (6) aged between 40 and 70 at the time of assessment, and (7) had imputed genotypes 41 .

Determining ancestry

Genotype markers for the UK Biobank sample were quality checked and imputed to the Haplotype Reference Consortium (HRC) and UK10K reference panels by Bycroft et al. 41 . From these data, we hard-called 1,326,701 bi-allelic HapMap3 SNPs imputed from the HRC reference panel with imputation quality ≥0.3, minor allele count (MAC) > 5 and missingness <0.05 using PLINK (v200aLM) 42 . We then used a multi-step, iterative process to quality check and identify individuals of British or Western European ancestry using the 2,504 participants in the 1,000 Genomes Project 43 with known ancestries as a reference. First, we identified 1,029,456 variants in common with the 1,000 Genomes Project and with minor allele frequency (MAF) > 0.01 in both datasets. Then the UK Biobank participants were projected onto the first two principal components (PC) from the 1000 Genomes Project reference panel using GCTA (v1.9) 5 . We classified Europeans as those with >0.9 probability as belong to the European supercluster based on the projection. Variants were then filtered for Hardy–Weinberg equilibrium in this tentative European subset (pHWE > 10 −5 ), and the projection and assignments to the European cluster repeated. The resulting classification assigned 456,426 individuals to European ancestry. Next we repeated the above procedure within the European subset of the 1000 Genomes panel to obtain individuals with >0.9 probability of clustering with the GBR (British in England and Scotland) and CEU (Northern and Western European ancestry) ancestry individuals. Using this more stringent criterion, we obtained 449,298 individuals of likely GBR and CEU ancestry.

Estimation of genomic relationships

The genomic relationship matrix (GRM) was constructed using GCTA (v1.9) 5 for European ancestry individuals from a set of 1,123,347 HapMap3 SNPs (MAF > 0.01, HWE > 10 −6 and missingness <0.05) originating from the Haplotype Reference Consortium (HRC) imputation panel. From this matrix, we used the --rel-cut-off option in GCTA to identify a subset of 348,502 individuals with a maximum genomic relationship (π) of 0.05 and a further set of 133,387 individuals with a maximum π of 0.02.

Genetic stratification

Principal components were calculated with 137,102 genotyped SNPs using flashPCA (v2.0) 44 in unrelated individuals with European ancestry (π < 0.05). Genotyped SNPs were those previously used by the UK Biobank to calculate principal components 41 , with some additional quality control filters (missingness < 0.05 pHWE 10 −5 MAF 0.01). The resulting SNP loading were used to project all individuals onto the PC space. PC projections were conducted specifically in the European ancestry subset to capture genetic stratification related to the UK Biobank data.

UK Biobank phenotypes

Thirty-two quantitative and ordered categorical traits from the UK Biobank were selected from those available that had a high proportion of individuals with records. Phenotypes include anthropometric (standing height, waist:hip ratio, body mass index, body fat percentage, sitting:standing height ratio) and blood traits (white blood cell count, red blood cell count, platelet count, eosinophil count) educational attainment, household income, spirometry (forced vital capacity, forced expiration volume) and some sex-limited traits (relative age at voice break, age at menarche, age at first birth). A full listing of traits, UK Biobank field identifiers and the number of records analysed are given in Supplementary Table 1.

Geographic stratification (birth contemporary groups)

The Geographic Information Systems (GIS) shapefile containing the boundaries of local authority areas in UK were obtained from the InFuse website 45 , which is part of the UK Data Servide Census Support. The R-packages sp (v1.4-4) and rgdal (v1.5-18) were used to merge the spatial data from local authority GIS shapefile 18,46,47 with the birth place coordinates of the UK Biobank participants (see Supplementary Table 2) in order to create 378 contemporary groups. Birth CG was fitted to phenotypes (see below) to assess and account for common environmental effects acting at the level of local authorities.

Models fitted to the phenotypes

Four models were fitted to each phenotype. Fixed effects included in all models as factors were sex (2 levels), genotyping batch (batch, 106 levels), year of birth (yob, 34 levels) and age at assessment (age, 31 levels). Models differed in the degree of geographic or genetic stratification in the model. A list of fixed effects and UK Biobank identifiers can be found in Supplementary Table 2. The models fitted were:


Michael Turelli

Theoretical population and quantitative genetics, speciation, and the population and evolutionary biology of Wolbachia and its hosts, especially Drosophila and disease-vector mosquitoes.

Grad Group Affiliations

Specialties / Focus

  • Population and Quantitative Genetics
  • Population Interactions
  • Systematics and Comparative Biology

Courses

  • PBG 270 Research Conference in Evolutionary Biology
  • EVE 103 Phylogeny and Macroevolution
  • BIS 101 Genes and Gene Expression
  • PBG 200C Principles of Population Biology

Field Sites

Honors and Awards

  • Guggenheim Fellowship, University College London, 9/86-8/87
  • American Academy of Arts and Sciences, Elected Fellow 2005
  • Miller Research Professorship, UC Berkeley, Spring Semester 2006
  • UC Davis Academic Senate Faculty Research Lecturer, 2012

Professional Societies

  • Genetics Society of America
  • Society for the Study of Evolution
  • American Society of Naturalists
  • American Academy of Arts and Sciences

Degrees

  • 1972 BS Mathematics University of California, Riverside
  • 1977 PhD Biomathematics University of Washington

Publications

Hoffmann, A. A., B. L. Montgomery, J. Popovici, I. Iturbe-Ormaetxe, P. H. Johnson, F. Muzzi, M. Greenfield, M. Durkan, Y. S. Leong, Y. Dong, H. Cook, J. Axford, A. G. Callahan, N. Kenny, C. Omodei, E. A. McGraw, P. A. Ryan, S. A. Ritchie, M. Turelli and S. L. O’Neill. 2011. Successful establishment of Wolbachia in Aedes populations to suppress dengue transmission. Nature 476:454-457.

Barton, N. H. and M. Turelli. 2011. Spatial waves of advance with bistable dynamics: cytoplasmic and genetic analogues of Allee effects. American Naturalist 178:E48-E75.

Carrington, L. B., J. R. Lipkowitz, A. A. Hoffmann and M. Turelli. 2011. A re-examination of Wolbachia-induced cytoplasmic incompatibility in California Drosophila simulans. PLoS ONE 6:e22565.

Warren, D. L., R. E. Glor and M. Turelli. 2010. ENMTools: A toolbox for comparative studies of environmental niche models. Ecography 33:607-611.

Turelli, M. 2010. Cytoplasmic incompatibility in populations with overlapping generations. Evolution 64:232-241

Haygood, R. and M. Turelli. 2009. Evolution of incompatibility-inducing microbes in subdivided host populations. Evolution 63:432-447.

Jansen, V. A. A., M. Turelli and H. C. J. Godfray. 2008. Stochastic spread of Wolbachia. Proceedings of the Royal Society of London, Series B 275:2769-2776.

Warren, D. L., R. E. Glor and M. Turelli. 2008. Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution 62:2868-2883.

Bolnick, D. I., M. Turelli, H. López-Fernández, P. C. Wainwright and T. J. Near. 2008. Accelerated mitochondrial evolution and ‘Darwin’s corollary’: Asymmetric viability of reciprocal F1 hybrids in centrarchid fishes. Genetics 178:1037-1048.

Turelli, M. and L. C. Moyle. 2007. Asymmetric postmating isolation: Darwin's corollary to Haldane's rule. Genetics 176:1059-1088.

Weeks, A. R., M. Turelli, W. R. Harcombe, K. T. Reynolds and A. A. Hoffmann. 2007. From parasite to mutualist: rapid evolution of Wolbachia in natural populations of Drosophila. PLoS Biology 5:997-1005.

Mittelbach, G. M., D. Schemske, H. V. Cornell, al. et M. Turelli. 2007. Evolution and the latitudinal diversity gradient: speciation, extinction, and biogeography. Ecology Letters 10:315-331.

Turelli, M. and N. H. Barton. 2006. Will multilocus epistasis and population bottlenecks increase additive genetic variance? Evolution 60:1763-1776

Fitzpatrick, B. M. and M. Turelli. 2006. The geography of mammalian speciation: Mixed signals from phylogenies and range maps. Evolution 60:601-615.

Hill, W. G., N. H. Barton and M. Turelli. 2006. Prediction of effects of genetic drift on variance components under a general model of epistasis. Theoretical Population Biology 70:56-62.

Barton, N. H. and M. Turelli. 2004. Effects of genetic drift on variance components under a general model of epistasis. Evolution 58:2111-2132.

Turelli, M. and N. H. Barton. 2004. Polygenic variation maintained by balancing selection: pleiotropy, sex-dependent allelic effects and GxE interactions. Genetics 166:1053-1079.

Hudson, R. R. and M. Turelli. 2003. Stochasticity overrules the "three-times rule": genetic drift, genetic draft, and coalescence times for nuclear loci versus mitochondrial DNA. Evolution 57:182-190

Turelli M, NH Barton and JA Coyne. 2001. Theory and speciation. Trends in Ecology & Evolution 16:330-343

Orr, HA and M Turelli. 2001. The evolution of postzygotic isolation: accumulating Dobzhansky-Muller incompatibilities. Evolution 55:1085-1094

Turelli, M, DW Schemske and P Bierzychudek. 2001. Conditions for stable two-allele polymorphisms with seed banks and fluctuating selection: maintaining the blues in Linanthus parryae. Evolution 55:1283-1298


Materials and Methods

Field data from a single experimental trial from the CCLONES population (see Baltunis et al. 2007 and Resende et al. 2012a for details) was used in this study. The response variable total tree height (HT, m) was used. The population was generated by crossing 32 parents in a circular mating design with additional off-diagonal crosses, resulting in 70 full-sib families with an average of 13.5 individuals per family. Each individual was clonally replicated (ramet) and a clonal field trial was established using single-tree plots with eight replicates (one ramet per replicate), in a resolvable alpha-incomplete block design ( Williams et al. 2002). Four of the replicates were grown under high intensity management while the rest were under an operational-like regime.

A subset of the CCLONES population, composed of 951 individuals from 61 families, was genotyped using the Illumina Infinium platform (Illumina, San Diego ( Eckert et al. 2010) with 7216 SNPs, each representing a unique pine EST contig. A total of 4853 SNPs were polymorphic and were used for further analyses.

Relationship matrices

In addition, a molecular marker-based dominance relationship matrix (DG) was constructed. To build a dominance relationship matrix, we created an incidence matrix (S) for effects due to dominance S = <sij>, where sij was parameterized to be coded 1 if the genotype was heterozygous and 0 if the marker genotype was homozygous for either class. The matrix S was further standardized to have mean zero by using: sij = 1 − 2pjqj if the individual is heterozygous, sij = 0 if the individual has missing data, and sij = 0 − 2pjqj otherwise.

Pedigree-based relationship matrices for additive (A) and dominance (D) effects were computed using standard methods ( Lynch and Walsh 1998 Mrode 2005). Following existing theory ( Cockerham 1954 Kempthorne 1954 Henderson 1985 Gianola and de los Campos 2008), the covariance matrices due to first degree epistatic terms were computed using Hadamard products (i.e., cell-by-cell product denoted as #) of the following form: (i) additive-by-additive interactions (A#A or AG#AG) (ii) dominance-by-dominance interactions (D#D or DG#DG) and (iii) additive-by-dominance interactions (A#D or AG#DG) for pedigree and marker-based methods, respectively.

Genetic analyses

All analyses were carried out in the software ASReml v3.0 ( Gilmour et al. 2009), which fits mixed models with complex datasets using sparse matrix methods. ASReml is equipped with the residual maximum likelihood (REML) for variance component estimation using the average information algorithm ( Gilmour et al. 1995).

Under the above model, the narrow-sense heritability can be estimated as h 2 = σ a 2 / σ p 2 , the dominance to total variance ratio as d 2 = σ d 2 / σ p 2 , the epistatic to total variance ratio as i 2 = σ i 2 / σ p 2 , and the broad-sense heritability as H 2 = σ g 2 / σ p 2 . The σ a 2 is the estimated additive variance, σ d 2 is the estimated dominance variance, and σ p 2 ⁠ , σ i 2 ⁠ , and σ g 2 are the total phenotypic, epistatic, and total genetic variance, respectively, that changed accordingly to the model being fit ( Table 1).

Summary of models, fitted effects, and relationship matrices used in the study

Model . Relationship matrix used (information used, A, D = pedigree, AG, DG = markers) .
Number . Code . Additive . Dominance . Epistasis .
1 P_A A
6 M_A AG
2 P_AD AD
7 M_AD AGDG
3 P_A#A ADA#A
8 M_A#A AGDGAG#AG
4 P_D#D ADD#D
9 M_D#D AGDGDG#DG
5 P_A#D ADA#D
10 M_A#D AGDGAG#DG
Model . Relationship matrix used (information used, A, D = pedigree, AG, DG = markers) .
Number . Code . Additive . Dominance . Epistasis .
1 P_A A
6 M_A AG
2 P_AD AD
7 M_AD AGDG
3 P_A#A ADA#A
8 M_A#A AGDGAG#AG
4 P_D#D ADD#D
9 M_D#D AGDGDG#DG
5 P_A#D ADA#D
10 M_A#D AGDGAG#DG
Model . Relationship matrix used (information used, A, D = pedigree, AG, DG = markers) .
Number . Code . Additive . Dominance . Epistasis .
1 P_A A
6 M_A AG
2 P_AD AD
7 M_AD AGDG
3 P_A#A ADA#A
8 M_A#A AGDGAG#AG
4 P_D#D ADD#D
9 M_D#D AGDGDG#DG
5 P_A#D ADA#D
10 M_A#D AGDGAG#DG
Model . Relationship matrix used (information used, A, D = pedigree, AG, DG = markers) .
Number . Code . Additive . Dominance . Epistasis .
1 P_A A
6 M_A AG
2 P_AD AD
7 M_AD AGDG
3 P_A#A ADA#A
8 M_A#A AGDGAG#AG
4 P_D#D ADD#D
9 M_D#D AGDGDG#DG
5 P_A#D ADA#D
10 M_A#D AGDGAG#DG

Model comparisons

Models were compared using the Akaike information criterion (AIC) ( Akaike 1974). Precision of variance components estimates, and their dependency, was assessed using the asymptotic variance–covariance matrix of estimates of variance parameters (V). The asymptotic sampling correlation matrix of estimates (F) was computed as F = L − 1 / 2 V L − 1 / 2 , where L is a diagonal matrix containing the diagonal elements of V. Inspection of the off-diagonal elements of the F matrix allows assessing sampling correlation of variance estimates. To have an overall assessment of dependency between the estimates, eigenvalues of F were examined. The standard error of the prediction (SEP) was estimated for each model as the square root of the prediction error variance (PEV), which is obtained by extracting the elements of the diagonal of the generalized inverse of the coefficient matrix from the linear mixed model equations (left hand side), and scaled by the error variance. In short form, the PEVs correspond to the Var ( a − a ^ ) ( Mrode, 2005, p. 51).

Predictive ability and stability of the models in estimating breeding and genetic values were evaluated. The predictive ability of a model’s breeding value was defined as the correlation between the estimated breeding value and the phenotypic average of all the ramets (clones). These values were calculated when all the data were used without cross-validation (BV-all). The predictive ability of a model’s total genetic value (sum of BV-all, dominance effect, and epistatic effect) was defined as the correlation between the predicted total genetic value and the phenotype average of all the clones using all the data without cross-validation (GV-all). Prediction models were assessed under cross-validation ( Kohavi 1995) to obtain predicted breeding value (BV-cv) and predicted total genetic value (GV-cv) with a random sub-sampling partitioning, fixed for all models. The stability of the predictive models were evaluated as the correlation between the BV-all and BV-cv, and between GV-all and GV-cv, and was defined as a measure of the dependency of the predictive breeding value on the phenotype. The mean square error (MSE) was calculated between BV-all and BV-cv within each model using standard methods. Finally, the capacity of the model to predict ranking position of the top 10% of the individuals, simulating a selection scenario, was evaluated as the correlation between the ranking position using the BV-all and the ranking position using the BV-cv.


Abstract

Background— Inflammation variables (C-reactive protein [CRP], fibrinogen, and soluble intercellular adhesion molecule-1 [sICAM-1]) have been identified as risk factors for cardiovascular disease. It is still not known how much the regulation of inflammatory risk factors is determined by genetic factors, and the aim of this study was to determine the heritability of these inflammation variables and of the acute phase regulating cytokines interleukin-6 (IL-6) and tumor necrosis factor-α (TNF-α) at older ages.

Methods and Results— The heritability of CRP, fibrinogen, sICAM-1, IL-6, and TNF-α was determined in a twin study consisting of 129 monozygotic twin pairs and 153 dizygotic same-sex twins aged 73 to 94 years who participated in the Longitudinal Study of Aging of Danish Twins. Furthermore, we determined the influence of selected genetic polymorphisms on the plasma level variations. Genetic factors accounted for 20% to 55% of the variation in plasma levels of the inflammation variables. The highest heritability was found for sICAM-1. The genetic polymorphisms we studied explained only a small, insignificant part of the heritability.

Conclusions— This study in elderly twins provides evidence for a substantial genetic component of inflammatory cardiovascular risk factors among the elderly.

Inflammation variables have been identified as risk factors for cardiovascular disease. The heritability of inflammation variables at older ages was considered in a twin study on 129 monozygotic and 153 dizygotic same-sex twin pairs aged 73 to 94 years. Genetic factors accounted for 20% to 55% of the variation in levels of the inflammation variables studied.

A positive family history of cardiovascular disease is a strong determinant of the risk of cardiovascular events in epidemiological studies, 1,2 and this observation was confirmed by a Swedish twin study in which death from coronary heart disease showed a substantial genetic component. 3 Within families, the genetic background and environmental factors are shared, and twin studies are among the best studies for disentangling the effects of genetic and environmental components.

Inflammation is an important mechanism in cardiovascular disease. The acute phase proteins C-reactive protein (CRP), 4,5 fibrinogen, 6 and soluble intercellular adhesion molecule-1 (sICAM-1) 7 are consistently associated with the risk of cardiovascular events in prospective population studies. The levels of these inflammatory markers reflect the severity of the underlying atherosclerosis in the vessel wall, but they may also causally contribute to the development of cardiovascular disease. 8 As an example, CRP activates the classical pathway of the complement system, functions as a scavenger factor, and determines the uptake of low-density lipoprotein by macrophages. 9 Fibrinogen levels determine the rigidity and fibrinolysis rate of clots fibrinogen functions as a bridging molecule for many types of cell–cell adhesion events and provides a critical provisional matrix. 10 ICAM-1 participates in the adhesion of neutrophils and monocytes to the endothelium, an essential step in extravasation of neutrophils and monocytes at the site of inflammation.

The plasma levels of these variables show a large interindividual and intraindividual variation. 11 Inflammatory triggers are the main determinants of the levels of these factors, but also, genetic variations have been reported to explain part of the variation in inflammatory variables. Studies of CRP 12–15 and fibrinogen 13,16–18 in healthy young and middle-aged twins and in family studies suggest that a considerable part of the variation in these factors can be explained by genetic factors.

Because cardiovascular disease is primarily a disease of the elderly, it is important to assess how important genetic variation in risk factors is in the elderly because they are subject to many more triggers or risks than young individuals. It may be expected that genetic factors make a different contribution to the variation in the elderly compared with the younger age groups. On one hand, it is possible that functional genetic variations in the promoter region of the genes, which determine response to triggers, contribute more in the elderly because there are more triggers. On the other hand, the surplus and variety of triggers present in later age may also decrease the importance of genetic variation. Thus, it is not clear whether a larger or smaller contribution of genetic variation is expected, and evolution predicts the first, and traditional gerontology predicts the latter.

In addition to functional variations in genes of inflammation variables, functional genetic variations in factors that regulate these variables may contribute to the variation in plasma levels of inflammatory variables. Interleukin-6 (IL-6) 19 and tumor necrosis factor-α (TNF-α) 20 are 2 important regulatory cytokines for acute phase proteins 21,22 that both have a genetic contribution to their levels 23 and that both have functional genetic polymorphisms in their promoter region that have been associated with their plasma levels. 24,25 These functional polymorphisms, for which an association with levels has been shown repeatedly and is biologically plausible, are the most likely genetic variants to explain part of the hereditary component that determines the plasma levels of the inflammatory variables.

The aim of this study was to assess the overall genetic role in variations of inflammation variables among the elderly. Therefore, we determined the genetic contribution to the variations in CRP, fibrinogen, sICAM-1 and the acute phase-regulating cytokines IL-6 and TNF-α in elderly twins (73 to 94 years old).

Subjects and Methods

Participants

A total of 584 individuals (190 males and 394 females) aged 73 to 94 years participated in this study. These individuals made up 129 intact monozygotic (MZ) twin pairs and 153 intact dizygotic (DZ) same-sex twin pairs. The twin pairs were identified using the Danish Twin Registry and participated in the 1997 Longitudinal Study of Aging of Danish Twins. 26 The aim was to include a random sample of twins so all twins who were able to finalize the interview and give informed consent were eligible for participation in the study.

The Danish Twin Registry, a nationwide and population-based registry established in 1954, comprises twins born between 1870 and 1930, surviving to the age of 6 years. 26 Only same-sex twin pairs were included in the present study. Zygosity was established through a questionnaire on the degree of similarity between twins in a pair. Zygosity classification was evaluated by comparison with blood group determinants, and the misclassification rate was <5%. 27

Methods

Blood samples (20 mL of EDTA blood) were collected within a 6-month period. The samples were centrifuged within 12 hours after sampling at 1000g for 10 minutes, and plasma was frozen in aliquots at −80°C within 2 days (usually within the same day). Plasma samples were rapidly thawed in a water bath at 37°C. The protein concentration of CRP was measured using a high-sensitivity in-house enzyme immunoassay using rabbit anti-human CRP IgG as capture and tagging antibody (DAKO). Human CRP Standard (Dade Behring) was used as a calibrator. Protein concentrations of IL-6, TNF-α, and sICAM-1 were determined using high-sensitivity human ELISAs (R&D Systems). The protein concentration of fibrinogen was determined using a nephelometric method (Dade Behring). Interassay coefficients of variation for the assays we used were: for fibrinogen, <5.0% IL-6, <10.1% ICAM, <5.6% TNF, <14.5% CRP, <6.5% based on 17 determinations from a single normal plasma sample on different days.

Genomic DNA samples were analyzed using polymerase chain reaction (PCR)-based methods. The fibrinogen β-455G/A promoter polymorphism was analyzed as described by Thomas et al. 28 The CRP-1059G/C polymorphism was analyzed as described by Zee et al. 29 The ICAM-1 K/E469 polymorphism was analyzed as described by Nishimura et al. 30 The IL-6–174C/T promoter polymorphism was analyzed as described by Fishman et al. 24 The TNF-α–238G/A promoter polymorphism was analyzed as described by Wennberg et al. 31 The genotype was determined in all individuals and monozygosity was confirmed. In each PCR run, samples with known genotypes were included. We also reanalyzed 10% of the samples (at random), and the results were confirmed.

Analyses of Twin Similarity

The similarity in MZ and DZ twins was assessed using intraclass correlations for inflammatory variables. The classic twin study methodology is based on the fact that MZ twins have identical genotypes, whereas DZ twins share, on average, half of their genes and are no more genetically related than ordinary siblings. A greater phenotypic similarity in MZ twins is to be expected if there is a significant genetic component in the etiology of the component. Analyses were adjusted (by linear regression) for age, sex, and body mass index (BMI). To approximate a bivariate normal distribution, data were transformed to the logarithmic scale, and probit plots showed that the marginal distributions could be assumed to be normal. Plasma levels of CRP >10 mg/L and levels of IL-6 >10 pg/mL (6% of the individuals) are considered to reflect “clinical inflammation” these data and the data from the cotwin were removed from the data set.

Estimation of Heritability

Heritability of the 5 traits, effects of covariates on traits and the degree of dependence between the traits, were determined using multivariate variance components models. 32 Phenotypic (co)variances are decomposed into genetic and environmental components by the Cholesky decomposition method. According to standard biometric practice, assuming no epistasis (genetic inter locus interaction), no gene–environment interaction or correlation, and no assortative mating, the phenotypic variance can be separated into 4 variance components: (A) variance attributable to additive genetic effects (D) genetic dominance (C) shared (family) environment and (E) nonshared (individual-specific) environment. 32 Only nonshared environments contribute to dissimilarity within MZ twin pairs because of their genetic identity, whereas the effects of additive genetic factors and genetic dominance may also contribute to dissimilarity within DZ pairs, who share, on average, half of the additive and one-quarter of the dominant genetic factors. For each trait, the heritability is the ratio of genetic variance to the total variance. Effects of the covariates of each trait are given by the regression coefficients coming from the covariate that is linearly added to the model (which corresponds to linear regression of the covariates on traits and subsequent heritability analysis of residuals thereof.

The multivariate model incorporates the 5 traits and associated covariates and allows estimating the degree of dependence between the traits (ie, correlations between the traits). Furthermore, the extent to which the same genetic factors contribute to the observed phenotypic correlation (ie, the degree of pleiotropy of possible genes influencing the traits) is obtained from the model by estimating the genetic correlations. The method for selecting the best model followed standard procedures (structural-equation analyses) using the Mx statistical program. 32,33

Association Between Genotype and Phenotype

Association between the 5 polymorphisms and 5 trait levels in the whole population was initially assessed by 1-way ANOVA (with robust estimation of variance of phenotypes for individual twins clustered into twin pairs because of possible within-pair dependence). Sib-pair–based association analysis was performed using the variance components models of association implemented in the Quantitative Transmission Disequilibrium Test (QTDT) program, 34 and in all analyses, gender, age, and BMI were included as covariates. We tested for association in QTDT using the model by Fulker et al, 35 allowing for the effects of polygenes, shared family environment, nonshared environment, and effect of linkage at the locus under study. This model allows for the partitioning of observed association into between-pair and within-pair components, thus providing a check on whether results could have arisen because of unsuspected population stratification among the pairs studied. Subsequent permutation tests 36 of association, which do not rely on the bivariate normal distribution assumption, were performed using the QTDT program. Secondly, the amount of variation in the plasma levels of the inflammatory variables attributable to the inflammatory polymorphisms was studied using parametric linkage analysis implemented in the Merlin program. 36 There is no adjustment of the mean levels for zygosity. It is an assumption that MZ and DZ pairs come from the same population. There is no sign of violation of this assumption (Table 1).

TABLE 1. General Characteristics and Concentrations of Inflammation Variables in MZ and DZ Elderly Twins

Allele frequencies were determined by gene counting, and 95% CIs of the allele frequencies were calculated from sample allele frequencies. Deviations of the genotype distributions from that expected for a population in Hardy–Weinberg equilibrium were analyzed using the χ 2 test.

Results

Table 1 shows the general characteristics of our elderly twin population and the concentrations of inflammation variables in the MZ and DZ twins. All characteristics are comparable in the MZ and DZ twins. Concentrations of the inflammatory variables are comparable in MZ and DZ twins.

The model chosen to explain the factors contributing to the variance in the inflammatory cardiovascular risk factors is the model including additive genetic effects, common environmental effects, and nonshared environmental effects (ACE). In all the traits, the common environmental effects were small and nonsignificant (data not shown). The ACE model was then used to estimate the heritability coefficients, effects of covariates, and dependencies among the traits (ie, mutual correlation of the traits and genetic correlation of the traits).

As covariates in our multivariate variance components model, we included gender, age, and BMI. Plasma concentrations of IL-6 and TNF-α increase with age, and the plasma concentration of IL-6 decreases with increasing BMI, whereas levels of CRP and fibrinogen increase with increasing BMI (Table 2). We observed no significant gender effect for any of the traits (Table 2).

TABLE 2. Heritability Estimates for Inflammatory Cardiovascular Risk Factors

The intraclass correlations in MZ twins were consistently higher than those in DZ twins for all variables (Table 2). The difference was statistically significant for ICAM-1 (P<0.001). Heritability estimates, when adjusting for effects of gender, age, and BMI, and when allowing for possible dependencies between the traits, varied between 0.17 for IL-6 and 0.55 for sICAM-1 (Table 2). Mutual correlations between the traits are given in Table 3. Between the plasma levels we observed, as expected, significantly positive correlations ranged from 0.17 to 0.55, indicating substantial relationship among the traits. Significant genetic correlations, ranging between 0.27 and 0.41, were seen between the cytokines and CRP, fibrinogen, and sICAM-1, indicating a substantial degree of pleiotropy (ie, a gene affecting 2 or more traits [Table 3]). Small (and not significant) genetic correlations were observed among CRP, fibrinogen, and sICAM-1, which may indicate that the common genetic contribution to their regulation is insignificant.

TABLE 3. Correlations Between Cytokines and Acute Phase Proteins

Polymorphisms in the inflammatory genes (-455G/A in fibrinogen-β, -1059G/C in CRP, K/E469 in ICAM-1, -174C/T in IL-6, and -238G/A in TNF-α) were all in Hardy–Weinberg equilibrium, and the frequencies of the alleles (Table 4) were comparable to those reported in other white populations. 25,37–40

TABLE 4. Frequencies of Polymorphisms in Inflammation Genes

When we studied the effect of polymorphisms in factors upstream in the regulatory pathway of the cytokines and acute phase proteins, evidence for an association between genotype at the TNF–238G/A polymorphism and IL-6 was found by ANOVA (P=0.02 Table 5). This association was confirmed using the sib-pair–based analysis (QTDT, χ 2 =5.7 [1 df] P=0.02) performed using the genetic model by Fulker et al, 35 which takes possible population stratification into account. Associations were also seen for TNF–238G/A polymorphism with CRP levels (P=0.03) and for the fibrinogen–455G/A polymorphism with fibrinogen levels (P=0.03) by ANOVA, but these associations were not confirmed in the subsequent sib-pair–based analysis.

TABLE 5. Relationship Between Polymorphisms and Plasma Protein Concentrations

No evidence for linkage between the polymorphisms and the plasma levels of the inflammation variables was observed, not even between the TNF-α polymorphism and IL-6 levels. However, power is limited for linkage studies in our cohort.

Discussion

This study describes that there is a clear genetic component in levels of inflammatory cardiovascular risk factors in the elderly. Selected functional polymorphisms in the genes of fibrinogen-β, CRP, ICAM-1, IL-6, and TNF-α only explained a very small part of this variation.

The contribution of genetic variation to plasma levels of fibrinogen in the elderly twins in our study was comparable or somewhat smaller than the genetic contributions published for younger, mostly middle-aged twins. For fibrinogen, the additive genetic variance in our elderly twins (0.21) was lower than the estimates determined in other twin studies, which varied from 0.28 to 0.44. 13,16,18 Family studies reported heritabilities for fibrinogen between 0.20 and 0.51, 16,41 and in the Kibbutzim Family Study, it was observed that the heritability decreased from 0.48 in 20-year-old individuals to 0.20 in 80-year-old individuals. 42 It was expected that the effect of genetic factors could be smaller in the elderly twins because of the presence of many inflammatory triggers, such as cardiovascular disease and other age-associated diseases. The results of our study support this hypothesis, but it remains unclear how much of the difference in heritability can be attributed to age versus other population characteristics.

Only a small, insignificant part of this variance could be explained by the -455G/A polymorphism in the promoter region of the fibrinogen-β gene, in accordance with other reports. 43 Polymorphisms in ≈2000 bp of the promoter region of the fibrinogen-β gene were identified, and in whites, these are in very strong linkage disequilibrium. 44 Furthermore, we recently found using luciferase expression studies that the -148C/T polymorphism (which is in perfect linkage disequilibrium with the -455G/A polymorphism in white populations) is the functional polymorphism in the promoter region, explaining all promoter activity. 45 Therefore, it is not expected that any new information will become available when other polymorphisms are studied in this region. Also polymorphisms in the promoter region of IL-6 and TNF-α, 2 major determinants of fibrinogen synthesis, explained only an insignificant part of the variance in fibrinogen levels.

The genetic contribution to variation in the CRP concentration appears somewhat lower in our study than in middle-aged twins, for whom a clear genetic contribution has been reported in a UK study (intraclass correlation coefficient of 0.57 in MZ twins and 0.31 in DZ twins) 12 and in healthy family studies. 14,15 Polymorphisms in the promoter region of CRP, IL-6, or TNF-α did not explain a major part of this genetic variance in our elderly twins, in contrast to the observation of Vickers et al that the IL-6–174C/T polymorphism explains 14% of the CRP concentration. 14 For sICAM-1, no information was published on the heritability of the levels, but the high estimate in our study makes this an interesting candidate gene for further studies. For IL-6, the heritability estimate in our study (0.17) is similar to an estimate of heritability that is available from a family study (0.24). 23 For TNF-α, the heritability estimate in our study (0.26) was much lower than the estimates derived from family studies (0.68 and >0.80). 23,46 The heritability of the in vitro production of TNF-α in a whole-blood stimulation experiment was 0.60, 47 which is also higher than the heritability we saw in vivo in elderly twins. This suggests that the responsiveness may have a stronger genetic component than the baseline levels.

The hypothesis that responsiveness may be genetically determined suggests that functional polymorphisms in the regulatory elements of inflammatory genes may be determinants of the plasma levels of inflammatory variables. However, the regulatory polymorphisms we studied in the genes for fibrinogen, IL-6, CRP, and TNF-α did not explain much of the variance in their gene products, nor did polymorphisms in the IL-6 and TNF-α genes explain the variance in the levels of the acute phase proteins. We studied polymorphisms for which an association with levels has been shown repeatedly and is biologically plausible. In addition to the fibrinogen promoter, also for the other factors, the gene and a reasonable part of the promoter region (on average up to 2000 bp) were studied for functional polymorphisms, and we selected polymorphisms for which it was most likely that they would explain a considerable part of the heritability. 24,45,48 However, we cannot exclude that other genetic variants have a major impact on the plasma levels of these factors, and studies using full haplotype analysis will be able to give a better estimation of the contribution of genetic variation in a gene to its plasma levels.

We only saw 1 possible association, but no linkage, for any combination of polymorphism and plasma level, indicating that twin pairs that share their alleles are not necessarily close in trait values. This suggests that we did not study the functional polymorphism or that a combination of polymorphisms affects the plasma level. The intriguing question is what the genetic components are if they are not genetic variants in the genes, variations in the promoter regions, or in the regulating cytokine genes. The explanation for the high heritability may therefore be found in enhancer elements further away from the gene or in other genes that determine the regulation of inflammatory factors, such as nuclear proteins. The high genetic correlation between inflammatory variables in our study may indicate a common genetic regulatory mechanism.

BMI has a large genetic component and is also a major determinant of inflammation variables, for example, by increasing the cytokine production by adipose tissue. 49 Furthermore, it is possible that the sensitivity of inflammatory variables to BMI is genetically determined. Adjustment for BMI in the statistical analysis may remove such a relationship. Therefore, we analyzed the data unadjusted and adjusted for BMI and noticed that the adjustment had no significant effect on the heritability estimated in our population. This suggests that different gene pools determine the inflammatory variables and the BMI.

When studying the elderly, it cannot be excluded that there has been a selection of our study population because only individuals that survived to be 73 years old were included in the study. However, that is a problem inherent in all aging research, and the present study benefited from a high response rate (79%).

The calculated heritability may be an underestimate of the underlying heritability because the procedure for processing blood was not optimal, which may have given a spuriously greater variation in the levels of the inflammatory markers. Furthermore, the phenotypes show a great day-to-day variation because the inflammatory variables are very sensitive to environmental factors.

In conclusion, levels of inflammatory cardiovascular risk predictors are, to a large extent, genetically determined, even in elderly individuals, and therefore, it is important to elucidate the genetic factors involved in the regulation of these proteins, especially in cardiovascular disease.

Funding from the US National Institute on Aging Research, grant NIA-P01-AG08761, and the European Commission, the fifth framework program (QLG2-CT-2002-01254) were received. We would like to thank Anette Larsen, Gunhild Andreasen, Kathrine Overgaard, Jakob Mortensen, and Berit Bents for their technical assistance.


Results

Simulation study

We performed two simulation studies with different sample sizes to provide a rough picture of how many twin pairs are needed for reliable estimates. In these simulations, we also compared the performance among the ACE(t) models with 5 and 8 interior knots and the ACE(t)-p models with 8 and 12 interior knots. We generated phenotypic data for 5000 MZ and 5000 DZ twin pairs in the first simulation and twofold sample sizes in the second one, using R. We first sampled a vector of age T for the twin pairs that was uniformly distributed from 1 to 50. We fixed the variance of the E component at 1 and chose the following quadratic and power variance functions for the A and C components, of which the shapes are depicted in Figure 1. We then simulated the phenotypic data by sampling from bivariate normal distributions based on Equation 1 with and plugged into the covariance matrices. The average mean square error (AMSE), which is defined as and takes into account both bias and variance, was used to evaluate the performance of estimation where we chose the number of evaluated time points In each scenario, we simulated 200 data sets and fitted them with the proposed ACE(t) and ACE(t)-p models to calculate the AMSEs. For the ACE(t) model, we ran an additional analysis to compare the confidence bands obtained from the Delta method and the bootstrap method with 200 replicates.

Variance functions of the A, C, and E components investigated in the simulation study.

The results presented in Table 2 show that the AMSEs of the ACE(t) model with 8 knots are larger than those with 5 knots for both components. This suggested that 8 knots seemed excessive and could worsen the performance in terms of AMSE for the relatively smooth functions that we chose to investigate here. In contrast, the ACE(t)-p model had smaller AMSEs than the ACE(t) model in all scenarios, confirming its superiority of tuning the penalty term for optimal estimation as long as a sufficient number of knots is given (Ruppert et al. 2003). The ASMEs from the ACE(t)-p model with a different number of knots were more similar to each other, implying that the estimates from P-splines are much less affected by the choice of knots than those from B-splines. The smaller number of knots in the penalized spline estimator performs better in terms of AMSE, which is in accordance with a previous theoretical finding (Claeskens et al. 2009). The AMSEs decreased almost linearly with increasing sample size in all situations however, accurate estimates still entailed thousands of twin pairs.

The confidence bands from the Delta method were comparable to those from the bootstrap method for both components as plotted in Figure 2. The confidence bands estimated from the bootstrap method were less smooth due to the limited number of resampling (i.e., 200 replicates), which should be adequate for the bootstrap method (Efron and Tibshirani 1994). The confidence bands from these two methods almost coincided, suggesting that the L-BFGS algorithm worked properly as a faster alternative to Newton’s method in this case.

Confidence bands constructed from the Delta and bootstrap methods for the A (shown in A) and C (shown in B) components.

A Finnish twin study of BMI

We applied both models to a twin data set retrieved from the Finnish Twin Cohort study to investigate age-dependent genetic and environmental components of BMI. The details on collection of the data were described in previous publications for older (Kaprio and Koskenvuo 2002) and younger cohorts (Kaprio et al. 2002). In this analysis, we included 19,510 MZ and 27,312 DZ same-sex twin individuals together with the information on age at the measurement contributed to the CODATwins project (Silventoinen et al. 2015). The distributions of age at the measurement among the MZ and DZ twins were widespread from 11 years to 60 years as shown in Figure 3. We fitted the data set with a linear regression on sex and age before using the residuals from the regression as the phenotypes for the analysis of variance curves. The variance curves of the genetic and common environmental components estimated from the ACE(t) (with 5 interior knots for either component) and the ACE(t)-p models (with 8 interior knots for either component) are shown in Figure 4 and Figure 5. For the ACE(t)-p model, we included more knots to ensure they were sufficient and excessive knots were affected little in terms of AMSE. The 95% pointwise confidence bands were obtained by using the bootstrap method for the ACE(t) model and the MCMC method described in the Appendix for the ACE(t)-p model. The comparable results from both models suggested that the variance of the additive genetic component started from ∼2 at age 11 years and increased rapidly until age 35 years. After that, it leveled off for ∼10 years before growing again to nearly 15, except that the curve from the ACE(t)-p model had more fluctuations. In contrast, the variance of the common environmental component starting at ∼8 at age 11 years dropped markedly before age 20 years and became almost undetectable after that. The estimated variance of the unique environmental component was 2.25. In contrast, the ACE model using the OpenMX package (Boker et al. 2011) yielded estimates of 7.384, 0.000, and 2.265 for the A, C, and E components, respectively, while the AE model gave nearly the same estimates for the A and E components.


Watch the video: Genetic variance, Additive, Dominance, Epistatic variance (June 2022).


Comments:

  1. Mainchin

    Thanks for the help on this question. All just great.

  2. Mekasa

    I took it to the quotation book, thanks!

  3. Hipolit

    I agree with all of the above. Let's discuss this issue. Here or at PM.

  4. Rod

    Did you quickly come up with such a matchless answer?

  5. Norwel

    And you have understood?



Write a message