美文网首页
Calculating Accuracy and Reliabi

Calculating Accuracy and Reliabi

作者: jilingxf | 来源:发表于2020-05-17 15:12 被阅读0次

Calculating Accuracy and Reliability of Random EffectEstimates with ASReml-R

用ASReml-R计算随机效应估计的精度和可靠性

工作当中遇到问题,想要计算育种值的准确性,以查找准确性较低的猪只原因和使用的时候注意,翻了半天的文章,最后还是决定看一篇https://www.vsni.co.uk/case-studies/reliability上的英文文献,现在大致上能够知道意思,,由于是第一次写,公式特别多,不在一一改了,原文上传到百度网盘上,链接:https://pan.baidu.com/s/12cGsfkKn9CKCRXxC6cm44A 提取码:gmss

An important aimwhen fitting linear mixed models (LMM) is the use of random effect estimates.In some analyses, such as genetic evaluations, the main objective of theanalysis is to obtain these estimates. Random effects, also known as BLUPs(best linear unbiased estimations), are obtained after fitting a LMM, wheretheir variance components are estimated by residual maximum likelihood (REML)and these values are used to calculate BLUPs.

当拟合线性混合模型(LMM)时,一个重要的目标是使用随机效应估计。在一些分析中,如遗传评价,分析的主要目的是获得这些估计。随机效应,也被称为BLUPs(最佳线性无偏估计),是在拟合LMM后得到的,其方差分量由残差极大似然(residual

maximum likelihood, REML)估计,这些值用于计算BLUPs。

Consider the following LMM for a genetic experiment, where we have arandomized complete block design withbbblocks and a total ofttgenotypesevaluated:

考虑下面的基因实验的LMM,我们有一个随机的完全块设计与bb块和总的tt基因型评估:

yij=μ+βj+ai+ϵij

whereyijisthe phenotypic observation of the ith genotype on the jth block;μis the overall mean;βjisthe fixed effect of the ith block[if !supportAnnotations][Alex1][endif] ;aiisthe random effect of the ith genotype, withai∼N(0,[if !msEquation][if !vml]

[endif][endif]) that isassociated with some information about pedigree, from which we can obtain apedigree-based or genomic-based relationship matrixA; andϵijis therandom residual withϵi∼N(0,σ2).

yij是第i个基因型j区块的表型值;μ总体均值;βj是第i个区块[if !supportAnnotations][Alex2][endif] 的固定效应;ai第i个基因型的随机效应,ai∼N(0,[if !msEquation][if !vml]

[endif][endif])与一些相关背景信息,从中我们可以获得一个基于系谱或基于基因型的关系矩阵A;ϵij随机残差 服从ϵi∼N(0,σ2)。

In the above model, also known as Animal Model, theaiairandom effects correspond to the breeding value(BV) from which selections of future progenitors or individuals will be done.As these effects are based on estimated variance components, then they areoften known as EBVs (estimated breeding values). Statistically, thematrixAAis critical as it is the place were we specifythecorrelations or dependenciesthat exist betweenrandom effects.

在上述模型中,又称动物模型,ai随机效应对应于育种值(BV),未来的祖先或个体将从中进行选择。由于这些影响是基于估计的方差成分,因此它们通常被称为EBVs(估计的育种值)。统计上,矩阵A是至关重要的,因为它是我们指定存在于随机效应之间的相关性或相关性的地方。

As many breeding and commercial decisions depend on the EBVs it isimportant to have an assessment of ourconfidencein thesevalues, and this is done by calculating a statistical measure of precision.Probably the most common statistic used to assess these effects is thecalculation of thecorrelation between true and predicted random

effects,r(a,[if !msEquation][if !vml]

[endif][endif]), also known asaccuracy. Thiscan be calculated using the following expression for a given genotype as:

由于许多育种和商业决策依赖于ebv,因此重要的是对我们对这些值的信心进行评估,这是通过计算精确的统计度量来完成的。评估这些效应最常用的统计数据可能是计算真实随机效应和预测随机效应之间的相关性,即r(a,[if !msEquation][if !vml]

[endif][endif]),也被称为准确性。对于一个给定的基因型,可以用下面的表达式来计算

r(ai,[if !msEquation][if !vml]

[endif][endif])=[if !msEquation][if !vml]

[endif][endif]

This formula requires the PEV (predictor error variance) of [if !msEquation][if !vml]

[endif][endif], which correspondsto the square of the standard error (SE) of the random effect estimate,i.e. PEV([if !msEquation][if !vml]

[endif][endif])=SE[if !msEquation][if !vml]

[endif][endif]. Most

statistical software will report these SE values, including SAS and ASReml. In

addition, we have[if !msEquation][if !vml]

[endif][endif], which is thepopulation estimate of the variance associated with the EBVs, and this is ourgenetic additive variance, which is used to calculate narrow-sense heritability(h2)h2).

这个公式需要 [if !msEquation][if !vml]

[endif][endif]的PEV(预测误差方差),它对应于随机效应估计的标准误差(SE)的平方,即 PEV([if !msEquation][if !vml]

[endif][endif])=SE[if !msEquation][if !vml]

[endif][endif]。大多数统计软件将报告这些SE值,包括SAS和ASReml。此外,我们有[if !msEquation][if !vml]

[endif][endif],即群体估计的方差与ebv相关,这是我们的基因加性方差,这是用来计算狭义遗传力(h2)。

According to statistical theory, the values of PEV will range from zero to

the [if !msEquation][if !vml]

[endif][endif]. In cases where

we have full information about a given genotype, then the PEV will be close to

zero as we have little uncertainty in the true additive value of a genotype,

hence, [if !msEquation][if !vml]

[endif][endif]=ai. However, when

there is very little information, then the PEV will approximate to [if !msEquation][if !vml]

[endif][endif]. Note thatthese extreme values of PEV translate inr(a,a^)r(a,a^)valuesof 1.0 and 0.0, respectively. Therefore, values closer to 1.0 are an indicationof very good quality of a given additive effect estimate.

根据统计理论,PEV的值将范围从零到[if !msEquation][if !vml]

[endif][endif]。在我们对一个基因型有完整信息的情况下,PEV将接近于零,因为我们对一个基因型的真实加值几乎没有不确定性,因此[if !msEquation][if !vml]

[endif][endif]=ai。然而,当很少有信息,然后[if !msEquation][if !vml]

[endif][endif]和PEV将近似。注意,这些PEV的极值分别转换为r(a,[if !msEquation][if !vml]

[endif][endif])的1.0和0.0值。因此,接近1.0的值表示给定的附加效果估计值的质量非常好。

Sometimes, youwill seereliabilityinstead ofaccuracyreported,where the former is the square of the later. Both statistics are commonly usedwhen reporting the quality of a random effect estimate in genetic studies.

有时,您会看到可靠性而不是准确性的报告,前者是后者的平方。在遗传研究中,这两种统计数据通常用于报告随机效应估计值的质量

r(ai,[if !msEquation][if !vml]

[endif][endif])2=[if !msEquation][if !vml]

[endif][endif]

To illustratethe above calculations, we used genetic data originating from a trial on apopulation of cultivated two-row spring barley. The original trial measuredheight (cm) of plants grown in pots during 2011 for a total of 856 lines;however, the available data for this analysis consists of the adjusted meanvalues of only 478 lines, together with their molecular information. Furtherdetails can be found in Oakleyet al. (2016).

为了说明上述计算,我们使用了在栽培的两行春大麦群体中进行的一项试验的遗传数据。2011年,最初的试验测量了盆栽植物的高度(cm),共计856株;然而,该分析的可用数据仅包括调整后的478行及其分子信息的平均值。进一步的细节可以在Oakley等人(2016)中找到。

In our example,we obtained the realized relationship matrix from the single nucleotidepolymorphism (SNP) marker information available for these 478 lines, and anAnimal Model was fitted considering this genomic matrix. This was done in thepackage R using the library ASReml-R, and the code used to fit this model was:

在我们的示例中,我们从这478行可用的SNP标记信息中获得了实现的关系矩阵,并根据该基因组矩阵拟合了一个动物模型。这是在包R中使用库ASReml-R完成的,用来适应这个模型的代码是:

modelGBLUP  <- asreml(fixed=Hadj11~1,

                     random=~vm(lines, ainv),

                     na.action=na.method(y='include'),

                     data=datagFULL)

Here, the response variable wasHadj11Hadj11andour single random effect is the factorlinesthatrepresents our additive effects, and this is associated with the inverse of therelationship matrixainv. In addition, in the above code itwas requested that observations with missing data (i.e.,NANA) in their response variable will be included in theanalyses. This allowed us to obtain EBVs for all individuals, even those thatwere not phenotyped but that were included in the relationship matrix.

The aboveanalysis, using the following code

这里,响应变量是Hadj11,我们的单一随机效应是代表附加效应的因素lines,这与关系矩阵ainv的逆相关。此外,在上述守则中还要求将其响应变量中缺少数据的观测值(即NANA)纳入分析。这使我们能够获得所有个体的EBVs,甚至包括那些没有表型但包含在关系矩阵中的个体。

以上分析,使用以下代码

vc <-  summary(modelGBLUP)$varcomp

will output thefollowing estimation of variance components.

将输出以下方差分量的估计。

componentstd.errorz.ratiobound%ch

vm(lines,  ainv)34.941597.8801384.434134P0.4

units!R43.100007.9493055.421857P0.0

These values correspond to a narrow-sense heritabilityh2of 0.478(±0.097) for the adjusted mean of height.Hence, the trait Hadj11has a strong genetic control. Now, we canproceed to request the BLUP (or EBV) values together with their standarderrors, this is done using the code:

这些值对应于调整后的平均身高的狭义遗传力h2为0.478(±0.097)。因此,性状Hadj11有很强的遗传潜力。现在,我们可以继续请求BLUP(或EBV)值及其标准错误,这是通过以下代码完成的:

BLUP <-  summary(modelGBLUP, coef=TRUE)$coef.random

Below we can seea subset of this list with the top 10 and bottom 10 genotypes sorted by theirSE values.

下面我们可以看到这个列表的一个子集,上面的10个基因型和下面的10个基因型根据它们的SE值排序。

solutionstd.errorz.ratioPEVaccuracyreliability

vm(lines,  ainv)_LENTA5.83473.94821.477815.58850.74420.5539

vm(lines,  ainv)_PALLAS5.11383.98071.284615.84620.73930.5465

vm(lines,  ainv)_CHARIOT1.14943.98590.288415.88710.73850.5453

vm(lines,  ainv)_CLASS-7.64293.9874-1.916815.89900.73820.5450

vm(lines,  ainv)_BONUS7.92053.98971.985315.91760.73790.5445

vm(lines, ainv)_TOKEN-7.85753.9930-1.967815.94400.73740.5437

vm(lines,  ainv)_MAJA9.92814.00222.480616.01780.73590.5416

vm(lines,  ainv)_PRESTIGE-8.42054.0184-2.095516.14770.73340.5379

vm(lines,  ainv)_ASPEN-4.14114.0456-1.023616.36700.72910.5316

vm(lines,  ainv)_CORNICHE-1.61224.0546-0.397616.43940.72770.5295

solutionstd.errorz.ratioPEVaccuracyreliability

vm(lines,  ainv)_FREJA28.01774.82605.805623.29020.57750.3335

vm(lines,  ainv)_DASIO-8.59694.8376-1.777123.40280.57470.3302

vm(lines,  ainv)_KESTREL-0.43034.8644-0.088523.66230.56820.3228

vm(lines,  ainv)_GAIRDNER5.31704.91881.081024.19440.55460.3076

vm(lines,  ainv)_STELLA15.14224.92733.073124.27810.55240.3052

vm(lines, ainv)_LOGAN5.76014.97161.158624.71700.54090.2926

vm(lines,  ainv)_CLARA9.48745.38701.761229.01970.41170.1695

vm(lines,  ainv)_MARESI1.54495.74480.268933.00320.23550.0555

vm(lines,  ainv)_FLICK SEJET-8.54435.8523-1.460034.24900.14080.0198

vm(lines,  ainv)_BEKA16.59086.09922.720237.2001NaN-0.0646

The EBV estimates for this data are positive and negative, as theyrepresent deviations from the overall height mean. For example, the genotypeFREJA has an EBV of+28.02indicating that, if this parentis crossed with a parent of equal genetic worth, its progeny will have anexpected deviation from the mean of+28cm. The full range ofEBVs is between −14.54cm to43.30cm, a wide intervalconsidering that the mean of the population is 88.03cm, and this is aresult of the relatively large heritability value found for this trait.

EBV对这些数据的估计有正有负,因为它们代表了与总体高度平均值的偏差。例如,FREJA基因型的EBV为+28.02,表明如果该亲本与具有同等遗传价值的亲本杂交,其后代将会有+28cm的期望偏差。EBVs的全量程在−14.54cm到43.30cm之间,考虑到群体的平均值为88.03cm,这是该性状发现的相对较大的遗传价值的结果。

Also, there is an interesting range in SE values from 3.949to 6.105. The width ofthis range is related to the level of genetic relationships betweenindividuals, that is specified through the matrixA. Genotypes with manyrelatives willshareorcollectinformation from many phenotypicresponses, and therefore they will have lower SE values; in contrast,individuals with few, or no relatives, will tend to have much higher SE values.However, it is also possible that individuals with no phenotypic data will havelarge SE values, unless they have many relatives to help with the estimation oftheir EBVs.

此外,SE值在3.949到6.105之间有一个有趣的范围。这个范围的宽度与个体之间的遗传关系的水平有关,这是通过矩阵A指定的。具有许多亲缘关系的基因型会从许多表型反应中共享或收集信息,因此它们的SE值较低;相比之下,亲属很少或没有亲属的人往往具有更高的SE值。然而,也有可能没有表型数据的个体将有较大的SE值,除非他们有许多亲属帮助估计他们的EBVs。

Now, we can

proceed to calculate the accuracy and reliability using the expressions shown

before, with the code。

现在,我们可以使用前面显示的表达式和代码来计算准确性和可靠性。

Va <-  vc[1,1]

BLUP$PEV <-  BLUP$std.error^2

BLUP$accuracy  <- sqrt(1 - BLUP$PEV/Va)

BLUP$reliability  <- 1 - BLUP$PEV/Va

注:这段代码好像有点问题,没跑通,我是用跑通的

Va <- vc[1,1]

PEV <- BLUP[,2]^2

accuracy <- sqrt(1  - PEV/Va)

reliability <-  1 - PEV/Va

And now we canobserve these calculated values for a subset of the top 10 and bottom 10individuals.

现在我们可以观察前10名和后10名个体的一个子集的这些计算值。

PEVaccuracyreliability

vm(lines,  ainv)_LENTA15.58850.74420.5539

vm(lines,  ainv)_PALLAS15.84620.73930.5465

vm(lines,  ainv)_CHARIOT15.88710.73850.5453

vm(lines,  ainv)_CLASS15.89900.73820.5450

vm(lines,  ainv)_BONUS15.91760.73790.5445

vm(lines,  ainv)_TOKEN15.94400.73740.5437

vm(lines,  ainv)_MAJA16.01780.73590.5416

vm(lines,  ainv)_PRESTIGE16.14770.73340.5379

vm(lines,  ainv)_ASPEN16.36700.72910.5316

vm(lines, ainv)_CORNICHE16.43940.72770.5295

PEVaccuracyreliability

vm(lines,  ainv)_FREJA23.29020.57750.3335

vm(lines,  ainv)_DASIO23.40280.57470.3302

vm(lines,  ainv)_KESTREL23.66230.56820.3228

vm(lines,  ainv)_GAIRDNER24.19440.55460.3076

vm(lines,  ainv)_STELLA24.27810.55240.3052

vm(lines,  ainv)_LOGAN24.71700.54090.2926

vm(lines,  ainv)_CLARA29.01970.41170.1695

vm(lines,  ainv)_MARESI33.00320.23550.0555

vm(lines,  ainv)_FLICK SEJET34.24900.14080.0198

vm(lines,  ainv)_BEKA37.2001NaN-0.0646

For this trait, the range of correlation between true and predicted

breeding values, r(a,[if !msEquation][if !vml]

[endif][endif]) , is from 0.134 to 0.744, and the lowest accuracy and reliability values

correspond to individuals with the largest SE values. Note that the

genotype BEKA has a NaN for accuracy, which

originates from a negative square root, as this BLUP has an SE that is larger

than [if !msEquation][if !vml]

[endif][endif] 

对于该性状,真实值与预测育种值r(a,[if !msEquation][if !vml]

[endif][endif])的相关范围为0.134~ 0.744,正确率和信度最低的个体对应的SE值最大。注意,基因型BEKA是NaN的准确性,源于一个负的平方根,因为这BLUP的SE大于[if !msEquation][if !vml]

[endif][endif]。[if !vml]

[endif]

In the above figure we present a boxplot of the accuracy values (left

plot) and a scatterplot of the PEV against accuracy (right plot). It is clear

from these plots that there is a group of three observations with very poor

accuracy that have r(a,[if !msEquation][if !vml]

[endif][endif]) values lowerthan0.41. However, the majority of the observations are foundwith accuracy values of0.54 or higher, where some of themreach values as high as0.74.

在上面的图中,我们给出了精度值的箱线图(左边的图)和PEV相对于精度的散点图(右边的图)。从这些图中可以明显看出,有一组精度很差的三个观测值,其r(a,[if !msEquation][if !vml]

[endif][endif])值小于0.41。然而,大多数观测值的精度值都在0.54或更高,其中一些观测值甚至高达0.74。

It is interesting to speculate what the reasons could be for these threelower observations (genotypes CLARA, MARESIand FLICK SEJET) togetherwith theNaN observation (genotype BEKA), to be sodifferent from the rest of the population. The unusual behavior of these fourobservations is likely to reflect errors in their information, includingphenotypic response (maybe a potential outlier), or incorrect geneticrelationships (due to pedigree or genotyping errors).

推测这三个较低的观察值(基因型为CLARA、MARESI和FLICK SEJET)和NaN观察值(基因型为BEKA)如此不同于其他个体的原因是很有趣的。这四个观察的不寻常行为可能反映了他们的信息错误,包括表型值(可能是一个潜在的异常值),或不正确的遗传关系(由于系谱或基因分型错误)。

In summary, we have calculated values of accuracy and reliability for the

estimation of random effects from an Animal Model. For the data analyzed, these

represented a wide range of values, reflecting different levels of precision.

For example, a large value of accuracy r(a,[if !msEquation][if !vml]

[endif][endif]),such as 0.90, will indicatethat there is little expected change on the estimation of the EBV if additionalinformation is collected from a given genotype or from its relatives (e.g.,its offspring). Therefore, r(a,[if !msEquation][if !vml]

[endif][endif])values provideus with a good measure of our confidence in the genetic evaluation performed.

综上所述,我们已经计算出从动物模型中估计随机效应的准确性和可靠性值。对于所分析的数据,这些代表了广泛的值,反映了不同的精度水平。例如,如果r(a,[if !msEquation][if !vml]

[endif][endif])的精度值很大,例如0.90,就表明如果从一个给定的基因型或从它的亲属(例如,它的后代)收集额外的信息,对EBV的估计就不会有什么预期的变化。因此r(a,[if !msEquation][if !vml]

[endif][endif])值为我们提供了一个很好的度量我们对所执行的遗传评估的信心的方法。

In our analysis, and in many other genetic analyses, it is critical toconsider the incorporation of the matrixAof geneticrelationships, as it will allow us to improve the estimation of the randomeffects because we have changed our assumption from independent random effectsto a model that considers correlations or dependencies between these randomeffects.

在我们的分析,和许多其他基因分析,关键是要考虑矩阵的公司A的遗传关系,它将使我们能提高随机效应的估计,因为我们已经改变了我们的假设从独立的随机效应模型,认为相关性或这些随机效应之间的依赖关系。

A routine checkof the accuracy and reliability (and indirectly of PEV values), can help tomake future improvements on the quality of the information obtained fromstatistical analyses. For example, we can assess the consequences ofincorporating additional relatives or replicates in future evaluations.Alternatively, these checks can also help to assess the benefits of using morecomplex linear models, such as those that combine multiple traits, measurementsor sites.

常规检查PEV值的准确性和可靠性(以及间接检查PEV值)有助于在今后改进统计分析获得的信息的质量。例如,我们可以在未来的评估中评估合并其他亲属或复制的后果。或者,这些检查也可以帮助评估使用更复杂的线性模型的好处,例如那些结合了多个特征、测量或位置的模型。

Finally, theabove calculations can be extended to any other random effect on all linearmixed models fitted, and their use is recommended to assess the quality of theestimates beyond what a simple standard error will indicate.

最后,上述计算可以扩展到对所有拟合的线性混合模型的任何其他随机影响,并且建议使用它们来评估估计值的质量,而不是简单的标准误差。

Author

作者

Salvador A.Gezan

Reference

参考文献

Oakey, H;Cullis, B; Thompson, R; Comadran, J; Halpin, C; Waugh, R. 2016.Genomic selection in multi-environment crop trials.Genes| Genomes | Genetics 6:1313-1326

Notes: SAG Jan-2020

相关文章

网友评论

      本文标题:Calculating Accuracy and Reliabi

      本文链接:https://www.haomeiwen.com/subject/efjpuqtx.html