美文网首页
limma: topTable

limma: topTable

作者: 浩瀚之宇 | 来源:发表于2018-11-29 16:16 被阅读0次

    adj.P.Val P-value after adjustment for multiple testing. This column is generally recommended as the primary statistic by which to interpret results. Genes with the smallest P-values will be the most reliable.
    P.Value Raw P-value
    t Moderated t-statistic (only available when two groups of Samples are defined)
    B B-statistic or log-odds that the gene is differentially expressed (only available when two groups of Samples are defined)
    logFC Log2-fold change between two experimental conditions (only available when two groups of Samples are defined)
    F Moderated F-statistic combines the t-statistics for all the pair-wise comparisons into an overall test of significance for that gene (only available when more than two groups of Samples are defined)

    Screenshot from 2018-11-29 16-13-08.png

    x<-topTable(fit2, coef="SCC-HCS", number=10000, adjust.method="BH", sort.by="B", resort.by="M") #synonyms are ‘"M"’ for ‘"logFC"’

    q值(即adj.P.Val值)

    toptable
    From limma v3.28.14 by Gordon Smyth

    Table of Top Genes from Linear Model Fit

    Extract a table of the top-ranked genes from a linear model fit.

    Keywords
    htest

    Usage

    topTable(fit, coef=NULL, number=10, genelist=fitgenes, adjust.method="BH", sort.by="B", resort.by=NULL, p.value=1, lfc=0, confint=FALSE) toptable(fit, coef=1, number=10, genelist=NULL, A=NULL, eb=NULL, adjust.method="BH", sort.by="B", resort.by=NULL, p.value=1, lfc=0, confint=FALSE, ...) topTableF(fit, number=10, genelist=fitgenes, adjust.method="BH", sort.by="F", p.value=1, lfc=0)
    topTreat(fit, coef=1, sort.by="p", resort.by=NULL, ...)

    Arguments

    fit
    list containing a linear model fit produced by lmFit, lm.series, gls.series or mrlm. For topTable, fit should be an object of class MArrayLM as produced by lmFit and eBayes.
    coef
    column number or column name specifying which coefficient or contrast of the linear model is of interest. For topTable, can also be a vector of column subscripts, in which case the gene ranking is by F-statistic for that set of contrasts.
    number
    maximum number of genes to list
    genelist
    data frame or character vector containing gene information. For topTable only, this defaults to fitgenes. A matrix of A-values or vector of average A-values. For topTable only, this defaults to fitAmean.
    eb
    output list from ebayes(fit). If NULL, this will be automatically generated.
    adjust.method
    method used to adjust the p-values for multiple testing. Options, in increasing conservatism, include "none", "BH", "BY" and "holm". See p.adjust for the complete list of options. A NULL value will result in the default adjustment method, which is "BH".
    sort.by
    character string specifying statistic to rank genes by. Possible values for topTable and toptable are "logFC", "AveExpr", "t", "P", "p", "B" or "none". (Permitted synonyms are "M" for "logFC", "A" or "Amean" for "AveExpr", "T" for "t" and "p" for "P".) Possibilities for topTableF are "F" or "none". Possibilities for topTreat are as for topTable except for "B".
    resort.by
    character string specifying statistic to sort the selected genes by in the output data.frame. Possibilities are the same as for sort.by.
    p.value
    cutoff value for adjusted p-values. Only genes with lower p-values are listed.
    lfc
    minimum absolute log2-fold-change required. topTable and topTableF include only genes with (at least one) absolute log-fold-changes greater than lfc. topTreat does not remove genes but ranks genes by evidence that their log-fold-change exceeds lfc.
    confint
    logical, should confidence 95% intervals be output for logFC? Alternatively, can take a numeric value between zero and one specifying the confidence level required.
    ...
    For toptable, other arguments are passed to ebayes (if eb=NULL). For topTreat, other arguments are passed to topTable.

    Details

    toptable is an earlier interface and is retained only for backward compatibility.

    These functions summarize the linear model fit object produced by lmFit, lm.series, gls.series or mrlm by selecting the top-ranked genes for any given contrast. topTable and topTableF assume that the linear model fit has already been processed by eBayes. topTreat assumes that the fit has been processed by treat.

    The p-values for the coefficient/contrast of interest are adjusted for multiple testing by a call to p.adjust. The "BH" method, which controls the expected false discovery rate (FDR) below the specified value, is the default adjustment method because it is the most likely to be appropriate for microarray studies. Note that the adjusted p-values from this method are bounds on the FDR rather than p-values in the usual sense. Because they relate to FDRs rather than rejection probabilities, they are sometimes called q-values. See help("p.adjust") for more information.

    Note, if there is no good evidence for differential expression in the experiment, that it is quite possible for all the adjusted p-values to be large, even for all of them to be equal to one. It is quite possible for all the adjusted p-values to be equal to one if the smallest p-value is no smaller than 1/ngenes where ngenes is the number of genes with non-missing p-values.

    The sort.by argument specifies the criterion used to select the top genes. The choices are: "logFC" to sort by the (absolute) coefficient representing the log-fold-change; "A" to sort by average expression level (over all arrays) in descending order; "T" or "t" for absolute t-statistic; "P" or "p" for p-values; or "B" for the lods or B-statistic.

    Normally the genes appear in order of selection in the output table. If a different order is wanted, then the resort.by argument may be useful. For example, topTable(fit, sort.by="B", resort.by="logFC") selects the top genes according to log-odds of differential expression and then orders the selected genes by log-ratio in decreasing order. Or topTable(fit, sort.by="logFC", resort.by="logFC") would select the genes by absolute log-fold-change and then sort them from most positive to most negative.

    topTableF ranks genes on the basis of moderated F-statistics for one or more coefficients. If topTable is called and coef has two or more elements, then the specified columns will be extracted from fit and topTableF called on the result. topTable with coef=NULL is the same as topTableF, unless the fitted model fit has only one column.

    Toptable output for all probes in original (unsorted) order can be obtained by topTable(fit,sort="none",n=Inf). However write.fit or write may be preferable if the intention is to write the results to a file. A related method is as.data.frame(fit) which coerces an MArrayLM object to a data.frame.

    By default number probes are listed. Alternatively, by specifying p.value and number=Inf, all genes with adjusted p-values below a specified value can be listed.

    The argument lfc gives the ability to filter genes by log-fold change. This argument is not available for topTreat because treat already handles fold-change thresholding in a more sophisticated way.

    Value

    A dataframe with a row for the number top genes and the following columns:
    genelist
    one or more columns of probe annotation, if genelist was included as input
    logFC
    estimate of the log2-fold-change corresponding to the effect or contrast (for topTableF there may be several columns of log-fold-changes)
    CI.L
    left limit of confidence interval for logFC (if confint=TRUE or confint is numeric)
    CI.R
    right limit of confidence interval for logFC (if confint=TRUE or confint is numeric)
    AveExpr
    average log2-expression for the probe over all arrays and channels, same as Amean in the MarrayLM object
    t
    moderated t-statistic (omitted for topTableF)
    F
    moderated F-statistic (omitted for topTable unless more than one coef is specified)
    P.Value
    raw p-value
    adj.P.Value
    adjusted p-value or q-value
    B
    log-odds that the gene is differentially expressed (omitted for topTreat)

    10.1 Summary Top-Tables
    Limma provides functions topTable() and decideTests() which summarize the results of the linear model, perform hypothesis tests and adjust the p-values for multiple testing. Results include (log) fold changes, standard errors, t-statistics and p-values. The basic statistic used
    for significance analysis is the moderated t-statistic, which is computed for each probe and
    for each contrast. This has the same interpretation as an ordinary t-statistic except that the
    standard errors have been moderated across genes, i.e., shrunk towards a common value, using
    a simple Bayesian model. This has the effect of borrowing information from the ensemble of
    genes to aid with inference about each individual gene [30]. Moderated t-statistics lead to
    p-values in the same way that ordinary t-statistics do except that the degrees of freedom are
    increased, reflecting the greater reliability associated with the smoothed standard errors. The
    effectiveness of the moderated t approach has been demonstrated on test data sets for which
    the differential expression status of each probe is known [11].
    A number of summary statistics are presented by topTable() for the top genes and the
    selected contrast. The M -value ( M ) is the value of the contrast. Usually this represents a log 2 -
    fold change between two or more experimental conditions although sometimes it represents a
    log 2 -expression level. The A-value ( A ) is the average log 2 -expression level for that gene across
    all the arrays and channels in the experiment. Column t is the moderated t-statistic. Column
    P.Value is the associated p-value and adj.P.Value is the p-value adjusted for multiple testing.
    The most popular form of adjustment is "BH" which is Benjamini and Hochberg’s method to
    control the false discovery rate [1]. The adjusted values are often called q-values if the intention
    is to control or estimate the false discovery rate. The meaning of "BH" q-values is as follows.
    If all genes with q-value below a threshold, say 0.05, are selected as differentially expressed,
    then the expected proportion of false discoveries in the selected group is controlled to be less
    than the threshold value, in this case 5%. This procedure is equivalent to the procedure of
    Benjamini and Hochberg although the original paper did not formulate the method in terms
    of adjusted p-values.
    The B-statistic ( lods or B ) is the log-odds that the gene is differentially expressed [30,
    Section 5]. Suppose for example that B = 1.5. The odds of differential expression is
    52exp(1.5)=4.48, i.e, about four and a half to one. The probability that the gene is differ-
    entially expressed is 4.48/(1+4.48)=0.82, i.e., the probability is about 82% that this gene is
    differentially expressed. A B-statistic of zero corresponds to a 50-50 chance that the gene
    is differentially expressed. The B-statistic is automatically adjusted for multiple testing by
    assuming that 1% of the genes, or some other percentage specified by the user in the call
    to eBayes() , are expected to be differentially expressed. The p-values and B-statistics will
    normally rank genes in the same order. In fact, if the data contains no missing values or
    quality weights, then the order will be precisely the same.
    As with all model-based methods, the p-values depend on normality and other mathemat-
    ical assumptions which are never exactly true for microarray data. It has been argued that
    the p-values are useful for ranking genes even in the presence of large deviations from the
    assumptions [29, 27]. Benjamini and Hochberg’s control of the false discovery rate assumes
    independence between genes, although Reiner et al [20] have argued that it works for many
    forms of dependence as well. The B-statistic probabilities depend on the same assumptions
    but require in addition a prior guess for the proportion of differentially expressed genes. The
    p-values may be preferred to the B-statistics because they do not require this prior knowledge.
    The eBayes() function computes one more useful statistic. The moderated F -statistic ( F )
    combines the t-statistics for all the contrasts into an overall test of significance for that gene.
    The F -statistic tests whether any of the contrasts are non-zero for that gene, i.e., whether
    that gene is differentially expressed on any contrast. The denominator degrees of freedom is
    the same as that of the moderated-t. Its p-value is stored as fit$F.p.value . It is similar to
    the ordinary F -statistic from analysis of variance except that the denominator mean squares
    are moderated across genes.
    A frequently asked question relates to the occasional occurrence that all of the adjusted
    p-values are equal to 1. This is not an error situation but rather an indication that there is
    no evidence of differential expression in the data after adjusting for multiple testing. This
    can occur even though many of the raw p-values may seem highly significant when taken as
    individual values. This situation typically occurs when none of the raw p-values are less than
    1/G, where G is the number of probes included in the fit. In that case the adjusted p-values
    are typically equal to 1 using any of the adjustment methods except for adjust="none" .

    相关文章

      网友评论

          本文标题:limma: topTable

          本文链接:https://www.haomeiwen.com/subject/rsyfcqtx.html