美文网首页
Normalization

Normalization

作者: 浩瀚之宇 | 来源:发表于2018-11-25 10:53 被阅读0次

    1 Quantile Normalization

    1.1 When to use Quantile Normalization?

    https://www.biorxiv.org/content/biorxiv/early/2014/12/04/012203.full.pdf

    1.2 The Impact of Normalization Methods on RNA-Seq Data Analysis

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4484837/

    1.3 quantile normalization 原理:

    https://www.cnblogs.com/lmj-sky/p/6036392.html

    A quick illustration of such normalizing on a very small dataset:

    Arrays 1 to 3, genes A to D

    A    5    4    3

    B    2    1    4

    C    3    4    6

    D    4    2    8

    For each column determine a rank from lowest to highest and assign number i-iv

    A    iv    iii  i

    B    i    i    ii

    C    ii    iii  iii

    D    iii  ii    iv

    These rank values are set aside to use later. Go back to the first

    set of data. Rearrange that first set of column values so each column is

    in order going lowest to highest value. (First column consists of

    5,2,3,4. This is rearranged to 2,3,4,5. Second Column 4,1,4,2 is

    rearranged to 1,2,4,4, and column 3 consisting of 3,4,6,8 stays the same

    because it is already in order from lowest to highest value.) The

    result is:

    A    5    4    3    becomes A 2 1 3

    B    2    1    4    becomes B 3 2 4

    C    3    4    6    becomes C 4 4 6

    D    4    2    8    becomes D 5 4 8

    Now find the mean for each row to determine the ranks

    A (2 1 3)/3 = 2.00 = rank i

    B (3 2 4)/3 = 3.00 = rank ii

    C (4 4 6)/3 = 4.67 = rank iii

    D (5 4 8)/3 = 5.67 = rank iv

    Now take the ranking order and substitute in new values

    A    iv    iii  i

    B    i    i    ii

    C    ii    iii  iii

    D    iii  ii    iv

    becomes:

    A    5.67    4.67    2.00B    2.00    2.00    3.00C    3.00    4.67    4.67D    4.67    3.00    5.67

    R实现方法:实质上是针对array数据进行设置的,要求数据每一列是一个array,每一行是一个探针

    针对分位数标准化,R中有多个包进行处理

    1:affy

    2: preprocessCore

    其中preprocessCore 中的normalize.quantiles使用非常方便

    > a<-matrix(1:6,3,2)

    > a

    [,1] [,2]

    [1,]    1    4

    [2,]    2    5

    [3,]    3    6

    > library(preprocessCore)

    > b=normalize.quantiles(a)

    > b

    [,1] [,2]

    [1,]  2.5  2.5

    [2,]  3.5  3.5

    [3,]  4.5  4.5

    2. Log transformation

    在对数据进行log2或者是log10处理的目的:

    Log transformation is just one way to make the skewed distribution less skewed. For parametric statistical methods,

    it helps to satisfy the assumption of inferential statistics. For non-parametric methods, it does not matter if the data

    is skewed or not. Log is not the only way to make the transformation, box-cox method could help you find the best

    transformation for your data. however log is indeed the most powerful one.

    相关文章

      网友评论

          本文标题:Normalization

          本文链接:https://www.haomeiwen.com/subject/xpfpqqtx.html