pandas 面试题挑战八

作者: 人工智能人话翻译官 | 来源:发表于2019-05-29 23:15 被阅读76次

    求两个Series的相关性

    现有两个Series如下:

    import pandas as pd
    
    s1 = pd.Series([.2, .0, .6, .2])
    s2 = pd.Series([.3, .6, .0, .1])
    

    求两个Series的皮尔逊系数


    皮尔逊系数说明

    解决方法就是把Series当成是一个向量去处理,如下:

    s1.corr(s2)
    

    输出

    -0.85106449634699
    

    Series数据的上移动与下移动

    现有数据如下:

    # importing pandas as pd 
    import pandas as pd 
    sr = pd.Series(['New York', 'Chicago', 'Toronto', 'Lisbon', 'Rio', 'Moscow']) 
    didx = pd.date_range(start ='2014-08-01 10:00', freq ='W',  
                         periods = 6, tz = 'Europe/Berlin')  
    sr.index = didx 
    print(sr) 
    

    输出如下:

    2014-08-03 10:00:00+02:00    New York
    2014-08-10 10:00:00+02:00     Chicago
    2014-08-17 10:00:00+02:00     Toronto
    2014-08-24 10:00:00+02:00      Lisbon
    2014-08-31 10:00:00+02:00         Rio
    2014-09-07 10:00:00+02:00      Moscow
    Freq: W-SUN, dtype: object
    

    把数据向下移动两行,解决方法如下:

    sr.shift(periods = 2) 
    

    输出

    2014-08-03 10:00:00+02:00         NaN
    2014-08-10 10:00:00+02:00         NaN
    2014-08-17 10:00:00+02:00    New York
    2014-08-24 10:00:00+02:00     Chicago
    2014-08-31 10:00:00+02:00     Toronto
    2014-09-07 10:00:00+02:00      Lisbon
    Freq: W-SUN, dtype: object
    
    image.png

    现有数据如下:

    import pandas as pd 
    
    sr = pd.Series(['New York', 'Chicago', 'Toronto', 'Lisbon', 'Rio', 'Moscow']) 
    didx = pd.date_range(start ='2014-08-01 10:00', freq ='W',  
                         periods = 6, tz = 'Europe/Berlin')  
    sr.index = didx 
    print(sr) 
    

    输出:

    2014-08-03 10:00:00+02:00    New York
    2014-08-10 10:00:00+02:00     Chicago
    2014-08-17 10:00:00+02:00     Toronto
    2014-08-24 10:00:00+02:00      Lisbon
    2014-08-31 10:00:00+02:00         Rio
    2014-09-07 10:00:00+02:00      Moscow
    Freq: W-SUN, dtype: object
    

    把数据向上移动两行,解决方法如下:

    sr.shift(periods = -2) 
    

    输出

    2014-08-03 10:00:00+02:00    Toronto
    2014-08-10 10:00:00+02:00     Lisbon
    2014-08-17 10:00:00+02:00        Rio
    2014-08-24 10:00:00+02:00     Moscow
    2014-08-31 10:00:00+02:00        NaN
    2014-09-07 10:00:00+02:00        NaN
    Freq: W-SUN, dtype: object
    
    image.png

    序列的自相关

    自相关系数
    平稳序列的自相关系数会快速收敛,从哪一阶开始快速收敛(忽然从一个较大的值降到0附近)就说明是哪一阶模型,例如自相关函数图拖尾,偏自相关函数图截尾,n从2或3开始控制在置信区间之内,因而可判定为AR(2)模型或者AR(3)模型。 如果你不懂时间序列是啥就别看这段了,这需要你系统的学习时间序列。

    现有数据如下:

    # importing pandas as pd 
    import pandas as pd 
    sr = pd.Series([11, 21, 8, 18, 65, 18, 32, 10, 5, 32, None]) 
    index_ = pd.date_range('2010-10-09 08:45', periods = 11, freq ='H') 
    sr.index = index_ 
    print(sr) 
    

    输出:

    2010-10-09 08:45:00    11.0
    2010-10-09 09:45:00    21.0
    2010-10-09 10:45:00     8.0
    2010-10-09 11:45:00    18.0
    2010-10-09 12:45:00    65.0
    2010-10-09 13:45:00    18.0
    2010-10-09 14:45:00    32.0
    2010-10-09 15:45:00    10.0
    2010-10-09 16:45:00     5.0
    2010-10-09 17:45:00    32.0
    2010-10-09 18:45:00     NaN
    Freq: H, dtype: float64
    

    求该Series的自相关系数

    result = sr.autocorr() 
    result
    

    输出:

    -0.13907359397344918
    

    如果你没学过时间序列,还非得想知道啥事autocorr,那好吧,我看了一下源码,其实autocorr就是⏬

    result = sr.corr(sr.shift())
    result
    

    输出:

    -0.13907359397344918
    

    相关文章

      网友评论

        本文标题:pandas 面试题挑战八

        本文链接:https://www.haomeiwen.com/subject/oghytctx.html