美文网首页我爱编程
Python 数据科学笔记1

Python 数据科学笔记1

作者: AsianDuckKing | 来源:发表于2018-01-16 21:38 被阅读0次

    Python DataScience Handbook 学习笔记1

    第一部分 numpy

    相比较于python内置的数据类型,numpy提供了更为高效的数据操作.
    首先我们要了解一下python内置的数据类型.以Integer为例,C代码的实现如下

    # This code illustrates why python allows dynamic typing
    struct _longobject {
        long ob_refcnt;
        PyTypeObject *ob_type;
        size_t ob_size;
        long ob_digit[1];
    };
    

    int 类型在实现中是一个指向上述结构体的指针;

    numpy中的核心:array

    numpy array 与 list的对比可以通过下图来体会:


    diff

    创建

    接下来我们通过实例来看一下在numpy中如何简单优雅地创建数组

    In [1]: import numpy as np
    
    In [2]: np.__version__
    Out[2]: '1.13.3'
    
    In [3]: np?
    
    In [4]: np.array([3.14, 3, 1, 2])
    Out[4]: array([ 3.14,  3.  ,  1.  ,  2.  ])
    
    In [5]: np.zeros((3, 5), dtype=int)
    Out[5]: 
    array([[0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0]])
    
    In [6]: np.arange(0,20,2)
    Out[6]: array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
    
    In [7]: np.linspace(0,1,5)
    Out[7]: array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])
    
    In [8]: np.random.random((3,3))
    Out[8]: 
    array([[ 0.43170959,  0.10099413,  0.45859315],
           [ 0.62548971,  0.57233299,  0.6632921 ],
           [ 0.74947709,  0.31867245,  0.05988924]])
    
    In [9]: np.random.normal(0, 1, (3,3))
    Out[9]: 
    array([[-1.45242445, -1.27771487,  1.39220407],
           [-0.66294773, -1.56926783, -0.02177722],
           [ 1.0318081 , -0.87103441,  0.78930381]])
    
    In [10]: np.random.randint(0, 10, (3, 3))
    Out[10]: 
    array([[0, 5, 8],
           [2, 7, 7],
           [5, 0, 5]])
    
    In [11]: np.zeros(10, dtype=np.complex128)
    Out[11]: 
    array([ 0.+0.j,  0.+0.j,  0.+0.j,  0.+0.j,  0.+0.j,  0.+0.j,  0.+0.j,
            0.+0.j,  0.+0.j,  0.+0.j])
    
    
    

    基本操作

    对于一维的数组,与python原生的操作非常相似,在此不在赘述。我在这里列出了一些较为fancy的部分.

    In [5]: x2 = np.random.randint(15, size = (3,5), dtype='int')
    
    In [6]: x2
    Out[6]: 
    array([[ 8,  8,  5, 11, 13],
           [ 2, 14,  2,  9,  6],
           [ 8, 14,  6,  4,  9]])
    
    In [7]: x2[::-1, ::-1]
    Out[7]: 
    array([[ 9,  4,  6, 14,  8],
           [ 6,  9,  2, 14,  2],
           [13, 11,  5,  8,  8]])
    
    
    

    与matlab类似,numpy可以通过:符号来实现整行整列的访问

    x2[:, 0] # first column of x2
    x2[0, :] # first row of x2
    

    接下来我们要强调非常重要的一点:在对numpy中的array作slice等操作时,与原生列表有很大的不同,主要表现为它会产生一个"view"而非一个"copy"。通俗的说就是它不重新分配内存,创建列表,而是直接在原始数据上操作。

    In [8]: x = [1,2,3,4,5]
    
    In [9]: y = np.array([1,2,3,4,5])
    
    In [10]: copy = x[1:3]
    In [12]: copy[1] = 1
    
    In [13]: copy
    Out[13]: [2, 1]
    
    In [14]: not_copy = y[1:3]
    In [16]: not_copy[1] = 1
    
    In [17]: not_copy
    Out[17]: array([2, 1])
    
    In [18]: x
    Out[18]: [1, 2, 3, 4, 5]
    
    In [19]: y
    Out[19]: array([1, 2, 1, 4, 5])
    
    
    

    当然,只要显式地调用copy()就能创建一个copy而非view.

    x2_sub_copy = x2[:2, :2].copy()
    

    reshape

    x = np.array([1, 2, 3])
    
    # row vector via reshape
    x.reshape((1, 3))
    Out[39]:
    array([[1, 2, 3]])
    In [40]:
    # row vector via newaxis
    x[np.newaxis, :]
    Out[40]:
    array([[1, 2, 3]])
    In [41]:
    # column vector via reshape
    x.reshape((3, 1))
    Out[41]:
    array([[1],
           [2],
           [3]])
    In [42]:
    # column vector via newaxis
    x[:, np.newaxis]
    Out[42]:
    array([[1],
           [2],
           [3]])
    

    Concatenation

    grid = np.array([[1, 2, 3],
                     [4, 5, 6]])
    In [46]:
    # concatenate along the first axis
    np.concatenate([grid, grid])
    Out[46]:
    array([[1, 2, 3],
           [4, 5, 6],
           [1, 2, 3],
           [4, 5, 6]])
    In [47]:
    # concatenate along the second axis (zero-indexed)
    np.concatenate([grid, grid], axis=1)
    Out[47]:
    array([[1, 2, 3, 1, 2, 3],
           [4, 5, 6, 4, 5, 6]])
          
    

    Splitting

    x = [1, 2, 3, 99, 99, 3, 2, 1]
    x1, x2, x3 = np.split(x, [3, 5])
    print(x1, x2, x3)
    [1 2 3] [99 99] [3 2 1]
    
    In [23]: grid = np.arange(16).reshape((4,4))
    
    In [24]: grid
    Out[24]: 
    array([[ 0,  1,  2,  3],
           [ 4,  5,  6,  7],
           [ 8,  9, 10, 11],
           [12, 13, 14, 15]])
    
    In [25]: a, b = np.vsplit(grid, [3])
    
    In [26]: a
    Out[26]: 
    array([[ 0,  1,  2,  3],
           [ 4,  5,  6,  7],
           [ 8,  9, 10, 11]])
    
    In [27]: b
    Out[27]: array([[12, 13, 14, 15]])
    
    

    相关文章

      网友评论

        本文标题:Python 数据科学笔记1

        本文链接:https://www.haomeiwen.com/subject/qjveoxtx.html