美文网首页Python
Python爬取网易财经基金历史净值数据

Python爬取网易财经基金历史净值数据

作者: Vined | 来源:发表于2018-03-04 20:03 被阅读234次

    网易财经基金历史净值数据的页面地址是
    http://quotes.money.163.com//fund/jzzs_110022.html?start=2018-02-22&end=2018-03-02
    jzzs_后面跟基金代码
    参数说明如下:

    1. start 开始日期,格式是yyyy-mm-dd
    2. end 结束日期,格式是yyyy-mm-dd
    网页截图

    页面里的表格主体部分html如下:

    <tbody>
        <tr>
            <td>2018-03-01</td>
            <td>2.3540</td>
            <td>2.3540</td>
            <td><span class="cRed">0.56%</span></td>
        </tr>
        <tr>
            <td>2018-02-28</td>
            <td>2.3410</td>
            <td>2.3410</td>
            <td><span class="cGreen">-1.35%</span></td>
        </tr>
        <tr>
            <td>2018-02-27</td>
            <td>2.3730</td>
            <td>2.3730</td>
            <td><span class="cGreen">-2.06%</span></td>
        </tr>
        <tr>
            <td>2018-02-26</td>
            <td>2.4230</td>
            <td>2.4230</td>
            <td><span class="cRed">0.29%</span></td>
        </tr>
        <tr>
            <td>2018-02-23</td>
            <td>2.4160</td>
            <td>2.4160</td>
            <td><span class="cGreen">-0.49%</span></td>
        </tr>
        <tr>
            <td>2018-02-22</td>
            <td>2.4280</td>
            <td>2.4280</td>
            <td><span class="cRed">2.58%</span></td>
        </tr>
    </tbody>
    

    获取历史净值数据的方法是用BeautifulSoup库的findAll找到tbody(表格主体)标签,然后在里面找tr(表格中的一行)标签,单元格内容是:

    1. td:nth-of-type(1)(第1个单元格)是净值日期
    2. td:nth-of-type(2)(第2个单元格)是单位净值
    3. td:nth-of-type(3)(第3个单元格)是累计净值
    4. td:nth-of-type(4)(第4个单元格)是日增长率

    范例代码如下:

    # -*- coding:utf-8 -*-
    
    
    import requests
    from bs4 import BeautifulSoup
    from prettytable import *
    
    
    def get_url(url, params=None, proxies=None):
        rsp = requests.get(url, params=params, proxies=proxies)
        rsp.raise_for_status()
        return rsp.text
    
    
    def get_fund_data(code, start='', end=''):
        record = {'Code': code}
        url = r'http://quotes.money.163.com//fund/jzzs_' + code + '.html'
        params = {'start': start, 'end': end}
        html = get_url(url, params)
        soup = BeautifulSoup(html, 'html.parser')
        records = []
        tab = soup.findAll('tbody')[0]
        for tr in tab.findAll('tr'):
            if tr.findAll('td') and len((tr.findAll('td'))) == 4:
                record['Date'] = str(tr.select('td:nth-of-type(1)')[0].getText().strip())
                record['NetAssetValue'] = str(tr.select('td:nth-of-type(2)')[0].getText().strip())
                record['ChangePercent'] = str(tr.select('td:nth-of-type(4)')[0].getText().strip())
                records.append(record.copy())
        return records
    
    
    def demo(code, start, end):
        table = PrettyTable()
        table.field_names = ['Code', 'Date', 'NAV', 'Change']
        table.align['Change'] = 'r'
        records = get_fund_data(code, start, end)
        for record in records:
            table.add_row([record['Code'], record['Date'], record['NetAssetValue'], record['ChangePercent']])
        return table
    
    
    if __name__ == "__main__":
        print demo('110022', '2018-02-22', '2018-03-02')
    

    输出结果如下:

    +--------+------------+--------+--------+
    |  Code  |    Date    |  NAV   | Change |
    +--------+------------+--------+--------+
    | 110022 | 2018-03-02 | 2.3580 |  0.17% |
    | 110022 | 2018-03-01 | 2.3540 |  0.56% |
    | 110022 | 2018-02-28 | 2.3410 | -1.35% |
    | 110022 | 2018-02-27 | 2.3730 | -2.06% |
    | 110022 | 2018-02-26 | 2.4230 |  0.29% |
    | 110022 | 2018-02-23 | 2.4160 | -0.49% |
    | 110022 | 2018-02-22 | 2.4280 |  2.58% |
    +--------+------------+--------+--------+
    

    相关文章

      网友评论

        本文标题:Python爬取网易财经基金历史净值数据

        本文链接:https://www.haomeiwen.com/subject/tsyqfftx.html