美文网首页
数据分析师之路(四):用Ruby批量处理Excel

数据分析师之路(四):用Ruby批量处理Excel

作者: 诗与星空 | 来源:发表于2017-07-20 18:51 被阅读255次

    最近沉浸于Python的学习中,收获颇多,但是在进行Excel表的处理的时候,我还是喜欢Ruby。原因很简单,Ruby对Excel的处理更容易理解,或许等我熟练地掌握Pandas后,就能脱离对ruby的依赖了。


    本文介绍一个批量处理Excel表的例子,把3000多家上市公司的全部资产负债表、利润表、现金流量表、主要经营指标表重新组合,把感兴趣的指标整合在一张Excel表中。

    一、先调用相应的库:

    # -*- coding: UTF-8 -*-
    require 'rubygems'
    require 'hpricot'
    require 'open-uri'
    require 'win32ole'
    

    其中hpricot和open-uri是对URL进行解析处理的库,win32ole是处理Excel的库。
    二、然后定义几个时间参数:

    s_report_date = "2017/3/31"
    s_date = "20170720"
    time = Time.new  #取当前时间
    s_today = "#{time.year}" + "#{time.month}" + "#{time.day}" #把当前时间拼接成2017616格式,用于命名excel文件名
    #puts s_today
    

    三、创建Excel表

    #创建一个excel表并按照当前日期命名
    file_path = "d:\\rb\\stock\\"
    file_name = s_today + "股票池.xls"
    file_name_path = file_path + file_name
    
    excel = WIN32OLE.new("excel.application")
    excel.visible = true     # in case you want to see what happens 
    workbook = excel.workbooks.add
    workbook.saveas(file_name_path)
    workbook.close
    excel.Quit
    

    win32ole的使用比较简单,直接按照操作说明比着葫芦画瓢即可。

    四、将相关数据写入Excel表,这是我目前感兴趣的报表项目:

    #写入表头
    excel = WIN32OLE::new('Excel.Application')
    workbook = excel.Workbooks.Open(file_name_path)
    worksheet = workbook.Worksheets(1) #定位到第一个sheet
    worksheet.Select
    #写入
    worksheet.Range("a2").Value = s_today + "By 路过银河" 
    worksheet.Range("a3").Value = "股票代码" 
    worksheet.Range("b3").Value = "股票名称" 
    worksheet.Range("c3").Value = "参考评级" 
    worksheet.Range("d3").Value = "行业板块"
    worksheet.Range("e3").Value = "地区"
    worksheet.Range("f3").Value = "概念"
    worksheet.Range("g3").Value = "报表日期"
    worksheet.Range("h3").Value = "总市值"
    worksheet.Range("i3").Value = "市盈率"
    worksheet.Range("j3").Value = "市净率"
    worksheet.Range("k3").Value = "季报利润总额"
    worksheet.Range("l3").Value = "年报利润总额"
    worksheet.Range("m3").Value = "上年同期利润总额"
    worksheet.Range("n3").Value = "利润同比增减"
    worksheet.Range("o3").Value = "利润较年报1/4增减"
    worksheet.Range("p3").Value = "季报净利润"
    worksheet.Range("q3").Value = "季报投资收益"
    worksheet.Range("r3").Value = "季报其他业务收支净额"
    worksheet.Range("s3").Value = "非主营收入占利润比"
    worksheet.Range("t3").Value = "2014净利润"
    worksheet.Range("u3").Value = "2015净利润"
    worksheet.Range("v3").Value = "2016净利润"
    worksheet.Range("w3").Value = "2015净利润增长率"
    worksheet.Range("x3").Value = "2016净利润增长率"
    worksheet.Range("y3").Value = "三年净利为正且16增长率大于15"
    worksheet.Range("z3").Value = "流动资产"
    worksheet.Range("aa3").Value = "存货"
    worksheet.Range("ab3").Value = "货币资金"
    worksheet.Range("ac3").Value = "流动负债"
    worksheet.Range("ad3").Value = "资产总额"
    worksheet.Range("ae3").Value = "负债总额"
    worksheet.Range("af3").Value = "流动比率"
    worksheet.Range("ag3").Value = "速动比率"
    worksheet.Range("ah3").Value = "现金率"
    worksheet.Range("ai3").Value = "资产负债率"
    worksheet.Range("aj3").Value = "销售费用"
    worksheet.Range("ak3").Value = "管理费用"
    worksheet.Range("al3").Value = "财务费用"
    worksheet.Range("am3").Value = "主营业务净利率"
    worksheet.Range("an3").Value = "主营业务毛利率"
    worksheet.Range("ao3").Value = "净资产收益率"
    worksheet.Range("ap3").Value = "经营现金流净额"
    worksheet.Range("aq3").Value = "存货/利润"
    worksheet.Range("ar3").Value = "盈利能力"
    worksheet.Range("as3").Value = "偿债能力"
    worksheet.Range("at3").Value = "营运能力"
    worksheet.Range("au3").Value = "业绩预估"
    worksheet.Range("av3").Value = "业绩成长"
    worksheet.Range("aw3").Value = "估值"
    worksheet.Range("ax3").Value = "交易性金融资产"
    worksheet.Range("ay3").Value = "主营业务收入"
    worksheet.Range("az3").Value = "投资收益"
    worksheet.Range("ba3").Value = "主营业务成本"
    worksheet.Range("bb3").Value = "季报主营业务收入"
    worksheet.Range("bc3").Value = "2015主营业务收入"
    worksheet.Range("bd3").Value = "2014主营业务收入"
    worksheet.Range("be3").Value = "季报总资产"
    worksheet.Range("bf3").Value = "2016总资产"
    worksheet.Range("bg3").Value = "2015总资产"
    worksheet.Range("bh3").Value = "2014总资产"
    worksheet.Range("bi3").Value = "季报在建工程"
    worksheet.Range("bj3").Value = "2016在建工程"
    worksheet.Range("bk3").Value = "2015在建工程"
    worksheet.Range("bl3").Value = "2014在建工程"
    

    五、然后写一个循环,我对循环学得不好,只好用比较笨的办法写,这里其实应该能写成一两行,记得在前面给x赋值哟,并通过一个if语句判断是深市还是呼市的股票:

    #循环开始
    while x <= $nword
    #puts "轮询中:"
    stock_lines = IO.readlines("d:/rb/stock/stock.txt");  
    s_code = stock_lines[x]
    s_code_s = s_code.chomp  # chomp用来删除文本里带过来的换行符
    if s_code_s >  "600000"
      scode = "sh" + s_code_s
    elsif  
      s_code_s ==  "600000"
      scode = "sh" + s_code_s
    else
      scode = "sz" + s_code_s
      
    end
    

    六、通过腾讯证券接口读实时股票信息,在此我主要是用来计算当前市值、市盈率和市净率:

    doc1 = Hpricot(open('http://qt.gtimg.cn/q=' + "#{scode}"))  #调用腾讯股票实时接口
    #puts doc
    
    out_file1 = open('d:/rb/stock/temp1.txt', 'w') 
    out_file1.write(doc1) 
    out_file1.close
    
    str1 = IO.read("d:/rb/stock/temp1.txt");  
      #puts str.length;  
      #puts str[0,30] 
    str1.split(/~/)
    arr1 = str1.split(/~/)
    #定义股票接口对应的字段名
    s_name = arr1[1] #股票名
    s_number = arr1[2]
    s_current_price = arr1[3] #当前价格
    s_closing_price = arr1[4] #昨收 
    s_opening_price = arr1[5] #今开
    s_ltsz = arr1[44]
    

    七、读取全市场全部报表数据,这个过程比较漫长,通常几个小时:

    #读取主要财务指标表
    ##定义财务指标表存储路径
    file_zycwzb_path = "d:/stock/zycwzb/"
    file_zycwzb_name = s_code_s + "zycwzb.csv"
    filename_zycwzb = file_zycwzb_path + file_zycwzb_name
    #读取财务指标表
    zycwzb_lines = IO.readlines(filename_zycwzb);  
    
    #读取资产负债表
    ##定义资产负债表存储路径
    file_zcfzb_path = "d:/stock/zcfzb/"
    file_zcfzb_name = s_code_s + "zcfzb.csv"
    filename_zcfzb = file_zcfzb_path + file_zcfzb_name
    #读取资产负债表
    zcfzb_lines = IO.readlines(filename_zcfzb);  
    
    #读取利润表
    ##定义利润表存储路径
    file_lrb_path = "d:/stock/lrb/"
    file_lrb_name = s_code_s + "lrb.csv"
    filename_lrb = file_lrb_path + file_lrb_name
    #读取利润债表
    lrb_lines = IO.readlines(filename_lrb);  
    
    #读取现金流量表
    ##定义现金流量表存储路径
    file_xjllb_path = "d:/stock/xjllb/"
    file_xjllb_name = s_code_s + "xjllb.csv"
    filename_xjllb = file_xjllb_path + file_xjllb_name
    #读取利润债表
    xjllb_lines = IO.readlines(filename_xjllb);  
    
    #判断是否存在最新季报,如果存在,则读取相关列并定义变量
    if zycwzb_lines[0] = s_report_date 
      puts "股票代码:" + s_code
      puts "报表日期:" + zycwzb_lines[0]
      #puts stock_lines[1]
    #主要财务指标表部分
    #读取基本每股收益
      arr_jbmgsy = zycwzb_lines[1]
      #取数格式如下:
      #arr_jbmgsy.split(/,/)
      #puts "#{s_report_date}基本每股收益:" + arr_jbmgsy.split(/,/)[1]
      #puts "年报基本每股收益:" + arr_jbmgsy.split(/,/)[2]
     # puts "上年同期基本每股收益:" + arr_jbmgsy.split(/,/)[5]
    #读取每股净资产
      arr_mgjzc = zycwzb_lines[2]
    #读取每股经营活动产生的现金流量净额
      arr_mgjyxjllje = zycwzb_lines[3]
    #读取主营业务收入
      arr_zyywsr = zycwzb_lines[4]
    #读取主营业务利润
      arr_mzyywlr = zycwzb_lines[5]
    #读取营业利润
      arr_yylr = zycwzb_lines[6]
    #读取投资收益
      arr_tzsy = zycwzb_lines[7]
    #读取营业外收支净额
      arr_yywszje = zycwzb_lines[8]
    #读取利润总额
      arr_lrze = zycwzb_lines[9]
    #puts "利润总额:" + arr_lrze.split(/,/)[1]
    #读取净利润
      arr_jlr = zycwzb_lines[10]
    #读取净利润(扣除非经常性损益后)
      arr_jlrkc = zycwzb_lines[11]
    #读取经营活动产生的现金流量净额
      arr_jyhdxjll = zycwzb_lines[12]
    #读取现金及现金等价物净增加额
      arr_xjje = zycwzb_lines[13]
    #读取总资产
      arr_zzc = zycwzb_lines[14]
    #读取流动资产
      arr_ldzc = zycwzb_lines[15]
    #读取总负债
      arr_zfz = zycwzb_lines[16]
    #读取流动负债
      arr_ldfz = zycwzb_lines[17]  
    #读取股东权益
      arr_gdqy = zycwzb_lines[18]  
    #读取净资产收益率加权
      arr_jzcsyl = zycwzb_lines[19]  
    
    #资产负债表部分
    #读取货币资金
    arr_hbzj = zcfzb_lines[1]
    #读取应收票据
    arr_yspj = zcfzb_lines[6]
    #读取应收账款
    arr_yszk = zcfzb_lines[7]
    #读取存货
    arr_ch = zcfzb_lines[20]
    #读取在建工程
    arr_zjgc = zcfzb_lines[38]
    #读取交易性金融资产
    arr_jyxjrzc  = zcfzb_lines[4]
    
    #利润表部分
    #读取销售费用
    arr_xsfy = lrb_lines[21]
    #读取管理费用
    arr_glfy = lrb_lines[22]
    #读取财务费用
    arr_cwfy = lrb_lines[23]
    #读取主营业务成本
    arr_zyywcb = lrb_lines[9]
    
    

    八、将读取到的Excel数据逐行写入创建的Excel表

    puts "#{s_number}正在写入excel表... ..."
    excel = WIN32OLE::new('Excel.Application')
    workbook = excel.Workbooks.Open(file_name_path)
    worksheet = workbook.Worksheets(1) #定位到第一个sheet
    worksheet.Select
    #写入
    worksheet.Range("a#{x + 3}").Value = "'" + s_number
    worksheet.Range("b#{x + 3}").Value = s_name 
    #worksheet.Range("d#{x + 3}").Value = s_current_price
    #worksheet.Range("f#{x + 3}").Value = s_closing_price
    worksheet.Range("g#{x + 3}").Value = s_current_price
    worksheet.Range("h#{x + 3}").Value = arr1[44]
    worksheet.Range("i#{x + 3}").Value = arr1[39]
    worksheet.Range("j#{x + 3}").Value = arr1[46]
    worksheet.Range("k#{x + 3}").Value = arr_lrze.split(/,/)[1]
    worksheet.Range("l#{x + 3}").Value = arr_lrze.split(/,/)[2]
    worksheet.Range("m#{x + 3}").Value = arr_lrze.split(/,/)[5]
    worksheet.Range("n#{x + 3}").Value = "=(k#{x + 3}-m#{x + 3})/abs(m#{x + 3})"
    worksheet.Range("o#{x + 3}").Value = "=(k#{x + 3}-(l#{x + 3}/4))/abs(m#{x + 3})"
    worksheet.Range("p#{x + 3}").Value = arr_jlr.split(/,/)[1]
    worksheet.Range("q#{x + 3}").Value = arr_tzsy.split(/,/)[1]
    worksheet.Range("r#{x + 3}").Value = arr_yywszje.split(/,/)[1]
    worksheet.Range("s#{x + 3}").Value = "=(q#{x + 3}+r#{x + 3})/p#{x + 3}"
    worksheet.Range("t#{x + 3}").Value = arr_jlr.split(/,/)[10]  #2014年净利润
    worksheet.Range("u#{x + 3}").Value = arr_jlr.split(/,/)[6]  #2015年净利润
    worksheet.Range("v#{x + 3}").Value = arr_jlr.split(/,/)[2]  #2016年净利润
    worksheet.Range("w#{x + 3}").Value = "=(u#{x + 3}-t#{x + 3})/t#{x + 3}"
    worksheet.Range("x#{x + 3}").Value = "=(v#{x + 3}-u#{x + 3})/u#{x + 3}"
    worksheet.Range("y#{x + 3}").Value = "=and(t#{x + 3}>0,u#{x + 3}>0,v#{x + 3}>0,x#{x + 3}>w#{x + 3})"
    worksheet.Range("z#{x + 3}").Value = arr_ldzc.split(/,/)[2] 
    worksheet.Range("aa#{x + 3}").Value = arr_ch.split(/,/)[2] 
    worksheet.Range("ab#{x + 3}").Value = arr_hbzj.split(/,/)[2] 
    worksheet.Range("ac#{x + 3}").Value = arr_ldfz.split(/,/)[2] 
    worksheet.Range("ad#{x + 3}").Value = arr_zzc.split(/,/)[2] 
    worksheet.Range("ae#{x + 3}").Value = arr_zfz.split(/,/)[2] 
    worksheet.Range("af#{x + 3}").Value = "=z#{x + 3}/ac#{x + 3}"
    worksheet.Range("ag#{x + 3}").Value = "=(z#{x + 3}-aa#{x + 3})/ac#{x + 3}"
    worksheet.Range("ah#{x + 3}").Value = "=(ab#{x + 3}+ax#{x + 3})/ac#{x + 3}"
    worksheet.Range("ai#{x + 3}").Value = "=ad#{x + 3}/ae#{x + 3}"
    worksheet.Range("aj#{x + 3}").Value = arr_xsfy.split(/,/)[2] 
    worksheet.Range("ak#{x + 3}").Value = arr_glfy.split(/,/)[2] 
    worksheet.Range("al#{x + 3}").Value = arr_cwfy.split(/,/)[2] 
    worksheet.Range("am#{x + 3}").Value = "=v#{x + 3}/ay#{x + 3}"
    worksheet.Range("an#{x + 3}").Value = "=(ay#{x + 3}-ba#{x + 3})/ay#{x + 3}"
    
    worksheet.Range("ap#{x + 3}").Value = arr_jyhdxjll.split(/,/)[2] 
    worksheet.Range("aq#{x + 3}").Value = "=aa#{x + 3}/v#{x + 3}"
    
    
    worksheet.Range("ax#{x + 3}").Value = arr_jyxjrzc.split(/,/)[2] 
    worksheet.Range("ay#{x + 3}").Value = arr_zyywsr.split(/,/)[2] 
    worksheet.Range("az#{x + 3}").Value = arr_tzsy.split(/,/)[2]
    worksheet.Range("ba#{x + 3}").Value = arr_zyywcb.split(/,/)[2]
    worksheet.Range("bb#{x + 3}").Value = arr_zyywsr.split(/,/)[1]
    worksheet.Range("bc#{x + 3}").Value = arr_zyywsr.split(/,/)[6]
    worksheet.Range("bd#{x + 3}").Value = arr_zyywsr.split(/,/)[10]
    worksheet.Range("be#{x + 3}").Value = arr_zzc.split(/,/)[1]
    worksheet.Range("bf#{x + 3}").Value = arr_zzc.split(/,/)[2]
    worksheet.Range("bg#{x + 3}").Value = arr_zzc.split(/,/)[6]
    worksheet.Range("bh#{x + 3}").Value = arr_zzc.split(/,/)[10]
    worksheet.Range("bi#{x + 3}").Value = arr_zjgc.split(/,/)[1]
    worksheet.Range("bj#{x + 3}").Value = arr_zjgc.split(/,/)[2]
    worksheet.Range("bk#{x + 3}").Value = arr_zjgc.split(/,/)[6]
    worksheet.Range("bl#{x + 3}").Value = arr_zjgc.split(/,/)[10]
    
    workbook.Close(1)
    excel.Quit
    

    循环结束,记得用x = x+1和end。
    实际上,此类指标很多收费软件都可以实现,比如wind,但是wind动辄六万块的年费,不是一般爱好者能玩的。
    通过学习编程来实现,既熟悉了数据的来龙去脉,又能把感兴趣的指标分析出来,何乐不为?

    程序执行过程:


    程序最终执行结果:

    相关文章

      网友评论

          本文标题:数据分析师之路(四):用Ruby批量处理Excel

          本文链接:https://www.haomeiwen.com/subject/uiejkxtx.html