美文网首页
Pythonic Data Cleaning With NumP

Pythonic Data Cleaning With NumP

作者: needrunning | 来源:发表于2019-05-16 01:37 被阅读0次

使用 Python 进行数据清洗的第三部分翻译,全部翻译的文章内容摘要如下

下图目录是一些常规的数据清理项,本文中主要讨论

Renaming Columns and Skipping Rows

Python Data Cleaning: Recap and Resources

数据清理目录.png

<figcaption style="margin-top: 5px; text-align: center; color: #888; font-size: 14px;">数据清理目录.png</figcaption>

原文地址

Pythonic Data Cleaning With NumPy and Pandas[1]

Renaming Columns and Skipping Rows

重命名列和跳行

首先我们分析下原始数据集 原始数据集.png

Therefore, we need to do two things:

Skip one row and set the header as the first (0-indexed) row
Rename the columns

通过增加参数,移除第一行

  • ounter(line
  • ounter(line
olympics_df = pd.read_csv('datasets/python-data-cleaning-master/olympics.csv', header=1)

通过指定列名索引集合来重命名列

  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
new_names =  {'Unnamed: 0': 'Country',
  • ounter(line
olympics_df.rename(columns=new_names, inplace=True)

Python Data Cleaning: Recap and Resources

数据清洗回顾和相关资源

In this tutorial, you learned how you can drop unnecessary information from a dataset using the drop() function, as well as how to set an index for your dataset so that items in it can be referenced easily.

Moreover, you learned how to clean object fields with the .str() accessor and how to clean the entire dataset using the applymap() method. Lastly, we explored how to skip rows in a CSV file and rename columns using the rename() method.

数据清洗是数据科学中的重要部分。这篇文章是对 python 中使用 Pandas and NumPy 库的使用有一个基本的理解。

Knowing about data cleaning is very important, because it is a big part of data science. You now have a basic understanding of how Pandas and NumPy can be leveraged to clean datasets!

Check out the links below to find additional resources that will help you on your Python data science journey:

  • The Pandas documentation[2]
  • The NumPy documentation[3]
  • Python for Data Analysis[4] by Wes McKinney, the creator of Pandas
  • Pandas Cookbook[5] by Ted Petrou, a data science trainer and consultant

翻译总结

一整篇文章的翻译分成了三部分,持续花了三周的时间,文章算是 Python 数据处理的入门知识,是实际使用的基础应用点,翻译的内容可以作为知识索引,之后需要的时候返回来再看看。

另外发现https://realpython.com[6]是学习 python 很不错的外文网站,之后会持续翻译这个网站上 python 相关的文章,作为积累,一点一点熟悉 python。

参考资料

[1]

Pythonic Data Cleaning With NumPy and Pandas: https://realpython.com/python-data-cleaning-numpy-pandas/ [2]

documentation: https://pandas.pydata.org/pandas-docs/stable/index.html [3]

documentation: https://docs.scipy.org/doc/numpy/reference/ [4]

Python for Data Analysis: https://realpython.com/asins/1491957662/ [5]

Pandas Cookbook: https://realpython.com/asins/B06W2LXLQK/ [6]

https://realpython.com: https://realpython.com/

相关文章

网友评论

      本文标题:Pythonic Data Cleaning With NumP

      本文链接:https://www.haomeiwen.com/subject/rthkaqtx.html