美文网首页pythonDATA ANALYSIS PROCESS
整洁数据:汇总多列(melt)&拆分列(pivot)

整洁数据:汇总多列(melt)&拆分列(pivot)

作者: IntoTheVoid | 来源:发表于2018-10-09 14:19 被阅读6次

识别整洁的数据
要使数据整洁,它必须具有:

  • 每个变量作为单独的列。
  • 每行作为单独的观察。
    作为数据科学家,您将遇到以各种不同方式表示的数据,因此在您看到数据时能够识别整洁(或不整洁)数据非常重要。
image.png

例如上图中的df2就不是整洁的数据,因为多个变量都挤在了同一列中.

Melt data

DataFrame using pd.melt(). There are two parameters you should be aware of: id_vars and value_vars. The id_vars represent the columns of the data you do not want to melt (i.e., keep it in its current shape), while the value_vars represent the columns you do wish to melt into rows. By default, if no value_vars are provided, all columns not set in the id_vars will be melted.

假如想让df1变为df2的样子, 代码如下:

# Print the head of airquality
print(airquality.head())

# Melt airquality: airquality_melt
airquality_melt = pd.melt(frame=airquality, id_vars=['Month', 'Day'])

# Print the head of airquality_melt
print(airquality_melt.head())

可以通过如下方式给融合的列自定义列名

# Print the head of airquality
print(airquality.head())

# Melt airquality: airquality_melt
airquality_melt = pd.melt(frame=airquality, id_vars=['Month', 'Day'], var_name='measurement', value_name='reading')

# Print the head of airquality_melt
print(airquality_melt.head())
image.png

Pivot data

pivot data 是melt data 的逆过程


image.png

示例:

# Print the head of airquality_melt
print(airquality_melt.head())

# Pivot airquality_melt: airquality_pivot
airquality_pivot = airquality_melt.pivot_table(index=['Month', 'Day'], columns='measurement', values='reading')

# Print the head of airquality_pivot
print(airquality_pivot.head())
image.png

恢复索引

# Print the index of airquality_pivot
print(airquality_pivot.index)

# Reset the index of airquality_pivot: airquality_pivot_reset
airquality_pivot_reset = airquality_pivot.reset_index()

# Print the new index of airquality_pivot_reset
print(airquality_pivot_reset.index)

# Print the head of airquality_pivot_reset
print(airquality_pivot_reset.head())
image.png
Pivoting duplicate values

pivot_table中传入aggfunc=

Splitting a column with .split() and .get()

对于如下的结构


image.png

如何将Cases_xxxxDeath_xxxx拆成两列一列为case, 一列为xxxx

  • Melt ebola using 'Date' and 'Day' as theid_vars,'type_country' as the var_name, and'counts'as the value_name.
  • Create a column called'str_split' by splitting the'type_country' column of ebola_melt on'_'. Note that you will first have to access the str attribute of type_country before you can use .split().
  • Create a column called'type' by using the .get() method to retrieve index 0of the'str_split' column of ebola_melt.
  • Create a column called'country' by using the.get()method to retrieve index 1 of the'str_split' column of ebola_melt.
  • Print the head of ebola.
# Melt ebola: ebola_melt
ebola_melt = pd.melt(ebola, id_vars=['Date', 'Day'], var_name='type_country', value_name='counts')

# Create the 'str_split' column
ebola_melt['str_split'] = ebola_melt.type_country.str.split('_')

# Create the 'type' column
ebola_melt['type'] = ebola_melt.str_split.str[0]

# Create the 'country' column
ebola_melt['country'] = ebola_melt.str_split.str[1]

# Print the head of ebola_melt
print(ebola_melt.head())

相关文章

网友评论

    本文标题:整洁数据:汇总多列(melt)&拆分列(pivot)

    本文链接:https://www.haomeiwen.com/subject/kzvjaftx.html