在本文中,我们将学习如何使用Pandas的 get_dummies()方法在Python中创建虚拟变量。 虚拟变量(或二进制/指标变量)通常用于统计分析以及更简单的描述性统计。 虚拟编码可以通过统计软件(例如Python、R或者SPSS)自动完成。
import pandas as pd
data_url = 'Salaries.csv'
df = pd.read_csv(data_url, index_col=0)
print(df.head())


print(pd.get_dummies(df['sex']).head())

df_dummies = pd.get_dummies(df, columns=['sex'])
print(df_dummies.head())

df_dummies = pd.get_dummies(df, prefix='Gender', prefix_sep='.',
columns=['sex'])
print(df_dummies.head())

df_dummies = pd.get_dummies(df, prefix='', prefix_sep='',
columns=['sex'])
print(df_dummies.head())

print(pd.get_dummies(df['rank']).head())

df_dummies = pd.get_dummies(df, columns=['rank'])
print(df_dummies.head())

df_dummies = pd.get_dummies(df, prefix='Rank', prefix_sep='.',
columns=['rank'])
print(df_dummies.head())


df_dummies = pd.get_dummies(df, prefix='', prefix_sep='',
columns=['rank', 'sex'])
print(df_dummies.head())

df_dummies = pd.get_dummies(df, prefix='', prefix_sep='',
columns=['rank', 'sex', 'discipline'])
print(df_dummies.head())
网友评论