[pandas] 胖大师炸鸡

作者: Silver_42ac | 来源:发表于2021-03-11 11:35 被阅读0次

[pandas] 胖大师炸鸡
大师兄的Python机器学习笔记:Pandas库
pandas DataFrame
畅销整个夏天的鸡光宝盒你找到自己喜欢的口味了吗
爱吃炸鸡的胖丁
三八女神节，要好好宠爱自己
大师兄的Python机器学习笔记:实现评估模型
炸鸡争宠大作战今天我要必须要吃这家的掌中宝
130斤肥仔减肥难
大师兄的Python机器学习笔记:特征提取

#查看data frame 的形状(多少行，多少列)
data_df.shap()

#将 data frame 转为 多维 numpy array
data_df.values
#将Series 转为 1维 numpy array
data_series.values

#rename
#new_df = new_df.rename(columns={'class': 'class_n'})

#insert row
row,col = sample_df.shape
new_lis = [x for x in range(row)]
sample_df.insert(0,'STRID',new_lis)

#批量修改某列
import re
def str_change(input_str):
    new = "xxx"+str(int(input_str)+1).zfill(6)
    return new

sample_df['xxxID'] = sample_df['xxxID'].apply(str_change)
#修改某一列#sample_df整体也会变



#输出data frame索引 或者 列名
sample_df.index
sample_df.columns

#删除丢弃某列或某行
默认axis=0;默认删除指定行，取index 编号或者 对应的index 字符串
new_df =  sample_df.drop(['index1','index2','index3'],axis=0)

new_df = sample_df.drop(['#Chr','Start','End'],axis=1)


#操作两列，生成一个新列
def my_test(a, b):
    return a + b

df['value'] = df.apply(lambda row: my_test(df_row['c1'], df_row['c2']), axis=1)

def convert_str(x):
    if re.search("UTR",x):
        return 'UTR'
    if 'intronic'=x:
        return 'intron'
    if 'exonic'=x:
        return 'exonic'
        
new2_df['Func']=new2_df['Func'].apply(lambda series_type_v: convert_str(series_type_v))

#data  frame 中调整/指定列的顺序
cols=['ID','Gender','ExamYear','Class','Participated','Passed','Employed','Grade']
new_df=old_df[cols]

#根据某一列或者多列，指定排序
new2_sort_df=new2_df.sort_values(by='Score', ascending=False)
#scending=False 为倒序
#衍生：  Score 先倒序，次要条件Value 顺序排序
new2_sort_df=new2_df.sort_values(by=['Score','Value'], ascending=[False,True])


#输出
out_xls ='result.xls'
out_xlsx = 'result.xlsx'
new2_df.to_csv(out_xls,sep="\t", index=False)
new2_sort_df.to_excel(out_xlsx, index=False)

#单独生成一列，求多列的平均值之类
feature_df["max"]=feature_df[["math","Chinese","PE"]].max(axis=1)
feature_df["mean"]=feature_df[["math","Chinese","PE"]].mean(axis=1)
feature_df["sum"]=feature_df[["math","Chinese","PE"]].sum(axis=1)

衍生方式,    所有行，第9列到19列 [:,8:20]
feature_df["sum"]=feature_df.iloc[:,8:20].sum(axis=1)


#去重复行
New_df['X','A', 'B'].drop_duplicates()

参考：
pandas dataframe 同时操作两列 / 多列
 Python在Dataframe中新添加一列
 pandas DataFrame 删除重复的行的实现方法