美文网首页
pandas实战

pandas实战

作者: 程序员小白成长记 | 来源:发表于2022-09-01 15:43 被阅读0次

一、pandas读取csv文件

import pandas as pd
f = open("C:\\Users\\matrix\\lz_business_advisor_dashboard_key_metrics_source.csv", encoding = 'utf-8')
pd.read_csv(f)

参考
【1】详解pandas的read_csv方法:https://blog.csdn.net/weixin_37706204/article/details/120827141

二、pandas读取excel文件

import pandas as pd
data = pd.read_excel("C:\\Users\\matrix\\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, index_col = 0, nrows = 3)
data

其中,
sheet_name:第几个sheet页,是从0开始的
header:第几行是header
index_col :其实列
nrows:取多少行

三、pandas在指定列添加一列

import pandas as pd
df = pd.read_excel("C:\\Users\\matrix\\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5,  nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
df

四、pandas写入excel文件

import pandas as pd
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5,  nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
df.to_excel(r"C:\Users\matrix\test.xlsx")

五、pandas写入数据库

import pandas as pd
import pymysql
from sqlalchemy import create_engine
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5,  nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
# create conn
conn = create_engine('mysql+pymysql://root:123456@localhost:3306/test', encoding = 'utf8')
# write db
insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace')
print(insert_rows)
# df.to_excel(r"C:\Users\matrix\test.xlsx")

参考:pandas 写入mysql数据库.to_sql方法详解

六、pandas处理表头

import pandas as pd
import pymysql
from sqlalchemy import create_engine
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5,  nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
df = pd.DataFrame(df, columns = ['date_of_data','Date'])
df.columns = (['date_of_data', 'date'])
df

七、pandas正则

replace方法不支持正则,需要用sub方法
demo: 将所有表头除了数字、字母和下划线以外的字符全部替换为下划线,并且全部字符小写

import pandas as pd
import re
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
# df = pd.DataFrame(df, columns = ['date_of_data','Date'])
field_list = []
# regex1 = re.compile(r'[^A-Za-z0-9]')
# regex2 = re.compile(r'_+')
for field in df.columns:
    # field = field.replace(' ', '')
    # field = regex1.sub('_', field)
    # transform except 'A-Za-z0-9' char to '_'
    field = re.sub(r'[^A-Za-z0-9]', r'_', field)
    # transform multiple '_' char to '_'
    field = re.sub(r'_+', r'_', field)
    # lower all char
    field = field.lower()
    # field = regex2.sub('#', field)
    field_list.append(field)
df.columns = tuple(field_list)
df

八、pandas将所有类改为str类型

import pandas as pd
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df[:] = df[:].astype(str)
df.dtypes

参考:在Pandas中更改列的数据类型

九、pandas写入数据库列类型

insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace', index=False,
                               dtype={col_name: VARCHAR(500) for col_name in df}
                              )
import pandas as pd
import pymysql
import sqlalchemy
from sqlalchemy import create_engine
import re
from sqlalchemy.types import VARCHAR 
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
# df = pd.DataFrame(df, columns = ['date_of_data','Date'])
field_list = []
# regex1 = re.compile(r'[^A-Za-z0-9]')
# regex2 = re.compile(r'_+')
for field in df.columns:
    # field = field.replace(' ', '')
    # field = regex1.sub('_', field)
    # transform except 'A-Za-z0-9' char to '_'
    field = re.sub(r'[^A-Za-z0-9]', r'_', field)
    # transform multiple '_' char to '_'
    field = re.sub(r'_+', r'_', field)
    # lower all char
    field = field.lower()
    # field = regex2.sub('#', field)
    field_list.append(field)
df.columns = tuple(field_list)
# transform all columns to str type, str map to db text
# df[:] = df[:].astype(str)
# df.dtypes
# df[field_list]
# create conn
conn_string = 'mysql+pymysql://root:123456@localhost:3306/test'
conn = create_engine(conn_string, encoding = 'utf8')
# write db
insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace', index=False,
                               dtype={col_name: VARCHAR(500) for col_name in df}
                              )
print(insert_rows)

参考:
[1] Pandas to_sql将列类型从varchar更改为text
[2] pandas to_sql all columns as nvarchar (可行方案)

相关文章

网友评论

      本文标题:pandas实战

      本文链接:https://www.haomeiwen.com/subject/ykmzirtx.html