美文网首页
pandas实战

pandas实战

作者: 程序员小白成长记 | 来源:发表于2022-09-01 15:43 被阅读0次

    一、pandas读取csv文件

    import pandas as pd
    f = open("C:\\Users\\matrix\\lz_business_advisor_dashboard_key_metrics_source.csv", encoding = 'utf-8')
    pd.read_csv(f)
    

    参考
    【1】详解pandas的read_csv方法:https://blog.csdn.net/weixin_37706204/article/details/120827141

    二、pandas读取excel文件

    import pandas as pd
    data = pd.read_excel("C:\\Users\\matrix\\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, index_col = 0, nrows = 3)
    data
    

    其中,
    sheet_name:第几个sheet页,是从0开始的
    header:第几行是header
    index_col :其实列
    nrows:取多少行

    三、pandas在指定列添加一列

    import pandas as pd
    df = pd.read_excel("C:\\Users\\matrix\\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5,  nrows = 3)
    df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
    df
    

    四、pandas写入excel文件

    import pandas as pd
    df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5,  nrows = 3)
    df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
    df.to_excel(r"C:\Users\matrix\test.xlsx")
    

    五、pandas写入数据库

    import pandas as pd
    import pymysql
    from sqlalchemy import create_engine
    df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5,  nrows = 3)
    df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
    # create conn
    conn = create_engine('mysql+pymysql://root:123456@localhost:3306/test', encoding = 'utf8')
    # write db
    insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace')
    print(insert_rows)
    # df.to_excel(r"C:\Users\matrix\test.xlsx")
    

    参考:pandas 写入mysql数据库.to_sql方法详解

    六、pandas处理表头

    import pandas as pd
    import pymysql
    from sqlalchemy import create_engine
    df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5,  nrows = 3)
    df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
    df = pd.DataFrame(df, columns = ['date_of_data','Date'])
    df.columns = (['date_of_data', 'date'])
    df
    

    七、pandas正则

    replace方法不支持正则,需要用sub方法
    demo: 将所有表头除了数字、字母和下划线以外的字符全部替换为下划线,并且全部字符小写

    import pandas as pd
    import re
    df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
    df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
    # df = pd.DataFrame(df, columns = ['date_of_data','Date'])
    field_list = []
    # regex1 = re.compile(r'[^A-Za-z0-9]')
    # regex2 = re.compile(r'_+')
    for field in df.columns:
        # field = field.replace(' ', '')
        # field = regex1.sub('_', field)
        # transform except 'A-Za-z0-9' char to '_'
        field = re.sub(r'[^A-Za-z0-9]', r'_', field)
        # transform multiple '_' char to '_'
        field = re.sub(r'_+', r'_', field)
        # lower all char
        field = field.lower()
        # field = regex2.sub('#', field)
        field_list.append(field)
    df.columns = tuple(field_list)
    df
    

    八、pandas将所有类改为str类型

    import pandas as pd
    df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
    df[:] = df[:].astype(str)
    df.dtypes
    

    参考:在Pandas中更改列的数据类型

    九、pandas写入数据库列类型

    insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace', index=False,
                                   dtype={col_name: VARCHAR(500) for col_name in df}
                                  )
    
    import pandas as pd
    import pymysql
    import sqlalchemy
    from sqlalchemy import create_engine
    import re
    from sqlalchemy.types import VARCHAR 
    df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
    df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
    # df = pd.DataFrame(df, columns = ['date_of_data','Date'])
    field_list = []
    # regex1 = re.compile(r'[^A-Za-z0-9]')
    # regex2 = re.compile(r'_+')
    for field in df.columns:
        # field = field.replace(' ', '')
        # field = regex1.sub('_', field)
        # transform except 'A-Za-z0-9' char to '_'
        field = re.sub(r'[^A-Za-z0-9]', r'_', field)
        # transform multiple '_' char to '_'
        field = re.sub(r'_+', r'_', field)
        # lower all char
        field = field.lower()
        # field = regex2.sub('#', field)
        field_list.append(field)
    df.columns = tuple(field_list)
    # transform all columns to str type, str map to db text
    # df[:] = df[:].astype(str)
    # df.dtypes
    # df[field_list]
    # create conn
    conn_string = 'mysql+pymysql://root:123456@localhost:3306/test'
    conn = create_engine(conn_string, encoding = 'utf8')
    # write db
    insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace', index=False,
                                   dtype={col_name: VARCHAR(500) for col_name in df}
                                  )
    print(insert_rows)
    
    

    参考:
    [1] Pandas to_sql将列类型从varchar更改为text
    [2] pandas to_sql all columns as nvarchar (可行方案)

    相关文章

      网友评论

          本文标题:pandas实战

          本文链接:https://www.haomeiwen.com/subject/ykmzirtx.html