美文网首页我爱编程
Python和csv文件交互

Python和csv文件交互

作者: 且行歌 | 来源:发表于2018-02-06 23:12 被阅读176次

    数据读取和输出

    读取格式

    Function Description
    read_csv Load delimited data from a file, URL, or file-like object; use comma as default delimiter
    read_table Load delimited data from a file, URL, or file-like object; use tab ('\t') as default delimiter
    read_fwf Read data in fixed-width column format (i.e., no delimiters)
    read_clipboard Version of read_table that reads data from the clipboard; useful for converting tables from web pages
    read_excel Read tabular data from an Excel XLS or XLSX file
    read_hdf Read HDF5 files written by pandas
    read_html Read all tables found in the given HTML document
    read_json Read data from a JSON (JavaScript Object Notation) string representation
    read_msgpack Read pandas data encoded using the MessagePack binary format
    read_pickle Read an arbitrary object stored in Python pickle format
    read_sas Read a SAS dataset stored in one of the SAS system’s custom storage formats
    read_sql Read the results of a SQL query (using SQLAlchemy) as a pandas DataFrame
    read_stata Read a dataset from Stata file format
    read_feather Read the Feather binary file format

    import pandas as pd
    df = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex1.csv')
    
    df
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>a</th>
    <th>b</th>
    <th>c</th>
    <th>d</th>
    <th>message</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>1</td>
    <td>2</td>
    <td>3</td>
    <td>4</td>
    <td>hello</td>
    </tr>
    <tr>
    <th>1</th>
    <td>5</td>
    <td>6</td>
    <td>7</td>
    <td>8</td>
    <td>world</td>
    </tr>
    <tr>
    <th>2</th>
    <td>9</td>
    <td>10</td>
    <td>11</td>
    <td>12</td>
    <td>foo</td>
    </tr>
    </tbody>
    </table>
    </div>

    pd.read_table('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex1.csv',sep = ',')#指定分隔符类型
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>a</th>
    <th>b</th>
    <th>c</th>
    <th>d</th>
    <th>message</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>1</td>
    <td>2</td>
    <td>3</td>
    <td>4</td>
    <td>hello</td>
    </tr>
    <tr>
    <th>1</th>
    <td>5</td>
    <td>6</td>
    <td>7</td>
    <td>8</td>
    <td>world</td>
    </tr>
    <tr>
    <th>2</th>
    <td>9</td>
    <td>10</td>
    <td>11</td>
    <td>12</td>
    <td>foo</td>
    </tr>
    </tbody>
    </table>
    </div>

    无标题格式

    pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex2.csv',header=None)
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>0</th>
    <th>1</th>
    <th>2</th>
    <th>3</th>
    <th>4</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>1</td>
    <td>2</td>
    <td>3</td>
    <td>4</td>
    <td>hello</td>
    </tr>
    <tr>
    <th>1</th>
    <td>5</td>
    <td>6</td>
    <td>7</td>
    <td>8</td>
    <td>world</td>
    </tr>
    <tr>
    <th>2</th>
    <td>9</td>
    <td>10</td>
    <td>11</td>
    <td>12</td>
    <td>foo</td>
    </tr>
    </tbody>
    </table>
    </div>

    指定标题格式

    pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex2.csv',names=['a','b','c','d','message'])
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>a</th>
    <th>b</th>
    <th>c</th>
    <th>d</th>
    <th>message</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>1</td>
    <td>2</td>
    <td>3</td>
    <td>4</td>
    <td>hello</td>
    </tr>
    <tr>
    <th>1</th>
    <td>5</td>
    <td>6</td>
    <td>7</td>
    <td>8</td>
    <td>world</td>
    </tr>
    <tr>
    <th>2</th>
    <td>9</td>
    <td>10</td>
    <td>11</td>
    <td>12</td>
    <td>foo</td>
    </tr>
    </tbody>
    </table>
    </div>

    指定行列

    names = ['a','b','c','d','message']
    
    pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex2.csv',
                names = names,index_col = 'message')
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>a</th>
    <th>b</th>
    <th>c</th>
    <th>d</th>
    </tr>
    <tr>
    <th>message</th>
    <th></th>
    <th></th>
    <th></th>
    <th></th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>hello</th>
    <td>1</td>
    <td>2</td>
    <td>3</td>
    <td>4</td>
    </tr>
    <tr>
    <th>world</th>
    <td>5</td>
    <td>6</td>
    <td>7</td>
    <td>8</td>
    </tr>
    <tr>
    <th>foo</th>
    <td>9</td>
    <td>10</td>
    <td>11</td>
    <td>12</td>
    </tr>
    </tbody>
    </table>
    </div>

    多索引

    parsed = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/csv_mindex.csv',
                         index_col = ['key1','key2'])
    
    parsed
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th></th>
    <th>value1</th>
    <th>value2</th>
    </tr>
    <tr>
    <th>key1</th>
    <th>key2</th>
    <th></th>
    <th></th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th rowspan="4" valign="top">one</th>
    <th>a</th>
    <td>1</td>
    <td>2</td>
    </tr>
    <tr>
    <th>b</th>
    <td>3</td>
    <td>4</td>
    </tr>
    <tr>
    <th>c</th>
    <td>5</td>
    <td>6</td>
    </tr>
    <tr>
    <th>d</th>
    <td>7</td>
    <td>8</td>
    </tr>
    <tr>
    <th rowspan="4" valign="top">two</th>
    <th>a</th>
    <td>9</td>
    <td>10</td>
    </tr>
    <tr>
    <th>b</th>
    <td>11</td>
    <td>12</td>
    </tr>
    <tr>
    <th>c</th>
    <td>13</td>
    <td>14</td>
    </tr>
    <tr>
    <th>d</th>
    <td>15</td>
    <td>16</td>
    </tr>
    </tbody>
    </table>
    </div>

    特殊操作

    list(open('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex3.txt'))
    
    ['            A         B         C\n',
     'aaa -0.264438 -1.026059 -0.619500\n',
     'bbb  0.927272  0.302904 -0.032399\n',
     'ccc -0.264273 -0.386314 -0.217601\n',
     'ddd -0.871858 -0.348382  1.100491\n']
    
    result = pd.read_table('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex3.txt',
                           sep = '\s+')
    
    
    result
    
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>A</th>
    <th>B</th>
    <th>C</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>aaa</th>
    <td>-0.264438</td>
    <td>-1.026059</td>
    <td>-0.619500</td>
    </tr>
    <tr>
    <th>bbb</th>
    <td>0.927272</td>
    <td>0.302904</td>
    <td>-0.032399</td>
    </tr>
    <tr>
    <th>ccc</th>
    <td>-0.264273</td>
    <td>-0.386314</td>
    <td>-0.217601</td>
    </tr>
    <tr>
    <th>ddd</th>
    <td>-0.871858</td>
    <td>-0.348382</td>
    <td>1.100491</td>
    </tr>
    </tbody>
    </table>
    </div>

    跳过
    pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex4.csv',skiprows=[0,2,3])
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>a</th>
    <th>b</th>
    <th>c</th>
    <th>d</th>
    <th>message</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>1</td>
    <td>2</td>
    <td>3</td>
    <td>4</td>
    <td>hello</td>
    </tr>
    <tr>
    <th>1</th>
    <td>5</td>
    <td>6</td>
    <td>7</td>
    <td>8</td>
    <td>world</td>
    </tr>
    <tr>
    <th>2</th>
    <td>9</td>
    <td>10</td>
    <td>11</td>
    <td>12</td>
    <td>foo</td>
    </tr>
    </tbody>
    </table>
    </div>

    !cat /Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex5.csv
    
    something,a,b,c,d,message
    one,1,2,3,4,NA
    two,5,6,,8,world
    three,9,10,11,12,foo
    
    r_l = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex5.csv')
    r_l
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>something</th>
    <th>a</th>
    <th>b</th>
    <th>c</th>
    <th>d</th>
    <th>message</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>one</td>
    <td>1</td>
    <td>2</td>
    <td>3.0</td>
    <td>4</td>
    <td>NaN</td>
    </tr>
    <tr>
    <th>1</th>
    <td>two</td>
    <td>5</td>
    <td>6</td>
    <td>NaN</td>
    <td>8</td>
    <td>world</td>
    </tr>
    <tr>
    <th>2</th>
    <td>three</td>
    <td>9</td>
    <td>10</td>
    <td>11.0</td>
    <td>12</td>
    <td>foo</td>
    </tr>
    </tbody>
    </table>
    </div>

    缺失值

    pd.isnull(r_l)
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>something</th>
    <th>a</th>
    <th>b</th>
    <th>c</th>
    <th>d</th>
    <th>message</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>False</td>
    <td>False</td>
    <td>False</td>
    <td>False</td>
    <td>False</td>
    <td>True</td>
    </tr>
    <tr>
    <th>1</th>
    <td>False</td>
    <td>False</td>
    <td>False</td>
    <td>True</td>
    <td>False</td>
    <td>False</td>
    </tr>
    <tr>
    <th>2</th>
    <td>False</td>
    <td>False</td>
    <td>False</td>
    <td>False</td>
    <td>False</td>
    <td>False</td>
    </tr>
    </tbody>
    </table>
    </div>

    result = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex5.csv',na_values=['NULL'])
    
    result
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>something</th>
    <th>a</th>
    <th>b</th>
    <th>c</th>
    <th>d</th>
    <th>message</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>one</td>
    <td>1</td>
    <td>2</td>
    <td>3.0</td>
    <td>4</td>
    <td>NaN</td>
    </tr>
    <tr>
    <th>1</th>
    <td>two</td>
    <td>5</td>
    <td>6</td>
    <td>NaN</td>
    <td>8</td>
    <td>world</td>
    </tr>
    <tr>
    <th>2</th>
    <td>three</td>
    <td>9</td>
    <td>10</td>
    <td>11.0</td>
    <td>12</td>
    <td>foo</td>
    </tr>
    </tbody>
    </table>
    </div>

    指定特殊位置为Nan
    sentinels = {'message':['foo','NA'],
                 'someting':['two']}
    
    import pandas as pd
    pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex5.csv',na_values=sentinels)
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>something</th>
    <th>a</th>
    <th>b</th>
    <th>c</th>
    <th>d</th>
    <th>message</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>one</td>
    <td>1</td>
    <td>2</td>
    <td>3.0</td>
    <td>4</td>
    <td>NaN</td>
    </tr>
    <tr>
    <th>1</th>
    <td>two</td>
    <td>5</td>
    <td>6</td>
    <td>NaN</td>
    <td>8</td>
    <td>world</td>
    </tr>
    <tr>
    <th>2</th>
    <td>three</td>
    <td>9</td>
    <td>10</td>
    <td>11.0</td>
    <td>12</td>
    <td>NaN</td>
    </tr>
    </tbody>
    </table>
    </div>

    converters Dict containing column number of name mapping to functions (e.g., {'foo': f} would apply the function f to all values in the 'foo' column).
    dayfirst When parsing potentially ambiguous dates, treat as international format (e.g., 7/6/2012 -> June 7, 2012); False by default.
    date_parser Function to use to parse dates.
    nrows Number of rows to read from beginning of file.
    iterator Return a TextParser object for reading file piecemeal.
    chunksize For iteration, size of file chunks.
    skip_footer Number of lines to ignore at end of file.
    verbose Print various parser output information, like the number of missing values placed in non-numeric columns.
    encoding Text encoding for Unicode (e.g., 'utf-8' for UTF-8 encoded text).
    squeeze If the parsed data only contains one column, return a Series.
    thousands Separator for thousands (e.g., ',' or '.').Argument Description
    path String indicating filesystem location, URL, or file-like object
    sep or delimiter Character sequence or regular expression to use to split fields in each row
    header Row number to use as column names; defaults to 0 (first row), but should be None if there is no header row
    index_col Column numbers or names to use as the row index in the result; can be a single name/number or a list of them for a hierarchical index
    names List of column names for result, combine with header=None
    skiprows Number of rows at beginning of file to ignore or list of row numbers (starting from 0) to skip.
    na_values Sequence of values to replace with NA.
    comment Character(s) to split comments off the end of lines.
    parse_dates Attempt to parse data to datetime; False by default. If True, will attempt to parse all columns. Otherwise can specify a list of column numbers or name to parse. If element of list is tuple or list, will combine multiple columns together and parse to date (e.g., if date/time split across two columns).
    keep_date_col If joining columns to parse date, keep the joined columns; False by default.

    部分

    pd.options.display.max_rows = 10
    
    result = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex6.csv')
    
    result
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>one</th>
    <th>two</th>
    <th>three</th>
    <th>four</th>
    <th>key</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>0.467976</td>
    <td>-0.038649</td>
    <td>-0.295344</td>
    <td>-1.824726</td>
    <td>L</td>
    </tr>
    <tr>
    <th>1</th>
    <td>-0.358893</td>
    <td>1.404453</td>
    <td>0.704965</td>
    <td>-0.200638</td>
    <td>B</td>
    </tr>
    <tr>
    <th>2</th>
    <td>-0.501840</td>
    <td>0.659254</td>
    <td>-0.421691</td>
    <td>-0.057688</td>
    <td>G</td>
    </tr>
    <tr>
    <th>3</th>
    <td>0.204886</td>
    <td>1.074134</td>
    <td>1.388361</td>
    <td>-0.982404</td>
    <td>R</td>
    </tr>
    <tr>
    <th>4</th>
    <td>0.354628</td>
    <td>-0.133116</td>
    <td>0.283763</td>
    <td>-0.837063</td>
    <td>Q</td>
    </tr>
    <tr>
    <th>...</th>
    <td>...</td>
    <td>...</td>
    <td>...</td>
    <td>...</td>
    <td>...</td>
    </tr>
    <tr>
    <th>9995</th>
    <td>2.311896</td>
    <td>-0.417070</td>
    <td>-1.409599</td>
    <td>-0.515821</td>
    <td>L</td>
    </tr>
    <tr>
    <th>9996</th>
    <td>-0.479893</td>
    <td>-0.650419</td>
    <td>0.745152</td>
    <td>-0.646038</td>
    <td>E</td>
    </tr>
    <tr>
    <th>9997</th>
    <td>0.523331</td>
    <td>0.787112</td>
    <td>0.486066</td>
    <td>1.093156</td>
    <td>K</td>
    </tr>
    <tr>
    <th>9998</th>
    <td>-0.362559</td>
    <td>0.598894</td>
    <td>-1.843201</td>
    <td>0.887292</td>
    <td>G</td>
    </tr>
    <tr>
    <th>9999</th>
    <td>-0.096376</td>
    <td>-1.012999</td>
    <td>-0.657431</td>
    <td>-0.573315</td>
    <td>0</td>
    </tr>
    </tbody>
    </table>
    <p>10000 rows × 5 columns</p>
    </div>

    限定部分读取

    pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex6.csv',nrows=5)
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>one</th>
    <th>two</th>
    <th>three</th>
    <th>four</th>
    <th>key</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>0.467976</td>
    <td>-0.038649</td>
    <td>-0.295344</td>
    <td>-1.824726</td>
    <td>L</td>
    </tr>
    <tr>
    <th>1</th>
    <td>-0.358893</td>
    <td>1.404453</td>
    <td>0.704965</td>
    <td>-0.200638</td>
    <td>B</td>
    </tr>
    <tr>
    <th>2</th>
    <td>-0.501840</td>
    <td>0.659254</td>
    <td>-0.421691</td>
    <td>-0.057688</td>
    <td>G</td>
    </tr>
    <tr>
    <th>3</th>
    <td>0.204886</td>
    <td>1.074134</td>
    <td>1.388361</td>
    <td>-0.982404</td>
    <td>R</td>
    </tr>
    <tr>
    <th>4</th>
    <td>0.354628</td>
    <td>-0.133116</td>
    <td>0.283763</td>
    <td>-0.837063</td>
    <td>Q</td>
    </tr>
    </tbody>
    </table>
    </div>

    chunksize方法
    chunker = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex6.csv',chunksize=1000)
    
    chunker
    
    <pandas.io.parsers.TextFileReader at 0x110ad76a0>
    
    tot = pd.Series([])
    for piece in chunker:
        tot = tot.add(piece['key'].value_counts(),fill_value=0)
    
    tot = tot.sort_values(ascending = False)
    
    tot[:10]
    
    E    368.0
    X    364.0
    L    346.0
    O    343.0
    Q    340.0
    M    338.0
    J    337.0
    F    335.0
    K    334.0
    H    330.0
    dtype: float64
    

    写文件

    data = pd.read_csv('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex5.csv')
    
    data
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>something</th>
    <th>a</th>
    <th>b</th>
    <th>c</th>
    <th>d</th>
    <th>message</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>one</td>
    <td>1</td>
    <td>2</td>
    <td>3.0</td>
    <td>4</td>
    <td>NaN</td>
    </tr>
    <tr>
    <th>1</th>
    <td>two</td>
    <td>5</td>
    <td>6</td>
    <td>NaN</td>
    <td>8</td>
    <td>world</td>
    </tr>
    <tr>
    <th>2</th>
    <td>three</td>
    <td>9</td>
    <td>10</td>
    <td>11.0</td>
    <td>12</td>
    <td>foo</td>
    </tr>
    </tbody>
    </table>
    </div>

    data.to_csv('/Users/meininghang/Desktop/out.csv')
    
    !cat /Users/meininghang/Desktop/out.csv
    
    ,something,a,b,c,d,message
    0,one,1,2,3.0,4,
    1,two,5,6,,8,world
    2,three,9,10,11.0,12,foo
    
    import sys
    data.to_csv(sys.stdout,sep='|') #保存特定格式
    
    |something|a|b|c|d|message
    0|one|1|2|3.0|4|
    1|two|5|6||8|world
    2|three|9|10|11.0|12|foo
    
    缺失值处理
    data.to_csv(sys.stdout,na_rep = 'NULL')
    
    ,something,a,b,c,d,message
    0,one,1,2,3.0,4,NULL
    1,two,5,6,NULL,8,world
    2,three,9,10,11.0,12,foo
    
    丢掉行列
    data.to_csv(sys.stdout,index = False,header=False)
    
    one,1,2,3.0,4,
    two,5,6,,8,world
    three,9,10,11.0,12,foo
    
    data.to_csv(sys.stdout,index=False,columns=['a','b','c'])
    
    a,b,c
    1,2,3.0
    5,6,
    9,10,11.0
    

    series导出

    dates = pd.date_range('1/1/2000',periods=7)
    
    import numpy as np
    ts = pd.Series(np.arange(7),index = dates)
    
    ts.to_csv('/Users/meininghang/Desktop/tse.csv')
    
    !cat /Users/meininghang/Desktop/tse.csv
    
    2000-01-01,0
    2000-01-02,1
    2000-01-03,2
    2000-01-04,3
    2000-01-05,4
    2000-01-06,5
    2000-01-07,6
    

    特定格式

    !cat /Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex7.csv
    
    "a","b","c"
    "1","2","3"
    "1","2","3"
    
    import csv
    f = open('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex7.csv')
    
    reader = csv.reader(f)
    
    for li in reader:
        print(li)
    
    ['a', 'b', 'c']
    ['1', '2', '3']
    ['1', '2', '3']
    
    #step1:读取
    with open('/Users/meininghang/Downloads/pydata-book-2nd-edition/examples/ex7.csv') as f:
        li = list(csv.reader(f))
    
    #step2:设定
    header,values = li[0],li[1:]
    
    #step3:构造
    data_dict = {h:v for h,v in zip(header,zip(*values))} #zip(*values)可以把行变成列
    data_dict
    
    {'a': ('1', '1'), 'b': ('2', '2'), 'c': ('3', '3')}
    
    #step4:格式
    class my_dialect(csv.Dialect):
        lineterminator = '\n'
        delimiter = ';'
        quotechar = ' " '
        quoting = csv.QUOTE_MINIMAL
    reader = csv.reader(f, dialect=my_dialect)
    
    
    ---------------------------------------------------------------------------
    
    TypeError                                 Traceback (most recent call last)
    
    <ipython-input-38-be4cd5a73166> in <module>()
          5     quotechar = ' " '
          6     quoting = csv.QUOTE_MINIMAL
    ----> 7 reader = csv.reader(f, dialect=my_dialect)
    
    
    TypeError: argument 1 must be an iterator
    

    Argument Description
    delimiter One-character string to separate fields; defaults to ','.
    lineterminator Line terminator for writing; defaults to '\r\n'. Reader ignores this and recognizes cross-platform line terminators.
    quotechar Quote character for fields with special characters (like a delimiter); default is '"'.
    quoting Quoting convention. Options include csv.QUOTE_ALL (quote all fields), csv.QUOTE_MINIMAL (only fields with special characters like the delimiter), csv.QUOTE_NONNUMERIC, and csv.QUOTE_NONE (no quoting). See Python’s documentation for full details. Defaults to QUOTE_MINIMAL.
    skipinitialspace Ignore whitespace after each delimiter; default is False.
    doublequote How to handle quoting character inside a field; if True, it is doubled (see online documentation for full detail and behavior).
    escapechar String to escape the delimiter if quoting is set to csv.QUOTE_NONE; disabled by default.

    与之对应的有csv.writer方法

    相关文章

      网友评论

        本文标题:Python和csv文件交互

        本文链接:https://www.haomeiwen.com/subject/orinzxtx.html