美文网首页
python学习:零碎的内容(二)

python学习:零碎的内容(二)

作者: GPZ_Lab | 来源:发表于2020-02-21 00:09 被阅读0次

...本来是在python学习:零碎的内容里不断盖楼,盖到62条的时候不知道写了啥,文章被封了一次。目前不敢再动了,新盖一楼。

  1. pip install XXX的时候总有ReadTimeoutError: HTTPSConnectionPool(host='....', port=443): Read timed out.
    参考https://github.com/pypa/warehouse/issues/3826
    试试 pip install --default-timeout=1000 package_name

  2. 日期空值:NaT
    df.apply(lambda x:pd.to_datetime(x, errors='coerce'))
    将df中string格式的日期(比方说2020/1/26)转化为datetime格式,如果有空值则为NaT

  3. 13位数的unix时间格式,转化为human readable
    Unix time also known as Epoch time, POSIX time
    即19870年1月1日后多少秒。13位数为19700101后多少毫秒(milliseconds)

from datetime import datetime
dt_object = datetime.fromtimestamp(1581162409463/1000)
print(dt_object.strftime("%Y-%m-%d %H:%M:%S")) 
print(dt_object)

# 得到:
2020-02-08 19:46:49
2020-02-08 19:46:49.463000
  1. plotly: 用make_subplots()作图,调整两个子图之间的距离,比例,共用Y轴:
    subplots
make_subplots(rows=1,cols=2, # 两个图并排放
              column_widths=[0.2,0.8],  # 一个占比20%,一个占比80%
              shared_yaxes=True,  # 共用Y轴
              horizontal_spacing=0.01)  # 两个图之间距离缩短
 

67.plotly 颜色使用集锦:
discrete颜色:
https://plot.ly/python/discrete-color/
内置颜色:
https://plot.ly/python/builtin-colorscales/#discrete-color-sequences
一个很棒的调色板网站(可能是搞设计的人用的),如下所示:

  1. os
    文件重命名os.rename('old','new')
    删除文件 os.system('rm XXX') 即可

  2. python里检查md5码

import hashlib
def file_as_bytes(file):
    with file:
        return file.read()
test = ['XXXXXXXXXXXXXXX.fastq',
        'XXXXXXXXXXXXXX.fastq']
[(fname, hashlib.md5(file_as_bytes(open(fname, 'rb'))).digest()) for fname in test]
  1. python2和python3中二进制和unicode character的问题
    decode: 二进制--->unicode character
    encode: unicode character--->二进制
    python2:
  • str: 8-bits value(二进制), unicode: unicode characters
  • 默认使用ASCII
  • with open(XXX.bin.'r') as ...默认设置为binary encoding
    python3:
  • bytes: 8-bits value(二进制), str: unicode characters
  • bytesstr是完全不一样的type, 连两者的空值都不能等同。
  • with open(XXX.bin,'r') as ...默认设置为utf-8 encoding,所以用python3打开binary格式文件,需要指定mode为'rb'

71.get()
参考https://stackoverflow.com/questions/2068349/understanding-get-method-in-python

t = {'a':1,'b':2,'c':3}
t['e'] # get Keyerror
t.get('e',None) # 如果key里没有'e',则默认返回None
  1. eumerate()的第二个参数
a = ['a','b','c']
for ind, i in eumerate(a,2):
  print(ind, i)

# 2 a
# 3 b
# 4 c
  1. zip() loop
    for ai, bi in zip(a,b):
    在Python3中,zip() return的是个generator, python2中return的是 a list of all the tuples it creates,如果对很大的list pair迭代,会耗损很大内存。如果要在python2中使用zip,最好看看izip(itertools)
  2. try
    参考https://www.thegeekstuff.com/2019/05/python-try-except-examples/
a = 12
b = 'test'
try:
  print(a+b)  # raise typeError
except TypeError: # 如果try里的运行结果是TypeError,那么就:
  print(str(a)+b)

# 12test

  1. list.sort(key=)

  2. pd.read_excel()
    读入excel中所有的sheet
    pd.read_excel('XXX.xlsx', sheet_name=None)
    得到一个dictionary, key为sheet name, value为各sheet读入的dataframe

  3. 给一个dataframe全员log10
    df.applymap(math.log10) (先import math)

  4. function最好不要return None
    因为如果你return的东西要放到if/else中去,None0或者空List等的效果是一样的,容易造成bug

  5. raise

def divide(a,b):
  try:
    return a/b
  except ZeroDivisionError as e:
    raise ValueError('What?') from e

divide(5,0)

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-9-b8e948d46537> in divide(a, b)
      2     try:
----> 3         return a/b
      4     except ZeroDivisionError as e:

ZeroDivisionError: division by zero

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-10-0a52e2eb64c8> in <module>
----> 1 divide(5,0)

<ipython-input-9-b8e948d46537> in divide(a, b)
      3         return a/b
      4     except ZeroDivisionError as e:
----> 5         raise ValueError('what?') from e
      6       
      7

ValueError: what?
  1. list.sort(key=)
    key传递一个函数,在sort之前对list中每个element调用
  2. list of tuple的排序
    先根据tuple的第一位element排序,再根据第二位...
test = [(1,2),(1,19),(1,3),(1,4),(0,3),(0,9),(0,10)]
test.sort()
test
[(0, 3), (0, 9), (0, 10), (1, 2), (1, 3), (1, 4), (1, 19)]

所以如果你有个list要排序,但是有一群特殊分子需要安排到前面去,可以先把特殊分子抽出来做成(0,x),其他的为(1,x),根据80条来设置。然后排序就可以把特殊分子排列在前面了。

  1. subprocess输入input
    subprocess见零碎的内容(一)43条
p = subprocess.Popen('XXXX',shell=True,stdin=subprocess.PIPE,
                     stderr=subprocess.PIPE)
stdout,stderr = p.communicate(input='XXX\nXXXX\nXXXXX')
# 多个Input用\n分开
  1. index name
  1. tqdm
    在jupyter notebook/lab 中使用tqdm, import这个比较合适
from tqdm import tqdm_notebook as tqdm
for i in tqdm([1,2,3,4]):
  ....
  1. 把某一个index提取出来成string,而不是Index object
    df.loc[df['col']==i,:].index.tolist()[0] #这里只有一个element

  2. dataframe筛选出某一种dtype的columns
    先看一下有几种dtypes:
    df.dtypes.value_counts()
    然后select
    df.select_dtypes(include=['XX','XXX'])

  3. 缺失值填充 missing value imputation

from sklearn.impute import SimpleImputer,KNNImputer
# 用KNN对numeric values填充
imputer_n = KNNImputer(n_neighbors=2,weight='uniform')
imputer_n.fit_transform(df)

# 用most frequent对categorical填充
imputer_c = SimpleImputer(strategy='most_frequent')
imputer_c.fit_transform(df)
  1. 有else的list comprehension
    ["Even" if i%2==0 else "Odd" for i in range(10)]

  2. multi-index的melt (long --> wide)

df
   ID gp     value gp2
0   1  a  0.708910  a1
1   2  a  0.273727  a1
2   3  a  0.161171  a2
3   4  a  0.920273  a2
4   5  b  0.147851  b1
5   6  b  0.957274  b1
6   7  b  0.421100  b2
7   8  b  0.807547  b2

df_mean = df.loc[:,['gp','value','gp2']].groupby(['gp','gp2']).mean()
df_mean
           value
gp gp2
a  a1   0.491318
   a2   0.540722
b  b1   0.552562
   b2   0.614323

pd.melt(df_mean.reset_index(),id_vars = ['gp','gp2'])
  gp gp2 variable     value
0  a  a1    value  0.491318
1  a  a2    value  0.540722
2  b  b1    value  0.552562
3  b  b2    value  0.614323

90.用pandas打开excel
在python3环境里,即使安装了openpyxl也无法打开
需要 pd.read_excel(XXXX, sheet_name= 'XXX', engine='openpyxl')

相关文章

网友评论

      本文标题:python学习:零碎的内容(二)

      本文链接:https://www.haomeiwen.com/subject/cntifhtx.html