python

1. 得到所有基准数据的标准差和均值

目前，如果你还是编程小白！！

你咨询编程大神：大神、大神，我想学编程，你说我是学Java、C语言、Python.......，还是其他的语言？你推荐先学哪一种编程语言呢？................。

问题就如糖衣大炮一样，又让人应接不暇，无从下口。但是，大多数大神给你的回答就是学python，我前面也写过一篇博文为什么推荐学Python。首先，Python语法读起来还是比较简单的，类似于R语言一样，以及Python的用途很广等等因素结合起来，所以推荐小白的你来学Python。

但是说来惭愧，自己很早以前就一直有这个概念，有这样的想法，但是由于自己一直没有坚持下来，到现在还是不会写Python代码，惭愧，羞愧，害........，没有脸说什么吧。我自己也是个编程小白。

生信人，最少需要来掌握一门编程语言（PS：自己定义R语言不算，个人认为R是必须掌握的）。个人推荐，Perl或Python可以适当学一点哦！至少可以满足自己需求，如果你不会，那只能求助他人啦（满足这个条件的前提是，你身边有这样的大神）。

OK！！前面都是些“废话”，我后续也会逐渐的分享一些关于Python的教程。

我们一起进步啦！！一起学习！！

今天分享的教程是，求所有数据的标准差和均值

代码来自Nature Methods，题目OME-NGFF: a next-generation file format for expanding bioimaging data-access strategies，网址：https://www.nature.com/articles/s41592-021-01326-w#data-availabilit

1.1 原始数据

代码区

## python 
# Get the standard deviation and mean for all the benchmark data
# grouped by type (e.g. HDF5, TIFF, Zarr, Overhead) and by
# source (e.g. http, local, s3)

## 加载所需包
import pandas

# 加载我文件
for csv_file in ["2d_benchmark_data.csv", "3d_benchmark_data.csv"]:

    print(csv_file)

    df = pandas.read_csv(csv_file)

    print("Mean")
    mean_values = df.groupby(["type", "source"]).mean()
    # or if you only want the "seconds" column
    # mean_values = mean_values["seconds"]
    print(mean_values)

    print("Std")
    std_values = df.groupby(["type", "source"]).std()
    print(std_values)

代码分段式

for csv_file in ["2d_benchmark_data.csv", "3d_benchmark_data.csv"]:

    print(csv_file)

求均值

df = pandas.read_csv(csv_file)

    print("Mean")
    mean_values = df.groupby(["type", "source"]).mean()

输出结果如下：

2d_benchmark_data.csv
Mean
type      source
HDF5      http      0.221113
          local     0.002818
          s3        1.121805
Overhead  http      0.001269
          local     0.000014
          s3        0.011279
TIFF      http      0.151114
          local     0.086267
          s3        0.388272
Zarr      http      0.006652
          local     0.007099
          s3        0.131575
Name: seconds, dtype: float64
Std
                 duration  chunk_distance      round   seconds
type     source                                               
HDF5     http    0.051123   149981.733567  29.011492  0.051115
         local   0.004189   149981.733567  29.011492  0.004156
         s3      0.322672   149981.733567  29.011492  0.322666
Overhead http    0.001197   149981.733567  29.011492  0.001187
         local   0.000016   149981.733567  29.011492  0.000002
         s3      0.002839   149981.733567  29.011492  0.002838
TIFF     http    0.037332   149981.733567  29.011492  0.037327
         local   0.036226   149981.733567  29.011492  0.036227
         s3      0.088530   149981.733567  29.011492  0.088532
Zarr     http    0.001773   149981.733567  29.011492  0.001760
         local   0.002866   149981.733567  29.011492  0.002868
         s3      0.019592   149981.733567  29.011492  0.019609
3d_benchmark_data.csv
Mean
type      source
HDF5      http      0.220592
          local     0.002479
          s3        1.046130
Overhead  http      0.001163
          local     0.000023
          s3        0.012607
TIFF      s3        0.928801
Zarr      http      0.013290
          local     0.007667
          s3        0.100552
Name: seconds, dtype: float64
Std
                 duration  chunk_distance      round   seconds
type     source                                               
HDF5     http    0.051433    906299.64163  29.011492  0.051430
         local   0.002911    906299.64163  29.011492  0.002880
         s3      0.259094    906299.64163  29.011492  0.259042
Overhead http    0.000518    906299.64163  29.011492  0.000509
         local   0.000086    906299.64163  29.011492  0.000051
         s3      0.005196    906299.64163  29.011492  0.005179
TIFF     s3           NaN             NaN        NaN       NaN
Zarr     http    0.009346    906299.64163  29.011492  0.009336
         local   0.006391    906299.64163  29.011492  0.006381
         s3      0.015169    906299.64163  29.011492  0.015172

Process finished with exit code 0