二、文件IO，阻塞非阻塞

作者: 木鱼_cc | 来源:发表于2018-07-13 22:02 被阅读0次

0.目录

1.系统调用
2.open/close函数
3.文件描述符
4.read/write函数
5.错误处理函数
6.阻塞、非阻塞
7.lseek函数
8.fcntl函数
9.ioctl函数
10.传入传出参数

1.系统调用

什么是系统调用：
由操作系统实现并提供给外部应用程序的编程接口。(Application Programming Interface，API)。是应用程序同系统之间数据交互的桥梁。

C标准函数和系统函数调用关系。一个helloworld如何打印到屏幕。

系统调用

1.1C标准库文件IO函数。

fopen、fclose、fseek、fgets、fputs、fread、fwrite......
r 只读、 r+读写
w只写并截断为0、 w+读写并截断为0
a追加只写、 a+追加读写

2.open/close函数

2.1函数原型

int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
int close(int fd);
//mode传的是数字权限 例如0777,0644，但是是八进制
//三个参数的方法只有在第二个参数中有O_CREAT作为参数的时候才需要使用
//linux系统编程取代了常规C语言IO的地方就是文件指针被替换成了一个整型的数字

2.2常用参数

O_RDONLY、O_WRONLY、O_RDWR    //只读 只写 读写
O_APPEND、O_CREAT、O_EXCL、 O_TRUNC、 O_NONBLOCK //追加，（不存在就）创建，文件是否存在，将文件截断为0（清空） ，非阻塞    
使用头文件：<fcntl.h>

2.3open常见错误

打开文件不存在
以写方式打开只读文件(打开文件没有对应权限)
以只写方式打开目录

2.4例子

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>

int main(void)
{
    int fd;
    char buf[64];
    int ret = 0;

    fd = open("./file.txt", O_RDONLY);
    if (fd == -1) {
        printf("open file error\n");
        exit(1);
    }
    printf("---open ok---\n");

    ret = read(fd,buf,sizeof(buf));
    while(ret) {
        write(fd, buf, ret);
        ret = read(fd,buf,sizeof(buf));
    }

    close(fd);
    return 0;
}

3.文件描述符

3.1PCB进程控制块

可使用命令locate sched.h查看位置： /usr/src/linux-headers-3.16.0-30/include/linux/sched.h

PCB

3.2文件描述符表

结构体PCB 的成员变量file_struct *file 指向文件描述符表。
从应用程序使用角度，该指针可理解记忆成一个字符指针数组，下标0/1/2/3/4...找到文件结构体。
本质是一个键值对0、1、2...都分别对应具体地址。但键值对使用的特性是自动映射，我们只操作键不直接使用值。
新打开文件返回文件描述符表中未使用的最小文件描述符。

STDIN_FILENO    0
STDOUT_FILENO   1
STDERR_FILENO   2

3.2FILE结构体

主要包含文件描述符、文件读写位置、IO缓冲区三部分内容。

struct file {
        ...
        文件的偏移量；
        文件的访问权限；
        文件的打开标志；
        文件内核缓冲区的首地址；
        struct operations * f_op;
        ...     
    };          
查看方法：
    (1) /usr/src/linux-headers-3.16.0-30/include/linux/fs.h     
    (2) lxr：百度 lxr → lxr.oss.org.cn → 选择内核版本(如3.10) → 点击File Search进行搜索 
        → 关键字：“include/linux/fs.h” → Ctrl+F 查找 “struct file {” 
        → 得到文件内核中结构体定义
        → “struct file_operations”文件内容操作函数指针 
        → “struct inode_operations”文件属性操作函数指针

3.3最大打开文件数

一个进程默认打开文件的个数1024。
命令查看unlimit -a 查看open files 对应值。默认为1024
可以使用ulimit -n 4096 修改
当然也可以通过修改系统配置文件永久修改该值，但是不建议这样操作。

cat /proc/sys/fs/file-max可以查看该电脑最大可以打开的文件个数。受内存大小影响。

4.read/write函数

ssize_t read(int fd, void *buf, size_t count); 
ssize_t write(int fd, const void *buf, size_t count); 
read与write函数原型类似。使用时需注意：read/write函数的第三个参数。
count在read和write中不尽相同，请看2.4例子

4.1编写程序实现简单的cp功能

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <errno.h>
#include <time.h>
#define N 1 //1024

int main(int argc, char *argv[])
{
    int fd, fd_out;
    int n;
    char buf[N];
    clock start,end;
    start = clock();

    fd = open(argv[1], O_RDONLY);//argv[0]是./app
    if(fd < 0){
        perror("open dict.txt error");
        exit(1);
    }

    fd_out = open(argv[2], O_WRONLY|O_CREAT|O_TRUNC, 0644);
    if(fd < 0){
        perror("open dict.cp error");
        exit(1);
    }

    while((n = read(fd, buf, N))){
        if(n < 0){
            perror("read error");
            exit(1);
        }
        write(fd_out, buf, n);
    }

    end = clock();

    printf("time:%ld",(double)(end-start));

    close(fd);
    close(fd_out);
    return 0;
}

4.2程序比较

如果一个只读一个字节实现文件拷贝，使用read、write效率高，还是使用对应的标库函数效率高呢？

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(void)
{
    FILE *fp, *fp_out;
    int n;
    clock start,end;
    start = clock();

    fp = fopen("dict.txt", "r");
    if(fp == NULL){
        perror("fopen error");
        exit(1);
    }

    fp_out = fopen("dict.cp", "w");
    if(fp == NULL){
        perror("fopen error");
        exit(1);
    }

    while((n = fgetc(fp)) != EOF){
        fputc(n, fp_out);
    }
    end = clock();

    printf("time:%ld",(double)(end-start));

    fclose(fp);
    fclose(fp_out);

    return 0;
}

比较可知识fopen/fclose效率明显高于read/write，为什么？先来看一副图

image.png

用户程序直接使用read/write函数时，因为write是底层函数，调用过程式把用户态转变成内核态，而随着这种状态的转变，时间的消耗也是在增加的。
而调用标库函数fgetc/fputc时，函数会把数据调进一个缓冲区(大小4096k)中,再write改变系统状态发到内核进行操作，所以效率会有很大的不同

4.3strace命令

shell中使用strace命令跟踪程序执行，查看调用的系统函数。

strace ./app

4.4缓冲区

read、write函数常常被称为Unbuffered I/O。指的是无用户及缓冲区。但不保证不使用内核缓冲区。

5.错误处理函数

错误号：errno

perror函数：   void perror(const char *s); 
strerror函数： char *strerror(int errnum); 

perror(“open error”);//自动补全错误信息
printf(“open error:%s\n”,strerror(errno));//把错误编号转换成字符

查看错误号：  
/usr/include/asm-generic/errno-base.h
/usr/include/asm-generic/errno.h

6.阻塞、非阻塞

读常规文件是不会阻塞的，不管读多少字节，read一定会在有限的时间内返回。从终端设备或网络读则不一定，如果从终端输入的数据没有换行符，调用read读终端设备就会阻塞，如果网络上没有接收到数据包，调用read从网络读就会阻塞，至于会阻塞多长时间也是不确定的，如果一直没有数据到达就一直阻塞在那里。同样，写常规文件是不会阻塞的，而向终端设备或网络写则不一定。

现在明确一下阻塞（Block）这个概念。当进程调用一个阻塞的系统函数时，该进程被置于睡眠（Sleep）状态，这时内核调度其它进程运行，直到该进程等待的事件发生了（比如网络上接收到数据包，或者调用sleep指定的睡眠时间到了）它才有可能继续运行。与睡眠状态相对的是运行（Running）状态，在Linux内核中，处于运行状态的进程分为两种情况：

正在被调度执行。CPU处于该进程的上下文环境中，程序计数器（eip）里保存着该进程的指令地址，通用寄存器里保存着该进程运算过程的中间结果，正在执行该进程的指令，正在读写该进程的地址空间。

就绪状态。该进程不需要等待什么事件发生，随时都可以执行，但CPU暂时还在执行另一个进程，所以该进程在一个就绪队列中等待被内核调度。系统中可能同时有多个就绪的进程，那么该调度谁执行呢？内核的调度算法是基于优先级和时间片的，而且会根据每个进程的运行情况动态调整它的优先级和时间片，让每个进程都能比较公平地得到机会执行，同时要兼顾用户体验，不能让和用户交互的进程响应太慢。

阻塞读终端：              【block_readtty.c】//不给我不走

非阻塞读终端              【nonblock_readtty.c】//不给我也走了

非阻塞读终端和等待超时     【nonblock_timeout.c】//不给我不走，除非时间到

注意，阻塞与非阻塞是对于文件而言的。而不是read、write等的属性。read终端，默认阻塞读。

总结read 函数返回值：

返回非零值：实际read到的字节数
返回 -1：
1）：errno != EAGAIN (或!=EWOULDBLOCK) read出错
2）：errno == EAGAIN (或==EWOULDBLOCK) 设置了非阻塞读，并且没有数据到达。
返回0：读到文件末尾

6.1阻塞例子

block_readtty.c

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

//hello worl        d \n

int main(void)
{
    char buf[10];
    int n;

    n = read(STDIN_FILENO, buf, 10);   // #define STDIN_FILENO 0   STDOUT_FILENO 1  STDERR_FILENO 2
    if(n < 0){
        perror("read STDIN_FILENO");
        //printf("%d", errno);
        exit(1);
    }
    write(STDOUT_FILENO, buf, n);
    
    return 0;
}

nonblock_readtty.c

#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MSG_TRY "try again\n"

int main(void)
{
    char buf[10];
    int fd, n;

    fd = open("/dev/tty", O_RDONLY|O_NONBLOCK); //使用O_NONBLOCK标志设置非阻塞读终端
    if(fd < 0){
        perror("open /dev/tty");
        exit(1);
    }
tryagain:

    n = read(fd, buf, 10);   //-1  (1)  出错  errno==EAGAIN或者EWOULDBLOCK

    if(n < 0){
        //由于open时指定了O_NONBLOCK标志，read读设备，没有数据到达返回-1，同时将errno设置为EAGAIN或EWOULDBLOCK
        if(errno != EAGAIN){        //也可以是 if(error != EWOULDBLOCK)两个宏值相同
            perror("read /dev/tty");
            exit(1);
        }
        sleep(3);
        write(STDOUT_FILENO, MSG_TRY, strlen(MSG_TRY));
        goto tryagain;
    }
    write(STDOUT_FILENO, buf, n);
    close(fd);

    return 0;
}

nonblock_timeout.c

#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>

#define MSG_TRY "try again\n"
#define MSG_TIMEOUT "time out\n"

int main(void)
{
    char buf[10];
    int fd, n, i;

    fd = open("/dev/tty", O_RDONLY|O_NONBLOCK);
    if(fd < 0){
        perror("open /dev/tty");
        exit(1);
    }
    printf("open /dev/tty ok... %d\n", fd);

    for (i = 0; i < 5; i++){
        n = read(fd, buf, 10);
        if(n > 0){    //说明读到了东西
            break;
        }
        if(errno != EAGAIN){   //EWOULDBLK  
            perror("read /dev/tty");
            exit(1);
        }
        sleep(1);
        write(STDOUT_FILENO, MSG_TRY, strlen(MSG_TRY));
    }

    if(i == 5){
        write(STDOUT_FILENO, MSG_TIMEOUT, strlen(MSG_TIMEOUT));
    }else{
        write(STDOUT_FILENO, buf, n);
    }

    close(fd);

    return 0;
}

7.lseek函数

Linux中可使用系统函数lseek（L seek）来修改文件偏移量(读写位置)

每个打开的文件都记录着当前读写位置，打开文件时读写位置是0，表示文件开头，通常读写多少个字节就会将读写位置往后移多少个字节。但是有一个例外，如果以O_APPEND方式打开，每次写操作都会在文件末尾追加数据，然后将读写位置移到新的文件末尾。lseek和标准I/O库的fseek函数类似，可以移动当前读写位置（或者叫偏移量）。

回忆fseek的作用及常用参数。 SEEK_SET、SEEK_CUR、SEEK_END
int fseek(FILE *stream, long offset, int whence);  成功返回0；失败返回-1
特别的：超出文件末尾位置返回0；往回超出文件头位置，返回-1

off_t lseek(int fd, off_t offset, int whence); 失败返回-1；成功：返回的值是较文件起始位置向后的偏移量。
特别的：lseek允许超过文件结尾设置偏移量，文件会因此被拓展(但是这个拓展必须有write或者read这样的IO操作)。

注意文件“读”和“写”使用同一偏移位置。                    【lseek.c】

lseek.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>

int main(void)
{
    int fd, n;
    char msg[] = "It's a test for lseek\n";
    char ch;

    fd = open("lseek.txt", O_RDWR|O_CREAT, 0644);
    if(fd < 0){
        perror("open lseek.txt error");
        exit(1);
    }

    write(fd, msg, strlen(msg));    //使用fd对打开的文件进行写操作，文件读写位置位于文件结尾处。

    lseek(fd, 0, SEEK_SET);         //修改文件读写指针位置，位于文件开头。 
                                    //注释上行会怎样呢？while循环无效了，因为指针在最后，读不出任何数据

    while((n = read(fd, &ch, 1))){
        if(n < 0){
            perror("read error");
            exit(1);
        }
        write(STDOUT_FILENO, &ch, n);   //将文件内容按字节读出，写出到屏幕
    }

    close(fd);

    return 0;
}

7.1 lseek常用应用

使用lseek拓展文件：write操作才能实质性的拓展文件。单lseek是不能进行拓展的。
一般：write(fd, "a", 1);
od -tcx filename 查看文件的16进制表示形式
od -tcd filename 查看文件的10进制表示形式
通过lseek获取文件的大小：lseek(fd, 0, SEEK_END); 【lseek_test.c】

[最后注意]：lseek函数返回的偏移量总是相对于文件头而言。

lseek_test.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>

int main(void)
{
    int fd;

    fd = open("lseek.txt", O_RDONLY | O_CREAT, 0664);
    if(fd < 0){
        perror("open lseek.txt error");
        exit(1);
    }

    int len = lseek(fd, 0, SEEK_END);//获取文件大小
    if(len == -1){
        perror("lseek error");
        exit(1);
    }
    printf("len of msg = %d\n", len);

    //int ret = truncate("lseek.txt", 1500);
    int ret = ftruncate(fd, 1800);
    if(ret == -1){
        perror("ftrun error");
        exit(1);
    }

#if 0
    len = lseek(fd, 999, SEEK_SET);
    if(len == -1){
        perror("lseek seek_set error");
        exit(1);
    }
    int ret = write(fd, "a", 1);
    if(ret == -1){
        perror("write error");
        exit(1);
    }
#endif


#if 0
    off_t cur = lseek(fd, -10, SEEK_SET);
    printf("--------| %ld\n", cur);
    if(cur == -1){
        perror("lseek error");
        exit(1);
    }
#endif

    close(fd);

    return 0;
}

8.fcntl函数

改变一个【已经打开】的文件的访问控制属性。
重点掌握两个参数的使用，F_GETFL 和 F_SETFL。
F_GETFL 获取文件访问控制属性
F_SETFL 设置文件访问控制属性

【fcntl.c】

fcntl.c

#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MSG_TRY "try again\n"

int main(void)
{
    char buf[10];
    int flags, n;

    flags = fcntl(STDIN_FILENO, F_GETFL); //获取stdin属性信息
    if(flags == -1){
        perror("fcntl error");
        exit(1);
    }
    flags |= O_NONBLOCK;    //位或 让flags多了一个非阻塞属性 ---------图解！
    int ret = fcntl(STDIN_FILENO, F_SETFL, flags);//设置属性
    if(ret == -1){
        perror("fcntl error");
        exit(1);
    }
    /*
    int fd = open("/dev/tty", O_RDONLY|O_NONBLOCK);以前 我们是这样设置STDIN_FILENO的属性的
    现在 我们只需要在程序中 --先取flags 位或 设置 就可以做到了更改访问控制属性的效果
    */
    

tryagain:
    n = read(STDIN_FILENO, buf, 10);
    if(n < 0){
        if(errno != EAGAIN){        
            perror("read /dev/tty");
            exit(1);
        }
        sleep(3);
        write(STDOUT_FILENO, MSG_TRY, strlen(MSG_TRY));
        goto tryagain;
    }
    write(STDOUT_FILENO, buf, n);

    return 0;
}

位图

9.ioctl函数

对设备的I/O通道进行管理，控制设备特性。(主要应用于设备驱动程序中)。

通常用来获取文件的【物理特性】（该特性，不同文件类型所含有的值各不相同）

【ioctl.c】

ioctl.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/ioctl.h>

int main(void)
{
    struct winsize size;

    if (isatty(STDOUT_FILENO) == 0)   
        exit(1);

    if(ioctl(STDOUT_FILENO, TIOCGWINSZ, &size)<0) {  //获得当前窗口的数据，保存在size中！
        perror("ioctl TIOCGWINSZ error");
        exit(1);
    }
    printf("%d rows, %d columns\n", size.ws_row, size.ws_col);

    return 0;
}

10.传入传出参数

10.1传入参数

const 关键字修饰的指针变量在函数内部读操作。
char *strcpy(const char *src, char *dst);

10.2传出参数

指针做为函数参数
函数调用前，指针指向的空间可以无意义，调用后指针指向的空间有意义，且作为函数的返回值传出
在函数内部写操作。

10.3传入传出参数

调用前指向的空间有实际意义
调用期间在函数内读、写(改变原值)操作
作为函数返回值传出。