Linux中的位图

作者: echoworlding | 来源:发表于2016-12-15 21:28 被阅读562次

Linux中的位图
CorelDRAW位图转换矢量图
一篇搞定Drawable资源文件
Android中图片的压缩方法
Bitmap位图渲染与操作（图片移动，放大，缩小，旋转，镜像）转
OPenGL纹理的载入
Redis（02）-空间节约工具
两个位图覆盖合成为一个位图
iOS--博文收藏
45|位图：如何实现网页爬虫中的URL去重功能？

什么是位图

位图（bitmap）的定义

维基百科中关于位图的介绍：

一种数据结构，代表了有限域中的稠集（dense set），每一个元素至少出现一次，没有其他的数据和元素相关联。在索引、数据压缩等方面有广泛应用。

自己的理解
位图就是按位(bit)来记录、索引某个对象状态的图表(map)，即用每一位（bit）来存放每一个对象的某种状态（例如分别用0和1表示的话，可以表示同一对象的两种状态）。

这在数据规模大、而数据状态又不是很多的情况下，用来索引判断某个数据的状态（如存不存在、有没有被使用等）很方便。

Linux内核中的位图

位图接口的声明

位图在Linux内核中被大量使用。下面的源代码文件包含这些位图结构的通用接口：

lib/bitmap.c
include/linux/bitmap.h

除了这两个文件，还有一个特定的架构头文件，对特定架构的位运算进行优化。对于x86_64架构，使用下面头文件：

arch/x86/include/asm/bitops.h

在Linux内核中，位图是一个unsigned long类型的数组。因此，一个位图可以简单地声明为：

unsigned long my_bitmap[16];  /* 声明一个16比特的位图 */

它就像这个样子：

位图示例

将数据【5，1，7，15，0，4，6，10】存入后，结构变为：

置位操作

实际上，在Linux内核中有一个专门的DECLARE_BITMAP宏来声明位图（该宏位于头文件include/linux/types.h中）：

#define DECLARE_BITMAP(name,bits)  
      unsigned long name[BITS_TO_LONGS(bits)]

其中，参数name是位图的名字，bits是位图中比特的总数目。

Linux内核源码中对位图的介绍：

  19 /*
  20  * bitmaps provide an array of bits, implemented using an an
  21  * array of unsigned longs.  The number of valid bits in a
  22  * given bitmap does _not_ need to be an exact multiple of
  23  * BITS_PER_LONG.
  24  *
  25  * The possible unused bits in the last, partially used word
  26  * of a bitmap are 'don't care'.  The implementation makes
  27  * no particular effort to keep them zero.  It ensures that
  28  * their value will not affect the results of any operation.
  29  * The bitmap operations that return Boolean (bitmap_empty,
  30  * for example) or scalar (bitmap_weight, for example) results
  31  * carefully filter out these unused bits from impacting their
  32  * results.
  33  *
  34  * These operations actually hold to a slightly stronger rule:
  35  * if you don't input any bitmaps to these ops that have some
  36  * unused bits set, then they won't output any set unused bits
  37  * in output bitmaps.
  38  *
  39  * The byte ordering of bitmaps is more natural on little
  40  * endian architectures.  See the big-endian headers
  41  * include/asm-ppc64/bitops.h and include/asm-s390/bitops.h
  42  * for the best explanations of this ordering.
  43  */

位图的两个重要操作

置位（set bit）和清零（clear bit）是位图的两种重要的操作，在头文件arch/x86/include/asm/bitops.h实现了这些操作。在这两类操作中，每个函数分两种类型：原子类型和非原子类型。非原子类型的实现比较简单，介绍如下：

置位操作

/**
 * __set_bit - Set a bit in memory
 * @nr: the bit to set （位图中的比特数目）
 * @addr: the address to start counting from （位图中某个比特需要设值的地址）
 *
 * Unlike set_bit(), this function is non-atomic and may be reordered.
 * If it's called on the same region of memory simultaneously, the effect
 * may be that only one operation succeeds.
 */
static inline void __set_bit(long nr, volatile unsigned long *addr)
{
    asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory");
}

在该函数中，指令bts用来选择位图中的某个比特值来作为首个操作数，然后将已选择的比特值存入寄存器CF标签中，并设置此比特位。

清零操作

static inline void __clear_bit(long nr, volatile unsigned long *addr)
{
    asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
}

该函数用来清除某个地址的某个比特值。
指令btr与指令bts类似，选择某个比特值作为首个操作数，然后将其值存入寄存器CF标签中，并清除位图中的这个比特值，并且将位图作为指令的第二个操作数。

位图在Ext2文件系统中的应用

Ext2文件系统结构图示意如下：

ext2文件系统结构示意图

在Ext2文件系统中，采用了位图来存储信息的结构有：

组描述符表（GDT，Group Descriptor Table，可用df命令查看）
块位图（Block Bitmap）
inode位图（inode Bitmap）

各项具体含义参考文章

查找文件流程

以查找/home/slot目录下的文件myfile为例，该文件的完整路径就是/home/slot/myfile，系统首先从根目录/开始查找，根目录/的inode编号固定为2：

根目录的inode编号

在根目录/下的目录项中查找名字为home的目录:

根目录下的文件信息

实际上目录文件的结构非常简单，就是一系列目录项（dirent）的列表。每个目录项由以下几部分组成：

所包含文件的文件名;
文件名对应的inode编号；
文件类型；
记录长度。

接下来根据文件名找到其对应的inode编号：

home对应的inode编号

（同理依次找到目录slot下文件myfile的inode编号。）
再接下来，根据inode编号，查看inode table，找到具体的inode，inode中记录了两类信息：

文件属性；
文件指针。

其中，文件指针指向磁盘上的数据块（data block），文件的具体信息就存储在这些block上。所以，最终根据这些文件指针找到文件内容。

注意：
最后三个文件指针都是间接指针，指向间接寻址块（Indirect Block）（一级、二级、三级间接寻址），而非数据块。

整个流程如下图所示：

文件寻址流程

删除文件流程

当使用命令 rm filename 删除一个文件时，系统实际上是删除了其inode编号，具体是每调用一次 rm filename 命令，就将filename对应的inode编号的硬连接数减一（并删除相应目录下的目录项），当硬连接数减为0时系统才释放该inode编号，以待分配给其它新的文件使用。

系统每次分配inode编号，是从所有可用的inode编号中选择最小的那个。

当然，在删除inode编号之前，系统要先释放数据块（即将bitmap中对应的位清零）。

所以，使用rm命令实际上只是删除了文件的inode编号（这也是为什么一般删除操作都很快），而文件数据仍然存放在磁盘上，只是无法再查看到。因此，只要找到文件对应的inode信息，就可以将文件内容恢复出来（前提是这些数据block没有被后来的其它数据所覆盖。所以在误删文件后，应立即停止所有写操作，以防止原有数据被覆盖）。