Redis-数据结构

作者: 麦大大吃不胖 | 来源:发表于2020-12-11 20:17 被阅读0次

Redis-数据结构
Redis-数据结构
redis-数据结构-链表
Redis-数据结构-对象
使用过Redis，我竟然还不知道Rdb
Redis学习之路（12）- 杂记
Redis-数据结构-跳跃表
数据结构之Redis-跳表
Redis-数据结构-SDS、链表
Redis-数据结构-RedisObject、字典

by shihang.mai

基于redis版本6.0。redis采用字节数组存储，做到对多语言安全。

0. 前言

之前对redis的操作只停留对各个数据结构curd，而并不知道各个数据结构的底层实现，作为程序员的我们还是得深入底层，主要为了提升技术(面试，手动狗头)

1. 数据结构

我们常用的5大数据结构:String、List、Set、ZSet、Hash。这些数据结构都只是针对val值而言。但是在redis的底层实现中，会将它们抽象为redisOject

//server.h
typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */
    int refcount;
    void *ptr;
} robj;

其中有3个最重要的属性type、 encoding 和 ptr

type: 记录了对象所保存的值的类型,可选值

//server.h
#define OBJ_STRING 0    /* String object. */
#define OBJ_LIST 1      /* List object. */
#define OBJ_SET 2       /* Set object. */
#define OBJ_ZSET 3      /* Sorted set object. */
#define OBJ_HASH 4      /* Hash object. */
#define OBJ_MODULE 5    /* Module object. */
#define OBJ_STREAM 6    /* Stream object. */

encoding: 记录了对象所保存的值的编码,可选值

//server.h
#define OBJ_ENCODING_RAW 0     /* Raw representation */
#define OBJ_ENCODING_INT 1     /* Encoded as integer */
#define OBJ_ENCODING_HT 2      /* Encoded as hash table */
#define OBJ_ENCODING_ZIPMAP 3  /* Encoded as zipmap */
#define OBJ_ENCODING_LINKEDLIST 4 /* No longer used: old list encoding. */
#define OBJ_ENCODING_ZIPLIST 5 /* Encoded as ziplist */
#define OBJ_ENCODING_INTSET 6  /* Encoded as intset */
#define OBJ_ENCODING_SKIPLIST 7  /* Encoded as skiplist */
#define OBJ_ENCODING_EMBSTR 8  /* Embedded sds string encoding */
#define OBJ_ENCODING_QUICKLIST 9 /* Encoded as linked list of ziplists */
#define OBJ_ENCODING_STREAM 10 /* Encoded as a radix tree of listpacks */

*ptr: 数据结构的指针。它是根据type和encoding确定的

他们的关系如下

redisObject.png

1.1 字符串类型

//object.c
#define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44
robj *createStringObject(const char *ptr, size_t len) {
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)
        return createEmbeddedStringObject(ptr,len);
    else
        return createRawStringObject(ptr,len);
}

当值为数字类型时，那么它就会用INT存储
当值是一个字符串值&&长度<=44，那么就会用embstr编码的简单动态字存储
当值是一个字符串值&&长度>44就会用ram编码的简单动态字符存储

redis字符串底层逻辑.png

这里的简单动态字符在源码中就是SDS(simple dynamic string），并且会根据字符串长度选择不一样的数据结构，其实是存储类型不一样，体现了redis追求极致内存

//sds.c
static inline char sdsReqType(size_t string_size) {
    if (string_size < 1<<5)
        return SDS_TYPE_5;
    if (string_size < 1<<8)
        return SDS_TYPE_8;
    if (string_size < 1<<16)
        return SDS_TYPE_16;
#if (LONG_MAX == LLONG_MAX)
    if (string_size < 1ll<<32)
        return SDS_TYPE_32;
    return SDS_TYPE_64;
#else
    return SDS_TYPE_32;
#endif
}

//sds.h
/* Note: sdshdr5 is never used, we just access the flags byte directly.
 * However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

SDS.png

len: 记录当前已使用的字节数，即字符串长度
alloc: 记录当前字节数组总共分配的字节数量
flags: 标记当前字节数组的属性

//sds.h
#define SDS_TYPE_5  0
#define SDS_TYPE_8  1
#define SDS_TYPE_16 2
#define SDS_TYPE_32 3
#define SDS_TYPE_64 4

buf[]: 保存字符串的每一个字符元素

为啥要存在两种编码呢，看看底层

//object.c
#define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44
robj *createStringObject(const char *ptr, size_t len) {
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)
        return createEmbeddedStringObject(ptr,len);
    else
        return createRawStringObject(ptr,len);
}

robj *createEmbeddedStringObject(const char *ptr, size_t len) {
    robj *o = zmalloc(sizeof(robj)+sizeof(struct sdshdr8)+len+1);
    struct sdshdr8 *sh = (void*)(o+1);

    o->type = OBJ_STRING;
    o->encoding = OBJ_ENCODING_EMBSTR;
    o->ptr = sh+1;
    o->refcount = 1;
    if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
        o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
    } else {
        o->lru = LRU_CLOCK();
    }

    sh->len = len;
    sh->alloc = len;
    sh->flags = SDS_TYPE_8;
    if (ptr == SDS_NOINIT)
        sh->buf[len] = '\0';
    else if (ptr) {
        memcpy(sh->buf,ptr,len);
        sh->buf[len] = '\0';
    } else {
        memset(sh->buf,0,len+1);
    }
    return o;
}

robj *createRawStringObject(const char *ptr, size_t len) {
    return createObject(OBJ_STRING, sdsnewlen(ptr,len));
}

robj *createObject(int type, void *ptr) {
    robj *o = zmalloc(sizeof(*o));
    o->type = type;
    o->encoding = OBJ_ENCODING_RAW;
    o->ptr = ptr;
    o->refcount = 1;

    /* Set the LRU to the current lruclock (minutes resolution), or
     * alternatively the LFU counter. */
    if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
        o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
    } else {
        o->lru = LRU_CLOCK();
    }
    return o;
}

embstr使用了连续内存，而raw不连续。
释放embstr只需要一次，而raw需要两次

1.2 快速列表quickList

双向链表+每个节点是一个ziplist

//quicklist.h
typedef struct quicklist {
    quicklistNode *head;
    quicklistNode *tail;
    unsigned long count;        /* total count of all entries in all ziplists */
    unsigned long len;          /* number of quicklistNodes */
    int fill : QL_FILL_BITS;              /* fill factor for individual nodes */
    unsigned int compress : QL_COMP_BITS; /* depth of end nodes not to compress;0=off */
    unsigned int bookmark_count: QL_BM_BITS;
    quicklistBookmark bookmarks[];
} quicklist;

typedef struct quicklistNode {
    struct quicklistNode *prev;
    struct quicklistNode *next;
    unsigned char *zl;
    unsigned int sz;             /* ziplist size in bytes */
    unsigned int count : 16;     /* count of items in ziplist */
    unsigned int encoding : 2;   /* RAW==1 or LZF==2 */
    unsigned int container : 2;  /* NONE==1 or ZIPLIST==2 */
    unsigned int recompress : 1; /* was this node previous compressed? */
    unsigned int attempted_compress : 1; /* node can't compress; too small */
    unsigned int extra : 10; /* more bits to steal for future usage */
} quicklistNode;

typedef struct quicklistBookmark {
    quicklistNode *node;
    char *name;
} quicklistBookmark;

出现quickList原因:
1.双端链表虽然便于在表的两端进行 push 和 pop 操作，但是它各个节点是单独的内存块，地址不连续，节点多了容易产生内存碎片；

而ziplist是一整块连续内存，所以存储效率很高，但是它不利于修改操作,每次数据变动都会引发一次内存的重新分配
结合它们做一个空间效率和时间效率的折中。同时拥有两个的优点

1.3 字典

字典就是key-val形式，hash表只是它的一种实现

//dict.h
typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    unsigned long iterators; /* number of iterators currently running */
} dict;

typedef struct dictType {
    uint64_t (*hashFunction)(const void *key);
    void *(*keyDup)(void *privdata, const void *key);
    void *(*valDup)(void *privdata, const void *obj);
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
} dictType;

typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

和jdk的hashmap一样，都是用链表解决hash冲突

dict.png

dict字典有一个长度为2的dictht类型数组，其中dictht[0]用来存放数据
当数据达到一定量时，就要扩容，即rehash了。首先将扩容标志位更新，然后利用利用dictht[1],长度变为dictht[0]的2倍，并且把数据全部迁移过去。
然后将dictht[1]改为dictht[0]，并且新建一个dictht[1]做下次扩容准备，同理缩减变为原来的一半。
但当数据很多的时候，这个扩容就做了一定优化了，不会是stw，而是当数据新增时，向dictht[1]增加，dictht[0]不增加；当为更新时，同时更新dictht[1]和dictht[0]

1.3 跳表

详情参见另外一篇我的博客

1.4 整数集合intset

//inset.h
typedef struct intset {
    uint32_t encoding;
    uint32_t length;
    int8_t contents[];
} intset;

intset.png

其中整数数组的元素从小到大排序，并且不重复

1.5 压缩列表ziplist

压缩列表是一组连续内存块组成的顺序的数据结构.在注释上写得很清楚它的数据结构了

//ziplist.c
* <zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>

zlbytes: 记录整个压缩列表占用的内存字节数
zltail: 距离起始的偏移量
zllen: 记录压缩列表包含的节点数量
entryN: 压缩列表的节点
zlend: 特殊值0xFF（十进制255），用于标记压缩列表的末端

而每一个entry如下

//ziplist.c
typedef struct zlentry {
    unsigned int prevrawlensize; /* Bytes used to encode the previous entry len*/
    unsigned int prevrawlen;     /* Previous entry len. */
    unsigned int lensize;        /* Bytes used to encode this entry type/len.
                                    For example strings have a 1, 2 or 5 bytes
                                    header. Integers always use a single byte.*/
    unsigned int len;            /* Bytes used to represent the actual entry.
                                    For strings this is just the string length
                                    while for integers it is 1, 2, 3, 4, 8 or
                                    0 (for 4 bit immediate) depending on the
                                    number range. */
    unsigned int headersize;     /* prevrawlensize + lensize. */
    unsigned char encoding;      /* Set to ZIP_STR_* or ZIP_INT_* depending on
                                    the entry encoding. However for 4 bits
                                    immediate integers this can assume a range
                                    of values and must be range-checked. */
    unsigned char *p;            /* Pointer to the very start of the entry, that
                                    is, this points to prev-entry-len field. */
} zlentry;

ziplist.png

2. 数据结构妙用

2.1 bitmap使用例子

统计某个用户登陆天数，位图上每一个位置表示一个日期

统计每一日用户登陆了多少,位图上每一个位置表示一个人(需要每一个用户映射一个位置)

3. redis的其他功能

3.1 管道

一次发送多个命令

(printf "PING\r\nPING\r\nPING\r\n"; sleep 1) | nc localhost 6379

3.2 消息订阅

#发布
publish key xxx
#订阅
SUBSCRIBE key

3.3 事务

#监控(CAS)
WATCH key
#打开事务
MULTI 
#命令
xxxxx
#执行命令
EXEC

多个事务,哪个执行先是看哪个事务的EXEC先到达

3.4 modules(模块)

布隆过滤器
某样东西一定不存在或可能存在

Counting Bloom
位图的位，升级为count

布谷鸟过滤器
两个hash函数，分别向数组的两个位置，其中有一个为空，放入，如果两个都有，那么挤走一个，被挤走的通过函数继续找位置，如果位置上有，那么将其挤走，这样循环

4. redis淘汰key策略

4.1 惰性删除

在get的时候，发现key失效，那么删除key返回null

4.2 定时主动删除

定时主动删除在 Redis 主线程中执行的，所以会影响redis对外提供的服务

因为惰性删除会导致很多key不及时删除占用很多内存，那么redis会默认每100ms检查设置了过期时间的集合，找到有过期的key则删除。会经历下面过程:

1. 随机测试20设置了过期时间的key
2. 删除所有发现的已过期的key
3. 若删除的key超过5个则重复步骤1
或者这次任务的执行耗时超过了 25 毫秒，才会退出循环

4.3 主动清理

主动清理在 Redis 主线程中执行的，所以会影响redis对外提供的服务

当Redis实例的内存超过设置的maxmemory时，会根据配置的策略maxmemory-policy来对key进行淘汰，可选的淘汰策略有如下几种，分3类记忆：

不淘汰

no-eviction：禁止驱逐数据，也就是说当内存不足以容纳新写入数据时，新写入操作会报错

在设置过期时间集合中淘汰

volatile-lru：从已设置过期时间的数据集（server.db[i].expires）中挑选最近最少使用的数据淘汰
volatile-ttl：从已设置过期时间的数据集（server.db[i].expires）中挑选将要过期的数据淘汰
volatile-random：从已设置过期时间的数据集（server.db[i].expires）中任意选择数据淘汰
volatile-lfu：从已设置过期时间的数据集(server.db[i].expires)中挑选最不经常使用的数据淘汰

在所有键中淘汰

allkeys-lru：当内存不足以容纳新写入数据时，在键空间中，移除最近最少使用的key（这个是最常用的）
allkeys-random：从数据集（server.db[i].dict）中任意选择数据淘汰
allkeys-lfu：当内存不足以容纳新写入数据时，在键空间中，移除最不经常使用的key

5. 缓存预热

系统上线时，提前将相关的缓存数据直接加载到缓存系统。避免在用户请求的时候，先查询数据库，然后再将数据缓存

需要考虑2个问题

哪些数据需要预热？
1. 根据业务不同，方法不同。
2. 例如云调就将所有纳管的设备就进行预热
如何预热？

找出了热点key之后，再根据自己的业务逻辑，到DB中查询数据填充到Redis中去.

利用springboot启动原理,观测者模式，直接实现InitializingBean即可

 public class Test implements InitializingBean {
     @Override
     public void afterPropertiesSet() throws Exception {
 
         //业务逻辑
     }
 }

Redis-数据结构
Redis有5种基本数据类型 string(字符串) 键值对计数扩容：【字符串】内部结构 list(列表) ...
Redis-数据结构
基本知识 by shihang.mai 二进制安全 redis采用字节数组存储，做到对多语言安全。在客户端需要约定...
redis-数据结构-链表
tips:本文参照《redis设计与实现》、《数据结构与算法》、redis源码链表提供了高效的节点重排能力，以及...
Redis-数据结构-对象
对象 redis没有直接使用SDS、链表、字典、压缩列表、整数集合等数据结构来实现键值对数据库，而是基于这些数...
使用过Redis，我竟然还不知道Rdb
使用过Redis，那就先说说使用过那些场景吧字符串缓存 //举例$redis->set();$redis->ge...
Redis学习之路（12）- 杂记
Redis-过期删除策略 Redis-删除策略： 1、定时删除：对内存友好，但是占用cpu 2、惰性删除：对cp...
Redis-数据结构-跳跃表
跳跃表（skiplist）跳跃表是一种有序数据结构，通过在每个节点中维护多个指向其他节点的指针，达到快速访问节点...
数据结构之Redis-跳表
在前面说Redis的文章里，提到了Redis的有序集合zset底层是依赖跳表实现的，当时没有展开讨论，内心认为...
Redis-数据结构-SDS、链表
一、简单动态字符串 SDS（simple dynamic string） 1、redis中使用SDS作为默认字符串...
Redis-数据结构-RedisObject、字典
一、对象 redisObject 1、定义与结构 redis使用对象来表示数据库中的键和值，对象包含字符串（str...

Redis-数据结构

0. 前言

1. 数据结构

1.1 字符串类型

1.2 快速列表quickList

1.3 字典

1.3 跳表

1.4 整数集合intset

1.5 压缩列表ziplist

2. 数据结构妙用

2.1 bitmap使用例子

3. redis的其他功能

3.1 管道

3.2 消息订阅

3.3 事务

3.4 modules(模块)

4. redis淘汰key策略

4.1 惰性删除

4.2 定时主动删除

4.3 主动清理

5. 缓存预热

相关文章

Redis-数据结构

Redis-数据结构

redis-数据结构-链表

Redis-数据结构-对象

使用过Redis，我竟然还不知道Rdb

Redis学习之路（12）- 杂记

Redis-数据结构-跳跃表

数据结构之Redis-跳表

Redis-数据结构-SDS、链表

Redis-数据结构-RedisObject、字典

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读