写入稍微有些复杂,涉及到 cow
快照, tx
事务,freelist
页缓存等等,先看下 freelist
实现。
freelist 实现
type freelist struct {
ids []pgid // all free and available free page ids.
pending map[txid][]pgid // mapping of soon-to-be free page ids by tx.
cache map[pgid]bool // fast lookup of all free and pending page ids.
}
ids 是所有缓存页 id 的排序数组,cache 是对应所有缓存页构建的 map 用于快速查询。其中稍复杂些的就是 pending,存储每个事务所缓存的页列表,这些页列表在事务结束后很可能被释放,这也是构建 cow
的一步
持久化 freelist
DB 在调用 Open
时会生成 meta page 信息,freelist
页的 Pid 就存在 meta 中
// Read in the freelist.
db.freelist = newFreelist()
db.freelist.read(db.page(db.meta().freelist))
可以看到,调用 db.freelist.read
去构建列表。
// read initializes the freelist from a freelist page.
func (f *freelist) read(p *page) {
// If the page.count is at the max uint16 value (64k) then it's considered
// an overflow and the size of the freelist is stored as the first element.
idx, count := 0, int(p.count)
if count == 0xFFFF {
idx = 1
count = int(((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[0])
}
// Copy the list of page ids from the freelist.
if count == 0 {
f.ids = nil
} else {
ids := ((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[idx:count]
f.ids = make([]pgid, len(ids))
copy(f.ids, ids)
// Make sure they're sorted.
sort.Sort(pgids(f.ids))
}
// Rebuild the page cache.
f.reindex()
}
- 由于
p.count
是 2 bytes,如果值为 0xffff,那么第一个字段存储的是真正的 free page 个数,否则p.count
就是。 -
p.ptr
后面存储的就是 pid 数组,读取后添加到 f.ids 数组中,然后排序。最后调用reindex
,将 ids 中的 pid 添加到 f.cache map
至于保存 freelist, 就是调用 write
函数,操作方向相反而己,没什么好说的
分配 freelist
分配页时,返回 n 个连续可用页 的首个 id. 这么做是有好处的,对这片磁盘的写都是顺序写,减少随机写。但是这里有个问题,如果一直找不到,那就会大量的从物理空间申请,会有碎片化问题
// allocate returns the starting page id of a contiguous list of pages of a given size.
// If a contiguous block cannot be found then 0 is returned.
func (f *freelist) allocate(n int) pgid {
if len(f.ids) == 0 {
return 0
}
var initial, previd pgid
for i, id := range f.ids {
if id <= 1 {
panic(fmt.Sprintf("invalid page allocation: %d", id))
}
// Reset initial page if this is not contiguous. 如果不连续,那么重新计算
if previd == 0 || id-previd != 1 {
initial = id
}
// If we found a contiguous block then remove it and return it. 找到了连续的 n 个页
if (id-initial)+1 == pgid(n) {
// If we're allocating off the beginning then take the fast path
// and just adjust the existing slice. This will use extra memory
// temporarily but the append() in free() will realloc the slice
// as is necessary. 从 ids 数组中移除
if (i + 1) == n {
f.ids = f.ids[i+1:]
} else {
copy(f.ids[i-n+1:], f.ids[i+1:])
f.ids = f.ids[:len(f.ids)-n]
}
// Remove from the free cache. 从 cache map 中移除
for i := pgid(0); i < pgid(n); i++ {
delete(f.cache, initial+i)
}
return initial // 返回起始 pid
}
previd = id
}
return 0
}
释放 freelist
free
给定 txid, 释放 page 及 overflow 所有的页,将所有 pid 追加到 f.pending[txid] 数组中,并添加到 cache map
// free releases a page and its overflow for a given transaction id.
// If the page is already free then a panic will occur.
func (f *freelist) free(txid txid, p *page) {
if p.id <= 1 {
panic(fmt.Sprintf("cannot free page 0 or 1: %d", p.id))
}
// Free page and all its overflow pages.
var ids = f.pending[txid]
for id := p.id; id <= p.id+pgid(p.overflow); id++ {
// Verify that page is not already free.
if f.cache[id] {
panic(fmt.Sprintf("page %d already freed", id))
}
// Add to the freelist and cache.
ids = append(ids, id)
f.cache[id] = true
}
f.pending[txid] = ids
}
release
指定最大 txid 释放全部无效事务的引用页,遍历所有 pending 事务 map,所有事务 id 小于给定 txid 的全部释放,并追加到 f.ids 数组中,然后排序
// release moves all page ids for a transaction id (or older) to the freelist.
func (f *freelist) release(txid txid) {
m := make(pgids, 0)
for tid, ids := range f.pending {
if tid <= txid {
// Move transaction's pending pages to the available freelist.
// Don't remove from the cache since the page is still free.
m = append(m, ids...)
delete(f.pending, tid)
}
}
sort.Sort(m)
f.ids = pgids(f.ids).merge(m)
}
小结
freelist 比较容易理解,就先写到这里。
网友评论