美文网首页
LMDB源码官方介绍

LMDB源码官方介绍

作者: CPinging | 来源:发表于2020-02-09 15:40 被阅读0次

前期研究了一下LMDB与RocksDB的效率,发现LMDB在查询方面确实要快于RocksDB。于是准备从理论到源码做一个对LMDB做一个系统的分析。

首先拿到代码后发现只有四个关键文件,于是本文对官方的英文解释做一个翻译,整体学习一下官方是如何对LMDB进行介绍的

image.png

我们这次要分析的文件为lmdb.h:

/** @file lmdb.h
 *  @brief Lightning memory-mapped database library
 *
 *  @mainpage   Lightning Memory-Mapped Database Manager (LMDB)
 *
 *  @section intro_sec Introduction
 *  LMDB is a Btree-based database management library modeled loosely on the
 *  BerkeleyDB API, but much simplified. The entire database is exposed
 *  in a memory map, and all data fetches return data directly
 *  from the mapped memory, so no malloc's or memcpy's occur during
 *  data fetches. As such, the library is extremely simple because it
 *  requires no page caching layer of its own, and it is extremely high
 *  performance and memory-efficient. It is also fully transactional with
 *  full ACID semantics, and when the memory map is read-only, the
 *  database integrity cannot be corrupted by stray pointer writes from
 *  application code.
 *
 *  The library is fully thread-aware and supports concurrent read/write
 *  access from multiple processes and threads. Data pages use a copy-on-
 *  write strategy so no active data pages are ever overwritten, which
 *  also provides resistance to corruption and eliminates the need of any
 *  special recovery procedures after a system crash. Writes are fully
 *  serialized; only one write transaction may be active at a time, which
 *  guarantees that writers can never deadlock. The database structure is
 *  multi-versioned so readers run with no locks; writers cannot block
 *  readers, and readers don't block writers.
 *
 *  Unlike other well-known database mechanisms which use either write-ahead
 *  transaction logs or append-only data writes, LMDB requires no maintenance
 *  during operation. Both write-ahead loggers and append-only databases
 *  require periodic checkpointing and/or compaction of their log or database
 *  files otherwise they grow without bound. LMDB tracks free pages within
 *  the database and re-uses them for new write operations, so the database
 *  size does not grow without bound in normal use.
 *
 *  The memory map can be used as a read-only or read-write map. It is
 *  read-only by default as this provides total immunity to corruption.
 *  Using read-write mode offers much higher write performance, but adds
 *  the possibility for stray application writes thru pointers to silently
 *  corrupt the database. Of course if your application code is known to
 *  be bug-free (...) then this is not an issue.
 *
 *  If this is your first time using a transactional embedded key/value
 *  store, you may find the \ref starting page to be helpful.
 *
 *  @section caveats_sec Caveats
 *  Troubleshooting the lock file, plus semaphores on BSD systems:
 *
 *  - A broken lockfile can cause sync issues.
 *    Stale reader transactions left behind by an aborted program
 *    cause further writes to grow the database quickly, and
 *    stale locks can block further operation.
 *
 *    Fix: Check for stale readers periodically, using the
 *    #mdb_reader_check function or the \ref mdb_stat_1 "mdb_stat" tool.
 *    Stale writers will be cleared automatically on some systems:
 *    - Windows - automatic
 *    - Linux, systems using POSIX mutexes with Robust option - automatic
 *    - not on BSD, systems using POSIX semaphores.
 *    Otherwise just make all programs using the database close it;
 *    the lockfile is always reset on first open of the environment.
 *
 *  - On BSD systems or others configured with MDB_USE_POSIX_SEM,
 *    startup can fail due to semaphores owned by another userid.
 *
 *    Fix: Open and close the database as the user which owns the
 *    semaphores (likely last user) or as root, while no other
 *    process is using the database.
 *
 *  Restrictions/caveats (in addition to those listed for some functions):
 *
 *  - Only the database owner should normally use the database on
 *    BSD systems or when otherwise configured with MDB_USE_POSIX_SEM.
 *    Multiple users can cause startup to fail later, as noted above.
 *
 *  - There is normally no pure read-only mode, since readers need write
 *    access to locks and lock file. Exceptions: On read-only filesystems
 *    or with the #MDB_NOLOCK flag described under #mdb_env_open().
 *
 *  - An LMDB configuration will often reserve considerable \b unused
 *    memory address space and maybe file size for future growth.
 *    This does not use actual memory or disk space, but users may need
 *    to understand the difference so they won't be scared off.
 *
 *  - By default, in versions before 0.9.10, unused portions of the data
 *    file might receive garbage data from memory freed by other code.
 *    (This does not happen when using the #MDB_WRITEMAP flag.) As of
 *    0.9.10 the default behavior is to initialize such memory before
 *    writing to the data file. Since there may be a slight performance
 *    cost due to this initialization, applications may disable it using
 *    the #MDB_NOMEMINIT flag. Applications handling sensitive data
 *    which must not be written should not use this flag. This flag is
 *    irrelevant when using #MDB_WRITEMAP.
 *
 *  - A thread can only use one transaction at a time, plus any child
 *    transactions.  Each transaction belongs to one thread.  See below.
 *    The #MDB_NOTLS flag changes this for read-only transactions.
 *
 *  - Use an MDB_env* in the process which opened it, not after fork().
 *
 *  - Do not have open an LMDB database twice in the same process at
 *    the same time.  Not even from a plain open() call - close()ing it
 *    breaks fcntl() advisory locking.  (It is OK to reopen it after
 *    fork() - exec*(), since the lockfile has FD_CLOEXEC set.)
 *
 *  - Avoid long-lived transactions.  Read transactions prevent
 *    reuse of pages freed by newer write transactions, thus the
 *    database can grow quickly.  Write transactions prevent
 *    other write transactions, since writes are serialized.
 *
 *  - Avoid suspending a process with active transactions.  These
 *    would then be "long-lived" as above.  Also read transactions
 *    suspended when writers commit could sometimes see wrong data.
 *
 *  ...when several processes can use a database concurrently:
 *
 *  - Avoid aborting a process with an active transaction.
 *    The transaction becomes "long-lived" as above until a check
 *    for stale readers is performed or the lockfile is reset,
 *    since the process may not remove it from the lockfile.
 *
 *    This does not apply to write transactions if the system clears
 *    stale writers, see above.
 *
 *  - If you do that anyway, do a periodic check for stale readers. Or
 *    close the environment once in a while, so the lockfile can get reset.
 *
 *  - Do not use LMDB databases on remote filesystems, even between
 *    processes on the same host.  This breaks flock() on some OSes,
 *    possibly memory map sync, and certainly sync between programs
 *    on different hosts.
 *
 *  - Opening a database can fail if another process is opening or
 *    closing it at exactly the same time.
 *
 *  @author Howard Chu, Symas Corporation.
 *
 *  @copyright Copyright 2011-2018 Howard Chu, Symas Corp. All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted only as authorized by the OpenLDAP
 * Public License.
 *
 * A copy of this license is available in the file LICENSE in the
 * top-level directory of the distribution or, alternatively, at
 * <http://www.OpenLDAP.org/license.html>.
 *
 *  @par Derived From:
 * This code is derived from btree.c written by Martin Hedenfalk.
 *
 * Copyright (c) 2009, 2010 Martin Hedenfalk <martin@bzero.se>
 *
 * Permission to use, copy, modify, and distribute this software for any
 * purpose with or without fee is hereby granted, provided that the above
 * copyright notice and this permission notice appear in all copies.
 *
 * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 */

解释如下:

LMDB是在BerkeleyDB的基础上基于B树与来管理数据的数据库。其所有的数据均通过mmap从内存中读出,所以不需要maclloc以及memcpy操作参与。由于不需要page缓存层的参与,所以该数据库非常简单与高效。同样该DB使用ACID语义,当内存映射为只读时,数据库完整性不会被应用程序代码中的指针写入破坏。

该DB支持多线程多进程并发读写操作。数据页使用“写时复制”策略,所以避免了数据的重写,所以支持很好的错误恢复特性。其写操作是顺序进行的,在一个时间只允许一个写操作进行因此保证了写者不会死锁。数据结构是多版本的所以读者可以在无锁的环境下进行操作;除此之外,写者不会阻塞写操作,反之也是。

不同于其他使用写前日志或追加写方案的的数据库,LMDB在操作时没有进行维护工作。上述两种方案均需要定期进行检查点保存或者log合并。而LMDB会追踪空闲的页并重复利用他们进行后续写,所以数据库的大小并不会增长的太严重。

mmap机制被用于只读或者读写映射,默认情况下呗设置为只读从而防止数据库被破坏。使用读写模式可以提高写性能,但是一些应用如果编写不严格会破坏数据库

第一次使用kv存储,我们需要看一下下面的内容:

为了解决文件锁问题,需要对BSD系统增加信号量机制:

  • 损坏的锁文件会造成同步问题

被中止的程序留下的陈旧的读取器事务会导致进一步的写入操作,从而使数据库快速增长,并且陈旧的锁可能会阻止进一步的操作。

一个时间一个线程只能操作一个任务。

一个进程不能同时打开两次LMDB数据库。

避免长周期的任务操作,读操作会阻碍写操作复用page,从而使得数据库增长速度变快。由于写是顺序进行,所以写操作同样会阻碍其他写操作。

避免将正在运行的工作挂起。作者提交时暂停的读取事务有时可能会看到错误的数据。

当多进程同时使用数据库的时候需要注意:

  • 避免强行中止正在工作的进程

需要定期检查过少id读操作,或者关闭工作环境从而使得锁文件可以重置。

不用在远程文件系统上使用LMDB,因为这会使得flock()在某些操作系统上失效。

同时打开或者关闭数据库会使得操作失效。

相关文章

网友评论

      本文标题:LMDB源码官方介绍

      本文链接:https://www.haomeiwen.com/subject/bkwwxhtx.html