閱讀(40.3k) 書簽贊(0) 我要糾錯

Redis 字典

2018-08-02 11:45 更新

字典

字典的應(yīng)用
實現(xiàn)數(shù)據(jù)庫鍵空間
用作 Hash 類型鍵的底層實現(xiàn)之一
字典的實現(xiàn)
哈希表實現(xiàn)
哈希算法
創(chuàng)建新字典
添加鍵值對到字典
添加新元素到空白字典
添加新鍵值對時發(fā)生碰撞處理
添加新鍵值對時觸發(fā)了 rehash 操作
Rehash 執(zhí)行過程
1. 開始 rehash
2. Rehash 進行中
3. 節(jié)點遷移完畢
4. Rehash 完畢
漸進式 rehash
_dictRehashStep
dictRehashMilliseconds
其他措施
字典的收縮
字典其他操作
字典的迭代
小結(jié)

字典（dictionary），又名映射（map）或關(guān)聯(lián)數(shù)組（associative array）， [http://en.wikipedia.org/wiki/Associative_array]是一種抽象數(shù)據(jù)結(jié)構(gòu)，由一集鍵值對（key-value pairs）組成，各個鍵值對的鍵各不相同，程序可以添加新的鍵值對到字典中，或者基于鍵進行查找、更新或刪除等操作。

本章先對字典在 Redis 中的應(yīng)用進行介紹，接著講解字典的具體實現(xiàn)方式，以及這個字典實現(xiàn)要解決的問題，最后，以對字典迭代器的介紹作為本章的結(jié)束。

字典的應(yīng)用

字典在 Redis 中的應(yīng)用廣泛，使用頻率可以說和 SDS 以及雙端鏈表不相上下，基本上各個功能模塊都有用到字典的地方。

其中，字典的主要用途有以下兩個：

實現(xiàn)數(shù)據(jù)庫鍵空間（key space）；
用作 Hash 類型鍵的底層實現(xiàn)之一；

以下兩個小節(jié)分別介紹這兩種用途。

實現(xiàn)數(shù)據(jù)庫鍵空間

Redis 是一個鍵值對數(shù)據(jù)庫，數(shù)據(jù)庫中的鍵值對由字典保存：每個數(shù)據(jù)庫都有一個對應(yīng)的字典，這個字典被稱之為鍵空間（key space）。

當(dāng)用戶添加一個鍵值對到數(shù)據(jù)庫時（不論鍵值對是什么類型），程序就將該鍵值對添加到鍵空間；當(dāng)用戶從數(shù)據(jù)庫中刪除鍵值對時，程序就會將這個鍵值對從鍵空間中刪除；等等。

舉個例子，執(zhí)行 FLUSHDB [http://redis.readthedocs.org/en/latest/server/flushdb.html#flushdb] 可以清空鍵空間里的所有鍵值對數(shù)據(jù)：

redis> FLUSHDB
OK

執(zhí)行 DBSIZE [http://redis.readthedocs.org/en/latest/server/dbsize.html#dbsize] 則返回鍵空間里現(xiàn)有的鍵值對：

redis> DBSIZE
(integer) 0

還可以用 SET [http://redis.readthedocs.org/en/latest/string/set.html#set] 設(shè)置一個字符串鍵到鍵空間，并用 GET [http://redis.readthedocs.org/en/latest/string/get.html#get] 從鍵空間中取出該字符串鍵的值：

redis> SET number 10086
OK

redis> GET number
"10086"

redis> DBSIZE
(integer) 1

后面的《數(shù)據(jù)庫》一章會對鍵空間以及數(shù)據(jù)庫的實現(xiàn)作詳細的介紹，屆時將看到，大部分針對數(shù)據(jù)庫的命令，比如 DBSIZE [http://redis.readthedocs.org/en/latest/server/dbsize.html#dbsize] 、 FLUSHDB [http://redis.readthedocs.org/en/latest/server/flushdb.html#flushdb] 、 RANDOMKEY [http://redis.readthedocs.org/en/latest/key/randomkey.html#randomkey] ，等等，都是構(gòu)建于對字典的操作之上的；而那些創(chuàng)建、更新、刪除和查找鍵值對的命令，也無一例外地需要在鍵空間上進行操作。

用作 Hash 類型鍵的底層實現(xiàn)之一

Redis 的 Hash 類型鍵使用以下兩種數(shù)據(jù)結(jié)構(gòu)作為底層實現(xiàn):

字典；
壓縮列表；

因為壓縮列表比字典更節(jié)省內(nèi)存，所以程序在創(chuàng)建新 Hash 鍵時，默認使用壓縮列表作為底層實現(xiàn)，當(dāng)有需要時，程序才會將底層實現(xiàn)從壓縮列表轉(zhuǎn)換到字典。

當(dāng)用戶操作一個 Hash 鍵時，鍵值在底層就可能是一個哈希表：

redis> HSET book name "The design and implementation of Redis"
(integer) 1

redis> HSET book type "source code analysis"
(integer) 1

redis> HSET book release-date "2013.3.8"
(integer) 1

redis> HGETALL book
1) "name"
2) "The design and implementation of Redis"
3) "type"
4) "source code analysis"
5) "release-date"
6) "2013.3.8"

《哈希表》章節(jié)給出了關(guān)于哈希類型鍵的更多信息，并介紹了壓縮列表和字典之間的轉(zhuǎn)換條件。

介紹完了字典的用途，現(xiàn)在讓我們來看看字典數(shù)據(jù)結(jié)構(gòu)的定義。

字典的實現(xiàn)

實現(xiàn)字典的方法有很多種：

最簡單的就是使用鏈表或數(shù)組，但是這種方式只適用于元素個數(shù)不多的情況下；
要兼顧高效和簡單性，可以使用哈希表；
如果追求更為穩(wěn)定的性能特征，并希望高效地實現(xiàn)排序操作的話，則可使用更為復(fù)雜的平衡樹；

在眾多可能的實現(xiàn)中，Redis 選擇了高效、實現(xiàn)簡單的哈希表，作為字典的底層實現(xiàn)。

dict.h/dict 給出了這個字典的定義：

/*
 * 字典
 *
 * 每個字典使用兩個哈希表，用于實現(xiàn)漸進式 rehash
 */
typedef struct dict {

    // 特定于類型的處理函數(shù)
    dictType *type;

    // 類型處理函數(shù)的私有數(shù)據(jù)
    void *privdata;

    // 哈希表（2 個）
    dictht ht[2];

    // 記錄 rehash 進度的標(biāo)志，值為 -1 表示 rehash 未進行
    int rehashidx;

    // 當(dāng)前正在運作的安全迭代器數(shù)量
    int iterators;

} dict;

以下是用于處理 dict 類型的 API ，它們的作用及相應(yīng)的算法復(fù)雜度：

操作	函數(shù)	算法復(fù)雜度
創(chuàng)建一個新字典	`dictCreate`	(O(1))
添加新鍵值對到字典	`dictAdd`	(O(1))
添加或更新給定鍵的值	`dictReplace`	(O(1))
在字典中查找給定鍵所在的節(jié)點	`dictFind`	(O(1))
在字典中查找給定鍵的值	`dictFetchValue`	(O(1))
從字典中隨機返回一個節(jié)點	`dictGetRandomKey`	(O(1))
根據(jù)給定鍵，刪除字典中的鍵值對	`dictDelete`	(O(1))
清空并釋放字典	`dictRelease`	(O(N))
清空并重置（但不釋放）字典	`dictEmpty`	(O(N))
縮小字典	`dictResize`	(O(N))
擴大字典	`dictExpand`	(O(N))
對字典進行給定步數(shù)的 rehash	`dictRehash`	(O(N))
在給定毫秒內(nèi)，對字典進行rehash	`dictRehashMilliseconds`	(O(N))

注意 dict 類型使用了兩個指針，分別指向兩個哈希表。

其中，0 號哈希表（ht[0]）是字典主要使用的哈希表，而 1 號哈希表（ht[1]）則只有在程序?qū)?0 號哈希表進行 rehash 時才使用。

接下來兩個小節(jié)將對哈希表的實現(xiàn)，以及哈希表所使用的哈希算法進行介紹。

哈希表實現(xiàn)

字典所使用的哈希表實現(xiàn)由 dict.h/dictht 類型定義：

/*
 * 哈希表
 */
typedef struct dictht {

    // 哈希表節(jié)點指針數(shù)組（俗稱桶，bucket）
    dictEntry **table;

    // 指針數(shù)組的大小
    unsigned long size;

    // 指針數(shù)組的長度掩碼，用于計算索引值
    unsigned long sizemask;

    // 哈希表現(xiàn)有的節(jié)點數(shù)量
    unsigned long used;

} dictht;

table 屬性是個數(shù)組，數(shù)組的每個元素都是個指向 dictEntry 結(jié)構(gòu)的指針。

每個 dictEntry 都保存著一個鍵值對，以及一個指向另一個 dictEntry 結(jié)構(gòu)的指針：

/*
 * 哈希表節(jié)點
 */
typedef struct dictEntry {

    // 鍵
    void *key;

    // 值
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
    } v;

    // 鏈往后繼節(jié)點
    struct dictEntry *next;

} dictEntry;

next 屬性指向另一個 dictEntry 結(jié)構(gòu)，多個 dictEntry 可以通過 next 指針串連成鏈表，從這里可以看出，dictht使用鏈地址法來處理鍵碰撞 [http://en.wikipedia.org/wiki/Hash_table#Separate_chaining]：當(dāng)多個不同的鍵擁有相同的哈希值時，哈希表用一個鏈表將這些鍵連接起來。

下圖展示了一個由 dictht 和數(shù)個 dictEntry 組成的哈希表例子：

$digraph hash_table_example { // setting rankdir = LR; node[shape = record, style = filled]; edge [style = bold]; // nodes ht1 [label=$ dictht |

如果再加上之前列出的 dict 類型，那么整個字典結(jié)構(gòu)可以表示如下：

$digraph hash_table_example { // setting rankdir = LR; node[shape=record, style = filled]; edge [style = bold]; // nodes dict [label=$ ht[2] | rehashidx: -1 | iterators: 0", fillcolor = "#A8E270"]; ht0 [label="dictht |

在上圖的字典示例中，字典雖然創(chuàng)建了兩個哈希表，但正在使用的只有 0 號哈希表，這說明字典未進行 rehash 狀態(tài)。

哈希算法

Redis 目前使用兩種不同的哈希算法：

MurmurHash2 32 bit 算法：這種算法的分布率和速度都非常好，具體信息請參考 MurmurHash 的主頁： http://code.google.com/p/smhasher/ 。
基于 djb 算法實現(xiàn)的一個大小寫無關(guān)散列算法：具體信息請參考 http://www.cse.yorku.ca/~oz/hash.html 。

使用哪種算法取決于具體應(yīng)用所處理的數(shù)據(jù)：

命令表以及 Lua 腳本緩存都用到了算法 2 。
算法 1 的應(yīng)用則更加廣泛：數(shù)據(jù)庫、集群、哈希鍵、阻塞操作等功能都用到了這個算法。

創(chuàng)建新字典

dictCreate 函數(shù)創(chuàng)建并返回一個新字典：

dict *d = dictCreate(&hash_type, NULL);

d 的值可以用圖片表示如下：

$digraph empty_dict { // setting rankdir = LR; node[shape = record, style = filled]; edge [style = bold]; // nodes dict [label=$ ht[2] | rehashidx | iterators", fillcolor = "#A8E270"]; ht0 [label="dictht |

新創(chuàng)建的兩個哈希表都沒有為 table 屬性分配任何空間：

ht[0]->table 的空間分配將在第一次往字典添加鍵值對時進行；
ht[1]->table 的空間分配將在 rehash 開始時進行；

添加鍵值對到字典

根據(jù)字典所處的狀態(tài)，將給定的鍵值對添加到字典可能會引起一系列復(fù)雜的操作：

如果字典為未初始化（即字典的 0 號哈希表的 table 屬性為空），則程序需要對 0 號哈希表進行初始化；
如果在插入時發(fā)生了鍵碰撞，則程序需要處理碰撞；
如果插入新元素，使得字典滿足了 rehash 條件，則需要啟動相應(yīng)的 rehash 程序；

當(dāng)程序處理完以上三種情況之后，新的鍵值對才會被真正地添加到字典上。

整個添加流程可以用下圖表示：

digraph dictAdd { node[shape=plaintext, style = filled]; edge [style = bold]; // start [label= key_exists_or_not; return_null_if_key_exists [label="返回 NULL ，\n表示添加失敗"]; key_exists_or_not -> return_null_if_key_exists [label="是"]; dict_empty_or_not [label="ht[0]\n 未分配任何空間？", shape=diamond, fillcolor = "#95BBE3"]; key_exists_or_not -> dict_empty_or_not [label="否"]; init_hash_table_one [label="初始化 ht[0]"]; dict_empty_or_not -> init_hash_table_one [label="是"]; init_hash_table_one -> need_rehash_or_not; need_rehash_or_not [label="需要 rehash ？", shape=diamond, fillcolor = "#95BBE3"]; dict_empty_or_not -> need_rehash_or_not [label="否"]; begin_incremental_rehash [label="開始漸進式 rehash "]; need_rehash_or_not -> begin_incremental_rehash [label="需要，\n并且 rehash 未進行"]; begin_incremental_rehash -> rehashing_or_not; rehashing_or_not [label="rehash\n 正在進行中？", shape=diamond, fillcolor = "#95BBE3"]; need_rehash_or_not -> rehashing_or_not [label="不需要，\n或者 rehash 正在進行"]; is_rehashing [label="選擇 ht[1] 作為新鍵值對的添加目標(biāo)"]; not_rehashing [label="選擇 ht[0] 作為新鍵值對的添加目標(biāo)"]; rehashing_or_not -> is_rehashing [label="是"]; rehashing_or_not -> not_rehashing [label="否"]; calc_hash_code_and_index_by_key [label="根據(jù)給定鍵，計算出哈希值，以及索引值"]; is_rehashing -> calc_hash_code_and_index_by_key; not_rehashing -> calc_hash_code_and_index_by_key; create_entry_and_assoc_key_and_value [label="創(chuàng)建新 dictEntry ，并保存給定鍵值對"]; calc_hash_code_and_index_by_key -> create_entry_and_assoc_key_and_value; add_entry_to_hashtable [label="根據(jù)索引值，將新節(jié)點添加到目標(biāo)哈希表"]; create_entry_and_assoc_key_and_value -> add_entry_to_hashtable;}" />

在接下來的三節(jié)中，我們將分別看到，添加操作如何在以下三種情況中執(zhí)行：

字典為空；
添加新鍵值對時發(fā)生碰撞處理；
添加新鍵值對時觸發(fā)了 rehash 操作；

添加新元素到空白字典

當(dāng)?shù)谝淮瓮兆值淅锾砑渔I值對時，程序會根據(jù) dict.h/DICT_HT_INITIAL_SIZE 里指定的大小為d->ht[0]->table 分配空間（在目前的版本中， DICT_HT_INITIAL_SIZE 的值為 4 ）。

以下是字典空白時的樣子：

以下是往空白字典添加了第一個鍵值對之后的樣子：

$digraph add_first_entry_to_empty_dict { // setting rankdir = LR; node[shape=record, style = filled]; edge [style = bold]; // nodes dict [label=$ ht[2] | rehashidx | iterators", fillcolor = "#A8E270"]; ht0 [label="dictht |

添加新鍵值對時發(fā)生碰撞處理

在哈希表實現(xiàn)中，當(dāng)兩個不同的鍵擁有相同的哈希值時，稱這兩個鍵發(fā)生碰撞（collision），而哈希表實現(xiàn)必須想辦法對碰撞進行處理。

字典哈希表所使用的碰撞解決方法被稱之為鏈地址法 [http://en.wikipedia.org/wiki/Hash_table#Separate_chaining]：這種方法使用鏈表將多個哈希值相同的節(jié)點串連在一起，從而解決沖突問題。

假設(shè)現(xiàn)在有一個帶有三個節(jié)點的哈希表，如下圖：

$digraph before_key_collision { // setting rankdir = LR; node[shape=record, style = filled]; edge [style = bold]; // nodes bucket [label=$ 0 | 1 | 2 | 3 ", fillcolor = "#F2F2F2"]; pair_1 [label="dictEntry |{key1 | value1 |next}", fillcolor = "#FADCAD"]; pair_2 [label="dictEntry |{key2 | value2 |next}", fillcolor = "#FADCAD"]; pair_3 [label="dictEntry |{key3 | value3 |next}", fillcolor = "#FADCAD"]; null0 [label="NULL", shape=plaintext]; null1 [label="NULL", shape=plaintext]; null2 [label="NULL", shape=plaintext]; null3 [label="NULL", shape=plaintext]; // lines bucket:table0 -> pair_1:head; pair_1:next -> null0; bucket:table1 -> null1; bucket:table2 -> pair_2:head; pair_2:next -> null2; bucket:table3 -> pair_3:head; pair_3:next -> null3; // label label = "添加碰撞節(jié)點之前";}" />

對于一個新的鍵值對 key4 和 value4 ，如果 key4 的哈希值和 key1 的哈希值相同，那么它們將在哈希表的 0 號索引上發(fā)生碰撞。

通過將 key4-value4 和 key1-value1 兩個鍵值對用鏈表連接起來，就可以解決碰撞的問題：

$digraph after_key_collision { // setting rankdir = LR; node[shape=record, style = filled]; edge [style = bold]; // nodes bucket [label=$ 0 | 1 | 2 | 3 ", fillcolor = "#F2F2F2"]; pair_1 [label="dictEntry |{key1 | value1 |next}", fillcolor = "#FADCAD"]; pair_2 [label="dictEntry |{key2 | value2 |next}", fillcolor = "#FADCAD"]; pair_3 [label="dictEntry |{key3 | value3 |next}", fillcolor = "#FADCAD"]; pair_4 [label="dictEntry |{key4 | value4 |next}", fillcolor = "#FFC1C1"]; null0 [label="NULL", shape=plaintext]; null1 [label="NULL", shape=plaintext]; null2 [label="NULL", shape=plaintext]; null3 [label="NULL", shape=plaintext]; // lines bucket:table0 -> pair_4:head; pair_4:next -> pair_1:head; pair_1:next -> null0; bucket:table1 -> null1; bucket:table2 -> pair_2:head; pair_2:next -> null2; bucket:table3 -> pair_3:head; pair_3:next -> null3; // label label = "添加碰撞節(jié)點之后";}" />

添加新鍵值對時觸發(fā)了 rehash 操作

對于使用鏈地址法來解決碰撞問題的哈希表 dictht 來說，哈希表的性能取決于大?。?code>size屬性）與保存節(jié)點數(shù)量（used屬性）之間的比率：

哈希表的大小與節(jié)點數(shù)量，比率在 1:1 時，哈希表的性能最好；
如果節(jié)點數(shù)量比哈希表的大小要大很多的話，那么哈希表就會退化成多個鏈表，哈希表本身的性能優(yōu)勢便不復(fù)存在；

舉個例子，下面這個哈希表，平均每次失敗查找只需要訪問 1 個節(jié)點（非空節(jié)點訪問 2 次，空節(jié)點訪問 1 次）：

以上內(nèi)容是否對您有幫助：

← Redis雙端鏈表

Redis 跳躍表 →

寫筆記

我要補充