[toc]
在前面对LinkedList进行分析的时候说到,LinkedList实际上性能比ArrayList不会高多少,只有在前向插入的时候才能比ArrayList性能高。因为LinkedList虽然在remove和insert的操作不需要数据拷贝,但是寻址需要时间,也就是说此从链表中找到需要操作的节点需要时间,只能根据链表挨个遍历。那么当时就在想,查询链表中的某一个元素能不能将O(n)的时间复杂度变为O(1)呢,那样就能充分利用链表的特点。实际上我们本章讨论的LinkedHashMap就是这样一个数据结构。其综合了HashMap和链表的优点,虽然数据结构比LinkedList更加复杂,每一个节点Entry都增加了很多指针,但是在某些场景下,是可以同时发挥Hashmap和链表的优点的数据结构。
1.类结构及其成员变量
1.1 类的基本结构
LinkedHashMap本质上只是HashMap的一个子类,之后在HashMap的基础之上,扩充了链表功能。类结构如下:
image.png
/**
* <p>Hash table and linked list implementation of the <tt>Map</tt> interface,
* with predictable iteration order. This implementation differs from
* <tt>HashMap</tt> in that it maintains a doubly-linked list running through
* all of its entries. This linked list defines the iteration ordering,
* which is normally the order in which keys were inserted into the map
* (<i>insertion-order</i>). Note that insertion order is not affected
* if a key is <i>re-inserted</i> into the map. (A key <tt>k</tt> is
* reinserted into a map <tt>m</tt> if <tt>m.put(k, v)</tt> is invoked when
* <tt>m.containsKey(k)</tt> would return <tt>true</tt> immediately prior to
* the invocation.)
*
* <p>This implementation spares its clients from the unspecified, generally
* chaotic ordering provided by {@link HashMap} (and {@link Hashtable}),
* without incurring the increased cost associated with {@link TreeMap}. It
* can be used to produce a copy of a map that has the same order as the
* original, regardless of the original map's implementation:
* <pre>
* void foo(Map m) {
* Map copy = new LinkedHashMap(m);
* ...
* }
* </pre>
* This technique is particularly useful if a module takes a map on input,
* copies it, and later returns results whose order is determined by that of
* the copy. (Clients generally appreciate having things returned in the same
* order they were presented.)
*
* <p>A special {@link #LinkedHashMap(int,float,boolean) constructor} is
* provided to create a linked hash map whose order of iteration is the order
* in which its entries were last accessed, from least-recently accessed to
* most-recently (<i>access-order</i>). This kind of map is well-suited to
* building LRU caches. Invoking the {@code put}, {@code putIfAbsent},
* {@code get}, {@code getOrDefault}, {@code compute}, {@code computeIfAbsent},
* {@code computeIfPresent}, or {@code merge} methods results
* in an access to the corresponding entry (assuming it exists after the
* invocation completes). The {@code replace} methods only result in an access
* of the entry if the value is replaced. The {@code putAll} method generates one
* entry access for each mapping in the specified map, in the order that
* key-value mappings are provided by the specified map's entry set iterator.
* <i>No other methods generate entry accesses.</i> In particular, operations
* on collection-views do <i>not</i> affect the order of iteration of the
* backing map.
*
* <p>The {@link #removeEldestEntry(Map.Entry)} method may be overridden to
* impose a policy for removing stale mappings automatically when new mappings
* are added to the map.
*
* <p>This class provides all of the optional <tt>Map</tt> operations, and
* permits null elements. Like <tt>HashMap</tt>, it provides constant-time
* performance for the basic operations (<tt>add</tt>, <tt>contains</tt> and
* <tt>remove</tt>), assuming the hash function disperses elements
* properly among the buckets. Performance is likely to be just slightly
* below that of <tt>HashMap</tt>, due to the added expense of maintaining the
* linked list, with one exception: Iteration over the collection-views
* of a <tt>LinkedHashMap</tt> requires time proportional to the <i>size</i>
* of the map, regardless of its capacity. Iteration over a <tt>HashMap</tt>
* is likely to be more expensive, requiring time proportional to its
* <i>capacity</i>.
*
* <p>A linked hash map has two parameters that affect its performance:
* <i>initial capacity</i> and <i>load factor</i>. They are defined precisely
* as for <tt>HashMap</tt>. Note, however, that the penalty for choosing an
* excessively high value for initial capacity is less severe for this class
* than for <tt>HashMap</tt>, as iteration times for this class are unaffected
* by capacity.
*
* <p><strong>Note that this implementation is not synchronized.</strong>
* If multiple threads access a linked hash map concurrently, and at least
* one of the threads modifies the map structurally, it <em>must</em> be
* synchronized externally. This is typically accomplished by
* synchronizing on some object that naturally encapsulates the map.
*
* If no such object exists, the map should be "wrapped" using the
* {@link Collections#synchronizedMap Collections.synchronizedMap}
* method. This is best done at creation time, to prevent accidental
* unsynchronized access to the map:<pre>
* Map m = Collections.synchronizedMap(new LinkedHashMap(...));</pre>
*
* A structural modification is any operation that adds or deletes one or more
* mappings or, in the case of access-ordered linked hash maps, affects
* iteration order. In insertion-ordered linked hash maps, merely changing
* the value associated with a key that is already contained in the map is not
* a structural modification. <strong>In access-ordered linked hash maps,
* merely querying the map with <tt>get</tt> is a structural modification.
* </strong>)
*
* <p>The iterators returned by the <tt>iterator</tt> method of the collections
* returned by all of this class's collection view methods are
* <em>fail-fast</em>: if the map is structurally modified at any time after
* the iterator is created, in any way except through the iterator's own
* <tt>remove</tt> method, the iterator will throw a {@link
* ConcurrentModificationException}. Thus, in the face of concurrent
* modification, the iterator fails quickly and cleanly, rather than risking
* arbitrary, non-deterministic behavior at an undetermined time in the future.
*
* <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification. Fail-fast iterators
* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>
*
* <p>The spliterators returned by the spliterator method of the collections
* returned by all of this class's collection view methods are
* <em><a href="Spliterator.html#binding">late-binding</a></em>,
* <em>fail-fast</em>, and additionally report {@link Spliterator#ORDERED}.
*
* <p>This class is a member of the
* <a href="{@docRoot}/../technotes/guides/collections/index.html">
* Java Collections Framework</a>.
*
* @implNote
* The spliterators returned by the spliterator method of the collections
* returned by all of this class's collection view methods are created from
* the iterators of the corresponding collections.
*
* @param <K> the type of keys maintained by this map
* @param <V> the type of mapped values
*
* @author Josh Bloch
* @see Object#hashCode()
* @see Collection
* @see Map
* @see HashMap
* @see TreeMap
* @see Hashtable
* @since 1.4
*/
public class LinkedHashMap<K,V>
extends HashMap<K,V>
implements Map<K,V>
{
}
以上是LinkedHashmap源码中的相关描述。可以看到LinkedHashMap继承了Hashmap,同时实现了Map的接口。其注释大意为:
LinkedHashMap同时实现了Hash表和链表这两种数据结构,是一种有序的数据结构,与Hashmap相比,这个实现维护了一个双向链表的数据结构,这个数据结构定义了一个以插入顺序为序的迭代顺序结构。需要注意的是,如果一个Key重复插入,其迭代顺序不会改变,其会按之前第一次的插入顺序返回。
这个数据结构的实现其目的是为了避免HashMap和hashTable的无序结构,同时又不会想TreeMap那样为了达到有序而带来一些额外的开销。它可以生成一些具有原始顺序副本,在实现过程中可以完全不用考虑原始的实现。
void foo(Map m) {
Map copy = new LinkedHashMap(m);
...
}
如果一个模块在输入的时候获取一个map,之后复制,然后返回结果,其顺序由复制的时候的顺序而决定,那么这种技术特别有用,(客户通常喜欢按提交的顺序归还物品。)
特殊的构造函数LinkedHashMap(int,float,boolean)用于创建迭代顺序为上次访问顺序的链表hash结构。从最近最少访问到最近访问,这种map很时候构建LRU缓存。调用put、 get、getOrDefault、compute、computeIfPresent、merge等方法将导致对应条目的访问(假设它在调用完成后存在)。replace仅仅在值被替换的时候才导致对条目的访问,putAll方法为指定映射中的每个映射生成一个条目访问,顺序是指映射的条目集迭代器提供的key-value映射。
没有其他的访问生成入口访问。尤其是需要说明的是,对集合视图的操作不会影响备份映射的迭代顺序。
removeEldestEntry(Map.Entry)方法可能会被重写,以便向Map中添加新的映射时强制执行一个策略。以自动删除过时的映射。
HashMap提供了所有的Map的操作,允许空值,与HashMap一样,其提供了恒定时间性能的add、contains、remove操作。假定hash足够离散,元素都分散在各个bucket中。其性能只比HashMap略低,比链表的时间略高。有一个例外的情况,对LinkedHashMap的集合视图进行迭代所需的时间与映射的大小成正比,而不管其容量如何, 在HashMap上迭代的性能可能比较低,需要的时间与其bucket的容量有关。而LinkedHashMap的迭代性能只与元素的数量有关。
LinkedHashMap有两个影响其性能的参数,初始化容量和负载因子。他们在Hashmap中有精确的定义。但是需要注意的是,对于LinkedHashMap来说,一旦初始容量选择过高带来的弊端比HashMap要轻,因为LinkedHashMap的迭代次数不受容量的影响。
需要注意的是,LinkedHashMap是不同步的,如果多个线程需要同时访问,并且至少有一个线程在结构上对其进行了修改,则它必须在外部同步。这通常是在一些自然封装的映射的对象上来同步实现的。
如果没有这样的对象,那么应该用Collections.synchronizedMap将这个对象包裹。最好在对象创建的时候就完成,以免造成非同步的访问。
Map m = Collections.synchronizedMap(new LinkedHashMap(...));
结构修改时添加或者删除一个或者多个操作的任何映射操作。或者在访问顺序LinkedHashMap的情况下,影响其迭代顺序的任何操作。在插入顺序的LinkedHashMap中,仅仅是更改map中已包含的key相关的值不是结构修改。在有访问顺序的linkedHashmap中,仅仅使用get进行查询也是一种结构修改。
类的所有集合视图方法返回的collections的iterator方法都是fail-fast的。如果迭代器在创建之后任何时候被修改,以任何方式(除了迭代器自己的remove)对map进行修改。迭代器将抛出ConcurrentModificationException。因此在并发修改的情况下,迭代器会快速的失败,而不是将来在某个不确定的时间,冒着任意的的不确定的风险。
需要注意的是迭代器的fail-fast行为不能得到保证,因为一般来说,存在不同步的并发修改时不可能做出任何保证。fail-fast机制尽力的抛出了ConcurrentModificationException,因此在编写一个依赖于这个异常来保证其正确性的程序是错误的。迭代器的fail-fast机制只用于检测bug。
集合的spliterators方法返回的所有Spliterator集合的late-binding和fail-fast参见Spliterator#ORDERED。
这个类是java集合框架的成员之一。
由此可以看出,LinkedHashMap实际上兼顾了HashMap和链表的优点,默认情况下可以按照插入序构建链表,另外,还可以做为LRU的缓存使用。
1.2 核心内部类Entry
我们在前面说过,HashMap树化之后,TreeNode节点是LinkedHashMap.Entry的子类。而LinkedHashMap.Entry又是HashMap.Node的子类。HashMap.Node又继承了Map.Entry。这个继承关系如下图:
image.png
/*
* Implementation note. A previous version of this class was
* internally structured a little differently. Because superclass
* HashMap now uses trees for some of its nodes, class
* LinkedHashMap.Entry is now treated as intermediary node class
* that can also be converted to tree form. The name of this
* class, LinkedHashMap.Entry, is confusing in several ways in its
* current context, but cannot be changed. Otherwise, even though
* it is not exported outside this package, some existing source
* code is known to have relied on a symbol resolution corner case
* rule in calls to removeEldestEntry that suppressed compilation
* errors due to ambiguous usages. So, we keep the name to
* preserve unmodified compilability.
*
* The changes in node classes also require using two fields
* (head, tail) rather than a pointer to a header node to maintain
* the doubly-linked before/after list. This class also
* previously used a different style of callback methods upon
* access, insertion, and removal.
*/
/**
* HashMap.Node subclass for normal LinkedHashMap entries.
*/
static class Entry<K,V> extends HashMap.Node<K,V> {
Entry<K,V> before, after;
Entry(int hash, K key, V value, Node<K,V> next) {
super(hash, key, value, next);
}
}
Entry主要的作用是实现对Node节点的链表化。构建一个新的双向链表来做为HashMap的扩展。之后LinkedHashMap可以做为链表使用。
Entry主要增加了before和after两个指针,以将HashMap的全部Entry变成双向链表。
我们可以看到的是,在jdk1.4的时候就完成了LinkedHashMap。在1.7之前没有引入红黑树的时候,Hashmap中的Entry继承关系就存在。而在引入红黑树之后,直接又定义了一个新的内部类TreeNode来继承LinkedHashMap.Entry节点。这也为我们后续代码重构提供了一个新的思路。
1.3 重要的成员变量
我们可以看到,在LinkedHashMap中扩张的成员变量位head、tail。分别指向链表的首和尾。
/**
* The head (eldest) of the doubly linked list.
*/
transient LinkedHashMap.Entry<K,V> head;
/**
* The tail (youngest) of the doubly linked list.
*/
transient LinkedHashMap.Entry<K,V> tail;
/**
* The iteration ordering method for this linked hash map: <tt>true</tt>
* for access-order, <tt>false</tt> for insertion-order.
*
* @serial
*/
final boolean accessOrder;
这三个变量均用transient修饰,也就是说,序列化的时候这三个属性不会被序列化,那么LinkedHashMap如何确保对象能从序列化中还原呢?那么与HashMap相同,自定义了序列化的方法,这在后面将单独介绍。
head表示双向链表的头部,也就是最早插入的元素。而tail将表示链表的尾部,为最近插入的元素。accessOrder表示链表的迭代顺序,当为true的时候,按访问顺序也就是最近访问的元素将被移动到head,如果为false,则按照插入顺序。
2.LinkedHashMap的基本原理
看过第一部分的内容之后,我们对LinkedHashMap的基本原理有了一个最基本的了解。其本身就是一个在Hashmap的基础上的扩展,让HashMap的每一个元素都添加了链表的指针结构。
那么我们假定存在如下LinkedHashMap,在accessOrder为false的情况下。我们插入了5个Entry。其结构如下图所示:
image.png
上图绿色箭头表示after指针。红色箭头表示before指针。上图仅仅表示举例对LinkedHashMap的结构进行说明,在真实的情况下可能很难出现上述这种HashMap。
LinkedHashMap中存在两个指征,head和tail。分别指向链表的第一个插入的元素和最后插入的元素。这样就能分别从这两个指征获取Entry并对LinkedHashMap的各个元素遍历。
3.构造函数
LinkedhashMap主要提供了如下4个构造函数,由于LinkedhashMap能够通过accessorder来控制链表的迭代的顺序,实际上就是控制按插入序还是按访问序。
3.1 LinkedHashMap()
空的构造函数。
public LinkedHashMap() {
super();
accessOrder = false;
}
调用了super,即HashMap中的构造函数。初始化长度为16,默认负载因子为0.75。accessorder为false,表示按插入序。
3.2 LinkedHashMap(int initialCapacity)
指定初始化的容量。
public LinkedHashMap(int initialCapacity) {
super(initialCapacity);
accessOrder = false;
}
同样使用了HashMap的构造函数。默认的accessorder为false按插入序。负载因子为0.75。
3.3 LinkedHashMap(int initialCapacity, float loadFactor)
通过这个构造函数可以同时指定初始容量和负载因子。
public LinkedHashMap(int initialCapacity, float loadFactor) {
super(initialCapacity, loadFactor);
accessOrder = false;
}
3.4 LinkedHashMap(Map<? extends K, ? extends V> m)
将一个map通过构造函数的方式按插入序变成LinkedHashMap。但是需要注意的是,HashMap本身无序,因此此处的插入序是不确定的。
public LinkedHashMap(Map<? extends K, ? extends V> m) {
super();
accessOrder = false;
putMapEntries(m, false);
}
但是在插入之后,会实现按putMapEntrues中的方法进行插入。
putMapEntries也是HashMap中的方法,按table进行遍历。
3.5 LinkedHashMap(int initialCapacity,float loadFactor,boolean accessOrder)
这个构造函数是LinkedHashMap最重要的一个构造函数,因为如果我们要通过LinkedHashMap来实现一个LRU的缓存,那么必须采用这个构造函数,将accessOrder指定为true。
public LinkedHashMap(int initialCapacity,
float loadFactor,
boolean accessOrder) {
super(initialCapacity, loadFactor);
this.accessOrder = accessOrder;
}
初始容量和负载因子都能自行控制。accessOrder只是控制了一个变量,但是后续在各get和remove的方法中将会根据这个变量变成不同的行为。
4.重要方法
4.1 put
我们先来看看LinkedHashMap中如何进行Put的。
实际上LinkedHashMap没有重载put方法,而是利用了HashMap中的put。
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
实际上还是调用的putVal
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
以上是HashMap中的put方法,我们发现,putVal是final修饰的,也就是说LinkedhashMap无法重写这个方法。那么HashMap中是否会考虑到put之后链表的关系如何维护呢?因为putVal中全部都是对Hashmap是否需要树化的操作。实际上,在Hashmap中的putVal中,已经预留了后处理的方法。
// Callbacks to allow LinkedHashMap post-actions
void afterNodeAccess(Node<K,V> p) { }
void afterNodeInsertion(boolean evict) { }
void afterNodeRemoval(Node<K,V> p) { }
上述三个方法在HashMap中的时候都是空的。就是为了预留给LinkedHashMap进行使用的。
链表的维护操作都在这三个方法中。分别对应访问和插入、删除操作。我们后续来看看这三个方法。
4.2 afterNodeAccess
此方法是access之后的后处理操作,如果accessOrder为true,那么被access的节点就要移动到tail。tail就是最新的节点。head是最老的节点。
void afterNodeAccess(Node<K,V> e) { // move node to last
LinkedHashMap.Entry<K,V> last;
//判断accessOrder是否为true,当前节点不为tail
if (accessOrder && (last = tail) != e) {
LinkedHashMap.Entry<K,V> p =
(LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after;
//定义指针
p.after = null;
//将e的after变为null
if (b == null)
head = a;
else
b.after = a;
//
if (a != null)
a.before = b;
else
last = b;
if (last == null)
head = p;
else {
p.before = last;
last.after = p;
}
//tail指向p
tail = p;
++modCount;
}
}
4.3 afterNodeInsertion
我们可以看到,在Insert之后调用afterNodeInsertion,在LinkedHashMap中的真正意图就是可能会将最老的元素删除。但是实际上,由于removeEldestEntry始终返回false,那么也就意味着这个方法中的逻辑永远不会被执行。
void afterNodeInsertion(boolean evict) { // possibly remove eldest
LinkedHashMap.Entry<K,V> first;
if (evict && (first = head) != null && removeEldestEntry(first)) {
K key = first.key;
removeNode(hash(key), key, null, false, true);
}
}
protected boolean removeEldestEntry(Map.Entry<K,V> eldest) {
return false;
}
实际上afterNodeInsertion不会做任何操作。也就是说,LinkedHashMap本身并不提供删除最老元素的实现,如果要实现LRU的缓存,对老元素进行删除的话,可能这个方法需要我们自己重写来实现。
4.4 afterNodeRemoval
这个方法的意思是在删除元素之后,将这个元素的前后的元素重写组成链表。
void afterNodeRemoval(Node<K,V> e) { // unlink
LinkedHashMap.Entry<K,V> p =
(LinkedHashMap.Entry<K,V>)e, b = p.before, a = p.after;
p.before = p.after = null;
if (b == null)
head = a;
else
b.after = a;
if (a == null)
tail = b;
else
a.before = b;
}
4.5 get
get方法则进行重写:
public V get(Object key) {
Node<K,V> e;
if ((e = getNode(hash(key), key)) == null)
return null;
if (accessOrder)
afterNodeAccess(e);
return e.value;
}
实际上还是调用的HashMap方法中的getNode方法。之后再调用afterNodeAccess进行整理。实际上accessOrder方法最关键的就是通过这个afterNodeAccess方法来调整链表的顺序。
4.6 remove
remove方法实际上是HashMap中的remove方法:
public V remove(Object key) {
Node<K,V> e;
return (e = removeNode(hash(key), key, null, false, true)) == null ?
null : e.value;
}
之后在removeNode方法中调用了afterNodeRemoval方法。对链表进行整理。
5.视图类
同样,LinkedHashMap也存在基于key、values和entrys的视图类。其原理与HashMap类似,只不过,在某些地方重写以实现了链表的遍历及特性。
public Set<K> keySet() {
Set<K> ks = keySet;
if (ks == null) {
ks = new LinkedKeySet();
keySet = ks;
}
return ks;
}
final class LinkedKeySet extends AbstractSet<K> {
public final int size() { return size; }
public final void clear() { LinkedHashMap.this.clear(); }
public final Iterator<K> iterator() {
return new LinkedKeyIterator();
}
public final boolean contains(Object o) { return containsKey(o); }
public final boolean remove(Object key) {
return removeNode(hash(key), key, null, false, true) != null;
}
public final Spliterator<K> spliterator() {
return Spliterators.spliterator(this, Spliterator.SIZED |
Spliterator.ORDERED |
Spliterator.DISTINCT);
}
public final void forEach(Consumer<? super K> action) {
if (action == null)
throw new NullPointerException();
int mc = modCount;
for (LinkedHashMap.Entry<K,V> e = head; e != null; e = e.after)
action.accept(e.key);
if (modCount != mc)
throw new ConcurrentModificationException();
}
}
如上即是keySet,其迭代器和Spliterator都存在。
5.1 Iterator
final class LinkedKeyIterator extends LinkedHashIterator
implements Iterator<K> {
public final K next() { return nextNode().getKey(); }
}
我们可以看到基于Key的Iterator,实际上比HashMap中的实现方式简单,因为可以直接使用链表。直接从链表返回。同理LinkedValueIterator和LinkedEntryIterator也相同。
5.2 Spliterators
我们可以看到在LinkedKeySet中:
public final Spliterator<K> spliterator() {
return Spliterators.spliterator(this, Spliterator.SIZED |
Spliterator.ORDERED |
Spliterator.DISTINCT);
}
这个方法比HashMap的Spliterator多了一个属性即是Spliterator.ORDERED 。
同理LinkedEntrySet 及LinkedValues都有类似的结构,这就不一一进行分析了。
6.总结
以上本文是我对LinkedHashMap的理解,LinkedHashMap也是一个常用的集合类,兼具了Hashmap和链表的优点。在get元素的时候可以巧妙的利用HashMap的特性将查找的性能降低到O(1)。这也是我们在某些情况下可以利用的。
比如在某些情况下,实现一个基于LRU的缓存,虽然LinkedHashMap本身没有提供删除元素的方法,但是我们可以根据list的长度自行控制,利用accessorder的特性,将超过一定长度的head元素删除。
网友评论