JDK源码剖析与最佳实践—ArrayList

作者: 丘八老爷 | 来源:发表于2017-04-02 22:01 被阅读69次

知其然，需知其所以然。——古语

知其所以然，需引而伸之，触类而长之；——虫草

最近准备研究下JDK源码，把常用的一些类作个剖析整理，出个系列文章。ArrayList应该是在开发过程中非常高频使用的一个集合类，就先拿这个类开刀了。

笔者使用的JDK版本为：1.8.0_102，由于源码太多，有些也比较简单，所以挑一些重点说明下。

<h3>一、整体介绍</h3>
ArrayList类如其名，是一个可以动态扩容的数组列表，是List家族中的一员，支持随机访问，而且在JDK8中新支持了Stream API，使用起来还是非常方便的。不过该类不是线程安全的，所以在多线程情况下需要小心使用。

<h3>二、源码剖析与实践</h3>
<h4>2.1 成员变量</h4>
transient Object[] elementData;
private int size;
其中elementData即为ArrayList实现依赖的数组，size为ArrayList实际包含的元素的个数。这里elementData有transient，为什么要加这个呢？看后面的2.5小节。

<h4>2.2 构造函数</h4>
<code>public ArrayList(Collection<? extends E> c) {
elementData = c.toArray();
if ((size = elementData.length) != 0) {
// c.toArray might (incorrectly) not return Object[] (see 6260652)
if (elementData.getClass() != Object[].class)
elementData = Arrays.copyOf(elementData, size, Object[].class);
} else {
// replace with empty array.
this.elementData = EMPTY_ELEMENTDATA;
}
}</code>
这个构造函数可以直接基于collection实现构造成ArrayList，具体过程是先将参数转化成对应数组，再调用Arrays.copyOf方法进行元素的复制，而Arrays.copyOf方法实际是基于System.arraycopy这个本地方法执行的操作。这两个方法在ArrayList被大量用到，主要就是用作数组间的元素拷贝。

<b>最佳实践：在Collection家族的集合类需要转化成ArrayList时，不需要遍历设值，可以直接使用构造方法，这样代码清晰简单，而且性能也好些。</b>

<h4>2.3 扩容方法</h4>
<code>
private void ensureCapacityInternal(int minCapacity) {
if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
}
ensureExplicitCapacity(minCapacity);
}
private void ensureExplicitCapacity(int minCapacity) {
modCount++;
// overflow-conscious code
if (minCapacity - elementData.length > 0)
grow(minCapacity);
}
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + (oldCapacity >> 1);
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
// minCapacity is usually close to size, so this is a win:
elementData = Arrays.copyOf(elementData, newCapacity);
}
private static int hugeCapacity(int minCapacity) {
if (minCapacity < 0) // overflow
throw new OutOfMemoryError();
return (minCapacity > MAX_ARRAY_SIZE) ?
Integer.MAX_VALUE :
MAX_ARRAY_SIZE;
}
</code>

扩容方法主要看ensureCapacityInternal这个私有方法，该方法在列表添加元素时会被调用，以保证数组容量足够。其它扩容相关私有方法都是该方法去调用的。其中比较关键的是grow方法，参数minCapacity为即将达到的元素个数，newCapacity为旧数组容量的1.5倍，哪个值大以哪个容量为准。不过这里还需要注意的是ArrayList的最大长度也是有限制的，最大也只有Integer.MAX_VALUE，而且再设置成最大之前，还会先设置容量为MAX_ARRAY_SIZE 。

为什么先设置成MAX_ARRAY_SIZE呢？这里源码给出的注释是MAX_ARRAY_SIZE是最大分配的数组大小，因为有些虚拟机还存储一些头信息在数组里，数组容量分配过大可能会有OutOfMemoryError。所以其实更安全的最大数组长度是MAX_ARRAY_SIZE。当然实际情况长度很少可能这么长。

另外这里扩容的时候为什么是原来的1.5倍呢？原因是如果扩容倍数太大，比如2.5倍，那么占用的内存会太大，浪费的内存也会相应多；而扩容太小，比如1.1倍，那么后期元素增加时又需要对数组重新分配内存，消耗性能。所以1.5倍是个经过测试的折衷值。【另可参见《编写高质量代码》建议63】

<b>最佳实践：由于ArrayList动态扩容的特性，而且大多数情况下是扩容1.5倍，所以在已知列表容量时，最好先为列表指定初始化容量，这样可以避免内存空间的浪费，以及扩容过程数组复制的性能开销。</b>

<h4>2.4 差集与交集</h4>
<code>public boolean removeAll(Collection<?> c) {
Objects.requireNonNull(c);
return batchRemove(c, false);
}
public boolean retainAll(Collection<?> c) {
Objects.requireNonNull(c);
return batchRemove(c, true);
}
private boolean batchRemove(Collection<?> c, boolean complement) {
final Object[] elementData = this.elementData;
int r = 0, w = 0;
boolean modified = false;
try {
for (; r < size; r++)
if (c.contains(elementData[r]) == complement)
elementData[w++] = elementData[r];
} finally {
// Preserve behavioral compatibility with AbstractCollection,
// even if c.contains() throws.
if (r != size) {
System.arraycopy(elementData, r,
elementData, w,
size - r);
w += size - r;
}
if (w != size) {
// clear to let GC do its work
for (int i = w; i < size; i++)
elementData[i] = null;
modCount += size - w;
size = w;
modified = true;
}
}
return modified;
}</code>
这里removeAll方法是将在参数集合中存在的元素中从list中删除，相当于ArrayList与集合参数c的差集；而retainAll方法是将参数集合中不存在的元素中从list中删除，相当于ArrayList与集合参数c的交集。

这里两个方法都调用的是batchRemove方法，只是complement值传的不同，去判断到底是留包含的还是不包含的。这里有逻辑很多逻辑是写在finally的，是为了保证contains判断报错时也正常执行。另外中间有个elementData[i] = null操作，把所有用不到的元素置为null，这样也便于垃圾回收。

<b>最佳实践：（1）集合之间取并集用addAll，取差集用removeAll，取交集用retainAll；（2）如果数组元素用不到可以置为空，另外在代码里面如果用到一些的List对象，如果不用了，最好调用clear方法，特别是大List或循环创建的。</b>

<h4>2.5 序列化与反序列化</h4>
<code>private void writeObject(java.io.ObjectOutputStream s)
throws java.io.IOException{
// Write out element count, and any hidden stuff
int expectedModCount = modCount;
s.defaultWriteObject();
// Write out size as capacity for behavioural compatibility with clone()
s.writeInt(size);
// Write out all elements in the proper order.
for (int i=0; i<size; i++) {
s.writeObject(elementData[i]);
}
if (modCount != expectedModCount) {
throw new ConcurrentModificationException();
}
}
private void readObject(java.io.ObjectInputStream s)
throws java.io.IOException, ClassNotFoundException {
elementData = EMPTY_ELEMENTDATA;
// Read in size, and any hidden stuff
s.defaultReadObject();
// Read in capacity
s.readInt(); // ignored
if (size > 0) {
// be like clone(), allocate array based upon size not capacity
ensureCapacityInternal(size);
Object[] a = elementData;
// Read in all elements in the proper order.
for (int i=0; i<size; i++) {
a[i] = s.readObject();
}
}

}
</code>

前面介绍成员变量的时候elementData是有transient标识的，即elementData是不会被序列化的。那ArrayList里面的数据到底会不会序列化呢？上面的代码可以看到ArrayList重写了writeObject与readObject方法，即覆盖了默认的序列化实现。序列化与反序列化的时候都是只序列化了List的数组中实际存在的元素，所以ArrayList加transient标识的目的只是为了避免默认序列化机制把整个数组都序列化了，然后自实现了序列化与反序列化方法，将真实存在的数据进行了序列化操作。

<h4>2.6 子列表</h4>
<code>public List<E> subList(int fromIndex, int toIndex) {
subListRangeCheck(fromIndex, toIndex, size);
return new SubList(this, 0, fromIndex, toIndex);
}
private class SubList extends AbstractList<E> implements RandomAccess {
private final AbstractList<E> parent;
private final int parentOffset;
private final int offset;
int size;
SubList(AbstractList<E> parent,
int offset, int fromIndex, int toIndex) {
this.parent = parent;
this.parentOffset = fromIndex;
this.offset = offset + fromIndex;
this.size = toIndex - fromIndex;
this.modCount = ArrayList.this.modCount;
}
private void checkForComodification() {
if (ArrayList.this.modCount != this.modCount)
throw new ConcurrentModificationException();
}
}
</code>
ArrayList提供了subList这个方法得到一个指定下标范围内的子列表视图，这里下标范围检查就不看了，所有的List下标相关操作都需要做个是否越界的检查。关键看下SubList 这个内部类，这里限于篇幅代码没有贴全，重点看下这个内部类的成员变量就可以了，特别注意下parent这个是父列表，而实际上所有子列表操作中其实都是操作的父列表。

另外看下SubList 这个内部类的checkForComodification方法，子列表几乎所有操作都会先调用checkForComodification方法，这个方法主要是检查modCount这个值是否相等，就是为了避免父列表修改的。因为父列表有修改时modCount值会增加，而造成不相等，会抛出异常，所以如果得到了子列表，父列表元素不能有变更操作（增删操作）。

<b>最佳实践：（1）子列表的所有操作都会改变原列表，所以有些场景下想修改列表的某部分数据，可以直接得出子列表进行修改，就会修改原列表了；（2）得到子列表后，不要再直接修改原列表了，否则会抛异常。</b>

<h4>2.7 Stream API</h4>
<code>public void forEach(Consumer<? super E> action) {
Objects.requireNonNull(action);
final int expectedModCount = modCount;
@SuppressWarnings("unchecked")
final E[] elementData = (E[]) this.elementData;
final int size = this.size;
for (int i=0; modCount == expectedModCount && i < size; i++) {
action.accept(elementData[i]);
}
if (modCount != expectedModCount) {
throw new ConcurrentModificationException();
}
}
public Spliterator<E> spliterator() {
return new ArrayListSpliterator<>(this, 0, -1, 0);
}
public boolean removeIf(Predicate<? super E> filter) {
Objects.requireNonNull(filter);
// figure out which elements are to be removed
// any exception thrown from the filter predicate at this stage
// will leave the collection unmodified
int removeCount = 0;
final BitSet removeSet = new BitSet(size);
final int expectedModCount = modCount;
final int size = this.size;
for (int i=0; modCount == expectedModCount && i < size; i++) {
@SuppressWarnings("unchecked")
final E element = (E) elementData[i];
if (filter.test(element)) {
removeSet.set(i);
removeCount++;
}
}
if (modCount != expectedModCount) {
throw new ConcurrentModificationException();
}
// shift surviving elements left over the spaces left by removed elements
final boolean anyToRemove = removeCount > 0;
if (anyToRemove) {
final int newSize = size - removeCount;
for (int i=0, j=0; (i < size) && (j < newSize); i++, j++) {
i = removeSet.nextClearBit(i);
elementData[j] = elementData[i];
}
for (int k=newSize; k < size; k++) {
elementData[k] = null; // Let gc do its work
}
this.size = newSize;
if (modCount != expectedModCount) {
throw new ConcurrentModificationException();
}
modCount++;
}
return anyToRemove;
}
@Override
@SuppressWarnings("unchecked")
public void replaceAll(UnaryOperator<E> operator) {
Objects.requireNonNull(operator);
final int expectedModCount = modCount;
final int size = this.size;
for (int i=0; modCount == expectedModCount && i < size; i++) {
elementData[i] = operator.apply((E) elementData[i]);
}
if (modCount != expectedModCount) {
throw new ConcurrentModificationException();
}
modCount++;
}</code>

其中forEach方法参数为函数式方法，这里会遍历每个数组执行对应的操作。spliterator方法返回Spliterator对象，实际是一个可以并行遍历的迭代器，ArrayList从Collection中继承的stream方法就会用到这个方法去构造。removeIf方法参数也会函数式方法，Predicate是个判断条件的函数接口，条件满足时就会将元素删除。replaceAll方法的参数也是UnaryOperator函数接口，这个接口可以执行操作，然后将对应的元素做替换。

这里Stream API与Lamda表达式一般都关联使用，目前对于这块原理还不是特别清楚，所以就不展开讲。后期理清楚了再补充。这里附几个看到的比较好的相关资料，有助于了解：
官方Lamda表达式教程
 Stream语法详解
 为什么需要 Stream

JDK源码剖析与最佳实践—ArrayList

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

程序员