06_ArrayList

作者: 0x70e8 | 来源:发表于2018-03-22 20:47 被阅读0次

[TOC]

0.概述

Resizable-array implementation of the List interface.
Implements all optional list operations, and permits all elements, including null. In addition to implementing the List interface, this class provides methods to manipulate the size of the array that is used internally to store the list.
(This class is roughly equivalent to Vector, except that it is unsynchronized.)

ArrayList是基于对象数组的线性数据结构
支持动态扩容
基础默认容量10
最大容量为Integer.MAX_VALUE
支持null值
动态扩容基于数组拷贝

1. 创建ArrayList

三种构造器：

无参构造器

    public ArrayList() {
        super();
        this.elementData = EMPTY_ELEMENTDATA;
    }

无参构造器，初始化内部对象数组为空数组 {}，此方式创建的list会在第一次add元素的时候就触发扩容。

指定初始容量

    public ArrayList(int initialCapacity) {
        super();
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal Capacity: "+
                                               initialCapacity);
        this.elementData = new Object[initialCapacity];
    }

所做的工作是，初始化对象数组。

从其他集合包装

    public ArrayList(Collection<? extends E> c) {
        elementData = c.toArray();
        size = elementData.length;
        // c.toArray might (incorrectly) not return Object[] (see 6260652)
        if (elementData.getClass() != Object[].class)
            elementData = Arrays.copyOf(elementData, size, Object[].class);
    }

首先这个方法使用了泛型边界通配符<? extends E>，限定了传入方法的集合的元素类型，而后做的事情是数组拷贝（也就是元素拷贝），注意此处给size变量赋了值，size变量的值的作用是指代容器中元素的个数，因为有元素拷贝进来，自然要设置size。

2. 动态扩容的触发

ArrayList在add元素的时候，会比较当前容量和元素数量，当发现元素数量达到容器的容量时，就会触发扩容操作。

    public boolean add(E e) {
        ensureCapacityInternal(size + 1);  // Increments modCount!!
        elementData[size++] = e;
        return true;
    }

    private void ensureCapacityInternal(int minCapacity) {
        // 针对无参构造器创建ArrayList的情况，add时设置默认容量
        if (elementData == EMPTY_ELEMENTDATA) {
            minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
        }

        ensureExplicitCapacity(minCapacity);
    }

    private void ensureExplicitCapacity(int minCapacity) {
        modCount++;

        // overflow-conscious code
        //当size+1 > elementData.length时，也就是在add之前，size（元素数量）==容量（数组长度），表示已经装满了
        if (minCapacity - elementData.length > 0)
            grow(minCapacity);
    }

和HashMap的扩容机制不同，在ArrayList中没有loadFactor的存在，扩容的阈值就是当前容量。

在使用无参构造器初始化阶段，内部数组为空的Object数组，即length为0。在第一次add元素时，会以默认容量（10）来创建新的数组，并将引用传递给原始的数组。随着元素数量的增长，数组的容量会不够，于是会动态增长数组容量，动态增长是通过创建新数组再将旧数据转移到新数组中的动作来完成的。Arrays.copyof();

3. 扩容方法

    /**
     * Increases the capacity to ensure that it can hold at least the
     * number of elements specified by the minimum capacity argument.
     *
     * @param minCapacity the desired minimum capacity
     */
    private void grow(int minCapacity) {
        // overflow-conscious code
        int oldCapacity = elementData.length;
        //  new = 1.5*old
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            newCapacity = minCapacity;
        // MAX_ARRAY_SIZE  = Integer.MAX_VALUE-8
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }

    private static int hugeCapacity(int minCapacity) {
        if (minCapacity < 0) // overflow
            throw new OutOfMemoryError();
        return (minCapacity > MAX_ARRAY_SIZE) ?
            Integer.MAX_VALUE :
            MAX_ARRAY_SIZE;
    }

4. 容器行为

4.1 indexof() -> ArrayList是支持null值的

    public int indexOf(Object o) {
        // 如果元素是null，遍历数组寻找null返回index
        if (o == null) {
            for (int i = 0; i < size; i++)
                if (elementData[i]==null)
                    return i;
        } else {
            for (int i = 0; i < size; i++)
                if (o.equals(elementData[i]))
                    return i;
        }
        return -1;
    }

另外lastIndexOf()则是倒序遍历数组，contains（）则是调用indexOf判断返回值是否为-1。

4.2 get()

    public E get(int index) {
        rangeCheck(index);

        return elementData(index);
    }

就是数组元素的访问。

4.3 remove（）

在ArrayList内部，对元素的处理实际上都是对数组的处理，很多情况下都要使用数组copy操作。比如remove：

    public E remove(int index) {
        rangeCheck(index);

        modCount++;
        E oldValue = elementData(index);

        int numMoved = size - index - 1;
        if (numMoved > 0)
            // 把被删除元素后续的元素拷贝到被删元素开始的位置上
            System.arraycopy(elementData, index+1, elementData, index,
                             numMoved);
        // 把最后一个元素置null
        elementData[--size] = null; // clear to let GC do its work

        return oldValue;
    }

这个方法把index后的数据覆盖index-1开始的数据，最后把最后的无效值置null（方便垃圾回收）；

    public boolean remove(Object o) {
        if (o == null) {
            for (int index = 0; index < size; index++)
                if (elementData[index] == null) {
                    fastRemove(index);
                    return true;
                }
        } else {
            for (int index = 0; index < size; index++)
                if (o.equals(elementData[index])) {
                    fastRemove(index);
                    return true;
                }
        }
        return false;
    }
    // fastremove 没有rangeCheck(index)的remove

    private void fastRemove(int index) {
        modCount++;
        int numMoved = size - index - 1;
        if (numMoved > 0)
            System.arraycopy(elementData, index+1, elementData, index,
                             numMoved);
        elementData[--size] = null; // clear to let GC do its work
    }

对象的remove没有什么高级的做法，也只是遍历找相等的对象，找到index去remove（index）；

4.4 clear（）

遍历数组，置null；

4.5 遍历

定义一个list：

    List<Integer> list = new ArrayList<Integer>() {
        private static final long serialVersionUID = 1L;
        {
            add(0);
            add(1);
            add(2);
            add(3);
            add(4);
            add(5);
        }
    };

4.5.1 for循环

        for (Integer i : list) {
            System.out.println(i);
        }
        for (int i = 0; i < list.size();) {
            System.out.println(list.get(i++));
        }

4.5.2 forEach循环lambda

    list.forEach(i -> {
        System.out.println(i);
    });

4.5.3 迭代器

        // listIterator可以指定迭代开始的游标位置，且支持正向和反向地迭代集合
        ListIterator<Integer> listitera = list.listIterator(list.size());
        
        while (listitera.hasPrevious()) {
        System.out.print(listitera.previous());
        }
        System.out.println();
        while (listitera.hasNext()) {
        System.out.print(listitera.next());
        }
        System.out.println();
        // 普通迭代器
        Iterator<Integer> iter = list.iterator();
        while (iter.hasNext()) {
        System.out.print(iter.next());
        }
        System.out.println();

5 ConcurrentModificationException

ArrayList实际上并不是线程安全的集合容器，ConcurrentModificationException这个并发修改异常只能检测到并发的修改并快速失败（fail-fast)来提醒开发者。

        for (Integer i : list) {    
            System.out.print(i);
            list.remove(i);
        }

这种情况下就会抛出ConcurrentModificationException。
上述的增强版for循环语法糖实际上内部使用的其实是迭代器来遍历元素的。在内部迭代时相当于：

    Iterator<Integer> iter = list.iterator();
        while (iter.hasNext()) {
            Integer e = iter.next();
            System.out.print(e);
            list.remove(e);
        }

原因是什么呢？看一下迭代器的next()（list的remove(e)的源码前面已有）：

    // ArrayList$Itr.next() 
    public E next() {
        // 有一个检测并发修改的方法
        checkForComodification();
        int i = cursor;
        if (i >= size)
            throw new NoSuchElementException();
        Object[] elementData = ArrayList.this.elementData;
        if (i >= elementData.length)
            throw new ConcurrentModificationException();
        cursor = i + 1;
        return (E) elementData[lastRet = i];
    }

    final void checkForComodification() {
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
        }

这里的异常就是从checkForComodification()抛出的，查看方法发现抛出原因很简单：modCount != expectedModCount,modCount和expectedModCount究竟在哪边使用了呢？
modCount是list用来计数在list上修改操作的，add，remove都会modCount++，而expectedModCount则是在取得迭代器的时候拷贝list的Modcount值，在迭代过程中，expectedModCount不会改变，用来检测在迭代过程中list的修改，一旦检测到修改，就抛出异常，即fail-fast机制。

5.1 避开ConcurrentModificationException

如何解决此异常，在并发环境下，对迭代器加锁同步，非并发情况下，使用迭代器内部方法安全remove对象：

    @Test
    public void testArrayListConcurrentModify1() {
        Iterator<Integer> iter = list.iterator();
        while (iter.hasNext()) {
            Integer e = iter.next();
            System.out.print(e);
            if (e == 0)
                iter.remove();
        }
        System.out.println(list);
    }

可以正常完成。
Iter内部的remove方法：

public void remove() {
            if (lastRet < 0)
                throw new IllegalStateException();
            checkForComodification();

            try {
                ArrayList.this.remove(lastRet);
                cursor = lastRet;
                lastRet = -1;
                expectedModCount = modCount;
            } catch (IndexOutOfBoundsException ex) {
                throw new ConcurrentModificationException();
            }
        }

没有操作modCount，所以不会抛出异常。

6. subList

ArrayList内部定义了一个内部类SubList（非静态私有类），用于返回ArrayList的子视图，因为只是提供一个视图，所以对视图的修改会直接反映到原list之上。
list获取视图的方法是public List<E> subList(int fromIndex, int toIndex)。
需要注意的是，在对视图迭代操作的时候，原list的修改也会导致迭代器抛出ConcurrentModificationException;有趣的是SubList类中获取的迭代器对象，是一个匿名内部类对象，在SubList#listIterator()方法中返回。

7. 性能与注意点

初始化ArrayList时，如果能够判断数据量，就在创建list对象同时设置容量值，避免数组频繁扩容的数组拷贝带来大量的性能开销。
list不适合在列表中间插入或删除操作，会造成数组大块的移动，如果数据量很大，会造成严重的性能损失。需要频繁在中间插入删除，考虑使用链表数据结构LinkedList。
使用for循环遍历元素时，不要增删元素，需要增删通过迭代器操作

8. 并发场景下的list

ArrayList不是并发安全的类，如果需要在多线程环境下使用此类，需要对数据进行明确的加锁来同步数据访问。
对于并发场景，JDK有一个原始的线性列表类Vector，它和ArrayList相似，区别是它的内部方法都是同步方法；
另外，JDK提供了Collections工具类的同步包装器，如synchronizedList（）来包装一个list，使其成为同步list，原理十分的简单，是在内部创建一个Object的mutex对象，作为锁对象对list的操作进行synchronized(mutex){}加锁操作。

处理同步包装器，JDK还提供了适用于并发的list类：CopyOnwriteArrayList。CopyOnWriteArrayList是java.util.concurrent包中的一个List的实现类。CopyOnWrite的意思是在写时拷贝，也就是如果需要对CopyOnWriteArrayList的内容进行改变，首先会拷贝一份新的List并且在新的List上进行修改，最后将原List的引用指向新的List。具体内容在并发部分再看。

9. 排序

9.1 List的排序

JDK提供的选择
在Java8中，List接口提供了default方法sort():

    default void sort(Comparator<? super E> c) {
        Collections.sort(this, c);
    }

但是ArrayList重写了sort方法：

    @Override
    @SuppressWarnings("unchecked")
    public void sort(Comparator<? super E> c) {
        final int expectedModCount = modCount;
        Arrays.sort((E[]) elementData, 0, size, c);
        if (modCount != expectedModCount) {
            throw new ConcurrentModificationException();
        }
        modCount++;
    }

注意sort方法修改了modeCount，要注意ConcurrentModificationException

使用的是集合工具类Collations的sort静态方法：

    public static <T> void sort(List<T> list, Comparator<? super T> c) {
        Object[] a = list.toArray();
        // 内部调用Arrays的排序方法
        Arrays.sort(a, (Comparator)c);
        ListIterator<T> i = list.listIterator();
        // 排序后数据复制
        for (int j=0; j<a.length; j++) {
            i.next();
            i.set((T)a[j]);
        }
    }

从源码角度看，本质上调用的都是Arrays.sort（）方法。

使用

@Test
    public void sortList() {
        List<Integer> l = new ArrayList<Integer>() {
            private static final long serialVersionUID = 1L;
            {
                add(10);
                add(1);
                add(3);
                add(100);
                add(19);
            }
        };
        /*
         * java 8 list.sort()
        */
        // 默认升序
        l.sort(null);

        // 升序 lambda
        l.sort((a, b) -> {
            return a - b;
        });
        // 降序 lambda
        l.sort((a, b) -> {
            return b - a;
        });

        /*
        * Collections.sort()
        */

        // 默认升序排列
        Collections.sort(l);

        // 自定义Comparator，降序排序
        Collections.sort(l, new Comparator<Integer>() {
            @Override
            public int compare(Integer o1, Integer o2) {
                return o2 - o1;
            }
        });

        // lambda 风格
        Collections.sort(l, (o1, o2) -> {
            return o2 - o1;
        });

    }

9.2 Comparator和Comparable

Comparable是排序接口。若一个类实现了Comparable接口，就意味着该类支持排序。实现了Comparable接口的类的对象的列表或数组可以通过Collections.sort或Arrays.sort进行自动排序。

Comparator是一个比较器，通过指定比较器可以为集合创建比较的规则，它通常作为一个外部的比较器来干涉集合的排序规则。通常它用来修改Comparable类的内部排序规则，或是对一个未实现Comparable接口的类定义排序规则。

简单来说，Comparable类似于一个标记接口，实现此接口说明此类支持默认的排序功能，排序规则根据实现的接口方法compareTo来确定，作为一个内部的比较器；而Comparator更加的通用，作为一个外部的比较器去干涉其他可排序和不可排序的类的顺序。

实现Comparable接口

import java.util.Arrays;

public class Person implements Comparable<Person> {

    private int age;

    public Person(int age) {
        this.age = age;
    }

    @Override
    public int compareTo(Person o) {
        return this.age - o.getAge();
    }

    public int getAge() {
        return age;
    }

    public static void main(String[] args) {
        Person[] ps = { new Person(30), new Person(10), new Person(20) };
        Arrays.sort(ps);
        for (Person i : ps) {
            System.out.print(i.getAge() + " ");
        }
        // 这边使用了Comparator来修改排序规则
        Arrays.sort(ps, (a, b) -> {
            return b.getAge() - a.getAge();
        });
        for (Person i : ps) {
            System.out.print(i.getAge() + " ");
        }
    }
}

不实现Comparable接口的排序支持

    class People {
        private int age;

        public People(int age) {
            this.age = age;
        }

        public int getAge() {
            return age;
        }

        public static void main(String[] args) {
            People[] pa = { new People(3), new People(1), new People(5) };
            //      Arrays.sort(pa); // 抛异常
            Arrays.sort(pa, (a, b) -> {
                return a.getAge() - b.getAge();
            });
            for (People i : pa) {
                System.out.print(i.getAge() + " "); // 1,3,5
            }
        }
    }

可以发现没有实现Comparable接口的People类，也可以通过指定Comparator来实现排序，如果没有Comparator则会抛出异常。

两种方法各有优劣，用Comparable 简单，只要实现Comparable 接口的对象直接就成为一个可以比较的对象（拥有默认的比较规则）。
用Comparator 的好处是另外实现一个比较器，来运行时创建比较规则，无论目标类有没有实现Comparable接口。

9.3 排序算法

9.3.1 基础排序算法

算法	时间复杂度
冒泡排序	O(n2)
选择排序	O(n2)
插入排序	O(n2)
希尔排序	O(n1.5)
快速排序	O(N*logN)
归并排序	O(N*logN)
堆排序	O(N*logN)
基数排序	O(d(n+r))

9.3.2 Arrays.sort()

从前文的sort的源码可以发现，list的排序最后调用的方法是Arrays.sort（）方法，查看JavaDoc文档，ArrayList采用的排序算法是TimSort：

/*
*
     * <p>The implementation was adapted from Tim Peters's list sort for Python
     * (<a href="http://svn.python.org/projects/python/trunk/Objects/listsort.txt">
     * TimSort</a>).  It uses techniques from Peter McIlroy's "Optimistic
     * Sorting and Information Theoretic Complexity", in Proceedings of the
     * Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474,
     * January 1993.
     *
*/

TimSort算法是一种起源于归并排序和插入排序的混合排序算法。

排序算法另外整理。

参考资料

[1] Java中Comparable和Comparator区别小结

[2] 阿里巴巴Java开发手册

[3] 排序算法总结