深入thread-local本质

作者: zh_harry | 来源:发表于2019-01-02 13:08 被阅读259次

深入thread-local本质
深入理解ThreadLocal
原来她叫ThreadLocal
ThreadLocal(一)
ThreadLocal 总结
Python的TLS机制中的threading.local()
深入分析ThreadLocal
Thread-Local Storage模式
ThreadLocal小记
ThreadLocal 工作原理

thread-local 是什么?

Martin Flower在《重构》中有一句经典的话："任何一个傻瓜都能写出计算机可以理解的程序，只有写出人类容易理解的程序才是优秀的程序员。可见高级语言的命名有多么重要，其实语言本身就是注释。

thread-local 从字面理解就是线程本地变量

package com.sparrow.jdk.threadlocal;

/**
 * @author by harry
 */
public class ThreadWithLocal extends Thread{
    public ThreadWithLocal(Long t) {
        this.t = t;
    }

    @Override public void run() {
        System.out.println(this.t);
    }

    //thread local 线程本地
    private Long t;

    public static void main(String[] args) {
        ThreadWithLocal t=new ThreadWithLocal(System.nanoTime());
        t.start();
        ThreadWithLocal t2=new ThreadWithLocal(System.nanoTime());
        t2.start();
    }
}

这里每个线程都是一个实例，相同实例多次start 会报异常Exception in thread "main" java.lang.IllegalThreadStateException
运行结果

3026460560368
3026460819350

public
class Thread implements Runnable {
  ...略
    /* ThreadLocal values pertaining to this thread. This map is maintained
     * by the ThreadLocal class. */
    ThreadLocal.ThreadLocalMap threadLocals = null;
  ... 略
}

通过以上的thread 源码和示例代码分析，两个变量的本质是一样的都可以理解为thread 本地变量，其实就是thread类里的一个成员变量。

为什么jdk 要单独实现一个ThreadLocal对象？

从业务实现的角度来想，通过示例代码的场景，是可以实现线程隔离的效果的。
但这里有一种情况实现起来有些困难(笔者自认为)。martin flower曾经提过一个概念叫客户端程序员,这个概念非常重要，因为这个概念笔者认为可以更容易地理解面向接口编程。作为程序的提供者比如jdk,tomcat 这些公用的框架，一般不允许使用者(客户端程序员)修改，但提供扩展能力，即 开闭原则,比如jdk的spi 扩展点等。
第一个实例代码的实现即在thread 的子类中进行扩展，理论上可以实现，但一般对封装好的线程，修改的起来比较复杂，而且可能会破坏原有代码逻辑。一般我们的业务代码都会工作在多线程的上下文中，而对于开发者来讲是透明的，如tomcat 就是多线程。假如如下代码工作在多线程环境下，一般spring 会声明为单例，即共享变量。

package com.sparrow.jdk.threadlocal;

/**
 * @author by harry
 */
public class MultiThreadShareBusiness {
    private Long threadId;

    public Long getThreadId() {
        return threadId;
    }

    public void setThreadId(Long threadId) {
        this.threadId = threadId;
    }

    public void business(){
       //如果变量的值与当前线程不同，说明线程不安全
        if(threadId!=Thread.currentThread().getId()) {
            System.out.println(Thread.currentThread().getId() + "-" + threadId);
        }
    }
}

package com.sparrow.jdk.threadlocal;

/**
 * @author by harry
 */
public class ThreadShareObjectTest extends Thread{
    private static MultiThreadShareBusiness o=new MultiThreadShareBusiness();

    public void run(){
        while (true) {
            o.setThreadId(Thread.currentThread().getId());
            o.business();
        }
    }

    public static void main(String[] args) {
        Thread thread=new ThreadShareObjectTest();
        thread.start();

        Thread thread2=new ThreadShareObjectTest();
        thread2.start();
    }
}

有输出说明线程不安全，共享变量被两个线程同时修改。

这里有两种方法可以保证线程安全，一种加锁，第二种就是用threadLocal变量隔离。

package com.sparrow.jdk.threadlocal;

/**
 * @author by harry
 */
public class MultiThreadLocalBusiness {

    public static void main(String[] args) {
        MultiThreadLocalBusiness m=new MultiThreadLocalBusiness();
        m.setThreadId(1L);
        m.business();
    }
    private ThreadLocal<Long> threadId = new ThreadLocal<>();

    public void setThreadId(Long threadId) {
        this.threadId.set(threadId);
    }

    public void business() {
        ThreadLocal<Long> t=this.threadId;
        if (t.get() != Thread.currentThread().getId()) {
            System.out.println(Thread.currentThread().getId() + "-" + t.get());
        }
    }
}

package com.sparrow.jdk.threadlocal;

/**
 * @author by harry
 */
public class ThreadLocalTest extends Thread{
    private static MultiThreadLocalBusiness o=new MultiThreadLocalBusiness();

    public void run(){
        while (true) {
            o.setThreadId(Thread.currentThread().getId());
            o.business();
        }
    }

    public static void main(String[] args) {
        Thread thread=new ThreadLocalTest();
        thread.start();

        Thread thread2=new ThreadLocalTest();
        thread2.start();
    }
}

修改为threadLocal 变量后无输出，说明起到了隔离效果。

thread-local 对象的本质为thread 类中的 map 的value，对外可以提供扩展。

  ThreadLocal.ThreadLocalMap threadLocals = null

线程隔离的前提条件

需要隔离的对象一定是共享变量。因为栈中的变量(局部变量)本身就具备隔离效果。
线程是共享的，一般与进程的生命周期相同。
以上两种情况下ThreadLocal 变量的线程隔离才有意义。

类图及源码分析

thread-local.png

static class ThreadLocalMap {

        /**
         * The entries in this hash map extend WeakReference, using
         * its main ref field as the key (which is always a
         * ThreadLocal object).  Note that null keys (i.e. entry.get()
         * == null) mean that the key is no longer referenced, so the
         * entry can be expunged from table.  Such entries are referred to
         * as "stale entries" in the code that follows.
         */
        static class Entry extends WeakReference<ThreadLocal<?>> {
            /** The value associated with this ThreadLocal. */
            Object value;

            Entry(ThreadLocal<?> k, Object v) {
                super(k);
                value = v;
            }
        }

迪米特法则
ThreadLocalMap 是ThreadLocal 的内部类，无访问限制符，只在包在有效。
迪米特法则（Law of Demeter）又叫作最少知识原则（Least Knowledge Principle 简写LKP），就是说一个对象应当对其他对象有尽可能少的了解,不和陌生人说话。
更多设计原则 https://www.jianshu.com/p/3f7628e2e796
thread local map 的key 为weak reference。

thread-local2.jpg
图片引自https://www.jianshu.com/p/a1cd61fa22da

通过thread local map 源代码和类图我们总结以下对象引用关系图。
首先在堆栈中的current thread ref，线程一定会在栈中，这个引用是可以确定的。
那么threadlocal ref 一定也在栈中吗？

javap -v com.sparrow.jdk.threadlocal.MultiThreadLocalBusiness

关于load store 指令参见 java 虚拟机规范
https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-2.html#jvms-2.11.2

 public void business() {
        ThreadLocal<Long> t=this.threadId;
        if (t.get() != Thread.currentThread().getId()) {
            System.out.println(Thread.currentThread().getId() + "-" + t.get());
        }
    }

 public void business();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=4, locals=1, args_size=1
         0: aload_0  //this 压栈，注意这里并不是thread local 引用 而是当前对象
         1: getfield      #4                  // Field  threadId:Ljava/lang/ThreadLocal; //取当前字段名
         4: invokevirtual #11                 // Method java/lang/ThreadLocal.get:()Ljava/lang/Object;//执行get  方法

public void business() {
//改成本地变量后
        ThreadLocal<Long> t=this.threadId;
        if (threadId.get() != Thread.currentThread().getId()) {
            System.out.println(Thread.currentThread().getId() + "-" + threadId.get());
        }
    }
  public void business();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=4, locals=2, args_size=1
         0: aload_0
         1: getfield      #4                  // Field threadId:Ljava/lang/ThreadLocal;
         4: astore_1
         5: aload_1 //将thread local 压栈

通过反汇编可以确定thread local ref 是有可能存在于栈中的。

引用(以下概念摘自《深入理解java 虚拟机》周志明著)

强引用
强引用是指在程序代码中普遍存在的。类似"Object obj=new Object()"这类的引用，只要强引用还存在，垃圾收集器永远不会回收掉被引用的对象。(gc root 可达)
软引用
软引用用来描述一些还有用，但并非必需的对象。对于软引用关联着的对象，在系统将要发生溢出异常之前，将会把这些对象列进回收范围之中并进行第二次回收。如果这次回收还是没有足够的的内存，才会抛出内存溢出异常。应用场景缓存
弱引用
弱引用也是用来描述非必需对象的，但是它的强度比软引更弱一些，被弱引用关联的对象只能生成到下一次垃圾收集发生之前。当垃圾收集器工作时，无论当前内存是否足够，都会回收只被弱引用关联的对象。
虚引用

通过源码可知thread local map 的key为thread local 对象的弱引用，我们通过代码来验证以上概念的正确性。

package com.sparrow.jdk.threadlocal;

import com.sparrow.jdk.volatilekey.User;

import java.lang.ref.SoftReference;
import java.lang.ref.WeakReference;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;

/**
 * Created by harry on 2018/4/12.
 */
public class TestWeakReference {
   static WeakReference<User> user = new WeakReference<User>(new User(100, new byte[10000]));

    public static void main(String[] args) {
        int i = 0;
        while (true) {
            //User u=user.get();
            if (user.get() != null) {
                i++;
                System.out.println("Object is alive for " + i + " loops - ");
            } else {
                System.out.println("Object has been collected.");
                break;
            }
//由概念可知无论内存是否足够，只要gc弱引用就会被释放。
            System.gc();
        }
    }
}

运行结果

Object is alive for 1 loops - 
Object has been collected.

从结果上看被释放掉了，好象没有什么问题，但我们回想一下，如果thread local 中的key每次在gc的时侯都被释放掉，我们的程序会报空指针异常，而为什么没有空指针异常呢？

package com.sparrow.jdk.threadlocal;

/**
 * @author by harry
 */
public class ThreadLocalGc {
    private static ThreadLocal<String> s=new ThreadLocal<>();

    public static void main(String[] args) {
        s.set("hello");
        System.out.println(s.get());
        System.gc();
        System.out.println(s.get());
    }
}

运行结果正常,没有报空指针异常

我们引代码改一下

package com.sparrow.jdk.threadlocal;

import com.sparrow.jdk.volatilekey.User;

import java.lang.ref.SoftReference;
import java.lang.ref.WeakReference;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;

/**
 * Created by harry on 2018/4/12.
 */
public class TestWeakReference {
   static WeakReference<User> user = new WeakReference<User>(new User(100, new byte[10000]));

    public static void main(String[] args) {
        int i = 0;
        while (true) {
//将注释打开，用一个强引用来引用thread local 对象
            User u=user.get();
            if (user.get() != null) {
                i++;
                System.out.println("Object is alive for " + i + " loops - ");
            } else {
                System.out.println("Object has been collected.");
                break;
            }
            System.gc();
        }
    }
}

运行结果

Object is alive for 1 loops - 
Object is alive for 2 loops - 
...

循环中，说明弱对象没有被回收，所以以上概念不够严谨，应该是没有强引用引用的弱对象会被gc回收。

thread local 的内存泄漏

由上图首先分析下内存泄漏条件:

线程未死亡
因为线程死亡后，thread local map 的引用被切断，而thread local 对象也会被切掉，那么对象一定会被回收，不可能泄漏。
thread local 的引用被回收，引用变为null。
key为null 后无get set 操作，因为get set 操作清除掉key 为null的对象。
另外，如果value 对应的对象不是很大,也不是很多的的情况下，内存泄漏并不明显。
如果是大对象则可能引发内存溢出异常(oom),所以建议在不使用该对象时手动调用remove 方法，避免内存泄漏。

package com.sparrow.jdk.threadlocal;

/**
 * Created by harry on 2018/4/12.
 */
public class ThreadLocalGCLeak extends Thread {
    public static class MyThreadLocal extends ThreadLocal {
        private byte[] a = new byte[1024 * 1024 * 1];
        @Override
        public void finalize() {
            System.out.println(" threadlocal 对象被gc回收.");
        }
    }

    public static class MyBigObject {//占用内存的大对象
        private byte[] a = new byte[1024 * 1024 * 50];
        @Override
        public void finalize() {
            System.out.println("50 MB对象被gc回收.");
        }
    }

    public static void main(String[] args) throws InterruptedException {
        Thread thread = new Thread(new Runnable() {
            @Override
            public void run() {
                ThreadLocal tl = new MyThreadLocal();
                tl.set(new MyBigObject());
                //tl=null;//断开ThreadLocal引用，暂没想到其他办法让thread local 对象被gc先回收
                System.out.println("Full GC 1");
                System.gc();
                //测试时模拟线程继续执行
                //while (true){}
            }
        });
     thread.setDaemon(false);
        thread.start();
        System.out.println("Full GC 2");
        System.gc();
        Thread.sleep(1000);
        System.out.println("Full GC 3");
        System.gc();
        Thread.sleep(1000);
        System.out.println("Full GC 4");
        System.gc();
    
    }
}

通过分析其实想泄漏还是挺难的~~~~

为什么使用弱引用

通过上面的示例，我们会发现，无论是强引用还是弱引用，在不手动remove 的情况下，value 都会泄漏（前提是线程还活着）。
而弱引用至少我们可以保证key会被回收。
因为如果是强引用threadlocal 有两个强引用（栈中的和thread locak map 的key）指向它，即使将栈中的切断，设置为null,而thead local map 的key也是强指向它，故thread local 不会被回收，而如果是弱引用，将栈中的强引用切断后，再无强引用引用threadlocal 对象，则在下次gc时会被回收。
http://www.cnblogs.com/onlywujun/p/3524675.html

对thread local 的优化

我们发现thread local map 的本质是hash map,而hash map的时间复杂度为O(1)+O(m) (m<n n为map的size)
所以通过优化thread local map 的时间度为O(1),即将hash map 转换成数组，dubbo 和netty的源码中有对thread local的优化
netty 源代码
io.netty.util.concurrent.FastThreadLocal

 private final int index;

    private final int cleanerFlagIndex;

    public FastThreadLocal() {
        index = InternalThreadLocalMap.nextVariableIndex();//初始化index 
        cleanerFlagIndex = InternalThreadLocalMap.nextVariableIndex();
    }

public final V get() {
        InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
        Object v = threadLocalMap.indexedVariable(index);
        if (v != InternalThreadLocalMap.UNSET) {
            return (V) v;
        }

        V value = initialize(threadLocalMap);
        registerCleaner(threadLocalMap);
        return value;
    }

dubbo 源码
org.apache.dubbo.common.threadlocal.InternalThreadLocal

public class InternalThreadLocal<V> {

    private static final int variablesToRemoveIndex = InternalThreadLocalMap.nextVariableIndex();

    private final int index;

    public InternalThreadLocal() {
        index = InternalThreadLocalMap.nextVariableIndex();
    }

  /**
     * Returns the current value for the current thread
     */
    @SuppressWarnings("unchecked")
    public final V get() {
        InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
        Object v = threadLocalMap.indexedVariable(index);
        if (v != InternalThreadLocalMap.UNSET) {
            return (V) v;
        }

        return initialize(threadLocalMap);
    }

深入thread-local本质
thread-local 是什么? Martin Flower在《重构》中有一句经典的话："任何一个傻瓜都能写出计...
深入理解ThreadLocal
ThreadLocal是什么？ This class provides thread-local variable...
原来她叫ThreadLocal
ThreadLocal类是什么: This class provides thread-local variabl...
ThreadLocal(一)
概述官方说明： This class provides thread-local variables. These...
ThreadLocal 总结
一、概述定义官方定义： This class provides thread-local variables....
Python的TLS机制中的threading.local()
Python 裡有個 threading.local()，可以取得 thread-local storage (T...
深入分析ThreadLocal
首先看下jdk里这个类的定义： This class provides thread-local variable...
Thread-Local Storage模式
别名 Per-Thread Attribute Thread-Specific Data Thread-Speci...
ThreadLocal小记
1、什么是ThreadLocal 官方介绍：该类提供了线程局部(thread-local)变量。这些变量不同于它们...
ThreadLocal 工作原理
一. 介绍 ThreadLocal 提供了线程局部（thread-local）变量。这些变量不同于普通变量，因为访...