美文网首页Spring技术Java技术硬核技术
Java 7 HashMap 多线程并发操作导致cpu 100%

Java 7 HashMap 多线程并发操作导致cpu 100%

作者: 一字马胡 | 来源:发表于2020-10-18 17:45 被阅读0次

    问题现象

    在线上发布一个java 7服务的时候,发现某台机器发布完成后无法正常提供服务,发布后出现大量线程被blocked,触发了告警:

    JVM线程监控状况

    从监控中可以看到,JVM中存活的线程数量已经达到2k+,这本身就是不正常的,其次,有近1.8k的线程被blocked了,这就说明服务根本就没有正常启动,存在启动问题。

    问题分析

    线程数量超出正常水平,和线程blocked是因果关系,因为线程被blocked了,所以需要更多的线程来执行工作,所以新的线程被不断的创建出来。
    所以需要找出线程被阻塞到了什么地方,通过简单排查分析,发现大量的线程都被阻塞在相同的地方:

    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton
    (DefaultSingletonBeanRegistry.java: 213)
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean
    (AbstractBeanFactory.java: 308)
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean
    (AbstractBeanFactory.java: 197)
    

    来看一下阻塞的地方法代码:org.springframework.beans.factory.support.DefaultSingletonBeanRegistry#getSingleton(java.lang.String, boolean)

        /**
         * Return the (raw) singleton object registered under the given name.
         * <p>Checks already instantiated singletons and also allows for an early
         * reference to a currently created singleton (resolving a circular reference).
         * @param beanName the name of the bean to look for
         * @param allowEarlyReference whether early references should be created or not
         * @return the registered singleton object, or {@code null} if none found
         */
        protected Object getSingleton(String beanName, boolean allowEarlyReference) {
            Object singletonObject = this.singletonObjects.get(beanName);
            if (singletonObject == null && isSingletonCurrentlyInCreation(beanName)) {
                synchronized (this.singletonObjects) {
                    singletonObject = this.earlySingletonObjects.get(beanName);
                    if (singletonObject == null && allowEarlyReference) {
                        ObjectFactory<?> singletonFactory = this.singletonFactories.get(beanName);
                        if (singletonFactory != null) {
                            singletonObject = singletonFactory.getObject();
                            this.earlySingletonObjects.put(beanName, singletonObject);
                            this.singletonFactories.remove(beanName);
                        }
                    }
                }
            }
            return (singletonObject != NULL_OBJECT ? singletonObject : null);
        }
    

    org.springframework.beans.factory.support.DefaultSingletonBeanRegistry#getSingleton(java.lang.String, boolean)这个方法确实存在同步代码,需要执行同步代码的线程需要获取到锁才能执行,否则就会被blocked。

    分析到这里,我们能确定的事情就是调用方法org.springframework.beans.factory.support.DefaultSingletonBeanRegistry#getSingleton(java.lang.String, boolean)确实会产生因竞争同步锁而导致的线程blocked,但是根据报警,几乎所有的线程都被blocked了,那就可能存在死锁问题,导致这个锁无法被释放,所以所有访问该方法的线程都被blocked,为了搞明白具体的原因,先把线程堆栈转储下来。

    "xxx-13-thread-1" daemon prio=10 tid=0x00007fa3790a1000 nid=0x48b4c waiting for monitor entry [0x00007fa38e17b000]
       java.lang.Thread.State: BLOCKED (on object monitor)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:213)
        - waiting to lock <0x0000000727af5b68> (a java.util.concurrent.ConcurrentHashMap)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:308)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
        at org.springframework.aop.aspectj.annotation.BeanFactoryAspectInstanceFactory.getAspectInstance(BeanFactoryAspectInstanceFactory.java:83)
        at org.springframework.aop.aspectj.annotation.LazySingletonAspectInstanceFactoryDecorator.getAspectInstance(LazySingletonAspectInstanceFactoryDecorator.java:53)
        at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:627)
        at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:616)
        at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:168)
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:671)
        ...
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:736)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:671)
        ...
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
    

    被blocked的线程栈都和上面贴出的一样,重点在于:

    waiting to lock <0x0000000727af5b68> (a java.util.concurrent.ConcurrentHashMap)
    

    0x0000000727af5b68是对象的地址,其实根据后面的提示(a java.util.concurrent.ConcurrentHashMap)也可以确定是我们上面分析过的那个同步对象,再来看一下刚才那个同步对象:

        /** Cache of singleton objects: bean name --> bean instance */
        private final Map<String, Object> singletonObjects = new ConcurrentHashMap<String, Object>(256);
    

    现在,我需要知道是哪个线程占有了对象0x0000000727af5b68的锁不释放,导致其他线程被blocked,为了搜索占有锁的线程,可以在线程栈转储文件中搜索关键字:"locked <0x0000000727af5b68>",根据对象锁获取逻辑,只可能有一个线程持有该对象锁,搜索后,发现了如下的堆栈:

    "main" prio=10 tid=0x00007fa4a0018000 nid=0x4889e runnable [0x00007fa4a8fea000]
       java.lang.Thread.State: RUNNABLE
        at java.util.HashMap.put(HashMap.java:494)
        at org.apache.thrift.meta_data.FieldMetaData.addStructMetaDataMap(FieldMetaData.java:49)
      ...
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:191)
        at com.sun.proxy.$Proxy346.<clinit>(Unknown Source)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.reflect.Proxy.newInstance(Proxy.java:764)
        at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:755)
        at org.springframework.aop.framework.JdkDynamicAopProxy.getProxy(JdkDynamicAopProxy.java:122)
        at org.springframework.aop.framework.JdkDynamicAopProxy.getProxy(JdkDynamicAopProxy.java:112)
        at org.springframework.aop.framework.ProxyFactory.getProxy(ProxyFactory.java:96)
      ...
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1759)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1696)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1626)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:553)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:481)
        at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:312)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
        - locked <0x0000000727af5b68> (a java.util.concurrent.ConcurrentHashMap)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:308)
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:756)
        at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:867)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:542)
        - locked <0x00000007292f2fb0> (a java.lang.Object)
        at org.springframework.boot.context.embedded.EmbeddedWebApplicationContext.refresh(EmbeddedWebApplicationContext.java:123)
        at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:666)
        at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:353)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:300)
    

    是main线程,它持有了对象0x00007fa4a8fea000锁,并且根据他的状态为RUNNABLE,说明它并没有被阻塞,也就是这其实不是死锁问题,估计是main线程进入了死循环出不来,从而持有的锁无法释放,导致其他需要对象0x00007fa4a8fea000锁的线程都被blocked。

    看到栈顶在java.util.HashMap.put(HashMap.java:494),心里一惊,感觉发现了thrift的一个bug了,这件事情还值得再说一说。

    HashMap在至今的java版本中均不是线程安全的,也就是说,如果你的场景中会存在并发访问一个Map,你就不能用HashMap,否则会出现或多或少的问题,我们使用的是Java 7,在Java 7中,多线程并发访问HashMap会存在线程死循环的问题。

    为了说明问题,截取HashMap的put方法代码如下:

        public V put(K key, V value) {
            if (table == EMPTY_TABLE) {
                inflateTable(threshold);
            }
            if (key == null)
                return putForNullKey(value);
            int hash = hash(key);
            int i = indexFor(hash, table.length);
            for (Entry<K,V> e = table[i]; e != null; e = e.next) { // ----- 494行
                Object k;
                if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                    V oldValue = e.value;
                    e.value = value;
                    e.recordAccess(this);
                    return oldValue;
                }
            }
    
            modCount++;
            addEntry(hash, key, value, i);
            return null;
        }
    

    进入死循环的条件就是当前的e.next = e,也就是某个节点的next指针指向了自己,导致无限循环问题。为了验证这个问题,将堆dump了下来,然后使用Eclipse Memory Analyzer Tool(下文中使用 MAT 来指代该工具)来载入dump下来的堆,然后点击下面示意图中的按钮获取到线程列表:


    线程详情

    MAT可以将线程的名字,当前的堆栈及持有的对象分析出来,对于排查内存问题非常的方便,找到main线程:

    main线程栈顶

    结合HashMap的put死循环代码,当时的e就是0x73ae67608这个java.util.HashMap.Entry,可以看到,这个java.util.HashMap.Entry的next还是自己,这样就导致了执行该代码的main线程死循环了。

    关于HashMap的死循环问题是如何产生的,可以参考为什么HashMap不线程安全

    问题解决

    这个HashMap的代码是thrift的代码,我们可以看看原始代码:

    //
    // Source code recreated from a .class file by IntelliJ IDEA
    // (powered by Fernflower decompiler)
    //
    
    package org.apache.thrift.meta_data;
    
    import java.io.Serializable;
    import java.util.HashMap;
    import java.util.Map;
    import org.apache.thrift.TBase;
    import org.apache.thrift.TFieldIdEnum;
    
    public class FieldMetaData implements Serializable {
        public final String fieldName;
        public final byte requirementType;
        public final FieldValueMetaData valueMetaData;
        private static Map<Class<? extends TBase>, Map<? extends TFieldIdEnum, FieldMetaData>> structMap = new HashMap();
    
        public FieldMetaData(String name, byte req, FieldValueMetaData vMetaData) {
            this.fieldName = name;
            this.requirementType = req;
            this.valueMetaData = vMetaData;
        }
    
        public static void addStructMetaDataMap(Class<? extends TBase> sClass, Map<? extends TFieldIdEnum, FieldMetaData> map) {
            structMap.put(sClass, map);
        }
    
        public static Map<? extends TFieldIdEnum, FieldMetaData> getStructMetaDataMap(Class<? extends TBase> sClass) {
            if(!structMap.containsKey(sClass)) {
                try {
                    sClass.newInstance();
                } catch (InstantiationException var2) {
                    throw new RuntimeException("InstantiationException for TBase class: " + sClass.getName() + ", message: " + var2.getMessage());
                } catch (IllegalAccessException var3) {
                    throw new RuntimeException("IllegalAccessException for TBase class: " + sClass.getName() + ", message: " + var3.getMessage());
                }
            }
    
            return (Map)structMap.get(sClass);
        }
    }
    
    

    根据问题,我们知道,解决问题的方式有两种,一种是将structMap定义成并发安全的ConcurrentHashMap,另一种方法是将访问structMap的代码写成同步的,也就是在操作structMap的方法上(或者代码段上)加上synchronized关键字。
    此时兴奋的我想快去给thrift提个pr,但是发现如下的代码:

    github thrift修复代码

    可以看到thrift已经修复了该问题,是使用加synchronized关键字的方案来解决的。我们可以升级到0.9.3及之后的版本就可以避免再次发生这样的问题。

    这个pr是为了解决THRIFT-1618这个任务的,为了看看这个问题是否和我们的问题一致,可以搜索一下这个任务:

    THRIFT-1618

    可以看到这个任务的状态是CLOSED,已经被解决,问题描述也和我们的状况一致。

    结论

    基于上文的分析,总结一下,该问题是因为多线程并发访问HashMap触发Java 7 HashMap扩容时导致链表循环,从而线程进入死循环,而死循环线程持有的对象锁无法得到释放,其他请求获取对象锁的线程均被blocked​。将thrift版本升级到0.9.3以上就可以解决这个问题。

    相关文章

      网友评论

        本文标题:Java 7 HashMap 多线程并发操作导致cpu 100%

        本文链接:https://www.haomeiwen.com/subject/lbscmktx.html