美文网首页架构师之路
记一次线上JVM长时间STW之分析 (续)

记一次线上JVM长时间STW之分析 (续)

作者: 华安火车迷 | 来源:发表于2017-11-09 18:30 被阅读249次

    在上一篇文章里,可以把重点总结成两句话:并不是GC才会STW;和STW 前要挂起所有的线程是分结果阶段的。而关于我们遇到的问题(线上的JVM发现偶尔STW时线程进入SafePoint后把自己挂起的block阶段会很长,最长有1.5s)目前通过我们把年轻代的空间调大后,症状减轻了,变成了好几天才会有一次这个问题出现,背后的逻辑应该是把YGC的时间大大延长后,减少了挂起所有线程的频率(以前是1秒都好几次,现在是6~8s 一次)所致。

    这篇文章主要想走读一下SafePoint这段进入STW 阶段的C++代码,来看看VMThread 如何挂起这些Java Thread的,最后再次作出一些可能的猜测。

    SafePoint.cpp

    void SafepointSynchronize::begin() {
    
      Thread* myThread = Thread::current();
      assert(myThread->is_VM_thread(), "Only VM thread may execute a safepoint");
    
      if (PrintSafepointStatistics || PrintSafepointStatisticsTimeout > 0) {
        _safepoint_begin_time = os::javaTimeNanos();
        _ts_of_current_safepoint = tty->time_stamp().seconds();
      }
    
    #if INCLUDE_ALL_GCS
      if (UseConcMarkSweepGC) {
        // In the future we should investigate whether CMS can use the
        // more-general mechanism below.  DLD (01/05).
        ConcurrentMarkSweepThread::synchronize(false);
      } else if (UseG1GC) {
        SuspendibleThreadSet::synchronize();
      }
    #endif // INCLUDE_ALL_GCS
    
      // By getting the Threads_lock, we assure that no threads are about to start or
      // exit. It is released again in SafepointSynchronize::end().
      Threads_lock->lock();
    
      assert( _state == _not_synchronized, "trying to safepoint synchronize with wrong state");
    
      int nof_threads = Threads::number_of_threads();
    
      if (TraceSafepoint) {
        tty->print_cr("Safepoint synchronization initiated. (%d)", nof_threads);
      }
    
      RuntimeService::record_safepoint_begin();
    
      MutexLocker mu(Safepoint_lock);
    
      // Reset the count of active JNI critical threads
      _current_jni_active_count = 0;
    
      // Set number of threads to wait for, before we initiate the callbacks
      _waiting_to_block = nof_threads;
      TryingToBlock     = 0 ;
      int still_running = nof_threads;
    
      // Save the starting time, so that it can be compared to see if this has taken
      // too long to complete.
      jlong safepoint_limit_time = 0;
      timeout_error_printed = false;
    

    这里介绍一下SafepointSynchronize::begin() 这个方法是VM Thread 调用的,在比如GC,JIT Compile, Loading Jar 这种需要STW的时候,这个VM Thread就需要调用这个方法来STW,然后等所有的Java Thread (除了GC Thread)都block住之后,就会做它本来想做的事情(vm operation) 最后唤醒所有线程。
    在这个方法的开始部分,除了一些加锁操作外,留意一下这三个变量的初始化:

      _waiting_to_block = nof_threads;
      TryingToBlock     = 0 ;
      int still_running = nof_threads;
    

    我们再回顾一下遇到问题时的截图


    对照这三个变量就是, nof_threads 是485 总数。

    接下来继续往下看,接着是一段注释

      // Begin the process of bringing the system to a safepoint.
      // Java threads can be in several different states and are
      // stopped by different mechanisms:
      //
      //  1. Running interpreted
      //     The interpeter dispatch table is changed to force it to
      //     check for a safepoint condition between bytecodes.
      //  2. Running in native code
      //     When returning from the native code, a Java thread must check
      //     the safepoint _state to see if we must block.  If the
      //     VM thread sees a Java thread in native, it does
      //     not wait for this thread to block.  The order of the memory
      //     writes and reads of both the safepoint state and the Java
      //     threads state is critical.  In order to guarantee that the
      //     memory writes are serialized with respect to each other,
      //     the VM thread issues a memory barrier instruction
      //     (on MP systems).  In order to avoid the overhead of issuing
      //     a memory barrier for each Java thread making native calls, each Java
      //     thread performs a write to a single memory page after changing
      //     the thread state.  The VM thread performs a sequence of
      //     mprotect OS calls which forces all previous writes from all
      //     Java threads to be serialized.  This is done in the
      //     os::serialize_thread_states() call.  This has proven to be
      //     much more efficient than executing a membar instruction
      //     on every call to native code.
      //  3. Running compiled Code
      //     Compiled code reads a global (Safepoint Polling) page that
      //     is set to fault if we are trying to get to a safepoint.
      //  4. Blocked
      //     A thread which is blocked will not be allowed to return from the
      //     block condition until the safepoint operation is complete.
      //  5. In VM or Transitioning between states
      //     If a Java thread is currently running in the VM or transitioning
      //     between states, the safepointing code will wait for the thread to
      //     block itself when it attempts transitions to a new state.
    

    在前一篇文章已经介绍过了,其实VM Thread去挂起正在做不同事情的Java Thread 是需要做不同的事情的,到目前为止我记录到的现象是,这个block很长的,Initial Running 都是0,waiting to block 都是1或者几。
    那么我们继续读下面的代码,看是如何去到 485 0 1 的

      // Iterate through all threads until it have been determined how to stop them all at a safepoint
      unsigned int iterations = 0;
      int steps = 0 ;
      while(still_running > 0) {
        for (JavaThread *cur = Threads::first(); cur != NULL; cur = cur->next()) {
          assert(!cur->is_ConcurrentGC_thread(), "A concurrent GC thread is unexpectly being suspended");
          ThreadSafepointState *cur_state = cur->safepoint_state();
          if (cur_state->is_running()) {
            cur_state->examine_state_of_thread();
            if (!cur_state->is_running()) {
               still_running--;
               // consider adjusting steps downward:
               //   steps = 0
               //   steps -= NNN
               //   steps >>= 1
               //   steps = MIN(steps, 2000-100)
               //   if (iterations != 0) steps -= NNN
            }
            if (TraceSafepoint && Verbose) cur_state->print();
          }
        }
    
        if (PrintSafepointStatistics && iterations == 0) {
          begin_statistics(nof_threads, still_running);
        }
    
        if (still_running > 0) {
          // Check for if it takes to long
          if (SafepointTimeout && safepoint_limit_time < os::javaTimeNanos()) {
            print_safepoint_timeout(_spinning_timeout);
          }
    

    这里面抓哟的逻辑就是:遍历一把所有的线程,然后做一次cur_state->examine_state_of_thread();如果判断不是running了则可以减一,再看看这个方法cur_state->examine_state_of_thread()里面主逻辑是:

    void ThreadSafepointState::examine_state_of_thread() {
     
      JavaThreadState state = _thread->thread_state();
    
      // Save the state at the start of safepoint processing.
      _orig_thread_state = state;
    
      bool is_suspended = _thread->is_ext_suspended();
      if (is_suspended) {
        roll_forward(_at_safepoint);
        return;
      }
    
      // Some JavaThread states have an initial safepoint state of
      // running, but are actually at a safepoint. We will happily
      // agree and update the safepoint state here.
      if (SafepointSynchronize::safepoint_safe(_thread, state)) {
        SafepointSynchronize::check_for_lazy_critical_native(_thread, state);
        roll_forward(_at_safepoint);
        return;
      }
    
      if (state == _thread_in_vm) {
        roll_forward(_call_back);
        return;
      }
    

    这里VM Thread会check一下每个Java Thread,如果它本身就是挂起的(我的机器只有24个核,就算它用满其实也有400多个被挂起了),那么直接标记它进入了SafePoint,我们等一下再去看那个roll_forward()方法, 现在先看第二种判断

    bool SafepointSynchronize::safepoint_safe(JavaThread *thread, JavaThreadState state) {
      switch(state) {
      case _thread_in_native:
        // native threads are safe if they have no java stack or have walkable stack
        return !thread->has_last_Java_frame() || thread->frame_anchor()->walkable();
    
       // blocked threads should have already have walkable stack
      case _thread_blocked:
        assert(!thread->has_last_Java_frame() || thread->frame_anchor()->walkable(), "blocked and not walkable");
        return true;
    
      default:
        return false;
      }
    }
    

    这里我并不了解这个has_last_Java_frame()frame_anchor()->walkable()我先暂且觉得只要进入了 in_native状态的Java Thread 全都进入了SafePoint。那再看看roll_forward的定义:

    void ThreadSafepointState::roll_forward(suspend_type type) {
      _type = type;
    
      switch(_type) {
        case _at_safepoint:
          SafepointSynchronize::signal_thread_at_safepoint();
          DEBUG_ONLY(_thread->set_visited_for_critical_count(true));
          if (_thread->in_critical()) {
            // Notice that this thread is in a critical section
            SafepointSynchronize::increment_jni_active_count();
          }
          break;
    
        case _call_back:
          set_has_called_back(false);
          break;
    
        case _running:
        default:
          ShouldNotReachHere();
      }
    }
    

    roll_forward()里,第一行就很重要:把该线程的_type直接从running 置成 type (比如_at_safepoint),这个会直接影响到最上面examine_state_of_thread()后的判断;
    接着,比较醒目的是这句SafepointSynchronize::signal_thread_at_safepoint(); 这句代码只有一行

    static void   signal_thread_at_safepoint()              { _waiting_to_block--; }
    

    那从examine_state_of_thread()一直往下的所有代码,它大致意思就是,把那些挂起的,in_native 的,这些线程都标记为已经进入SafePoint了,并且把这个统计变量做_waiting_to_block-- 操作,而如果Java Thread扔还在跑Java Code时,则只做一个_call_back 状态标记。

    好了,所有的这些操作都将会影响到examine_state_of_thread()接着的判断,我们再回到最初examine_state_of_thread()

    while(still_running > 0) {
        for (JavaThread *cur = Threads::first(); cur != NULL; cur = cur->next()) {
          assert(!cur->is_ConcurrentGC_thread(), "A concurrent GC thread is unexpectly being suspended");
          ThreadSafepointState *cur_state = cur->safepoint_state();
          if (cur_state->is_running()) {
            cur_state->examine_state_of_thread();
            if (!cur_state->is_running()) {
               still_running--;
               // consider adjusting steps downward:
               //   steps = 0
               //   steps -= NNN
               //   steps >>= 1
               //   steps = MIN(steps, 2000-100)
               //   if (iterations != 0) steps -= NNN
            }
            if (TraceSafepoint && Verbose) cur_state->print();
          }
        }
    

    大家还记得我们的场景是 485 0 1 吗, 也就是说,过完这个while循环, 我们的still_running要减到0, 也就是说,所有的Java Thread 当初都是 is_running,然后都不是is_running了,也就是说,有484 个线程它要么被挂起了,要么是执行native 要么是已经在SafePoint 了。只有1个线程,它要么在in_java_code, 要么 thread_in_vm, 而,因为它
    必须不能是running,所以它必须只能是 thread_in_vm.

    接着,一大段的 if(is_still_running > ) 都直接pass掉,直到:

      if (PrintSafepointStatistics) {
        update_statistics_on_spin_end();
      }
    
      // wait until all threads are stopped
      while (_waiting_to_block > 0) {
        if (TraceSafepoint) tty->print_cr("Waiting for %d thread(s) to block", _waiting_to_block);
        if (!SafepointTimeout || timeout_error_printed) {
          Safepoint_lock->wait(true);  // true, means with no safepoint checks
        } else {
          // Compute remaining time
          jlong remaining_time = safepoint_limit_time - os::javaTimeNanos();
    
          // If there is no remaining time, then there is an error
          if (remaining_time < 0 || Safepoint_lock->wait(true, remaining_time / MICROUNITS)) {
            print_safepoint_timeout(_blocking_timeout);
          }
        }
      }
    

    这里就是上一篇文章那个地方了,也就是说,VM Thread 把所有的Java Thread标记完了,如果还有线程还处于_waiting_to_block的话,它只能把自己先挂起,等待这些线程都把自己block住,再通知自己。
    我们确实还有一个这样的线程,它之前的状态是 thread_in_vm, 为什么说之前的状态呢,还记得吗?在roll_foward()里它已近置成call_back了。

    好,现在我们来关心这个活动的Java Thread的去向,和它究竟何时去唤醒我们的VM Thread。大家都知道,VM Thread需要STW时,它往所有的Java Thread设置阑珊,所有的线程在SafePoint里把自己block的手段就放在这个make_polling_page_unreadable里(在SafePoint.cpp里找到)

    // Mark the polling page as unreadable
    void os::make_polling_page_unreadable(void) {
      if( !guard_memory((char*)_polling_page, Linux::page_size()) )
        fatal("Could not disable polling page");
    };
    ...
    bool os::guard_memory(char* addr, size_t size) {
      return linux_mprotect(addr, size, PROT_NONE);
    }
    

    而实现在 os_linux_86.cpp

    extern "C" JNIEXPORT int
    JVM_handle_linux_signal(int sig,
                            siginfo_t* info,
                            void* ucVoid,
                            int abort_if_unrecognized) {
      ucontext_t* uc = (ucontext_t*) ucVoid;
      ...
      pc = (address) os::Linux::ucontext_get_pc(uc);
      
      ...
      
      if (thread->thread_state() == _thread_in_Java) {
        // Java thread running in Java code => find exception handler if any
        // a fault inside compiled code, the interpreter, or a stub
    
        if (sig == SIGSEGV && os::is_poll_address((address)info->si_addr)) {
          stub = SharedRuntime::get_poll_stub(pc);
        }
      ..
    

    这里我们不关心具体的OS调用了,我们唯一好奇的如果往后的代码都不会长时间阻塞一个线程的话,那么阻塞很可能就在这里了。
    那我们继续往后看
    这个线程当然最终会读这个page不成功,触发 pageFail,然后会回到VM里回调方法SafepointSynchronize::handle_polling_page_exception这里有个细节让我注意到了

    // ------------------------------------------------------------------------------------------------------
    // Exception handlers
    
    
    void SafepointSynchronize::handle_polling_page_exception(JavaThread *thread) {
      assert(thread->is_Java_thread(), "polling reference encountered by VM thread");
      assert(thread->thread_state() == _thread_in_Java, "should come from Java code");
      assert(SafepointSynchronize::is_synchronizing(), "polling encountered outside safepoint synchronization");
    
      if (ShowSafepointMsgs) {
        tty->print("handle_polling_page_exception: ");
      }
    
      if (PrintSafepointStatistics) {
        inc_page_trap_count();
      }
    
      ThreadSafepointState* state = thread->safepoint_state();
    
      state->handle_polling_page_exception();
    }
    

    大家看到吗,在这里它会累加一下那个 nof_page_trap, 但是从我们的问题截图上看,这个数字是0,所以这点我还不知道怎么去解释,所以只能解释的是,我们看到的问题截图那个message,是VMThread 仍处于阻塞的时候打印的。


    最后会调用ThreadSafepointState::handle_polling_page_exception()在这个方法里,主要是处理一些线程返回指针和地址的一些问题,到这里我几乎都看不懂了。唯一认识的就是它会去调SafepointSynchronize::block(thread());

    // Implementation of Safepoint callback point
    
    void SafepointSynchronize::block(JavaThread *thread) {
       
       ... //此处省略部分代码
       
    
      // Check that we have a valid thread_state at this point
      switch(state) {
        case _thread_in_vm_trans:
        case _thread_in_Java:        // From compiled g
    
          // We are highly likely to block on the Safepoint_lock. In order to avoid blocking in this case,
          // we pretend we are still in the VM.
          thread->set_thread_state(_thread_in_vm);
    
          if (is_synchronizing()) {
             Atomic::inc (&TryingToBlock) ;
          }
    
          // We will always be holding the Safepoint_lock when we are examine the state
          // of a thread. Hence, the instructions between the Safepoint_lock->lock() and
          // Safepoint_lock->unlock() are happening atomic with regards to the safepoint code
          Safepoint_lock->lock_without_safepoint_check();
          if (is_synchronizing()) {
            // Decrement the number of threads to wait for and signal vm thread
            assert(_waiting_to_block > 0, "sanity check");
            _waiting_to_block--;
            thread->safepoint_state()->set_has_called_back(true);
    
            DEBUG_ONLY(thread->set_visited_for_critical_count(true));
            if (thread->in_critical()) {
              // Notice that this thread is in a critical section
              increment_jni_active_count();
            }
    
            // Consider (_waiting_to_block < 2) to pipeline the wakeup of the VM thread
            if (_waiting_to_block == 0) {
              Safepoint_lock->notify_all();
            }
          }
    
          // We transition the thread to state _thread_blocked here, but
          // we can't do our usual check for external suspension and then
          // self-suspend after the lock_without_safepoint_check() call
          // below because we are often called during transitions while
          // we hold different locks. That would leave us suspended while
          // holding a resource which results in deadlocks.
          thread->set_thread_state(_thread_blocked);
          Safepoint_lock->unlock();
    
          // We now try to acquire the threads lock. Since this lock is hold by the VM thread during
          // the entire safepoint, the threads will all line up here during the safepoint.
          Threads_lock->lock_without_safepoint_check();
          // restore original state. This is important if the thread comes from compiled code, so it
          // will continue to execute with the _thread_in_Java state.
          thread->set_thread_state(state);
          Threads_lock->unlock();
          break;
    

    我这里假设这个 thread_in_vm 的线程在进入block之前吧自己状态切换成thread_in_vm_trans,在block里可以看到,它和 thread_in_java的逻辑是一样的。在这里,会重新把自己的状态置为 thread_in_vm(还记得吗,之前置成了 call_back),接着需要在Safepoint_lock里把自己的锁取到(我找了很多代码,这个地方几乎是用久成立的),接着set_has_called_back(true)表示已经调用过了,这时就可以把那个_waiting_to_block_waiting_to_block--

    到最最最最后,如果(waiting_to_block ==0) 则去唤醒VM Thread,后面的事情我们就不关心了。

    后话

    好了,分析到这里, VM Thread怎么走和其他Java Thread 怎么跑我们大致都看了一遍了,我们仍然找不到问题的答案是因为,我们还是猜测不到在 VM Thread 把自己挂起,到Java Thread 走进 block() 被什么阻塞了。

    我猜想出可能是这样一种场景:我假设当时在Running的部分Java Thread其实都在做native 调用,并且他们都在做大量的IO操作,这个可能性是比较大的。由于他们都是在native调用,所以VM Thread是会认为它们都是安全的。而有一个线程在 Thread in vm状态,那么碰巧就是它在做unreadable page load时就由于OS在做IO所以这个page被swap out 了,等待swap in 等待了这么久。
    不过这种猜测很多疑点,比如这个Page很少可能性被swap out 了

    我厂的ivy 同学还有另外一种假设,就是线程是在做thread 的tran,应该是OS的操作把它阻塞了

    而我厂的姐夫同学就给出一个很有价值的信息,我们线上的服务器的SSD是做了raid 0 的,而实验环境没有, raid 0 在做高负荷IO时,是有可能会block住。

    所有的疑问也只能等到有机会时来去验证了。

    待续。。。

    参考资料

    https://github.com/JetBrains/jdk8u_hotspot/blob/master/src/share/vm/runtime/
    http://blog.csdn.net/iter_zc/article/details/41843595
    http://jcdav.is/2015/11/09/More-JVM-Signal-Tricks/

    相关文章

      网友评论

      • Sam同学:好长时间没写新文章了。赶紧的呀。

      本文标题:记一次线上JVM长时间STW之分析 (续)

      本文链接:https://www.haomeiwen.com/subject/qbfpmxtx.html