美文网首页
UDP服务器性能优化:Perf和GCP的对比

UDP服务器性能优化:Perf和GCP的对比

作者: winlinvip | 来源:发表于2021-03-02 21:44 被阅读0次

RTC服务器是UDP协议,存在以下几个难点:

  1. UDP包数目众多,包普遍比较小。比如一个视频关键帧,可能会被分成几十个UDP发送。比如每个Opus包,几十到一百多字节不等。
  2. 不同协议需要复用端口(才能支持K8S云原生平台),每个包都需要找到对应的Session处理,客户端地址可能还会变更。
  3. 高实时性,每个Session要即时的收发数据,不能做主动聚集包后收发,每个Session短时间就一两个包处理,没有太多可以批量处理的包。
  4. 内核对UDP协议的性能优化,不如TCP高,优化方式也不如TCP多。
  5. 需要加密和解密,除了CPU消耗,还导致内存拷贝。

尽管这样,还是有不少可以做的,详细可以看下面的链接:

优化过程中,最关键的是压测工具srs-bench,以及Perf+GCP

发现Perf和GCP的数据有点差距,比如67%左右CPU使用时:

top - 14:58:57 up 25 days,  1:58,  4 users,  load average: 0.66, 0.76, 0.73
Tasks:  92 total,   2 running,  90 sleeping,   0 stopped,   0 zombie
%Cpu(s): 30.1 us,  5.1 sy,  0.0 ni, 61.8 id,  0.0 wa,  0.0 hi,  3.1 si,  0.0 st
KiB Mem :  8008964 total,   460028 free,  1390824 used,  6158112 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  6311680 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 8375 root       0 -20 1120556 992436   4192 R  68.1 12.4  24:14.17 srs
 8462 root      20   0  312104  36364   3800 S   1.0  0.5   0:25.25 perf
 6745 root      20   0  150332   6664   2380 S   0.7  0.1   0:15.11 dstat
    6 root      20   0       0      0      0 S   0.3  0.0  49:03.07 ksoftirqd/0

SRS的统计信息:

Hybrid cpu=70.00%,969MB, cid=47984,8, timer=24421,4394,19973, clock=0,45,4,0,0,0,0,0,0, 
objs=(pkt:0,raw:0,fua:0,msg:0,oth:401,buf:0,drop:0), 
cache=(pkt:20-31w,raw:109113-69w,fua:32227-41w,msg:1-41w,buf:19-34w)

RTC: Server conns=401, rpkts=(47734,rtp:47726,stun:1,rtcp:7), 
spkts=(1710,rtp:117,stun:1,rtcp:1592), rtcp=(pli:0,twcc:3982,rr:398), 
snk=(39826,a:19913,v:19913,h:0), rnk=(2,2,h:2,m:0), 
fid=(id:0,fid:5272,ffid:42461,addr:1,faddr:47734)

对比Perf的Top37函数,总计60.34%:

Overhead  Shared Object       Symbol
  10.13%  srs.4.0.77          [.] sha1_block_data_order_avx2
   4.37%  srs.4.0.77          [.] bitvector_left_shift
   2.96%  libpthread-2.17.so  [.] __recvfrom_nocancel
   2.51%  libc-2.17.so        [.] __memcpy_ssse3
   2.51%  srs.4.0.77          [.] heap_delete
   2.49%  srs.4.0.77          [.] SrsHourGlass::cycle
   2.39%  srs.4.0.77          [.] SrsRtpPacket2::decode
   2.19%  srs.4.0.77          [.] SrsRtpObjectCacheManager<SrsRtpPacket2>::recycle
   2.16%  srs.4.0.77          [.] SrsRtpPacket2::recycle_shared_buffer
   1.79%  [kernel]            [k] finish_task_switch
   1.71%  srs.4.0.77          [.] SrsRtcPublishStream::on_rtp
   1.56%  [kernel]            [k] system_call_after_swapgs
   1.56%  [kernel]            [k] free_hot_cold_page
   1.52%  srs.4.0.77          [.] srtp_get_stream
   1.47%  [kernel]            [k] copy_user_enhanced_fast_string
   1.39%  srs.4.0.77          [.] aesni_ctr32_encrypt_blocks
   1.33%  srs.4.0.77          [.] operator delete[]
   1.32%  [kernel]            [k] _raw_spin_unlock_irqrestore
   1.19%  srs.4.0.77          [.] SrsRtcRecvTrack::do_check_send_nacks
   0.99%  srs.4.0.77          [.] OPENSSL_cleanse
   0.94%  srs.4.0.77          [.] SrsRtpRingBuffer::set
   0.93%  srs.4.0.77          [.] std::less<unsigned int>::operator()
   0.89%  srs.4.0.77          [.] srtp_unprotect
   0.88%  srs.4.0.77          [.] heap_insert
   0.85%  srs.4.0.77          [.] SrsRtcPublishStream::check_send_nacks
   0.85%  srs.4.0.77          [.] SrsRtpNackForReceiver::get_nack_seqs
   0.83%  srs.4.0.77          [.] SrsRtcPublishStream::get_audio_track
   0.81%  srs.4.0.77          [.] SrsRtcTrackDescription::has_ssrc
   0.72%  srs.4.0.77          [.] SrsResourceManager::find_by_fast_id
   0.69%  srs.4.0.77          [.] SrsSharedPtrMessage::count
   0.68%  srs.4.0.77          [.] EVP_MD_CTX_cleanup
   0.67%  srs.4.0.77          [.] SrsRtcPublishStream::do_on_rtp_plaintext
   0.64%  srs.4.0.77          [.] SrsBuffer::require
   0.63%  libc-2.17.so        [.] epoll_ctl
   0.61%  [kernel]            [k] udp_recvmsg
   0.60%  srs.4.0.77          [.] operator new[]
   0.58%  srs.4.0.77          [.] SrsUdpMuxListener::cycle

而GCP的top37函数,总计69.59%:

[root@iZbp12af7ajnkuducj2u8rZ ~]# ./objs/pprof objs/srs gperf.srs.gcp 
(pprof) top37
Total: 17795 samples
    2397  13.5%  13.5%     2397  13.5% __recvfrom_nocancel
    1894  10.6%  24.1%     1894  10.6% sha1_block_data_order_avx2
     746   4.2%  28.3%      746   4.2% bitvector_left_shift
     501   2.8%  31.1%      511   2.9% heap_delete
     485   2.7%  33.8%     2315  13.0% SrsHourGlass::cycle
     440   2.5%  36.3%      440   2.5% __GI_epoll_wait
     429   2.4%  38.7%     1136   6.4% SrsRtpObjectCacheManager::recycle
     424   2.4%  41.1%      424   2.4% __memcpy_ssse3
     417   2.3%  43.5%      516   2.9% SrsRtpPacket2::recycle_shared_buffer
     373   2.1%  45.6%     1146   6.4% SrsRtpPacket2::decode
     321   1.8%  47.4%      321   1.8% __GI_epoll_ctl
     287   1.6%  49.0%     4914  27.6% SrsRtcPublishStream::on_rtp
     270   1.5%  50.5%      270   1.5% aesni_ctr32_encrypt_blocks
     245   1.4%  51.9%      698   3.9% SrsRtcRecvTrack::do_check_send_nacks
     218   1.2%  53.1%      218   1.2% srtp_get_stream
     200   1.1%  54.2%     1338   7.5% SrsRtpRingBuffer::set
     199   1.1%  55.3%      199   1.1% std::less::operator
     185   1.0%  56.4%      923   5.2% SrsRtcPublishStream::check_send_nacks
     180   1.0%  57.4%      180   1.0% heap_insert
     179   1.0%  58.4%      206   1.2% SrsRtpNackForReceiver::get_nack_seqs
     175   1.0%  59.4%      175   1.0% __sendto_nocancel
     150   0.8%  60.2%      237   1.3% SrsResourceManager::find_by_fast_id
     149   0.8%  61.1%      149   0.8% OPENSSL_cleanse
     143   0.8%  61.9%      143   0.8% srtp_unprotect
     141   0.8%  62.6%      141   0.8% std::vector::size
     130   0.7%  63.4%      130   0.7% EVP_MD_CTX_cleanup
     127   0.7%  64.1%      264   1.5% SrsRtcPublishStream::get_audio_track
     118   0.7%  64.8%      118   0.7% SrsFastCoroutine::pull
     118   0.7%  65.4%      118   0.7% SrsRtcTrackDescription::has_ssrc
     114   0.6%  66.1%      114   0.6% SrsBuffer::require
     113   0.6%  66.7%     3272  18.4% SrsRtcPublishStream::do_on_rtp_plaintext
     110   0.6%  67.3%      377   2.1% SrsRtpObjectCacheManager::allocate
     106   0.6%  67.9%     8985  50.5% SrsUdpMuxListener::cycle
      96   0.5%  68.4%      634   3.6% _st_vp_check_clock
      94   0.5%  69.0%     1151   6.5% SrsRtcConnection::notify
      84   0.5%  69.4%       84   0.5% PackedCache::KeyMatch (inline)
      84   0.5%  69.9%       84   0.5% std::_Rb_tree::_M_begin

差异见下表:

TOP Perf Perf Top GCP GCP
1 10.13% [.] sha1_block_data_order_avx2 1 13.5% __recvfrom_nocancel
2 4.37% [.] bitvector_left_shift 2 10.6% sha1_block_data_order_avx2
3 2.96% [.] __recvfrom_nocancel 3 4.2% bitvector_left_shift
4 2.51% [.] __memcpy_ssse3 4 2.8% heap_delete
5 2.51% [.] heap_delete 5 2.7% SrsHourGlass::cycle
6 2.49% [.] SrsHourGlass::cycle 6 2.5% __GI_epoll_wait
7 2.39% [.] SrsRtpPacket2::decode 7 2.4% SrsRtpObjectCacheManager::recycle
8 2.19% [.] SrsRtpObjectCacheManager<SrsRtpPacket2>::recycle 8 2.4% __memcpy_ssse3
9 2.16% [.] SrsRtpPacket2::recycle_shared_buffer 9 2.3% SrsRtpPacket2::recycle_shared_buffer
10 1.79% [k] finish_task_switch 10 2.1% SrsRtpPacket2::decode
11 1.71% [.] SrsRtcPublishStream::on_rtp 11 1.8% __GI_epoll_ctl
12 1.56% [k] system_call_after_swapgs 12 1.6% SrsRtcPublishStream::on_rtp
13 1.56% [k] free_hot_cold_page 13 1.5% aesni_ctr32_encrypt_blocks
14 1.52% [.] srtp_get_stream 14 1.4% SrsRtcRecvTrack::do_check_send_nacks
15 1.47% [k] copy_user_enhanced_fast_string 15 1.2% srtp_get_stream
16 1.39% [.] aesni_ctr32_encrypt_blocks 16 1.1% SrsRtpRingBuffer::set
17 1.33% [.] operator delete[] 17 1.1% std::less::operator
18 1.32% [k] _raw_spin_unlock_irqrestore 18 1.0% SrsRtcPublishStream::check_send_nacks
19 1.19% [.] SrsRtcRecvTrack::do_check_send_nacks 19 1.0% heap_insert
20 0.99% [.] OPENSSL_cleanse 20 1.0% SrsRtpNackForReceiver::get_nack_seqs
21 0.94% [.] SrsRtpRingBuffer::set 21 1.0% __sendto_nocancel
22 0.93% [.] std::less<unsigned int>::operator() 22 0.8% SrsResourceManager::find_by_fast_id
23 0.89% [.] srtp_unprotect 23 0.8% OPENSSL_cleanse
24 0.88% [.] heap_insert 24 0.8% srtp_unprotect
25 0.85% [.] SrsRtcPublishStream::check_send_nacks 25 0.8% std::vector::size
26 0.85% [.] SrsRtpNackForReceiver::get_nack_seqs 26 0.7% EVP_MD_CTX_cleanup
27 0.83% [.] SrsRtcPublishStream::get_audio_track 27 0.7% SrsRtcPublishStream::get_audio_track
28 0.81% [.] SrsRtcTrackDescription::has_ssrc 28 0.7% SrsFastCoroutine::pull
29 0.72% [.] SrsResourceManager::find_by_fast_id 29 0.7% SrsRtcTrackDescription::has_ssrc
30 0.69% [.] SrsSharedPtrMessage::count 30 0.6% SrsBuffer::require
31 0.68% [.] EVP_MD_CTX_cleanup 31 0.6% SrsRtcPublishStream::do_on_rtp_plaintext
32 0.67% [.] SrsRtcPublishStream::do_on_rtp_plaintext 32 0.6% SrsRtpObjectCacheManager::allocate
33 0.64% [.] SrsBuffer::require 33 0.6% SrsUdpMuxListener::cycle
34 0.63% [.] epoll_ctl 34 0.5% _st_vp_check_clock
35 0.61% [k] udp_recvmsg 35 0.5% SrsRtcConnection::notify
36 0.60% [.] operator new[] 36 0.5% PackedCache::KeyMatch (inline)
37 0.58% [.] SrsUdpMuxListener::cycle 37 0.5% std::_Rb_tree::_M_begin

相关文章

  • UDP服务器性能优化:Perf和GCP的对比

    RTC服务器是UDP协议,存在以下几个难点: UDP包数目众多,包普遍比较小。比如一个视频关键帧,可能会被分成几十...

  • 火焰图与perf工具

    高并发中性能优化取决于找出性能瓶颈,可借助linux的perf工具。

  • Netty-核心组件

    Netty是一个基于NIO的网络框架,它极大的简化并优化了TCP和UDP套接字服务器等网络编程,并且性能以及安全...

  • 从存储角度提升程序性能

    这篇主要是利用局部性原理优化程序性能,以前也写过一篇优化程序性能,这篇主要是结合perf工具,量化比较程序性能的优...

  • linux核心命令

    ltrace: 跟踪进程调用库函数的情况 perf:Linux kernel自带的系统性能优化工具 strace:...

  • 大型网站技术架构(3):WEB 前端性能优化

    上次说到了性能优化策略,根据网站的分层架构,可以大致的分为 web 前端性能优化,应用服务器性能优化,存储服务器性...

  • MySQL性能调优

    MYSQL查询语句优化 mysql的性能优化包罗甚广: 索引优化,查询优化,查询缓存,服务器设置优化,操作系统和硬...

  • Google Cloud Platform(GCP) 学习笔记

    GCP 基础知识 GCP 于 AWS 很相似,通过不同云平台的一些概念对比,快速了解GCP。 Also read ...

  • Nginx 基本入门1

    什么是NginxNginx是一个高性能的http服务器和反向代理服务器。Nginx 专为性能优化而开发,性能是其最...

  • 【perf】Linux性能分析利器perf的安装和使用

    # yum -y install perf # yum infoperf 性能分析利器之perf浅析 http:/...

网友评论

      本文标题:UDP服务器性能优化:Perf和GCP的对比

      本文链接:https://www.haomeiwen.com/subject/sftfqltx.html