美文网首页
kspan 集群度量方案

kspan 集群度量方案

作者: Goun | 来源:发表于2021-06-29 16:44 被阅读0次

    非原创,参考文章如下,相对下列文章信息,操作和说明更加贴近日常工作:

    背景

    作为集群管理员,当我们管理的集群数量众多时,或者pod从创建到启动的过程,需要经理的过程,以及耗时,可以分析出我们的集群慢在哪里。

    在没有可视化工具之前,我们可以通过查看event事件,确定每个步骤的耗时,如下:

    $ kubectl create deploy nginx --image=nginx
    deployment.apps/nginx created
    $ kubectl get event
    LAST SEEN   TYPE     REASON              OBJECT                       MESSAGE
    7s          Normal   Scheduled           pod/nginx-f89759699-whcxz    Successfully assigned default/nginx-f89759699-whcxz to hd-k8s-master003
    7s          Normal   Pulling             pod/nginx-f89759699-whcxz    Pulling image "nginx"
    7s          Normal   SuccessfulCreate    replicaset/nginx-f89759699   Created pod: nginx-f89759699-whcxz
    7s          Normal   ScalingReplicaSet   deployment/nginx             Scaled up replica set nginx-f89759699 to 1
    

    我们可以查看到Pod从调度,pull ,create,start的全部过程,以及大致的时间消耗。

    更优雅的方案

    K8S 中的这些事件,都对应着我们的一个操作,比如上文中是创建了一个 deployment ,它产生了几个 event , 包括 Scheduled , Pulled ,Created 等。我们将其进行抽象,是不是和我们做的链路追踪(tracing)很像呢?

    这里我们会用到一个 CNCF 的毕业项目 Jaeger[1] ,在之前的 K8S生态周报 中我有多次介绍它,Jaeger 是一款开源的,端对端的分布式 tracing 系统。不过本文重点不是介绍它,所以我们查看其文档,快速的部署一个 Jaeger 即可。另一个 CNCF 的 sandbox 级别的项目是 OpenTelemetry[2] 是一个云原生软件的可观测框架,我们可以把它跟 Jaeger 结合起来使用。不过本文的重点不是介绍这俩项目,这里暂且略过。

    接下来介绍我们这篇文章的用到的主要项目,是来自 Weaveworks 开源的一个项目,名叫 kspan ,它的主要做法就是将 K8S 中的 event 作为 trace 系统中的 span 进行组织。

    部署kspan
    创建rbac授权,因为kspan要监听event相关信息

    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: kspan
      
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: kspan-admin
    rules:
    - apiGroups:
      - ""
      resources:
      - configmaps
      - endpoints
      - persistentvolumeclaims
      - persistentvolumeclaims/status
      - pods
      - replicationcontrollers
      - replicationcontrollers/scale
      - serviceaccounts
      - services
      - services/status
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - ""
      resources:
      - bindings
      - events
      - limitranges
      - namespaces/status
      - pods/log
      - pods/status
      - replicationcontrollers/status
      - resourcequotas
      - resourcequotas/status
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - ""
      resources:
      - pods/exec
      verbs:
      - create
    - apiGroups:
      - ""
      resources:
      - namespaces
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - apps
      resources:
      - controllerrevisions
      - daemonsets
      - daemonsets/status
      - deployments
      - deployments/scale
      - deployments/status
      - replicasets
      - replicasets/scale
      - replicasets/status
      - statefulsets
      - statefulsets/scale
      - statefulsets/status
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - autoscaling
      resources:
      - horizontalpodautoscalers
      - horizontalpodautoscalers/status
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - batch
      resources:
      - cronjobs
      - cronjobs/status
      - jobs
      - jobs/status
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - extensions
      resources:
      - daemonsets
      - daemonsets/status
      - deployments
      - deployments/scale
      - deployments/status
      - ingresses
      - ingresses/status
      - networkpolicies
      - replicasets
      - replicasets/scale
      - replicasets/status
      - replicationcontrollers/scale
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - policy
      resources:
      - poddisruptionbudgets
      - poddisruptionbudgets/status
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - networking.k8s.io
      resources:
      - ingresses
      - ingresses/status
      - networkpolicies
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - metrics.k8s.io
      resources:
      - pods
      - nodes
      verbs:
      - get
      - list
      - watch
    - apiGroups:
      - metrics.k8s.io
      resources:
      - pods
      verbs:
      - get
      - list
      - watch
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      creationTimestamp: null
      name: kspan-admin
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: kspan-admin
    subjects:
    - kind: ServiceAccount
      name: kspan
      namespace: default
    

    创建pod

    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        run: kspan
      name: kspan
    spec:
      containers:
      - image: docker.io/weaveworks/kspan:v0.0
        name: kspan
        resources: {}
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      serviceAccountName: kspan
    

    部署jagger

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: jaeger
      name: jaeger
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: jaeger
      strategy: {}
      template:
        metadata:
          labels:
            app: jaeger
        spec:
          containers:
          - image: jaegertracing/opentelemetry-all-in-one
            name: opentelemetry-all-in-one
            resources: {}
            ports:
            - containerPort: 16685
            - containerPort: 16686
            - containerPort: 5775
              protocol: UDP
            - containerPort: 6831
              protocol: UDP
            - containerPort: 6832
              protocol: UDP
            - containerPort: 5778
              protocol: TCP
    

    创建jagger svc,它默认会使用 otlp-collector.default:55680 传递 span

    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: jaeger
      name: otlp-collector
    spec:
      ports:
      - port: 55680
        protocol: TCP
        targetPort: 55680
      selector:
        app: jaeger
    

    当所有的Pod都启动成功后,我们可以进行访问测试

    效果

    创建ns以及Pod

    $ kubectl create ns moelove
    namespace/moelove created
    $ kubectl -n moelove create deploy nginx --image=nginx
    deployment.apps/nginx created
    

    查看jaeger ui,查看信息


    创建Pod耗时详情

    结论

    目前kspan的开源地址并没有提供定制化部署的方案,或者我没有找到详细的文档,所以不建议将kspan作为kubernetes的常用组件进行部署,当有需求再进行部署,查看任务下发的耗时,找到瓶颈即可。

    如果你是多租户场景,需要针对调度慢等情况做告警,可以研究OpenTelemetry

    相关文章

      网友评论

          本文标题:kspan 集群度量方案

          本文链接:https://www.haomeiwen.com/subject/jgrkultx.html