美文网首页
技术分享 | 在GreatDB分布式部署模式中使用Chaos M

技术分享 | 在GreatDB分布式部署模式中使用Chaos M

作者: GreatSQL | 来源:发表于2021-12-31 10:29 被阅读0次
    • GreatSQL社区原创内容未经授权不得随意使用,转载请联系小编并注明来源。

    1. 需求背景与万里安全数据库软件GreatDB分布式部署模式介绍

    1.1 需求背景

    混沌测试是检测分布式系统不确定性、建立系统弹性信心的一种非常好的方式,因此我们采用开源工具Chaos Mesh来做GreatDB分布式集群的混沌测试。

    1.2 万里安全数据库软件GreatDB分布式部署模式介绍

    万里安全数据库软件GreatDB 是一款关系型数据库软件,同时支持集中式和分布式的部署方式,本文涉及的是分布式部署方式。

    分布式部署模式采用shared-nothing架构;通过数据冗余与副本管理确保数据库无单点故障;数据sharding与分布式并行计算实现数据库系统高性能;可无限制动态扩展数据节点,满足业务需要。

    整体架构如下图所示:

    file

    2. 环境准备

    2.1 Chaos Mesh安装

    在安装Chaos Mesh之前请确保已经预先安装了helm,docker,并准备好了一个kubernetes环境。

    1)在 Helm 仓库中添加 Chaos Mesh 仓库:

    helm repo add chaos-mesh https://charts.chaos-mesh.org
    

    2)查看可以安装的 Chaos Mesh 版本:

    helm search repo chaos-mesh
    

    3)创建安装 Chaos Mesh 的命名空间:

    kubectl create ns chaos-testing
    

    4)在docker环境下安装Chaos Mesh:

    helm install chaos-mesh chaos-mesh/chaos-mesh -n=chaos-testing
    

    验证安装
    执行以下命令查看Chaos Mesh的运行情况:

    kubectl get pod -n chaos-testing
    

    下面是预期输出:

    NAME                                       READY   STATUS    RESTARTS   AGE
    chaos-controller-manager-d7bc9ccb5-dbccq   1/1     Running   0          26d
    chaos-daemon-pzxc7                         1/1     Running   0          26d
    chaos-dashboard-5887f7559b-kgz46           1/1     Running   1          26d
    

    如果3个pod的状态都是Running,表示 Chaos Mesh 已经成功安装。

    2.2 准备测试需要的镜像

    2.2.1 准备mysql镜像

    一般情况下,mysql使用官方5.7版本的镜像,mysql监控采集器使用的是mysqld-exporter,也可以直接从docker hub下载:

    docker pull mysql:5.7
    docker pull prom/mysqld-exporter
    

    2.2.2 准备zookeeper镜像

    zookeeper使用的是官方3.5.5版本镜像,zookeeper组件涉及的监控有jmx-prometheus-exporter 和zookeeper-exporter,均从docker hub下载:

    docker pull zookeeper:3.5.5
    docker pull sscaling/jmx-prometheus-exporter
    docker pull josdotso/zookeeper-exporter
    

    2.2.3 准备GreatDB镜像
    选择一个GreatDB的tar包,将其解压得到一个./greatdb目录,再将greatdb-service-docker.sh文件拷贝到这个解压出来的./greatdb目录里:

    cp greatdb-service-docker.sh ./greatdb/
    

    将greatdb Dockerfile放到./greatdb文件夹的同级目录下,然后执行以下命令构建GreatDB镜像:

    docker build -t greatdb/greatdb:tag2021 .
    

    2.2.4 准备GreatDB分布式集群部署/清理的镜像

    下载集群部署脚本cluster-setup,集群初始化脚本init-zk 以及集群helm charts包(可咨询4.0开发/测试组获取)

    将上述材料放在同一目录下,编写如下Dockerfile:

    FROM debian:buster-slim as init-zk
    
    COPY ./init-zk /root/init-zk
    RUN chmod +x /root/init-zk
    
    FROM debian:buster-slim as cluster-setup
    \# Set aliyun repo for speed
    RUN sed -i 's/deb.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list && \
      sed -i 's/security.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list
    
    RUN apt-get -y update && \
      apt-get -y install \
      curl \
      wget
    
    RUN curl -L https://storage.googleapis.com/kubernetes-release/release/v1.20.1/bin/linux/amd64/kubectl -o /usr/local/bin/kubectl && \
      chmod +x /usr/local/bin/kubectl && \
      mkdir /root/.kube && \
      wget https://get.helm.sh/helm-v3.5.3-linux-amd64.tar.gz && \
      tar -zxvf helm-v3.5.3-linux-amd64.tar.gz && \
      mv linux-amd64/helm /usr/local/bin/helm
    
    COPY ./config /root/.kube/
    COPY ./helm /helm
    COPY ./cluster-setup /
    

    执行以下命令构建所需镜像:

    docker build --target init-zk -t greatdb/initzk:latest .
    
    docker build --target cluster-setup -t greatdb/cluster-setup:v1 .
    

    2.2.5 准备测试用例的镜像
    目前测试支持的用例有:bank,bank2,pbank,tpcc,flashback等,每个用例都是一个可执行文件。

    以flashback测例为例构建测试用例的镜像,先将用例下载到本地,在用例的同一目录下编写如下内容的Dockerfile:

    FROM debian:buster-slim
    COPY ./flashback /
    RUN cd / && chmod +x ./flashback
    

    执行以下命令构建测试用例镜像:

    docker build -t greatdb/testsuite-flashback:v1 .
    

    2.3 将准备好的镜像上传到私有仓库中

    创建私有仓库和上传镜像操作请参考:https://zhuanlan.zhihu.com/p/78543733

    3. Chaos Mesh的使用

    3.1 搭建GreatDB分布式集群

    在上一章2.2.4 中cluster-setup目录下执行以下命令块去搭建测试集群:

    ./cluster-setup  \
    -clustername=c0 \
    -namespace=test \
    -enable-monitor=true \
    -mysql-image=mysql:5.7 \
    -mysql-replica=3 \
    -mysql-auth=1 \
    -mysql-normal=1 \
    -mysql-global=1 \
    -mysql-partition=1 \
    -zookeeper-repository=zookeeper \
    -zookeeper-tag=3.5.5 \
    -zookeeper-replica=3 \
    -greatdb-repository=greatdb/greatdb \
    -greatdb-tag=tag202110 \
    -greatdb-replica=3 \
    -greatdb-serviceHost=172.16.70.249
    

    输出信息:

    liuxinle@liuxinle-OptiPlex-5060:~/k8s/cluster-setup$ ./cluster-setup \
    > -clustername=c0 \
    > -namespace=test \
    > -enable-monitor=true \
    > -mysql-image=mysql:5.7 \
    > -mysql-replica=3 \
    > -mysql-auth=1 \
    > -mysql-normal=1 \
    > -mysql-global=1 \
    > -mysql-partition=1 \
    > -zookeeper-repository=zookeeper \
    > -zookeeper-tag=3.5.5 \
    > -zookeeper-replica=3 \
    > -greatdb-repository=greatdb/greatdb \
    > -greatdb-tag=tag202110 \
    > -greatdb-replica=3 \
    > -greatdb-serviceHost=172.16.70.249
    INFO[2021-10-14T10:41:52+08:00] SetUp the cluster ...                         NameSpace=test
    INFO[2021-10-14T10:41:52+08:00] create namespace ...                         
    INFO[2021-10-14T10:41:57+08:00] copy helm chart templates ...                
    INFO[2021-10-14T10:41:57+08:00] setup ...                                     Component=MySQL
    INFO[2021-10-14T10:41:57+08:00] exec helm install and update greatdb-cfg.yaml ... 
    INFO[2021-10-14T10:42:00+08:00] waiting mysql pods running ...               
    INFO[2021-10-14T10:44:27+08:00] setup ...                                     Component=Zookeeper
    INFO[2021-10-14T10:44:28+08:00] waiting zookeeper pods running ...           
    INFO[2021-10-14T10:46:59+08:00] update greatdb-cfg.yaml                      
    INFO[2021-10-14T10:46:59+08:00] setup ...                                     Component=greatdb
    INFO[2021-10-14T10:47:00+08:00] waiting greatdb pods running ...             
    INFO[2021-10-14T10:47:21+08:00] waiting cluster running ...                  
    INFO[2021-10-14T10:47:27+08:00] waiting prometheus server running...         
    INFO[2021-10-14T10:47:27+08:00] Dump Cluster Info                            
    INFO[2021-10-14T10:47:27+08:00] SetUp success.                                ClusterName=c0 NameSpace=test
    

    看到c0-zookeeper-initzk-7hbfs的状态是Completed,其他pod的状态为Running,表示集群搭建成功。

    3.2 在GreatDB分布式集群中使用Chaos Mesh做混沌测试

    Chaos Mesh在kubernetes环境支持注入的故障类型包括:模拟Pod故障、模拟网络故障、模拟压力场景等,这里我们以模拟Pod故障中的pod-kill为例。

    将实验配置写入到文件中 pod-kill.yaml,内容示例如下:

    apiVersion: chaos-mesh.org/v1alpha1
    kind: PodChaos   # 要注入的故障类型
    metadata:
      name: pod-failure-example
      namespace: test   # 测试集群pod所在的namespace
    spec:
      action: pod-kill   # 要注入的具体故障类型
      mode: all    # 指定实验的运行方式,all(表示选出所有符合条件的 Pod)
      duration: '30s'    # 指定实验的持续时间 
      selector: 
        labelSelectors:
          "app.kubernetes.io/component": "greatdb"    # 指定注入故障目标pod的标签,通过kubectl describe pod c0-greatdb-1 -n test 命令返回结果中Labels后的内容得到
    

    创建故障实验,命令如下:

    kubectl create -n test -f pod-kill.yaml
    

    创建完故障实验之后,执行命令 kubectl get pod -n test -o wide 结果如下:

    NAME                                    READY   STATUS              RESTARTS   AGE     IP             NODE                     NOMINATED NODE   READINESS GATES
    c0-auth0-mysql-0                        2/2     Running             0          14m     10.244.87.18   liuxinle-optiplex-5060   <none>           <none>
    c0-auth0-mysql-1                        2/2     Running             0          14m     10.244.87.54   liuxinle-optiplex-5060   <none>           <none>
    c0-auth0-mysql-2                        2/2     Running             0          13m     10.244.87.57   liuxinle-optiplex-5060   <none>           <none>
    c0-greatdb-0                            0/2     ContainerCreating   0          2s      <none>         liuxinle-optiplex-5060   <none>           <none>
    c0-greatdb-1                            0/2     ContainerCreating   0          2s      <none>         liuxinle-optiplex-5060   <none>           <none>
    c0-glob0-mysql-0                        2/2     Running             0          14m     10.244.87.51   liuxinle-optiplex-5060   <none>           <none>
    c0-glob0-mysql-1                        2/2     Running             0          14m     10.244.87.41   liuxinle-optiplex-5060   <none>           <none>
    c0-glob0-mysql-2                        2/2     Running             0          13m     10.244.87.60   liuxinle-optiplex-5060   <none>           <none>
    c0-nor0-mysql-0                         2/2     Running             0          14m     10.244.87.29   liuxinle-optiplex-5060   <none>           <none>
    c0-nor0-mysql-1                         2/2     Running             0          14m     10.244.87.4    liuxinle-optiplex-5060   <none>           <none>
    c0-nor0-mysql-2                         2/2     Running             0          13m     10.244.87.25   liuxinle-optiplex-5060   <none>           <none>
    c0-par0-mysql-0                         2/2     Running             0          14m     10.244.87.55   liuxinle-optiplex-5060   <none>           <none>
    c0-par0-mysql-1                         2/2     Running             0          14m     10.244.87.13   liuxinle-optiplex-5060   <none>           <none>
    c0-par0-mysql-2                         2/2     Running             0          13m     10.244.87.21   liuxinle-optiplex-5060   <none>           <none>
    c0-prometheus-server-6697649b76-fkvh9   2/2     Running             0          9m24s   10.244.87.37   liuxinle-optiplex-5060   <none>           <none>
    c0-zookeeper-0                          1/1     Running             1          12m     10.244.87.44   liuxinle-optiplex-5060   <none>           <none>
    c0-zookeeper-1                          1/1     Running             0          11m     10.244.87.30   liuxinle-optiplex-5060   <none>           <none>
    c0-zookeeper-2                          1/1     Running             0          10m     10.244.87.49   liuxinle-optiplex-5060   <none>           <none>
    c0-zookeeper-initzk-7hbfs               0/1     Completed           0          12m     10.244.87.17   liuxinle-optiplex-5060   <none>           <none>
    

    4. 在argo中编排测试流程

    Argo 是一个开源的容器本地工作流引擎,用于在Kubernetes上完成工作,可以将多步骤工作流建模为一系列任务,完成测试流程编排。

    我们使用argo定义一个测试任务,基本的测试流程是固定的,如下所示:

    file

    测试流程的step1是部署测试集群,接着开启两个并行任务,step2跑测试用例,模拟业务场景,step3同时使用Chaos Mesh注入故障,step2的测试用例执行结束之后,step4终止故障注入,最后step5清理集群环境。

    4.1 用argo编排一个混沌测试工作流(以flashback测试用例为例)

    1)修改 cluster-setup.yaml 中的image信息,改成步骤2.2 准备测试需要的镜像中自己传上去的集群部署/清理镜像名和tag

    2)修改 testsuite-flashback.yaml 中的image信息,改成步骤2.2 准备测试需要的镜像中自己传上去的测试用例镜像名和tag

    3)将集群部署、测试用例和工具模板的yaml文件全部使用 kubectl apply -n argo -f xxx.yaml 命令创建资源 (这些文件定义了一些argo template,方便用户写workflow时候使用)

    kubectl apply -n argo -f cluster-setup.yaml
    kubectl apply -n argo -f testsuite-flashback.yaml
    kubectl apply -n argo -f tools-template.yaml
    

    4)复制一份workflow模板文件 workflow-template.yaml,将模板文件中注释提示的部分修改为自己的设置即可,然后执行以下命令创建混沌测试工作流:

    kubectl apply -n argo -f workflow-template.yaml
    

    以下是一份workflow模板文件:

    apiVersion: argoproj.io/v1alpha1
    kind: Workflow
    metadata:
      generateName: chaostest-c0-0-
      name: chaostest-c0-0
      namespace: argo
    spec:
      entrypoint: test-entry #测试入口,在这里传入测试参数,填写clustername、namespace、host、greatdb镜像名和tag名等基本信息
      serviceAccountName: argo
      arguments:
        parameters:
          - name: clustername
            value: c0
          - name: namespace
            value: test
          - name: host
            value: 172.16.70.249
          - name: port
            value: 30901
          - name: password
            value: Bgview@2020
          - name: user
            value: root
          - name: run-time
            value: 10m
          - name: greatdb-repository
            value: greatdb/greatdb
          - name: greatdb-tag
            value: tag202110
          - name: nemesis
            value: kill_mysql_normal_master,kill_mysql_normal_slave,kill_mysql_partition_master,kill_mysql_partition_slave,kill_mysql_auth_master,kill_mysql_auth_slave,kill_mysql_global_master,kill_mysql_global_slave,kill_mysql_master,kill_mysql_slave,net_partition_mysql_normal,net_partition_mysql_partition,net_partition_mysql_auth,net_partition_mysql_global
          - name: mysql-partition
            value: 1
          - name: mysql-global
            value: 1
          - name: mysql-auth
            value: 1
          - name: mysql-normal
            value: 2
      templates:
        - name: test-entry
          steps:
            - - name: setup-greatdb-cluster  # step.1 集群部署. 请指定正确的参数,主要是mysql和zookeeper的镜像名、tag名
                templateRef:
                  name: cluster-setup-template
                  template: cluster-setup
                arguments:
                  parameters:
                    - name: namespace
                      value: "{{workflow.parameters.namespace}}"
                    - name: clustername
                      value: "{{workflow.parameters.clustername}}"
                    - name: mysql-image
                      value: mysql:5.7.34
                    - name: mysql-replica
                      value: 3
                    - name: mysql-auth
                      value: "{{workflow.parameters.mysql-auth}}"
                    - name: mysql-normal
                      value: "{{workflow.parameters.mysql-normal}}"
                    - name: mysql-partition
                      value: "{{workflow.parameters.mysql-partition}}"
                    - name: mysql-global
                      value: "{{workflow.parameters.mysql-global}}"
                    - name: enable-monitor
                      value: false
                    - name: zookeeper-repository
                      value: zookeeper
                    - name: zookeeper-tag
                      value: 3.5.5
                    - name: zookeeper-replica
                      value: 3
                    - name: greatdb-repository
                      value: "{{workflow.parameters.greatdb-repository}}"
                    - name: greatdb-tag
                      value: "{{workflow.parameters.greatdb-tag}}"
                    - name: greatdb-replica
                      value: 3
                    - name: greatdb-serviceHost
                      value: "{{workflow.parameters.host}}"
                    - name: greatdb-servicePort
                      value: "{{workflow.parameters.port}}"
            - - name: run-flashbacktest    # step.2 运行测试用例,请替换为你要运行的测试用例template并指定正确的参数,主要是测试使用的表个数和大小
                templateRef:
                  name: flashback-test-template
                  template: flashback
                arguments:
                  parameters:
                    - name: user
                      value: "{{workflow.parameters.user}}"
                    - name: password
                      value: "{{workflow.parameters.password}}"
                    - name: host
                      value: "{{workflow.parameters.host}}"
                    - name: port
                      value: "{{workflow.parameters.port}}"
                    - name: concurrency
                      value: 16
                    - name: size
                      value: 10000
                    - name: tables
                      value: 10
                    - name: run-time
                      value: "{{workflow.parameters.run-time}}"
                    - name: single-statement
                      value: true
                    - name: manage-statement
                      value: true
              - name: invoke-chaos-for-flashabck-test    # step.3 注入故障,请指定正确的参数,这里run-time和interval分别定义了故障注入的时间和频次,因此省略掉了终止故障注入步骤
                templateRef:
                  name: chaos-rto-template
                  template: chaos-rto
                arguments:
                  parameters:
                    - name: user
                      value: "{{workflow.parameters.user}}"
                    - name: host
                      value: "{{workflow.parameters.host}}"
                    - name: password
                      value: "{{workflow.parameters.password}}"
                    - name: port
                      value: "{{workflow.parameters.port}}"
                    - name: k8s-config
                      value: /root/.kube/config
                    - name: namespace
                      value: "{{workflow.parameters.namespace}}"
                    - name: clustername
                      value: "{{workflow.parameters.clustername}}"
                    - name: prometheus
                      value: ''
                    - name: greatdb-job
                      value: greatdb-monitor-greatdb
                    - name: nemesis
                      value: "{{workflow.parameters.nemesis}}"
                    - name: nemesis-duration
                      value: 1m
                    - name: nemesis-mode
                      value: default
                    - name: wait-time
                      value: 5m
                    - name: check-time
                      value: 5m
                    - name: nemesis-scope
                      value: 1
                    - name: nemesis-log
                      value: true
                    - name: enable-monitor
                      value: false
                    - name: run-time
                      value: "{{workflow.parameters.run-time}}"
                    - name: interval
                      value: 1m
                    - name: monitor-log
                      value: false
                    - name: enable-rto
                      value: false
                    - name: rto-qps
                      value: 0.1
                    - name: rto-warm
                      value: 5m
                    - name: rto-time
                      value: 1m
                    - name: log-level
                      value: debug
            - - name: flashbacktest-output         # 输出测试用例是否通过的结果
                templateRef:
                  name: tools-template
                  template: output-result
                arguments:
                  parameters:
                    - name: info
                      value: "flashback test pass, with nemesis: {{workflow.parameters.nemesis}}"
            - - name: clean-greatdb-cluster           # step.4 清理测试集群,这里的参数和step.1的参数一致
                templateRef:
                  name: cluster-setup-template
                  template: cluster-setup
                arguments:
                  parameters:
                    - name: namespace
                      value: "{{workflow.parameters.namespace}}"
                    - name: clustername
                      value: "{{workflow.parameters.clustername}}"
                    - name: mysql-image
                      value: mysql:5.7
                    - name: mysql-replica
                      value: 3
                    - name: mysql-auth
                      value: "{{workflow.parameters.mysql-auth}}"
                    - name: mysql-normal
                      value: "{{workflow.parameters.mysql-normal}}"
                    - name: mysql-partition
                      value: "{{workflow.parameters.mysql-partition}}"
                    - name: mysql-global
                      value: "{{workflow.parameters.mysql-global}}"
                    - name: enable-monitor
                      value: false
                    - name: zookeeper-repository
                      value: zookeeper
                    - name: zookeeper-tag
                      value: 3.5.5
                    - name: zookeeper-replica
                      value: 3
                    - name: greatdb-repository
                      value: "{{workflow.parameters.greatdb-repository}}"
                    - name: greatdb-tag
                      value: "{{workflow.parameters.greatdb-tag}}"
                    - name: greatdb-replica
                      value: 3
                    - name: greatdb-serviceHost
                      value: "{{workflow.parameters.host}}"
                    - name: greatdb-servicePort
                      value: "{{workflow.parameters.port}}"
                    - name: clean
                      value: true
            - - name: echo-result
                templateRef:
                  name: tools-template
                  template: echo
                arguments:
                  parameters:
                    - name: info
                      value: "{{item}}"
                withItems:
                  - "{{steps.flashbacktest-output.outputs.parameters.result}}"
    

    Enjoy GreatSQL :)

    本文由博客一文多发平台 OpenWrite 发布!

    相关文章

      网友评论

          本文标题:技术分享 | 在GreatDB分布式部署模式中使用Chaos M

          本文链接:https://www.haomeiwen.com/subject/otjgqrtx.html