美文网首页
Greenplum on Kubernetes

Greenplum on Kubernetes

作者: 程序员王旺 | 来源:发表于2021-03-10 11:00 被阅读0次

    Greenplum 是一款基于postsqlgre实现的分布式关系型数据库,我们今天尝试下如何把它部署到kubernetes上。

    docker-compose部署方式

    在部署到kkubernetes之前,我们先来演示一个使用docker部署的例子,本例是基于这篇文章,感谢提供的基础镜像。
    1. 下载greenplum
    https://github.com/greenplum-db/gpdb/releases

    2. 编译greenplum镜像
    定义Dockerfile文件

    #  lyasper/gphost 基础镜像提供了sshd服务,及一些初始化脚本
    FROM lyasper/gphost
    # open-source-greenplum-xx...rpm为上面下载的greenplum安装包
    COPY open-source-greenplum-db-6.14.1-rhel7-x86_64.rpm /home/gpadmin/greenplum-db.rpm
    RUN yum install -y /home/gpadmin/greenplum-db.rpm
    RUN chown -R gpadmin /usr/local/greenplum-db*
    RUN rm -f /home/gpadmin/greenplum-db.rpm
    
    
    docker build -t greenplum:6 .
    

    3. 编写一个docker-compose.yaml文件

    version: '3'
    services:
      mdw:
        hostname: mdw
        image: "greenplum:6"
        ports:
         - "6222:22"
         - "6432:5432"
      sdw1:
        hostname: sdw1
        image: "greenplum:6"
      sdw2:
        hostname: sdw2
        image: "greenplum:6"
      etl:
        hostname: etl
        image: "greenplum:6"
    
    docker-compose up
    

    4. 初始化greenplum

    # 登陆master节点
    ssh -p 6222 gpadmin@127.0.0.1
    # 或者 ssh -p 6222 gpadmin@0.0.0.0
    # 密码: changeme
    
    # 初始化配置文件
    source /usr/local/greenplum-db/greenplum_path.sh
    
    # 配置greenplum文件
    artifact/prepare.sh -s 2 -n 2
    # -s: segment(容器)的个数
    # -n: 每个segment(容器)上primary的个数
    
    # 初始化集群,会生成env.sh 文件(greenplum所需的环境变量)
    gpinitsystem -a -c gpinitsystem_config
    source env.sh
    
    # 开启远程无密码访问
    artifact/postinstall.sh
    
    # 查看安装结果
    ps -ef | grep postgres
    
    # 查看集群状态
    gpstate -s
    
    image.png
    image.png

    k8s 部署方式

    制作k8s专用镜像(可选)
    在部署到k8s之前,我们需要对现有的镜像做一些处理,生成一个k8s专用的greenplum镜像。稍后我会把制作好的镜像放到阿里云上,方便大家拉取。

    1. 先启动一个容器并进入

    注意:docker run 后面一定不要加/bin/bash等参数,否则后面commit容器时会覆盖以前镜像的CMD参数,所以这里采用先启用再进入到容器的方式

    $  docker run -it --rm greenplum:6
    $  docker exec -it a2b9a823d845 /bin/bash
    
    1. 修改 artifact/prepare.sh 脚本,需修改3个地方
    • 修改第7行master地址末尾加.gp:MASTERHOST=hostname.gp
    • 修改第8行数据节点的前缀:SEG_PREFIX=gp-
    • 修改87、90行,分别在地址后面加个 .gp
    #!/bin/bash
    
    set -x
    if [ -z ${GPHOME+x} ]; then echo "GPHOME is unset";exit 1 ; fi
    
    MASTERHOST=`hostname`.gp
    SEG_PREFIX=gp-
    SEG_HOSTNUM=0 # 0 means muster only
    SEG_NUMPERHOST=1
    VERBOSE=0
    .....
    if [ $SEG_HOSTNUM -eq 0 ];then
         echo $MASTERHOST.gp >  $HOSTFILE
    else
       for i in $(seq 1 $SEG_HOSTNUM); do
          echo $SEG_PREFIX$i.gp >> $HOSTFILE
       done
     fi    
    
    1. 修改/usr/local/greenplum-db-6.14.1/bin/gpinitsystem 脚本第1320行,在解析主机地址时,末尾添加.gp,这样就可以用gp-1.gp去连接segment服务器,而不是gp-1。
       1314 HOST_LOOKUP() {
       1315         res=`echo $1 | $GPHOSTCACHELOOKUP`
       1316         err_index=`echo $res | awk '{print index($0,"__lookup_of_hostname_failed__")}'`
       1317         if [ $err_index -ne 0 ]; then
       1318                 echo "__lookup_of_hostname_failed__"
       1319         else
       1320                 echo $res.gp  # 在主机
       1321         fi
       1322 }
    
    1. 创建一个初始化脚本
      新建一个脚本 /home/gpadmin/init.sh,这个脚本仅在master上运行。
    #!/bin/bash
    
    # whoami: gpadmin
    
    # only run on master
    host=`hostname`
    if [ $host = "gp-0" ];then
       source /usr/local/greenplum-db/greenplum_path.sh
       artifact/prepare.sh -s 2 -n 2
       gpinitsystem -a -c gpinitsystem_config
       source env.sh
       artifact/postinstall.sh
       gpstate -s
    fi
    
    1. 生成一个新的镜像
      我已经把镜像放到了阿里云上,所以后面直接拉取即可
    $ docker commit -a "wangjc" -m "Greenplum on Kubernetes" 60875d7be058  greenplum-k8s:6.14.18
    $ docker tag greenplum-k8s:6.14.18 registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
    $ docker push registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
    

    部署到k8s

    我的 k8s版本为: 1.18.0,低版本是不支持PVC的,注意下

    1. 创建Local Volume
      这里我们使用本地的存储卷来保存greenplum的数据,在部署之前,先保证 /var/greenplum目录存在且有足够的权限。
    # gp-lv.yaml 
    ### greenplum storage ###
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: greenplum-pv
    spec:
      capacity:
        storage: 2Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Delete
      storageClassName: greenplum-storage
      local:
        path: /var/greenplum
      nodeAffinity:
        required:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - master
              - node1
              - node2
    
    ---
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: greenplum-claim
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 2Gi
      storageClassName: greenplum-storage
    
    $ kubectl create -f gp-lv.yaml 
    $ kubectl get pv
    $ kubectl get pvc 
    
    1. 部署greenplum文件
    • 这里采用StatefulSet方式部署,部署三个服务: gp-0作为master,gp-1、gp-2作为segment
    • master服务开放两个端口:8432为gp数据库的端口、8222为ssh服务端口
    # gp.yaml
    ###  greenplum db ###
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: gp
    
    spec:
      selector:
        matchLabels:
          app: gp
      serviceName: gp
      replicas: 3
      template:
        metadata:
          labels:
            app: gp
        spec:
          securityContext:
            # 允许对非root用户访问PersistentVolumeClaim,否则无法写Local PVC
            #runAsUser: 1000
            fsGroup: 1000
          volumes:
          - name: gp-pv-storage
            persistentVolumeClaim:
              claimName: greenplum-claim
          containers:
          - name: gp-container
            image: registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
            imagePullPolicy: IfNotPresent
            volumeMounts:
            - mountPath: "/home/gpadmin/master"
              name: gp-pv-storage
            - mountPath: "/home/gpadmin/data"
              name: gp-pv-storage
            ports:
            - name: gp-port
              containerPort: 5432
            - name: ssh-port
              containerPort: 22
    ---
    #service
    apiVersion: v1
    kind: Service
    metadata:
      name: gp
    spec:
      selector:
        app: gp
      type: ClusterIP
      clusterIP: None
    ---
    #service
    apiVersion: v1
    kind: Service
    metadata:
      name: gp-out
    spec:
      ports:
      - name: gp-port
        port: 8432
        nodePort: 8432
        protocol: TCP
        targetPort: gp-port
      - name: ssh-port
        port: 8222
        nodePort: 8222
        protocol: TCP
        targetPort: ssh-port
      selector:
        app: gp
        statefulset.kubernetes.io/pod-name: gp-0
      type: NodePort
    
    
    $ kubectl create -f gp.yaml
    $ kubectl get pods
    
    1. 初始化gp数据库
      通过ssh连接上master服务器(密码是:changeme),进入到服务器内部进行初始化,192.168.x.x为k8s任意一个节点。初始化成功后,会在每个segment节点上生成两个段。
    $ ssh -p 8222 gpadmin@192.168.x.x
    $ ./init.sh
    20210310:02:50:04:002506 gpstate:gp-0:gpadmin-[INFO]:-Obtaining Segment details from master...
    20210310:02:50:04:002506 gpstate:gp-0:gpadmin-[INFO]:-Gathering data from segments...
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:--Master Configuration & Status
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Master host                    = gp-0
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Master postgres process ID     = 2345
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Master data directory          = /home/gpadmin/master/gpseg-1
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Master port                    = 5432
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Master current role            = dispatch
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Greenplum initsystem version   = 6.14.1 build commit:5ef30dd4c9878abadc0124e0761e4b988455a4bd Open Source
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Greenplum current version      = PostgreSQL 9.4.24 (Greenplum Database 6.14.1 build commit:5ef30dd4c9878abadc0124e0761e4b988455a4bd Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Feb 22 2021 22:11:57
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Postgres version               = 9.4.24
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Master standby                 = No master standby configured
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-Segment Instance Status Report
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Segment Info
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Hostname                          = gp-1.gp
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Address                           = gp-1.gp
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Datadir                           = /home/gpadmin/data/gpseg0
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Port                              = 10000
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Status
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      PID                               = 842
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Configuration reports status as   = Up
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Database status                   = Up
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Segment Info
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Hostname                          = gp-1.gp
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Address                           = gp-1.gp
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Datadir                           = /home/gpadmin/data/gpseg1
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Port                              = 10001
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Status
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      PID                               = 843
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Configuration reports status as   = Up
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Database status                   = Up
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Segment Info
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Hostname                          = gp-2.gp
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Address                           = gp-2.gp
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Datadir                           = /home/gpadmin/data/gpseg2
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Port                              = 10000
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Status
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      PID                               = 842
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Configuration reports status as   = Up
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Database status                   = Up
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-----------------------------------------------------
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Segment Info
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Hostname                          = gp-2.gp
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Address                           = gp-2.gp
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Datadir                           = /home/gpadmin/data/gpseg3
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Port                              = 10001
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-   Status
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      PID                               = 843
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Configuration reports status as   = Up
    20210310:02:50:05:002506 gpstate:gp-0:gpadmin-[INFO]:-      Database status                   = Up
    

    恢复数据
    部署到k8s中最大的问题就是当pod重建时,容器被“打回原形”,但好在我们将gp的数据保存在宿主机上,但这时不能用上面 init.sh 脚本初始化了,这样会把之前的数据覆盖掉。因此需要对镜像再做修改。

    按照前面的方法,启动一个容器,进去添加一个新的脚本 repair.sh ,专门用来修复数据的。

    # 用上一个版本的镜像启动
    $ docker run -it registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
    

    在镜像中添加:repair.sh

    #!/bin/bash
    
    # whoami: gpadmin
    
    # only run on master
    host=`hostname`
    if [ $host = "gp-0" ];then
       source /usr/local/greenplum-db/greenplum_path.sh
    
       # 注意下prepare.sh脚本,这里面有个删除master、data等目录的操作,需要注释掉
       artifact/prepare.sh -s 2 -n 2
       #gpinitsystem -a -c gpinitsystem_config
       source env.sh
       #artifact/postinstall.sh
    
       #  解决pod重建后,挂载目录权限问题导致启动失败
       sudo chmod 0700 -R $HOME/master/*
       for HOST in `cat $HOME/hostfile`; do
          ssh gpadmin@${HOST} "sudo chmod 0700 -R $HOME/data/*"
          ssh gpadmin@${HOST} "sudo chmod 0700 -R $HOME/master/*"
          ssh gpadmin@${HOST} "sudo chmod 0700 -R $HOME/mirror/*"
       done
    
       gpstart
    fi
    

    提交容器生成一个新的镜像

    $ docker commit -a "wangjc" -m "Greenplum on Kubernetes" 135b86314ce1  registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
    $ docker push registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
    

    使用新镜像部署gp,并新建数据库添加数据,这时将gp-0删除,这里要注意的是:删除pod会导致svc失效,需要重新创建下。

    $ kubectl delete pod gp-0
    $ kubectl delete -f svc.yaml &&  kubectl create -f svc.yaml      
    
    #连接到gp库的master,进去执行修复操作
    $ ssh -p 8222 gpadmin@192.168.x.x
    #执行恢复脚本, 执行到最后需要手工输入一个:y
    $ ./repair.sh
    
    

    支持镜像节点

    默认镜像节点功能是关闭的,还是按之前的方式进入容器中修改后重新提交一个新的容器。

    1. 修改 vi artifact/prepare.sh 文件添加如下代码
    MIRDATASTR=""
    for i in $(seq 1 $SEG_NUMPERHOST);  do
        MIRDATASTR="$MIRDATASTR  $PREFIX/mirror"
    done
    
    # 增加一个MIRDATASTR替换
    sed "s/%%PORT_BASE%%/$PORT_BASE/g; s|%%MIRDATASTR%%|$MIRDATASTR|g; s|%%PREFIX%%|$PREFIX|g;
    
    1. 修改 artifact/gpinitsystem_config_template 文件将所有以MIRROR开头的属性注释打开,并修改镜像目录如下
    declare -a MIRROR_DATA_DIRECTORY=(%%MIRDATASTR%%)
    
    1. 修改init.sh 文件,修改3个段服务器,每个服务器分4个段
    artifact/prepare.sh -s 3 -n 4
    
    1. 提交容器生成一个新的镜像
    $ docker commit -a "wangjc" -m "Greenplum on Kubernetes" 135b86314ce1  registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
    $ docker push registry.cn-hangzhou.aliyuncs.com/9c/greenplum-k8s:6.14.18
    

    相关文章

      网友评论

          本文标题:Greenplum on Kubernetes

          本文链接:https://www.haomeiwen.com/subject/ialrqltx.html