Pachyderm部署
根据官方文档部署、测试Pachyderm,文档链接:
http://docs.pachyderm.io/en/latest/index.html
准备环境
kubernetes集群:192.168.13.17
命名空间(namespace):ht
创建k8s命名空间
1.编辑yaml文件namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: ht
2.kubectl create进行创建
# kubectl create -f namespace.yaml
3.其它操作
# 命令直接创建
# kubectl create namespace ht
# 查看namespace
# kubectl get namespace
# 删除指定namespace
# kubectl delete namespace ht
部署步骤
安装pachctl
1.下载安装包
# curl -o /tmp/pachctl.tar.gz -L https://github.com/pachyderm/pachyderm/releases/download/v1.8.1/pachctl_1.8.1_linux_amd64.tar.gz
2.解压
# tar -xvf /tmp/pachctl.tar.gz -C /tmp
3.拷贝
# cp /tmp/pachctl_1.8.1_linux_amd64/pachctl /usr/local/bin
部署Pachyderm
1.部署到指namespace
# pachctl deploy local --namespace ht
2.移除Pachyderm集群
# pachctl undeploy --namespace ht
3.查看是否部署成功
# kubectl get all -n ht
![](https://img.haomeiwen.com/i15687234/0094e37548e60ee6.png)
4.端口转发
# pachctl port-forward
![](https://img.haomeiwen.com/i15687234/ca5f4bd294584d4a.png)
5.export新增环境变量ADDRESS
# export ADDRESS=192.168.13.17:30650
6.验证可用性
# pachctl version
COMPONENT VERSION
pachctl 1.8.1
pachd 1.8.1
# pachctl list-repo
NAME CREATED SIZE
简单试用
完成对图像进行边缘检测的Pachyderm管道的部署。
当有增量数据进入仓库时,管道将自动处理,并将结果输出到仓库。
数据仓库
仓库操作
创建仓库
# pachctl create-repo images
查看仓库
# pachctl list-repo
NAME CREATED SIZE
images 13 seconds ago 0B
数据操作
一、添加数据到仓库
1.手动start-commit、finish-commit添加数据
适用场景:①需要一定时间的大批量数据传输;②对于提交增添一些描述信息
2.put-file -f自动commit添加数据
文件来源:①本地文件;②URL;③自动擦除的对象存储桶(a object storage bucket which it’ll automatically scrape)
本例采用自动commit的方式:
# pachctl put-file images master liberty.png -f http://imgur.com/46Q8nDz.png
二、一些查看操作
1.查看仓库存储空间
# pachctl list-repo
NAME CREATED SIZE
images 6 minutes ago 57.27KiB
2.查看提交历史
# pachctl list-commit images
REPO COMMIT PARENT STARTED DURATION SIZE
images 8314e21c18144f7bb8bd464809a3fa93 <none> 3 minutes ago Less than a second 57.27KiB
3.查看提交的文件列表
# pachctl list-file images master
COMMIT NAME TYPE COMMITTED SIZE
8314e21c18144f7bb8bd464809a3fa93 /liberty.png file 5 minutes ago 57.27KiB
4.查看指定文件
# pachctl get-file images master liberty.png | display
Pachyderm管道
![](https://img.haomeiwen.com/i15687234/5dbf7aa2742bad82.png)
管道配置JSON
# edges.json
{
"pipeline": {
"name": "edges"
},
"transform": {
"cmd": [ "python3", "/edges.py" ],
"image": "pachyderm/opencv"
},
"input": {
"pfs": {
"repo": "images",
"glob": "/*"
}
}
}
分析代码如下,容器化应用,已打包成官方镜像pachyderm/opencv。
# edges.py
import cv2
import numpy as np
from matplotlib import pyplot as plt
import os
# make_edges reads an image from /pfs/images and outputs the result of running
# edge detection on that image to /pfs/out. Note that /pfs/images and
# /pfs/out are special directories that Pachyderm injects into the container.
def make_edges(image):
img = cv2.imread(image)
tail = os.path.split(image)[1]
edges = cv2.Canny(img,100,200)
plt.imsave(os.path.join("/pfs/out", os.path.splitext(tail)[0]+'.png'), edges, cmap = 'gray')
# walk /pfs/images and call make_edges on every file found
for dirpath, dirs, files in os.walk("/pfs/images"):
for file in files:
make_edges(os.path.join(dirpath, file)
创建管道
# pachctl create-pipeline -f edges.json
增加管道,实现多步分析
![](https://img.haomeiwen.com/i15687234/2e457150944f50fe.png)
管道配置
# montage.json
{
"pipeline": {
"name": "montage"
},
"input": {
"cross": [ {
"pfs": {
"glob": "/",
"repo": "images"
}
},
{
"pfs": {
"glob": "/",
"repo": "edges"
}
} ]
},
"transform": {
"cmd": [ "sh" ],
"image": "v4tech/imagemagick",
"stdin": [ "montage -shadow -background SkyBlue -geometry 300x300+2+2 $(find /pfs -type f | sort) /pfs/out/montage.png" ]
}
}
挂载pfs
# mkdir pfs
# pachctl mount ./pfs
nodefs.MountRoot: exec: "/bin/fusermount": stat /bin/fusermount: no such file or directory
缺少/bin/fusermount,解决方法:
# yum install fuse
进程卡住
# pachctl mount pfs
发生了什么,如何解决???不影响主测试流程,留待后续处理。
总结
至此,Pachyderm的整个部署以及采用官方示例进行的简单测试均已完成,后续随着本人对它的使用,会陆续推出对于Pachyderm更加深入的介绍。
网友评论