Apache Zeppelin 0.9 版本升级之问题记录

作者: 六层楼那么高 | 来源:发表于2021-09-26 16:00 被阅读0次

Apache Zeppelin 0.9 版本升级之问题记录
Zeppelin 0.9 版本升级之源码编译
Apache Zeppelin项目结构及代码分析
zeppelin 0.8环境和spark2.2问题
Zeppelin安装
服务器完整搭建supertset|zeppelin|druid|
Spark交互式分析平台Apache Zeppelin的安装
Zeppelin Phoenix 与 Trino jdk 版本不
Apache Zeppelin 系列
apache zeppelin入门

0. 旧版本问题

当前公司内部的 Zeppelin 0.7.2 的版本用了有 3 年多，累计了一些无法问题，亟待解决：

不支持 spark 2.2 以上的版本，spark 2.2 存在执行结束不释放 yarn 资源的严重 bug
不支持 yarn cluster 模式，导致所有 driver 都在 server 节点，可能出现资源瓶颈 [ZEPPELIN-2898]
笔记本（notebook）多了之后（5000+），Server 启动速度很慢，长达数分钟，笔记本加载也偶现卡顿问题r
Server 存在单点故障问题，不支持高可用
当前代码编辑自动补全不够友好，不支持 TAB 补全 [ZEPPELIN-277]
前端容易卡住
每次更新 interpreter ，整个 interpreter 都需要重启，在多租户场景下不够友好，直观感受就是用一段时间查询就很慢（时间耗在了第一次启动 interperter 上面）。 [ZEPPELIN-1770]

通过一番调研，以上大部分问题在 0.9 版本得以解决，除此之外 0.9 还支持了很多新的特性，例如：

shell 解释器支持 terminal 交互式模式，也就是 web terminal ，用户体验大大提升
新增多种解释器包括：Groovy Interpreter [ZEPPELIN-2176]
notebook 重构，性能优化，存储支持 MongoDB 等存储引擎

下载源码，编译打包（参考之前的文章），部署之后遇到了很多问题，记录如下：

1. 开启 kerberos 找不到 tgt 问题

对于开启 kerberos 的环境，启动 interperter 前，已经 kinit 认证过了，再提交任务，还是报错找不到 tgt，这种方式再 0.7.2 版本都是可行的，目前不清楚升级之后找不到的原因。由于之前遇到过类似的问题，这里通过配置的方式解决：zeppeiln-env.sh 增加以下 Java 参数配置，支持从本地缓存获取 kerberos ticket :
-Djavax.security.auth.useSubjectCredsOnly=false
这个参数的解释官网文档

2. 新版本 hive jdbc proxyuser 配置的变化

对于非开启 kerberos 的环境， hive jdbc interperter 需要配置代理用户，旧版本的配置方式已经不适用了，新版本的配置，可以参考官网文档，配置 hive.server2.proxy.user

Name	Value
default.driver	org.apache.hive.jdbc.HiveDriver
default.url	jdbc:hive2://localhost:10000
default.user	hive_user
default.password	hive_password
default.proxy.user.property	hive.server2.proxy.user

3. notebook 迁移

3.1 旧数据迁移过程

新版 Zeppelin 提供了 notebook 的迁移脚本 (bin 目录下的 upgrade-note.sh 脚本)，迁移主要依赖于此脚本，另外此次也将本地存储换成了 mongoDB：

拷贝 conf 下的和 notebook 下的所有笔记到新版本的对应目录
通过脚本清理 2019 年以前以及超大存储的 note
执行升级 note 命令 bin/upgrade-note.sh -d （由于0.9版本对 note 结构和命令有修改）
删除异常的 note 文件类型，不然会导致 notebook server 运行时异常，rm notebook/_*
通过 python 脚本更新 notebook-authorization.json （0.8 版本对权限做了变更，新增了 runner 角色）
停止 zeppelin-server，修改 conf/zeppelin-env.sh , 配置存储为 mongodb
MongoDB 中新增库：use xxdb
重启 zeppelin server 后会自动交本地的 notebook 迁移至 MongoDB

3.2 升级 notebook 权限问题

Zeppelin 0.8.x 版本笔记本的权限模块做了一些重构，新增了 runner 角色，导致新版本迁移过去不兼容，参考官网 issue 的解决方案

所以 conf/notebook-authorization.json 里面需要新增 runners，不然无法识别权限。

通过一段 python 脚本可以更新解决

import json

note_auth_json_file = 'PATH_TO_ZEPPELIN/conf/notebook-authorization.json'

f = open(note_auth_json_file, 'r')
note_auth_json = json.loads(f.read())
f.close()

for note_id in note_auth_json['authInfo']:
    print(note_id)
    if 'writers' in note_auth_json['authInfo'][note_id] \
        and 'runners' not in note_auth_json['authInfo'][note_id]:
        note_auth_json['authInfo'][note_id]['runners'] = note_auth_json['authInfo'][note_id]['writers']
        print(note_auth_json['authInfo'][note_id]['runners'])

f = open(note_auth_json_file, 'w')
f.write(json.dumps(note_auth_json))
f.close()

3.3 note 默认绑定解释器丢失问题

每个 notebook 都有默认的 interperter ，例如创建一个 spark 类型的笔记本后，默认解释器就是 spark，写代码无需在第一行通过 %spark 的方式手动指定。迁移新版后后用户默认的绑定器解释器丢失，全部都变成了 spark，导致非 spark 类的 note 不能按照预期正常运行。

找到了官方 issue : https://issues.apache.org/jira/browse/ZEPPELIN-5309 但是没人解决，这里我写了个 python 脚本修复这个问题，也提交给了官方。

首先获取老版本的映射关系，遍历所有笔记更新 json 文件，然后将默认的解释器绑定写到新的 note json 里面。

# -*- coding: utf-8 -*-
import json
import os
import subprocess

old_interpreter_file = './interpreter.json'

f = open(old_interpreter_file, 'r')
int_json = json.loads(f.read())
f.close()

dict = {}
for interpreter_id in int_json['interpreterSettings']:
    dict[interpreter_id] = int_json['interpreterSettings'][interpreter_id]['name']

for note_id in int_json['interpreterBindings']:
    path = "./notebook/*_" + note_id + ".zpln"

    process = subprocess.Popen("find ./notebook -type f -name \"*" + note_id +".zpln\"", shell=True, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
    stdout, stderr = process.communicate()
    exit_code = process.wait()
    file=stdout.decode("utf-8").strip()

    if file:
      print("find file: " + file)
      print("note_id:" + note_id + " default binding interpreter is: " + dict[int_json['interpreterBindings'][note_id][0]])
      nf = open(file, 'r')

      note_json = json.loads(nf.read())
      note_json["defaultInterpreterGroup"] = dict[int_json['interpreterBindings'][note_id][0]]
      nf.close()
          
      f = open(file, 'w')
      f.write(json.dumps(note_json,indent=4))
      f.close()

4. shiro 新增配置导致自动登录失败问题

目前 shiro 配置是除了 api/version 之外的所有的请求都要走认证

/api/version = anon
/** = authc

而 /api/cluster/address 是新版本新增的一个接口，这个请求比自动登录要早，走认证会自动跳转到 /api/login 这个导致自动登录失败

shiro.ini 新增以下配置解决

/api/cluster/address = anon

网友评论

本文标题：Apache Zeppelin 0.9 版本升级之问题记录

本文链接：https://www.haomeiwen.com/subject/kvwqnltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！