kylin配置文件
- kylin.properties文件包含的配置信息部分很多,进行逐一的解读
- kylin的配置信息总共分为以下部分,共15个部分:
- METADATA | ENV (元数据和kylin模式)
- SERVER | WEB | RESTCLIENT (kylin的web信息配置)
- PUBLIC CONFIG (kylin web的基础配置,不重要)
- SOURCE (数据源配置)
- STORAGE(数据存储)
- JOB(kylin job build一些配置)
- ENGINE (计算引擎的配置)
- CUBE | DICTIONARY(cube和字典的相关配置)
- QUERY (查询相关配置)
- SECURITY (kylin web页面登陆的权限控制)
- SPARK ENGINE CONFIGS (spark引擎配置文件)
- QUERY PUSH DOWN (kylin下压引擎配置文件)
- JDBC Data Source (kylin JDBC数据源配置)
- Livy with Kylin (kylin使用livy提交spark任务)
- Realtime OLAP (kylin 流式olap配置)
kylin.properties各部分
- 在kylin环境部署的时候,绝大多数参数可以采用配置文件中给的默认设定。
- 需要修改配置的时候,建议在对应的位置修改,这样方便查找和修改前后对比;
METADATA | ENV
#
#### METADATA | ENV ###
#
## The metadata store in hbase
kylin元数据在Hbase中表名
#kylin.metadata.url=kylin_metadata@hbase
#
## metadata cache sync retry times
#kylin.metadata.sync-retries=3
#
## Working folder in HDFS, better be qualified absolute path, make sure user has the right permission to this directory
kylin元数据在hdfs上存储路径,kylin中默认超过10m的元数据存储到hdfs,此路径和下文中SPARK ENGINE CONFIGS 中spark-history路径联动,此项配置建议保持默认,不建议修改
#kylin.env.hdfs-working-dir=/kylin
#
## DEV|QA|PROD. DEV will turn on some dev features, QA and PROD has no difference in terms of functions.
kylin的运行环境,测试环境可以尝试为DEV。生产环境可以保持QA默认;
#kylin.env=QA
#
## kylin zk base path
kylin在zk中存储的路径,kylin在zk中存储的信息包括:kylin job的分布式调度、[dict, job_engine, create_htable] 等信息,kylin源码中用到了多线程读取元数据,需要使用锁。
#kylin.env.zookeeper-base-path=/kylin
#
SERVER | WEB | RESTCLIENT
- kylin的运行模式、web服务等信息,此部分需要根据实际情况修改
#### SERVER | WEB | RESTCLIENT ###
#
## Kylin server mode, valid value [all, query, job]
kylin的server mode,单机模式设定为all,集群模式需要根据需要修改各个节点的mode
#kylin.server.mode=all
#
## List of web servers in use, this enables one web server instance to sync up with other servers.
主要是针对集群模式,英文逗号分割
#kylin.server.cluster-servers=localhost:7070
#
## Display timezone on UI,format like[GMT+N or GMT-N]
kylin web页面显示的时区,china地区设置为GMT+8
#kylin.web.timezone=
#
## Timeout value for the queries submitted through the Web UI, in milliseconds
#kylin.web.query-timeout=300000
#
#kylin.web.cross-domain-enabled=true
#
##allow user to export query result
#kylin.web.export-allow-admin=true
#kylin.web.export-allow-other=true
#
## Hide measures in measure list of cube designer, separate by comma
#kylin.web.hide-measures=RAW
#
##max connections of one route
#kylin.restclient.connection.default-max-per-route=20
#
##max connections of one rest-client
#kylin.restclient.connection.max-total=200
#
PUBLIC CONFIG
- kylin一些公共的配置信息,不建议修改,维持默认配置较好
#### PUBLIC CONFIG ###
计算引擎默认是mapreduce
#kylin.engine.default=2
存储默认是Hbase
#kylin.storage.default=2
#kylin.web.hive-limit=20
#kylin.web.help.length=4
#kylin.web.help.0=start|Getting Started|http://kylin.apache.org/docs/tutorial/kylin_sample.html
#kylin.web.help.1=odbc|ODBC Driver|http://kylin.apache.org/docs/tutorial/odbc.html
#kylin.web.help.2=tableau|Tableau Guide|http://kylin.apache.org/docs/tutorial/tableau_91.html
#kylin.web.help.3=onboard|Cube Design Tutorial|http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
#kylin.web.link-streaming-guide=http://kylin.apache.org/
#kylin.htrace.show-gui-trace-toggle=false
#kylin.web.link-hadoop=
#kylin.web.link-diagnostic=
#kylin.web.contact-mail=
#kylin.server.external-acl-provider=
#
## Default time filter for job list, 0->current day, 1->last one day, 2->last one week, 3->last one year, 4->all
对kylin页面默认显示的数据范围,像kylin monitor页面,会有时间段的展示
#kylin.web.default-time-filter=1
#
SOURCE
#### SOURCE ###
#
## Hive client, valid value [cli, beeline]
默认采用的是hive cli方式
#kylin.source.hive.client=cli
#
## Absolute path to beeline shell, can be set to spark beeline instead of the default hive beeline on PATH
#kylin.source.hive.beeline-shell=beeline
#
## Parameters for beeline client, only necessary if hive client is beeline
##kylin.source.hive.beeline-params=-n root --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://localhost:10000
#
## While hive client uses above settings to read hive table metadata,
## table operations can go through a separate SparkSQL command line, given SparkSQL connects to the same Hive metastore.
#kylin.source.hive.enable-sparksql-for-table-ops=false
##kylin.source.hive.sparksql-beeline-shell=/path/to/spark-client/bin/beeline
##kylin.source.hive.sparksql-beeline-params=-n root --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u jdbc:hive2://localhost:10000
#
flat-table在kylin job运行结束后会被清理掉
#kylin.source.hive.keep-flat-table=false
#
## Hive database name for putting the intermediate flat tables
kylin job过程中使用到的中间临时表都存在hive的default库下,建议保持默认;
#kylin.source.hive.database-for-flat-table=default
#
## Whether redistribute the intermediate flat table before building
#kylin.source.hive.redistribute-flat-table=true
#
#
STORAGE
#### STORAGE ###
#
## The storage for final cube file in hbase
#kylin.storage.url=hbase
#
## The prefix of hbase table
kylin segment表在hbae的表名前缀
#kylin.storage.hbase.table-name-prefix=KYLIN_
#
## The namespace for hbase storage
kylin元数据在Hbase中存储的namespace
#kylin.storage.hbase.namespace=default
#
## Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4]
htable是否采用压缩,如果设定了压缩需要测试服务器环境是否支持对应的压缩格式
#kylin.storage.hbase.compression-codec=none
#
## HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020
## Leave empty if hbase running on same cluster with hive and mapreduce
##kylin.storage.hbase.cluster-fs=
#
## The cut size for hbase region, in GB.
#kylin.storage.hbase.region-cut-gb=5
#
## The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster.
## Set 0 to disable this optimization.
#kylin.storage.hbase.hfile-size-gb=2
#
#kylin.storage.hbase.min-region-count=1
#kylin.storage.hbase.max-region-count=500
#
## Optional information for the owner of kylin platform, it can be your team's email
## Currently it will be attached to each kylin's htable attribute
Hbase表schema中会出现对应的onwer
#kylin.storage.hbase.owner-tag=whoami@kylin.apache.org
#
#kylin.storage.hbase.coprocessor-mem-gb=3
#
## By default kylin can spill query's intermediate results to disks when it's consuming too much memory.
## Set it to false if you want query to abort immediately in such condition.
#kylin.storage.partition.aggr-spill-enabled=true
#
## The maximum number of bytes each coprocessor is allowed to scan.
## To allow arbitrary large scan, you can set it to 0.
#kylin.storage.partition.max-scan-bytes=3221225472
#
## The default coprocessor timeout is (hbase.rpc.timeout * 0.9) / 1000 seconds,
## You can set it to a smaller value. 0 means use default.
## kylin.storage.hbase.coprocessor-timeout-seconds=0
#
## clean real storage after delete operation
## if you want to delete the real storage like htable of deleting segment, you can set it to true
删除kylin cube的segment后,是否删除对应的htable。建议设置为true。要不然Hbase中会一直保留无效的htable。
#kylin.storage.clean-after-delete-operation=false
#
JOB
- kylin job运行配置,包含重试、并行任务数、邮件通知、job分布式调度等,需要根据实际情况进行配置
#### JOB ###
#
## Max job retry on error, default 0: no retry
可以设置kylin job的error重试次数
#kylin.job.retry=0
#
## Max count of concurrent jobs running
kylin中同时并行运行job数量
#kylin.job.max-concurrent-jobs=10
#
## The percentage of the sampling, default 100%
#kylin.job.sampling-percentage=100
#
## If true, will send email notification on job complete
这个需要配置,邮件通知内容,建议打开
##kylin.job.notification-enabled=true
##kylin.job.notification-mail-enable-starttls=true
##kylin.job.notification-mail-host=smtp.office365.com
##kylin.job.notification-mail-port=587
##kylin.job.notification-mail-username=kylin@example.com
##kylin.job.notification-mail-password=mypassword
##kylin.job.notification-mail-sender=kylin@example.com
-- 需要设置分布式调度时需要打开此项配置
#kylin.job.scheduler.provider.100=org.apache.kylin.job.impl.curator.CuratorScheduler
设置kylin分布式调度时需要修改
#kylin.job.scheduler.default=0
#
ENGINE
- 对默认引擎的Mapreduce配置,可以维持默认配置;
#### ENGINE ###
#
## Time interval to check hadoop job status
kylin检查mapreduce任务执行状态
#kylin.engine.mr.yarn-check-interval-seconds=10
#
#kylin.engine.mr.reduce-input-mb=500
#
#kylin.engine.mr.max-reducer-number=500
#
#kylin.engine.mr.mapper-input-rows=1000000
#
## Enable dictionary building in MR reducer
#kylin.engine.mr.build-dict-in-reducer=true
#
## Number of reducers for fetching UHC column distinct values
#kylin.engine.mr.uhc-reducer-count=3
#
## Whether using an additional step to build UHC dictionary
#kylin.engine.mr.build-uhc-dict-in-additional-step=false
#
#
CUBE | DICTIONARY
- kylin cube和字典配置信息,可以维持默认配置
#### CUBE | DICTIONARY ###
#
#kylin.cube.cuboid-scheduler=org.apache.kylin.cube.cuboid.DefaultCuboidScheduler
#kylin.cube.segment-advisor=org.apache.kylin.cube.CubeSegmentAdvisor
#
## 'auto', 'inmem', 'layer' or 'random' for testing
kylin cube的构建算法
#kylin.cube.algorithm=layer
#
## A smaller threshold prefers layer, a larger threshold prefers in-mem
#kylin.cube.algorithm.layer-or-inmem-threshold=7
#
## auto use inmem algorithm:
## 1, cube planner optimize job
## 2, no source record
#kylin.cube.algorithm.inmem-auto-optimize=true
#
#kylin.cube.aggrgroup.max-combination=32768
#
#kylin.snapshot.max-mb=300
#
#kylin.cube.cubeplanner.enabled=true
#kylin.cube.cubeplanner.enabled-for-existing-cube=true
#kylin.cube.cubeplanner.expansion-threshold=15.0
#kylin.cube.cubeplanner.recommend-cache-max-size=200
#kylin.cube.cubeplanner.mandatory-rollup-threshold=1000
#kylin.cube.cubeplanner.algorithm-threshold-greedy=8
#kylin.cube.cubeplanner.algorithm-threshold-genetic=23
#
#
QUERY
- kylin查询时配置,建议保持默认,高端用户可以自定义
#### QUERY ###
#
## Controls the maximum number of bytes a query is allowed to scan storage.
## The default value 0 means no limit.
## The counterpart kylin.storage.partition.max-scan-bytes sets the maximum per coprocessor.
#kylin.query.max-scan-bytes=0
#
#kylin.query.cache-enabled=true
#
## Controls extras properties for Calcite jdbc driver
## all extras properties should undder prefix "kylin.query.calcite.extras-props."
## case sensitive, default: true, to enable case insensitive set it to false
## @see org.apache.calcite.config.CalciteConnectionProperty.CASE_SENSITIVE
#kylin.query.calcite.extras-props.caseSensitive=true
## how to handle unquoted identity, defualt: TO_UPPER, available options: UNCHANGED, TO_UPPER, TO_LOWER
## @see org.apache.calcite.config.CalciteConnectionProperty.UNQUOTED_CASING
#kylin.query.calcite.extras-props.unquotedCasing=TO_UPPER
## quoting method, default: DOUBLE_QUOTE, available options: DOUBLE_QUOTE, BACK_TICK, BRACKET
## @see org.apache.calcite.config.CalciteConnectionProperty.QUOTING
#kylin.query.calcite.extras-props.quoting=DOUBLE_QUOTE
## change SqlConformance from DEFAULT to LENIENT to enable group by ordinal
## @see org.apache.calcite.sql.validate.SqlConformance.SqlConformanceEnum
#kylin.query.calcite.extras-props.conformance=LENIENT
#
## TABLE ACL
#kylin.query.security.table-acl-enabled=true
#
## Usually should not modify this
#kylin.query.interceptors=org.apache.kylin.rest.security.TableInterceptor
#
#kylin.query.escape-default-keyword=false
#
## Usually should not modify this
#kylin.query.transformers=org.apache.kylin.query.util.DefaultQueryTransformer,org.apache.kylin.query.util.KeywordDefaultDirtyHack
#
SECURITY
- kylin web登陆用户的权限管理,功能很强大的。
#### SECURITY ###
#
## Spring security profile, options: testing, ldap, saml
## with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login
#kylin.security.profile=testing
#
## Admin roles in LDAP, for ldap and saml
#kylin.security.acl.admin-role=admin
#
## LDAP authentication configuration
#kylin.security.ldap.connection-server=ldap://ldap_server:389
#kylin.security.ldap.connection-username=
#kylin.security.ldap.connection-password=
#
## LDAP user account directory;
#kylin.security.ldap.user-search-base=
#kylin.security.ldap.user-search-pattern=
#kylin.security.ldap.user-group-search-base=
#kylin.security.ldap.user-group-search-filter=(|(member={0})(memberUid={1}))
#
## LDAP service account directory
#kylin.security.ldap.service-search-base=
#kylin.security.ldap.service-search-pattern=
#kylin.security.ldap.service-group-search-base=
#
### SAML configurations for SSO
## SAML IDP metadata file location
#kylin.security.saml.metadata-file=classpath:sso_metadata.xml
#kylin.security.saml.metadata-entity-base-url=https://hostname/kylin
#kylin.security.saml.keystore-file=classpath:samlKeystore.jks
#kylin.security.saml.context-scheme=https
#kylin.security.saml.context-server-name=hostname
#kylin.security.saml.context-server-port=443
#kylin.security.saml.context-path=/kylin
#
SPARK ENGINE CONFIGS
- kylin的cube计算引擎为spark的时候,需要根据实际情况进行配置修改
#### SPARK ENGINE CONFIGS ###
#
## Hadoop conf folder, will export this as "HADOOP_CONF_DIR" to run spark-submit
## This must contain site xmls of core, yarn, hive, and hbase in one folder
##kylin.env.hadoop-conf-dir=/etc/hadoop/conf
#
## Estimate the RDD partition numbers
#kylin.engine.spark.rdd-partition-cut-mb=10
#
## Minimal partition numbers of rdd
#kylin.engine.spark.min-partition=1
#
## Max partition numbers of rdd
#kylin.engine.spark.max-partition=5000
#
## Spark conf (default is in spark/conf/spark-defaults.conf)
#kylin.engine.spark-conf.spark.master=yarn
##kylin.engine.spark-conf.spark.submit.deployMode=cluster
#kylin.engine.spark-conf.spark.yarn.queue=default
#kylin.engine.spark-conf.spark.driver.memory=2G
#kylin.engine.spark-conf.spark.executor.memory=4G
#kylin.engine.spark-conf.spark.executor.instances=40
#kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024
#kylin.engine.spark-conf.spark.shuffle.service.enabled=true
#kylin.engine.spark-conf.spark.eventLog.enabled=true
#kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history
#kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history
#kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false
#
#### Spark conf for specific job
#kylin.engine.spark-conf-mergedict.spark.executor.memory=6G
#kylin.engine.spark-conf-mergedict.spark.memory.fraction=0.2
#
## manually upload spark-assembly jar to HDFS and then set this property will avoid repeatedly uploading jar at runtime
##kylin.engine.spark-conf.spark.yarn.archive=hdfs://namenode:8020/kylin/spark/spark-libs.jar
##kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
#
## uncomment for HDP
##kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
##kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
##kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
#
#
QUERY PUSH DOWN
- kylin查询下压引擎,可以设置为implala、prestore等
#### QUERY PUSH DOWN ###
#
##kylin.query.pushdown.runner-class-name=org.apache.kylin.query.adhoc.PushDownRunnerJdbcImpl
#
##kylin.query.pushdown.update-enabled=false
##kylin.query.pushdown.jdbc.url=jdbc:hive2://sandbox:10000/default
##kylin.query.pushdown.jdbc.driver=org.apache.hive.jdbc.HiveDriver
##kylin.query.pushdown.jdbc.username=hive
##kylin.query.pushdown.jdbc.password=
#
##kylin.query.pushdown.jdbc.pool-max-total=8
##kylin.query.pushdown.jdbc.pool-max-idle=8
##kylin.query.pushdown.jdbc.pool-min-idle=0
#
JDBC Data Source
#### JDBC Data Source
##kylin.source.jdbc.connection-url=
##kylin.source.jdbc.driver=
##kylin.source.jdbc.dialect=
##kylin.source.jdbc.user=
##kylin.source.jdbc.pass=
##kylin.source.jdbc.sqoop-home=
##kylin.source.jdbc.filed-delimiter=|
Livy with Kylin
#### Livy with Kylin
##kylin.engine.livy-conf.livy-enabled=false
##kylin.engine.livy-conf.livy-url=http://LivyHost:8998
##kylin.engine.livy-conf.livy-key.file=hdfs:///path-to-kylin-job-jar
##kylin.engine.livy-conf.livy-arr.jars=hdfs:///path-to-hadoop-dependency-jar
#
#
Realtime OLAP
#### Realtime OLAP ###
#
## Where should local segment cache located, for absolute path, the real path will be ${KYLIN_HOME}/${kylin.stream.index.path}
#kylin.stream.index.path=stream_index
#
## The timezone for Derived Time Column like hour_start, try set to GMT+N, please check detail at KYLIN-4010
#kylin.stream.event.timezone=
#
## Debug switch for print realtime global dict encode information, please check detail at KYLIN-4141
#kylin.stream.print-realtime-dict-enabled=false
#
## Should enable latest coordinator, please check detail at KYLIN-4167
#kylin.stream.new.coordinator-enabled=true
#
## In which way should we collect receiver's metrics info
##kylin.stream.metrics.option=console/csv/jmx
#
## When enable a streaming cube, should cousme from earliest offset or least offset
#kylin.stream.consume.offsets.latest=true
#
## The parallelism of scan in receiver side
#kylin.stream.receiver.use-threads-per-query=8
#
## How coordinator/receiver register itself into StreamMetadata, there are three option:
## 1. hostname:port, then kylin will set the config ip and port as the currentNode;
## 2. port, then kylin will get the node's hostname and append port as the currentNode;
## 3. not set, then kylin will get the node hostname address and set the hostname and defaultPort(7070 for coordinator or 9090 for receiver) as the currentNode.
##kylin.stream.node=
#
## Auto resubmit after job be discarded
#kylin.stream.auto-resubmit-after-discard-enabled=true
网友评论