背景
MetricFlow 目前国内的资料比较少,在学习官方文档过程中给了经典的煎饼店的 GitHub 示例代码。在Windows环境按其 Readme 说明跑通比较困难。因此主要记录自己将项目运行起来的过程。
环境信息
数据库:PostgreSQL 15
Python版本:3.12.4
官方文档:https://docs.getdbt.com/docs/build/about-metricflow
官方示例项目:https://github.com/dbt-labs/jaffle-sl-template
运行流程
特别说明:参考官方文档的Readme
项目下载
方式一:通过Git下载
在文件系统shift + 右键
打开菜单,打开 PowerShell
或使用Win + X
打开CMD
:
克隆项目:git clone https://github.com/dbt-labs/jaffle-sl-template.git
方式二:下载Zip包并解压(推荐)
安装 Metricflow
采用虚拟环境安装,打开CMD命令行并切换到的项目目录下:jaffle-sl-template-main
1)创建虚拟环境:python -m venv .env
2)切换到虚拟环境下:.env\Scripts\active.sh
前面显示
(.env)
即成功。3)安装dbt和postgres适配器:
pip install dbt-core dbt-postgres
4)安装metricflow根据官方readme的说明:
pip install "dbt-metricflow[<YOUR_DBT_ADAPTER_NAME>]"
我是用PostgreSQL适配器,因此执行执行:pip install "dbt-metricflow[postgres]"
5)验证dbt版本:
dbt --version
(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt --version
Core:
- installed: 1.8.3
- latest: 1.8.3 - Up to date!
Plugins:
- postgres: 1.8.2 - Up to date!
6)验证metricflow版本:mf --version
(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>mf --version
mf, version 0.7.0
连接测试
根据dbt_project文件知道profile是:snowflake
因此在用户目录下,找到:.dbt\profiles.yml
文件,添加项目的连接信息。
snowflake:
target: dev
outputs:
dev:
type: postgres
host: 192.168.19.128 # PG数据库地址
user: postgres # PG数据库用户名
password: postgres # PG数据库密码
port: 5432 # PG数据库端口号
dbname: jaffle_shop # PG数据库名称
schema: public # PG数据库模式
threads: 4
connect_timeout: 10
需提前在PG数据库创建jaffle_shop
库:create database jaffle_shop
运行dbt debug
测试连通性:
(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt debug
13:14:59 Running with dbt=1.8.3
13:14:59 dbt version: 1.8.3
13:14:59 python version: 3.12.4
13:14:59 python path: C:\Users\admin\Downloads\jaffle-sl-template-main\.env\Scripts\python.exe
13:14:59 os info: Windows-10-10.0.19045-SP0
13:14:59 Using profiles dir at C:\Users\admin\.dbt
13:14:59 Using profiles.yml file at C:\Users\admin\.dbt\profiles.yml
13:14:59 Using dbt_project.yml file at C:\Users\admin\Downloads\jaffle-sl-template-main\dbt_project.yml
13:14:59 adapter type: postgres
13:14:59 adapter version: 1.8.2
13:14:59 Configuration:
13:14:59 profiles.yml file [OK found and valid]
13:14:59 dbt_project.yml file [OK found and valid]
13:14:59 Required dependencies:
13:14:59 - git [OK found]
13:14:59 Connection:
13:14:59 host: 192.168.19.128
13:14:59 port: 5432
13:14:59 user: postgres
13:14:59 database: jaffle_shop
13:14:59 schema: public
13:14:59 connect_timeout: 10
13:14:59 role: None
13:14:59 search_path: None
13:14:59 keepalives_idle: 0
13:14:59 sslmode: None
13:14:59 sslcert: None
13:14:59 sslkey: None
13:14:59 sslrootcert: None
13:14:59 application_name: dbt
13:14:59 retries: 1
13:14:59 Registered adapter: postgres=1.8.2
13:14:59 Connection test: [OK connection ok]
13:14:59 All checks passed!
安装dbt依赖
1)下载项目依赖的2个dbt包:dbt_utils 和 dbt_date:dbt deps
发现运行到这里就卡住了,日志说让更新到最新版本:
(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt deps
13:17:03 Running with dbt=1.8.3
13:17:04 Installing dbt-labs/dbt_utils
13:17:05 Installed from version 1.0.0
13:17:05 Updated version available: 1.2.0
13:17:05 Installing calogica/dbt_date
13:17:06 Installed from version 0.8.1
13:17:06 Updated version available: 0.10.1
13:17:06
13:17:06 Updates available for packages: ['dbt-labs/dbt_utils', 'calogica/dbt_date']
Update your versions in packages.yml, then run dbt deps
2(因此根据提示的版本修改了package.yml
文件:
packages:
- package: dbt-labs/dbt_utils
version: 1.2.0
- package: calogica/dbt_date
version: 0.10.1
再次运行:dbt deps
:
(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt deps
13:20:35 Running with dbt=1.8.3
13:20:36 Updating lock file in file path: C:\Users\admin\Downloads\jaffle-sl-template-main/package-lock.yml
13:20:36 Installing dbt-labs/dbt_utils
13:20:37 Installed from version 1.2.0
13:20:37 Up to date!
13:20:37 Installing calogica/dbt_date
13:20:38 Installed from version 0.10.1
13:20:38 Up to date!
加载数据
执行:dbt seed
(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt seed
13:25:15 Running with dbt=1.8.3
13:25:15 Registered adapter: postgres=1.8.2
13:25:15 Unable to do partial parsing because saved manifest not found. Starting full parse.
13:25:16 [WARNING]: Deprecated functionality
The `tests` config has been renamed to `data_tests`. Please see
https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more
information.
13:25:17 Found 10 models, 6 seeds, 18 data tests, 15 sources, 17 metrics, 664 macros, 1 group, 5 semantic models, 3 saved queries
13:25:17
13:25:17 Concurrency: 4 threads (target='dev')
13:25:17
13:25:17 1 of 6 START seed file public.raw_customers .................................... [RUN]
13:25:17 2 of 6 START seed file public.raw_items ........................................ [RUN]
13:25:17 3 of 6 START seed file public.raw_orders ....................................... [RUN]
13:25:17 4 of 6 START seed file public.raw_products ..................................... [RUN]
13:25:22 1 of 6 OK loaded seed file public.raw_customers ................................ [INSERT 939 in 4.09s]
13:25:22 4 of 6 OK loaded seed file public.raw_products ................................. [INSERT 10 in 4.55s]
13:25:22 5 of 6 START seed file public.raw_stores ....................................... [RUN]
13:25:22 6 of 6 START seed file public.raw_supplies ..................................... [RUN]
13:25:28 5 of 6 OK loaded seed file public.raw_stores ................................... [INSERT 5 in 5.26s]
13:25:28 6 of 6 OK loaded seed file public.raw_supplies ................................. [INSERT 65 in 5.03s]
13:25:55 2 of 6 OK loaded seed file public.raw_items .................................... [INSERT 95368 in 37.87s]
13:26:03 3 of 6 OK loaded seed file public.raw_orders ................................... [INSERT 59652 in 45.51s]
13:26:03
13:26:03 Finished running 6 seeds in 0 hours 0 minutes and 45.77 seconds (45.77s).
13:26:03
13:26:03 Completed successfully
13:26:03
13:26:03 Done. PASS=6 WARN=0 ERROR=0 SKIP=0 TOTAL=6
将jaffle-data目录下的csv文件加载到PG数据库中:
PostgreSQL
运行你的 dbt 项目并查询指标
1)运行:dbt build --exclude path:jaffle-data
执行爆了很多错误,但是同时类似的。节选其中一个:
13:29:37 Database Error in model stg_supplies (models\staging\stg_supplies.sql)
relation "dbt_sl_test.raw_supplies" does not exist
LINE 11: select * from "jaffle_shop"."dbt_sl_test"."raw_supplies"
从报错的select语句看出来,我们没有dbt_sl_test
这个 schema。前面测试连接的时候我们PG数据库的 schema 的是:public
。
根据报错定位到模型里面用了 source 。发现 source 的配置写死了 schema,因此需要调整 source 的 schema。
文件路径:models\staging\_source.yml
sources:
- name: ecom
schema: public # dbt_sl_test
description: E-commerce data
再次执行:dbt build --exclude path:jaffle-data
13:40:51 Finished running 6 view models, 4 table models, 18 data tests, 3 saved queries in 0 hours 0 minutes and 2.44 seconds (2.44s).
13:40:51
13:40:51 Completed successfully
13:40:51
13:40:51 Done. PASS=31 WARN=0 ERROR=0 SKIP=0 TOTAL=31
2)运行:mf validate-configs
不出意外,报错:
• ERROR: with metric `revenue` - Unable to query metric `revenue`.
Received following error from data warehouse:
Database Error
type "datetime" does not exist
LINE 15: DATE_TRUNC('day', cast(ordered_at as DATETIME)) AS met...
看样子是PG数据库不支持使用cast将 timestamp 强转转为 datetime 类型。测试了一下,可以不需要强转。
查找代码,发现是 models\marts\customer360\order_items.yml
的语义模型的维度字段 ordered_at 定义的。修改expr
:
semantic_models:
- name: order_item
... ...
dimensions:
- name: ordered_at
expr: ordered_at # cast(ordered_at as DATETIME)
type: time
保存后重新构建:dbt build --exclude path:jaffle-data
再次执行:mf validate-configs
(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>mf validate-configs
(To see warnings and future-errors, run again with flag `--show-all`)
v 🎉 Successfully parsed manifest from dbt project
v 🎉 Successfully validated the semantics of built manifest (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v 🎉 Successfully validated semantic models against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v 🎉 Successfully validated dimensions against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v 🎉 Successfully validated entities against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v 🎉 Successfully validated measures against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v 🎉 Successfully validated metrics against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
成功!根据官方文档查询指标large_orders:mf query --metrics large_orders
不出意外,又报错了:
(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>mf query --metrics large_orders
| Initiating query…
ERROR: Got errors while resolving the query.
Error #1:
Message:
The given input does not exactly match any known metrics.
Suggestions:
['large_order', 'orders', 'food_orders', 'order_cost', 'customers_with_orders', 'order_total']
Query Input:
large_orders
Issue Location:
[Resolve Query(['large_orders'])]
Log file: C:\Users\admin\Downloads\jaffle-sl-template-main\logs\metricflow.log
看了一下,命令的字面意思是查询指标:large_orders
,但是从报错的来看,压根就没有large_orders
,只有large_order
。
文档错误???改命令执行:mf query --metrics large_order
(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>mf query --metrics large_order
v Success 🦄 - query completed after 0.30 seconds
large_order
-------------
10504
成功!!!
初学dbt,如有不对请指正
网友评论