美文网首页
跑通MetricFlow官方示例项目:jaffle_shop

跑通MetricFlow官方示例项目:jaffle_shop

作者: 猫留下你走吧 | 来源:发表于2024-07-13 23:24 被阅读0次
背景

MetricFlow 目前国内的资料比较少,在学习官方文档过程中给了经典的煎饼店的 GitHub 示例代码。在Windows环境按其 Readme 说明跑通比较困难。因此主要记录自己将项目运行起来的过程。

环境信息

数据库:PostgreSQL 15
Python版本:3.12.4
官方文档:https://docs.getdbt.com/docs/build/about-metricflow
官方示例项目:https://github.com/dbt-labs/jaffle-sl-template

运行流程

特别说明:参考官方文档的Readme

项目下载

方式一:通过Git下载
在文件系统shift + 右键 打开菜单,打开 PowerShell 或使用Win + X打开CMD
克隆项目:git clone https://github.com/dbt-labs/jaffle-sl-template.git
方式二:下载Zip包并解压(推荐)

GitHub
安装 Metricflow

采用虚拟环境安装,打开CMD命令行并切换到的项目目录下:jaffle-sl-template-main
1)创建虚拟环境:python -m venv .env
2)切换到虚拟环境下:.env\Scripts\active.sh

cmd
前面显示(.env) 即成功。
3)安装dbt和postgres适配器:pip install dbt-core dbt-postgres
4)安装metricflow根据官方readme的说明:pip install "dbt-metricflow[<YOUR_DBT_ADAPTER_NAME>]" 我是用PostgreSQL适配器,因此执行执行:pip install "dbt-metricflow[postgres]"
5)验证dbt版本:dbt --version
(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt --version
Core:
  - installed: 1.8.3
  - latest:    1.8.3 - Up to date!

Plugins:
  - postgres: 1.8.2 - Up to date!

6)验证metricflow版本:mf --version

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>mf --version
mf, version 0.7.0
连接测试

根据dbt_project文件知道profile是:snowflake
因此在用户目录下,找到:.dbt\profiles.yml文件,添加项目的连接信息。

snowflake:
  target: dev
  outputs:
    dev:
      type: postgres
      host: 192.168.19.128 # PG数据库地址
      user: postgres # PG数据库用户名
      password: postgres # PG数据库密码
      port: 5432 # PG数据库端口号
      dbname: jaffle_shop # PG数据库名称
      schema: public # PG数据库模式
      threads: 4
      connect_timeout: 10

需提前在PG数据库创建jaffle_shop库:create database jaffle_shop
运行dbt debug 测试连通性:

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt debug
13:14:59  Running with dbt=1.8.3
13:14:59  dbt version: 1.8.3
13:14:59  python version: 3.12.4
13:14:59  python path: C:\Users\admin\Downloads\jaffle-sl-template-main\.env\Scripts\python.exe
13:14:59  os info: Windows-10-10.0.19045-SP0
13:14:59  Using profiles dir at C:\Users\admin\.dbt
13:14:59  Using profiles.yml file at C:\Users\admin\.dbt\profiles.yml
13:14:59  Using dbt_project.yml file at C:\Users\admin\Downloads\jaffle-sl-template-main\dbt_project.yml
13:14:59  adapter type: postgres
13:14:59  adapter version: 1.8.2
13:14:59  Configuration:
13:14:59    profiles.yml file [OK found and valid]
13:14:59    dbt_project.yml file [OK found and valid]
13:14:59  Required dependencies:
13:14:59   - git [OK found]

13:14:59  Connection:
13:14:59    host: 192.168.19.128
13:14:59    port: 5432
13:14:59    user: postgres
13:14:59    database: jaffle_shop
13:14:59    schema: public
13:14:59    connect_timeout: 10
13:14:59    role: None
13:14:59    search_path: None
13:14:59    keepalives_idle: 0
13:14:59    sslmode: None
13:14:59    sslcert: None
13:14:59    sslkey: None
13:14:59    sslrootcert: None
13:14:59    application_name: dbt
13:14:59    retries: 1
13:14:59  Registered adapter: postgres=1.8.2
13:14:59    Connection test: [OK connection ok]

13:14:59  All checks passed!
安装dbt依赖

1)下载项目依赖的2个dbt包:dbt_utils 和 dbt_date:dbt deps
发现运行到这里就卡住了,日志说让更新到最新版本:

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt deps
13:17:03  Running with dbt=1.8.3
13:17:04  Installing dbt-labs/dbt_utils
13:17:05  Installed from version 1.0.0
13:17:05  Updated version available: 1.2.0
13:17:05  Installing calogica/dbt_date
13:17:06  Installed from version 0.8.1
13:17:06  Updated version available: 0.10.1
13:17:06
13:17:06  Updates available for packages: ['dbt-labs/dbt_utils', 'calogica/dbt_date']
Update your versions in packages.yml, then run dbt deps

2(因此根据提示的版本修改了package.yml 文件:

packages:
  - package: dbt-labs/dbt_utils
    version: 1.2.0
  - package: calogica/dbt_date
    version: 0.10.1

再次运行:dbt deps

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt deps
13:20:35  Running with dbt=1.8.3
13:20:36  Updating lock file in file path: C:\Users\admin\Downloads\jaffle-sl-template-main/package-lock.yml
13:20:36  Installing dbt-labs/dbt_utils
13:20:37  Installed from version 1.2.0
13:20:37  Up to date!
13:20:37  Installing calogica/dbt_date
13:20:38  Installed from version 0.10.1
13:20:38  Up to date!
加载数据

执行:dbt seed

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>dbt seed
13:25:15  Running with dbt=1.8.3
13:25:15  Registered adapter: postgres=1.8.2
13:25:15  Unable to do partial parsing because saved manifest not found. Starting full parse.
13:25:16  [WARNING]: Deprecated functionality
The `tests` config has been renamed to `data_tests`. Please see
https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more
information.
13:25:17  Found 10 models, 6 seeds, 18 data tests, 15 sources, 17 metrics, 664 macros, 1 group, 5 semantic models, 3 saved queries
13:25:17
13:25:17  Concurrency: 4 threads (target='dev')
13:25:17
13:25:17  1 of 6 START seed file public.raw_customers .................................... [RUN]
13:25:17  2 of 6 START seed file public.raw_items ........................................ [RUN]
13:25:17  3 of 6 START seed file public.raw_orders ....................................... [RUN]
13:25:17  4 of 6 START seed file public.raw_products ..................................... [RUN]
13:25:22  1 of 6 OK loaded seed file public.raw_customers ................................ [INSERT 939 in 4.09s]
13:25:22  4 of 6 OK loaded seed file public.raw_products ................................. [INSERT 10 in 4.55s]
13:25:22  5 of 6 START seed file public.raw_stores ....................................... [RUN]
13:25:22  6 of 6 START seed file public.raw_supplies ..................................... [RUN]
13:25:28  5 of 6 OK loaded seed file public.raw_stores ................................... [INSERT 5 in 5.26s]
13:25:28  6 of 6 OK loaded seed file public.raw_supplies ................................. [INSERT 65 in 5.03s]
13:25:55  2 of 6 OK loaded seed file public.raw_items .................................... [INSERT 95368 in 37.87s]
13:26:03  3 of 6 OK loaded seed file public.raw_orders ................................... [INSERT 59652 in 45.51s]
13:26:03
13:26:03  Finished running 6 seeds in 0 hours 0 minutes and 45.77 seconds (45.77s).
13:26:03
13:26:03  Completed successfully
13:26:03
13:26:03  Done. PASS=6 WARN=0 ERROR=0 SKIP=0 TOTAL=6

将jaffle-data目录下的csv文件加载到PG数据库中:


PostgreSQL
运行你的 dbt 项目并查询指标

1)运行:dbt build --exclude path:jaffle-data
执行爆了很多错误,但是同时类似的。节选其中一个:

13:29:37    Database Error in model stg_supplies (models\staging\stg_supplies.sql)
  relation "dbt_sl_test.raw_supplies" does not exist
  LINE 11:     select * from "jaffle_shop"."dbt_sl_test"."raw_supplies"

从报错的select语句看出来,我们没有dbt_sl_test这个 schema。前面测试连接的时候我们PG数据库的 schema 的是:public
根据报错定位到模型里面用了 source 。发现 source 的配置写死了 schema,因此需要调整 source 的 schema。
文件路径:models\staging\_source.yml

sources:
  - name: ecom
    schema: public # dbt_sl_test
    description: E-commerce data

再次执行:dbt build --exclude path:jaffle-data

13:40:51  Finished running 6 view models, 4 table models, 18 data tests, 3 saved queries in 0 hours 0 minutes and 2.44 seconds (2.44s).
13:40:51
13:40:51  Completed successfully
13:40:51
13:40:51  Done. PASS=31 WARN=0 ERROR=0 SKIP=0 TOTAL=31

2)运行:mf validate-configs
不出意外,报错:

• ERROR: with metric `revenue`  - Unable to query metric `revenue`.
Received following error from data warehouse:
Database Error
  type "datetime" does not exist
  LINE 15:       DATE_TRUNC('day', cast(ordered_at as DATETIME)) AS met...

看样子是PG数据库不支持使用cast将 timestamp 强转转为 datetime 类型。测试了一下,可以不需要强转。
查找代码,发现是 models\marts\customer360\order_items.yml 的语义模型的维度字段 ordered_at 定义的。修改expr

semantic_models:
  - name: order_item
      ... ...
    dimensions:
      - name: ordered_at
        expr: ordered_at  # cast(ordered_at as DATETIME)
        type: time

保存后重新构建:dbt build --exclude path:jaffle-data
再次执行:mf validate-configs

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>mf validate-configs
(To see warnings and future-errors, run again with flag `--show-all`)
v 🎉 Successfully parsed manifest from dbt project
v 🎉 Successfully validated the semantics of built manifest (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v 🎉 Successfully validated semantic models against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v 🎉 Successfully validated dimensions against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v 🎉 Successfully validated entities against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v 🎉 Successfully validated measures against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)
v 🎉 Successfully validated metrics against data warehouse (ERRORS: 0, FUTURE_ERRORS: 0, WARNINGS: 0)

成功!根据官方文档查询指标large_orders:mf query --metrics large_orders
不出意外,又报错了:

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>mf query --metrics large_orders
| Initiating query…
ERROR: Got errors while resolving the query.

Error #1:
  Message:

    The given input does not exactly match any known metrics.

    Suggestions:
      ['large_order', 'orders', 'food_orders', 'order_cost', 'customers_with_orders', 'order_total']

  Query Input:

    large_orders

  Issue Location:

    [Resolve Query(['large_orders'])]
Log file: C:\Users\admin\Downloads\jaffle-sl-template-main\logs\metricflow.log

看了一下,命令的字面意思是查询指标:large_orders,但是从报错的来看,压根就没有large_orders,只有large_order
文档错误???改命令执行:mf query --metrics large_order

(.env) C:\Users\admin\Downloads\jaffle-sl-template-main>mf query --metrics large_order
v Success 🦄 - query completed after 0.30 seconds
  large_order
-------------
        10504

成功!!!
初学dbt,如有不对请指正

相关文章

网友评论

      本文标题:跑通MetricFlow官方示例项目:jaffle_shop

      本文链接:https://www.haomeiwen.com/subject/ucuqhjtx.html