美文网首页
Group Recipe(2)

Group Recipe(2)

作者: LONG_7 | 来源:发表于2021-05-03 10:03 被阅读0次

    Hands-On: Grouping Data

    At this point, we have a prepared t-shirts dataset and have some done preliminary statistical exploration.

    If our ultimate goal is to understand our customers, we’ll need to group all past orders by unique customers, aggregating their past interactions.

    Hint
    A screencast at the end of the page recaps all of the actions described here.

    Group Orders by Customer

    To do this, we’ll use another visual recipe, Group.

    • With the orders_prepared dataset open, look in the upper-right corner for the Actions menu. From this menu, choose Group in the list of Visual recipes.
    • An alternative path is to select (but not open) the orders_prepared dataset from the Flow and find the plus icon at the top of the right sidebar.

    The Group Recipe allows you to aggregate the values of some columns by the values of one or more keys.

    • In the recipe dialog, choose to group by customer_id.
    • Change the name of the output dataset to orders_by_customer.
    • Click Create Recipe.

    The Group recipe has several steps (on the left). The core step is the Group step, where you choose which columns to serve as keys and what aggregations you want performed.

    Some columns, like order_id and tshirt_category, we won’t need in the new dataset. For the others, make the following selections:

    • order_date: Min
    • pages_visited: Avg
    • total: Sum

    For each customer, this will give us the date of first order, the average number of visited pages per visit, and the sum of all orders. We’ll also compute the count of each group – a default setting.

    Note
    The recipe reminds us of the storage type of each column in the “Per field aggregations”. We are able to retrieve the minimum of order_date because its storage type is a date. If it were a string, the “minimum” would be the first result in alphabetical order.

    Before running the recipe, check the Output step. Here we can rename the columns of the output dataset.

    • Rename order_date_min to first_order_date.

    Click Run to create the new grouped dataset, updating the schema.

    When exploring the output dataset, use the Analyze tool on the customer_id column. Note that all values are unique. We have exactly one record for every customer.

    The video below recaps these steps:

    //

    What’s next?

    Now that you have a few datasets and recipes in the Flow, it’s time to take stock of what you’ve accomplished in the next hands-on lesson.

    相关文章

      网友评论

          本文标题:Group Recipe(2)

          本文链接:https://www.haomeiwen.com/subject/phcbrltx.html