美文网首页
2021-04-27

2021-04-27

作者: LONG_7 | 来源:发表于2021-04-27 21:52 被阅读0次

    Hands-On: Explore Your Data

    In the last hands-on lesson, you imported your first dataset into a Dataiku DSS project. Returning to that project, let’s explore that dataset.

    The Explore tab of a dataset provides a tabular view of your data where you can start to examine it.

    Sampling

    In the Sampling concept video, we learned how Dataiku DSS only shows a sample of the dataset when you are working interactively with it.

    To see the sample settings of a dataset, near the top left of the page, click Configure sample, which opens a panel on the left.

    By default, the sample in the Explore tab includes the first 10,000 records of the dataset.

    Storage Type and Meaning

    Beneath each column name is the storage type and meaning.

    Dataiku DSS detects a meaning of “Integer” for customer_id, based upon the fact that most of values of customer_id are integers. The gauge shows red for the few values that do not match this meaning, which allows us to determine whether these values are truly invalid customer IDs, or, as is the case here, Integer is too restrictive a meaning for customer_id.

    Click on the meaning and update it to Text. Now the gauge for customer_id is entirely green.

    Note

    In this dataset, we do not have any missing values. But if we did, they would be represented by the color gray in the data quality bar.

    Charts

    You can use charts to explore a dataset. For example, we might want to know how often each type of t-shirt is ordered.

    • Click on the Charts tab.
    • From the panel on the left, drag and drop Count of records as the Y variable.
    • Drag and drop tshirt_category as the X variable.

    Dataiku DSS shows a column chart of Count of records by tshirt_category for the current sample.

    The chart reveals that the values of tshirt_category are not consistently recorded. Sometimes black shirt color is recorded as “Black”, and sometimes as “Bl”. Similarly, white shirts are sometimes recorded as “White” and sometimes as “Wh”.

    What’s next?

    Congratulations! You’ve created your first project, imported your first dataset, and created your first chart. In the next hands-on lesson, we’ll handle these issues with a Prepare recipe.

    相关文章

      网友评论

          本文标题:2021-04-27

          本文链接:https://www.haomeiwen.com/subject/iaoprltx.html