美文网首页
Interactive Visual Statistics(2)

Interactive Visual Statistics(2)

作者: LONG_7 | 来源:发表于2021-05-10 14:54 被阅读0次

    Hands-On: Statistics Worksheets and Cards

    In the last hands-on lesson, we saw how the beginnings of how to prepare data through a Prepare recipe. We also saw how tools like the Analyze window can reveal the distribution of a column.

    You can also use the Statistics tab of a dataset to perform more in-depth exploratory data analysis (EDA). The Statistics tab allows you to generate statistical reports on your data by creating Worksheets, and Cards within those worksheets.

    Hint
    A screencast at the end of the lesson recaps the actions described here.

    Interactive Statistics

    Let’s create a worksheet with cards that perform common EDA tasks. For example, if we are interested in seeing a side-by-side summary of the orders_prepared dataset for each of the variables pages_visited, tshirt_category, and total, then:

    With the orders_prepared dataset open:

    • Navigate to the Statistics tab and click +Create Your First Worksheet.
    • Select the Univariate analysis box to open the “Univariate analysis” window.
    • Select pages_visited, tshirt_category, and total from the list of “available variables” in the first panel of the window.
    • With these variables selected, click the “plus” button in the “Variables to describe” panel.

    After making a selection, Dataiku DSS automatically selects the statistical “Options” (in the third panel of the window) that are appropriate for the numerical variables (pages_visited and total) and the categorical variable (tshirt_category). You can deselect any of these options if you so choose.

    • Click Create Card.

    Dataiku DSS creates a card with one section for each variable. The type of statistical chart and descriptive statistic in each section depends on whether the variable is categorical or numerical.

    For example, tshirt_category, a categorical variable, has a bar chart (or categorical histogram), while pages_visited and total each have a numerical histogram and box plot insert. Also, the quantile table is applicable to the numerical variables, while the frequency table is applicable to the categorical variable.

    Note
    By default, Dataiku DSS computes worksheet statistics on a sample of the first records in your dataset. You can configure this setting by clicking the drop-down arrow next to Sampling and filtering.

    We may also be interested in checking whether the total variable follows an exponential distribution. The interactive statistics feature allows you to estimate the parameters of univariate probability distributions using the Fit Distribution card.

    • Click the +New Card button from the “Worksheet” window.
    • Then select the Fit curves & distributions option and the Fit Distribution card.
    • Select total as the “Variable” and Exponential as the “Distribution”.
    • Click Create Card.

    Dataiku DSS creates a card that shows the exponential distribution fit to the data. There is also a Q-Q plot that compares the quantiles of the data against the quantiles of the fitted distribution. Observing points far from the identity line suggests that the data could not have been drawn from the exponential distribution.

    To learn about the full capabilities in the Statistics tab, see the Interactive statistics section of the reference documentation.

    The following video goes through what we just covered.

    //

    What’s next?

    This was just a brief introduction into the kinds of statistical tests we can easily perform in Dataiku DSS. Now let’s continue building our Flow.

    相关文章

      网友评论

          本文标题:Interactive Visual Statistics(2)

          本文链接:https://www.haomeiwen.com/subject/uyfbrltx.html