Ordinal Data
As the name suggests, ordinal data have some order. It allows you to rank the values, such as for education level: elementary is less than high_school, which in turn is less than university. The ordering allows us to compute the median: if our dataset has 100 examples of each education level, it is correct to say that the median education is high_school. From the definition of the median, this means is that 50% of the examples have high_school or elementary education, while the other 50% have high_school or university education, which is correct and, maybe, an important insight. Calculating the mean, however, makes no sense for ordinal data. What is the average of a university and an elementary school anyway?
A note for the mathematically-inclined reader: if you encode the levels of an ordinal variable with subsequent numbers, you can safely apply monotone transformations to it, such as taking the logarithm. This is because monotone transformations preserve the order, and the order is all that matters here.
序列数据
像名字说明的一样,序列数据拥有排序性。它让你可以对值进行排序,比如教育水平:小学低于高中、高中低于大学。排序性让我们尅计算中位数:若我们的数据有100个关于教育水平的样本,它可以说教育水平中位数是高中。从中位数的定义出发,它表示有50%的样本是高中及小学,同时另外50%的样本是高中及大学,这是正确的而且有可能是重要的洞见。对序列数据计算平均值是没有意义的。你认为大学和小学教育水平的平均值有什么含义吗?
给数学感兴趣的读者一个提示:若你打算对序列数据标注数字,你可以安全的使用单调性转化,比如使用对数。因为单调性转化保留了排序性,排序性在这里是非常重要的。
网友评论