Discretization: A process that transforms quantitative data into qualitative data.
- Some data mining algorithms only accept categorical attributes (LVF, FINCO, Naïve Bayes).
- The learning process is often less efficient and less effective when the data has only quantitative features.
Chi Merge Algorithm
- This discretization method uses a merging approach.
- Relative class frequencies should be fairly consistent within an interval(otherwise should split)
- χ2 is a statistical measure used to test the hypothesis that two discrete attributes are statistically independent.
- For two adjacent intervals, if χ2 test concludes that the class is independent intervals should be merged. If χ2 test concludes that they are not independent, i.e., the difference in relative class frequency is statistically significant, the two intervals should remain separate.
网友评论