1.What is the accuracy of your decision tree?
data:image/s3,"s3://crabby-images/6f59b/6f59ba03bedcfa787750ab2ad9209640a21d3ada" alt=""
data:image/s3,"s3://crabby-images/7ee4c/7ee4c8f240d015b1a061c005028d3abb364451f7" alt=""
2. How many features are in your data?
数据被整理成一个 numpy 数组后,行数是数据点数,列数是特征数;要提取这个数字,只需运行代码 len(features_train[0])
data:image/s3,"s3://crabby-images/cfb11/cfb112e092b293aebba1b473ddb19d8689aad9b5" alt=""
3.change the number of features.
进入../tools/email_preprocess.py,然后找到类似此处所示的一行代码:
selector = SelectPercentile(f_classif, percentile=10)
将百分位数从 10 改为 1,然后运行dt_author_id.py
现在,特征数是多少?
data:image/s3,"s3://crabby-images/432fd/432fd4b2e8e02e861b7a537b581e98dbe351f5ab" alt=""
data:image/s3,"s3://crabby-images/dc6b7/dc6b779eba66ad568ab6efc7a50aaa0190698445" alt=""
4.在其他所有方面都相等的情况下,特征数量越多会使决策树的复杂性更高
5.当你仅使用 1% 的可用特征(即百分位数 = 1)时,决策树的准确率是多少?
data:image/s3,"s3://crabby-images/873ed/873ed6344235fc3b7e9a8a2338e72791f86dc5f4" alt=""
data:image/s3,"s3://crabby-images/4ded3/4ded3334f80430d1eb8fe7d81f7dbb9201844b27" alt=""
网友评论