数据挖掘：集成方法Bagging和Boosting

作者: Cache_wood | 来源:发表于2022-04-12 18:22 被阅读0次

《统计学习方法》之随机森林Random Forest
数据挖掘：集成方法Bagging和Boosting
CV-模型集成
今我来思，堆栈泛化（Stacked Generalization
集成学习方法
随机森林：这或许是集成学习中最经典的一个 Bagging 算法了
Ensemble
【集成学习】boosting-bagging-random fo
集成学习方法（组合分类器）
adaboost、adaboost.m1、adaboost.m2

@[toc]

Rule-Based Classifier

Classify records by using a collection of “if…then…” rules
Rule: (Condition) → y

where
- Condition is a conjunctions of attributes
- y is the class label
- LHS: rule antecedent or condition
- RHS: rule consequent
- Examples of classification rules:
  
  (Blood Type=Warm) ∧ (Lay Eggs=Yes) → Birds
  
  (Taxable Income < 50K) ∧ (Refund=Yes) → Evade=No

A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule.

Rule Coverage and Accuracy

Coverage of a rule: Fraction of records that satisfy the antecedent of a rule.

Accuracy of a rule: Fraction of records that satisfy both the antecedent and consequent of a rule.

Characteristics of Rule-Based Classifier

Mutually exclusive rules

Classifier contains mutually exclusive rules if the rules are independent of each other

Exhaustive rules

Every record is covered by at most one rule
Classifier has exhaustive coverage if it accounts for every possible combination of attribute values
Each record is covered by at least one rule

Rules are mutually exclusive and exhaustive
Rule set contains as much information as the tree

Effect of Rule Simplification

Rules are no longer mutually exclusive

A record may trigger more than one rule
- Ordered rule set - dependent on the importance
- Unordered rule set – use voting schemes

Rules are no longer exhaustive

A record may not trigger any rules
- Use a default class

Ordered Rule Set

Rules are rank ordered according to their priority

An ordered rule set is known as a decision list

When a test record is presented to the classifier

It is assigned to the class label of the highest ranked rule it has
triggered
If none of the rules fired, it is assigned to the default class

Rule Ordering Schemes

Rule-based ordering

Individual rules are ranked based on their quality

Class-based ordering

Rules that belong to the same class appear together

Building Classification Rules

Direct Method

Extract rules directly from data
e.g.: RIPPER, CN2, Holte’s 1R

Indirect Method

Extract rules from other classification models (e.g. decision trees, neural networks, etc).
e.g: C4.5rules

Direct Method: Sequential Covering

Start from an empty rule
Grow a rule using the Learn-One-Rule function
Remove training records covered by the rule
Repeat Step (2) and (3) until stopping criterion is met

Aspects of Sequential Covering

Rule Growing
Instance Elimination
Rule Evaluation
Stopping Criterion
Rule Pruning

Rule Growing (Examples)

CN2 Algorithm:

Start from an empty conjunct: {}
Add conjuncts that minimizes the entropy measure: {A}, {A,B},…
Determine the rule consequent by taking majority class of instances
covered by the rule

RIPPER Algorithm:

Start from an empty rule: {} => class
Add conjuncts that maximizes FOIL’s information gain measure:
- R0: {} => class (initial rule)
- R1: {A} => class (rule after adding conjunct)
- $Gain(R0, R1) = t [\log (p_1/(p_1+n_1)) – \log (p_0/(p_0 + n_0)) ]$
- where t: number of positive instances covered by both $R_0$ and $R_1$
  $p_0$ : number of positive instances covered by $R_0$
  $n_0$ : number of negative instances covered by $R_0$
  $p_1$ : number of positive instances covered by $R_1$
  $n_1$ : number of negative instances covered by $R_1$

Instance Elimination

Why do we need to eliminate instances?

Otherwise, the next rule is
identical to previous rule

Why do we remove positive instances?

Ensure that the next rule is different

Why do we remove negative instances?

Prevent underestimating accuracy of rule

Stopping Criterion and Rule Pruning

Stopping criterion

Compute the gain
If gain is not significant, discard the new rule

Rule Pruning

Similar to post-pruning of decision trees
Reduced Error Pruning:
- Remove one of the conjuncts in the rule
- Compare error rate on validation set before and after pruning
- If error improves, prune the conjunct

Summary of Direct Method

Grow a single rule
Remove Instances from rule
Prune the rule (if necessary)
Add rule to Current Rule Set
Repeat

Indirect Methods C4.5 rules

consider an alternative rule r’: A’ → y where A’ is obtained by removing one of the conjuncts in A
Compare the pessimistic error rate for r against all r’s
Prune if one of the r’s has lower pessimistic error rate
Repeat until we can no longer improve generalization error

Instead of ordering the rules, order subsets of rules (class ordering)

Each subset is a collection of rules with the same rule consequent (class)
Compute description length of each subset
- Description length = L(error) + g L(model) (Similar with decision tree)
- g is a parameter that takes into account the presence of redundant attributes in a rule set (default value = 0.5)

Advantages of Rule-Based Classifiers

As highly expressive as decision trees
Easy to interpret
Easy to generate
Can classify new instances rapidly
Performance comparable to decision trees

网友评论

本文标题：数据挖掘：集成方法Bagging和Boosting

本文链接：https://www.haomeiwen.com/subject/udhusrtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！