首页 > 分享 > Python语言基于CART决策树的鸢尾花数据分类

Python语言基于CART决策树的鸢尾花数据分类

花匠小妙招
2024-11-06 23:35

1.数据集的获取。

使用SCIKIT-LEARN的自带的鸢尾花数据集，获取数据集.

2.数据集的划分。

基于hold-out法，构建训练集与测试集并且确保训练集与测试集内各类别占比一致。

要求：训练集80%，测试集20%。

3. 模型的学习。

利用训练集，学习两种复杂程度不同的CART分类树(用深度控制)，可视化分类树的学习结果，并给出每一棵树的特征重要性评分。

4. 基于测试集的分类树的评价。

(1)结合测试集各样本的类别预测结果及真实类别答案，生成混淆矩阵，并可视化混淆矩阵

(2)基于混淆矩阵，估计每个类别的查准率、查全率、F1值，以及宏查准率、宏查全率、宏F1值；估计总体预测正确率.

5. 使用整个数据集学习上述两种不同深度的分类树, 可视化。

源码如下：

import pandas as pd

from matplotlib import pyplot as plt

from pandas.core.common import random_state

from sklearn.datasets import load_iris

from sklearn.tree import plot_tree

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score, accuracy_score

from sklearn.model_selection import train_test_split

import seaborn as sns

iris = load_iris()

X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=random_state())

dtree_shallow = DecisionTreeClassifier(max_depth=2)

dtree_shallow.fit(X_train, y_train)

dtree_deep = DecisionTreeClassifier(max_depth=4)

dtree_deep.fit(X_train, y_train)

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15, 7))

plot_tree(dtree_shallow, filled=True, rounded=True, ax=axes[0], feature_names=iris.feature_names,

class_names=iris.target_names)

axes[0].set_title('Shallow Decision Tree')

plot_tree(dtree_deep, filled=True, rounded=True, ax=axes[1], feature_names=iris.feature_names,

class_names=iris.target_names)

axes[1].set_title('Deep Decision Tree')

plt.show()

def show_score(tree):

importance = tree.feature_importances_

for i, v in enumerate(importance):

print('Feature: %0d, Score: %.5f' % (i, v))

print("深度为2的决策树评分为：")

show_score(dtree_shallow)

print("深度为4的决策树评分为：")

show_score(dtree_deep)

def show_confusion_matrix(tree, title):

y_pred = tree.predict(X_test)

cm = confusion_matrix(y_test, y_pred)

df_cm = pd.DataFrame(cm)

ax = sns.heatmap(df_cm, annot=True, cmap="Purples")

ax.set_title(title)

ax.set_xlabel('predict target')

ax.set_ylabel('true target')

plt.show()

show_confusion_matrix(dtree_shallow, 'Confusion Matrix of ShallowTree')

show_confusion_matrix(dtree_deep, 'Confusion Matrix of DeepTree')

def show_performance_measurement(tree):

y_pred = tree.predict(X_test)

precision = precision_score(y_test, y_pred, average=None)

recall = recall_score(y_test, y_pred, average=None)

f1 = f1_score(y_test, y_pred, average=None)

accuracy = accuracy_score(y_test, y_pred)

macro_precision = precision_score(y_test, y_pred, average='macro')

macro_recall = recall_score(y_test, y_pred, average='macro')

macro_f1 = f1_score(y_test, y_pred, average='macro')

print(f'Precision: {precision}')

print(f'Recall: {recall}')

print(f'F1 score: {f1}')

print(f'Accuracy: {accuracy}')

print(f'Macro Precision: {macro_precision}')

print(f'Macro Recall: {macro_recall}')

print(f'Macro F1 score: {macro_f1}')

print("深度为2的决策树性能度量指标：")

show_performance_measurement(dtree_shallow)

print("深度为4的决策树性能度量指标：")

show_performance_measurement(dtree_shallow)

X = iris.data

y = iris.target

tree1 = DecisionTreeClassifier(max_depth=2)

tree1.fit(X, y)

tree2 = DecisionTreeClassifier(max_depth=4)

tree2.fit(X, y)

plt.figure(figsize=(15, 7))

plt.subplot(1, 2, 1)

plot_tree(tree1, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)

plt.title('Decision Tree with max depth 2')

plt.subplot(1, 2, 2)

plot_tree(tree2, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)

plt.title('Decision Tree with max depth 4')

plt.show()

运行结果与输出图片：

深度为2的决策树评分为：

Feature: 0, Score: 0.00000

Feature: 1, Score: 0.00000

Feature: 2, Score: 0.00000

Feature: 3, Score: 1.00000

深度为4的决策树评分为：

Feature: 0, Score: 0.01875

Feature: 1, Score: 0.01875

Feature: 2, Score: 0.05648

Feature: 3, Score: 0.90602

深度为2的决策树性能度量指标：

Precision: [1. 0.9 0.9]

Recall: [1. 0.9 0.9]

F1 score: [1. 0.9 0.9]

Accuracy: 0.9333333333333333

Macro Precision: 0.9333333333333332

Macro Recall: 0.9333333333333332

Macro F1 score: 0.9333333333333332

深度为4的决策树性能度量指标：

Precision: [1. 0.9 0.9]

Recall: [1. 0.9 0.9]

F1 score: [1. 0.9 0.9]

Accuracy: 0.9333333333333333

Macro Precision: 0.9333333333333332

Macro Recall: 0.9333333333333332

Macro F1 score: 0.9333333333333332

进程已结束,退出代码0

收藏！专插本大学语文复习必备知识点（附备考资料分享）

事业单位招聘考试复习资料

热点分享

家庭养花知识大全(家庭养花知识大全与技巧)

养花常识养花技巧 1.浇花 ①残茶浇花残茶用来浇花,既能保持土...

养花知识大全,养花技巧大全

养花知识绿萝是一种很常见的盆栽植物，因为四季翠绿、养护简单...

推荐分享

家庭养花风水知识家庭养花“五行说”

许多人喜欢在家庭里面养花，但不是很了解家庭养花风水知识。居家...

家庭养花知识大全家庭养花有什么好处

家庭养花知识大全家庭养花有什么好处爱花之人总是喜欢在家里...

热门点击排行

君子兰什么品种最名贵十大名贵君子兰排名

世界上最名贵的10种兰花图片，莲瓣兰价值高达1500万

分享分类导航

花卉

每日分享

花卉图片

养花生活

Python语言基于CART决策树的鸢尾花数据分类

收藏！专插本大学语文复习必备知识点（附备考资料分享）

事业单位招聘考试复习资料

家庭养花知识大全(家庭养花知识大全与技巧)

养花知识大全,养花技巧大全

家庭养花风水知识 家庭养花“五行说”

家庭养花知识大全 家庭养花有什么好处

家庭养花风水知识家庭养花“五行说”

家庭养花知识大全家庭养花有什么好处