import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
iris = pd.read_csv('E:/练习/Iris.csv') 1
iris.head() 1 idSepalLengthCmSepalWidthCmPetalLengthCmPetalWidthCmSpecies015.13.51.40.20124.93.01.40.20234.73.21.30.20344.63.11.50.20455.03.61.40.20
iris.info() 1
<class 'pandas.core.frame.DataFrame'> RangeIndex: 150 entries, 0 to 149 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 150 non-null int64 1 SepalLengthCm 150 non-null float64 2 SepalWidthCm 150 non-null float64 3 PetalLengthCm 150 non-null float64 4 PetalWidthCm 150 non-null float64 5 Species 150 non-null int64 dtypes: float64(4), int64(2) memory usage: 7.2 KB 12345678910111213
iris.drop('id',axis=1,inplace=True) 1
iris.info() 1
<class 'pandas.core.frame.DataFrame'> RangeIndex: 150 entries, 0 to 149 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 SepalLengthCm 150 non-null float64 1 SepalWidthCm 150 non-null float64 2 PetalLengthCm 150 non-null float64 3 PetalWidthCm 150 non-null float64 4 Species 150 non-null int64 dtypes: float64(4), int64(1) memory usage: 6.0 KB 123456789101112
fig = iris[iris.Species==0].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='orange', label='Setosa') iris[iris.Species==1].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='blue', label='versicolor',ax=fig) iris[iris.Species==2].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='green', label='virginica', ax=fig) fig.set_xlabel("Sepal Length") fig.set_ylabel("Sepal Width") fig.set_title("Sepal Length VS Width") fig=plt.gcf() fig.set_size_inches(10,6) plt.show() 123456789
fig = iris[iris.Species==0].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='orange', label='Setosa') iris[iris.Species==1].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='blue', label='versicolor',ax=fig) iris[iris.Species==2].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='green', label='virginica', ax=fig) fig.set_xlabel("Petal Length") fig.set_ylabel("Petal Width") fig.set_title(" Petal Length VS Width") fig=plt.gcf() fig.set_size_inches(10,6) plt.show() 123456789
iris.hist(edgecolor='black', linewidth=1.2) fig=plt.gcf() fig.set_size_inches(12,6) plt.show() 1234
plt.figure(figsize=(15,10)) plt.subplot(2,2,1) sns.violinplot(x='Species',y='PetalLengthCm',data=iris) plt.subplot(2,2,2) sns.violinplot(x='Species',y='PetalWidthCm',data=iris) plt.subplot(2,2,3) sns.violinplot(x='Species',y='SepalLengthCm',data=iris) plt.subplot(2,2,4) sns.violinplot(x='Species',y='SepalWidthCm',data=iris) 123456789
<AxesSubplot:xlabel='Species', ylabel='SepalWidthCm'> 1
from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn import svm from sklearn import metrics from sklearn.tree import DecisionTreeClassifier 123456 pandas.DataFrame.shape–返回DataFrame数据形状
iris.shape 1
(150, 5) 1 现在,当我们训练任何算法时,特征的数量及其相关性起着重要的作用。如果存在特征且许多特征高度相关,则训练具有所有特征的算法将降低精度。因此,应仔细选择特征。该数据集的功能较少,但我们仍将看到相关性。 pandas.DataFrame.corr–计算相关系数 seaborn.heatmap --热力图
plt.figure(figsize=(7,4)) sns.heatmap(iris.corr(),annot=True,cmap='cubehelix_r') plt.show() 123
train, test = train_test_split(iris, test_size = 0.3) print(train.shape) print(test.shape) 123
(105, 5) (45, 5) 12 获取训练集X的特征: [‘SepalLengthCm’,‘SepalWidthCm’,‘PetalLengthCm’,‘PetalWidthCm’] 训练集Y的实际分布 测试集X的特征 测试集Y的实际分布
train_X = train[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']] train_y=train.Species# output of our training data test_X= test[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']] test_y =test.Species 1234 检查训练集和测试集
train_X.head(2) 1 SepalLengthCmSepalWidthCmPetalLengthCmPetalWidthCm726.32.54.91.5395.13.41.50.2
test_X.head(2) 1 SepalLengthCmSepalWidthCmPetalLengthCmPetalWidthCm1436.83.25.92.3405.03.51.30.3 训练集中分类的输出值(原始列表中标注的分类)
train_y.head() 1
72 1 39 0 21 0 109 2 106 2 Name: Species, dtype: int64 123456 SVM支持向量机 sklearn.svm.SVC --SVC算法 sklearn.svm.SVC.fit–对于训练集使用fit方法训练算法 sklearn.svm.SVC.predict-- 传入测试集,使用predict方法给出预测值 sklearn.metrics.accuracy_score --预测值与实际值对比,给出算法准确度
model = svm.SVC() model.fit(train_X,train_y) prediction=model.predict(test_X) print('The accuracy of the SVM is:',metrics.accuracy_score(prediction,test_y)) 1234
The accuracy of the SVM is: 0.9777777777777777 1 支持向量机具有很好的精度。我们将继续检查不同型号的精度。现在我们将按照上面相同的步骤来训练各种机器学习算法。 逻辑回归(Logistic Regression)
model = LogisticRegression() model.fit(train_X,train_y) prediction=model.predict(test_X) print('The accuracy of the Logistic Regression is',metrics.accuracy_score(prediction,test_y)) 1234
The accuracy of the Logistic Regression is 0.9777777777777777 D:Anaconda3libsite-packagessklearnlinear_model_logistic.py:763: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( 1234567891011 决策树(Decision Tree)
model=DecisionTreeClassifier() model.fit(train_X,train_y) prediction=model.predict(test_X) print('The accuracy of the Decision Tree is',metrics.accuracy_score(prediction,test_y)) 1234
The accuracy of the Decision Tree is 0.9777777777777777 1 K均值聚类(Kmeans)
from sklearn.cluster import KMeans 1
model = KMeans(n_clusters=3) model.fit(train_X,train_y) prediction=model.predict(test_X) print('The accuracy of the KMeans is',metrics.accuracy_score(prediction,test_y)) x0 = (train_X,train_y)[prediction == 0] x1 = (train_X,train_y)[prediction == 1] x2 = (train_X,train_y)[prediction == 2] plt.scatter(x0[:, 0], x0[:, 1], c = "red", marker='o', label='label0') plt.scatter(x1[:, 0], x1[:, 1], c = "green", marker='*', label='label1') plt.scatter(x2[:, 0], x2[:, 1], c = "blue", marker='+', label='label2') plt.xlabel('petal length') plt.ylabel('petal width') plt.legend(loc=2) plt.show() 123456789101112131415
The accuracy of the KMeans is 0.3111111111111111 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-156-38728399d491> in <module> 4 print('The accuracy of the KMeans is',metrics.accuracy_score(prediction,test_y)) 5 ----> 6 x0 = (train_X,train_y)[prediction == 0] 7 x1 = (train_X,train_y)[prediction == 1] 8 x2 = (train_X,train_y)[prediction == 2] TypeError: only integer scalar arrays can be converted to a scalar index
12345678910111213141516171' K邻近算法(K-Nearest Neighbours) n_neighbors=3:检查邻近3个点判断属于哪个分类
model=KNeighborsClassifier(n_neighbors=3) model.fit(train_X,train_y) prediction=model.predict(test_X) print('The accuracy of the KNN is',metrics.accuracy_score(prediction,test_y)) 1234
The accuracy of the KNN is 0.9777777777777777 1 当n_neighbors值不同时,检查KNN算法的准确度变化,少数服从多数
a_index=list(range(1,11)) a=pd.Series() x=[1,2,3,4,5,6,7,8,9,10] for i in list(range(1,11)): model=KNeighborsClassifier(n_neighbors=i) model.fit(train_X,train_y) prediction=model.predict(test_X) a=a.append(pd.Series(metrics.accuracy_score(prediction,test_y))) plt.plot(a_index, a) plt.xticks(x) 12345678910
<ipython-input-134-4f8d635a95d7>:2: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. a=pd.Series() ([<matplotlib.axis.XTick at 0x231ebade3d0>, <matplotlib.axis.XTick at 0x231ea5e09d0>, <matplotlib.axis.XTick at 0x231ece293d0>, <matplotlib.axis.XTick at 0x231ed283eb0>, <matplotlib.axis.XTick at 0x231ed28f400>, <matplotlib.axis.XTick at 0x231ed283970>, <matplotlib.axis.XTick at 0x231ed28fac0>, <matplotlib.axis.XTick at 0x231ed28ffd0>, <matplotlib.axis.XTick at 0x231ed295520>, <matplotlib.axis.XTick at 0x231ed295a30>], [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
123456789101112131415161718192021222324252627petal=iris[['PetalLengthCm','PetalWidthCm','Species']] sepal=iris[['SepalLengthCm','SepalWidthCm','Species']] 12
train_p,test_p=train_test_split(petal,test_size=0.3,random_state=0) #petals train_x_p=train_p[['PetalWidthCm','PetalLengthCm']] train_y_p=train_p.Species test_x_p=test_p[['PetalWidthCm','PetalLengthCm']] test_y_p=test_p.Species train_s,test_s=train_test_split(sepal,test_size=0.3,random_state=0) #Sepal train_x_s=train_s[['SepalWidthCm','SepalLengthCm']] train_y_s=train_s.Species test_x_s=test_s[['SepalWidthCm','SepalLengthCm']] test_y_s=test_s.Species 123456789101112
model=svm.SVC() model.fit(train_x_p,train_y_p) prediction=model.predict(test_x_p) print('The accuracy of the SVM using Petals is:',metrics.accuracy_score(prediction,test_y_p)) model=svm.SVC() model.fit(train_x_s,train_y_s) prediction=model.predict(test_x_s) print('The accuracy of the SVM using Sepal is:',metrics.accuracy_score(prediction,test_y_s)) 123456789
The accuracy of the SVM using Petals is: 0.9777777777777777 The accuracy of the SVM using Sepal is: 0.8 12
model = LogisticRegression() model.fit(train_x_p,train_y_p) prediction=model.predict(test_x_p) print('The accuracy of the Logistic Regression using Petals is:',metrics.accuracy_score(prediction,test_y_p)) model.fit(train_x_s,train_y_s) prediction=model.predict(test_x_s) print('The accuracy of the Logistic Regression using Sepals is:',metrics.accuracy_score(prediction,test_y_s)) 12345678
The accuracy of the Logistic Regression using Petals is: 0.9777777777777777 The accuracy of the Logistic Regression using Sepals is: 0.8222222222222222 12
model=DecisionTreeClassifier() model.fit(train_x_p,train_y_p) prediction=model.predict(test_x_p) print('The accuracy of the Decision Tree using Petals is:',metrics.accuracy_score(prediction,test_y_p)) model.fit(train_x_s,train_y_s) prediction=model.predict(test_x_s) print('The accuracy of the Decision Tree using Sepals is:',metrics.accuracy_score(prediction,test_y_s)) 12345678
The accuracy of the Decision Tree using Petals is: 0.9555555555555556 The accuracy of the Decision Tree using Sepals is: 0.6444444444444445 12
model=KMeans(n_clusters=3) model.fit(train_x_p,train_y_p) prediction=model.predict(test_x_p) print('The accuracy of the KMeans using Petals is:',metrics.accuracy_score(prediction,test_y_p)) model.fit(train_x_s,train_y_s) prediction=model.predict(test_x_s) print('The accuracy of the KMeans using Sepals is:',metrics.accuracy_score(prediction,test_y_s)) 12345678
The accuracy of the KMeans using Petals is: 0.022222222222222223 The accuracy of the KMeans using Sepals is: 0.7555555555555555 12
model=KNeighborsClassifier(n_neighbors=3) model.fit(train_x_p,train_y_p) prediction=model.predict(test_x_p) print('The accuracy of the KNN using Petals is:',metrics.accuracy_score(prediction,test_y_p)) model.fit(train_x_s,train_y_s) prediction=model.predict(test_x_s) print('The accuracy of the KNN using Sepals is:',metrics.accuracy_score(prediction,test_y_s)) 12345678
The accuracy of the KNN using Petals is: 0.9777777777777777 The accuracy of the KNN using Sepals is: 0.7333333333333333 12
1'
1'
相关知识
【机器学习】KNN算法实现鸢尾花分类
【机器学习】基于KNN算法实现鸢尾花数据集的分类
机器学习算法其一:鸢尾花数据集逻辑回归分类预测学习总结
基于机器学习的鸢尾花数据集的三分类算法的实现 C++
【机器学习】鸢尾花分类:机器学习领域经典入门项目实战
【机器学习】KNN算法实现手写板字迹识别
【人工智能】基于分类算法的学生学业预警系统应用
Python机器学习教程——逻辑回归
【python机器学习】KNN算法实现回归(基于鸢尾花数据集)
【10月23日】机器学习实战(一)KNN算法:手写识别系统
网址: 机器学习分类算法SVM、逻辑回归、KNN https://m.huajiangbk.com/newsview1114421.html
上一篇: 与太阳花极为相似的植物(探秘其特 |
下一篇: 三秒认清这些花,让你不被男友骗 |