의사결정 나무(Decision Tree)

 : CART - classification과 Regression 모두 가능
 : 여러 규칙을 순차적으로 적용하면서 분류나 예측을 진행하는 단순 알고리즘 사용 모델

 

Random Forest

앙상블 모델

base 모델로 Decision Tree

 

 * tree1.py

import pydotplus
from sklearn import tree

# height, hair로 남녀 구분
x = [[180, 15],
     [177, 42],
     [156, 35],
     [174, 5],
     [166, 33]]

y = ['man', 'women', 'women', 'man', 'women']
label_names = ['height', 'hair Legnth']

model = tree.DecisionTreeClassifier(criterion='entropy', random_state=0)
print(model)
fit = model.fit(x, y)
print('acc :{:.3f}'.format(fit.score(x, y))) # acc :1.000

mydata = [[171, 8]]
pred =  fit.predict(mydata)
print('pred :', pred) # pred : ['man']

from sklearn import tree

tree.DecisionTreeClassifier() : 

scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

 

sklearn.tree.DecisionTreeClassifier — scikit-learn 0.24.1 documentation

 

scikit-learn.org

# 시각화 - graphviz 툴을 사용
import collections

dot_data = tree.export_graphviz(model, feature_names=label_names, out_file=None,\
                                filled = True, rounded=True)
graph = pydotplus.graph_from_dot_data(dot_data)
colors = ('red', 'orange')
edges = collections.defaultdict(list) # list type 변수

for e in graph.get_edge_list():
    edges[e.get_source()].append(int(e.get_destination()))

for e in edges:
    edges[e].sort()
    for i in range(2):
        dest = graph.get_node(str(edges[e][i]))[0]
        dest.set_fillcolor(colors[i])

graph.write_png('tree.png') # 이미지 저장

import matplotlib.pyplot as plt

img = plt.imread('tree.png')
plt.imshow(img)
plt.show()

 


 * tree2_iris.py

...

# 의사결정 나무 모델
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier(criterion='entropy', max_depth=5)

...
...
# 트리의 특성 중요도 : 전체 트리 결정에 각 특성이 어느정도 중요한지 평가
print('특성 중요도 : \n{}'.format(model.feature_importances_))

def plot_feature_importances(model):
    n_features = x.shape[1] # 4
    plt.barh(range(n_features), model.feature_importances_, align='center')
    #plt.yticks(np.range(n_features), iris.featrue_names[2:4])
    plt.xlabel('특성중요도')
    plt.ylabel('특성')
    plt.ylim(-1, n_features)

plot_feature_importances(model)
plt.show()

# graphviz
from sklearn import tree
from io import StringIO
import pydotplus

dot_data = StringIO() # 파일 흉내를 내는 역할
tree.export_graphviz(model, out_file = dot_data,\
                     feature_names = iris.feature_names[2:4])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_png('tree2.png')

import matplotlib.pyplot as plt

img = plt.imread('tree2.png')
plt.imshow(img)
plt.show()

 

'BACK END > Deep Learning' 카테고리의 다른 글

[딥러닝] KNN  (0) 2021.03.18
[딥러닝] RandomForest  (0) 2021.03.17
[딥러닝] 나이브 베이즈  (0) 2021.03.17
[딥러닝] PCA  (0) 2021.03.16
[딥러닝] SVM  (0) 2021.03.16

# 분류모델 중 Decision Tree

set.seed(123)
ind <- sample(1:nrow(iris), nrow(iris) * 0.7, replace = FALSE)
train <- iris[ind, ]
test <- iris[-ind, ]

 

# ctree

install.packages("party")
library(party)
iris_ctree <- ctree(formula = Species ~ ., data = train)
iris_ctree
# Conditional inference tree with 4 terminal nodes
# 
# Response:  Species 
# Inputs:  Sepal.Length, Sepal.Width, Petal.Length, Petal.Width 
# Number of observations:  105 
# 
# 1) Petal.Length <= 1.9; criterion = 1, statistic = 97.466
# 2)*  weights = 36 
# 1) Petal.Length > 1.9
# 3) Petal.Width <= 1.7; criterion = 1, statistic = 45.022
# 4) Petal.Length <= 4.6; criterion = 0.987, statistic = 8.721
# 5)*  weights = 25 
# 4) Petal.Length > 4.6
# 6)*  weights = 10 
# 3) Petal.Width > 1.7
# 7)*  weights = 34
plot(iris_ctree, type = "simple")
plot(iris_ctree)

 

# predit

pred <- predict(iris_ctree, test)
pred
t <- table(pred, test$Species) # 인자 : 예측값, 실제값
t
# pred         setosa versicolor virginica
# setosa         14          0         0
# versicolor      0         18         1
# virginica       0          0        12
sum(diag(t)) / nrow(test) # 0.9777778
library(caret)
confusionMatrix(pred, test$Species)
# Confusion Matrix and Statistics
# 
# Reference
# Prediction   setosa versicolor virginica
# setosa         14          0         0
# versicolor      0         18         1
# virginica       0          0        12
# 
# Overall Statistics
# 
# Accuracy : 0.9778          
# 95% CI : (0.8823, 0.9994)
# No Information Rate : 0.4             
# P-Value [Acc > NIR] : < 2.2e-16       
# 
# Kappa : 0.9662          
# 
# Mcnemar's Test P-Value : NA              
# 
# Statistics by Class:
# 
#                      Class: setosa Class: versicolor Class: virginica
# Sensitivity                 1.0000            1.0000           0.9231
# Specificity                 1.0000            0.9630           1.0000
# Pos Pred Value              1.0000            0.9474           1.0000
# Neg Pred Value              1.0000            1.0000           0.9697
# Prevalence                  0.3111            0.4000           0.2889
# Detection Rate              0.3111            0.4000           0.2667
# Detection Prevalence        0.3111            0.4222           0.2667
# Balanced Accuracy           1.0000            0.9815           0.9615

 

# 방법2 : rpart : 가지치기 이용

library(rpart)

iris_rpart <- rpart(Species ~ ., data = train, method = 'class')
x11()
plot(iris_rpart)
text(iris_rpart)

plotcp(iris_rpart)
printcp(iris_rpart)

cp <- iris_rpart$cptable[which.min(iris_rpart$cptable[, 'xerror'])]
iris_rpart_prune <- prune(iris_rpart, cp=cp, 'cp')
x11()
plot(iris_rpart_prune)
text(iris_rpart_prune)

'BACK END > R' 카테고리의 다른 글

[R] R 정리 17 - Naive Bayes  (0) 2021.02.02
[R] R 정리 16 - Random Forest  (0) 2021.02.02
[R] R 정리 14 - 로지스틱 회귀2 ROC  (0) 2021.02.01
[R] R 정리 13 - 로지스틱 회귀  (0) 2021.02.01
[R] R 정리 12 - 다중 선형 회귀 DB  (0) 2021.01.29

+ Recent posts

1