SVM Classification Analysis of Heart Disease Data with R Part 2

5 min readAug 12, 2021

Assalamua’laikum,

Hello statisticians, this article continues the previous article which discusses the analysis of the classification data of the ANN method for Heart Disease data. So, for this article the author will try to use the SVM method classification. Where the accuracy results will be compared with the analysis of the ANN method, that has been done previously .

Support Vector Machine is a relatively new technique (1995) for making predictions, both in the case of classification and regression, which is very popular in recent times. The support vector machine is in the same class as the Neural Network in terms of functions and problem conditions that can be solved, both of which are included in the supervised learning class. Both scientists and practitioners have done a lot of research by applying this technique to solve real problems in life. Both in the problem of gene expression analysis, finance, weather to the medical field, it is proven that in many implementations of support vector machines, it gives better results than neural networks, especially in terms of the solutions achieved. Theoretically the Support Vector Machine was developed for classification problems with two classes in an effort to find the best hyperplane. Hyperplane is a separator function between two classes in the input space. (Nurhayati, 2015)

Heart Disease data set is from 1988 and consists of four databases: Cleveland, Hungary, Switzerland, and Long Beach V. This data contains 76 attributes, including predicted attributes, but all published experiments refer to the use of a subset of these 14 attributes. . The "target" field refers to the patient's presence or diagnosis of heart disease. This is an integer with a value of 0 = negative heart disease and 1 = positive heart disease. Where the number of data objects there are 303 data. The data link is as follows https://www.kaggle.com/ronitf/heart-disease-uci.

The first step, install the packages that will be used in the SVM analysis and activate the packages with the library command as follows:

library(e1071)
library(caret)
library(devtools)
library(datasets)
library(rgl)
library(misc3d)

Then, input the heart disease data in the form of a csv file with the following syntax:

heart_deseases= read.csv(file.choose(), header = TRUE, sep = “,”)

Next, I want to first see the data frame, structure, variable name, and variable factor of the target data heart disease as follows:

str(heart_deseases)
View(heart_deseases)
names(heart_deseases)
summary(heart_deseases)
factor(heart_deseases$target)

So, that the heart disease data set will be displayed as follows:

You can also see a summary of the data as follows:

As well as, known factors in the target variable as follows:

Then, divide the portion of the heart disease data into 2 parts, namely 65% of data for training data, 35% of testing data as follows:

n=round(nrow(heart_deseases)*0.65)
n
set.seed(123)
sample=sample(seq_len(nrow(heart_deseases)), size = n)
train=heart_deseases[sample,] # terhadap data training
test=heart_deseases[-sample,] # terhadap data test

Next, we will build a Support Vector machine model with a data train using the kernel polynomial type as follows:

data.svm<-svm(factor(target) ~., data = train, kernel=’polynomial’)
data.svm
summary(data.svm)

So, that the SVM model has been formed for data classification. After the SVM model is formed, the model can be used as a classification or prediction tool for the Y (target) variable, namely the diagnosis of heart disease. Then, that predictions are made using testing data as follows:

databaru=test[,-14]
databaru.pred = predict(data.svm, newdata=databaru, decision.values=T)
databaru.pred
##
databaru.dv = attr(databaru.pred,’decision.values’)
databaru.dv

So that the prediction of heart disease diagnosis is obtained from 13 heart disease factors as follows:

From the output, it is explained that if the predictive value is negative as in the second data, the value is -0.12007287, then the predicted diagnosis is 0 (positive heart disease), whereas if the predictive value is positive as in the 92nd data, the value is 0.08987262, then the diagnosis is predicted to be 1 (negative heart disease).

To see more clearly, see if the predictions obtained from the SVM model are true or not from the original data, then the model testing is carried out as follows:

prediksi<-predict(data.svm, test)
prediksi
table(prediksi)

The results of the prediction or validation of the classification can be obtained in the following output.

From the output, it is explained that if the predicted value is 1 as in the 2nd data, then it is predicted that the diagnosis is positive heart disease, whereas if the predicted value is 0 as in the 92nd data, it is predicted that the diagnosis is negative heart disease.

The last step is to validate the svm model to see the level of accuracy obtained with the Confusion Matrix as follows:

confusionMatrix(prediksi, factor(test$target))

From the output, it is known that the results of the SVM classification obtained prediction results that showed 54 correct for class 1 (positive heart disease) and 18 wrong predictions for class 0 (negative heart disease). ) as many as 31 and wrong predictions for class 1 (positive heart disease) as many as 3. It is known that the level of accuracy obtained is 0.8019 or 80.19% of the predicted accuracy is measured. Where the accuracy results are quite good.

Where we see the level of accuracy with the ANN method of heart disease data classification is 0.7619 or 76.19% of the predicted accuracy measured. So that the best method used for the classification of heart disease diagnoses on heart disease data is the SVM method with an accuracy rate of 0.8019 or 80.19%.

Reference:

SVM Classification Analysis of Heart Disease Data with R Part 2

Assalamua’laikum,

Written by Falah Novayanda Adlin

No responses yet