Confusion Matrix and Cyber Attack

Confusion Matrix, Cybercrimes, Security, Machine learning

Aaryan
5 min readJun 6, 2021

What is a cyber attack?

A cyber attack is an attempt to gain unauthorized access to a computer, computing system, or computer network with the intent to cause damage. Cyber attacks aim to disable, disrupt, destroy or control computer systems or to alter, block, delete, manipulate or steal the data held within these systems.

What is Cyber Security?

Cyber security refers to the body of technologies, processes, and practices designed to protect networks, devices, programs, and data from attack, damage, or unauthorized access. Cyber security may also be referred to as information technology security.

What is machine learning for cybersecurity?

Machine learning algorithms help security teams save time by automatically identifying security incidents and threats, analyzing them, and even automatically responding to them in some cases. Machine learning is built into many modern security tools.

Some companies use machine learning Prediction for the analysis of integrated cybercrime and we should know that how effective is our Machine learning model that's why we use the Confusion matrix that evaluates the Algorithm or tells the effectiveness but there are a few problems which will we discuss in this article.

What is Confusion Matrix

A confusion matrix is a table that is used to describe the performance of a Machine Learning model on a set of test data for which the true values are known.

It is the N x N matrix used for evaluating the performance of a model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. From this, we will get to know how well our model is performing and what kinds of errors it is making.

Confuse matrix(2X2)

From the above figure:

We have,

  • The target variable has two values: 1 and 0
  • 1 and 0 are also known as Positive values and Negative values respectively.
  • The columns of the confusion matrix basically represent the Actual Values and the rows represent the Predicted Values.

Let’s understand TP, FP, FN, TN in a Confusion Matrix

  • True Positive(TP): The actual value was positive and the model predicted a positive value
  • True Negative (TN): The actual value was negative and the model predicted also a negative value
  • False Positive (FP): The actual value was negative but the model predicted a positive value also known as type 1 error.
  • False Negative (FN): The actual value was positive but the model predicted a negative value also known as type 2 error.

The accuracy formula for a model is Accuracy = TN+TP / TN+FP+FN+TP

Accuracy formula

Let’s take an example:

We have a total of 165 Humans and our model predicts ‘yes’ for males and predict ‘no’ for females. Out of those 165 Humans, the model predicted “yes” 110 times, and “no” 55 times. So from prediction, we got to know that we have 110 males and 55 females are there but in reality, 105 humans are male and 60 are female.

  • True Negative: Out of 55 times for which model predicted is he/she is male or not will take place, 50 predictions were ‘True’ which means 50 of them are females.
  • True Positive: The model predicted 110 times humans are male out of which 100 are actually male.
  • False Positive: 10 of them are female but the model had predicted that they are male. False-positive is also called a Type I error.
  • False Negative: Out of 55 humans for which model predicted 5 females but they are male. also known as Type II error.

now let’s calculate the accuracy

TN+TP / TN+FP+FN+TP=(50+100)/(50+10+5+100)

accuracy= (100+50)/165

=>0.91

91 % accuracy is very good in this case and also in so many cases high accuracy is very good for results but sometimes in some critical data type 1 and type 2 errors give problems in our results, for example, a model related to data security.

An Overview of False Positives and False Negatives in cybersecurity.

Let’s say a model which predicts that trojan malware exits in the file or not and model accuracy is 91 percent.

So sometimes it may predict a good file to be a harmful file and remove it because of false positive. So if that file is very important then it will become a problem. And the false negative, in this case, will be very much important for example I downloaded a file that contains trojan malware and my model predicted to be not to contain the trojan malware because of false-negative and I lost all file so these are the problems with false negative and false positive so, we should be very careful about it.

False Positive or Type I error

This type of error can prove to be very dangerous. Our system may predict a normal file to be malware and our system may remove that file which can cause us big trouble if that file is very important for our project and we lose it so we can lose our job.

False Negative or Type II error

This type of error is also very dangerous as our model predicts a dangerous file to be a normal file and neglect it and that file may destroy our os etc or may steal our data.

Hence, we can conclude that type 1 and type 2 error is very critical in cyber security.

--

--