Malware Detection By Machine Learning Techniques

Research Aim, Objectives and Questions

Malware refers to a software-designed program that can infiltrate and damage a computer system in order to breach data from it. The owner of the system remain unaware of the malware attack on their computer. However, a simple classification of malware can be file infectors and standalone malware (Varma, Raj & Raju, 2017). However, another way for classifying these malware include worms, backdoors, Trojans, spyware and adware. Therefore, malware detection methods and techniques need to be advanced level for detecting different types of malware for the system. The use of various techniques for the detecting several malware will be discussed in this research. Malware detection by using standard methods are difficult for detecting the malwares in the computer system (Kolosnjaji et al., 2016). The use of malware detection applications uses the polymorphic layers for avoiding detection and use mechanism for automatically update into newer version in a short period. However, a few machine-learning methods have been discussed in the research that might help in detecting malware from the system.

As there is a diversity of malware has been increasing all over the world, anti-virus software are not able to provide full protection to the computer user in the companies and individual basis. As per the Kaspersky labs, 6563145 different systems have been attacked and 400000 unique malware software have been detected in the market (Anderson & McGrew, 2017). Therefore, there is an extreme need of a technique to minimize these ratios. However, there is also a decrease in the expertise for maintaining the attacks in the company and individual basis. Therefore, the use o machine-learning concept has been important in the daily life. The attacking tools have been uncrossing on a daily basis. High-availability of anti-malware techniques have been maintained in the market for the detecting these malwares over the internet, Therefore there has been an opportunity in the market for the antimalware in the market (Narudin et al., 2016).

This research will focus on identifying new techniques of detecting these malwares in the system. The importance of the machine learning in order to detect these malwares in the internet will be analyzed. The primary goal of the research will be based on detection of the malwares and issues to implement this technique in the internet.

The accuracy level of the antimalware software have been decreasing on a daily basis. There have been increase in the number if the malwares as discussed earlier. As commented by Avasarala, Day & Steiner (2016), there has been 8.7% increase in the cyber-attacks over the internet. Therefore, this have created a high-level risk in the companies and other usage of computer system at home.

Literature Review

In recent days, Kaspersky labs have reported that different companies are suffering with data loss problem in the market due to the cyber-attacks. According to Gandotra, Bansal & Sofat (2014), there has been increase in the cyber-attack over the internet Therefore the use if the antimalware software have been increased in the market. However, the availability of the anti-malware software have been less in the market. They are also not able to detect these new malware in the market. Therefore, this has been creating major problem in solving this problem (Kumar, Gao, Welch & Mansoori, 2016).

This research reflects the use of the machine learning for classifying these malware and provide protection from these malwares. The use of the machine learning techniques will be discussed in the study. The research will helps in maintaining the security of data and information in the computer system. The use if the various machine learning technique will be discussed in the paper.

The aim of the research is to classify malware with the help of machine learning method.

The research objectives ate as follows:

  • To identify different types of malware over the internet
  • To understand the need to machine learning in detecting malware
  • To implement proper strategies in order to mitigate issues related to malware

The research questions are as follows:

  • What are the different types of malware over the internet?
  • What is the need of machine learning in detecting malware?
  • How these issues can be mitigated using different strategies?

There are various classification of the malware over the internet.

Virus: Virus is the simplest form of malware. However, it can be the most dangerous malware over the internet. This malware can enter into the system of user without permission and damages the system (Allix et al., 2016).

Worm: This malware is similar to virus. However, it can spread over the network and can damage all the systems connected in a single network (Kolosnjaji et al., 2016).

Trojan: This malware aims at different legitimate network and software. It acts as general spreading vector using the social engineering technology. Therefore, user get confused with installing legitimate software.

Adware: This malware only display fake advertisements on the screen of the computer. Therefore, there are various adware present over different websites on the internet (Friedrichs, Huger & O’donnell, 2015).

Spyware: This malware acts as an agent to convey the details of one computer to another user. Therefore, this breaches into the computer for stealing data and information. Therefore, spyware checks out the search history and send personal details over the internet. The use of different algorithms might help in minimizing the threats and risk of malware attack in the computer system (Milosevic, Dehghantanha & Choo, 2017).

Backdoor: The backdoor malware helps in providing a secret entrance to either malware into the computer. Other malware get into the system with the help in backdoor malware. Therefore, it never attack independently but with a lot of other malware to enter into the computer.

Detection Methods

Ransom ware: This type of malware encrypts all the data and information stored in the computer send to the internet. Therefore, it locks all the data and information in the computer system. After that, a ransom money is asked for decrypting these data and information in the computer (Dash et al., 2016).

Remote Administration Tools (RAT): This malware helps in allowing attacker to receive access to the computer and make changes in the settings of the computer. It can even change the password of different accounts stored in the computer (Chen, Ye & Bourlai, 2017).

As commented by Sethi et al., (2018), all the malware detection method are based in the signature-based and behavior-based methods. Various detection methods has been discussed below:

File Format Inspection: File metadata helps in providing information about the whole data set stored in the computer. For example, Windows portable executable files helps in providing information during compile time and exported functions (Yerima, Sezer & Muttik, 2015).

String Extraction: This method refers to the examination of software output. For example, status and error messages during interference of data and information related to malware operation.

Fingerprinting: This cryptographic hash computation helps in maintaining the security of the biometric systems. Therefore, different artifacts can be detected through this method (Meidan et al., 2017).

K-nearest neighbors (KNN) is the simplest algorithms of the machine learning method. A non-parametric algorithm does not make any assumption during detection of malware. This algorithm can be used for both regression and classification problems. The prediction is based on K training instances. In the case of the KNN classification problems, majority of the output class can be predicted by the majority of sites to the K nearest neighbors (Bekerman et al., 2015).

As commented by Chumachenko, (2017), Euclidian distance works for the problems that are of same type. However, the value of k plays an important role in predicting accuracy of the algorithm. The small value of k denotes lower accuracy level of the algorithm. However, larger value of k lower the performance of the algorithm. 

This research will help in detecting malware by the use of the machine learning algorithms. Therefore, different algorithms will be used in order to detect these malwares. The perception algorithm will be used in this study for correctly detecting these malwares (Chumachenko, 2017). 

− F = (fa1, fa2, . . . , fan) is an array representing the feature values associated to a file, where fai are file features. − Ri = (Fi , labeli) is a record, where Fi is an array of file feature as above, and labeli is a boolean tag. The value of labeli identifies the file characterised by the array of feature values Fi as being either a malware file or a clean file. − R = (R1, R2, . . . Rm) is the set of records associated to the training files.

K-nearest algorithms

NumberOf Iterations ← 0

M axIterations ← 100


Train (R, 1, -1)

while FP(R) > 0 do

Train (R, 0, -1)

end while

NumberOf Iterations ← NumberOf Iterations + 1

until (TP(R) = NumberOfM alwareF iles) or

(NumberOf Iterations = M axIterations)

The algorithm 1 and 2 will be used in the sequel as bricks in cascade classification stages.

NumberOf Iterations ← 0

M axIterations ← 100


Train (R, 1, -1)

R′ = R−{all malware samples}

while FP(R′

) > 0 do

Train (R′

, 0, -1)

R′ = R′−{all samples correctly classified}

end while

NumberOf Iterations ← NumberOf Iterations + 1

until (TP(R) = NumberOfM alwareF iles) or

(NumberOf Iterations = M axIterations)

Algorithm 3 refers to the first main optimization into OSP algorithm. Therefore, it reduces the size of the training and increases the speed of training. Therefore, utilizing this optimized version of the OSP version, the speed of detecting malwares will be increased.

The data analysis has been done using the Random forest analysis method.  The random forest is a bootstrapping algorithm that is deepened on the Decision tree (CART) model.  This method has been initiating different variables and samples during the experimental analysis.  The real process has been done for the many times that have helped in collecting samples as data for the data analysis. The fine tuning method of the data analysis has been maintaining data collection sample.  The 5-fold cross validation has been done by initiating 3 experiments and applying machine learning to the cyber security.

After the completion of the research, it can be presumed that the detection of the malwares in the internet might be possible. The machine learning techniques discussed in the study might help in detecting and removing the malwares in the computer system. The use of malware detection applications uses the polymorphic layers for avoiding detection and use mechanism for automatically update into newer version in a short period. However, a few machine-learning methods have been discussed in the research that might help in detecting malware from the system. The role of the antivirus software can be implemented by the machine learning techniques in order to detect malwares in the computer system.


Allix, K., Bissyandé, T. F., Jérome, Q., Klein, J., & Le Traon, Y. (2016). Empirical assessment of machine learning-based malware detectors for Android. Empirical Software Engineering, 21(1), 183-211.

Anderson, B., & McGrew, D. (2017, August). Machine learning for encrypted malware traffic classification: accounting for noisy labels and non-stationarity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1723-1732). ACM.

Avasarala, B. R., Day, J. C., & Steiner, D. (2016). U.S. Patent No. 9,292,688. Washington, DC: U.S. Patent and Trademark Office.

Bekerman, D., Shapira, B., Rokach, L., & Bar, A. (2015, September). Unknown malware detection using network traffic classification. In Communications and Network Security (CNS), 2015 IEEE Conference on (pp. 134-142). IEEE.

Chen, L., Ye, Y., & Bourlai, T. (2017, September). Adversarial Machine Learning in Malware Detection: Arms Race between Evasion Attack and Defense. In Intelligence and Security Informatics Conference (EISIC), 2017 European (pp. 99-106). IEEE.

Chumachenko, K. (2017). Machine Learning Methods for Malware Detection and Classification.

Dash, S. K., Suarez-Tangil, G., Khan, S., Tam, K., Ahmadi, M., Kinder, J., & Cavallaro, L. (2016, May). Droidscribe: Classifying android malware based on runtime behavior. In Security and Privacy Workshops (SPW), 2016 IEEE (pp. 252-261). IEEE.

Friedrichs, O., Huger, A., & O’donnell, A. J. (2015). U.S. Patent No. 9,088,601. Washington, DC: U.S. Patent and Trademark Office.

Gandotra, E., Bansal, D., & Sofat, S. (2014). Malware analysis and classification: A survey. Journal of Information Security, 5(02), 56.

Kolosnjaji, B., Zarras, A., Lengyel, T., Webster, G., & Eckert, C. (2016, July). Adaptive semantics-aware malware classification. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (pp. 419-439). Springer, Cham.

Kolosnjaji, B., Zarras, A., Webster, G., & Eckert, C. (2016, December). Deep learning for classification of malware system call sequences. In Australasian Joint Conference on Artificial Intelligence (pp. 137-149). Springer, Cham.

Kumar, S., Gao, X., Welch, I., & Mansoori, M. (2016, March). A machine learning based web spam filtering approach. In Advanced Information Networking and Applications (AINA), 2016 IEEE 30th International Conference on (pp. 973-980). IEEE.

Meidan, Y., Bohadana, M., Shabtai, A., Guarnizo, J. D., Ochoa, M., Tippenhauer, N. O., & Elovici, Y. (2017, April). ProfilIoT: a machine learning approach for IoT device identification based on network traffic analysis. In Proceedings of the Symposium on Applied Computing (pp. 506-509). ACM.

Milosevic, N., Dehghantanha, A., & Choo, K. K. R. (2017). Machine learning aided android malware classification. Computers & Electrical Engineering, 61, 266-274.

Narudin, F. A., Feizollah, A., Anuar, N. B., & Gani, A. (2016). Evaluation of machine learning classifiers for mobile malware detection. Soft Computing, 20(1), 343-357.

Sethi, K., Chaudhary, S. K., Tripathy, B. K., & Bera, P. (2018, January). A Novel Malware Analysis Framework for Malware Detection and Classification using Machine Learning Approach. In Proceedings of the 19th International Conference on Distributed Computing and Networking (p. 49). ACM.

Varma, P. R. K., Raj, K. P., & Raju, K. S. (2017, February). Android mobile security by detecting and classification of malware based on permissions using machine learning algorithms. In I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), 2017 International Conference on (pp. 294-299). IEEE.

Vatamanu, C., Cosovan, D., Gavrilut, D., & Luchian, H. (2015). A comparative study of malware detection techniques using machine learning methods. World Academy of Science, Engineering and Technology, International Journal of Computer and Information Engineering, 2(5).

Yerima, S. Y., Sezer, S., & Muttik, I. (2015). High accuracy android malware detection using ensemble learning. IET Information Security, 9(6), 313-320.

Calculate the price
Make an order in advance and get the best price
Pages (550 words)
*Price with a welcome 15% discount applied.
Pro tip: If you want to save more money and pay the lowest price, you need to set a more extended deadline.
We know how difficult it is to be a student these days. That's why our prices are one of the most affordable on the market, and there are no hidden fees.

Instead, we offer bonuses, discounts, and free services to make your experience outstanding.
How it works
Receive a 100% original paper that will pass Turnitin from a top essay writing service
step 1
Upload your instructions
Fill out the order form and provide paper details. You can even attach screenshots or add additional instructions later. If something is not clear or missing, the writer will contact you for clarification.
Pro service tips
How to get the most out of your experience with Answers Market
One writer throughout the entire course
If you like the writer, you can hire them again. Just copy & paste their ID on the order form ("Preferred Writer's ID" field). This way, your vocabulary will be uniform, and the writer will be aware of your needs.
The same paper from different writers
You can order essay or any other work from two different writers to choose the best one or give another version to a friend. This can be done through the add-on "Same paper from another writer."
Copy of sources used by the writer
Our college essay writers work with ScienceDirect and other databases. They can send you articles or materials used in PDF or through screenshots. Just tick the "Copy of sources" field on the order form.
See why 20k+ students have chosen us as their sole writing assistance provider
Check out the latest reviews and opinions submitted by real customers worldwide and make an informed decision.
Political science
I like the way it is organized, summarizes the main point, and compare the two articles. Thank you!
Customer 452701, February 12th, 2023
I requested a revision and it was returned in less than 24 hours. Great job!
Customer 452467, November 15th, 2020
Thank you. I will forward critique once I receive it.
Customer 452467, July 25th, 2020
Thank you for your work
Customer 452551, October 22nd, 2021
Political science
Thank you!
Customer 452701, February 12th, 2023
Thank you very much!! I should definitely pass my class now. I appreciate you!!
Customer 452591, June 18th, 2022
Thank you for your help. I made a few minor adjustments to the paper but overall it was good.
Customer 452591, November 11th, 2021
Thank you so much, Reaserch writer. you are so helpfull. I appreciate all the hard works. See you.
Customer 452701, February 12th, 2023
Business Studies
Great paper thanks!
Customer 452543, January 23rd, 2023
Customer reviews in total
Current satisfaction rate
3 pages
Average paper length
Customers referred by a friend
15% OFF your first order
Use a coupon FIRST15 and enjoy expert help with any task at the most affordable price.
Claim my 15% OFF Order in Chat