Source Code Similarity Indicators Based on Machine Learning
Main Article Content
Abstract
This paper presents a hybrid system-driven machine learning research to classify plagiarized assignments in academics. Previous research has shown several similarities or clone detector engines developed to identify plagiarized source-code assignments and make things easy for the evaluators to determine the code thieves. The research presented in this paper is the successor of the former module in which our similarity detector engine labelled 'SimDec' detects plagiarism in C and C++ source codes. This paper will illustrate machine learning algorithms' work for solving prediction problems on the system data generated by the 'SimDec' software engine. Multi-class SVM, Logistic regression, and a Simple Neural Network in the supervised learning spectrum have been implemented on the attribute counting methods (ATM) generated data to predict the source codes' plagiarism severity. In this incremental process, the experimentation of applying unsupervised learning on the same data by discarding the target variable was performed to deliver a comparative study of ML algorithms. Finally, the directions for improving the system are pointed out. The proposed hybrid approach would reduce our similarity detector's time complexity and processing speed.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors hold copyright to the articles or abstracts submitted for publication.
All articles or abstracts published by JMRT are under the terms of the Creative Commons Attribution License. This permits anyone to copy, distribute, transmit and adapt the work provided the original work and source is appropriately cited.