Source Code Similarity Indicators Based on Machine Learning

Main Article Content

Ajinkya Rajendrakumar Kunjir Jinan Fiaidhi


This paper presents a hybrid system-driven machine learning research to classify plagiarized assignments in academics. Previous research has shown several similarities or clone detector engines developed to identify plagiarized source-code assignments and make things easy for the evaluators to determine the code thieves. The research presented in this paper is the successor of the former module in which our similarity detector engine labelled 'SimDec' detects plagiarism in C and C++ source codes. This paper will illustrate machine learning algorithms' work for solving prediction problems on the system data generated by the 'SimDec' software engine. Multi-class SVM, Logistic regression, and a Simple Neural Network in the supervised learning spectrum have been implemented on the attribute counting methods (ATM) generated data to predict the source codes' plagiarism severity. In this incremental process, the experimentation of applying unsupervised learning on the same data by discarding the target variable was performed to deliver a comparative study of ML algorithms. Finally, the directions for improving the system are pointed out. The proposed hybrid approach would reduce our similarity detector's time complexity and processing speed.

Article Details

Author Biography

Jinan Fiaidhi, Full Professor and Biotechnology PhD Program Coordinator

Dr. Jinan Fiaidhi is a full Professor of Computer Science and the Graduate Coordinator of the PhD program in Biotechnology at Lakehead University, Ontario, Canada since late 2001. She was the grad coordinator for the Lakehead University Computer Science MSc program for the period (2009-2018). She is also an Adjunct Research Professor with the University of Western Ontario. She received her graduate degrees in Computer Science from Essex University (PgD 1983) and Brunel University (PhD, 1986). During the period (1986-2001), Dr. Fiaidhi served at many academic positions (e.g. University of Technology (Asso. Prof and Chairperson), Philadelphia University (Asso. Prof), Applied Science University (Professor), Sultan Qaboos University (Asso. Prof.). Dr. Fiaidhi research is focused on mobile and collaborative learning utilizing the emerging technologies (e.g. Deep Learning, Cloud Computing, Calm Computing, Mobile Learning, Learning Analytics, Social Networking, Croudsourcing, Enterprise Mashups, OpenData, Extreme Automation and Semantic Web). Dr. Fiaidhi research is supported by the major research granting associations in Canada (e.g. NSERC, CFI). Dr. Fiaidhi is a Professional Software Engineer of Ontario (PEng), Senior Member of IEEE, member of the British Computer Society (MBCS) and member of the Canadian Information Society (CIPS) holding the designate of ISP. Dr.Fiaidhi is the chair of of the IEEE Special Interest Research Group on Big Data for eHealth with IEEE ComSoc eHealth TC. Moreover Dr. Fiaidhi is the Editor in Chief for the IGI Global International Journal of Extreme Automation and Connectivity in Heathcare (IJEACH).  More information on her publications and news can be found at her institution web page: