Document Type : Research Paper

Authors

1 Department of Computational Linguistics, Languages and Linguistics Center, Sharif University of Technology, Tehran, Iran

2 Department of Computer, Faculty of Statistics, Mathematics and Computer, Allameh Tabataba'i University, Tehran, Iran

Abstract

The aim of this research is to survey the performance of several Machine Learning (ML) methods in Persian poem classification into two categories: with allusion and without allusion. To this end, several supervised learning methods are exploited, namely Naive Bayes, Support Vector Machines (SVM), Decision Tree, Random Forest, K-Nearest Neighbor (KNN), Logistic Regression and Multilayer Perceptron algorithms. After collecting the labeled data in format of two text files, each of the verses converted to numerical vector and after merging data and dividing it into two parts of training and testing, each algorithm is implemented on the train set, and is tested on the test set. Output of each algorithm is the predicted label for each verse by the machine. The evaluation method of the algorithms is LOOCV. The results show that Naive Bayes method with 76.09%, Logistic Regression with 76.09%, Multilayer Perceptron with 75.22% and the Support Vector Machines with 74.35% have better performance than the other algorithms. Overall, according to the other criteria such as f1-score and execution time, it can be said that the best performance is related to the Naive Bayes algorithm.

Keywords