Semantic Based Information Retrieval System Using Modified Inverse Document Frequency

  • Zun May Myint Department of Computer Engineering and Information Technology Mandalay Technological University, Myanmar
  • Phyo Thuzar Tun Department of Computer Engineering and Information Technology Mandalay Technological University, Myanmar
Keywords: Cosine Similarity, Modified Inverse Document Frequency (MIDF), IR, KNN Classifier, VSM, WSD, WordNet.


Today, Information Retrieval (IR) provides users with documents that will satisfy their information need. Word sense ambiguity is a cause of poor performance in IR system. IR performance will be increased if ambiguous words can be correctly disambiguated. Word Sense disambiguation (WSD) is the task to assign the correct meaning to such ambiguous words based on the surrounding context. Various senses provided by WSD process have been used as semantics for indexing the documents to aid the information retrieval system. K-Nearest Neighbour (KNN) is used for effective text classification in WSD process and the Vector Space Model (VSM) is used for IR process. The cosine similarity method is used in both KNN and VSM to calculate the similarity in which term frequency and inverse document frequency (TF-IDF) scheme is used to calculate the weight of each word. There is a challenge that the original TF-IDF scheme eliminates the related senses although there is a related sense. This paper thus proposes the modified TF-IDF method, so called TF-MIDF, to solve the no-relevant problem by modifying the IDF equation to improve the accuracy of IR performance. By comparing the performance between the original IDF scheme and the MIDF scheme, the average precision results of the original TF-IDF method is 71% and the average precision results of the TF-MIDF is 80%. Therefore, the proposed methodology is more precise than the original method while retrieving the relevant documents of the required information.


[1] S. Christopher and P.Oakes, " Word Sense Disambiguation in Information Retrieval Revisted", The University of Sunderland , Informatic Centre, August, 2003, Canada.
[2] D. Subarani, “Concept Based Information Retrieval from Text Documents”, Dept. of Computer Sciences, SLN College of Sciences, Tirupathi, India, IOSR Journal of Computer Engineering (IOSRJCE), PP 38-38, July-Aug, 2012.
[3] S. Viswanadha Raju, J. Sreedhar and P. Pavan Kumar, “Word Sense Disambiguation: An Empirical Survey”, International Journal of Soft Computing and Engineering (IJSCE), Volume-2, Issue-2, May, 2012.
[4] R. Navigli, “Word Sense Disambiguation: A Survey”, ACM Computing Surveys, Vol. 41, No. 2, Article 10, Italy, February, 2009.
[5] N. Sharma and S. Niranjan, “ An Optimized Combinational Approach of Learning Algorithm for Word Sense Disambiguation”, International Journal of Science and Research (IJSR), vol 3 Issuse 6, June 2014.
[6] P. Tamilselvi and S. K. Srivatsa (2011), “Word Sense Disambiguation using Case based Approach with Minimal Features Set”, Indian Journal of Computer Science and Engineering (IJCSE), vol. 2, no. 4, pp. 628-633, 2011.
[7] P. Tamilselvi and S.K. Srivatsa , “Case Based Word Sense Disambiguation Using Optimal Features”, International Conference on Information communication and Management IPCSIT vol. 16, 2011, Singapore
[8] A. R. Rezapour, S. M. Fakhrahmad and M. H. Sadreddini, “Applying Weighted KNN to Word Sense Disabiguation”, Proceedings of the World Congress on Engineering, Vol III, U.K, July 6-8, 2011.
[9] M. Barathi and S. Valli , “Ontology Based Query Expansion Using Word Sense Disambiguation”, International Journal of Computer Science and Information Security(IJCSI), vol. 7. No.2, February 2010.
[10] Donald Metzler “Generalized Inverse Document Frequency”, Napa Valley, California, USA, October 26-30, 2008.
[11] M. Nameh, S.M. Fakhrahmad and M. Zolghadri Jahromi , “A New Approach to Word Sense Disambiguation Based on Context Similarity”, Proceedings of the World Congress on Engineering(WCE), vol 1. July 6-8, 2011, London, U.K.,
[12] B. Liu, Web Data Mining, Department of Computer Science, University of Illinois at Chicago, USA, 2007.