Improving the Effectiveness of Information Retrieval System

  • Su Mon Phyo Department of Computer Engineering and Information Technology, MTU, Mandalay, Myanmar
  • Moe Moe Aye Department of Computer Engineering and Information Technology, MTU, Mandalay, Myanmar
Keywords: Information Retrieval, Probabilistic Model, Keyword Extraction, Keyword Similarity Distance, Related Index.

Abstract

With the rapid growth of information and easy access of information, in particular the boom of the World Wide Web, the problem of finding useful information and knowledge becomes one of the most important topics in information and computer science. Information Retrieval (IR) systems, also called text retrieval systems, facilitate users to retrieve information which is relevant or close to their information needs. This research provides an effective IR system for retrieving not only relevant but also related documents. For retrieving relevant documents, Probabilistic Model is applied. For retrieving related documents, the related indexed table is built including extracted keywords and related documents lists. In constructing related index table in the database, Shannon’s entropy difference between intrinsic and extrinsic mode is used to extract the highly significant keywords.  Entropy threshold value was assigned to 0.5 of normalized entropy difference square ( ) according to the analytical results. The proposed keyword similarity distance (KSD) function is used to calculate similarity and relations between document pair.  The proposed system is implemented by using PHP programming language and MySQL database. The performance of this approach is evaluated by using standard IR metric such as Precision (P), Recall (R), F-measure (F) and Average Precision (AP) on three test datasets (Oshumed, CISI and CRAN). According to the experimental results, the performance of the proposed system using related index table is more effective than the traditional probabilistic model.

References

[1] M. Donge and V. Nandedkar, Information Retrieval using Context Based Document Indexing, International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), vol. 2, issue. 12, Dec 2014.
[2] J. Singh, P. Singh, Y. Chaba, Performance modeling of information retrieval techniques using similarity functions in wide area networks, International Journal of Advanced Research in Computer Science and Software Engineering, vol.4, issue.12, Dec 2014.
[3] J. Singh, P. Singh, Y. Chaba, Performance evaluation and design of optimized information retrieval techniques using similarity functions in wide area networks, IJARCSSE, vol.5, issue.1, Jan 2015.
[4] T.K.Landauer and S.T Dumais, “A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge", Psychological Review, 1997.
[5] R. K. Rajpal and Y. Rathore, “A Novel Techinque For Ranking of Documents Using Semantic Similarity”, International Journal of Computer Science and Information Technologies (IJCSIT), vol. 5, 2014.
[6] K. Lund and C.Burgess, “Producing high-dimensional semantic spaces from lexical co-occurrence”, Behavior Research Methods, Instruments & Computers, p. 203-208, 1996.
[7] H. Sch¨utze, Introduction to Information Retrieval, http://informationretrieval.org, Institute for Natural Language Processing, University¨at Stuttgart, Aug 2011.
[8] S. M. Phyo, L. W. Kyi, Developing related index table for effective IR system, The Sixth International Conference on Science and Engineering (ICSE), Myanmar, Dec 2015.
Published
2016-09-23
Section
Articles