Closest Match Based Information Retrieval and Recommendation Engine using Signature-Trees and Fuzzy Relevance Sorting

Authors

  • Ali Sohani Data Science Department, Cubix.co
  • . Rafi Ullah Data Science Department, Cubix.co
  • Athaul Rai Data Science Department, Cubix.co
  • Owais Karni Data Science Department, Cubix.co

Keywords:

Signature tree matching, Recommendation system’s, fuzzy recommendation system, Fuzzy Relevance Sorting.

Abstract

This paper proposes a recommendation technique to avoid exhaustive search to be ran on the database with thousands of records, before coming to a conclusion or inference, where it can be said that recommended thing is matching up to a significant percentage of what was initially desired. Often such searches involve not just the simple full-match search based on indexes, but also the partial or nearby match searches where which percentage of match between entities is relevant enough for ultimate recommendation. Usually these problems are tackled by various methods like Fuzzy operations, Reg-Ex searches, Clustering, Similarity Analysis each having its own set of effectiveness as well as efficiency. Our goal here was to create a search and recommendation system which can perform fuzzy-search and fuzzy-similarity-analysis with near-match percentages in an effective, efficient as well as user-friendly manner on thousands of records/ files/ rows with 100s of attributes/ features/ columns. Inspired from Google's Image Searching Algorithm, that search on the basis of signatures based on feature-extraction from each image, we have created Match engine, that read schema of data or files, compiles encoded signature and store them as an index. That index is then converted into a tree (S-Tree), on the basis of relevance of each field/ column and data frequency observed. After compilation done, system can now search and recommendation of best matches in very efficient manner. For further optimization we use heuristics like dividing feature sets into hard-filters and soft-filters, former demands full match and later demands fuzzy match. On arriving even one best match, we can retrieve other matches without searching.

Our technique though not that modern and actually inspired, but based on ensemble methods used to provide fast and efficient results. We have proved quicker than full scan searches. In future we plan to make signature comparison engine on variety of advanced data types of features like Geo-coordinates and synonyms. And storing compiled signatures trees into distributed database/grid, query will run concurrently to match the results, or signatures passing through machine learning techniques. Currently system used for recipe recommendation and in future this will be used in applications like dating system’s, film and music recommendation.

References

[1] Deppisch, Uwe. "S-tree: a dynamic balanced signature index for office retrieval." In Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 77-87. ACM, 1986.
[2] Frakes, William B., and Ricardo Baeza-Yates. "Information retrieval: data structures and algorithms." (1992).
[3] Faloutsos, Chris, and Stavros Christodoulakis. "Signature files: An access method for documents and its analytical performance evaluation." ACM Transactions on Information Systems (TOIS) 2, no. 4 (1984): 267-288.
[4] Chang, Walter W., and Hans-Jörg Schek. A signature access method for the Starburst database system. IBM Thomas J. Watson Research Division, 1989.
[5] Zezula, Pavel, Fausto Rabitti, and Paolo Tiberio. "Dynamic partitioning of signature files." ACM Transactions on Information Systems (TOIS) 9, no. 4 (1991): 336-367.
[6] Lee, Dik Lun, and Chun-Wu Leng. "Partitioned signature files: Design issues and performance evaluation." ACM Transactions on Information Systems (TOIS) 7, no. 2 (1989): 158-180.
[7] Snasel, Vaclav, Zdenek Horak, Milos Kudelka, and Ajith Abraham. "Fuzzy signatures organized using S-Tree." In Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on, pp. 633-637. IEEE, 2011.
[8] Choi, Hyunsik, HaRim Jung, Ki Yong Lee, and Yon Dohn Chung. "Skyline queries on keyword-matched data." Information Sciences 232 (2013): 449-463.
[9] Olaru, Cristina, and Louis Wehenkel. "A complete fuzzy decision tree technique." Fuzzy sets and systems 138, no. 2 (2003): 221-254.
[10] https://content.iospress.com/articles/journal-of-intelligent-and-fuzzy-systems/ifs710
[11] Yazdi, Hadi Sadoghi, Mohammad GhasemiGol, Sohrab Effati, Azam Jiriani, and Reza Monsefi. "Hierarchical tree clustering of fuzzy number." Journal of Intelligent & Fuzzy Systems 26, no. 2 (2014): 541-550.
[12] De Felipe, Ian, Vagelis Hristidis, and Naphtali Rishe. "Keyword search on spatial databases." In Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, pp. 656-665. IEEE, 2008.
[13] Faloutsos, Christos. "Signature-based text retrieval methods: A survey." IEEE Data Eng. Bull. 13, no. 1 (1990): 25-32.
[14] Faloutsos, Christos. "Signature Files." (1992): 44-65.
[15] Chen, Yangjun. "Signature files and signature trees." Information Processing Letters 82, no. 4 (2002): 213-221.
[16] Helmer, Sven. "Evaluating different approaches for indexing fuzzy sets." Fuzzy Sets and Systems 140, no. 1 (2003): 167-182.
[17] Tousidou, Eleni, Alex Nanopoulos, and Yannis Manolopoulos. "Improved methods for signature-tree construction." The Computer Journal 43, no. 4 (2000): 301-314.
[18] Lee, Dik Lun, and Chun-Wu Leng. "A partitioned signature file structure for multiattribute and text retrieval." In Data Engineering, 1990. Proceedings. Sixth International Conference on, pp. 389-396. IEEE, 1990.
[19] Tousidou, Eleni, Panayiotis Bozanis, and Yannis Manolopoulos. "Signature-based structures for objects with set-valued attributes." Information Systems 27, no. 2 (2002): 93-121.

Downloads

Published

2018-08-01

How to Cite

Sohani, A., Rafi Ullah, ., Rai, A., & Karni, O. (2018). Closest Match Based Information Retrieval and Recommendation Engine using Signature-Trees and Fuzzy Relevance Sorting. American Scientific Research Journal for Engineering, Technology, and Sciences, 45(1), 120–134. Retrieved from https://asrjetsjournal.org/index.php/American_Scientific_Journal/article/view/4208

Issue

Section

Articles