Closest Match Based Information Retrieval and Recommendation Engine using Signature-Trees and Fuzzy Relevance Sorting
This paper proposes a recommendation technique to avoid exhaustive search to be ran on the database with thousands of records, before coming to a conclusion or inference, where it can be said that recommended thing is matching up to a significant percentage of what was initially desired. Often such searches involve not just the simple full-match search based on indexes, but also the partial or nearby match searches where which percentage of match between entities is relevant enough for ultimate recommendation. Usually these problems are tackled by various methods like Fuzzy operations, Reg-Ex searches, Clustering, Similarity Analysis each having its own set of effectiveness as well as efficiency. Our goal here was to create a search and recommendation system which can perform fuzzy-search and fuzzy-similarity-analysis with near-match percentages in an effective, efficient as well as user-friendly manner on thousands of records/ files/ rows with 100s of attributes/ features/ columns. Inspired from Google's Image Searching Algorithm, that search on the basis of signatures based on feature-extraction from each image, we have created Match engine, that read schema of data or files, compiles encoded signature and store them as an index. That index is then converted into a tree (S-Tree), on the basis of relevance of each field/ column and data frequency observed. After compilation done, system can now search and recommendation of best matches in very efficient manner. For further optimization we use heuristics like dividing feature sets into hard-filters and soft-filters, former demands full match and later demands fuzzy match. On arriving even one best match, we can retrieve other matches without searching.
Our technique though not that modern and actually inspired, but based on ensemble methods used to provide fast and efficient results. We have proved quicker than full scan searches. In future we plan to make signature comparison engine on variety of advanced data types of features like Geo-coordinates and synonyms. And storing compiled signatures trees into distributed database/grid, query will run concurrently to match the results, or signatures passing through machine learning techniques. Currently system used for recipe recommendation and in future this will be used in applications like dating system’s, film and music recommendation.
- There are currently no refbacks.