Enhancing Personalized Search And Improving Accuracy And Performance For Keyword-based XML Queries

Enhancing Personalized Search And Improving Accuracy And Performance For Keyword-based XML Queries

Date

2010-07-19

Publisher

Computer Science & Engineering

Abstract

This dissertation research focuses on three aspects related to querying of XML data. The three focus areas are: (1) Improving accuracy of XML keyword queries by modeling the contexts of XML elements; (2) Enhancing XML-based personalized search by using group profiling to determine individual preferences; and (3) Improving performance of distributed XML querying by caching of frequently-used query results. For each of these three focus areas, we developed formal concepts and algorithms that lead to the improved accuracy and performance. Our contributions are as follows:1. Improving the accuracy of XML keyword queries:We improve search accuracy by utilizing nodes' contexts in an XML tree. Overlooking nodes' contexts when building relationships between the nodes may lead to erroneous query results. The context of a data node is determined by its parent node. By treating each set of nodes consisting of a parent and its children data nodes as one unified entity and then determining the relationships between the different unified entities, an XML system can build much more accurate relationships between data nodes in less processing time, resulting in more accurate query results.2. Enhancing XML-based personalized search: By pre-defining and categorizing social groups based on demographic, ethnic, cultural, religious, or other characteristics, a user profile could be inferred from the profiles of the social groups to which the user belongs. This would simplify personalized search and make its process more efficient. We implemented this approach in an XML-based recommender system. The system is able to output ranked lists of content items taking into account not only the initial preferences of the user, but also the preferences of the user's various social groups.3. Improving performance of distributed XML querying:Distributed XML documents are too big and complicated to be rapidly queried every time a user submits a query due to the overhead involved in decomposing the queries, sending the decomposed queries to remote site(s), and executing structural join operations to compose the results. We investigated strategies and mechanisms to tackle these problems. We then implemented these mechanisms in a query processor, and compared their performance to standard XML query processors.