Learning implicit user interest hierarchy for web personalization
MetadataShow full item record
Most web search engines are designed to serve all users in a general way, without considering the interests of individual users. In contrast, personalized web search engines incorporate an individual user's interests when choosing relevant web pages to return. In order to provide a more robust context for personalization, a user interest hierarchy (UIH) is presented. The UIH extracts a continuum of general to specific user interests from web pages and generates a uniquely personalized order to search results. This dissertation consists of five main parts. First, a divisive hierarchical clustering (DHC) algorithm is proposed to group words (topics) into a hierarchy where more general interests are represented by a larger set of words. Second, a variable-length phrase-finding (VPF) algorithm that finds meaningful phrases from a web page is introduced. Third, two new desirable properties that a correlation function should satisfy are proposed. These properties will help understand the general characteristics of a correlation function and help choose or devise correct correlation functions for an application domain. Fourth, methods are examined that rank the results from a search engine depending on user interests based on the contents of a web page and the UIH. Fifth, previously studied implicit indicators for interesting web pages are evaluated. The time spent on a web page and other new indicators are examined in more detail as well. Experimental results indicate that the personalized ranking methods presented in this study, when used with a popular search engine, can yield more relevant web pages for individual users. The precision/recall analysis showed that our weighted term scoring function could provide more accurate ranking than Google on average.