Top-K Results and Threshold Algorithms November 3, 2006Posted by shahan in information retrieval.
Top-K retrieval results are simply a method of requesting the top k elements from a collection, akin to saying the top 10 sellers on eBay. A Top-K algorithm is also known as a Threshold Algorithm (TA), one which terminates when a certain threshold is achieved. The threshold in IR is based on the inverted index maintained on a particular keyword (associating a relevance value with they keyword’s location, as mentioned in a previous blog entry). The simplest example: if looking for keyword k in an inverted index R containing n elements, where k < n, then by simply sorting R in descending order, we can retrieve the first k elements, resulting in top K. TAs are an extension to performing this task over several lists and provide various optimizations based on available features of the inverted indexes. One important feature is the ability to perform only sequential scans, random access, or a combination. An example of a situation where random access is not available is when basing the inverted index on a list that may change and is not directly accessible, parsing of Google search results for instance. More information on the features and optimization of TAs will be provided later.