jump to navigation

Top-K Results and Threshold Algorithms November 3, 2006

Posted by shahan in information retrieval.

Top-K retrieval results are simply a method of requesting the top k elements from a collection, akin to saying the top 10 sellers on eBay. A Top-K algorithm is also known as a Threshold Algorithm (TA), one which terminates when a certain threshold is achieved. The threshold in IR is based on the inverted index maintained on a particular keyword (associating a relevance value with they keyword’s location, as mentioned in a previous blog entry). The simplest example: if looking for keyword k in an inverted index R containing n elements, where k < n, then by simply sorting R in descending order, we can retrieve the first k elements, resulting in top K. TAs are an extension to performing this task over several lists and provide various optimizations based on available features of the inverted indexes. One important feature is the ability to perform only sequential scans, random access, or a combination. An example of a situation where random access is not available is when basing the inverted index on a list that may change and is not directly accessible, parsing of Google search results for instance. More information on the features and optimization of TAs will be provided later.



No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: