Tags: cascon, conference, describex, ibm, vistopk
add a comment
I’m happy to say that two projects that I work on, DescribeX (a team effort with Sadek Ali and Flavio Rizzolo) and VisTop-k, both of which are supervised by Dr. Mariano Consens, will be demonstrated at IBM’s CASCON Technology Showcase on October 22 – 25, 2007. There were quite a few interesting projects last year and I’m looking forward to seeing what new ideas have arisen, especially since my Eclipse plugin skills have increased a tremendous amount. As a student I’m also looking forward to the food
Eclipse Plug-in GEF Tutorial March 8, 2007Posted by shahan in eclipse, eclipse plugin, GEF.
1 comment so far
I’ve attached a (very very) draft version of building a GEF-based graphical editor for Eclipse. It doesn’t have any pictures but is step by step and is complete enough allowing me to recreate a plug-in from scratch without having to memorize the process. I wrote it as the initial learning curve for getting started with GEF was higher than I initially anticipated. Feedback is always appreciated.
VisTopK Updated January 14, 2007Posted by shahan in eclipse, eclipse plugin, GEF.
add a comment
I did some more reading into GEF and have advanced considerably. I’ve added bookmarking of timeframes and these are saved with the search terms to the currently edited VisTopK file. I also added mousewheel zooming for when the document graph scrolls off the screen. The bookmarks are marked off with a purple border similar to the screenshot in my previous post but without the t# text. Maybe it should be there?
Next steps are: index selection (local/network), specifying xpath expressions, specifying field to search
VisTopK Screenshot Available January 11, 2007Posted by shahan in eclipse, eclipse plugin, GEF, information retrieval, visualization.
add a comment
Although there was a large break in between VisTopK-related posts, the project is now complete. A labelled screenshot is provided for your benefit. The report will be uploaded soon as well. I’m very happy with the result and am excited by the possibilities offered by the plug-in as it allows integration of many other projects, existing and new.
I’ve created a screencast of VisTopK in action using the great application Wink, originally referenced from Greg’s blog entry, but WordPress doesn’t allow Flash (SWF) uploads. Anyone have any ideas on how/where I can post it? I tried converting it to an AVI to maybe post it to YouTube but the size of 1GB stopped that attempt cold.
VisTopK – Initial Setup November 6, 2006Posted by shahan in eclipse, eclipse plugin, GEF.
add a comment
Today I officially started coding for the project and was very productive! I needed to setup a set of inverted indexes based on a collection of files. To setup these indexes, I used Apache Lucene and was it ever easy. I had some difficulties initially as I was trying to use some contributed modules (the in-memory indexer to be exact, but I figured it might come in handy, at some point). I also incorrectly decided to use an embedded relational database to allow for a more “natural” way to access indexes. Based on the information found here, I decided to give HSQLDB a shot and it was extremely easy to setup and use, but instead, I removed the relational database, used Lucene’s built-in query engine, and accessed all the inverted indexes for the terms within the document collection.
Now it’s a matter of deciding to take TReX’s existing No-Random-Access threshold algorithm code or just roll my own. Reasons to stay away from the existing code are: tight integration with TReX’s data structures, lack of parameters for its use, and a bad code smell. If I roll my own, then it’s to decide whether I should integrate it into Lucene or keep it as a simple external algorithm engine. An ambitious Lucene contrib vision possibly? I’ve never contributed to open source but am truly inspired by the dedication required as described in Karl Fogel’s (FREE) ebook Producing Open Source Software. Starting from scratch will also allow for a nicer class hierarchy to take advantage of some interesting concepts mentioned in the IO-Top-k paper, concepts such as propabilistic inference or skew detection to terminate the algorithm even sooner.
Once that’s done I’ll have a prototype for XML document collection indexing and retrieval using a threshold algorithm for top-k query processing.
Some related tools that seem interesting are: Luke, a Lucene index-modification/viewing tool, very nice looking and feature filled. Lius, which I haven’t tried and seems to do the same thing as Luke.
VisTopK – Visualization of Top-K Information Retrieval Algorithms November 3, 2006Posted by shahan in eclipse, eclipse plugin, GEF.
add a comment
This project is for a course credit and is supervised by Dr. Consens. I will provide a visualization of Top-K results using static and dynamic views of Threshold Algorithm (TA) traces. This involves the use of inverted indexes from an XML document collection and will explore the effect of various factors such as compression, encoding, and retrieval/indexing algorithms. The deliverable will be a plug-in in Eclipse, due to its extensibility and ease of integration with various components. Current thoughts of the plugin are a main view containing a stock-chart and a property view. The visualization will be a front end to the XML format trace data collected during a separate TA run. Static and dynamic views will be provided for snapshots and temporal views. The visualization will be based on the depiction in:
IO-Top-k: Index-access Optimized Top-k Query Processing. Holger Bast, Debapriyo Majumdar, Ralf Schenkel, Martin Theobald, Gerhard Weikum.
An interesting paper which provides a good basis for TAs, the algorithms themselves are relatively easy to understand but requires acquiring the terminology of the context. The relation between, advantages, and differences of several different TA methods are discussed, some of which are No Random Access (NRA), Random Access (RA), and a combination of both.