Academic CV Template March 25, 2010Posted by shahan in Uncategorized.
Tags: academic, cv, latex, resume
add a comment
After searching for longer than I would have liked to, I found a good Tex/Latex template for academic CVs at the following link:
Globalive and what net-neutrality is and isn’t December 13, 2009Posted by shahan in Uncategorized.
Tags: net neutrality, wireless provider
add a comment
The story here is that the smaller internet providers won’t have access to the newer/faster connections that Bell and Telus setup. Also that internet access is not yet deemed an essential service.
Basically, there’s a lot of brewhaha about what net-neutrality really is. Financial issues are really the side-dish of the whole meal.
Net-neutrality is really about giving equal access to the material on the internet, i.e., not limiting P2P transfers, or skype calls.
The issue is not about the _physical_ connection nor about the cost of that access. The biggest earner for the internet provider is not the fastest (and therefor most expensive) connection, it’s the general user who pays for a reasonably fast connection then simply sends email and visits a few sites. Many reports already find Canada way behind the times with its internet costing and technology policies. The end-result in my opinion is, as more wireless providers enter the market (“more” means that an increase from 3 national providers to 4 with Globalive’s entrance which also relies on Rogers’ network), then internet providers will/should move to wireless technologies. This I think is an important step especially for a large land mass like Canada where physical connections as offered by Bell and Telus should take the back-seat.
ICDE 2008 DescribeX Demonstration April 10, 2008Posted by shahan in Uncategorized.
Tags: automaton, automaton intersection, Brics, describex, eclipse, GEF, refinement, structural summary, XML, Zest
add a comment
This post is an outline of the DescribeX and the demonstration at ICDE 2008. The 4-page demonstration submission will be available soon.
UPDATE: The submission is available online here.
DescribeX is a graphical Eclipse plugin for interacting with structural summaries of XML collections. It is developed in Java using GEF, Zest (now incorporated into GEF), Brics (a Java automaton library), and Apache Lucene (a Java information retrieval library). The structural summaries are defined using an axis path regular expression (AxPRE).
Several versions have been developed, each new version allowing a different type of summary as well as different interactions with the summary.
The oldest version, originally developed for Cascon 2006, created a P* summary (or F&B-Index) and thus created the structural summary as a tree. A tree graph layout algorithm from GEF was used. Only a P*C refinement was available using XPath expressions evaluated against all the files in the collection. The control panel for this version is on the bottom on the left.
The second version allowed the creation of an A(k)-index, allowing the user to specify the height in the path for which to consider when creating the summary partitions. This used Zest (now incorporated into GEF) for the layout algorithm due since a structural summary based on the A(k)-index can create a graph instead of a tree.
The third version implements the true AxPRE expressions, using the Brics automaton Java library for converting the regular expression to a NFA. A label summary was created of the collection and refinements were processed by intersecting the NFA of the regular expression with the automaton of the label summary. Zest was also used for the layout algorithm. The control panel for this version is in the middle on the left side.
The differences between the versions are in the extra features such as the additional filters such as coverage and highlighting elements from a keyword query.
The key points of the demonstration are that our tool allows a user to quickly and easily determine the paths that exist in the collection, determine the importance of summary nodes, as well as interact with the structural summary by performing refinements. An additional aspect is the ability to highlight the elements that contain the terms in keyword search, this is in relation to our participation in INEX.
The attached screenshot shows three graphs, the topmost and middle graphs are P* structural summaries (or F&B-indexes) of two protein-protein interaction (PPI) datasets conforming to the PSI-MI schema standard. These two graphs are based on the first version and shows the important nodes coloured green using a coverage value of 50%, i.e. showing the nodes that together contain 50% of the entire collection’s total number of extents. Other coverage measures are easily available (such as a random walk coverage) and easily implementable. The first (topmost) dataset, HPRD, is a single 60MB XML file while the second (middle) dataset, Intact, is a collection of 6 XML files totalling 20MB. It should be noted that these are only a small subset of the gigabyte size collections available. We can see that the structure of the larger HPRD collection has a smaller structure in use than the Intact collection.
I obtained some very good feedback after demonstrating DescribeX to several of the attendees. Some of the feedback included displaying cardinalities as well as displaying the information retrieval component while using summaries. It would have been nice to show how the scoring of a document would have been affected if some of the summary nodes were refined using an AxPRE to combine elements containing the search term. Next time I hope to allow the user to use the plugin to prod the product, “It’s like walking the high wire without a safety net” as Guy Lohman put it.
Future work involves preparing a downloadable plugin for interested users. As it stands, the three versions can be made available and can work alongside each other (and actually the third version requires the first version); however, the instructions for use have not been updated in a while (though the application is easy to use). There is also a lack of extensibility of the newer version since I would like to update the way in which the extension point for filters and coverage are implemented.
Alternative Search Engines October 13, 2007Posted by shahan in Uncategorized.
Tags: information retrieval, search engines, semi-structured information
add a comment
In response to WebWorkerDaily’s article, none of the search engines listed include retrieval using structured information. Although I’m involved with information retrieval as part of my research, I don’t spend a lot of time exploring the search engines “out there”. The only reason I can give is that they haven’t done for me what Google already does with a little bit of query creativity. While searching news or blogs may have the benefit of limited scope, there’s no demonstration of added benefit.
A consequence of limiting search to a niche is that the popular terms within that niche become “boosted” automatically without being subsumed, e.g., by a larger news service or certain wiki. Another is that the rate of re-crawling already indexed pages can be better managed. I’ll make it a point to explore whether these search engines examine markup on the page when crawling though this is unlikely.
Currently my research efforts in information retrieval are over semi-structured document collections. Within our group we have been experimenting with boosts to certain structural elements and although our efforts have met slight improvements in the result rankings, there are a number of other tests to be run which I anticipate to reveal better boosting factors. The boosts thus far that we have experimented with have excluded subelement content lengths and are calculated as: sum, log(sum), 1/log(sum), avg, and no boosting. The boosting is based on a Markov Chain Model developed for Strucutural Relevance by Sadek Ali and shows great promise in using summaries.
Improving Blog Traffic October 11, 2007Posted by shahan in Uncategorized.
add a comment
As a relatively new blogger, I’ve often wondered how I want to portray my writings and have begun to make it a higher priority over the last few weeks. One of the best things about blogging is that it is a way to hold myself accountable publicly. I’m listing a few questions and their answers for what I see VannevarVision to be.
What am I blogging about?
internet, information retrieval, online social networks, some eclipse programming
Who is my audience?
researchers or those interested in the more technical details of the topics listed
Do I want readers to keep coming back?
of course, I think I have interesting things to say
What is my target post rate?
currently at least once a week, I will get this down to once a day.
Most Importantly… What is my motivation?
I have a voice, I have a pretty good idea of what I’m talking about, I will make a change somewhere that will affect readers like you. I have valuable experiences to draw from and I’d like to be remembered amongst the archives 100 years down the road when someone is digging through trying to piece my biography together to determine what kind of foods I ate, not to mention how many beers I drank. It’d be nice in the future for my kids when they’re looking through the old-school internet and see that I was serious about my work.
nothing like the present, I don’t need my forebrain smacked in the form of a wakeup call