jump to navigation

Academic CV Template March 25, 2010

Posted by shahan in Uncategorized.
Tags: , , ,
add a comment

After searching for longer than I would have liked to, I found a good Tex/Latex template for academic CVs at the following link:


Globalive and what net-neutrality is and isn’t December 13, 2009

Posted by shahan in Uncategorized.
Tags: ,
add a comment


The story here is that the smaller internet providers won’t have access to the newer/faster connections that Bell and Telus setup. Also that internet access is not yet deemed an essential service.

Basically, there’s a lot of brewhaha about what net-neutrality really is. Financial issues are really the side-dish of the whole meal.
Net-neutrality is really about giving equal access to the material on the internet, i.e., not limiting P2P transfers, or skype calls.
The issue is not about the _physical_ connection nor about the cost of that access. The biggest earner for the internet provider is not the fastest (and therefor most expensive) connection, it’s the general user who pays for a reasonably fast connection then simply sends email and visits a few sites. Many reports already find Canada way behind the times with its internet costing and technology policies. The end-result in my opinion is, as more wireless providers enter the market (“more” means that an increase from 3 national providers to 4 with Globalive’s entrance which also relies on Rogers’ network), then internet providers will/should move to wireless technologies. This I think is an important step especially for a large land mass like Canada where physical connections as offered by Bell and Telus should take the back-seat.

Facebook Continues Unconsented Invasion of Privacy December 13, 2009

Posted by shahan in Uncategorized.
Tags: ,
add a comment


Even though I’m not on facebook for privacy reasons, there is still an inherent flaw that facebook collects info on non-users. It’s done by installing the facebook application on the blackberry which then gives the owner the choice of storing the blackberry contact list on facebook.

Also, recently facebook upgraded its privacy controls to allow people posting content to control who sees each and every piece of content – but if you noticed, the wizard’s default is to open all the info to everyone again! I was so close to signing up and then poof, they try to pull another one over people’s eyes. Can’t wait for a truly open-source social network solution to exist.

I also can’t wait till google voice comes up with its service in canada where disposable numbers and email address will become the norm! I really have to be ghost in the meantime.

ICDE 2008 DescribeX Demonstration April 10, 2008

Posted by shahan in Uncategorized.
Tags: , , , , , , , , ,
add a comment

This post is an outline of the DescribeX and the demonstration at ICDE 2008. The 4-page demonstration submission will be available soon.

UPDATE: The submission is available online here.

DescribeX is a graphical Eclipse plugin for interacting with structural summaries of XML collections. It is developed in Java using GEF, Zest (now incorporated into GEF), Brics (a Java automaton library), and Apache Lucene (a Java information retrieval library). The structural summaries are defined using an axis path regular expression (AxPRE).

Several versions have been developed, each new version allowing a different type of summary as well as different interactions with the summary.

The oldest version, originally developed for Cascon 2006, created a P* summary (or F&B-Index) and thus created the structural summary as a tree. A tree graph layout algorithm from GEF was used. Only a P*C refinement was available using XPath expressions evaluated against all the files in the collection. The control panel for this version is on the bottom on the left.

The second version allowed the creation of an A(k)-index, allowing the user to specify the height in the path for which to consider when creating the summary partitions. This used Zest (now incorporated into GEF) for the layout algorithm due since a structural summary based on the A(k)-index can create a graph instead of a tree.

The third version implements the true AxPRE expressions, using the Brics automaton Java library for converting the regular expression to a NFA. A label summary was created of the collection and refinements were processed by intersecting the NFA of the regular expression with the automaton of the label summary. Zest was also used for the layout algorithm. The control panel for this version is in the middle on the left side.

The differences between the versions are in the extra features such as the additional filters such as coverage and highlighting elements from a keyword query.

The key points of the demonstration are that our tool allows a user to quickly and easily determine the paths that exist in the collection, determine the importance of summary nodes, as well as interact with the structural summary by performing refinements. An additional aspect is the ability to highlight the elements that contain the terms in keyword search, this is in relation to our participation in INEX.

The attached screenshot shows three graphs, the topmost and middle graphs are P* structural summaries (or F&B-indexes) of two protein-protein interaction (PPI) datasets conforming to the PSI-MI schema standard. These two graphs are based on the first version and shows the important nodes coloured green using a coverage value of 50%, i.e. showing the nodes that together contain 50% of the entire collection’s total number of extents. Other coverage measures are easily available (such as a random walk coverage) and easily implementable. The first (topmost) dataset, HPRD, is a single 60MB XML file while the second (middle) dataset, Intact, is a collection of 6 XML files totalling 20MB. It should be noted that these are only a small subset of the gigabyte size collections available. We can see that the structure of the larger HPRD collection has a smaller structure in use than the Intact collection.

I obtained some very good feedback after demonstrating DescribeX to several of the attendees. Some of the feedback included displaying cardinalities as well as displaying the information retrieval component while using summaries. It would have been nice to show how the scoring of a document would have been affected if some of the summary nodes were refined using an AxPRE to combine elements containing the search term. Next time I hope to allow the user to use the plugin to prod the product, “It’s like walking the high wire without a safety net” as Guy Lohman put it.

Future work involves preparing a downloadable plugin for interested users. As it stands, the three versions can be made available and can work alongside each other (and actually the third version requires the first version); however, the instructions for use have not been updated in a while (though the application is easy to use). There is also a lack of extensibility of the newer version since I would like to update the way in which the extension point for filters and coverage are implemented.

Overview Screenshot of DescribeX Demonstration at ICDE 2008 Cancun

The value of the semantic web. RDF$? November 6, 2007

Posted by shahan in information retrieval, internet artchitecture, online social networks, openid, semantic web, standards, Uncategorized.
add a comment

The question that this entry seeks to answer is, “Using the semantic web, what resources are available that have meaningful marketable value?”.

While the value of the semantic web has been touted, marketable value is not as widely discussed. However; in order to encourage Google to develop an OpenRDF API, they need to see what it can do for them. In my previous post about Search Standards, I mentioned measurement of a person’s search preferences, such as type of content to search and metric ranges, is key to improving results. Combining Greg Wilson’s post about Measurement with the value-of-data issues mentioned in Bob Warfield’s User-Contributed Data Auditing we now want to understand how to retrieve semantically marked-up content which has the ability to generate revenue.

User-generated semantic metrics are easily achieved with the semantic web. Further, semantic metrics can be tied together using various means, one of which is mentioned in Dan Connolly’s blog entry Units of measure and property chaining. It should be noted that, due to the extensibility of semantic data, the value or metrics are independent of any specifics, thus allowing it to be used for trust metrics as well.

There is a general use case which describes what I mean:

  1. Content is made available. The quality is not called into question, yet.
  2. The content is semantically marked up so that it has properties that mean something.
  3. Other users markup the content even further but with personally-relevant properties that can be created by themselves or using an existing schema (e.g. available from their employer) which can be associated through their online identity OpenID and can be extended with their social network through Google’s OpenSocial API.

The data has now been extended from being searchable for relevant content using existing methods to becoming searchable using user-generated value metrics. These can then be leveraged, similar to Google Coop, and with further benefit if search standards were available.

If a group was selected based on their ability to identify and rank relevant content based on not by the content contained, but by the value associated with the properties of that content, the idea of relevant content no longer becomes whether the content itself is relevant to the person evaluating it, but whether the properties would be relevant to someone searching for those properties. This potentially has the ability to remove bias from relevance evaluation. No longer is content being evaluated for what it is but what it is perceived as, and the metrics from paid users as well as the users who view the content for their own or standard metrics is easily expandable and searchable by others, an architecture permitting growth beyond limited views.

Alternative Search Engines October 13, 2007

Posted by shahan in Uncategorized.
Tags: , ,
add a comment

In response to WebWorkerDaily’s article, none of the search engines listed include retrieval using structured information. Although I’m involved with information retrieval as part of my research, I don’t spend a lot of time exploring the search engines “out there”. The only reason I can give is that they haven’t done for me what Google already does with a little bit of query creativity. While searching news or blogs may have the benefit of limited scope, there’s no demonstration of added benefit.

A consequence of limiting search to a niche is that the popular terms within that niche become “boosted” automatically without being subsumed, e.g., by a larger news service or certain wiki. Another is that the rate of re-crawling already indexed pages can be better managed. I’ll make it a point to explore whether these search engines examine markup on the page when crawling though this is unlikely.

Currently my research efforts in information retrieval are over semi-structured document collections. Within our group we have been experimenting with boosts to certain structural elements and although our efforts have met slight improvements in the result rankings, there are a number of other tests to be run which I anticipate to reveal better boosting factors. The boosts thus far that we have experimented with have excluded subelement content lengths and are calculated as: sum, log(sum), 1/log(sum), avg, and no boosting. The boosting is based on a Markov Chain Model developed for Strucutural Relevance by Sadek Ali and shows great promise in using summaries.

Improving Blog Traffic October 11, 2007

Posted by shahan in Uncategorized.
add a comment

As a relatively new blogger, I’ve often wondered how I want to portray my writings and have begun to make it a higher priority over the last few weeks. One of the best things about blogging is that it is a way to hold myself accountable publicly. I’m listing a few questions and their answers for what I see VannevarVision to be.

What am I blogging about?

internet, information retrieval, online social networks, some eclipse programming

Who is my audience?

researchers or those interested in the more technical details of the topics listed

Do I want readers to keep coming back?

of course, I think I have interesting things to say

What is my target post rate?

currently at least once a week, I will get this down to once a day.

Most Importantly… What is my motivation?

I have a voice, I have a pretty good idea of what I’m talking about, I will make a change somewhere that will affect readers like you. I have valuable experiences to draw from and I’d like to be remembered amongst the archives 100 years down the road when someone is digging through trying to piece my biography together to determine what kind of foods I ate, not to mention how many beers I drank. It’d be nice in the future for my kids when they’re looking through the old-school internet and see that I was serious about my work.

Why Now?

nothing like the present, I don’t need my forebrain smacked in the form of a wakeup call

The Structure of Information Networks October 11, 2007

Posted by shahan in Uncategorized.
Tags: ,
add a comment

Jon Kleinberg is teaching a course, The Structure of Information Networks, with an interesting reading list, some of which overlaps with the required readings of Online Social Networks taught by Stefan Saroiu. Jon Kleinberg will also be giving a talk as part of U of T’s Distinguished Lecture Series on Oct 30 11AM at the Bahen Centre, Rm 1180. Other lectures are available here.

Reading List: Online Social Networks October 11, 2007

Posted by shahan in Uncategorized.

I’m duplicating the list of papers required for the Online Social Networks course. I’m no longer in the course but will continue to follow the material. The presentation I prepared on the Measurement and Analysis of Online Social Networks Presentation by Mislove et al. is attached.

* The Structure and Function of Complex Networks, M. E. J. Newman. SIAM Review 45, 167-256 (2003).
* Analysis of Topological Characteristics of Huge Online Social Networking Services, Y-Y Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. World Wide Web 2007 (WWW ’07).
* Measurement and Analysis of Online Social Networks, A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, S. Bhattacharjee. Internet Measurement Conference (IMC) 2007.
* Exploiting Social Networks for Internet Search, A. Mislove, K. P. Gummadi, and P. Druschel. HotNets 2006.
* Identity and Search in Social Networks, D. J. Watts, P. S. Dodds, M. E. J. Newman. Science 269(5571), 2002.
* On Six Degrees of Separation in DBLP-DB and More, E. Elmacioglu and D. Lee. Sigmod Record 2005.
* A Survey and Comparison of Peer-to-Peer Overlay Network Schemes, E. K. Lua, J. Crowcroft, M. Pias, R. Sharma, S. Linn. IEEE Communications Surveys and Tutorials 7(2005).
* SkipNet: A Scalable Overlay Network with Practical Locality Properties, N. J. A. Harvey, M. B. Jones, S. Saroiu, M. Theimer, A. Wolman. Usenix Symposium on Internet Technologies and Systems (USITS) 2003.
* The Impact of DHT Routing Geometry on Resilience and Proximity, K. P. Gummadi, R. Gummadi, S. D. Gribble, S. Ratnasamy, S. Shenker, I. Stoica. Sigcomm 2003.
* The Sybil Attack, J. R. Douceur, IPTPS 2002.
* Defending against Eclipse Attacks on Overlay Networks, A. Singh, M. Castro, P. Druschel, A. Rowstron. Sigops 2004.
* SybilGuard: Defending Against Sybil Attacks via Social Networks H. Yu, M. Kaminsky, P. B. Gibbons, A. Flaxman. Sigcomm 2006.
* Strength of Weak Ties, M. S. Granovetter. The American Journal of Sociology 1973.
* BubbleRap: Forwarding in small world DTNs in ever decreasing circles, P. Hui and J. Crowcroft. University of Cambridge Tech Report #UCAM-CL-TR-684 2007.
* Exploiting Social Interactions in Mobile Systems, A. G. Miklas, K. K. Gollu, K. K. W. Chan, S. Saroiu, K. P. Gummadi, E. de Lara. Ubicomp 2007.
* RE: Reliable Email, S. Garriss, M. Kaminsky, M. J. Freedman, B. Karp, D. Mazieres. Symposium on Networked Systems Design and Implementation (NSDI) 2006.
* Efficient Private Techniques for Verifying Social Proximity, M. J. Freedman and A. Nicolosi. IPTPS 2007.
* Separating key management from file system security, D. Mazieres, M. Kaminsky, M. F. Kaashoek, E. Witchel. Symposium on Operating Systems Principles (SOSP) 1999.
* Decentralized User Authentication in a Global File System., M. Kaminsky, G. Savvides, D. Mazieres, M. F. Kaashoek. Symposium on Operating Systems Principles (SOSP) 2003.
* HomeViews: Peer-to-Peer Middleware for Personal Data Sharing Applications, R. Geambasu, M. Balazinska, S. D. Gribble, and H. M. Levy. Sigmod 2007.