jump to navigation

XML Structural Summaries and Microformats October 31, 2007

Posted by shahan in eclipse plugin, information retrieval, search engines, software architecture, software development, visualization, XML.
add a comment

From my experiences attempting to integrate microformats into XML structural summaries, the results have all been workarounds.

Microformats are integrated into an XHTML page through the ‘class’ attribute of an element. I won’t go into the issues with doing this and while the additional information embedded into the page is welcome, it doesn’t conform to the standardized integration model offered by XML. A good reference on integrating and pulling microformat information from a page is here.

Microformats are not easily retrieved from a page because there is no way to know ahead of time what formats are integrated into the page. A workaround in creating an XML structural summary based on microformats can be obtained by applying an extension of the XML element model to indexing attributes and furthermore their values (in order to identify differing attributes). Since the structural summaries being developed using AxPREs are based on XPath expressions, they will be able to handle microformats but with advanced planning on the user.

The screenshot below is of DescribeX with a P* summary of a collection of hCalendar files. Using Apache Lucene, the files are indexed to include regular text token, XML elements, XML attributes and their associatd values. On the right-hand side you can see a query has been entered searching using Lucene’s default regex ‘*event*’ to search for ‘class’ attributes that contain that term. The vertices in red represent the elements which contain it and while it would be nice to assume that the descendants of the highlighted vertices are related to hCalendar events, it is not the case.

Microformat highlighting using DescribeX

Demonstrating DescribeX and VisTopK at IBM CASCON Technology Showcase 2007 October 4, 2007

Posted by shahan in conference, eclipse plugin, GEF, information retrieval, visualization, XML.
Tags: , , , ,
add a comment

I’m happy to say that two projects that I work on, DescribeX (a team effort with Sadek Ali and Flavio Rizzolo) and VisTop-k, both of which are supervised by Dr. Mariano Consens, will be demonstrated at IBM’s CASCON Technology Showcase on October 22 – 25, 2007. There were quite a few interesting projects last year and I’m looking forward to seeing what new ideas have arisen, especially since my Eclipse plugin skills have increased a tremendous amount. As a student I’m also looking forward to the food😉

Link to CASCON

Published: Exploring PSI-MI XML Collections Using DescribeX October 2, 2007

Posted by shahan in publication, software development, standards, visualization, XML.
Tags: , ,
1 comment so far

My first official publication🙂 Thanks to Reza for putting so much hard work into it as well as his patience for some of the DescribeX bug fixes. Many thanks also go to my professors Mariano and Thodoros who guide and encourage at every opportunity.


PSI-MI has been endorsed by the protein informatics community as a standard XML data exchange format for protein-protein interaction datasets. While many public databases support the standard, there is a degree of heterogeneity in the way the proposed XML schema is interpreted and instantiated by different data providers. Analysis of schema instantiation in large collections of XML data is a challenging task that is unsupported by existing tools. In this study we use DescribeX, a novel visualization technique of (semi-)structured XML formats, to quantitatively and qualitatively analyze PSI-MI XML collections at the instance level with the goal of gaining insights about schema usage and to study specific questions such as: adequacy of controlled vocabularies, detection of common instance patterns, and evolution of different data collections. Our analysis shows DescribeX enhances understanding the instance-level structure of PSI-MI data sources and is a useful tool for standards designers, software developers, and PSI-MI data providers.


Reza Samavi, Mariano Consens, Shahan Khatchadourian, Thodoros Topaloglou. Exploring PSI-MI XML Collections Using DescribeX. Journal of Integrative Bioinformatics, 4(3):70, 2007. Online Journal: link

Many Eyes January 28, 2007

Posted by shahan in visualization.
add a comment

Posting a quick link to Many Eyes presenting a very interesting and well done online resource of user contributed data visualizations.

VisTopK Screenshot Available January 11, 2007

Posted by shahan in eclipse, eclipse plugin, GEF, information retrieval, visualization.
add a comment

Although there was a large break in between VisTopK-related posts, the project is now complete. A labelled screenshot is provided for your benefit. The report will be uploaded soon as well. I’m very happy with the result and am excited by the possibilities offered by the plug-in as it allows integration of many other projects, existing and new.

Screenshot of VisTopK


I’ve created a screencast of VisTopK in action using the great application Wink, originally referenced from Greg’s blog entry, but WordPress doesn’t allow Flash (SWF) uploads. Anyone have any ideas on how/where I can post it? I tried converting it to an AVI to maybe post it to YouTube but the size of 1GB stopped that attempt cold.