jump to navigation

The value of the semantic web. RDF$? November 6, 2007

Posted by shahan in information retrieval, internet artchitecture, online social networks, openid, semantic web, standards, Uncategorized.
add a comment

The question that this entry seeks to answer is, “Using the semantic web, what resources are available that have meaningful marketable value?”.

While the value of the semantic web has been touted, marketable value is not as widely discussed. However; in order to encourage Google to develop an OpenRDF API, they need to see what it can do for them. In my previous post about Search Standards, I mentioned measurement of a person’s search preferences, such as type of content to search and metric ranges, is key to improving results. Combining Greg Wilson’s post about Measurement with the value-of-data issues mentioned in Bob Warfield’s User-Contributed Data Auditing we now want to understand how to retrieve semantically marked-up content which has the ability to generate revenue.

User-generated semantic metrics are easily achieved with the semantic web. Further, semantic metrics can be tied together using various means, one of which is mentioned in Dan Connolly’s blog entry Units of measure and property chaining. It should be noted that, due to the extensibility of semantic data, the value or metrics are independent of any specifics, thus allowing it to be used for trust metrics as well.

There is a general use case which describes what I mean:

  1. Content is made available. The quality is not called into question, yet.
  2. The content is semantically marked up so that it has properties that mean something.
  3. Other users markup the content even further but with personally-relevant properties that can be created by themselves or using an existing schema (e.g. available from their employer) which can be associated through their online identity OpenID and can be extended with their social network through Google’s OpenSocial API.

The data has now been extended from being searchable for relevant content using existing methods to becoming searchable using user-generated value metrics. These can then be leveraged, similar to Google Coop, and with further benefit if search standards were available.

If a group was selected based on their ability to identify and rank relevant content based on not by the content contained, but by the value associated with the properties of that content, the idea of relevant content no longer becomes whether the content itself is relevant to the person evaluating it, but whether the properties would be relevant to someone searching for those properties. This potentially has the ability to remove bias from relevance evaluation. No longer is content being evaluated for what it is but what it is perceived as, and the metrics from paid users as well as the users who view the content for their own or standard metrics is easily expandable and searchable by others, an architecture permitting growth beyond limited views.

Search Standards and OpenID; not only for single sign-on, will search standards emerge? October 31, 2007

Posted by shahan in online social networks, search engines, software architecture, standards.
Tags: , ,
1 comment so far

OpenID can be the answer to a whole slew of online profile questions. Not only can it answer, “how can I sign on to all these sites using my existing profile?”, it offers the possibility of answering, “How can I search this website using my existing preferences?”.
OpenID is a single sign on architecture created by Janrain which enables users to use an existing account supporting OpenID to access other websites that also support OpenID, thereby removing the need to create separate accounts on each site. It is a secure method for passing account details from one site to the other and differs from a password manager (either software or online) that hosts your different usernames and passwords for each site. Allowing your profile to be stored and represented online, you have the ability to use your existing information quickly and easily.

Despite Stefan Brands’ in-depth analysis of the problems that may arise with OpenID, OpenID is a good solution. Not only because of the ease of authentication, but also because it’s a secure way of storing a profile online. WordPress has OpenID by default (more info here). With the number of search engines emerging that do different things with different methods, I predict the rise of search standards and profiles.

A simple definition of Search Standard: The method and the properties which enable a user to search content.

These can cover search-engine relevant properties (which can be translated into accepted user-preferences) like:

  • sources, e.g., blogs, news, static webpages
  • metric ranges, e.g., > 80% precision or recall
  • content creation date
  • last indexed or updated

This is only opening the door to many areas in search engines and associated user preferences. By having these standards, it modifies the role of the search engine from dealing with the interface and presentation to the user, to that of a web service (an actual engine) which can be exploited by combining it with other search engines. By having these preferences, it addresses one of the biggest concerns when dealing with users, understanding and identifying what they prefer. As the number of search engines increases, the search engine market will no longer be as horizontal as it has been, but will become more hierarchical as each specializes in its niche. Combinations of search parameters may prove to be beneficial as the number and type of content increases, further encouraging the divergent expression of users on the web.

Published: Exploring PSI-MI XML Collections Using DescribeX October 2, 2007

Posted by shahan in publication, software development, standards, visualization, XML.
Tags: , ,
1 comment so far

My first official publication :) Thanks to Reza for putting so much hard work into it as well as his patience for some of the DescribeX bug fixes. Many thanks also go to my professors Mariano and Thodoros who guide and encourage at every opportunity.

Abstract

PSI-MI has been endorsed by the protein informatics community as a standard XML data exchange format for protein-protein interaction datasets. While many public databases support the standard, there is a degree of heterogeneity in the way the proposed XML schema is interpreted and instantiated by different data providers. Analysis of schema instantiation in large collections of XML data is a challenging task that is unsupported by existing tools. In this study we use DescribeX, a novel visualization technique of (semi-)structured XML formats, to quantitatively and qualitatively analyze PSI-MI XML collections at the instance level with the goal of gaining insights about schema usage and to study specific questions such as: adequacy of controlled vocabularies, detection of common instance patterns, and evolution of different data collections. Our analysis shows DescribeX enhances understanding the instance-level structure of PSI-MI data sources and is a useful tool for standards designers, software developers, and PSI-MI data providers.

Reference

Reza Samavi, Mariano Consens, Shahan Khatchadourian, Thodoros Topaloglou. Exploring PSI-MI XML Collections Using DescribeX. Journal of Integrative Bioinformatics, 4(3):70, 2007. Online Journal: link

Follow

Get every new post delivered to your Inbox.