Review: Measurement and Analysis of Online Social Networks October 4, 2007
Posted by shahan in online social networks.Tags: online social networks
add a comment
Measurement and Analysis of Online Social Networks, A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, S. Bhattacharjee. Internet Measurement Conference (IMC) 2007.
An in-depth study of online social network graphs is performed for Flickr, LiveJournal, Orkut, and YouTube. Typical graph properties are indicated and compared amongst each other such as: power-law coefficients, degree correlation, link symmetry, paths, and fringe clusters, as well as some brief analysis on groups. The authors indicate that this information will be useful to enhance existing and develop new applications and algorithms by comparing them to the web. There is a lack of practical application of the results however with little or no future direction indicated.
Amongst the multitude of values presented, I found the link symmetry to be one of the most interesting portions of the paper. While the authors speculate that core nodes with a high degree can be useful in the transmission of information, they can also be detrimental in the case of spam or viruses. On the web, PageRank considers pages with many incoming links and few outgoing links to be authoritative and a source of information. Conversely, pages with many outgoing links with few incoming links are considered active and are not sources of information. Using this type of model allows PageRank to effectively identify pages that contain useful information; however, due to link symmetry in an online social network, this does not apply.
A suggestion would be to examine the destination of a link request such that if user A request a connection to user B first, then user B can be deemed more important. This would be useful in the case of YouTube, where the average indegree of friends connected to nodes of high outdegree is low, thus the “celebrity-driven” nature of the content (Figure 6). Considering the link destination as being more valuable is one way to offset the symmetric nature of links. This model is akin to voting for someone when you request a link to them, your vote being acknowledged through link reciprocation. In the case of lack of temporal information, we can infer more important nodes as we know the resultant outdegree versus average indegree of friends graph . A benefit of having the temporal information is also to quash the potential infiltrator who wishes to spread a virus, thus instead of allowing a node to become part of a hub by having many links, they would be trusted only if many other nodes have a desire to connect to them first.
Despite several well-explained graphs of differences due to snowball sampling or link caps, there is a lack of conclusion in the paper. It sets the stage very well for the exploration of questions and answers, with the data itself available for download, for better solutions for online social network-based trust and information retrieval techniques.
Review: Analysis of Topological Characteristics of Huge Online Social Networking Services October 4, 2007
Posted by shahan in online social networks.Tags: online social networks
add a comment
Analysis of Topological Characteristics of Huge Online Social Networking Services, Y-Y Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. World Wide Web 2007 (WWW ‘07).
The authors present the main characteristics and evolution of online social networks while providing supporting research on how representative a sample is. Complete data from Cyworld is compared and contrasted with samples from MySpace and Orkut to show that while certain properties are similar to that of offline social networks, interesting properties are revealed which are unique to online social networks.
One of the most important concerns in selecting a sample is validating how closely it represents the complete network. In the case of Orkut, the authors found a power law exponent of 3.7; however, this contradicts the exponent of 1.5 found by Mislove et al. Furthermore, each paper has differing references for the effect of snowball sampling on the power-law exponent; Mislove et al state “collecting samples via the snowball method has been shown to underestimate the power-law coefficient [1]” while conversely Ahn et al state “It is known that the power-law nature in the degree distribution is well conserved under snowball sampling [2]“. Having small sample sizes of 100,000 nodes further exacerbates this issue.
Interestingly, the historical analysis reveals the multi-scaled degree-distribution emerging in mid 2004. While the authors were unable to accurately indicate the reason for this, an Aug 2005 report from SK Telecommunications (Cyworld’s owner) website [3] reveals that Mobile Cyworld was released around March 2004. Additionally, there is inconclusive proof of this using alexa.com’s pageview statistics of cyworld.com, showing a sudden drop in late 2004, possibly contributed to Mobile Cyworld. Furthermore, a previous community portal site, Freechal, began to charge for membership around the same time period, encouraging users to switch to the more interactive Cyworld, ironically, with probably higher cost due to the addictive nature of acorns [4].
While the data and the research methodology presented in the paper may not be completely accurate, being one of the first attempts to provide metrics for online social networks places a high value on this work.
[1] L. Becchetti et al, A Comparison of Sampling Techniques for Web Graph Characterization. In Proceedings of the Workshop on Link Analysis, (LinkKDD’06), Philadelphia, PA, Aug 2006.
[2] S. H. Lee et al, Statistical properties of sampled networks. Phys. Rev. E, 73:016102, 2006.
[3] http://www.sktelecom.com/eng/jsp/tlounge/presscenter/PressCenterDetail.jsp?f_reportdata_seq=3668, SKT introduces WAP version of Mobile Cyworld (accessible only through search feature on homepage).
[4] http://english.ohmynews.com/articleview/article_view.asp?no=179108&rel_no=1
Published: Exploring PSI-MI XML Collections Using DescribeX October 2, 2007
Posted by shahan in XML, publication, software development, standards, visualization.Tags: publication, standards, XML
1 comment so far
My first official publication
Thanks to Reza for putting so much hard work into it as well as his patience for some of the DescribeX bug fixes. Many thanks also go to my professors Mariano and Thodoros who guide and encourage at every opportunity.
Abstract
PSI-MI has been endorsed by the protein informatics community as a standard XML data exchange format for protein-protein interaction datasets. While many public databases support the standard, there is a degree of heterogeneity in the way the proposed XML schema is interpreted and instantiated by different data providers. Analysis of schema instantiation in large collections of XML data is a challenging task that is unsupported by existing tools. In this study we use DescribeX, a novel visualization technique of (semi-)structured XML formats, to quantitatively and qualitatively analyze PSI-MI XML collections at the instance level with the goal of gaining insights about schema usage and to study specific questions such as: adequacy of controlled vocabularies, detection of common instance patterns, and evolution of different data collections. Our analysis shows DescribeX enhances understanding the instance-level structure of PSI-MI data sources and is a useful tool for standards designers, software developers, and PSI-MI data providers.
Reference
Reza Samavi, Mariano Consens, Shahan Khatchadourian, Thodoros Topaloglou. Exploring PSI-MI XML Collections Using DescribeX. Journal of Integrative Bioinformatics, 4(3):70, 2007. Online Journal: link