In project REMINDS we will develop systems to perform an analysis of public information transmitted through Social Networks, to automatically filter and show the information that is potentially relevant to a general audience. Although Social Networks are a source for a tremendous amount of information, much of the information is either private (yet granting public access in most cases), personal, not important or simply irrelevant to the general audience. Despite this situation, we have witnessed in the last years many important news and mass opinions on relevant issues being conveyed by social networks, usually surpassing in speed the broadcast through the traditional media of important events.
The main issue that REMINDS tackles is then to create a system capable of detecting, in the social networks sea of data, “relevant information”, while filtering and ignoring private comments and personal information. As Saracevic (2007) says, “relevance is a, if not even the, key notion in information science in general and information retrieval in particular”. For the author, relevance can assume different manifestations in information science, such as “system or algorithmic”, “topical or subject”, “cognitive relevance or pertinence”, “situational relevance or utility” and “affective relevance”. There is work in the area of event detection and also about influence, and detection of controversial topics. For instance, Diakopoulos (2010) studied the polarity of the opinions given by Twitter users. Based on the polarity and on the identified events, they were able to understand the general feeling of the opinions. They also used the “Pearson correlation” between positive and negative responses to measure the degree of controversy of the discussed topics, identifying strong oppositions in the opinions. Thelwall (2010) studied the sentiment polarity in the MySpace social network and discovered that nearly 2/3 of the users express emotions. Gomez-Rodriguez (2012) defined an information cascade model and developed an algorithm capable of inferring networks of influence and diffusion for the propagated topics. Leskovec (2006) have before that studied information cascades, i.e. the propagation of actions or ideas due the influence of others. Bakshy (2011) have recently quantified the relative influence of users on Twitter. They found a correlation between the largest cascades and the most influential users, as well as between the number of followers and the past local influence.
It is interesting to notice that despite relevance being 80 years old and many attempts to define its contributing factors have been drawn, no conclusive results have been drawn and debate continues. Xu and Chen in 2006 wrote: “...there is no agreement on factors beyond topicality, neither in terms of what they should be nor of how important they are… Naturalistic inquiry with qualitative research methods has been advocated and adopted by many researchers… [yet] almost no study of relevance judgment had adopted a confirmatory approach.” The REMINDS team has experience in text-mining, information retrieval and community detection.
Our system from a previous project (“Breadcrumbs”) allows us already to automatically detect in news fragments the answers to three standard journalistic questions: Who? / Where? / When?. Our team has also experience in sentiment analysis (important to understand if topic is being polemic or controversial) and on ranking comments on the social web. Our partner company in this research – INTERRELATE – is a startup whose business is “mining, interrelating, sensing and analyzing online information”. Therefore, we propose creation and analysis of an automatic relevancy detection system.
Our methodology will be based on two standard, realistic, yet new from the technological point of view, approaches, to detect relevancy, plus a third “speculative” approach. The first approach will be trying to test for irrelevance (and eventually failing, concluding for some degree of relevance); the second one will be using journalistic factors to assess relevancy. Finally, the third one will be to try to find correlation and causality between interaction patterns and relevance in social networks. Those two “and a half” approaches will then be confronted with a “gold-standard” model of relevance to validate the system and for testing the relative importance of factors used in these models.
The result will inform the weights and aggregation functions of the system into a single ranking of relevant fragments of information. As a result we will create a model of relevance that embodies better understanding of how people make relevance decisions and which enables making automatic relevance predictions at a large scale.