Computing Massive Trust Analytics for Twitter using Apache Spark with Account Self-assessment

Georgios Drakopoulos, Andreas Kanavos, Andreas Kanavos, Konstantinos Paximadis, Aristidis Ilias, Christos Makris, Phivos Mylonas


Although trust is predominantly a human trait, it has been carried over to the Web almost since its very inception. Given the rapid Web evolution to a true melting pot of human activity, trust plays a central role since there is a massive number of parties interested in interacting in a multitude of ways but have little or even no reason to trust a priori each other. This has led to schemes for evaluating Web trust in contexts such as e-commerce, social media, recommender systems, and e-banking. Of particular interest in social networks are classification methods relying on network-dependent attributes pertaining to the past online behavior of an account. Since the deployment of such methods takes place at Internet scale, it makes perfect sense to rely on distributed processing platforms like Apache Spark. An added benefit of distributed platforms is paving the way algorithmically and computationally for higher order Web trust metrics. Here a Web trust classifier in MLlib, the machine learning library for Apache Spark, is presented. It relies on both the account activity but also on that of similar accounts. Three datasets obtained from topic sampling regarding trending Twitter topics serve as benchmarks. Based on the experimental results best practice recommendations are given.


Paper Citation