Open Census for Addressing False Identity Attacks in Decentralized Social Networks
We address the problem of estimating the population of a given area (e.g., a city). Unlike a centralized census where identities are reported to a single principal agency (e.g., U.S. Census Bureau) and are stored on a centralized data server, the proposed decentralized census distributes reported identities in a peer-to-peer (P2P) network. Peers in this network can also distribute opinions on whether some of the reported identities are correct or false. The estimation of the count of correct identities is based on the available information. A decentralized census enables individuals to independently verify governmental census data. The results of a decentralized census may complement results of an official census, as it may give new opportunities for reaching additional residents. Thus, governments can improve their official data using results from independent census processes, potentially after additional verification. To quantify the correctness of identities and the correct of the opinions on such identities in decentralized census, Bernoulli random variables are used. We use a Belief network to represent the probabilistic relationships between them. There are two ways to use the inferred value to estimate the count of correct identities. The first is by computed the expected cardinality of the number of correct identities. The second is by deciding separately the correctness of each identity based on a threshold and then simply count all the identities identified as correct. A dataset consists of information about reviewers, products and reviews of the products from Amazon.com is used to train the proposed Belief network. The statistics of good products and reliable reviewers are used to represent the correctness of identities and correctness of opinions. The statistics extracted from the product reviews are used to represent the relations between our data. Note that this is the best available data we can access to train our Belief network. Once more relevant datasets are available, the Belief network can be updated according to the newly observed samples. We evaluate the impact that various types of attackers have on the error of decentralized census through a set of studies. Simulators were implemented to generate identities and opinions according to various models. Results of evaluating the quality of the decentralized census are reported in receiver operating characteristic (ROC) curves. The quantification of the quality of decentralized census is selected as the area under the ROC curve. We studied the convergence of Markov chain Monte Carlo (MCMC) based on simulated instances of decentralized census. We estimated the maximum number of attackers to which the system can resist and the minimum number of honest agents needed to counter attacks of certain intensities. We showed that robustness to attackers is possible when there exists a reasonable kernel of honest individuals who share opinions on the correctness of the identities. Graceful degradation was observed with the increase of the number of attackers. Among various types of attacks, the false favorable attackers require the highest number of honest users to be satisfactorily countered. It is sometimes assumed that cycles are necessary to represent authentication with belief network theory. This study shows how to represent authentication in an acyclic belief network.