Scientific journal
European Journal of Natural History
ISSN 2073-4972
ИФ РИНЦ = 0,301

MODEL OF INFORMATION SPREAD IN SOCIAL NETWORKS

Lande D.V. 1, 2 Hraivoronska A.M. 1 Berezin B.O. 1
1 Institute for Information Recording NASU
2 NTUU “Kyiv Polytechnic Institute”
We propose evolution rules of the multiagent network and determine statistical patterns in life cycle of agents – information messages. The main discussed statistical pattern is connected with the number of likes and reposts for a message. This distribution corresponds to Weibull distribution according to modeling results. We examine proposed model using the data from Twitter, an online social networking service.
social network
modeling
Weibull distribution
agent-based system
information spread

The flows of information have a strong influence on opinion formation and other processes in the society. Today social networks play a fundamental role as a medium for the information spread. These facts motivate to explore mechanisms of creation of information flows and influence on them. Dealing with this requires focusing attention on modeling and finding laws or patterns in the spread of information [1].

In this article we present an agent-based model of information spread. The agent in this model is an information message [2]. A message published in social network may cause different types of public reaction. This model involves types of reaction such as positive or negative comments, respect or protest (we will call it like/dislike); message may be shared or copied (repost); also one message may have a link to another one (link). The evolution of the agent is controlled by mentioned above types of reaction. The main attribute of the agent is “energy” (E); that is representation of current relevance of the message or a degree of interest to the topic of the message by people. Naturally, a positive reaction or appearance of link to the message cause increase of energy. In opposite way, energy decreases when the message gets negative feedback. Anyway, energy tends to decrease because information eventually becomes outdated.

The agent specification

More precisely the rules of agent evolution are as follows. Each agent appears with the initial energy (E0) and dies when its energy becomes 0. The energy varies during the agent’s life cycle depending on the types of reaction. Let us list them all and their impact on the energy:

  • like: energy is incremented;
  • dislike: energy is decremented;
  • repost: energy is increased by 2;
  • reference: energy is incremented.

In addition the energy is decremented at every time step (we consider the evolution in discrete time).

On the other hand, the more relevance of the message, the more likely people respond and express their opinion about information in this message. It is assumed the probability to get some response depends on current energy of agent. We introduce the probability of getting certain reaction for the agent with energy E as follows

Lande01.wmf

Lande02.wmf

Lande03.wmf

We denote by Lande04.wmf initial parameters of the model, and by φ some monotone nondecreasing function from R to [0, 1].

The simulation of information spread

Earlier we introduced the evolution rules for the agent. The information flow consists of the set of such agents. We simulate the dynamics of the whole information flow as follows. At the initial time only one agent exists. New agents may appear in two ways. Firstly there is a probability of spontaneous generation (ps). It means that new agent may appear with probability ps at every time step. Such appearance corresponds to the publishing new information by somebody. Secondly a copy of existing agent may be created (repost).

Here we describe the life cycle of one agent in terms of variation of its energy. Let εt denote the value of energy at time t. Suppose δt is the random variable such that

Lande05.wmf

Lande06.wmf

Lande07.wmf

Lande08.wmf

Let us denote Lande09.wmf. Then we have

εt+1 = εt + δt .

It follows that we can consider a change of energy as the random walk on {0, 1, 2, …, E0, …} with transition probabilities

Lande10.wmf

In other words the stochastic sequence (ε0, ε1, …, εt, …) is a Markov chain with transition probabilities pij. A state diagram for this Markov chain is shown on Fig. 1, using a directed graph to picture the state transitions.

The random walk of energy is useful approach to analysis of properties of the model.

Model results

Now let us consider the statistical distribution of likes and reposts for messages in the information flow. Note that we can find the probability to get n likes for one agent according to the above theoretical approach.

Suppose an agent gets like at time t; then δt ∈ {0, 2}, otherwise δt ∈ {–1, 1}. Denote by Lande11.wmf any vector such that Lande12.wmf, if t = t1, …, tn and Lande13.wmf otherwise for 0 < t1 < ... < tn < Tmax. It is easily proved that

Lande14.wmf

Data generated by the model is illustrated in Fig. 2.

The frequency distribution of likes (blue line with dots) increases at first, and then decreases. It looks like a density function of the Weibull distribution [4]

Lande15.wmf

In Fig. 2 a density function of the Weibull distribution with the shape parameter k = 2,1 and the scale parameter λ = 7,4 is shown (red line). We get this density function as an approximation for the frequency distribution of likes using the method of least squares.

pic_18.tif

Fig. 1. A state diagram for Markov chain. States represent energy of an agent

pic_19.tif

Fig. 2. Distribution of likes generated by model

pic_20.tif

Fig. 3. Distribution of reposts generated by model

The frequency distribution of reposts and its approximation are shown in Fig. 3. Here a density function of the Weibull distribution has the shape parameter k = 1,7 and the scale parameter λ = 4,6.

Information flows in social networks

We study life cycles of news publications in Twitter and compare results with output produced by the model. Data about increase of likes and retweets for special information messages were collected [3]. We found that distributions of likes and retweets from a real social network fit Weibull distribution similarly to the model (Fig. 4 and 5). The shape parameter coincides with good accuracy in both situations.

We developed a computer program using R programming language for analyzing statistical data. Processing was carried out in three following steps.

pic_21.tif

Fig. 4. Distribution of likes from Twitter

pic_22.tif

Fig. 5. Distribution of retweets from Twitter

At first step the program detected increases in number of retweets for one user in online mode. For example, messages of New York Times newspaper were scanned every 15 minutes.

At second step the program treated data accumulated at the first step. We applied Weibull distribution for data approximation, so the scale parameter and the shape parameter were calculated. In addition, we estimated the rate of increase for number of retweets.

At third step all gathered data were stored in the external data base for future analysis.

To summarize, we collected texts of each message, timestamps for messages, scale and shape parameters, and numerous graphs. These graphs represent number of likes and retweets, the rate of growth for number of likes and retweets, and approximation for number of likes and retweets with Weibull distribution.

Conclusion

We constructed agent-based model of message life cycle in social networks.

The statistical pattern for number of likes and reposts for information messages was found. Distribution of likes and reposts satisfy Weibull distribution according to modeling results. Model output is quite similar to the results from the real social network. It follows that the statistical pattern exists in real social networks and the model captures this pattern.

Findings described in this article can be useful for future studying of information spread in social networks. Also the presented results can be applied to detecting anomalies in a life cycle of information messages.