Abstract

Online communities are becoming increasingly important as platforms for large-scale human cooperation. In these communities users seek and share professional skills, spreading knowledge along the chain of skill level. To investigate how users communicate and cooperate to
complete a large number of tasks, we analyze StackExchange, one of the largest question and answer systems in the world. We construct expertise networks to include all pairs of help-seeking interactions and measure the skill levels of users based on their positions in networks by a novel indicator "average flow distance". We explain the discovered hierarchy in networks, in particular, the maximum length and the distribution of users across hierarchies.

1. Introduction

2. Method

2.1 Constructing expertise networks

method.png

3. Findings

3.1 The hierarchy of expertise networks

We construct expertise network using the data of physics.stackexchange.com and investigate the network topology (Figure 1). A divide between askers and answers is observed: the population of askers is 1.5 times as big as that of answerers, but only 28% of askers also answer questions. A similar structure called "bow-tie" was observed by Andrei Broder et al. at 2000 and Jun Zhang et al. at 2007.

Figure 3: The hierarchy of the Math expertise network. The nodes represents users and the arcs represent the help-seeking relationships between users. The X coordinates are randomly generated between 0 and 1 and the Y coordinates shows the flow level *L_i* of users. The color of arcs shows the difficulty of questions calculated by the TrueSkill algorithm.

We calculate the flow level L_i of all users and found that the askers and answerers are separated (Figure 2). The flow level of askers equals one and that of answerers is equal to or greater than two. Those users who both ask and answer questions have a variety of flow levels, depending on the level of the users receiving their help.

It is observed that question difficulty is related with the flow level of asker and answers.

Figure 4. The distribution of TrueSkill scores of questions and users (left) and the distribution of flow level gaps L_j - L_i on all pairs of connected nodes i and j.

The TrueSkill score of users separates the askers (whose scores are around 10) from the answerers (whose scores are around 30). The distribution of flow level gaps shows that for a majority of cases the answerers need to have a (1.2) higher skill level to give a satisfying (accepted) answer.

Figure 5. The comparison between four different measures of skill level, including degree, PageRank score, TrueSkill score, and flow level.

We compare four different measures of skill level, including degree, PageRank score, TrueSkill score, and flow level. It turns out that PageRank score is trivially correlated with the degree of nodes. The TrueSkill scores, while it separates the askers from answerers as efficient as flow level,

3.2 Cascade Model for Attention Competition

The limitation of hierarchical levels

Figure 4.

We find the cascade model explains the limitation of flow hierarchy in expertise networks. In particular, the flow distance Li is a function of the ith node in the model such that:

![][1]
[1]:http://latex.codecogs.com/svg.latex?L_i=1+\frac{1}{n-i}(L_1+L_2+...+L_{i-1})

![][2]
[2]:http://latex.codecogs.com/svg.latex?f(x)=\left{\begin{array}{lr}L_i=1+\frac{1}{n-i}(L_1+L_2+...+L_{i-1})&:i\leq\frac{n}{2}\L_i=1+\frac{1}{i-1}(L_1+L_2+...+L_{i-1})&:\frac{n}{2}<i\leq{n}\L_i=1+\frac{4}{n^2}\sum_{i=1}{n/2}(n-2i+1)L_{n+1-i}&:i=n+1&n=even\L_i=1+\frac{4}{n^{2-1}\sum_{i=1}}{(n-1)/2}(n-2i+1)L_{n+1-i}&:i=n+1&n=odd\end{array}\right.