Based on the results of the analysis described above, as well as results from complementary analyses led us to choose the Weibull distribution for our data. The log-normal and the Weibull distributions both fit the data well, but we ended up choosing the Weibull distribution over the log-normal because it did a much better job of differentiating between different repositories.
The burstiness paramater as defined by Goh and Barabasi is a parameter that ranges from −1 to 1, with negative values indicating anti-bursty behavior, 0 random behavior, and positive values indicating bursty behavior. For example, in the figure above, (a) is moderately bursty, (b) is extremely bursty, and (c) is moderately anti-bursty. This parameter can be computed empirically from the data (displayed in the insets of the above figure), but we are more interested in computing it based off of the distribution we fit to the data. It is defined in terms of the coefficient of variation of said distribution.
The memory coefficeint is essentially the pearson correlation coefficient between the set of interevent times and the set of interevent times offset by one position. It looks at how closely related a given interevent time is to the one that follows it. M, like B, ranges from −1 to 1, with positive values indicating that short interevent times tend to be followed by a short one, and negative values indicating that short interevent times tend to be followed by long ones. Examples of different values of M can be seen in Figure 1. Subfigures (a) and (c) have M close to 0, (b) and (d) have moderately positive values of M, (e) has a very positive value of M , and (f) has a fairly negative value of M.
With a distribution selected and B and M defined in terms of paramters of the fit distributions to each set of interevent times, we can now fit a distribution to each repository’s set of interevent times using maximum likelihood estimation, and then we can compute B and M based on measures of that fit distribution. Doing so yields us a joint distribution of B and M for all repositories, shown in subplot (a) of the figure below.
Subplot (b) is taken from the study by Goh and Barabasi referenced earlier, which shows the modes of the joint distributions of B and M for other types of data such as emails, printing, and natural phenoma. Mapping the mode of our measures to the same subplot (displayed as a yellow circle) shows that peoples’ work on GitHub exhibits similar burstiness and memory characteristics to that of people sending emails, which is a reasonable result.
One of the additional pieces of information we have about each repository in the study is the number of stars it received from users on GitHub. This can be viewed as a metric of success, as the act of starring a repository indicates a user’s interest in said repository. The figure below shows bivarate associations between success and burstiness(a), and success and the memory coefficeint (b).
Main plots of each subfigure show a scatterplot of the two variables along with a smoothed spline. Outliers (above the 99th percentile in S) were filtered out of the main plot. Margins show the one-way distribution of each variable. Insets show the same scatterplot and smoothed spline on a logarithmic scale, this time including outliers. There is a slight postive association between S and B, and it is more evident on the logarithmic scale. Based on the splines in (a), higher values of B are associated with higher values of S, to a point. Slightly positive values of M are associated with greater values of S, but the association is minimal.
The association seen visually here, while minimal, is statistically significant as supported by an additional analysis of fitting regressing a repository’s number of stars on B, M, and many other key characteristics of the repositories.
Teams on GitHub certainly exhibit bursty behavior. The degree of teams’ burstiness falls in line with other human activities, specifically printing and emailing. Because the overall burstiness of the teams falls in line with other human activities, it means that we have shown that groups of people act in similar patterns to individuals. This pattern of burstiness could be explained by social loafing, as teams could fall in and out of complacency when many people are working on the same team.
When beginning this project, we were hoping to see a much stronger association between par- ticularly burstiness and success than we did. Instead, we found the opposite: teams are able to, for the most part, find success regardless of how bursty or memory-dependent their work patterns are. A more in depth analysis in the future could investigate interevent times grouped by both team and team member. This has its own set of challenges, as more bias would be introduced due to the notable increase in filtering that smaller subdivisions requires.
If you are interested in more details and specifics of this study, I encourage you to check out the repository containing full paper, a slide deck, and code located here.