Go to Ask.com


enter the fray: our reader discussion forum
Problem with the Data
by Rhododendrites
While there are several odd things about this study (why use administrators for the "elite" group instead of the users with the most edits, why factor in "those that might become administrators later," etc.) the most important thing to note comes from the methodology section of the paper, most of which I quote here:
In the following analyses, we used a history dump of the English Wikipedia that was generated on 7/2/2006. The dump included over 58 million revisions, from more than 4.7 million wiki pages, of which 2.4 million are article-related entries in the encyclopedia...

To calculate the work done while editing an article, we calculated both the number of edits made and the change in content between edits. We model change as the number of words added and removed, as calculated by a traditional “diff” operation [9]. However, we used words as units instead of lines, allowing greater precision than previous studies...For both measures we aggregated edits over all 58+ million revisions, grouping by time and user participation level. User participation level was calculated based on the total number of edits made by a user.


They used all 58 million revisions in the dump! Not just the articles! They compared the contributions (words added/removed, edits, etc.) of experienced and novice Wikipedia users on talk pages, policy articles....even on those like the Request for Comment discussion page! You may have seen how long talk pages and discussions about Wikipedia decisions can get; how many edits and words go into them, with few words ever being removed. As Kittur, et al pointed out, it takes time (and a certain amount of passion for the topic at hand or for Wikipedia) to gain an understanding of the value of "indirect" work.

By including portions of Wikipedia that most newer users have never even heard of, the data becomes heavily skewed towards higher edits and word contributions on the part of the experienced Wikipedians.

I would bet that the same study, if repeated with the limitation of only examining article pages, would show exactly what we thought before--that the top 1% covers so much ground at least in part because of all the custodial work that it does.

Please correct me if I'm misinterpreting :)

<a href="http://rhododendrites.bl­ogspot.com/2008/03/wilsons-sla­te-article-complicates-not.htm­l">This</a>

View complete thread