Tutorial on FLOSS data at AoM 2006, Atlanta
Up one levelJames Howison presented a short tutorial on FLOSS data sources at the 2006 Academy of Management conference in Atlanta. This folder has the presentation as well as links to the resources discussed in the session.
-
The homepage for the Open Source PDWs at Academy
-
James Howison's Powerpoint Presentation
-
Notre Dame's Monthly Sourceforge Dumps
-
FLOSSmole
- Data collected on Sourceforge, Rubyforge, ObjectWeb and Freshmeat
-
CVSanalY
- Data for download on the CVS/SVN history of all projects on Sourceforge. Scripts to assist analyzing any CVS repository.
-
Ohloh
- Commercial site providing summary information on FLOSS projects.
-
Open Business Readiness Rating Project
- Methodology for readiness assessments on open source projects. Groups undertaking assessments share their assessments online.
-
Howison, J. & Crowston, K. (2004). The perils and pitfalls of mining SourceForge. In Proceedings of Mining Software Repositories Workshop, International Conference on Software Enginnering (ICSE 2004), Edinburgh, Scotland, May 25.
- SourceForge provides abundant accessible data from Open Source Software development projects, making it an attractive data source for software engineering research. However it is not without theoretical peril and practical pitfalls. In this paper, we outline practical lessons gained from our spidering, parsing and analysis of SourceForge data. SourceForge can be practically difficult: projects are defunct, data from earlier systems has been dumped in and crucial data is hosted outside SourceForge, dirtying the retrieved data. These practical issues play directly into analysis: decisions made in screening projects can reduce the range of variables, skewing data and biasing correlations. SourceForge is theoretically perilous: because it provides easily accessible data items for each project, tempting researchers to fit their theories to these limited data. Worse, few are plausible dependent variables. Studies are thus likely to test the same hypotheses even if they start from different theoretical bases. To avoid these problems, analyses of SourceForge projects should go beyond project level variables and carefully consider which variables are used for screening projects and which for testing hypotheses.
-
Howison, J., Conklin, M. & Crowston, K. (In press). FLOSSmole: A collaborative repository for FLOSS research data and analyses. International Journal of Information Technology and Web Engineering.
- This paper introduces and expands on previous work on a collaborative project, called FLOSSmole (formerly OSSmole), designed to gather, share and store comparable data and analyses of free and open source software development for academic research. The project draws on the ongoing collection and analysis efforts of many research groups, reducing duplication, and promoting compatibility both across sources of FLOSS data and across research groups and analyses. The paper outlines current difficulties with the current typical quantitative FLOSS research process and uses these to develop requirements and presents the design of the system.
-
Presentation in PPT format