Skip to content

Floss.syr.edu

Sections
Personal tools
You are here: Home » Presentations » Tutorial on FLOSS data at AoM 2006, Atlanta

Tutorial on FLOSS data at AoM 2006, Atlanta

Up one level
James Howison presented a short tutorial on FLOSS data sources at the 2006 Academy of Management conference in Atlanta. This folder has the presentation as well as links to the resources discussed in the session.
Tutorial on FLOSS data at AoM 2006, Atlanta The homepage for the Open Source PDWs at Academy
 
Tutorial on FLOSS data at AoM 2006, Atlanta James Howison's Powerpoint Presentation
 
Tutorial on FLOSS data at AoM 2006, Atlanta Notre Dame's Monthly Sourceforge Dumps
 
Tutorial on FLOSS data at AoM 2006, Atlanta FLOSSmole
Data collected on Sourceforge, Rubyforge, ObjectWeb and Freshmeat
Tutorial on FLOSS data at AoM 2006, Atlanta CVSanalY
Data for download on the CVS/SVN history of all projects on Sourceforge. Scripts to assist analyzing any CVS repository.
Tutorial on FLOSS data at AoM 2006, Atlanta Ohloh
Commercial site providing summary information on FLOSS projects.
Tutorial on FLOSS data at AoM 2006, Atlanta Open Business Readiness Rating Project
Methodology for readiness assessments on open source projects. Groups undertaking assessments share their assessments online.
Tutorial on FLOSS data at AoM 2006, Atlanta Howison, J. & Crowston, K. (2004). The perils and pitfalls of mining SourceForge. In Proceedings of Mining Software Repositories Workshop, International Conference on Software Enginnering (ICSE 2004), Edinburgh, Scotland, May 25.
SourceForge provides abundant accessible data from Open Source Software development projects, making it an attractive data source for software engineering research. However it is not without theoretical peril and practical pitfalls. In this paper, we outline practical lessons gained from our spidering, parsing and analysis of SourceForge data. SourceForge can be practically difficult: projects are defunct, data from earlier systems has been dumped in and crucial data is hosted outside SourceForge, dirtying the retrieved data. These practical issues play directly into analysis: decisions made in screening projects can reduce the range of variables, skewing data and biasing correlations. SourceForge is theoretically perilous: because it provides easily accessible data items for each project, tempting researchers to fit their theories to these limited data. Worse, few are plausible dependent variables. Studies are thus likely to test the same hypotheses even if they start from different theoretical bases. To avoid these problems, analyses of SourceForge projects should go beyond project level variables and carefully consider which variables are used for screening projects and which for testing hypotheses.
Tutorial on FLOSS data at AoM 2006, Atlanta Howison, J., Conklin, M. & Crowston, K. (In press). FLOSSmole: A collaborative repository for FLOSS research data and analyses. International Journal of Information Technology and Web Engineering.
This paper introduces and expands on previous work on a collaborative project, called FLOSSmole (formerly OSSmole), designed to gather, share and store comparable data and analyses of free and open source software development for academic research. The project draws on the ongoing collection and analysis efforts of many research groups, reducing duplication, and promoting compatibility both across sources of FLOSS data and across research groups and analyses. The paper outlines current difficulties with the current typical quantitative FLOSS research process and uses these to develop requirements and presents the design of the system.
Tutorial on FLOSS data at AoM 2006, Atlanta Presentation in PPT format
 
 

Powered by Plone

This site conforms to the following standards: