III-CXT - Small: Semi-automated coding of qualitative data to study group maintenance in self-organizing distributed teams


This study explores the application of Natural Language Processing (NLP) and Machine Learning (ML) tools to the context domain of organizational behaviour, more specifically to a study of group maintenance in a novel setting. The proposal involves information scientists working collaboratively with domain scientists with goal of developing an innovative NLP and ML-based research tool to support qualitative social science research, specifically content analysis. Content analysis is a qualitative research technique for finding evidence of concepts of interest using text as raw data rather than numbers [75]. The process of identifying and labelling significant features in text is referred to as “coding” and the result of such an analysis is a text annotated with codes for the concepts exhibited [72]. The problem of coding qualitative data is conceptualized as an Information Extraction (IE) problem. However, rather than seeking to automate the process, the system will employ the technologies in a supporting role, keeping the human coder in the loop. Specifically, it will apply an active learning process, using a few hand-coded examples to create an initial model that is evolved through interaction with the user. The project thus advances the domain by allowing qualitative researchers to obtain the benefits of cyber-infrastructure in leveraging their research capabilities. To validate the utility of the tool and further advance the domain, the system will be applied to the study of group maintenance behaviour in cyber-infrastructure-supported distributed groups, specifically free/libre open source software development teams.

PDF icon III-CXT-Small.pdf927.86 KB