290 94 6MB
English Pages XXI, 236 [251] Year 2020
Innovation, Entrepreneurship und Digitalisierung
Dirk Homscheid
Firm-Sponsored Developers in Open Source Software Projects A Social Capital Perspective
Innovation, Entrepreneurship und Digitalisierung Reihe herausgegeben von Mario Schaarschmidt, Institut für Management, Universität Koblenz-Landau, Koblenz, Deutschland Harald von Korflesch, Institut für Management, Universität Koblenz-Landau, Koblenz, Deutschland
Kern dieser Schriftenreihe ist die empirische und praxisnahe Betrachtung des Zusammenspiels von Innovation, Entrepreneurship und Digitalisierung in verschiedenster Ausprägung. Dies beinhaltet Themen wie Geschäftsmodellinnovation, Soziale Medien, Technologiemanagement sowie neuere Themenblöcke wie beispielsweise Sharing Economy. Ein besonderer Fokus liegt bei der Bearbeitung der Themen auf den Veränderungen, die durch Digitalisierung hervorgerufen wurden. Ziel dieser Reihe ist es, insbesondere innovative Forschungsergebnisse, welche zu neuen wissenschaftlichen Erkenntnissen führen, gebündelt dem geneigten Leser zu präsentieren. Publiziert werden nationale und internationale wissenschaftliche Arbeiten. Die Reihe Innovation, Entrepreneurship und Digitalisierung wird herausgegeben von Mario Schaarschmidt und Harald von Korflesch.
Weitere Bände in der Reihe http://www.springer.com/series/16138
Dirk Homscheid
Firm-Sponsored Developers in Open Source Software Projects A Social Capital Perspective
Dirk Homscheid Baden-Württemberg, Germany Genehmigte Dissertation zur Verleihung des akademischen Grades eines Doktors der Sozial- und Wirtschaftswissenschaften (Dr. rer. pol.), Fachbereich 4: Informatik, Uni versität K oblenz-Landau, Koblenz, Germany, 2020.
ISSN 2524-5783 ISSN 2524-5791 (electronic) Innovation, Entrepreneurship und Digitalisierung ISBN 978-3-658-31477-4 ISBN 978-3-658-31478-1 (eBook) https://doi.org/10.1007/978-3-658-31478-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Responsible Editor: Carina Reibold This Springer Gabler imprint is published by the registered company Springer Fachmedien Wiesbaden GmbH part of Springer Nature. The registered company address is: Abraham-Lincoln-Str. 46, 65189 Wiesbaden, Germany
To my Family
Foreword
Since early efforts in understanding open collaborative communities, researchers have placed a premium in understanding the functioning of open source software (OSS) communities. This interest mainly arises from its economic peculiarities: Parties submit their resources (mostly time and human capital) to a project that is freely available to anyone. While this form of private-collective innovation is interesting per se, it becomes even more intriguing when firms with economic interests start deploying resources to OSS projects. To this end, Dirk Homscheid investigates the behavior of firm representatives in OSS projects by drawing on the example of Linux kernel development. Across three studies, Dirk Homscheid showed that developers that act on behalf of a company differ significantly from average developers in aspects such as network centrality, social capital, or opinion leadership. To arrive at meaningful conclusions, Dirk Homscheid uses a plethora of different methods and approaches such as social network analysis, survey, and regression analysis. The results therefore provide meaningful hints for firms on how to engage with open communities that complement their own research and development activities. The focal thesis was conducted at University of Koblenz-Landau, Institute for Management, in close collaboration with the Institute for Web Science and Technologies (WeST). The thesis therefore nicely mirrors the multiple research endeavors at these institutes at the intersection of digital technologies, business strategies, and socio-economic change. Personally, I have to thank Dirk Homscheid for a very fruitful and productive period of close collaboration. He always was open to new suggestions and was
vii
viii
Foreword
hard working towards getting the necessary data. It is my pleasure to honor his achievements in this foreword. For your career as well as for your personal life, dear Dirk, I wish you only the best. Koblenz, Germany
Mario Schaarschmidt
Acknowledgment
I would like to thank JProf. Dr. Mario Schaarschmidt from the Institute for Management for the opportunity to do a doctorate under his supervision and for the numerous thought-provoking impulses, which he has given me. Additionally, I am very grateful for the financial support through the PhD scholarship offered by Prof. Dr. Steffen Staab from the Institute for Web Science and Technologies. I got into conversation with many people about my research while doing my PhD thesis. All these people I would like to thank for their valuable feedback and support. I would like to particularly mention the researchers from the Institute for Web Science and Technologies, Koblenz, here especially Dr. Jerome Kunegis, and the researchers from GESIS, Cologne, for sharing their thoughts with me. Moreover, I would like to thank the faculty, particularly Prof. Dr. Jan Recker and Prof. Dr. Marijn Janssen, of the 2015 European Conference on Information Systems Doctoral Consortium for their target-aimed feedback on my PhD research. The same applies for the participants of the 2015 Developmental Workshop on Open Research and Practice in Information Systems (AIS SIGOPEN) and in particular Prof. Dr. Joseph Feller and Prof. Dr. Matt Germonprez. During my dissertation I have been very supported by my family. At this point I would like to take the opportunity to warmly thank my parents, my parents-in-law as well as my sister and especially my wife for their support and patience. Villingen-Schwenningen, Germany
Dirk Homscheid
ix
Contents
1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Question and Dissertation Goal. . . . . . . . . . . . . . . . . . . . . 3 1.3 Coherence of the Dissertation Studies. . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Structure and Outline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 The Social Capital View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 The Evolution of the Social Capital Concept . . . . . . . . . . . . . . . . . . 12 2.2 Considering Social Capital Theorists . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1 Pierre Bourdieu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.2 James Samual Coleman. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.3 Robert David Putnam. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.4 Janine Nahapiet and Sumantra Ghoshal . . . . . . . . . . . . . . . . 23 2.2.5 Nan Lin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3 Defining Social Capital. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4 Social Capital Research Today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.5 Characteristics of Social Capital compared to other Forms of Capital. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.6 Drawbacks of Social Capital. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.7 Social Capital in the Context of Organizations. . . . . . . . . . . . . . . . . 49 2.8 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3 Open Source Software and Firm Involvement. . . . . . . . . . . . . . . . . . . . 53 3.1 The Open Source Software Phenomenon . . . . . . . . . . . . . . . . . . . . . 54 3.1.1 A Comparison of Open Source Software and Proprietary Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.1.2 An Outline of the Open Source Movement. . . . . . . . . . . . . . 56 xi
xii
Contents
3.1.3 The Understanding of Open Source Software at a Glance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2 Exposing Open Source Communities and Open Source Software Developers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.2.1 Introducing Open Source Software Communities. . . . . . . . . 62 3.2.2 Motivation of Open Source Software Developers. . . . . . . . . 66 3.2.3 Motivation of Firms to get Involved in Open Source Software Communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.3 Business Models related to Open Source Software. . . . . . . . . . . . . . 75 3.4 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4 The Linux Kernel Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.1 History of and Facts about the Linux Kernel. . . . . . . . . . . . . . . . . . . 84 4.2 Governance of the Linux Kernel and the Linux Kernel Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.3 Explaining the Linux Kernel Development Process . . . . . . . . . . . . . 88 4.4 Linux Kernel Project as Research Context . . . . . . . . . . . . . . . . . . . . 89 4.5 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5 Collection and Cleanup of Network and Source Code Data. . . . . . . . . 93 5.1 Linux Kernel Mailing List Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.1.1 The Process of Data Crawling. . . . . . . . . . . . . . . . . . . . . . . . 93 5.1.2 The Process of Data Cleaning. . . . . . . . . . . . . . . . . . . . . . . . 98 5.1.3 The Process of Contributor Categorization. . . . . . . . . . . . . . 100 5.2 Linux Kernel Source Code Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6 Study I: Private-Collective Innovation and Open Source Software: Longitudinal Insights from Linux Kernel Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.2 Theoretical Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.2.1 Open Source Software Contributors . . . . . . . . . . . . . . . . . . . 111 6.2.2 Private-Collective Model of Innovation. . . . . . . . . . . . . . . . . 114 6.3 Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.3.1 Data Collection and Coding of Contributor Categories. . . . . 116 6.3.2 Social Network Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.4 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.4.1 Comparison of In-Degree and Out-Degree. . . . . . . . . . . . . . 119 6.4.2 Comparison of Degree per Group. . . . . . . . . . . . . . . . . . . . . 120 6.4.3 Longitudinal Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Contents
xiii
6.5 Discussion and Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.5.1 Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.5.2 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.5.3 Implications for Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.5.4 Limitations and Suggestions for Future Research. . . . . . . . . 126 7 Study II: The Social Capital Effect on Value Contribution— Revealing Differences between Voluntary and Firm-Sponsored Open Source Software Developers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.2 Theoretical Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.2.1 The Concept of Social Capital. . . . . . . . . . . . . . . . . . . . . . . . 129 7.2.2 Open Source Software Communities. . . . . . . . . . . . . . . . . . . 129 7.3 Hypotheses Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.3.1 Relations of Social Capital Dimensions to Each Other. . . . . 130 7.3.2 Relations of Social Capital Dimensions to Contribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.3.3 Firm-Sponsorship as Moderator . . . . . . . . . . . . . . . . . . . . . . 134 7.4 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 7.4.1 Setup of the Data to be Examined. . . . . . . . . . . . . . . . . . . . . 134 7.4.2 Operationalization of Variables. . . . . . . . . . . . . . . . . . . . . . . 136 7.4.3 Outlier Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.4.4 Validity Consideration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.5 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.5.1 Descriptive Information about Linux Kernel Mailing List Actors in 2014. . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.5.2 Descriptive Information about the Linux Kernel Source Code for 2014 and 2015 . . . . . . . . . . . . . . . . . . . . . . 145 7.5.3 Correlations and Regression Results. . . . . . . . . . . . . . . . . . . 148 7.6 Discussion, Conclusion and Implications. . . . . . . . . . . . . . . . . . . . . 159 7.6.1 Discussion and Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.6.2 Implications for Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 7.6.3 Implications for Management. . . . . . . . . . . . . . . . . . . . . . . . 164 7.6.4 Limitations of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 8 Study III: Social Capital and the Formation of Individual Characteristics—An Examination of Open Source Software Developers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 8.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
xiv
Contents
8.2 Hypotheses Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 8.3 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 8.3.1 Utilized Constructs and Indicators. . . . . . . . . . . . . . . . . . . . . 173 8.3.2 Conception and Method of Research. . . . . . . . . . . . . . . . . . . 175 8.3.3 Conduct of the Survey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 8.4 Data Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 8.4.1 Methods of Data Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . 180 8.4.2 Survey Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 8.5 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 8.5.1 Descriptive Information about the Linux Kernel Survey Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 8.5.2 Confirmatory Factor Analysis. . . . . . . . . . . . . . . . . . . . . . . . 188 8.5.3 Correlations and Regression Results. . . . . . . . . . . . . . . . . . . 192 8.6 Discussion, Conclusion and Implications. . . . . . . . . . . . . . . . . . . . . 197 8.6.1 Discussion and Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . 197 8.6.2 Implications for Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 8.6.3 Implications for Management. . . . . . . . . . . . . . . . . . . . . . . . 201 8.6.4 Limitations of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 9 Summary, Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 9.1 Theoretical and Empirical Contribution . . . . . . . . . . . . . . . . . . . . . . 203 9.2 Implications for Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 9.3 Implications for Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Abbreviations
AVE Average Variance Extracted BSD Berkeley Software Distribution CFI Comparative Fit Index CMIN/DF X2/Degrees of Freedom CR Composite Reliability DV Dependent Variable FLOSS Free/Libre Open Source Software FOSS Free Open Source Software FS Free Software FSF Free Software Foundation GFI Goodness-of-Fit Index GNU GNU’s Not Unix GNU GPL GNU General Public License ICS Inclusion of Community in the Self IOS Inclusion of Other in the Self IV Independent Variable LK Linux Kernel LKML Linux Kernel Mailing List OS Open Source OSD Open Source Definition OSI Open Source Initiative OSS Open Source Software PCI Private-Collective Innovation RMSEA Root-Mean-Square Residual SCC AVG Average Source Code Contribution (2014–2015) SSCI Social Sciences Citation Index TLD Top-Level Domain VIF Variance Inflation Factor xv
List of Figures
Figure 1.1 Coherence of the Dissertation Studies. . . . . . . . . . . . . . . . . . . . 7 Figure 2.1 Measures of Social Capital. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Figure 2.2 The Social-Capital Proposition: Relative Effect of Social Capital. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Figure 2.3 The Location-by-Position Proposition: Differential Advantages of Structural Bridges and Weaker Ties in a Hierarchical Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Figure 2.4 The Structural Contingency Proposition: Structural Constraints on Networking Effects. . . . . . . . . . . . . . . . . . . . . . . 32 Figure 2.5 Lin’s Model of the Social Capital Theory . . . . . . . . . . . . . . . . . 33 Figure 2.6 Overview of Published Social Capital related Articles per Year. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Figure 2.7 Overview of Citations related to Social Capital Articles per Year. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Figure 2.8 Overview of Published Social Capital related Books per Year. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Figure 2.9 Treemap of Research Fields utilizing Social Capital. . . . . . . . . 44 Figure 4.1 Linux Kernel Governance Structure. . . . . . . . . . . . . . . . . . . . . . 87 Figure 4.2 Exemplary Linux Kernel Release Cycle. . . . . . . . . . . . . . . . . . . 89 Figure 4.3 Linux Kernel Value Creation—Value Capture Matrix. . . . . . . . 91 Figure 5.1 LKML Thread List of a Month. . . . . . . . . . . . . . . . . . . . . . . . . . 95 Figure 5.2 Metadata and Content of a LKML Message. . . . . . . . . . . . . . . . 96 Figure 6.1 LK Contributors per Group and Year. . . . . . . . . . . . . . . . . . . . . 118 Figure 6.2 Amount of LKML Messages sent per Group and Year . . . . . . . 118 Figure 6.3 Comparison of In-Degree and Out-Degree. . . . . . . . . . . . . . . . . 120 Figure 6.4 Out-Degree Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
xvii
xviii
List of Figures
Figure 6.5 In-Degree Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Figure 6.6 Average Out-Degree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Figure 6.7 Average In-Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Figure 6.8 Gini Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Figure 7.1 Research Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Figure 7.2 Simple Slopes: Interaction of Degree Centrality and Firm-Sponsorship. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Figure 7.3 Simple Slopes: Interaction of Tie Strength and Firm-Sponsorship. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Figure 7.4 Simple Slopes: Interaction of #Cross Lists and Firm-Sponsorship. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Figure 8.1 Research Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Figure 8.2 Confirmatory Factor Analysis—Measurement Model. . . . . . . . 191 Figure 8.3 Structural Equation Model—Total Model . . . . . . . . . . . . . . . . . 196
List of Tables
Table 2.1 Overview of Social Capital Forerunners . . . . . . . . . . . . . . . . . . 15 Table 2.2 Theories of Capital. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Table 2.3 Overview of Characteristics of Social Capital Dimensions. . . . 25 Table 2.4 Definitions of Social Capital . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Table 3.1 Characteristics of Proprietary and Open Source Software. . . . . 56 Table 3.2 Definitions of Online Community. . . . . . . . . . . . . . . . . . . . . . . 65 Table 3.3 Overview of Motivational Drivers of OSS Contributors. . . . . . 72 Table 3.4 Business Models in the Field of OSS. . . . . . . . . . . . . . . . . . . . . 78 Table 5.1 Coded LKML Contributor Categories. . . . . . . . . . . . . . . . . . . . 101 Table 5.2 Top 10 of the Most Common Domain Names (Time Frame 1996–2014). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Table 5.3 Top 10 of the Most Active Actors on the LKML (Time Frame 1996–2014). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Table 5.4 Descriptive Information about Categorized Domain Names and LKML Actors (Time Frame 1996–2014) . . . . . . . . 104 Table 5.5 Identified LKML Actors per Contributor Category and Year . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Table 5.6 Amount of LKML Messages sent per Contributor Category and Year. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Table 6.1 Comparison of different Aspects for the Private, Collective and Private-Collective Innovation Model. . . . . . . . . 116 Table 6.2 LKML Contributor Categories. . . . . . . . . . . . . . . . . . . . . . . . . . 117 Table 7.1 LK Source Code Contribution Data: Means and Standard Deviations for the Years 2014 and 2015. . . . . . . . . . . 142 Table 7.2 Top 10 LKML Actors of the Year 2014. . . . . . . . . . . . . . . . . . . 144
xix
xx
List of Tables
Table 7.3 The 10 Most Active Organizations on the LKML of the Year 2014. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Table 7.4 Distribution of Lines of Source Code per LK Part. . . . . . . . . . . 146 Table 7.5 Distribution of the LK Source Code Commits 2014 and 2015 via the LK Versions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Table 7.6 Overview of the 10 Most Active LK Source Code Contributors of the Years 2014 and 2015. . . . . . . . . . . . . . . . . . 149 Table 7.7 Overview of the 10 Most Active LK Source Code Contributing Companies of the Years 2014 and 2015 . . . . . . . . 150 Table 7.8 Operationalized Variables: Descriptive Statistics and Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Table 7.9 Linear Regression Results: Hypotheses 1 to 3. . . . . . . . . . . . . . 153 Table 7.10 Linear Regression Results: Hypotheses 4 to 6. . . . . . . . . . . . . . 154 Table 7.11 Moderation Test Results: Moderation Hypotheses 7 to 9 . . . . . 156 Table 7.12 Linear Regression Result: Total Social Capital Model, DV Relational Capital (Tie Strength). . . . . . . . . . . . . . . . . . . . . 157 Table 7.13 Linear Regression Result: Total Model, DV Source Code Contribution (SCC AVG) . . . . . . . . . . . . . . . . . . . . . . . . . 158 Table 7.14 Summary of Hypotheses Results. . . . . . . . . . . . . . . . . . . . . . . . 163 Table 8.1 Job Autonomy Items. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Table 8.2 Opinion Leadership Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Table 8.3 Perceived Own Reputation Items. . . . . . . . . . . . . . . . . . . . . . . . 175 Table 8.4 Thresholds of Local and Global Quality Measures. . . . . . . . . . 184 Table 8.5 Descriptive Information about the overall LK Contributor Sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 Table 8.6 Descriptive Information about the LK Contributor Sample separated by Firm-Sponsored LK Contributors and LK Hobbyists. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Table 8.7 Additional Descriptive Information about the Subsample of Firm-Sponsored LK Contributors . . . . . . . . . . . . . . . . . . . . . 189 Table 8.8 Overview of Cronbach's Alpha and Corrected Item-Total Correlations of the used Survey Constructs. . . . . . . . . . . . . . . . 190 Table 8.9 Global Quality Measures of the Measurement Model. . . . . . . . 191 Table 8.10 Local Quality Measures of the Measurement Model and Correlations from Multi-Item Measures . . . . . . . . . . . . . . . . . . 192 Table 8.11 Operationalized Variables: Descriptive Statistics and Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Table 8.12 Linear Regression Results: Hypotheses 1 to 3. . . . . . . . . . . . . . 195
List of Tables
xxi
Table 8.13 Moderation Test Results: Moderation Hypotheses 4 to 6 . . . . . 195 Table 8.14 Structural Equation Model—Total Model. . . . . . . . . . . . . . . . . 197 Table 8.15 Summary of Hypotheses Results. . . . . . . . . . . . . . . . . . . . . . . . 199
1
Introduction
1.1
Motivation
The open source software (OSS) market has experienced a rapid development in recent years. One of the factors contributing to this growth was the emergence of new and improved OSS for a wide range of applications. As current examples, the open source (OS) operating system for mobile devices called Android (Google 2012) and special OS server and cloud operating systems, such as OpenStack or Cloud Foundry, can here be mentioned. But also OSS projects that emerged out of personal needs some time ago that still have high relevance, such as the Linux kernel (LK) project, which was initiated by Linus Torvalds in 1991 and is one of the most successful and one of the largest collaborative OSS development projects ever started (Corbet & Kroah-Hartman 2017) as well as the Apache HTTP server project started by Brian Behlendorf in 1995 (Lerner & Tirole 2002), can here be included. The Apache HTTP server is until today—thus over 20 years—the most popular web server, serving currently around 42% of all active websites (Netcraft 2018). In comparison to the early beginnings of the OS movement in the 1980s, today OSS communities do not only consist of voluntary contributors1 , but diverse contributor groups, including hobbyists, research institutions, universities and also companies (Schaarschmidt & Von Kortzfleisch 2015, Teigland et al. 2014). In contrast to proprietary software development—which usually happens within the boundaries of a company—is the development of OSS characterized by a decentralized organization, as the contributors participate from all over the world (Crowston et al. 2007, 1 In this dissertation the terms contributor, developer, actor and others are used synonymously
to denote people who are active in OSS projects. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2020 D. Homscheid, Firm-Sponsored Developers in Open Source Software Projects, Innovation, Entrepreneurship und Digitalisierung, https://doi.org/10.1007/978-3-658-31478-1_1
1
2
1
Introduction
Perr et al. 2010). The participants manage tasks, such as developing and improving the source code, debugging as well as documenting (MacCormack et al. 2006) and organize themselves by means of online communication technologies (e.g., forums, chats or mailing lists) (Bonaccorsi & Rossi 2003, Crowston & Howison 2006, Crowston et al. 2012, Kogut & Metiu 2001, O’Mahony & Ferraro 2007, Scacchi et al. 2006). Although OSS has to be published under an OS license, which grants everyone the right to use, modify as well as distribute the human-readable source code, and thus shares characteristics of a public good—which is non-excludability and nonrivalry (Wasko et al. 2009)—the field of OSS also seems to be interesting for the involvement of companies. Companies utilizing OSS and making revenues with their activities have established business models that comply with the license terms and conditions existing around OSS (Perr et al. 2010). A main reason for firms to open outwards and to get involved in OSS communities is to appropriate from OSS by providing complementary products and services (e.g., Andersen-Gott et al. 2012). Furthermore, firms can raise technological as well as innovation advantages and complement their own resource base (Dahlander & Magnusson 2008, Grand et al. 2004). To optimize the benefits that arise from making use of OSS communities, firms can deploy own resources to an OSS community and thus attempting to influence project work. Through this approach firms can, for example, try to influence the trajectory of the OSS project they are involved in (Schaarschmidt et al. 2013, West & O’Mahony 2008). In correspondence with the involvement of companies in the area of OSS, a profitable market has developed. How valuable and successful OSS can be, can be seen in many fields of application of the LK, where companies have understood how to combine the OS kernel with complementary products and services, and to successfully place these on the market. For instance, the LK provides basic services for the OS mobile operating system Android, which is intensely supported by Google. In 2016 Android captured a smartphone operating system market share of 84.80% (International Data Corporation (IDC), 2018). The LK also supplies a range of embedded devices (62% market share (Corbet & Kroah-Hartman 2017)) like routers, smart TVs or GPS navigation devices. In addition, most of the world’s supercomputers (99% market share (Corbet & Kroah-Hartman 2017)) run a specific Linux distribution as an operating system and a majority of public cloud providers utilize Linux as their operating system—90 % of public cloud workload is processed by Linux systems (Corbet & Kroah-Hartman 2017). Thus, it is not surprising that, according to a report of Research and Markets, the global OS services market is expected to grow from USD 11.40 billion in 2017 to USD 32.95 billion by 2022 (Research and Markets 2018).
1.2 Research Question and Dissertation Goal
3
The importance of OSS companies, their employees and their enormous knowledge about the respective OSS is also underlined by the latest acquisitions of OSS companies. The Chinese Internet group Alibaba acquired the German start-up Data Artisans at a price of EUR 90 million. Data Artisans has developed a software package for processing large amounts of data based on the OS stream processing framework Apache Fink (Kerkmann 2019). Furthermore, the software technology company IBM acquired recently the world market leader in the area of OS, Red Hat, at a cost of about USD 34 billion (Kerkmann 2019).
1.2
Research Question and Dissertation Goal
In recent decades, the field of OSS has repeatedly attracted the attention of researchers and managers (Crowston et al. 2012, Lerner & Tirole 2002). Research topics related to OSS covered, for example, the following aspects: • the specific features of OSS which are opposed to proprietary software, for example, OS licenses (e.g., Crowston et al. 2012, Feller & Fitzgerald 2002, German & González-Barahona 2009, Stallman & Gay 2002) or the characteristic of a public good with its elements of non-excludability and non-rivalry (e.g., Alexy & Reitzig 2013, Erickson 2018, Stürmer et al. 2009, von Hippel & von Krogh 2003, 2006, Wasko et al. 2009), • the organizational structures of OSS projects including their governance models (e.g., Bonaccorsi & Rossi 2003, Crowston & Howison 2006, Crowston et al. 2012, de Laat 2005, Kogut & Metiu 2001, Lattemann & Stieglitz 2005, Lynn et al. 2001, Markus 2007, Mockus et al. 2002, Moon & Sproull 2010, O’Mahony & Ferraro 2007, Scacchi et al. 2006, Shah 2006), • the motivation of voluntary OSS contributors to get involved in OSS projects (e.g., Alexy & Leitner 2011, Baytiyeh & Pfaffman 2010, Bitzer et al. 2007, David & Shapiro 2008, Ghosh 2005, Hars & Ou 2002, Hertel et al. 2003, Lakhani & von Hippel 2003, Lakhani & Wolf 2005, Spaeth et al. 2008, Stewart & Gosain 2006, Wu et al. 2007, Xu et al. 2009), • the reasons for firms to get active in OSS development (e.g., Andersen-Gott et al. 2012, Bonaccorsi & Rossi 2006, Dahlander 2005, Dahlander & Magnusson 2005, 2006, Lerner & Tirole 2002, Riehle 2007, Ziegler et al. 2014), • successful OSS business models (e.g., Bonaccorsi et al. 2006, Chesbrough & Appleyard 2007, Hall 2017, Krishnamurthy 2005, Lakka et al. 2011, Okoli & Nguyen 2015, Perr et al. 2010, Popp 2015, Riehle 2012, Watson et al. 2008),
4
1
Introduction
• the relation between companies and OSS communities (e.g., Capra et al. 2011, Dahlander & Magnusson 2008, 2005, Krishnamurthy & Tripathi 2009, Teigland et al. 2014) and • the business value of OSS (e.g., Chengalur-Smith et al. 2010, Morgan & Finnegan 2014, Ven & Mannaert 2008), just to name a few of the many researched areas. The plethora of OSS research and the corresponding studies have contributed to understand the phenomenon of OSS and its implications for business, but they lack the differentiation of different contributor groups with respect to the motives of the contributors’ involvement in OSS projects. For instance, OSS contributors are often understood as an uniform or homogeneous group (e.g., Méndez-Durón & García 2009) with the same orientation and purpose for the involvement in the OSS community, neglecting various motives and related interests that contributors pursue by their activity. In general, motivational aspects of hobbyists—the voluntary contributors who generate added value for the OSS community in their spare time—and the drivers of firms to actively participate in OSS development have been researched very thoroughly over the past 15 years (e.g., Baytiyeh & Pfaffman 2010, Cai & Zhu 2016, Hars & Ou 2002, Lakhani & Wolf 2005, Lerner & Tirole 2002, Shah 2006, Wu et al. 2007, Xu et al. 2009). In addition, researchers provide with the private-collective model of innovation an explanatory approach that seeks to find an explanation for the motivation of the combined development of OSS through hobbyists and firms (Alexy & Reitzig 2013, Stürmer et al. 2009, von Hippel & von Krogh 2003, 2006, von Krogh 2008, Zaggl & Raasch 2015). In particular, the private-collective innovation model seeks to explain why firms privately invest resources to create artefacts that share the characteristics of non-rivalry and nonexcludability (Alexy & Reitzig 2013, Erickson 2018, Zaggl & Raasch 2015). The private-collective model also implicitly assumes that private and public investments in innovations are approximately equal. However, successful OSS projects receive more than 85% of their code from contributors who are paid by companies (Corbet & Kroah-Hartman 2017) and the majority of code is written between 9 a.m. and 5 p.m.—indicating that contributions are predominantly provided by firms (Riehle et al. 2014). Accordingly, among the contributors existing in OSS projects, firms constitute a major contributor group with a possible corresponding influence in the community and on the community work. The pertinent literature on user communities and governance in OSS maintains that a large proportion of influence individuals have in a community depends on their position in this community (e.g., Crowston & Howison 2006, Dahlander & O’Mahony 2011). This view is reflected by social capital theory, which posits
1.2 Research Question and Dissertation Goal
5
that strong relationships and network positions that are advantageous to access information or to gain credibility are valuable resources that affect different outcome variables, like value creation (Nahapiet & Ghoshal 1998, Tsai & Ghoshal 1998) and various personal characteristics (Lin 2001, Nahapiet & Ghoshal 1998). The relation between network position in a community and different positive outcomes has been emphasized in various areas. For example, Chou & He (2011) were able to show that a developer’s social capital positively affects expertise integration, that is, individuals with high social capital synthesize project-related information for other community members. Relatedly, Wasko & Faraj (2005) found for electronic networks of practice—communities, which share characteristics with OSS communities—that social capital is associated with knowledge contributions. Aside from burgeoning research that has used social capital theory to investigate online communities, important aspects that pertain to the role of existing contributor groups (e.g., hobbyists and firm-sponsored developers) in the communities have not been addressed yet. Research has to show, if the associations between network position and positive outcome as predicted by social capital theory are independent of developers’ profession. As firms are identified to have a major impact in OSS communities, the group of firm-sponsored contributors is seen as lead contributor group in the further considerations of this dissertation. Against this background, this dissertation aims to extend research that has used social capital theory to investigate online communities by addressing the following central research questions that guide this dissertation: 1. How is the relation between an OSS contributor’s social capital and his2 created value affected by firm-sponsorship? 2. How is the relation between an OSS contributor’s social capital and associated individual outcomes affected by firm-sponsorship?
Although, it is known from previous studies that ‘progressing to the center’ of a project increases individuals—and therefore firms’—influence in a project (Dahlander & O’Mahony 2011), it is not known how firms can tap social capital inherent in an OSS community best. Thus, this dissertation approaches the research questions in three steps. First, this research investigates the different contributor groups associated with public and increasing private interests interacting in an OSS development project (Study I) to create a sound starting position for the further studies of this dissertation, in relation to existing contributor groups in an OSS project. The study will 2 For
reasons of legibility it is renounced a gender-specific differentiation as for example ‘his/her’ or ‘he/she’. Terms apply, in terms of equal treatment, for both genders.
6
1
Introduction
contribute new knowledge about the structure of OSS contributors to the literature of OSS communities and provides empirical insights to research around the private-collective model of innovation. Second, this dissertation aims at synthesizing literature on social capital theory and more recent literature on OSS communities to arrive at a conceptual model of social capital and individuals’ value creation in OSS communities. Accordingly, it targets at replicating prior research that used social capital to predict diverse forms of outcome (e.g. Chou & He 2011) by using alternative operationalizations of the different social capital dimensions as well as forms of outcome (i.e., source code contribution (Study II), three forms of personal characteristics: opinion leadership, perceived own reputation, job autonomy (Study III)). Third, this research aims at challenging social capital theory by including firmsponsorship in the models to see whether it may influence the relationship between social capital and forms of outcome. Thereby it extends prior research by including the important role of sponsorship a developer receives. Among rare exceptions (e.g., Dahlander & Wallin 2006), only few research has taken into account developers’ sponsorship that influences their network position. Avenues for further theoretical and empirical research as well as suggestions for managers of firms, actively participating in OSS communities, to improve the position of their employees in the OSS community—in relation to their OSS engagement objectives— complement the contribution of this dissertation.
1.3
Coherence of the Dissertation Studies
As part of this dissertation, the doctoral candidate conducted three studies. These studies are delineated in detail after the fundamentals part (i.e., introduction, theoretical foundations, research project, foundations of research data) and prior to the overall summary and conclusion of this dissertation, see Figure 1.1. All three studies have in common that they relate to the same research context, this is the LK project— one of the most successful and one of the largest collaborative OSS development projects ever started—and thematically build on each other. However, each study is self-contained and this implies that partly theoretical foundations are repeated in the individual studies. Study I of the dissertation titled ‘Private-Collective Innovation and Open Source Software: Longitudinal Insights from Linux Kernel Development’ is descriptive research in which social network analysis forms the basis of the evaluations. The aim of this study is to investigate how different contributor groups associated with public and increasing private interests interact in an OSS development project. In order to
1.3 Coherence of the Dissertation Studies
7
Introduction
Theoretical Foundations
Social Capital Theory, OSS Communities
Research Project and Research Data Foundations
Linux Kernel Project, Network and Source Code Data
Research Context: Linux Kernel Project
Descriptive Research; Foundation: Social Network Analysis
Study I Data Source: LKML (1996-2014)
Private-Collective Innovation and Open Source Software: Longitudinal Insights from Linux Kernel Development
Causal Research; Foundation: Social Capital Theory
Study II The Social Capital Effect on Value Contribution – Revealing Differences between Voluntary and Firm-Sponsored Open Source Software Developers
Study III Social Capital and the Formation of Individual Characteristics – An Examination of Open Source Software Developers
Data Sources: LKML (2014) LKSC (2014/2015)
Thematically building on each other
Data Sources: LKML (2014) LKCS (2015/2016)
Summary and Conclusion
LKML: Linux Kernel Mailing List; LKSC: Linux Kernel Source Code; LKCS: Linux Kernel Contributor Survey
Figure 1.1 Coherence of the Dissertation Studies
study the interplay of both interest groups, not only demographic characteristics of the community need to be considered but also the structural patterns of interactions in it. To achieve this goal, developers active in the LK development community are analyzed from a social network point of view, as the interaction between the members of a software development community reflects the structure of their collaboration. The data source for this study is the Linux kernel mailing list (LKML) with data of the years 1996 to 2014. Study II and Study III are causal research, in which the social capital theory forms the theoretical basis for the established research models and hypotheses. The aim of Study II, called ‘The Social Capital Effect on Value Contribution—Revealing Differences between Voluntary and Firm-Sponsored Open Source Software Developers’,
8
1
Introduction
is on the one hand, to investigate the relation between an OSS contributor’s social capital and his created value—in the form of source code contribution. On the other hand, there is the question whether firm-sponsorship of OSS contributors has an influence on the aforementioned relationship. Based on the conceptual social capital model of Nahapiet & Ghoshal (1998) and Tsai & Ghoshal (1998), the research model of this study is developed. As data sources from which the required metrics are derived, the LKML with data of the year 2014 and LK source code contribution data of the years 2014 and 2015 are utilized. The aim of Study III, named ‘Social Capital and the Formation of Individual Characteristics—An Examination of Open Source Software Developers’, is on the one hand, to investigate the relations between an OSS contributor’s social capital and associated individual outcomes—these are opinion leadership, perceived own reputation and job autonomy. On the other hand, there is the question whether firm-sponsorship of OSS contributors has an influence on the aforementioned relationships. Based on the adapted conceptual social capital model of Nahapiet & Ghoshal (1998) and the network-related social capital theory of Lin (2001), the research model of this study is developed. The data sources used to derive the variables needed are the LKML with data of the year 2014 and data from a survey conducted among LK source code contributors in December 2015 and January 2016.
1.4
Structure and Outline
This Chapter discusses the motivation to address the topic of voluntary and firmsponsored contributors active in OSS communities from a social capital point of view. In addition, existing research gaps are identified and the research questions of this dissertation are specified accordingly. The chapter concludes with an explanation of the coherence between the three implemented studies and gives a description of the structure and outline of this doctoral thesis. To gain a comprehensive understanding of the social capital concept and further of the social capital theory, Chapter 2—titled ‘The Social Capital View’—gives an overview of the evolution of social capital, clarifies the influences on the concept by explaining the views of the most influential theorists and presents a selection of essential social capital definitions to find an appropriate definition for this doctoral thesis. As the research stream in the context of social capital has broadened over the last years (Adler & Kwon 2002) it is vital to assess how social capital research has recently developed. Moreover, it is discussed whether social capital is really a form of capital as well as drawbacks of social capital are presented. To see how social capital research in the organizational context has evolved, theoretical and empirical research on this topic has also been evaluated.
1.4 Structure and Outline
9
Chapter 3—named the title ‘Open Source Software and Firm Involvement’— gives a detailed insight into the field of OSS. First, the most important differences between proprietary software and OSS are delineated. For the context of this doctoral thesis it is vital to understand the motives of the OS movement as well as the purpose and the details of OSS licenses, which are clarified accordingly. Second, light is shed on characteristics of OSS communities as well as the predominant drivers of voluntary OSS developers and firms involved in OSS communities. Third, business models related to OSS are discussed, to show how firms can make money with OSS related products and services in congruence with OSS licensing terms. Since the LK project forms the research context of all studies of this dissertation, the project is presented in more detail in Chapter 4. First, the history and some relevant facts about the LK are delineated. Second, the governance of the LK project and the LK community are introduced. Third, the LK development process is explained. Finally, a justification is given, why the LK project is a suitable project to form the research context of all three studies of this dissertation. As the research studies of this doctoral thesis share to a certain extend the same data sources, which are the LKML and source code contribution information from the LK version control system Git, the processes of data crawling and data cleaning are described in Chapter 5 ‘Collection and Cleanup of Network and Source Code Data’. Chapters 6 to 8 deal with the three studies of this dissertation. Chapter 6 is about Study I, titled ‘Private-Collective Innovation and Open Source Software: Longitudinal Insights from Linux Kernel Development’. In the context of the study, the theoretical background is discussed and the research method of the study as well as the results are depicted in detail. The chapter closes with a discussion and conclusion. Study II, called ‘The Social Capital Effect on Value Contribution—Revealing Differences between Voluntary and Firm-Sponsored Open Source Software Developers’, is delineated in Chapter 7. In the context of the study, the theoretical foundations are discussed and the hypotheses derived therefrom. Furthermore, the research design of the study and the results are depicted in detail. The chapter concludes with a discussion and conclusion. Chapter 8 deals with Study III, named ‘Social Capital and the Formation of Individual Characteristics—An Examination of Open Source Software Developers’. In the context of the study, the theoretical foundations are discussed and the hypotheses derived therefrom. Further, the research design of the study and the results are depicted in detail. The chapter concludes with a discussion and conclusion. As a last point, Chapter 9 summarizes the findings of the three studies conducted in this dissertation and gives implications for research and management.
2
The Social Capital View
“Whereas economic capital is in people’s bank accounts and human capital is in their hands, social capital inheres in the structure of their relationships” (Portes 1998, p. 7)
OSS projects are formed by groups of heterogeneous people, mostly spread all over the world, and hence constitute a distributed workforce that organizes themselves by means of online communication technologies (e.g., forums, chats or mailing lists) (Bonaccorsi & Rossi 2003, Crowston & Howison 2006, Crowston et al. 2012 Kogut & Metiu 2001, O’Mahony & Ferraro 2007, Scacchi et al. 2006). As such OSS projects get larger and more and more contributors join the project team, also the extent of social interaction increases. Thereby, the social interaction appears in different shapes, for example, as communication among the members or the coordination of work packages (Dahlander & O’Mahony 2011, Lee & Cole 2003). In addition, connected with the social interaction over time is the formation of relationships among the contributors (Bergquist & Ljungberg 2001). From a theoretical perspective, social interaction and relationships among community members—regardless whether it is a physical or an online community—can be seen through the lens of the social capital concept, as people in a community get involved with each other to obtain benefits from the interactions and relationships, respectively, with their peers. This circumstance is depicted by the general definition of social capital, which reads “investment in social relations with expected returns” (Lin 1999a, p. 30). This definition developed by Lin (1999a) is consistent with the considerations of various researchers who have contributed to the topic of social capital (e.g., Bourdieu 1980, 1983, 1986, Burt 1992, Coleman 1988, 1990, Portes 1998, Putnam et al. 1993, Putnam 1995a). Moreover, Nahapiet & Ghoshal (1998) make the link between communities and networks, respectively, and social capital clearer as they define social capital as © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2020 D. Homscheid, Firm-Sponsored Developers in Open Source Software Projects, Innovation, Entrepreneurship und Digitalisierung, https://doi.org/10.1007/978-3-658-31478-1_2
11
12
2 The Social Capital View “the sum of the actual and potential resources embedded within, available through, and derived from the network of relationships possessed by an individual or social unit. Social capital thus comprises both the network and the assets that may be mobilized through that network” (Nahapiet & Ghoshal 1998, p. 243).
To gain a comprehensive understanding of the social capital concept and further of the social capital theory, this chapter gives an overview of the evolution of social capital, clarifies the influences on the concept by explaining the views of the most influential theorists and presents a selection of essential social capital definitions to find an appropriate definition for this doctoral thesis. As the research stream in the context of social capital has broadened over the last years (Adler & Kwon 2002) it is vital to assess how social capital research has recently developed. Moreover, it is discussed whether social capital is really a form of capital as well as drawbacks of social capital are presented. To see how social capital research in the organizational context has evolved theoretical and empirical research on this topic has also been evaluated.
2.1
The Evolution of the Social Capital Concept
To get a first general understanding of the social capital topic the following section gives a brief chronological overview of the historical evolution of the main social capital considerations. The term social capital was used for the first time by John Dewey (1900), an American philosopher and educator, in his monograph “The School and Society”. In his remarks about the relationship between social change and elementary education, Dewey mentioned social capital but offers no explanation for it. The first known use of the phrase social capital in a scientific publication was made in the year 1916 by Lyda J. Hanifan, who was a state supervisor of rural schools in Charleston, West Virginia (Putnam 2000). In his article “The Rural School Community Center” Hanifan (1916) described how a rural school community of West Virginia formed social capital within a year and used this to improve its social communal life. In relation to social capital Hanifan states: “In the use of the phrase social capital I make no reference to the usual acceptation of the term capital, except in a figurative sense. I do not refer to real estate, or to personal property or to cold cash, but rather to that in life which tends to make these tangible substances count for most in the daily lives of a people, namely, goodwill, fellowship, mutual sympathy and social intercourse among a group of individuals and families who make up a social unit, the rural community, whose logical center is the
2.1 The Evolution of the Social Capital Concept
13
school. [...] The individual is helpless socially, if left entirely to himself. [...] If he may come into contact with his neighbor, and they with other neighbors, there will be an accumulation of social capital, which may immediately satisfy his social needs and which may bear a social potentiality sufficient to the substantial improvement of living conditions in the whole community. The community as a whole will benefit by the cooperation of all its parts, while the individual will find in his associations the advantages of the help, the sympathy, and the fellowship of his neighbors” (Hanifan 1916, p. 130 f.).
In the given quote Hanifan describes his understanding of social capital as opposed to material goods and considers it as a facilitator of individuals’ engagement in community life as well as for social cohesion. Hanifan (1920) published a revised version of his article as a chapter on social capital in his book “The Community Center” in the year 1920. Almost half a century after Hanifan’s publication, the subject of social capital was revived in the 1960s. Jane Jacobs (1961), an American-Canadian journalist and author, described in her book “The Death and Life of Great American Cities” an idea about the features and value of community and neighborhood solidarity in cities, which corresponds to the meaning of social capital, although she did not explicitly define the term. The American political scientist Robert H. Salisbury (1969) goes one step further and extends the notion of social capital to a vital element of interest group creation in his article “An Exchange Theory of Interest Groups” in the year 1969. A first substantial theoretical analysis and consideration of the social capital concept was elaborated by the French sociologist Pierre Bourdieu in the 1980s. A brief discussion of social capital entitled “Le Capital Social—Notes Provisoires” (Bourdieu 1980) was published in the French Social Science Research Proceedings in the year 1980. His thoughts did not get adequate attention in the academic world most likely due to the publication in French (Portes 1998). In 1986, a more detailed elaboration appeared on the subject of three different types of capital, namely: economic, cultural and social capital. This time Bourdieu’s article was written in English, but unfortunately published in a handbook on the sociology of education, limiting its visibility (Bourdieu 1986, Portes 1998). At about the same time as Bourdieu, the American economist Glenn Loury (1977, 1981) researched on racial income inequalities and their policy implications, when considering social capital. In the view of Loury the exclusive focus on individual human capital of orthodox economic theories is not sufficient, as these are too individualistic (Loury 1977, Portes 1998). In 1988 the American sociologist James S. Coleman introduced a concept of social capital, which was based on both, sociological and economical, disciplines and
14
2 The Social Capital View
relies on the work of Loury. By bringing together the two research fields Coleman offered a holistic consideration of “the role of social capital in the creation of human capital” (Coleman 1988, Portes 1998). He developed a concept of social capital, showed its significance for the appropriation of human capital and revealed means by which social capital can be generated. Through his publications (1988, 1990) Coleman brought the social capital topic forward and made it visible in American sociology (Portes 1998). In the 1990s, the theoretical and empirical consideration of social capital was further examined by numerous scholars. For example, the Canadian-American sociologist Barry Wellman and Scot Wortley (1990) researched empirically on the role of community ties for social support. Wayne E. Baker (1990), an American sociologist, investigated empirically the relation between corporations and investment banks. Also in the mid-1990s the World Bank started the “Social Capital Initiative” with funding from the government of Denmark with the purpose of investigating the impact of social capital on economic and societal development (World Bank 2011). The American political scientist Robert D. Putnam achieved with his empirical studies attention far beyond the academic field in the 1990s. He conceptualized and verified empirically his social capital approach in two major studies about political institutions and democracy in Italy (Putnam et al. 1993) and the investigation of declining engagement of American citizens in civic organizations, which has led to a weakening of the civil society in the period from 1950 to 1995 (Putnam 1995a, 2000). At the end of the 1990s the American-Chinese sociologist Nan Lin places networks into the center of his contemplation of the social capital concept (Lin 1999a). Moreover, he synthesized the relevant considerations of the social capital concept made so far and presented a social capital theory in the year 2001 (Lin 2001). In summary, the term social capital evolved within a century towards a theory influenced by various researchers from different scientific disciplines. The foundation for this progress was laid by Hanifan in the year 1916 with his detailed description of the importance of community involvement, and thus in turn social capital, for successful rural school development. Putnam notes in relation to the social capital approach of Hanifan: “Hanifan’s account of social capital anticipated virtually all the crucial elements in later interpretations, but his conceptual invention apparently attracted no notice from other social commentators and disappeared without a trace” (Putnam 2000, p. 19). A long time after Hanifan, Pierre Bourdieu takes on the role as a forerunner in detailing out a first theoretical social capital concept. However, the French thoughts have not had a significant influence and visibility at the time. Thus, mainly American economists, sociologists and political
2.1 The Evolution of the Social Capital Concept
15
scientists expedited social capital research and made the outcomes visible through social capital concepts and formulations of social capital theories. Table 2.1 summarizes the previously introduced social capital forerunners and gives an overview of their discipline, the research field that is connected with their social capital consideration and if the author has given a definition of social capital. Table 2.1 Overview of Social Capital Forerunners Authors
Disciplines
Dewey, John
Years
Research Fields
Social Capital Definitions
American philosopher 1900 and educator
Educational research
no
Hanifan, Lyda J.
State supervisor
Urban sociology
yes
Jacobs, Jane
American-Canadian journalist and author
1961
Urban sociology
no
Salisbury, Robert H.
American political scientist
1969
Loury, Glenn
American economist
Bourdieu, Pierre
French sociologist
Coleman, James
American sociologist
Wellman, Barry & Wortley, Scot
Canadian-American sociologist
Baker, Wayne E.
American sociologist
Putnam, Robert
American political scientist
1993
World Bank
Economic development
1999
Lin, Nan
American-Chinese sociologist
1999a
1916, 1920
1977 1980, 1986 1988, 1990
1990 1990
Interest group research no Economics of education
yes
Philosophy of the social sciences
yes
Sociology of education, Rational-choice sociology
yes
Community sociology no Sociology of organizations
yes
Study of democracy
yes
Economic and societal yes development research Social networks
yes
16
2.2
2 The Social Capital View
Considering Social Capital Theorists
The previous section has given a summarizing chronological overview of the main influential pioneers of the social capital concept. In the following, first a classification of classical capital in contrast to the human, cultural and social capital terms and theorists is given. Thereafter, as the thoughts of some in the previous section mentioned pioneers have had significant influence on the further development of the social capital notion, the most important theorists and their understanding of the concept are explained in more detail. First comments on the notion of capital can be attributed to Marx (Brewer 1984, Marx 1933/1849, Marx & McLellan 1995/1867). He conceptualized capital as a part of the surplus value captured by capitalists or the bourgeoisie. These two groups possess and control the means of production in the circulations of commodities and monies between the processes of production and consumption (Lin 1999a). Capital in this sense can on the one hand be understood as a product of a process, since it is part of the surplus value. On the other hand, capital can also be seen as an investment process, since through this process the surplus value is produced and captured. As investments are made by the dominant class and this class also captures the surplus value, the described capital theory by Marx is formed on exploitative relations between the capitalists or bourgeoisie and the proletariat (Lin 1999a). Table 2.2 Theories of Capital (Source: Lin (1999a, p. 30), reprinted by permission from INSNA)
2.2 Considering Social Capital Theorists
17
Table 2.2 supports to get an understanding of various capital terms by relating them to each other through the comparison of their main characteristics. Marx’s theory of capital is classified as the classical theory of capital, whereas considerations about human, cultural and social capital are named neo-capital theories (Lin 1999a). The human capital theory refers to capital as an investment in personal skills and knowledge with expected returns, such as higher earnings or economic value (Becker 1993/1964, Johnson 1960, Lin 1999a, Schultz 1961). Moreover, in the view of Bourdieu (Bourdieu 1990, Bourdieu & Passeron 1977, Lin 1999a) cultural capital is obtained by investments (e.g., in pedagogic actions of the reproduction process) of the ruling class in reproducing their set of symbols and meanings, which are misrecognized and internalized by the masses (i.e., the dominated people) as their own. Social capital can be distinguished roughly into two facets depending on the level of analysis: on the individual level, social capital is seen as outcome of investments in social networks through which individuals get access to and utilize resources embedded in these networks—see for example the research of Burt (e.g., Burt 1992, 1997b, 1998), Coleman (e.g., Coleman 1988, 1990), Flap (e.g., Boxman et al. 1991, de Graaf & Flap 1988, Flap 1991, Sprengers et al. 1988, Völker & Flap 1999), Lin (e.g., Lin & Bian 1991, Lin & Dumin 1986, Lin et al. 1981) and Marsden (e.g., Campbell et al. 1986, Marsden & Hurlbert 1988). On the group level with partly relations to the individual level, social capital is referred to investments in mutual recognition and acknowledgment to foster solidarity and reproduction of groups (e.g., Bourdieu 1980, 1986, Coleman 1988, 1990, Putnam et al. 1993, Putnam 1995a). Research on the subject of social capital is of current interest in social sciences. With this actuality, the risk of term expansion and dilution of the social capital concept rises. In addition, the concept of social capital has its origin in various disciplines of the social sciences (e.g., sociology, political science) and is associated partly with the same or similar and in some cases with different ideas. It is therefore essential to ascertain first most important approaches to the concept of social capital. These can be found in the writings of Pierre Bourdieu, James S. Coleman and Robert D. Putnam (Adler & Kwon 2002, Lin 1999a, Portes 1998). In the following central arguments of their works are discussed.
2.2.1
Pierre Bourdieu
The French sociologist Pierre Bourdieu (1930-2002) conducted research primarily in the field of social dynamics, in particular about the dynamics of power in society (e.g., Bourdieu 1977). His view of society, as the combination of social relations,
18
2 The Social Capital View
was influenced by Marx. Accordingly, Bourdieu states “what exists in the social world are relations—not interactions between agents or intersubjective ties between individuals, but objective relations which exist independently of individual consciousness and will” (Bourdieu & Wacquant 1992, p. 97). Bourdieu was the first scholar that distinguished himself by a profound theoretical consideration of the social capital concept (Portes 1998) and introduced at the same time his understanding of cultural capital (Bourdieu 1980). Thus, he introduces a differentiated view on capital by reflection far beyond the concept of just material assets and connects it with the social world. Bourdieu (1983) describes the social world as accumulated history. To understand the composition and the functioning of the social world, a pure economic theory view is not sufficient for him, as this has limited all possible ways of exchanges to business exchanges, mainly following the rule of maximization of profit. This manner is (economically) self-interested and has implicitly led to the definition of all other appearances of exchange as non-economic and thus disinterested (Bourdieu 1983, 1986). To get a deeper understanding of the coherence of the social world it is vital to see capital in all its possible appearances. Moreover, the relationships between the different types of capital have to be understood, as the transformation of one type into another is associated with some kind of cost (Bourdieu 1986). For this reason Bourdieu differentiates the following three forms of capital: “Economic capital is immediately and directly convertible into money and may be institutionalized in the forms of property rights” (Bourdieu 1986, p. 243). “Cultural capital is convertible, on certain conditions, into economic capital and may be institutionalized in the forms of educational qualifications” (Bourdieu 1986, p. 243). “Social capital is made up of social obligations (“connections”), which is convertible, in certain conditions, into economic capital and may be institutionalized in the forms of a title of nobility” (Bourdieu 1986, p. 243).
In the view of Bourdieu economic capital (e.g., money, assets, property, etc.) builds the root of all other types of capital, as these can be transformed into economic capital. However, a transformation from one type of capital to another is often not possible at an equal rate of exchange. This is due to the different characteristics and anchorings of the various capital forms. Economic capital has a high rate of exchange, since it is the most liquid type of capital. In contrast, the exchange rate
2.2 Considering Social Capital Theorists
19
of social capital is lower, as it is more sticky—due to the fact that it is intangible and inherent in the relations between individuals1 (Bourdieu 1986). Bourdieu links social capital with the micro level, as he sees it as an individual resource, which its owner can utilize, like the money in his bank account (i.e., economic capital) or the acquired education in school (i.e., cultural capital), to promote his personal goals. Moreover, the quality and volume of social capital of individuals or groups of people is closely related to their amount of economic capital and cultural capital and thus a component of class and inequality structure of a society. The individual who has one type of capital is expected to also have the other forms. But if one form of capital is missing, the individual has little chance to compensate for this lack of capital by special configuration with alternative types of capital (Bourdieu 1983, Roßteutscher et al. 2008). In this sense, Bourdieu sees social capital as a resource of individuals, which unfolds its full effect by belonging to the corresponding (i.e., privileged) group or class of society (Bourdieu 1983). Thus, he defines social capital as “the aggregate of the actual or potential resources which are linked to possession of a durable network of more or less institutionalized relationships of mutual acquaintance and recognition” (Bourdieu 1986, p. 248).
In addition, Bourdieu points out that benefits individuals could derive from their social capital are a consequence of their participation with others. As an individual’s social network has to be build over time to become a source of benefits, the individual has to invest strategically in it beforehand. Furthermore, to maintain the acquired social capital a continuous effort of care is needed to sustain useful and lasting relationships that can secure profits. The extent of an individual’s social capital is determined by two properties: firstly, the size of the network of relationships, which can actually be mobilized, and, secondly, the extent of his network friends’ capital in the form of economic and cultural capital (Bourdieu 1983). Exclusive clubs, where the members have to fulfill certain requirements to be accepted (e.g., business or golf clubs), are examples for specific groups that utilize the accumulated collective and individual social capital to obtain, among other things, benefits from their membership. In this context Bourdieu states “the profits which accrue from membership in a group are the basis of the solidarity which makes them possible” (Bourdieu 1986, p. 249). John Field, an American scholar, leaves a critical note on Bourdieu’s social capital perspective as he states, “if there is a normative dimension to Bourdieu’s theory, 1 For a detailed explanation of social capital characteristics in contrast to other forms of capital
please see Section 2.5.
20
2 The Social Capital View
then, it is presumably that social capital generally functions to mask the naked profitseeking of its holders, and is therefore inimical with the open democratic society he espoused in his journalism and political activism” (Field 2008, p. 22).
2.2.2
James Samual Coleman
The American sociologist James S. Coleman (1926-1995) introduces his notion of social capital in his work “Social Capital in the Creation of Human Capital” in the year 1988, in which he uses his approach at the same time for an analysis of dropouts from high school. Human capital is here described as a person’s skills and capabilities. Social capital is understood by Coleman as a resource for action. In this sense, the concept constitutes an opportunity to match the socio-structural context with rational choice theory (Coleman 1988, Roßteutscher et al. 2008). The social capital considerations of Coleman (1988) are based on a distinction of two paradigms that describe and explain social action. First, the general sociological paradigm tries to explain social action out of the social context. Here, in the socialization internalized values and norms are taken into account as causes of action. Second, the economics related rational choice paradigm is based on an individual, that is independent from the environment. This individual is characterized as only pursuing its own interests, whereby the principle of action is the maximization of own benefits. According to Coleman (1988) both paradigms can be connected with the aid of the concept of social capital. He understands social capital as a specific resource for action of individual or collective actors. In his social theory Coleman (1990) pursues a macro-micro-macro model; this means, he sees social structures (i.e., macro level) as factors influencing the actions of individual members of society (i.e., micro level), which in turn constitute the social structures (i.e., macro level). The social relations between actors have at the micro level to be understood as individual characteristics or resources, while they are at the macro level components of social structures. The dual nature of these relationships is marked by Coleman with the concept of social capital (Coleman 1990, Roßteutscher et al. 2008). Coleman defines social capital itself in an open and rather general way and sees it as a function, anchoring it in the relations between individuals. He states: “social capital is defined by its function. It is not a single entity, but a variety of different entities having two characteristics in common: They all consist of some aspect of social structure, and they facilitate certain actions of individuals who are within the structure” (Coleman 1990, p. 302).
2.2 Considering Social Capital Theorists
21
In extension of the functional definition of social capital, Coleman indicates important manifestations of social capital. Social capital can occur in social relations in the form of obligations and expectations, information potential, norms and effective sanctions as well as authority relations (Coleman 1990). In contrast to Bourdieu’s view on social capital, Coleman gives it a wider notion. He considers social capital not just as a resource that can be obtained by privileged groups but opens it and recognizes its value for all types of groups and communities, this includes the powerless and outsiders. In his understanding a person does not own social capital as it develops by undertaking action together with others, for example, in communities or groups, and is available to a person like a resource, which is based on trust and shared values (Coleman 1988, Gauntlett 2011). Moreover, social capital builds a source of information as well as of norms and sanctions, which can lead to beneficial actions but can also be limiting (Coleman 1988). Thereby social capital is formed by individuals that not only pursue their own interests but also look after others and apply themselves to welfare relevant activities. This involvement is not characterized by the interest of getting rewards or reciprocal expectations but through the faith to do a good deed. Coleman sees the emergence of social capital as a side effect, which results from activities of the involved parties (Coleman 1988, Gauntlett 2011). Coleman expresses the circumstance as follows: “[Social capital] is an important resource for individuals and may affect greatly their ability to act and their perceived quality of life. They have the capability of bringing it into being. Yet, because the benefits of actions that bring social capital into being are largely experienced by persons other than the actor, it is often not in his interest to bring it into being” (Coleman 1988, p. 118).
2.2.3
Robert David Putnam
Another central social capital theorist, the social capital literature draws on, is the American political scientist Robert D. Putnam (born in 1941), who belongs to one of the most important representatives of the social capital theory, not only in the academic field but also beyond, reaching into the wider public sphere (Field 2008). His political advisory function, his relations to non-scientific journals and the choice of provocative headlines for his publications are certainly in large parts responsible for Putnam’s prominence (Schechler 2002). Putnam made the social capital approach popular in political sciences through his works about the effectiveness of social capital on the political and economic performance in Italy and the USA. In two comprehensive studies (Putnam et al.
22
2 The Social Capital View
1993, Putnam 1995a) and additional related articles (e.g., Putnam 1993, 1995b, 2001) Putnam has conceptually developed the approach and applied it empirically. In the first study on northern and southern Italian political institutions, titled “Making Democracy Work: Civic Traditions in Modern Italy” (1993), Putnam identifies a close relationship between the efficiency of government action and the functioning of modern democracy on the one hand, and the existing amount of social capital of a society on the other hand. He underlines the importance of social capital and the quality of civic life for the structure of democratic societies (Putnam et al. 1993). The second study resulting in the articles “Bowling Alone: America’s Declining Social Capital” (1995a), “Tuning In, Tuning Out: The Strange Disappearance of Social Capital in America” (1995b) and the book “Bowling Alone: The Collapse and Revival of American Community” (2000) reveals the decline of Americans’ social capital—expressed through the reduction to be engaged in traditional associations, organizations and networks—in the late twentieth century. Putnam relates social capital to the level of social connectedness and civic engagement in communities, like towns, cities and countries. Thus, he gives social capital a collective character and emphasizes this with the statement “working together is easier in a community blessed with a substantial stock of social capital” (Putnam et al. 1993, p. 35 f.). In this vein Putnam defines social capital as “features of social organization such as networks, norms, and social trust that facilitate coordination and cooperation for mutual benefit” (Putnam 1995a, p. 67).
In the year 2000 an extended description of social capital followed by Putnam, through which the collective character of it is highlighted in more detail: “Whereas physical capital refers to physical objects and human capital refers to the properties of individuals, social capital refers to connections among individuals— social networks and the norms of reciprocity and trustworthiness that arise from them. In that sense social capital is closely related to what some have called “civic virtue”. The difference is that “social capital” calls attention to the fact that civic virtue is most powerful when embedded in a sense network of reciprocal social relations. A society of many virtuous but isolated individuals is not necessarily rich in social capital” (Putnam 2000, p. 19) .
Putnam (1993) sees in social capital a solution for typical problems of collective goods that occur on the societal level. Collective goods are commodities no one can be excluded from using, regardless of whether someone has contributed to their production or not. Rational self-interested individuals will try to benefit from such an
2.2 Considering Social Capital Theorists
23
asset without contributing. If everyone thinks and acts self-interested, in society the achievement of collective goals is endangered (e.g., Olsen 1967). These cooperation problems can be overcome with social capital, with positive consequences for the development of society (Putnam et al. 1993, Roßteutscher et al. 2008). Moreover, Putnam (2000) details social capital as a construct of three connected elements. First, density and reach of community life; second, the in the community accruing social trust through volunteerism and active participation and third, a resulting focus on community values and norms of reciprocity. These three crucial properties of social capital in turn build the foundation of an efficient and functioning democracy. In his publication in the year 2000 Putnam sees all three elements— community commitment, trust and community-building values and norms—of the American society at the end of the twentieth century fade (Putnam 2000, Roßteutscher et al. 2008). In summary of the previously described considerations of the three most important pioneers of social capital conceptualizations, it can be stated that the evolution of social capital is shaped by different research disciplines (i.e., predominantly sociology and political sciences) and by various levels of analysis, that consequently lead to different anchorings of social capital. Bourdieu considers social capital on the group level as a property of privileged classes. Coleman expanded the notion of social capital in comparison to Bourdieu and sees it as a resource for all types of groups, which is created by community activities without expectation of return. Putnam gives social capital a collective character and links it to the level of organizations and groups (Portes 2000, Wollebaek & Selle 2002) and the society level. From the 1990s, more and more researchers from various disciplines utilized the social capital concept for investigating individual-, community- or society-related issues (e.g., Adler & Kwon 2002, Baker 1990, Boxman et al. 1991, Burt 1992, 1997a, b, Inglehart 1997, Portes 1998, Woolcock 1998, Woolcock & Narayan 2000). Since this doctoral research is anchored in the organizational and networkrelated field, the most significant remarks on social capital from an organizational and network-related point of view are examined in the following two sub-sections.
2.2.4
Janine Nahapiet and Sumantra Ghoshal
Janine Nahapiet and the Indian organizational theorist Sumantra Ghoshal (19482004) have introduced a conceptual model in which social capital—divided into three different dimensions, namely the structural, relational and cognitive dimension—is the prerequisite to build up and harness intellectual capital in organizations (Nahapiet & Ghoshal 1998). With the related Academy of Manage-
24
2 The Social Capital View
ment Review article “Social Capital, Intellectual Capital, and the Organizational Advantage” (1998) and the proposed conceptualization of social capital, Nahapiet & Ghoshal have crucially influenced social capital research in the organizational domain. Their paper has been identified as the second most cited article in the decade from 1998 to 2008 in the fields of economics and management and the fifth most influential strategic management article published in the last 26 years by 2009 (Oxford Handbooks Online 2009). Nahapiet & Ghoshal (1998) evaluate that it is essential for building up and harnessing intellectual capital that both the exchange and the combination of knowledge is guaranteed. Intellectual capital is defined as “the knowledge and knowing capability of a social collectivity, such as an organization, intellectual community, or professional practice. [...] Intellectual capital thus represents a valuable resource and a capability for action based in knowledge and knowing” (Nahapiet & Ghoshal 1998, p. 245). To ensure the exchange and the combination of knowledge social capital constitutes a fundamental requirement. In this context Nahapiet & Ghoshal (1998) draw on the understanding of social capital of Bourdieu (1986), Burt (1992) and Putnam (1995a), as they consider in their definitions of social capital not only the structure of relationship networks (like e.g., Baker 1990) but also the actual and potential resources that could be utilized through such meshes. Thus, Nahapiet & Ghoshal define social capital as: “the sum of the actual and potential resources embedded within, available through, and derived from the network of relationships possessed by an individual or social unit. Social capital thus comprises both the network and the assets that may be mobilized through that network” (Nahapiet & Ghoshal 1998, p. 243).
In more detail Nahapiet & Ghoshal (1998) distinguish the hereafter described three dimensions of social capital—namely structural, relational and cognitive—, when giving the conceptual explanation of the effect of social capital on the formation of intellectual capital. Table 2.3 gives an overview of the characteristics of the three social capital dimensions. 1. Structural Dimension of Social Capital: For the distinction of the structural and relational dimension of social capital Nahapiet & Ghoshal (1998) make reference to the thoughts of Granovetter (1992b) about structural and relational embeddedness. In general, structural embeddedness deals with the characteristics of the social system as well as of the mesh of relations and can be understood as the impersonal configuration of relations between individuals or groups (Granovetter 1992b). Nahapiet & Ghoshal draw on these characteristics
2.2 Considering Social Capital Theorists
25
Table 2.3 Overview of Characteristics of Social Capital Dimensions 1. Structural Dimension
2. Relational Dimension
3. Cognitive Dimension
Network ties Network configuration Appropriable organization
Trust Norms Obligations Identification
Shared language and codes Shared narratives
of structural embeddedness and understand the structural dimension of social capital as the overall pattern of relations between actors. Characteristics of the structural dimension are network ties (Scott 1991, Wasserman & Faust 1994), network configuration (Krackhardt 1994) or network morphology (Tichy et al. 1979) as well as appropriable organization (Nahapiet & Ghoshal 1998). These three characteristics of the structural dimension are explained in more detail in the following. – Network ties are seen in social capital theory as an essential means to get access to resources and thus can lead, for example, to information benefits— in other words “who you know affects what you know” (Nahapiet & Ghoshal 1998, p. 252). Moreover, social relations and network ties, respectively, build information channels that can help to save time and investment when gathering information (Coleman 1988, Nahapiet & Ghoshal 1998). – The overall configuration of network ties provides information about the whole network structure and is associated with characteristics like density, connectivity and hierarchy of the network (Nahapiet & Ghoshal 1998). – Appropriable organization is related to networks that were created for a specific purpose but may be utilized for another purpose (Coleman 1988). This means that social capital in the form of ties, norms and trust build up in one situation can under specific circumstances be transferred to another situation and thereby have an effect on patterns of social exchange (Nahapiet & Ghoshal 1998). For example, trust can be transferred from the family context into work related contexts (Fukuyama 1995b) or personal relationships can develop into the business context (Coleman 1990). 2. Relational Dimension of Social Capital: For the explanation of the relational dimension of social capital Nahapiet & Ghoshal (1998) draw on the description of Granovetter’s relational embeddedness. Relational embeddedness refers to personal relations individuals have formed with each other over time—this is by common interactions (Granovetter 1992b). Nahapiet & Ghoshal (1998) further
26
2 The Social Capital View
identify elementary assets that are located in these personal relationships, which are trust and trustworthiness (Fukuyama 1995b, Putnam 1993), norms and sanctions (Coleman 1990, Putnam 1995a), obligations and expectations (Burt 1992, Coleman 1990, Granovetter 1985) as well as identity and identification (Håkansson & Snehota 1995, Merton 1968). These four different assets of the relational dimension are explained in more detail below. – Trust can be defined as the belief that the “results of somebody’s intended action will be appropriate from our point of view” (Misztal 1996, p. 9–10) and is an essential element in relationships to facilitate social exchange in general and cooperative interaction in particular (e.g., Fukuyama 1995b, Gambetta 1988, Nahapiet & Ghoshal 1998, Putnam 1993, 1995a, Ring & Van de Ven 1992, 1994, Tyler & Kramer 1996). – Coleman (1990) sees norms as a kind of control mechanism in groups or societies, which exist “when the socially defined right to control an action is held not by the actor but by others” (Nahapiet & Ghoshal 1998, p. 255). In the process of interaction and exchange among individuals norms set the degree of consensus in a social system and could facilitate openness as well as motivation and willingness to participate in exchange of knowledge (Nahapiet & Ghoshal 1998). – Obligations are referred to as a commitment or duty that is connected with some kind of future activity (Nahapiet & Ghoshal 1998). Coleman (1990) sees obligations as ‘credit slip’—this means, the owner of the credit slip can expect some kind of activity by the debtor to redeem the slip. Nahapiet & Ghoshal (1998) state that obligations and expectations influence the access to knowledge and the motivation of individuals to exchange and combine it. – Nahapiet & Ghoshal (1998) refer to identification as a process where individuals see themselves as part of another individual or a group. Moreover, identification may facilitate the perceived opportunities for exchange and may further increase the frequency of cooperation (Lewicki & Bunker 1996, Nahapiet & Ghoshal 1998). 3. Cognitive Dimension of Social Capital: The cognitive dimension of social capital is linked to resources that provide shared representations, interpretations and systems of meaning among individuals or groups (Cicourel 1973, Nahapiet & Ghoshal 1998). In particular these resources include shared language and codes (Arrow 1974, Cicourel 1973, Monteverde 1995) as well as shared narratives among the actors (Nahapiet & Ghoshal 1998, Orr 1990). – Shared language and codes allow a common understanding and support the access to and exchange of information and knowledge.
2.2 Considering Social Capital Theorists
27
– Shared narratives in the form of myths, stories and metaphors facilitate the creation, exchange and preservation of rich sets of meaning in communities. All these described cognitive elements are essential, as through the use of a common language sense is endowed, which in turn is essential for the diffusion of knowledge within a community (Nahapiet & Ghoshal 1998). All three social capital dimensions are seen to have an influence on an organization’s ability to produce and share intellectual capital. Although the three dimensions and their individual characteristics were considered separately, it is known that some of them were highly interrelated (Nahapiet & Ghoshal 1998). The proposed conceptual model of Nahapiet & Ghoshal (1998) and in specific the social capital conceptualization was applied or adapted in management and information systems related research manifoldly (e.g., Chang & Chuang 2011, Chou & He 2011, de Clercq et al. 2015, Hsu & Hung 2013, Inkpen & Tsang 2005, Tsai & Ghoshal 1998, Wasko & Faraj 2005).
2.2.5
Nan Lin
The American-Chinese sociologist Nan Lin (born in 1938) draws in his considerations of social capital on the work of Granovetter (1973, 1974) and Burt (1992) and thus focuses on the causal relationship between the position of an individual in a network and the associated outcomes arising from the access to resources of other network fellows (Lin 1999a). Building up on the network context, Lin defines social capital in a short statement as “the resources embedded in social networks accessed and used by actors for actions” (Lin 2001, p. 25).
Consequently, Lin forms his reflections of social capital on network theory and defines social structure as “set of social units (positions) that possess differential amounts of one or more types of valued resources” (Lin 2001, p. 33). This described set, which can be understood as the number of contacts of a limited social unit, consists of individuals that stand in different hierarchical positions to each other and contribute their personal tangible and intangible resources to this set of relationships. As a result of the different positions of individuals in the network and thus, through the hierarchy of the contacts, it comes to a varying access to available resources in the social network (Fuchs 2006, Lin 2001).
28
2 The Social Capital View
The mentioned hierarchy of contacts can be depicted as a pyramid, where in the upward direction the amount of possible reachable contacts of an individual decreases. Although an individual can reach a lesser number of contacts upwardly, the reachable contacts occupy more valuable positions, which are equipped with more valuable resources (e.g., power, wealth or reputation). The access to and the exchange of these resources always takes place via two interacting positions in the hierarchy (Fuchs 2006, Lin 2001). In this exchange context, Lin (2001) distinguishes between homophilous and heterophilous interactions. Homophilous interaction takes place between two actors who have similar resources, whereas being spoken of heterophilous interactions when the actors are equipped with dissimilar resources. Lin (2001) proposes three measures to quantify social capital related to social ties and the value of access to resources of others, respectively. These three measures are listed below and depicted in Figure 2.1 and are relevant for the understanding of Lin’s proposed social capital propositions, which are considered subsequently. 1. Upper Reachability: Upper reachability stands for the resources of the uppermost position an actor in a social hierarchical structure can reach through social ties. 2. Heterogeneity: The range of various through social ties accessible resources across vertical positions in a hierarchical structure is expressed through resource heterogeneity. 3. Extensity: Extensity displays the number of positions and the associated resources reachable for an actor in a hierarchical structure through social ties. Lin (2001) developed and explained his comprehensive understanding of a theory of social capital based on seven propositions, which are summarized in the following. 1. The Social-Capital Proposition: Lin’s primary and most important proposition for the theory of social capital reads “The success of action is positively associated with social capital” (Lin 2001, p. 61). This means, the success of an expressive (i.e., maintaining resources) or instrumental (i.e., gaining resources) action is determined by the access and the usage of superior social capital. An individual can perform a purposeful action by accessing superior social capital through another individual, who holds or can obtain more highly valued resources (Lin 2001). Figure 2.2 illustrates the aforementioned case. Ego 1 has a competitive advantage over ego 2, although both are located at nearly the same structural position and horizontal level. The advantage arises as ego 1 activates a social tie to alter 1, which is located at a relatively higher position compared to the social tie to alter 2 accesses by ego 2. Albeit alter 1 and alter 2 are both on the same vertical axis their degree of valued resources differs.
2.2 Considering Social Capital Theorists
29
high
Structural Positions
Upper Reachability
Extensity Heterogeneity
low
Pyramid represents hierarchical character of social structure Figure 2.1 Measures of Social Capital. (Source: Lin (2001, p. 62), reprinted by permission from Cambridge University Press through PLSclear)
2. The Strength-of-Position Proposition: The strength-of-position proposition is related to structural advantages, which arise from the hierarchical positioning of individuals in the network. Lin (2001) proposes a positive relation between the position of an individual in a social structure and his level of available resources. An individual with a relatively high position in the social structure can obtain more highly-valued resources and social capital respectively, for example through harnessing social ties with better resources, and accordingly will have an advantage over the others. The proposition is phrased as “The better the position of origin, the more likely the actor will access and use better social capital” (Lin 2001, p. 64). 3. The Strength-of-Strong-Tie Proposition: The strength-of-strong-tie proposition is based on the structural principle. Obtainable resources are positively linked to social ties to those peers in a social structure with whom an individual shares stronger sentiment and trust. This fact is linked by Lin (2001) to social capital
30
2 The Social Capital View
in the third proposition as “The stronger the tie, the more likely that the social capital accessed will positively affect the success of expressive action” (Lin 2001, p. 65). 4. The Strength-of-Weak-Tie Proposition: Lin (2001) draws for the strength-ofweak-tie proposition on the work of Granovetter (1973, 1974) and links characteristics of weak ties to the context of social capital. Weak ties are—in comparison to strong ties—accompanied by less interaction as well as less sentiment and linked with more various resources. This is due to less frequent contact as well as less intimacy and less intensity, and following the principle of heterogeneity and upper reachability. The proposition states: “The weaker the tie, the more likely ego will have access to better social capital for instrumental action” (Lin 2001, p. 67). 5. The Strength-of-Location Proposition: The strength-of-location proposition reads “The closer individuals are to a bridge in a network, the better social capital they will access for instrumental action” (Lin 2001, p. 69). A social bridge is here understood as “the sole link between two groups of actors” (Lin
Structural Positions
high
a1 e2
e1 a2
e1: ego 1 a1: alter 1 e2: ego 2 a2: alter 2
low
Pyramid represents hierarchical character of social structure Figure 2.2 The Social-Capital Proposition: Relative Effect of Social Capital. (Source: Lin (2001, p. 61), reprinted by permission from Cambridge University Press through PLSclear)
2.2 Considering Social Capital Theorists
31
2001, p. 70). Through social bridges resources in both groups can be accessed. This means, the closer an individual is to a broker, connecting two otherwise unrelated groups, the more valuable are the resources—as different from those existing in his own group—he can access through the aid of the social bridge. 6. The Location-by-Position Proposition: The location-by-position proposition can be seen as a combination of the second and the fifth proposition. Figure 2.3 illustrates the coherence. The vertical axis displays the hierarchy of a structure, where higher positions are related to better resources. Ego occupies the position of a broker and his relations to A and B form two social bridges. The relation of ego to A is equipped with more benefit for the members of ego’s group than his social tie to B. This is due to the principle of hierarchy and related resources. A’s group is relatively higher located in comparison to ego’s and B’s group
Hierarchical Axis
high
A
ego
B
low
Pyramid represents hierarchical character of social structure Figure 2.3 The Location-by-Position Proposition: Differential Advantages of Structural Bridges and Weaker Ties in a Hierarchical Structure. (Source: Lin (2001, p. 72), reprinted by permission from Cambridge University Press through PLSclear)
32
2 The Social Capital View
and thus provides access to better resources for ego’s group compared to B’s group. Following this situation the location-by-position proposition is phrased as “The strength of a location (in proximity to a bridge), for instrumental action, is contingent on the resource differential across the bridge.” (Lin 2001, p. 71). The social ties of ego to both groups extend resource heterogeneity for ego’s group peers and also does the connection to A’s group extend the upper reachability for the group members of ego. 7. The Structural Contingency Proposition: The structural contingency proposition outlines the limitation of specific structural positions in relation to the accessibility of better resources and reads “Networking (tie and location) effects are constrained by the hierarchical structure for actors located near or at the top and bottom of the hierarchy” (Lin 2001, p. 74). Figure 2.4 illustrates the structural constraints for specific hierarchical positions. Ego 1 is located near the upper ceiling in a hierarchical structure and has limited options to access social ties and reach for better resources on a higher vertical position. Ego 3 with a
high
Structural Positions
e1
e2
e3
low
e1: ego 1 e2: ego 2 e3: ego 3
Pyramid represents hierarchical character of social structure Figure 2.4 The Structural Contingency Proposition: Structural Constraints on Networking Effects. (Source: Lin (2001, p. 74), reprinted by permission from Cambridge University Press through PLSclear)
2.2 Considering Social Capital Theorists
33
position near the lower ceiling is structurally limited in chances to access social ties vertically in either directions. In comparison to ego 1 and ego 3 does ego 2 have the advantage to being able to expand upwards and thus the possibility to access better resources due to the position in the middle range of the hierarchy. The aforementioned considerations of Lin result in a model of social capital. Lin assumes that three factors, namely structural position, network location and purpose of action influence the composition and level of an individual’s social capital. The social capital model of Lin, which is displayed in Figure 2.5, shows that the structural position has a direct effect on the production of social capital as well as an indirect effect through the network location of an actor in a specific network structure. The purpose of action (instrumental or expressive) does also influence the development of social capital indirectly (Fuchs 2006, Lin 2001). The structural position, network location and the purpose of action influence, depending on the characteristics of the upper reachability, the heterogeneity and the extensity in a specific network the amount of social capital, which in turn determines the return of social capital in form of wealth, power or reputation (Fuchs 2006, Lin 2001). Structural Position (pyramidal hierarchy)
Network Location
Social Capital
(tie strength and bridging)
(upper reachability, heterogeneity and extensity of embedded resources)
Return (wealth, power, reputation)
Purpose of Action (instrumental or expressive)
Figure 2.5 Lin’s Model of the Social Capital Theory. (Source: Lin (2001, p. 76), reprinted by permission from Cambridge University Press through PLSclear)
As can be seen from the two prior described reflections on social capital, a very mature and detailed discussion about the conceptualization of social capital in the organizational field (Nahapiet & Ghoshal 1998) as well as the derivation and justification of a theory of social capital (Lin 1999a, 2001) has occurred in the late 1990s and early 2000s.
34
2 The Social Capital View
2.3
Defining Social Capital
The two previous sections have given an overview of the main influential pioneers of the social capital concept and introduced the considerations of the most relevant social capital theorists for this study in more detail. In addition, a broad range of researchers investigated the concept of social capital over the last decades. A variety of contributions arose from these investigations involving disciplines such as2 • political sciences (e.g., Brehm & Rahn 1997, Knack 2002, Newton 2001, Putnam et al. 1993, Putnam 1995a, b), • sociology (e.g., Burt 1992, 1997b, 2004, Coleman 1988, 1990, Granovetter 1973, Lin 1999b, Portes 1998, Portes & Sensenbrenner 1993, Woolcock 1998), • economics (e.g., Glaeser et al. 2002, Guiso et al. 2004, Knack & Keefer 1997, Pretty & Ward 2001, Woolcock & Narayan 2000) and • management (e.g., Adler & Kwon 2002, Burt 1997b, Inkpen & Tsang 2005, Nahapiet & Ghoshal 1998, Tsai & Ghoshal 1998, Wasko & Faraj 2005). However, the debate on the issue of social capital is not entirely new. The insight that social capital has a positive influence on the functioning of communities dates back to Lyda J. Hanifan (1916), who referred to social capital as first author in the academic literature. In consequence of the mentioned contributions from a variety of research fields, many definitions for social capital arose over the last years, as the social capital concept was applied to and modified for different research contexts (Adler & Kwon 2002, Dasgupta & Serageldin 1999, Lin 1999a, Sobel 2002). As an example for the plethora of social capital studies Adler & Kwon (2002, p.17) mention the following areas and scholars, among others, in organizational research affected by social capital3 : Social capital ... • ... influences career success (e.g., Burt 1992, Choi 2018, Erickson 2001, Fernandez & Castilla 2001, Flap & 2 To
each research discipline a selection of literature with particular significance is given. list extended by the doctoral candidate.
3 Researchers
2.3 Defining Social Capital
•
• •
•
35
Boxman 2001, Gabbay & Zuckerman 1998, Gubbins & Garavan 2016, Mouw 2003, Podolny & Baron 1997, Richardson et al. 2017, Seibert et al. 2001), ... facilitates resource exchange and product innovation (e.g., Cuevas-Rodríguez et al. 2014, Gabbay & Zuckerman 1998, Hansen 1998, Laursen et al. 2012, Molina-Morales & Martínez-Fernández 2010, Tsai & Ghoshal 1998, Zhang et al. 2015, Zhang et al. 2018), ... facilitates the creation of intellectual capital (e.g., Hargadon & Sutton 1997, Liu 2017, Madhavaram & Hunt 2017, Nahapiet & Ghoshal 1998, Ramadan et al. 2017, Subramaniam & Youndt 2005), ... facilitates entrepreneurship (e.g., Bizri 2017, Burkemper 2017, Cao et al. 2015, Chung & Gibbons 1997, Davidsson & Honig 2003, Henley et al. 2017, McKeever et al. 2014, Neumeyer et al. 2018, Williams et al. 2018, Zhou et al. 2017) and the formation of start-up companies (e.g., Alexy et al. 2012, Pedrini et al. 2016, Pirolo & Presutti 2010, Semrau & Hopp 2016, Shane & Stuart 2002, Walker et al. 1997) and ... strengthens supplier relations (e.g., Asanuma 1985, Baker 1990, Carey & Lawson 2011, Doner & Smitka 1992, Dore 1983, Gerlach 1992, Helper 1990, Hughes & Perrons 2011, Krause et al. 2007, Preston et al. 2017, Roden & Lawson 2014, Uzzi 1997, Villena et al. 2011, Whipple et al. 2015).
The multiple fields and research disciplines for which the social capital concept was and is adapted can be a pitfall, as the varying definitions also include different ways to conceptualize and measure social capital, which in turn lead to a diversity in evaluating the causal mechanisms in the macro (i.e., collective) and micro (i.e., individual) processes (Lin 1999a). Moreover, these circumstances make it difficult to form a generally accepted definition for social capital (Durlauf 2002, Lin 2001, Morrow 1999, Steineld et al. 2008). Nevertheless, even if social capital is a very elastic term (Lappé & Du Bois 1997) there is agreement that it refers in a broad sense to benefits an individual could receive from his social relationships (Lin 1999a, Steineld et al. 2008). In order to review the broad understanding of the notion of social capital, in the following a tabular overview of the most common definitions is given (see Table 2.4). As the social capital definitions are on the abstract level broadly similar, they reveal a fine characteristic by which they can be categorized, this is their type of primary focus (Adler & Kwon 2002). The type of primary focus specifies the kind of ties an individual could have with his peers or the collectivity and is divided into the following three categories:
36
2 The Social Capital View
1. Focus on the Relations of an Individual with other Individuals. This view sees social capital as a resource resided in the external connections of an individual. Thus, it is also referred to as bridging (Gittell & Vidal 1998, Putnam 2000, Svendsen 2006) or communal (Oh et al. 1999) form of social capital that inheres in mostly weak social network ties of an individual in relation to others (Adler & Kwon 2002, Granovetter 1973). 2. Focus on the Structure of Relations among Individuals within a Collectivity. This view of social capital focuses on the internal ties within a collectivity (e.g., organization, community or nation) and is also known as bonding (Putnam 2000, Svendsen 2006) or linking (Oh et al. 1999) form of social capital. It can be found in mostly strong relations among individuals or groups within a collectivity and facilitates group cohesiveness and the pursuit of common goals (Adler & Kwon 2002). 3. Focus on Internal as well as External Types of Linkages. Definitions of social capital that take into account both, internal and external types of linkages, can be seen as neutral with respect to the internal and external dimension. This neutrality of these social capital definitions is an advantage, as the treatment of social capital as internal or external is mainly a matter of perspective and unit of analysis. Moreover, the internal and external perspective on social capital are not mutually exclusive (Adler & Kwon 2002). With regard to the different definitions given of social capital in Table 2.4, some of these explain social capital via its function (e.g., Coleman 1990). For a clear distinction of the sources of social capital and the effects of social capital it is vital to distinguish the resources themselves from the possibility to acquire them by purposive actions or various memberships in social structures—this differentiation is clearly made by Bourdieu but blurry by Coleman (Portes 1998). As a result, tautological statements can emerge if social capital is equated with the resources accessed through it (Foley & Edwards 1999, Portes 1998). This circumstance was taken up by Adler & Kwon (2002) and evaluated in more detail. Consequently, Adler & Kwon (2002) developed a definition of social capital, which precisely incorporates the substance, the sources and the effects of social capital. As this definition constitutes a good example for illustrating the interaction of the three components of social capital, the definition is explained in the following and reads: “Social capital is the goodwill available to individuals or groups. Its source lies in the structure and content of the actor’s social relations. Its effects flow from the information, influence, and solidarity it makes available to the actor” (Adler & Kwon 2002, p. 23).
Baker
External, Communal, Bridging
Portes
Knoke
Boxman, De Graaf & Flap Burt
Bourdieu & Wacquant
Belliveau, O’Reilly & Wade Bourdieu
Authors
Type of Focus
(continued)
“the aggregate of the actual or potential resources which are linked to possession of a durable network of more or less institutionalized relationships of mutual acquaintance and recognition” (1986, p. 248) “made up of social obligations (“connections”), which is convertible, in certain conditions, into economic capital and may be institutionalized in the forms of a title of nobility” (1986, p. 243) “the sum of the resources, actual or virtual, that accrue to an individual or a group by virtue of possessing a durable network of more or less institutionalized relationships of mutual acquaintance and recognition” (1992, p. 119) “the number of people who can be expected to provide support and the resources those people have at their disposal” (1991, p. 52) “friends, colleagues, and more general contacts through whom you receive opportunities to use your financial and human capital” (1992, p. 9) “the brokerage opportunities in a network” (1997a, p. 355) “the process by which social actors create and mobilize their network connections within and between organizations to gain access to other social actors’ resources” (1999, p. 18) “the ability of actors to secure benefits by virtue of membership in social networks or other social structures” (1998, p. 6)
“a resource that actors derive from specific social structures and then use to pursue their interests; it is created by changes in the relations among actors” (1990, p. 619) “an individual’s personal network and elite institutional affiliations” (1996, p. 1572)
Definitions of Social Capital
Table 2.4 Definitions of Social Capital (Source: adapted from Adler & Kwon (2002, p. 20) and extended by the doctoral candidate, reprinted by permission from Academy of Management (NY), permission conveyed through Copyright Clearance Center, Inc.)
2.3 Defining Social Capital 37
Internal, Linking, Bonding
Type of Focus
Table 2.4 Continued Definitions of Social Capital
(continued)
“social capital refers to the institutions, relationships, and norms that shape the quality and quantity of a society’s social interactions. [...] Social capital is not just the sum of the institutions which underpin a society—it is the glue that holds them together” (1999) Brehm & Rahn “the web of cooperative relationships between citizens that facilitate resolution of collective action problems” (1997, p. 999) Coleman “Social capital is defined by its function. It is not a single entity, but a variety of different entities having two characteristics in common: They all consist of some aspect of social structure, and they facilitate certain actions of individuals who are within the structure” (1990, p. 302) Fukuyama “the ability of people to work together for common purposes in groups and organizations” (1995b, p. 10) “Social capital can be defined simply as the existence of a certain set of informal values or norms shared among members of a group that permit cooperation among them” (1997) Inglehart “a culture of trust and tolerance, in which extensive networks of voluntary associations emerge” (1997, p. 188) Lin “the resources embedded in social networks accessed and used by actors for actions” (2001, p. 25) Portes & Sensenbrenner “those expectations for action within a collectivity that affect the economic goals and goal-seeking behavior of its members, even if these expectations are not oriented toward the economic sphere” (1993, p. 1323) Putnam “features of social organization such as networks, norms, and social trust that facilitate coordination and cooperation for mutual benefit” (1995a, p. 67)
World Bank
Authors
38 2 The Social Capital View
Both
Type of Focus
Table 2.4 Continued
“those voluntary means and processes developed within civil society which promote development for the collective whole” (1996, p. 11) “Social capital is the goodwill available to individuals or groups. Its source lies in the structure and content of the actor’s social relations. Its effects flow from the information, influence, and solidarity it makes available to the actor” (2002, p. 23) “naturally occurring social relationships among persons which promote or assist the acquisition of skills and traits valued in the marketplace [...] an asset which may be as significant as financial bequests in accounting for the maintenance of inequality in our society” (1992, p. 100) “the sum of the actual and potential resources embedded within, available through, and derived from the network of relationships possessed by an individual or social unit. Social capital thus comprises both the network and the assets that may be mobilized through that network” (1998, p. 243) “the web of social relationships that influence individual behavior and thereby affects economic growth” (1997, p. 154) “the set of elements of the social structure that affects relations among people and are inputs or arguments of the production and/or utility function” (1992, p. 160) “the information, trust, and norms of reciprocity inhering in one’s social networks” (1998, p. 153)
Thomas
Woolcock
Schiff
Pennar
Nahapiet & Ghoshal
Loury
Adler & Kwon
Definitions of Social Capital
Authors
2.3 Defining Social Capital 39
40
2 The Social Capital View
“Goodwill that others have towards us is a valuable resource” (Adler & Kwon 2002, p. 18). Here, goodwill builds the substance of social capital and is seen as the sympathy (Robison et al. 2002), trust (Adler 2001) and forgiveness (Williamson 1985) offered to us by family, friends and other contacts (Adler & Kwon 2002). The sources of social capital arise from the social structure an actor is located in (Adler & Kwon 2002). Precisely speaking “social capital is the resource available to actors as a function of their location in the structure of their social relations” (Adler & Kwon 2002, p.18). In addition, the effects of social capital originate from the information, influence and solidarity the goodwill provides to the actor. For each actor a resulting effect has an individual value (Adler & Kwon 2002). In the two previous and in this section the most relevant social capital thinkers and their definitions of social capital as well their ideas for a social capital theory were presented. From all given definitions it can be briefly summarized that social capital is based on the potential that results from the linking of heterogeneous actors. In addition, • trust (e.g., Coleman 1988, Inglehart 1997, Nahapiet & Ghoshal 1998, Putnam 1995a, Woolcock 1998), • norms (e.g., Coleman 1988, Fukuyama 1997, Nahapiet & Ghoshal 1998, Putnam 1995a, Woolcock 1998) and • obligations (e.g., Bourdieu 1986, Coleman 1988, Nahapiet & Ghoshal 1998) as well as • cognitive proximity (e.g., Nahapiet & Ghoshal 1998) facilitated by shared codes and narratives as well as a shared language are relevant informal factors for the formation of social capital and in turn vital for the access to resources, which are provided through social capital. Furthermore, it follows from the given definitions that network structure and hierarchy are also of relevance for the actors’ access to resources (e.g., Adler & Kwon 2002, Lin 2001, Lin 1999a).
2.4
Social Capital Research Today
The far-reaching theoretical elaboration of social capital by the end of the 1990s (see Section 2.1) has led the foundation for a variety of analyses and research building upon it. Figure 2.6 shows the quantity of social capital related academic articles published in the field of social sciences from 1990 (start of the database) to 2015. In the year 2000 around 175 social capital related published articles were catalogued
2.4 Social Capital Research Today
41
in the Clarivate Analytics Social Sciences Citation Index4 (SSCI) and a decade later the number has increased to 745 identified articles released in the year 2010. Another five years later about 1,085 social capital related academic publications were recorded for the year 2015. At first glance, Figure 2.6 shows that after years of constant increase the number of published social capital related articles has stabilized in the years 2013 and 2014. At second glance, it is noticeable that there was a standout increase in identified publications related to social capital in 2015 compared to the year 2014. This circumstance is contrary to the presumption of Kwon & Adler (2014) that the development of social capital related publications and citations may be at a turning point around the year 2013. This assumption is also refuted by considering the development of citations related to academic social capital articles (see Figure 2.7). The number of citations shows an exponential increase over the years. In the year 2000 there were around 1,100 citations of social capital related
Figure 2.6 Overview of Published Social Capital related Articles per Year. (The search in the Clarivate Analytics Social Sciences Citation Index (http://www.isiknowledge.com, last accessed December 18, 2018) was performed on January 12, 2016 with the following parameters: database: Web of Science™ Core Collection [1990-present]; search expression: “social capital” in the topic search field, which searches in the title, abstract, author keywords and Keywords Plus® records; timespan: start of the database until 2015. The search result contained 9,646 records.). (Source: Clarivate Analytics Social Sciences Citation Index, certain data included herein are derived from Clarivate Web of Science. © Copyright Clarivate 2016. All rights reserved.) 4 http://www.isiknowledge.com,
last accessed December 18, 2018.
42
2 The Social Capital View
academic articles identified in the SSCI, a decade later in 2010 circa 15,000 citations and for the year 2015 about 28,500 citations were recorded. This overall picture is underlying the statement of Baker & Faulkner (2009) that social capital is a growth industry. When viewing Figures 2.6 and 2.7 two issues have to be considered: Firstly, it looks as if the concept of social capital is an invention of the 1990s. This is not the case, as beforehand studies on the subject of social capital have been conducted (see Section 2.1). Secondly, although the trend of published social capital related articles per year as well as the trend of citations related to social capital articles per year are harmoniously increasing, social capital research is not a cumulative research, as the name social capital was and is used in multiple research disciplines (see Section 2.1 and 2.3). Moreover, different works related to social capital from various contexts have contributed more or less independently of each other to the development of the concept over the years (Franzen & Freitag 2007).
Figure 2.7 Overview of Citations related to Social Capital Articles per Year. (The search in the Clarivate Analytics Social Sciences Citation Index (http://www.isiknowledge.com, last accessed December 18, 2018) was performed on January 12, 2016 with the following parameters: database: Web of Science™ Core Collection [1990-present]; search expression: “social capital” in the topic search field, which searches in the title, abstract, author keywords and Keywords Plus® records; timespan: start of the database until 2015. The search result contained 9,646 records.). (Source: Clarivate Analytics Social Sciences Citation Index, certain data included herein are derived from Clarivate Web of Science. © Copyright Clarivate 2016. All rights reserved.)
2.4 Social Capital Research Today
43
To get an impression how book publications on the topic of social capital evolved over time, besides the above mentioned published social capital related articles in academic journals, the Google Books Ngram Viewer5 was utilized. Figure 2.8 shows that from the mid-1990s until 2003 there was a sharp exponential increase in book releases related to the social capital field. From 2003 to 2008 book publications are mostly stable on the level reached before. In comparison to book publications regarding OSS or online communities the social capital topic clearly stands out. The numbers on the Y-axis of Figure 2.8 depict the percentage of references for each term in relation to the total of referenced books in the underlying corpus.
Figure 2.8 Overview of Published Social Capital related Books per Year. (The search with the Google Books Ngram Viewer (https://books.google.com/ngrams, last accessed December 18, 2018) was executed on January 12, 2016 with the following setting: English corpus; timespan: 1970 until 2008 (corpus end date); search terms: “Social Capital, Open Source Software, Online Communities”; no smoothing level.) (Source: Google Books Ngram Viewer)
From the investigation, how academic social capital publications and citations developed over the last years, it can be concluded that social capital research is still emerging. As research pertaining to the theoretical concept of social capital has reached a certain maturity (Kwon & Adler 2014), the trend of recent academic publications related to social capital can be attributed to empirical research on a broad variety of topics involving social capital. This development is also depicted by the treemap (see Figure 2.9) showing the Top 10 research fields utilizing social capital as of May 2018. The outlined Top 10 research fields give the respective record count of social capital publications in the associated field of research as well as the percentage in relation to the total SSCI search result counting 11,454 records. Although the identified Top 10 social capital research fields make up almost 84 % 5 https://books.google.com/ngrams,
last accessed December 18, 2018.
44
2 The Social Capital View
of the search results, there is still a wide variation of the remaining 16 %, which are distributed over a further 181 research fields underlining the broad utilization of the social capital concept in academia.
(14.41%)
(10.75%)
(6.86%)
(6.43%)
(9.81%) (5.28%)
(4.84%)
(11.89%)
(8.33%) (5.18%)
Figure 2.9 Treemap of Research Fields utilizing Social Capital. (The search in the Clarivate Analytics Social Sciences Citation Index (http://www.isiknowledge.com, last accessed December 18, 2018) was performed on May 10, 2018 with the following parameters: database: Web of Science™ Core Collection [all years]; search expression: “social capital” in the topic search field, which searches in the title, abstract, author keywords and Keywords Plus® records; timespan: start of the database until May 2018. The search result contained 11,454 records.) (Source: Clarivate Analytics Social Sciences Citation Index, certain data included herein are derived from Clarivate Web of Science. © Copyright Clarivate 2018. All rights reserved.)
2.5
Characteristics of Social Capital compared to other Forms of Capital
The understanding of the term social capital by theorists and scholars and its diverse application in the literature were explained in the previous sections (see e.g., Section 2.3 and 2.4) of this work. Besides the effort of researchers to define social capital and to examine it empirically, some scholars question (e.g., Adler & Kwon 2002, Fine 2001, Robison et al. 2002), whether the resource ‘social capital’ is named correctly with reference to the meaning of capital in general. Moreover, there is a critical discussion in the literature about the use of the term capital with respect to
2.5 Characteristics of Social Capital compared to other Forms of Capital
45
social capital (e.g., Falk & Kilpatrick 2000, Inkeles 2000, Smith & Kulynych 2002, Sobel 2002). Social capital consists of features which are similar to other forms of capital. At the same time it also shares characteristics that distinguishes it from other types of capital (Araujo & Easton 1999, Robison et al. 2002). In the following these characteristics were considered in more detail, based on an extended evaluation of Adler & Kwon (2002, p. 21 f.). 1. Long-Lived Asset: Social capital is a long-lived asset, like all other forms of capital. Other resources can be invested in it with the prospect of subsequent benefits, although these may be uncertain (e.g., Bourdieu 1986). Seeing social capital as inherent in relationships to others, internal and external relations can be differentiated. Investments of collective actors in the development of their internal relations result in a reinforced collective identity and enhances their capacity for collective action. In addition, both individuals and collective actors can extend their social capital by investing in their external relations. Through building up an external network they could obtain advantages, like access to exclusive information, power and solidarity (Adler & Kwon 2002, Lin 2001). Moreover, social capital—like all forms of capital—can lead to both positive and negative benefits for the individual actor as well as others (Adler & Kwon 2002), which are outlined in Section 2.6. 2. Appropriability and Convertibility: All types of capital share the characteristic that they are ‘appropriable’ (Coleman 1988) and ‘convertible’ (Bourdieu 1983, 1986). In the sense of appropriability social capital is like physical capital. Physical capital (i.e., a factor of production, for example, machinery or buildings) can be utilized for various objectives with varying degrees of effectiveness. Accordingly, social capital is appropriable in the sense that an individual utilizes his network ties—like friendship relations—for other intentions, which could be information gathering or advice (Adler & Kwon 2002). Besides that, social capital has a kind of convertibility in common with other types of capital. Here, the network position an individual possesses is linked with his social capital, which can be converted into economic capital or other benefits (Bourdieu 1986, Lin 2001). The convertibility rate of social capital into economic capital, for instance, is lower than of economic capital into social, human or cultural capital, as social capital is less liquid and more sticky (Adler & Kwon 2002, Anheier et al. 1995, Bourdieu 1986, Smart 1993). 3. Substitute and Complement: Another characteristic of social capital is that it can be a substitute for other resources or it can complement them—like this is also the case of other kinds of capital. In the sense of a substitute, an individual’s social
46
4.
5.
6.
7.
2 The Social Capital View
capital—here in the form of superior relations to others—can compensate a lack of financial or human capital. Furthermore, social capital can complement other types of capital, for example by enhancing the efficiency of economic capital via diminishing transaction costs (Adler & Kwon 2002, Lazerson 1995, Robison et al. 2002). Maintenance: In contrast to financial capital, social capital as well as physical and human capital need maintenance. Social ties have to be used and renewed from time to time otherwise they lose their impact (Bourdieu 1986). In addition, social capital and human capital have in contrast to physical capital an unpredictable rate of depreciation. Through disuse or misuse social capital could depreciate, while with use it will not (Robison et al. 2002). Human capital and some kinds of public goods (e.g., knowledge) normally increase with use, this fact also holds true for social capital (Adler & Kwon 2002). Collective Good: Internal, bonding social capital has the character of a ‘collective good’ unlike other types of capital. The individual who benefits from this ‘collective good’ does not exclusively own it (Coleman 1988). The usage of this form of social capital is non-rivalrous, so that it can not be used up by an individual and the availability for others is not restricted. But in comparison to a public good the use of internal, bonding social capital is excludable, so that others can be expelled from a network of relations (Hechter 1988). In comparison to internal, bonding social capital external, bridging social capital is more like a private good and can be seen as a business goodwill (Adler & Kwon 2002). Location: Another significant difference of social capital, in comparison to other types of capital, is its location. It is not held by any person but it is located in their relationships with other people (e.g., Coleman 1988). This circumstance is explained by (Burt 1992, p. 58) as follows: “No one player has exclusive ownership rights to social capital. If you or your partner in a relationship withdraws, the connection dissolves with whatever social capital it contained.” It follows that building up social capital takes place through a mutual process of commitment of both partners, whereas the defection of only one partner will destruct it (Adler & Kwon 2002). Measurement: A major distinction of social capital in relation to other assets that economists term ‘capital’ is its non-suitability for a quantified measurement of investments in its advancement (Solow 1997). Although, the outcomes that derive from social capital (e.g., benefits) can be quantified, the term capital has to be understood in a metaphorical sense, since the endeavors made for developing social relations and networks, respectively, cannot be measured (Adler & Kwon 2002, Fernandez et al. 2000).
2.6 Drawbacks of Social Capital
47
Summarizing the above presented explanations of different characteristics of social capital in relation to other types of capital, it can be concluded that social capital can be classified in the heterogeneous field of resources commonly referred to as capital. Thus, this resource, which exists in relations between individuals can be named capital, although in some cases the term social capital has to be understood in a metaphorical sense (Adler & Kwon 2002, Robison et al. 2002). However, social capital as a long-lived asset does not only have positive benefits. Possible drawbacks and negative benefits connected with it are elucidated in the next section.
2.6
Drawbacks of Social Capital
In research, the effects and consequences of social capital have been identified to facilitate and improve for example, social community life (e.g., Coleman 1988, Hanifan 1916, Woolcock & Narayan 2000) or democracy (e.g., Putnam 1993, 2000) and many more. Nevertheless, besides the positive effects of social capital also negative benefits can be figured out, so that social capital has an adverse impact on socioeconomic outcomes. In the literature at least four negative consequences of social capital are recognized and described, these include exclusion of outsiders, excess claims on group members, restrictions on individual freedoms and downward leveling norms (Portes 1998, Portes & Sensenbrenner 1993). On the basis of the elaboration of Portes (1998, p. 15 ff.) these consequences are depicted in more detail below. 1. Exclusion of Outsiders: The phenomenon of the exclusion of others and the denial of access to certain resources or benefits, respectively, investigated in the context of social capital is often related to specific communities or interest groups. Thereby, two perspectives of consideration have to be distinguished. On the one hand, the members of a group or community usually form bonding social capital with strong ties to each other, which bring mostly benefits for the members. On the other hand, the strong bonding among the members put them in a position to obstruct the access of others to the benefits generated within the specific community (Daniel et al. 2003). As examples can the monopoly of Jewish merchants over the New York diamond trade or the dominance of Cubans over numerous areas of the Miami economy be mentioned (Portes 1998, Waldinger 1995). In sum, the social capital emerging from strong bonding ties and trust within a group builds the basis for its economic advantage. Though, “the same social relations that [...] enhance the ease and efficiency of economic
48
2 The Social Capital View
exchanges among community members implicitly restrict outsiders” (Waldinger 1995, p. 557). 2. Excess Claims on Group Members: In contrast to the first explained circumstance, closure of groups or communities can likewise be a threat for economic achievements (e.g., business success) of individuals in a group or community. Shared normative structure—including strong norms for assistance—in a community can facilitate a free-riding problem, as less diligent or successful members place demands (e.g., job or loan demands) on their more successful peers, such as thriving entrepreneurs (Geertz 1963, Weber 1947). However, these are then slowed down in their continued (entrepreneurial) success by the obligation to support other community members. Here, the claimants’ social capital in the specific group context allows them to get access to resources that would otherwise not be possible (Geertz 1963, Portes 1998, Weber 1947). 3. Restrictions on Individual Freedoms: Belonging to a specific group or community may be connected with the possibility to have access to valuable resources and in turn to better social capital (see Sub-Section 2.2.5). But on the other hand there may also be constraints of freedom of individuals (e.g., social control) engaged in these communities (e.g., neighborhood communities: neighbors know each other and look to a certain degree after each other) (e.g., Boissevain 1974). The constraints are related to demands for conformity and the compliance of group-specific norms. In addition, these could also restrict individual privacy and autonomy (Portes 1998, Portes & Sensenbrenner 1993). 4. Downward Leveling Norms: The fourth negative consequence of social capital is connected with a common experience of adversity or subordination of a group and its opposition to mainstream society, on which the solidarity of the group members is based. This condition can lead to downward leveling norms, which ensure that the members of the suppressed group are kept in their places (Portes 1998, Portes & Sensenbrenner 1993). Examples for and researched contexts of this drawback are the group of Puero Rican crack dealers in the Bronx (Bourgois 1991, 1995) or the behavior of different ethnic youth groups in the U.S. (MatuteBianchi 1986, 2008, Stepick 1992, Suarez-Orozco 1987). In summary, it can be concluded that social capital has also a ‘dark side’ (Field 2008, Ostrom 1997), although predominantly the positive effects of it were viewed and highlighted in research (e.g., Coleman 1990, Loury 1977). The distinction whether social capital in a given context has good or bad outcomes also depends on the level of analysis, this is considering the behavior of members of a group from the inside of the group or from the outside (van Deth & Zmerli 2010). Some examples that illustrate how embeddedness in social structures may result in socially
2.7 Social Capital in the Context of Organizations
49
undesirable outcomes (e.g., corruption, crime, limitation of public welfare, negative social influences) are crime associations, mafia families, gambling and prostitution rings as well as youth gangs (e.g., Callahan 2005, Collier & Garg 1999, Gambetta 1996, Portes 1998, Svendsen 2006, Whitley 1991).
2.7
Social Capital in the Context of Organizations
As this research is anchored in the organizational field a more detailed analysis on the subject of social capital in relation to organizations and employees is substantial to get an overall understanding of the role of social capital in the context of organizational research. The potential of social capital has been recognized by organizational scholars and thus extensive research has been undertaken to better understand management and organizational phenomena (Payne et al. 2011). However, as already described on a more general level (see Section 2.3) social capital was and is applied to a plethora of topics in organizational research to better understand and explain socioeconomic circumstances. Examples for this research include the exploration of the value of intra-organizational social capital (Maurer et al. 2011), the role of social capital in the context of knowledge acquisition and knowledge exploitation (Anand et al. 2002, Yli-Renko et al. 2001, 2002) and the role of social capital related to organizational performance (Dess & Shaw 2001, Stam & Elfring 2008). A fundamental and in a variety of empirical studies adapted conceptualization of social capital is provided by Nahapiet & Ghoshal (1998) and linked with value creation in organizations (for a detailed explanation of the social capital conceptualization of Nahapiet & Ghoshal see Sub-Section 2.2.4). Empirical studies in the organizational context drawing on this conceptualization investigated, for example, the relation between social capital and innovation enablers (Camps & Marques 2013), knowledge sharing (van den Hooff & Winter 2011, van den Hooff & Huysman 2009), knowledge transfer (Inkpen & Tsang 2005), firm performance (Rass et al. 2013) and value creation through product innovation (Tsai & Ghoshal 1998). In addition, Leana & van Buren III (1999) introduce the notion of organizational social capital accompanied by a model of its components and consequences. Thereby, organizational social capital is defined “as a resource reflecting the character of social relations within the organization, realized through members’ levels of collective goal orientation and shared trust” (Leana & van Buren III 1999, p. 540). Here, the unit of analysis is the organization and social capital is understood as an attribute of the collective. According to Leana & van Buren III (1999) two components are essential to build organizational social capital, namely associability and trust.
50
2 The Social Capital View
Associability stands for the willingness and the ability to be dedicated to collective action. Trust among the employees of an organization is a basic prerequisite for organizational social capital (Leana & van Buren III 1999). The fact that trust plays an essential role in the formation of social capital has also been widely discussed by Nahapiet & Ghoshal (1998) (see Sub-Section 2.2.4). The consequences of organizational social capital are associated with benefits, such as higher work flexibility or the development of intellectual capital, and on the other hand potential costs, like costs for maintaining relationships (Leana & van Buren III 1999). Leana & van Buren III (1999) argue that organizational social capital is related to potential positive outcomes for an organization and potential costs. Hence, the relation between organizational social capital and the associated organizational outcomes—here referred to as organizational performance—has to be weighed. Moreover, this relation is affected by external factors of the organization (e.g., velocity and predictability of change in the industry), which also determine if the positive effects of organizational social capital outweigh the potential costs for an organization (Heath et al. 1993, Leana & van Buren III 1999). Studies drawing on the considerations of organizational social capital of Leana & van Buren III (1999) research, for example on the development of organizational social capital in the context of family firms (Arregle et al. 2007), the independent and combined effects of organizational social capital and structure on the performance of organizations (Andrews 2010) or organizational social capital and its relationship with performance (Leana & Pil 2006). A crucial point in organizational social capital research is the level of analysis. Here parallels between possible organizational levels of analysis (i.e., employee, team or group and the whole organization) and the sorts of levels of social capital (i.e., individual and group) exist. Both can be considered from the individual or from the group level. This fact leads to a multilevel perspective of social capital related to organizations, which comprises “various contexts in which individual and collective behaviors occur and how these behaviors influence the nested levels of social organization (i.e., individuals within teams within organizations within industries)” (Payne et al. 2011, p. 492). As the mapping of a multilevel perspective to a conceptual model and the linking of effects from the various levels is quite complex the majority of organizational research investigates predominantly one certain level of the possible (Adler & Kwon 2002, Payne et al. 2011). This reduces the complexity of the conceptualization and operationalization of social capital substantially, nevertheless a variety of these exist for each possible level of analysis, for example, organizations in their interactions with other organizations (Baker 1990) or individual actors from firms (Belliveau et al. 1996).
2.8 Summary
51
To summarize the findings of this section, it can be said that research pertaining to social capital of organizations tries to create a deeper understanding of the value of social capital in the organizational context and its related outcomes, be they beneficial or costly. In addition, the research in this context refers mostly to a specific level of analysis, both social capital and organizational unit related.
2.8
Summary
The aim of this chapter was to create an understanding for the comprehensive topic of social capital and the concept of social capital per se. By starting with an overview of the evolution of social capital including the main social capital forerunners, a basic understanding was created. This was further advanced through the detailed delineation of the thoughts of the three most important pioneers and theorists, who are Pierre Bourdieu, James S. Coleman and Robert D. Putnam, on the social capital concept. These remarks were complemented by the introduction of the organizational view of social capital by Nahapiet & Ghoshal and the network-related view of social capital by Lin. Building on the work of Adler & Kwon (2002) characteristics of social capital compared to other forms of capital were explained in detail to investigate whether social capital can be described as a form of capital. By extending the research of Portes & Sensenbrenner (1993) and Portes (1998) possible drawbacks of social capital were examined in more depth. As explained in Section 2.1 a closer look at the topic of the social capital concept reveals that the concept has experienced influences from and was applied to many research disciplines over time. In general a concept is “an abstraction or idea formed by the perception of phenomena” (Hair 2016, p. 224). The influences from different research areas on the development of a social capital concept lead to the result that the contents of the social capital concepts are rather diverse. Some researchers have further tried to develop their understanding of a social capital concept into a social capital theory connected to their research discipline (e.g., Bourdieu 1986, Lin 2001). A theory is generally defined by “a set of interrelated constructs (concepts), definitions and propositions that present a systematic view of phenomena specifying relations among variables, with the purpose of explaining and predicting the phenomena” (Kerlinger 1973, p. 9) or how Wilson (2014) expresses it in a short form: “Theory is a set of principles devised to explain phenomena” (Wilson 2014, p. 55). In summary, it can be said that there are different approaches to establish a social capital theory, whereby the elaboration of Lin (2001) is most detailed with the description of seven propositions and a proposed model of a social capital theory.
52
2 The Social Capital View
The examination on the numerous definitions has shown that no generally valid definition exists, rather there are several valid definitions of social capital, which depend on the research context and the unit of analysis. However, the tabular overview of the most common social capital definitions (see Section 2.3, Table 2.4) shows the increasing blur in use of the term. Obviously there is a shift in the usage of the term, so that it is no more used as part of a complex action and social theory as by Bourdieu, but mainly in order to explain economic development effects of individual groups or actors (Fuchs 2006). This fact is underlined by the outlined statistics of the evolution of academic social capital publications (see Section 2.4). In summary, there is a wide spectrum of social capital definitions. However, the term social capital is defined for the purpose of this doctoral thesis drawing on the definition proposed by Nahapiet & Ghoshal (1998). Nahapiet & Ghoshal understand social capital as “the sum of the actual and potential resources embedded within, available through, and derived from the network of relationships possessed by an individual or social unit. Social capital thus comprises both the network and the assets that may be mobilized through that network” (Nahapiet & Ghoshal 1998, p. 243).
With respect to the anchoring of this research in the organizational and networkrelated context the definition of social capital of Nahapiet & Ghoshal (1998) originated from organizational research is appropriate to be applied for this study as it considers not only the structure of relationship networks but also the actual and potential resources that could be utilized through such networks. In addition, the network-related social capital definition of Nan Lin reflects the contents of the definition of Nahapiet and Ghoshal in a somewhat compact form and reads “the resources embedded in social networks accessed and used by actors for actions” (Lin 2001, p. 25).
3
Open Source Software and Firm Involvement
“Every good work of software starts by scratching a developer’s personal itch” (Raymond 1999, p. 32)
The beginning of software development in academic and corporate research settings was coupled with the availability of appropriate hardware, which was the case in the 1960s (Lerner & Tirole 2002). Since then, a lot has happened around the topic of hardware and software development. At the beginning, the software or specifically, the computer operating system was a free add-on to the corresponding computer hardware. This changed, as the hardware became more powerful and diverse as well as software development not only produced operating systems, but also the first application programs. Over time, in addition to proprietary software development, leading movements and communities related to Free and Open Source Software (FLOSS) emerged (Lerner & Tirole 2002). This chapter gives a detailed insight into the field of OSS. First, the most important differences between proprietary software and OSS are delineated. For the context of this doctoral thesis it is vital to understand the motives of the OS movement as well as the purpose and the details of OSS licenses, which are clarified accordingly. Second, light is shed on characteristics of OSS communities as well as the predominant drivers of voluntary OSS developers and firms involved in OSS communities. Third, business models related to OSS are discussed, to show how firms can make money with OSS related products and services in congruence with the OSS licensing terms.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2020 D. Homscheid, Firm-Sponsored Developers in Open Source Software Projects, Innovation, Entrepreneurship und Digitalisierung, https://doi.org/10.1007/978-3-658-31478-1_3
53
54
3
Open Source Software and Firm Involvement
3.1
The Open Source Software Phenomenon
3.1.1
A Comparison of Open Source Software and Proprietary Software
Before going into details about OSS, some relevant differences between proprietary software and OSS are briefly explained in the following. Regardless of whether it is OSS or proprietary software, both basically comprise of a sequence of instructions, which always consist of two parts and build on each other. The first part is formed by the human-readable source code and the second part is made up of the computer understandable and executable binary code of a program (Buchtala 2007, de Blasi 2015). Starting from this explanation the distinctions between OSS and proprietary software are further highlighted by means of the legal, technological and organizational dimension. Legal Dimension: The main difference between OSS and proprietary software is connected with the permission to use the software and the software source code, respectively, regulated via the software license terms. With respect to software licenses in the field of OSS it can be said that these are in general based on a copyleft clause: “Copyleft is a general method for making a program (or other work) free (in the sense of freedom, not “zero price”), and requiring all modified and extended versions of the program to be free as well” (Free Software Foundation 2018a).
OSS and in specific software source code released under a copyleft license can usually be freely used, modified and shared. Normally, all changes to the source code have to be distributed under the same license terms as the original source code (Open Source Initiative 2018b), this ensures that the source code remains freely available and the rights of the source code authors are maintained accordingly (Lerner & Tirole 2002). The most commonly utilized copyleft license is the GNU General Public License version 2 (GNU GPL v2) (Aslett 2008). Proprietary software—in contrast—does not offer the option for users to access the source code. All rights related to proprietary software source code belong to the developing company and are secured by intellectual property rights, for instance copyright or patents (de Blasi 2015, Hall 2017, St. Laurent 2008). Technological Dimension: The source code of a program consists of human-readable instructions, which are converted by a compiler from the human-readable form into
3.1 The Open Source Software Phenomenon
55
a computer-executable one, the binary code. As the source code reveals all source code instructions like algorithms etc., the binary code is just understandable for a computer and can not be brought back into a human-readable form (MacCormack et al. 2006). Accordingly, OSS is always available to the public in a human-readable form, which is the pure source code. This allows users to use, analyze, change and share the software (Crowston et al. 2012, Open Source Initiative 2018b). In contrast, proprietary software programs are only available as binary code. The humanreadable source code is not provided by the developing company, since the source code constitutes—in many cases—a competitive advantage (Alexy 2008, de Blasi 2015, Watson et al. 2008). Organizational Dimension: The development and contribution process differs significantly between OSS and proprietary software. Eric Raymond (1999) describes the two development processes from a higher perspective as ‘Bazaar’ and ‘Cathedral’. According to Raymond (1999) does the Cathedral view depict the development of proprietary software, as the process is typically organized by a centralized project management, which coordinates the development team(s) as well as the predefined schedule and milestones including requirements analysis, development and testing (de Blasi 2015). The development usually happens within the boundaries of a company or with the involvement of service providers, this corresponds to a closed software development model (Crowston et al. 2007, Perr et al. 2010). The software source code is subject to the copyright of the company and can constitute a significant competitive advantage of the company and build an important part in relation to the business or revenue model of the firm (Alexy 2008). After the successful development of the software, it will be marketed in binary code format to the target group at an adequate price (Perr et al. 2010). The occurrence of bugs in the software, after its release, is usually perceived as a deficiency (de Blasi 2015). The term Bazaar is used by Raymond (1999) to illustrate the development process of OSS and is concretized by the example of the LK development project as Eric Raymond thereto states: “The Linux community seemed to resemble a great babbling bazaar of differing agendas and approaches (aptly symbolized by the Linux archive sites, who’d take submissions from anyone) out of which a coherent and stable system could seemingly emerge only by a succession of miracles” (Raymond 1999, p. 24).
In contrast to the proprietary software development process is the development of OSS characterized by a decentralized organization as the contributors—mainly volunteers—participate from all over the world in an OSS project and manage tasks
56
3
Open Source Software and Firm Involvement
such as developing and improving the source code, debugging and documenting (de Blasi 2015, MacCormack et al. 2006). The most important means for the successful cooperation are modern communication technologies and the Internet (Lerner & Tirole 2001, Perr et al. 2010). In general, the communication structures in an OSS project are informal and there is often a loose policy related to source code releases. Most importantly and as opposed to proprietary software development, bugs in the source code are seen as something positive, which are part of software development (de Blasi 2015). This is also underlined by the Bazaar view stated by Eric Raymond “Release early. Release often. And listen to your customers.” (Raymond 1999, p. 24). The two presented software development processes can today no longer be seen as completely distinct, when considering the cooperation between OSS projects and firms. In permissive OSS projects—which allow working together with companies—the organizational structure depends on the composition of the contributor group and can either be more like the Bazaar or the Cathedral or also a mixture of both (de Blasi 2015). The major characteristics of OSS and proprietary software are, with respect to the legal, technological and organizational dimension, summarized in the following Table 3.1. Table 3.1 Characteristics of Proprietary and Open Source Software. (Source: adapted from de Blasi (2015, p. 32)) Dimensions
Proprietary Software
Open Source Software
Legal Technological Organizational
Copyright Computer-readable code Cathedral approach
Copyleft Human-readable source code Bazaar approach
3.1.2
An Outline of the Open Source Movement
To get a coherent understanding of the OS movement and their motivation, its background is in the following briefly described on the basis of their three identifiable stages of development revealed by Lerner & Tirole (2002, p. 200 ff.). • In the first phase, ranging from the early 1960s to the early 1980s, software development regarding computer operating systems and the Internet was mainly performed in academic (e.g., Berkeley and MIT) and corporate research settings
3.1 The Open Source Software Phenomenon
57
(e.g., Bell Labs, Xerox’s Palo Alto Research Center) (von Krogh & von Hippel 2003). During this time, sharing of basic operating code of computer programs between the developers of various institutions was self-evident. In the 1970s the focus was on the development of operating systems serving multiple computer platforms. The most successful of those was Unix, developed at AT&T’s Bell Laboratories. Unix was mainly freely distributed to and installed across various institutions without having restrictive licensing terms (Salus 1994). The start of the Usenet—the Unix User Network—in 1979 sped up the sharing of source code of Unix improvements and extensions through linking together the Unix community (Lerner & Tirole 2002, West 2003). • At the beginning of the 1980s AT&T claims intellectual property rights for parts of the Unix source code (West & Dedrick 2001). This circumstance can be seen as beginning of the second phase of the OS movement that ranges from the early 1980s to the early 1990s (Lerner & Tirole 2002). Consequently, first attempts to formalize policies regarding the cooperative software development process were made. Driving force was the American programmer Richard Stallman with his Free Software Foundation (FSF) and the objective to develop and distribute a free Unix compatible operating system—called GNU meaning “GNU’s Not Unix”— consisting of the kernel, libraries, compilers, editors, shells, mailer, development tools and other completing software (Stallman 2003, Stallman & Gay 2002). In addition, Stallman made a first move towards a formal free software license, namely the GNU General Public License (GNU GPL), which governs that software developers have to agree explicitly that the basic source code as well as enhancements are made freely available for use, modification and dissemination (Raymond 1999, Stallman 1999, Zachary 1991). At the same time the Berkeley Software Distribution (BSD) effort created the BSD license, which is less constraining than the GNU GPL, as it allows modifying and disseminating software published under the BSD license without releasing the source code (Feller & Fitzgerald 2002, Lerner & Tirole 2002). • The third phase started in the early 1990s as with the diffusion of Internet access OS activities increased sharply (von Krogh & von Hippel 2003, West & Dedrick 2001). Furthermore, companies started activities to support and get involved in OSS communities (Lerner & Tirole 2002). Hence, the regulations of software licenses for freely available software—until then the GNU GPL was predominant—became less restrictive. The Debian Free Software Guidelines elaborated in 1995 for example, grant the right to combine cooperatively developed with proprietary software, as long as the copyright note of the original program remains (Lerner & Tirole 2002). At the same time, the existing software licenses with their regulations have been basis for individuals to formulate
58
3
Open Source Software and Firm Involvement
principles—thereafter known as Open Source Definition (OSD)—which define characteristics of OSS with regard to its use, modification and dissemination on a general level (Lerner & Tirole 2002, Perens 1999).
Thus, given a summarizing overview of the formation of the OS movement based on the most important and relevant events in this context, it is vital to go more into detail about the characteristics of OSS, which were mainly formed through the principles of the OSD in the late 1990s.
3.1.3
The Understanding of Open Source Software at a Glance
OSS is software being distributed with a license that eliminates the three main restrictions of software published under a copyright license. According to St. Laurent (2008) the three main restrictions of copyright licenses are that it is not allowed to 1. copy the work, 2. make descendant works based on the original without a special permission and 3. allow others to do the first two things. Hence, official OSS licenses allow the three aforementioned aspects through their terms and conditions. Moreover, the Open Source Initiative (OSI)1 founded in 1998 by Bruce Perens and Eric S. Raymond reviews all official OSS licenses according to their conformity with the OSD. The OSD was derived from the Debian Free Software Guidelines primarily by Bruce Perens at the end of the 1990s (Perens 1999) and is since then maintained by the OSI. The OSD comprises the following ten principles (Open Source Initiative 2018b), which have to be fulfilled by any official OSS license: 1. 2. 3. 4. 5.
Free redistribution Source code (availability)2 (Allowing of) derived works[2] Integrity of the author’s source code No discrimination against persons or groups
1 About the Open Source Initiative, https://opensource.org/about, last accessed December 18,
2018. 2 Insertion
in parentheses by the doctoral candidate.
3.1 The Open Source Software Phenomenon
6. 7. 8. 9. 10.
59
No discrimination against fields of endeavor Distribution of license License must not be specific to a product License must not restrict other software License must be technology-neutral
The above listed ten principles of the OSD describe on a higher general level which rights a specific OSS license has to grant and which aspects the license has to prevent, respectively. At a glance, an official OSS license has to ensure that the software and in specific the source code is freely available in a human-readable form—usually as plain text (OSD criterion 2). This main feature of OSS is a prerequisite for granting and facilitating source code modifications and derived works of the software source code, which have to be released under the terms as the license of the original software (OSD criterion 3). Furthermore, the OSS license has to ensure free redistribution of the software itself or as part of a composed software distribution (OSD criterion 1). The aforementioned three criteria of the OSD can be seen as key features of an OSS license. The further seven criteria of the OSD deal with additional aspects in connection with the source code and the OSS license itself and can be considered as organizational accessory. The previous paragraph gives an explanation of OSS with regard to the principles of the OSD maintained by the OSI. In the field of OSS other terms such as Free Software (FS)3 and—as a mixture of FS and OSS—Free and Open Source Software (FOSS) as well as Free/Libre Open Source Software (FLOSS)4 are often used synonymously to describe the context of software whose source code is publicly available (Crowston et al. 2012, Feller & Fitzgerald 2002). However, the terms OSS and FS are expressions for closely related movements that differ in nuances (Kelty 2008). When considering the definitions of OSS and FS in a broader sense both terms can be understood as software published under a license that allows the evaluation, usage, modification and redistribution of publicly available source code (Crowston et al. 2012). When researching the field of companies’ involvement in OSS development projects, it is important to shed light on the nuances between OSS and FS, which only come to light in a closer examination of their definitions. To consider in a narrower sense what FS stands for, a deeper understanding of the FS movement 3 In
the literature “free software” is sometimes termed “libre software” to make the meaning of free distinct. Here, “free” is to be understood as “freedom” and not as “without cost” (Crowston et al. 2012). 4 Although FS and OSS are distinct movements, the terms FOSS and FLOSS are often used in the literature to refer to the similarities and benefits of both concepts (Crowston et al. 2006).
60
3
Open Source Software and Firm Involvement
and the four types of freedom of FS is required. The American programmer Richard Stallman has shaped the term FS in the mid-1980s by publishing the GNU manifesto and by his objective to develop and distribute a variety of free software programs, including a free Unix compatible operating system—also known as GNU software or the GNU project—, as countermovement to increasing efforts of proprietary software development and software copyright licensing (Stallman 1985). In 1985 the activities of Stallman were merged into the FSF (Stallman 2003, Stallman & Gay 2002). In the sense of the FSF a software has to ensure the following four essential freedoms (Free Software Foundation 2018b) to its users, so that the program can be considered as truly free software: 1. The freedom to run the program as you wish, for any purpose. 2. The freedom to study how the program works, and change it so it does your computing as you wish. Access to the source code is a precondition for this. 3. The freedom to redistribute copies so you can help your neighbor. 4. The freedom to distribute copies of your modified versions to others. By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this. Richard Stallman complements the explanation of FS by stating “to understand the concept, you should think of “free” as in “free speech”, not as in “free beer”” (Stallman & Gay 2002, p. 43).
Taking into account the above condensed described principles of FS and OSS it can be stated that at large both movements pursue the objective of promoting software licenses that allow the evaluation, usage, modification and redistribution of the respective software (Crowston et al. 2012). In summary, at the beginning of the 1980s computer software was more and more commercialized and intellectual property rights have been asserted. Opposed to this development Richard Stallman pursues the philosophy of free software—free in the sense of giving the user freedom in use of the software (see the four freedoms of FS in the section above)—and published his point of view in the GNU manifesto in the mid-1980s (Stallman 2003, Stallman & Gay 2002). To bundle all efforts around the advocacy and development of FS Stallman founded the FSF in 1985. It can be concluded that the FS movement was primarily driven by Richard Stallman and focuses on the philosophy behind the defined four freedoms with a user’s perspective on the respective context (Stallman 2009).
3.1 The Open Source Software Phenomenon
61
In the mid-1990s a group of people broke away from the morally charged and polarizing FS idea and introduced the OSD accompanied by the foundation of the OSI (Dempsey et al. 2002). The advocates of the OSD follow a more applicationbased approach with respect to software whose source code is publicly available: the openness of the source code is viewed as a pragmatic rather than a moral issue, meaning their interest in OSS comes from a developer’s point of view (Dempsey et al. 2002, Fuggetta 2003). Accordingly, the focus is on the favorable OSS development model and the advantageous OSS development process (Crowston et al. 2006, 2012, Lee & Cole 2003). Stallman concisely describes the situation of FS and OSS by stating “The two terms describe almost the same category of software, but they stand for views based on fundamentally different values. Open source is a development methodology; free software is a social movement.” (Stallman 2009, p. 31).
The salient difference between the two shown views of FS and OSS turns out, when companies and their involvement in the software development process or in the bundling of the software with their proprietary software modules are considered. In these days most software is a composition of a variety of software modules that are linked together rather than standalone software packages developed by single communities or companies (German & González-Barahona 2009, Schaarschmidt 2012). To generate value for users proprietary software and FS/OSS have often to be integrated or combined with each other, especially if such software is used in commercial environments. Here, the important aspect for the possibility of using publicly available software together with proprietary software is determined by the license under which the publicly available software is released and that in turn relates directly to the categorization of the software as FS or OSS (Crowston et al. 2006). For instance, if proprietary software of a third party is bundled or integrated with publicly available software that is published under the very restrictive GNU GPL, which embodies the four freedoms of FS, the license demands that the newly created software bundle has also to be published under the GNU GPL (Crowston et al. 2006). As a consequence, the third party has to disclose the source code of their proprietary software part and thus releases the created intellectual property, which is in most cases directly connected with their competitive advantage. This is a quite unattractive situation for companies, which will most likely keep them away from any kind of involvement (German & González-Barahona 2009, Schaarschmidt 2012). The very restrictive GNU GPL and also most of the other FS licenses are not suitable to facilitate firm involvement in FS development, as the effects of the four freedoms of FS defined by Stallman hinder the participation of companies in such
62
3
Open Source Software and Firm Involvement
settings (German & González-Barahona 2009). Opposed to the restrictive regulations of FS licenses the OSI facilitates with their OSD less restrictive software licenses (e.g., the Berkeley Software Distribution License) that allow companies to get involved in OSS communities and development, respectively, without losing intellectual property rights of their proprietary software parts (Fitzgerald 2006, Perens 1999). In summary, it should be mentioned that there are over 80 OS licenses, which have been approved by the OSI and thus fulfill the ten criteria of the OSD (Open Source Initiative 2018a). Each individual license has specific rights and obligations associated with their application. The multiplicity of licenses results in a continuum which ranges from • high restrictive licenses—obligation to release modified or composed source code under the license terms of the initial source code (e.g., GNU Affero GPL, GNU GPL)—over • restrictive licenses—obligation to release modified source code, in addition composed code, for example, the linkage with proprietary code, is allowed to be released using other licenses (e.g., GNU Lesser GPL, Mozilla Public License)— to • permissive licenses—no restriction related to the distribution of modified or composed source code, which is licensed under such a permissive license (e.g., Apache License, Berkeley Software Distribution License) (Lerner & Tirole 2002, Ma 2010).
3.2
Exposing Open Source Communities and Open Source Software Developers
3.2.1
Introducing Open Source Software Communities
Basically, OSS communities can be categorized as online communities (Crowston & Scozzi 2002). However, before going into detail about online communities, first characteristics of communities in general are considered. A first noticeable definition of fundamental characteristics of communities was given in the 1930s by the American sociologist Robert E. Park (1936), who states the following three features that make up a community: 1. “a population territorially organized, 2. more or less completely rooted in the soil it occupies,
3.2 Exposing Open Source Communities and Open Source Software Developers
63
3. its individual units living in a relationship of mutual interdependence that is symbiotic rather than societal, in the sense in which that term applies to human beings” (Park 1936, p. 3).
The definition of community by Robert E. Park is primarily characterized by the aspect of territorial or geographical anchorage of a community. In the period until 1955 a multiplicity of definitions of community have been evolved in a variety of fields of application. Hillery (1955) reveals about 94 diverse definitions of the term community, in which a community is, for instance, considered a group, a process, a social system, a geographic place, a consciousness of kind, a totality of attitudes, a common lifestyle, the possession of common ends, local self-sufficiency et cetera. The work of Hillery (1955) concludes that no consensus about the meaning of community exists. However, in most analyzed studies three attributes regarding community are prevalent, which are in a broader sense related to the community definition of Robert E. Park form 1936 (Driskell & Lyon 2002, Hillery 1955). These fundamental components of community are accepted by most sociologists (Lyon 1999) and are 1. a specific place and the identification with it, usually a specific geographic area, 2. common ties through commonalities like common obligations and responsibilities as well as shared values and meanings and 3. social interaction between the members including relationship building (Etzioni & Etzioni 1999, Hillery 1955, Muniz & O’Guinn 2001). Over the years the aspect of a specific place and the connected identification has been decoupled from the basic characteristics of a community, as social interaction is more and more independent of a physical location or geographical area due to technological advancement (e.g., auto mobility, airplanes, telephone, the Internet) helping to maintain relationships over larger distances (Smith & Kollock 1999, Wellman & Leighton 1979). Hence, the geographical aspect as part of the definition of community clearly loses importance, especially with regard to expanding the definition into the context of online communities. Overall, the two characteristics social interaction and common ties are considered to be the most important (Bell & Newby 1974). In this sense, the American sociologist Joseph R. Gusfield (1975) differentiates the term community into two fields. The first field is about the territorial or geographical aspect of a community. In this case the sense of community is related to belonging to a specific geographical area and its social structure like neighborhood, town, city or region. The second field is about the relational facet of communities of interests. Here, the quality and character of relationships among the
64
3
Open Source Software and Firm Involvement
members, which share a common interest (e.g., hobby, religion), are of importance without consideration of a specific shared location (Obst et al. 2001). As now an understanding of community in general is created, it is necessary to see how researchers adapt the characteristics of traditional communities to online communities. A literature review reveals a plethora of possible definitions for online communities and their properties, as the term has been widely utilized and adapted in various research disciplines (Preece et al. 2003, Wang et al. 2002). Common to most of the definitions is that the identified aspects of social interaction, common ties and shared interests are prevalent, nevertheless there is a lack of consensus on a mutual definition, as the term is associated with a variety of online activities (Komito 1998, Preece et al. 2003). Table 3.2 gives a selection of definitions for the term online community to show the vast extent of the definitions. In addition to the efforts of individual scholars to characterize an online community resulting in a multitude of definitions, a group of professionals in human computer interaction elaborated in a workshop during the Computer Supported Cooperative Work conference in Boston in November 1996 a characterization for physical and network (online) communities through determining the following core attributes, which also include the previously identified key elements, such as social interaction, common ties and a shared interest (Lazar & Preece 1998, Whittaker et al. 1997). The identified community attributes comprise • a shared goal, interest, need or activity that builds the primary reason for members to be part of the community, • repeated and active participation by members with intense interactions, strong emotional ties as well as shared activities emerging between the members/ participants, • accessibility of shared resources and policies governing the access, • reciprocity of information, support and services between the participants and • shared context of social conventions, language and protocols (Lazar & Preece 1998, Whittaker et al. 1997). Scholars have widely applied the above named attributes to describe characteristics of online communities (e.g., Bilgram et al. 2008, Flavián & Guinalíu 2005, Lazar & Preece 1998, Preece 2000, Stockdale & Borovicka 2006, Wang et al. 2002) and utilized them also in the context of OSS development communities. Consequently, OSS development communities share these characteristics as there are
3.2 Exposing Open Source Communities and Open Source Software Developers
65
Table 3.2 Definitions of Online Community Authors
Definitions of Online Community
Bagozzi & Dholakia
“mediated social spaces in the digital environment that allow groups to form and be sustained primarily through ongoing communication processes” (2002, p. 3) “computer-mediated social interaction among large groups of people” (1997, p. 1) “a group of individuals engaging in predominantly online interaction on virtual spaces created through the integration of communication with contents developed by community members” (2008, p. 59) “an aggregation of individuals or business partners who interact around a shared interest, where the interaction is at least partially supported and/or mediated by technology and guided by some protocols or norms” (2004a) “a group of people who interact in a virtual environment. They have a purpose, are supported by technology, and are guided by norms and policies” (2003, p. 1023) “social aggregations that emerge from the Net when enough people carry on [. . .] public discussions long enough, with sufficient human feeling, to form webs of personal relationships in cyberspace” (1993, p. 5) “Virtual Communities describe the union between individuals or organizations who share common values and interests using electronic media to communicate within a shared semantic space on a regular basis” (1999, p. 30); English version cited from Schubert & Ginsburg (2000, p. 46) “networks of interpersonal ties that provide sociability, support, information, a sense of belonging and social identity. I do not limit my thinking about community to neighbourhoods and villages” (2001, p. 228)
Erickson Jang et al.
Porter
Preece & Maloney-Krichmar Rheingold
Schubert
Wellman
• shared goals such as developing and providing superior software to users (e.g., Bagozzi & Dholakia 2006, O’Mahony & Bechky 2008, Ye & Kishida 2003), • repeated and active participation (e.g., Bagozzi & Dholakia 2006, Wasko et al. 2009), • accessibility of shared resources (e.g., Wasko et al. 2009), • generalized reciprocity (e.g., Lakhani & von Hippel 2003, Raymond 2001, Smith & Kollock 1999, Wasko et al. 2009) and • shared language (e.g., Chou & He 2011, Stewart & Gosain 2006).
66
3
Open Source Software and Firm Involvement
Along with this, von Hippel & von Krogh (2003) take up the features described above and give the following definition of OSS development communities: “Internet-based communities of software developers who voluntarily collaborate in order to develop software that they or their organizations need” (von Hippel & von Krogh 2003, p. 209).
In summary about the definitions of physical communities, online communities and in specific OSS development communities it can be concluded that communities— regardless of whether they are physical or online—are formed by a group of humans that get together—either in the real world or online—because of shared interests and interact with each other to achieve a shared goal. These humans, members, contributors—whatever they are called—constitute the most important part of a community. With these contributors the success and existence, respectively, of a community rises or falls. As far as OSS development communities are concerned, the contributors are usually made up by a heterogeneous group of people that, among other things, develop, test and document the software that can be used, changed and shared by any person after its release (Open Source Initiative 2018b). Besides the fact that OSS communities consist of hobbyists, who voluntarily provide their resources to the community, the given definition by von Hippel & von Krogh (2003) also includes another important contributor group—this is organizations. Organizations differ from volunteers in terms of their motivation to be active in OSS development communities and are represented by their employed developers in the community. In turn, employed developers might be considered as proxies for firm interests in the community.
3.2.2
Motivation of Open Source Software Developers
There are various approaches in research to determine the motives of people for participation in communities (Hertel 2007, von Krogh et al. 2012, 2008). In this doctoral thesis, the most frequently used approach based on motivational theory (von Krogh et al. 2012) is utilized. Based on motivational theory, drivers of OSS participants to get involved in OSS communities can be distinguished between intrinsic (e.g., joy, altruism) and extrinsic (e.g., need, payment) motivation. In the past a variety of scholars have examined drivers and thus the motivation of OSS participants to be active in OSS projects, for example Baytiyeh & Pfaffman (2010), Cai & Zhu (2016), Hars & Ou (2002), Lakhani & Wolf (2005), Lerner & Tirole (2002), Shah (2006), Wu et al. (2007), Xu et al. (2009).
3.2 Exposing Open Source Communities and Open Source Software Developers
67
Motivation of people as defined by Deci & Ryan (1985) is composed of the drive and the behavior of a human being. The drive stands in relation to the needs of a person and the behavior is related to processes and structures of an individual to direct the drive towards the satisfaction of needs. In more detail, the psychologists Deci & Ryan (1985) distinguish three variants of motivation in their established Self-Determination Theory (SDT): • intrinsic and • extrinsic motivation as well as • amotivation. Intrinsic motivation is the execution of an activity because of the associated satisfaction, whereas extrinsic motivation is performing a task as means to an end or as a result of an obligation. Amotivation is characterized by the personal feeling of incompetence or by rejecting the execution of an activity. In addition, both intrinsic and extrinsic motivation are not distinct. Rather, one can imagine these two forms as end poles of a continuum, between which mixed forms of both motivation types can exist (Ryan & Deci 2000b). For this research it is essential to have a differentiated understanding of the various drivers of OSS participants to be active in OSS communities. Thus, on the theoretical basis of Deci & Ryan’s (1985) SDT the motivation of OSS participants is in the following section exposed in more detail with reference to intrinsic and extrinsic stimuli. Amotivation will not be addressed, as hereafter an understanding of drivers of motivational behavior are considered. Ryan & Deci (2000a) describe intrinsic motivation as the execution of an activity due to the associated enthusiasm and not for the achievement of certain results. If a person is intrinsically motivated he acts in his own interest, for the joy or challenge associated with a task and not due to external events, pressure or reward. The human need for competence and self-determination is at the core of the theory of intrinsic motivation. This need is directly linked to emotions of interest and delight (Deci & Ryan 1985). In connection with OSS participants, researchers investigated a plurality of intrinsic motivators (e.g., Baytiyeh & Pfaffman 2010, Hars & Ou 2002, Lakhani & Wolf 2005, Shah 2006, Xu et al. 2009). The most relevant of these intrinsic drivers are now described briefly: 1. Joy-Based Intrinsic Motivation. Research on motivational reasons of people to contribute to OSS development shows that joy-based intrinsic motivation is one of the strongest and most prevalent drivers of OSS contributors (e.g., Hars &
68
3
Open Source Software and Firm Involvement
Ou 2002, Lakhani & Wolf 2005, Shah 2006). Joy-based motivation is closely linked to the creativity of a person. Frequently, contributors to OSS projects have a strong interest in software development and related challenges (Hars & Ou 2002). During programming they arrive at a state of flow, which is characterized by a focused concentration, the fusion of action and awareness, the confidence in their own abilities and the joy of the activity itself, regardless of the outcome (Nakamura & Csikszentmihalyi 2001). 2 Altruism as Intrinsic Motivator. Another fundamental aspect of intrinsic motivation is altruism, which is the desire to help others and to improve their welfare (Hars & Ou 2002). In OSS communities developers for example, write source code at their own expense, which includes the invested time and opportunity costs, and freely reveal the source code to the community. Consequently, these contributors can be called altruistic. They participate in the OSS community without taking advantages of its outcome (Baytiyeh & Pfaffman 2010, Hars & Ou 2002, Wu et al. 2007). A variant of altruism related to communities is referred to as ‘community identification’ and corresponds to the need for belonging and love described by Maslow (1943). Some contributors in OSS communities identify themselves as part of the community and align their goals with those of the community (Hars & Ou 2002, Lakhani & Wolf 2005). In addition, they may treat other members of the community like their relatives—this phenomenon is named kin selection altruism—and are willing to perform outstanding achievements for them and thus for the community (Hars & Ou 2002). 3. OSS Ideology as Intrinsic Motivator. In addition, the OSS ideology plays a crucial role for many OSS contributors to get involved in OSS communities (e.g., Alexy & Leitner 2011, David & Shapiro 2008, Stewart & Gosain 2006, Xu et al. 2009). Stewart & Gosain (2006) examine the elements of the OSS ideology in detail, which in summary involves • joint collaborative values and norms, such as helping, sharing and collaboration, • individual values, such as learning, technical knowledge and reputation, • OSS process beliefs, such as code quality and bug fixing and • beliefs regarding the importance of freedom in OSS, such as an OS code and its free availability and use for everyone (Stewart & Gosain 2006). Besides these distinguishing aspects of participants’ intrinsic motivation to engage in OSS projects, researchers have found that extrinsic stimuli can also have a significant impact on the level of activity of contributors in OSS communities (e.g., Baytiyeh & Pfaffman 2010, Cai & Zhu 2016, Hars & Ou 2002, Lakhani & Wolf 2005,
3.2 Exposing Open Source Communities and Open Source Software Developers
69
Lerner & Tirole 2002, Shah 2006, Wu et al. 2007, Xu et al. 2009). A behavior is extrinsically motivated, when an activity is performed for reward, recognition or because of a referral from someone or an obligation (Ryan & Deci 2000a). The most prevalent of these external stimuli are considered hereafter: 1. Recognition, Reputation and Improvement of the Professional Status as Extrinsic Motivators. ‘Signaling incentives’ as described by Lerner & Tirole (2002, 2005) are one of the most common reasons for people to participate in OSS communities. These incentives cover, inter alia, the recognition by other members of the community, the enhanced reputation that may come from appropriate contributions and the improvement of the professional status. The participants in OSS projects find their drive in the opportunity to earn the appreciation of the community through their achievements. Furthermore, adequate contributions to the OSS project may enhance the reputation of participants in the eyes of their community peers (Cai & Zhu 2016, Lakhani & von Hippel 2003, Raymond 2001, Roberts et al. 2006, Spaeth et al. 2008, Xu et al. 2009). Additionally, OSS developers have the chance to present their skills to future employers through appropriate performances in the community and thus they take advantage of the participation in OSS projects as a way of career advancement (Ghosh 2005, Lakhani & Wolf 2005, Roberts et al. 2006, Wu et al. 2007). In connection with the development of their skills, abilities and knowledge the developers can pave their ways for better areas of employment, higher salaries or more challenging jobs. On the other hand, the participation in OSS projects can also be used to implement an own business related to OSS (Lerner & Tirole 2002). 2. Reciprocity as Extrinsic Motivator. Among other scholars (e.g., Lakhani & von Hippel 2003, Shah 2006), Raymond (1999, 2001) sees reciprocity in the attitude of giving and taking in OSS communities as a strong driver of OSS contributors. A reciprocal behavior of OSS participants is closely linked with the phenomenon of a gift culture (Wu et al. 2007). Like-minded people understand their participation as a gift to the community, because they use OSS by themselves and also benefit from further developments of the software by other members (Wu et al. 2007). 3. Personal Need as Extrinsic Motivator. A vital factor which is closely related to an extrinsic stimulus is a personal need of an actor (e.g., Baytiyeh & Pfaffman 2010, Hars & Ou 2002, Raymond 1999, Wu et al. 2007). Many OSS projects are launched because the initiators need software with specific functions that are not available so far, but they have the willingness and knowledge to develop this software. In addition, an incentive to action for OSS participants can be triggered by an error in OSS, which should be corrected or a non-existent but required
70
3
Open Source Software and Firm Involvement
function in the OS program (Wu et al. 2007). Raymond (1999) describes this motivator by the following precise statement “Every good work of software starts by scratching a developer’s personal itch” (Raymond 1999, p. 32).
OSS projects that emerged out of personal needs that still have relevance include the LK project, which was initiated by Linus Torvalds in 1991 (Corbet & KroahHartman 2017) and the Apache HTTP server project started by Brian Behlendorf in 1995 (Lerner & Tirole 2002). The Apache HTTP server is until today—thus over 20 years—the most popular web server, serving currently around 42 % of all active websites (Netcraft 2018). 4. Improving own Skills as Extrinsic Motivator. OSS communities offer, for instance, the possibility for developers to improve their programming skills and their knowledge through participation in an OSS project and through the connected process of self-learning (Ye & Kishida 2003). Programmers are free to choose in which tasks they participate according to their interests and abilities (David & Shapiro 2008, Hars & Ou 2002, Lakhani & von Hippel 2003). In addition, the OSS community forms a knowledge community that supports each other. As a result, the self-learning participants are experiencing a continuous learning curve and build up a repertoire of experiences, ways and means to solve specific software development tasks (Wu et al. 2007). Furthermore, the developed source code is usually part of an evaluation process before it becomes part of the official program. Consequently, the constructive feedback from the community, especially through peer evaluation of the developed source code, encourages the development of developers’ skills (Lakhani & Wolf 2005). 5. Payment as Extrinsic Motivator. Extrinsic stimuli—that are not only applied in the context of OSS projects—are monetary incentives (Ghosh 2005, Lakhani & Wolf 2005, Roberts et al. 2006). OSS contributors that get monetary incentives to participate in an OSS community are largely paid or employed by companies (Hertel et al. 2003). In short, such contributors are firm-sponsored and get a monetary compensation for their contribution activity (Dahlander & Wallin 2006, Luthiger & Jungwirth 2007). Extrinsic motivation of OSS developers can be compared to a rational, economic calculation. As long as the benefits outweigh the costs, an interested contributor is willing to support an OSS community. In sum, it can be stated that extrinsically motivated contributors participate in OSS projects highly targeted and selective and
3.2 Exposing Open Source Communities and Open Source Software Developers
71
usually only if they have either a direct or a future benefit of their involvement (Lerner & Tirole 2002). In conclusion of the above described motivational drivers of OSS contributors the following two aspects have to be considered. • First, a distinct categorization into purely intrinsic and extrinsic motivators is deficient to reflect the motivational facets in OSS communities (Lakhani & Wolf 2005, Roberts et al. 2006). Consequently, with respect to motivational theory—in specific to Deci & Ryan’s (1985) SDT—motivational patterns range on a continuum between intrinsic, internalized extrinsic and extrinsic motives (Mair et al. 2015, Ryan & Deci 2000b). With regard to internalized extrinsic facets of motivation Ryan & Deci (2000b) explain that extrinsic drivers of motivation can be internalized over time by a person and are then perceived as intrinsic motivational drivers. Accordingly, the person perceives the execution of an activity—actually triggered through an extrinsic stimulus—and their corresponding behavior as self-determined rather than driven by external forces (Roberts et al. 2006, Ryan & Deci 2000b). In the context of OSS contributors various scholars utilize the distinction between intrinsic, internalized extrinsic and extrinsic motivational drivers in their studies (e.g., Hars & Ou 2002, von Krogh et al. 2008, Lakhani & Wolf 2005, 2007) and assign stimuli like reputation, reciprocity, need and learning to the area of internalized extrinsic drivers (von Krogh et al. 2008). • Second, the individual motivational drivers of OSS participants are partly interrelated rather than completely independent (Martinez-Torres & Diaz-Fernandez 2014), as for example, OSS ideology beliefs and values involve status attainment (i.e., recognition) and reputation (Stewart & Gosain 2006), which are also considered as individual motivators in OSS communities (e.g., Lerner & Tirole 2002). In general it can be stated that research investigating the motivation of OSS contributors does not agree on one main driver out of the continuum between intrinsic and extrinsic motivators for participation in OSS communities. However, the majority of studies sees intrinsic stimuli as key drivers of OSS contributor involvement (Martinez-Torres & Diaz-Fernandez 2014). To get a coherent picture of pertinent OSS literature, which exposed motivational factors of OSS contributors Table 3.3 summaries relevant research in this field stating the investigated motivational drivers (i.e., intrinsic, internalized extrinsic and extrinsic drivers) as well as information about the type of study (i.e., theoretical or empirical type).
72
3
Open Source Software and Firm Involvement
Table 3.3 Overview of Motivational Drivers of OSS Contributors. (Source: adapted from von Krogh et al. (2008, p. 27) and extended by the doctoral candidate)
X
Csikszentmihalyi (1975, 1990, 1996)
X
Lakhani & von Hippel (2003)
X
X
Lattemann & Stieglitz (2005)
X
Lerner & Tirole (2002)
X
Osterloh & Rota (2007)
X
X X
X
X
Raymond (1999)
X
X
von Hippel & von Krogh (2003)
X
X
Ye & Kishida (2003)
X X
Alexy & Leitner (2011)
X
Baytiyeh & Pfaffman (2010)
X
Bitzer et al. (2007)
X
Empirical Studies
David & Shapiro (2008)
X
X
Hars & Ou (2002)
X X
X
X
Lakhani & von Hippel (2003) X
Luthiger & Jungwirth (2007)
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X X
X
X
X
X
X
X
X
X
X X
X
Okoli & Oh (2007) Oreg & Nov (2008)
X
Rob erts et al. (2006)
X
S ha h ( 2 00 6)
X
X
X
X X
X
Spaeth et al. (2008)
X
X
X
X
X
Stewart & Gosain (2006)
X
Wu et al. (2007)
X
Xu et al. (2009)
X
X
Lakhani & Wolf (2005)
X
X X
Ghosh (2005)
Hertel et al. (2003)
Payment
Career
X
Bergquist & Ljungberg (2001) Theoretical Studies
Learning
X
Need
Benkler (2002)
Reciprocity
Authors
Reputation
Drivers
Ideology
Extrinsic
Drivers
Altruism
Internalized Extrinsic
Drivers
Joy
Intrinsic
X
X
X
X
X
X
X
X X
X
X
X
X
X
3.2 Exposing Open Source Communities and Open Source Software Developers
3.2.3
73
Motivation of Firms to get Involved in Open Source Software Communities
In addition to hobbyists, the group of contributors in OSS development projects does also include companies or more specifically, developers that are employed— and thus paid—by firms. While voluntary OSS contributors are driven by intrinsic and extrinsic aspects of motivation (see Sub-Section 3.2.2), literature investigating drivers of companies, which are active in OSS projects, reveals that economic theory is not sufficient to explain the relation between firms and their OSS involvement. Corresponding studies find three features that influence companies to participate in OSS communities, which are economic, technological and social aspects (Bonaccorsi & Rossi 2006, Dahlander & Magnusson 2005, Ziegler et al. 2014). In the following, the three mentioned aspects are considered in more detail. 1. Economic Drivers. The dominant strategy for companies to appropriate from OSS is by providing complementary products and services to customers—for example, installation services, training, technical support, maintenance, consultancy and certifications (Fitzgerald 2006)—in congruence with their business strategy (Dahlander 2005, Dahlander & Magnusson 2005, Lerner & Tirole 2002). An essential prerequisite for offering complimentary services is a comprehensive understanding of the OSS, the products or services are based upon. Companies following this approach commonly deploy own employees to the OSS project they provide products or services for. Consequently, the advantages for the firm are twofold: first, the company acquires external knowledge through their own employees active in the relevant OSS development project and second, it has access to complementary resources through the OSS community, which are difficult to replicate internally (Andersen-Gott et al. 2012, Dahlander & Magnusson 2005, Riehle 2007). Further, firms utilize OSS communities to realize cost savings. To achieve this, companies often release their proprietary software source code under an OS license with the objective to attract external contributors and to build a community related to the software. As a consequence, internal research and development expenditure can be reduced by utilizing the community in addition to development efforts within the boundaries of the firm (George et al. 2005, Hawkins 2004). Thus, the self-initiated community forms a valuable workforce for companies, which performs tasks such as programming, testing as well as detecting and reporting bugs or bringing in ideas for new features without any monetary incentive or reward (Henkel 2004, Lerner & Tirole 2002). In the long run, the software code can be maintained by the community, such that the company has lower costs than its competitors with proprietary
74
3
Open Source Software and Firm Involvement
software (Hawkins 2004). However, it should be noted that establishing an ecosystem and an active community around published source code is not an easy endeavor as the firm must attract and retain volunteer contributors (Spaeth et al. 2015, Zhou et al. 2016) as well as preferably being ahead of competitors, as these could pursue alike strategies (Ågerfalk & Fitzgerald 2008, Andersen-Gott et al. 2012, Dahlander & Magnusson 2008). 2. Technological Drivers. For firms releasing their proprietary software source code under an OS license and trying to build a community around it, also technological aspects can be a driver. In addition, companies can also become active in existing innovation projects, which are related to their business model to raise technological as well as innovation advantages. In this context, the open innovation approach can be seen as an overarching concept guiding companies when opening outwards to organize their innovation activities more effectively and efficiently (Chesbrough 2003, Chesbrough & Appleyard 2007). A means for firms to complement their own resource base are innovation communities. In the case of software, hardware or service companies, OSS communities form a resource pool these firms can benefit from—depending on the strategy they pursue (Dahlander & Magnusson 2008, Grand et al. 2004). The interaction and exchange with the community can lead—under technology and innovation aspects—to enhanced or new products driven by market demand, which imply a competitive advantage (Bettenburg et al. 2015). The inclusion of such external contributors increases in addition the firm’s innovative capacity (Chesbrough 2003). Moreover, from a technological point of view is releasing developed source code to the public a means of choice to promote technological standards (Bonaccorsi & Rossi 2006, Fink 2003, Wichmann 2002). 3. Social Drivers. Scholars reveal that companies are also driven by social reasons to contribute to OSS communities. Here, the desire to conform to the social norms of OSS communities and moral obligations can be an aspect (Andersen-Gott et al. 2012, Bonaccorsi & Rossi 2006). Besides, such firms accept and respect the values of OSS communities, as they want to build a trusting relationship to the community with their commitment and contributions (Lerner & Tirole 2002, Osterloh et al. 2003). Moreover, by sharing source code and knowledge with the OSS community some firms want to give something back to the community— this aspect is in the literature also known as ‘reciprocity’ (Lerner & Tirole 2002, Osterloh et al. 2003, Raymond 2001). Another driver is the enhancement of corporate reputation through involvement in OSS communities as there is an increasing public interest in OSS and the related projects (Bonaccorsi & Rossi 2006).
3.3 Business Models related to Open Source Software
75
Altogether, the above outlined drivers motivating companies to participate in OSS communities are not to be considered separately. Rather, companies are motivated by different aspects from all three areas of drivers to get involved in OSS communities. Not only do firms benefit from their involvement in OSS project, OSS communities also get advantages through the contributions of the companies, like increased productivity, sustained contributions and output (Capiluppi et al. 2012).
3.3
Business Models related to Open Source Software
In general, a business model defines how an organization creates and appropriates value (Popp 2011, Weill et al. 2005). More specifically and in the best case it answers the following questions asked by Peter Drucker5 about the business activity of a company (Magretta 2002, p. 4): • • • •
Who is the customer? And what does the customer value? How do we make money in this business? What is the underlying economic logic that explains how we can deliver value to customers at an appropriate cost?
Finding a uniform and common definition of the term business model is quite a challenge in two ways. For one thing, there are a plenty of definitions of the notion business model in the business and management research field (e.g., Amit & Zott 2001, Casadesus-Masanell & Ricart 2010, Magretta 2002, Teece 2010, Wei et al. 2013, Weill et al. 2005) and for another, there is no consensus in the literature on the evolution, structure and definition of business models (Morris et al. 2005). But it seems as if research in this field is gaining importance in recent years, as there are a couple of newly publications synthesizing literature on the business model concept as well as its characteristics and making suggestions for further important research directions (e.g., Goyal et al. 2017, Klang et al. 2014, Lambert 2015, Massa et al. 2017, Saebi & Foss 2015, Wirtz et al. 2016, Zott et al. 2011). Nevertheless, Porter (2001) criticizes the blurry understanding of business models and describes its appearance as ‘a loose conception of how a company does business and generates revenue’ (Porter 2001, p. 73). The unsatisfactory situation of a nonuniform understanding of business models was the driver for Morris et al. (2005) 5 Peter F. Drucker (1909–2005) was an American economist of Austrian origin. He has publis-
hed numerous works on topics of the economy, politics, society and management and is considered a pioneer of modern management theory (Drucker Institute 2018).
76
3
Open Source Software and Firm Involvement
to synthesize the literature pertaining to definitions and issues related to business models resulting in a proposition for a unified business model definition, which states: “A business model is a concise representation of how an interrelated set of decision variables in the areas of venture strategy, architecture, and economics are addressed to create sustainable competitive advantage in defined markets” (Morris et al. 2005, p. 727).
When comparing the proposed business model definition of Morris et al. (2005) with the above mentioned questions of Peter Drucker related to business activities of a company, it can be clearly concluded that they have a high degree of congruence. With the explanation of business models derived from Morris et al. (2005) in mind, the next step will be to further evaluate the specific understanding of business models in the field of OSS. From an economic point of view is the combination of the two terms OSS and business model initially contradictory. This is because OSS is generally understood as freely available software source code without any commercial background. On the other hand, there is the notion of business model, which describes how a company implements its value creation and profit-making intentions. Nonetheless, in the field of OSS, there are several companies that have positioned themselves very successfully in the market, for instance, Canonical Ltd., Mozilla Corporation, Red Hat Inc. and SUSE PLC. Accordingly, business models have been established that do not conflict with the licensing terms of OSS. Here, as for the definition of business models in general, the countless definitions (e.g., Bonaccorsi et al. 2006, Dahlander & Magnusson 2006, Riehle 2012) and diverse categorizations (e.g., Bonaccorsi et al. 2006, Chesbrough & Appleyard 2007, Fitzgerald 2006, Hall 2017, Krishnamurthy 2005, Lakka et al. 2011, Okoli & Nguyen 2015, Perr et al. 2010, Popp 2015, Watson et al. 2008) of business models for OSS show that there is a very broad understanding of the topic among scholars. In addition, the utilized approaches for evaluation and development of categorizations in relation to OSS business models are also broadly based. This can be seen, for example, in the approaches elaborated by Lakka et al. (2011) and Okoli & Nguyen (2015). Lakka et al. (2011) follow a structured-case methodological approach. The exploratory study proposes a literature-based OSS business model taxonomy and comes to the conclusion that OSS business models vary from conventional software business models caused by specific properties influencing the software value chain, the infrastructure as well as the revenue model of companies following an OSS oriented business model (Lakka et al. 2011). Okoli & Nguyen (2015) chose a different approach than a classical literature review to determine crucial business models in the field of OSS: they involved 34 FLOSS experts as part of a Delphi study. The experts come to the outcome of eight currently relevant and two potential
3.3 Business Models related to Open Source Software
77
business models related to OSS (Okoli & Nguyen 2015). As a result of the literature review regarding business models related to OSS, Table 3.4 gives an overview about models discussed in current studies, which generate a revenue stream in conjunction with OSS. These business models are considered in more detail in the following. In addition, there are also approaches of business models related to OSS without revenue stream from OSS, such as business transformation, cost and risk reduction and sharing, respectively, as well as outsourcing (e.g., Ågerfalk & Fitzgerald 2008, Alexy 2009, Fitzgerald 2006, Raymond 2001), which have to be mentioned, but are not further considered here. Moreover, it should be noted that the given overview of business models related to OSS does not claim to be exhaustive. Over time, OSS business model approaches changed, thus today there exists a large number of possible business models related to OSS and diverse hybrid forms of it. Aslett (2008) reveals through an empirical study— surveying 114 companies in the field of OSS—that there are not only a handful of OSS related business models but a large variety—here 88 models—due to different combinations of individual factors, such as development model, licensing strategy as well as primary, secondary and tertiary revenue-generation approach (Aslett 2008). Table 3.4 states currently discussed business models with revenue streams related to OSS, a short description of the models and the authors who discuss the corresponding model presently. Not all authors use in their publications the exact term for the designation of the model. Nevertheless, an assignment to the corresponding model can be made from the given content descriptions. The presented business models related to OSS are not distinct. Mixed forms of the individual models exist, for example, the open core model can be seen as an extension of the dual- / multi-licensing model (Lampitt 2008, Riehle 2012). In addition, in practice, companies often do not just operate according to one of the illustrated business models, but rather according to a hybrid of several of them (Aslett 2008). After the detailed explanation of business models related to OSS, it can be concluded that non of the business models draws revenues directly from OSS. The majority of the business models are based on the provision of complementary services and products. Moreover, the boundaries between the use of, the contribution to and the distribution of OSS are becoming blurred. This is due to the effect that the user can become quite easily a contributor, especially in connection with the open platform approach and due to the ever-increasing combination of OSS with proprietary software, which is primarily utilized in the dual- / multi-licensing as well as hybrid / proprietary extensions approach. Finally, it should be noted that applicable business models related to OSS are connected to the control and ownership structure of the OS community. In this context, the two following OS community facets are distinguished:
Open Core / Freemium
(continued)
3
Hybrid / Proprietary Extensions
Dual-/Multi-Licensing
Chesbrough & Appleyard (2007), Perr et al. (2010)
Authors
Chesbrough & Appleyard (2007), Lakka et al. (2011), Perr et al. (2010) Vendor licenses software under different licenses (free ‘public’ Chesbrough & Appleyard (2007), or ‘community’ license versus paid ‘commercial’ license) Hall (2017), based on customer intent to use, modify or distribute the Lakka et al. (2011), software (Perr et al. 2010, p. 441). Okoli & Nguyen (2015), Perr et al. (2010), Popp (2015), Riehle (2012) Companies broadly proliferate OS applications and monetize Bonaccorsi et al. (2006), through sale of proprietary versions or product line extensions. Chesbrough & Appleyard (2007), Variants include mixed OS / proprietary technologies or Krishnamurthy (2005), service with free trial or ‘community’ versions (Perr et al. Lakka et al. (2011), 2010, p. 442). Okoli & Nguyen (2015), Perr et al. (2010), Popp (2015), Riehle (2012), Watson et al. (2008) Vendor offers a limited, ‘standard’ or ‘lite’ version of the Hall (2017), offered software under a FLOSS license, often also provided as Krishnamurthy (2005), Software as a Service (SaaS). Revenue is generated through Okoli & Nguyen (2015), enhanced or ‘enterprise’ versions, including extended Popp (2015), functionality or performance, of the software under a Riehle (2012) commercial license (Hall 2017, p. 436).
Consortia of end-user organizations or institutions jointly develop OSS applications to be used by all (Perr et al. 2010, p. 442). Vendor sells and supports hardware device or appliance incorporating OSS (Perr et al. 2010, p. 442).
Community Source / Consortia
Device
Descriptions
Models
Table 3.4 Business Models in the Field of OSS. (Source: adapted from Perr et al. (2010, p. 441 f.) and extended by the doctoral candidate) 78 Open Source Software and Firm Involvement
Open Platform
Generation of revenues through annual service agreements bundling OSS with customer / technical support and certified software updates and bug fixes (Perr et al. 2010, p. 441).
Generation of revenues through the sale of customer support contracts (Perr et al. 2010, p. 441).
Subscription
Support
Professional Services / Consulting
Descriptions
Vendor provides a computing or service platform under a FLOSS license. Generation of revenues through advertising, complementary commercial services or products, such as plug-ins or extensions (Hall 2017, p. 436 f.). Generation of revenues through the provision of professional and legal services, training, auditing, consulting in the context of OSS or customization of OSS (Perr et al. 2010, p. 441).
Models
Table 3.4 (countinued)
Bonaccorsi et al. (2006), Chesbrough & Appleyard (2007), Hall (2017), Krishnamurthy (2005), Lakka et al. (2011), Okoli & Nguyen (2015), Perr et al. (2010), Popp (2015), Riehle (2012), Watson et al. (2008) Bonaccorsi et al. (2006), Chesbrough & Appleyard (2007), Krishnamurthy (2005), Lakka et al. (2011), Okoli & Nguyen (2015), Perr et al. (2010), Riehle (2012), Watson et al. (2008) Bonaccorsi et al. (2006), Chesbrough & Appleyard (2007), Krishnamurthy (2005), Lakka et al. (2011), Okoli & Nguyen (2015), Perr et al. (2010), Popp (2015), Riehle (2012), Watson et al. (2008)
Hall (2017), Okoli & Nguyen (2015)
Authors
3.3 Business Models related to Open Source Software 79
80
3
Open Source Software and Firm Involvement
• In Community OS projects control is executed by a community of stakeholders and the source code of the project is released under a single license. From a market perspective, anyone can start a business related to the OS project, of course, in compliance with the software license terms (Riehle 2012). Business models compatible with community OS projects are for example, professional services / consulting, support and subscription. • In Commercial OS projects control is in most cases executed by one stakeholder with the intention of commercial exploitation of the project, which is possible because the stakeholder—most likely a company—owns the copyright of the software. On the one hand, the stakeholder publishes the source code under an OS license to fulfill the requirements of an OS company, and on the other hand, the software can also be released under a commercial license (Riehle 2012). Business models compatible with commercial OS are for instance, dual- / multilicensing as well as open core / freemium (Lampitt 2008, Olson 2008, Riehle 2012, Shahrivar et al. 2018). From the given insight into business models related to OSS it can be seen that the topic is broad in content and very diverse. Nonetheless, numerous business models have successfully developed around OSS that are compatible with the OS license terms.
3.4
Summary
The aim of this chapter was to provide an in-depth introduction to the field of OSS in order to create a corresponding understanding of the required background related to OSS in connection with this doctoral thesis. Through the explanation of the differences between proprietary software and OSS a structured introduction of OSS was made, separated according to legal (copyright versus copyleft), technological (binary code versus human-readable source code) and organizational (cathedral versus bazaar) aspects. Further, a summarizing overview of the formation of the OS movement based on the most important and relevant events in this context was given. To shed light on the nuances between OSS and FS, the relevant license principles of OSS (ten principles) and FS (four freedoms) were delineated in detail. As OSS is developed through a community of heterogeneous contributors, which are spread all over the world, it is vital to get an understanding of the characteristics of communities in general and OSS communities in specific. The fundamental components of communities in general were explained and based on that the by scholars derived attributes of online communities and in specific the properties of
3.4 Summary
81
OSS development communities were contrasted in detail. With reference to SDT of Deci & Ryan (1985) the motivational intrinsic and extrinsic drivers of voluntary OSS contributors were considered and further the drivers of firms that are active in OSS communities were examined in depth through presenting economic, technological and social drivers. As extension to the drivers of companies getting involved in OSS communities, a variety of business models related to OSS have been revealed and further analyzed, to show how firms can run a business in accordance with license terms and conditions existing around OSS.
4
The Linux Kernel Project
The LK project is one of the most successful and one of the largest collaborative OSS development projects ever started. The LK itself is one of the largest individual parts as well as the fundamental component on nearly any Linux system, such as mobile, desktop, server or embedded device operating systems (Corbet & Kroah-Hartman 2017). Here, it is important to distinguish the • Linux kernel, which is a monolithic Unix-like operating system kernel, that builds the core of Linux operating systems by providing essential functions, for instance, hardware as well as system security and integrity management or management of user program executions (Corbet & Kroah-Hartman 2017, Love 2005) from • Linux in general, which usually describes the broad range of Linux operating systems, where the LK forms the basis. The wide spectrum of Linux operating systems includes Linux distributions for the deployment on personal computers, servers (e.g., CentOS, Debian, Fedora, openSUSE or Ubuntu) and supercomputers as well as special Linux derivatives for the use on embedded (e.g., routers, smart TVs and network-attached storages) or mobile devices, like Android running on tablet computers, smartphones and smartwatches (Butler 2011, Shacklette 2007). Although Linux plays a rather subordinate role as a desktop operating system— global market share of 1.64% in 2017 (Statista 2018)—compared to Microsoft Windows (83.75% global market share in 2017) or Apple MacOSX (12.01% global market share in 2017), today the LK is used for a broad variety of devices, systems and software, which are of significant importance. For example, the LK provides basic services for the popular mobile operating system Android—smartphone operating system market share of Android was 84.80% in 2016 (International Data © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2020 D. Homscheid, Firm-Sponsored Developers in Open Source Software Projects, Innovation, Entrepreneurship und Digitalisierung, https://doi.org/10.1007/978-3-658-31478-1_4
83
84
4 The Linux Kernel Project
Corporation (IDC) 2018)—and a multiplicity of embedded devices (62% market share (Corbet & Kroah-Hartman 2017)) like routers, smart TVs or GPS navigation devices. In addition, most of the world’s supercomputers (99% market share (Corbet & Kroah-Hartman 2017)) run a specific Linux distribution as an operating system and a majority of public cloud providers utilize Linux as their operating system—90% of public cloud workload is processed by Linux systems (Corbet & Kroah-Hartman 2017).
4.1
History of and Facts about the Linux Kernel
A Short Outline of the History of the Linux Kernel In April 1991, Linus Torvalds, a computer science student at the University of Helsinki, Finland, started the work on the LK out of a simple idea for a computer operating system (Torvalds 1999). In August 1991, Torvalds asked in a newsgroup on Usenet for feedback on things people liked or disliked with regard to the operating system Minix, as his work is in a way similar to it. After this appeal the attention for the work of Linus Torvalds and also code contributions from other voluntary developers to the LK project increased rapidly (Kuwabara 2000). In September 1991, Torvalds published the 10,239 lines of code comprehensive version 0.01 of the LK via the Finnish University and Research Network. About three years later, in March 1994 the LK version 1.0.0 was released and comprised 176,250 lines of code (Antoniol et al. 2002, Kuwabara 2000, Merlo et al. 2002). In 2007, the Linux Foundation was formed as legal body of the LK project by the merger of the Open Source Development Labs (OSDL) and the Free Standards Group (FSG) (Chesbrough & Appleyard 2007). The tasks of the non-profit organization include the standardization in the field of Linux, the legal protection of the source code of the developers, the promotion and protection of the Linux brand as well as the payment of important LK fellows such as Linux creator Linus Torvalds or lead maintainer Greg Kroah-Hartmann (de Laat 2007, The Linux Foundation 2018). Over the past two decades, the LK project has continued to develop successfully, bringing the release 4.13—published in September 2017—to 60,538 files and 24,766,703 lines of code (Corbet & Kroah-Hartman 2017). Facts about the Linux Kernel The LK source code is licensed under the GNU GPL version 2.0. A salient point with regard to the license is that modifications or derived works of the LK source code
4.2 Governance of the Linux Kernel and the Linux Kernel Community
85
must also be published under the GNU GPL version 2.01 (Love 2005). In general, the in the programming language C written LK is built in a monolithic way, whereby for instance, device drivers are implemented as loadable kernel modules, which can be loaded or unloaded while the system is running. The tasks that the LK takes care of and handles are, for example, services around multitasking, threading, shared libraries, memory management and demand loading, just to name a few (Bovet & Cesati 2005, Love 2005). The loadable module principle is one of the success factors of the LK. The principle is implemented by a data abstraction layer, which provides a common interface for selected LK subsystems. In this way, developers can focus on programming their module, such as device drivers, without understanding the full complexity of the kernel (Bovet & Cesati 2005, Mauerer 2010). In the same way, hardware-specific code is organized into separate modules in each subsystem. This ensures that the hardware-specific code of the LK can be implemented at a manageable expense for new hardware platforms (Lee & Cole 2003, Mauerer 2010).
4.2
Governance of the Linux Kernel and the Linux Kernel Community
Introduction into the Governance of the Linux Kernel In general, the understanding of OSS governance can be rather broad (Lynn et al. 2001, Markus 2007) and includes for instance, software license regulations (e.g., de Laat 2005), roles and responsibilities (e.g., Mockus et al. 2002) as well as outcome control (e.g., Lattemann & Stieglitz 2005) and norms and beliefs of reciprocity (e.g., Shah 2006). Regarding the LK, governance is considered in terms of the structure of roles and responsibilities. The governance structure of the LK project can be seen as pyramid-like and is very much based on trust. This refers to both, the trust that Linus Torvalds brings to his subsystem maintainers, which he appoints, and the trust that maintainers place in developers. This circumstance is also reflected by the fact that the maintainers retain their function until they voluntarily resign (Software Engineering Daily 2017). Nonetheless, in the past there have been recessions of LK maintainers, due to way of behaving in the community that they did not want to support—first and foremost, the recurrent verbal outbursts of Linus Torvalds (e.g., Ackerman 2013, Sharwood 2015).
1 For
more information about characteristics of software licenses in the field of OSS and FS, please refer to Sub-Section 3.1.3.
86
4 The Linux Kernel Project
Specifically, the LK project has established an organizational structure that can withstand growing demands due to ever-increasing workloads. The LK community implements a pyramidal role concept – illustrated in Figure 4.1 –, in which the roles of subsystem maintainers and branch, driver, file maintainers occupy the most important positions (Moon & Sproull 2010, Shaikh & Cornford 2003). The level of the latter named maintainers are in the literature also partly described via the term credited developers or trusted lieutenants (Dafermos 2001, Iannacci 2005). In addition, the LK governance structure is also associated with the benevolent dictator model, as Linus Torvalds steers the project as unique leader (Raymond 1999). Here, dictator has to be understood as diplomatic leader, who also locates people with influence in the right places in the community structure and stands for a project vision that the community members follow (Gardler & Hanganu 2013a). Nevertheless, Torvalds does not decide everything alone, mainly due to the ever-increasing workload. Hence, the maintainers are heavily involved in the responsibility for their supervised subsystem including reviewing and merging patches as well as making decisions, which are supported by Torvalds (Corbet & Kroah-Hartman 2017). The delegation of tasks to the subsystem maintainers is underlined by the following example: since LK version 4.7 to version 4.13 Linus Torvalds signed off 207 patches or 0.3% of the total only (Corbet & Kroah-Hartman 2017). In addition, the LK consists of approximately 150 main subsystem trees and around 700 subsystem maintainers (Corbet & Kroah-Hartman 2017, Software Engineering Daily 2017), whereby the subsystem trees are split up in overall around 1,800 branches (including the subsystem trees) with a total of about 1,300 specific subsystem and branch maintainers2 .
Insights into the Linux Kernel Community The community of contributors builds the vital foundation of the LK development project. Here, the LK project pursues the Bazaar model as contribution approach, that is encouraging a broad basis of developers to contribute and having short and often release cycles (Gardler & Hanganu 2013b, Raymond 1999). In the early days of the LK project, the community was made up of enthusiastic hobbyists, who have voluntarily written kernel source code during their spare time. With the further evolution of the project more and more companies participated in the development of LK source code, mainly with the aim to optimize the kernel for their specific needs, for example, bringing company-specific hardware drivers into the kernel 2 Information was evaluated from the Linux kernel maintainers file, as of July 9, 2018, https://
github.com/torvalds/linux/blob/master/MAINTAINERS, last accessed December 18, 2018.
4.2 Governance of the Linux Kernel and the Linux Kernel Community
87
Linus Torvalds
Subsystem Maintainers
Branch, Driver, File Maintainers
Developers, Submitters
Figure 4.1 Linux Kernel Governance Structure
(Corbet & Kroah-Hartman 2017). LK contributors—regardless if hobbyists or firmsponsored—who, for example, fix bugs or program new drivers as well as optimize source code, are of particular importance. Over the last decade the ratio of developers contributing the majority of the source code has changed. This is mainly due to the increasing variety, intensity and complexity of tasks, which include, for example, new driver development, bug fixing, reviewing of contributed patches, hardening the kernel as well as testing and maintaining of previous LK versions (Love 2005). This situation is also illustrated, inter alia, by the following facts: Around 2005 the Top 20 most active contributors were responsible for writing about 80% of the LK source code (Computerworld UK 2007). This ratio has changed significantly; today the Top 30 most active developers contribute about 19% of the source code—that is the result of the analysis of the Top 30 LK developers contributing to LK version 4.8 (released in August 2016) to LK version 4.13 (released in September 2017) (Corbet & Kroah-Hartman 2017). On average, about 1,700 developers and about 260 companies contributed to each LK release with regard to the development cycles of LK version 4.8 to 4.13. Over the entire six LK development cycles (LK version 4.8
88
4 The Linux Kernel Project
to 4.13) nearly 83,000 patches have been merged, which equates to an average rate of 8.5 patches per hour. Highly involved companies included Intel (10,833 patches, 13.1% of the total), Red Hat (5,965 patches, 7.2% of the total) and Linaro (4,636 patches, 5.6% of the total) (Corbet & Kroah-Hartman 2017).
4.3
Explaining the Linux Kernel Development Process
The LK development team publishers a LK release every three months in average, which is possible because of the fast moving development process. The LK development follows a well-structured process, which contributed to the success of the project over the last two decades with its well-established procedures (Corbet & Kroah-Hartman 2017). To get an impression of how the LK development process works, it is described in a short and simplified way as follows. At first, a developer programs and tests the change (e.g., a bug fix, a driver or a code optimization) he wants to bring into the LK. The number of subsystems it affects determines, if the change will be executed as a single patch or multiple patches. Secondly, for each affected subsystem the patch is sent by his author via email to the maintainer and the mailing list of the LK subsystem. Thirdly, a review process starts, as the maintainer of the subsystem and the readers of the subsystem mailing list will examine the patch and give feedback via the subsystem mailing list. Fourthly, the maintainer brings the patch into his Git3 LK tree when the review and revise process is complete and he thinks that the patch brings a change that is desirable to have it incorporated. The next step depends on the importance of the patch. If the patch is of high importance (e.g., a critical bug fix) a pull request will be sent immediately to Linus Torvalds. In other respects, the request is sent to Linux Torvalds within the next merge window. Directly after a LK release, the two-week merge window of the next LK version starts—see also Figure 4.2, which depicts the LK release cycle. The final step, with regard to getting a patch into the LK, is then in the hands of Linus Torvalds, who decides which patches to include in the next LK release. After the merge window, the closed phase starts with intensively testing the new LK release candidate. Over a timespan of about eight to ten weeks no more patches with new functionality are accepted, because the community works focused on tasks, such as testing and bug fixing, to get the next LK release candidate stable and error free. A LK release candidate is published every Sunday during the release candidate cycle by Linus Torvalds. With the release of the new LK by Linus Torvalds the compi3 The
free and open source distributed version control system Git is used since 2005 for the management of the LK source code (Kroah-Hartman et al. 2009), https://git-scm.com, last accessed December 18, 2018.
4.4 Linux Kernel Project as Research Context
89
lation of the next LK release candidate starts again with the merge window (The Kernel Development Community 2018, Software Engineering Daily 2017).
v.4.13-rc1
v.4.12
Release
Merge window (two weeks)
v.4.13-rc2
v.4.13-rcX
Release candidate cycle (eight to ten weeks)
v.4.13
Release
Figure 4.2 Exemplary Linux Kernel Release Cycle
As can be concluded from the above explained LK development process, the LK development trajectory is not specified by a central roadmap, rather, the further development is determined by technical guidelines and the available resources that are brought in independently by individuals and companies (Morton 2005) or as Linus Torvalds states it “Linux is evolution, not intelligent design” (as cited in Kroah-Hartman 2008).
4.4
Linux Kernel Project as Research Context
To find a relevant OSS project for this doctoral thesis different aspects have been taken into account, for example, size of the OSS project, activity and continuity, company involvement, availability of a large set of data. Finally, the LK project was chosen as research context for all studies of this doctoral thesis (see Chapters 6 to 8), which has served as an example for a suitable OSS project in many previous research studies (e.g., Antikainen et al. 2007, Israeli & Feitelson 2010, Lee & Cole 2003, Moon & Sproull 2010, Riehle et al. 2014). The LK project was initialized by Linus Torvalds in 1991 and has been one of the most active OSS projects since its beginning. There are software releases every three month on average, which are possible because of the fast-moving development process and the broad foundation of contributors, ranging from hobbyists to companies. Thus, it involves more people than any other OSS project. The kernel itself makes up the core component of any Linux system and is used in operating systems for mobile devices right up to operating systems for supercomputers. Typically, a new release of the kernel comprises over 10,000 patches contributed by over 1,700 developers representing over 260 companies and is published under the GNU GPL version 2.0 (Corbet & Kroah-Hartman 2017). Besides the fact that the LK is one of the largest cooperative software projects ever started, it has also an economic relevance, as many companies
90
4 The Linux Kernel Project
have business models that rely on the LK or on software working on top of the LK, respectively. Many of these companies do actively participate in the improvement of the kernel and thereby take effect on the orientation of the development. Very active companies in the kernel development, among others, are Red Hat, Intel, IBM, Samsung, Google and Oracle (Corbet et al. 2013, Corbet & Kroah-Hartman 2017).
4.5
Summary
The LK project is a showcase for a very successful collaborative OSS development project, whose success is achieved by both hobbyists and firms, more specifically, firm-sponsored contributors. Over the last two decades the project around the monolithic Unix-like operating system kernel has developed from a hobby project of a single interested computer science student to an OSS project represented by the Linux Foundation with approximately 1,700 active developers as well as 260 firms contributing to each LK release and over 24.76 million lines of code as of September 2017 (Corbet & Kroah-Hartman 2017). The project itself, with its pyramid-like governance structure has repeatedly adapted to new circumstances, which resulted from increasing development speed, complexity and the growing size of the project. In relation to the involvement of a variety of firms in the LK project, it can be concluded that these do also benefit from their participation related to the extension and improvement of the LK. On the one hand they benefit with respect to their business and on the other hand they may take effect on the orientation of the LK development. Thus, as the LK builds the basis for a large variety of software systems and devices, it has a high economic relevance, as a major amount of companies operate business models based on software or devices incorporating the LK. To illustrate, where the LK project can be located in a matrix depicting a value creation dimension and a value capture dimension with respect to closed and open innovation examples, Figure 4.3 depicts where the LK project can be categorized. The value creation of the LK project and in specific of the LK source code is driven and accomplished by a large community. Moreover, the value created is captured by the broad ecosystem, as the outcome of the LK project, which is the LK source code, is comparable to a public good with its characteristics of non-excludability and non-rivalry (Chesbrough & Appleyard 2007, Wasko et al. 2009). When IBM develops source code that is incorporated into the LK the value is created in-house at IBM, but the whole LK ecosystem benefits from it and can capture the value accordingly. In contrast to this, the in-house value creation of Microsoft’s Windows operating systems and the associated value capture by Microsoft are an example of a closed innovation approach with proprietary software..
Value Capture
Company
4.5 Summary
91
Microsoft’s OS
MySpace
Google
YouTube
Ecosystem
IBM Linux code Linux Kernel
Wikipedia Pirated Music Complementors
In-House
Community-Driven Value Creation
Figure 4.3 Linux Kernel Value Creation—Value Capture Matrix. (Source: Chesbrough & Appleyard (2007), p. 63), reprinted by permission from SAGE Publishing)
5
Collection and Cleanup of Network and Source Code Data
The data basis of the studies of this thesis—which are presented in the following Chapters 6, 7 and 8—is formed by the data sources LKML and LK version control system Git. Thus, the processes of data crawling and data cleaning related to both data sources are described hereafter in Sections 5.1 and 5.2.
5.1
Linux Kernel Mailing List Data
5.1.1
The Process of Data Crawling
To collect the necessary data for conducting the research studies of this doctoral thesis the LKML web archive was crawled by exploiting the online platform ‘marc.info’1 . Via the ‘linux-kernel’ mailing list2 LK development issues (e.g., patches) are discussed as well as kernel bugs reported. For the process of crawling the LKML web archive, storing the retrieved information into a local MySQL database and for processing the data, a custom-made Java based software package3 was utilized and extended by the doctoral candidate. For the context of this doctoral thesis it was taken recourse to four modules of the Java based software package in original and extended form. These modules are described briefly as follows4 . 1 http://marc.info/?l=linux-kernel,
last accessed December 18, 2018. last accessed December 18, 2018. 3 The software used was developed by Christoph Schneider as part of his Master thesis at the University of Koblenz-Landau in 2009 (see Schneider (2009)). 4 For more detailed information about the functioning of the software package and the underlying database model, please read further in Christoph Schneider’s Master thesis (Schneider 2009, pp. 47–70). 2 http://vger.kernel.org/vger-lists.html#linux-kernel,
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2020 D. Homscheid, Firm-Sponsored Developers in Open Source Software Projects, Innovation, Entrepreneurship und Digitalisierung, https://doi.org/10.1007/978-3-658-31478-1_5
93
94
5
Collection and Cleanup of Network and Source Code Data
Web Crawler: The first module of the software package is made up of the web crawler, which retrieves the mailing list messages from the online archive platform ‘marc.info’ and stores these into a local MySQL database. A MySQL database server was installed and configured accordingly on a private cloud instance using the cloud computing resources kindly provided by the Institute for Web Science and Technologies5 of the University of Koblenz-Landau. As the web crawler was specifically developed for parsing the platform ‘marc.info’ it captures in a first step the structure of the message threads separated by months and years and in a second step the metadata and content of each LKML message belonging to a specific message thread. Figure 5.1 shows an example of the first 30 mailing list message threads as of October 2013. Communication on the same topic, identifiable by the same subject line of messages, is grouped together in message threads by the mailing list web archive platform. The number of messages in a conversation is given in squared brackets (see Figure 5.1, column three). Moreover, per message thread the date, the subject and the sender name of the first message is given in the thread list of a month. Figure 5.2 shows the content and additional metadata of a specific mailing list message, which was crawled, processed and stored according to the underlying database schema into a local MySQL database by the utilized software. The metadata comprises information about • • • •
the mailing list, the message was sent to (here the ‘linux-kernel’ list), the subject line of the message, the name and encoded email address of the message sender and the posting timestamp.
Person Mapping: Given the long time period of 19 years—from 1996 to 2014—of the crawled LKML data the captured information about individuals active on the mailing list require a detailed consideration. The long time period and the email data itself bring challenges, when considering the LKML communication on an individual level. First, individuals may use simultaneously various email addresses, for example, a corporate email address and a private one. Second, email addresses or the affiliations of individuals may change over time, for example, using a private email address because of starting as a LK hobbyist followed by a corporate email address due to getting a job in LK development or using different corporate email addresses because of changes of employers. Third, even the sender name of the
5 https://west.uni-koblenz.de/en,
last accessed December 18, 2018.
5.1 Linux Kernel Mailing List Data
95
Figure 5.1 LKML Thread List of a Month. (Source: http://marc.info/?l=linux-kernel&r=1& b=201310&w=2, last accessed December 18, 2018)
same email address may be different from time to time because of, for instance, typing errors (Schaarschmidt 2012). To overcome the aforementioned challenges in connection with email address data and to increase the validity of individual level evaluations of LKML actors the second module of the utilized software package identifies multiple occurrences of individual LKML actors by analyzing the email addresses as well as sender names for each individual within the database. The program creates a mapping table, which maps multiple occurrences of an individual with either the same sender name but various email addresses or the same email address but different sender names to one related person object. This software module checks first for email addresses and second for sender names. To filter out general, not relevant messages sent to the LKML two blacklists—one for email addresses and one for sender names—were utilized to exclude specific email addresses (e.g., containing mailer-
96
5
Collection and Cleanup of Network and Source Code Data
Figure 5.2 Metadata and Content of a LKML Message. (Source: http://marc.info/?l=linuxkernel&m=138317815814819&w=2, last accessed December 18, 2018)
daemon, postmaster or noreply) or sender names (e.g., mail delivery system) from the mapping process. The process of mapping multiple occurrences of an individual by his name may lead to an incorrect mapping in the case that different actors active on the LKML share the same name and thus are mapped to one person object. At this time, this limitation is accepted so far, but this circumstance is analyzed in more depth in the context of the individual studies for the period of investigation of the LKML data chosen for hypotheses testing. In addition, all multiple occurrences of an individual were connected to the first person object the mapping software can determine in the database. The resulting limitations are solved in connection with the extension of the software module Adjacency Matrix Creator, which is explained in the following. Maintainer and Credits Alignment: To see whether central actors in the LKML also occupy central positions in the governance structure of the LK project, for example, as maintainer or stand out by their source code contribution for the LK, the identified actors of the LKML have to be appropriately attributed. This task was executed by the third module of the used software package. As basis for this
5.1 Linux Kernel Mailing List Data
97
attribution the maintainers and credits file of the LK version 4.46 released January 10, 2016 was utilized. The maintainers file lists all LK modules and components with the respective person(s) that maintain(s) all code changes (i.e., patches) for this specific LK module or component in a very detailed way. The in this list stated email addresses of the maintainers were used to draw a connection to the LKML actors. In line with the OS ideology that contributors are named for their source code contribution in the corresponding source code file, the LK project maintains furthermore a so-called credits file, which contains LK contributors that showed above-average commitment related to contributions to the LK. Again, the in the credit file listed email addresses of the credited developers were utilized to get the linkage to the identified LKML actors. The program compares the email addresses included in the two introduced files with the email addresses in the person table of the LKML database. In case of a match a corresponding flag is set for the identified person in the person table. Adjacency Matrix Creator: To further analyze the LKML actors’ data the information stored in the local MySQL database has to be brought into a format that is applicable for social network analysis, as the communication of the numerous individuals on the mailing list has to be modeled as directed network. Therefore, the fourth module of the utilized software package extracts the gathered data from the relevant tables of the local MySQL database and transforms as well as stores it into a format (i.e., adjacency matrices) that can be handled by statistical programs to calculate the required social network measures (Boccaletti et al. 2006). To achieve this, the adjacency matrix creator first retrieves mailing list messages from the local database on a monthly basis for a given period of time. During this process the software checks for duplicate messages—these could occur, if a message was sent to the LKML and at the same time to additional mailing lists7 —and removes them from the internal message list. Second, the software iterates over the created internal message list in order to check if a message had a specific recipient in its message thread, to which the sender responds with his message. If a specific recipient can be identified the software creates a relation between the message sender and the recipient. In case no explicit recipient could be identified the program creates a relation between the message sender and the pre-sender of a message in the message thread. 6 https://mirrors.edge.kernel.org/pub/linux/kernel/v4.x/linux-4.4.tar.gz,
last accessed December 18, 2018. 7 Overview of mailing lists for LK developers available via kernel.org, http://vger.kernel.org/ vger-lists.html, last accessed December 18, 2018.
98
5
Collection and Cleanup of Network and Source Code Data
As the communication between the different actors on the LKML is thus modeled as a directed graph and network, respectively, in which nodes are actors and directed edges (i.e., arcs) are a reply of one actor to another (e.g., Borgatti & Everett 2006, Krackhardt 1994), the software stores the gathered information into an adjacency matrix, which depicts the facts and represents the network structure in the time frame under consideration. Accordingly, a zero in the matrix states that there is no relation between the two considered actors, whereas a number greater than zero (i.e., edge weight) indicates a relation between the two respective actors and simultaneously gives the amount of directed messages sent between, for example, actor 1 and actor 2 in the time frame the matrix was calculated for. Adjacency matrices form in the following studies of this doctoral thesis the basis for the calculation of network measures (e.g., Wasserman & Faust 1994) of the LKML actors. As stated before in the paragraph about Person Mapping, the software maps every person occurring multiple times on the LKML to the first person object the mapping software can identify in the local database. To overcome this limitation a Java program was written, which stores the person-related affiliations of the LKML actors year by year in a new database table and is used by the adjacency matrix creator. Thus, with the help of this linkage temporal affiliations of LKML actors can be analyzed without the aforementioned limitation. In summary, the utilized and extended software package crawls rather unstructured LKML data, processes it for storage into a database and transforms it into adjacency matrices, which represent the communication of the LKML actors through spanning a directed graph with corresponding edge weights. This enables the calculation of network measures in the further steps. However, the data were beforehand subjected to intensive plausibility checks.
5.1.2
The Process of Data Cleaning
Usually the process of data crawling is conceived for one crawler instance that sequentially crawls the required content of the set time frame. Due to the time required for crawling the large period from 1996 until 2014 and a five seconds delay between every web crawler request to ‘marc.info’—to comply with their robot policy8 and to prevent IP address blocking by the web archive platform— several web crawler instances with different user-agent information and different IP addresses, realized through a VPN service, were utilized. As the process of collecting the data was done by multiple crawling agents over the period of six months, 8 See Section ‘Robot Policy’ on http://marc.info/?q=about, last accessed December 18, 2018.
5.1 Linux Kernel Mailing List Data
99
threads and messages could have been crawled more than once, although the agents worked separated by the months to crawl. This circumstance could occur, for example, if a message thread starts in one month and continues in the next. Although, maybe the crawling agent is not instructed to crawl the next month, as this is done by another crawling agent, the crawler follows the thread structure and crawls the complete message thread irrespective of the months the communication is stretched or regardless of the set crawling time period. As a consequence, the crawled data of the LKML web archive—making up about 15 gigabytes of data—were subjected to intense plausibility checks. Accordingly, a set of SQL statements was created to examine both the data rows of each table (i.e., checking for duplicates) and the data links between the tables, to ensure the integrity of the data (i.e., data consistency) (Date 2000). Hereafter the main outcomes of the conducted checks are described. Checking for multiple occurrences of the same unique thread and message identifier (see Sub-Section 5.1.1, paragraph Web Crawler) in the MySQL database and thus checking for messages that were crawled multiple times revealed that 106,806 messages were crawled more than once, resulting in 110,819 multiple messages. These were removed together with all dependent records from the database with the help of a developed Java program. Furthermore, the analysis of the data showed that there were message threads recorded in the database without containing messages. Altogether 2,835 empty message threads were deleted from the database. In a further step, erroneous foreign key dependencies between the records of referencing tables were detected and corrected accordingly, to ensure referential integrity (Date 2000). Such erroneous dependencies may have arisen due to, for example, an unexpected termination of the connection of the crawling agent to the local MySQL database during the crawling and storing process. When checking the crawled information (i.e., name and email address) of individuals active on the LKML it was discovered that the email address ‘[email protected]’ was recorded 455 times with different sender names all in the format stating ‘tip-bot for’ followed by a name (e.g., Namhyung Kim or Ingo Molnar). Moreover, additional 766 sender names containing ‘tip-bot for’ connected to personal email addresses were identified. As the tipbot9 is a robot that sends automatically generated emails to the LKML to inform about code merges, in specific in the tip tree, person objects containing tip-bot in the sender name or ‘tipbot@’ in the email address were excluded from further analysis.
9 For further information about LK tipbot notifications please refer to http://lwn.net/Articles/ 357483/ and http://linux-kernel.2935.n7.nabble.com/What-is-tip-bot-td603228.html, both last accessed December 18, 2018.
100
5.1.3
5
Collection and Cleanup of Network and Source Code Data
The Process of Contributor Categorization
People involved in the LK development and accordingly in the communication through the LKML are motivated by different internal and/or external stimuli (see e.g., Sub-Section 3.2.2 and 3.2.3). Thus, with respect to existing research two major contributor groups can be determined in the LK development: firm-sponsored developers and hobbyists (e.g., Corbet et al. 2015, Schaarschmidt 2012). To detect possible categories of allocation for contributors active in the crawled LKML the approach of previous research and analysis studies was followed (e.g., Kroah-Hartman et al. 2008, Schaarschmidt 2012) and the domain name of the email addresses as well as the top-level domains (TLD; e.g., .edu) were examined to derive a categorization. Simply spoken, contributors sending messages from domains indicating that they are working for the respective organizations (e.g., ericsson.com, intel.com or redhat.com) and thus most likely are employed by the corresponding companies are classified as firm-sponsored contributors. In addition, contributors using email addresses with a domain name related to public email providers (e.g., aol.com, gmail.com or yahoo.com) have been categorized as hobbyists. Upon further analysis of the email addresses it has been found that hobbyists often use email addresses under their own personal domain name, for example, christophjaeger.info, lucasnussbaum.net or scherping.de, which were accordingly categorized as hobbyists. LK contributors with an affiliation to universities, may it be for example, students or academic staff, were mainly identified and categorized by means of the TLD, like for instance, .edu or .ac.uk. Another contributor category, which could be deducted from the LKML data comprises LK contributors that were identified to have affiliations with research institutions (e.g., CERN, Fraunhofer, Max-Planck-Gesellschaft), standardization organizations (e.g., IEEE, W3C) as well as government (e.g., NASA, NIST) or military (e.g., Army, Navy) institutions. This attribution was made with the help of the domain name (e.g., cern.ch, mpg.de) and the TLD (e.g., .gov, .mil). In addition, contributors affiliated with Linux-near foundations or communities were manually identified by the domain name (e.g., freebsd.org, linux-foundation.org, opensuse.org) and accordingly categorized. It is known that the categorization of LK contributors by the domain name of their email addresses is not free from limitations. For example, LK contributors using an email address with the domain name @kernel.org are closely affiliated with the Linux Foundation but may also work for a LK supporting company, here using the @kernel.org email address to hide their real company affiliation (Schaarschmidt 2012). Table 5.1 provides detailed information about the identified and applied LKML contributor categories. To go more into detail, assigning LKML actors to a contributor group was done in a semi-manual and semi-automated process in order to obtain a high accuracy of
5.1 Linux Kernel Mailing List Data
101
Table 5.1 Coded LKML Contributor Categories Codes
Contributor Categories
null
No coding
0 1 2 3 4
5
Descriptions
LKML actors that were not coded Undefined LKML actors affiliation could not be determined Firm-sponsored LKML actors identified as firm-sponsored contributors Hobbyists LKML actors identified as hobbyists University-related LKML actors affiliated with universities InstitutionsLKML actors affiliated with related research / standardization institutions or public authorities Linux-near LKML actors affiliated with Linux-near foundations or communities
Examples
amd.com, hp.com, samsung.com aol.com, gmx.net, hotmail.com cam.ac.uk, columbia.edu, mit.edu ieee.org, nasa.gov, navy.mil
debian.org, kernel.org, kde.org
the attributions. On the basis of the crawled LKML data, ranking lists were created with the aid of SQL statements in preparation for the allocation of contributors to categories. First, a descending ordered list of the most occurring domain names for the whole time frame of the crawled LKML (i.e., 1996 until 2014) was generated. Table 5.2 lists the Top 10 of these most occurring domain names. To give an example, the Top 10 list shows that 827 unique email addresses were recorded related to the domain name intel.com, which means that 827 different actors with an intel.com email address were active on the LKML in the period from 1996 until 2014. The generated list comprised the frequency of occurrence for overall 65,435 domain names. Of these, the first 1,000 unique domain names were manually assigned to a contributor category by a trained student assistant and the doctoral candidate. The assigned category information for each domain name was afterwards imported into the database and accordingly set for every identical domain name existing in the database with the aid of a developed Java program. Second, a descending ordered list of the most active actors—in terms of messages sent—on the LKML was generated. Table 5.3 lists the Top 10 of these most active actors including the information whether they are LK maintainer and/or credited
102 Table 5.2 Top 10 of the Most Common Domain Names (Time Frame 1996–2014)
5
Collection and Cleanup of Network and Source Code Data
1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Domain Names
Frequency
gmail.com yahoo.com hotmail.com aol.com intel.com gmx.de redhat.com gmx.net msn.com earthlink.net
11,667 5,407 3,360 1,425 827 764 643 628 541 530
LK developer. From this, it can be seen that, for example, Andrew Morton has sent 45,457 emails to the mailing list in the period between 1996 until 2014. The generated list comprised the amount of messages for overall 115,041 actors, which had sent messages to the LKML. Of these, the first 1,000 most active ones were manually assigned to a contributor category by means of their email address, again by a trained student assistant and the doctoral candidate. The assigned category information for each actor was then imported into the database with the aid of a developed Java program. Table 5.3 Top 10 of the Most Active Actors on the LKML (Time Frame 1996–2014)
Names Kroah-Hartman1,2
Messages
1. Greg 86,478 Andrew Morton1 45,457 2. 3. Ingo Molnar1 39,218 Alan Cox2 38,050 4. 5. Rafael J. Wysocki1 33,886 David Miller1,2 30,509 6. 7. Linus Torvalds1,2 29,680 Peter Zijlstra1 26,305 8. 9. Andi Kleen2 25,047 Stephen Rothwell2 24,034 10. Status: 1 LK maintainer, 2 Credited LK developer (Status as of February 2015)
5.1 Linux Kernel Mailing List Data
103
During the process of manually assigning actors to a contributor category, based on the domain name of their email address, different patterns related to the domain names were discovered and used for a further automated coding of actors through SQL statements. Domain names with the TLD .edu or .ac. can only be registered by academic institutions, thus these were categorized as university-related. In addition, if domain names contained ‘uni-’ (abbreviation for university) or ‘tu-’ (abbreviation for technical university) the corresponding actors were categorized as universityrelated. Moreover, LKML actors with domain names like aol.com, gmx.de/.eu/.net, gmail.com, googlemail.com, yahoo.com/.de/.ca and so on were automatically classified as hobbyists. In a same manner, the attribution of firm-sponsored developers was accomplished, if from the domain name a company could be derived, as for example, in the case of cisco.com, dell.com, hp.com, ibm.com or siemens.com. The automated attribution of actors to the category of research institutions and public authorities was supported by TLDs, such as .gov and .mil. In a further step, around 2,000 domain names for which no pattern according to the above-mentioned could be identified were checked manually. When checking the addresses, it was found that many sites are no longer available. This is due to the long time period, which starts in 1996. These no longer available websites have then been queried via the Internet archive platform10 in order to carry out a proper attribution. If this step did not lead to a result, the contributor category of the respective actor was coded as undefined (code zero). Table 5.4 shows the general quantitative results of the categorization of domain names and in turn of LKML actors. As can be seen from the tabular overview, the described and performed attribution results in a categorization of 23% (15,039) of the domain names and 59.84% (68,845) of overall 115,041 LKML actors, which were active in the time period between 1996 and 2014. This result is achieved, because the focus while categorizing was on coding the most occurring domain names. For example, with the categorization of the domain name ‘gmail.com’ to hobbyists 11,667 active actors (see Table 5.2) could be coded with a single attribution. Nevertheless, also the most active actors (see Table 5.3) on the LKML and their affiliations were considered and categorized, as these are of special interest. To assure the quality of the attribution 300 categorized domain names (i.e., approximate 2% from overall 15,039 categorized domain names) were randomly picked and the categorization manually checked. A five percent error probability of the categorizations was beforehand assumed, which corresponds to 15 records. The manual check of the 300 categorized domain names reveals two incorrect catego-
10 https://archive.org/web/,
last accessed December 18, 2018.
104
5
Collection and Cleanup of Network and Source Code Data
Table 5.4 Descriptive Information about Categorized Domain Names and LKML Actors (Time Frame 1996–2014) Different Domain Names LK Mailing List Actors
Categorized
Not Categorized
Overall
15,039 (23.00%) 68,845 (59.84%)
50,396 (77.00%) 46,196 (40.16%)
65,435 115,041
rized domain names (=0.66%). Both domain names were categorized as hobbyists, but one had to be coded as firm-sponsored and the other one as institution-related. This check shows that an error probability of the categorizations of less than one percent can be assumed for the corresponding records. The following two tables give a descriptive overview of the LKML actors’ data for the whole time frame, starting with the year 1996 until the year 2014. Here, Table 5.5 lists the number of identified LKML actors per contributor category and year and gives the percentage of the categorized actors for each year. In addition, Table 5.6 states the amount of messages that were sent per contributor category and year of the overall 2,664,780 messages sent in the time frame from 1996 until 2014. Further, it gives the percentage of messages for which the message senders could be categorized for each year.
5.2
Linux Kernel Source Code Data
As a proxy for value created by the LK source code contributors, their specific quantity of source code contributions is considered. Information about each change of the LK source code is recorded through the source code management system Git11 , which is used since the year 2005 (Kroah-Hartman et al. 2009). To collect the needed data for calculating source code contribution measures for each LK source code contributor the doctoral candidate developed a bash script to read the needed information from the LK Git repository, to process the data and to store it into a local MySQL database. Hereafter, the steps taken to accomplish obtaining information from the LK repository are briefly illustrated. First, a Linux server, running CentOS 7, was installed and configured accordingly on a private cloud instance using the cloud computing resources kindly provided by the Institute for Web Science and Technologies12 of the University of Koblenz-Landau. Second, the LK repository 11 https://git-scm.com,
last accessed December 18, 2018. last accessed December 18, 2018.
12 https://west.uni-koblenz.de/en,
1,893
27
354
316
1,197
92
4
3,883
51.25
NC
UD
FS
HY
UR
IR
LN
SA
%A
null
0
1
2
3
4
5
1996
CC*
Code
49.51
4,740
4
90
1,248
502
471
32
2,393
1997
46.86
7,627
18
135
1,662
949
777
33
4,053
1998
2000
52
167
1,599
1,808
1,412
72
6,098
2001
73
153
1,421
2,887
1,743
71
6,851
2002
75
143
1,221
3,307
1,640
87
6,158
2003
101
169
1,338
7,635
2,709
252
8,605
2004
135
168
1,316
4,314
2,081
133
7,945
2005
97
77
710
3,245
1,447
79
4,329
47.34
45.59
48.09
51.25
58.65
50.63
56.64
10,696 11,208 13,199 12,631 20,809 16,092 9,984
45
159
2,086
1,506
1,207
60
5,633
1999
Table 5.5 Identified LKML Actors per Contributor Category and Year 2006
55.14
9,593
136
73
506
3,142
1,367
66
4,303
2007
65.55
6,607
87
64
445
2,486
1,213
36
2,276
2008
70.57
6,048
73
44
312
2,531
1,264
44
1,780
2009
73.26
5,837
81
42
297
2,414
1,389
53
1,561
2010
72.98
5,973
70
42
330
2,422
1,439
56
1,614
2011
73.37
6,338
74
48
331
2,452
1,658
87
1,688
2012
77.33
6,043
77
64
303
2,505
1,679
45
1,370
2013
78.47
5,732
63
57
230
2,364
1,759
25
1,234
2014
77.91
6,007
85
63
215
2,482
1,823
12
1,327
5.2 Linux Kernel Source Code Data 105
6,361
389
3,967
3,663
9,124
471
11
23,986
73.48
NC
UD
FS
HY
UR
IR
LN
SM
%M
null
0
1
2
3
4
5
71.68
29,574
51
595
7,561
6,020
6,792
179
8,376
1997
73.02
64,368
3,532
1,796
18,271
10,504
12,525
375
17,365
1998
70.08
74,063
3,555
1,025
16,093
15,015
15,985
232
22,158
1999
70.20
82,792
4,492
685
14,411
15,303
22,984
244
24,673
2000
70.61
91,776
5,720
1,479
12,269
20,346
24,420
566
26,976
2001
2002
2003
6,067
1,523
9,945
36,033
37,248
891
28,293
2004
4,593
3,538
9,626
33,760
48,398
2,015
25,533
2005
6,100
4,455
7,758
39,458
49,339
2,212
18,975
2006
6,393
5,032
6,894
54,156
58,473
2,305
16,188
2007
12,903
1,265
7,541
55,695
80,121
2,286
15,805
2008
10,089
487
8,438
68,722
86,205
4,212
13,088
2009
13,775
363
7,625
67,619
90,892
5,116
9,029
2010
11,193
204
6,200
52,120
92,747
4,820
11,615
2011
7,984
757
4,304
62,682
94,505
4,646
9,158
2012
2013
3,284
8,425
2014
4,927
13,517
27,129
973
2,550
64,544
7,347
651
2,066
89,760
37,766
1,536
3,082
67,192
111,426 135,061 150,280
5,064
7,027
77.21
76.42
79.97
85.21
89.17
91.00
93.16
95.36
93.51
95.02
96.79
96.58
95.14
105,202 120,000 127,463 128,297 149,441 175,616 191,241 194,419 178,899 184,036 218,713 246,594 278,300
3,386
1,625
9,416
33,158
33,245
395
23,977
5
* CC: Contributor category; NC: No coding; UD: Undefined; FS: Firm-sponsored; HY: Hobbyists; UR: University-related; IR: Institutionsrelated; LN: Linux-near; SA: Sum of identified actors for each year; %A: Percentage of categorized actors for each year; SM: Sum of messages sent each year; %M: Percentage of categorized messages sent each year.
1996
Code CC*
Table 5.6 Amount of LKML Messages sent per Contributor Category and Year
106 Collection and Cleanup of Network and Source Code Data
5.2 Linux Kernel Source Code Data
107
was cloned to the local CentOS 7 server. Third, a bash script was developed to extract the needed information from the LK source code repository and to store it into a local MySQL database, which was running on a private cloud instance. This MySQL database server also hosts the database for storing the LKML data (see Sub-Section 5.1.1). To investigate whether central LK source code contributors also occupy central positions in the governance structure of the LK project or stand out for their source code contribution for the LK, the identified LK developers have to be appropriately attributed, as previously described for the LKML actors (see Sub-Section 5.1.1). For this task the software module three Maintainer and Credits Alignment of the utilized software package in the context of the LKML data processing (see SubSection 5.1.1) was adapted to the structure of the LK source code database. Thus, the program compared the email addresses listed in the maintainers and credits file with appearances in the email table of the LK source code database. In case of a match, a flag was set for the corresponding person related to the maintainer and/or credited developer attribute. After the data was extracted from the LK repository, processed and stored into the local MySQL database—making up approximately 4.5 gigabytes of data for the LK releases 2.6.13 as of August 28, 2005 to release 4.4 as of January 10, 2016—the data were checked on a random basis for accuracy and completeness compared to the LK repository, as during the gathering process—which took around two weeks—errors may have occurred that were not perceived. These random data checks included, among others, checking the quantity of commits for each release, examining file, person and email entities for uniqueness as well as comparing insert, delete, committer and source code author information with the LK repository. The automatically and manually performed checks and comparisons revealed no inconsistencies.
6
Study I: Private-Collective Innovation and Open Source Software: Longitudinal Insights from Linux Kernel Development
6.1
Introduction
Free Software (FS) and Open Source Software (OSS) have changed how researchers and practitioners look at software development and related business models (Bonaccorsi et al. 2006, Capra et al. 2011, Riehle 2012). OSS contrasts traditional in-house software development, as its outcome—the software—is freely accessible by anyone. As a result, major companies including market leaders such as Facebook, Google and Twitter use OS technologies. According to a report of Research and Markets is the global OS services market expected to grow from USD 11.40 billion in 2017 to USD 32.95 billion by 2022 (Research and Markets 2018). Due to this success firms have also started to actively engage in OSS development (Grand et al. 2004, Schaarschmidt & Von Kortzfleisch 2009, Schaarschmidt et al. 2015). While in early years, software technology companies such as IBM and Novell invested time and resources in OSS development, today even user firms (e.g., Samsung) invest in OSS development (Corbet & Kroah-Hartman 2017). Thus, today’s professional OSS projects receive contributions from hobbyists, universities, research centers, as well as software vendors and user firms (Schaarschmidt & The largest part and the results of this study have been previously published as Homscheid, D., Kunegis, J. & Schaarschmidt, M. (2015), Private-Collective Innovation and Open Source Software: Longitudinal Insights from Linux Kernel Development, in M. Janssen, M. Mäntymäki, J. Hidders, B. Klievink, W. Lamersdorf, B. van Loenen & A. Zuiderwijk, eds, ‘Open and Big Data Management and Innovation’, Vol. 9373 of Lecture Notes in Computer Science, Springer International Publishing, pp. 299–313. Reprinted by permission from Springer Nature Customer Service Centre GmbH. Since this study is largely derived from the previously mentioned publication, the personal pronoun‘we’ is used. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2020 D. Homscheid, Firm-Sponsored Developers in Open Source Software Projects, Innovation, Entrepreneurship und Digitalisierung, https://doi.org/10.1007/978-3-658-31478-1_6
109
110
6
Study I: Private-Collective Innovation and Open Source Software ...
Von Kortzfleisch 2015, Teigland et al. 2014). Theorists have referred to this kind of combined public and private investments in innovation creation as private-collective innovation (von Hippel & von Krogh 2003). This concept asserts that a private investment model—where firms create and commercialize ideas themselves—and a collective invention model, where multiple economic actors create public goods innovations, may coexist and interact under certain circumstances (Stürmer et al. 2009). In particular, the private-collective innovation model seeks to explain why firms privately invest resources to create artefacts that share the characteristics of non-rivalry and non-excludability (Alexy & Reitzig 2013, Erickson 2018, Zaggl & Raasch 2015). The private-collective model also implicitly assumes that private and public investments in innovations are approximately equal. However, successful OSS projects receive more than 85% of their code from contributors1 who are paid by a company (Corbet & Kroah-Hartman 2017) and the majority of code is written between 9 a.m. and 5 p.m.—again indicating that contributions are predominantly provided by firms (Riehle et al. 2014). These figures contrast with the picture of private-collective innovation as an invention mode where public and private interests manifest equally. The aim of this research therefore is to investigate how different contributor groups associated with public and increasing private interests interact in an OSS development project. In order to study the interplay of both interest groups we not only need to consider demographic characteristics of the community but also the structural patterns of interactions in it. To achieve this goal, we analyze developers active in the LK development community from a social network point of view, as the interaction between the members of a software development community reflects the structure of their collaboration. In particular, we investigate degree distributions and the Gini coefficient in the contributor network with respect to the private and collective contributor groups. Network centrality measures are important indicators of influence in OSS development and are known to deviate according to having firm-sponsorship or not (Dahlander & Wallin 2006). We start with detailing what volunteer and firm-sponsored (i.e., employed) developers motivate to participate in OSS development. Then, we discuss the privatecollective innovation model in more detail. Based on a data set of mailing list communication of LK developers from 1996 to 2014 we calculate network measures for each type of developer (e.g., firm-sponsored, hobbyist, university-affiliated, etc.) and compare them for each year. We discuss implications for research and provide further avenues for research concerning private-collective innovation. 1 In
this study the terms contributor, developer, actor, participant and programmer are used synonymously to denote people who are active in OSS projects and in LK development.
6.2 Theoretical Background
6.2
Theoretical Background
6.2.1
Open Source Software Contributors
111
Open Source Software Communities An OSS project relies on contributors who make up the core element of an OSS community. OSS is commonly understood as a type of software that can be used, changed, and shared by any person. The software itself is in most cases developed by a heterogeneous group of people and distributed under specific licenses, which guarantee the above-mentioned characteristics of OSS (Open Source Initiative 2018b). In general, a community arises when different people come together and share a common interest (Preece 2000). Thus, von Hippel & von Krogh (2003) conceptualize OSS development communities as “Internet-based communities of software developers who voluntarily collaborate in order to develop software that they or their organizations need” (von Hippel & von Krogh 2003, p. 209).
Besides the fact that OSS communities consist of hobbyists, who voluntarily provide their resources to the community, the definition also involves another important contributor group—organizations. Organizations differ from hobbyists in terms of their motivation to engage and are represented in the community by their employed developers. In turn, employed developers might be considered as proxies for firm interests in the community. Motivation of Voluntary OSS Developers The pertinent literature specifies intrinsic and extrinsic motivation as major drivers for hobbyists to engage in OSS projects (e.g., Hars & Ou 2002, Lakhani & Wolf 2005, Roberts et al. 2006, Wu et al. 2007). Intrinsic motivation is the execution of an activity due to the accompanying enthusiasm and not for the achievement of specific results (Ryan & Deci 2000a). A behavior is extrinsically motivated when an activity is performed for reward, recognition or because of an instruction from someone or an obligation (Ryan & Deci 2000a). Although researchers agree on different forms of intrinsic and extrinsic motivation, there is often disagreement about their relevance. The most relevant forms of both motivation types in the context of OSS developers are described briefly in the following. In connection with OSS developers, researchers investigated a plethora of intrinsic motivators. Among these, joy-based intrinsic motivation is the strongest and
112
6
Study I: Private-Collective Innovation and Open Source Software ...
most prevalent driver of OSS contributors (Lakhani & Wolf 2005). Joy-based motivation is closely linked to the creativity of a person. Frequently, contributors to OSS projects have a strong interest in software development and related challenges (Hars & Ou 2002). Another fundamental aspect of intrinsic motivation is altruism, which is the desire to help others and to improve their welfare. In OSS communities, developers code programs, report bugs, etc., at their own expense, which includes the invested time and opportunity costs. They participate in the OSS community, without taking advantages of its outcome (Baytiyeh & Pfaffman 2010, Hars & Ou 2002, Wu et al. 2007). In addition, the OSS ideology plays a crucial role for many contributors and involves • • • •
joint collaborative values and norms, such as helping, sharing and collaboration, individual values, such as learning, technical knowledge and reputation, OSS process beliefs, such as code quality and bug fixing and beliefs regarding the importance of freedom in OSS, such as an OS code and its free availability and use for everyone (Stewart & Gosain 2006).
Besides these distinguishing aspects of participants’ intrinsic motivation to get involved in OSS projects, researchers have found that extrinsic stimuli can also have an impact on the activities of actors in communities (e.g., Hars & Ou 2002, Lakhani & Wolf 2005, Roberts et al. 2006, Wu et al. 2007). An extrinsic stimulus is given through a personal need of a developer. Many OSS projects are launched because the initiators needed software with specific functions that are not available to date, and they have the willingness and knowledge to develop these (Wu et al. 2007). OSS communities offer the possibility for developers to improve their programming skills and their knowledge through participation in a project. Programmers are free to choose in which tasks they participate according to their interests and abilities. As a result, the self-learning participants are experiencing a continuous learning curve and build a repertoire of experiences, ways and means to solve specific software development tasks (Wu et al. 2007). In addition, ‘signaling incentives’ as described by Lerner & Tirole (2002) can also be a reason for people to participate in OSS communities. The incentives cover, inter alia, the recognition by other members of the community and the improvement of the professional status. Motivation of Firms Involved in OSS Development In addition to hobbyists, companies are also active in OSS communities. While voluntary OSS contributors are driven by intrinsic and extrinsic values, economic and technological aspects motivate firms to participate in OSS projects. In recent
6.2 Theoretical Background
113
times, companies open outwards to organize their innovation activities more effectively and efficiently. A means to complement their own resource base are innovation communities. In the case of software companies, OSS communities form a resource pool these firms can benefit from—depending on the strategy they pursue (Dahlander & Magnusson 2008, Grand et al. 2004). Literature investigating motivational aspects of companies active in OSS projects reveals that economic theory is not sufficient to explain the relation between firms and their OSS engagement. Andersen-Gott et al. (2012) have reviewed this issue and identified the following three categories of motivational factors that are relevant for companies active in OSS communities. 1. Innovative Capabilities: If the involvement of a company in an OSS project is aligned with the business model it maintains, the interaction with the community can lead to better or new products which imply a competitive advantage. The inclusion of external contributors increases the firm’s innovative capacity. 2. Complementary Services: The dominant way for firms to appropriate from OSS is by providing complementary services to customers (e.g., training, technical support, consultancy and certifications (Fitzgerald 2006)) aligned with their business strategy (Dahlander 2005). Firms pursuing this concept deploy own employees that also contribute to the OS project and community work. Thus, the company (1) acquires external knowledge through their own employees active in OSS development and (2) has access to complementary resources in the community, which are difficult to replicate internally (Riehle 2007). 3. Cost Reduction: Companies can publish the source code of their proprietary software under an OS license, try to attract external developers and build a community around the software. In this case, the company will get, for example, ideas for new features, bug reports, documentation, and extensions of the software from external contributors without having to pay for it (Henkel 2004, Lerner & Tirole 2002). Further, in the long run, the code is maintained by the community, such that the firm has lower costs than its competitors with proprietary software (Hawkins 2004). However, it should be noted that establishing an ecosystem and an active community around released source code is no easy task as rivals could pursue similar strategies (Ågerfalk & Fitzgerald 2008, Dahlander & Magnusson 2008).
114
6.2.2
6
Study I: Private-Collective Innovation and Open Source Software ...
Private-Collective Model of Innovation
In organization science, two different modes of innovation are dominant, namely the private investment and the collective action model. Beyond that, von Hippel & von Krogh (2003) shaped with their work “Open Source Software and the ‘Private-Collective’ Innovation Model: Issues for Organization Science” the term private-collective innovation, which stands for a synthesis of the two aforementioned models. The private investment model is associated with a rather closed innovation behavior. Innovators tend to protect their internally developed proprietary knowledge as this is the source of their profits and competitive advantage, usually preserved by intellectual property rights, like licenses, copyrights or patents (von Hippel & von Krogh 2003, von Krogh 2008). Here innovation is clearly seen as a closed process driven by private investments in order to lead to private returns, for the innovator (Liebeskind 1996). The collective action model of innovation is connected to the provision of a public good. Innovators collaborate in order to develop a public good under conditions of market failure (von Hippel & von Krogh 2003). Thus, governments often use public funding to promote the creation of relevant public goods through collective action. The produced goods are characterized by its nonexcludable access and non-rivalry of its benefits (Olsen 1967, von Krogh 2008). In general, this model requires that innovators supply their created and collected knowledge about a project to a common knowledge base and thus make it a public good. This innovation method can unfortunately be exploited by free riders, who wait until other contributors have done the work and use the outcome for free (Olsen 1967, von Hippel & von Krogh 2003). As a combination of the two previously described innovation modes does the private-collective innovation model combine beneficial aspects of both models. Roughly said, the model depicts the provision of public goods made possible by private funding. The model assumes the fact that the creation of a public good through private innovators is associated with more benefits than the pure usage of the public good, as this, for example, is done by free riders. In specific, the process of creating the good is considered to be beneficial for the innovators, as the processrelated advantages outweigh the process-related expenditures (von Hippel 2005, von Krogh 2008). OSS communities are an example for a mixture of the private investment and collective action model of innovation. OSS contributors freely reveal their privately developed source code as a public good. The developers—may they be hobbyists or firm-sponsored—do not make commercial use of their property rights, although the source code is created as a result of private investments. This innovation behavior
6.2 Theoretical Background
115
is termed private-collective innovation (von Hippel & von Krogh 2003). To get a deeper understanding of how OSS communities combine the best of both models, OSS innovation is in the following first considered from the private investment and second from the collective action point of view. From the private investment model perspective, OSS deviates in two major aspects from the conventional private investment model. First, software contributors are the actual innovators in OSS rather than commercial software developers, because they create software that is needed either by themselves or by the community. Second, OSS developers freely reveal the source code, which they have developed by private means; this manner stands in contrast to the classical closed innovation behavior. Due to the lack of a commercial market for the sale or licensing of OSS, it is made openly available as a public good (von Hippel & von Krogh 2003). Rewards for the developers are provided in forms other than money or commercialization of property rights. Contributors gain private profits such as reputation, experience or reciprocity (Lerner & Tirole 2002, von Krogh 2002). From the collective action model view, the community produces a public good with its attributes of non-excludability—specific FLOSS licenses prevent any kind of restrictions as the usage, modification and redistribution of the software by everyone is expressly desired—and non-rivalry, as using, copying or disseminating does not diminish the value of the software (Olsen 1967). Taking the above given description of the collective action model into account, the non-excludability would bring a dilemma with it because free riders benefit from the software but do not contribute to the good compared to the developers. This circumstance is not a problem, as in line with the OS ideology people voluntarily participate in OSS development and share the results without costs (von Hippel & von Krogh 2003). Moreover, contributors obtain benefits, for example, problem solving expertise, learning and enjoyment, from the participation on developing a public good, which the free rider cannot get (Lerner & Tirole 2002, Raymond 2001). The benefits in form of selective incentives are connected to the development process of the good and thus only accessible for the participants. Therefore, OSS contributions cannot be seen as pure public goods as these have significant private elements that evolved out of the ideology, which support the community (von Hippel & von Krogh 2003). In sum, the privatecollective model of innovation combines the advantages of both private investment and collective action model. Table 6.1 compares the most important aspects of the three innovation models from an economic perspective and in relation to OSS development.
116
6
Study I: Private-Collective Innovation and Open Source Software ...
Table 6.1 Comparison of different Aspects for the Private, Collective and Private-Collective Innovation Model. (Source: adapted from Demil & Lecocq (2006) and Schaarschmidt et al. (2013)) License Copyright ownership Number of participating companies Revenue stream Control intensity Knowledge sharing intensity
Private
Collective
Private-Collective
Proprietary Company One
Open Collective None
Open Collective One or more
Direct High Low
None Low High
Indirect Low High
6.3
Method
6.3.1
Data Collection and Coding of Contributor Categories2
To obtain the data needed for our research, we crawled the LKML web archive3 . We use mailing list data as it is suitable to calculate network positions that represent developers’ influence (Dahlander & Wallin 2006). The ‘linux-kernel’4 mailing list has the purpose of discussing LK development topics as well as of reporting bugs. The observation period of the LK community ranges from 1996 (beginning of the web archive) to 2014. We identified actors that occur multiple times on the list, for example, with different email addresses, but identical sender names. We have mapped these to one person object related to the email address he has used when sending a message to the list for the first time. The identified people interacting in the LKML act partly on behalf of companies. To get a deeper understanding if the actors in the mailing list are affiliated with a firm we used the domain name of the email addresses to assign people to a contributor category. Developers sending messages from a domain indicating that the person is employed by the corresponding company are classified as employed contributors, whereas people using email addresses from public email providers such as yahoo.com were classified as hobby2 For
a detailed description of the steps data crawling, data cleaning and the process of contributor categorization, please refer to Chapter 5.1. 3 https://marc.info/?l=linux-kernel, last accessed December 18, 2018. 4 http://vger.kernel.org/vger-lists.html#linux-kernel, last accessed December 18, 2018.
6.3 Method
117
ists. Likewise, we identified developers with email addresses indicating universities and research institutions. Assigning LK actors to a contributor category was done in a semi-manual and semi-automated process in order to obtain a high accuracy of the attributions. Detailed information about the different contributor categories is provided in Table 6.2. Table 6.2 LKML Contributor Categories Contributor Categories
Descriptions
Examples
Companies
People with email addresses from companies People with private email addresses People from universities
www.intel.com, www. redhat.com www.gmail.com, www. yahoo.com www.columbia.edu, www.duke.edu www.ieee.org, www.nasa. gov, www.army.mil
Hobbyists Universities Research Institutions
People from research institutions and public authorities (e.g., IEEE, government, military)
The cleaned data set comprises 1,941,119 communication replies for the total time period with overall 86,509 contributors involved. The overall distribution of the contributor groups is made of 37.96% of company developers, hobbyists represent 51.22%, universities account for 9.65% and research institutions make up 1.17%. Descriptive information about the data set is given in Figures 6.1 and 6.2. Figure 6.1 shows the quantity of identified contributors per contributor group and year from 1996 to 2014. Figure 6.2 states the amount of messages sent per contributor group and year for the investigated period.
6.3.2
Social Network Analysis
A social network represents persons connected by edges. Social networks can represent friendship relationships, communication, interaction contacts or other types of social relationships. Social network data sets are widely used, not only in the area of social network analysis, but also in the areas of data mining, sociology, politics, economics and other fields (Wasserman & Faust 1994). In order to study the interactions of developers in the LK community, we perform an analysis of the LKML’s communication network. Communication within the
118
6
Study I: Private-Collective Innovation and Open Source Software ...
Figure 6.1 LK Contributors per Group and Year
Figure 6.2 Amount of LKML Messages sent per Group and Year
Linux developer community can be modeled as a directed network, in which nodes are developers and directed edges (i.e., arcs) are a reply of one developer to another. In our data set, we ignore all messages that are not replies to other developers. A relationship between the sender of a starter message and others does only emerge
6.4 Results
119
when one or more people reply to the starter message. We perform a structural analysis of this network to study the interplay of developers interacting. The directed network of replies we consider is annotated with two additional metadata: • For replies, the posting timestamp is known. This allows us to make a longitudinal analysis of the considered network statistics. • For developers, we know their company, university or other affiliation, if any, allowing us to identify four categories of developers, as described in Sub-Section 6.3.1. We perform social network analysis with Matlab and the KONECT Toolbox (Kunegis 2013). The contribution of one user in a directed social network can be used to measure both the activity and the importance of that user in the community. We achieve this by considering the following network-based measures, each of which is defined for individual nodes: • The in-degree of a node equals the total number of replies received by a developer. The in-degree can thus be interpreted as a measure of importance of a developer. • The out-degree of a node equals the total number of replies written by a developer, and can thus be interpreted as a measure of the activity of a developer. • As a network-wide measure, we additionally define the Gini coefficient of the in-degree distribution (Kunegis & Preusse 2012), which denotes the inequality of the in-degrees. It is zero when all developers have equal in-degrees and one when a single developer received all replies. It can thus be interpreted as a measure of diversity of the community (Kunegis et al. 2012).
6.4
Results
6.4.1
Comparison of In-Degree and Out-Degree
In a first analysis, we compare the in-degree and the out-degree of all developers, i.e., the number of replies given vs. the number of replies received. Figure 6.3 shows the results of this analysis. We can observe that both measures are highly correlated— developers who receive many replies also write many replies. Thus, for the LK community the activity and the importance of developers correlate highly.
120
6
Study I: Private-Collective Innovation and Open Source Software ... 5
Number of replies received
10
4
10
3
10
2
10
1
10
0
10
0
10
1
2
3
4
10 10 10 10 Number of replies written
5
10
Figure 6.3 Comparison of In-Degree and Out-Degree
6.4.2
Comparison of Degree per Group
In this analysis, we want to find out whether the developer-based measures of activity and importance vary from one group to another. We compute the distributions of out-degree and in-degree, for each group for the whole data set aggregated over all years. The results are shown in Figures 6.4 and 6.5. The plots show that: • The highest activity as measured by the out-degree is achieved by company developers, then hobbyists, and the lowest activity is given by developers from research institutions and universities. • The measure of importance, the in-degree, correlates and shows the same pattern as for the activity: company developers have the most importance, then hobbyists, and finally developers from research institutions and universities. These results are consistent with the observation that the measures of activity and importance correlate. To verify the statistical significance of our results, we perform pairwise Mann-Whitney U tests, testing whether values of each statistic for one type
6.4 Results
121
Figure 6.4 Out-Degree Distribution
Figure 6.5 In-Degree Distribution
of developer are statistically different from the values for another group. The group differences are statistically significant (p < 0.05; company developers vs. hobbyists: p < 0.10 for the out-degree), except for developers from companies vs. developers from research institutions and hobbyists vs. developers from research institutions
122
6
Study I: Private-Collective Innovation and Open Source Software ...
for the in-degree and out-degree; developers from universities vs. developers from research institutions for the out-degree.
6.4.3
Longitudinal Analysis
In order to study the change of the community over time, we compute three groupwide measures of activity and importance for each individual year in the range 1996 to 2014. • The average value of the out-degree and the in-degree of all developers in each group, restricted to all replies given and received, respectively during a given year. • The Gini coefficient of the in-degree distribution of all developers of a given group, restricted to all replies received during a given year.
Figure 6.6 Average Out-Degree
The results of the analysis are shown in Figures 6.6, 6.7 and 6.8. The average outdegree and in-degree (Figures 6.6 and 6.7) show a consistent result with the degree
6.4 Results
123
Figure 6.7 Average In-Degree
Figure 6.8 Gini Coefficient
distribution shown in Figures 6.4 and 6.5. The average out-degree standing for activity of the developers of the different groups increases for the developers from companies as well as hobbyists over time and does not change significantly for the developers of the other contributor categories. The measure of importance, the in-degree, shows a similar behavior. The values for the developers from companies
124
6
Study I: Private-Collective Innovation and Open Source Software ...
and hobbyists increase and do not change significantly for the other types of developers. The network wide measure Gini coefficient (Figure 6.8) decreases slightly for developers from companies in the last years. The high fluctuation of the Gini coefficient for universities and research institutions is related to the small group sizes. Across all developer groups and times, the Gini coefficient is very near to one (> 98%, up to fluctuations). This value is higher than in the large majority of social networks (Kunegis & Preusse 2012), indicating that the importance in the community is concentrated in a very small number of actors when compared to other typical social networks.
6.5
Discussion and Conclusion
6.5.1
Discussion
This study provides results of an activity analysis of different contributor groups in the LK development from 1996 to 2014. The aim of this study was to investigate how different contributor groups associated with public and increasing private interests interact in an OSS development project. To achieve this goal, we analyze developers active in the LK development community from a social network perspective, as the interaction between members of a software development community reflects the structure of their collaboration. The first result of our analysis shows that the out-degree, as a measure for activity, and the in-degree, as a measure of importance, correlate highly both in general and for each contributor group individually. Thus, developers who write many replies to the mailing list also receive many replies. This phenomenon in the LKML differs from forum communication, where a variety of user roles with different communication patterns can be identified (Chan et al. 2010). Both the highest activity and importance, is achieved by company developers followed by hobbyists. The lowest activity and importance can be attributed to developers from research institutions and universities. Connecting these results to the amount of contributors per contributor category (Figure 6.1) it can be seen that although the amount of developers from companies is less than the amount of hobbyists the impact made by employed developers on the LK community is larger. The mentioned impact is expressed by the amount of messages sent per contributor group and year (Figure 6.2) as well as stated by the measures of activity (Figure 6.6) and importance (Figure 6.7), clearly seen especially from 2010 on. The second result of our analysis shows that the Gini coefficient, as a network wide measure, decreases in recent years for developers from companies and remains constant for the other groups. Although the decrease is small, this can be seen as
6.5 Discussion and Conclusion
125
a tendency that the importance in the community for the group of developers from companies is distributed to more actors. When considering the early years of the LK development (from 1996 until 2000) it can be seen that university members were the most active and important force in the project (Figure 6.6, 6.7). From 2001 the activity and importance of hobbyists and companies increases alike. Although the amount of company contributors remains relatively stable for the period starting from 2010 until 2014 (Figure 6.1) the activity (Figures 6.2 and 6.6) and importance (Figure 6.7) in that time increases sharply.
6.5.2
Conclusion
The aforementioned observations help to answer the research question of how different contributor groups associated with public and increasing private interests interact in an OSS development project, here the LK development. In the beginning, the LK project was driven by intrinsic motivated enthusiasts who are hobbyists and university members (e.g., students) and are associated with pure collective interests. As software and services that were built on top of the LK got more and more influence, the participation of firms, for example, motivated by offering complementary services (see Sub-Section 6.2.1), increased. As a consequence the private interests in this project increased, too, especially from 2010 on with the diffusion of mobile devices powered by Android (Google 2012), which uses the LK as foundation of the operating system. Summarizing the results of this study, it can be concluded that the balance between private and collective contributors in the LK development seems to be changing to an OS project that is mostly developed jointly by private companies. These firms need the kernel for their products and services. Thus, the LK project is no longer just an OS alternative for hobbyist to develop OSS for the reason not be locked in by proprietary software. The advantages for the participating companies outweigh the drawbacks in terms of collective copyright ownership and less control in the project (see also Sub-Section 6.2.2, Table 6.1) as the LK community can be utilized to complement their own resource base for innovations.
6.5.3
Implications for Research
Our findings can be classified into the context of private-collective innovation. The LK development project is an outstanding OSS project. The engagement of companies has increased in the last years, as more and more firms have business models that
126
6
Study I: Private-Collective Innovation and Open Source Software ...
rely on the kernel. Although companies cannot dictate what the community should do, they can in a way influence the trajectory of the project by assigning employed developers to the project (Schaarschmidt & Von Kortzfleisch 2015, Schaarschmidt et al. 2015). As the results of our study for the LK project show, employed developers can take key positions in a community due to the intensity of the commitment, expressed by activity and importance, for the community. With this in mind, future research should more thoroughly discuss the nature and structure of firm presence in OSS development. The majority of early research on OSS somehow neglected firm presence and centered on developer motivation while later research discussed emerging OSS business models (e.g., Bonaccorsi et al. 2006). However, our study calls for more longitudinal studies on firm presence in OSS as (1) firm engagement varies over time and (2) former hobbyists might transform into employed developers (the latter was no focus of this study).
6.5.4
Limitations and Suggestions for Future Research
Our research has some limitations that have to be considered when utilizing our study’s outcomes. The categorization of LKML actors by the hostname of their email addresses into four contributor groups is an approximate but sufficient classification. It is known that there are developers that do not use their company email address while contributing and actors may do personal work out of the office (Corbet et al. 2013). We mapped developers that occur more than once on the LKML to one person object related to the email address he has used when sending a message to the list for the first time. Multiple occurrences happen if the message sender uses different email addresses over time, but the same sender name (e.g., because of company changes). We have not considered these dynamics in our analysis. Furthermore, it has to be considered that the LK project is a unique OSS project, so that our conclusions cannot directly be transferred to other OSS projects where firm-sponsored developers are involved. Further research can consider investigating the different types and content of the interaction as well as the aforementioned dynamics of developers or compare the multi-vendor project LK to single-vendor OSS projects.
7
Study II: The Social Capital Effect on Value Contribution—Revealing Differences between Voluntary and Firm-Sponsored Open Source Software Developers
7.1
Introduction
OSS communities consist of users, developers as well as firms and constitute a resource pool companies may utilize to complement their own resource base (Grand et al. 2004). The way in which companies benefit from OSS communities varies and corresponds with the strategy they maintain (Dahlander & Magnusson 2008). For example, software development firms can benefit by capturing technical knowledge available in the OSS community (Colombo et al. 2013) but also cut costs through integrating OSS in their own products. These different strategies become possible as OSS shares characteristics of a public good, that are non-excludability and non-rivalry (Wasko et al. 2009). Although these characteristics enable firms to appropriate from OSS as such, it is difficult to differentiate own offerings that rely on, or benefit from OSS from those of competitors (Da Silva & Alwi 2008). Thus, it is vitally important to at least influence the development trajectory of an OSS project to optimize the benefits that arise from making use of OSS communities (Schaarschmidt 2012). One way of establishing influence in OSS communities is by deploying own resources to an OSS project (Schaarschmidt et al. 2013, West & O’Mahony 2008). Firm-sponsored developers are contractually bonded to the firm and are likely to behave in the firm’s intention while contributing to the OSS project A very early research in progress version of this study has been published as Homscheid, D., Schaarschmidt, M., Staab, S. (2016), Firm-Sponsored Developers in Open Source Software Projects: A Social Capital Perspective. Research in Progress. Proceedings of the 24th European Conference on Information Systems (ECIS), June 12-15, 2016, Istanbul, Turkey.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2020 D. Homscheid, Firm-Sponsored Developers in Open Source Software Projects, Innovation, Entrepreneurship und Digitalisierung, https://doi.org/10.1007/978-3-658-31478-1_7
127
128
7
Study II: The Social Capital Effect on Value Contribution …
as “a man on the inside” (Dahlander & Wallin 2006, p. 1243). Thus, assigning own paid developers to work for an OSS project is a suitable means to influence project work. On the other hand, the pertinent literature on user communities and governance in OSS maintains that a large proportion of influence individuals have in a community depends on their position in the community (e.g., Crowston & Howison 2006, Dahlander & O’Mahony 2011). This view is reflected by social capital theory, which posits that strong relationships and network positions that are advantageous to access information are valuable resources that affect different downstream variables, most importantly value creation (Tsai & Ghoshal 1998). Recently, the relation between network position in a community and different positive outcomes has been emphasized in various areas. For example, Chou & He (2011) were able to show that a developer’s social capital positively affects expertise integration, that is, individuals with high social capital synthesize project-related information for other community members. Relatedly, Wasko & Faraj (2005) found for electronic networks of practice—communities that share characteristics with OSS communities—that social capital is associated with knowledge contributions. Aside from these research efforts, important aspects that pertain to the role of firm-sponsorship have not been addressed yet. Research has to show whether the associations between network position and positive outcome as predicted by social capital theory are independent of developers’ profession or are different for heterogeneous developer groups. Against this background, this study aims to extend research that has used social capital theory to investigate online communities by addressing the following central research question: How is the relation between an OSS contributor’s social capital and his created value affected by firm-sponsorship?
To approach the research question, a conceptual model of social capital and individual’s value creation is adopted and tested. The study has theoretical as well as managerial implications. Theoretically, this study sheds much-needed light on the effect of firm-sponsorship in online communities. Managerially, this study reveals mechanisms that help firms engaged in OSS development to influence value creation. This study is organized as follows. First, a very brief introduction to the social capital concept is given and OSS communities are defined. Second, the research model along with a detailed derivation of the established hypotheses is provided. In the third step the operationalization of the utilized variables along with the setup of the used data as well as the outlier and validity consideration are depicted in detail. Subsequently, in the fourth step relevant descriptive information about LKML actors and their source code contribution as well as the results of the hypotheses tests are
7.2 Theoretical Background
129
presented. The study is completed by a discussion of the results with a corresponding conclusion and a description of implications for research and practice.
7.2
Theoretical Background
The hereafter given introduction into the theoretical background depicts a very brief outline of the most important thematic aspects. The detailed theoretical consideration can be found in Chapter 2 and Sub-Section 3.2.1.
7.2.1
The Concept of Social Capital
This study utilizes the social capital view of Nahapiet & Ghoshal (1998) as well as Lin (1999a, 2001), as Lin focuses in his considerations of social capital theory on the causal relationship between the position of an individual in a network and the associated outcomes due to the access to resources. In this sense, Lin defines social capital as “the resources embedded in social networks accessed and used by actors for actions” (Lin 2001, p. 25). Embedded resources in social networks may lead to positive outcomes because of the following four reasons. First, an individual will benefit from the flow of information, if he has social ties with people in strategically or hierarchically relevant positions. The associated opportunities in consequence of the information flow would not be available without the corresponding relations (Chou & He 2011). Second, if an individual is centrally positioned in the social network, he will be able to influence decisions by exploiting social ties. Third, other people will attribute more and higher social credentials to an individual, if he can show off relations to strategically positioned actors. Lastly, social relationships are seen as reinforcements of one’s identity and recognition, which provide emotional support as well as public acknowledgment (Lin 1999a).
7.2.2
Open Source Software Communities
Von Hippel & von Krogh (2003) conceptualize OSS development communities as “Internet-based communities of software developers who voluntarily collaborate in order to develop software that they or their organizations need” (von Hippel & von Krogh 2003, p. 209). Besides the fact that OSS communities consist of hobbyists, who voluntarily provide their resources to the community, the definition also involves another important contributor group—organizations. Organizations differ
130
7
Study II: The Social Capital Effect on Value Contribution …
from hobbyists in terms of their motivation to engage and are represented in the community by their employed developers. In turn, employed developers might be considered as proxies for firm interests in the community. However, until now, the majority of research efforts pertained to communities characterized by homophilous actors. With firm-affiliated individuals, the community consists of heterogeneous actors who have not yet been in a research focus to date.
7.3
Hypotheses Development
Nahapiet & Ghoshal (1998) introduced a conceptual model, in which social capital—divided into structural, cognitive and relational dimensions—is the prerequisite to build up and harness intellectual capital in organizations. This conceptual model was taken by Tsai & Ghoshal (1998) to derive and empirically examine a model of social capital and firm’s value creation. This doctoral research draws on these elaborations to conceptualize and evaluate a model of social capital and individual’s value creation. In this doctoral study, the focus and thus the unit of analysis, respectively, is on the individual level, involving individual contributors and their relationships to other contributors in OSS communities. The research model is depicted in Figure 7.1 and will be explained through the development of the hypotheses further below.
7.3.1
Relations of Social Capital Dimensions to Each Other
The structural dimension of social capital reflects the structure of relations between individuals in a group or community (Granovetter 1992a). The structural attributes of social capital are defined on the basis of the network structure, which is spanned by the individuals, including network ties, network configuration and appropriable organization (Nahapiet & Ghoshal 1998). These attributes, belonging to the context of an individual’s social interaction, are seen to shape an individual’s trust in relationships to others, to influence the norms, the obligations and expectations or the identification of an individual. The latter named characteristics (i.e., trust, norms, obligations, identification) are connected to the relational dimension of social capital, which comprises these assets that are anchored in relationships (Nahapiet & Ghoshal 1998). Thus, individuals form trusting relationships over time with their community peers through social interactions. This circumstance was confirmed by previous studies, for example Granovetter (1985) and Tsai & Ghoshal (1998). Following the aforementioned coherence and applying it to the context of OSS contri-
7.3 Hypotheses Development
131
butors, it is proposed that the formation of trustful relations also takes place through online social interaction, which leads to the following hypothesis:
Value
Social Capital Firm-Sponsorship Structural
H7
H4
H1 H3
Relational H2
H5
H9
H8
Contribution
H6
Cognitive
Figure 7.1 Research Model
H1 : Structural capital is positively associated with a contributor’s relational capital. The cognitive dimension of social capital stands for a shared language and for shared codes as well as shared narratives among individuals in a group or community (Nahapiet & Ghoshal 1998). These cognitive elements are essential because they support a common understanding of joint objectives in a social system, and they thereby simultaneously facilitate the formation of trusting relationships among individuals active in that social system (Barber 1983, Nahapiet & Ghoshal 1998). Due to collective goals and values individuals in a community do more likely trust one another, as they assume that they are all pursuing the same objectives (Tsai & Ghoshal 1998). The same holds true for virtual or online communities, where individuals not physically, but digitally, through the Internet, find together and share a common interest and common goals (O’Mahony & Bechky 2008), as well as use a common language (Stewart & Gosain 2006). Based on these contexts, the following hypothesis for OSS contributors is established: H2 : Cognitive capital is positively associated with a contributor’s relational capital. The relation between the structural and the cognitive dimension of social capital is based on the assumption that social interaction is vital for individuals to form and to pass on common objectives and values in a group or community (Tsai & Ghoshal
132
7
Study II: The Social Capital Effect on Value Contribution …
1998). With focus on contributors adopting objectives, languages and values through their interaction in an OSS community, the following hypothesis is concluded: H3 : Structural capital is positively associated with a contributor’s cognitive capital.
7.3.2
Relations of Social Capital Dimensions to Contribution
As described above, the structural dimension of social capital is based on an individual’s social ties and its social interactions. Tsai & Ghoshal (1998) describe social ties as pathways through which information and resources flow. In particular, through social interaction an individual may obtain information and resources from other individuals through unofficial channels. Previous studies have shown that this circumstance is all the more true, the more central an individual is in a community (e.g., Crowston & Howison 2006, Dahlander & O’Mahony 2011). The fast and easy access to information and resources enables an individual to perform better in accordance with the objectives of the group or community. Studies on this subject researched on the influence of an individual’s group centrality, which is associated with the structural dimension of social capital, on knowledge contribution (e.g., Wasko & Faraj 2005), expertise integration (e.g., Chou & He 2011) or research impact (e.g., Li et al. 2013). Furthermore, these studies show that social capital, and here in specific the structural dimension of social capital, can be associated with valuable outcomes arising from it. In this doctoral study, in the context of OSS contributors, the outcome of an individual’s social capital is considered as a specific value created for the OSS community. The understanding of value created is derived from the pertinent literature in this field (e.g., Amit & Zott 2001, Bowman & Ambrosini 2000, Lepak et al. 2007, Moran & Ghoshal 1996). In general, value creation has two dimensions: first, the concept with its content—this is what is valuable—and second, the process expressing how value is generated (Bowman & Ambrosini 2000, Lepak et al. 2007, Wassmer & Dussauge 2011). In the sense of the aforementioned first dimension related to what is valuable, source code contribution is defined in this research as proxy for value created by individuals in a community. With regard to the second described dimension concerning the process related to value creation it is anticipated that a high centrality of an individual in a community can lead to a greater volume and speed of resource flows (Li et al. 2013) or to the availability of needed resources (Lin 2001). Thus, it is proposed that central contributors in an OSS community can gain advantages from structural capital and obtain more information, knowledge and resources. The consequence of the locational as well as information advantage in the community is likely to be
7.3 Hypotheses Development
133
reflected in a higher contribution to the source code. These considerations lead to the following hypothesis: H4 : Structural capital is positively associated with a contributor’s source code contribution. Trust, norms, obligations and identification are key concepts of the relational dimension of social capital. Individuals can achieve more valuable relational capital by maintaining strong relationships with other individuals in a group (Nahapiet & Ghoshal 1998, van den Hooff & Huysman 2009). Accordingly, strong and trustworthy relationships are seen as a major asset for cooperation, resource acquisition and knowledge sharing in virtual communities (Chang & Chuang 2011, Ridings et al. 2002). Moreover, a high level of trust among individuals in a community can foster joint problem solving and commitment to as well as identification with the community (Wasko & Faraj 2005). If an individual maintains strong relationships to his peers and he is fully committed to the community work, that person is most likely willing to spend more time on fulfilling tasks for the community (Hsu & Hung 2013). Further, with regard to the value creation process, committed and motivated individuals are seen as acting much more creative to produce novel outcomes (Lepak et al. 2007). Drawing on these findings, the following hypothesis for the context of committed contributors in OSS communities is established: H5 : Relational capital is positively associated with a contributor’s source code contribution. The cognitive dimension of social capital consists of a shared language and a shared vision, as previously described. Both characteristics form and strengthen over time and enable individuals to engage in a meaningful exchange of ideas, interests and knowledge within a collective (Nahapiet & Ghoshal 1998). In the same vein individuals expand their knowledge and expertise during the process of exchange with their community peers (Wasko & Faraj 2005). Such procedures enhance the cognitive capital of individuals and will increase the likelihood that involved individuals will contribute more, for example knowledge (Chiu et al. 2006, Wasko & Faraj 2005), to the community. The existing findings are transferred to the context of OSS contributors and their contribution to the source code and conclude the following: H6 : Cognitive capital is positively associated with a contributor’s source code contribution.
134
7.3.3
7
Study II: The Social Capital Effect on Value Contribution …
Firm-Sponsorship as Moderator
OSS communities receive contributions not only from hobbyist, but also from firms, more precisely from employed developers contributing to the community on behalf of their employer (Grand et al. 2004, Schaarschmidt & Von Kortzfleisch 2009). As a result, today’s successful OSS projects obtain contributions from hobbyists, universities, research centers, as well as from software vendors and user firms (Homscheid et al. 2015, Teigland et al. 2014, Schaarschmidt & Von Kortzfleisch 2015). As firms pursue certain objectives with their dedication of employed developers to an OSS community, it is comprehensible that they may influence the development trajectory of an OSS project to get desired returns from their engagement (Schaarschmidt 2012). Consequently, firm-sponsored contributors will get involved in a different way in the community than hobbyists (Dahlander & Magnusson 2006). To assess if there is any difference in the relation between social capital and source code contribution for different contributor groups, especially hobbyist and firm-sponsored developers, the following three hypotheses are proposed. These hypotheses are backed by the idea that being sponsored provides access to a wider set of resources (Dahlander & Wallin 2006) that complement resources provided through social capital. H7 : Firm-sponsorship positively moderates the relation between a contributor’s structural capital and his source code contribution. H8 : Firm-sponsorship positively moderates the relation between a contributor’s relational capital and his source code contribution. H9 : Firm-sponsorship positively moderates the relation between a contributor’s cognitive capital and his source code contribution.
7.4
Research Design
7.4.1
Setup of the Data to be Examined
Data Sources under Consideration For this study two data sources will be utilized to test the proposed research model with the positioned hypotheses. On the one hand, the LKML data is used to derive social capital properties of the LKML actors. On the other hand, it is made use of LK source code contribution data to deduct the generated value that the LK
7.4 Research Design
135
contributors added to the LK project. For this study the LKML data from the year 2014 is utilized in connection with LK source code data of the years 2014 and 2015. The LK source code data of both years are combined into an average source code contribution value. This approach is justified as follows: The reference period of the LKML is one year. Thus, data bias from LKML actors who are every now and then active in LKML communication is reduced as opposed to a larger investigation period. Further, the use and the content of the LKML may have changed over the years and thus could cause issues related to construct, measurement and statistical conclusion validity (Howison et al. 2011), which are reduced through the chosen time frame under consideration. Moreover, in the presented research model, a linear relationship between independent and dependent variables is assumed, so that a chronological sequence of the data used is relevant. Accordingly, the data of the dependent variable, which is operationalized as source code contribution, come from the years 2014 and 2015. The addition of source code contribution data of the year 2015 is based on the potential impact of LKML activities performed by an actor in 2014, which will affect his source code contribution in the year 2015, as programming specific source code may take a while. The linking of the network data set of the communication of the actors of the LKML of the year 2014 with the LK source code contribution data of the years 2014 and 2015 is based on the email addresses of the LK developers existing in both data sources. The email addresses of the LK source code authors were extracted from the LK source code commit log files together with the corresponding source code contribution data. Utilized Statistical Software The independent variables (i.e., centrality measures) were calculated with the aid of R1 (version 3.1.3), a free software environment for statistical computing and graphics, and using the OS integrated development environment RStudio2 for R (version 0.98.1103). As input file for R served the generated adjacency matrix for the communication that took place in the year under consideration on the LKML. In detail, for calculating the needed individual actor related measures functions from the R “sna” package3 (i.e., tools for social network analysis; version 2.3-2) were used. The computed network measures were directly stored by the developed R script into the local MySQL database. Hypotheses testing (i.e., calculating regressions) was done with IBM SPSS Statistics version 25. As input file for SPSS served a corresponding database export file with all needed information about LKML actors’ 1 https://www.r-project.org,
last accessed December 18, 2018. last accessed December 18, 2018. 3 https://cran.r-project.org/web/packages/sna/, last accessed December 18, 2018. 2 https://www.rstudio.com,
136
7
Study II: The Social Capital Effect on Value Contribution …
activities (i.e., their centrality measures) and their source code contribution. The moderation hypotheses were checked using the PROCESS macro4 for SPSS (version 3.1) provided by Andrew F. Hayes.
7.4.2
Operationalization of Variables
The operationalization of the different dimensions and variables of the research model (see Figure 7.1) is driven by the underlying sources of data. As for this study data from the LKML is used to span a communication network of the various LKML actors (see Sub-Section 5.1.1) methods of social network analysis are utilized to calculate different actor related network measures (e.g., Wasserman & Faust 1994) that are related to the three social capital dimensions developed by Nahapiet & Ghoshal (1998). Moreover, the dependent variable source code contribution is operationalized through the calculation of LK source code changes performed by the individual LK contributors. The two utilized sources of data limit the possibility that the results are affected by common method variance (Podsakoff et al. 2003). In the following the operationalization of the research model is delineated in detail. Social Capital As in relation to the proposed hypotheses, it is the subject of this study to evaluate the relationship between an OSS contributor’s social capital and his created value depicted by his source code contribution. Furthermore, the influence of firm-sponsorship of an OSS contributor on the aforementioned relationship has to be assessed. For the measurement of an OSS contributor’s individual social capital and in specific of the three social capital dimensions the gathered LKML data is used. Thus, specific network measures derived from the literature of theoretical (e.g., Borgatti et al. 1998, Li 2013) as well as empirical (e.g., Abbasi et al. 2014, Li et al. 2013) social capital research and social network analysis (e.g., Borgatti 2005, Faust 1997, Wasserman & Faust 1994) as well as graph theory (e.g., Borgatti & Everett 2006, Krumke & Noltemeier 2009) were assigned to the three social capital dimensions as they depict the characteristics of the different dimensions for the measurement via network data best. The structural dimension of social capital is directly related to the structure of relationships, which arise from social interaction between actors in a community. These relationships or network ties can be seen as impersonal configuration of relations between individuals or groups (Granovetter 1992b, Nahapiet & Ghoshal 1998). Granovetter (1992b) refers to the impersonal configuration of relations as 4 http://www.processmacro.org,
last accessed December 18, 2018.
7.4 Research Design
137
structural embeddedness. A basic measure from network analysis and graph theory, which allows to make a statement about the structural position and embeddedness, respectively, of an individual actor in a network, by taking the number of peers he is directly connected to into account, is the degree centrality (Borgatti et al. 1998, Wasserman & Faust 1994). This measure is in line with the comments of Nahapiet & Ghoshal (1998) related to the structural dimension of social capital seen as displaying the overall pattern of relations between actors in a group or community. In this study degree centrality and thus the degree of a node is calculated as the number of nodes adjacent to it (Freeman 1978/79, Nieminen 1974, Wasserman & Faust 1994). Following, it is the number of peers an individual LKML actor has interacted with, regardless of the direction (i.e., sending and/or receiving messages) and regardless of the amount of the interaction (i.e., unweighted degree score). In the literature degree centrality was utilized as measure for the structural dimension of social capital for instance by Abbasi et al. (2014), Chou & He (2011), Li et al. (2013) or Wasko & Faraj (2005). The relational dimension of social capital consists of assets that are located in personal relationships between the actors engaged in social interactions within a community or group. The mentioned assets evolve over time through interaction with the peers and comprise trust, norms, obligations and identification (Nahapiet & Ghoshal 1998). Granovetter (1992b) terms this circumstance relational embeddedness of individuals in a group or community. A network measure reflecting the facts of the relational dimension of social capital is specified as tie strength, also called weighted degree centrality (Abbasi et al. 2014, Borgatti et al. 1998). Granovetter (1973) underpins this context with his remarks according to the strength of relations. The degree of interaction intensity between two actors, the frequency of their intimacy (i.e., trustworthiness), reciprocity as well as acknowledged or mutual obligations are reflected by the strength of the relationship between the corresponding actors (Lin 2001, Granovetter 1973). In addition, Granovetter (1983) clarifies the role of strong ties in the context of his weak tie argumentation by stating “[...] I had better say that strong ties can also have value. Weak ties provide people with access to information and resources beyond those available in their own social circle; but strong ties have greater motivation to be of assistance and are typically more easy available. I believe that these two facts do much to explain when strong ties play their unique role” (Granovetter 1983, p. 209).
Furthermore, Krackhardt (1992) elaborates that strong ties are essential for the development of trust between individuals. For the measurement of the relational dimension of social capital, tie strength is calculated and understood as the weighted
138
7
Study II: The Social Capital Effect on Value Contribution …
degree of a node, this is the number of nodes adjacent to it weighted by their node values (Abbasi et al. 2014, Newman 2004, Wasserman & Faust 1994). Here, this is the number of communication interactions (i.e., number of messages sent and/or received) an individual LKML actor has made with his peers. In research weighted degree centrality and tie strength, respectively, is applied in the context of social capital studies for example by Abbasi et al. (2014), Plotkowiak (2014) or Seibert et al. (2001). Elements of the cognitive dimension of social capital are shared language and codes as well as shared narratives within a group or community (Nahapiet & Ghoshal 1998). Tsai & Ghoshal (1998) outline according to the cognitive elements of the cognitive dimension of social capital that also common goals, interests and aspirations, summarized under the term shared vision, can be seen as a part of the cognitive dimension. In the context of this study it can be assumed, by taking key properties of communities into account (Whittaker et al. 1997), which were also applied in the context of online communities (Stockdale & Borovicka 2006), that the aforementioned elements of the cognitive dimension of social capital are partly or even entirely inherent existing among the members of the LK project. That is because the key properties of a community—these are the reasons why people come together and form a community—are a shared goal, interest or need, reciprocity of information, support and services as well as a shared context of social conventions, language and protocols (Whittaker et al. 1997), which partly coincide with the elements of the cognitive dimension of social capital and are also reflected in the OSS ideology (for more information about the OSS ideology please see Sub-Section 3.2.2). As the LK project is organized through various subgroups—the so called LK subsystems— and accordingly by means of corresponding mailing lists, which serve to discuss implementation details, for instance, of specific drivers or software of the LK (e.g., linux-usb5 or linux-raid6 ) (Bovet & Cesati 2005, Lee & Cole 2003, Love 2005), the measure for the cognitive dimension of social capital in this study is related to the number of different subsystems and mailing lists, respectively, a LK contributor is involved in. This operationalization is based on the approach of Plotkowiak (2014) and transferred to the LK context with respect to the available information given in the crawled LKML data. With regard to the above given explanation about a general shared language and codes as well as a general shared vision, which LK contributors share based on their engagement in the LK project, each subsystem of the LK and in specific each LKML may have its own shared language and codes as well as shared vision. Although 5 http://vger.kernel.org/vger-lists.html#linux-usb, 6 http://vger.kernel.org/vger-lists.html#linux-raid,
last accessed December 18, 2018. last accessed December 18, 2018.
7.4 Research Design
139
derived from the general LK shared language and shared vision each subsystem may be different with respect to aspects related to their common goals and interests (Hertel et al. 2003). Resulting from this coherence in the context of the LK project a LK contributor is considered to have more cognitive related social capital if he is interested and active in different LK subsystems and thus in numerous LKMLs. The interest and engagement of a LK contributor in diverse LKMLs leads to a broad understanding of and awareness for the different shared languages and shared visions of the various LKMLs. Thereof the contributor forms advanced knowledge of the goals, interests and aspirations of the diverse LK subsystems, which then leads to an enhanced cognitive social capital of the LK contributor. In summary, in this study the cognitive dimension of social capital is operationalized through the number of different mailing lists (hereafter also referred to as cross lists) an LK contributor is active during the period under consideration. Former studies in the field of online or network-related social capital research following the social capital conceptualization of Nahapiet & Ghoshal (1998) assessed the cognitive dimension predominantly via questionnaire-based methods (e.g., Chiu et al. 2006, Lu & Yang 2011, Wasko & Faraj 2005), whereby some recent studies also operationalized all three social capital dimensions by network measures (e.g., Li et al. 2013, Plotkowiak 2014). Firm-Sponsorship It is known that actors in OSS communities can be described as heterogeneous not only in relation to their field of contribution or function (e.g., source code contribution, bug reporting, documentation services) (von Krogh et al. 2003, Ye & Kishida 2003) but also in relation to their motivation or reason to contribute to the community, for instance, because they are interested to learn and do the work in their leisure time, or they get paid for it, as part of their professional work (Ghosh 2005, Hertel et al. 2003, Lakhani & Wolf 2005, Roberts et al. 2006). The latter mentioned circumstance is of special interest for this study as the actors on the LKML and the LK source code contributors are a heterogeneous group, as some of them act on behalf of companies while others might be considered hobbyists (Homscheid et al. 2015). Thus, in order to operationalize firm-sponsorship LKML actors were assigned to specific contributor categories in a semi-automated and semi manual process. For more information about the categorization of the LKML actors and the contributor categories please refer to Sub-Section 5.1.3. For this study the LK contributors assigned to contributor category 1 (i.e., firm-sponsored LKML actors) and contributor category 2 (i.e., LK hobbyists) were of special interest to test the proposed moderator hypotheses.
140
7
Study II: The Social Capital Effect on Value Contribution …
Source Code Contribution As a proxy for the value created by the LK contributors their source code contribution is considered. The measure is calculated for the time frame under consideration for each LK source code author by summing up its LK source code changes. The source code changes of the source code authors are tracked in Git7 , the source code management system used by the LK developers, subdivided into line inserts and line deletes (for more information about the gathering process of the LK source code data please see Section 5.2). These two values are tracked for each source code commit and since each commit contains only the source code developed or changed by exactly one LK developer these indicators can be clearly connected to the individual LK contributors. As not only new lines of source code stand for value created but also deleted lines of source code are accompanied with added value, for example in the case of maintaining/bug-fixing or improving existing source code as well as retiring obsolete features (Lotufo et al. 2010), line inserts and line deletes were summed up per source code commit, resulting in a number representing the source code line changes for a specific commit of a specific LK source code author. As measure for source code contribution the number of changes were added up for each LK source code author separately for the years 2014 and 2015 and then combined into an average source code contribution value. Control Variables In general control variables are included into regression models to figure out if an occurring effect is the exclusive consequence of the independent variable/s or not (Fahrmeir et al. 2013). The challenge in the use of network data for the investigation of causal relations between predictors and outcomes is the presence of such variables in the data (Li 2013). In relation to the postulated hypotheses of this study there may be other factors besides firm-sponsorship that may influence the source code contribution activities of LK developers. Several former studies in the field of social capital research, which used (partly) network data to empirically assess causal relationships refrained from including control variables in their models (e.g., Chang & Chuang 2011, Chiu et al. 2006, Chou & He 2011, Wasko & Faraj 2005). Whereas other studies in the social capital context did include certain control variables in their evaluations, for example, Hsu & Hung (2013) controlled for project size and project duration when investigating the effects of social capital on process and product performance or DiVincenzo & Mascia (2011) tested for project duration, project team size and geographical location in their study about the impact of social capital in project-based organizations on project performance. Derived from the utilized control variables of the aforementioned studies and on 7 https://git-scm.com,
last accessed December 18, 2018.
7.4 Research Design
141
the basis of the information existing in the complete LKML data set from the year 1996 until the year 2014 mailing list tenure of LKML actors was included into the evaluations of this study as control variable. This is due to the fact that the experience and the over the time accumulated knowledge and understanding of LKML actors related to LK tasks and issues might have an influence on their source code contribution performance. In this study mailing list tenure is measured via the number of month a contributor is active in the LKML taking the complete available LKML communication from the year 1996 until 2014 into account. In detail, mailing list tenure is determined through the months between the sending date of the first message of a contributor and the date of the last message sent to the list, which has to be located in the year 2014 as the communication of the year 2014 is under consideration for this study, resulting in a potential maximum value of 228 months (= 19 years). It should be noted that this measure may be vulnerable to validity issues (Howison et al. 2011). The construct validity for this measure could theoretically be limited, since the tenure to the mailing list is derived from two points in time, without a closer look at the activity between the two points in time.
7.4.3
Outlier Detection
With regard to both data sources (i.e., LKML data and LK source code repository data) utilized to test the conceptualized research model the data is considered to display the world as it is, as these network data were not formed through the direct request of participants but as a byproduct of the LK actors’ general activities (Howison et al. 2011). In contrast to, for example, survey data where the participants are aware of the fact that their opinion about a topic is requested and thus, they may give answers that do not correspond to their real opinion (Homburg & Krohmer 2008, Howison et al. 2011). Although the used data in this study is considered real world data both data sets were checked for outliers as proposed by Osborne & Overbay (2004). The result of the investigation of LKML data regarding outliers was inconspicuous. In the case of the LK source code data outliers were detected and removed from the data set. This is justified by the fact that Git, the LK source code management software, counts all lines of a source file as deleted and also again as inserted in cases a source file is renamed, moved to another location in the LK directory or consolidated with other source code files. This circumstance is often the case in connection with cleaning up and reorganizing the LK source code. Appropriate patterns were revealed through in-depth analyses of the LK Git data by the doctoral candidate. For the detection of outliers the suggestion of Osborne & Overbay (2004) was followed and thus a visual inspection of the data by means of
142
7
Study II: The Social Capital Effect on Value Contribution …
box-plots performed as well as thresholds calculated starting from three and more standard deviations from the mean. As a result of the outlier detection, eight records with values for source code contribution of more than five standard deviations from the mean (i.e., values > 92,760) were removed from the data set for the year 2014, the highest excluded value counted 634,088 changed lines of LK source code. For the year 2015 three records were excluded with values for source code contribution of more than five standard deviations from the mean (i.e., values > 130,642), the highest removed value counted 586,447 changed lines of code. Table 7.1 gives the number of source code contributors (N), the mean and standard deviation (S.D.) of the LK source code contribution data for the years under consideration after the outlier removal. Table 7.1 LK Source Code Contribution Data: Means and Standard Deviations for the Years 2014 and 2015 Years
N
Means
S.D.
2014 2015
1,825 1,100
1,259.26 1,673.93
3,916.96 4,553.23
7.4.4
Validity Consideration
To ensure the validity of this study the given recommendations by Howison et al. (2011) with regard to the combination of network data and social network analysis methods (i.e., calculating different actor related network measures, for example, degree centrality or tie strength) have been considered. The utilized network data— this includes the LKML data as well as the source code contribution data—can be seen in the sense of Howison et al. (2011) as digital trace data with its specific characteristics that can cause issues related to the study’s validity. Digital trace data are described by Howison et al. (2011, p. 769 f.) through three features and are contrasted with traditionally collected research data, such as surveys or interviews: 1. Digital trace data are found data, in contrast to produced data for research studies. 2. Digital trace data are event-based data, rather than summary data. 3. Digital trace data are longitudinal data, rather than date-related data, because events occur over a period of time.
7.5 Results
143
The for this study most relevant points in the context of possible validity issues arising from the combination of network data and social network analysis methods addressed by Howison et al. (2011) are briefly delineated in the following. System and practice issues and therewith construct, measurement, statistical conclusion validity issues are reduced as the time frame under consideration is set to one year. However, the use and the content of the LKML may have changed over the years (i.e., 1996–2014), but can most likely be considered stable for the year 2014—this is the year under consideration of the LKML. Corresponding samples for checking the usage and contents of the LKML for the year 2014 were carried out without any noticeable results related to the described possible issues. Reliability issues from system generated data can not totally be excluded but are viewed with little relevance. Nevertheless, in-depth knowledge of both the LKML archive platform “marc.info” and the source code management system Git is available to the doctoral candidate. An issue occurring with the LKML archive platform “marc.info” could be the time zone management of the received and archived LKML emails. As the exact time stamp of LKML emails is not subject of the study, except for the delineation of LKML emails after years, this causes no issues related to internal and statistical conclusion validity. From the aspect of choosing multiple or single link types no validity issues related to construct validity do occur, as the LKML communication network consists of one kind of links between the LKML actors, which is the message communication. Inappropriate importation of network measure interpretations are a source of possible construct validity issues, as specific network data and thus the derived network measures may not be valid outside the original context. The in this study applied network measures in the context of the operationalization of social capital are derived from leading literature of theoretical (e.g., Borgatti et al. 1998, Li 2013) as well as empirical (e.g., Abbasi et al. 2014, Li et al. 2013) social capital research and social network analysis (e.g., Borgatti 2005, Faust 1997, Wasserman & Faust 1994) as well as graph theory (e.g., Borgatti & Everett 2006, Krumke & Noltemeier 2009), thus ensuring a high construct validity.
7.5
Results
7.5.1
Descriptive Information about Linux Kernel Mailing List Actors in 2014
In the following, descriptive information about LKML actors and their corresponding activities on the mailing list in 2014 are given. In the year 2014 there were
144
7
Study II: The Social Capital Effect on Value Contribution …
overall 278,300 mailing list activities (i.e., messages sent) that can be distinguished in 104,808 starter messages sent to the mailing list and 173,492 specific recognized response messages. In total 6,007 actors or message senders could be identified active in the year under consideration. Table 7.2 lists the Top 10 actors on the LKML in descending order of the number of their sent messages and giving their affiliation in the year 2014 as well as stating, if they were LK maintainers and/or credited LK developers. It can be clearly seen that there were no hobbyists present among the Top 10 LKML actors of the year 2014 and that the majority (i.e., eight out of ten) of the actors were LK maintainers. Table 7.2 Top 10 LKML Actors of the Year 2014
1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
LKML Actors
Affiliations
Amount of Messages sent in 2014
Greg Kroah-Hartman1,2 Peter Zijlstra1 Kamal Mostafa Luis Henriques Rafael J. Wysocki1 Lee Jones1 Mark Brown1 Steven Rostedt1 Arnd Bergmann1 Andy Lutomirski1
Linux Foundation Red Hat Canonical Ltd. Canonical Ltd. Intel Linaro Debian Red Hat Linaro AMA Capital
12,881 3,386 3,131 3,001 2,997 2,657 2,641 2,432 2,403 2,276
Status: 1 LK maintainer, 2 Credited LK developer (Status as of February 2015)
Firm-sponsored LK actors are of special interest for this study and as a result also the companies employing such contributors. Thus, Table 7.3 shows the ten most active organizations on the LKML ranked in order of the activity of their employed contributors in the year 2014. In the year under consideration Red Hat employees had sent more than 17,200 messages to the LKML. In comparison to the total number of sent messages to the mailing list in 2014 (i.e., 278,300 emails) the employees of the Top 3 companies account for 16.02% (44,589 emails) and the employees of the Top 10 are responsible for 31.61% (87,958 emails) of the messages that have been sent. The Linux Foundation was excluded from the mentioned Top 10 list, because it is a non-profit organization with immediate connection to the LK project.
7.5 Results
145
Table 7.3 The 10 Most Active Organizations on the LKML of the Year 2014 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Organizations
Amount of Messages sent in 2014
Red Hat Intel Linaro SUSE Canonical Ltd. Google IBM Samsung ARM Ltd. Texas Instruments
17,225 14,556 12,808 10,532 7,238 6,166 5,332 5,261 4,445 4,395
As explained in Sub-Section 5.1.1 in the context of the utilized software module Person Mapping different actors on the LKML may share an identical name and are mapped through the person mapping program to one person object. For the time frame under consideration (i.e., the year 2014) for hypotheses testing 6,007 actors on the LKML were identified. From these, 55 actors have been recognized to share an identical name but act with different email addresses on the mailing list. Under the assumption that these 55 actors are all individual persons sharing the same name, a possible erroneous linkage rate of LKML actors having an identical name in the year 2014 of about 0.92% occurs, which is considered acceptable.
7.5.2
Descriptive Information about the Linux Kernel Source Code for 2014 and 2015
The whole LK consists of different subsystems and parts, for example, the hardware, network and sound drivers as well as the documentation (Bovet & Cesati 2005, Lee & Cole 2003, Love 2005). To get an understanding of how the over 20 million lines of source code of the LK software are distributed over the different parts Table 7.4 states the lines of source code and the corresponding percentage per LK part as of August 17, 2015. The results in the given table reveal that the drivers part is by far the largest component of the kernel with over 11.48 million lines of source code making up 56.61% of the kernel. Far behind come the two parts belonging to architecture (3.39 million lines of code making up 16.74%) and file system (1.17 million lines of code making up 5.81%).
146
7
Study II: The Social Capital Effect on Value Contribution …
Table 7.4 Distribution of Lines of Source Code per LK Part LK Part
Lines of Code
Percentage
./usr ./init ./samples ./ipc ./virt ./block ./security ./crypto ./scripts ./lib ./mm ./firmware ./tools ./kernel ./Documentation ./include ./sound ./net ./fs ./arch ./drivers Overall
845 5,739 8,758 8,926 10,701 37,845 74,844 90,327 91,474 109,466 110,035 129,084 232,123 246,369 569,944 715,349 886,892 899,167 1,179,220 3,398,176 11,488,536 20,293,820
0.0042 0.0283 0.0432 0.0440 0.0527 0.1865 0.3688 0.4451 0.4507 0.5394 0.5422 0.6360 1.1438 1.2140 2.8085 3.5250 4.3703 4.4307 5.8107 16.7449 56.6110 100.0000
Source Code Date: August 17, 2015 from LK GitHub Mirror (https://github.com/torvalds/ linux, last accessed December 18, 2018)
For this study the two years 2014 and 2015 are under consideration for evaluating the source code contribution of LK developers. Next, some descriptive information are given separately for both years. For the year 2014 3,941 LK source code authors could be identified. The source code committed in the year 2014 by these authors was added to LK releases in the time frame between January 2014 and January 2016 through 74,884 source code commits, hence, comprising source code for the LK releases 3.13 to 4.4. The majority of commits containing source code from the year 2014 are related to LK releases between June 2014 and February 2015 (see Table 7.5). Furthermore, in the year 2015 there were 3,863 source code authors that contributed source code to the LK. The source code of these authors was included
7.5 Results
147
in LK releases in the period from February 2015 until January 2016 by 63,545 source code commits, thus, comprising source code for the LK releases 3.19 to 4.4. Here, the majority of commits comprising source code from the year 2015 are related to LK releases between June 2015 and January 2016. The distribution of LK commits containing source code developed in the years 2014 and 2015 is contrasted in Table 7.5. From the comparison of the source code contribution for both years it can be seen that, although the years had nearly the same amount of source code authors, until LK release 4.4 in January 2016 there is a difference of 11,300 fewer code commits with LK source code of 2015. This difference can be explained with the fact that not all the source code developed in the year 2015 is already officially integrated into the LK and thus was not yet part of a LK release, as the LK source
Table 7.5 Distribution of the LK Source Code Commits 2014 and 2015 via the LK Versions Commits with LK Source Code of 2014 3,941 source code authors LK Release Dates Commits Versions
Commits with LK Source Code of 2015 3,863 source code authors LK Release Dates Commits Versions
3.13 3.14 3.15 3.16 3.17 3.18 3.19 4.0 4.1 4.2 4.3 4.4
3.19 4.0 4.1 4.2 4.3 4.4
19. January 2014 30. March 2014 08. June 2014 03. August 2014 05. October 2014 07. December 2014 08. February 2015 12. April 2015 21. June 2015 30. August 2015 01. November 2015 10. January 2016
187 5,356 13,931 13,688 13,250 12,313 12,633 2,850 465 127 60 24 74,884
08. February 2015 12. April 2015 21. June 2015 30. August 2015 01. November 2015 10. January 2016
939 8,331 12,475 14,605 13,214 13,981
63,545
148
7
Study II: The Social Capital Effect on Value Contribution …
code data collection and analysis period was March and April 2016, when the last existing LK release was release number 4.48 . Going into detail with the LK source code contributors of the years 2014 and 2015 the comparison of the ten most active LK source code developers of both years shows that contributors affiliated with companies form the majority in the listings and that nearly all of the ten most active developers occupy a central position in the governance structure of the LK as maintainer (see Table 7.6). In addition, Table 7.6 states the number of inserts and deletes and the sum of both, which display the numbers of corresponding lines of code changes of the source code of each contributor. Furthermore, the consideration of the most active LK source code contributing firms of the years 2014 and 2015 reveals that seven of the ten listed companies are present among the Top 10 in both years (see Table 7.7). The results of the most active LK source code contributing companies of the year 2014 of this study are consistent with the list published by the Linux Foundation about companies that are sponsoring the LK work (Corbet et al. 2015), although the time frame of the Linux Foundation report included alongside the LK releases of the year 2014, in addition LK release 3.11 as of September 2013 and LK release 3.14 as of November 2013. Table 7.7 gives further information about the lines of code changes separated into the number of inserts and deletes and the sum of both for each listed company.
7.5.3
Correlations and Regression Results
To get first insights into the results of the study, Table 7.8 displays descriptive information, this is means including standard deviations as well as minimum and maximum values, and correlation coefficients related to the investigated data set. The sample size of each variable can vary, because only those records were used, where the variable values were present in the data set. LKML tenure is specified as active month of an actor in the LKML. For the determination of LKML tenure the whole period from 1996 until 2014 was utilized. The 19 years under consideration can result in a potential maximum value of 228 months. Here, Table 7.8 states a maximum value of 220 months for LKML tenure. This means, there is at least one actor in the data set that has sent its first message to the LKML in the year 1996 and was also active in the mailing list in 2014. The maximum value for degree centrality displays a node, which has 1,336 other nodes adjacent to it. In addition, the maximum value 8 This
circumstance is not relevant for the further evaluations of this study, as the focus is on the source code commits of the corresponding years and these data sets are completely available.
Hans Verkuil1 Dave Chinner1 Mauro Carvalho Chehab1,2 Malcolm Priestley1 H. Hartley Sweeten1
Tomi Valkeinen1 Larry Finger1 Greg KroahHartman1,2 Andrzej Pietrasiewicz1 Kristina Martsenko
Vision Engraving Systems
26,204
12,645
37,836
Samsung
Hobbyist
49,516
64,036
38,921
68,506
54,957
49,558
65,125 10.
81,151 9.
92,793 8.
99,074 7.
74,311 138,347 6.
111 165,115 165,226 5.
114,138 107,506 221,644 4.
81,423 149,960 231,383 3.
73,319 313,550 2.
Takashi Iwai1 Johnny Kim1
Ingo Molnar1
Christoph Hellwig1,2 Jie Yang1
Alex Deucher1 Larry Finger1 Dennis Dalessandro Mike Marciniszyn1
Contributors
240,231
Sum of Changes Ben Skeggs
Deletes
319,074 315,014 634,088 1.
Red Hat
FOSS Outreach Program Cisco
Linux Foundation Samsung
Texas Instruments Hobbyist
2014 Affiliations Inserts
Status: 1 LK maintainer, 2 Credited LK developer (Status as of March 2016)
10.
9.
8.
7.
6.
5.
4.
3.
2.
1.
Contributors
Atmel Corporation
SUSE
Red Hat
Intel
Hobbyist
Intel
Intel
Hobbyist
AMD
Red Hat
Deletes
Sum of Changes
11,769 480,350
34,099
19,361
21,219
20,800
14,772
57,759
31,557
276
21,109
19,757
20,460
30,010
163
30,633
34,375
40,470
40,976
41,260
44,782
57,922
62,190
179,858 180,301 360,159
468,581
295,434 291,013 586,447
2015 Affiliations Inserts
Table 7.6 Overview of the 10 Most Active LK Source Code Contributors of the Years 2014 and 2015
7.5 Results 149
18,343
27,419
48,390
Vision Engraving Systems Freescale Semiconductor
9.
10.
40,094
59,012
Free Electrons
8. 25,653
212,265 152,676 168,418 57,332 65,637 49,795
66,733
67,513
84,665
443,042 440,995 386,692 166,748 131,248 111,124
713,012
Sum of Changes
10.
9.
8.
2. 3. 4. 5. 6. 7.
1.
Texas Instruments
AMD Intel SUSE Linaro Samsung Freescale Semiconductor Atmel Corporation Mellanox Technologies
Red Hat
Companies
40,408
51,298
52,590
505,428 421,295 48,169 78,606 75,975 59,539
403,271
2015 Inserts
19,468
13,611
18,041
25,031 186,175 72,468 38,788 39,841 31,475
358,969
Deletes
59,876
64,909
70,631
530,459 607,470 120,637 117,394 115,816 91,014
762,240
Sum of Changes 7
230,777 288,319 218,274 109,416 65,611 61,329
338,861
Deletes
2. 3. 4. 5. 6. 7.
374,151
2014 Inserts
Texas Instruments Samsung Intel Red Hat Linaro Cisco SUSE
1.
Companies
Table 7.7 Overview of the 10 Most Active LK Source Code Contributing Companies of the Years 2014 and 2015
150 Study II: The Social Capital Effect on Value Contribution …
7.5 Results
151
of tie strength shows 5,932 messages sent and/or received by an individual LKML actor with his peers in the year 2014. The maximum number of LKML cross lists a LKML actor is active in is 65 lists and the maximum source code line changes an individual has made in the LK source code was 42,667 line changes (average value of the years 2014 and 2015). The two contributor categories named “hobbyists” and “firm-sponsored” are each dummy-coded. All displayed correlations are significant at a p-level < 0.01. When having a closer look at the given correlations in Table 7.8, it is worth mentioning that the correlation between degree centrality and tie strength—with a value of .930—is very high. Moreover, also the correlations between degree centrality and number of cross lists (.733) as well as between tie strength and number of cross lists (.610) are remarkable high. Previous studies utilizing network measures also revealed high correlations between different network measures, for example, Bloch et al. (2016) and Valente et al. (2008). Going into detail with the results of the calculated regressions, Table 7.9 states the findings related to the postulated hypotheses 1 to 3. H1 predicts a positive relationship between structural capital (degree centrality) and relational capital (DV tie strength). The relation between both is significant (p < 0.001) with a β of .938 related to degree centrality and a β of -.029 related to the control variable LKML tenure. The high value of β related to degree centrality is an indication for multicollinearity. An additional examination of the variance inflation factor (VIF) gives a result of 1.074 related to LKML tenure and degree centrality, which is thus well below the recommended threshold of 3 (Hair et al. 2010, O’Brien 2007). A look at the correlation (see Table 7.8) between degree centrality and tie strength reveals that the correlation is very high (.930). Degree centrality contributes 86.6% to explain tie strength, the addition of the control variable LKML tenure has no influence on the coefficient of determination (R2 ). The second hypothesis posits that cognitive capital (#cross lists) is positively associated with relational capital (DV tie strength). There is a significant (p < 0.001) regression with β values of .592 related to #cross lists and .076 related to the control variable LKML tenure. VIF is below the threshold of 3 with a value of 1.06 related to LKML tenure and #cross lists. The amount of cross lists a LKML actor is active in makes a contribution of 37.8% to explain tie strength, here also no influence of the control variable LKML tenure on the value of R2 . H3 describes a positive relation between structural capital (degree centrality) and cognitive capital (DV #cross lists). This hypothesis is confirmed by a significance level of p < 0.001. β values are .720 related to degree centrality and .049 related to the control variable LKML tenure. Due to the high β value related to degree centrality the VIF was also assessed, but does not pose an issue with a value of 1.074, thus well below the recommended threshold of 3 (Hair et al. 2010, O’Brien
N
Means
S.D. 220 1,336 5,932 65 42,667 1 1
Min Max
5,941 30.53 46.00 0 5,503 15.12 48.61 0 5,503 56.06 238.94 0 6,001 3.05 4.59 1 2,005 1,032.02 2,979.04 0 4,326 .586 .493 0 4,326 .414 .493 0
***p < 0.001; **p < 0.01; *p < 0.05; + p < 0.1
LKML Tenure Degree Centrality Tie Strength #Cross Lists SCC AVG Hobbyists Firm-Sponsored
Variables .262** .217** .248** .177** −.163** .163**
1
.930** .733** .345** −.072** .072**
2
4
5
6
.610** .307** .339** −.058** −.161** −.094** .058** .161** .094** −1.0**
3
7
1 2 3 4 5 6 7
Table 7.8 Operationalized Variables: Descriptive Statistics and Correlations
152 Study II: The Social Capital Effect on Value Contribution …
7.5 Results
153
2007). However, the consideration of the correlation between degree centrality and #cross lists (see Table 7.8) shows a high value of .733. Degree centrality contributes 53.9% to explain the amount of cross lists a LKML actor is active in, the control variable LKML tenure does not have an impact on R2 . Hypotheses 1 to 3 relate to the relationships of the three social capital dimensions to each other. Although the calculated regressions are significant, due to the partly high β values and also high correlation values, the results must be considered more differentiated. The network dimensions considered in connection with the networkbased determination of the three social capital dimensions operate in a similar way and differ sometimes only in nuances, such as degree centrality and tie strength. The utilized measures are based on characteristics of nodes, which all show a similar behavior (Bloch et al. 2016), this means, nodes with a higher degree centrality will most likely also have a higher tie strength. In consequence, the mentioned measures will most likely indicate high values with regard to correlation and regression results. Regardless, it should be noted that despite their significant effect, the control variable LKML tenure has a marginal impact (significant β values less than .08) on the relationships tested and no impact on the values of the explained variances (R2 ) on the individual model levels. Table 7.9 Linear Regression Results: Hypotheses 1 to 3 LKML Tenure (Control) Degree Centrality #Cross Lists R2 Number of Actors
H1 DV Tie Strength
H2 DV Tie Strength
H3 DV #Cross Lists
-.146 (.026)***
.389 (.056)***
.005 (.001)***
4.610 (.025)*** – .866 5,475
– 29.812 (.552)*** .378 5,475
.070 (.001)*** – .539 5,475
***p < 0.001; **p < 0.01; *p < 0.05; + p < 0.1; standard errors in parentheses
In the next step, the results of the calculated regressions of the relations of the social capital dimensions to source code contribution—expressed by hypotheses 4 to 6—are considered (see Table 7.10). The research model states for H4 a positive relation between structural capital (degree centrality) and the dependent variable LK source code contribution. The significant result leads to β values related to degree centrality of .322 (p < 0.001) and related to the control variable LKML tenure of .066 (p < 0.01). VIF was also calculated and is with a value of 1.123 related to degree centrality and LKML tenure below the threshold of 3 (Hair et al. 2010, O’Brien
154
7
Study II: The Social Capital Effect on Value Contribution …
2007). Degree centrality and the control variable LKML tenure contribute 12.3% to explain source code contribution, whereby the increase of R2 related to the control variable LKML tenure makes up just 0.4%. H5 predicts a relation with a positive effect between relational capital (tie strength) and the dependent variable LK source code contribution. The result of the calculated regression shows significance (p < 0.001) with β values of .280 related to tie strength and of .102 related to the control variable LKML tenure. The VIF related to tie strength and LKML tenure is no issue with a value of 1.085. Tie strength and LKML tenure make a contribution of 10.4% to explain source code contribution, here the increase of R2 , which can be attributed to the control variable LKML tenure, is 1.0%. A positive relationship between cognitive capital (#cross lists) and the dependent variable source code contribution is constructed through H6 . This relationship is significant and shows β values related to #cross lists of .314 (p < 0.001) and related to the control variable LKML tenure of .075 (p < 0.01). The result of the VIF related to #cross lists and LKML tenure is no issue, with a value of 1.125 it is far below the threshold of 3 (Hair et al. 2010, O’Brien 2007). The amount of cross lists a LKML actor is active in and LKML tenure make a contribution of 12.0% to explain source code contribution, whereby the control variable LKML tenure is responsible for an R2 increase of just 0.5%. Overall, hypotheses 4 to 6 are all related to the dependent variable LK source code contribution and confirmed by the results of the regression calculation. The control variable LKML tenure is also significant for all three relations but has marginal impacts with β values equal or less than .102 on the relationship levels and minimal impact (1.0% and less) on the values of the explained variances (R2 ) on the individual model levels. Table 7.10 Linear Regression Results: Hypotheses 4 to 6 LKML Tenure (Control) Degree Centrality Tie Strength #Cross Lists R2 Number of Actors
H4 DV SCC AVG
H5 DV SCC AVG
H6 DV SCC AVG
4.700 (1.584)**
7.194 (1.562)***
5.323 (1.570)**
16.395 (1.141)*** – – .123 1,984
– 2.856 (.226)*** – .104 1,984
– – 146.473 (10.374)*** .120 1,993
***p < 0.001; **p < 0.01; *p < 0.05; + p < 0.1; standard errors in parentheses SCC AVG: Average Source Code Contribution (2014–2015)
7.5 Results
155
In the following, the results of the moderator analysis related to hypotheses 7 to 9 are outlined (see Table 7.11). In general, moderator effects can be examined through two different models in the context of regression analysis, first, in the form of interaction effects or second, in the form of multi-group analysis (Urban & Mayerl 2018). In this study the moderator effects were examined through calculating interaction effects with the aid of the PROCESS macro for SPSS (version 3.1), as thus the strength, direction and significance of the moderator effects can be easily determined (Urban & Mayerl 2018). H7 posits that firm-sponsorship moderates the relation between structural capital (degree centrality) and source code contribution. This hypothesis is confirmed, as the interaction term of degree centrality and the moderator firm-sponsorship is significant (p < 0.001) in the calculated model. Figure 7.2 depicts the simple slopes of the corresponding interaction. The significant interaction effect of degree centrality and firm-sponsorship makes a contribution of 2.3% and LKML tenure (control variable) contributes 0.7% to the coefficient of determination resulting in an overall R2 of 16.6%. H8 states that the relationship between relational capital (tie strength) and source code contribution is moderated by firm-sponsorship. The results of the moderator analysis reveal an existing moderation effect, as the interaction term of tie strength and firm-sponsorship is significant (p < 0.001). Figure 7.3 shows the simple slopes of the aforementioned interaction. The significant interaction effect of tie strength and firm-sponsorship contributes 3.4% and the control variable LKML tenure is responsible for an increase of 1.3% to the coefficient of determination resulting in an overall R2 of 15.1%. The postulated H9 states a relation between cognitive capital (#cross lists) and source code contribution, which is moderated by firm-sponsorship. The calculated moderator effect confirms a significant moderation (p < 0.001). Corresponding simple slopes are depicted in Figure 7.4. The significant interaction effect of the amount of cross lists a LKML actor is active in and the moderator firm-sponsorship makes a contribution of 1.5% and LKML tenure (control variable) contributes 0.9% to the coefficient of determination resulting in an overall R2 of 15.1%. In summary, hypotheses 7 to 9 are related to the moderation of relations by firm-sponsorship. The results reveal a significant interaction effect related to all corresponding relationships, thus confirming all postulated moderation hypotheses. Besides the interaction effect the significant effect of LKML tenure (control variable) has a minor impact (1.3% and less) on the coefficient of determination in all three tested cases. The given simple slopes show the impact of the moderator (firm-sponsored; not firm-sponsored (hobbyists)) on the relationship between
156
7
Study II: The Social Capital Effect on Value Contribution …
degree centrality (Figure 7.2), tie strength (Figure 7.3), #cross lists (Figure 7.4) and source code contribution. In addition to the above described individual regression relationships, the regression model of the social capital dimensions (DV tie strength) and the overall regression model (DV source code contribution) were also examined. The results of the regression model of the social capital dimensions (see Table 7.12) show significance but reveal a β value of 1.047 (p < 0.001) related to degree centrality, a β value of -.152 (p < 0.001) related to #cross lists and a β value of -.021 (p < 0.001) related to the control variable LKML tenure. Although, the calculated VIFs are all below the threshold of 3 (Hair et al. 2010, O’Brien 2007), the very high β value related to degree centrality is an indicator of multicollinearity. The total regression model shows no significance (see Table 7.13), as the relations from degree centrality and tie strength to source code contribution are not significant. The β value related to LKML tenure is .061 (p < 0.01) and related to #cross lists β is .174 (p < 0.001). In addition, in three cases the VIF is above the limit of 3. These cases are degree centrality, tie strength and #cross lists. In these cases the VIF is a clear indicator for multicollinearity related to the named independent variables (Hair et al. 2010, O’Brien 2007). Table 7.11 Moderation Test Results: Moderation Hypotheses 7 to 9
LKML Tenure (Control) Degree Centrality Tie Strength #Cross Lists Firm-Sponsorship (Moderator) Interaction R2 R2 Change Interaction Number of Actors
H7 DV SCC AVG
H8 DV SCC AVG
H9 DV SCC AVG
6.778 (1.872)***
9.340 (1.841)***
7.643 (1.864)***
9.430 (1.745)*** – – 149.694 (162.773)
– 1.334 (.312)*** – 164.118 (158.796)
– – 88.524 (16.695)*** -41.291 (187.552)
16.324 (2.404)*** .166 .023
3.974 (.489)*** .151 .034
118.029 (22.050)*** .151 .015
1,643
1,643
1,652
***p < 0.001; **p < 0.01; *p < 0.05; + p < 0.1; standard errors in parentheses SCC AVG: Average Source Code Contribution (2014–2015); Interaction: Independent Variable x Firm-Sponsorship
7.5 Results
157
(amount of changed lines of source code)
Average Source Code Contribution
Moderator Firm-Sponsorship Firm-Sponsored Not Firm-Sponsored
Degree Centrality
Figure 7.2 Simple Slopes: Interaction of Degree Centrality and Firm-Sponsorship Table 7.12 Linear Regression Result: Total Social Capital Model, DV Relational Capital (Tie Strength)
LKML Tenure (Control) Degree Centrality #Cross Lists R2 Number of Actors
Total Social Capital Model DV Tie Strength
VIF
−.109 (.025)*** 5.148 (.035)*** −7.649 (.353)*** .877 5,443
1.079 2.199 2.170
***p < 0.001; **p < 0.01; *p < 0.05; + p < 0.1; standard errors in parentheses
158
7
Study II: The Social Capital Effect on Value Contribution …
(amount of changed lines of source code)
Average Source Code Contribution
Moderator Firm-Sponsorship Firm-Sponsored Not Firm-Sponsored
Tie Strength
Figure 7.3 Simple Slopes: Interaction of Tie Strength and Firm-Sponsorship Table 7.13 Linear Regression Result: Total Model, DV Source Code Contribution (SCC AVG) LKML Tenure (Control) Degree Centrality Tie Strength #Cross Lists R2 Number of Actors
Total Model DV SCC AVG
VIF
4.284 (1.599)** 6.896 (3.821)+ .512 (.583) 81.036 (18.637)*** .132 1,965
1.153 12.742 7.353 3.602
***p < 0.001; **p < 0.01; *p < 0.05; + p < 0.1; standard errors in parentheses SCC AVG: Average Source Code Contribution (2014–2015)
7.6 Discussion, Conclusion and Implications
159
(amount of changed lines of source code)
Average Source Code Contribution
Moderator Firm-Sponsorship Firm-Sponsored Not Firm-Sponsored
#Cross Lists
Figure 7.4 Simple Slopes: Interaction of #Cross Lists and Firm-Sponsorship
7.6
Discussion, Conclusion and Implications
7.6.1
Discussion and Conclusion
The aim of this study was on the one hand, to investigate the relation between an OSS contributor’s social capital and his created value for the OSS community. On the other hand, it should be verified, whether firm-sponsorship of OSS contributors has an influence on the relationship between social capital and created value. For this study, the social capital definition of Nahapiet & Ghoshal (1998) was used, which considers not only the structure of relationship networks but also the actual and potential resources that could be utilized through such networks. As contribution to research, this study adapted the conceptual social capital model of Nahapiet & Ghoshal (1998) to conceptualize and evaluate a model of social capital and individual’s value creation in the context of OSS communities. Social capital is therein depicted by its three dimensions—these are the structural, relational and
160
7
Study II: The Social Capital Effect on Value Contribution …
cognitive dimension. Further, the understanding of value creation was derived from the pertinent literature in this field (e.g., Amit & Zott 2001, Bowman & Ambrosini 2000, Lepak et al. 2007, Moran & Ghoshal 1996). In relation to this study, value creation, in specific the concept with its context—this is what is valuable— (Bowman & Ambrosini 2000, Lepak et al. 2007, Wassmer & Dussauge 2011) was related to source code contribution, as this can be seen as proxy for value created by individuals in an OSS community. Further contributions to research arise from the operationalization of the research model. On the one hand, all three social capital dimensions were implemented as network measures and on the other hand, two data sources (i.e., LKML communication data and LK source code contribution data) were linked to verify the research model. As part of the examination of the relationship between social capital and source code contribution, this study provides a solid result with regard to the individual relationships of the three social capital dimensions among each other (H1 -H3 ). The hypotheses are confirmed, as the relationships between structural capital and relational capital (H1 ) and between cognitive capital and relational capital (H2 ) as well as between structural capital and cognitive capital (H3 ) are significant at the individual relationship level (see Table 7.9). It is noticeable that there are some very high β values related to the individual relationships and that these variables have also high correlation values. For example, the relation between structural capital (IV degree centrality) and relational capital (DV tie strength) is significant (p < 0.001) with a β value of .938. The correlation between both variables is .930 (p < 0.01). Thus, the results must be considered more differentiated. The network dimensions considered in connection with the network-based determination of the three social capital dimensions operate in a similar way and sometimes differ only in nuances, such as degree centrality and tie strength. The utilized measures are based on characteristics of nodes, which all show a similar behavior (Bloch et al. 2016), this means, nodes with a higher degree centrality will most likely also have a higher tie strength. In consequence, this could be an explanation why the mentioned measures show high values with regard to correlation and regression results. In relation to the described consideration of the individual relationships of the three social capital dimensions among each other, these relationships were also tested in a total model. This test yields an interesting result, because the total model of social capital dimensions is indeed significant (see Table 7.12), but with respect to the relationship of structural capital (IV degree centrality) to relational capital (DV tie strength), the β value is greater than one (β value 1.047, p < 0.001). Even if the calculated VIFs with results smaller than the threshold of 3 (Hair et al. 2010, O’Brien 2007) do not give any indication for multicollinearity, the high β value can be seen as evidence for multicollinearity between the independent variables.
7.6 Discussion, Conclusion and Implications
161
As argued in the context of the results related to the individual relationships of the social capital dimensions, here the two independent social capital dimensions (i.e., structural (degree centrality) and cognitive (amount of cross lists)) operate in a similar way, so that there is probably too much dependence between the measures used. Another important finding provided by the study is related to the first main objective, which concerns the relationship between social capital and value contribution. In this context, the postulated hypotheses (H4 –H6 ), which represent the three individual relationships between the social capital dimensions and source code contribution, are confirmed. Table 7.10 shows the corresponding results. In summary, as conceptualized in the research model, it is confirmed that both structural and cognitive as well as relational social capital of a person are—on an individual relationship level—positively related to its source code contribution and its created value for the OSS community, respectively. The control variable LKML tenure has only a minor impact (1.0% and less) on the values of the explained variances (R2 ). Accordingly, the experience and the over the time accumulated knowledge of LKML actors related to LK tasks and issues have only a marginal impact on the source code contribution of LK developers, as opposed to their social capital. In addition to the described results according to individual relationships of social capital dimensions to value creation, the tested total model containing the structural, cognitive, and relational relationship to source code contribution reveals no significance (see Table 7.13). It should be noted that the result shows high VIF values above the threshold of 3 (Hair et al. 2010, O’Brien 2007). Here, according to the high VIF values, multicollinearity of the independent variables is assumed. Matching the arguments already given in relation to the results of the total social capital model, a very likely cause for this multicollinearity is that the three social capital dimensions operate in a similar way, so that there is probably too much dependence between the measures used. In the further part of this thesis, the relationship between social capital and source code contribution is considered as confirmed. This is due to the fact that the individual measures of the three social capital dimensions are highly correlated. Thus, it is anticipated that they express the same thing in terms of social capital. Hence, for the further discussion, the level of consideration of the relation between social capital and source code contribution is turned to the level of confirmed individual paths. The second main objective of the study was the examination whether firmsponsorship of OSS contributors has an influence on the relationship between social capital and created value. The established moderation hypotheses (H7 -H9 ) were all confirmed (see Table 7.11). Including the simple slopes (Figure 7.2 to 7.4), it can be
162
7
Study II: The Social Capital Effect on Value Contribution …
stated that the relationship between structural capital (degree centrality) and created value (average source code contribution) is significantly more pronounced for firmsponsored contributors than for not firm-sponsored contributors (i.e., the hobbyists). The same results emerge when looking at the relationships between relational capital (tie strength) and created value (average source code contribution) as well as cognitive capital (number of cross lists) and created value (average source code contribution). It can be concluded that firm-sponsored contributors, through their access to a wider set of resources (Dahlander & Wallin 2006) and their higher community involvement, have a higher level of social capital, which in turn is positively related to their level of value creation in the OSS community. Overall, it can be concluded that the established and tested relationship paths were confirmed when considered at the individual level. In order to get an overall picture of the relationship interaction, the total models were also tested. The results show significance for the total social capital model (DV relational capital) and no significance for the total model (DV source code contribution). The results of both models have in common that they show high dependencies between their respective individual measures, which requires further research attention. In summary, Table 7.14 states the hypotheses results in relation to the tested individual path level and total model level.
7.6.2
Implications for Research
In line with social capital theory, this study has shown that social capital of OSS contributors is positively associated with their created value—here in the form of source code contribution. The starting point for this research was the adapted social capital model of Nahapiet & Ghoshal (1998), which was aligned to the context of OSS contributors and OSS communities resulting in a model of social capital and individual’s value creation. The evaluated model still has potential for improvement in terms of the operationalization of the network metrics to measure the three social capital dimensions. Future studies could start from the shown facts related to the high dependence of the utilized network measures of social capital and evaluate further network measures, which depict the characteristics of the dimensions of social capital, but are not so highly interdependent. In addition, in line with the established hypotheses, this study confirmed the moderation effect of firm-sponsorship on the relationship between social capital and source code contribution. The study has clearly shown that the associations between network position and positive value contribution as predicted by social capital theory are not independent of developers’ profession or in other words, they
7.6 Discussion, Conclusion and Implications
163
Table 7.14 Summary of Hypotheses Results
H1
H2
H3
H4
H5
H6
H7
H8
H9
Relations
Results Individual Path Level
Results Total Model Level
Structural Capital (Degree Centrality) → Relational Capital (Tie Strength) Structural capital is positively associated with a contributor’s relational capital. Cognitive Capital (#Cross Lists) → Relational Capital (Tie Strength) Cognitive capital is positively associated with a contributor’s relational capital. Structural Capital (Degree Centrality) → Cognitive Capital (#Cross Lists) Structural capital is positively associated with a contributor’s cognitive capital. Structural Capital (Degree Centrality) → Source Code Contribution Structural capital is positively associated with a contributor’s source code contribution. Relational Capital (Tie Strength) → Source Code Contribution Relational capital is positively associated with a contributor’s source code contribution. Cognitive Capital (#Cross Lists) → Source Code Contribution Cognitive capital is positively associated with a contributor’s source code contribution. Structural Capital (Degree Centrality) → Source Code Contribution moderated by Firm-Sponsorship Firm-sponsorship positively moderates the relation between a contributor’s structural capital and his source code contribution. Relational Capital (Tie Strength) → Source Code Contribution moderated by Firm-Sponsorship Firm-sponsorship positively moderates the relation between a contributor’s relational capital and his source code contribution. Cognitive Capital (#Cross Lists) → Source Code Contribution moderated by Firm-Sponsorship Firm-sponsorship positively moderates the relation between a contributor’s cognitive capital and his source code contribution.
Supported
Supported
Supported
Supported
Supported
Supported
Supported
Not supported
Supported
Not supported
Supported
Not supported
Supported
–
Supported
–
Supported
–
164
7
Study II: The Social Capital Effect on Value Contribution …
are different for heterogeneous developer groups. The research on differences of heterogeneous contributor groups in OSS communities is still at the beginning. Here, it would be an interesting approach to explore further aspects under which heterogeneous contributor groups in OSS project differ from each other.
7.6.3
Implications for Management
The results of this study support managers of companies, which employ developers for active participation in one or more OSS projects, to get an answer to the question of how to govern firm-sponsored developers in an OSS community to exert influence in the community. In general, from the social capital theory point of view and from the perspective of the results of this study, providing staff to collaborate in OSS projects is an appropriate means of influencing value creation in the community. The moderator analysis of the social capital to source code contribution relationship has clearly shown that this relationship is significantly more pronounced for firmsponsored OSS contributors than for hobbyists. With regard to the LK project, it can be concluded that active firm-sponsored contributors who stand out through a central network position in the LK community certainly have the opportunity— through their cooperation and their source code contributions—to influence, for example, the development trajectory of the project and thus can exploit benefits for their firm that arise from making use of the OSS community. These insights fit the explanation by Andrew Morton, who states that the further development of the LK is determined by technical guidelines and the available resources that are brought in independently by individuals and companies (Morton 2005). In conclusion, companies have indeed the opportunity to influence OSS communities by directing their employees regarding the position to be taken in the community. Therefore, it has to be clear to the employees where to develop towards in the community, for example, by means of their contributions. This process may take time, to get employees into key positions in OSS communities (Dahlander & O’Mahony 2011) that are strategically relevant for the company, but it is connected to the above mentioned benefits for the company. From a management perspective, it would be interesting to understand how this process might look like for developing employees into key positions in OSS communities and how long this may take. These interesting aspects could be elaborated by further research.
7.6 Discussion, Conclusion and Implications
7.6.4
165
Limitations of the Study
This study has some limitations that must be considered when utilizing the results of this research. In the following, the limitations of this study are described and complemented by suggestions for improving research in this area. Generalizability of Results The OSS project (i.e., the LK project) chosen for this study is unique. This applies both to the successful duration of the LK project over more than two decades, and to the support of the OSS project by its contributors, which can be categorized into different contributor groups (Homscheid et al. 2015). Accordingly, the results of this study can not be directly transferred to other OSS projects involving different contributor groups. Further research could target similar OSS projects, such as the Apache HTTP server project, to identify different contributor groups and evaluate the three manifestations of the relationship between social capital and value creation for these contributor groups. Contributor Categorization The categorization of LKML actors by the hostname of their email addresses into four contributor groups is an approximate classification, which is not free from limitations. It is known that individuals may use simultaneously various email addresses while acting in the LK project. Further, there are actors that may do personal work out of the office using their company email address and there are developers that do not use their company email address while contributing on behalf of a firm (Corbet et al. 2013). For example, LK contributors using an email address with the domain name @kernel.org are closely affiliated with the Linux Foundation but may also work for a LK supporting company, here using the @kernel.org email address to hide their real company affiliation (Schaarschmidt 2012). In order to increase the quality of the affiliation data derived from the network data set of OSS contributors, further research could query former and current affiliations, in addition to further questions, via a personal OSS contributor survey. Operationalization of LKML Tenure In this study LKML tenure is measured via the number of month a contributor is active in the LKML, taking the complete available LKML communication from the year 1996 until 2014 into account. The measure is determined through the months between the sending date of the first message of a contributor and the date of the last message sent to the mailing list. The construct validity for this measure could theoretically be limited, since the tenure to the mailing list is derived from two points in time, without a closer look at the activity between the two points in time. A starting point for further research could be to look at the activities of LKML actors between their first and last message. Thus, it is possible to determine whether an actor is involved in regular contributions or
166
7
Study II: The Social Capital Effect on Value Contribution …
if he is only sporadically active. This aspect could enrich research around the topic of “episodic volunteering”, see for example, Barcomb et al. (2018). Operationalization of Source Code Contribution as Proxy for Value Creation Source code contribution as a measure of value creation is considered in this study as a quantitative measure, this is the number of source code lines insertions and changes. However, it is understood that a small contribution (e.g., 30 lines of source code) could have more content-related value, as this is related to an important basic function of the kernel, whereas a major contribution in code (e.g., 500 lines of code) for a side component has in comparison less content-related value. Further studies could examine how the value of source code contribution could be assessed from its content-related side.
Study III: Social Capital and the Formation of Individual Characteristics—An Examination of Open Source Software Developers
8.1
Introduction
In contrast to the early beginnings of OSS communities around the 1990s—where these communities were largely formed by voluntary contributors—today OSS communities consist of contributors with a variety of motivations for their involvement (Andersen-Gott et al. 2012, von Krogh et al. 2012). In general, the contributors can be assigned to different groups of participants. These groups include the hobbyists—the voluntary contributors who generate added value for the community in their spare time. In addition, companies often participate in OSS communities by providing employees, who are thus paid by the firm to contribute a possible variety of services to the OSS project (West & O’Mahony 2008). Thereby, not only activities of software technology companies can be identified, such as IBM and Red Hat (Corbet & Kroah-Hartman 2017, Grand et al. 2004, Schaarschmidt & Von Kortzfleisch 2009, Schaarschmidt et al. 2015), but also of user firms, such as Samsung and Google (Corbet & Kroah-Hartman 2017). Furthermore, the group of participants of OSS projects is completed by contributors from the ranks of universities and research institutions (Homscheid et al. 2015). For the mentioned individual groups various drivers can be identified, which induce the participants to get involved in OSS communities. With focus on the two major groups of participants in OSS projects—these are hobbyists and firmsponsored contributors—scholars, like for example, Andersen-Gott et al. (2012), Baytiyeh & Pfaman (2010), Bonaccorsi & Rossi (2006), Cai & Zhu (2016), Hars & Ou (2002), Lakhani & Wolf (2005), Lerner & Tirole (2002, 2005), Spaeth et al. (2008), Xu et al. (2009), Ziegler et al. (2014), revealed a variety of motivators. As predominant drivers of hobbyists research identified joy-based intrinsic motivation and altruism as well as signaling incentives, such as recognition, reputation and the © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2020 D. Homscheid, Firm-Sponsored Developers in Open Source Software Projects, Innovation, Entrepreneurship und Digitalisierung, https://doi.org/10.1007/978-3-658-31478-1_8
167
8
168
8
Study III: Social Capital and the Formation of Individual Characteristics …
improvement of the professional status as well as the improvement of own skills. Firm-sponsored contributors act on behalf of the company they are employed by, so that their activities for the OSS project are most likely related to the intentions of the firm. Companies get involved in OSS communities to appropriate from OSS by providing complementary products and services (e.g., Andersen-Gott et al. 2012). Furthermore, they can raise technological as well as innovation advantages and complement their own resource base (Grand et al. 2004, Dahlander & Magnusson 2008). However, thereby employed or firm-sponsored contributors can be seen as “a man on the inside” (Dahlander & Wallin 2006, p. 1243) as they act in the firm’s intention while contributing to the community. Nonetheless, contributions in OSS communities can be directly connected to the person who provided them, for example, through their name as author of a source code contribution (Avelino et al. 2017). These individuals build a certain position in the community through their community involvement and their related activities—no matter if they are volunteering or motivated by firm-sponsorship. This individual position may be associated with resultant characteristics for the person, e.g., influence, opinion leadership etc. The described coherence is supported by social capital theory, which states that a central position in a network and strong relationships with others in a community can be beneficial and influence various downstream variables, including different individual outcomes for the person (Lin 2001, Nahapiet & Ghoshal 1998). Accordingly, research has to show whether the relations between network position and positive individual outcome as derivable from social capital theory are independent of OSS developers’ profession. Against this background, this study aims to extend research that has used social capital theory to investigate individual outcomes of OSS contributors by addressing the following research question: How is the relation between an OSS contributor’s social capital and associated individual outcomes affected by firm-sponsorship?
This study is organized as follows. First, the research model along with a detailed derivation of the postulated hypotheses is provided. In the second step the operationalization of the utilized variables as well as the implementation of the OSS developer survey are depicted in detail. Subsequently, in the third step relevant descriptive information about the LK survey participants and the in-depth results of the hypotheses tests are presented. The study is completed by a discussion of the results with a corresponding conclusion and a description of implications for research and practice.
8.2 Hypotheses Development
8.2
169
Hypotheses Development
The development of the following hypotheses is based on the concept of social capital applied in the context of OSS communities. Specifically, Lin’s (2001) social capital theory and the social capital understanding of Nahapiet & Ghoshal (1998) provide the framework for the postulation of relationships between social capital and various outcome variables. This framework serves as the basis for deriving relationships between social capital and individual outcomes that reflect individual characteristics of active OSS developers. Related to social capital, the focus in this study is on the relational dimension of social capital, as the relational dimension unites properties that build on the structural dimension and cognitive dimension of social capital and can hardly exist without them (Nahapiet & Ghoshal 1998). The existence of the properties of the structural dimension and cognitive dimension are thus a prerequisite for the formation of the characteristics of the relational dimension of social capital as verified by the study of Tsai & Ghoshal (1998). Hence, the relational dimension of social capital is most meaningful and thus is focused on in the following consideration of social capital. The research model of this study is shown in Figure 8.1 and will be further explained through the derivation of the hypotheses below.
Social Capital
Firm-Sponsorship H4
H6
Outcome
Opinion Leadership
H5 H1 H2
Relational
Perceived own Reputation
H3
Job Autonomy
Figure 8.1 Research Model
The relational dimension of social capital incorporates assets that are anchored in relationships between individuals. These intangible assets are shaped by social interaction between people over time, which can result—inter alia—in trust and trustworthiness (Fukuyama 1995a, Putnam 1995a, Tsai & Ghoshal 1998). In detail, trust is a property of a relationship in contrast to trustworthiness, which is a property of a person involved in the relation (Barney & Hansen 1994, Tsai & Ghoshal 1998).
170
8
Study III: Social Capital and the Formation of Individual Characteristics …
Trust is thereby defined as the belief that the “results of somebody’s intended action will be appropriate from our point of view” (Misztal 1996, p. 9–10). Corresponding studies (e.g., Fukuyama 1995b, Gambetta 1988, Nahapiet & Ghoshal 1998, Putnam 1993, 1995a, Ring & Van de Ven 1992, 1994, Tyler & Kramer 1996) show that in relationships characterized by a high level of trust, people are more involved in social interactions in general and are more active in cooperative interactions in particular. The trust which a person relates to another individual is also based on the belief in the competence and capability (Sako 1992, Szulanski 1996) and the belief in the reliability (Giddens 2013, Ouchi 1981) of the other person (Mishira 1996). If the characteristics described above apply to an individual, the person will most likely allocate a central position in a structural network (Lin 1999a, 2001, Tsai & Ghoshal 1998). Moreover, an individual’s central position in a network and personal characteristics, as explained before, can foster the development of this person to an opinion leader, since a large part of the described personal characteristics are also shared by opinion leaders. These attributes include, for example, that opinion leaders occupy central positions in specific networks, that they are seen as experts in their field—which is connected with competence, reliability and trust—by friends, coworkers, colleagues or community peers and that they are heavily involved in interactions (Weimann et al. 2007). Further, this coherence—a central position in a network is strongly connected to whom individuals perceive as opinion leader— is supported by the framework for identifying opinion leaders elaborated by van der Merwe & van Heerden (2009). In general, opinion leaders are “individuals who exert an unequal amount of influence on the decisions of others” (Rogers et al. 1962, p. 435). Scholars agree that personal influence of opinions, motivations, attitudes, behaviors and beliefs of others is the main characteristic of opinion leaders (Flynn et al. 1996, Hellevik & Bjørklund 1991, Mowen 1990, Park 2013, Rogers 1983). In addition, opinion leadership is seen to be largely domain specific, so that an individual is an opinion leader about a certain product or service and can be considered a type of specialist related to the particular domain (Blackwell et al. 2006, Flynn et al. 1994, Grewal et al. 2000). Following the aforementioned argumentation and applying it to the context of OSS contributors, the following hypothesis is postulated: H1 : Relational capital is positively associated with a contributor’s characteristic as opinion leader.
Pertinent literature related to social capital and in particular to its relational dimension reveals that individuals maintaining strong relationships in a specific group or community can achieve more valuable relational capital (Nahapiet & Ghoshal 1998,
8.2 Hypotheses Development
171
van den Hooff & Huysman 2009). Correspondingly, strong and trustworthy relationships are identified as major driver of active participation (Hsu & Hung 2013), cooperation, resource acquisition and knowledge sharing in virtual communities (Chang & Chuang 2011, Ridings et al. 2002). In addition, the closure of a community and the trustworthiness of social structure facilitate the build-up of reputation of individuals highly active in a community (Coleman 1988, Portes 1998). Underlining, Wasko & Faraj (2005) confirm that individuals gain reputation through active participation in a community. In line with the given argumentation, Lin (1999a) states that a highly active person interacting and giving favors in a community is rewarded with increasing reputation in the corresponding community. This reputation is also classified as one of three returns of social capital from a network-based point of view on social capital (Lin 1999a, 2001). Reputation of an individual is thereby defined by Lin (1999a) “as favorable/unfavorable opinions about an individual in a social network” (Lin 1999a, p. 40). In congruence with the described findings in the literature the following hypothesis is derived related to OSS contributors and the perception of their own reputation in OSS communities: H2 : Relational capital is positively associated with a contributor’s perceived own reputation.
Nahapiet & Ghoshal (1998) attribute the assets trust, norms, obligations and identification to the relational dimension of social capital. Besides the asset of trust, which is a vital property of relationships to facilitate meaningful social interaction and exchange, the asset identification in connection with individuals and communities is indispensable (Nahapiet & Ghoshal 1998). Identification may foster opportunities for exchange and may increase cooperative interaction as well as cooperations (Lewicki & Bunker 1996, Nahapiet & Ghoshal 1998). Moreover, when individuals have a high level of identification with a group or community, then they are most likely also highly committed to the community (Wasko & Faraj 2005). As in line with the literature about OSS contributors and their motivation to be active in OSS communities, OSS contributors are committed to their involvement as the participation in the community and the completion of tasks is mostly voluntary and aligned according to their own interests (David & Shapiro 2008, Hars & Ou 2002, Lakhani & von Hippel 2003). From this context it is deduced that the relation of OSS contributors to community work is characterized through elements of freedom, independence and self-determination in scheduling and completing work. The last mentioned characteristics relate to the field of job autonomy (Hackman & Oldham 1975, 1980). According to the given derivation, a relation between relational social capital in the forms of identification (Nahapiet & Ghoshal 1998) as well as commit-
172
8
Study III: Social Capital and the Formation of Individual Characteristics …
ment (Sambasivan et al. 2011, Wasko et al. 2005) and job autonomy is postulated as follows: H3 : Relational capital is positively associated with a contributor’s job autonomy.
Firm-Sponsorship as Moderator OSS communities consist of different kinds of contributors, whereby they are mainly made up of firm-sponsored contributors acting on behalf of their employer and hobbyists voluntarily participating in an OSS community (Grand et al. 2004, Schaarschmidt & Von Kortzfleisch 2009). In consequence, actual OSS development projects receive support and contributions from hobbyists, universities, research centers as well as software vendors and user firms (Homscheid et al. 2015, Teigland et al. 2014, Schaarschmidt & Von Kortzfleisch 2015). Firms most likely pursue specific objectives with their provision of employed developers to OSS communities, thus it is intelligible that they have special interests in returns from their engagement in comparison to hobbyists. Consequently, firm-sponsored contributors will get involved in a different way in OSS communities than hobbyists (Dahlander & Magnusson 2006). To assess, if there is any difference in the relation between social capital, here in the form of relational capital, and expected returns—these are opinion leadership, reputation and job autonomy—for different contributor groups, especially hobbyist and firm-sponsored contributors, the in the following stated three hypotheses are proposed. These hypotheses are backed by the idea that being sponsored usually means daily work in the OSS community (Riehle et al. 2014) and, consequently, also a higher level of interaction as well as exchange and thus a more positive influence on the described individual returns. H4 : Firm-sponsorship positively moderates the relation between a contributor’s relational capital and his characteristic as opinion leader, such that high levels of relational capital more strongly relate to opinion leadership than low levels. H5 : Firm-sponsorship positively moderates the relation between a contributor’s relational capital and his perceived own reputation, so that high levels of relational capital more strongly relate to perceived own reputation than low levels. H6 : Firm-sponsorship positively moderates the relation between a contributor’s relational capital and his job autonomy, so that high levels of relational capital more strongly relate to job autonomy than low levels.
8.3 Research Design
8.3
173
Research Design
To test the proposed hypotheses of this study, the measures for the different dimensions and variables of the research model were operationalized from two different data sources. The measure for the relational dimension of social capital is derived from the LKML data of the year 2014 and operationalized as tie strength. Tie strength is understood as the weighted degree of a node, this is the number of nodes adjacent to it weighted by their node values (Abbasi et al. 2014, Newman 2004, Wasserman & Faust 1994). Here, this is the number of communication interactions (i.e., number of messages sent and/or received) an individual LKML actor has made with his peers. In research weighted degree centrality and tie strength, respectively, is applied in the context of social capital studies for example by Abbasi et al. (2014), Plotkowiak (2014) or Seibert et al. (2001). The variables reflecting individual returns of active OSS developers and the moderator have been operationalized as questionnaire constructs and thus are survey data. The two utilized sources of data limit the possibility that the results are affected by common method variance (Podsakoff et al. 2003). In the following the operationalization of the questionnaire constructs are explained in detail.
8.3.1
Utilized Constructs and Indicators
All survey constructs and variables used in this study have proven their good psychometric properties, including reliability and validity, in previous studies. The latent constructs were operationalized as multi-item measures, which were reflectively captured and their reliability was evaluated based on Cronbach’s alpha and corrected item-total correlation. All indicators of the constructs described below were, unless otherwise stated, captured using a seven-point Likert scale, ranging from 1 (strongly disagree) over 4 (undecided) to 7 (strongly agree) and mandatory to answer. Because the conducted LK contributor survey was addressed to both hobbyists and firm-sponsored developers, some constructs and control variables related to corporate activities have been shown only to firm-sponsored contributors. Hereafter, the utilized constructs for the survey and their sources of origin are further explained—in alphabetical order. In addition, the source items (first row) and the adapted items (second row) are presented in comparison tables. Job Autonomy: Hackman & Oldham (1980) provide three items to capture job autonomy, which could be taken for the survey in their original formulation (see Table 8.1). The scale has proven its good reliability in earlier management rela-
174
8
Study III: Social Capital and the Formation of Individual Characteristics …
ted research, for example, Morgeson et al. (2005), Spreitzer (1995) or Wang & Netemeyer (2002). Table 8.1 Job Autonomy Items. (Source: Hackman & Oldham (1980)) JA_1 I have significant autonomy in determining how I do my job. JA_2 I can decide on my own how to go about doing my work. JA_3 I have considerable opportunity for independence and freedom in how I do my job.
Opinion Leadership: Opinion leadership is recorded with four from the original six items conceptualized by Flynn et al. (1996) (see Table 8.2). The good reliability of the scale has been shown through studies, such as Kratzer & Lettl (2009), Schreier et al. (2007) or Schweisfurth & Herstatt (2015). Two items of the original scale are negatively worded and thus reverse-coded. These items were excluded from the scale beforehand, as reverse-coded items could cause method bias (DiStefano & Motl 2006, Podsakoff et al. 2003) and downgrade scale unidimensionality (Herche & Engelland 1996). Table 8.2 Opinion Leadership Items. (Source: adapted from Flynn et al. (1996)) OL_1
OL_2
OL_3
OL_4
I often influence people’s opinions about popular rock [clothing; environmentally correct products]. I often influence people’s opinion about software algorithms or paradigms. People I know pick rock music [clothing; “green” products] based on what I have told them. People I know use software algorithms or paradigms based on what I have told them. I often persuade other people to buy the rock music [fashions; “green” products] that I like. I often persuade others to make use of software algorithms or paradigms I like. Other people come to me for advice about choosing cd’s and tapes [fashionable clothing; products that are good for the environment]. Other people come to me for advice about choosing software algorithms or paradigms.
Perceived Own Reputation: For the measurement of perceived own reputation three items were used conceptualized by Wasko & Faraj (2005) (see Table 8.3) based on research of Constant et al. (1996). The good reliability of the scale was shown by
8.3 Research Design
175
previous studies (e.g., Alrushiedat et al. 2010, Chang & Chuang 2011, Hsu & Lin 2008, Lin et al. 2013, Wasko et al. 2009). Table 8.3 Perceived Own Reputation Items. (Source: adapted from Wasko & Faraj (2005)) PownR_1 PownR_2
PownR_3
I earn respect from others by participating on the network of practice. I earn respect from others by participating in the Linux kernel community. I feel that participation improves my status in the profession. I feel that participation in the Linux kernel community improves my status in the profession. Participating on the network of practice improves my reputation in the profession. Participating in the Linux kernel community improves my reputation in the profession.
Moderator Firm-Sponsorship: Firm-sponsorship as moderator was operationalized through a single question about the developer status of the survey participants in relation to their involvement with the LK. The answer options were predefined, thus the participants could choose between the possibilities ‘employed/firm-sponsored developer’ and ‘hobbyist’. Demographic and Control Variables: In order to obtain background information about the participants of the online survey, some demographic and additional information was requested, which are independent of the investigated research model. As demographic variables gender, age and level of education were assessed. In addition, all participants were asked about their task range (technical, half half, organizational) in connection with their LK involvement. Furthermore, from the employed and thus firm-sponsored participants information about their tenure in years related to their current company, about their department size in number of coworkers and about their income related to their LK activities (voluntary indication) was collected.
8.3.2
Conception and Method of Research
Primary Data Collection and Cross-Sectional Study The information required in relation to the dependent variables and the moderator was obtained through the collection of primary data and their analysis. For this— taking into account the achievement of the largest possible international reach—a standardized online questionnaire has been chosen as method of survey. This ques-
176
8
Study III: Social Capital and the Formation of Individual Characteristics …
tionnaire included only closed questions and the responses were captured largely on a seven-point Likert scale1 . Thus, it can be ensured that given responses from the participants could be directly and easily analyzed with the aid of statistical programs in contrast to open-ended questions and corresponding textual answers. In addition, an online survey is a cost-effective method, by which a high outreach and large number of cases can be achieved. These aspects have a high significance for this study, since the LK contributors are distributed globally all over the world. These advantages are contrasted by an often insufficient knowledge about the population of survey participants and the risk of non-serious responses resulting from anonymity of the survey (Homburg & Krohmer 2008). Furthermore, the collection of data was carried out by a cross-sectional study. Cross-sectional data are characterized by their time relatedness, this means, these are recorded at a certain point in time (Kuß 2007). Target Group and Sample The population in the context of this study is formed by all LK developers that contributed or contribute source code to the kernel. These include both employed and freelance LK developers as well as hobbyists. The LK version control system Git2 used since 2005, provides information about the LK source code contributors to a large extent. This leads subsequently to the consideration of the sample. The sample required for testing the proposed hypotheses was formed of LK developers that could be identified in the LK source code repository3 . Data Collection Tool In order to approach many potential participants and collect as many as possible data for the exploration of the research questions of this study, a standardized online questionnaire—composed in English—was chosen as instrument of data collection. The development of a standardized questionnaire for conducting an online survey constitutes a particular challenge, since the effectiveness of the instrument depends on the interpretation of the asked questions by the respondents (Homburg & Krohmer 2008). To exclude potential sources of error in relation to the content of the asked questions, appropriate question constructs have to be used (Homburg & Krohmer 2008). To keep the possibility of errors connected with the questions asked as low as possible, only constructs with appropriate indicators were used, which have 1 The
Likert scale, developed by the American social scientist Rensis Likert (1903–1981), is the most often employed one-dimensional scaling method in science for the measurement of attitudes. The scale is based on a rating scale, which is marked from a strongly negative or equivalent to a strongly positive or equivalent attitude to record (Greving 2009). 2 https://git-scm.com, last accessed December 18, 2018. 3 https://git.kernel.org, last accessed December 18, 2018.
8.3 Research Design
177
been proven in research literature several times before. According to Homburg & Krohmer (2008) the responsiveness of the participants has to be considered, which is related to the length of the questionnaire, when conducting surveys. As a guideline, 25 questions for online surveys are appropriate (Homburg & Krohmer 2008). For the survey of LK contributions it can be said that the participants have a very high relatedness to the survey topic and thus the questionnaire can be longer (Homburg & Krohmer 2008). In the context of this study the questionnaire consists of constructs, which are required for hypotheses testing as well as other questions, which are relevant for further research in the area of OSS contributors. The LK contributor survey addressed the participants with overall 40 specific survey questions. Questionnaire Structure A for participants understandable structure of the questionnaire is essential to reduce the drop-out rate. The survey should begin with interesting questions while critical questions as well as questions about the person should be asked at the end of the survey (Homburg et al. 2008). In the following the for the LK contributor survey developed questionnaire and its structure will be described. 1. Introduction: The structure of the main page of the online survey followed the guideline of Porst (2011). Firstly, respondents were introduced to the subject of the survey, which comprised information about both the target group of the survey and the objective of the study. Moreover, an incentive to participate by announcing a private donation up to USD 150 to the Linux Foundation4 was set— this included a donation of USD 1 per completed questionnaire until reaching 150 responses. The inflow of incentives is a common and successful way to increase the response rate (Porter 2004b). Secondly, participants were informed about the procedure for answering the upcoming questions and the approximate duration for processing the questionnaire. The introduction concluded with thanks in advance for the participants’ support and the indication of an email address where they could leave a message, if they had any questions.
4 Some
participants of the LK contributor survey used the possibility to give feedback to the specified e-mail address. Thus, several participants indicated that the Linux Foundation is a financially well-equipped organization, which is largely sponsored by companies. I was asked to reconsider the donation to an organization that really depends on donations and is not largely supported by firms. As a consequence, the private donation of USD 150 was given to the Software Freedom Conservancy, a not-for-profit organization that helps to promote, improve, develop, and defend Free, Libre, and Open Source Software (FLOSS) projects, https://sfconservancy.org/about/, last accessed December 18, 2018.
178
8
Study III: Social Capital and the Formation of Individual Characteristics …
2. Section of Questions: To give the respondents a feeling for the length of the questionnaire a progress bar was displayed above the questions, which indicated the proportion of the survey they had completed. A progress bar and the given information on the progress of the survey can increase the response rate and reduce drop-outs, respectively (Heerwegh 2004). The survey questions were grouped thematically and for reason of clearness distributed over nine pages with three to nine questions per page. Every page and every question construct on a page was explained to the participants by a page header, intermediate headers and additional information was given to which context the questions are related (i.e., organizational, community context or to the own LK contributor activity in general). The arrangement of the questions on the pages was done randomly by the survey software to avoid position effects, which are also known as primacy and recency effects (Paier 2010). The constructs used in the online survey have been described in Sub-Section 8.3.1. 3. Demographic Data and Control Variables: At the end of the survey demographic information about the participants (e.g., gender, age and highest educational degree) and some control variables (e.g., task range) were gathered. 4. Survey Completion: The survey closed by thanking the respondents for their participation and again a reference to a contact email address, if the respondents want to express their opinion about the survey.
8.3.3
Conduct of the Survey
With the aid of the online survey software Unipark EFS Version 10.95 the utilized constructs with their corresponding items (see Sub-section 8.3.1) have been implemented in an online questionnaire. The structure of the questionnaire has been described in the previous section. Prior to the main survey all processes around the survey, which could be managed to a large extent with the online survey software, have been extensively pretested. The questionnaire itself had been pretested by ten participants with knowledge about the LK community and the LK development process in general. Sample of Recipients As described earlier in the paragraph about the target group of this study, the name and email address of source code authors included in the LK source code repository formed the sample for the survey. All relevant data were crawled in connection with 5 http://www.unipark.de,
last accessed December 18, 2018.
8.3 Research Design
179
the LK source code data collection as described in Section 5.2 and further processed in a local MySQL database. Although LK contributors could have been addressed by posting a corresponding message to specific mailing lists, for example, by means of the ‘linux-kernel’6 or ‘linux-embedded’7 mailing list, however, the targeted and personal approach of LK contributors via email had a distinct advantage, which builds the added value of this study. This is the possibility of connecting the survey answers with the LKML data of a person. To get a sample of LK contributors for the survey invitation, in a first step all source code authors active in the time frame between January 2013 and the latest LK release in November 2015 (i.e., LK release 4.3 at the time of the conduct of this study) were extracted from the local MySQL database. Although the LK source code repository Git contains information about all source code commits and thus about source code authors starting from June 2005—the start of the usage of the software by the LK project—for this LK contributor survey rather active contributors were relevant. Nevertheless, after a first look at the distribution of active source code developers over the years and with the aim to obtain the highest possible response rate, the time period for the extraction of recipient addresses had been set to three years, resulting in the time frame from January 2013 to November 2015. In a second step, the extracted data were checked for duplicates in order to keep the probability of repeated addressing of a recipient as low as possible. When duplicates were detected the record connected with the most recent source code contribution has been retained and accordingly all other records of a participant had been removed from the data set. E-Mail Invitation Specific aspects in relation to the invitation email, which were implemented to draw the attention of the recipients to the LK contributor survey and to distinguish itself from mass emails, are described in the following. First, as email sender name the name of the doctoral candidate was used, instead of a generic email sender name. Second, a dedicated email address (i.e., [email protected]) of the University of Koblenz-Landau was employed as sender email address. Third, as besides the email addresses also the names of the LK contributors were available in the LK source code repository, the firstname of the contributors was utilized to personalize the email content. Hereby, the firstname of the recipients was included in the subject of the email and in the salutation of the email message. Fourth, the hyperlink to the online questionnaire provided in the email message was accompanied by an indication that the questionnaire is hosted by Unipark, and thus, the hyperlink refers to 6 http://vger.kernel.org/vger-lists.html#linux-kernel,
last accessed December 18, 2018. last accessed December 18, 2018.
7 http://vger.kernel.org/vger-lists.html#linux-embedded,
180
8
Study III: Social Capital and the Formation of Individual Characteristics …
their domain. Fifth, at the end of the email message a contact email address, which was the same as the sender email address, was explicitly indicated to the email recipients. This was followed by an email signature containing the University’s name, the corresponding institutes the doctoral candidate is affiliated with as well as the hyperlinks to the websites of the institutes. Bulk Mailing of Email Invitations The recipient data extracted from the LK source code repository were imported via a comma-separated values file into the participant management module of the online survey software. During the import process the software checked for duplicate entries of email addresses. When mass-mailing of emails, specific technical requirements and configurations have to be met by the email servers in order to not be black-listed as spammer after a short period of time. The configuration of the required parameters for the successful bulk mailing of email invitations was automatically done by the online survey software Unipark. In a first step, 500 LK contributors were addressed with an invitation to the online survey in mid-December 2015 in order to see how the LK contributors accept the request to participate in the online survey and how they may respond to it. The high response rate to this first invitation wave of about 50 completed questionnaires within six hours and the positive and constructive feedback via email gave the confirmation to invite, in a second step, also the other 7,657 LK contributors to the survey. The sending of emails was handled by the survey software mail module, which sent each time about 250 emails at intervals of 10 to 15 minutes. Error messages and undeliverable messages of email servers related to the invitation email—mainly because the email address did not exist anymore—were received by the email account related to [email protected]. 48 LK contributors have shared their thoughts about the survey or about the announced donation to the Linux Foundation with the doctoral candidate.
8.4
Data Analysis
8.4.1
Methods of Data Analysis
In order to assess the described hypothetical constructs on an empirical level, these are analyzed using first and second generation reliability and validity criteria. Stokburger-Sauer & Eisend (2009) define reliability as the extent to which repeated measurements with the same measurement tool and in constancy of the measured aspect returns the same values. Consequently, the reliability is the formal
8.4 Data Analysis
181
precision of measurement and is also a necessary characteristic for its validity. The validity is described as the degree to which a measuring instrument measures what it is supposed to measure. It indicates the extent to which an instrument is free of both systematic and random measurement errors (Homburg et al. 2008). As analysis methods of the first generation Cronbach’s alpha and the corrected item-total correlation are used to evaluate the utilized constructs and items (Stokburger-Sauer & Eisend 2009). The category of quality criteria of the second generation contains the measurement model evaluation. Here, the testing instruments are the confirmatory factor analysis (Backhaus et al. 2008), and structural equation modeling, which stands for causal analysis (Backhaus et al. 2008, Stokburger-Sauer & Eisend 2009). Cronbach’s Alpha To determine the quality of the operationalization on a construct related level, the internal consistency reliability of the indicators of a construct is assessed through calculating Cronbach’s alpha. This is the average of the correlation coefficients of all combinations of scale halves (Cronbach 1951, Kuß 2007). Cronbach’s alpha is ranging between zero and one, whereby high values indicate a strong relationship between the individual items of a construct and thus point to a high reliability. The threshold for a satisfying reliability level is 0.7 (Nunnally 1978). Corrected Item-Total Correlation The corrected item-total correlation constitutes another quality criterion for evaluating the reliability of a measurement instrument. It indicates the correlation of an indicator variable with all other indicators of the same factor (Stokburger-Sauer & Eisend 2009). Bearden et al. (1989) describe that the correlation of each indicator should be at least 0.5. A high corrected item-total correlation is accompanied by a high reliability and a high degree of convergent validity (Nunnally & Bernstein 1994). Confirmatory Factor Analysis The quality of the operationalization at the model level is evaluated by using a confirmatory factor analysis. Here, for the development of a measurement model individual indicators are a priori assigned to the factors, as opposed to exploratory factor analysis (Stokburger-Sauer & Eisend 2009). The measurement model reflects relationships between the hypothetical constructs (i.e., factors) and the corresponding indicators (Backhaus et al. 2008). The empirical test of the measurement model is achieved through the structure-testing confirmatory factor analysis and, in the context of the evaluation of the reliability and validity, through the interpretation of the global as well as local quality measures (Stokburger-Sauer & Eisend
182
8
Study III: Social Capital and the Formation of Individual Characteristics …
2009). Through the consideration of the global quality measures the consistency of the overall model—the so-called model fit—can be examined (Stokburger-Sauer & Eisend 2009). In the following the for this study considered global quality measures are introduced. X 2 /Degr ees o f Fr eedom (C M I N /D F): As goodness-of-fit measure in the context of global model quality, X2 divided by the number of degrees of freedom (CMIN/DF) is used (Homburg et al. 2008). This value should be less than three for a good model quality (Bollen 1989). Goodness-of-Fit Index (GFI): The goodness-of-fit index is a descriptive goodnessof-fit measure, which does not include the degrees of freedom (Homburg et al. 2008). The index indicates the proportion of the variances and covariances in the covariance matrix, which is explained by the model. Values for the index range between zero and one, while values above 0.9 express a satisfactory goodness (Stokburger-Sauer & Eisend 2009). Comparative Fit Index (CFI): The comparative fit index belongs to the group of incremental fit measures and includes the degrees of freedom and the sample size. Here, the relevant model is evaluated in relation to a basic model, also called null model, in which the indicator variables are uncorrelated (Stokburger-Sauer & Eisend 2009). An index close to one states a high goodness. Values from 0.90 to 0.95 are acceptable, values above 0.95 are even good (Homburg et al. 2008). Root Mean Square Residual (RMSEA): The root-mean-square residual is an inferential statistical goodness-of-fit measure. Through this measure the goodness of the approximation of the model to the empirical data can be calculated (StokburgerSauer & Eisend 2009). Values less than 0.05 stand for a good and values up to 0.08 stand for an acceptable model fit (Browne & Cudeck 1993). In addition to the above mentioned global goodness-of-fit measures, the measurement quality of individual indicators and factors can also be assessed through local measures of goodness, which are described hereafter. Factor Loading: A factor loading expresses how strongly the respective factor correlates with the specific indicator variable (Kuß 2007). Acceptable values for factor loadings are above the threshold of 0.5 (Backhaus et al. 2008).
8.4 Data Analysis
183
Composite Reliability (CR) and Average Variance Extracted (AVE): Composite reliability and average variance extracted indicate how well a factor is determined by the assigned indicator variables. Both measures range between zero and one, whereas for CR values should be above the limit of 0.6 (Bagozzi & Yi 1988) and for AVE above 0.5 (Stokburger-Sauer & Eisend 2009). Discriminant Validity / Fornell-Larcker Criterion: In addition to analyzing the reliability by the above-described goodness measures the validity has also to be checked, in specific the discriminant validity. This is analyzed by the strict FornellLarcker criterion. To fulfill the criterion the AVE of a factor has to be higher than all squared correlations between this factor and the other factors or vice versa, the square root of the AVE of a factor must be greater than any correlations between this factor and the other factors (Fornell & Larcker 1981). Table 8.4 summaries the in this study utilized local and global quality measures as well as the associated thresholds. Structural Equation Modeling Structural equation models delineate the causal relationships between hypothetical constructs, which have to be examined (Backhaus et al. 2008). This multivariate method offers the advantage in comparison to multiple regression analyses that even complex dependency structures, such as mutual relations and chains of effects, can be handled (Homburg et al. 2008). With the help of a confirmatory factor analysis and a structural equation model the postulated relationships were modeled and the corresponding total relationship model was verified using IBM SPSS Amos 25 (Analysis of Moment Structures). The testing of the individual hypotheses (i.e., calculating regressions for individual paths) was done with IBM SPSS Statistics version 25. The moderation hypotheses were assessed using the PROCESS macro8 for SPSS (version 3.1) provided by Andrew F. Hayes.
8.4.2
Survey Data Preparation
At the end of the LK contributor survey phase the captured data was exported from the online survey software Unipark as SPSS compatible file. In a first step, as a basic setting in SPSS and for the correct inclusion of records into analysis, the indicator of missing values (e.g., −77 or 0) was set in SPSS for each variable. In a second step, the data were subjected to several plausibility checks. Firstly, the by the 8 http://www.processmacro.org,
last accessed December 18, 2018.
184
8
Study III: Social Capital and the Formation of Individual Characteristics …
Table 8.4 Thresholds of Local and Global Quality Measures Quality Measure
Threshold
Local Quality Measures for Reflective Constructs Reliability Cronbach’s Alpha ≥ 0.7 Corrected Item-Total≥ 0.5 Correlation Confirmatory Factor Analysis Factor Loading ≥ 0.5 Composite Reliability Average Variance Extracted Discriminant Validity Fornell-Larcker Criterion
≥ 0.6 ≥ 0.5
Source
Nunnally (1978) Bearden et al. (1989), Zaichkowsky (1985) Backhaus et al. (2008), Bagozzi & Yi (1988) Bagozzi & Yi (1988) Stokburger-Sauer & Eisend (2009)
AVE(ξ i ) > r2 (ξ i ,ξ j ), for i = j or AV E(ξ i ) > r(ξ i ,ξ j ), for i = j
Fornell & Larcker (1981)
Global Quality Measures CMIN/DF
≤ 3.0
GFI
≥ 0.9
CFI
≥ 0.9
RMSEA
≤ 0.08
Bagozzi & Yi (1988), Bollen (1989) Bagozzi & Yi (1988), Stokburger-Sauer & Eisend (2009) Bagozzi & Yi (1988), Homburg et al. (2008) Bagozzi & Yi (1988), Browne & Cudeck (1993)
online survey software Unipark provided duration a respondent needed for filling in the online questionnaire was reviewed. This analysis revealed no conspicuity. Furthermore, the answer patterns of the respondents were checked for unengaged answering behavior (e.g., all items were answered with 1, 4 or 7). Here, two records were identified and deleted from the data set. Secondly, all entered data for age, tenure and department size were checked for outliers. One respondent entered 99 as age, this record was removed from the data set. Thirdly, simple relations between the demographic data and the control variables were examined for plausibility. These included, for example, the record-by-record comparison of values for age
8.5 Results
185
and highest education degree. These plausibility checks revealed no abnormalities. Thus, the cleaned data set of the LK contributor survey consists of 797 records.
8.5
Results
Overall, 8,157 LK contributors were invited via email—managed by the Unipark9 online survey application—in mid-December 2015 to participate in the LK contributor survey. Of these, 1,515 email messages were returned as undeliverable. This results in a sample of 6,642 potentially reached LK contributors. By the end of the survey phase, at the end of January 2016, in total 1,762 notified LK contributors had opened and 800 participants had completed the questionnaire. Accordingly, that yields a response rate of 26.53 % and a completion rate of 12.04 % for the LK contributor online survey. After cleaning the data set 797 records were taken into consideration for linking the survey data set to the records of LKML actors. Linking the network data set of LKML actors’ communication with the LK contributor survey data constitutes a unique characteristic of this study with respect to organizational empirical studies utilizing social capital theory. Currently, in the area of organizational research, most studies can be identified with reference to social capital theory, which either use network measures (e.g., Li et al. 2013) or survey measures (e.g., Chang & Chuang 2011, Chou & He 2011, Hsu & Hung 2013) to evaluate their research models or hypotheses. In order to ensure a comprehensible and valid combination of both data sources via linking of the actors, the following procedure was implemented. As described in detail in the previous Sub-Sections 8.3.2 and 8.3.3, the contact addresses of potential addressees for the LK contributor survey were extracted from the LK source code repository Git. Specifically, these are the source code author email addresses of the LK source code commit log files of the period January 2013 to November 2015. By using the source code author email addresses from the LK source code repository Git, it can be ensured that (1) the contact addresses of the contributors are up-to-date and (2) the persons themselves have developed LK source code and contributed it to the LK project—which is a prerequisite for answering the questions of the LK contributor survey. After the termination of the LK contributor survey, the cleaned survey data were linked to the LKML network records of 2014 via the email addresses of the persons present in both data sources. This results in a final intersection of 210 persons, who were active in the LKML in 2014 and answered the LK contributor survey in December 2015 and January 2016. 9 http://www.unipark.com/de/,
last accessed December 18, 2018.
186
8.5.1
8
Study III: Social Capital and the Formation of Individual Characteristics …
Descriptive Information about the Linux Kernel Survey Participants
Of the 210 LK survey participants, who were in the final data sample, approximately 94 % are male. LK hobbyists make up the largest subgroup of the sample with 54 %, whereas 46 % of the LK respondents are firm-sponsored. The average age of people in the sample is about 33 years (mean = 32.65, SD = 7.17) and the majority (48.6 %) of respondents fall into the age category of 30 to 39 years. When considering the level of education the majority holds a Master’s degree (47.6 %) followed by the respondents with a Bachelor’s degree (33.3 %). The working tasks of the LK contributors in the sample are of technical nature by 87 % of the respondents—as it was to be expected, as specifically LK contributors were addressed for the survey that had contributed source code to the kernel. Table 8.5 provides a general summary of the descriptive information about the overall LK survey sample. Table 8.5 Descriptive Information about the overall LK Contributor Sample Variable
N
Gender
210
Developer Status
210
Age
210
Level of Education
210
Task Range
210
Male Female Firm-Sponsored Hobbyists < 20 20 – 29 30 – 39 40 – 49 50 – 59 ≥ 60 High School Apprenticeship Bachelor Master Ph.D. Technical Half Half Organizational
Frequency
Percentage
197 13 96 114 2 71 102 29 5 1 17 6 70 100 17 182 25 3
93.8 % 6.2 % 45.7 % 54.3 % 1.0 % 33.8 % 48.6 % 13.8 % 2.4 % 0.5 % 8.1 % 2.9 % 33.3 % 47.6 % 8.1 % 86.7 % 11.9 % 1.4 %
8.5 Results
187
Although the above delineated descriptive information about the overall LK contributor sample gives an insight about the characteristics of the participated LK contributors, for this study the differences between firm-sponsored LK developers and LK hobbyists are of special interest. Thus, in the following the descriptive information about the survey sample is considered according to the two contributor groups. Comparing firm-sponsored LK contributors and LK hobbyists the majority of the respondents in both groups is male. The average age of firm-sponsored contributors in the sample is circa 34 years (mean = 33.67, SD = 6.25) and thus the average of these developers are approximately two years older than the average age of contributors in the subsample of LK hobbyists (mean = 31.80, SD = 7.80). The majority of the respondents fall into the age category between 30 and 39 years related to firmsponsored developers (62.5 %) and into the age category between 20 and 29 years related to hobbyists (45.6 %). Although, when considering the level of education the majority of both groups holds a Master’s degree (firm-sponsored developers: 47.9 %; hobbyists: 47.4 %) followed by the respondents with a Bachelor’s degree (firm-sponsored developers: 33.3 %; hobbyists: 33.3 %), in the subsample of hobbyists 11.4 % of the contributors hold a high school degree as highest education degree compared to only 4.2 % in the firm-sponsored developer sample. For both groups the majority of the sample works on technical tasks related to the LK (firm-sponsored developers: 80.2 %; hobbyists: 92.1 %) followed by a small group that works half on technical and half on organizational tasks (firm-sponsored developers: 18.8 %; hobbyists: 6.1 %). Table 8.6 depicts the descriptive information about the LK contributor sample separated by firm-sponsored LK developers and LK hobbyists. Additional descriptive information was requested from firm-sponsored LK contributors, including tenure, department size and income. The overall subsample of firm-sponsored LK contributors consists of 96 entities. The majority of firmsponsored LK developers (57.3 %) works between two and five years for their current company, followed by contributors working for their firm between six and nine years, which make up 22.9 %. The average tenure in about 5.4 years (SD = 3.63). Firm-sponsored survey participants were asked to indicate their department size. They should consider their department if it is independent from the rest of the company and reports own turnovers, otherwise they should consider their organization in matters of number of coworkers. The majority of LK contributors in the survey subsample has between ten and 49 coworkers (33.3 %). As the indication of the monthly income was voluntary for firm-sponsored survey participants 50 respondents did not give any information related to this question. The majority of firm-sponsored developers (16.7 %) earns more than $ 3,000 per month with their LK activities. Interestingly, 13,5 % of firm-sponsored developers do earn less than
188
8
Study III: Social Capital and the Formation of Individual Characteristics …
Table 8.6 Descriptive Information about the LK Contributor Sample separated by FirmSponsored LK Contributors and LK Hobbyists
Percentage
Frequency
Percentage
Hobbyists
Frequency
Firm-Sponsored
92
95.8 %
105
92.1 %
4
4.2 %
9
7.9 %
1
1.0 %
1
0.9 %
20 – 29
19
19.8 %
52
45.6 %
30 – 39
60
62.5 %
42
36.8 %
40 – 49
13
13.5 %
16
14.0 %
50 – 59
3
3.1 %
2
1.8 %
≥ 60
0
0.0 %
1
0.9 %
High School
4
4.2 %
13
11.4 %
Apprenticeship
2
2.1 %
4
3.5 %
Bachelor
32
33.3 %
38
33.3 %
Master
46
47.9 %
54
47.4 %
Ph.D.
12
12.5 %
5
4.4 %
Technical
77
80.2 %
105
92.1 %
Half Half
18
18.8 %
7
6.1 %
1
1.0 %
2
1.8 %
Variable
N
Gender
96 Male Female
Age
114
96 < 20
Level of Education
114
96
Task Range
114
96
Organizational
N
114
$ 1,000 per month for their involvement in LK activities. Table 8.7 lists the additional descriptive information about the subsample of firm-sponsored LK contributors.
8.5.2
Confirmatory Factor Analysis
The research model is checked successively using the methods of data analysis described in Sub-Section 8.4.1. Cronbach’s Alpha delivers values of above 0.87—the
8.5 Results
189
Table 8.7 Additional Descriptive Information about the Subsample of Firm-Sponsored LK Contributors Variable
N
Tenure (in Years)
96
Department Size (in Number of Coworkers)
96
Income (per Month)
96