Computer Supported Education: 11th International Conference, CSEDU 2019, Heraklion, Crete, Greece, May 2-4, 2019, Revised Selected Papers [1st ed.] 9783030584580, 9783030584597

This book constitutes the thoroughly refereed proceedings of the 11th International Conference on Computer Supported Edu

260 91 53MB

English Pages XVII, 648 [664] Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Matter ....Pages i-xvii
How Programming Students Trick and What JEdUnit Can Do Against It (Nane Kratzke)....Pages 1-25
Exploring the Affordances of SimReal for Learning Mathematics in Teacher Education: A Socio-Cultural Perspective (Said Hadjerrouit)....Pages 26-50
Feedback Preferences of Students Learning in a Blended Environment: Worked Examples, Tutored and Untutored Problem-Solving (Dirk T. Tempelaar, Bart Rienties, Quan Nguyen)....Pages 51-70
Teaching Defence Against the Dark Arts Using Game-Based Learning: A Review of Learning Games for Cybersecurity Education (Rene Roepke, Ulrik Schroeder)....Pages 71-87
A Collective Dynamic Indicator for Discussion Forums in Learning Management Systems (Malik Koné, Madeth May, Sébastien Iksal, Souleymane Oumtanaga)....Pages 88-110
Immersion and Control in Learning Art Knowledge: An Example in Museum Visit (Morgane Burgues, Nathalie Huet, Jean-Christophe Sakdavong)....Pages 111-127
Studying Relationships Between Network Structure in Educational Forums and Students’ Performance (O. Ferreira-Pires, M. E. Sousa-Vieira, J. C. López-Ardao, M. Fernández-Veiga)....Pages 128-154
Emotion Recognition from Physiological Sensor Data to Support Self-regulated Learning (Haeseon Yun, Albrecht Fortenbacher, René Helbig, Sven Geißler, Niels Pinkwart)....Pages 155-173
Separating the Disciplinary, Application and Reasoning Dimensions of Learning: The Power of Technology-Based Assessment (Gyöngyvér Molnár, Benő Csapó)....Pages 174-190
Behind the Shoulders of Bebras Teams: Analyzing How They Interact with the Platform to Solve Tasks (Carlo Bellettini, Violetta Lonati, Mattia Monga, Anna Morpurgo)....Pages 191-210
Computational Pedagogy: Block Programming as a General Learning Tool (Stefano Federici, Elisabetta Sergi, Claudia Medas, Riccardo Lussu, Elisabetta Gola, Andrea Zuncheddu)....Pages 211-235
RefacTutor: An Interactive Tutoring System for Software Refactoring (Thorsten Haendler, Gustaf Neumann, Fiodor Smirnov)....Pages 236-261
User-Centered Design: An Effective Approach for Creating Online Educational Games for Seniors (Louise Sauvé, David Kaufman)....Pages 262-284
Population Growth Modelling Simulations: Do They Affect the Scientific Reasoning Abilities of Students? (Kathy Lea Malone, Anita Schuchardt)....Pages 285-307
Novice Learner Experiences in Software Development: A Study of Freshman Undergraduates (Catherine Higgins, Ciaran O’Leary, Claire McAvinia, Barry Ryan)....Pages 308-330
Promoting Active Participation in Large Programming Classes (Sebastian Mader, François Bry)....Pages 331-354
Agile Methods Make It to Non-vocational High Schools (Ilenia Fronza, Claus Pahl, Boris Sušanj)....Pages 355-372
Testing the Testing Effect in Electrical Science with Learning Approach as a Factor (James Eustace, Pramod Pathak)....Pages 373-397
A Web Platform to Foster and Assess Tonal Harmony Awareness (Federico Avanzini, Adriano Baratè, Luca A. Ludovico, Marcella Mandanici)....Pages 398-417
MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment to Boost Mathematical Competence (Alice Barana, Marina Marchisio, Raffaella Miori)....Pages 418-441
Educational Practices in Computational Thinking: Assessment, Pedagogical Aspects, Limits, and Possibilities: A Systematic Mapping Study (Lúcia Helena Martins-Pacheco, Nathalia da Cruz Alves, Christiane Gresse von Wangenheim)....Pages 442-466
Virtual Reality and Virtual Lab-Based Technology-Enhanced Learning in Primary School Physics (Diana Bogusevschi, Gabriel-Miro Muntean)....Pages 467-478
Increasing Parental Involvement in Computer Science Education Through the Design and Development of Family Creative Computing Workshops (Nina Bresnihan, Glenn Strong, Lorraine Fisher, Richard Millwood, Áine Lynch)....Pages 479-502
Retention of University Teachers and Doctoral Students in UNIPS Pedagogical Online Courses (Samuli Laato, Heidi Salmento, Emilia Lipponen, Henna Vilppu, Mari Murtonen, Erno Lehtinen)....Pages 503-523
Designing Culturally Inclusive MOOCs (Mana Taheri, Katharina Hölzle, Christoph Meinel)....Pages 524-537
Evaluation of an Interactive Personalised Virtual Lab in Secondary Schools (Ioana Ghergulescu, Arghir-Nicolae Moldovan, Cristina Hava Muntean, Gabriel-Miro Muntean)....Pages 538-556
Potential Benefits of Playing Location-Based Games: An Analysis of Game Mechanics (Samuli Laato, Tarja Pietarinen, Sampsa Rauti, Erkki Sutinen)....Pages 557-581
Resolving Efficiency Bottleneck of the Bellman Equation in Adaptive Teaching (Fangju Wang)....Pages 582-601
Ontology-Based Analysis and Design of Educational Games for Software Refactoring (Thorsten Haendler, Gustaf Neumann)....Pages 602-628
in-Game Raw Data Collection and Visualization in the Context of the “ThimelEdu” Educational Game (Nikolas Vidakis, Anastasios Kristofer Barianos, Apostolos Marios Trampas, Stamatios Papadakis, Michail Kalogiannakis, Kostas Vassilakis)....Pages 629-646
Back Matter ....Pages 647-648
Recommend Papers

Computer Supported Education: 11th International Conference, CSEDU 2019, Heraklion, Crete, Greece, May 2-4, 2019, Revised Selected Papers [1st ed.]
 9783030584580, 9783030584597

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

H. Chad Lane Susan Zvacek James Uhomoibhi (Eds.)

Communications in Computer and Information Science

1220

Computer Supported Education 11th International Conference, CSEDU 2019 Heraklion, Crete, Greece, May 2–4, 2019 Revised Selected Papers

Communications in Computer and Information Science Editorial Board Members Joaquim Filipe Polytechnic Institute of Setúbal, Setúbal, Portugal Ashish Ghosh Indian Statistical Institute, Kolkata, India Raquel Oliveira Prates Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil Lizhu Zhou Tsinghua University, Beijing, China

1220

More information about this series at http://www.springer.com/series/7899

H. Chad Lane Susan Zvacek James Uhomoibhi (Eds.) •



Computer Supported Education 11th International Conference, CSEDU 2019 Heraklion, Crete, Greece, May 2–4, 2019 Revised Selected Papers

123

Editors H. Chad Lane University of Illinois at Urbana-Champaign Urbana, IL, USA James Uhomoibhi School of Engineering University of Ulster Newtownabbey, UK

Susan Zvacek Anderson Academic Commons University of Denver Denver, CO, USA

ISSN 1865-0929 ISSN 1865-0937 (electronic) Communications in Computer and Information Science ISBN 978-3-030-58458-0 ISBN 978-3-030-58459-7 (eBook) https://doi.org/10.1007/978-3-030-58459-7 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

The present book includes extended and revised versions of selected papers from the 11th International Conference on Computer Supported Education (CSEDU 2019), held in Heraklion, Crete, Greece, during May 2–4, 2019. CSEDU 2019 received 202 paper submissions from 48 countries, of which 15% were included in this book. The papers were selected by the event chairs based on a number of criteria that include the classifications and comments provided by the Program Committee members, the session chairs’ assessment, and also the program chairs’ global view of all papers included in the technical program. The authors of selected papers were then invited to submit a revised and extended version of their papers, having at least 30% additional material. CSEDU 2019 is an annual meeting place for presenting and discussing new educational tools and environments, best practices and case studies on innovative technology-based learning strategies, and institutional policies on computer supported education including open and distance education. CSEDU welcomes research and practice articles on current technologies, as well as emerging trends, and promotes discussion about the pedagogical potential of new educational technologies in the academic and corporate world. In this collection of extended papers, the breadth of research covered by the CSEDU community is immediately evident. Research presented here occurs in a number of important learning contexts, including collaborative learning, higher education, online learning, informal learning, and with learners of all ages. The articles also explore a rich and exciting set of technologies and approaches, such as adaptive educational technologies, learning analytics, game-based approaches, virtual and augmented reality, social technologies, and physiological sensing. Finally, readers will see that papers also touch on a wide range of target domains, including computer programming, computational thinking, electronics, population modeling, self-regulation, and more. We are very happy to share this exciting set of papers with the broader educational technology research community. We would like to thank all the authors for their contributions, and also to the reviewers who helped to ensure the quality of this publication. May 2019

H. Chad Lane Susan Zvacek James Uhomoibhi

Organization

Conference Chair James Uhomoibhi

Ulster University, UK

Program Co-chairs H. Chad Lane Susan Zvacek

University of Illinois at Urbana-Champaign, USA SMZTeaching.com, USA

Program Committee Dor Abrahamson Mehdi Adda Nelma Albuquerque Efthimios Alepis Eleftheria Alexandri Richard Alterman Fahriye Altinay António Andrade Francisco Arcega Juan Ignacio Asensio Breno Azevedo Nilufar Baghaei Zoltan Balogh Adriano Baratè João Barros Marcelo Barros Patrícia Bassani Emad Bataineh Kay Berkling Gilda Bernardino de Campos Pavol Bistak Andreas Bollin Elmar-Laurent Borgmann Ivana Bosnic Federico Botella Rosa Bottino

University of California, Berkeley, USA Université du Québec à Rimouski, Canada Concepts and Insights, Brazil University of Piraeus, Greece Hellenic Open University, Greece Brandeis University, USA Near East University, Cyprus Universidade Católica Portuguesa, Portugal Universidad de Zaragoza, Spain University of Valladolid, Spain Instituto Federal de Educação, Ciência e Tecnologia Fluminense, Brazil Unitec, New Zealand Constantine the Philosopher University in Nitra, Slovakia Università degli Studi di Milano, Italy Polytechnic Institute of Beja, Portugal Federal University of Campina Grande, Brazil Universidade Feevale, Brazil Zayed University, UAE Baden-Württemberg Cooperative State University, Germany Pontifícia Universidade Católica do Rio de Janeiro, Brazil Slovak University of Technology, Slovakia Klagenfurt University, Austria Koblenz University of Applied Sciences, Germany University of Zagreb, Croatia Miguel Hernandez University of Elche, Spain CNR, Italy

viii

Organization

François Bouchet Patrice Bouvier Tharrenos Bratitsis Krysia Broda Martin Bush Egle Butkeviciene Santi Caballé Renza Campagni Pasquina Campanella Chris Campbell Thibault Carron Ana Carvalho Vítor Carvalho Cristian Cechinel Isabel Chagas Chia-Hu Chang Mohamed Chatti Lukas Chrpa Maria Cinque António Coelho Marc Conrad Ruth Cook Fernando Costa Gennaro Costagliola Manuel Perez Cota John Cuthell Rogério da Silva Ines Dabbebi Sergiu Dascalu Angélica de Antonio Luis de-la-Fuente-Valentín Giuliana Dettori Tania Di Mascio Yannis Dimitriadis Amir Dirin Danail Dochev

Georg Dr. Schneider Toby Dragon Benedict du Boulay Amalia Duch Ishbel Duncan Nour El Mawas Larbi Esmahi João Esteves

Laboratoire d’Informatique de Paris 6, France SYKO Studio, France University of Western Macedonia, Greece Imperial College London, UK London South Bank University, UK Kaunas University of Technology, Lithuania Open University of Catalonia, Spain Università di Firenze, Italy University of Bari, Italy Griffith University, Australia Pierre and Marie Curie University, France University of Coimbra, Portugal IPCA-EST, University of Minho, Portugal Universidade Federal de Pelotas, Brazil Universidade de Lisboa, Portugal National Taiwan University, Taiwan, China University of Duisburg-Essen, Germany Czech Technical University in Prague, Czech Republic LUMSA Università, Italy University of Porto, Portugal University of Bedfordshire, UK DePaul University, USA Universidade de Lisboa, Portugal Università degli Studi di Salerno, Italy University of Vigo, Spain Virtual Learning, UK University of Houston-Victoria, USA LIUM, France University of Nevada, Reno, USA Universidad Politécnica de Madrid, Spain Universidad Internacional de la Rioja, Spain ITD-CNR, Italy University of L’Aquila, Italy University of Valladolid, Spain Haaga-Helia UAS, Finland Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Bulgaria Trier University of Applied Sciences, Germany Ithaca College, USA University of Sussex, UK Politechnical University of Catalonia, Spain University of St Andrews, UK Université de Lille, France Athabasca University, Canada University of Minho, Portugal

Organization

Ramon Fabregat Gesa Sanaz Fallahkhair Si Fan Michalis Feidakis Richard Ferdig Rosa Fernandez-Alcala Rita Francese Jesús G. Boticario Mikuláš Gangur Francisco García Peñalvo Isabela Gasparini Piotr Gawrysiak Sébastien George Biswadip Ghosh Henrique Gil Mercè Gisbert Apostolos Gkamas Anabela Gomes Cristina Gomes Maria Gomes Nuno Gonçalves Carina Gonzalez Ana González Marcos Anandha Gopalan Angela Guercio Christian Guetl Nathalie Guin David Guralnick Roger Hadgraft Yasunari Harada Cecilia Haskins Oriel Herrera Antonio Hervás Jorge Janet Hughes Nnenna Ibezim Tomayess Issa Ivan Ivanov Marc Jansen Jose Janssen Hannu-Matti Järvinen Stéphanie Jean-Daubias Michail Kalogiannakis Atis Kapenieks Vaggelis Kapoulas

ix

Universitat de Girona, Spain University of Brighton, UK University of Tasmania, Australia University of West Attica (UniWA), Greece Kent State University, USA University of Jaen, Spain Università degli Studi di Salerno, Italy aDeNu Research Group, UNED, Spain University of West Bohemia, Czech Republic Salamanca University, Spain UDESC, Brazil Warsaw University of Technology, Poland Le Mans University, France Metropolitan State University of Denver, USA Escola Superior de Educação do Instituto Politécnico de Castelo Branco, Portugal Universitat Rovira i Virgili, Spain University Ecclesiastical Academy of Vella, Greece Coimbra Polytechnic (ISEC), Portugal Instituto Politécnico de Viseu, Portugal University of Minho, Portugal Polytechnic Institute of Setúbal, Portugal Universidad de la Laguna, Spain Universidad de la Rioja, Spain Imperial College London, UK Kent State University, USA Graz University of Technology, Austria Université Claude Bernard Lyon 1, France Kaleidoscope Learning, USA University of Technology Sydney, Australia Waseda University, Japan Norwegian University of Science and Technology, Norway Universidad Catolica de Temuco, Chile Universidad Politécnica de Valencia, Spain The Open University, UK University of Nigeria Nsukka, Nigeria Curtin University, Australia SUNY Empire State College, USA University of Applied Sciences Ruhr West, Germany Open Universiteit, The Netherlands Tampere University, Finland Université Claude Bernard Lyon 1, LIRIS, France University of Crete, Greece Riga Technical University, Latvia Computer Technology Institute and Press, Greece

x

Organization

Ilias Karasavvidis Jerzy Karczmarczuk David Kaufman Jalal Kawash Mizue Kayama Samer Khasawneh Rob Koper Maria Kordaki Adamantios Koumpis Miroslav Kulich Lam-for Kwok Jean-Marc Labat José Lagarto Elicia Lanham Rynson Lau Geoffrey Lautenbach Borislav Lazarov José Leal Dominique Leclet Chien-Sing Lee Mark Lee Newton Lee Marie Lefevre José Alberto Lencastre Andrew Lian Cheng-Min Lin Andreas Lingnau Martin Llamas-Nistal Chee-Kit Looi Luca Andrea Ludovico Krystina Madej Maria Marcelino Massimo Marchiori Jacek Marciniak Ivana Marenzi José Marques Lindsay Marshall Alke Martens Scheila Martins Demétrio Matos Bruce Maxim Madeth May Godfrey Mayende Elvis Mazzoni

University of Thessaly, Greece University of Caen, France Simon Fraser University, Canada University of Calgary, Canada Shinshu University, Japan Walsh University, USA Open Universiteit, The Netherlands University of the Aegean, Greece Berner Fachhochschule, Switzerland Czech Technical University in Prague, Czech Republic City University of Hong Kong, Hong Kong, China Pierre and Marie Curie University, France Universidade Católica Portuguesa, Portugal Deakin University, Australia City University of Hong Kong, Hong Kong, China University of Johannesburg, South Africa Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Bulgaria University of Porto, Portugal UPJV, France Sunway University, Malaysia Charles Sturt University, Australia Newton Lee Laboratories LLC and Institute for Education Research and Scholarships, USA Université Claude Bernard Lyon 1, France University of Minho, Portugal Suranaree University of Technology, Thailand Nan Kai University of Technology, Taiwan, China Ruhr West University of Applied Sciences, Germany University of Vigo, Spain Nanyang Technological University, Singapore Università degli Studi di Milano, Italy Georgia Tech, USA University of Coimbra, Portugal University of Padua, Italy Adam Mickiewicz University, Poland Leibniz University Hannover, Germany FEUP, Portugal Newcastle University, UK Universität of Rostock, Germany University of Houston-Victoria, USA IPCA-ID+, Portugal University of Michigan-Dearborn, USA Le Mans University, France University of Agder, Norway University of Bologna, Italy

Organization

José Carlos Metrôlho Louise Mifsud Bakhtiar Mikhak Peter Mikulecky Andreea Molnar Gyöngyvér Molnár António Moreira Jerzy Moscinski Maria Moundridou Antao Moura Michal Munk Cristina Muntean Antoanela Naaji Hiroyuki Nagataki Ryohei Nakatsu Minoru Nakayama Sotiris Nikolopoulos Fátima Nunes Dade Nurjanah Emma O’Brien Ebba Ossiannilsson Kuo-liang Ou Alessandro Pagano José Palma Manuel Palomo Duarte Stamatios Papadakis Iraklis Paraskakis Pramod Pathak Emanuel Peres Paula Peres Arnulfo Perez Isidoros Perikos Donatella Persico Alfredo Pina Niels Pinkwart Elvira Popescu Tamara Powell Francesca Pozzi Augustin Prodan

xi

Instituto Politécnico de Castelo Branco, Portugal Oslo Metropolitan University, Norway Harvard University, USA University of Hradec Kralove, Czech Republic Swinburne University of Technology, Australia University of Szeged, Hungary Universidade de Aveiro, Portugal Silesian University of Technology, Poland School of Pedagogical and Technological Education (ASPETE), Greece Federal University of Campina Grande (UFCG), Brazil Constantine the Philosopher University in Nitra, Slovakia National College of Ireland, Ireland Vasile Goldis Western University of Arad, Romania Osaka Electro-Communication University, Japan National University of Singapore, Singapore Tokyo Institute of Technology, Japan Technological Educational Institute of Larissa, Greece Universidade de São Paulo, Brazil Telkom University, Indonesia Mary Immaculate College, Ireland The Swedish Association for Distance Education (SADE), Sweden National Tsinghua University, Taiwan, China University of Bari, Italy Polytechnic Institute of Setúbal, Portugal Universidad de Cádiz, Spain Department of Preschool Education, Faculty of Education, University of Crete, Greece South East European Research Centre, Greece National College of Ireland, Ireland University of Trás-os-Montes and Alto Douro, INESC TEC, Portugal ISCAP, Portugal The Ohio State University, USA University of Patras, Greece CNR, Italy Public University of Navarra, Spain Humboldt University, Germany University of Craiova, Romania Kennesaw State University, USA CNR, Italy Iuliu Hatieganu University, Romania

xii

Organization

Franz Puehretmair

Clark Quinn Muthu Ramachandran Altina Ramos Fernando Ramos Eliseo Reategui Manuel Reis Fernando Ribeiro Sandro Rigo Razvan Rughinis Rebecca Rutherfoord Barbara Sabitzer Libuse Samkova Demetrios Sampson Juan M. Santos Manoj Saxena Sabine Schlag Wolfgang Schreiner Ulrik Schroeder Sabine Seufert Haya Shamir Ali Fawaz Shareef Pei Siew Juarez Bento da Silva Jane Sinclair Natalya Snytnikova Filomena Soares Ellis Solaiman Michael Sonntag Marcus Specht J. Spector Claudia Steinberger Jun-Ming Su Masanori Sugimoto Katsuaki Suzuki Nestori Syynimaa Qing Tan Steven Tanimoto Luca Tateo Dirk Tempelaar Marco Temperini Uwe Terton Neena Thota Tomas Trescak

Competence Network Information Technology to Support the Integration of People with Disabilities (KI-I), Austria Quinnovation, USA Leeds Beckett University, UK University of Minho, Portugal University of Aveiro, Portugal Universidade Federal do Rio Grande do Sul, Brazil University of Trás-os-montes and Alto Douro, Portugal Instituto Politécnico de Castelo Branco, Portugal Unisinos, Brazil University POLITECHNICA of Bucharest, Romania Kennesaw State University, USA Johannes Kepler University Linz, Austria University of South Bohemia, Czech Republic University of Piraeus, Greece University of Vigo, Spain Central University of Himachal Pradesh, India University of Wuppertal, Germany Johannes Kepler University Linz, Austria RWTH Aachen University, Germany University of St. Gallen, Switzerland Waterford Research Institute, USA Cyryx College, Maldives Universiti Tunku Abdul Rahman, Malaysia Universidade Federal de Santa Catarina, Brazil University of Warwick, UK Novosibirsk State University, Russia University of Minho, Portugal Newcastle University, UK Johannes Kepler University Linz, Austria TU Delft, The Netherlands University of North Texas, USA University of Klagenfurt, Austria National University of Tainan, Taiwan, China Hokkaido University, Japan Kumamoto University, Japan University of Jyväskylä, Finland Athabasca University, Canada University of Washington, USA Aalborg University, Denmark Maastricht University, The Netherlands Sapienza University of Rome, Italy University of the Sunshine Coast, Australia University of Massachusetts Amherst, USA Western Sydney University, Australia

Organization

Abdallah Tubaishat Richard Van Eck Leo van Moergestel Carlos Vaz de Carvalho Andreas Veglis Ioanna Vekiri J. Velázquez-Iturbide Giuliano Vivanet Bahtijar Vogel Harald Vranken Alf Wang Fangju Wang Edgar Weippl David Whittinghill Leandro Wives Jie Yang Amel Yessad Katarina Zakova Diego Zapata-Rivera Thomas Zarouchas Iveta Zolotova Javier García Zubía

Zayed University, UAE University of North Dakota, USA University of Applied Sciences Utrecht, The Netherlands ISEP, Portugal Aristotle University of Thessaloniki, Greece Independent Researcher, Greece Universidad Rey Juan Carlos, Spain University of Cagliari, Italy Malmö University, Sweden Open Universiteit, The Netherlands Norwegian University of Science and Technology, Norway University of Guelph, Canada University of Vienna, SBA Research, Austria Purdue University, USA Universidade Federal do Rio Grande do Sul, Brazil National Central University, Taiwan, China Laboratoire d’Informatique de Paris 6, Sorbonne Université, France Slovak University of Technology in Bratislava, Slovakia Educational Testing Service, USA Computer Technology Institute and Press, Greece Technical University of Kosice, Slovakia Universidad de Deusto, Spain

Additional Reviewers Sandra Elsom Laura Freina Richard Ladner Andrew Munoz Marcello Passarelli Luis Pedro Connor Scully-Allison Eiji Tomida Yasushi Tsubota

University of the Sunshine Coast, Australia CNR, Institute for Educational Technology, Italy University of Washington, USA University of Nevada, Reno, USA CNR, Italy University of Aveiro, Portugal University of Nevada, Reno, USA Ehime University, Japan Kyoto University, Japan

Invited Speakers Maria Roussou Michael Baker

University of Athens, Greece CNRS, France

xiii

Contents

How Programming Students Trick and What JEdUnit Can Do Against It . . . . Nane Kratzke

1

Exploring the Affordances of SimReal for Learning Mathematics in Teacher Education: A Socio-Cultural Perspective . . . . . . . . . . . . . . . . . . Said Hadjerrouit

26

Feedback Preferences of Students Learning in a Blended Environment: Worked Examples, Tutored and Untutored Problem-Solving . . . . . . . . . . . . . Dirk T. Tempelaar, Bart Rienties, and Quan Nguyen

51

Teaching Defence Against the Dark Arts Using Game-Based Learning: A Review of Learning Games for Cybersecurity Education . . . . . . . . . . . . . Rene Roepke and Ulrik Schroeder

71

A Collective Dynamic Indicator for Discussion Forums in Learning Management Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Malik Koné, Madeth May, Sébastien Iksal, and Souleymane Oumtanaga

88

Immersion and Control in Learning Art Knowledge: An Example in Museum Visit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Morgane Burgues, Nathalie Huet, and Jean-Christophe Sakdavong

111

Studying Relationships Between Network Structure in Educational Forums and Students’ Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O. Ferreira-Pires, M. E. Sousa-Vieira, J. C. López-Ardao, and M. Fernández-Veiga Emotion Recognition from Physiological Sensor Data to Support Self-regulated Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haeseon Yun, Albrecht Fortenbacher, René Helbig, Sven Geißler, and Niels Pinkwart

128

155

Separating the Disciplinary, Application and Reasoning Dimensions of Learning: The Power of Technology-Based Assessment . . . . . . . . . . . . . . Gyöngyvér Molnár and Benő Csapó

174

Behind the Shoulders of Bebras Teams: Analyzing How They Interact with the Platform to Solve Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carlo Bellettini, Violetta Lonati, Mattia Monga, and Anna Morpurgo

191

xvi

Contents

Computational Pedagogy: Block Programming as a General Learning Tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefano Federici, Elisabetta Sergi, Claudia Medas, Riccardo Lussu, Elisabetta Gola, and Andrea Zuncheddu RefacTutor: An Interactive Tutoring System for Software Refactoring . . . . . . Thorsten Haendler, Gustaf Neumann, and Fiodor Smirnov

211

236

User-Centered Design: An Effective Approach for Creating Online Educational Games for Seniors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Louise Sauvé and David Kaufman

262

Population Growth Modelling Simulations: Do They Affect the Scientific Reasoning Abilities of Students? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kathy Lea Malone and Anita Schuchardt

285

Novice Learner Experiences in Software Development: A Study of Freshman Undergraduates . . . . . . . . . . . . . . . . . . . . . . . . . . . . Catherine Higgins, Ciaran O’Leary, Claire McAvinia, and Barry Ryan

308

Promoting Active Participation in Large Programming Classes . . . . . . . . . . . Sebastian Mader and François Bry

331

Agile Methods Make It to Non-vocational High Schools . . . . . . . . . . . . . . . Ilenia Fronza, Claus Pahl, and Boris Sušanj

355

Testing the Testing Effect in Electrical Science with Learning Approach as a Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . James Eustace and Pramod Pathak A Web Platform to Foster and Assess Tonal Harmony Awareness . . . . . . . . Federico Avanzini, Adriano Baratè, Luca A. Ludovico, and Marcella Mandanici MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment to Boost Mathematical Competence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alice Barana, Marina Marchisio, and Raffaella Miori Educational Practices in Computational Thinking: Assessment, Pedagogical Aspects, Limits, and Possibilities: A Systematic Mapping Study . . . . . . . . . . Lúcia Helena Martins-Pacheco, Nathalia da Cruz Alves, and Christiane Gresse von Wangenheim Virtual Reality and Virtual Lab-Based Technology-Enhanced Learning in Primary School Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diana Bogusevschi and Gabriel-Miro Muntean

373 398

418

442

467

Contents

Increasing Parental Involvement in Computer Science Education Through the Design and Development of Family Creative Computing Workshops . . . . Nina Bresnihan, Glenn Strong, Lorraine Fisher, Richard Millwood, and Áine Lynch Retention of University Teachers and Doctoral Students in UNIPS Pedagogical Online Courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Samuli Laato, Heidi Salmento, Emilia Lipponen, Henna Vilppu, Mari Murtonen, and Erno Lehtinen Designing Culturally Inclusive MOOCs . . . . . . . . . . . . . . . . . . . . . . . . . . . Mana Taheri, Katharina Hölzle, and Christoph Meinel Evaluation of an Interactive Personalised Virtual Lab in Secondary Schools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ioana Ghergulescu, Arghir-Nicolae Moldovan, Cristina Hava Muntean, and Gabriel-Miro Muntean

xvii

479

503

524

538

Potential Benefits of Playing Location-Based Games: An Analysis of Game Mechanics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Samuli Laato, Tarja Pietarinen, Sampsa Rauti, and Erkki Sutinen

557

Resolving Efficiency Bottleneck of the Bellman Equation in Adaptive Teaching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fangju Wang

582

Ontology-Based Analysis and Design of Educational Games for Software Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thorsten Haendler and Gustaf Neumann

602

in-Game Raw Data Collection and Visualization in the Context of the “ThimelEdu” Educational Game . . . . . . . . . . . . . . . . . . . . . . . . . . . Nikolas Vidakis, Anastasios Kristofer Barianos, Apostolos Marios Trampas, Stamatios Papadakis, Michail Kalogiannakis, and Kostas Vassilakis Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

629

647

How Programming Students Trick and What JEdUnit Can Do Against It Nane Kratzke(B) L¨ ubeck University of Applied Sciences, M¨ onkhofer Weg 239, 23562 L¨ ubeck, Germany [email protected]

Abstract. According to our data, about 15% of programming students trick if they are aware that only a “dumb” robot evaluates their programming assignments unattended by programming experts. Especially in large-scale formats like MOOCs, this might become a question because to trick current automated assignment assessment systems (APAAS) is astonishingly easy and the question arises whether unattended grading components grade the capability to program or to trick. This study analyzed what kind of tricks students apply beyond the well-known “copypaste” code plagiarism to derive possible mitigation options. Therefore, this study analyzed student cheat patterns that occurred in two programming courses and developed a unit testing framework JEdUnit as a solution proposal that intentionally targets such tricky educational aspects of programming. The validation phase validated JEdUnit in another programming course. This study identified and analyzed four recurring cheat patterns (overfitting, evasion, redirection, and injection) that hardly occur in “normal” software development and are not aware to normal unit testing frameworks that are frequently used to test the correctness of student submissions. Therefore, the concept of well-known unit testing frameworks was extended by adding three “countermeasures”: randomization, code inspection, separation. The validation showed that JEdUnit detected these patterns and in consequence, reduced cheating entirely to zero. From a students perspective, JEdUnit makes the grading component more intelligent, and cheating does not pay-off anymore. This Chapter explains the cheat patterns and what features of JEdUnit mitigate them by a continuous example. Keywords: Automatic · Assessment · Programming · Course Education APAAS · MOOC · Moodle · VPL · Trick · Cheat · Pattern · Mitigation

1

·

Introduction

In a digitized world, more and more experts are needed with at least some basic programming skills. Programming might even evolve into a foundational skill similar to reading, writing, and calculating. Therefore, the course sizes of c Springer Nature Switzerland AG 2020  H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 1–25, 2020. https://doi.org/10.1007/978-3-030-58459-7_1

2

N. Kratzke

university and college programming courses are steadily increasing. Even massive open online courses [14] are used more and more systematically to convey necessary programming capabilities to students of different disciplines [19]. The coursework consists of programming assignments that need to be assessed. Since the submitted assignments are executable programs with a formal structure, they are highly suited to be assessed automatically. In consequence, plenty of automated programming assignment assessment systems (APAAS) evolved. We refer to [1–3,5,9,17] for an overview of such tools. Previous research [11] showed how astonishingly simple it is for students to trick automated programming assignment assessment systems. It is often overlooked that APAAS solutions are systems that execute injected code (student submissions) and code injection is known as a severe threat from a security point of view [20]. We refer to [7,15], and [6] for an overview of such kind of attacks. Of course, such code injection vulnerabilities are considered by current solutions. However, in previous research [11], it was astonishing to see that current APAAS solutions sometimes overlook the cheating cleverness of students. On the one hand, APAAS solutions protect the host system via sandbox mechanisms, and APAAS systems put much effort in sophisticated plagiarism detection and authorship control of student submissions [13,16]. On the other, the grading component can be cheated in various – sometimes ridiculously simple – ways making these solutions highly suspect for (semi-)automated and unattended programming examinations that contribute to certificate a certain level of programming expertise. Previous research [11] identified at least four simple cheat patterns: – – – –

Overfitting Evasion Redirection Injection

Moreover, it strived to raise a general problem awareness but did not focus on solutions to mitigate these patterns. To propose solutions on how to mitigate these identified cheat patterns is the primary intent and contribution of this Chapter. We propose to use the following three techniques: – Randomization of test cases – Pragmatic code inspections – Separation of student and evaluation logic These techniques mitigate the presented patterns and demonstrate the suitability for the APAAS solution Moodle/VPL and the programming language Java. Nevertheless, the principles are transferable to other APAAS solutions and programming languages and therefore of broader interest and not limited to Moodle/VPL and Java. Consequently, the remainder of this paper is outlined as follows. Section 2 presents the methodology that has been used to identify and categorize student cheat-patterns and to validate appropriate “countermeasures”. Section 3 will explain the identified cheat-patterns. Section 4 presents a mitigation analysis

How Programming Students Trick and What JEdUnit Can Do Against It

3

and shows that the identified cheat patterns can be addressed by three “countermeasures” that should be considered by every APAAS solution. Section 5 explains how these insights have been considered in the development of a unit testing framework (JEdUnit) that focuses intentionally on educational aspects and considers the aspects mentioned above consequently. We will discuss our results of JEdUnit in Sect. 6 and provide some guidelines on the generalizability and limitations of this research. Finally, we conclude our findings in Sect. 7.

Fig. 1. Research methodology.

2

Methodology

Figure 1 presents the overall research approach that compromised two phases: Problem systematization and solution proposal validation. 2.1

Problem Systematization

For the initial problem systematization, two first semester programming Java courses in the winter semester 2018/19 (see Table 1) have been systematically evaluated. Both courses were used to search for student submissions that intentionally trick the grading component of APAAS solutions. Table 1 provides a summarized overview of the course design. All assignments were automatically evaluated by the VPL Moodle plugin (version 3.3.3) following the general recommendations described by [21]. For more details, the reader is referred to [11]. Figure 2 shows an exemplifying VPL screenshot from a students perspective.

4

N. Kratzke

Fig. 2. VPL screenshot (on the right evaluation report presented to students) [11]. Table 1. Courses used for problem systematization and solution validation. Systematization Prog I (CS) Prog I (ITD)

Validation Prog II (CS)

Students 113 29 Assignments (total) 11 Number of bunches 3 Assignments per bunch (avg) 1 Time for a bunch (weeks)

79 20 6 3 2

73 8 7 1 2

Groups Students per group (avg) Student/advisor ratio (avg)

6 12 6

4 18 6

7 18 6

To minimize Hawthorne and Experimenter effects [4] neither the students nor the advisers in the practical programming courses were aware that student submissions were analyzed to deduce cheating patterns. Even if cheating was detected, this had no consequences for the students. It was not even communicated to the student or the advisers. Furthermore, students were not aware that the version history of their submissions and therefore even intermediate cheating experiments (that did not make it to the final submission) were logged. However, not every submission was inspected for understandable effort reasons. Therefore, only significant submission samples (see Table 2) were investigated to search systematically for cheat-patterns. Table 3 summarizes the results quantitatively. Within these six samples, cheat and trick patterns were identified mainly by a manual but a scriptsupported observation. VPL submissions were downloaded from Moodle and analyzed weekly. We developed a Jupyter -based [10] quantitative analysis and

How Programming Students Trick and What JEdUnit Can Do Against It

5

submission data model for this dataset. Each student submission was represented as an object containing its version and grading history that references its student submitter and its corresponding study programme. The analytical script and data model made use of the well known Python libraries statistics, NumPy [12], matplotlib [8], and the Javaparser library [18]. It was used to identify the number of submissions and evaluations, points per submission versions, timestamps per submission version, occurrences of unusual terms, and more. Based on this quantitative data, the mentioned samples (S1–S5) were selected automatically (or randomly in case of S6). Additionally, the source codes of the sample submissions were exported weekly as an archived PDF document. However, the scanning for cheat-patterns was done manually within these documents. Table 2. Weekly analyzed samples (systematization phase). Description

2.2

Rationale

S1

TOP 10 of submissions with many Parameter optimization could triggered evaluations cause plenty of evaluations

S2

TOP 10 of submission with many versions

S3

TOP 10 of submissions with Cheating could cause such point astonishingly less average points boosts across all triggered evaluations but full points in the final submission

S4

Submissions with unusual (above 95% percentile) many condition related terms like if, return, switch, case, and so on

S5

Submissions with unusual terms APAAS attacks could cause such like System.exit, unusual terms System.getProperties, :=>> that would stop program execution or have a special meaning in VPL or Java but are unlikely to contribute to a problem solution

S6

Ten random submissions

Cheating experiments could cause plenty of versions

Parameter optimization could cause unuasal many condition related terms

To cover unintended observation aspects

Solution Proposal Validation

Based on these elicited cheat patterns, corresponding mitigation options have been derived. Three of them (randomization, code inspection, and submission/evaluation logic separation) have been implemented in a unit testing framework called JEdUnit as a solution proposal to mitigate the identified problems. JEdUnit has been validated using a Programming II course for computer science students in the summer semester 2019. The course has been given analogously

6

N. Kratzke

to the systematization phase except that JEdUnit has been applied. The search for cheats have been conducted similarly, except the fact, that we inspected every submission because of smaller course size and less (but more extensive and more complex) assignments. The mitigation options and JEdUnit, as well as the validation results, are presented in the following Sections.

3

Cheat-Patterns to Consider

Some basic Java programming knowledge must be assumed throughout this paper. The continuous example assignment for this Chapter shall be the following task. A method countChar() has to be programmed that counts the occurrence of a specific character c in a given String s (not case-sensitive). The following example calls are provided for a better understanding of the intended functionality. – – – –

countChar(‘a’, countChar(‘A’, countChar(‘x’, countChar(‘!’,

"Abc") "abc") "ABC") "!!!")

→ → → →

1 1 0 3

A reference solution for our “count chars in a string” problem might be the following implementation of countChar(). Listing 1.1. Reference solution (continuous example). int countChar ( char c , String s ) { s = s . toLowerCase (); c = Character . toLowerCase ( c ); int i = 0; for ( char x : s . toCharArray ()) { if ( x == c ) i ++; } return i ; }

According to our data, most students strive to find a solution that fits the scope and intent of the assignment (see Table 3 and Fig. 3). However, in the systematization phase, a minority of students (approximately 15%) make use of the fact that a “dumb automat” grades. Accordingly, we observed the following cheating patterns that differ significantly from the intended reference solution above (see Fig. 3): – – – –

Overfitting solutions (63%) Redirection to reference solutions (6%) Problem evasion (30%) Injection (1%)

How Programming Students Trick and What JEdUnit Can Do Against It

7

Table 3. Detected cheats.

Fig. 3. Observed cheat-pattern frequency (without application of JEdUnit) [11].

Especially overfitting and evasion tricks are “poor-mans’ weapons” often used by novice programmers as a last resort to solve a problem. Much more alarming redirection and injection cheats occurred only in rare cases (less than 10%). However, how do these tricks and cheats look like? How severe are they? More-

8

N. Kratzke

over, what can be done against it? We will investigate these questions in the following paragraphs. 3.1

Overfitting Tricks

Overfitted solutions strive to get a maximum of points for grading but do not strive to solve the given problem in a general way. A notable example of an overfitted solution would be Listing 1.2. Listing 1.2. Overfitting solution. int countChar ( char c , String s ) { if ( c == ’a ’ && s . equals ( " Abc " )) if ( c == ’A ’ && s . equals ( " abc " )) if ( c == ’x ’ && s . equals ( " ABC " )) if ( c == ’! ’ && s . equals ( " !!! " )) // [...] if ( c == ’x ’ && s . equals ( " X " )) return 42; }

return return return return

1; 1; 0; 3;

return 1;

This solution maps merely the example input parameters to the expected output values. The solution is completely useless outside the scope of the test cases. 3.2

Problem Evasion Tricks

Another trick pattern is to evade a given problem statement. According to our experiences, this pattern occurs mainly in the context of more sophisticated and formal programming techniques like recursive programming or functional programming styles with lambda functions. So, let us now assume that the assignment is still to implement a countChar() method, but this method should be implemented recursively. A reference solution might look like in Listing 1.3 (we do not consider performance aspects due to tail recursion): Listing 1.3. Recursive reference solution. int countChar ( char c , String s ) { s = s . toLowerCase (); c = Character . toLowerCase ( c ); if ( s . isEmpty ()) return 0; char head = s . charAt (0); String rest = s . substring (1); int n = head == c ? 1 : 0; return n + countChar (c , rest ); }

How Programming Students Trick and What JEdUnit Can Do Against It

9

However, sometimes, student submissions only pretend to be recursive without being it. Listing 1.4 is a notable example. Listing 1.4. Problem evasing solution. int countChar ( char c , String s ) { if ( s . isEmpty ()) return 0; return countChar (c , s , 0); } int countChar ( char c , String s , int i ){ for ( char x : s . toCharArray ()) { if ( x == c ) i ++; } return i ; }

Although countChar() is calling (an overloaded version of) countChar() which looks recursively, the overloaded version of countChar() makes use of a for-loop and is therefore implemented in a fully imperative style. The same pattern can be observed if an assignment requests functional programming with lambda functions. A lambda-based solution could look like in Listing 1.5. Listing 1.5. Lambda reference solution. (c , s ) -> Stream . of ( s . toCharArray ()) . filter ( x -> x == c ) . count ();

However, students take refuge in familiar programming concepts like loops. Very often, submissions like in Listing 1.6 are observable: Listing 1.6. Lambda problem evasion. (c , s ) -> { int i = 0; for ( char x : s . toCharArray ()) { if ( x == c ) i ++; } return i ; };

The (c, s) -> { [...] }; seems functional on the first-hand. But, if we look at the implementation, it is only an imperative for loop embedded in a functional looking context. The problem here is, that evaluation components just looking on inputoutput parameter correctness will not detect these kinds of programming style evasions. The just recursive- or functional-looking solutions will generate the correct results. Nevertheless, the intent of such kind of assignments is not just to foster correct solutions but although to train specific styles of programming.

10

3.3

N. Kratzke

Redirection Cheats

Another shortcoming of APAAS solutions can be compiler error messages that can reveal details of the evaluation logic. In the case of VPL, an evaluation is processed according to the following steps. 1. The submission is compiled and linked to the evaluation logic. 2. The compiled result is executed to run the checks. 3. The check results are printed in an APAAS specific notation on the console (standard-out). 4. This output is interpreted by the APAAS solution to run the automatic grading and present a kind of feedback to the submitter. This process is straightforward and provides the benefit that evaluation components can handle almost all programming languages. If one of the steps fails, an error message is generated and returned to the submitter as feedback. This failing involves typically to return the compiler error message. That can be problematic because these compiler error messages may provide unexpected cheating opportunities. Let us remember. The assignment was to program a method countChar(). Let us further assume that a student makes a small spelling error like to name instead of countChar() – so just a trailing s is the method added. That is a general programming error that happens fast (see Listing 1.7). Listing 1.7. A slightly misspelled submission. int countCharS ( char c , String s ) { int i = 0; for ( char x : s . toCharArray ()) { if ( x == c ) i ++; } return i ; }

If this submission would be submitted and evaluated by an APAAS solution, this submission would likely not pass the first compile step due to a simple spelling error. What is returned is a list of compiler error messages like this one here that shows the problem: Checks.java:40: error: cannot find symbol Submission.>>countChar to give points – Comment :=>> for hints and remarks that should be presented to the submitter as feedback. VPL assumes that students are not aware of this knowledge. It is furthermore (somehow inherently) assumed that student submissions do not write to the console (just the evaluation logic should do that) – but it is possible for submissions to place arbitrary output on the console and is not prohibited by the Jails server. So, these assumptions are a fragile defence. A quick internet search with the search terms "grade VPL" will turn up the documentation of VPL explaining how the grading component is working under the hood. So, submissions like Listing 1.9 are possible and executable. Listing 1.9. Injection submission. int countChar ( char c , String s ) { System . out . print ( " Grade := > > 100 " ); System . exit (0); return 0; // for compiler silence }

12

N. Kratzke

The intent of such a submission is merely to inject a line like "Grade :=>> 100"’ into the output stream to let the grading component evaluate the submission with full points.

4

Mitigation Analysis of Cheat Patterns

So, in our problem systematization phase, we identified four patterns of tricking or cheating that can be observed in regular programming classes. These tricks work because students know that a dumb automat checks their submissions. In the following Sects. 4.1, 4.2, 4.3, 4.4 and we will ask the question what can be done to make APAAS more “intelligent” to prevent this kind of cheating? 4.1

What Can Be Done to Prevent Overfitting ?

Randomized test data make overfitted submissions ineffective. Therefore, our general recommendation is to give a substantial fraction of points for randomized test cases. However, to provide some control over randomized tests, these tests must be pattern based to trigger expectable problems (e.g., off-by-one errors, boundary cases) in student submissions. We refer to [17] for further details. E.g. for string-based data, we gained promising results to generate random strings merely by applying regular expressions [22] inversely. Section 5.1 explains how randomization is used by JEdUnit to tackle the overfitting problem. 4.2

What Can Be Done to Avoid Problem Evasion?

Problem evasion cannot be detected by comparing the equality of inputoutput results (black-box testing). To mitigate problem evasion, we need automated code inspection approaches (white-box inspections). Submissions must be scanned for unintended usage of language concepts like for and while loops. However, this makes it necessary to apply parsers and makes the assignment specific evaluation logic much more complicated and time-intensive to program. To simplify this work, we propose a selector-based model that selects nodes from an abstract syntax tree (AST) of a compilation unit to detect and annotate such kind of violations in a practical way. The approach works similarly like CSS selectors selecting parts of a DOM-tree in a web context (see Sect. 5.2). 4.3

What Can Be Done to Prevent Redirection?

Interestingly problem evasion and redirection can be solved by the same mitigation approach. Similar to evasion cheats, submissions can be scanned for unintended usage of language concepts, e.g. calls to classes containing the reference logic that is used for testing. This white box inspection approach makes it

How Programming Students Trick and What JEdUnit Can Do Against It

13

possible to scan the submission with questionable calls like Solution.x() calls. Additionally, we deny to make use of getClass() calls and the import of the reflection package. Both would enable to formulate arbitrary indirections. The techniques necessary to deny specific method calls and to deny the import of packages will be explained in Sect. 5.2. 4.4

What Can Be Done to Avoid Injection Attacks?

In a perfect world, the submission should be executed in a context that by design cannot access the grading logic. The student logic should be code that deserializes input parameters from stdin, passes them to the submitted function, and serializes the output to stdout. The grading logic should serialize parameters, pipe them into the wrapped student code, deserialize the stdout, and compare it with the reference function’s output. However, this approach would deny making use of common unit testing frameworks for evaluation although it would effectively separate the submission logic and the evaluation logic in two different processes (which would make most of the attack vectors in this setting ineffective). However, to the best of the author’s knowledge, no unit testing framework exists that separates the test logic from the to be tested logic in different processes. In the case of VPL, the shared use of the stdout (System.out) stream is given. APAAS systems that separate the submission logic stdout stream from the evaluation logic stdout stream might be not or less prone to injection attacks. However, even for VPL, there are several options to handle this problem. E.g., we can prohibit making use of the System.exit() call to assure that submissions could never stop the evaluation execution on their own. This prohibition can be realized using a Java SecurityManager – it is likely to be more complicated for other languages not providing a virtual machine built-in security concept. For these systems, parser-based solutions (see Sect. 3.2) would be a viable option (see Sect. 5.2). A very effective way to separate the stdout/stderr streams is to redirect these console streams to new (for the submission logic unaware) streams. This redirection is an astonishingly simple solution for the most severe identified problem. It will be explained in Sect. 5.3. Table 4. Mapping of presented JEdUnit features to cheat patterns. Randomization

Code Inspection

Separation

Overfitting (63%) ← Evasion

(30%)

Redirection (6%) Injection

(1%)

← ← ←

14

5

N. Kratzke

JEdUnit

The Sects. 4.1, 4.2, 4.3, and 4.4 showed that it is possible to mitigate identified cheat patterns using the strategies listed in Table 4. These insights flowed into a unit testing framework called JEdUnit. JEdUnit has a specific focus on educational aspects and strives to simplify automatic evaluation of (small) Java programming assignments using Moodle and VPL. The framework has been mainly developed for our purposes in programming classes at the L¨ ubeck University of Applied Sciences. However, this framework might be helpful for other programming instructors. Therefore, it is provided as open source. Every JEdUnit evaluation is expressed in a Checks.java compilation unit and usually relies on a reference implementation (which is by convention provided in a file called Solution.java) and a submission (which is by convention provided in a file called Main.java). However, the conventions can be adapted to assignment specific testing requirements. Listing 1.10. Reference Solution expressed in JEdUnit. public class Solution { public int countChars ( char c , String s ) { int n = 0; for ( char x : s . toLowerCase (). toCharArray ()) { if ( Character . toLowerCase ( c ) == x ) n ++; } return n ; } }

Similar to JUnit, each test case is annotated with a @Test annotation (see Listing 1.11). However, a JEdUnit @Test annotation takes two additional parameters: – weight is a value between 0 and 1.0 that indicates how much this test case contributes to the final grade of the assignment. – description is a string briefly explaining the intent of the test case. Listing 1.11. Test template expressed in JEdUnit. public class Checks extends Constraints { @Test ( weight =0.25 , description = " Example calls " ) public void t e s t _ 0 1 _ e x a m p l e C a l l s () { [...] } @Test ( weight =0.25 , description = " Boundary tests " ) public void t e s t _ 0 2 _ b o u n d a r y _ c a s e s () { [...] } @Test ( weight =0.5 , description = " Randomized tests " ) public void t e s t _ 0 3 _ r a n d o m i z e d _ c a s e s () { [...] } }

How Programming Students Trick and What JEdUnit Can Do Against It

15

A test case usually runs several test data tuples against the submitted solution and compares the result with the reference solution. A test data tuple can be created using the t() method. Listing 1.12 shows this for the example calls of our continuous example. Listing 1.12. Example test case expressed in JEdUnit. @Test ( weight =0.25 , description = " Example calls " ) public void t e s t _ 0 1 _ e x a m p l e C a l l s () { test ( t ( ’a ’ , " ABC " ) , t ( ’A ’ , " abc " ) , t ( ’X ’ , " abc " ) , t ( ’x ’ , " XxY " ) ). each ( // check d -> assertEquals ( Solution . countChars ( d . _1 , d . _2 ) , Main . countChars ( d . _1 , d . _2 ) ), // explain d -> f ( " countChar (% s , % s ) should return % s " , d . _1 , d . _2 , Solution . countChars ( d . _1 , d . _2 ) ), // on error d -> f ( " but returned % s " , Main . countChars ( d . _1 , d . _2 ) ) ); }

The each() method takes three parameters to run the evaluation on the test data provided as tuples in the test() method. – A Predicate that checks whether the submitted solution returns the same in Listing as the reference solution (indicated as correctness check 1.12). – A Function that explains the method call and reports the expected result in Listing 1.12). (indicated as the expected behavior – A Function that reports the actual result if the check predicate evaluates to in Listing 1.12). false (indicated as The functions are used to provide meaningful feedback for students. To make this straightforward JEdUnit provides the format f() method. f() is merely a convenience wrapper for the String.format() method providing additional formatting of feedback outputs that often confuses students. E.g., f() indicates non printable characters like spaces, tabulators, and carriage returns by visible , , or . Additionally f() realizes visible representation characters like map, and list representations (and more).

16

N. Kratzke Table 5. Random generators provided by JEdUnit. Method

Description

c()

Random character ([a-zA-Z])

c(String regexp)

Random character from a regular expression (first char)

s(String... regexps)

Random string from a sequence of regular expressions

s(int min, int max)

Random string between a minimum and maximum length

b()

Random boolean value

i()

Random integer value

i(int m)

Random integer value [0, m]

i(int l, int u)

Random integer value [l, u]

d()

Random double value

d(double m)

Random double value [0, m[

d(double l, double u)

Random double value [l, u[

List l(int l, Supplier g)

Random list with length l generated by g

List l(int l, int u, Supplier g)

Random list with length in the range of [l, u] generated by g

The reader is referred to the Wiki1 for a more detailed introduction to JEdUnit and further features like: – – – –

Initializing assignments. Configuration of checkstyle (coding conventions). Making use of predefined code inspections. Checking complex object-oriented class structures automatically.

However, in the following Sects. 5.1, 5.2, and 5.3 we focus mainly on how JEdUnit makes use of randomization, code inspection, and stream separation to mitigate observed overfitting, problem evasion, redirection, and injection cheating problems (see Table 4). 5.1

How Does JEdUnit Support Randomization?

JEdUnit provides a set of random generators to mitigate overfitting problems. These random generators enable to generate randomized test data in specified ranges and according to patterns to test explicitly common problem cases like boundary cases, off-by-one errors, empty data structures, and more. Because 1

https://github.com/nkratzke/JEdUnit/wiki.

How Programming Students Trick and What JEdUnit Can Do Against It

17

these generators are frequently used, they intentionally have short names (see Table 5). Using this set of random generators, we can quickly create test data – for instance a random list of predefined terms. Listing 1.13. Demonstration of a randomized list generator creating a list of strings from a regular expression. List < String > words = l (2 , 5 , () -> s ( " This | is | just | a | silly | example " ) );

The randomly generated lists have a length between two and five entries. Possible resulting lists would be: – – – –

["This", "a"] ["silly", "is", "This"] ["example", "example", "a", "a"] ["This", "is", "just", "a", "example"]

The generators shown in Table 5 are designed to work seamlessly with the test().each() pattern introduced in Listing 1.12. Listing 1.14 exemplifies a randomized test case for our continuous example. It works with random strings but generates test data to check intentionally for cases where the to be counted character is placed in front, in the middle or at the end of a string to cover frequent programming errors like off-by-one errors. Listing 1.14. Example for a randomized test case expressed in JEdUnit. @Test ( weight =0.5 , description = " Randomized tests " ) public void t e s t _ 0 3 _ r a n d o m i z e d _ c a s e s () { // Regexp to generate random strings String r = " [a - zA - Z ]{5 ,17} " ; // Pick a random character to search char c = c (); test ( t (c , s ( c + " {1 ,7} " , r , r )) , // t (C , s (r , r , c + " {1 ,7} " )) , // t (c , s (r , c + " {1 ,7} " , r )) , // ). each ( check , explain , onError ); // check , explain , onError defined

First position Last position Middle position as in Listing 1.12

}

Such kinds of test cases are not prone to overfitting because the test data is randomly generated for every evaluation. A possible generated feedback for the test case shown in Listing 1.14 would look like so:

18

N. Kratzke

– [OK] countChars(‘j’, "jUcEzCzODWWN") should return 1 – [FAILED] countChars(‘j’, "zOdAqavJJkxjvrjj") should return 5 but returned 3 – [OK] countChars(‘j’, "SPAqlORwxjjjjRHIKCCWS") should return 4 Each evaluation will be run with random data but according to comparable patterns. So, JEdUnit provides both: comparable evaluations and random test cases that are not prone to overfitting. 5.2

How Does JEdUnit Support Code Inspection?

JEdUnit integrates the JavaParser [18] library to parse Java source code into an abstract syntax tree. JEdUnit tests have full access to the JavaParser library and can do arbitrary checks with this parser. However, this can become quickly very complex. Therefore, JEdUnit tests can make use of a selector-based model that selects nodes from an abstract syntax tree (AST) of a compilation unit to detect and annotate such kind of violations in a practical way. The approach works similarly like CSS selectors selecting parts of a DOM-tree in a web context. The following examples demonstrate how to use this selector model on submission ASTs pragmatically for code inspections. The reader should inspect the two evasion examples shown in Listing 1.14 and 1.6 that show typical evasion patterns to avoid recursive or lambda/stream-based programming in Java. To detect lambda evasion (see Listing 1.6), we can add the following inspection to a test case. It scans for lambda functions that make use of block statements in their definition. To use blocks in lambda functions might indicate a kind of “problem evasion” in a submission – a student may try to evade from lambda programming into more simple statement based programming (which is likely not the intent of the assignment). Listing 1.15. Example for a lambda evasion check in JEdUnit. @Inspection ( description = " Lambda evasion inspection " ) public void a s s i g n m e n t I n s p e c t i o n s () { penalize (25 , " No blocks in lambdas . " , () -> inspect ( " Main . java " , ast -> ast . select ( LAMBDA ) . select ( BLOCK ) . annotate ( " No blocks in lambdas . " ) . exists () ) ); }

So, this inspection would effectively detect submissions like the one already presented in Listing 1.6. To detect recursion evasion (see Listing 1.4) we simply have to search for methods outside the main() method that make use of loop statements.

How Programming Students Trick and What JEdUnit Can Do Against It

Table 6. Predefined code inspections provided by JEdUnit. Check option

Default Description

CHECK IMPORTS

True

Checks if only cleared libraries are imported. java.util.* is allowed per default

CHECK COLLECTION INTERFACES

True

Checks if collections are accessed via their interfaces only (e.g. List instead of LinkedList is used in method parameters, method return types, variable and datafield declarators). To use concrete collection classes like ArrayList instead of their interface List is penalized by default, the same applies for Map. This check can be deactivated

ALLOW LOOPS

True

for, while, do while, forEach() loops are allowed per default. This can be deactivated and penalized. E.g., in case if methods must be implemented recursively

ALLOW METHODS

True

Methods are allowed per default. Can be deactivated and penalized in special cases, like to enforce the usage of lambda functions

ALLOW LAMBDAS

True

Lambdas are allowed per default. Can be deactivated and penalized in special cases, like to enforce the usage of methods

ALLOW INNER CLASSES

False

Inner classes are penalized by default. This check can be deactivated if inner classes shall be allowed

ALLOW DATAFIELDS

False

Checks if datafields (global variables) are used. This is penalized by default. However, this check must be deactivated for object-oriented contexts

ALLOW CONSOLE OUTPUT

False

By default System.out.print() statements are not allowed and penalized outside the scope of the main() method. This check can be deactivated

19

20

N. Kratzke Listing 1.16. Example to detect recursion evasion.

inspect ( cu , ast -> ast . select ( METHOD , " [ name != main ] " ) . select ( FOR , FOREACH , WHILE , DOWHILE ) . annotate ( " no loops allowed " ) . exists () );

The same technique can be used to detect submissions that make use of System.out.println() or System.exit() calls that might indicate injection attacks. Listing 1.17. Example to detect suspect method calls (used JEdUnit internally). inspect ( cu , ast -> ast . select ( METHODCALL ) . filter ( c -> c . toString (). startsWith ( " System . exit " ))) . annotate ( c -> " [ CHEAT ] Forbidden call : " + c ) . exists () );

The reader may notice that these selector-based code inspections are quite powerful and flexible to formalize and detect arbitrary violation patterns in student code submissions. JEdUnit makes intensively use of this code inspection technique and provides several predefined code inspections that can be activated by config options. Table 6 lists some of these predefined inspections. 5.3

How Does JEdUnit Supports Separation of Evaluation and Submission Logic?

The main problem of injection attacks is that the submission logic and the evaluation logic make use of the same console streams (stdout/stderr). The problem is that the grading component interprets this console output and this output could be compromised by student submission’s outputs (intentionally or unintentionally). JEdUnit solves this problem simply be redirecting the stdout/stderr console streams to other streams. Simple methods like this do this redirection. Listing 1.18. Redirection of stdout console stream. /** * Creates a file called console . log that stores all * console output generated by the submitted logic . * Used to isolate the evaluation output from the * submitted logic output to prevent injection attacks . * @since 0.2.1 */ public void redirectStdOut () throws Exception { this . redirected = new PrintStream ( Config . STD_OUT_REDIRECTION ); System . setOut ( this . redirected ); }

How Programming Students Trick and What JEdUnit Can Do Against It

21

The stdout/stderr streams are switched whenever a submission logic is called and switched again when the submission logic returns. Because the submission logic has no access to the evaluation library logic of JEdUnit, it can not figure out the current state of the redirection and is therefore not capable of reversing this redirection. In consequence, JEdUnit generates two stdout/stderr streams. One for the evaluation logic that is passed to the VPL grading component, and one for the submission logic that could be presented as feedback to the user for debugging purposes. However, currently, JEdUnit is simply ignoring the submission output streams. Figure 4 shows how JEdUnit changes the control flow (right-hand side) in comparison to the standard VPL approach (left-hand side). This stream redirection effectively separates the submission logic streams from the evaluation logic streams, and no stream injections can occur.

Fig. 4. Isolation of console streams in JEdUnit to prevent injection.

Additionally, JEdUnit prevents that student submissions can stop their execution via System.exit() or similar calls to bypass the control flow of the evaluation logic. This is done by prohibiting the reflection API2 and calls to System.exit() by code inspection means already explained in the previous Sect. 5.2.

6

Discussion of Results and Threats of Validity

If we compare the number of detected cheats in the systematization phase with the validation phase (see Table 3 in Sect. 3), we see the impact of applying solutions like JEdUnit. In the systematization phase cheat detection has been 2

The reflection API would enable to formulate arbitrary calling indirections that could be not identified by code inspections.

22

N. Kratzke

done manually and without notice to the students. In the validation phase cheat detection has been done automatically by JEdUnit and been reported immediately as JEdUnit evaluation feedback to the students. The consequence was that cheating occurred only in the first weeks. What is more, the cheating detected in the validation phase has been only observed in intermediate outcomes and not in the final submissions (while the cheating reported in the systematization phase is cheating that made it to the final submissions). So, we can conclude that JEdUnit effectively detects the four identified patterns of cheating (overfitting, evasion, redirection, and injection). Cheating was not even applied in intermediate results and came entirely to an end in the second half of the semester. It was no effective mean anymore from a students perspective. However, we should consider and discuss the following threats of internal and external validity [4] that apply to our study, and that might limit its conclusions. Selection Bias and Maturation. We should consider that the target audience of the systematization and validation phase differed a bit. In the systematization phase, we worked with first-semester novice programmers of a computer science and information technology and design study programme. In the validation phase, we worked with second-semester computer science students that made some programming progress during their first semester. This increase in expertise might be one reason why a general decrease in cheating was observable in the second semester. However, that explains not the fact that cheating comes entirely to an end in the second half of the semester. This decrease might be only explained by the fact that students learned that the grading automat “was not so dumb” anymore. Contextual Factors. This threat occurs due to specific conditions under which research is conducted that might limit its generalisability. In this case, we were bound to a Moodle-based APAAS solution. The study would not have been possible outside this technical scope. We decided to work with VPL because it is the only mature-enough open source solution for Moodle. Therefore, the study should not be taken to conclude on all existing APAAS systems. Especially JEdUnit is a Moodle/VPL specific outcome. However, it seems to be worth to check existing APAAS solutions whether they are aware of the four identified cheat-patterns (or attack vectors from a system security perspective) and what can be transferred from JEdUnit into further APAAS solutions. Hawthorne Effects. This threat occurs due to participants’ reactions to being studied or tested. It alters their behaviour and therefore, the study results. We can observe the Hawthorne effect quite obvious in Table 3. The reader should compare the systematization phase (unaware of observation) and the validation phase (noticing a more clever grading automat capable of detecting cheats). In the validation phase cheating was drastically reduced, only occurred in intermediate outcomes, and even comes entirely to an end in the second half of the

How Programming Students Trick and What JEdUnit Can Do Against It

23

semester. JEdUnit made the grading component more“clever” and changed the behaviour of students not to cheat. Because of this internal feedback loop, the study should not be taken to draw any conclusions on the quantitative aspects of cheating. Furthermore, the reader should additionally take all the threat to validity discussion of the systematization phase [11] into account in order to avoid drawing wrong conclusions.

7

Conclusion

Students trick – at least 15% of them. Maybe because grading components of automated programming assessment systems can be tricked very easily. Even first-year students are clever enough to do this. We identified recurring patterns like overfitting, evasion, redirection, and even injection tricks. Most APAAS solutions provide sandbox mechanisms and code plagiarism detection to protect the host against hostile code or to detect “copy-paste” cheating. However, these measures do not prevent submissions like System.out.println("Grade :=>> 100"); System.exit(0); which would give full points (in a Moodle/VPL setting) regardless of the assignment complexity. These two lines (or the injection idea behind) are like an “atomic bomb” for a grading component that is unaware to many programming instructors. Current APAAS solutions can little do against it. This study aimed to systematize such kind of recurring programming student cheat patterns and to search for mitigation options. To handle these kinds of cheats we do not only need sandboxing nor code plagiarism detection (that almost all APAAS solutions provide) but additionally means 1. to randomise test cases (contributes to 63% of all cheats, see Table 4), 2. to provide pragmatic code inspection techniques (contributes to 36% of all cheats, see Table 4), 3. and to isolate the submission and the evaluation logic consequently in separate processes (contributes to 1% of all cheats, see Table 4). Therefore, this paper presented and evaluated the solution proposal JEdUnit that handles these problems. According to our validation in three courses with over 260 students and more than 3.600 submissions of programming assignments, JEdUnit can make grading components much more “intelligent”. We showed how to make overfitting inefficient, how to detect evasion and redirection, and how to deny injection cheat patterns. When students learn that the grading component is no dumb automat anymore, cheating decreases immediately. In our case, cheating comes even entirely to an end. However, as far as the author oversees the APAAS landscape, exactly these mentioned features are only incompletely provided by current APAAS solutions. Most APAAS solutions focus on sandboxing and code plagiarism detection but

24

N. Kratzke

oversee the cheating cleverness of students. JEdUnit is not perfect as well, but it focuses on the cheating cleverness of students and is meant as fortifying add-on to existing APAAS solutions like Moodle/VPL. However, its biggest shortcoming might be its current technological dependence on Moodle/VPL and Java. Further research should focus on aspects of how the JEdUnit lessons learned can be transferred into further APAAS solutions. To do this, we have to ask how and which of the JEdUnit principles can be made programming language and APAAS solution agnostic? Therefore, JEdUnits working state can be inspected on GitHub3 to foster the adaption of the ideas and concepts behind JEdUnit.

References 1. Ala-Mutka, K.M.: A survey of automated assessment approaches for programming assignments. Comput. Sci. Educ. 15(2), 83–102 (2005). https://doi.org/10.1080/ 08993400500150747 2. Alraimi, K.M., Zo, H., Ciganek, A.P.: Understanding the MOOCs continuance: the role of openness and reputation. Comput. Educ. 80, 28–38 (2015). https:// doi.org/10.1016/j.compedu.2014.08.006, http://www.sciencedirect.com/science/ article/pii/S0360131514001791 3. Caiza, J.C., Alamo Ramiro, J.M.d.: Automatic grading: review of tools and implementations. In: Proceedings of 7th International Technology, Education and Development Conference (INTED2013) (2013) 4. Campbell, D.T., Stanley, J.C.: Experimental and Quasi-experimental Designs for Research. Houghton Mifflin Company, Boston (2003). Reprint 5. Douce, C., Livingstone, D., Orwell, J.: Automatic test-based assessment of programming: a review. J. Educ. Resour. Comput. 5(3) (2005). https://doi.org/10. 1145/1163405.1163409, http://doi.acm.org/10.1145/1163405.1163409 6. Gupta, S., Gupta, B.B.: Cross-site scripting (xss) attacks and defense mechanisms: classification and state-of-the-art. Int. J. Syst. Assurance Eng. Manage. 8(1), 512– 530 (2017). https://doi.org/10.1007/s13198-015-0376-0 7. Halfond, W.G.J., Orso, A.: Amnesia: analysis and monitoring for neutralizing SQL-injection attacks. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, ASE 2005, pp. 174–183. ACM, New York (2005). https://doi.org/10.1145/1101908.1101935, http://doi.acm.org/ 10.1145/1101908.1101935 8. Hunter, J.D.: Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007). https://doi.org/10.1109/MCSE.2007.55 9. Ihantola, P., Ahoniemi, T., Karavirta, V., Sepp¨ al¨ a, O.: Review of recent systems for automatic assessment of programming assignments. In: Proceedings of the 10th Koli Calling International Conference on Computing Education Research, Koli Calling 2010, pp. 86–93. ACM, New York (2010). https://doi.org/10.1145/1930464. 1930480, http://doi.acm.org/10.1145/1930464.1930480 10. Kluyver, T., et al.: Jupyter notebooks - a publishing format for reproducible computational workflows. In: Loizides, F., Schmidt, B. (eds.) Positioning and Power in Academic Publishing: Players, Agents and Agendas, pp. 87–90. IOS Press (2016) 11. McLaren, B.M., Reilly, R., Zvacek, S., Uhomoibhi, J. (eds.): CSEDU 2018. CCIS, vol. 1022. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21151-6 3

https://github.com/nkratzke/JEdUnit.

How Programming Students Trick and What JEdUnit Can Do Against It

25

12. Oliphant, T.: A Guide to NumPy. Trelgol Publishing (2006) 13. del Pino, J.C.R., Rubio-Royo, E., Hern´ andez-Figueroa, Z.J.: A virtual programming lab for moodle with automatic assessment and anti-plagiarism features. In: Proceedings of the 2012 International Conference on e-Learning, e-Business, Enterprise Information Systems, and e-Government (2012) 14. Pomerol, J.C., Epelboin, Y., Thoury, C.: What is a MOOC?, Chapter 1, pp. 1– 17. Wiley-Blackwell (2015). https://doi.org/10.1002/9781119081364.ch1, https:// onlinelibrary.wiley.com/doi/abs/10.1002/9781119081364.ch1 15. Ray, D., Ligatti, J.: Defining code-injection attacks. SIGPLAN Not. 47(1), 179–190 (2012). https://doi.org/10.1145/2103621.2103678, http://doi.acm.org/10. 1145/2103621.2103678 16. Rodr´ıguez, J., Rubio-Royo, E., Hern´ andez, Z.: Fighting plagiarism: metrics and methods to measure and find similarities among source code of computer programs in VPL. In: EDULEARN11 Proceedings, 3rd International Conference on Education and New Learning Technologies, IATED, pp. 4339–4346, 4–6 July 2011 17. Romli, R., Mahzan, N., Mahmod, M., Omar, M.: Test data generation approaches for structural testing and automatic programming assessment: a systematic literature review. Adv. Sci. Let. 23(5), 3984–3989 (2017). https://doi.org/10.1166/asl. 2017.8294 18. Smith, N., van Bruggen, D., Tomassetti, F.: JavaParser: Visited. Leanpub (2018) 19. Staubitz, T., Klement, H., Renz, J., Teusner, R., Meinel, C.: Towards practical programming exercises and automated assessment in massive open online courses. In: 2015 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), pp. 23–30, December 2015. https://doi.org/10.1109/TALE. 2015.7386010 20. Su, Z., Wassermann, G.: The essence of command injection attacks in web applications. SIGPLAN Not. 41(1), 372–382 (2006). https://doi.org/10.1145/1111320. 1111070, http://doi.acm.org/10.1145/1111320.1111070 21. Thi´ebaut, D.: Automatic evaluation of computer programs using moodle’s virtual programming lab (vpl) plug-in. J. Comput. Sci. Coll. 30(6), 145–151 (Jun 2015), http://dl.acm.org/citation.cfm?id=2753024.2753053 22. Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)

Exploring the Affordances of SimReal for Learning Mathematics in Teacher Education: A Socio-Cultural Perspective Said Hadjerrouit(B) University of Agder, Kristiansand, Norway [email protected]

Abstract. SimReal is an innovative technology that emerged the last few years in mathematics education. It provides new potentialities for mathematical learning by means of dynamic and interactive visualizations of mathematical concepts. This paper uses SimReal in the context of teacher education to explore the affordances and constraints of the tool for the learning of mathematics. It presents a framework that combines Vygotsky’s socio-cultural theory, the role of mediation, and the notion of affordance at a technological, pedagogical, and socio-cultural level. The aim of the article is to explore the extent to which SimReal as a mediating artifact affords students’ mathematical learning in teacher education. Keywords: Affordance · Agency · Mathematical learning · SimReal · Teacher education · Visualization

1 Introduction SimReal is a visualization tool that is used to teach a wide range of topics ranging from school to undergraduate mathematics. While the suitability of the tool has been evaluated in university mathematics courses, it is not the case in teacher education, without an appropriate theoretical framework. Several theoretical approaches can be used to address the use of digital tools in mathematics education such as instrumental or documentational approach, theory of didactical situations, or anthropological theory of didactics [1]. However, these approaches are not ready-made to assess the affordances of SimReal to support the learning of mathematics and explore the extent to which the tool is useful in teacher education [2]. The notion of affordance combined with a socio-cultural perspective provides a powerful framework to address the impacts of SimReal by considering the tool as a mediating artefact with affordances and constraints at the technological, pedagogical, and socio-cultural level. This suits well with the way mathematics is taught in the socio-cultural context of teacher education. This paper is an extended version of the article that was published in the proceedings of CSEDU 2019 [3]. This paper has a different title and structure. The sections that are extended are Introduction, Literature Review, Theoretical Framework, Discussion, and Conclusion and Future Work. The abstract has changed, and research questions © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 26–50, 2020. https://doi.org/10.1007/978-3-030-58459-7_2

Exploring the Affordances of SimReal for Learning Mathematics

27

are included. The result section contains more findings and information on affordances and programming issues. Blocks of text are also partly rewritten. This new version is structured as follows. Firstly, the visualization tool SimReal is presented. Secondly, the theoretical framework is described. This is followed by the context of the study, the research questions and methods. The results are then presented and discussed. Finally, a summary of the results, and recommendations for future work conclude the article.

2 Literature Review 2.1 The Notion of Visualization Mathematics educators have used visualizations for many years [4, 5]. Textbooks and Web sites are filled with pictures, illustrations, animations, diagrams, and graphs. Likewise, graphing calculators and videos have become integral part of mathematics classroom. According to Arcavi [6], visualization is defined as the ability to use and reflect upon pictures, graphs, animations, images, and diagrams on paper or with digital tools with the goal of communicating information, thinking about and advancing understandings. Visualizations include both the process and product of creation, and reflection upon pictures, images in the mind internally, and on paper or with digital tools externally [6]. Another use of the notion is visual representation, but sometimes no difference is made between mathematical visualizations (pictures, images, and diagrams) and mathematical representations (verbal, graphical, and symbolical). Finally, Presmeg [5] uses similar terms to define the notion of visualization. This includes processes of construction and transformation of both visual mental images and inscriptions (or external representations) of a spatial nature that may be implicated in doing mathematics. 2.2 The Visualization Tool SimReal SimReal is a visualization tool for a range number of mathematical subjects ranging from school to higher education. It uses a combination of graphic calculator, video lessons, video live streaming, and interactive simulations [7]. An example of SimReal utilization in mathematics education is given in Fig. 1. SimReal has more than 5000 applications, exercises, and tasks [8]. The tool can be divided into small subsets, while keeping the same basic structure and user interface. Hogstad, Ghislain, and Vos [9] investigated a subset of SimReal called Sim2Bil, which provides 4 windows for visualizations: simulation, graph, formula, and menu window. Some studies on SimReal focus on mathematics at the undergraduate university level [8, 10, 11], and aimed to report on students’ views of the tool as a supplement to ordinary teaching and its usefulness in difficult and abstract mathematical areas. Students’ views were mostly positive, except some technical difficulties, e.g., navigating through the tool. In engineering education, Hogstad, Ghislain, and Vos [9] studied a subset of SimReal called Sim2bil that aims at exploring how students use visualizations in their mathematical activities. Furthermore, Hadjerrouit and Gautestad [13] used the theory of instrumental orchestration and usability criteria to analyze teachers’ use of SimReal in an engineering class. Another study addressed the use of SimReal in an upper

28

S. Hadjerrouit

Fig. 1. An example of SimReal utilization in mathematics education [3].

secondary school [12]. It reports on positive students’ attitudes towards the use of the tool in classroom. Some students did not found visualizations useful, and the integration of the tool into the curriculum was not easy. Other studies were carried out in teacher education in a technology-based course. Firstly, Hadjerrouit [14] evaluated the suitability of the tool using usability criteria. Secondly, Hadjerrouit [15] addressed students’ perceptions of SimReal impacts on learning using a survey and open-ended questions based on three categories of affordances. Finally, Hadjerrouit [3] is a continuation of these two studies and aimed at using SimReal to investigate the affordances of the tool in a more indepth way by means of specific mathematical tasks - Pythagoras’ and Square theorem and programming languages to create digital visualizations.

3 Theoretical Framework 3.1 Socio-Cultural Theory and Mediating Artifacts An important element of Vygotsky’s socio-cultural theory [16] is the role of tools in educational settings, and the way they mediate teaching and learning activities. A crucial issue is thus the role of mediating artifacts and how teachers and students take advantage of artifacts, such as mathematical symbols, textbooks, blackboard, and in particular digital artifacts, such as GeoGebra, Excel, and SimReal. These artifacts have both affordances and constraints. Based on the writings of Vygotsky, the role of mediating artifacts in mathematics education can be represented as follows (Fig. 2). In this paper, the subject is the student, the object is the mathematical concept to be learned, e.g., Pythagoras’ theorem, and the mediating artifact is SimReal. Among several theoretical approaches that can be applied to explore the impacts of SimReal, such as the instrumental approach [17], the notion of affordance combined with Vygotsky’s sociocultural stance provides the most appropriate framework to address the affordances and constraints of SimReal in a teacher education context.

Exploring the Affordances of SimReal for Learning Mathematics

29

SimReal as mediating artifact with affordances and constraints

Subject (Student)

Object (mathemaƟcal learning)

Fig. 2. SimReal as mediating artifact with its affordances and constraints.

3.2 The Notion of Affordance The notion of affordance was originally proposed by Gibson [18]. It refers to the relationship between an object’s physical properties and the characteristics of a user that enables particular interactions between user and object. More specifically, Gibson used the term “affordance” to describe the action possibilities offered to an animal by the environment with reference to the animal’s action capabilities [18, 19]. A typical example is a tall tree that has the affordance of food for a giraffe because it has a long neck. Gibsons’ notion of affordance implies a complementarity between the animal and the environment. The notion of affordance was introduced to the computer world by Norman [20]. It refers to the perceived and actual properties of the thing, primarily those fundamental properties that determine just how the thing could possibly be used. According to Norman [20], an affordance is the design aspect of an object, which suggests how the object should be used and a visual clue to its function and use. Examples of affordances are user interface elements, which directly suggest suitable actions: Clickable geometrical figures, draggable sliders, pressable buttons, selectable menus for figures or algebraic calculations, etc. Several research studies used Norman’s ideas to implement the notion of affordance in various educational settings. For example, Turner and Turner [21] specified a threelayer articulation of affordances: Perceived affordances, ergonomic affordances, and cultural affordances. Likewise, Kirchner, Strijbos, Kreijns, and Beers [22] described a three-layer definition of affordance: Technological affordances that cover usability issues, educational affordances to facilitate teaching and learning, and, social affordances to foster social interactions. In mathematics education, Chiappini [23] applied the notions of perceived, ergonomic, and cultural affordances to Alnuset, a digital tool for high school algebra. Finally, Hadjerrouit [3, 15] presented a model with three levels of affordances of digital tools in teacher education. The same model is used in this paper. De Landa [24] emphasized that affordances are not intrinsic properties of the object. Rather affordances become actualized in a specific context, e.g., the socio-cultural context of the classroom. In other words, affordances emerge from the relationship between the object, e.g., a digital tool, and the particular environment with which it is interacting. From this perspective, the specific context of the mathematics classroom in teacher education may include several artifacts with their affordances and constraints, which interact with the teacher and students. The artifacts used in classroom may include paper-pencil techniques, the blackboard, Interactive White Board (IWB), PowerPoint slides, and

30

S. Hadjerrouit

diverse digital tools, such as Smartphones, IPad, GeoGebra, SimReal, and mathematical symbols, notations, and representations, etc. Hence, artifacts with their affordances and constraints mediate between the subject (student) and the object (mathematics). This view of digital tools is in line with Vygotsky’s socio-cultural stance. 3.3 SimReal Affordances Based on the research literature described above, the specificities of mathematics education, and the socio-cultural stance as an overreaching framework, this study proposes three categories of affordances at six different levels (Fig. 3):

Fig. 3. Three categories of SimReal affordances at six different levels [3].

a) Technological affordances that describe the tool’s functionalities b) Pedagogical affordances at four levels: • • • •

Pedagogical affordances at the student level or mathematical task level Pedagogical affordances at the classroom or student-teacher interaction level Pedagogical affordances at the mathematics subject level Pedagogical affordances at the assessment level

c) Socio-cultural affordances that cover curricular, cultural, and ethical issues The affordances that are highlighted in this section are only a subset of potential affordances that could emerge in specific classroom contexts. These affordances are

Exploring the Affordances of SimReal for Learning Mathematics

31

some of the most important ones from the researcher’s perspective, but as De Landa [24] pointed out affordances are not intrinsic properties of the object, of the visualization tool SimReal in this study. Rather affordances of SimReal become actualized in specific contexts, e.g., the socio-cultural context of the classroom in teacher education. Regarding the affordance model presented in Fig. 3, two types of technological affordances can be distinguished: Ergonomic and functional affordances. From the ergonomic point of view, these are ease-of-use, ease-of-navigation, and accessibility of SimReal at any time and place, accuracy and quick completion of mathematical activities. From the functional point of view, SimReal provides support for performing calculations, drawing graphs and functions, solving equations, constructing diagrams, measuring figures and shapes, and presenting mathematical concepts visually and dynamically. Technological affordances are a pre-requisite for any digital tool and provide the ground for pedagogical affordances in educational settings. There are several pedagogical affordances at the student level, e.g., using SimReal to freely build and transform mathematical expressions that support conceptual understanding, such as collecting real data, creating a mathematical model, using a slider to vary a parameter or drag a corner of a triangle, moving between symbolic, numerical, and graphical representations, simulating mathematical concepts, or exploring regularity and change [25]. At this level, the motivational factor is important to engage students in mathematical problem solving. Furthermore, feedback of the tool in various forms to students’ actions may foster mathematical thinking, e.g., whether an answer given by the student is correct or not. Programming mathematical tasks also provide affordances that may foster mathematical learning and understanding. Likewise, several pedagogical affordances can be provided at the classroom level [25]. Firstly, affordances that result in changes of interpersonal dimensions, such as change of teachers’ and students’ role, less teacher-directed and more student-oriented instruction. Secondly, affordances that foster more learner autonomy, resulting in students taking greater control over their own learning, and using SimReal as a “new” authority in assessing their own learning. Other affordances at this level are change of social dynamics and more focus on collaborative learning and group work. These affordances have an impact on the didactical contract and the role of the teacher as an authority in classroom [26]. Variation in teaching and differentiation of tasks are other affordances of digital tools at this level [27]. Other affordances at this level may result in flipping the classroom, which is an alternative way of using SimReal out-of-class. Furthermore, three types of pedagogical affordances can be provided at the mathematics subject level [25]. The first one is fostering mathematical fidelity according to “ideal” mathematics, the congruence between machine mathematics and “ideal” and paper-pencil mathematics and promoting faithfulness of machine mathematics [28]. The second affordance is amplifying and reorganizing the mathematical subject. The former is accepting teachers’ and curricular goals to achieve those goals better. Reorganizing the mathematical subject means changing the goals by replacing some issues, adding and reordering others. For example, in advanced topics such as calculus there might be less focus on skills and more on mathematical concepts [25]. In geometry, there might be and

32

S. Hadjerrouit

emphasis on more abstract geometry, and away from facts, more argumentation and conjecturing [25]. Likewise, it may be useful to support tasks that encourage metacognition, e.g., starting with real-world tasks, and using SimReal to generate results. Affordances at the assessment level consist of summative and formative assessment. Summative assessment is for testing, scoring and grading, and it can be provided in the form of statistics that the tool can generate. Formative assessment is crucial for the learning process. Feedback is an essential component of formative assessment, and it can take many forms, e.g., immediate feedback to students’ actions, a combination of conceptual, procedural, and corrective information to the students, or asking question types, etc. Finally, several socio-cultural affordances can emerge at this level. Firstly, SimReal should provide opportunities to concretize the mathematics subject curriculum in teacher education. Secondly, SimReal should be tied to teaching mathematics in schools, and support the learning of mathematics at the primary, secondary, and upper secondary school level. In other words, teachers using SimReal should consider the requirement of adapted education. Finally, other socio-cultural affordances are associated with ethical, gender, and multi-cultural issues, which are too complex to address in this paper.

4 The Study 4.1 Context of the Study and Participants The study was conducted at the University of Agder in the context of a technology-based course for teacher educators. The course covered various digital tools and their didactical use in classroom, such as Excel, GeoGebra, Aplusix, Numbas, Smart Board, Campus Inkrement, diverse pedagogical software, and SimReal. Fifteen teacher students (N = 15) were enrolled in the course in 2016. The students had very different knowledge background in mathematics ranging from primary to upper secondary and university level. The students were categorized in four groups according to their knowledge level in mathematics associated with their study programmes: Primary teacher education level 1–7, primary teacher education level 5–10, advanced teacher education level 8–13, and mathematics master’s programme. The participants had varied experience in using digital tools and resources inside or outside classroom settings. In terms of mathematics, the basic requirement was the completion of a bachelor-degree in teacher or mathematics education, which presupposes at least the acquisition of mathematics at the primary and middle school level. In terms of digital tools, the recommended prerequisites were basic knowledge in information and communication technologies and some experiences with standard digital tools like text processing, spreadsheets, calculators and Internet. None of the students had any prior experience with SimReal. 4.2 Research Questions Based on the results achieved in the paper presented in Hadjerrouit [3], the purpose of this work is to extend the investigation of affordances and constraints of SimReal and their impact on students’ mathematical learning in teacher education. It addresses two main research questions:

Exploring the Affordances of SimReal for Learning Mathematics

33

a) What are the students’ perceptions of SimReal’s affordances and constraints? b) What are the impacts of SimReal on mathematical learning in terms of affordances and constraints in teacher education? The first research question is addressed in the result section, and the second in the discussion. 4.3 Teaching Activities A digital learning environment centered around SimReal was created over three weeks, starting from 25 August to 8 September 2016. The SimReal environment included video lectures, visualizations, and simulations of basic, elementary, and advanced mathematics, and diverse online teaching material. Basic mathematics focused on games, dices, tower of Hanoi, and prison. Elementary mathematics consisted of multiplication, algebra, Pythagoras’ and Square theorem, and reflection. The topics of advanced mathematics were measurement, trigonometry, conic section, parameter, differentiation, Fourier, and programming issues. To assess students’ experiences on specific mathematical topics that are of considerable interest for the participants, two specific tasks were chosen. The first one was Pythagoras’ theorem, and the many ways of representing it. The theorem has also been given numerous proofs, including both geometric and algebraic proofs, e.g., proofs by dissection and rearrangement, Euclid’s proof, and algebraic proofs. Thus, Pythagoras’ theorem is more than just a way of calculating the lengths of a triangle. An example of representing the theorem is given in the following figure (Fig. 4).

Fig. 4. An example of representation of Pythagoras’ theorem [3].

The second task was the Square theorem. Like Pythagoras’ theorem, there are many ways of using and representing the theorem (Fig. 5). 4.4 Methods This work is a single case study in teacher education. It aims at exploring the affordances of SimReal for mathematical learning in teacher education. Both quantitative and qualitative methods were used to collect and analyze the affordances and constraints of SimReal and the students’ experiences with the tool. The following methods were used:

34

S. Hadjerrouit

Fig. 5. An example of representation of the Square theorem [3].

a) A survey questionnaire with a five-point Likert scale from 1 to 5, and quantitative analysis of the results b) Students’ comments in their own words to each of the statements of the survey c) Students’ written answers to open-ended questions d) Qualitative analysis of students’ comments on point b and answers to open-ended questions to point c e) Task-based questions on Pythagoras’ and Square theorem, and programming issues The design of the survey was guided by the theoretical framework and the research questions. To measure students’ perceptions of SimReal, a survey questionnaire with a five-point Likert scale from 1 to 5 was used, where 1 was coded as the highest and 5 as the lowest (1 = “Strongly Agree”; 2 = “Agree”; 3 = “Neither Agree nor Disagree”; 4 = “Disagree”; 5 = “Strongly Disagree”). The average score (MEAN) was calculated, and the responses to open-ended questions were analyzed qualitatively. The survey included 72 statements that were distributed as follows: Technological affordances (12), pedagogical affordances at the student level (11), classroom level (19), mathematical subject level (9), assessment level (10), and finally socio-cultural level (11). The students were asked to respond to the survey using the five-point Likert scale and to comment each of the statements in their own words. The students were also asked to provide written answers in their own words to open-ended questions. The responses to students’ comments to the survey and open-ended questions were analyzed qualitatively. Of particular importance are task-based questions on Pythagoras’ and Square theorem to collect data on affordances of these two mathematical tasks. Additional questions on programming issues were given to the students to assess the affordances of programming languages for the learning of mathematics. Asking task-based questions provides information on the affordances of SimReal. This method also provides more nuanced and in-depth information about the students’ experiences with SimReal. The analysis of the data was guided by the specified affordances of the theoretical framework presented in Fig. 3, and open coding to bring to the fore information about affordances that was not covered by the theoretical framework.

Exploring the Affordances of SimReal for Learning Mathematics

35

5 Results 5.1 Technological Affordances The results achieved by means of the survey questionnaire and open-ended questions show that the affordances of SimReal emerged at all study levels. Students from the study programme 1–7 perceived affordances, but in a less evident way. They became more visible for students from the study programme 5–10. Affordances came to the fore at the advanced level, and for mathematics education students. Globally, most students pointed out that SimReal still lacks a user-friendly interface and that it is not easy to use, to start and to exit. For many students, the tool was accessible anywhere and anyplace, but the navigation through the tool is still not straightforward. On the positive side, SimReal has a ready-made mathematical content, and that the video lessons, simulations, animations, and live streaming are of good quality. This is reflected in many students’ responses. Looking carefully at the students’ answers to open-ended questions, some representative comments are as follows: • • • •

SimReal has good visualizations for different categories of users It takes time to get familiar with SimReal and user interface SimReal offers a lot of functionalities to understand and teach complex subjects The video lessons are very accurate and straight to the point. Graphics could be improved with sound effects • Functionality clearly exists somewhere within SimReal, but it is difficult to find and use • Too many unnecessary buttons and menus steal attention from the topic and task. Unattractive design 5.2 Pedagogical Affordances In terms of pedagogical affordances at the student level, many students think that SimReal provides real-world tasks, which engage them in mathematical problem solving, particularly when using visualizations to simulate mathematical concepts. Most students think that visualizations are useful to gain knowledge that is otherwise difficult to acquire. They also liked the combination of live streaming of lessons, video lectures, simulations, and animations. Likewise, SimReal provided affordances to explore variation and regularities in the way mathematics is taught, e.g., vary a parameter to see the effect of a graph or function. The students also think that SimReal is congruent with paper and pencil techniques. On the negative side, most students think that SimReal is not helpful to refresh students’ mathematical knowledge. Some representative comments that reflect this analysis are: • It helps a lot to learn in different ways • SimReal offers good and different visualizations to understand mathematics • There are many good visualizations, exercises, and tasks that are well described, including their solutions • I was really impressed by the presentation of several examples and proofs of a specific mathematical subject to get a deeper understanding

36

S. Hadjerrouit

In terms of pedagogical affordances at the classroom level, the majority agreed that they can use SimReal on their own, without being completely controlled by the teacher, and, as a result, they do not need much help from the teacher or textbooks to solve exercises. Likewise, most students think that the tool can be used as an alternative to textbooks and lectures. The tool also facilitates various activities consisting of problem solving, and listening to video and live streaming of lectures, and several ways of representing mathematical knowledge by means of texts, graphs, symbols, animations, and visualizations. In terms of differentiation and individualization, many students believe that the level of difficulty of the mathematical tasks is acceptable, but it is relatively difficult to adjust the tool to the students’ knowledge level. Even though the degree of autonomy is not very high, it is good enough to allow students to work at their own pace. On the negative side, most students think that SimReal does not support much cooperation or group work, and it does not have collaborative tools integrated into it. Some students’ representative comments that reflect this analysis are: • Can find different simulations for different students • SimReal offers a good variation of mathematical activities, but they should perhaps be related to the curriculum • SimReal provides a high degree of autonomy, even though it is not always straightforward • A better use interface could increase student autonomy, because it is sometimes not easy to navigate through it • SimReal can be useful as a supplement to ordinary teaching • I think SimReal is one of the best ways of teaching mathematics if combined with lectures, textbooks, etc. • The work with SimReal is controlled by the teacher when it comes to teaching, but not in terms of learning, where the main interaction is between the tool and the student In terms of pedagogical affordances at the mathematics subject level, most students agreed that SimReal provides a high quality of mathematical content, and that it provides real-world applications that foster reflection, metacognition, and high-level thinking. Likewise, the vast majority found that SimReal is mathematically sound, and that the tool can display correctly mathematical formulas, functions, graphs, numbers, and geometrical figures. The students also highlighted some constraints. The overwhelming majority think that the digital tool GeoGegra has a better interface, and it is better to express mathematical concepts than SimReal. Finally, the combination of mathematics and applications in physics and engineering is considered useful to gain mathematical understanding. Some students’ representative comments that reflect this analysis are: • SimReal presents a lot of mathematical concepts that are useful to think about mathematics in different ways • SimReal is mathematically sound and has a high quality of representations, e.g., formulas, functions, graphs, etc. • SimReal has a good and high quality of visualizations • It is good with some animations from the real-world to visualize. Several forms of representations help a lot, e.g., animations from the real-world

Exploring the Affordances of SimReal for Learning Mathematics

37

• SimReal (…) leaves space for further mathematical thinking and gives answers on questions like “what if”, “what happens when”, or “is it possible that” In terms of affordances at the assessment level, most students think that SimReal gives directly feedback in the form of dynamic animations and visualizations. Likewise, SimReal provides satisfying solutions step-by-step, but not for all tasks. Still, SimReal does not provide several types of feedback, differentiated knowledge on student profiles, several question types, and statistics. Finally, the degree of interaction is evaluated as satisfying. Some students’ representative comments that reflect this analysis are: • I haven’t seen full feedback, for example on multiple-choice or quiz, but just rightwrong response • Not real feedback. Makes it difficult for students to use alone. 5.3 Socio-Cultural Affordances In terms of affordances at the socio-cultural level, most students think that SimReal is an appropriate tool to use in teacher education, but it does not take sufficiently into account the requirement for adapted education. Furthermore, most students believed that SimReal is appropriate to use in secondary schools, and in a lesser degree in middle and primary schools. On the negative side, the vast majority will not continue using video lessons and live streaming to learn mathematics, but some will still be using video simulations in the future. Nevertheless, most students think that the tool enables the teacher to concretize the mathematics subject curriculum. Some students’ representative comments that reflect this analysis are: • I think SimReal can be used in middle schools, and upper secondary level, but teachers don’t have enough time to design activities at this level • SimReal is not easy for pupils in primary schools • SimReal is good enough for middle schools, and it is easier to adapt for secondary schools • There are better alternatives, e.g., GeoGebra. Finally, it worth noting that affordances do not emerge in the same degree for all students. Rather they become actualized in relationship to the participants’ knowledge level from the 4 categories of study programmes: Primary teacher education level 1–7 and level 5–10, master programme, and advanced teacher education level 8–13. 5.4 Affordances of Pythagoras’ Theorem Students were engaged in 16 different approaches to exploring Pythagoras’ theorem [29]. These were divided into paper-based (1–9) and SimReal-based approaches (10– 16). The students were asked to report on SimReal affordances and critically reflect on their impact on learning Pythagoras’ theorem by responding to 5 specific questions.

38

S. Hadjerrouit

a) If you should choose only one of the 16 different approaches of explaining Pythagoras’ theorem, which of them would you prefer? The students provided a variety of preferences in order of priority according to the perceived affordances of the approaches. Some suggestions were 2/5/7/12/15/16, 2/3/5, 2/3, and 7/14. One student provided an interesting and detailed solution. Firstly, the student decided to use approach 15 as a brief introduction, and then Pythagoras’ theorem 1, both as a simple presentation of the equation and first visual proof of the theorem. Then, the student suggested using approach 2 as a general formula and 3 as a more specific and realistic one. The student also suggested a combination of approach 10 (paper-based) and 14 (SimReal-based), but without the written explanation or mathematical formula. Instead, one can start with a given problem such as “find the area of the pool or the area of the baseball field”. After having discussed some suggestions, the student can then check the explanation provided by SimReal in terms of written text or mathematical formula. Finally, the student would demonstrate approach 4 using rigorous proof through the use of algebraic and geometrical properties. Summarizing, several affordances emerged in this context. These relate to various mediating artifacts: realistic task, pen-paper formulas, digital visualizations with written explanations of the theorem, rigorous mathematical proofs of the theorem, and a combination of paper-based and SimReal visualizations. b) If you should combine one of the pen/paper approaches and one of the digital simulations which of them would you prefer? The students provided a variety of preferences, such as 3/12 and 9/12. As described above, one student combined the pen-paper-based approach 10 with the SimReal-based simulation 14. After the use of the pen-and-paper solution and an attempt to calculate the blue area of Fig. 4, the student tried then to calculate several areas of the figure. As a result, approach 10 and 14 provide different solutions to the problem, but they complement each other in terms of their affordances. c) If you should choose the combination of the two approaches 9 and 12, how would you in detail explain Pythagoras’ theorem? A variety of explanations were provided to explain Pythagoras’ theorem using various elements such as the layout and colors of the figure, and the SimReal-based simulation, and mathematical explanations. A good combination of 9 and 12 is as follows. The student starts with 9 (angle A = 90°), and notes that the pink and the blue area of the bottom square is equal to the other two corresponding square areas (blue and pink). Moreover, the sum of the blue and pink square areas is equal to the bottom square area, consisting of these two rectangles. Moving on to 12, it is worth mentioning that their area remains unchanged. Before using the SimReal-based solution, some figures of the rectangles on the blackboard would be useful when revising or presenting the area formula of the parallelogram. The student combined the scroll bar of the digital simulation and considered cases of parallelograms on the blackboard depending on the position of the parallelogram and its height. This could be a good reasoning step to explain why the area remains the same. The digital simulation could be used to clarify the question.

Exploring the Affordances of SimReal for Learning Mathematics

39

Summarizing, this task shows that the combination of approach 9 and 12, supported by the affordances of the blackboard were useful to explain Pythagoras’ theorem. d) Do you think that teaching different ways of Pythagoras’ theorem by combining pen/paper and SimReal-based simulations would help in the understanding of this topic, or do you think it would be confusing for the students? The students pointed out that the combination of pen-paper techniques and SimReal simulations is helpful to understand Pythagoras’ theorem depending on time and pedagogical constraints, and students’ knowledge level as well. They think that it would be meaningful to use several approaches to teach Pythagoras’ theorem considering students’ knowledge levels and learning styles. It is therefore important to present mathematical tasks using different ways. By showing a figure describing Pythagoras’ theorem, the teacher has a good opportunity to explain the mathematical formula in his/her own words, before showing a digital simulation of the theorem. This may motivate the students and stimulate their curiosity. Approach 9 or 3 combined with simulation 12 would give a good effect. In most cases, a good combination of pen/paper and SimReal simulations is preferable, but there may be some confusing cases that make the understanding of the topic more difficult. In those cases, the task should be either pen/paper solution or SimReal simulation, but not both, even though the teaching may be less efficient. As a result, a good way of teaching Pythagoras’ theorem is a combination of SimReal affordances with those of paper-pencil techniques. e) Give your own comments about how to teach Pythagoras’ Theorem The participants suggested several teaching methods: 1. Let the students study Pythagoras’ theorem in the grid, using example 003, 004, and 005. These provide visual clues (e.g. dynamic change of symbols) that facilitate the understanding of the theorem. 2. Start by introducing the principle of Pythagoras’ theorem, thus keeping aside algebra as much as possible and visually displaying it in different ways during the initial phase. This method applies to students at the middle school level. 3. Draw on paper and ask the students to study the figure to see that a2 + b2 = c2. Then, let the students work with tasks using the Pythagoras’ formula in different ways to find unknown parameters. 4. First introduce a problem that stimulates students to use the formula to solve it. Problem-based teaching when it is possible is always the best method. 5. It is too early to introduce Pythagoras’ theorem at the primary school level. However, if pupils should learn it, it must be simplified using geometric representations and explanations, such as 002. Perhaps, it may be a good exercise for good students. 6. Start using a visualization tool, a physical or digital one. An important feature of the tool is that it has to be dynamic. This can be achieved easier with a computer simulation than with physical tools. 7. Teach using various methods, because students learn in different ways. By combining pen-paper techniques and digital aids, one can reach a larger student group.

40

S. Hadjerrouit

8. It is important to focus on specific applications and differences between proof examples, e.g., numeral examples versus algebraic expressions. There are also examples from ancient times that may be interesting to draw in, e.g. Egyptian and Chinese mathematics. 9. An interesting point about Pythagoras’ theorem is that a teacher could actually choose any shape to lie on the lengths of the triangle - it does not have to be a square. An important point with Pythagoras’ theorem is the fact that the theorem itself is so simple and elegant, which is probably the reason why many adults still remember it from their own school education. 5.5 Affordances of Square Theorem Students were engaged in 6 different approaches to exploring the Square theorem [30]. These are divided into paper-based (1–3) and SimReal-based approaches (4–6). The students were asked to study them and report on their affordances and constraints by responding to 4 specific questions. a) Pen/paper proofs (1, 2, 3) versus SimReal-based proofs (4, 5, 6) of the theorem Most students preferred a combination of pen-paper techniques with SimReal proofs, but those participating in the problem-solving should not just passively read the proofs. They should rather take advantage of the dynamic visualizations provided by SimReal. Regarding the Square theorem, the pen-paper approaches 1-2-3 do not necessarily promote students’ understanding, because these are based on a more mechanical calculation method. SimReal simulation 4 is a good approach to visualizing the theorem. However, the second and third approach are somewhat tricky to understand geometrically, but still better than just formulas. Therefore, approaches 4–6 should be used to create dynamic images of the Square theorem. Another student preferred pen/paper proofs (1–3) and felt that these methods are mostly used to describe algebraic operations and expressions. However, these approaches are important, but only if the teacher takes a more practical approach to the theorem, and the geometrical SimReal-based approaches could be used to enhance the understanding of the theorem. b) In what way do you think the use of SimReal can provide a better understanding of the Square theorem? The students think that SimReal provides affordances to improve the understanding of the Square theorem by visualizing mathematical concepts. More specifically, one student suggested a quiz, and another a “fill the blanks” exercise, where a student could get an immediate feedback if the answer is correct or not. Globally, the students think that the digital simulations are beneficial for visually strong students, considering the fact that upper secondary mathematics becomes more theoretical the higher up the grade, and, as a result, there is less focus on conceptual understanding, and why and how to carry out calculations. Digital simulations can have therefore a positive effect on students’ learning and help them to see how mathematical formulas work.

Exploring the Affordances of SimReal for Learning Mathematics

41

c) Give some comments about how you could think to improve either by pen/paper or SimReal the understanding of the Square theorem Students provided many ways of improving the understanding of the Square theorem. One solution consists of exercises both with symbols and numbers, but also allowing the use of expansion like (a + b)2 = (a + b) (a + b), until the student becomes familiar with the theorem. The paper-and-pen exercises 1-2-3 show specific procedures on how a student can change and calculate the Square theorem tasks, but the procedures would have been clearer if there was a headline for each example to show how the theorem works. This is clearly a constraint that should be considered. SimReal solutions 4-56 have digital simulations with explanations, color coding, and reference to formulas. These cover the Square theorem quite well, and there is no need for improvement. Likewise, SimReal simulations can make it easier for students to see how the formulas work, and this is especially true for the 1st and 2nd approach to the Square theorem. d) Do you prefer learning the Square theorem in one way or do you feel a better understanding learning it in different ways? As already stated above, most students think that a combination of different approaches is the most appropriate way to provide a better understanding of the theorem, while being careful not to use several approaches at the same time as this might be counterproductive. They also argued that it is important to see mathematics from different angles. Using new methods to explain the solution to a single problem will give new perspectives about the problem and the corresponding solution, and how these are interrelated. A good example is the figure-based and algebraic proofs of the Square theorem. Showing different point of views of the theorem (like a geometrical one) and applications of the theorem could indeed be very efficient. Summarizing, students indicated that visualizing and simulating mathematical concepts could be useful to make mathematics easier to understand, because visualization tools like SimReal provide a concrete way of making mathematical concepts more dynamic. SimReal provides a huge variation of visualization examples for the teacher to use in classroom. As shown, SimReal can support the understanding of basic Square or Pythagoras’ theorem by visualizing what happens dynamically with the figures. Nevertheless, a combination of pen/paper, SimReal visualizations, and chalk-blackboard could be more efficient way of teaching mathematics than just digital tools. A comparison of the results in terms of affordances achieved by means of taskbased questions reveal that these are globally in line with those achieved by the survey questionnaire and open-ended questions in terms of pedagogical affordances at the student level. The issues that correspond very well are the usefulness of visualizations and simulations for understanding the Square and Pythagoras’ theorem, the congruence of SimReal-based visualizations with paper-pencil techniques, and a combination of different representations and approaches to the theorems.

42

S. Hadjerrouit

5.6 Programming Affordances Programming has rapidly grown as an innovative approach to learning mathematics at different levels. The topic will become compulsory in schools from the study year 2020. As a result, it is expected to improve SimReal by including programming tasks using Python and other programming languages such as Scratch. Given this background, it was worthwhile to ask the students about the affordances of programming. a) Would it be of interest to program your own simulations in teaching mathematics? The study reveals that SimReal can provide more affordances in terms of programming mathematical concepts. Basically, most students think that teachers with experience in programming mathematical simulations and visualizations will open a new way of teaching mathematics. For example, a teacher could focus on subjects and tasks that are difficult for the students to comprehend. Another possibility is to program tasks that are not already covered by SimReal, but that are already available online. Most importantly for teachers is the use of different methods to promote understanding and making new connections. Hence, it may be worthwhile to take advantage of simulations and explanations combined with some programming examples so that the knowledge to be learned is presented with various methods. On the other hand, there are many good explanations of mathematical tasks online today so that it might not be necessary for teachers to program on their own. One student pointed out that he would not spend time programming his own simulations, even though he sees an advantage in it. Programming is demanding in terms of efforts and time for the student to start with such a task. An expert in programming could also contribute to this purpose together with a mathematics teacher. However, the focus should be on mathematical knowledge rather than programming issues. Since this is about advanced mathematics, it requires a higher level of programming knowledge, and it may be necessary to evaluate whether the students have sufficient understanding of mathematics to be able to program themselves. b) Do you think it would be of interest and help that students/pupils could program their own simulations? The participants think that students/pupils would be interested in programming visualizations if they have acquired sufficient skills in this matter. This would contribute to enhanced motivation and increased understanding of mathematics, because they will be forced to fully comprehend mathematics before they could program visualizations. Likewise, it could be of help for the students if they could program their simulations by themselves. However, it is crucial that they focus on the mathematical part of the task rather than programming issues alone. Programming their own simulations could be motivating for those students who are both interested and knowledgeable in programming. This presupposes, however, that the students have understood the mathematics before getting started with programming. Students having difficulties in mathematics should rather spend their time on it. Hence, programming would be helpful if it contributes to the learning of mathematics. Likewise, advanced mathematics requires a

Exploring the Affordances of SimReal for Learning Mathematics

43

higher level of programming knowledge, and it may therefore be necessary to evaluate whether students have sufficient understanding of mathematics in order to be able to program themselves. Finally, only one student pointed out that he would not spend time on programming, even though he sees an advantage in it. Summarizing, programming mathematical tasks can contribute to the understanding of mathematics, but it is demanding in terms of efforts and time, especially for novice students. Based on students’ responses to questions a and b, three groups of potential programmers emerged. Firstly, those who like programming. Secondly, those who somehow like programming, but under certain conditions. Finally, those who don’t like programming, and prefer using digital tools. The students in the first group think it would be useful if teachers could create their own simulations. Drawing mathematical figures on the board is often challenging as it is difficult to draw precisely, but “doing it on a digital tool, the data will be perfect”. There are often many functionalities teachers could use, but these cannot be done on the board or with pen-paper techniques. Moreover, it would be beneficial if teachers could program their own simulations in the sense that this activity would give them more freedom to adapt their lessons to the individual students. However, programming activities are time-consuming and could detract from the topic being taught. Therefore, it is uncertain if this demanding task would be beneficial for the students to program their own simulations. Furthermore, since everyone has his/her own way of learning, it will be helpful for teachers to program their own simulations that are best suited to their class. In slightly different terms, some students believe that programming is a useful skill if teachers are able to program their own simulations and motivate their students in doing so. Likewise, it is an advantage that students are able to program their own simulations because learning is enhanced when they practice what they are being taught. The second category of students has a slightly different view of programming. Some think in terms of customization by either modifying existing simulations or creating new ones. In addition, a conceptual understanding of mathematics is required in order for the students to be able to program their own simulations. On the other hand, programming can provide a deeper and better understanding of mathematics and may help students to design appropriate tasks. Similarly, teachers may need some programming skills, but there already exist several simulation tools that can be used instead. Students may also benefit from programming simulations, but it is uncertain that they will learn more about Pythagoras’ theorem by simulating the theorem. Clearly, digital simulations would be useful only if they stimulate students’ understanding of mathematics. Finally, only few students don’t like learning programming. One group of students pointed out that teachers are too busy to spend many hours making simulations. In addition, there are many powerful digital tools at their disposal. On the other hand, it could be exciting for students to combine programming with mathematics, but it is uncertain whether this is the most effective way of learning mathematics. Another reason is that it is unnecessary to provide programming opportunities at the primary school level, but it may be useful for interested students. This requires that teachers are able to program by themselves, but they don’t need to design simulations for the whole class.

44

S. Hadjerrouit

6 Discussion According to Vygotsky’s socio-cultural theory, mathematical learning is mediated by artifacts such as language, mathematical symbols, and other material and digital artifacts. Today innovative technologies such as SimReal are providing powerful ways of representing mathematical concepts dynamically so that the students can see and hopefully understand the meaning of the represented and visualized concepts. In this section, the impacts of SimReal on students’ mathematical learning in terms of affordances and constraints are discussed. 6.1 Affordances The first issue raised in this study is the identification of affordances at three different levels: technological, pedagogical, and socio-cultural level. Technological affordances are self-evident requirements for any digital tool in educational settings, and a prerequisite for pedagogical affordances at the student, classroom, mathematics subject, and assessment level. Besides providing affordances for individual and collaborative problem-solving, digital tools should support mathematical thinking, conceptual understanding, and thus provide mathematical affordances. Another important affordance is feedback in terms of formative and summative assessment. These four categories of affordances provide insight into the potentialities and constraints of SimReal in teacher education. Finally, digital tools should provide affordances at the socio-cultural level. In terms of technological affordances, most students were satisfied with SimReal in terms of availability of mathematical content and open accessibility of the tool. In terms of pedagogical affordances, the students used SimReal to simulate various mathematical tasks at the student level, including Pythagoras’ and Square theorem to achieve a didactical goal. SimReal allowed them to explore a wide range of dynamic visualizations of the theorems and gave them the opportunity to create links between symbolic and digital representations of the theorems. Thus, the tool is congruent with paper-pencil techniques as Pythagoras’ and Square theorem clearly show. Moreover, many issues have been addressed at the student and classroom level: Motivation, autonomy, individualization, differentiation, variation, and activities. The students evaluated these affordances positively, and think that SimReal is useful because it combines various activities (problem solving, video lectures, live streaming, and simulation), and several ways of representing mathematical knowledge, such as texts, graphs, symbols, animations, and visualizations. There are various ways of using SimReal visualizations that may motivate students. They could be used as a supplement to mathematics on the blackboard, paper-pencil techniques, online teaching material, and as an alternative way for sharing knowledge and explaining mathematics. Visualizations, including online videos, are especially important because these are one of the main sources of information for young students. Enhanced motivation is also achieved through realistic mathematical tasks, dynamic simulations and visualizations, and a combination of these. These are considered useful to gain mathematical knowledge that is otherwise difficult to acquire. The combination of live streaming of lessons, video lectures, and simulations is highly valued as these can speed up the interest and motivation for doing mathematics. Beyond motivational

Exploring the Affordances of SimReal for Learning Mathematics

45

issues, many students think that SimReal enables a high degree of student autonomy allowing them to work at their own pace, without much interference from the teacher. While doing mathematics with paper and pencil is still important to stimulate learning, Pythagoras’ and Square theorem show that SimReal and the digital simulations of the theorems can be used as an alternative to achieve variation in teaching. The tasks show that SimReal facilitates various activities with the theorems, and it can be used in combination with pen-paper proofs. This is in line with the research literature that indicates that variation in teaching is important because students learn in different ways [15]. This is an important pedagogical affordance when doing mathematics depending on the students’ prerequisite and the mathematical task being solved. At the mathematics subject level, most students think that the tool has a high quality of mathematical content. Moreover, the mathematical notations are correct and sound. The study also shows that the technological and pedagogical affordances of SimReal make mathematics easier to understand, because these provide a concrete way of making mathematical concepts more dynamic. In addition, SimReal provides a huge variation of visualization tasks for the teacher to use in classroom, e.g., SimReal can support the understanding of Square and Pythagoras’ theorem by visualizing the dynamic behavior of the theorems. Nevertheless, a combination of pen/paper, digital visualizations, and blackboard could be more efficient to teach mathematics than just SimReal alone. Most students also think that programming mathematical tasks can provide more affordances for the learning of mathematics, and in terms of programming mathematical concepts. Basically, three groups of students emerged when it comes to programming. Those who like programming, those who somehow like programming but under certain conditions, and those who don’t like programming, and prefer using digital tools. Besides these differences, it appears that students think that programming mathematical simulations and visualizations will open a new way of doing mathematics. Since programming will become compulsory, it is worthwhile to take advantage of simulations and explanations combined with some programming examples so that the knowledge to be learned is presented with various methods. Of course, some prerequisites are necessary to achieve the didactical goal of learning mathematics using programming. Firstly, the students should acquire sufficient skills in this matter. Secondly, the students should have a good understanding of mathematics before getting started with programming. Finally, the students should focus on the mathematical part of the task rather than programming issues alone. Only those who are knowledgeable both in programming constructs and mathematics are likely to achieve the goal of programming mathematical concepts for the sake of learning. Finally, at the socio-cultural level, many students think that the tool is appropriate to use in teacher education and at the upper secondary school level, and it enables to concretize the curriculum. 6.2 Constraints Despite students’ overall satisfaction with SimReal, there still are constraints that need further consideration. Four categories of constraints are discussed in this section. Besides the lack of a user-friendly and intuitive interface, SimReal does not provide powerful

46

S. Hadjerrouit

affordances for differentiation and adapted education. Moreover, it does not provide sufficient support for group work, and, finally, it does not provide several types of feedback, review modes, question types, and statistics. Firstly, SimReal does not have an intuitive user interface or attractive design, and management facilities. Clearly, there is a need for a better user interface and navigation for different types of users. Secondly, in terms of differentiation, most students think that the tool is not fully adapted to their age. Thus, in terms of affordances, SimReal needs to be better adjusted to the student knowledge levels. Hence, more differentiation and individualization are expected in future work, as SimReal does not take sufficiently into account the requirement for adapted education. Clearly, visualization tools like SimReal will play an important role in school education, and should be integrated into the teaching of mathematics, but digital tools should not take over completely. A combination of various tools and resources is the most appropriate way to enhance the learning of mathematics. Thirdly, SimReal should provide better support for group work, which is a motivational factor in keeping them engaged in mathematics collaboratively. Perhaps, the reason why SimReal does not support group work or stimulate students to cooperate is that it does not have communication tools, and it does not contain group tasks. Despite this limitation, cooperation possibilities are given by the community of researchers using SimReal online. Even though discussions happened in classroom, it appears that SimReal does not contribute much to interactions in classroom. Hence, it would be important in future research to examine how and whether students work collaboratively in small groups when doing mathematics. Finally, in terms of affordances at the assessment level, SimReal does not provide meaningful feedback to the mathematical task being solved, except in terms of numbers, functions, and graphs that appear on the screen when entering a mathematical function or calculation. On the other hand, visualizations and dynamic simulations, as the tasks on Pythagoras’ and Square theorem show, can be considered as a form of feedback that foster understanding of the theorems. Clearly, dynamic visualizations provide feedback by showing mathematical concepts dynamically, and these helped to create a sense of problem solving and promote conceptual understanding. This is the strength of SimReal. Still, advanced feedback is a challenge that the designers of SimReal need to address, because as Bokhove and Drijvers [31] argued, digital tools should provide formative feedback to the work students are doing, e.g., in the form of review modes, because this type of feedback supports the learning process. This can be achieved by considering different learning styles and increased differentiation. Depending on usability issues, enhanced interactivity of SimReal could provide support to feedback. Interactivity provides an opportunity to assess students’ knowledge in interaction with SimReal. It is also a way of acquiring and understanding knowledge through action-feedback and reflection. Hence, at the assessment level, research work needs to be done to improve the feedback function. 6.3 Reconceptualizing the Notion of Affordance The theoretical framework of the study has proven to be useful to address the affordances of SimReal and their impacts on mathematical learning in teacher education. Nevertheless, the research literature reveals that the notion of affordance can be reconceptualized

Exploring the Affordances of SimReal for Learning Mathematics

47

and extended by considering ontological issues [32]. As already stated, affordances are not properties that exist objectively. Rather affordances emerge in the socio-cultural context of the classroom, where several artifacts and their affordances interact with the student, e.g., paper-pencil, blackboard, textbooks, Smart Phones, PowerPoint slides, mathematical tasks and their representations, and digital artifacts like SimReal. A reconceptualization of the notion of affordance needs to take in consideration new and more powerful theories such as Actor-Network Theory (ANT), which does not consider technology simply as a tool, but rather as an actor with agency that serves to reorganize human thinking [33]. In this regard, Wright and Parchoma [34] criticized the value of affordances, and proposed Actor-Network-Theory as an alternative framework that may contribute to greater critical consideration of the use of the notion of affordance. The theory of assemblage may also contribute to the understanding of affordances and its relationship to mathematical learning, which is understood as “an indeterminate act of assembling various kinds of agencies rather than a trajectory that ends in the acquiring of fixed objects of knowledge” [35, p. 52]. Moreover, Withagen, Araújo, and de Poel [36] argued that affordances are not mere possibilities for action, but they can also have the potential to solicit actions. Hence, the concept of agency can contribute to a better understanding of the notion of affordance.

7 Conclusions and Future Work Summarizing, Vygotsky’s socio-cultural perspective, the role of mediation, and the notion of affordance provide a sound theoretical framework to assess the impact of SimReal affordances on students’ mathematical learning at different levels in teacher education. Although this study does not aim to capture all potential affordances, it is possible to make reasonable interpretations of those that emerged in the context of this specific study and draw some recommendations for using SimReal in future work. From a methodological point of view, the purpose of this article is to collect data on affordances in the context a technology-based course in teacher education by asking students to respond to a survey questionnaire and open-ended questions. In addition, the students had the opportunity to comment the items of the survey in their own words. Task-based questions were also used to provide more nuanced information about the affordances of Pythagoras’ and Square theorem, and students’ views of programming affordances as well. The data collected by means of these methods provided an important amount of information that gave a good interpretation of the results achieved in this study. Even though, the results are promising, it is still difficult to generalize the results because of the small sample size (N = 15). In fact, amongst this sample size there is already variance with regard to the four categories of study programmes. However, it would have been better for this research study to have less variance with such a small sample size and ensure that one or two of those groups have a larger representation [3]. In future studies, students’ recommendations will be considered to improve the teaching and learning of mathematics with SimReal. In terms of technological affordances, there is a need for a better and intuitive user interface and navigation for different type of users. In terms of pedagogical affordances, there is a need for better feedback and review modes, more differentiation and individualization, and the possibility of programming videos and visualizations. From a socio-cultural point of view, research is

48

S. Hadjerrouit

needed to understand the co-emergence of different mediating artifacts, in particular SimReal, paper-and-pencil techniques, and other tools like GeoGebra to foster mathematical understanding in a digital learning environment. It is also necessary to understand the role of the teacher in orchestrating students’ work using visualization tools. From the curriculum point of view, it is important that SimReal enables the concretization of the mathematics subject curriculum so that the tool is tied to teaching mathematics. Hence, despite great interest in SimReal and the promising results achieved so far, research work remains to be done to fully exploit the potentialities of SimReal, and its educational value according to the notion of affordance, the socio-cultural perspective and the role of mediation. Moreover, the notion of affordance will be refined by considering other theories, such as Action-Network Theory, agency, and assemblage theory. It is also planned to consider students’ learning styles, e.g., visual and verbal students. Finally, data collection and analysis methods will be improved to ensure more validity and reliability. Acknowledgment. This research was supported by MatRIC – The Centre for Research, Innovation and Coordination of Mathematics Teaching, project number 150401. I would like to express my special appreciation to Per Henrik Hogstad, and his great support in designing and teaching mathematics lessons using SimReal.

References 1. Drijvers, P., Kieran, C., Mariotti, M.-A.: Integrating technology into mathematics education: theoretical perspectives. In: Hoyles, C., Lagrange, J.-B. (eds.) Mathematics and Technology-Rethinking the Terrain. Springer, Berlin (2010). https://doi.org/10.1007/978-14419-0146-0_7 2. Geiger, V., Forgasz, H., Tan, H., Calder, N., Hill, J.: Technology in mathematics education. In: Perry, B., et al. (eds.) Research in Mathematics Education in Australasia 2002–2011, pp. 111–141. Sense Publishers, Boston (2012) 3. Hadjerrouit, S.: Investigating the affordances and constraints of SimReal for mathematical learning: a case study in teacher education. Proc. CSEDU 2019, 27–37 (2019) 4. Clements, M.A.: Fifty years of thinking about visualization and visualizing in mathematics education: a historical overview. In: Fried, M.N., Dreyfus, T. (eds.) Mathematics & Mathematics Education: Searching for Common Ground. AME, pp. 177–192. Springer, Dordrecht (2014). https://doi.org/10.1007/978-94-007-7473-5_11 5. Presmeg, N.: Visualization and learning in mathematics education. In: Lerman, S. (ed.) Encyclopedia of Mathematics Education, pp. 636–640. Springer, Berlin (2014). https://doi.org/10. 1007/978-3-030-15789-0 6. Arcavi, A.: The role of visual representations in the learning of mathematics. Educ. Stud. Math. 52(3), 215–241 (2003) 7. SimReal. http://grimstad.uia.no/perhh/phh/matric/simreal/no/sim.htm. Accessed 07 June 2019 8. Brekke, M., Hogstad, P.H.: New teaching methods – using computer technology in physics, mathematics, and computer science. Int. J. Digit. Soc. (IJDS) 1(1), 17–24 (2010) 9. Hogstad, N.M., Ghislain, M., Vos, P.: Engineering students’ use of visualizations to communicate about representations and applications in a technological environment. In: Proceedings of INDRUM2016, 31 March–2 April, pp. 211–220. Montpellier (2016)

Exploring the Affordances of SimReal for Learning Mathematics

49

10. Gautestad, H.V.: Use of SimReal+ in mathematics at the university level. A case study of teacher’s orchestrations in relation to the usefulness of the tool for students. Master thesis. University of Agder, Kristiansand (2015) 11. Hogstad, N.M.: Use of SimReal+ in mathematics at the university level. A case study of students’ attitudes and challenges. Master thesis. University of Agder, Kristiansand (2012) 12. Curri, E.: Using computer technology in teaching and learning mathematics in an Albanian upper secondary school. The implementation of SimReal in trigonometry lessons. Master thesis. University of Agder, Kristiansand (2012) 13. Hadjerrouit, S., Gautestad, H. H.: Using the visualization tool SimReal to orchestrate mathematical teaching for engineering students. In: González, C.S, Castro, M., Nistal M.L. (eds.) Proceedings of EDUCON 2018 - Emerging Trends and Challenges of Engineering Education. IEEE Global Engineering Education Conference, pp. 44–48 (2018) 14. Hadjerrouit, S.: Evaluating the interactive learning tool SimReal+ for visualizing and simulating mathematical concepts. In: Proceedings of the 12th International Conference on Cognition and Exploratory Learning in Digital Age (CELDA 2015), pp. 101–108 (2015) 15. Hadjerrouit, S.: Assessing the affordances of SimReal+ and their applicability to support the learning of mathematics in teacher education. Issues Inf. Sci. Inf. Technol. 14, 121–138 (2017) 16. Vygotsky, L.: Mind in Society. The Development of Higher Psychological Processes. Havard University Press, Cambridge (1978) 17. Trouche, L.: Analysing the complexity of human/machine interactions in computerized learning environments: guiding students’ command process through instrumental orchestrations. Int. J. Comput. Math. Learn. 9, 281–307 (2004) 18. Gibson, J.J.: The Ecological Approach to Visual Perception. Houghton Mifflin, Boston (1977) 19. Osiurak, F., Rossetti, Y., Badets, A.: What is an affordance? 40 years later. Neurosci. Behav. Rev. 77, 403–417 (2017) 20. Norman, D.A.: The Psychology of Everyday Things. Basic Books, New York (1988) 21. Turner, P., Turner, S.: An affordance-based framework for CVE evaluation. In: People and Computers XVII – The Proceedings of the Joint HCI-UPA Conference, pp. 89–104. Springer, London (2002). https://doi.org/10.1007/978-1-4471-0105-5_6 22. Kirchner, P., Strijbos, J.-W., Kreijns, K., Beers, B.J.: Designing electronic collaborative learning environments. Educ. Technol. Res. Dev. 52(3), 47–66 (2004) 23. Chiappini, G.: Cultural affordances of digital artifacts in the teaching and learning of mathematics. In: Proceedings of ICTMT11 (2013) 24. DeLanda, E.: Intensive Science and Virtual Philosophy. Bloomsbury Academic, New York (2013) 25. Pierce, R., Stacey, K.: Mapping pedagogical opportunities provided by mathematical analysis software. Int. J. Math. Learn. 15, 1–20 (2010) 26. Brousseau, G.: Theory of Didactical Situations In Mathematics. Kluwer Academic Publishers, Boston (1997) 27. Hadjerrouit, S.: Bronner, A.: An instrument for assessing the educational value of Aplusix (a + x) for learning school algebra. In: Searson, M., Ochoa, M. (eds.). Proceedings of Society for Information Technology and Teacher Education International Conference 2014, pp. 2241– 2248. AACE, Chesapeake (2014) 28. Zbiek, R.M., Heid, M.K., Blume, G.W., Dick, T.P.: Research on technology in mathematics education, a perspective of constructs. In: Lester, F.K. (ed.) Second Handbook of Research on Mathematics Teaching and Learning, pp. 1169–1207. Information Age, Charlotte (2007) 29. Pythagoras theorem. http://grimstad.uia.no/perhh/phh/MatRIC/SimReal/no/SimRealP/AA_ sim/Mathematics/Geometry/KeyWordIcon_Pythagoras_Exercise.htm. Accessed 12 June 2019

50

S. Hadjerrouit

30. Square theorem. http://grimstad.uia.no/perhh/phh/MatRIC/SimReal/no/SimRealP/AA_sim/ Mathematics/Basic/KeyWordIcon_Square_Exercise.htm. Accessed 11 June 2019 31. Bokhove, K., Drijvers, P.: Digital tools for algebra education: Criteria and evaluation. Int. J. Math. Learn. 15, 45–62 (2010) 32. Burlamaqui, L., Dong, A.: The use and misuse of the concept of affordance. In: Gero, John S., Hanna, S. (eds.) Design Computing and Cognition ‘14, pp. 295–311. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14956-1_17 33. Latour, B.: Reassembling the Social: An Introduction to Actor-Network-Theory. Oxford University Press, Oxford (2005) 34. Wright, S., Parchoma, G.: Technologies for learning? An actor-network theory critique of ‘affordances’ in research on mobile learning. Res. Learn. Technol. 19(3), 247–258 (2011) 35. De Freitas, E., Sinclair, N.: Mathematics and the Body: Material Entanglements in the Classroom. Cambridge University Press, New York (2014) 36. Withagen, R., Araújo, D., de Poel., H.J.: Inviting affordances and agency. New Ideas Psychol. 45, 11–18 (2017)

Feedback Preferences of Students Learning in a Blended Environment: Worked Examples, Tutored and Untutored Problem-Solving Dirk T. Tempelaar1(B)

, Bart Rienties2

, and Quan Nguyen3

1 School of Business and Economics, Maastricht University, Maastricht, The Netherlands

[email protected] 2 Institute of Educational Technology, Open University UK, Walton Hal, Milton Keynes, UK

[email protected] 3 School of Information, University of Michigan, Ann Arbor, MI, USA

[email protected]

Abstract. In contemporary technology-enhanced, learning platforms that combine the learning of new concepts with the practicing of newly learned skills, students are offered multiple feedback options. Typically, a problem-solving exercise allows the option to check the correctness of the answer, for calling hints that provide a partial help in the sequence of problem-solving steps, or calling a fully worked-out example. This opens new opportunities for research into student learning tactics and strategies, leaving the traditional context of lab-based research following experimental design principles behind, going into the research of revealed learning choices of students learning in authentic settings. In this empirical study, we apply multi-modal data consisting of logged trace data, self-report surveys and learning performance data, to investigate antecedents and consequences of learning tactics and strategies applied by students learning introductory mathematics and statistics. We do so by distinguishing different learning profiles, determined by the intensity of using the platform and the relative amounts of examples and hints called. These learning profiles are related to prior knowledge and learning dispositions, as antecedents, and course performance, as a consequence. One of our findings is that of ‘help abuse’: students who bypass the option to call for hints as concrete feedback in their problem-solving journey and instead opt for calling generic solutions of the problem: the worked examples. This help abuse is associated with prior knowledge and learning dispositions, but much less with course performance. Keywords: Blended learning · Dispositional learning analytics · Learning strategies · Multi-modal data · Tutored problem-solving · Untutored problem-solving · Worked examples

1 Introduction Research into student learning tactics and strategies has for a long time primarily relied on data collected through self-reports, or was based on think-aloud protocols. Both these © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 51–70, 2020. https://doi.org/10.1007/978-3-030-58459-7_3

52

D. T. Tempelaar et al.

sources of educational data are open to biases: biased responses often present in selfreported perceptions, or biases due to the exclusion of naturalistic contexts from the analysis [6, 8, 9]. The increased use of blended learning and other forms of technologyenhanced education gave way to measure revealed learning strategies by collecting traces of students’ learning behaviors in digital learning platforms. This new opportunity of combining trace data with self-report data has boosted empirical research in learning tactics and strategies. Examples of such are [6] and research by Gaševi´c and co-authors [8, 9]. This new research application aims to investigate relationships between learning strategies measured by trace data, learning approaches measured by self-reports as the antecedents of these learning strategies, and academic performance as the consequences of learning strategies. For instance, [8] finds that learning strategies are related to deep learning approaches, but not to surface learning approaches. In the experimental study [9], the role of instructional conditions and prior experience with technology-enhanced education is investigated. However, most of these studies do not take individual differences into account, as expressed in [9]: ‘Future studies should also account for the effects of individual differences -e.g., motivation to use technology, self-efficiancy about the subject matter and/or technology, achievement goal orientation, approaches to learning, and metacognitive awareness’. Our paper aims to contribute to this lack of empirical work incorporating individual differences by addressing students’ learning strategies within a dispositional learning analytics context. The Dispositional Learning Analytics (DLA) infrastructure, introduced by [7], combines learning data (generated in learning activities through technology-enhanced systems) with learner data (student dispositions, values, and attitudes measured through self-report surveys). Learning dispositions represent individual difference characteristics that affect all learning processes and include affective, behavioral and cognitive facets [18]. Students’ preferred learning approaches are examples of such dispositions of both cognitive and behavioral type. The current study builds on our previous DLA-based research [13, 19, 21, 22, 24– 28]. One of our empirical findings in these studies was that traces of student learning in digital platforms show marked differences in the use of worked examples [13, 19, 21, 22, 24–28]. The merits of the worked examples principle [17] in the initial acquisition of cognitive skills are well documented. The use of worked solutions in multi-media based learning environments stimulates gaining deep understanding [17]. When compared to the use of erroneous examples, tutored problem solving, and problem-solving in computer-based environments, the use of worked examples may be more efficient as it reaches similar learning outcomes in less time and with less learning efforts. The mechanism responsible for this outcome is disclosed in [17, p. 400]: ‘examples relieve learners of problem-solving that – in initial cognitive skill acquisition when learners still lack understanding – is typically slow, error-prone, and driven by superficial strategies. When beginning learners solve problems, the corresponding demands may burden working memory capacities or even overload them, which strengthens learners’ surface orientation. … When learning from examples, learners have enough working memory capacity for self-explaining and comparing examples by which abstract principles can be considered, and those principles are then related to concrete exemplars. In this way,

Feedback Preferences of Students Learning

53

learners gain an understanding of how to apply principles in problem solving and how to relate problem cases to underlying principles’. The current study is a follow-up study of [28]. That study investigated the antecedents and consequences of the learning strategies of worked examples, tutored and untutored problem solving by analyzing absolute amounts of calls for worked examples and hints after excluding a large group of students with low activity levels in the digital learning environment. However, this group of relatively inactive students are of interest for themselves, so in this follow-up study, we sought an approach to include these lowactivity learning behaviors. The solution was found by switching from absolute to relative measures of the intensity of strategy use. Empirical research based on measured learning behavior suggests that students may abuse help facilities available in digital learning environments through bypassing hints that are more abstract and going straightforwardly to concrete solutions [1–3, 20]. Analyzing log behavior of students, distinguishing proper use and abuse of help facilities, would allow creating profiles of adaptive and maladaptive learning behaviors [20]; see also [14]. Following research by McLaren and co-authors [11, 12], we extend the range of preferred learning strategies taken into account to include, beyond worked-examples, the tutored and untutored problem-solving strategies. In the tutored problem-solving strategy, students receive feedback in the form of hints and evaluation of provided answers, both during and at the end of the problem-solving steps. In untutored problem solving, feedback is restricted to the evaluation of provided answers at the end of the problem-solving steps [11, 12]. Evidence for the worked examples principle is typically based on laboratory-based experimental studies, in which the effectiveness of different instructional designs is compared [17]. McLaren and co-authors take the research into the effectiveness of several learning strategies a step into the direction of ecological validity, by choosing for an experimental design in a classroom context, assigning the alternative learning approaches worked-examples, tutored and untutored problem-solving and erroneous examples as the conditions of the experiment [11, 12]. In our research, we increase ecological validity one more step by offering a digital learning environment that encompasses all learning strategies of worked-examples, tutored and untutored problem solving, and observing the revealed preference of the students in terms of learning strategy they apply. In this naturalistic context, the potential contribution of LA-based investigations is that we can observe students’ revealed preferences for a specific learning strategy, how these preferences depend on the learning task at hand, and how these preferences link to other observations, such an individual difference characteristics. By doing so, we aim to derive a characterization of students who actively apply worked examples or tutored problem solving, and those not doing so. In line with contemporary research into learning strategies applying trace data [8, 9], we adopt two research questions: 1) what are the antecedents of the learning strategies of using worked examples and tutored problem solving in terms of prior knowledge and learning dispositions? And 2) what are the consequences of the learning strategies of using worked examples and tutored problem solving in terms of course performance learning outcomes?

54

D. T. Tempelaar et al.

2 Methods 2.1 Context of the Empirical Study This study takes place in a large-scale introduction course in mathematics and statistics for first-year students of a business administration and economics program in the Netherlands. The educational system can best be described as ‘blended’ or ‘hybrid’. The most important component is face-to-face: Problem-Based Learning (PBL), in small groups (14 students), coached by expert tutors (in 74 parallel tutor groups). Participation in the tutoring group meetings is required. The online component of the blend is optional: the use of the two e-tutorial platforms SOWISO (https://sowiso.nl/) and MyStatLab (MSL) [13, 19, 21, 22, 24–28]. This design is based on the philosophy of student-centered education, in which the responsibility for making educational choices lies primarily with the student. Since most of the learning takes place outside the classroom during self-study through the e-tutorials or other learning materials, the class time is used to discuss how to solve advanced problems. The educational format, therefore, has most of the characteristics of the flipped-classroom design in common. The intensive use of the e-tutorials and achievement of good scores in the e-tutorial practice modes is encouraged by giving performance bonus points in quizzes that are taken every two weeks and consist of items drawn from the same item pools that are used in the practice mode. This approach was chosen to encourage students with limited prior knowledge to make intensive use of the e-tutorials. The subject of this study is the full cohort of students 2018/2019 (1035 students). The diversity of the student population was large: only 21% of the student population was educated in the Dutch secondary school system, compared to 79% educated in foreign systems, with 50 nationalities. A large part of the students had European nationality, with only 4.0% of the students from outside Europe. Secondary education systems in Europe differ widely, particularly in the fields of mathematics and statistics. It is, therefore, crucial that this introductory module is flexible and allows for individual learning paths. On average, students spend 27 h connect time in SOWISO and 32 h in MSL, which is 30% to 40% of the 80 h available to learn both subjects. Although students work in two e-tutorial platforms, this analysis will focus on student activity in one of them, Sowiso, because of the availability of fine-grained feedback data. 2.2 Instruments and Procedure Both e-tutorial systems SOWISO and MSL follow a test-driven learning and practice approach. Each step in the learning process is initiated by a problem and students are encouraged to (try to) answer each problem. If a student has not (fully) mastered a problem, he or she can ask for hints to solve the problem step by step, or to ask for a fully worked out example. Upon receipt of feedback, a new version of the problem is loaded (parameter based) to enable the student to demonstrate his or her newly acquired mastery. The revealed preferences of students for learning strategies are related to their learning dispositions, as we have shown in previous research [13, 19, 21, 24–28] for the use of elaborated examples in SOWISO, and the use of elaborated examples in MSL [22]. This study expands [13, 26, 27] by examining three learning strategies in the SOWISO

Feedback Preferences of Students Learning

55

tool: worked examples and supported and unsupported problem solving. This study is an immediate continuation of the [28] study, which was based on absolute numbers of worked examples and hints called for by the student. An important outcome of our previous study was that the intensity of practicing is the most important determinant of the demand for feedback, whatever form of feedback it may be. Differences in learning strategies only have a second order effect. By analyzing the relative numbers of feedback requests rather than the absolute numbers in this study, it is expected that the role of strategies will become better visible. Figure 1 shows in a sample problem the implementation of the alternative feedback strategies that students can choose:

Fig. 1. Sample of Sowiso problem with feedback options Check, Theory, Solution and Hint.

– Check: the unstructured problem-solving approach, which only provides correctness feedback after solving a problem; – Hint: the tutored problem-solving approach, with feedback and tips to help the student with the different problem-solving steps; – Solution: the worked examples approach; – Theory: ask for a short explanation of the mathematical principle. Our study combines trace data from the SOWISO e-tutorial with self-report data that measure learning dispositions, and course performance data. Clicks in the e-tutorial system represent an important part of that trace data, and in that respect our research design

56

D. T. Tempelaar et al.

is aligned with the research of Amo-Filvà and co-authors [4, 5] who use a tool called Clickstream to describe click behavior in the digital learning environment. However, trace data can be more than just click data. Azevedo [6] distinguishes between trace data of product type and process type, where click data is part of the process data category. In this study, we will combine both process data, such as the clicks to initiate the learning support mentioned above of Check, Hint, Solution and Theory, as well as product data, such as the mastery in the tool, as discussed below. SOWISO reporting options for trace data are very broad, which requires making selections from the data. First, all dynamic trace data were aggregated over time, to arrive at static, full course period accounts of trace data. Secondly, a selection was made from the wide range of trace variables by focusing on the process variables that are most closely related to the alternative learning strategies. A total of four trace variables were selected: – Mastery in the tool, the proportion of the exercises that have been successfully solved as a product indicator; – Attempts: total number of attempts at individual exercises; – Hints: the relative number of hints called, as the number of hints per attempt; – Examples: the relative number of examples called, as number of examples per attempt. The next step in the analysis is to create profiles of learning behavior by distinguishing different patterns of student learning in the e-tutorials, as in [4, 5]. Instead of applying advanced statistical techniques to create different profiles of the use of worked examples, such as in Gaševi´c and co-authors [8, 9] or in [26, 27], the student population has been split into three subgroups based on the scores of three trace variables: Attempts, the relative number of Examples, and the relative number of Hints. For each of the trace variables Attempts, Examples and Hints, we thus obtain the subgroups Low, Middle and High. Table 1 shows the statistics of the 27 different profiles resulting from the subdivision of each of the three trace variables. The operationalization of the revealed preferences for learning strategies follows these three subgroups divisions. The revealed preference for the strategy of the worked examples is operationalized as the intensity of calling worked examples, relative to the number of attempts to solve a problem. Similarly, the revealed preference for the tutored problem-solving strategy is operationalized as the intensity of calling hints, relative to the number of attempts to solve a problem. As is clear from Table 1, the revealed preferences for learning strategies are related. The combination of a high number attempts and a high number of examples but a low number of hints is populated by 101 students, in contrast to no more than 12 students populating the combination of high-attempts but low-hints and examples. The strategy of untutored problem solving is a necessary part of any of the revealed preferences since students can only build mastery through untutored problem solving: a problem solved with hints or a worked example does not count towards mastery. Since high mastery scores are important to students because of achieving bonus scores, the system contains a stimulus to call hints and worked examples only when there is a great need for it. This explains the relatively low number of hints called for, as visible from Table 1.

Feedback Preferences of Students Learning

57

Table 1. Descriptive statistics per learning profile of Attempts (Att), Examples (Ex) and Hints, with subgroups indicated by Lo (low), Mi (middle) and Hi (high). Subgroup

N

Mastery Attempts Examples Hints

AttLo/ExLo/HintsLo

47 21.6%

227

0.168

0.0017

AttLo/ExLo/HintsMi

53 28.9%

325

0.198

0.0191

AttLo/ExLo/HintsHi

86 27.5%

263

0.206

0.1150

AttLo/ExMi/HintsLo

33 35.0%

401

0.358

0.0023

AttLo/ExMi/HintsMi

39 32.8%

396

0.359

0.0157

AttLo/ExMi/HintsHi

36 28.7%

332

0.366

0.1150

AttLo/ExHi/HintsLo

20 40.1%

402

0.514

0.0018

AttLo/ExHi/HintsMi

15 24.5%

314

0.499

0.0184

AttLo/ExHi/HintsHi

16 36.6%

334

0.476

0.0940

AttMi/ExLo/HintsLo

21 90.6%

776

0.230

0.0023

AttMi/ExLo/HintsMi

40 80.1%

722

0.236

0.0187

AttMi/ExLo/HintsHi

56 87.2%

746

0.233

0.1244

AttMi/ExMi/HintsLo

44 80.1%

787

0.368

0.0032

AttMi/ExMi/HintsMi

60 71.5%

784

0.371

0.0191

AttMi/ExMi/HintsHi

39 71.9%

772

0.358

0.1223

AttMi/ExHi/HintsLo

40 61.0%

774

0.499

0.0029

AttMi/ExHi/HintsMi

25 67.6%

775

0.525

0.0182

AttMi/ExHi/HintsHi

20 71.5%

779

0.479

0.1028

AttHi/ExLo/HintsLo

12 89.8%

1171

0.245

0.0017

AttHi/ExLo/HintsMi

14 95.2%

1149

0.215

0.0169

AttHi/ExLo/HintsHi

16 98.4%

1106

0.215

0.0774

AttHi/ExMi/HintsLo

28 97.7%

1143

0.381

0.0024

AttHi/ExMi/HintsMi

34 94.5%

1193

0.388

0.1078

AttHi/ExMi/HintsHi

32 97.4%

1164

0.385

0.1157

AttHi/ExHi/HintsLo

101 95.0%

1431

0.548

0.0024

AttHi/ExHi/HintsMi

65 96.2%

1375

0.529

0.0165

AttHi/ExHi/HintsHi

43 93.6%

1352

0.527

0.0964

In this study, we will focus on a selection of self-report surveys for measuring students’ learning dispositions. More than a dozen instruments have been conducted, ranging from affective learning emotions to cognitive learning processing strategies: – Epistemological self-theories of intelligence; – Epistemological views on role-playing efforts in learning; – Epistemic learning emotions;

58

– – – – – – – – – –

D. T. Tempelaar et al.

Cognitive learning processing strategies; Metacognitive learning regulation strategies; Subject-specific (mathematics & statistics) learning attitudes; Academic motivations; Achievement goals; Achievement orientations; Learning activities emotions; Motivation & engagement constructs; National cultural values; and Help-seeking behavior.

The most important self-report instruments for measuring learning approaches used in this study are briefly described in the following paragraphs. For more detailed coverage, reference is made to earlier studies by the authors [13, 19, 21, 22, 24–28]. The description of the research results will focus on specific aspects of learning dispositions: learning processing and regulation, aspects of students’ attitudes, motivation and engagement and activity emotions. Course performance data are based on the final written exam and the three intermediate quizzes. The quiz scores are averaged, and for both the exam and the quiz score, we focus on the topic score for mathematics, in line with the focus on the math e-tutorial Sowiso. That results in MathExam and MathQuiz as the relevant performance indicators. The first day of the course, students write a diagnostic entry test, of which the score is indicated by MathEntry. Learning Approaches. Students’ learning approaches are based on Vermunt’s Inventory of Learning Styles (ILS) tool [29]. Our study focused on two of the four domains of the ILS: cognitive processing strategies and metacognitive control strategies. The instrument distinguishes three different processing strategies: deep approaches to learning, step-wise or surface approaches to learning and concrete or strategic approaches to learning, as well as three regulatory strategies: self-regulation, external regulation and lack of regulation. Attitudes to Learning. The attitude towards learning mathematics and statistics was assessed using the SATS instrument [23]. The instrument contains six quantitative methods-related learning attitudes: – Affect: students’ feelings about mathematics and statistics, – CognComp: the students’ self-perceptions of their intellectual knowledge and skills when applied to mathematics and statistics, – Value: the attitude of students towards the usefulness, relevance and value of mathematics and statistics in their personal and professional lives, – Difficulty: students’ perception that mathematics and statistics as subjects are not difficult to learn, – Interest: the individual interest of students in learning mathematics and statistics, – Effort: the amount of work that students are willing to do to learn the subjects.

Feedback Preferences of Students Learning

59

Motivation and Engagement Wheel. The instrument Motivation and Engagement Wheel [10] breaks down learning cognitions and learning behaviors into four categories of adaptive versus maladaptive types and cognitive versus behavioral types. Self-belief, the value of school (ValueSchool), and learning focus (LearnFocus) shape the adaptive, cognitive factors, or cognitive boosters. Planning, task management (TaskManagm), and Persistence shape the behavioral boosters. Mufflers, the maladaptive, cognitive factors are Anxiety, failure avoidance (FailAvoid), and uncertain control (UncertainCtrl), while self-sabotage (SelfSabotage) and Disengagement are the maladaptive, behavioral factors or guzzlers. Learning Activity Emotions. The Control-Value Theory of Achievement Emotions (CVTAE, [15]) postulates that emotions that arise in learning activities differ in valence, focus, and activation. Emotional valence can be positive (enjoyment) or negative (anxiety, hopelessness, boredom). CVTAE describes the emotions experienced about an achievement activity (e.g. boredom experienced while preparing homework) or outcome (e.g. anxiety towards performing at an exam). The activation component describes emotions as activating (i.e. anxiety leading to action) versus deactivating (i.e. hopelessness leading to disengagement). From the Achievement Emotions Questionnaire (AEQ, [16]) measuring learning emotions, we selected four scales: positive activating emotion Enjoyment, negative activating emotion Anxiety, neutral deactivating Boredom and negative deactivating Hopelessness. Next, Academic Control is included as the antecedent of all learning emotions. Different from the other factors described above, learning activity emotions are not only a learning disposition but also an outcome of the learning process. For that reason, activity emotions are measured halfway the course, where all other disposition variables were measured at the start of the course.

2.3 Statistical Analyses The full sample is split into three equal-sized groups for each of the following trace variables: Attempts, Examples and Hints. This results in 29 subsamples in total with sample sizes ranging from 12 to 101: see Table 1 for the descriptives of what we will label the different profiles of tool use. The choice for splitting in three (low, middle, high) is for a practical reason: a quartile split would result in too many profiles with small sample sizes, and median split misses the fine-grained outcomes currently available. In our previous study, [28], working with absolute measures of intensity of feedback use, we restricted the sample to students achieving at least 70% of tool mastery. This to make fair comparisons: it is difficult to compare 20 calls for a worked example by a student who did all exercises and a student who practiced only small part of the content. Now that we are working with relative measures, there is no reason to exclude infrequent practicing. From Table 1 it is clear that most if not all students in the low-Attempts profiles would have been excluded when applying this 70% mastery requirement: the highest average mastery level amongst the nine low-Attempts profiles equal 40.1%. Including these profiles in this study implies a more diverse sample than in the previous study. All of our analyses consist of 3-way ANOVA’s with demographics, learning dispositions and course performance as response variables, and the 3X3X3 grouping of

60

D. T. Tempelaar et al.

Attempts, Examples and Hints as explanatory factors, including their interaction terms. Although different statistical methods could have been used to analyze these data, as, e.g. regression analysis, we opted for running ANOVA’s as it allows for a straightforward graphical presentation of the outcomes. Lastly, we opted for a uniform presentation of these outcomes, demonstrating the effects of Attempts and Examples, excluding the effects of Hints. In most analyses, the role of Hints is very modest, and the graph of the 3X3 structure of Attempts and Examples is most informative.

3 Results 3.1 Previous Research In previous research [13, 21, 22, 24–28], we examined the role of worked examples in LA applications and found that several dispositional constructs predict the use of worked examples as a learning strategy. Demographic variables, student-learning approaches, learning attitudes and learning emotions influenced the use of worked examples, with effect sizes up to 7% for individual dispositions. In our profiling study [26] we found that the use of worked examples and the total number of attempts to be the two variables that determine the most characteristic differences between the different profiles in the use of the e-tutorial. The use of hints did not greatly contribute to the creation of the students’ usage profiles. It is, therefore expected that dispositions will play a less important role in explaining the use of hints as a learning strategy than in explaining the use of working examples. Next, we will not systematically report the analyses of all dispositional variables (what would go beyond the limits of the size of this contribution), but focus on the reporting for variables where the antecedent-consequence relationship is most visible. 3.2 Absolute and Relative Measures of Feedback Use A first step in the analysis is to confirm that the nature of relative measures of feedback use is different enough from the nature of absolute measures, to expect outcomes that differ from those in [28]. Table 2 provides the correlations of Attempts with relative measures, Examples and Hints, and absolute measures, ExamplesAbs and HintsAbs. Although Examples is still strongly related to Attempts, the change from absolute to relative measures diminishes explained variation from 83% to 34%, and the weak positive correlation between Attempts and Hints changes into a weak negative correlation, suggesting a truly different context. 3.3 Demographics Relationships between demographic variables and profiling of students are no more than weak. Profile differences in Attempts, Examples and Hints explain 0.5% of the variation in sex, 3.9% of the variation in international status, and 6.7% of the variation in prior math education. In the ANOVA’s of sex and international status, Attempts is the only statistically significant main factor. In the ANOVA of prior math education, all main factors are statistically significant, with the largest role for the grouping of Examples.

Feedback Preferences of Students Learning

61

Table 2. Correlations between Attempts, Examples and Hints (relative measures of feedback use), ExamplesAbs, and HintsAbs (absolute measures of feedback use). 1

2

3

4

1. Attempts

1.000

2. Examples

.583***

1.000

3. Hints

−.152***

−.171***

1.000

4. ExamplesAbs

.912***

.741***

−.167***

1.000

5. HintsAbs

.170***

.018

.826***

-.087**

5

1.000

Note: **: p < .01; ***: p < .001

3.4 Diagnostic Entry-Test Profile differences in Attempts, Examples and Hints relate to the MathEntry test score mirror those of the prior math education. The ANOVA of prior math education indicates 8.6% of explained variation by the three main factors, and the interaction term of Attempts and Examples. Figure 2 describes that relationship.

Fig. 2. ANOVA outcomes of MathEntry score explained by Attempts, Examples and Hints groupings. Estimated marginal means of Attempts and Examples groups.

Post-hoc differences indicate a lack of statistically significant differences between the Attempts and Hints groups, but significant differences between all Examples profiles. The relevance of the interaction effect is visible from Fig. 2: students in low Examples profiles have the highest prior knowledge, and students in the high Examples profiles have the lowest prior knowledge, except the profile with low Attempts.

62

D. T. Tempelaar et al.

3.5 Learning Approaches Although the use of learning strategies is the explicit focus of the framework for learning approaches, tool use profiles explain little variation in the ANOVA’s of the separate cognitive processing strategies and metacognitive regulation strategies. Amongst the learning strategies, the relationship with step-wise learning is strongest but limited to explaining no more than 0.6% of the variation in the learning strategy. Attempts are the only statistically significant category, with higher amounts of attempts corresponding to stronger inclination to learn step-wise. 5.2% of the variation of External regulation is explained by tool use profiles, making it the regulation strategy most strongly related to the profiles. Again, Attempts is the single contributor to the ANOVA having a statistically significant impact, with higher amounts of attempts corresponding to stronger inclination to regulate learning with the help of external cues. 3.6 Learning Attitudes The pattern of learning attitudes is very different, both with regard to levels of explained variation being higher, as well as other profile factors than Attempts being the main predictor. As an example, see Fig. 3, the relationships between user-profiles and the attitudinal variable Interest. Profiles explain 6.6% of the variation in Interest. All main effects of Attempts, Examples and Hints are statistically significant, as well as interaction effects, but the largest effect is by Examples: using fewer examples goes with higher levels of interest in learning mathematics and statistics.

Fig. 3. ANOVA outcomes for MathEntry score explained by Attempts, Examples and Hints groupings. Depicted: estimated marginal means of Attempts and Examples groups.

Feedback Preferences of Students Learning

63

3.7 Motivation and Engagement Learning motivation and engagement constructs demonstrate similar more nuanced patterns as the attitudes: both Attempts and Examples categories explain levels of the dispositional constructs, with different effect mechanisms. Two behavioral constructs are detailed below: the adaptive Persistence, explained for 9.6% by the use profiles, and the maladaptive Self-sabotage, with 6.5% explained variation. Figure 4 contains the details of these two patterns.

Fig. 4. ANOVA outcomes for Persistence score, left panel and Self-sabotage score, right panel, explained by Attempts, Examples and Hints groupings. Depicted: estimated marginal means of Attempts and Examples groups.

The role of Attempts is unambiguous: more attempts predict higher levels of Persistence, lower levels of Self-sabotage. The role of Examples is however, more complex: it is only the profile with the lowest number of Examples that stands out in predicting Persistence. Self-sabotage is described by a rather unique pattern: lower levels of Attempts, but higher levels of Examples predict levels of Self-sabotage. 3.8 Learning Activity Emotions The valence of learning activity emotions, positive or negative, determines the pattern in the relationship between Examples grouping and the level of the emotion. Negative emotions Anxiety, Boredom and Hopelessness, demonstrate higher levels of the emotion for lower levels of Examples. The reverse pattern describes positive emotion Enjoyment. The Attempts grouping is described by a more complex pattern. Boredom and Enjoyment show a similar consistent pattern: higher levels of Attempts go with lower levels of Boredom and higher levels of Enjoyment. See Fig. 5 for the pattern of Boredom. Anxiety and Hopelessness demonstrate a different pattern: the lowest levels are reached for

64

D. T. Tempelaar et al.

the middle category of Attempts. In all four emotions, only main effects of Attempts and Examples are statistically significant, and explained variation ranges between 5.7% (Anxiety) to 9.0% (Boredom).

Fig. 5. ANOVA outcomes for learning activity emotion Boredom, explained by Attempts, Examples and Hints groupings. Depicted: estimated marginal means of Attempts and Examples groups.

3.9 Tool Mastery As explained in the methods section, the inclusion of all students active in the e-tutorial into the sample implies a very heterogeneous sample with regard to Tool mastery. Since tool mastery is strongly connected to the number of Attempts, we find that heterogeneity back in Fig. 6 that depicts the relationships between Attempts and Examples profiles with Tool mastery. Of the three main effects, only that of Attempts is statistically significant: higher levels of Attempts go with higher levels of Tool mastery. The main effect of Examples is not significant, because the positive effect of calling more Examples on Tool mastery in the low-Attempts profile is balanced by the negative effect of calling more Examples in the middle-Attempts profile. However, there is a strongly significant interaction effect. Altogether, tool use profiles explain 71.0% of the variation in Tool mastery. 3.10 Course Performance The last two analyses relate tool use profiles with the two course performance measures: Quiz and Exam for the mathematical topic in the course. Here, we find the three main effects to be statistically significant, as well as the interaction of Attempts and Examples,

Feedback Preferences of Students Learning

65

Fig. 6. ANOVA outcomes for Tool mastery, explained by Attempts, Examples and Hints groupings. Depicted: estimated marginal means of Attempts and Examples groups.

with the explained variation being 17.2% and 31.2%. Figure 7 describes the main effects, as well the interaction: more practicing is beneficial for quiz and exam scores, calling more Examples has the opposite effect, but its effect is not equal for all levels of Attempts.

Fig. 7. ANOVA outcomes for Math Quiz score, left panel and Math Exam score, right panel, explained by Attempts, Examples and Hints groupings. Depicted: estimated marginal means of Attempts and Examples groups.

66

D. T. Tempelaar et al.

4 Discussion In this study, we investigated the antecedents and consequences of students’ use of the learning strategies of worked examples and hints. We did so in the context of relative measures of intensity of use of both types of feedback: the number of worked examples called per attempt, and the number of hints called per attempt, where we included attempts in the analyses as a measure of overall student activity in the e-tutorial. Doing so makes this study a follow-up of [28] that investigated the same research questions, however in the context of absolute measures of intensity of use of both types of feedback and excluding students with low activity levels. Our first finding refers to the limited role of Hints or the tutored problem-solving strategy. First, in terms of size: students use on average slightly less than four examples per ten attempts but slightly more four hints per hundred attempts. That is nearly ten times as many examples as hints. In itself, this is a first indication of the presence of the ‘abuse of help’ phenomenon: students are inclined to step over hints as a partial help and go straightforwardly to the examples that provide full help. However, the limited role of Hints extends beyond the counts. Hints play no, or at most very weak roles in all antecedent and consequence relationships, relationships that are dominated by Attempts and Examples. Since there is quite some variability in the use of hints, with students in the high-Hints groups using more than 50 times the number of hints as students in the low-Hints groups, the absence of a role for Hints in the relationships also signals that this ‘abuse of help’ phenomenon does not hamper effective learning. It is for these reasons that the reporting of these antecedent and consequence relationships has focused on the role of attempts and examples. The role of Attempts and Examples is more complex. In the consequence relationships, Attempts is positively related with Tool mastery, MathQuiz and MathExam: higher levels of activity are associated with higher performance levels. MathQuiz and MathExam tell the story that students who need more examples relative to the number of attempts tend to perform less well. However, that is not true for the low-Attempts group when explaining MathExam, and not true for both low-Attempts and high-Attempts when explaining Tool mastery. When we focus on the antecedent relationships with the learning disposition constructs, we have two crucial findings: first, it is not the intensity of using the tool, Attempts, being the main predictor, but Examples having the largest contribution in distinguishing the groups. Second: Examples is positively associated with maladaptive dispositions, and negatively associated with adaptive dispositions. Low levels of Selfsabotage and Boredom go with few Examples, and high levels of Interest and Persistence come with few Examples. A pattern that is shared with our indicator of prior knowledge: the score on the MathEntry test. The highest levels of the two maladaptive learning dispositions Self-sabotage and learning Boredom and the lowest level of the adaptive learning disposition Interest were found in the profile of low-Attempts, high-Examples. In our previous research, excluding the least active students from the analysis, this profile was missing for the most part. However, it is exactly this profile that most contributes to the ‘help abuse’ phenomenon. ‘Bad student use’ is not the synonym of bad performance; at least, not always. When investigating the relationship between prior knowledge, expressed by MathEntry test

Feedback Preferences of Students Learning

67

score, and use of Examples, we concluded that in all Attempts categories, students in the low-Examples profiles have the highest average test score. Apparently, part of the ‘help abuse’ is by students with high levels of initial mastery, who just check a sequence of worked examples to find out where the borders of their mastery are positioned and where they should start with actual practicing. Such type of ‘help abuse’ suggests being a rather efficient learning strategy. The downside of the current analysis is that we cannot easily distinguish the ‘good’ from the ‘bad’ student use of ‘help abuse’.

5 Conclusion Existing studies on the efficiency of alternative learning strategies, both in labs [17] and in classrooms [11, 12], point in the direction of worked-out examples being superior to tutored and untutored problem solving as instructional technology. These are generic conclusions that do not distinguish between types of academic tasks and types of students. The most important contribution of this research is the emphasis on individual student preferences: when taking the digital learning environment out of the lab, bringing it to an authentic context where students themselves decide what learning scaffolds to use, we observe large differences in the intensity students use different learning modes: worked examples, hints as part of tutored problem solving, or untutored problem solving. These large differences are associated with individual differences in prior knowledge and learning dispositions; therefore, it requires an observational type of study, rather than an experimental type of study, to discover these different learning profiles based on individual differences in knowledge and learning dispositions. Transferring the findings of [11, 12, 14] to our context suggests that the superiority of the strategy of the examples worked could be the result of the tasks offered to the participants of these studies to be of such a type that the students in their studies had little or no prior knowledge. Our context is different: given the wide variety of tasks and the great diversity of prior knowledge of students, there is a wide range of relevant prior knowledge levels for each task. In such a context, where students are expected to demonstrate mastery, a mastery that can only be acquired in the untutored problemsolving mode, the use of examples and hints is inevitably a detour, making the most direct way to master inefficient. That route of using tutored problem solving and worked examples is taken by students who consider the direct route of untutored problem solving to be -still- impassable, explaining the relationship with prior knowledge. Our study is based on creating a taxonomy of learning behaviors by measuring trace data generated by students’ activities in e-tutorials. This taxonomy confirms the concept of ‘help abuse’ developed by [2, 20]. “The ideal student behaves as follows: If, after spending some time thinking about a problem-solving step, a step does not look familiar, the student should ask the tutor for a hint.” [2, p. 229]. However, not all students are ideal, and some of these ‘non-ideal’ students will, instead of trying to solve problems by asking for hints, bypass these hints and immediately ask for complete solutions. In our previous research [28], we found some evidence of this ‘help abuse’. That evidence was less powerful than the one what we found in this study, due to the choice of analyzing absolute measures of the intensity of use of examples and hints and excluding the least active students from the analysis. In this study, analyzing relative measures of feedback

68

D. T. Tempelaar et al.

use without excluding students with low activity levels, we demonstrate that the ‘help abuse’ phenomenon is of crucial importance to explain differences in student profiles and that it is clearly connected to differences in learning dispositions. In other words, the concepts of ‘good and bad student use’ as introduced in [20] are associated with individual differences in learning approaches. We also confirm the findings of, e.g., [4, 5] that traces of learning processes are useful data sources for profiling learning behavior. At the same time: these data contain only part of the learning process. In other words: the most important limitation of this research approach is that learning that takes place outside the traced e-tutorials is not observed. Current research focuses on individual differences amongst students in their preference for learning strategies and the relationship with learning strategies. In future research, we also want to include the task dimension, by investigating the preference of students for learning strategies in function of both individual differences in learning strategies and task characteristics.

References 1. Aleven, V., McLaren, B.M., Koedinger, K.R.: Towards computer-based tutoring of helpseeking skills. In: Karabenick, S., Newman, R. (eds.) Help Seeking in Academic Settings: Goals, Groups, and Contexts, pp. 259–296. Erlbaum, Mahwah (2006) 2. Aleven, V., McLaren, B., Roll, I., Koedinger, K.: Toward tutoring help seeking. In: Lester, J.C., Vicari, R.M., Paraguaçu, F. (eds.) ITS 2004. LNCS, vol. 3220, pp. 227–239. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30139-4_22 3. Aleven, V., Roll, I., McLaren, B.M., Koedinger, K.R.: Help helps, but only so much: research on help seeking with intelligent tutoring systems. Int. J. Artif. Intell. Educ. 26(1), 205–223 (2016). https://doi.org/10.1007/s40593-015-0089-1 4. Amo, D., Alier, M., García-Peñalvo, F.J., Fonseca, D., Casañ, M.J.: Learning analytics to assess students’ behavior with scratch through clickstream. In: Conde, M.Á., FernándezLlamas, C., Guerrero-Higueras, Á.M., Rodríguez-Sedano, F.J., Hernández-García, Á., GarcíaPeñalvo, F.J. (eds.), Proceedings of the Learning Analytics Summer Institute Spain 2018 – LASI-SPAIN 2018, pp. 74–82. CEUR-WS.org., Aachen (2018) 5. Amo-Filvà, D.A., Alier Forment, M., García-Peñalvo, F.J., Fonseca-Escudero, D., Casañ, M.J.: Clickstream for learning analytics to assess students’ behaviour with Scratch. Fut. Gener. Comput. Syst. 93, 673–686 (2019). https://doi.org/10.1016/j.future.2018.10.057.2019 6. Azevedo, R., Harley, J., Trevors, G., Duffy, M., Feyzi-Behnagh, R., Bouchet, F., Landis, R.: Using trace data to examine the complex roles of cognitive, metacognitive, and emotional self-regulatory processes during learning with multi-agent systems. In: Azevedo, R., Aleven, V. (eds.) International Handbook of Metacognition and Learning Technologies. SIHE, vol. 28, pp. 427–449. Springer, New York (2013). https://doi.org/10.1007/978-1-4419-5546-3_28 7. Shum, S.B., Crick, R.D.: Learning dispositions and transferable competencies: pedagogy, modelling and learning analytics. In: Shum, S.B., Gasevic, D., Ferguson, R. (eds.) Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, pp. 92–101. ACM, New York (2012). https://doi.org/10.1145/2330601.2330629 8. Gaševi´c, D., Jovanovi´c, J., Pardo, A., Dawson, S.: Detecting learning strategies with analytics: links with self-reported measures and academic performance. J. Learn. Anal. 4(1), 113–128 (2017). https://doi.org/10.18608/jla.2017.42.10

Feedback Preferences of Students Learning

69

9. Gaševi´c, D., Mirriahi, N., Dawson, S., Joksimovi´c, S.: Effects of instructional conditions and experience on the adoption of a learning tool. Comput. Hum. Behav. 67, 207–220 (2017). https://doi.org/10.1016/j.chb.2016.10.026 10. Martin, A.J.: Examining a multidimensional model of student motivation and engagement using a construct validation approach. Br. J. Educ. Psychol. 77(2), 413–440 (2007). https:// doi.org/10.1348/000709906X118036 11. McLaren, B.M., van Gog, T., Ganoe, C., Karabinos, M., Yaron, D.: The efficiency of worked examples compared to erroneous examples, tutored problem solving, and problem-solving in classroom experiments. Comput. Hum. Behav. 55, 87–99 (2016). https://doi.org/10.1016/ j.chb.2015.08.038 12. McLaren, B.M., van Gog, T., Ganoe, C., Yaron, D., Karabinos, M.: Exploring the assistance dilemma: comparing instructional support in examples and problems. In: Trausan-Matu, S., Boyer, K.E., Crosby, M., Panourgia, K. (eds.) ITS 2014. LNCS, vol. 8474, pp. 354–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07221-0_44 13. Nguyen, Q., Tempelaar, D.T., Rienties, B., Giesbers, B.: What learning analytics based prediction models tell us about feedback preferences of students. Q. Rev. Distance Educ. 17(3), 13–33 (2016). In: Amirault, R., Visser, Y., (eds.) e-Learners and their Data, Part 1: Conceptual, Research, and Exploratory Perspectives 14. Papamitsiou, Z., Economides, A.: Learning analytics and educational data mining in practice: a systematic literature review of empirical evidence. Educ. Technol. Soc. 17(4), 49–64 (2014) 15. Pekrun, R.: The control-value theory of achievement emotions: assumptions, corollaries, and implications for educational research and practice. Educ. Psychol. Rev. 18(4), 315–341 (2006). https://doi.org/10.1007/s10648-006-9029-9 16. Pekrun, R., Götz, T., Frenzel, A.C., Barchfeld, P., Perry, R.P.: Measuring emotions in students’ learning and performance: the achievement emotions questionnaire (AEQ). Contemp. Educ. Psychol. 36, 36–48 (2011). https://doi.org/10.1016/j.cedpsych.2010.10.002 17. Renkl, A.: The worked examples principle in multimedia learning. In: Mayer, R.E. (ed.) The Cambridge Handbook of Multimedia Learning, pp. 391–412. Cambridge University Press, Cambridge (2014) 18. Rienties, B., Cross, S., Zdrahal, Z.: Implementing a Learning analytics intervention and evaluation framework: what works? In: Kei Daniel, B. (ed.) Big Data and Learning Analytics in Higher Education. LNCS, pp. 147–166. Springer, Cham (2017). https://doi.org/10.1007/ 978-3-319-06520-5_10 19. Rienties, B., Tempelaar, D., Nguyen, Q., Littlejohn, A.: Unpacking the intertemporal impact of self-regulation in a blended mathematics environment. Comput. Hum. Behav. (2019). https:// doi.org/10.1016/j.chb.2019.07.007 20. Shih, B., Koedinger, K.R., Scheines, R.: A response time model for bottom-out hints as worked examples. In: Baker, R.S.J.D., Barnes, T., Beck, J. (eds.) Proceedings of the 1st International Conference on Educational Data Mining, EDM 2008, Montreal, Canada, pp. 117–126 (2008) 21. Tempelaar, D.T., Cuypers, H., Van de Vrie, E., Heck, A., Van der Kooij, H.: Formative assessment and learning analytics. In: Suthers, D., Verbert, K. (eds.) Proceedings of the 3rd International Conference on Learning Analytics and Knowledge, pp. 205–209. ACM, New York. https://doi.org/10.1145/2460296.2460337 (2013) 22. Tempelaar, D.: How dispositional learning analytics helps understanding the worked-example principle. In: Sampson, D.G., Spector, J.M., Ifenthaler, D., Isaias, P. (eds.) Proceedings 14th International Conference on Cognition and Exploratory Learning in Digital Age (CELDA 2017), pp. 117–124. International Association for Development of the Information Society, IADIS Press (2017) 23. Tempelaar, D.T., Gijselaers, W.H., Schim van der Loeff, S., Nijhuis, J.F.H.: A structural equation model analyzing the relationship of student achievement motivations and personality

70

24.

25.

26.

27. 28.

29.

D. T. Tempelaar et al. factors in a range of academic subject-matter areas. Contemp. Educ. Psychol. 32(1), 105–131 (2007). https://doi.org/10.1016/j.cedpsych.2006.10.004 Tempelaar, D.T., Rienties, B., Mittelmeier, J., Nguyen, Q.: Student profiling in a dispositional learning analytics application using formative assessment. Comput. Hum. Behav. 78, 408–420 (2018). https://doi.org/10.1016/j.chb.2017.08.010 Tempelaar, D.T., Rienties, B., Giesbers, B.: In search for the most informative data for feedback generation: learning analytics in a data-rich context. Comput. Hum. Behav. 47, 157–167 (2015). https://doi.org/10.1016/j.chb.2014.05.038 Tempelaar, D.T., Rienties, B., Nguyen, Q.: Towards actionable learning analytics using dispositions. IEEE Trans. Educ. 10(1), 6–16 (2017). https://doi.org/10.1109/TLT.2017.266 2679 Tempelaar, D.T., Rienties, B., Nguyen, Q.: Adding dispositions to create pedagogy-based learning analytics. Zeitschrift für Hochschulentwicklung, ZFHE 12(1), 15–35 (2017) Tempelaar, D., Rienties, B., Nguyen, Q.: Analysing the use of worked examples and tutored and untutored problem-solving in a dispositional learning analytics context. In: Lane, H., Zvacek, S., Uhomoibhi, J. (eds.) Proceedings of the 11th International Conference on Computer Supported Education (CSEDU 2019), vol. 2, pp. 38–47. SCITEPRESS – Science and Technology Publications, Lda. (2019) Vermunt, J.D.: Metacognitive, cognitive and affective aspects of learning styles and strategies: a phenomenographic analysis. High. Educ. 31, 25–50 (1996). https://doi.org/10.1007/BF0012 9106

Teaching Defence Against the Dark Arts Using Game-Based Learning: A Review of Learning Games for Cybersecurity Education Rene Roepke(B)

and Ulrik Schroeder

Learning Technologies Research Group, RWTH Aachen University, Aachen, Germany {roepke,schroeder}@cs.rwth-aachen.de

Abstract. When comparing game-based learning approaches for cybersecurity education with the Defence against the Dark Arts class at Hogwarts we notice some similarities although they are not directly linked. Various game-based learning applications and learning games have been developed in the past but similar to the teacher’s curse in Harry Potter, they do not survive the prototype phase and hence, they are not available to the public. This work presents an extensive systematic literature and product review responding to two hypotheses as well as in-depth game analyses using a classification model for serious games. While we do not expect many games to be available for the target group of non-professional Internet and IT end-users without prior knowledge in Computer Science (CS), we also hypothesize that available games lack proper learning goals and do not teach sustainable knowledge and skills in CS. For our game analysis, we used the G/P/S classification model to gain insights on gameplay, purpose, and scope of the applications. The results include a falsified first hypothesis and indicators for the second hypothesis. In addition, the results of the game analysis are presented and discussed. Keywords: Cyber security education · Learning games · Game-based learning · End-users

1 Motivation Every year, a new teacher attempts to teach Defence against the Dark Arts at Hogwarts School of Witchcraft and Wizardry, but due to the position’s curse, a teacher last only one year before leaving and never returning. It looks like there is a lack of suitable teachers for this important subject, although many try to conquer it. Similarities can be found when looking at game-based learning approaches in the cybersecurity domain. In the past 15 years, various games for cybersecurity education have been developed and evaluated by researchers, educators, and the game industry. Game-based learning applications or learning games aim to teach players about phishing, malware, encryption and other important topics in the cybersecurity domain. A learning game is a type of ‘serious game’ with educational context. The definition of ‘serious game’ was originally by Clark Abt [1] but was updated by Mike Zyda in © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 71–87, 2020. https://doi.org/10.1007/978-3-030-58459-7_4

72

R. Roepke and U. Schroeder

2005 [2]. We define it as a game with a purpose other than pure entertainment [3]. We sometimes use the term ‘competence developing games’ as defined by König et al. in 2016 [4, 5]. We define game-based learning as learning by playing a game. When using technology, we add the prefix ‘digital’ but in its origin game-based learning is simply about fun and engagement joined with the serious activity of learning [6]. As part of the ERBSE project, which is an abbreviation for “Enable Risk-aware Behavior to Secure End-users”, we take a look at game-based learning approaches on cybersecurity and risk awareness. We focus on the area of conflict between end-users without previous CS education and the high complexity of cybersecurity. The goal of the project is to design, implement, and evaluate game-based learning approaches for cybersecurity education to enable non-professional end-users in risk assessment and appropriate behavior when using IT systems and the Internet [7]. While a target group of Computer Science (CS) students or employees in a similar field of work may seek learning opportunities in the area of cybersecurity, nonprofessional Internet and IT end-users have different motives. Often, respective games are used in university courses or as on-the-job training to gamify learning and serve as an exploratory learning opportunity for interested target groups. End-users without prior knowledge in CS often lack motivation and interest to learn about what appears to be a difficult topic. The learning curve is perceived as steep and the learning content as rather complex. By using game-based learning, we want to overcome this lack of motivation and engagement to convey knowledge and skills in the domain of cybersecurity. A problem occurs when end-users are not educated in the field of cybersecurity. Without appropriate knowledge and skills, end-users can find it very difficult to assess risks associated with using the Internet and IT systems. They simply do not know how to behave safely. This results in end-users who continue to use today’s technology and the Internet but who are not aware of potential risks and measures to secure themselves or others. This work presents a systematic review of game-based learning for cybersecurity education and looks at different learning games and game-based learning applications. In comparison to the work presented in [8], this contribution includes a game analysis using the G/P/S model by Djaouti et al. [9] and gives insights on gameplay, purpose, and scope of found games and applications. We build on the results of the two-fold systematic literature and product review and combine it with further analyses to better grasp how games work in cybersecurity. As in the previous work, the underlying hypothesis is that there are not many game-based learning applications and learning games in the field of cybersecurity, which are targeted at non-professional Internet and IT end-users and we also assume that available, existing games for respective end-users do not teach sustainable skills and knowledge of CS to properly educate its target group to behave securely and assess risk appropriately. Lastly, a game analysis of found games is presented to enrich the discussion. Our motivation for this work lies in the assumption that while a game has proven to be effective, it seems to disappear after its evaluation and does not reach the actual target group. So, similar to the teaching problem for the Defence against the Dark Arts class,

Teaching Defence Against the Dark Arts Using Game-Based Learning

73

effective game-based approaches in cybersecurity do not last. One could argue they are cursed as well. The structure of this paper is as follows: First, related work regarding reviews and the classification of serious games is discussed. Next, the methodology and results of both, the systematic literature and product review and the classification using the G/P/S model are presented. A discussion of our two hypotheses follows and last, a summary and future work outlook conclude the paper.

2 Related Work 2.1 Game-Based Learning Under Review Different authors took upon the review of various approaches to cybersecurity education. While some authors focused on game-based learning approaches and compared them to traditional training and instruction, other authors compared games and different gamified concepts in order to work up a state of the art. In this section, we want to give a brief overview of existing related work and highlight different approaches to compare gamebased learning approaches. In a review of game technology for cybersecurity education by Alotaibi et al. [10], various studies have been compared. The authors state that game-based learning approaches are relatively new in the field of security awareness and hence, it needs more extensive research. Alotaibi et al. [10] determine the need for a general game technology framework for raising security awareness. Next, they analyzed ten popular cybersecurity games that were available online. They focused on aspects like the game type, target audience and intended learning objectives. The results show that most games are suitable for students or teenagers and depending on the content, they may be also usable for professionals or employees in respective fields [10]. Hendrix et al. [3] determined a need for training of the public and within businesses. In their work, they reviewed studies using various training approaches for cybersecurity and while the studies indicated positive effects, the sample sizes were small and no effect sizes have been properly discussed. Also, the samples were often drawn from different target groups, which weakens the observed results as well. Hendrix et al. also mentioned problems when searching for publicly available games. Many games were not available at all and thus, they do not reach the intended target groups [3]. Besides the review of Hendrix et al. [3], a review of traditional and hands-on training was done by Tioh et al. [11] The authors emphasize that game-based learning has the potential to combine the characteristics of both training methods, traditional and handson, and thus, it can be more effective in educating users. Tioh et al. [11] identified different game types and topics. For academically developed games, they also reviewed existing studies on effectiveness. Overall, effectiveness was not yet empirically shown [11]. As a serious game usually tries to serve as an immersive experience and often includes simulations and animations to present the game content. Compte et al. [12] analyzed serious games for information assurance and derived observations and suggestions for the design of serious games in the domain of cybersecurity. In educational settings like academia or schools, time constraints limit the use of games although we actually want

74

R. Roepke and U. Schroeder

learners to have full access and sufficient time to play games. Pastor et al. [13] argue that learners should be able to play “in their own environment” [13] and without constraints, e.g. time limitation. A very constrained but even more immersive experience are virtual laboratories to gain hands-on knowledge and skills [14, 15]. With the majority of games being digital games using simulations, animations, and 2D or 3D environments, tabletop and card games move to the background. However, Dewey and Shaffer [14] also took a look at the games [d0x3D!] and Control-Alt-Hack®. Both games deal with security concepts related to network and computer security and are targeted at end-users, e.g. younger adults [15]. In so-called “Capture The Flag” (CTF) events or hack-a-thons, participants compete against others to win or solve challenges. These types of games are often described as open challenges or competitions where no solution will be provided at the end. The website CTFtime [16] reports more than 150 competitions for 2018. More than 70% were available online and accessible to anyone [16]. While CTFs have a rather competitive nature, they are more suitable to players with prior knowledge in cybersecurity, e.g. professionals in IT. They can usually be attempted by everyone and hence, they are open to the public as well. However, the challenges can be too complicated for players without support or proper background knowledge. Overall, the discussed reviews present game-based learning approaches from different perspectives. Often games are research prototypes used in academia and many games involve simulations and animations to serve as an immersive learning experience. While most games have been evaluated and indicate positive effects, the sample sizes were rather small and consisted of varying target groups [3, 11]. In addition, many games are no longer available or very hard to find. Thus, they are not accessible by the intended target groups [3]. 2.2 Classification of Serious Games In addition to different perceptions of serious games and digital learning games, there also exists a multitude of classification methods and taxonomies and none has been established widely [9]. Consequently, we can find many contributions that attempt to classify serious games and/or learning games. The initial input of a classification model or taxonomy is a result set based on systematic literature and/or product research. Alotaibi et al. [10] present a classification result of serious games in the domain of cybersecurity. While the authors do not use an explicit taxonomy, they classify scientific contributions and available products according to different dimensions [10]. First classification methods were based on a one-dimensional view of serious games to answer very specific questions, e.g. market-based classifications or purpose-based classifications [17]. In 2008 Sawyer and Smith [18] developed the “Serious Game Taxonomy”. This taxonomy classifies by the intended market and the purpose of a serious game. Pastor et al. [13] present another multidimensional classification of serious games and game-based learning applications for cybersecurity. Their approach focusses on the distinction between simulations and laboratory environments. In addition, aspects of scalability and framework conditions, as well as the target group, learning objectives,

Teaching Defence Against the Dark Arts Using Game-Based Learning

75

and the learning curve, are considered. A drawback of this contribution is the small sample size (n = 13). The authors did not apply their approach to a large number of games [12]. While multidimensional classification methods allow different dimensions to be considered, they are nevertheless limited, e.g. several classes cannot be assigned within one dimension and consequently, the classification is static and coarse. A solution is offered by using a multi-label classification, which allows assigning a set of different labels or classes to one dimension. Breuer and Bente [19] present an approach to multidimensional multi-label classification and suggest a set of dimensions with exemplary labels [19]. Unlike previous approaches, this approach is very open and flexible because labels can be added at will. However, there is also the danger of over-specification because too many labels exist and no actual classification happens. All in all, the classification of serious games can be done in various ways. Depending on the motivation and question in mind, different approaches for classifying serious games can be applied. The basis of all is extensive literature and product search. This contribution continues with a systematic literature and product review as well as a game analysis based on the G/P/S classification model by Djaouti et al. [9].

3 Systematic Literature and Product Review In the following, we will present the methodology and results of our systematic literature and product review. Within the interdisciplinary domain of game-based learning where development and research are driven by academic and commercial stakeholders, a systematic literature review alone does not cover all available game-based learning applications and learning games in the field of cybersecurity. Hence, a two-fold review process which consists of a systematic literature review on academic publications, as well as a product search using a search engine, is performed. 3.1 Methodology Generally, both retrieval processes are based on two keyword sets, one with cybersecurity-related terms and another one with terms regarding game-based learning and learning games. These keyword sets contain the most suitable keywords in their category but are not expected to be complete in coverage of all publications. The keyword sets are defined as follows: ITsec = {IT security, cybersecurity, risk awareness, security awareness, security education, cyber education, security} (1) LearnTech = {game based learning, gamification, serious game, learning game, edugame, teaching game, competence developing game} (2)

76

R. Roepke and U. Schroeder

All queries are used in three different digital libraries/search engines: IEEE Xplore1 , Google Scholar2 , and ACM Digital Library3 . On each request, the first 100 results are extracted for further analysis. We limit ourselves to the 100 first results since results with an even lower rank may be less fitting to our search queries. Afterward, a multiple-step filtering and classification process is performed to systematically review all extracted publications. First, all duplicates are removed and online availability and accessibility (via university access or open access) are determined. All duplicates and results that are not available to read are excluded. The third step is filtering all results based on the leading question, whether a result is about cybersecurity education or not. All off-topic results are excluded to reduce the result set. Next, a categorization is attempted on the result set. Results are sorted into the following categories: competition, game, gamification, review, and others, which includes all publications on frameworks, tools and further cybersecurity education content that does not fit any category. In the next step, we take a look at all reviews and all serious games or game-based learning applications cited within a review are added to the result set for further processing. This measure prevents the non-finding of games that are already reviewed by other authors. Afterward, for each publication, we determine the cybersecurity-related topics, the game name, the target group, and the intended educational context. Lastly, we check the online availability for all identified games. Since tabletop games, i.e. board games and card games are non-digital and therefore offline, online availability refers to online available information or similar. With the completion of the literature review and classification, product search for game-based learning applications and learning games on cybersecurity is performed using the Google search engine. All games found are added to the result set of the literature review and the topic, target group, and educational context are determined to complete the analysis. 3.2 Results Using the predefined keyword sets ITsec and LearnTech, the initial result set contained 2636 publications. It was reduced to 1277 results by eliminating duplicates and inaccessible results. Then, the set was filtered based on the content, i.e. cybersecurity education by any means. The remaining set included 183 publications. Adding the games mentioned in other reviews as well as the findings of the product search, the result set was extended to 216 results. In this final set, 181 results are of type ‘game’, ‘gamification’ or ‘competition’ (see Table 1). Next, we processed the partial result sets labeled ‘competition’ or ‘gamification’. Competitions are often CTFs or other cybersecurity challenges. They may include game mechanics but they are different from game-based learning applications and serious games. We also assume that we did not discover a representative portion of results in 1 https://ieeexplore.ieee.org/, accessed 02 September 2019. 2 https://scholar.google.de/, accessed 02 September 2019. 3 https://dl.acm.org/, accessed 02 September 2019.

Teaching Defence Against the Dark Arts Using Game-Based Learning

77

Table 1. Cumulative overview of results. Type

# Results

Game

133

Gamification 24 Competition 24  = 181 Review

14

Others

21

this category. The keyword set LearnTech was focussed on game-based learning and learning games rather than on competitions and challenges. In addition, CTFs and other cybersecurity challenges are often more competitive than learning games. Players are highly motivated to win and therefore may pursue significant training and research in the domain. In comparison, learning games are played for learning and therefore, they are usually less competitive. Also, since in competition knowledge and skills can be of advantage, players are encouraged to get a deeper understanding of cybersecurity which exceeds the expected level of education for end-users in this domain. After reviewing all results labeled as competitions we discovered that 2/3 are targeted at CS students, or professionals and hence, they are not suitable to end-users. The other third of results may be suitable for end-users but they are most likely for participants who are highly interested in the topic. We also expect them to have some background knowledge when participating in such competitions. Overall, we decided to omit all results categorized as ‘competitions’ due to the fact that they are not primarily for endusers without prior knowledge in CS. However, we strongly suggest looking further into competitions and used game technologies in CTFs for end-user education. Next, we omitted all results labeled as ‘gamification’ due to their incorporation of gamification approaches to educating users. We distinguish gamification and gamebased learning. To our understanding, gamification is the use of game elements in a rather traditional learning context, such as scoring, avatars, or rankings. We use the definition of Deterding et al. [17] that “gamification is the use of game design elements in non-game contexts” and hence, gamification is not equal to game-based learning. In the end, our further analysis and discussion will be based on 133 results on gamebased learning approaches for cybersecurity education. For all results categorized as games, we continued the analysis and determined the respective topics, game names, target groups, and the intended educational context. Among these results, we identified 99 different learning games or game-based learning applications. Possible target groups are CS students, employees, end-users, parents/teachers, professionals, and students. While 99 games may seem to be many games, online availability is a crucial criterion to interpret the results further. Games which are not available online (either as downloads or web applications) are inaccessible by the respective target group. Finally, only 48 of

78

R. Roepke and U. Schroeder

99 games were available (see Table 2). Note that tabletop games, board games, and card games are marked as ‘available online’ if they are still sold or available for download and print. Table 2. Distribution of target groups. Target group

# Games

Available online

CS students

19

5

Employees

12

7

End-users

26

12

Parents/Teachers

1

1

Professionals

9

5

Students

32

18

 = 99

 = 48

In addition to the target group of a game, we determined the educational context. We distinguished between Primary School, Middle School, High School, College/University, Corporate and Non-formal context. We applied the multi-label classification since some games were suitable for different educational contexts. The results are shown in Table 3. Table 3. Results of Educational Context Analysis using multi-label classification. Educational context # Games Primary School

4

Middle School

9

High School

10

College/University

26

Corporate

20

Non-formal

38

The majority of games is designed for use in colleges and universities as well as corporate and non-formal contexts. Since there is no complete coverage of CS education among primary and secondary schools, fewer games are explicitly designed for school contexts. We assume that without CS education in schools, cybersecurity education is taught even less. Often games are designed for colleges and universities because they are part of research projects and students are a suitable test and evaluation group. These games

Teaching Defence Against the Dark Arts Using Game-Based Learning

79

may serve a specific educational purpose, e.g. gamifying a class on cybersecurity but can also be suitable for end-users in non-formal learning contexts. In corporate learning contexts, games are often developed professionally and used for training purposes of employees or professionals. It’s embedding into the corporate contexts can make them less suitable for the public, e.g. they simulate a corporate environment to create an authentic learning experience. Topics may include corporate espionage or data privacy regulations of a company. Depending on the topic, a transfer to private life of end-users is unlikely. The analysis of game topics did not yield clear results but rather a variety of topics reoccurring in different games, e.g. phishing and password security. Ranging from hacking and network security to online safety, available games often cover more than one topic in varying depth. The overall result of our two-fold approach to reviewing game-based learning applications and learning games on cybersecurity is a set of 48 available games, found through a product search in scientific publications.

4 Game Analysis Using G/P/S Model In the following, we use the result set of our systematic literature and product review to further analyze the games on the aspects of gameplay, purpose, and scope. By applying the G/P/S model, i.e. a model for classification of serious games developed by Djaouti et al. [9] in 2011, we want to get further insights on how game-based learning applications and learning games in the domain of cybersecurity are implemented. 4.1 Methodology For the classification of serious games, Djaouti et al. [9] propose the G/P/S model. It was designed on the basis of previously developed taxonomies and aims to be a better solution to prior problems. The model focusses on the analysis of both, the “Game” dimension and the “Serious” dimension. Therefore, the G/P/S model is a multidimensional model, i.e. the model classifies games based on three key aspects: “Gameplay”, “Purpose” and “Scope”. The “gameplay” aspect captures how the game is played. Here, details about the game structure are to be determined. There is currently no uniform definition for “gameplay” in academia and the gaming industry. It explains how the game is played as well as the general objective of the game. Djaouti et al. distinguish games according to their goals: There are games that are called “play-based” because they have no rules checking for the objectives of the game. The other type of games includes rules that define the objectives of the game (e.g. winning). They are classified as “game-based”. “Purpose” means the purpose intended by the developer, e.g. in addition to entertainment. This is an attempt to answer the question of what the game can be used for. Djaouti et al. distinguish between three main purposes: message-broadcasting, training and data exchange. Learning games, for example, can focus specifically on conveying content or just train something, e.g. motor skills in serious games for health care. Games with a

80

R. Roepke and U. Schroeder

focus on data exchange are, for example, collaborative games in which the exchange of knowledge determines success in the game, e.g. Lure of the Labyrinth. The aspect “Scope” refers to the area of application and to the intended market (e.g. military, education) and the target group (e.g. public, pupils, students, professionals) for which the game is suitable. This can be used to determine who should play the game. By using the G/P/S model with its three dimensions we can analyze each game of our result set and determine common gameplays and purposes as well as strengthen our analysis of target groups. 4.2 Results Our game analysis using the G/P/S model [9] was performed on the resulting 39 available games (nine games of the original result set were not available anymore at the time of this analysis). Since insight in gameplay and purpose required playing and experimenting with the available games, we were not able to properly classify unavailable games. After classifying serious games using the G/P/S model, the results look as follows (see Table 4): The differentiation of “gameplay” was determined by the objectives and rules of a game. Games that have objectives and rules for winning the game are classified as “game-based”. If there are no explicit rules for winning but, e.g., a score or rankings, these games are marked as “play-based”. For game-based learning applications and learning games about cybersecurity, there exist games with the goal of winning as well Table 4. Results of G/P/S classification. Gameplay

# Results

Game-based

17

Play-based

22

Purpose

# Results

Message broadcasting 27 Training

15

Data exchange

4

Scope

# Results

Market Education

30

Corporate

8

Entertainment

1

Target Group General Public

20

Professionals

8

Students

18

Teaching Defence Against the Dark Arts Using Game-Based Learning

81

as games that focus more on playing than just winning. 17 entries were classified as “game-based” and 22 entries as “play-based”. Regarding the “Purpose” dimension of the G/P/S model, it is possible for games to serve multiple purposes. On the one hand, a game may only be aimed at conveying content, but on the other hand, it may also have active assessment elements allowing the behavior to be learned and trained. 27 of the 39 games were classified as “Message broadcasting”. Only 15 contributions offer explicit training, e.g. in the evaluation of emails in relation to phishing and/or spam. Only four of the identified games are based on the exchange of data and knowledge in a collaborative fashion. However, those games are non-digital games, i.e. card or board games. Discussion elements in the course of the game require players to exchange ideas, make joint decisions or assess risks as a group. Three of the games with the purpose of “data exchange” are for IT professionals and thus, previous knowledge can be explicitly integrated into the game, e.g. in discussions and decision making. With regard to the scope, a distinction is made between markets and target groups. As expected, most of the contributions were developed for the education sector or serve for training, formally or non-formally. 8 of the 39 contributions are for employees and the workplace. The target group distribution shows that more than half of all contributions are suitable for the public, but also 18 contributions explicitly endorse their use with students. For some games, explicit instructions are given to instructors on how the game can be integrated into their courses. Finally, the classification results show the diversity of existing games in the area of cybersecurity. While only a few games currently promote collaboration and knowledge exchange, there are more games to convey learning content. Due to the apparent need for cybersecurity education, games focus on the public as well as on students in schools and universities.

5 Discussion After presenting the results of our two-fold systematic literature and product review as well as our game analysis using the G/P/S classification model by Djaouti et al. [9], we need to discuss our two hypotheses with respect to the results. As the literature and product review shows that only 48 out of 99 games are publicly available. The game analysis in the second step included only 39 games since nine more games went offline over time. This observation agrees with our initial assumption that although games have proven to be effective or suitable, they seem to disappear after some evaluation meaning they do not reach the target groups in the long run. One could suggest a curse as in the Harry Potter stories lies on those game-based approaches. Regarding our first hypothesis, which says that there are not many game-based learning applications and learning games for cybersecurity education targeted at nonprofessional Internet and IT users without prior knowledge in CS, we were surprised to find out that about 60% of the games are designed for the respective target group. A total of 58 of 99 games are suitable for end-users (26 results) and students (32 results), wherein this case, students are explicitly non-CS students. Available are only 30 out of 48 games. At first, this result indicates that there are quite a few games for end-users

82

R. Roepke and U. Schroeder

without prior knowledge in CS and thus, the result disproves our hypothesis. However, due to the fact that only about 50% of the results are available to the public, the results are smaller, which weakens its effect. The game analysis using the G/P/S model [9] supports our observation. 20 of 39 (~ 51%) games are suitable for the general public, e.g. end-users without prior knowledge in CS. 18 games are also suitable for students but since the G/P/S model does not distinguish between CS students and other students, we cannot clearly derive a portion of games suitable for players without prior knowledge in CS. The most targeted market according to the G/P/S classification is the educational market. This market includes formal educational contexts, e.g. schools and university, but also non-formal contexts. Since the G/P/S model is usually applied to serious games and learning games are serious games with educational content, this comes to no surprise. Depending on the field of work, professionals in the corporate context may or may not be end-users without prior knowledge. For our research, we exclude these games, since they usually are meant for employees in the IT sector and prior knowledge is given due to their education, training or studies prior to the job. For our second hypothesis, a closer look at the available results is necessary. We are looking for games that teach sustainable skills and knowledge in CS to properly qualify end-users to behave securely and assess risk suitably. Therefore, we need to determine what the games and applications teach and possibly how knowledge and skills are conveyed (Fig. 1).

Fig. 1. Screenshot of “The Internet Safety Game” [20].

As presented in [8], we took a look at different games and found indicators for our second hypothesis. We looked at “The Internet Safety Game” [20] which is available on the platform “NetSmartKidz” by the National Center for Missing & Exploited Children (see). Its target group is younger children in non-formal learning contexts and is available via a web browser. The game is similar to a board game. It consists of a board with pathways and a character that can be moved stepwise when rolling a dice. In the game, you have to collect various items on the board to win. Each item is a piece of information

Teaching Defence Against the Dark Arts Using Game-Based Learning

83

regarding the Internet and online safety. There are six items to be found and based on the chosen difficulty level, in the beginning, a multiple-choice quiz follows as a final assessment. The information shared with the collectible items include recommendations like not sharing personal information (e.g. name, age, or address) or chat vocabulary. While not sharing personal information seems appropriate advice for younger children, this advice is out of context. There is no explanation on potential risks and reasons why someone should not share personal information online. Next, we took a look at the game prototype PASDJO by Seitz and Hussmann [21]. It is a game about passwords, where players rate a set of passwords and get feedback on the quality of passwords accordingly. The gameplay is very simple and the overall game time is rather short. After rating, a set of passwords the player gets feedback on the quality of a password but the game does not take it further. The topics of password strength, risks of weak passwords and authentication via passwords are not addressed. We were not able to identify specific learning goals, which is why this game does not teach sustainable skills and knowledge in CS [21]. A third example is the “Safe Online Surfing” platform [22] by the Federal Bureau of Investigation (FBI). This platform contains various mini-games embedded in a theme world with islands representing games for different age groups (sorted by grade levels). While the mini-games incorporate different game elements and mechanisms to engage the players, skills and knowledge taught in the games are rather arbitrary and not properly motivated. In addition, the learning content misses risks and threat models and is heavily built on factual knowledge without enough context [22]. The three games presented here share a common lack of context and relevance. The learning content is either arbitrary or not suitably integrated into specific contexts. For learners, it is hard to relate to the learning content because factual knowledge seems irrelevant and disconnected. These games all provide indicators validating our second hypothesis that games targeted at end-users without prior knowledge in CS to do not teach sustainable skills and knowledge in cybersecurity (Fig. 2).

Fig. 2. Screenshot of CyberCIEGE [23].

84

R. Roepke and U. Schroeder

The game CyberCIEGE [23] is a research prototype by the Naval Postgraduate School, which made it to the public and is available for download. It focusses on computer and network security and consists of an interactive game environment, where the player acts as an employee of a company who is responsible for the configuration of firewalls, VPNs and other security-related systems. The game is complex and provided many different scenarios. Attack scenarios include trojan horses, viruses, malicious e-mail attachments and more. It was used in various studies to evaluate its effectiveness [23–25]. Compared to the three previous games, CyberCIEGE offers a different learning experience with more content, context, and relevance. The gameplay of an employee configuring IT systems and dealing with security issues is a real-life scenario. While the aspects of context and relevance may be less controversial in CyberCIEGE, the availability of multiple scenarios makes it harder to enter the game. The game was used in different educational contexts, but its use was always supported by a teacher or instructor. In addition, the developers of CyberCIEGE provide supportive materials to include the game into introductory courses to cybersecurity, e.g. a syllabus matching game scenarios to possible course topics. This makes CyberCIEGE suitable for formal learning contexts and possibly less interesting to end-users. For end-users in non-formal learning contexts, a game requires easy access. A large set of different scenarios might be interesting due to the amount of learning content that it provides, but to end-users, it may be overwhelming and a reason not play it. After presenting different games and discussing their characteristics, we can respond to our second hypothesis. Games like the “Safe Online Surfing” platform [22] or “The Internet Safety Game” are designed for children in non-formal learning contexts. PASDJO [21] and CyberCIEGE [23] may be more suitable for target groups of different but older age groups. While all games can be used by end-users without prior knowledge and skills in CS, they present limitations. Games like “The Internet Safety Game” or mini-games on the “Safe Online Surfing” platform present cybersecurity learning content without essential context. Thus, they fail to teach why the addressed topics are important and they do not address potential risks and threats. This leads to missing relevance and context of the learning content and the rather factual knowledge in these games stays unconnected. In the way of presentation, reviewed games show a variety of purposes. The G/P/S classification shows that 27 games focus on message broadcasting, e.g. distribution of important information and factual knowledge. Only 15 games incorporate training of skills and procedural knowledge. When it comes to card games and board games, data exchange is often chosen to support knowledge acquisition. By collaborating and exchanging information between players, a social component supports learning. This purpose is not found in digital game-based learning for cybersecurity education so far. Most games are played by single players and social interaction is not explicitly integrated to support learning. An issue of games that heavily rely on message broadcasting is the interactivity, e.g. PASDJO just presents passwords and ask the user to rate them. After rating the first passwords the gameplay becomes repetitive and boring. Mini-games on the “Safe Online Surfing” platform implement various game mechanisms to engage players, but the focus lies on factual knowledge only.

Teaching Defence Against the Dark Arts Using Game-Based Learning

85

In CyberCIEGE, factual, conceptual, and procedural knowledge is conveyed due to the interactive environment in which the player acts as an employee configuring the IT systems of a company. Teaching important factual knowledge about threats, systems and countermeasures are combined with procedural knowledge and skills, e.g. configuring a firewall. Overall, we consider the given games as indicators verifying our second hypothesis. The presented games are missing relevance and important information on risks, adversary models and quality of security measures. They show limitations due to heavily focusing on factual knowledge. While CyberCIEGE may attempt to convey conceptual and procedural knowledge, the game appears to be more complex for users to access. Its use is recommended in a course setup or formal educational contexts. We suggest teaching a mixture of factual, conceptual and procedural knowledge in order to teach sustainable knowledge and skills in CS. It is also important to integrate aspects of relevance and context such that players understand why to learn about a topic of cybersecurity. Since cybersecurity faces a lot of changes over time, i.e. adversaries trying new techniques, using hidden backdoors, or relying on missing user awareness, the teaching of cybersecurity needs to be sustainable and gained knowledge and skills need to be easy to adapt to new challenges. There is constant need to continue learning about new risks and with foundational skills and knowledge it should be less challenging to end-users. Finally, the first hypothesis was disproven by the result of 99 games in the field of cybersecurity, where 58 games are suitable for end-users. By reviewing all available games, we were able to find indicators proofing the second hypothesis. Many games lack relevance and context since they do not teach sustainable knowledge or skills. They often rely on factual knowledge and to some degree the content of a game seems arbitrary. Used games mechanisms vary but most games share the purpose of message broadcasting. Some games also implement training possibilities and in non-digital tabletop games, i.e. card games, social interaction is required for data and knowledge exchange.

6 Conclusion and Future Work Similar to the problem of keeping a teacher for the class in Defence against the Dark Arts at Hogwarts, game-based approaches for cybersecurity education somehow do not last and are thus, they are not available to the public. This work includes a review of game-based learning applications and learning games for cybersecurity education as well as a game analysis based on the G/P/S model by Djaouti et al. [17]. Our approach was based on a systematic literature and product review. The result set included 216 entries and in a first classification, we identified 181 results on games, competitions, and gamification. Next, we excluded all results on competitions and gamification as they are either not targeting end-users without prior knowledge or skills in CS or are relying on different concepts than game-based learning. Since online availability is In the end, we classified all remaining results on aspects like target group, educational context, and game topics. Since online availability is crucial for end-users to find games on cybersecurity, we narrowed our result set down to 48 games for different target groups (see Table 2) and educational contexts (see Table 3).

86

R. Roepke and U. Schroeder

For further insights, we used the G/P/S model by Djaouti et al. [9] to analyze all available games and classify according to gameplay, purpose, and scope. Lastly, we used our results of the systematic literature and product review and the analysis using the G/P/S model to discuss our two hypotheses. At first, we disproved the first hypothesis that there are not many games for endusers without prior knowledge or skills in CS. At least 50% of the games are targeted at end-users and non-CS students. 20 out of 39 games are suitable for the general public according to our results of the G/P/S classification. This result was rather surprising since we assumed that game prototypes usually disappear after their evaluation. For our second hypothesis, we were not able to definitely proof or disprove it. We merely identified limitations and drawbacks of games that indicate that available games for end-users do not teach sustainable knowledge or skills. We determined a major purpose of message broadcasting and some games offering training opportunities. Only card and board games incorporate social interaction as well as data and information exchange. In most games, we are missing aspects of relevance and context. Learning content is often based on factual knowledge and in some games, it seems rather arbitrary what players should learn. Future work can be based on our result set of the systematic literature and product review. As our intention was establishing the state of the art on game-based learning approaches in the domain of cybersecurity, we need to take further looks into games of specific topics to determine how learning content is integrated into a game. Also, we want to analyse for game elements and map learning goal and principles to game mechanics. For our project context, we propose to design new game prototypes for end-users and implement lessons learned from available research on game-based learning in the cybersecurity domain. In ERBSE, we want to implement and evaluate game-based learning approaches to enable end-users in risk assessment and appropriate behavior when using IT systems and the Internet. Here, we want to apply a interdisciplinary approach with expertise in three different fields: IT Security, CS didactics and Game Design. Hopefully our solutions survive the research prototype phases and can reach the target group in the long run. In other words, our desired approaches should last, unlike the teachers for Defence against the Dark Arts.

References 1. Abt, C.: Serious Games. University Press of America, Lanham (1987) 2. Zyda, M.: From visual simulation to virtual reality to games. Comput. (Long. Beach. Calif) 38, 25–32 (2005). https://doi.org/10.1109/mc.2005.297 3. Hendrix, M., Al-Sherbaz, A., Bloom, V.: Game based cyber security training: are serious games suitable for cyber security training? Int. J. Serious Games. 3, 53–61 (2016). https:// doi.org/10.17083/ijsg.v3i1.107 4. Wolf, M.R., König, J.A.: Competence developing games. INFORMATIK 2017. Gesellschaft für Informatik, Bonn (2017) 5. König, J.A., Wolf, M.R.: A new definition of competence developing games. In: Proceedings of the Ninth International Conference on Advances in Computer-Human Interactions, pp. 95– 97 (2016)

Teaching Defence Against the Dark Arts Using Game-Based Learning

87

6. Prensky, M., Thiagarajan, S.: Digital Game-Based Learning. Paragon House Publishers (2007) 7. NERD.NRW, G.: ERBSE - Enable Risk-aware Behavior to Secure End-users (2019). https:// nerd.nrw/forschungstandems/erbse/. Accessed 02 Sep 2019 8. Roepke, R., Schroeder, U.: The problem with teaching defence against the dark arts: a review of game-based learning applications and serious games for cyber security education. In: Proceedings of the 11th International Conference on Computer Supported Education - Volume 2: CSEDU, pp. 58–66. SciTePress (2019) 9. Djaouti, D., Alvarez, J., Jessel, J.-P.: Classifying serious games: the G/P/S model. Handbook of Research on Improving Learning and Motivation Through Educational Games: Multidisciplinary Approaches, pp. 118–136. IGI Global, United States (2011) 10. Alotaibi, F., Furnell, S., Stengel, I., Papadaki, M.: A review of using gaming technology for cyber-security awareness. Int. J. Inf. Secur. Res. 6, 660–666 (2016). https://doi.org/10.20533/ ijisr.2042.4639.2016.0076 11. Tioh, J.-N., Mina, M., Jacobson, D.W.: Cyber security training a survey of serious games in cyber security. In: 2017 IEEE Frontiers in Education Conference (FIE), pp. 1–5. IEEE (2017) 12. Le Compte, A., Elizondo, D., Watson, T.: A renewed approach to serious games for cyber security. In: 2015 7th International Conference on Cyber Conflict: Architectures in Cyberspace, pp. 203–216. IEEE (2015) 13. Pastor, V., Diaz, G., Castro, M.: State-of-the-art simulation systems for information security education, training and awareness. In: IEEE EDUCON 2010 Conference, pp. 1907–1916. IEEE (2010) 14. Dewey, C.M., Shaffer, C.: Advances in information security education. In: 2016 IEEE International Conference on Electro Information Technology (EIT), pp. 0133–0138. IEEE (2016) 15. Son, J., Irrechukwu, C., Fitzgibbons, P.: Virtual lab for online cyber security education. Commun. IIMA. 12, 5 (2012) 16. CTFtime.org: CTF Events (2019). https://ctftime.org/event/list/?year=2018. Accessed 02 Sep 2019 17. Deterding, S., Khaled, R., Nacke, L.E., Dixon, D.: Gamification: toward a definition. In: CHI 2011 Gamification Workshop Proceedings (2011) 18. Sawyer, B., Smith, P.: Serious games taxonomy. Pap. Present. serious games summit game Dev. Conf. San Fr. USA, 23–27 (2008) 19. Breuer, J., Bente, G.: Why so serious? On the relation of serious games and learning. J. Comput. Game Cult. 4, 7–24 (2010) 20. NetSmartKidz.org: The Internet Safety Game (2018). https://www.netsmartzkids.org/Advent ureGames/TheInternetSafetyGame. Accessed 29 Nov 2018 21. Seitz, T., Hussmann, H.: PASDJO: quantifying password strength perceptions with an online game. In: Proceedings of the 29th Australian Conference on Computer-Human Interaction, pp. 117–125 (2017) 22. FBI: FBI| Safe Online Surfing| SOS (2019). https://sos.fbi.gov/en/. Accessed 02 Sep 2019 23. Irvine, C.E., Thompson, M.F., Allen, K.: CyberCIEGE: gaming for information assurance. IEEE Secur. Priv. 3, 61–64 (2005) 24. Ariffin, M.M., Ahmad, W.F.W., Sulaiman, S.: Investigating the educational effectiveness of game based learning for IT education. In: 2016 3rd International Conference on Computer and Information Sciences (ICCOINS), pp. 570–573 (2016) 25. Raman, R., Lal, A., Achuthan, K.: Serious games based approach to cyber security concept learning: Indian context. In: 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), pp. 1–5 (2014)

A Collective Dynamic Indicator for Discussion Forums in Learning Management Systems Malik Kon´e1,2(B) , Madeth May1 , S´ebastien Iksal1 , and Souleymane Oumtanaga2 1 LIUM, Le Mans Universit´e, Le Mans, France {malik.kone.etu,madeth.may,sebastien.iksal}@univ-lemans.fr 2 LARIT, INP-HB, Yamoussoukro, Cˆ ote d’Ivoire [email protected]

Abstract. In today’s successful Learning Management System (LMS), gathering thousands of students, emergent collective dynamics drive innovative learning experiences where learners help each other in online forums. The benefits of those behaviors were theorized in Vygotsky’s socio-constructivism theory where he insists that the knowledge development of not so formal peer exchanges is beneficial to all participants. Observing and understanding how those dynamics occur could improve course design and help tutors intervene to sustain collective learning. But, although the scientific community acknowledges the importance of theses dynamics, few works have yet been able to grasp and display them in a format tailored to the Massive Open Online Courses (MOOCs)’ instructors. Indeed, only recently have researches been able to articulate the required continuous Natural Language Processing (NLP) and Social Network Analysis (SNA). In this research, we propose an innovative model to compute a collective activity indicator to answer the problem of detecting and visualizing the collective dynamics from the MOOCs’s forums interactions. We also present datasets collected from several LMSs and used to illustrate the portability, scalability, interactivity of our first visualizations. Our approach should help develop indicators and Learning Dashboard (LDB) of collective actions for MOOCs. Keywords: Visualization · Collective models · Learning analytics

1

· Discussion · Social network

Introduction

Although recent years have seen the success and development of many MOOCs, these Learning Management System (LMS) still lack adequate indicators and tools to help their users understand the dynamics taking place when online students are interacting on their forums. The situation is obviously very different from that of a physical classroom where instructors can get a feeling the cohorts c Springer Nature Switzerland AG 2020  H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 88–110, 2020. https://doi.org/10.1007/978-3-030-58459-7_5

A Collective Dynamic Indicator for Discussion Forums in LMS

89

dynamics directly from observation. In online course, even if the number of learners is not massive, they usualy are free to connect at any time. This makes it quite difficult for an instructor or even a team of instructors to follow exactly what is happening and how the learners understanding evolve in the course. Still, for 92% of MOOCs instructors, MOOCs forum are the most useful resources for understanding class dynamics [32]. Therefore in our study we focus on MOOCs forums exchanges and present a method to visualize the collective dynamics emerging from them. This should help instructors intervene in the course, or course design. Although, modern socio-constructivist theory based on Vygotsky’s works, points to discussion among peers as the foundation of building new knowledge. It is the exchanges with peer of roughly same academic level that brings learners in their proximal development zone, i.e. at the fringe of their knowledge domain but not too far in the unknown, and where knowledge can be replaced or created. Our approached is based several datasets that we collected over the past two years. It is innovative in that it present an indicator that is domain independent, scalable and platform agnostic. The inherent contextualization problem faced by indicators is directed towards the observer as we are moving in the direction of support tools rather than fully automated learning solutions [3]. We will show in Sect. 2, Natural Language Processing (NLP) and Social Network Analysis (SNA) based indicators that are useful to monitor discussion exchange but that may lack the collective dimension. There we will also review previous work about visualizing collective actions indicators. In the following section, in Sect. 3, we will propose our model framework. Then in Sect. 4 we will explain the differences between our datasets and how we collected them. In Sect. 5 we present our first visualizations. Finally, we will conclude our paper giving some perspectives and direction to further the research.

2

Previous Works

In Table 1 and Table 2, we review previous works pertaining to collective actions. The first set of researches focuses on indicators of collective actions and their relation with performance. The second set reports studies about supporting learning with visualizations of collective actions. Different sizes of social contexts are covered, small (S.), medium (M.), and Huge (H.), respectively less than a few tens, a few hundreds or a few thousands learners. For each paper we summarize what is measured, visualized and the expected results. Theses studies aim to impact positively the performance (i.e. the grades), the activity (i.e. the duration and number of online actions) of the learners (L.) or the instructors (I.), or the satisfaction and the motivation of the users. Beside the learners and the instructors the audience for these researches is also often the Researchers (R.) themselves. 2.1

Indicators of Collective Actions

Studies that consider collaboration usually try to identify the intentions by studying the student’s behavior, the message publications and their motivation.

90

M. Kon´e et al.

The social network analysis is used to determine the online behavior. It identify patterns for social activity (comments, like of others’ posts, replies, number of read) that characterize the users. To get feed back about the motivation of users, researches uses surveys. Content type is usualy analysed by hand for a small set of learners to get a deeper understanding on the quality of interactions happening in the group. Table 1. Indicator base studies. For this table and (Table 2), sizes are coded as small (S. less than ten students), medium (M. a few tens) or huge (a few hundreds and more students). The audience are coded as Instructors (I.), Learners (L.) or Researchers (R.). What is measured?

Expected output

Size Audience Papers

Interactions

Collective work

S.

I.

[10, 24]

Social activity & content type

Performance

S.

R.

[28]

Hierarchical positions

Activity & Performance S.

R.

[30]

Social activity & content type

Collective work

S.

R.

[6]

Social activity & content type

Activity

M.

R., I.

[5]

Social network structure, motivation & prio-performance

Performance

M.

R.

[19]

Interactions

performance

H.

R.

[34]

Social network structure, content type

Activity

H.

R.

[4]

Detecting Collective Action. In a 250 strong Community of Learning (COL), [30] compare on-task users, those showing engagement and high performance, with off-task users. They use questionnaires to relate the different behaviors to the users’ hierarchical position in the COL. They compare the actors’ hierarchical position and their engagements in learning discussions. They find a positive correlation between social position and engagement, therefore the authors did not invalidate their hypothesis that the social position influenced the learning behavior. SNA techniques are also used to collect statistical measures from the social network of messages’ exchange in order to automatically operationalize collaborative indicators. In [24], for example, SNA is used to compute initiative, activity, regularity of activity, regularity of initiation, and reputation, that permits identifying isolated learners and potential assistant within the learners. [34] are other researchers interested in the learning differences between offtopic and on-topic users. After a detailed analysis of the forum messages, they demonstrate that on-topic users, that is high-order thinking users displaying constructive and interactive behaviors in the forums, have more learning gain than

A Collective Dynamic Indicator for Discussion Forums in LMS

91

off-topic learners. In conclusion, the authors advocate for an off-topic discussion detector mechanism to guide users back on more constructive grounds. In [5], the authors’ approach is to study the turn-taking in discussions. They identify different types of conversation and categorize the users as: loner, replier, initiator without reply, initiator who respond, active social learner, active social without turn-taking, reluctant active social learners. Beside their valuable categorization, an important result is that they observe more engagement from recurrent posters, that is poster replying to comments made to their initial posts. If the importance of collective action for learning is agreed upon, the difficulty is identifying it at scale. It is a complex process needing content, temporal and social network analysis. The previous studies justify the importance of a detailed content analysis but they relied on human intervention limiting their potential to scale up. Scaling Up. [4,6] are the first big scale attempts that we found taking into account time, message content, social and dialogue structure. Each model the students’ dynamics with a mixture of NLP techniques such as Latent Dirichlet Allocation (LDA) or Latent Semantic indexing (LSI), and SNA (e.g. block models) applied to big temporal datasets. [6]’s dataset contains 3,685 contributions from 179 participants and spans 2 years. [4] use 2 datasets of respectively 7,699 and 12,283 messages written by 1,175 and 1,902 participants. [6] operationalize collaboration with a Cohesion Network Analysis score applied to synchronous chat discussions. It correlates significantly with human discussions’ analysis but it is not tested to identify collective actions as the learners were forced by pedagogical design in collaborative groups. [4] analyze the influence of the course structure (timing and # of the staffs’ publications) on the forum structure, content and on the social network of learners. They show that the course structure correlates to the forum structure (timing and # of students’ posts), but not to the forum content or to the social network grown from the students’ interactions in the forum. They report that although some learners do not publish often, those still have an important impact in forums because they sometimes trigger long discussions on course’s topics. Finally, the authors recommend combining forum activity prediction model with content analysis to support instructors focusing on important discussions. These two studies exemplify the possibility to get collective actions indicators based on content, structure and time, and as in [12] they push for better support tools for instructors. 2.2

Visualizations as Support Tools

Visualization as supporting tools have been used successfully in teaching contexts. [18] classify the important visualizations types, while [11] and [23] advocate their generalization to support all LMS users. Exploration and Awareness Tools. [27] use a Learning Dashboard (LDB) to provide quick and precise feedback with their Behavioral Awareness Mechanism.

92

M. Kon´e et al. Table 2. Visualization based studies about collective action. What is visualized?

Expected output

Size Audience Papers

Collaboration & behavioral justification

Learners’ activity

S.

I. & L.

[1]

Agreement level & contributions

Teacher’ activity

S.

I.

[23]

Activity, grades & usability

Users’ satisfaction

S.

L.

[27]

Domain concepts

Performance

S.

L.

[2]

Activity

motivation & Performance S.

I. & L.

[25]

Engagement

learner’s satisfaction

H.

L.

[36]

self-activity compared to others

Motivation & Performance H.

L.

[7]

Social network & content type

Teacher’s activity

I.

[13]

H.

They tackle the problem of portability and provide dynamic feedback across several platforms to 24 students working on collaborative projects. They use communication, coordination, motivation, performance and satisfaction indicators to operationalized collective actions. Other studies showed how group awareness, i.e. rapid feedback about collective action, and visual narratives [36] are beneficial to the students’ engagement, even if no content analysis is done [26]. [7] evaluated the impact of a radar type visualization given to students in a two MOOCs providing them with awareness on what previous sessions learners had done at the same time of the course. Their visualization had a positive impact on older students but it did not improve significantly the younger students’ activity in the forums. Generally, as [29] note, there is a “need to develop both advanced data mining methods to reveal patterns from MOOC data and visualization techniques to convey the analytical results to end users and allow them to freely explore the data by themselves”. iForum [13] answers [29]’s call. This LDB (Fig. 1) provide a visual analytic system to interactively explore the forum of a LMS. The complex interface helps the tutor group users and compare them to gain insights on the general forum’s dynamic in terms of structure but also in terms of content categorization. In [16], an original spiral visualization of Moodle’s activities enabled the authors to spot that students’ activities peaked on the same days for students with similar grades. The visualization clearly helped the author identify this interesting behavioral trend. Finally, [17] propose TieVis, an original scalable visualization specifically tailored to track explore and analyze dynamics in interpersonal links. The limit of these representation is often their complexity in terms of enduser visualization. iForum required an extensive explanation from the designers to help the instructor grasp what was shown. Similarly, in TieVis the authors rec-

A Collective Dynamic Indicator for Discussion Forums in LMS

93

Fig. 1. iForum’s Dashboard [13] showing (a) overall changes of post in the forum, (b) the network of replies, (c) the thread representation of the forum (high resolution image available on https://git-lium.univ-lemans.fr/mkone/ccis2019) [22].

ognized that some of their visualizations were not intuitive at all. Also, the LDB do not directly show an indicator of collective action but of several individual actions. As reported in [21,33] visualizations can help creativity and holistic thinking; improve the ability to make effective inferences; while making visual analogies reinforces conceptual development; impacts cognition and helps sense-making and understanding. The visualizations limits are also clarified in [33]: a) different learning styles, natural differences in learners have a significant impact on the way diagrams are perceived, visualized and understood b) visualizations do not equally affect all types of learning activities. Visualizations’ Effectiveness. The study from [1] investigates how to reinforce student collective actions in the LMS dotLRN. Noticeably they focus on directly helping the students, by designing a comprehensible tree-based visualization explaining to the student why they received a recommendation to act more collectively and how having a higher-order thinking behavior would be beneficial to them. The engineer students working on a collaborative project reported to understand the tool and generally found it useful. Nevertheless, precaution has to be taken with visualizations for young learners because, for the least, [25] found that a LDB could have undesired side effects on teenagers. Analyzing students in a summer camp remedial program, they reported that extensive referral to a LDB downgraded some mastery goal willing students to students only willing to show proof of competence. The opposite

94

M. Kon´e et al.

was not witnessed. Therefore, some students abandoned their initial and honorable motivation to understand, and aimed to trick the system so that the LDB showed that they understood. This last experiment justifies that although we aim to support instructors with explorable visualizations, we will consider topic based visualizations rather than user-based ones. Topic-based visualizations do not emphases individual actions and, therefore, should damper the motivations to trick the system by adopting a superficial behavior.

3

A Model of Collective Dynamics

In this Section we present our model to visualize collective dynamics. But first we make some clarifications. We make a difference between visualization and LDB. A Learning Dashboard is a “single display that aggregates different indicators about the learner learning process and/or learning context into one or multiple visualizations” [31]. We use the term visualization when the LDB can be made of a single visualization and keep the term LDB when it incorporates clearly several visualizations. We consider discussions taking place in LMSs or in other online applications if they are used in an educational context. We understand that LMSs are applications designed with the intention of being teaching tools, but sometimes we may refer to Google Hangouts as a LMS too. If so, it is because we consider the specific Hangouts chat from the G Suite for Education which was built with the intent to support teaching and learning and used in a teaching context. 3.1

Collective Interaction

We distinguish collaborative actions from collective actions. [8] define collaboration as a coordinated synchronous activity born from the persistent will to share a common perception of a problem originating from people with similar social roles. Taken as a bottom-up process with the coordination coming up from the actors themselves, collaboration is difficult to automatically analyze. It implies that to coordinate, each actor evaluates the intentions of others and, doing so, each instantiates a theory of mind [15] that would be very challenging to implement artificially. Therefore we use the term “collective action” instead of “collaboration” to emphasize the fact that we do not assume a shared intention or shared goal in the actors’ interactions. We focus on a set of observable actions and leave the deduction of the interaction’s intent to the observer. Nevertheless, we use the expression “collective action” instead of the more generic “social interaction” to bear in mind that from the observer point of view, the studied interactions have a shared goal. This is not true for all social interactions. So, in a collaborative interaction the shared goal can be made explicit by the actors, in a collective interaction, it is subjective to the observer, and in social interactions, it may not exist at all.

A Collective Dynamic Indicator for Discussion Forums in LMS

95

Discussions. The elementary actions we are concerned with are those taking place in forum and chats of LMSs. In general, they are publications and messages’ comments, but messages’ up and down votes, publication times are significant and integrated into the model that we elaborate in Sect. 4. In this study, we consider that a forum is made of discussions and that each discussion is created from an initial message followed by other messages. This sequence of messages is what we call a discussion thread. A discussion may contain several threads if comments are allowed or if explicit references are made to previous publications. In that case, the new thread will contain the referenced messages up to the initial one and all subsequent comments. In chats, several interrelated threads may also appear in a unique discussion when one explicitly mentions a previous message. Depending on the LMS, different tools exist to facilitate discussions between peers. The main distinction is historical and separates asynchronous tools, forums, from synchronous aimed chats. Hence, forums’ hierarchical structure is often more fine-grained than that of chats because their posts were always meant to be persistent. On the other hand, chats have better online awareness and presence indicators. Despite their historical differences, today, both forum and chat can be used for synchronous and asynchronous online discussions. One can subscribe to a forum’s discussion and receive alerts as soon as a new message is published. And a history of posts is kept in modern chats, while the creation of several co-existing chats has been facilitated. Each chat is, then, equivalent to a forum discussion. Finally, both forums and chats display messages’ timestamp and authors’ information. So for simplicity, we will call indifferently “forum” any virtual discussion space in a LMS. Implicit Interaction Strength. Figure 2 gives a first example of messages’ correlation, or closeness, translated as interaction’s strength between their authors. There, the strength is either high, low or null. In practice the strength I, that we refer to as implicit interaction strength, could be anything between 0 and 1. We propose to define the implicit interaction strength between two messages as a function of time, topics and actors. This translates the idea that the messages relationship depends on: Who. wrote them. Do the messages’ authors have already interacted together before? Is the author a super poster, a lonely lurker? One will probably consider a message differently depending on his relationship to the message’s author. Time. Obviously, the delay (or time delta) between two messages influences the strength of their relationship. Usually, the quicker the response, the stronger the link. Content. should also play an import role in the way messages relate to one another. The difficulty is to reliably automate content analysis, but NLP techniques exist to advance in that direction. Observer’s Threshold. Deducting the causal relationship between messages is difficult because one has to guess what is the real intention of the message’s

96

M. Kon´e et al.

Fig. 2. Two threads illustrating the relation between topic, time proximity and link strength [22].

author. To avoid errors, we do not directly deduce the messages explicit correlation based uniquely on the previously computed implicit strength. Instead, we propose to make the relationship strength dependent on a parameter set interactively by the observer. We call it the required message correlation r (r ∈ [0; 1]). If the observer sets r ≈ 0 then, for him, the requirement for a message to be linked to other messages is weak and the interactions between messages, actors and topics will be common. Although that would probably lead to overly complex and unusable dynamics metrics and visualizations. Conversely, if r ≈ 1, the requirement is high, meaning that the observer wants linked messages to be close in time, have a lot of topic overlap and be written from closely connected authors. In that case messages, actors and topics will hardly have any relations with one another and dynamic will probably be invisible. Explicit Interaction Strength. We propose the function E (see Eq. 1 graphed as in Fig. 3) as a model for the strength of the two messages. If the observer’s requirement is high (r ≈ 1) then, for most Is, the interaction E should be low, and conversely, if the observer’s requirement is low (r ≈ 0) then the interaction E should be high, for most Is.

E(r, I) =

3  i=0

Bi (I)Pi

where:

⎧ Bernstein polynomials ⎪  

⎪ ⎪ ⎪ ⎪ 3 ⎪ i 3−i ⎪ ⎪ ⎨ Bi (I) = i I (1 − I) P0 = (0, 0), P3 = (1, 1) ⎪ ⎪ ⎪ ⎪ ⎪ P1 = P2 = (r, 1 − r) ⎪ ⎪ ⎪ ⎩ with (r, I) ∈ [0, 1]2

(1)

A Collective Dynamic Indicator for Discussion Forums in LMS

97

Fig. 3. Graphical proposition for the function E(I, r), the explicited strength of two messages’ interactions, then used to identify collective action. I is the “implicit” strength based on topic, time and social network analysis. r is a “requirement”, a threshold set by the observer. The P is are the control points used in the Bernstein polynomial to define the curves of E.

It is based on E, the explicited interaction strength that we display the actoractor or topic-topic dynamics. To further illustrate our intent, we complete the example in Fig. 2 with a discussion using longer threads (Fig. reffig:cycles) and we set four strengths level for I: high, medium, low or ≈ 0. Each time step or topic delta, decreases the implicit strength by one unit. Therefore, in our case, keeping the topic unchanged, the maximum number of relations a message can have with “previous” messages is 3. We call this connection pattern previous3. But, the I function could have other forms. For example, the star pattern, i.e., I been strong for the initial discussion message but null for the others; or the previous-∞ pattern, also called “total co-presence” in [35], denoting existing correlations with all the previous discussion’s messages. In thread c), not only does actor D connects to actor C but he also connects to actor B. He potentially could have been connected to actor A but in addition to their time delta of 3, there a is topic delta dropping the strength of I nearly to 0. Hence, only an exceptionally low requirement r would explicit E, the interaction between the messages of actors A an D. In our case, only if D had published on the same topic as A would their implicit interaction be high enough to gain visibility. Since this is not the case, D is not directly connected to A in the associated sociogram. Using a high requirement, the sociogram of Fig. 4 would only have the following relationships explicited. G ← F ← E and A ← B ← C. Changing the requirement to a low level would explicit all drawn relations but G ← H, A ← F and B ← D. The results of these manipulations are a temporal actor-actor network and a temporal topic-topic network that can be represented with weighted oriented graphs. Snapshots of an actor-actor network from the PP dataset is presented in Fig. 7.

98

M. Kon´e et al.

Fig. 4. Interactions cycles built from a bi-party actor-topic graph. Thread c and d are used to build an actor-actor graph. Dotted arrows denotes weaker links.

3.2

The Dynamics

Collective dynamics are time-dependent interactions spurring for the messages co-occurrence in forums’ discussions, to which an observer associate a common social goal. Actors. How do actors’ messages spread over time? Let (τi )1,2,3 be three messages timestamp and A, B, C three actors. Collectively, the actors’ messages could be distributed as A1 , B2 , C3 , meaning that A, B and C respectively posted one message at timestamp τ1 , τ2 , τ3 . But another dynamic could be that actor A, alone, published the three messages, thus A1 , A2 , A3 ; or alternatively that A published a message at τ1 and B at τ2 and τ3 , thus A1 , B2 , B3 . These denote different publication dynamics. Visualizing them helps identify the users posting behavior distinguishes, for example, active posters from lurkers. Topics. How do messages spread over time and topics? Or how topics are covered by messages over time? LDA based methods are commonly used Bayesian parametric methods to approximate a message’s topic or topics mixture [20]. But other non-parametric methods such as stochastic block model [14], are also useful to map each message to a point in topic space Φ. The topic space may be the set of probability distributions over Φ = T × U × V where a message point noted M = (.7, .1, .2) denotes that the message is made of topics T , U and V respectively in proportions .7, .1 and .2. In the rest of the article, for illustrative purposes we suppose that each message maps to a unique topic. Therefore M from topic T would be the point (1, 0, 0).

A Collective Dynamic Indicator for Discussion Forums in LMS

99

Here’s a trivial topic message dynamics: T1 , T2 , T3 . All three messages are on the topic T , but posted at time τ1 , τ2 , τ3 . At the opposite we have T1 , U2 , V3 where each message maps respectively to topics T , U and V. This type of dynamic shows the evolution of the topics’ popularity in the LMS, or the evolution of topics’ interest over time. Actors and Topics. How do an actor’s messages cover topics over time? Let (AT )i be the message with timestamp τi posted by actor A and on topic T . The two threads in Fig. 2 illustrate how the strength of actors’ ties (or links) varies as a function of time and topic overlap. It shows two threads. a corresponds to actor-topic dynamic (2) where B’s late post after A’s 1st publication does not correlate strongly enough to create the link from B to A. But A’s 2nd post is timely enough, although not exactly on the same topic as B’s message, to create the tie A  B, drawn as a dashed arrow. In Thread b, in addition to the tie B  A, we have a topic overlap and time proximity between C and A. This makes the strong tie A → C. The smalls sociograms below the threads are agent centered. The arrow originate from where the last action took place. the following actor-topics dynamics: (AT )1 , . . . , (BU)8 , (AT )9

(2)

(AT )1 , (BV)2 , . . . , (CU)8 , (AU)9

(3)

Where (2) denotes that actor A published two messages both on topics T , but the 2nd came after B’s message on topic U. The · · · mean that between B’s messages and the first message of A several unknown messages were published. In 3, B publishes a message on topic V immediately after A’s message on T . Then, after some time and other publications, A posts a message on yet a 3rd topic U, similar to what C had just published on. From the above observable dynamics, we can define two more dynamics: the actor-actor and topic-topic dynamics. Actors and Actors. The actor-actor’s dynamics is the evolution of the actors’ social network where the actors’ links depend on their messages topic closeness, and frequency. We suppose that messages posted on overlapping topics in the same discussion by different actors potentially indicate some interactions between the authors. This may not always be the case. Figure 2 exemplifies how the strength of messages correlation is made of, topic, temporal and actor closeness. What is not shown in this figure is that the strength of the tie may also depend on previous actors’ interactions, that is the social network built from previous messages correlations. If a message is published shortly after another then their correlation should be strong. In example (3), actors A and C would have a strong interaction because they published on the same topic U and their respective messages’ timestamp are close. An interaction also exists between actors A and B but probably weaker because it is only based on the messages’ timestamp and not the topic overlaps.

100

M. Kon´e et al.

The relationship between actors B and C would be even weaker if it existed at all. Their messages are far apart and not on the same topic. So, from the message topic, time and actor correlations we build a directed graph representing the social network of messages exchanged between the LMS actors. The evolution of that network is what we call the actor-actor dynamic. Topics and Topics. This type of dynamic concerns the way the topic’s correlations evolves over time. For example, if at the beginning of a course, topics T and U tend to be closely connected because students often mix them up in discussions, we hypothesize that as the course’s concepts disambiguate, the relationship between the two topics will likely decrease because fewer students will publish messages mixing both topics in the same discussion. As for the actor-actor dynamic, the topic-topic dynamic can be expressed as a temporal network (or temporal graph), but where nodes are topics and links between them represent relationships whose strength depend on a topic, time and actor’ correlation. This can be seen as a message based distance. In thread a of Fig. 2, some relation between T and U would occur for the same reason than that of A  B. Incidentally, in the 2nd thread, we would have V  T based on time proximity, but also U  T based on the shared author A. Finally, let’s recall that for us collective dynamic is the evolution of relationships between topics and actors spurring from the messages’ co-occurrences in a LMS’s forum without making any assumption about the actors’ intentions. 3.3

Identifying Collective Actions

Once we built the actor-actor and topic-topic networks, we want to identify potential collective actions. Those are derived from the actor-actor network structure. We make the hypothesis that collective actions need the presence of a recurrent actor, that is an actor replying to one of his replier [5]. We take it as the evidence that at least one of the actors has potentially assimilated someone else’s message before acting, therefore initiating a collective action. Structurally, recurrent actors form cycles in the sociogram. For example, in Thread c Fig. 4, actor A who posted twice in the thread, closed the cycle A ← B ← C  D  E  A. Furthermore, since actor E is in cycle with A and F , and F with H, we will consider that actor A and all of the above are engaged in a common collective action with F and H. G on the other side is not part of any cycle and, therefore, does not participate in collective action. In fact, G published a message and disappeared. We have no evidence that his message had an impact on others or others on his. That is why we consider necessary (but not sufficient) that the recurrent interactions conditions collective action. In that sense, we further [5]’s findings that recurrent interactions are important for discussions.

A Collective Dynamic Indicator for Discussion Forums in LMS

101

Table 3. List of datasets and courses that we use to experiment our LDB. Dataset source Discussion Message Author Timeiii

Structurei

Extraii

T

Active time, citations, Att.

Span (days) Precision Moodle (2009) FFL

348

1490

19

78

s

Coursera (2017) PP PML

868

2548

1112

365

[1 h; 2 m] W, G & A. V & C

1135

4157

982

240

[5 h; 1 m] W, G & A. V & C

248

549

311

728

[1 d; 1 y] W & A.

499

1004

638

989

ms

W, G & S. V, C & Sub.

1318

9460

4609

1022

ms

W, G & S. V, C & Sub.

7297

96

327

s

G & S.

AT

V&C

Coursera (2018) HR UFM

Hangouts (2018) VUCI

4

5



The Datasets

In this section, we detail our datasets used to analyze the collective dynamics from them. Table 3 lists our seven datasets collected in 2017 and 2018. They are organized in four groups based on the datasets’ origins. The datasets contain forum information from the following online courses: 1 ) Coursera (2018), a database extraction for a Human Right (HR) and Understanding Financial Markets (UFM) MOOCs. 2 ) Moodle (2009), a database extraction for a French as a Foreign Language (FFL) course. 3 ) Hangouts (2018), a JSON export from the Virtual University of Cˆ ote d’Ivoire (VUCI)’s G Suite for Education with the data from 5 chats setup-up by the university staff for the staffs or the students. 4 ) Coursera (2017), a saving of online courses using selenium web scraper. Three courses are available: Python Plotting (PP), Python Machine Learning (PML) and African Towns (AT) an urban planning course. The dataset is then transformed and stored as a CSV file to be processed by a Python engine. It has messages’ content but approximates the timestamp. In the Table, the forum structurei is given by the existence of different forum types: Weekly (W), General (G), Technical Support (S), Thematic (T) or Assignment (A) related. Extra informationii is often available, such as messages up votes (V), comments (C), subscription (Sub.) or file attachment (Att.). When data was scrapped, the dates were in humanized format (e.g. 6 month ago, 23 min ago), therefore the precisioniii varies with the posts’ ages. Recent posts can be compared with greater precision than older ones. We give the intervals in which the precision varies in hours (h), days (d), month (m) and year (y).

102

M. Kon´e et al.

Fig. 5. Visualizations from the FFL dataset [22]. (Color figure online)

5

Visualizations

In this section, we detail our datasets and present the model that we are going to use to analyze the collective dynamics. We now introduce three visualizations from our work. They were made separately, each tests some elements of the global conception model presented in Fig. 9. 5.1

Visualizing Different Actions Types

We used the FFL dataset to sketch our first visualizations. Our Moodle dataset has the particularity to contains detailed information about the actors’ activity type and duration. It enabled us to distinguish the users’ active time from their idle times. With that information, we came up with the visualizations of Fig. 5, built as part of a standalone web application. Once we selected a user, we see the visualizations corresponding to his active time. The top chart presents, in yellow, the total time spent in the forums for a user, day per day and relates it with his active time, shown in blue. To handle the scaling problem, we implemented a monthly, weekly and daily based grouping criteria. On the lower chart, we compared the activity time of two users. We see that, although they display similar activity patterns over time, one is generally more active than the other. This was a successful test to sketch our firsts visualizations, but the FFL dataset lacked message content and its size did not create a huge monitoring problem for the tutors. Albeit, this dataset is interesting because it emphasizes the importance of the activity type and showed the importance of scaling even for small dataset. The following step is to scale up with a bigger dataset that includes content.

A Collective Dynamic Indicator for Discussion Forums in LMS

103

Fig. 6. Global (a) and Detail (b) representation of a VUCI conversation. (b) is a detailed representation of the red highlighted area in (a), showing a 2 h conversation. The highlighted yellow area shows text, author and timestamp of a selected message. A log count of the messages per author is displayed on the left of (a) as horizontal bar chart [22]. (Color figure online)

5.2

Interactivity

The hangouts dataset is slightly different from the other datasets. It is less structured because it comes from chats and not forums. Figure 6 displays 6 274 messages from one chat, gathering several threads of exchanges between the VUCI’s administration and their tutors, from March 2018 to December 2018. We started from a manual, but automatable, export of the Google’s hangouts services. It gave us a JSON file that we preprocessed in python and fed to d3,

104

M. Kon´e et al.

a visualization JavaScript library, via a standalone Django web application. We built a LDB, testing on that larger dataset, interactive features such as zooming, panning, data point selection. The figure’s pane (a) contains a bird’s eye view of all messages. Users are represented vertically, in the middle, with their names. On the left is a logged scale histogram of message count for each user. On the right, along with the time axis, we plotted the messages as discs whose areas are proportional to the messages’ length. Activating the mouse wheel while on the time axis zoom in and out. In the lower pane, is the 3 h period framed in red, zoomed. It shows a message pop-up, with full detail, activated by the mouse hovering a data point. In this investigation, we did not include the SNA and NLP analysis because our objective was the visualization of a large dataset with content and the handling of the scaling problem. It proved that our technology choices were sound. We transformed the dataset of several thousand messages, from the JSON file to the HTML rendering, in a few seconds. But it also pinpointed the importance to implement many interactive exploration functions to alleviate the scaling problem. For examples, ways to quickly zoom on a few data points without losing the overall picture, as well as ways to filter and order data point or axis labels, and also enabling more intricate communication between the different LDB’s charts. In addition to all this, a major drawback to our visualization is that it did not take into account, yet, the final users. Finally, besides defining visual modalities, we still needed to test the algorithm to compute the collective activity indicator.

Fig. 7. At the top (a) is half of compound yearly actor-actor network. The three bottom images (b), (c) and (d) are closeup around actor 642 during the quarters of the year [22]. (Color figure online)

A Collective Dynamic Indicator for Discussion Forums in LMS

5.3

105

The Social Network and the Visualization

We used the PP dataset from Coursera 2017 to investigate the construction of a collective activity indicator with SNA techniques. We came up with the sociogram (Fig. 7) illustrating an actor-actor dynamics. Nodes are all learners. We removed the tutors and the course’s mentors to approach Dillenbourg’s collaboration definition that we gave in Sect. 2. The users are linked based on their proximity to the previous-3 actors who published in the same discussion. The arrows’ width depends on the messages’ timestamp closeness, # of co-occurrences of the authors and # of votes that the message collected. We colored them by discussion, and order them by age, the oldest being the lightest. This is an intermediate visualization that is used for analysis purposes and will not be presented to the end users for two reasons: a) it gives abstruse information for someone that does not have access to the raw data, b) it represents the actors as nodes and as we noted in Sect. 2.2 we would rather have visualizations showing topics than persons. We present it to illustrate what a social network from our dataset looks like, and because, zoomed and reduced to the three snapshots (b), (c), (d), representing three successive yearly quarters, we distinguish an evolving pattern. In particular a detailed analysis of actor 642, circled in red, shows that he started to designate two other messages probably because they were meaningful to him (a), then he engaged in several message exchanges designating others’ message meaningful as his was also getting attention (c), and in the most recent quarter someone commented one of his earlier message (d). It is not clear from Fig. 7 if actor 642 was part of cycles. Testing our hypothesis that actors in cycles share a collective dynamic is in our perspectives. 5.4

Scalability, Interaction and Social Network’s Visualization

In Fig. 8 we show our last LDB answering problems of portability, scalability but still using SNA. It covers the last three block of our data analysis cycle (Fig. 9). As in Fig. 7 this visual is not intended for the end users yet, but it illustrates how we can combine several datasets, some huge, in one LDB while adding interactivity and still keeping the data-update fast enough for real time usage. Here we use Javascript library implementing the Verlet algorithm (a numerical method to simulate molecular dynamics) to dispose the nodes representing, as previously, users whose number of exchange is translated by their disc’s area. The difference with Fig. 7 is that the observer can now select the data source which is precomputed with different modes of connection (e.g. previous3, previous-1, previous-5 ) as describe in Sect. 6.1. Also he can drag, fix zoom on the nodes, and a mouse over brings additional information from the underlying dataset. We anonymized the data and published this visualization on https://observablehq.com/@maliky/example-of-network-fromdiscussion-forums-in-mooc. On another LDB not shown here but accessible from the previous url we also show the implementation of the mechanism for the observer to filter links based on a slider inputs.

106

M. Kon´e et al.

Fig. 8. An interactive visualization to explore datasets from MOOCs’s forums interactions.

6 6.1

Perspectives and Conclusion Perspectives

The three tests and visualizations presented are preparing further work to build a collective activity indicator taking in account the social network structure, the content of messages, their evolution in time, and visualizations for that indicator. To further our effort to find visual modalities for the indicator, we conducted a survey in July 2018 with 48 tutors from the VUCI to introduce our project and start engaging them in a co-construction process. 27 were not satisfied or unsure about their current tools’ effectiveness to monitor their students’ work. 39 agreed or strongly agreed that ICTs could help their students better collaborate and 40 that collaboration was, indeed important for learning. In a second survey, we plan to ask the tutors the kind of visual representation they would find useful to monitor the collective activities of their students, but also what dimensions of the relationships between messages they would like to leverage to explicit visually those relationships. This is part of a co-construction approach

A Collective Dynamic Indicator for Discussion Forums in LMS

107

Fig. 9. Data Analysis cycle [22].

that should engage the tutors, facilitating the adoption of our visualization while increasing its usability and impact. Experiments. We envision our coming experiment as two folds. First, we observe a course for 5 to 7 weeks. At least one of the tutor should connect regularly to our LDB where our indicator of collective action is regularly updated. We record interface events (for example, mouse move and clicks, filters used) on our LDB. We collect the instructor comments made on the fly on the platform and at the end of the course, we conduct an interview to get a feedback formatted as a likert scale and based on [9]’s heuristics: 1. Spatial organization 2. Information coding 3. Orientation and help 4. Data set reduction 5. Flexibility 6. Consistency 7. Remove the extraneous (ink). If we are not able to get live data from a big enough MOOC where collective action is more probable to occur, we will use our Coursera 2018 dataset and collect feedback from the instructors. The second fold of the experiment comes after incorporating the first observation made by the instructors. It is a short experiment used to present the modification made to the prototype and collect final observation from the instructors. Concerning the portability and validation of our pipeline stages (Fig. 9), we continue to work with several datasets, extending our unified model to incorporate SNA and NLP statistics for all datasets. The portability assumption rests on our capacity to extract periodically, every few minutes, hours or days, data from the main LMS. This we will necessitate proper APIs authorizations to make our LDB communicate effectively with the LMS.

108

6.2

M. Kon´e et al.

Conclusion

In this research we proposed a method to compute and visualize an indicator for collective actions based on interactions in MOOCs’ forum. We realized from several visualizations to prepare an experiment with the final prototype of the indicator. Our first visualization integrated, detailed information about the activities. The second and third used bigger datasets and tested respectively interactive features to alleviate the scaling problem and featured of a temporal social network built from the forums’ interaction. To validate entirely our model, we will need to run a real-world experiment with feedback from more instructors. Still our research shows that it is now possible to build a real-time LDB for collective actions emerging from the interactions in MOOCs’ forums. We insist that our indicator is not a collaboration indicator but one of the collective actions which depends on the observer’s stand point. It is the observer who finally is defining what type of interactions should be important in for his context, and often artificial intelligence solution have failed inserting the researchers and conceptors’ apriori into situations that ended up unforeseen. In our case, and as recommended by [3], we aim for a stupid indicator to enhance the intelligent tutors. Therefore, it is the observer that will setting the threshold to reject unnecessary interactions parameters from his standpoint. He could for example set the time resolution to something low to keep long interactions. Further more our indicator rely on the existence of cycles created by recurring actors, those may not always exist, but in that case we would still display important information about the forum’s activity. We hope that this research will help improve the monitoring of group activity and enhance the tutors in their ability to provoke collective learning. To further this research one should include a field experiment as described in Sect. 6.1 but also plan a LDB for the learners themselves as this would potential be beneficial for their autonomy in a self-regulated manner.

References 1. Anaya, A.R., Luque, M., Peinado, M.: A visual recommender tool in a collaborative learning experience. Expert Syst. Appl. 45, 248–259 (2016) 2. Awasthi, P., Hsiao, I.H.: INSIGHT: a semantic visual analytics for programming discussion forums. In: VISLA@ LAK, pp. 24–31 (2015) 3. Baker, R.S.: Stupid tutoring systems, intelligent humans. Int. J. Artif. Intell. Educ. 26(2), 600–614 (2016). https://doi.org/10.1007/s40593-016-0105-0 4. Boroujeni, M.S., Hecking, T., Hoppe, H.U., Dillenbourg, P.: Dynamics of MOOC discussion forums. In: LAK, pp. 128–137 (2017) 5. Chua, S.M., Tagg, C., Sharples, M., Rienties, B.: Discussion analytics: identifying conversations and social learners in FutureLearn MOOCs (2017) 6. Dascalu, M., McNamara, D.S., Trausan-Matu, S., Allen, L.K.: Cohesion network analysis of CSCL participation. Behav. Res. Methods 50(2), 604–619 (2017). https://doi.org/10.3758/s13428-017-0888-4 7. Davis, D., Jivet, I., Kizilcec, R.F., Chen, G., Hauff, C., Houben, G.J.: Follow the successful crowd: raising MOOC completion rates through social comparison at scale. In: LAK, pp. 454–463 (2017)

A Collective Dynamic Indicator for Discussion Forums in LMS

109

8. Dillenbourg, P.: What do you mean by collaborative learning. In: CollaborativeLearning: Cognitive and Computational Approaches, vol. 1, pp. 1–15 (1999) 9. Dowding, D., Merrill, J.A.: The development of heuristics for evaluation of dashboard visualizations. Appl. Clin. Inform. 09(3), 511–518 (2018) 10. Duque, R., G´ omez-P´erez, D., Nieto-Reyes, A., Bravo, C.: Analyzing collaboration and interaction in learning environments to form learner groups. Comput. Hum. Behav. 47, 42–49 (2015) 11. Emmons, S.R., Light, R.P., B¨ orner, K.: MOOC visual analytics: empowering students, teachers, researchers, and platform developers of massively open online courses. J. Assoc. Inf. Sci. Technol. 68(10), 2350–2363 (2017) 12. Ezen-Can, A., Boyer, K.E., Kellogg, S., Booth, S.: Unsupervised modeling for understanding MOOC discussion forums: a learning analytics approach. In: Proceedings of the Fifth International Conference on Learning Analytics and Knowledge, pp. 146–150. ACM (2015) 13. Fu, S., Zhao, J., Cui, W., Qu, H.: Visual analysis of MOOC forums with iForum. IEEE Trans. Vis. Comput. Graph. 23(1), 201–210 (2017) 14. Gerlach, M., Peixoto, T.P., Altmann, E.G.: A network approach to topic models. Sci. Adv. 4(7), eaaq1360 (2018) 15. Gerstenberg, T., Tenenbaum, J.B.: Intuitive theories. In: The Oxford Handbook of Causal Reasoning (2017) 16. G´ omez-Aguilar, D.A., Hern´ andez-Garc´ıa, A., Garc´ıa-Pe˜ nalvo, F.J., Ther´ on, R.: Tap into visual analysis of customization of grouping of activities in eLearning. Comput. Hum. Behav. 47, 60–67 (2015) 17. Guo, F., et al.: TieVis: visual analytics of evolution of interpersonal ties. J. Vis. 20(4), 905–918 (2017). https://doi.org/10.1007/s12650-017-0430-x 18. Heer, J., Shneiderman, B.: Interactive dynamics for visual analysis. Queue 10(2), 30 (2012) 19. Hommes, J., Rienties, B., de Grave, W., Bos, G., Schuwirth, L., Scherpbier, A.: Visualising the invisible: a network approach to reveal the informal social side of student learning. Adv. Health Sci. Educ. 17(5), 743–757 (2012). https://doi.org/ 10.1007/s10459-012-9349-0 20. Jelodar, H., et al.: Latent dirichlet allocation (LDA) and topic modeling: models, applications, a survey. arXiv:1711.04305 [cs], November 2017 21. Klerkx, J., Verbert, K., Duval, E.: Enhancing learning with visualization techniques. In: Spector, J.M., Merrill, M.D., Elen, J., Bishop, M.J. (eds.) Handbook of Research on Educational Communications and Technology, pp. 791–807. Springer, New York (2014). https://doi.org/10.1007/978-1-4614-3185-5 64 22. Kon´e, M., May, M., Iksal, S., Oumtanaga, S.: Towards visual explorations of forums’ collective dynamics in learning management systems. In: Lane, H., Zvacek, S., Uhomoibhi, J. (eds.) Proceedings of the 11th International Conference on Computer Supported Education, CSEDU 2019, Heraklion, Crete, Greece, 2–4 May 2019, vol. 2, pp. 67–78. SciTePress (2019) 23. van Leeuwen, A., Janssen, J., Erkens, G., Brekelmans, M.: Supporting teachers in guiding collaborating students: effects of learning analytics in CSCL. Comput. Educ. 79, 28–39 (2014) 24. Lobo, J.L., Santos, O.C., Boticario, J.G., Del Ser, J.: Identifying recommendation opportunities for computer-supported collaborative environments. Expert Syst. 33(5), 463–479 (2016) 25. Lonn, S., Aguilar, S.J., Teasley, S.D.: Investigating student motivation in the context of a learning analytics intervention during a summer bridge program. Comput. Hum. Behav. 47, 90–97 (2015)

110

M. Kon´e et al.

26. May, M., George, S., Pr´ ovˆ ot, P.: TrAVis to enhance online tutoring and learning activities: real-time visualization of students tracking data. Interact. Technol. Smart Educ. 8(1), 52–69 (2011) 27. Medina, E., Meseguer, R., Ochoa, S.F., Medina, H.: Providing behaviour awareness in collaborative project courses. J. Univ. Comput. Sci. 22(10), 1319–1338 (2016) 28. Paredes, W.C., Chung, K.S.K.: Modelling learning & performance: a social networks perspective. In: Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, pp. 34–42. ACM (2012). 00013 29. Qu, H., Chen, Q.: Visual analytics for MOOC data. IEEE Comput. Graph. Appl. 35(6), 69–75 (2015) 30. Rehm, M., Gijselaers, W., Segers, M.: The impact of hierarchical positions on communities of learning. Int. J. Comput.-Support. Collab. Learn. 10(2), 117–138 (2014). https://doi.org/10.1007/s11412-014-9205-8 31. Schwendimann, B.A., et al.: Perceiving learning at a glance: a systematic literature review of learning dashboard research. IEEE Trans. Learn. Technol. 10(1), 30–41 (2017) 32. Stephens-Martinez, K., Hearst, M.A., Fox, A.: Monitoring MOOCs: which information sources do instructors value? In: Proceedings of the First ACM Conference on Learning @ Scale Conference, pp. 79–88. ACM (2014) 33. Twissell, A.: Visualisation in applied learning contexts: a review. J. Educ. Technol. Soc. 17(3), 180–191 (2014) 34. Wang, X., Wen, M., Ros´ o, C.P.: Towards triggering higher-order thinking behaviors in MOOCs. In: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, pp. 398–407. ACM (2016) 35. Wise, A.F., Cui, Y., Jin, W.Q.: Honing in on social learning networks in MOOC forums: examining critical network definition decisions. In: Proceedings of the Seventh International Learning Analytics & Knowledge Conference, pp. 383–392. ACM (2017) 36. Yousuf, B., Conlan, O.: VisEN: motivating learner engagement through explorable visual narratives. In: Conole, G., Klobuˇcar, T., Rensing, C., Konert, J., Lavou´e, ´ (eds.) EC-TEL 2015. LNCS, vol. 9307, pp. 367–380. Springer, Cham (2015). E. https://doi.org/10.1007/978-3-319-24258-3 27

Immersion and Control in Learning Art Knowledge: An Example in Museum Visit Morgane Burgues, Nathalie Huet, and Jean-Christophe Sakdavong(B) CLLE-LTC CNRS UMR 5263, Université Toulouse 2 Jean Jaurès, Toulouse, France [email protected], {nathalie.huet,jean-christophe.sakdavong}@univ-tlse2.fr

Abstract. New technologies are extensively used in learning and their potential positive impact is still evaluated, in particular on tablets and Virtual Reality (VR). The interest of this technology is the immersion and the control they can bring to the learning environment. To study the impact of the degree of immersion and control on learning we evaluate the performance, intrinsic motivation and self regulation on a first experiment. The task consisted of a virtual visit of a 3D museum based on four different conditions, (1) high immersion with active VR, (2) passive high immersion, (3) low immersion with active tablet, (4) passive low immersion. The intrinsic motivation and emotion were evaluated with a self-reported questionnaire, and self regulation with behavioral indicator. Results show that an active control of the learning improves performance, but it did not impact intrinsic motivation and had a partial impact on self regulation. Immersion had no impact on performance and intrinsic motivation, but it impacts partially self regulation. Because overall performance was quite low, we supposed that participants were mentally overloaded by the high amount of information. To test the effect of cognitive load, we did a second experiment in which a low cognitive load condition was added and compared to the sample of experiment one who was in a high cognitive load condition. Results showed higher performance in low cognitive load condition rather than in high one. Also control still had a positive impact on learning on low cognitive load, and immersion had no impact. Keywords: Virtual-reality · Learning · Control · Immersion · Art knowledge emotions

1 Introduction Since several decades, new technologies for learning are increasing. Some authors even think that they can deeply impact the educational daily practice [1]. The educational practice brings together different content such as mathematics, languages, and art. On the subject of art, a charter on cultural education was published in 2016 in Avignon (France), to make accessible the artistic and cultural education to all. Following this charter and the increase in new technologies, we focus for this experiments on learning about artistic knowledge during a virtual visit of a museum via new technologies. Learning is recognized to be a complex process, impacted or supported by different factors, © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 111–127, 2020. https://doi.org/10.1007/978-3-030-58459-7_6

112

M. Burgues et al.

such as: emotions [2], intrinsic motivation [3] and self-regulation [4]. In the different new devices of learning we can find smartphones, tablet, augmented reality and virtual reality. Here, the device that interest us is ne virtual reality as a pedagogical resources [5, 6] for art knowledge. Virtual Reality can be an interesting tool for learning as it is characterized by two elements: immersion and the potential interaction and control with the environment [7]. Referring to immersion, it is attested that Virtual Reality (VR) is the most immersive display available today, compared to computer, conventional display and tablet [8]. But it did not attest with research that VR bring better performance on learning thanks to the immersion [9]. As the impact and the sense of impact of immersion did not bring consensus for the moment, we can have a look at the second VR characteristic, the control. The control is the fact that people can interact with their environment, they can choose action, be involved and be active of their learning. At the opposite, if people cannot interact with the environment, they cannot be active, and they be passive on their learning, they just look passively information. This kind of learning style is not the favourable one as opposed to active control on learning [10], allowed by VR. To study the impact of VR on learning we have to take into account the impact of other variables that can impact learning. The two variables having an attested impact on learning that interests us are: self regulation [11] and intrinsic motivation [4]. Self regulation is defined as an active process, made in consciousness and allowing the knowledge construction, a deliberate application of strategies in pursuit of goal [12]. During the literature review we asses a lack of study in the impact and relation between VR, self regulation and learning [13], attesting the interest to study for this field. However, the control, brought by virtual reality, is recognized to asses better performance, cognitive competences thanks to a deeper treatment of information [14]. Also, immersion worn by VR allows limited parasitic elements to the environment [8]. Finally, we hypothesis that a high level of control and immersion bring better performance and self regulation [15]. Concerning the intrinsic motivation, active control, in classical environment is recognized to positively impact intrinsic motivation [16]. Consequently, we hypothesis the same for virtual reality: people with a high control of their learning on VR have a better performance than people with a lower level of control. Concerning the immersion and the intrinsic motivation, results did not seem to be in accordance with the positive impact of high level of immersion on performance on virtual reality environment [17] and on motivation [18, 19]. However, we noticed that research did not focus on intrinsic motivation, but on general motivation. We have hypotheses, referring to the literature, that immersive environment, increases intrinsic motivation and performance thanks to virtual reality [20]. This article is an extension of a previous article [13] in which we added another experimental condition presented as experiment 2. The first experiment [13], studies the impact of control and immersion on motivation, self regulation and learning performance. We have several hypotheses: Immersion impacts positively self regulation, intrinsic motivation and learning performance. Control also allows a better performance, self regulation and intrinsic motivation. To study

Immersion and Control in Learning Art Knowledge: An Example in Museum Visit

113

this, we build a 3D museum visit, as students can do with their classes, in accordance with the charter of 2016, to participate in the art culture sharing.

2 Experiment 1 2.1 Objectives and Methodology The first experiment consists of a museum visit composed by 4 sculptures. Students had to acquire knowledge about these sculptures on a 3D virtual environment with a VR headset, or with a tablet. The purpose of this experiment is to evaluate the impact of immersion and control on learning, intrinsic motivation and self regulation. Participants. In this study we had sixty-one students, (thirty-six females and twentyfive males, average age = 22.66, SD = 3.80) recruited at the University of Toulouse II and III. Prior to the museum visit, we checked the level of art on the sculptures presented later by asking them some questions and the fact that they had not art courses. This test allows to check that they were completely unfamiliar with the selected artworks. This test also allows to know their real need of knowledge, and to be sure that a behavior such as asking for new knowledge was due to a lack of knowledge and being considered as a self-regulated strategy. Materials and Groups. Participant had to do a 3D museum visit to acquire knowledge on one sculptures. The museum contains four sculptures by Michelangelo: the “David” (Fig. 1) and the “Moses” (Fig. 2), the “Pieta” (Fig. 3) and the “Dying slave” (Fig. 4). The learning performance was evaluated on the three last ones, the David was simply used as a familiarization phase with the material. The learning task was to visit a museum. Students had ten minutes to familiarize with the system (first sculpture) and thirty minutes with the other three sculptures (participant can use freely these thirty minutes). Students were aware that they will be evaluated on their learning of the sculpture after the visit. The sculptures were presented in the same order to all participants: familiarization with the “David”, and then: first the “Moses”, secondly, the “Pieta”, and at the end the “Dying slave”. The task was to memorize the knowledge on Michelangelo’s sculptures after hearing information. Participants were randomly assigned to different independent groups. We had four groups: Group #1: high level of immersion and an active control of their learning (N = 15). Participants had a VR headset and a pointer remote to move around the sculpture and search by themselves information to heard, selecting a part of it using thee remotes. Group #2 (N = 15): high immersion level but with a passive control. Here, participants had a VR headset, but they could move around the sculptures, and did not have to search for information. The information to hear was given on a control panel (Fig. 4) and they just have to select it and listen to it. Group #3 and #4: low immersion, using an android tablet instead of a VR headset, but they have the same 3D environment as group one, group three (N = 16) was in an active mode and group #4 (N = 15) in passive mode.

114

M. Burgues et al.

Fig. 1. The “David” by Michelangelo, sculpture for familiarization.

Fig. 2. The “Moses” by Michelangelo [13].

Immersion and Control in Learning Art Knowledge: An Example in Museum Visit

115

Fig. 3. The “Pieta” by Michelangelo [13].

Fig. 4. The “Dying slave” by Michelangelo [13].

VR headset was the Google Daydream mobile headset, with three degrees of freedom (DoF) with a three DoF pointer remote. The tablets were Android HP Pro Slate 12’ Displaying the 3D scenes over the 2D screen and using gyrometer and magnetometer to see around (three DoF, as the VR headset). Familiarization. The familiarization phase consisted of the discovery of one sculpture, to get used of materials, and to do a test at the end that not evaluated. All the process was made to be the same than the evaluated phase. The familiarization period took ten minutes during the museum visit, in the attributed condition (groupe one, two, three or four). At the same time, they discovered that the visit consisted in two activities: visually observed the sculptures and listen to the information. They could listen two types of

116

M. Burgues et al.

information: General information on the sculpture globally, and specific information on particular parts of the sculpture (e.g.: the eyes of the David). After the familiarization phase, people have a familiarization test phase. The test consisted of a completion of a fill-gap exercise, about the sculpture they just saw. It was specified to participants that this fill-gap familiarization was not evaluated, when the other fill-gap after would be. Performance. Performance was assessed by fill-gap exercises related to each sculpture (as in the familiarization phase). All participants had the same fill-gap exercises. They completed one by one 4 fill-gap sentences for each sculpture, one by one (First 4 sentences about the “Moses”, followed by 4 about the “Pieta” and 4 about the “Dying slave”). The sentences were picked from the general information they could listen to during the visit. Each fill-gap exercise was composed of four sentences, with three hole on it, a total of 12 words to find by sculpture. For example: “The weight of the statue rests on a single [leg] and therefore on a foot in majority. With time the [microcracks] appeared on this foot and go up in the leg, which put [statue] in danger; [fill-gap to complete]” [13]. This brought to a total of 36 words. When the answers were correct they scored 1 point, to reach a maximum score on 36 points, 12 points per evaluated sculpture. Measure of Self-regulation. Using the Pintrich model [11], we constructed and used two behavioral indicators: The number of times that general and specific information was heard and replayed. This was a metacognitive indicator, revealing a regulation of their own learning and knowledge. The number of clicks on the clock that displayed the time elapsed during the visit [4]. This is a well known time management indicator, already used on the literature. Students behaviour was tracked by tracing every click on a control panel (Fig. 5). This traces were used to provide our indicators. When a behaviour occurred it was coded 1. According to the literature, a good self regulator is a learner who manages his/her time regularly and the information display, according to their levels of knowledge on art and their estimated degree of memorization. Measure of Intrinsic Motivation. According to the literature we decided not to measure general motivation, but intrinsic motivation, related positively to performance [3], using the questionnaire from Deci, Eghari, Patrick and Leone [21]. It contained 17 items, divided in four dimensions; Interest, perception of competences, pressure, perception of choice. To complete the questionnaire, participants had to indicate their degree of disagreement-agreement, on a 7-point likert scale, ranging from 1: “Absolutely wrong for me” to 7: “Absolutely true for me”. This questionnaire had to be completed at the end of the learning tasks of the sculptures. The more they had a higher score, the more it showed a high degree of agreement with the dimension: The more they had a higher score of pressure, the more they felt pressure; the more they had a higher score of perceived competence, the more they felt competent.

Immersion and Control in Learning Art Knowledge: An Example in Museum Visit

117

Fig. 5. Control panel during the visit [13].

Measure of Emotional Perception. Emotional perceptions were evaluated with the use of items assessing the degree of perception of emotions for each sculpture. Participants had to indicate their degree of emotions perceived for the “Moses”, the “Pieta” and the “Dying slave” on a 5-point Likert scale, ranging from 1: no emotion perceived to 5 strong emotion perceived. The higher the score was, the more the participants experienced a strong emotions. Procedure. The first step was the same for all the participants whatever the group assignment. First, the experimentator presented the general instructions and participants completed many questionnaires, about their consent request, their level of knowledge before the museum visit in 3D, and the emotional perception of each sculpture. The second phase was the familiarization phase to familiarize participants with the materials according to the conditions they were randomly assigned, VR passive/active/Tablet passive/active. At this step, the participants had different materials and thus, different instructions, but the 3D museum visit was the same, only the presentation modality was different. During this phase, they visited one sculpture and heard the information about that one. The familiarization phase also showed and allowed them to do a fill-gap exercise, without stakes, because it was not evaluated. The third phase was the learning phase, they followed the visit with another three works of art. This phase was identical to the familiarization phase: 30 min for three sculptures, except that they knew they will be evaluated about their learning at the end of the visit. Between the learning phase and the test phase, participants answered to the intrinsic motivation questionnaire and to a measure of mental effort perceived, to control this variable. That allowed us to have an interference task before the test of the learning performance. To record it, participants had to answer to the fill-gaps, they had as much time as they wanted and needed to do so.

118

M. Burgues et al.

2.2 Results Emotional Perception. One way analysis of variance (ANOVA) with the sculptures as repeated measures was computed. Results revealed that the three sculptures were not equally emotionally perceived, F (2,120) = 27.23; p < .001. The emotional perception of the “Pieta” sculpture, was significantly higher (M = 3.11; SD = .90) than for the other two sculptures. The other two ones did not differ on emotional perception, they aroused a quite low emotion, M = 2.34; (SD = .90) for the “Moses” sculpture; and M = 2,34; (SD = . 96) for the “Dying slave”. Because of this difference in emotional perception, in the following sections we used the sculptures as a repeated measure. Performance. Results from the three-way ANOVA with Immersion and Control as independent factors and the sculpture as the repeated measure showed that performance was significantly affected by the control condition, F (1, 57) = 8.32; p = 0.006, η2p = 0.13. Participants in active conditions significantly outperformed (M = 5.69, SD = 0.37) those in passive condition (M = 4.18, SD = 0.37). Besides, results revealed no significant effect of immersion on performance, F(1, 57) = 0.22; p = 0.64. Concerning the variations as a function of sculptures, results showed that the sculpture significantly affected performance, F (1,57) = 6.46; p = 0.014, η2p = 0.10. More precisely, the “Pieta” was significantly more successful in terms of performance (M = 6.08, SD = 2.81) than the “Moses” (M = 4.79, SD = 2.60) and the “Dying slave” (M = 3.90, SD = 2.89). Finally, the interaction between sculpture and control was not significant, F (1,57) = 0.93; p = 0.76, η2p = 0.02 Similarly, no interaction between sculpture and immersion was found, F(1,57) = 1.97, p = 0.17, η2p = 0.03 (Fig. 6).

Fig. 6. Performance per sculpture according to the control condition.

Self-regulation. To measure self regulation we used two behavioral indicators, (1) Number of times information heard and replayed; (2), number of click on clock. (1) The first behavioral indicators reveals no effect of immersion on the number of times that information are heard and replayed, F(1,57) = 0.09; p = 0.77, η2p = .002,

Immersion and Control in Learning Art Knowledge: An Example in Museum Visit

119

no control effect, F(1,57) = 0.44; p = 0.51, η2p = .008, and no interaction effect, F(1.57) = 3.17; p = 0.08, η2p = 0.05. Moreover, this indicator is positively related to performance, r = .434, p < .001. For each sculpture we study the correlation between this indicator and performance. The performance is related to this indicator for the “Dying slave”, r = .66; p = .004, and the “Pieta”, r = .26; p = .04. In contrast, no correlation was assessed between performance and this indicator for the “Moses” Sculpture. (2) The results for the second indicator, the time management, are assessed on the global performance and not per sculpture. Thus, the results showed a significant effect of immersion on the number of times that clocks was consulting, F(1,57) = 23,766, p < .001, η2p = .294. The same result was found for the effect of control on this indicator, F(1.57) = 4.678, p = .048, η2p = .067. Studying the interaction between immersion, control and the indicator did not reveal significant results, F (1,57) = 304, p = .584, η2p = .005. Thus, our results showed that participants in high immersion condition, with virtual reality, consulted more the clock (M = 7.96, SD = .80) than participants in low immersion condition, with tablet (M = 2.40, SD = .81). The same results were found for the control, people in active condition consulted much more the clock (M = 6.33, SD = .81) than people in passive condition (M = 4.03, SD = .80). Intrinsic Motivation. Results from the two-ways ANOVA revealed no effect of immersion on the participant’s intrinsic motivation, F (1, 57) = .305; p = .583, η2p = .005, and no effect of the control on it, F (1,57) = .168; p = .683, η2p = .003. Studying the interaction between them did not revealed any effect F (1,57) = .118; p = .732, η2p = .002. Because the intrinsic motivation is composed by several dimensions we decided to have a look on each of them. We did not find any effect of immersion, control and interaction for each dimension. Precisely, for the dimensions one, called “Interest”, we have no immersion effect F (1,57) = .118; p = .732, η2p = .002, no effect of control F (1,57) = . 306, p = .583, η2p = .005, and no interaction effect, F (1,57) = 255, p = .616, η2p = .004 Concerning the dimensions 2, the “perceived competences”, we have no effect of immersion on it, F (1,57) = .002, p = .967, η2p = .000, no control effect F (1,57) = .175, p = .677, η2p = .003, and no interaction F (1,57) = 53, p = .819, η2p = .001. The dimension 3, “perceived choice” reveal none effect of immersion, F (1,57) = 1,042, p = 312, η2p = .018, of control, F (1,57) = . 450, p = .505, η2p = .008, and interaction, F (1,57) = 361, p = 550, η2p = .006. The last dimension; the 4, also called the “perceived skills” revealed no significant effect of immersion, F (1,57) = .188, p = .667, η2p = .003, no control effect, F (1.57) = 1.420, p = .238, η2p = .024, and no interaction between them, F (1,57) = 3,463, p = .68, η2p = .057. The general intrinsic motivation, including all the dimensions, is not related to performance. Results from analyses for each dimensions, showed that only the dimension 2 “perceived competence” was significantly positively in relation to performance, r = .35, p < .05, but not the other dimensions.

120

M. Burgues et al.

2.3 Discussion-Conclusion Experiment 1 The objective of this experiment was to study the impact of immersion and control on performance, self regulation and intrinsic motivation. The first results were in accordance with our hypothesis, showing that in active condition of learning, participants were more performant than in passive one. The active condition allowed participants to discover by themselves the information to learn, to self-regulate their strategies [22] thanks to which they could improve their learning. As expected the behavioral indicator “heard and replay information” of self-regulation was indeed related to performance, but no relation was found between control and intrinsic motivation. The lack of results concerning the intrinsic motivation could be due to our experimental protocol. We knew that participants had no interest or knowledge in art. We supposed that they decided to participate in this study, not for the art learning proposal but for the tools and the technologies used (virtual reality or tablet for the 3D museum). At this moment of the experiment, the learning task did not have any importance for the participants. In comparison, students in art having this museum visit including in their curriculum, might have a stronger motivation to learn new things or to get a better grade. Futures studies will have a look on the different types of motivation and on what it is concentrated on, to have a better understanding on it. On this first study we also had a look on the perception of emotion according to the three different sculptures. It was found that participants had stronger emotions when they saw the “Pieta” than when they saw the other ones. We believe that future studies could explore this point in more detail, studying the impact of emotions on learning, according to different conditions of immersion and control [23]. To finish, immersion did unexpectedly not impact performance. In other words performance did not differ according to the use of tablet or virtual reality headset. Besides, no effect of immersion was found on the behavioral indicator of self-regulation. Also, no relation between immersion, control and intrinsic motivation was revealed. To understand why immersion did not impact learning, self-regulation and intrinsic motivation, we may consider using the theory of the cognitive load from Sweller [14]. In this theory, the cognitive capacity is considered to be limited and has to be distributed in different types of cognitive load. Our protocol may have induced a cognitive overload, due to the required simultaneous activities: learning how to use the material, getting knowledge on art and complete fill-gap exercises. During the experiment we measured the mental effort perceived as a control variable, to have a first indication of the cognitive load. Participants, all conditions considered, had a high degree of perceived mental effort, in order to memorize information: in high immersion, M = 5.61, SD = 1.63, Min: 1, Max: 9; in low immersion, M = 6.37, SD = 1.73, Min: 1, Max: 9. Thus, for each condition, the load to use material was so important that it could have limited the cognitive capacity available for the essential load for learning. Also, the amount of information to assimilate could have been so important that it may have limited its retention. In the next study, we replicated the same experiment, with an additional condition, characterized by the visit of only one sculpture instead of three ones, to study the impact of reducing the cognitive load. We expected that by reducing the amount of information to be memorized, the

Immersion and Control in Learning Art Knowledge: An Example in Museum Visit

121

performance and the mental effort perceived would be higher than with three sculptures. This hypothesis is tested in the study called experiment 2.

3 Experiment 2 3.1 Objectives and Methodology Experiment 2 was designed to evaluate the effect of the amount of art knowledge on learning performance. This experiment was similar to experiment 1, with one change: a group of participants was added and had to learn knowledge from only one sculpture (the “moses”) instead of three ones as it was in experiment 1. Thus this new group of participants was in a low cognitive load condition and was compared to the participants from the group of experiment 1 who were in a high cognitive load condition. We expected that by reducing the amount of knowledge to memorize (low cognitive load), performance would be higher and the mental effort lower than in experiment one. Moreover, we studied whether performance and mental effort perceived would vary according to immersion and control conditions. Participants. Ninety-three students (seventy-eight females and fifteen males, average age = 20.99, SD = 2.46) in human and social sciences were recruited at the University of Toulouse II. It was previously checked that the participants did not have any knowledge in renaissance art. The samples did not differ from emotional perception of the “Moses”: t (153) = .114; p = .91. This new sample was compared to the sample of experiment one. Materials and Groups. Low cognitive load was manipulated by presenting only one sculpture that is in this condition participants had to learn knowledge from only one sculpture. Thus, we have deleted two sculptures on the museum visit among those selected in experiment 1, keeping the “David” for the familiarization phase, and the “Moses” for the visit. The choice of keeping the Moses was because it followed the progress of the experiment 1 and also because it was the one with the less emotional perception. Emotions impact learning [2], but it will be necessary to conduct a study in its own right to study this phenomenon, and it was not the case here, consequently we decided to control it, taking the sculpture with the less emotional perception following the study 1. Therocedure was identical to study 1, but participants in low cognitive load group had ten minutes for the familiarization and ten minutes for the learning phase with the second sculpture (the “Moses”). Participants were randomly assigned to four different groups, the same as experiment 1: Active virtual reality (N = 23); 2: Passive virtual reality (N = 23); 3: Active tablet (N = 22); 4: Passive tablet. (N = 25). Familiarization. Identical to experiment 1. It is compound with ten minutes of a visit for the “David” Sculpture and after the fill-gap exercise related to it, with 4 sentences and 3 holes. The sentences are the same as experiment 1. Performance. The evaluation of learning performance was with the same fill-gap as in experiment 1 limited to the “Moses” ones This bring to a total of 12 words to be found. For each correct answer, one point was scored with brought to a maximum of 12 points per student.

122

M. Burgues et al.

Mental Effort Perceived. The cognitive load was reflected by the mental effort perceived to memorized the art knowledge related to each sculpture in learning phase (museum visit). It was assessed via a 9-point scale ranging from 1 (very low mental effort) to 9 (very high mental effort). This measure was collected for all groups. Procedure. The different steps of the experiment were the same as experiment 1, except that for the low cognitive load group, the visit phase contained a single sculpture. They did the familiarization phase with the visit and the fill-gap non-evaluated. After, they did the visit, following the conditions in which their were assigned. The visit was limited to the sculpture the ‘Moses” and had a duration of 10 min. At the end of this learning phase, they did an interference phase, composed of socio-demographic questions and about mental effort perceived to measure cognitive load. Then, they concluded the experiment with the completion of the fill-gap exercises, measuring the learning performance.

3.2 Results Performance. A three-way ANOVA with the cognitive load manipulated by the amount of knowledge to learn (one sculpture vs. three sculptures), Immersion (tablet vs. virtual reality headset) and Control (active vs. passive) as independent factors was computed. Results showed a significant effect of the cognitive load, F (1,146) = 4.29; p = 0.04, η2p = 0.03 on performance; no significant effect of immersion, F (1, 146) = 0.24; p = 0.88 but a significant effect of control was revealed, F (1,146) = 9.85; p = 0.002, η2p = 0.063. Performance was higher in low cognitive load condition than in the high ones, t(152) = 2.044; p = .04; higher in active condition than in the one t(152) = 2.70; p = .008. Two interactions were marginally significant. The first one was between the amount of information and the degree of immersion, F (1,146) = 3.48; p = .06; η2p = 0.23 showing that in tablet condition, performance was higher in low amount condition than in high amount information condition (p = .006) whereas the performance did not change according to the amount of sculptures in VR (p = .84). The second one was between the amount of information and the degree of control, F (1,146) = 3.41; p = .06, η2p = 0.23. In active condition, performance did not change according to the amount of information (p = .88), whereas in passive condition, performance was higher for the low amount of information condition than for the high amount of information (p = .006) (Table 1). Mental Effort Perceived. Results from the three way independent factors ANOVA revealed no effect of the cognitive load manipulated on the mental effort perceived, F (1,147) = 1.97; p = .163, η2p = .013; no effect of immersion, F (1,147) = .182; p = .67, η2p = .001; no effect of control, F (1,147) = .15; p = .70, η2p = .001. Two interaction effects were found on the mental effort perceived. First, a significant interaction effect between cognitive load and immersion, F (1,147) = 6.59; p = .01, η2p = .043 showing that in low load condition, mental effort was perceived higher with VR than with tablet (p = .03) whereas in the high load condition, the mental effort was perceived equal in both VR and tablet conditions (p = .09). Second, a significant interaction effect between cognitive load and control, F (1,147) = 4.14; p = .04, η2p = .027 was found

Immersion and Control in Learning Art Knowledge: An Example in Museum Visit

123

Table 1. Performance means on the “moses” and (SD) according to the conditions. Low cognitive load High cognitive load Tablet Active 5.95 (2.13) Passive 5.76 (1.88)

5.13 (2.26) 3.73 (2.43)

Virtual reality Active 5.61 (1.97) Passive 4.87 (2.05)

6.27 (2.55) 4.06 (2.57)

Total

4.78 (2.60)

5.55 (2.01)

but in low cognitive load condition, mental effort perceived equally in passive condition than in high load condition (p = . 08). Again, in active condition, the cognitive load manipulated did not affect the mental effort perceived (p = .09) (Figs. 7 and 8).

Fig. 7. Mental effort perceived as a function of immersion.

Finally, no relationship was found between mental effort perceived and performance, r = .018; p = .83 on the overall sample of the 153 participants, neither on the participants in low cognitive load condition, r = .017; p = .87, nor in high cognitive load, r = −.02; p = .88. 3.3 Discussion-Conclusion Experiment 2 As expected performance was higher in low cognitive load condition than in high one and was higher in active condition than in the passive one. The degree of immersion alone did not affect performance.

124

M. Burgues et al.

Fig. 8. Mental effort perceived as a function of control.

Even though the difference was marginally significant (p = .06) in tablet condition, performance was higher in the low cognitive load condition than in high one, whereas the performance did not change according to the amount of sculptures in virtual reality condition. This result suggests that a high cognitive load decreases performance only in tablet condition and not in virtual reality one. Besides, in active condition, performance did not differ regardless the amount of information, whereas in passive condition, performance was lower in high cognitive load condition than in low one. Unexpectedly, we did not find any effect on the mental effort perceived of the cognitive load, the immersion and the control condition. The significant interaction effect between cognitive load and immersion, revealed that in low load condition, mental effort was perceived higher with VR than with tablet whereas no significant difference was found in high cognitive load condition. This result about higher mental effort perceived in VR regardless the cognitive load level could be explained by the increase of experience in using the technology. After three sculptures, the participants had more experience of using the technology than after one sculpture. Consequently, it seems that perceived effort decreases with the increasing experience. In other words, after a very short experience (familiarization phase), the participants in low cognitive load condition might be in cognitive overload concerning the extrinsic load. They could have more cognitive capacity available for learning due to the low amount information [13] but at the same time, they had less experience in using the technology tool. In VR, the level of perceived mental effort to retain information can be due to the managing between cognitive capacity for learning and the use of technology. The extrinsic load would be higher on the VR condition than in tablet only in low cognitive load because people used usually more tablet than VR. Thus, in tablet condition, the extrinsic load could be lower than in VR and the amount of information could take less essential cognitive load, diminishing the mental effort perceived. Using technologie usually used, and reduce the amount of information seemed to help to reduce the mental effort perceived. Finally, the significant interaction between cognitive load and control on mental effort perceived was difficult to understand because the simple effects of the analysis of

Immersion and Control in Learning Art Knowledge: An Example in Museum Visit

125

variance did not reach significance. Mental effort perceived seemed to be more influenced by the use of the new technologies than by the control of the interactions with the learning object. Globally, our results revealed a dissociation between cognitive load and the participant’s mental effort perceived. Performance was not related to mental effort perceived but by the manipulation of the cognitive load: lower amount of information improves learning performance. We think it is important to question our evaluative methodology for the mental effort perceived by using one question with a 9 point Likert scale and by finding a different way of measure for the next study.

4 General Conclusion The different results are as interesting as surprising. First, both experiments did not find any immersion impact on learning. If immersion didn’t impact learning, how virtual environment could have impacted it? This result questions the importance of immersion in virtual reality compared to the importance of presenting information under virtual modality [24]. Future studies could be interested on the impact of the learning modality presentation, in virtuality or in reality, on the learning process, according to the continuum of Milgram and Kishino [24]. Secondly, our findings confirmed the hypothesis that, regardless of the level of cognitive load, the active control contributes to improve the learning performance. To propose applied benefits, we can advise that, for novice, leaving the possibility to the learner to control, interact and discover the learning environment by themselves can increase their learning performance. This also can help to learn by reinforcing the mnesic trace, taking into account the theory of grounded cognition [26]. With the experiment 2, we also realized the important impact of the amount of information i.e. the level of cognitive load, on the learning performance. Knowing this, we can recommend for novice learner to privilege educational scenario with low cognitive load. In order to reduce it, it is possible to limit the amount of information and/or to take care about the technologies used by selecting familiar technologies or providing a longer familiarization phase. It can also be interesting to segment the learning phase in order to reduce the amount of information listened in one time frame [27]. To help retention, instead of putting the test after the totality of the museum visit, we can put the test after the visit of each sculpture. Having a test sculpture by sculpture can help learners to reduce cognitive load [13], to help self regulation [11] and to use the testing effect [28]. In this perspective, it may be interesting for the next experiment to study the impact of this type of segmented and tested learning. For example, we could compare the learning performance and the mental effort perceived of experiment 1 with the new condition of segmented learning phase and testing sculpture by sculpture. To be more precise, we could present the first sculpture to the participant so that they could learn, self-regulate themselves and do the test on sculpture one. Then, the second sculpture could be presented for the learning and self-regulating phase followed by the test and so forth. In addition to that, future studies can also integrate a motivation measure between novice and art student, to study the impact of learning goals on their motivation and learning.

126

M. Burgues et al.

We can conclude that the use of new technologies really bring different interesting learning practices. To further explore the impact of these new learning practices, we would need to make additional studies in the future, to define applied inputs with a pedagogical added value.

References 1. Aleven, V., Stahl, E., Schworm, S., Fischer, F., Wallace, R.: Help seeking and help design in interactive learning environments. Rev. Educ. Res. 73(3), 277–320 (2003). https://doi.org/10. 3102/00346543073003277 2. Gendron, B.: Capital émotionnel, cognition, performance et santé: quels liens? In: Du percept à la décision: Intégration de la cognition, l’émotion et la motivation. De Boeck Supérieur, Louvain-la-Neuve (2010). https://doi.org/10.3917/dbu.masmo.2010.01.0329 3. Black, A.E., Deci, E.L.: The effects of student self regulation and instructor autonomy support on learning in a college-level natural science course: a self-determination theory perspective. Sci. Educ. 84, 740–756 (2000) 4. Deci, E.L., Ryan, R.M.: The ‘what’ and ‘why’ of goal pursuits: human needs and the selfdetermination of behavior. Psychol. Inq. 11, 227–268 (2000) 5. Burkhardt, J.M., Lourdeaux, D., Mellet-d’Huart, D.: La conception des environnements virtuels pour l’apprentissage (2003) 6. Kontogeorgiou, A.M., Bellou, J., Mikropoulos, T.A.: Being inside the quantum atom. PsychNology J. 6(1), 83–98 (2008) 7. Muhanna, A.: Virtual reality and the CAVE: taxonomy, interaction challenges and research directions. J. King Saud Univ. Comput. Inf. Sci. 27, 344–361 (2015) 8. Mikropoulos, T.A., Natsis, A.: Educational virtual environments: a ten-year review of empirical research (1999–2009). Comput. Educ. 56(3), 769–780 (2011) 9. Negut, A., Matu, S.-A., Sava, F.A., David, D.: Task difficulty of virtual reality-based assessment tools compared to classical paper-and-pencil or computerized measures: a meta-analytic approach. Comput. Hum. Behav. 54, 414–424 (2016) 10. Hake, R.R.: Interactive-engagement versus traditional methods: a six-thousand student survey of mechanics test data for introductory physics courses. Am. J. Phys. 66(1), 64–74 (1998) 11. Pintrich, P.: The role of goal orientation in self regulated learning. In: Handbook of Self Regulation. Academic Press, San Diego (2000) 12. Bouffard-Bouchard, T., Pinard, A.: Sentiment d’auto-efficacité et exercice des processus d’autorégulation chez des étudiants de niveau collégial. Int. J. Psychol. 23(1–6), 409–431 (1988) 13. Sakdavong, J.C., Burgues, M., Huet, N.: Virtual reality in self-regulated learning: example in art domain. In: Proceedings of the 11th International Conference on Computer Supported Education (CSEDU 2019), Heraklion, Greece, vol. 2, pp. 79–87, May 2019. https://doi.org/ 10.5220/0007718500790087 14. Sweller, J.: Evolution of human cognitive architecture. In: Ross, B.H. (ed.) The Psychology of Learning and Motivation, vol. 43, pp. 215–266. Academic Press, New York (2003) 15. Ausubel, D.P.: In defense of verbal learning. Educ. Theory 11(1), 15–25 (1961). https://doi. org/10.1111/j.1741-5446.1961.tb00038 16. Deci, E.L., Nezlek, J., Sheinman, L.: Characteristics of the rewarder and intrinsic motivation of the rewardee. J. Pers. Soc. Psychol. 40(1), 1 (1981) 17. Jang, S., Vitale, J.M., Jyung, R.W., Black, J.B.: Direct manipulation is better than passive viewing for learning anatomy in a three-dimensional virtual reality environment. Comput. Educ. 106, 150–165 (2017)

Immersion and Control in Learning Art Knowledge: An Example in Museum Visit

127

18. Limniou, M., Roberts, D., Papadopoulos, N.: Full immersive virtual environment CAVE in chemistry education. Comput. Educ. 51(2), 584–593 (2008) 19. Visch, V.T., Tan, E.S., Molenaar, D.: The emotional and cognitive effects of immersion in film viewing. Cogn. Emot. 24(8), 1439–1445 (2010) 20. Dalgarno, B., Lee, M.J.: What are the learning affordances of 3-D virtual environments? Br. J. Educ. Technol. 41(1), 10–32 (2010) 21. Deci, E.L., Eghrari, H., Patrick, B.C., Leone, D.: Facilitating internalization: the selfdetermination theory perspective. J. Pers. 62, 119–142 (1994) 22. Bruner, J.S.: Going beyond the information given. Contemp. Approaches Cogn. 1(1), 119–160 (1957) 23. Pan, Z., Cheok, A.D., Yang, H., Zhu, J., Shi, J.: Virtual reality and mixed reality for virtual learning environments. Comput. Graph. 30 (2006). https://doi.org/10.1016/j.cag.2005.10.004 24. Milgram, P., Kishino, F.: A taxonomy of mixed reality visual displays. IEICE Trans. Inf. Syst. 77(12), 1321–1329 (1994) 25. Redish, E.F., Saul, J.M., Steinberg, R.N.: On the effectiveness of active engagement microcomputer-based laboratories. Am. J. Phys. 65(1), 45–54 (1997) 26. Barsalou, L.W.: Grounded cognition. Ann. Rev. Psychol. 59, 617–645 (2008) 27. Mayer, R.E. (ed.): The Cambridge Handbook of Multimedia Learning. Cambridge University Press, New York (2005) 28. Roediger III, H.L., Karpicke, J.D.: Test enhanced learning: taking tests improves long-term retention. Psychol. Sci. 17(249), 255 (2006) 29. Bandura, A., Schunk, D.H.: Cultivating competence, self-efficacy, and intrinsic interest through proximal self-motivation. J. Pers. Soc. Psychol. 41(3), 586 (1981)

Studying Relationships Between Network Structure in Educational Forums and Students’ Performance O. Ferreira-Pires, M. E. Sousa-Vieira(B) , J. C. L´ opez-Ardao, and M. Fern´ andez-Veiga University of Vigo, Vigo, Spain [email protected]

Abstract. Social networks based on mutual interest, affinity or leadership are spontaneously generated when the training activities are carried out through online learning systems wherein collaboration and interaction among participants is encouraged. The bare structure of those interactions, reflected in a network graph, is known to contain relevant statistical information about the dynamics of the learning process within the group, thus it should be possible to extract such knowledge and exploit it either for improving the quality of the learning outcomes or for driving the educational process toward the desired goals. However, discovering the features and structural properties in the social network graph which enclose the maximum and most valuable information for educational purposes requires a careful analysis and identification of the deep connections between social graphs and students’ academic performance. In this chapter, we address a systematic study on the strength and statistical significance of a number of correlations between the underlying graphs triggered by the online learning activities and the prediction of the student’s achievements. Several structural features of networks are identified as the ones with larger impact on the effectiveness of the estimators. Our data source is a complete record of online student activity collected over two academic years both at the undergraduate and the graduate level.

Keywords: Online social learning environments Social networks analysis

1

· Forums ·

Introduction

During the recent years, the structure of several natural and artificial complex systems has been analyzed, and as a result many of the properties of these objects have been discovered [1]. The examples are pervasive, from biological networks to social networks, or from the Internet topology to the Bitcoin transactions. Nevertheless, despite the significant progress made in the structural understanding c Springer Nature Switzerland AG 2020  H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 128–154, 2020. https://doi.org/10.1007/978-3-030-58459-7_7

Studying Relationships Between Network Structure

129

of massive networks, the ultimate goal is to translate this physical or logical structure, which has no meaning in itself, to functional predictions or behavior of the system under study. Online social networks (OSNs) have been increasingly integrated into formal and informal learning environments during the last decade due to their ability to foster learning through the immediate formation of social relationships among the participants either in the real or in the virtual realms. Moreover, these relationships can take place not only between peer students or colleagues, but also between students and experts alike [8]. With social learning methodologies and tools, the emphasis in learning shifts more toward effective information exchange and collaborative learning, thus happening within a community. It has long been recognized that the structure of such interactions (of its underlying graph) is key to a deep comprehension of the information flow within the students’ group, and that in the end it can be used to measure the quality of the learning process and to infer students’ performance directly from their pattern of interactions. During the past twelve years, social network analysis (SNA) [17] has been used by researchers to analyze a number of datasets naturally produced within digitized learning environments, like massive online courses (MOOCs), learning management systems (LMSs) and social learning environments (SLEs). SNA offers a powerful toolbox to try to understand to what extent the network structure of the social interaction graph is related to the flow of information/knowledge among the participants in the community. In this paper, we discuss the results of SNA conducted on the cohorts of students of two courses of different levels on computer networks. In our case, we use a software platform based on Moodle, especially built for encouraging online participation of the students to design, carry out and evaluate a set of online learning tasks and games. After logging the activity during a full year, we have performed a thorough network analysis with the aim to understand the information flow within this controlled groups of students. In this work we focus on the participation in forums, modeling the social relationships taking place in each one of the forums of the virtual classrooms as suitable social graphs. Our findings include the detection of significant correlations among the pattern of activity and the structure of the network and the final results, as well as the prompt identification of subcommunities in which information flow is quick and intense. The rest of the paper is organized as follows. Section 2 summarizes recent related work. The methodology employed in the courses under study is reported in Sect. 3. Section 4 contains the main results of the social networks analysis applied to the datasets. Finally, concluding remarks are included in Sect. 5.

2

Literature Review

In the last decade, a significant research effort has been done on understanding how the interpersonal interactions in OSNs shape, reinforce and enhance the learning process. Datasets were mined in order to discover the most influential students or to find out how collaboration among groups of students arise, and the

130

O. Ferreira-Pires et al.

impact of relationships on learners’ performance. In other words, whether the structure of the community to which a student belongs while he/she is engaged in the learning environment has any substantial correlation on his/her performance. In this Section, we provide a chronological review of representative papers. A more extensive compilation can be found in [4]. The focus of the study reported in [14] is to highlight the advances that social network analysis can bring when studying the nature of the interaction patterns within a networked learning community, and the way its members share and construct knowledge. A structural analysis of student networks has been done in [5] too, to see if a student’s position in the social learning network could be a good predictor of the sense of community. Empirical evidence was found that such position is indicative of the strength of their subjective sense of community, and also of the type of support required by students for advancing as expected through the course. The study addressed in [9] focuses on communities of students from a social network point of view, and discuss the relations taking place in those communities, the way media can influence the formation of online relationships and the benefits that can emerge after actively maintaining a personal learning network. Understanding the information flow that really occurs in SLEs is obviously of key importance to improve the learning tools and methodologies. For instance, [16] highlight the importance of a good understanding of the communication flows that really occur among users in educational online forums, in order to detect significant posts to be included in social networks analysis. In [19] authors examine the impact of academic motivation on the type of discourse contributed and on the position of the learner in the social learning network, and concludes that highly intrinsically motivated learners become central and prominent contributors to cognitive discourse; in contrast, extrinsically motivated learners only contribute on average and are positioned randomly throughout the social graph. The work [10] investigate the patterns and the quality of online interactions during project-based learning, showing its correlation with project scores. The identification of social indices that actually are related to the experience of the learning process is addressed in [26], showing that some popular measures such as density or degree centrality are meaningful or not depending on the characteristics of the course under study. The structure of two distributed learning networks is given in [2] in order to understand how students’ success can be enhanced. In [11] authors propose degree centrality as the basic predictor for effectiveness of learning in the course under study. In addition to structural properties, the influence of cognitive styles and linguistic patterns of self-organizing groups within an online course have also been the focus of some works, such as [27]. More recently, the work [20] examines relationships between online learner self- and co-regulation. Results reveal that students with high levels of learner presence occupy more advantageous positions in the network, suggesting that they are more active and attract more reputation in networks of interaction. The authors of [24] discuss the patterns of network dynamics within a multicultural online collaborative learning environment. The experiment tests a set

Studying Relationships Between Network Structure

131

of hypothesis concerning tendencies towards homophily/heterophily and preferential attachment, participant roles and group work in the course under study. In [28] social network analysis techniques are used to examine the influence of moderator’s role on online courses. The main conclusion is that when students are assigned to the moderator position their participation quantity, diversity and interaction attractiveness increase significantly, and their lack of participation influences the group interaction. In [3] authors investigate the association between social network properties, content richness in academic learning discourse and performance, concluding that these factors cannot be discounted in the learning process and must be accounted for in the learning design. In [7], the relationship between social network position, creative performance and flow in blended teams is investigated. The results indicate that social network indices, in particular those measuring centralization and neighbors’ interactions, can offer valuable insight into the creative collaboration process. The article [15] compares the impact of social-context and knowledge-context awareness on quantitative and qualitative peer interaction and learning performance, showing that with the first one the community had significantly better learning performance, likely related to the more extensive and frequent interactions among peers. And [21] investigates the discourses involving student collaboration in fixed groups and opportunistic collaboration. They find that actively participating and contributing high-level ideas were positively correlated with students’ domain of knowledge. The existence of a positive relationship between centralization and cohesion and the social construction of knowledge in discussion forums is the main conclusion in [25]. In [18] the authors present a new model for students’ evaluation based on their behavior during a course, and its validation through an analysis of the correlation between social network measures and the grades obtained by the students. Finally, [13] investigate the influence of learning design and tutor interventions on the formation and evolution of communities of learning, employing social network analysis to study three differently designed discussion forums. Related to our prior work in this area of research, [22] focused on the quantitative characterization of non-formal learning methodologies. To this end, we used one custom software platform, SocialWire, for discovering what factors or variables have statistically significant correlation with the students’ academic achievements in the course. The dataset was first collected along three consecutive editions of an undergraduate course on computer networks. Next, we also measured the extent and strength of social relations in an online social network used among students of a master level course in computer networks [23]. The dataset comprised again a period of three academic years. As these papers discuss, in addition to the quantity of interactions among participants, successful prediction of performance is possible when the quality of interactions can also be observed, or inferred on the basis of the network structure. In [6] we use a similar approach, but applied to the analysis of forums engagement. It is the first time that we encourage and reward quality participation in this activity in the undergraduate course on computer networks under study. In

132

O. Ferreira-Pires et al.

this work we extend this analysis to a master level course on advanced computer networks. In both cases, we include in the analysis the so-called global graph that includes the social relationships taking place in all the forums of each course. Moreover, we enrich the study with new information, such as those given in the new community structure section.

3

Application

We have taken as our educational environments the 2017/2018 edition of a course on Computer Networks (CN course) directed to undergraduates of the second year of the Telecommunications Technologies Engineering degree and the 2018/2019 edition of a course on Advanced Computer Networks (ACN course) for first-year graduate students of the Telecommunications Engineering master degree. Both courses have a weekly schedule that spans 14 weeks. The classroom activities (adapted to the level of each course) are organized as follows: – Lectures, that blend the presentation of concepts, techniques and algorithms with the practice of problem-solving skills and discussion of theoretical questions. In the ACN course, as an introduction to some lessons, teachers will recommend some videos to review related contents studied in previous courses, among them the CN course. – Laboratory sessions, where the students design and analyze different network scenarios and with different protocols, using real or simulated networking equipment. In the CN course, in some of these sessions students make a small programming assignment. In both courses the activities are supported by a tailored Moodle site to which the students and teachers belong, and wherein general communication about the topics covered takes place. To encourage networked learning and collaborative work, different activities are planned and carried out in the platform. The students may gain different points by completing or participating in these activities, and the resulting rankings are eventually made public to the group. In the editions analysed in this work, these online activities were proposed: 1. Homework tasks, to be worked out previously to the in-class or the laboratory sessions. With this activity teachers encourage the students to prepare some of the material in advance. 2. Quizzes, proposed before the midterm exams for self-training. 3. Collaborative participation in forums. Several forums were created in Moodle to allow the students to post questions, doubts or puzzles related to the organization of the course (organization forum in both courses), the content of the in-class lectures or the laboratory sessions (lessons forum in both courses) and the programming assignments (programming forum, only in the CN course). 4. Optional activities, such as collaborative edition of a glossary of terms related to the subject, games, peer assessment of tasks, etc.

Studying Relationships Between Network Structure

133

The maximum score of tasks and quizzes is measured in so-called merit points, and represents the total score gained from engagement in online activities in the continuous assessment. It is possible to obtain extra merit points by doing the optional activities in order to compensate for low scores or late submissions of some of the tasks or quizzes. Participation in forums, solving doubts or sharing resources, is also valued with points or votes granted by the teachers or the classmates. In the CN course each post can be voted in a discrete scale: 3 points (lessons forum), 5 points (programming forum) or 11 points (organization forum). As new points are obtained, the so-called karma level of each student increases, depending on the average of the points obtained in each forum, the difference with the average of the points obtained by the class in each forum, the total number of points obtained and the total number of posts voted by the student. In the ACN course each post can receive positive or negative votes (one for teacher or student). As new votes are granted or obtained, the karma level of each student increases in 1 unit for each granted vote and increases or decreases in 5 or −5 units for each positive or negative received vote. Moreover, the student that opens the thread and the teachers can reward with 15 and 30 units of karma, respectively, the best response. Finally, the use of the virtual classroom is also rewarded by the automatic scoring of different actions carried out in the platform related to the normal activity unfolded along the term, like viewing resources, posting new threads, replying to posts, etc. The so-called experience points are awarded in a controlled environment with maximum values and their frequency set by the teachers. The accomplishment of some tasks, the karma levels and the experience points are ultimately converted into certain benefits helpful to pass the subject: bonus points for the grades, extra time or tips in the final exam, etc. Students may pass the course after a single final examination covering all the material (in the CN course, provided the programming assignment meets the minimum requirements), but they are encouraged to adhere to the continuous assessment modality. In continuous assessment, we weigh 50% (in the CN course) or 55% (in the ACN course) the final exam, but the rest is split as follows: 20% (in the CN course) or 30% (in the ACN course) from the midterm exams, 20% (in the CN course) from the programming assignment and 10% (in the CN course) or 15% (in the ACN course) coming out from the merit points obtained by accomplishing the online activities described previously, devised as a tool to increase the level of participation. Students have two opportunities to pass the exam (non-exclusive), May and July in the CN course and January and July in the ACN course. To finish our description, in this edition 136 students followed the CN course. Of the 129 students which followed the continuous assessment 65 finally passed the course. And of the 7 students not engaged in continuous assessment only 2 finally were able to pass (one of them had an active participation in the three forums). In the ACN course, 20 students were enrolled. Of the 17 students which followed the continuous assessment 15 finally passed the course. And all the

134

O. Ferreira-Pires et al.

students not engaged in continuous assessment did not take the final exam (none of them participated in the forums activity).

4

Analysis of the Datasets

We applied standard SNA techniques to mine the data collected in forums. As explained in the introduction, we model the social relationships taking place in each one of the three forums as graphs, termed hereafter lessons graph (LG), programming graph (PG) and organization graph (OG). Our intent is to explain the basic structural properties of such graphs as consequences of the social interactions among its agents. We also consider the global graph (GG) that includes the social relationships taking place in the three forums. For such purpose, we recorded the events that took place in each forum: users who posted new threads, users who replied and the average valuations they received. This information is represented as a graph where two nodes, the users, are connected by an edge if one has given a reply to an entry posted by the other. Moreover, self-edges represent new threads. The weight of each edge is equal to the average points (in the CN course) or the difference between the number of positive and negative votes (in the ACN course) obtained by the reply or the new thread post (in the case of the global graph, the points of each forum were converted to the same scale). An illustration of the graphs of the CN course is given in Fig. 1, where every node is a student identified by his/her position in the ordered list of final grades.

Fig. 1. Forums activity graphs of the CN course [6]. LG (top-left), PG (top-right), OG (bottom-left) and GG (bottom-right). (Color figure online)

Studying Relationships Between Network Structure

135

Fig. 2. Forums activity graphs of the ACN course. LG (top-left), OG (top-middle) and GG (top-right). (Color figure online)

The node with label 0 corresponds to the instructors. Light green nodes belong to students that passed the subject at the first opportunity (May), while dark green is for students who passed after the second opportunity (July), and grey is for students who dropped off the course or failed the subject in the end. The width of each edge is proportional to its weight. Figure 2 show the graphs of the ACN course. In this case, light green nodes belong to students that passed the subject at the first opportunity (January), while dark green is for students who passed after the second opportunity (July) and grey is for students that failed the course. 4.1

Graph Level Measures

In SNA, the static or dynamic structure of a graph reveal key aspects of the collective and individual behavior of the agents. Next, we report some of the typical descriptive measures of a graph, and their values in our datasets. Notice that for some measures we consider simplified versions of the graphs, where the weight of each edge is the sum of the weights of all the edges between the underlying pair of nodes. Moreover, including self-edges means including the opening of new forum threads in the analysis. Density. The density of a graph refers to the number of edges that exist, reported as a fraction to the total possible number of edges, with values ranging from 0 (the empty graph) to 1 (the complete graph). Results in Tables 1 and 2 show that density values are small, specially in the CN course. This fact reflects the definition of the links, since only a part of students provide replies of each post.

136

O. Ferreira-Pires et al.

Table 1. Summary of basic structural parameters of each graph of the CN course [6]. PG

OG

Without loops 0.0564

LG

0.0525

0.0764

0.0558

With loops

0.0718

0.0666

0.0971

0.0685

Reciprocity

0.4091

0.3171

0.2857

0.3669

Transitivity

0.1151

0.1687

0.2201

0.2135

2

70

69

78

178

3

20

32

51

185

4

0

5

10

94

5

0

0

1

21

6

0

0

0

2

In

0.1525

0.1564

0.2543

0.2219

Without loops 0.5733

0.6298

0.7387

0.7463

With loops

0.5663

0.5817

0.6941

0.7368 0.7569

Density

# of cliques

Degree

Size

Out Closeness Betweenness Eigenvector

GG

0.6432

0.6487

0.7579

Directed

0.4949

0.2275

0.4335

0.4249

Undirected

0.7202

0.6637

0.6957

0.6398

Unweighted Without loops 0.8355 With loops

0.8384

0.8206

0.8209

0.8372

0.8109

0.8078

0.8378

−0.0616 −0.2551 −0.1472 −0.2529

Assortativity Degree Nominal

0.0955

−0.0805 0.0154

0.0661

Table 2. Summary of basic structural parameters of each graph of the ACN course. LG Density

OG

GG

Without loops 0.2001

0.1727

0.2352

With loops

0.2311

0.2041

0.2654

Reciprocity

0.5238

0.3829

0.5021

Transitivity

0.4545

0.3546

0.5054

2

31

38

54

3

25

24

62

4

10

3

32

5

2

0

7

In

0.3214

0.2148

0.3114

Without loops 0.6275

0.6796

0.6228

With loops

0.5381

0.5955

0.5424 0.6293

# of cliques

Degree

Size

Out Closeness Betweenness Eigenvector

0.5963

0.6686

Directed

0.3063

0.2678

0.1849

Undirected

0.2971

0.4426

0.1992

Unweighted Without loops 0.6153 With loops

Assortativity Degree Nominal

0.6245

0.6706

0.6078

0.6327

0.6145

−0.4379 −0.1164 −0.0346 −0.0566 −0.0975 −0.0869

Studying Relationships Between Network Structure

137

Reciprocity. Reciprocity accounts for the number of mutual exchanges of information in the network, just counting the balance between incoming and outgoing edges for two subsets of nodes. In the studied graphs, these exchanges happen in the form of posts-replies pairs. In mutual collaboration, either part receives at least one reply from the other part. Tables 1 and 2 also list the average reciprocity in the networks. The results obtained are noticeable, since they are measuring an interactive activity as the participation in forums. The smaller value of the OG in both courses is due to the fact that many of the questions raised in this forum are solved with a single answer, in many cases by the teachers. This also happens in the PG in the CN course, in which some of the doubts, mainly related to the specifications of the tasks, are also solved by the teachers. Related to reciprocity is dyad census, that measures the type of relationship between each pair of nodes. It can be in three states: mutual, asym or null (see Fig. 3): – mutual: CN course (18-LG, 13-PG, 13-OG); ACN course (11-LG, 9-OG) – asym: CN course (52-LG, 56-PG, 65-OG); ACN course (20-LG, 29-OG) – null: CN course (710-LG, 711-PG, 517-OG); ACN course (74-LG, 98-OG)

null

mutual

asymmetric

Fig. 3. Three dyad isomorphism classes.

Transitivity. A broader form of collaboration is transitivity, the fraction of closed loops with three nodes in the graph. The global transitivity coefficient has been computed for the datasets. The results obtained for the CN course are shown in Table 1 again, and confirm that transitivity is moderate. However, this is not entirely unexpected, since in forums there is benefit in acquiring or propagating information through third parties. Our data are consistent with this observation and, consequently, transitivity is quite small. Notice the opposite order of the values of reciprocity and transitivity of the three networks. Nevertheless, as we can see in Table 2, the results obtained for the ACN course are noticeable because many of the replies are directed to the owners of the threads. Again, related to transitivity is triad census, that counts the number of each type of triad present in the network. There are 16 possible triads in a directed graph (see Fig. 4). In our datasets the values obtained are consistent with typical forums’ dynamics: – 033: CN course (7721-LG, 7795-PG, 4615-OG); ACN course (192-LG, 289OG) – 012: CN course (1196-LG, 1305-PG, 1209-OG); ACN course (90-LG, 182-OG) – 102: CN course (482-LG, 275-PG, 128-OG); ACN course (58-LG, 54-OG) – 021D: CN course (183-LG, 233-PG, 180-OG); ACN course (34-LG, 35-OG) – 021U: CN course (26-LG, 22-PG, 57-OG); ACN course (9-LG, 19-OG) – 021C: CN course (85-LG, 48-PG, 89-OG); ACN course (5-LG, 21-OG)

138

– – – – – – – – – –

O. Ferreira-Pires et al.

111D: CN course (40-LG, 26-PG, 15-OG); ACN course (10-LG, 10-OG) 111U: CN course (110-LG, 125-PG, 168-OG); ACN course (21-LG, 39-OG) 030T: CN course (5-LG, 8-PG, 10-OG); ACN course (2-LG, 11-OG) 030C: CN course (2-LG, 2-PG, 3-OG); ACN course (0-LG, 0-OG) 201: CN course (17-LG, 19-PG, 33-OG); ACN course (11-LG, 7-OG) 120D: CN course (4-LG, 2-PG, 2-OG); ACN course (4-LG, 2-OG) 120U: CN course (1-LG, 9-PG, 12-OG); ACN course (5-LG, 5-OG) 120C: CN course (3-LG, 4-PG, 12-OG); ACN course (5-LG, 2-OG) 210: CN course (5-LG, 6-PG, 10-OG); ACN course (9-LG, 3-OG) 300: CN course (0-LG, 1-PG, 2-OG); ACN course (0-LG, 1-OG)

Cliques. A clique is a maximal completely connected subgraph of a given graph. So, a clique represents a strongly tied subcommunity where each member interacts with any other member. Notice that 2-cliques and 3-cliques are related to the measures discussed in the last paragraphs, i.e., reciprocity and transitivity. Tables 1 and 2 list the number of cliques in the graphs by their size. We can see that in both courses cliques larger than 4 are not very likely.

003

012

102

021D

021U

021C

111D

111U

030T

030C

201

120D

120U

120C

210

300

Fig. 4. Sixteen triad isomorphism classes.

Centrality. There exist a number of centrality measures for nodes in a graph. These were developed to capture different properties of nodes’ position. The following are some of the most commonly used, theoretically and empirically: – Degree centrality: just counts the number of neighbors of each node. Implicitly, this considers that all the adjacent nodes are equally important. – Closeness centrality: measures how easily a node can reach other nodes, computing the inverse of the average length of the shortest paths to all the other nodes in the graph. – Betweenness centrality: tries to capture the importance of a node in terms of its role in connecting other nodes, computing the ratio between the number of shortest paths that a node lies on and the total number of possible shortest paths between two nodes. – Eigenvector centrality: a measure based on the premise that a node’s importance is determined by how important or influential its neighbors are. The scores arise from a reciprocal process in which the centrality of each node is proportional to the sum of the centralities of the nodes it is connected to.

Studying Relationships Between Network Structure

139

For the case of degree centrality, we considered separately the in-degree centrality, which is the number of replies a student receives, and two measures of the out-degree centrality: (1) the number of replies given by a student in the graphs without self-edges, and (2) the number of new threads opened and replies given by a student in the graphs with self-edges (we consider this last measure due to the fact that these are the interactions that can be voted by the rest of the class). The results in Tables 1 and 2 reveal that the in-degree centrality values are moderate, but the out-degree centrality is noticeable, indicating a non-homogeneous distribution of the replies submitted by the participants, mainly in the OG. A subset of few nodes act as very active participants in forums (among them the teachers). Nevertheless, more nodes act as generators of new threads and recipients of information. As for the closeness centrality, the high values shown in Tables 1 and 2 are again indicative of the existence of few very active contributors. In the case of the betweenness centrality, the high values observed in Tables 1 and 2 suggest that in all the networks few nodes act as bridges between different parts of the graph. Notice the reduced number of articulation points (vertex whose removal disconnects a graph) in each network. In the CN graphs there are 5 in the LG (0, 4, 23, 76 and 105), 7 in the PG (0, 5, 9, 15, 33, 90 and 127) and 4 in the OG (0, 33, 53 and 73). And in the ACN graphs there are 3 in the LG (0, 4 and 7) and 3 in the OG (8, 9 and 16). Finally, for the eigenvector centrality, we considered the undirected version and we tested different configurations of the graphs built up from the datasets (weighted or not, with or without self-loops). Tables 1 and 2 show that all the measured eigenvector centrality values are noticeable. Again, this clearly means that there are substantial differences among the nodes in their role as sources or recipients of information. Assortativity. Assortativity is defined as the preference of nodes to connect to other nodes within their same class, i.e., those nodes having approximately similar attributes. This is termed as homophily in the literature, and it is based on a prior labeling of the nodes. The assortativity coefficient is therefore positive if similar nodes are more likely to be adjacent to each other, and negative otherwise. Tables 1 and 2 list the degree assortativity and the case of nominal assortativity where each student is labeled according to his/her final grade, considering in both cases the directed graphs. For the nominal assortativity, we have obtained low values suggesting randomness in the relationships. For the degree assortativity, the negative values obtained suggest relationships between the less and the most active students, which is mostly a hint of a well designed activity.

140

O. Ferreira-Pires et al.

Fig. 5. Community structure in the students’ networks of the CN course. LG (top-left), PG (top-right), OG (bottom-left) and GG (bottom-right).

Fig. 6. Community structure in the students’ networks of the ACN course. LG (topleft), OG (top-middle) and GG (top-right).

Community Structure. We applied the multilevel community detection algorithms (robust and fast for this application) to discover the division of nodes into communities in this experimental case, in order to complete the information about the patterns of collaboration in the group. The algorithm was quite effective in identifying the true underlying communities active in the three social networks as well as in the global graph. Graphically, a depiction of those communities for the CN course can be seen in Fig. 5, where one can easily recognize 6, 5, 7 and 6 communities of students, with values of modularity 0.4023, 0.3004, 0.1947 and 0.2377, respectively. For the ACN course, Fig. 6 shows 3, 4 and 4 communities of students, with values of mod-

Studying Relationships Between Network Structure

141

ularity 0.19133, 0.2341 and 0.0961, respectively. Note the connections between communities, in other words, the fact that some nodes belong to two neighbor, overlapping communities, which indicates that groups of strong collaboration are typically not closed. Instead, some students act as bridges between those loosely related groups, allowing a faster propagation of information. 4.2

Per Student Behavior

Due to the fact that global level measures can hide some characteristics of the graphs, it might be interesting to study the distribution of the participation of the students in each forum. Next, we report the results of such analysis. Individual Centralities. In Figs. 7 and 9 we depict the histograms of individual degree centralities and number of new threads or replies for the CN course, which are good indicators of the students’ activity. The tail of the empirical outdegree and number of replies distributions accumulates a non-negligible probability. This is consistent with the view that some nodes concentrate a significant part of the activity of the graphs. Notice that the content of teachers concentrates more that 36, 43 and 49 interactions with more than 20 students in each one of the three graphs. Among the students, the most active (4, 5, 90) interact with several others. A similar behavior is observed in Figs. 8 and 10 for the ACN course. In this case, the content of teachers concentrates 32 and 28 interactions Frequency

Frequency

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0

0.1

0

1

2

3

4 in degree

5

6

7

8

0

0

1

2

3

Frequency 0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

2

4

6 in degree

8

0

10

0

5

10 out degree

Frequency

5

6

7

8

15

20

Frequency

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0

4 in degree Frequency

0.5

0.1

0

5

10

15 out degree

20

25

0

0

5

10

15 out degree

20

25

Fig. 7. Degree centralities of the CN course [6]. LG (left), PG (middle) and OG (rigth).

142

O. Ferreira-Pires et al. Frequency

Frequency

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0

0.1

0

1

2

3 4 in degree

5

6

7

0

0

1

2

Frequency 0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0

3 in degree

4

5

6

Frequency

0.5

0.1

0

2

4

6 out degree

8

0

10

0

2

4

6 8 out degree

10

12

Fig. 8. Degree centralities of the ACN course. LG (left) and OG (right). Frequency

Frequency

Frequency

0.7

0.7

0.7

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0

0

0.5

1

1.5 2 2.5 # new threads

3

3.5

4

0

0

0.1

1

2

3

Frequency

4 5 # new replies

6

7

8

0

0

0.7

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0

5

10

15 20 # replies

25

30

0

35

0

10

15

20 25 # replies

30

35

0

40

0

0.6

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

1 1.5 2 average points new threads

2.5

3

0

0

Frequency

2 3 average points new threads

4

5

0

0

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

1 1.5 2 average points replies

2.5

3

0

0

4

15

20

25 # replies

30

35

40

45

2

4 6 8 average points new threads

10

Frequency 0.6

0.5

10

Frequency 0.6

0

3.5

0.1

1

0.6

0

3

Frequency

0.6

0.5

5

Frequency

0.6

0

1.5 2 2.5 # new threads

0.1

5

Frequency

0

1

Frequency

0.7

0

0.5

Frequency

0.7

0.1

1

2 3 average points replies

4

5

0

0

2

4 6 average points replies

8

10

Fig. 9. Quantity and quality of interactions of the CN course [6]. LG (left), PG (middle) and OG (right).

Studying Relationships Between Network Structure Frequency

Frequency

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1 0

0.1

0

5

10

15 # new threads

20

0

25

0

1

2 3 # new threads

Frequency 0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

1

2

3 # replies

4

5

6

0

0

5

10

15 # replies

Frequency

20

25

30

Frequency 0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

1

2

3 4 5 average points new threads

6

7

8

0

0

1

2 3 4 5 average points new threads

Frequency

6

7

Frequency

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1 0

5

0.1

0

0.6

0

4

Frequency

0.7

0

143

0.1

0

1

2

3 4 average points replies

5

6

7

0

0

1

2

3 4 average points replies

5

6

Fig. 10. Quantity and quality of interactions of the ACN course. LG (left) and OG (right). Frequency

Frequency

Frequency

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

0

0

0

0.1

0.2

0.3 0.4 closeness

0.5

0.6

0.7

0

0.1

0.2

0.3 0.4 closeness

0.5

0.6

0.7

0

0

0.1

0.2

0.3

0.4 0.5 closeness

0.6

0.7

0.8

Fig. 11. Closeness centralities of the CN course [6]. LG (left), PG (middle) and OG (right). Frequency

Frequency

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0.1

0.2

0.3

0.4 0.5 closeness

0.6

0.7

0.8

0

0

0.1

0.2

0.3

0.4 0.5 closeness

0.6

0.7

0.8

Fig. 12. Closeness centralities of the ACN course. LG (left) and OG (right).

144

O. Ferreira-Pires et al. Frequency

Frequency

Frequency

1

1

1

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0

0

0.1

0.2 0.3 directed betweenness

0.4

0.5

0

0.2

0

0.05

0.1 0.15 directed betweenness

Frequency

0

0.2

0

1

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0

0.1

0.2

0.3 0.4 0.5 undirected betweenness

0.1

0.15 0.2 0.25 0.3 directed betweenness

0.6

0.7

0

0

0.35

0.4

Frequency

1

0

0.05

Frequency

1

0.2

0.1

0.2

0.3 0.4 undirected betweenness

0.5

0.6

0

0

0.1

0.2

0.3 0.4 0.5 undirected betweenness

0.6

0.7

Fig. 13. Betweenness centralities of the CN course [6]. LG (left), PG (middle) and OG (right).

with more than 10 students in each one of the three graphs. Among the students, the most active (1, 4, 9, 12) interact with several others. In addition to the intensity of interactions, another important factor is their quality. Figure 9 also shows the histograms of the average points obtained for posting new threads or replies in the CN course (remember the different limits of the scales used in each forum, 3, 5 and 11, respectively). In general, new threads and replies are positively voted, especially those of the lessons and the programming forums. It is important to highlight that a 70% of the best contributors (those students whose posts always received the maximum score) finally passed the course. For the ACN course, Fig. 10 shows a similar behavior. In this case, a 80% of the best contributors (those students whose posts always received the maximum score) finally passed the course. The alternative measures of centrality produce similar, consistent findings. For example, the individual closeness centralities exhibit non-negligible tails in their histograms, see Figs. 11 and 12, revealing the existence of a small number of very active students, (4, 5, 15, 23, 36, 90) in the CN course and (1, 2, 4, 9, 12) in the ACN course, close to many others. And for the betweenness centralities, Figs. 13 and 14 show with the histograms that the higher values are correlated to the articulation points of each graph, listed previously. Finally, Figs. 15 and 16 depict the histograms of the individual eigenvector centralities, taking into account the undirected version of the graphs, weighted or not. Again, we can observe the non-negligible probability of the tails of the distributions (teacher and students 4, 5, 10, 25, 33, 90 in the CN course and 1, 2, 4, 9, 12 in the ACN course). Next, in order to check the relationship among the patterns of participation in the forums and the achievements of the course, we have measured the statistical correlations between the features under study in this section and the final grades of the students that followed the continuous assessment. The sam-

Studying Relationships Between Network Structure Frequency

Frequency

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0

0

0.2

0.05

0.1

0.15 0.2 0.25 directed betweenness

0.3

0.35

0

0

0.05

Frequency 1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

0.05

0.1

0.1 0.15 0.2 directed betweenness

0.25

0.3

Frequency

1

0

145

0.15 0.2 0.25 undirected betweenness

0

0.3

0

0.05

0.1

0.15 0.2 0.25 0.3 0.35 undirected betweenness

0.4

0.45

Fig. 14. Betweenness centralities of the ACN course. LG (left) and OG (right). Frequency

Frequency

Frequency

0.7

0.7

0.7

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0

0

0.2 0.4 0.6 0.8 weighted eigenvector (with self-edges)

1

0

0

0.1

0.2 0.4 0.6 0.8 weighted eigenvector (with self-edges)

Frequency

1

0

0.7

0.7

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0

0.2 0.4 0.6 0.8 weighted eigenvector (with self-edges)

0.2 0.4 0.6 0.8 unweighted eigenvector (with self-edges)

1

0

0

1

Frequency

0.7

0

0

Frequency

0.1

0.2 0.4 0.6 0.8 unweighted eigenvector (with self-edges)

1

0

0

0.2 0.4 0.6 0.8 unweighted eigenvector (with self-edges)

1

Fig. 15. Eigenvector centralities of the CN course [6]. LG (left), PG (middle) and OG (right). Frequency

Frequency

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0

0.2 0.4 0.6 0.8 weighted eigenvector (with self-edges)

1

0

0

Frequency 0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0

0

0.2 0.4 0.6 0.8 weighted eigenvector (with self-edges)

1

Frequency

0.5

0.1

0.2 0.4 0.6 0.8 unweighted eigenvector (with self-edges)

1

0

0

0.2 0.4 0.6 0.8 unweighted eigenvector (with self-edges)

1

Fig. 16. Eigenvector centralities of the ACN course. LG (left) and OG (right).

146

O. Ferreira-Pires et al.

Table 3. Correlation between individual features in each graph and student’s performance in the CN course [6]. ˆ t, P(> |t|)) (β,

Lessons graph

ρˆ

In degree

0.2768 (0.6411, 3.3351, 1.11 · 10−3 )

Out degree

0.2393 (0.6352, 2.8531, 5.01 · 10−3 )

Number new threads

0.2593 (1.1634, 3.1082, 2.31 · 10−3 )

Number replies

0.1834 (0.4176, 2.1613, 3.25 · 10−2 )

Points new threads

0.2514 (0.4611, 3.0081, 3.14 · 10−3 )

Points replies

0.2025 (0.1751, 2.3943, 1.81 · 10−2 )

Directed betweenness

0.1727 (17.0614, 2.0301, 4.44 · 10−2 )

Undirected betweenness 0.2011 (17.0614, 2.3762, 1.89 · 10−2 ) Closeness

0.2956 (4.2402, 3.5823, 4.77 · 10−4 )

Weighted eigenvector

0.1179 (3.0696, 1.3756, 1.71 · 10−1 )

Unweighted eigenvector 0.2366 (6.3358, 2.8191, 5.55 · 10−3 ) ˆ t, P(> |t|)) Programming graph ρˆ (β, In degree

0.2425 (0.5227, 2.8941, 4.44 · 10−3 )

Out degree

0.2066 (0.4418, 2.4453, 1.58 · 10−2 )

Number new threads

0.1989 (0.6028, 2.3501, 2.02 · 10−2 )

Number replies

0.2708 (0.5383, 3.2574, 1.42 · 10−3 )

Points new threads

0.1928 (0.1271, 2.2753, 2.45 · 10−2 )

Points replies

0.2733 (0.1251, 3.2891, 1.28 · 10−3 )

Directed betweenness

0.1754 (23.2548, 2.0632, 4.12 · 10−2 )

Undirected betweenness 0.1922 (23.2548, 2.2681, 2.49 · 10−2 ) Closeness

0.3604 (5.0941, 4.4732, 1.63 · 10−5 )

Weighted eigenvector

0.1439 (4.3298, 1.6845, 9.46 · 10−2 )

Unweighted eigenvector 0.2639 (5.3481, 3.1671, 1.91 · 10−3 ) ˆ t, P(> |t|)) Organization graph ρˆ (β, In degree

0.2022 (0.4069, 2.3912, 1.82 · 10−2 )

Out degree

0.1767 (0.3692, 2.0782, 3.96 · 10−2 )

Number new threads

0.2176 (0.8351, 2.5821, 1.09 · 10−2 )

Number replies

0.1505 (0.2824, 1.7624, 8.03 · 10−2 )

Points new threads

0.2171 (0.0907, 2.5742, 1.12 · 10−2 )

Points replies

0.1806 (0.4052, 2.1263, 3.53 · 10−2 )

Directed betweenness

0.1243 (24.1481, 16.6465, 1.49 · 10−1 )

Undirected betweenness 0.1681 (24.1483, 1.9743, 5.05 · 10−2 ) Closeness

0.2181 (2.9128, 2.5863, 1.08 · 10−2 )

Weighted eigenvector

0.2111 (5.3343, 2.4994, 1.37 · 10−2 )

Unweighted eigenvector 0.2526 (5.9487, 3.0234, 3.01 · 10−3 )

Studying Relationships Between Network Structure

147

ple correlations ρˆ were computed and the linear regression statistical test was used to quantify such correlations. This test checks the statistical significance of a linear fit of a response variable on one factor variable. The estimated linear ˆ Under the null hypothesis (meaning that there is no coefficient is denoted by β. such linear dependence) the test statistic follows a t-distribution and high values are very unlikely to be observed empirically [12]. The result, in Tables 3 and 5, shows a statistically significant positive dependence (ˆ ρ > 0.2) between almost all the considered factors and the students’ performance. Moreover, in order to understand the relationship among the different roles that each student plays in each network, in Table 7 we show some correlations that suggest a balanced behavior, as desirable. Finally, in order to check the analogies among the different networks, in Table 8 we show some correlations that suggest that many students show a similar pattern of participation in the three forums. Because of the small number of students in the ACN course, statistical tests are not accurate. In this case, in order to infer the relationships between individual features and students’ performance or the relationships between individual features between graphs, we can check Tables 4 and 6, where individual values are ordered by students’ performance and where we observe that, with few exceptions, high performing students are good contributors of the lessons forum, but they also show an active participation in the organization forum. The other students that participate in the forums show a balanced behavior. Crossclique Number. The crossclique number accounts for the number of cliques a node belongs to. Figure 17 depicts histograms of this measure for the three networks in the CN course. In the LG, students with values higher than 10 are {4, 25, 53, 90}. In the PG students with values higher than 20 are {4, 5, 90}. Finally, in the OG, students with values higher that 20 are {4, 10, 21, 36, 90}. The same is shown in Fig. 18 for the two networks in the ACN course. In the LG, students with values higher than 10 are {1, 2, 4, 5, 7} and in the OG, students with values higher that 10 are {4, 9, 12, 14, 16}. Additionally, the results in Table 9 indicate that in the three graphs there is a statistically significant positive dependence (again ρˆ > 0.2) between belonging to many subgraphs and the students’ performance. Finally, values in Table 11 also suggest similarities related to this feature among the three forums. Again, relationships in the ACN course must be inferred from data shown in Table 10. Remember that individual values are ordered by students’ performance.

148

O. Ferreira-Pires et al.

Table 4. Relationship between individual features in each graph and student’s performance in the ACN course. Lessons graph In degree Out degree Number new threads Number replies Points new threads Points replies Directed betweenness

4

4

1

7

4

0

6

1

0

4

2

0

3

0

1

0

5

3

0

8

4

0

1

1

0

2

1

0

0

0

2

1

3

1

1

6

1

0

2

1

0

2

2

0

0

0

2

0

6

3

0

13

6

0

1

1

0

2

2

0

0

0

2

1

15

8

3

33

4

0

14

0

0

12

14

0

5

0

0

0

22

17

0

61

30

0

7

7

0

11

6

0

0

0

12

6

0.2348 0.0171 0

0.3551 0.0667 0

0.1384 0

0

0

0

0.0361 0

Undirected betweenness 0.0281 0.0094 0 0 Closeness

0.5384 0.5

0

0.3114 0.0649 0

0.0122 0 0

0

0

2 2 3 12 7 0.0057

0

0.0211 0

0.6363 0.6086 0.4666 0.7777 0.5 0

Weighted eigenvector

0.0649 0

0

3

0.0043

0

0

0.6363 0.5

0.5185 0

0.4666 0.4

0.5

0.2161 0.1908 0.0093 1

0.2402 0

0.2432 0.0812 0.0841

0

0.0869 0

0.0129 0.0141

0.1591 0.3563 0

Unweighted eigenvector 0.8031 0.6831 0.1046 1 0

0.5645 0.3556 0

0.7361 0

0.5322 0.2044 0.4311

0.2617 0

0.2521 0.0593

Organization graph In degree Out degree Number new threads Number replies Points new threads Points replies Directed betweenness

2

4

4

3

2

0

1

4

0

0

4

1

5

2

4

0

3

2

0

5

0

0

2

0

1

3

4

1

2

1

3

0

2

2

2

5

1

0

0

1

0

0

3

0

1

1

1

0

4

2

0

6

0

0

4

0

1

4

5

1

2

1

5

0

8

13

14

22

5

0

0

3

0

0

15

0

4

4

5

0

10

5

0

26

0

0

15

0

5

15

31

5

10

5

16

0

0

0

0

0.0181 0.0144 0 0

0

0.1247 0

0.0561 0

4 2 5 8 24 0.1731

0.0341 0.0513 0.1662 0

Undirected betweenness 0.0041 0.0198 0.0078 0.0238 0 Closeness

6

0

0

0.0138 0.0487 0

0.5

0.5333 0.5333 0.5517 0.4848 0

0

0.1423 0.1255

0.0361 0.0022 0.1297 0 0.4848 0.5517 0.6401

0.3636 0.4571 0.5925 0.3721 0.5925 0.5161 0.5714 0 Weighted eigenvector

0.1827 0.3356 0.2281 1

0.0729 0

0.1545 0.0323 0.5066

0.0022 0.0612 0.3658 0.0302 0.2383 0.0953 0.3059 0 Unweighted eigenvector 0.4917 0.5885 0.3819 0.7993 0.2198 0

0.3055 0.2138 0.7624

0.0256 0.1879 0.6103 0.1374 0.5438 0.3147 0.5721 0

Studying Relationships Between Network Structure

149

Table 5. Correlation between individual centrality features in the GG and student’s performance in the CN course. ˆ t, P(> |t|)) (β,

Global graph

ρˆ

In degree

0.1962 (0.2179, 2.3165, 2.21 · 10−2 )

Out degree

0.1639 (0.1601, 1.9241, 5.65 · 10−2 )

Number new threads

0.3031 (0.5095, 1.3408, 3.36 · 10−4 )

Number replies

0.2271 (0.1735, 2.6994, 7.85 · 10−3 )

Points new threads

0.2933 (0.0518, 3.5511, 5.29 · 10−4 )

Points replies

0.2531 (0.0241, 3.0281, 2.95 · 10−3 )

Directed betweenness

0.1061 (15.9159, 1.2351, 2.19 · 10−1 )

Undirected betweenness 0.1001 (15.9159, 1.1661, 2.46 · 10−1 ) Closeness

0.3319 (3.8679, 4.0731, 7.91 · 10−5 )

Weighted eigenvector

0.2425 (5.7694, 2.8945, 4.44 · 10−3 )

Unweighted eigenvector 0.3661 (8.0296, 4.5531, 1.17 · 10−5 )

Table 6. Relationship between individual centrality features in the GG and student’s performance in the ACN course. Global graph In degree Out degree Number new threads Number replies Points new threads Points replies Directed betweenness

5

6

4

9

5

0

6

4

0

4

4

1

5

2

5

0

7

5

0

9

4

0

3

1

1

4

5

1

2

0

5

1

5

3

3

11

2

0

2

1

0

2

5

0

2

1

1

0

10

5

0

19

6

0

5

1

1

6

7

1

2

1

7

1

23

21

17

55

9

0

14

3

0

12

29

0

9

4

5

0

32

22

0

87

30

0

22

7

5

26

37

5

10

5

28

6

0.0496 0.0231 0 0

0.2321 0.0107 0

0.0211 0.0181 0

Closeness Weighted eigenvector

0.0133 0.0212 0

6 4 8 20 31

0.0744 0.0563 0.1404

0.0371 0.0477 0.1236 0

Undirected betweenness 0.0365 0.0134 0.0027 0.1401 0.0041 0 0

7

0.0212 0.0315 0.0771

0.0315 0.0012 0.1196 0

0.6071 0.5862 0.5151 0.7391 0

0.5483 0.5862 0.5483 0.6296

0.3617 0.5666 0.5862 0.3695 0.6071 0.5

0.5666 0.3777

0.2021 0.2343 0.0593 1

0.2011 0.0635 0.2399

0.1954 0

0.0019 0.1176 0.3657 0.0068 0.1433 0.0239 0.1641 0.0072 Unweighted eigenvector 0.7059 0.7379 0.2399 1

0.6443 0

0.5025 0.2445 0.7234

0.0206 0.5484 0.5515 0.0844 0.3495 0.1751 0.5016 0.0422

150

O. Ferreira-Pires et al.

Table 7. Correlation between number of new threads and replies posted per student in each forum in the CN course. ˆ t, P(> |t|)) (β,

ρˆ

0.4707 (0.2388, 6.176, 7.35 · 10−9 )

Lessons graph

Programming graph 0.6794 (0.4456, 10.719, 2.01 · 10−16 ) Organization graph

0.5637 (0.2757, 7.9021, 8.92 · 10−13 )

Global graph

0.6851 (0.3114, 10.887, 2.02 · 10−16 )

Frequency

Frequency

Frequency

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0

0

5

10

15

20 25 crossclique #

30

35

40

45

0

0.1

0

10

20

30 crossclique #

40

50

0

0

10

20

30

40 50 crossclique #

60

70

80

Fig. 17. Crossclique numbers in the students’ CN networks [6]. LG (left), PG (middle) and OG (right). Frequency

Frequency

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1 0

Fig. 18. (right).

0

0.1

5

10

15 20 crossclique #

25

30

35

0

0

5

10

15 20 crossclique #

25

30

35

Crossclique numbers in the students’ ACN networks. LG (left) and OG

Studying Relationships Between Network Structure

151

Table 8. Correlation between individual features in different forums in the CN course [6]. ˆ t, P(> |t|)) (β,

Lessons - programming graphs

ρˆ

In degree

0.5514 (0.5132, 7.6531, 3.46 · 10−12 )

Out degree

0.5517 (0.4444, 7.6582, 3.37 · 10−12 )

Number new threads

0.2354 (0.1591, 2.8041, 5.81 · 10−3 )

Number replies

0.6396 (0.5584, 9.6332, 2.01 · 10−16 )

Points new threads

0.2517 (0.0905, 3.0111, 3.11 · 10−3 )

Points replies

0.6553 (0.3465, 10.0432, 2.01 · 10−16 )

Directed betweenness

0.1639 (0.2198, 1.9232, 5.65 · 10−2 )

Undirected betweenness

0.0839 (0.0618, 0.9751, 3.31 · 10−1 )

Closeness

0.9671 (0.9414, 43.9851, 2.01 · 10−16 )

Weighted eigenvector

0.2146 (0.2481, 2.5432, 1.21 · 10−2 )

Unweighted eigenvector Lessons - organization graphs

0.5813 (0.4399, 8.2711, 1.17 · 10−13 ) ˆ t, P(> |t|)) ρˆ (β,

In degree

0.5291 (0.4596, 7.2184, 3.55 · 10−11 )

Out degree

0.6401 (0.5038, 9.6451, 2.01 · 10−16 )

Number new threads

0.3316 (0.2836, 4.0711, 8.02 · 10−5 )

Number replies

0.6283 (0.5181, 9.3511, 2.62 · 10−16 )

Points new threads

0.3452 (1.5128, 4.2571, 3.86 · 10−5 )

Points replies

0.5643 (1.9489, 7.9132, 8.36 · 10−13 )

Directed betweenness

0.3085 (0.6065, 3.7551, 2.58 · 10−4 )

Undirected betweenness

0.1365 (0.1875, 1.5951, 1.12 · 10−1 )

Closeness

0.8821 (0.8115, 21.6732, 2.01 · 10−16 )

Weighted eigenvector

0.3458 (0.3508, 4.2663, 3.73 · 10−5 )

0.6132 (0.5437, 8.9886, 2.08 · 10−15 ) ˆ t, P(> |t|)) Programming - organization graphs ρˆ (β, Unweighted eigenvector In degree

0.5201 (0.4855, 7.0501, 8.59 · 10−11 )

Out degree

0.8066 (0.7881, 15.7984, 2.01 · 10−16 )

Number new threads

0.3133 (0.3967, 3.8191, 2.04 · 10−4 )

Number replies

0.7548 (0.7791, 5.1073, 1.11 · 10−6 )

Points new threads

0.2962 (0.4668, 3.5911, 4.62 · 10−4 )

Points replies

0.7268 (1.3275, 12.2521, 2.01 · 10−16 )

Directed betweenness

0.3995 (0.5854, 5.0452, 1.45 · 10−6 )

Undirected betweenness

0.5447 (1.1065, 7.5191, 7.11 · 10−12 )

Closeness

0.8826 (0.8342, 21.7434, 2.01 · 10−16 )

Weighted eigenvector

0.4458 (0.3913, 5.7654, 5.36 · 10−8 )

Unweighted eigenvector

0.6331 (0.7416, 9.4684, 2.01 · 10−16 )

152

O. Ferreira-Pires et al.

Table 9. Correlation between crossclique number and student’s performance in the CN course. ˆ t, P(> |t|)) (β,

ρˆ Lessons graph

0.2665 (0.2742, 3.2012, 1.71 · 10−3 )

Programming graph 0.2109 (0.1333, 2.4987, 1.37 · 10−2 ) Organization graph

0.2094 (0.0971, 2.4802, 1.44 · 10−2 )

Global graph

0.1137 (0.0141, 1.3264, 1.87 · 10−1 )

Table 10. Relationship between crossclique number and student’s performance in the ACN course. Lessons graph Organization graph Global graph

28 24 0 11

2 36 18 0 18 4 0 6 0 4

6 10 8 14 4 0 4 2 5 18 2 18 6 11

4 2

6

6 18 0

42 44 8 72 32 0 23 11 34 2 28 30 2 24 6 21 3

Table 11. Correlation between crossclique number in different forums in the CN course [6]. ρˆ

ˆ t, P(> |t|)) (β,

Lessons - programming graphs

0.5664 (0.3479, 7.9562, 6.61 · 10−13 )

Lessons - organization graphs

0.6034 (0.2716, 8.7612, 7.49 · 10−15 )

Programming - organization graphs 0.6445 (0.4724, 9.7651, 2.01 · 10−16 )

5

Conclusions

This chapter has studied how social network analysis techniques can explain, and ultimately predict, the success of students in an online or virtualized learning platform. Though we have specifically focused on participation in discussion forums (in a typical undergraduate course and in a master level course), we believe that our conclusions can be extended to other forms of online collaboration, like team projects or online challenges. Our contributions, which extend our previous work, can be summarized as follows. Firstly, as we have shown, it is difficult to find one or a few network properties that hint at good academic performance by individual students, because some of them exhibit similar correlations with the outcomes. In future work it could be interesting to analyze the composition of some of the graphs information (along with other measures of the courses) in order to improve the accuracy of the correlations. Secondly, the detection of sparsely connected islands of information groups is straightforward (e.g., via cliques), and the clustering properties

Studying Relationships Between Network Structure

153

of the graph also allow for the effective recognition of good students. Thirdly, although in the master level course statistical tests are not appropriate because of the small number of students, the relationships that can be inferred from the data shown along the chapter are similar to those obtained for the undergraduate level course. Finally, it is important to highlight that the connections between statistical correlations and network structure arise already at early stages in the course, so they can be detected almost right after the interactions start and provide early signs of success or failure, both for groups and for individuals, thus allowing the instructors to adapt their methodologies to the individual needs. In summary, we believe that this study contributes to a better understanding of the analyzed learning experiences in general, and possibly to devise more effective designs in order to increase the usefulness of the forums academic activity under study in particular.

References 1. Barab´ asi, A.: Network Science. Cambridge University Press, Cambridge (2016) 2. Cadima, R., Ojeda, J., Monguet, J.M.: Social networks and performance in distributed learning communities. Educ. Technol. Soc. 15(4), 296–304 (2012) 3. Chung, K.S.K., Paredes, W.C.: Towards a social networks model for online learning & performance. Educ. Technol. Soc. 18(3), 240–253 (2015) 4. Dado, M., Bodemer, D.: A review of methodological applications of social network analysis in computer-supported collaborative learning. Educ. Res. Rev. 22, 159– 180 (2017) 5. Dawson, S.: A study of the relationship between student social networks and sense of community. Educ. Technol. Soc. 11(3), 224–238 (2008) 6. Ferreira, O., Sousa, M.E., L´ opez, J.C., Fern´ andez, M.: Investigating interaction patterns in educational forums: a social networks analysis approach. In: Proceedings of the 11th International Conference on Computer Supported Education (CSEDU 2019), vol. 2, pp 88–99 (2019) 7. Gaggioli, A., Mazzoni, E., Milani, L., Riva, G.: The creative link: investigating the relationship between social network indices, creative performance and flow in blended teams. Comput. Hum. Behav. 42(1), 157–166 (2015) 8. Hart, J.: Social learning handbook. Centre for Learning & Performance Technologies, Bath (2014) 9. Haythornthwaite, C.: Learning relations and networks in web-based communities. J. Web Based Commun. 4(2), 140–158 (2008) 10. Heo, H., Lim, K.Y., Kim, Y.: Exploratory study on the patterns of online interaction and knowledge co-construction in project-based learning. Comput. Educ. 55(3), 1383–1392 (2010) 11. Hommes, J., Rienties, B., Grave, W., Bos, G., Schuwirth, L., Scherpbier, A.: Visualizing the invisible: a network approach to reveal the informal social side of student learning. Adv. Health Sci. Educ. 17(5), 743–757 (2012). https://doi.org/10.1007/ s10459-012-9349-0 12. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning with Applications in R. Springer, New York (2013). https://doi.org/10. 1007/978-1-4614-7138-7

154

O. Ferreira-Pires et al.

13. Jan, S.K., Viachopoulos, P.: Influence of learning design of the formation of online communities of learning. Int. Rev. Res. Open Distrib. Learn. 19(4) (2018) 14. Laat, M., Lally, V., Lipponen, L., Simons, R.J.: Investigating patterns of interaction in networked learning and computer-supported collaborative learning: a role for social network analysis. Int. J. Comput.-Support. Collab. Learn. 2(1), 87–103 (2007). https://doi.org/10.1007/s11412-007-9006-4 15. Lin, J.W., Mai, L.J., Lai, Y.C.: Peer interaction and social network analysis of online communities with the support of awareness of different contexts. Int. J. Comput.-Support. Collab. Learn. 10(2), 161–181 (2015). https://doi.org/10.1007/ s11412-015-9212-4 16. Manca, S., Delfino, M., Mazzoni, E.: Coding procedures to analyze interaction patterns in educational web forums. J. Comput. Assist. Learn. 25(2), 189–200 (2009) 17. Newman, M.: Networks: An Introduction. Oxford University Press, New York (2010) 18. Putnik, G., Costa, E., Alves, C., Castro, H., Varela, L., Shah, V.: Analysis of the correlation between social network, analysis measures and performance of students in social network-based engineering education. Int. J. Technol. Des. Educ. 26(3), 413–437 (2016). https://doi.org/10.1007/s10798-015-9318-z 19. Rienties, B.: The role of academic motivation in computer-supported collaborative learning. Comput. Hum. Behav. 25(6), 1195–1206 (2009) 20. Shea, P., et al.: Online learner self regulation: Learning presence, viewed through quantitative content and social network analysis. Int. Rev. Res. Open Distrib. Learn. 14(3), 427–461 (2013) 21. Siqin, T., van Aalst, J., Chu, S.K.W.: Fixed group and opportunistic collaboration in a CSCL environment. Int. J. Comput.-Support. Collab. Learn. 10(2), 161–181 (2015). https://doi.org/10.1007/s11412-014-9206-7 22. Sousa, M.E., L´ opez, J.C., Fern´ andez, M., Rodr´ıguez, M., L´ opez, C.: Mining relations in learning-oriented social networks. Comput. Appl. Eng. Educ. 25(5), 769– 784 (2017) 23. Sousa-Vieira, M.E., L´ opez-Ardao, J.C., Fern´ andez-Veiga, M.: The network structure of interactions in online social learning environments. In: Escudeiro, P., Costagliola, G., Zvacek, S., Uhomoibhi, J., McLaren, B.M. (eds.) CSEDU 2017. CCIS, vol. 865, pp. 407–431. Springer, Cham (2018). https://doi.org/10.1007/9783-319-94640-5 20 24. Stepanyan, K., Mather, R., Dalrymple, R.: Culture, role and group work: a social network analysis perspective on an online collaborative course. Br. J. Educ. Technol. 45(4), 676–693 (2014) 25. Tirado, R., Hernando, A., Aguaded, J.I.: The effect of centralization and cohesion on the social construction of knowledge in discussion forums. Interact. Learn. Environ. 23(3), 293–316 (2015) 26. Toikkanen, T., Lipponen, L.: The applicability of social network analysis to the study of networked learning. Interact. Learn. Environ. 19(4), 365–379 (2011) 27. Vercellone-Smith, P., Jablokow, K., Friedel, C.: Characterizing communication networks in a web-based classroom: cognitive styles and linguistic behavior of selforganizing groups in online discussions. Comput. Educ. 59(2), 222–235 (2012) 28. Xie, K., Yu, C., Bradshaw, A.C.: Impacts of role assignment and participation in asynchronous discussions in college-level online classes. Internet High. Educ. 20, 10–19 (2014)

Emotion Recognition from Physiological Sensor Data to Support Self-regulated Learning Haeseon Yun1,2(B) , Albrecht Fortenbacher1 , René Helbig1 , Sven Geißler1 , and Niels Pinkwart2 1 University of Applied Sciences Berlin, Wilhelminenhof Strasse 75A, 12459 Berlin, Germany

{yun,forte,helbig,s0560544}@htw-berlin.de 2 Humboldt University Berlin, Rudower Chaussee 25, 12489 Berlin, Germany

[email protected]

Abstract. In education, learners’ autonomy and agency have been emphasized across various domains. However, ability to self-regulate their learning by setting a goal, monitoring, regulating and evaluating their learning progress is not easy. With wearable sensor technology, various physiological and contextual data can be detected and collected. To provide learners with a context-aware personal learning support, we have researched physiological sensor data (EDA and ECG) by providing emotional stimulants to 70 students from two higher education institutes. We have analyzed our collected data using multiple methods (qualitative, quantitative, machine learning and fuzzy logic approaches) and found a relation between physiological sensor data and emotion that seems promising. Consecutively, we have investigated a learning support system for self-regulated learning and proposed three ideas with prototypes. Our future work will entail implementation of research findings to develop a learning companion system to support learners’ self-regulated learning. Keywords: Emotion detection · Learning indicators · Sensor data · Machine learning · Fuzzy logic reasoning · Technology enhanced learning · Learning companion

1 Introduction Successful scientists and inventors such as Thomas Edison or Albert Einstein inspire learners of various background as they have overcome the obstacles that they face during learning and proved their excellence. Especially, their unique aptitudes of not fitting in a conventional school system with slowness in learning motivate others to put importance in persistency in learning when they are inclined to give up. These great minded persons enlighten teachers as well, especially to be observable for students’ differences in nature to correspond them appropriately and instruct them effectively. Until now, the roles of teachers and educators include not only the transfer of knowledge but also educating students how to successfully learn and motivate themselves. In order to effectively support learners, teachers have adopted various measures to detect students’ cognitive © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 155–173, 2020. https://doi.org/10.1007/978-3-030-58459-7_8

156

H. Yun et al.

and emotional status. Due to the increase in individual’s differences in their interests, aptitudes and characteristics along with changing core skills such as 21st competencies (Collaboration, Communication, Creativity, Critical thinking, Information literacy, Problem-solving and Socio-emotional skills [56]), areas such as motivation and emotions draw great attention in the research fields of education and technology enhanced learning. Emotions of learners can affect their learning achievement and maintaining a positive experience of learning can help learners cope with difficulties that they can face during learning. Learners’ emotions should be especially paid attention to in an informal learning setting as the roles of teachers are different from those in a traditional face-to-face learning environment. In an informal learning context, scrutiny on how to effectively provide students with unprecedented means of support (e.g. online forum, ITS feedback) is needed. To consider an effective learning support, we need to investigate not only learners’ cognitive development but also their emotions while learning. Wearable sensor technology made it possible to collect physiological data of users. In a learning context, the collected learners’ physiological data and surrounding data can then be analyzed to provide learners with a personalized environment through context aware feedback and adaptation. Since physiological data is related to emotion, it is plausible that students’ emotional regulation can be supported by technology enhanced learning designs that utilize wearable sensors. Therefore, in this paper, we discuss our research project, LISA (Learning Analytics for sensor-based Adaptive Learning)1 , by presenting the theoretical background on sensor data with respect to self-regulated learning, and specifically we focus on emotion in relation to physiological data such as skin conductance data and cardiovascular activities. Then we describe our experiment which stimulates specific emotions using the pictures while recording physiological data of 70 students from higher educational institutes. Based on the collected data, we analyze the empirical data using various mixed methods (e.g. qualitative, quantitative, machine learning and fuzzy logic) and thus report our results. As our research aim endeavors to relate physiological sensor data with academic emotion as a learning support, we discuss our findings and propose our ideas of an affective learning support system as a learning companion, using sensor data.

2 State of the Art 2.1 Sensors for Self-regulated Learning Self-regulated learning implies the active control over one’s learning through control of thoughts, feelings and behaviors based on a set goal [1]. This includes not only an effective attainment of knowledge but a management and a regulation of one’s emotion and even environment. Among subsets of self-regulation, emotion regulation has been seen as a vital aspect to tackle in the area of education. Specifically, Azevedo and colleagues [2] recently studied self-regulated learning in an online learning environment where physiological data of a learner were used to visualize his or her emotion to support self-regulated learning. Mohr [3] utilized physiological data so called “behavioral markers” to relate to the areas of self-regulation (cognition, emotion and behavior). 1 16SV7534K.

Emotion Recognition from Physiological Sensor Data

157

Previously, authors [4] discussed the sensors in relation to self-regulated learning framework by matching each of four areas (cognition, motivation, behavior and context) and each of the four phases (planning, monitoring, regulation and evaluation) with specific sensors. Gaze detection and facial expression analysis through a camera gives an insight into cognition and physiological sensor data such as electrodermal activity and cardiovascular activities, which can provide understanding on motivation/emotion, especially for the evaluation phase of self-regulated learning. Furthermore, speech recognition using microphone and location detection using GPS and Bluetooth are known to detect the emotional stress such as depression [5, 6]. 2.2 Emotion Recognition Using Physiological Sensor Data Observation of physiological data such as eye movement, facial recognition, skin conductance level (SCL), nonspecific skin conductance responses (SCR) reveals a relation to both cognition and emotion [7–11]. Bradley and Lang [12] reported the noticeable physiological changes in heart rate, blood pressure, electrodermal activity and respiration rate due to emotional cues. To distinguish between positive and negative emotions, skin conductance and related signals [13, 14], and cardiovascular activity data [15–18] were actively explored. The findings from various research, reported a positive correlation between the intensity of emotion and skin conductance [13, 14, 19, 20]. Cardiovascular signals such as heart rate, heart rate acceleration and heart rate variability also fluctuate in relation to the degree of emotional valence [15–18]. The machine learning approach has been used to detect emotions that a person experiences by using heart rate, skin conductance and respiration rate [21]. A general model to recognize learner’s emotional state in a learning situation is still context dependent with ambivalent findings. To generate a general model for emotion detection, in this research we have focused on general emotion which can be mapped into an analyzable construct. We have adapted the emotional picture experiment using IAPS and SAM rating [22, 23] toward students in a higher education institute while recording physiological sensor data (EDA and ECG) and we have applied qualitative, quantitative and machine learning approach on the collected data to find basic relationship between sensor data and emotions, specifically EDA and ECG.

3 Emotion Recognition Using EDA and ECG Data 3.1 Method The overall goal of the LISA project is to give learning support by analyzing learners’ sensor data. Physiological sensor data can provide indications about emotional and cognitive states. Therefore, we needed an algorithm that could predict emotions in order to give an emotional support. Furthermore, the LISA project provides multimodal learning analytics, which uses physiological sensor data that can be mapped to specific learning situations, so that more personalized feedback and recommendations could be provided.

158

H. Yun et al.

From literature we know that data from EEG2 , ECG3 (or PPG4 ) and EDA5 sensors seem to reflect mental processes related to emotions [57]. But still, many case studies refer to specific emotional situations, or are limited by the number of participants, which makes it difficult to generalize the results and to obtain an effective prediction algorithm which works in arbitrary learning situations. Furthermore, LISA required a non-intrusive, non-distracting sensor solution which excluded EEG sensors, so our focus was on obtaining information about skin conductance (EDA sensors) and about the cardiac system (ECG or PPG sensors). To obtain sensor data referring to different emotional states, such as happiness, anger, boredom and satisfaction, we conducted a study with total of 70 participants at HTW Berlin and IWM Tübingen6 using the International Affective Picture System (IAPS). First, we selected 96 pictures referring to different values of valence and arousal (IAPS reference ratings) and they were presented to participants as a set (preparation-picture stimulus-picture rating) to induce well-defined emotions. While exposing participants to pictures, we recorded EDA and ECG sensor data and synchronized the data with the experiment software afterwards. Raw EDA and ECG sensor data were processed and normalized to yield specific values such as instantaneous heart rate (IHR), heart rate variability (HRV) as well as tonic and phasic components of skin conductivity. We have also calculated a set of features, mainly aggregated values of skin conductivity and heart rate, such as mean or skewness. Other features include skin conductivity response (SCR), small pulses corresponding to mental processes, or heart rate variability. The duration of the experiment was approximately 45 min and recording sensor data at 1000 Hz sampling rate yielded a vast amount of data annotated by stimuli induced by emotional pictures. As a first step, we conducted qualitative and quantitative analysis on our collected data, to verify some hypotheses which were derived from the literature [57]. Based on a first explorative approach, statistical methods were applied to all collected data to comport with statistical significance. The statistical approach aimed to find correlations between features derived from sensor data and emotional states, indicated by both IAPS reference ratings and self-rating of a participant. As a second step, by using machine learning environment (Auto-Weka), we classified sensor data according to their IAPS reference rating, and compared our results to literature findings which report the results of machine learning using the identical sensor data and a similar experiment design. Using a series of cross validations on all sensor data, our results consist of precision, recall and model fitness. Based on a newly discovered problem, we took several approaches including feature reduction, classification and the history of picture. As the third step towards emotion prediction, we applied a fuzzy logic approach which uses expert knowledge (literature) as rule sets to derive a model and classify emotional states [15]. Modelling correlations between sensor data and emotions as expert 2 Electroencephalogram. 3 Electrocardiogram. 4 Photoplethysmography. 5 Electrodermal Activity. 6 Leibniz Institut für Wissensmedien, Tuebingen.

Emotion Recognition from Physiological Sensor Data

159

rules, we calculated emotion values (valence and arousal) for each picture. Comparing these values with IAPS reference ratings gives an indication about feasibility and quality of emotion prediction. 3.2 Emotional Picture Experiment Although there is a vast amount of studies indicating correlations between data from physiological sensors and emotions, it is difficult to apply these findings directly to obtain a reliable predictor of learning emotions. Reasons include the choice of physiological sensors which might not be available with the LISA sensor device (e.g. EEG), but also a study design which makes it difficult to apply the results (closed experiment setting vs real life setting). Therefore, we decided to set up an experiment using IAPS emotional pictures, showing them to 70 participants while recording electrodermal activity (EDA) and electrocardiogram (ECG) data. The International Affective Picture System (IAPS) consists of emotionally inducing images with normative ratings for its level of valence and arousal, obtained from a wide range of people. By using the circumplex model of affect [24], each picture can be mapped onto the affective grid as shown in Fig. 1.

Fig. 1. 2-dimensional affective space with IAPS picture [12].

Valence ranges from negative emotion such as anger or sadness to positive emotion such as joy, and arousal indicates the intensity of the emotion, for example calm/boring to stimulating/exciting. IAPS provides 1182 IAPS pictures with mean values and standard deviation for valence, arousal and dominance, which serve as a good standard to evoke specific emotion evaluated by a large number of people from various cultural backgrounds. Most importantly, the images can be plotted on a two-dimensional affective space based on its normative valence and arousal values [24].

160

H. Yun et al.

Participant’s perception of IAPS pictures can be rated using self-assessment manikins (SAM) as a pictorial rating instrument (Fig. 2).

Fig. 2. 9-point scale SAM Rating for valence (top) and arousal (bottom) [23].

This allows to record participants’ perception in addition to collecting physiological data. Specifically, skin conductance is found to co-vary with arousal values, both for pictures which induce negative and positive valence. Also, heart rate reacts to positive valence and decelerates when unpleasant images are presented. Comprehensive pictures are found to be relatively safe and effective by providing targeted emotional stimulus yet not detrimental to the viewers. Based on previous results [24], we designed a 45 min experiment with 96 pictures. To selected 96 pictures, we have applied three main criteria on the complete IAPS pictures (1182 pictures): 1) equal distribution of the pictures, i.e. 24 pictures for each of the categories HH (high valence and high arousal), HL, LL and LH, corresponding to the four quadrants of the circumplex model, 2) pictures that have no significant statistical difference in ratings (valence and arousal) between genders were selected using independent t-test, to minimize the gender effect on the experiment, and 3) within a given category, standard deviation for both valence and arousal should be as low as possible, but in any case lower than 2.5. Applying the second criteria, we chose 735 pictures where the difference in ratings between female and male participants was less than 1 (valence) and less than 0.8 (arousal). Additionally, pictures containing explicit violence, or which were sexually explicit were excluded. The mean valence rating difference between female and male was less than 1. The mean arousal rating difference between female and male was less than 0.8. This resulted in 735 pictures. Third, to select pictures that are targeted to the specific category, pictures containing explicit violence and sexually explicit were excluded. Lastly, standard deviation for the respective category was maintained to be as low as possible. For high valence and high arousal pictures (category HH), 42 pictures with valence or arousal values greater than 6 were selected. Excluding 7 sexually explicit pictures and 11 with high standard deviation, we attained 24 pictures with a mean valence from 6.07

Emotion Recognition from Physiological Sensor Data

161

to 7.4 and mean arousal from 6.1 to 7.35. Standard deviation for valence was below 2.0, for arousal below 2.21. For high valence and low arousal pictures (HL), 42 pictures with valence greater than 6 and arousal lower than 4 were selected. Excluding 18 pictures with high standard deviation, valence ranged from 6.03 to 7.52 and arousal from 2.51 to 3.97. The range of standard deviation for HL stimuli pictures is less than 2 for valence and up to 2.2 for arousal. For low valence and low arousal pictures (LL), pictures with less than 4 for valence and arousal resulted in only 9 pictures. Therefore, we had to adjust the threshold value to 4.3 for both valence and arousal, which resulted in 29 pictures. The values with highest standard deviation were excluded for both valence (less than 2) and arousal (up to 2.23). For low valence and high arousal pictures (LH), pictures with valence less than 4 and arousal greater than 6 were selected, resulting in 53 pictures. Out of these, 12 pictures with explicit violence and 2 redundant pictures were excluded. Pictures with high standard deviation for valence (greater than and equal to 2) and arousal (greater than 2.23) were also excluded. Once the participant arrived at the experiment setting, they were guided to sit in front of the computer screen. The general aim and procedure of the experiment were explained, and both verbal and written consent were received. Then, the electrodes to measure EDA and heart rate were attached. While EDA and heart rate signals were verified for accurate recording, the baseline task and the emotional picture rating task were explained with examples. Moreover, participants were provided with answers for any questions they asked. Once the participants were ready, a baseline task which involves heartbeat perception was administered for about 5 min. The heartbeat perception task was performed 3 times with a 25, 35 and 45 s interval respectively. When instructed, the subject was asked to listen to his or her heart beats and count them, if possible. Participants were informed to feel and listen to their heart beat and advised not to measure the pulse physically as the intention of the heart perception task is evaluate the accuracy of the perceived heartbeat but to attain a base line by promoting awareness of oneself [58]. After each time of heart perception task, the participants were requested to report the counted or estimated number of heart beats. After the baseline task, the 34-min emotional picture experiment began with the screen message “Get ready to rate the next slide” shown for 5 s to initiate the picture-rating task. A total of 96 pictures were shown in random order. Each picture was shown for 6 s, then the participant was asked to rate valence and arousal using the SAM rating. After 10 s, an empty screen with a + sign was presented for 5 s, to prepare for the next picture. After the experiment, participants were directed to a short relaxing video clip to avoid any negative effects of emotional pictures. 3.3 Data Collection and Processing The data collected during the emotional picture experiment were EDA and ECG signal data, a participant’s self-rating of IAPS picture (valence and arousal), the IAPS picture ID and a UNIX timestamp. The IAPS reference arousal and valence rating values along with standard deviations of each IAPS picture were provided by the center for the study of emotion and attention and later synchronized with sensor data using UNIX timestamp.

162

H. Yun et al.

EDA and ECG sensor data were recorded using a wearable sensor, BITalino (r)evolution Plugged Kit BT 107 which is suitable for explorative research with low cost and raw data acquisition functionality. Both EDA and ECG raw sensor data were sampled and stored through a 10-bit channel (ADC) and converted to μSiemens for EDA and mV for ECG using transfer functions stated in the respective data sheets. EDA values were normalized using the z-score function (subtracting the mean value and dividing by the standard deviation), to facilitate comparison among different participants. In addition, EDA signals were split into a slowly changing tonic component (SCL, skin conductance level) and a rapidly changing phasic component (SCR, skin conductance response), as these components were correlated with mental processes [25]. From SCR pulses, which are characterized by offset and peak time as well as by their amplitude, the existence of a pulse within the picture viewing time and its latency serve as indicators for emotional states. In the absence of stimuli, SCR pulses can be aggregated to provide a frequency value (NSSCR, non-specific skin conductance response). With cardiovascular activities, Heart rate variability, which indicates changes in RR intervals, is a well-known indicator for the state of a person’s autonomic nervous system [26], and it has been identified as a possible indicator for emotions [27]. Therefore, from the ECG signal, we have determined the time between consecutive peaks, also known as RR interval or inter-beat interval (IBI), and the reciprocal value of RR intervals (Instantaneous heart rate, IHR) was calculated. To analyze HRV, as either time or frequency domain can be used, we have used both features in our study. Specifically, as [28] stated that frequency domain analysis of HRV on data sets of 2 to 5 min can reveal the balance between sympathetic and vagal activity, we aimed to explore different features based on HRV. 3.4 Qualitative and Quantitative Analysis On the processed data set, we have based our analysis on three hypotheses which were derived from the literature to validate the relationship between emotion and sensor data. The first hypothesis states that the skin conductance is related to arousal [13, 14, 18, 21, 29], as the linkage between SCR and the intensity of emotion was reported [29] and a larger change in EDA (k) was shown during the passive emotional state [18]. In addition, [21] also verified that the level of skin conductance is at peak, when a participant’s perception of the stimulus is in the high arousal state. The second hypothesis is set as the heart rate (magnitude and acceleration) is related to valence [17, 23]. [17] reported an association between the change of heart rate and valence of emotion and [18] specifically reported that negative emotion resulted in a faster heart rate acceleration compared to the acceleration during happiness. [13] extended this logic by indicating that the heart rate shows an accelerative state during negative emotional states. [20] also recounted that the heart rate shows a modest relationship with the valence self-rating. Lastly, the third hypothesis to verify is the relationship between positive emotion with both high EDA and high heart rate and negative emotion with high EDA with depressed heart rate [30]. [30] reported that the extreme pleasantness is characterized by 7 https://bitalino.com.

Emotion Recognition from Physiological Sensor Data

163

high value of SCR and high heart rate whereas extreme unpleasantness is characterized by high SCR with depressed heart rate. The results of our qualitative investigation supported the relationship between EDA and arousal. Specifically, the accelerated EDA is associated with high intensity and decelerated EDA with low intensity of emotion. With our first findings from the qualitative analysis, we have pursued the statistical significance of the result by applying a quantitative method on our data set. To find the effect of EDA gradient and arousal level, the hypothesis of non-difference of variance and identical averages between high arousal and low arousal stimulus on EDA change was set. To confirm the difference in variance, F-test was conducted and we have found the significant difference of the variances in EDA gradient between high arousal situation and low arousal situation (P < 0.05, F(2.0629) > Fcrit (1.1334)) (Table 1). Table 1. F-test results between high arousal EDA gradient and low arousal EDA gradient.

Furthermore, there was a significant difference in the EDA gradient interval for high arousal stimulus (M = 2.1533 10−6 , Std: 5.3111 10−5 ) and for low arousal stimulus (M = 7.6525 10−6 , Std: 3.6978 10−5 ) conditions with t (1009) = 2.89, p = 0.0.0303 (α < 0.05) Table 2). As our next step to find more possible indicators for academic emotions based on sensor data, we have explored the machine learning approach to construct a general model which can be applied regardless of individual differences. 3.5 Machine Learning In the literature, there are many promising approaches which use machine learning to classify biometric signals and to predict emotions. To give a few examples, Conati and colleagues [31] applied a machine learning approach on EDA and EMG8 sensor data. Fernando and colleagues [32] used ECG based features such as HRV and specifically focused on EDA sensors by investigating different methods of feature extraction. They used the circumplex model of affect [20] to classify emotions defined by valence and 8 Electromyography.

164

H. Yun et al.

Table 2. T-test results between high arousal EDA gradient and low arousal EDA gradient.

arousal. Using EDA sensor data, they achieved an accuracy rate of 81.81% and 89.29% for arousal and valence, respectively. In our experiment, the selected 96 IAPS pictures were shown to each participant in a random order. The pictures were equally distributed over the four classes HH, HL, LH, LL (high valence high arousal, high valence low arousal, low valence high arousal, low valence low arousal). To calculate features for each picture, we used five functions over time: EDA values, normalized EDA values (z-score), instantaneous heart rate (IHR) values, normalized heart rate values (z-score), and heart rate variability referring to a 120 s interval. All features referring to EDA and IHR were calculated over an interval of 6 s, corresponding to the time duration when a participant viewed the picture, and over an interval of 22 s, from the start of a picture to the start of the next picture. For EDA, EDA z-score, IHR and IHR z-score, we calculated the following values: mean value, minimum, maximum, first and last value, gradient, standard deviation and variance, gradient, kurtosis, moments, skewness and zero-crossings [34]. Together with 10 HRV features, 5 in the time domain and 5 in the frequency domains, each picture had more than fifty highly homogeneous features. Our first approach was to verify results from Ayata and colleagues [33] who derived features from the EDA signal for classification. With Weka [47] as an environment for machine learning, we used Support Vector Machine (SVM), J48 Decision Tree and Random Forest as classifiers, as well as with Auto-Weka which applies dimension reduction and attempts to find salient hyper-parameters for various algorithms. The results obtained by Auto-Weka show that the Random Forest algorithm [35] provides the best accuracy. Particularly, an accuracy of 90% was achieved when analyzing EDA data from HTW experiment (20 participants) despite not using the features from Empirical Mode Decomposition as suggested by [33]. Additionally, J48 and SVM also provided the comparable accuracy, in spite of the values were lower than those obtained by the Random Forest classifier.

Emotion Recognition from Physiological Sensor Data

165

We also evaluated each resulted model with a series of cross validations on all sensor data, specifically the 10-fold cross validation and the k-1 validation, where k is the number of participants. Upon validations, the accuracy level was diminished to 50% for both valence and arousal, which indicates an overfitting problem. It refers that the algorithm may “learn” the data of each participant instead of generating a general model out of each data. This may be resulted due to the size of our HTW dataset (20 participants, 96 pictures per participant), yet similar outcome was observed when we have extended our analysis to all 70 participants data. Auto-Weka constructs a model which fits well to each participant (precision better than 90%), however it fails to construct a general model even with larger data set. To address the issue, we investigated the feature reduction methods and used another machine learning environment AutoSKLearn. Feature reduction describes dimensionality reduction using the Principal Component Analysis (PCA) which reduces the feature set without loss of much information [36]. Dimensionality reduction can be used to solve overfitting problems or to reduce the training time of the machine learning algorithm by combining highly correlated features. Auto-Weka uses its own method to reduce the feature dimension called “attribute evaluator” which is used to identify relevant features for each classifier. In addition to using Auto-Weka as a machine learning environment for data analysis, we have used AutoSKLearn [35]. Before using AutoSKLearn, we used a random forest classification algorithm [37] to find the most relevant features. For both EDA and IHR, the most relevant features refer to the longer interval between two pictures, including mean value, standard deviation, gradient, difference between minimum and maximum value, kurtosis and skewness. Considering literature findings which indicate a correlation of HRV values and emotional states [38], we calculated HRV in a 120 s interval and selected the time-domain features such as sdsd, rmssd, pnn20 and pnn50. We have also generated frequency domain features of HRV as the study shows the relevance of rmssdin low and high frequency is highly correlated with valence [39]. As a result, 19 features were selected for further investigation. The extended analysis with the reduced feature set with a 4-class problem (HH, HL, LH and LL) were still unsatisfactory. Auto-Weka regarded all features as relevant and produced good results (both precision and recall between 90% and 100%) only when all participants were used for training. This yields a good predictor for known participants, but not a general predictor for emotions. If a participant is excluded from training, precision and recall were as low as 25% to 30%. A possible explanation is the observation that the algorithms, Lazy k-Star [40] and AdaBoost [41], used by Auto-Weka tend to learn feature values if there is a high entropy. Even after normalization using the z-score function, EDA and IHR values seem to be closely related to individual persons, which means that the data set probably is too small to obtain generalizable results. Similar results were obtained by AutoSKlearn [37] and Google AutoML. Specifically, the Google AutoML resulted in the accuracy of 25–26% for a 4-class problem. However, the results became better when reducing the problem to two classes, high valence and low valence using AutoSKLearn. AutoSKLearn achieved a precision of 63%, when 80% of the data were used for training, 20% for testing.

166

H. Yun et al.

AutoSKLearn used FastICA for preprocessing and Gradient Boosting for a classifier [42]. An interesting observation from the 4-class problem was that the true-positive rate of LH (valence/arousal) is significantly higher than for the 3 other classes (HH: 15%, HL: 19%, LL: 27%, LH: 45%). According to the circumplex model of emotion [20], the class LH contains strong negative emotions, like anger, fear and disgust. In a learning context, LH could be an indicator for a stressful learning situation. In order to specifically detect LH emotions from our sensor data, we investigated deeper into the 2-class problem LH against other states (HH, HL, LL) using cross validation with 5 splitting’s. The results were promising with the accuracy above 75%. AutoSKlearn used a Random Forest classifier and Kernel PCA with a Polynomial Kernel [35] as preprocessing. 3.6 Fuzzy Logic Approach As our next step, we have adopted the Fuzzy Logic approach [15] as some expert knowledge can be formulated in a human-like manner as a set of simple rules to generate a general model. In our case, rules like “IF eda_gradient IS high THEN valence IS low” associate sensor values with the dimension of emotion. In many cases, these rules are insufficient to apply a quantitative approach as they are stated as a relational statement without numerical value. For example, the association between stress and the combination of EDA peak height and the instantaneous heart rate [43] cannot be applied directly in a quantitative analysis. However, as Fuzzy Logic allows ambiguity, the association without concrete values can be analyzed. As we have investigated emotion using IAPS pictures, various literature findings could have served as expert rules. For instance, emotional with a valence of 6 or above can be described as happiness or contentment [20]. This can be transferred into a simple rule in a Fuzzy Logic system. To apply these rules in Fuzzy Logic, the Fuzzy membership function was defined based on the scatterplot or histogram of the collected data. The boundaries of each range (e.g. low, mid, high) were set by using the mean and standard deviations of the sample (Fig. 3.). Then the membership functions transformed the membership of a specific element into a percentage membership in the set of values. The fuzzy logic system weighs each input signal, defines the overlap between the levels of input, and determines an output response. Domain knowledge is modelled as a set of IF/THEN rules which use the input membership values as weighting factors to determine their influence on the Fuzzy solution sets. Once the functions are inferred, scaled, and combined, they are de-fuzzified or translated into a solution variable, which is a scalar output [44]. Based on the fuzzy logic approach presented in [15], we have started out with a fuzzy logic model for assessing arousal with EDA gradient. We have first customized the membership function according to our data set. The mean of the EDA gradient was 0.004925482  with a standard deviation of 0.030947348, therefore three boundaries were set based on the shape of our data set as follows:

Emotion Recognition from Physiological Sensor Data

167

Fig. 3. Fuzzy membership function based on histogram.

TERM low: = (−0.29527918, 1) (0.004925482, 0); TERM mid: = t r i a n 0.004925482 0.026021866 0.03587283; TERM high: = (0.004925482, 0) (0.03587283, 1); As our qualitative and quantitative approaches confirmed the relation between EDA gradient and the arousal level, the following 3 rule sets were used to de-fuzzify/translate the level of EDA gradient to the arousal level.

4 Support for Self-regulated Learning Our various attempts on correlating physiological data with emotional states shine some insights to enhance the role of a learning support system which could, in due time, provide learners with context-aware personally adaptive recommendations. Similar to human teachers and other tutoring staffs, the machine learning integrated system would observe and evaluate its performance and recommend alternatives for improvement. However, unlike human teacher, a ML integrated system may not be successful in inferring students’ behaviors as it depends on a vast load of students’ data which omits the individuality concerning culture, physiology and genetics [45]. In addition, most systems are developed to support cognitive achievement toward a fixed group of people, therefore they are limited in supporting learners in a wide spectrum of field and group. For instance, most intelligent tutoring systems are developed to support STEM discipline for students in elementary to high school, which lacks its generality in applying to other students of varying ages and fields. Various tutoring systems have been developed to teach a programming language [51, 52]. Meta-Tutor, an intelligent tutoring system developed to promote self-regulated learning, focused on cognitive and metacognitive support in a context of biology [50]. As an emotional and social support system is highlighted for its supportive role in learning, various studies investigate affective, emotional learning support system [53, 54]. Unlike cognitive domain, emotions have unique individual components with less dependency on a specific group, domain or age. Thus, exploring emotion detection and

168

H. Yun et al.

support system may bring more generalizable results. Even though emotion may not be easy to clearly distinguish from one to the other (e.g. tiredness vs boredom or sadness), for a learning support system less accurately implemented ML may still provide hidden characteristics of students [46, 48, 49] and support learners sufficiently. In addition, compared to the fixed rule-based system, ML could enhance the ways of interaction between learners and a system by improving computer-human interaction (CHI) toward human-human interaction (HHI). Among various attempts for learning support system, in our studies, we have investigated three systems which are 1) sensor based adaptive learning support system called SmartMonitor, 2) mobile sensor-based learning companion called Charlie and 3) user-friendly sensor based adaptive learning support system called LISA SmartMonitor. All three systems aim to provide a learner a system which acts as a learning companion which has the characteristics of human. Compared to the traditional tutoring system, learning companions emphasize the relational interaction with learners by focusing on providing more agency on learners’ competency in learning management. Furthermore, a learning companion aims to accompany a learner in their learning journey regardless of a learning domain (e.g. math, music, history, sport) and its purpose is to provide learners with awareness of their learning status, to allow learners to reflect on their previous learning and to persist in learning by setting a learning goal. Based on the design considerations of a mobile senor-based learning companion [54], our systems are designed to show learners with an overview of their learning goal, physiological sensor data and environmental data with varying usability emphasis. SmartMonitor (Fig. 4) shows learner’s physiological sensor data, air quality level along with goal which was set by learner. Additionally, learner is provided with a volitional control strategy with a matching Emoji.

Fig. 4. SmartMonitor- a sensor based learning support system.

With Charlie (Fig. 5), learners are provided with a recommendation of which goal was most successfully fulfilled along with the ideal time and environment to learn (e.g. brightness, noise level and location). Additionally, learners can reflect on their distraction level while studying using Charlie.

Emotion Recognition from Physiological Sensor Data

169

Fig. 5. Charlie- a mobile sensor based learning companion.

Lastly, with the LISA SmartMonitor, learners are accompanied with a metaphorbased visualsbased on [54, 55] which gently alert learners to control their learning (Fig. 6).

Fig. 6. LISA SmartMonitor-an user friendly sensor based learning support system.

Incorporating different fields (e.g. computer science, education and design), our endeavor aims to provide a user-friendly support for self-regulated learning. Specifically, Charlie was developed in a mobile device which aims to provide awareness of learning environment by promoting self-regulated learning (e.g. goal setting, advice on request, reflection and planning using recommendations). Learners in various characteristics and fields can interact with Charlie as Charlie assists in goal setting process and provide learners to evaluate their own set goals. The system provides a history of learners’ past learning experience as a log by providing 1) set goal 2) goal evaluation in percentage 3) brightness of the learning place 4) noise level of the learning place and 5) location of the learning place. For instance, a learner who wants to read a history book for a class can set a goal following SMART rule, which indicates specific, measurable, attainable, realistic and time-bound (e.g. read, history, book, page 243–260, for 40 min). Charlie will present the set goal with detailed information on learning space (e.g. brightness: shiny, noise: quiet). If a learner is distracted, one can tap Charlie and get some additional support which include volitional control strategies, motivational quotes and humor. (e.g. count 10 to 1 backward and try to refocus).

170

H. Yun et al.

Furthermore, to enhance the human-computer interface by tackling the design limitations, LISA SmartMonitor has adopted metaphor-based design to provide reflection on their learning.

5 Conclusion and Future Consideration Under LISA project, we have investigated physiological sensor data which are relevant in a learning environment. Based on the theoretical background and feasibility (e.g. learner wearing a device), we have chosen skin conductance and cardiovascular activities to relate to learning indicators. As emotion is critical to observe and be supported in a learning context and it is less domain-dependent, we have theoretically mapped emotions into measurable constructs (e.g. valence and arousal). By using emotional picture experiment, we have collected physiological data from learners from higher education institute then processed raw data into various feature data. We have used multiple analysis methods (qualitative, qualitative, machine learning and fuzzy logic) to test hypotheses and found that intensity of emotion is significantly related to the gradient of EDA, distinguishing positive from negative emotion using AutoSKlearn is promising (63%) and finally detecting stressful emotion (low valence with high arousal) among other emotions was most successful using AutoSKlearn using cross validation with 5 splittings (75%). The research on relating physiological data with emotion is time-consuming and labor-some yet may research conclude their research by reporting the results. Our research, consecutively, investigates various support system for self-regulated learning by exploring literature on learning companion and computer human interface. We have proposed three approaches on designing a companion with subtle differences in aims (e.g. different platform, focus on design). Yet we have maintained our focus to provide a companion like support system for learners by following the design considerations. Our work is inconclusive yet viable and our future work will entail an implementation of an emotion detection model using the findings (stressful emotion detection as a model, distinguishing negative from positive emotion and relation between EDA signal with arousal) in a learning support system. In parallel, we are aiming to pursue an interdisciplinary support system that provides learners with self-regulated support by investigating user-friendly design by prototyping, a focus group study, interview and observation. Then we will conduct user-study to benefit learners in their self-regulation.

References 1. Zimmerman, B.J.: Becoming a self-regulated learner: an overview. Theory Pract. 41(2), 64–70 (2002) 2. Azevedo, R., Taub, M., Mudrick, N.V., Millar, G.C., Bradbury, A.E., Price, M.J.: Using data visualizations to foster emotion regulation during self-regulated learning with advanced learning technologies. In: Buder, J., Hesse, F.W. (eds.) Informational Environments, pp. 225– 247. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64274-1_10 3. Mohr, D.C., Zhang, M., Schueller, S.M.: Personal sensing: understanding mental health using ubiquitous sensors and machine learning. Annu. Rev. Clin. Psychol. 13, 23–47 (2017)

Emotion Recognition from Physiological Sensor Data

171

4. Yun, H., Fortenbacher, A., Pinkwart, N.: Improving a mobile learning companion for selfregulated learning using sensors. In: Proceedings of the 9th International Conference on Computer Supported Education, CSEDU 2017, vol. 1 (2017) 5. Calvo, R.A., D’Mello, S., Gratch, J., Kappas, A. (eds.): The Oxford Handbook of Affective Computing. Oxford University Press, Oxford (2015) 6. Canzian, L., Musolesi, M.: Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 1293–1304. ACM (2015) 7. Kreibig, S.D., Gendolla, G.H., Scherer, K.R.: Goal relevance and goal conduciveness appraisals lead to differential autonomic reactivity in emotional responding to performance feedback. Biol. Psychol. 91(3), 365–375 (2012) 8. Pecchinenda, A.: The affective significance of skin conductance activity during a difficult problem-solving task. Cogn. Emot. 10(5), 481–504 (1996) 9. Tomaka, J., Blascovich, J., Kelsey, R.M., Leitten, C.L.: Subjective, physiological, and behavioral effects of threat and challenge appraisal. J. Pers. Soc. Psychol. 65(2), 248 (1993) 10. D’Mello, S.K.: A selective meta-analysis on the relative incidence of discrete affective states during learning with technology. J. Educ. Psychol. 105, 1082–1099 (2013) 11. Fairclough, S.H., Venables, L., Tattersall, A.: The influence of task demand and learning on the psychophysiological response. Int. J. Psychophysiol. 56(2), 171–184 (2005) 12. Bradley, M.M., Lang, P.J.: Motivation and emotion. In: Cacioppo, J., Tssinary, L.G., Berntson, G.G. (eds.) Handbook of Psychophysiology, Chap. 25, pp. 581–607. Oxford University Press, New York (2007) 13. Levenson, R.W., Ekman, P., Friesen, W.V.: Voluntary facial action generates emotion-specific autonomic nervous system activity. Psychophysiology 27(4), 363–384 (1990) 14. Cacioppo, J.T., Berntson, G.G., Larsen, J.T., Poehlmann, K.M., Ito, T.A., et al.: The psychophysiology of emotion. In: Handbook of Emotions, vol, 2, pp. 173–191 (2000) 15. Mandryk, R.L., Atkins, M.S.: A fuzzy physiological approach for continuously modeling emotion during interaction with play technologies. Int. J. Hum Comput Stud. 65(4), 329–347 (2007) 16. Vrana, S.R., Cuthbert, B.N., Lang, P.J.: Fear imagery and text processing. Psychophysiology 23(3), 247–253 (1986) 17. Libby Jr., W.L., Lacey, B.C., Lacey, J.I.: Pupillary and cardiac activity during visual attention. Psychophysiology 10(3), 270–294 (1973) 18. Ekman, P., Levenson, R.W., Friesen, W.V.: Autonomic nervous system activity distinguishes among emotions. Science 221(4616), 1208–1210 (1983) 19. Chanel, G., Mühl, C.: Connecting brains and bodies: applying physiological computing to support social interaction. Interact. Comput. 27(5), 534–550 (2015) 20. Lang, P.J.: The emotion probe: studies of motivation and attention. Am. Psychol. 50(5), 372 (1995) 21. Picard, R.W., Vyzas, E., Healey, J.: Toward machine emotional intelligence: analysis of affective physiological state. IEEE Trans. Pattern Anal. Mach. Intell. 23(10), 1175–1191 (2001) 22. Bradley, M.M., Lang, P.J.: The international affective picture system (IAPS) in the study of emotion and attention. In: Coan, J.A., Allen, J.J.B. (eds.) Handbook of Emotion Elicitation and Assessment, Chap. 29, pp. 29–46. Oxford University Press, New York (2007) 23. Lang, P.J., Bradley, M.M., Cuthbert, B.N.: International affective picture system (IAPS): Affective ratings of pictures and instruction manual. Technical report A-8. University of Florida, Gainesville, FL (2008) 24. Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161 (1980)

172

H. Yun et al.

25. Boucsein, W.: Electrodermal Activity. Springer, New York (2012). https://doi.org/10.1007/ 978-1-4614-1126-0 26. Camm, A., et al.: Heart rate variability: standards of measurement, physiological interpretation and clinical use. Task force of the European society of cardiology and the North American society of pacing and electrophysiology. Circulation 93(5), 1043–1065 (1996) 27. Gruber, J., Mennin, D.S., Fields, A., Purcell, A., Murray, G.: Heart rate variability as a potential indicator of positive valence system disturbance: a proof of concept investigation. Int. J. Psychophysiol. 98(2), 240–248 (2015) 28. Heathers, J., Goodwin, M.: Dead science in live psychology: a case study from heart rate variability (HRV) (2017) 29. Lanzetta, J.T., Cartwright-Smith, J., Eleck, R.E.: Effects of nonverbal dissimulation on emotional experience and autonomic arousal. J. Pers. Soc. Psychol. 33(3), 354 (1976) 30. Winton, W.M., Putnam, L.E., Krauss, R.M.: Facial and autonomic manifestations of the dimensional structure of emotion. J. Exp. Soc. Psychol. 20(3), 195–216 (1984) 31. Conati, C., Chabbal, R., Maclaren, H.: A study on using biometric sensors for monitoring user emotions in educational games. Technical report (2018) 32. Ferdinando, H., Seppänen, T., Alasaarela, E.: Emotion recognition using neighborhood components analysis and ECG/HRV-based features. In: De Marsico, M., di Baja, G.S., Fred, A. (eds.) ICPRAM 2017. LNCS, vol. 10857, pp. 99–113. Springer, Cham (2018). https://doi. org/10.1007/978-3-319-93647-5_6 33. Ayata, D.D., Yaslan, Y., Kama¸sak, M.: Emotion recognition via galvanic skin response: comparison of machine learning algorithms and feature extraction methods. Istanbul Univ.-J. Electr. Electr. Eng. 17(1), 3147–3156 (2017) 34. Minhad, K., Hamid Md Ali, S., Reaz, M.: A design framework for human emotion recognition using electrocardiogram and skin conductance response signals. J. Eng. Sci. Technol. 12(11), 3102–3119 (2017) 35. Schölkopf, B., Burges, C.J., Smola, A.J. (eds.): Advances in Kernel Methods: Support Vector Learning, pp. 327–352. MIT Press, Cambridge (1999) 36. Sammut, C., Webb, G.I.: Encyclopedia of Machine Learning and Data Mining, pp. 314–315. Springer, Heidelberg (2017) 37. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015) 38. Kreibig, S.D.: Autonomic nervous system activity in emotion: a review. Biol. Psychol. 84(3), 394–421 (2010) 39. Scheibe, S., Fortenbacher, A.: Heart Rate Variability alsIndikatorfür den emotionalen Zustand eines Lernenden. In: Proceedings der Pre-Conference-Workshops der 17. E-Learning FachtagungInformatik co-located with 17th e-Learning Conference of the German Computer Society (DeLFI 2019) (2019) 40. Cleary, J.G., Trigg, L.E.: K*: an instance-based learner using an entropic distance measure. In: Machine Learning Proceedings 1995, pp. 108–114. Morgan Kaufmann (1995) 41. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: ICML, vol. 96, pp. 148–156 (1996) 42. Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13(4–5), 411–430 (2000) 43. Stez, C., Anrich, B., Schumm, J., Marca, R., Troster, G., Elhlert, U.: Discriminating stress from cognitive load using a wearable EDA. IEEE Trans. Inf Technol. Biomed. 14(2), 410–417 (2010) 44. Cox, E.: Fuzzy fundamentals. IEEE Spectr. 29(10), 58–61 (1992) 45. Woolf, B.P.: Building Intelligent Interactive Tutors: Student-Centered Strategies for Revolutionizing E-Learning. p. 225. Morgan Kaufmann (2010)

Emotion Recognition from Physiological Sensor Data

173

46. Gertner, A.S., VanLehn, K.: Andes: a coached problem solving environment for physics. In: Gauthier, G., Frasson, C., VanLehn, K. (eds.) ITS 2000. LNCS, vol. 1839, pp. 133–142. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45108-0_17 47. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016) 48. Arroyo, I., Beck, J.E., Woolf, B.P., Beal, C.R., Schultz, K.: Macroadapting animalwatch to gender and cognitive differences with respect to hint interactivity and symbolism. In: Gauthier, G., Frasson, C., VanLehn, K. (eds.) ITS 2000. LNCS, vol. 1839, pp. 574–583. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45108-0_61 49. Johns, J., Woolf, B.: A dynamic mixture model to detect student motivation and proficiency. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence, pp. 2–8. AAAI Press, Boston (2006) 50. Azevedo, R., Witherspoon, A., Chauncey, A., Burkett, C., Fike, A.: MetaTutor: a Meta Cognitive tool for enhancing self-regulated learning. In: 2009 AAAI Fall Symposium Series (2009) 51. Koedinger, K.R., Aleven, V.A.W.M.M., Heffernan, N.: Toward a rapid development environment for cognitive tutors. In: Artificial Intelligence in Education: Shaping the Future of Learning through Intelligent Technologies, Proceedings of AI-ED, pp. 455–457 (2003) 52. Aleven, V., McLaren, B.M., Sewall, J., Koedinger, K.R.: The cognitive tutor authoring tools (CTAT): preliminary evaluation of efficiency gains. In: Ikeda, M., Ashley, K.D., Chan, T.-W. (eds.) ITS 2006. LNCS, vol. 4053, pp. 61–70. Springer, Heidelberg (2006). https://doi.org/ 10.1007/11774303_7 53. Lallé, S., Conati, C., Azevedo, R.: Prediction of student achievement goals and emotion valence during interaction with pedagogical agents. In: Proceedings of the 17th International Conference on Autonomous Agents and Multi Agent Systems, pp. 1222–1231. International Foundation for Autonomous Agents and Multiagent Systems (2018) 54. McDuff, D., Karlson, A., Kapoor, A., Roseway, A., Czerwinski, M.: AffectAura: an intelligent system for emotional memory. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 849–858. ACM (2012) 55. Cernea, D., Weber, C., Ebert, A., Kerren, A.: Emotion-prints: interaction-driven emotion visualization on multi-touch interfaces. In: Visualization and Data Analysis 2015, vol. 9397, p. 93970A. International Society for Optics and Photonics (2015) 56. Silber-Varod, V., Eshet-Alkalai, Y., Geri, N.: Tracing research trends of 21st -century learning skills. Br. J. Educ. Technol. 50, 3099–3118 (2019) 57. Yun, H., Fortenbacher, A., Helbig, R., Pinkwart, N.: In search of learning indicators: a study on sensor data and IAPS emotional pictures. In: Proceedings of the 11th International Conference on Computer Supported Education (CSEDU 2019) (2019) 58. Schandry, R.: Heart beat perception and emotional experience. Psychophysiology 18(4), 483– 488 (1981). https://doi.org/10.1111/j.1469-8986.1981.tb02486.x

Separating the Disciplinary, Application and Reasoning Dimensions of Learning: The Power of Technology-Based Assessment Gyöngyvér Molnár1(B)

and Ben˝o Csapó2

1 University of Szeged, Pet˝ofi S. sgt. 30-34, Szeged 6726, Hungary

[email protected] 2 MTA–SZTE Research Group on the Development of Competencies, Pet˝ofi S. sgt. 30-34,

Szeged 6726, Hungary

Abstract. The aim of this study was to show how technology-based assessment can support personalized learning. The paper outlines the theoretical foundations and realizations of an online assessment system, eDia, which was designed to provide students and teachers regular feedback from the beginning of schooling to the end of the six years of primary education. The three-dimensional theoretical model of knowledge separates the reasoning, application and content aspects of learning. The eDia system contains almost 20,000 innovative (multimedia-supported) tasks in reading, mathematics and science, developed in the three-dimensional approach. The sample for the experimental study was drawn from first- to sixth-grade students (aged 7 to 12) in Hungarian primary schools. There were 505 classes from 134 schools (N = 10,737) in the sample. Results empirically confirmed that: (1) technology-based assessment can be used to make students’ learning visible; (2) there is a significance to separating the three dimensions of learning, which proved to be highly correlated, yet different constructs; and, finally, (3) the item banks in the eDia system are well structured and fit the knowledge level of firstto sixth-graders in all three main domains of learning. Keywords: Technology-based assessment · Assessment for learning · Item banking

1 Introduction Assessment and feedback play a central role in successful learning [1, 2]. The idea of using them to make learning visible and initiate evidence-based education was emphasized by John Hattie. He synthesized the results of over 800 meta-analyses encompassing 52,637 studies and concluded that the diversity among students, the limited capacity of teachers and the need to provide proper feedback to promote students’ cognitive development represent among the most challenging tasks in education [3]. Such a difficult assessment problem cannot be solved with traditional assessment instruments, whereas information and communications technology (ICT) may be of use. The present paper © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 174–190, 2020. https://doi.org/10.1007/978-3-030-58459-7_9

Separating the Disciplinary, Application and Reasoning Dimensions

175

introduces the theoretical foundations and realization of such an online assessment system, eDia, which was designed for this purpose. The eDia system makes the most essential processes of learning and cognitive development visible by providing students and teachers with regular feedback in the three main domains of education – reading, mathematics and science – through technology-based diagnostic assessment from the beginning of schooling to the end of the six years of primary education. The system was planned and developed by the Centre for Research on Learning and Instruction, University of Szeged. The eDia system is an integrated, learning-centred online assessment system that supports all assessment processes from theory-based item bank development through technology-based test administration and IRT-based data analyses to an easy-to-use and well-interpretable feedback module. In this paper, we outline the theoretical foundations and the practical realization of the eDia system. First, we describe the assessment frameworks and show how they were used in writing items and building an item bank. Then we demonstrate how the three dimensions of learning, the reasoning, application and disciplinary aspects of knowledge, can be assessed and made visible by using technology-based assessment in everyday educational practice and how students’ reasoning skills, the applicability of their knowledge and their disciplinary knowledge are related from the perspective of reading, mathematics and science. We summarize how technology-based assessment and the different types of feedback can support personalized learning by providing detailed feedback for teachers on their students’ cognitive development.

2 Theoretical Frameworks: A Three-Dimensional Model of Learning Based on experiences in framework development in international assessment projects and taking a number of theoretical considerations and empirical results into account, we proposed a three-dimensional model of learning (Fig. 1). The model and approach assume that in modern societies these three dimensions of learning should not be mutually exclusive and that they should not compete for teaching time; instead, they should be present simultaneously. They should reinforce and interact with each other in school education.

Fig. 1. The three-dimensional model of learning [4].

The three-dimensional model of knowledge has been developed with a particular focus on the psychological processes of learning and cognitive development that are

176

G. Molnár and B. Csapó

less visible in everyday teaching (see e.g. [4–7]) in the context of learning in three main domains, reading, mathematics and science. The framework for reading [8, 9] was somewhat different from those for mathematics [10, 11] and science [12, 13], which were more similar to each other. 2.1 The Thinking (Psychological) Dimension The content of the first dimension of knowledge, the thinking dimension, the most universal across cultures and school systems, is the most culture-fair dimension of knowledge. There have been several attempts to assess the thinking dimension of knowledge in international large-scale assessment programmes, e.g. when PISA chose problem solving three times as a fourth, innovative domain out of the seven data collection cycles up to 2019.

Fig. 2. A Grade 6-level scientific reasoning task.

According to the eDia framework, “the psychological dimension of knowledge not only contains ‘domain-specific reasoning skills’, but also general reasoning skills embedded in different content and contexts, which has lately been referred to as transversal skills, and is not the same as procedural knowledge. We assume that there are natural cognitive developmental (psychological) processes” [14], the developmental level of which at the beginning of schooling determines later success [15]. This approach opens the door to enhancing domain-general reasoning skills in a domain-specific context. The tasks presented in Figs. 2 and 3 combine scientific knowledge about living systems and mathematical knowledge about geometry with the assessment of students’ inductive reasoning skills. In both of the tasks, students need to discover both similarities and differences in the attributes of individual objects and to group them based on a set of

Separating the Disciplinary, Application and Reasoning Dimensions

177

similarities or differences they need to take into account. According to Klauers’ definition [16] of inductive reasoning, students must use the operation of cross-classification in these items.

Fig. 3. A Grade 1-level mathematics reasoning task.

2.2 Application Dimension Around the turn of the millennium, another prominent large-scale assessment programme was launched by the OECD, the Programme for International Student Assessment (PISA). The PISA frameworks have shifted the focus from the disciplinary to the application dimension of knowledge [17], which is often known as literacy, and defined the competencies students need in a modern society. As an impact of PISA, the usability of knowledge acquired in school outside the school context has become the main issue in contrast to the previous dominance of disciplinary knowledge. At elementary level, problems become more realistic, when everyday observations and experiences come to play an active role in the problem-solving process [18]. The problems are embedded in relevant situations, illustrated by pictures which can be manipulated. In both of the tasks presented in Figs. 4 and 5, students can interact with the problem environment using online technology. The task in Fig. 4 encompasses an additional important feature of an authentic problem beyond the real-life-like context; namely, several solutions are possible. During the scoring procedure, it is all the same which of the carrots are placed – dragged and dropped – on the other plate. It is only the number of carrots that counts. All of the combinations are accepted. The task measures skill level addition up to 10 in a realistic application context.

178

G. Molnár and B. Csapó

Fig. 4. Grade 5-level scientific application task.

Fig. 5. Grade 1-level mathematics application task.

Separating the Disciplinary, Application and Reasoning Dimensions

179

2.3 The Disciplinary Dimension In the history of education, the most traditional and the most commonly known is the disciplinary dimension of knowledge, which is often called subject matter or content knowledge described in school curricula and presented in textbooks. The first prominent international assessment programme in the 1970s, the Trends in International Mathematics and Science Study (TIMSS), focused mostly on this dimension of knowledge as a type of curriculum- and textbook-oriented summative assessment. This kind of knowledge is assessed in the eDia diagnostic system via tasks measuring the acquisition of concepts and procedures which are part of the curriculum. Figure 6 presents an example, in which pupils’ disciplinary scientific knowledge is assessed in the area of Earth science, and Fig. 7 illustrates a task measuring pupils’ mathematics disciplinary knowledge in the area of numeracy [for more examples, see 14 and 18].

Fig. 6. A Grade 5-level scientific disciplinary task (Elements of the drop-down list: parent river, tributary, mouth, watershed, drainage basin).

2.4 Building an Item Bank to Assess the Three Dimensions of Learning It is no longer debated that almost 20 years after the millennium, technology-based assessment has become mainstream compared to traditional testing. The use of technology has strongly improved the efficiency of testing procedures and offered the possibility

180

G. Molnár and B. Csapó

Fig. 7. A mathematics disciplinary task for Grade 1.

to re-think the purpose of assessment. Realizing efficient and reliable testing is no longer an issue. Two new directions have appeared and questions arisen: (1) how can we use assessments for personalized learning? Specifically, how can we use assessments to make learning visible to help teachers tailor education to individual students’ needs? And (2) how can contextual information gathered beyond the response data (e.g. time on task and repetition) be used and contribute to providing more sophisticated feedback to learners and teachers instead of using single indicators, such as a test score? [18]. The development and scope of the eDia system fit this issue and the re-thinking of the assessment process. The primary aim of the system is to provide diagnostic feedback, based on the three-dimensional model of knowledge described in the previous section, with objective reference points for teachers on their students’ development in the domains of reading, mathematics and science from the beginning of schooling to the end of the six years of primary education. It allows a more authentic and realistic technology-based testing environment than traditional assessments with all the attendant advantages. The creation of an assessment system designed for personalized learning required not only the development of software for low-stakes technology-based assessment optimized for large-scale assessment (for more detailed information, see [18, 19]) and a hardware infrastructure, but also the development of an item bank with tens of thousands of empirically scaled items.

Separating the Disciplinary, Application and Reasoning Dimensions

181

Based on the three-dimensional model of learning described in the previous section, we constructed an item bank for diagnostic assessments containing over 20,000 innovative (multimedia-supported), empirically scaled tasks in the domains of reading, mathematics and science. To prevent reading difficulties and ensure the validity of the results in the first to third grades, instructions were provided both in written form and online by a pre-recorded voice (see the loudspeaker icons in Figs. 3, 4, 5 and 7). Thus, students in Grades 1 to 3 used headphones during the administration of the tests, and because of the multimedia elements it is suggested that headphones also be used in Grades 4 to 6. At present, the system is used in more than 1000 elementary schools (approx. onethird of the primary schools in Hungary; see [20]). In these schools, eDia makes learning visible by providing students and teachers regular feedback on their knowledge level – among other areas – in the domains of reading, mathematics and science based on the three-dimensional model.

3 Applying the eDia System in Everyday School Practice to Monitor Dimensions of Learning Separately 3.1 Aims The objectives of the study were fourfold. First, we examined the applicability of an online diagnostic assessment system in regular school practice. We then empirically validated the three-dimensional model of learning outcomes based on research results collected among first- to sixth-graders using eDia, the Hungarian online diagnostic assessment system. Based on the results of the dimensionality analyses, we tested the appropriateness of the item bank and ran the scaling procedure to be able to place the items in a given domain on one single scale and answer the research question: how do the application, psychological and disciplinary dimensions of mathematics, reading, and science develop over time from Grades 1 to 6. Finally, we examined the relationship between the three dimensions of learning in each of the domains. To do this, we used almost 1639 items out of 91,000 to ascertain the applicability of online diagnostic assessments in regular educational practice and empirically validate the three-dimensional theoretical model of learning introduced above. 3.2 Methods The sample was drawn from first- to sixth-grade students (aged 7–13) in Hungarian primary schools (N = 10,896; see Table 1). School classes formed sampling units. 505 classes from 134 schools in different regions were involved in the study, and thus students with a wide-ranging distribution of background variables took part in the data collection. The online tests were administered within as diagnostic assessments during regular school hours using the eDia system. The assessment thus took place in the schools’ ICT labs using the available school infrastructure (mostly desktop computers) within the participating Hungarian schools. It was supervised by the teacher, who had been thoroughly trained in test administration. Teachers had the option to allow their students to take the tests within a six-week period of time. Schools participated in the programme voluntarily. The proportion of boys and girls was about equal.

182

G. Molnár and B. Csapó Table 1. The study sample (based on [18]). Grade Domain R 1

722

M

Generally Age [mean (SD)] S

720

496

1030

7.8 (.58)

2

1049 1049

678

1351

8.8 (.61)

3

1240 1287

852

1762

9.8 (.62)

4

1580 1598

879

2148

10.8 (.60)

5

1798 1941 1587

2476

11.8 (.60)

6

1617 1535 1488

2129

12.9 (.59)

Mean 8006 8130 5980 10896 Note: R: reading; M: mathematics; S: science.

Test completion in a single domain lasted no more than 45 min (one school lesson) consisting of 50–55 items for lower graders and 60–85 items for higher graders. Each test contained tasks from the three learning dimensions and for the vertical scaling tasks, which were originally developed for students in both lower and higher grades. The instruments formed a part of the whole test battery; it consisted of 1639 items (543 items for reading, 604 items for mathematics and 492 items for science) developed in the three-dimensional approach to learning. To prevent reading difficulties, instructions were provided online using a pre-recorded voice in tasks developed for 1st- to 3rdgraders. Children used a mouse or keyboard to indicate their answer. Time-on-task and achievement data were analysed to test the applicability of the online assessment system. We conducted confirmatory factor analyses (CFA) within structural equation modelling (SEM; [21]) to test the underlying measurement model to ascertain whether the data fit the hypothesized three-dimensional theory of knowledge in the three main domains and in the three dimensions of learning, application (literacy), thinking (reasoning) and disciplinary knowledge, respectively, or whether the one-dimensional model is valid with all three dimensions combined under one general factor. The Rasch model was used to run the vertical and horizontal scaling of the data and draw the three-dimensional item-person map for mathematics, reading and science. Bivariate correlations and partial correlations were employed to test the relations between the three dimensions of knowledge in the three main domains of learning. 3.3 Results According to the time-on-task data (Table 2), the online diagnostic tests proved to be applicable during regular school hours even at the very beginning of schooling using the school infrastructure without any modern touch screen technology. Generally, students spent more time completing tasks in the reasoning dimension of knowledge in the fields of mathematics and reading than they did in the other two dimensions of learning. This tendency could not be observed for science, where students spent most of the time completing tasks in the reasoning dimension of learning.

Separating the Disciplinary, Application and Reasoning Dimensions

183

Table 2. Time-on-test data in the three dimensions of learning in reading, mathematics and science, respectively. Dimension of learning Mean (SD), sec Mathematics Reading

Science

Test level

1972 (824)

1926 (853) 1214 (485)

Application dim.

659 (443)

674 (522)

392 (241)

Disciplinary dim.

714 (485)

670 (455)

347 (219)

Reasoning dim.

598 (598)

581 (368)

474 (293)

The three-dimensional measurement model for each domain and in each grade showed a good model fit (see Table 3). The Comparative Fit Index (CFI) generally varied between .91 and .97, the Tucker–Lewis Index (TLI) values were above .90, and the Root Mean Square Error of Approximation (RMSEA) in most cases was below .06, indicating a good global model fit. According to the χ2 -difference tests, the three-dimensional model fit significantly better than the one-dimensional model in each domain and in each grade, indicating that the three dimensions of learning can be distinguished empirically, independent of domain and grade. Table 3. Goodness of fit indices for testing the three-dimensional model of learning in reading, mathematics and science for Grades 1 to 6. Grade

Reading

Mathematics

Science

CFI

TLI

RMSEA

CFI

TLI

RMSEA

CFI

TLI

RMSEA

1

.947

.941

.057

.953

.948

.077

.921

.915

.050

2

.975

.973

.032

.944

.939

.061

.944

.939

.038

3

.833

.818

.054

.923

.912

.046

.924

.918

.111

4

.937

.932

.066

.940

.933

.060

.939

.930

.060

5

.911

.908

.035

.939

.931

.060

.938

.933

.040

6

.970

.967

.037

.912

.906

.054

.934

.928

.048

The three-dimensional item-person maps (Figs. 8, 9 and 10) show the match between the item difficulty distribution and the distribution of students’ three-dimensional Raschscaled achievement estimates for reading, mathematics and science. They also show the distribution of the items, which are in line with the knowledge level of the students. The ‘x’s represent students’ Rasch-scale achievement (blue: in the application dimension; red: in the reasoning dimension; and green: in the disciplinary dimension). The numbers represent the items in the item bank (colouring is the same as it is for students). The item bank proved to be appropriate to measure students’ learning outcomes from Grades 1 to 6. It consists of very easy, very difficult and average items as well, and there are no difficulty gaps on the line. However, there are some noticeable differences

184

G. Molnár and B. Csapó RA RR RD +items -------------------------------------------------------------------------------Person’s ability - high Item’s difficulty: high 6 | | |128 536 | | | | | | | | |130 5 | | |127 129 | | |478 | | |126 | | |61 398 | | | 4 | | | | | | | | X|503 507 | | |473 | | X| 3 X| X| X|124 349 X| X| XX|15 141 414 505 506 X| XX| XX|522 XX| XX| XX|12 363 XXX| XX| XXX|78 395 496 2 XXX| XXX| XXX|49 81 95 96 394 502 504 513 XXX| XXXXX| XXXXX|5 97 152 158 288 317 466 535 XXXX| XXXXX| XXXXX|99 100 101 203 246 286 287 532 XXXXXXX| XXXXXXX|80 289 290 314 315 318 355 465 XXXXXX| XXXXXXXXXX| XXXXXXXX| XXXXXXXX|3 10 102 110 121 123 255 267 280 1 XXXXXXXXXX| XXXXXXXXXX| XXXXXXX|26 32 33 45 46 73 167 265 269 XXXXXXXXXXX| XXXXXXXXXX| XXXXXXXX|30 31 55 79 98 120 122 180 239 XXXXXXXXXX| XXXXXXXXX| XXXXXXXXXXX|8 14 43 111 168 182 183 251 257 XXXXXXXXXXXXX| XXXXXXXXXX| XXXXXXXX|54 76 104 156 157 169 171 176 XXXXXXXXXXXX| XXXXXXXXXXX| XXXXXXXXXXX|4 34 38 39 82 94 118 140 146 153 0 XXXXXXXXXXX| XXXXXXXXXX| XXXXXXXXX|18 44 115 125 142 155 159 160 XXXXXXXXXX| XXXXXXXXX| XXXXXXXXX|1 13 41 56 59 60 64 74 112 154 XXXXXXXX| XXXXXXXXX| XXXXXX|7 29 40 87 88 108 116 119 135 XXXXXXXX| XXXXXXX| XXXXXXXXX|11 17 21 35 42 58 63 93 109 114 XXXXXXX| XXXXXXXX| XXXXXXX|2 37 47 75 106 113 117 133 136 -1 XXXXXX| XXXXX| XXXXXXX|6 28 57 65 90 166 170 173 189 XXXX| XXXXXX| XXXX|16 20 27 51 53 62 71 72 107 198 XXXX| XXXX| XXXX|48 66 68 70 77 91 92 105 132 134 XXX| XXX| XXXX|9 50 52 83 137 172 174 175 204 XX| XXX| XXX|19 84 85 89 103 131 138 149 194 -2 XX| XXX| XXX|86 139 145 164 223 248 249 297 XX| XX| XX|69 151 163 192 217 218 221 282 XX| XX| XX|22 36 67 148 161 196 205 215 216 X| XX| X|162 165 197 208 209 220 234 247 XX| X| X|190 191 193 195 206 236 263 264 -3 X| X| X|23 24 25 235 261 292 298 311 312 X| X| X|147 207 284 369 461 462 X| | X|144 188 283 310 415 X| | X|143 233 417 | X| X|185 186 187 202 330 354 440 -4 | X| |231 232 | | X|229 230 327 332 438 | | |326 328 329 345 436 | | |331 437 439 -5 | | X|334 344 | | |333 343 Person’s ability - low Item’s difficulty - low

Fig. 8. A three-dimensional item-person map in the domain of reading (each ‘x’ represents 63 students). (Color figure online)

in the comparison of the item-person maps in the three main domains of learning in the distribution of students’ abilities, in the distributions of item difficulties and in the correspondence of the distributions of item difficulties to those of students’ abilities. The student-level distributions are more similar for reading and science, and there are much greater differences in the domain of mathematics. While there are difficult items in the reading item bank, the number of difficult items must be increased for precise assessments. Similarly, while there are easy items in the mathematics item bank, the number of easy items must be increased for precise

Separating the Disciplinary, Application and Reasoning Dimensions

185

MA MR MD items -------------------------------------------------------------------------------| |5 99 108 109 127 135 137 138 146 Person’s ability - high | |68 86 149 164 178 191 192 193 | | |6 58 119 230 315 357 390 478 479 | | |51 84 107 179 303 318 415 431 4 | | |317 356 391 541 560 572 | | |118 320 333 462 557 562 571 | | |7 40 70 71 83 313 358 455 456 | X| |39 332 433 533 534 552 601 | | |42 190 435 469 521 X| X| |355 392 434 436 559 3 X| | X|301 565 X| X| |41 67 162 314 316 457 603 X| X| |69 87 148 172 377 397 430 432 X| X| X|347 467 468 531 540 542 X| XX| XX|169 319 334 410 532 XX| XX| XX|163 348 421 529 530 539 600 2 XX| XX| XXX|2 19 117 136 219 221 304 373 445 XXX| XXX| XX|22 50 218 220 225 227 229 236 XXXXX| XXXXX| XXXX|47 214 331 365 375 440 476 477 XXX| XXXXX| XXXX|48 309 416 423 429 502 520 598 XXXX| XXXXX| XXXXX|49 61 88 176 237 285 294 364 371 XXXXXXX| XXXXXX| XXXXXX|104 259 260 262 286 345 370 376 1 XXXXXXXX| XXXXXXXX| XXXXXXXX|36 59 60 161 174 215 224 242 261 XXXXXXXXXX| XXXXXXXXX| XXXXXXXXXX|38 115 245 249 271 274 300 366 XXXXXXXXX| XXXXXXXXX| XXXXXXXXX|1 4 57 96 97 98 116 131 239 244 XXXXXXXXXX| XXXXXXXXXXXXXXXXXXXXXX|55 113 185 203 205 223 228 241 XXXXXXXXXXXXXXXXXXXXXXXXX|XXXXXXXXXXXX|13 106 121 159 186 222 234 238 XXXXXXXXXX| XXXXXXXXX|XXXXXXXXXXXX|46 54 56 76 103 105 125 128 158 XXXXXXXXXXXXX|XXXXXXXXXXXX|XXXXXXXXXXXX|34 44 133 141 145 175 177 216 0 XXXXXXXXXXXXX|XXXXXXXXXXXX|XXXXXXXXXXXX|3 15 17 20 37 52 85 123 126 139 XXXXXXXXXXXX| XXXXXXXXXXX|XXXXXXXXXXXX|33 45 53 75 122 157 187 189 202 XXXXXXXXXXX| XXXXXXXXXXX|XXXXXXXXXXXX|21 31 124 130 188 252 253 279 XXXXXXXXXXX|XXXXXXXXXXXXXXXXXXXXXXXXX|10 29 35 73 94 184 204 226 254 XXXXXXXXXX| XXXXXXXXXX| XXXXXXXXX|18 32 64 114 132 147 212 217 250 XXXXXXXXXX| XXXXXXX| XXXXXXXXXXX|14 27 74 77 79 95 143 144 246 -1 XXXXXXX| XXXXXXX| XXXXXXXXX|62 78 82 120 142 156 160 182 248 XXXXXXX| XXXXXXXX| XXXXXX|112 134 196 197 213 247 277 287 XXXXXX| XXXXXX| XXXX|80 180 211 256 380 399 401 404 XXXXX| XXXX| XXXXX|110 195 208 257 275 290 325 395 XXXXX| XXX| XXX|28 63 81 129 209 210 295 322 338 XXX| XXXX| XXX|11 12 16 24 183 194 207 255 258 -2 XXX| XXXX| XX|111 166 167 173 201 305 321 336 XX| XXXX| XXX|93 168 199 200 270 324 335 343 XX| XX| XX|25 414 480 493 499 543 544 546 X| X| XX|140 155 206 269 351 354 453 549 X| X| X|23 43 154 170 181 267 296 323 X| X| X|8 9 101 171 198 352 454 538 545 -3 X| | |30 66 72 268 465 547 587 588 590 | | X|26 65 510 589 591 X| X| X|464 X| | X|150 153 461 463 509 | | |91 102 165 459 | | |508 | | |89 -4 | | |152 507 | | |100 151 Person’s ability - low | |92 567 568 570 | |90 569

Fig. 9. A three-dimensional item-person map in the domain of mathematics (each ‘x’ represents 47 students). (Color figure online)

assessments. The distribution of the items in the three dimensions indicates that further items must be developed in the disciplinary dimension of each learning domain if we place the items on a single scale. Most of the items measuring disciplinary knowledge can be categorised as easy or difficult. Generally, the 1639 items extracted from the eDia system item bank are well structured and fit the knowledge level of first- to sixth-graders in all three main domains of learning. The application (blue signs) and reasoning (red signs) items were well-matched to the sample (‘x’ and number are parallel), but some

186

G. Molnár and B. Csapó RA RR RD items -------------------------------------------------------------------------------Person’s ability - high Item’s difficulty: high 6

5

4

3

2

1

0

-1

-2

-3

-4

-5

| | |128 536 | | | | | | | | |130 | | |127 129 | | |478 | | |126 | | |61 398 | | | | | | | | | | | X|503 507 | | |473 | | X| X| X| X|124 349 X| X| XX|15 141 414 505 506 X| XX| XX|522 XX| XX| XX|12 363 XXX| XX| XXX|78 395 496 XXX| XXX| XXX|49 81 95 96 394 502 504 513 XXX| XXXXX| XXXXX|5 97 152 158 288 317 466 535 XXXX| XXXXX| XXXXX|99 100 101 203 246 286 287 532 XXXXXXX| XXXXXXX|80 289 290 314 315 318 355 465 XXXXXX| XXXXXXXXXX| XXXXXXXX| XXXXXXXX|3 10 102 110 121 123 255 267 280 XXXXXXXXXX| XXXXXXXXXX| XXXXXXX|26 32 33 45 46 73 167 265 269 XXXXXXXXXXX| XXXXXXXXXX| XXXXXXXX|30 31 55 79 98 120 122 180 239 XXXXXXXXXX| XXXXXXXXX| XXXXXXXXXXX|8 14 43 111 168 182 183 251 257 XXXXXXXXXXXXX| XXXXXXXXXX| XXXXXXXX|54 76 104 156 157 169 171 176 XXXXXXXXXXXX| XXXXXXXXXXX| XXXXXXXXXXX|4 34 38 39 82 94 118 140 146 153 XXXXXXXXXXX| XXXXXXXXXX| XXXXXXXXX|18 44 115 125 142 155 159 160 XXXXXXXXXX| XXXXXXXXX| XXXXXXXXX|1 13 41 56 59 60 64 74 112 154 XXXXXXXX| XXXXXXXXX| XXXXXX|7 29 40 87 88 108 116 119 135 XXXXXXXX| XXXXXXX| XXXXXXXXX|11 17 21 35 42 58 63 93 109 114 XXXXXXX| XXXXXXXX| XXXXXXX|2 37 47 75 106 113 117 133 136 XXXXXX| XXXXX| XXXXXXX|6 28 57 65 90 166 170 173 189 XXXX| XXXXXX| XXXX|16 20 27 51 53 62 71 72 107 198 XXXX| XXXX| XXXX|48 66 68 70 77 91 92 105 132 134 XXX| XXX| XXXX|9 50 52 83 137 172 174 175 204 XX| XXX| XXX|19 84 85 89 103 131 138 149 194 XX| XXX| XXX|86 139 145 164 223 248 249 297 XX| XX| XX|69 151 163 192 217 218 221 282 XX| XX| XX|22 36 67 148 161 196 205 215 216 X| XX| X|162 165 197 208 209 220 234 247 XX| X| X|190 191 193 195 206 236 263 264 X| X| X|23 24 25 235 261 292 298 311 312 X| X| X|147 207 284 369 461 462 X| | X|144 188 283 310 415 X| | X|143 233 417 | X| X|185 186 187 202 330 354 440 | X| |231 232 | | X|229 230 327 332 438 | | |326 328 329 345 436 | | |331 437 439 | | X|334 344 | | |333 343 Person’s ability - low

Item’s difficulty - low

Fig. 10. A three-dimensional item-person map in the domain of science (each ‘x’ represents 60 students). (Color figure online)

average-level disciplinary items (green signs) were missing from the tests. Further study is needed to test the behaviour of the whole item bank. The bivariate correlations between the three dimensions of learning were higher and proved to be more similar for mathematics and reading (see Fig. 11), ranging from .57 to .63, while they were significantly weaker for science, ranging from .52 to .54. Partial correlations were significantly lower as all bivariate relationships were influenced by the third construct (see Fig. 11). Between the application and disciplinary

Separating the Disciplinary, Application and Reasoning Dimensions

187

Fig. 11. Relations between the three dimensions of learning in the domains of reading, mathematics and science. (Solid lines indicate bivariate correlations; dotted lines represent partial correlations. All coefficients are significant at the p < .001 level.).

dimensions of learning, it varied from .36 to .45; between the application and reasoning dimensions, it varied from .31 to .38; and, finally, between the reasoning and disciplinary dimensions, we saw the most similar relationship ranging from .33 to .36. We assumed that disciplinary knowledge and reasoning predict performance in the application dimension of mathematics, reading and science, since we need that dimension of learning most in everyday life. Thus, we regressed (1) MD, RD and SD and (2) MR, RR and SR on (3) MA, RA and SA, respectively, and estimated the proportion of variance explained. The results showed that the disciplinary and reasoning dimensions of learning explained performance in the application dimension of learning at a moderate to high level (71%, 54% and 73%), but with a different effect (see Fig. 12). The residuals of measures of the disciplinary and reasoning dimensions were still correlated at a moderate level (r = .36, .36 and .33), indicating common aspects of (1) MD and MR, (2) RD and RR, and (3) SD and SR that are separable from (4) MA, RA and SA, respectively. Each of the models fits well (CFI = 1.000, TLI = 1.000, RMSEA = .000).

Fig. 12. A structural model of reading, mathematical and scientific knowledge: disciplinary and reasoning dimensions of knowledge as predictors for the application dimension (*p < .01).

To sum up, our results have shown that the reasoning, application and disciplinary dimensions of knowledge are highly correlated constructs, though not identical. Students’ levels of disciplinary knowledge and thinking skills strongly influence and predict achievement in the context of mathematical and scientific application and do so on a moderate level in the context of reading. That is, it is necessary to make learning visible in all of the three dimensions of learning to be able to provide proper feedback for students and teachers on their teaching and learning efficacy.

188

G. Molnár and B. Csapó

4 Conclusions In the present paper, we have explored the possibilities of using online diagnostic assessment in an educational context to make three different aspects of learning visible. We introduced the theoretical foundations and realizations of an online assessment system, the eDia, which was designed to provide students and teachers with regular feedback in three main domains of education – reading, mathematics and science – from the beginning of schooling to the end of the six years of primary education. The three-dimensional frameworks were created on the basis of theoretical assumptions derived from research results on cognitive development and expectations of schooling in modern societies, on the one hand, and taking into account the experiences of framework development for large-scale educational assessments, on the other. The threedimensional model of learning outcomes, separating reasoning, application and disciplinary aspects of knowledge, was mapped into item banks, and technology was used to implement regular diagnostic assessments in everyday educational practice. Thus, differentiated, scientifically established, regular assessment can support personalized learning and teaching by providing detailed feedback for teachers on their students’ cognitive development. In the present analyses, we have empirically confirmed the theoretical model on which the frameworks are based and have shown the importance of separating the three dimensions of learning. The dimensions identified have proved to be highly correlated, yet different constructs. Therefore, students may be developed with targeted, personalized interventions in the domains and dimensions where they are most lagging behind. Beyond confirming the three-dimensional model of learning outcomes, we have used item-person maps to show that the item bank for the eDia system is appropriate to measure students’ cognitive development in the first six years with sufficient items for assessing students at different levels of development. With its item bank, the eDia system both supports evidence-based practice and paves the way for data-based (assessment-based) instruction. The current analyses were carried out with data from the first phases of operation of the eDia system. As the results presented in this study have confirmed the validity of the three-dimensional approach, the cognitive orientation of the dimensions can be further strengthened. Those items that fit least in a given dimension can be improved or removed, while further items can be created similarly to the best fitting items, as well as new items for difficulty levels less well covered by items. As more schools join the regular assessments, the analyses introduced in this study may later be replicated with larger samples and item pools to further confirm the validity of the three-dimensional model. Further research is needed to investigate the effectiveness of three-dimensional feedback in educational practice. Teachers need time to became familiar with the content of testing, the meaning of the separated dimensions, and the message the data delivers to them concerning the development of their pupils. They also have to learn how the deficiencies identified by the diagnostic assessments can be treated. A further relevant research question is how teachers can solve the problems of differentiated instruction on their own and what kind of support they need to improve the effectiveness of their

Separating the Disciplinary, Application and Reasoning Dimensions

189

teaching. As easily applicable, inexpensive, valid and reliable assessments are available, well-controlled intervention studies may be launched to explore the potential of differentiated learning. Funding. This study was funded by OTKA K115497 and EFOP 3.4.3.

References 1. Hattie, J.: Visible Learning: A Synthesis of over 800 Meta-analyses Relating to Achievement. Routledge, New York (2009). https://doi.org/10.4324/9780203887332 2. Hattie, J., Timperley, H.: The power of feedback. Rev. Educ. Res. 77(1), 81–112 (2007). https://doi.org/10.3102/003465430298487 3. Hattie, J.: Visible Learning for Teachers: Maximizing Impact on Learning. Routledge, New York (2012). https://doi.org/10.4324/9780203181522 4. Csapó, B.: Goals of learning and the organization of knowledge. In: Klieme, E., Leutner, D., Kenk, M. (eds.) Kompetenzmodellierung. Zwischenbilanz des DFG-Schwerpunktprogramms und Perspektiven des Forschungsansatzes. 56. Beiheft der Zeitschrift für Pädagogik, Weinheim, Beltz, pp. 12–27 (2010) 5. Adey, P., Csapó, B.: Developing and assessing scientific reasoning. In: Csapó, B., Szabó, G. (eds.) Framework for Diagnostic Assessment of Science, pp. 17–54. Nemzeti Tankönyvkiadó, Budapest (2012) 6. Blomert, L., Csépe, V.: Psychological foundations of reading acquisition and assessment. Framework for diagnostic assessment of reading. In: Csapó, B., Csépe, V. (eds.) Framework for Diagnostic Assessment of Reading, pp. 17–87. Nemzeti Tankönyvkiadó, Budapest (2012) 7. Nunes, T., Csapó, B.: Developing and assessing mathematical reasoning. In: Csapó, B., Szendrei, M. (eds.) Framework for Diagnostic Assessment of Mathematics, pp. 17–56. Nemzeti Tankönyvkiadó, Budapest (2011) 8. Csapó, B., Csépe, V. (eds.): Framework for Diagnostic Assessment of Reading. Nemzeti Tankönyvkiadó, Budapest (2012) 9. Csapó, B., Steklács, J., Molnár, G. (eds.): Az olvasás-szövegértés online diagnosztikus értékelésének tartalmi keretei [Framework for Online Diagnostic Assessment of Reading]. Oktatáskutató és Fejleszt˝o Intézet, Budapest (2015) 10. Csapó, B., Szendrei, M. (eds.): Framework for Diagnostic Assessment of Mathematics. Nemzeti Tankönyvkiadó, Budapest (2011) 11. Csapó, B., Csíkos, C., Molnár, G. (eds.): A matematikai tudás online diagnosztikus értékelésének tartalmi keretei [Framework for Online Diagnostic Assessment of Mathematics]. Oktatáskutató és Fejleszt˝o Intézet, Budapest (2015) 12. Csapó, B., Szabó, G. (eds.): Framework for Diagnostic Assessment of Science. Nemzeti Tankönyvkiadó, Budapest (2012) 13. Csapó, B., Korom, E., Molnár, G. (eds.): A természettudományi tudás online diagnosztikus értékelésének tartalmi keretei [Framework for Online Diagnostic Assessment of Science]. Oktatáskutató és Fejleszt˝o Intézet, Budapest (2015) 14. Molnár, G., Csapó, B.: Making the psychological dimension of learning visible: using technology-based assessment to monitor students’ cognitive development. Front. Psychol. 10, 1368 (2019). https://doi.org/10.3389/fpsyg.2019.01368 15. Nguyen, T., et al.: Which preschool mathematics competencies are most predictive of fifth grade achievement? Early Childhood Res. Q. 36, 550–560 (2016). https://doi.org/10.1016/j. ecresq.2016.02.003 16. Klauer, K.J.: Denktraining für Jugendliche. Hogrefe, Göttingen (1993)

190

G. Molnár and B. Csapó

17. OECD: The PISA 2003 assessment framework. Mathematics, reading, science and problem solving knowledge and skills. OECD, Paris (2003). https://doi.org/10.1787/19963777 18. Molnár, G., Csapó, B.: How to make learning visible through technology: the eDia-online diagnostic assessment system. In: Lane, H., Zvacek, S., Uhomoibhi, J. (eds.) CSEDU 2019. Proceedings of the 11th International Conference on Computer Supported Education, vol. 2, pp. 122–131. Scitepress, Heraklion (2019) 19. Csapó, B., Molnár, G.: Online diagnostic assessment in support of personalized teaching and learning: the eDia system. Front. Psychol. 8, 2022 (2019). https://doi.org/10.3389/fpsyg. 2019.01522 20. Csapó, B., Molnár, G.: Potential for assessing dynamic problem-solving at the beginning of higher education studies. Front. Psychol. 8, 2022 (2017). https://doi.org/10.3389/fpsyg.2017. 02022 21. Bollen, K.A.: Structural Equations with Latent Variables. Wiley, New York (1989). https:// doi.org/10.1002/9781118619179

Behind the Shoulders of Bebras Teams: Analyzing How They Interact with the Platform to Solve Tasks Carlo Bellettini, Violetta Lonati, Mattia Monga(B) , and Anna Morpurgo Universit` a degli Studi di Milano, Milan, Italy {bellettini,lonati,monga,morpurgo}@di.unimi.it https://aladdin.di.unimi.it

Abstract. In the Italian Bebras Challenge on Informatics and Computational Thinkings, an interactive online platform is used to display attractive tasks to teams of pupils, and evaluate the answers they submit. We instrumented the platform in order to collect also data concerning the interactions of pupils with the system. We analyzed this data according to a multidimensional model use to describe such interaction, and collected many overall statistics on the problem solving process and teams’ behavior. The fine grained data we logged was also useful to analyze how the teams engaged with specific tasks, which we designed as easy and turned out to be difficult. By looking at the data, we were able to explain the unexpected difficulties, nailing down what had distracted or confused many solvers. Keywords: Computing education · Computational thinking · Informatics contests · Learning analytics · Problem-solving process

1

Introduction

The Bebras International Challenge on Informatics and Computational Thinking1 [7,9,11] targets a wide audience of pupils, with the goal of getting them acquainted with the fundamental concepts of informatics. The challenge is organized on an annual basis in several countries since 2004, with almost 3 million participants from 54 countries in the 2018 edition. In most countries the contest is played online: participants have to solve a set of about 10–15 tasks that are designed to be fun and attractive, but also suggestive of a significant piece of informatics or computational thinking. The tasks are not designed to check a specific knowledge or a curricular skill. In fact, in many countries informatics is not part of the curriculum for non-vocational studies and Bebras tries to offer the opportunity to get in touch with a realistic informatic activity to non specialists, by avoiding the use of jargon or other technicalities (for example, the need of programming languages). 1

http://bebras.org/.

c Springer Nature Switzerland AG 2020  H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 191–210, 2020. https://doi.org/10.1007/978-3-030-58459-7_10

192

C. Bellettini et al.

The tasks should be challenging enough to keep the engagement of pupils high, but not too difficult to discourage them. It is not easy to predict the difficulty of tasks [2,13–15] or the way pupils will approach them. However, since Bebras tasks are more and more used as the starting point for educational activities carried out by teachers during their school practice [6,8,12], it is important to understand the difficulties they pose to contestants, at least a posteriori : thus, most of the platforms used to deliver the challenge can be used to collect overall statistics and analyze the results in terms of success ratio [1,16]. We wanted, however, to be able to follow the whole process of problem solving: how much time do pupils spend in reading the questions? Is the time spent in reading correlated with a successful answer? Many tasks are interactive, indeed sophisticated [5], i.e., they present open-answer questions and may require complex, combined answers. In this case the process pupils use to (try to) solve a task deserves a careful observation with both qualitative techniques and quantitative ones. If the tasks were proposed in the classroom or under the control of an experimenter, smart environments or digital tangibles could be exploited to collect data on the interactions going on in the group, see for example [4]. We instrumented our contest online platform [3] in order to track significant event data during the contest: the selection of a different task, when a user inserts any kind of input or interacts with an active part of the screen (text areas, buttons, menus, draggable objects, etc.). In this paper, we recall the multidimensional model we proposed for describing the interaction of pupils in their problem solving process [3], and we discuss some of the possibilities of observation that these fine-grain data open up. In particular we report about two specific studies we carried out on two tasks proposed during the last Bebras edition. We had expected these tasks to be easy, but the success ratio turned out to be surprisingly low. We used the collected data to understand how the pupils engaged with the tasks: which strategies they used in trying to solve it and where they stop or fail to recognize an erroneous answer. The paper is organized as follows. In Sect. 2 we recollect the model underlying the data our online platform can record; in Sect. 3, we describe the Bebras context in which we applied it; in Sect. 4 we show which kind of broad analyses the model enables; in Sect. 5 we present two tasks we decided to examine in depth; in Sect. 6 we analyze the data collected in order to understand the strategies used in solving those two specific tasks; finally in Sect. 7 we draw some conclusions.

2

A Model for Interaction

In this section we sketch the model we designed [3] to capture how pupils interact with our Bebras contest system. Another recent paper uses a similar approach to analyze the “work habits” of students engaged in open-ended programming assignments [10]. We chose some measures and indicators that we found useful for framing the problem solving activity of pupils in our contest. While our multidimensional model has been designed by taking into consideration our experience with the Bebras Challenge, we believe it applies as well to other learning environments which provide interactive activities.

Behind the Shoulders of Bebras Teams

193

Our basic assumptions are that pupils: – – – – –

are engaged in a sequence of tasks that can be addressed in any order; always have the possibility to go back to tasks; always have the possibility to change answers given earlier; may get feedback, if provided by the task; have a fixed limited time to complete the tasks.

In particular, the model considers two aspects of the pupils’ behavior: the engagement (e.g., the pupil stays for a long or short time on a specific task, once only or revisiting it) and the interaction mode (e.g., the pupil reads/watches/thinks or acts). These aspects are complex and cannot be directly observed and measured by a unique variable. Instead we introduce a pool of simple indicators that are clearly related to engagement and interaction mode: by considering these indicators together we get a model of the pupils’ interaction behavior. In order to do that, we distinguish between three levels of activity on a task at a given time. Level 0 no activity: another task is displayed. Level 1 reading/watching/thinking: the task is displayed but there is no action, hence the pupil is reading the text, or watching the included images and diagrams, or thinking about the task. Level 2 acting: the pupil is inserting or changing the answer of the task, or is doing an action that gets a feedback from the system. To describe the interaction of a pupil on any single task, we use the following indicators: 1. 2. 3. 4.

initialReadingTime: time spent on the task before the first interaction; firstSessionTime: length of the first session spent on the task; totalTime: total time spent on the task; displaySessions: number (possibly 0) of read/watch/think sessions, i.e., without action; 5. dataSessions: number (possibly 0) of sessions with some action; 6. actionTime: total time spent acting on the task; 7. feedback: number of actions that got feedback from the system.

For instance, a pupil that uses a trial-and-error approach without any reflection would show a short initialReadingTime but a high actionTime and feedback. On the other hand, a significant initialReadingTime with a high displaySessions and a low dataSessions may describe the behavior of a careful motivated pupil that thinks a lot on the solutions, goes back and checks them, but does not modify them often.

3

The Bebras Challenge in Italy

In Italy the Bebras challenge is organized by ALaDDIn2 , a working group of our Computer Science Department born with the aim of changing how non-specialist audiences perceive informatics and reforming how the discipline is presented in 2

https://aladdin.di.unimi.it.

194

C. Bellettini et al.

schools. In November 20183 51,634 pupils (22,550 females, 29,084 males) took part to the online challenge. They participated in 15,738 teams of 3–4 pupils, divided in five age groups: KiloBebras (grades 4–5, ages ≈ 9–10), MegaBebras (grades 6–7, ages ≈ 11–12), GigaBebras (grade 8, age ≈ 13), TeraBebras (grades 9–10, ages ≈ 14–15), PetaBebras (grades 11–13, ages ≈ 16–18). The challenge proposed twelve tasks, to be solved in 45 min. The set of tasks was different for each age group, but some tasks were repeated in two or three sets. The teams access a web platform that presents the tasks to be solved. There are different types of questions: multiple choice, open answer with a number/text box, drag-and-drop, interactive, and so on. Occasionally some automatic feedback is provided, especially by interactive tasks (for instance when clicking on a “simulation” button). The web application that presents the tasks to be solved was designed as a multifunctional system to support all phases of the competition: task editing and participants’ registration and training before the contest; tasks’ administration, monitoring, and data collection during the contest; scoring, access to solutions, and production of participation certificates after the contest. For a detailed description of the architecture and implementation of the system, see [1]. Each task is designed to occupy exactly the full screen, no matter which device is used (see Fig. 1); in the side-bar, active zones with numbers allow to move among tasks and a task can be entered as many times as wished—we say that, each time a task is entered, a new session on the task starts. Answers can be changed in every moment, since they are submitted for evaluation only when either the contestant ends the contest or the allowed time is over. Moreover, it is possible to insert a partial answer and complete it in a later session, since tasks that have already been displayed appear exactly as they were when the last session ended.

Fig. 1. A screenshot of the Italian Bebras platform during the contest. This specific task was designed by the Belgian Bebras organizers, as indicated by the flag in the top-right corner. This is an interactive task in which the solvers must give the answer by re-ordering the buttons in bottom-right corner of the screen. 3

Full statistics are available, in Italian, at https://bebras.it/1819/Statistiche+gare. html.

Behind the Shoulders of Bebras Teams

195

The contest system was designed to record several pieces of information, mostly needed to support ordinary operations related to the contest. In particular, before the contest the system collects some data about the composition of teams (age, gender, and grade of each member) and their school (geographical data, number of participating teams, number of teachers involved in Bebras). While a team is taking part in the contest, the system stores the current state of each task, determined by the data currently inserted by the team and relevant to compute the score gained in the task. When the allowed time ends, the current state of each task is considered final and recorded as the submitted answer for the task. We instrumented our system in order to log data about many significant events besides the submitted answers: all events that allow a team to select and display a different task, by clicking on the numbers or arrows in the side-bar; those pertaining to the insertion of (part of) an answer, e.g., by typing, selecting a multiple choice option, selecting an option in a scroll-bar menu, dragging and dropping an object, clicking on an active part of the screen, and so on; those to get feedback (e.g., by clicking on a simulation button). For each tracked event the system logs a timestamp, the type of event (enter a task, change the state of the task, get feedback, leave the task), and all changes in the state of the task, if any.

4

Visualizing the Overall Behavior of a Team

In this Section we present a tool that provides teachers with a visual representation of several data describing the interaction behavior of Bebras contestants with the system, also in comparison with the average behavior of contestants of the same age. The tool processes the event-tracking data collected by the Bebras platform described in Sect. 3. Some data are filtered out before processing due to either the inconsistency of the collected data, or suspects of cheating, or anomalies derived from technical issues occurred during the contest. For instance, the cohort of primary school pupils (KiloBebras) that participated in the Italian Bebras contest in 2018 includes 3,770 teams (of 4 pupils each at most), among which 327 teams were filtered out for data inconsistency, mainly related to timestamps4 : we analyzed the data for 3,443 KiloBebras teams. The tool displays diagrams in the dashboard that teachers use to manage teams’ registration and to check scores and rankings: such diagrams illustrate the detailed behavior of any specific team both on the overall challenge and on each specific task, and present summary views of indicators for tasks and teams. The visual representation of the behavior of a team during its contest is given by a diagram as in Fig. 2, which depicts, for each task, the level of activity on each time slot of 10 s; when the team goes from a task to another one, the plots of the two tasks overlap slightly. The legend shows also the score gained by the 4

The logging of events relies on the clocks of the computers used by contestants.

196

C. Bellettini et al.

Fig. 2. Time line of a team. The level of activity (0—no activity, 1—reading/thinking, 2—acting) for each task is shown in each time slot of 10 s. The last digits report the points awarded to the answer out of the maximum available. T01 and T04 are discussed respectively in Sects. 6.1 and 6.2.

team in each task (for example, 4/4 means that all the four points associated to that task were gained). In this particular case, the team tackled the tasks in the order they were presented, and inserted answers in the first session of each task (acting); on the first (“Anthill scramble”, a programming task that provided feedback, see Sect. 6.2) and the second task (the task shown in Fig. 1, the solution had to be given by reordering the buttons) the team started almost immediately to act, and spent some time to insert/edit the right answer; on the following tasks, instead, the initial reading times were longer (as illustrated in the plot by the plateaus preceding the picks), i.e., answers were inserted after some time devoted to reading and thinking about each task; after about 1,400 s from the start (more or less half the available time of 45 min) the team reached the last task. From that moment on, the team went back and forth through all tasks and displayed them repeatedly. In many cases the team stayed on the tasks less than 10 s, this is most probably due to the fact that they clicked on the tasks’ numbers in the left-bar in order to find a specific task they were searching for. On tasks T01, T02, and T06, instead, they spent some time and they interacted with the system, possibly modifying their answers. In particular they came back again to question T06 after around 2,100 s from start; they probably discussed about the task a lot, but this time thet did not change their answer (which indeed was correct). The event-tracking data for a team are used to compute the measures described in the previous section, which summarize the behavior of that team on each task, see Table 1. In particular, we compute the actionTime as follows: the allowed time is divided into brief time slots (10 s each); the number of time

Behind the Shoulders of Bebras Teams

197

Table 1. Indicators for a team in each task initialReadingTime firstSessionTime totalTime displaySessions dataSessions actionTime feedback T01

22

93

243

4

3

24

10

T02

13

45

125

2

2

6

0

T03

27

72

93

2

1

17

0

T04

54

226

250

2

1

9

0

T05

41

59

100

2

1

1

0

T06

39

51

377

2

2

3

0

T07

49

52

142

3

1

1

0

T08

50

88

193

4

1

6

0

T09 122

126

243

6

1

1

0

T10 115

122

141

0

1

3

0

T11

75

275

298

2

1

11

0

T12 102

166

240

6

1

2

0

Table 2. Mean, standard deviation and five-number summary—over all contestants— of all indicators for a task (T01) [3]. initialReadingTime firstSessionTime totalTime displaySessions dataSessions actionTime feedback mean 47.38

244.38

348.33

0.42

1.67

54.95

11.64

std

25.68

133.35

198.36

0.73

0.91

41.96

13.10

min

1

2

10

0

0

0

0

25%

33

155

202

0

1

25

4

50%

42

221

299

0

1

42

8

75%

56

309

448

1

2

71.25

15

max

294

1,198

1,447

5

7

327

188

slots where an action occurs are counted, and the resulting number is multiplied by the duration of slots. Moreover, when counting sessions, we do not consider sessions that last less than 5 s, which occur mainly when the contestants are searching for a task and click quickly on the side-bar to find the desired one. In order to place the behavior of a single team among other teams, we compute, for each task, the mean, variance, min, max, and quartiles—over all contestants—of each indicator, see Table 2. The general behavior of any team during the whole activity with respect to any indicator can be shortly described by the sums, over all tasks, of the ranking of the team compared to other teams. We call this sum the ranking index of the team for that indicator. For instance, if a team have usually devoted a high time to initial reading, resulting in a high ranking on many tasks with respect of this indicator, it will have a high ranking index for the initialReadingTime indicator. The set of ranking indices for a given team are visualized in a radar chart with a dimension for each indicator, see Fig. 3. In this particular case, for instance, the team shows a very short reading habit; moreover, as it was evident also from the timeline (Fig. 2) the team had a tendency at displaying again the given answers (probably to check them). Radar diagrams can be used also to compare the behavior of different teams under the light of the data (possibly weighting the analytical data with their knowledge of the team members), for a commented example see [3].

198

C. Bellettini et al.

Fig. 3. Radar plot of the ranking indexes of a team, representing the general behavior of the team during the whole activity.

5

Two KiloBebras Tasks that Require Further Inspection

Bebras tasks are more and more used as the starting points for educational activities carried out by teachers during their school practice [6,8,12] and it is thus important to understand the challenges they pose and the way pupils tackle the tasks. In particular we are considering here the category KiloBebras (grades 4–5) in the last Italian edition of the Bebras challenge. The KiloBebras teams had to solve a suite of twelve tasks: four easy, four middle, and four hard ones. Of these, five were multiple choice and one required an open answer (a number). Of the other six, two required to arrange objects into some requested order, two required to drag objects into the correct place, and for the other two the solution was built by (repeatedly) clicking on a number of buttons to select the right combination of instructions/colors that composed the solution. Moreover one task (the programming one, see below) had a button to run a simulation, so it was possible to have feedback on the submitted solution. We decided to analyze the collected data of two of these tasks, “Waterfalls” (T04 – drag an object into the correct place) and “Anthill scramble” (T01 – selecting the right combination of instructions by repeatedly clicking on a number of buttons, with feedback), for which the percentage of correct answers was unexpectedly low with respect to the foreseen difficulty of the task. We consider a task’s success ratio as an objective a posteriori measure of the difficulty of the task, in particular a success ratio of at least 70% indicates the

Behind the Shoulders of Bebras Teams

199

Fig. 4. Task T04 “Waterfalls”.

task can be considered easy, of aroud 50% that it is medium, and of at most 30% that it is a difficult task. In the task “Waterfalls” (Bebras id: 2018-HU-03, the Italian version is shown in Fig. 4 objects thrown down waterfalls that merge into one river are carried by the water and pass under some bridges. Under each bridge an object is substituted by another object. The task asks to identify which object has to be thrown down which waterfall in order to get a log out at the end of the river. The choice is between two possible objects, a fish and a carrot, and three different waterfalls. In particular the task asks to drag either the carrot or the fish into the correct position. In the task “Anthill scramble” (Bebras id: 2018-AU-01, the Italian version is shown in Fig. 5 the request is is to guide an echidna to the anthill while eating all red ants. Pupils are required to insert a sequence of commands (direction arrows) to start its way on a path and then to take the right direction at every crossroad. The task is interactive in that it allows to test the sequence of commands given by clicking on a button that highlights the actual path walked by the echidna, one stretch at a time. So it is possible to check the correctness of the answer and also to study the effect of each command or sequence of commands in order to understand their semantics. We collected data on these tasks from 3,443 KiloBebras teams.

200

C. Bellettini et al.

Fig. 5. Task T01 “Anthill scramble”.

The “Waterfalls” task had only 28% of correct answers, which surprised us as the task was supposed to be an easy one: it requires to trace which objects are substituted by which, but this must be done only for two objects (carrot and fish), substitutions happen for one object at a time, and at most three times along each path. The “Anthill scramble” task had 39% of correct answers. Given that the task has a simulation button which in fact allows pupils to check their answer, we expected a success rate around 60–70%, which is often the case for tasks with feedback.

6

A Deeper Analysis

We wanted to further examine the tasks T04 (“Waterfalls”) and T01 (“Anthill scramble”), in which the percentage of correct answers was unexpectedly low with respect to the foreseen difficulty of the task. We wondered if it would be possible, through the analysis of some data collected by the platform during the contest (see Sect. 3), to get a deeper understanding of the way a team had worked or (mis)understood the request that would explain at least part of the wrong answers. All changes in the given answer are recorded by our system, with their timestamp. So it is possible to deeply analyze the data to follow the whole problem

Behind the Shoulders of Bebras Teams

201

solving process applied by a team and possibly observe a specific problem solving approach or specific difficulties in understanding or solving the task: when the task requires making a choice among different possibilities (i.e., testing their effectiveness), do they apply some kind of systematic research? For tasks for which an arrangement of components is needed, do they follow an incremental approach? Is the time spent in reading or their working approach correlated with a successful answer? Is it correlated with the time to complete the task? Did the team work toward the solution of the actual task or on a misreading? 6.1

Task 04 - Waterfalls

As already mentioned in the previous section, only 955 out of 3,443 teams (i.e., 28%) correctly solved this task. The correct answer is to drag the fish to the second waterfall. By looking at some of the cases, we noticed common patterns in interacting with the platform. We then deeply looked into the data to cluster the teams in some classes of behavior and determine their cardinality (the data are summarized in Table 3). A Teams who moved just one object once, giving the chosen solution after some reasoning time, on average two minutes, without other interaction with the platform. Of these, 377 gave the correct answer and 147 gave a wrong one. B Teams who initially moved the fish into the correct position, but then made other changes. Out of these, 162 finished with the correct configuration and 521 with a wrong one. C The remaining ones, of which 416 answered correctly and 1,792 gave a wrong answer.

Table 3. Clustering of behaviors of solvers of “Waterfalls”; 28 teams did not answer to this question. Correct answer % on total solvers Wrong answer % on total solvers A 377 B 162 C 416

10.9% 4.7% 12.1%

147 521 1,792

4.3% 15.1% 52.0%

Class B attracted our attention because, although these teams had initially put the right solution, 76% of them submitted then a wrong answer. In the following we graphically show and comment the behavior of some teams. In the graphs, the orange lines show the movements of the carrot, the blue lines the movements of the fish, a green circle near the time column indicates that in that moment the solution was right, a yellow square indicates that the fish was correctly placed but together with a not useful positioning of the carrot

202

C. Bellettini et al.

(also dragged to a waterfall), the dashed lines show the times in which the team exited the task. Figure 6 shows the behavior of a team that reasoned for less than a minute before moving the fish to the second waterfall (blue line), exited the session (dashed line) after checking the solution for almost another minute (we have no tracking of non platform mediated activities, so we cannot further comment on “non action” phases), and were confident about their solution as they never revisited the task.

Fig. 6. Task T04: a team with behavior of type A. (Color figure online)

Figure 7 shows instead the behavior of a team that, after having put the fish in the correct position, moved also the carrot to a waterfall, presumably assuming the task required to move both objects.

Fig. 7. Task T04: a team with behavior of type B. (Color figure online)

Actually the task asked to drag only one of the two objects into the correct position, but indeed positioning also the second object has the only effect that a carrot is found at the end of the river besides a log (and it is not stated that only a log is to be produced). Thus a hypothesis is that a possible misinterpretation of the task was to place both objects at a waterfall. When assigning scores, we had considered such solutions wrong. Recomputing the correctness of answers admitting also this new interpretation changed significantly the cardinalities of the previously presented classes. In particular the number of teams that answered correctly in class B grows from 162 to 524, and in class C from 416 to 1,267. The overall percentage of correct answers is in this case 63%, near to the difficulty level we had expected.

Behind the Shoulders of Bebras Teams

203

Fig. 8. Task T04 class C: methodical team behaviors.

Class C is the largest one and exhibits many different behaviors. However it is possible to identify at least two main subclasses: the methodicals and the unpredictables. Two examples of methodical behavior are depicted in Fig. 8a and b. The teams explored the different possible configurations in an ordered way. It is important to notice that a methodical behavior is not sufficient to guarantee the success in the task, as it is clearly shown by the behavior presented in Fig. 10 in the Appendix. We also note that a methodical approach can be pursued also without interaction with the system, which would result in class A. The unpredictables teams (see Fig. 11a and b in the Appendix) explored the possible configurations without any particular order, returning to already explored ones, and often without recognizing the correct ones. In general the time devoted to the task by these teams is significantly greater than that of the methodical ones, independently of whether they find or not the correct solution. 6.2

Task 01 - Anthill Scramble

The “Anthill scramble” task had 39% of correct answers (1,343 over 3,443 teams). Given that the task has a simulation button which in fact allowed teams to check their answer, we expected a success rate around 60–70%, which is often the case for tasks with feedback. The feedback button allowed to simulate the movement of the echidna according to the inserted commands (direction arrows). Commands needed to be inserted one at a time; for each command a button needed to be clicked in order to change and select the desired direction arrow. For this reason, tracking all clicks would not give meaningful information and our analysis concentrates only on the use of the available feedback. The first surprising observation is that 133 teams (around 3% of the teams) never used the feedback button (with only 4 teams able to find the correct answer in these conditions).

204

C. Bellettini et al. Table 4. Task T01: different approaches to build the solution. Total Correct % correct Wrong % wrong Correct at first attempt

29

29

100%

Incremental

211

146

69.2%

Partially incremental

158

63

39.9%

95

60.1%

Only global simulations 2907

1101

37.9%

1806

62.1%

3.0%

129

97.0%

5

100%

No simulations

133

No answer

4

65

5

Total

3443

1343

30.8%

2100

An interesting finding regards the possibility of getting feedback on partial solutions, which could be exploited by teams in order to better understand the effect of a single command or to incrementally (step by step) construct the solution. Table 4 show how teams used this possibility. The probability of getting the right solution is much higher among those teams who used an incremental approach in building the solution, with respect to the other teams; the 211 teams who built the solution step by step answered correctly in 70% of the cases. Among the other teams, the percentage of successful answers was less than 40%. This holds both for the 2,907 teams who used feedback just on complete solutions, and for the 158 teams who asked for feedback after the insertion of the first command, then proceeded to choose all other commands and asked for feedback again only after the insertion of the complete answer. It is interesting to note that the incremental approach was in some cases used from the start (see Fig. 9a), but often also discovered after failing attempts (see Fig. 9b). As shown in Table 5, the percentage of success remains high in both cases. Table 5. Task T01: incremental approaches. Total Correct % correct Wrong % wrong From the beginning After failing attempts Total

129

85

65,9%

44

34.1%

82

61

74.4%

21

25.6%

211

146

69.2%

65

30.8%

Not only is the percentage of success better with an incremental approach, but also the time dedicated to the task is on average less in this case, in comparison with a non incremental approach, both for the teams who eventually find the correct solution (see for instance Fig. 12a in the Appendix) and for the ones that don’t (see for instance Fig. 12b in the Appendix). Examining the first attempts (i.e., the solutions inserted before clicking the simulation button for the first time), we identify an higher percentage of solu-

Behind the Shoulders of Bebras Teams

205

Fig. 9. Task T01: incremental behaviors.

tions where even the first command was wrong. This suggests that most teams misunderstood the text of the task, which indeed asks to specify the starting direction and the directions to take at crossroads. We may speculate that they might have been confused by the fact that there was no distinction, in the interface, between the first command to insert and the other ones.

7

Conclusion

In order to collect data on how pupils solve computational thinking tasks, we instrumented the platform used to deliver the Bebras contest in Italy. We proposed a multidimensional model for describing the interactions of pupils with the platform but extract clear statistical patterns correlated to success or failure is difficult as shown also by a recent paper that analyzed in the context of openended programming assignments [10]. Summaries of data properly visualized, however, can be offered to teachers to help them in comparing the performance and behaviors of their teams (possibly weighting the analytical data with their knowledge of the team members) and, in general, we believe behavior metrics are useful to better understand the education potential of the Bebras contest and adapt them for inclusion in more regular school activities.

206

C. Bellettini et al.

We used fine-grained data related to the problem solving activity of pupils to analyze how the teams engaged with the assigned tasks. In particular, by looking at the data, we were able to explain unexpected difficulties in specific tasks, nailing down what had distracted or confused many solvers. Unfortunately, these analyses seem difficult to automate: the knowledge of the task logic is crucial to extract meaningful patterns. However, thanks to our analyses we discovered general problems in the way tasks and their interactive parts were designed and we will hopefully avoid the same errors in the next editions of the contest, thus engaging pupils better on our learning objectives. Acknowledgements. We would like to thank the Bebras community for the great effort spent in producing exciting task ideas.

Appendix We present here some figures related to task T04 (Waterfalls) and T01 (Anthill scramble). Figure 10 shows a methodical but wrong behavior; the team followed a methodical exploration without being able to recognize the correct answer even if they examined it for almost 20 s.

Fig. 10. Task T04 class C: methodical, wrong, team behavior.

Behind the Shoulders of Bebras Teams

207

Figures 11a) and b) show two examples of non-methodical behavior. Figure 12a and b show the behaviors of two examples of two teams that do not use an incremental approach; one succeeds, the other does not.

Fig. 11. Task T04 class C: non methodical team behaviors.

208

C. Bellettini et al.

Fig. 12. Task T01: Non incremental behaviors.

Behind the Shoulders of Bebras Teams

209

References 1. Bellettini, C., et al.: A platform for the Italian Bebras. In: Proceedings of the 10th International Conference on Computer Supported Education (CSEDU 2018), vol. 1, pp. 350–357. SCITEPRESS (2018). https://doi.org/10.5220/0006775103500357 2. Bellettini, C., Lonati, V., Malchiodi, D., Monga, M., Morpurgo, A., Torelli, M.: How challenging are Bebras tasks? An IRT analysis based on the performance of Italian students. In: Proceedings of the 20th Annual Conference on Innovation and Technology in Computer Science Education ITiCSE 2015, pp. 27–32. ACM (2015). https://doi.org/10.1145/2729094.2742603 3. Bellettini, C., Lonati, V., Monga, M., Morpurgo, A.: How pupils solve online problems: an analytical view. In: Proceedings of the 11th International Conference on Computer Supported Education (CSEDU 2019), vol. 2, pp. 132–139. SCITEPRESS (2019). https://doi.org/10.5220/0007765801320139 4. Bonani, A., Del Fatto, V., Dodero, G., Gennari, R.: Tangibles for graph algorithmic thinking: experience with children (abstract only). In: Proceedings of the 49th ACM Technical Symposium on Computer Science Education, SIGCSE 2018, pp. 1094–1094. ACM, New York (2018). https://doi.org/10.1145/3159450.3162267 5. Boyle, A., Hutchison, D.: Sophisticated tasks in e-assessment: what are they and what are their benefits? Assess. Eval. High. Educ. 34(3), 305–319 (2009). https:// doi.org/10.1080/02602930801956034 6. Calcagni, A., Lonati, V., Malchiodi, D., Monga, M., Morpurgo, A.: Promoting computational thinking skills: would you use this Bebras task? In: Dagiene, V., Hellas, A. (eds.) ISSEP 2017. LNCS, vol. 10696, pp. 102–113. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71483-7 9 7. Dagien˙e, V.: Supporting computer science education through competitions. In: Proceedings of 9th WCCE 2009, Education and Technology for a Better World, 9th WCCE 2009, Bento, Goncalves (2009). https://www.ifip.org//wcce2009/ proceedings/papers/WCCE2009 pap76.pdf 8. Dagien˙e, V., Sentance, S.: It’s computational thinking! Bebras tasks in the curriculum. In: Brodnik, A., Tort, F. (eds.) ISSEP 2016. LNCS, vol. 9973, pp. 28–39. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46747-4 3 9. Dagien˙e, V., Stupuriene, G.: Informatics education based on solving attractive tasks through a contest. In: KEYCIT 2014 - Key Competencies in Informatics and ICT, pp. 97–115 (2015) 10. Goldstein, S.C., Zhang, H., Sakr, M., An, H., Dashti, C.: Understanding how work habits influence student performance. In: Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education, ITiCSE 2019, pp. 154–160. ACM (2019). https://doi.org/10.1145/3304221.3319757 11. Haberman, B., Cohen, A., Dagien˙e, V.: The beaver contest: attracting youngsters to study computing. In: Proceedings of the 16th Annual Joint Conference on Innovation and Technology in Computer Science Education, pp. 378–378. ACM (2011). https://doi.org/10.1145/1999747.1999891 12. Lonati, V., Malchiodi, D., Monga, M., Morpurgo, A.: Bebras as a teaching resource: classifying the tasks corpus using computational thinking skills. In: Proceedings of the 22nd Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE 2017), p. 366 (2017). https://doi.org/10.1145/3059009.3072987 13. Lonati, V., Malchiodi, D., Monga, M., Morpurgo, A.: How presentation affects the difficulty of computational thinking tasks: an IRT analysis. In: Proceedings of the 17th Koli Calling International Conference on Computing Education Research, pp. 60–69. ACM, New York (2017). https://doi.org/10.1145/3141880.3141900

210

C. Bellettini et al.

14. van der Vegt, W.: Predicting the difficulty level of a Bebras task. Olymp. Inform. 7, 132–139 (2013). https://ioinformatics.org/journal/INFOL127.pdf 15. van der Vegt, W.: How hard will this task be? Developments in analyzing and predicting question difficulty in the Bebras challenge. Olymp. Inform. 12, 119–132 (2018). https://doi.org/10.15388/ioi.2018.10 16. van der Vegt, W., Schrijvers, E.: Analyzing task difficulty in a Bebras contest using Cuttle. Olymp. Inform. 13, 145–156 (2019). https://doi.org/10.15388/ioi.2019.09

Computational Pedagogy: Block Programming as a General Learning Tool Stefano Federici(B) , Elisabetta Sergi, Claudia Medas, Riccardo Lussu, Elisabetta Gola, and Andrea Zuncheddu Università di Cagliari, 09123 Cagliari, CA, Italy [email protected]

Abstract. Education today can get closer to the real world that surrounds our students. Using technology in the classroom is a winning strategy in order to improve the engagement of students and then, in the end, their performances. And using coding tools based on the metaphor of building blocks is an even better alternative to support education at school in all subjects and to improve the engagement and the performances of the students. But this requires to both students and teachers to spend a non-negligible amount of time on each new topic. However, when we take into account all the steps that are necessary to create a multimedia interactive app that is helpful to better understand a given school topic, acquiring all the necessary elements can be done in a short time if we start from the right tools. A computational pedagogy based on the active usage of tools that allow to solve problems by means of computer programming is within our reach. Keywords: Programming-based learning · Scratch · Computational pedagogy

1 Introduction Associating coding to teaching, that is using computer programming in a playful environment as for example Scratch (Maloney et al. 2010; Fig. 1) to teach a school subject has shown to have positive effects in many experiments. that try to analyze the impact of coding on both technical subjects, such as mathematics, geometry or physics (Calao et al. 2015; Lopez and Hernandez 2015; Foerster 2016; Hobenshield Tepylo and Floyd 2016; Miller and Larkin 2017), and non-technical subjects such as English as a foreign language or history (Costa et al. 2016; Gresse von Wangenheim et al. 2017). In these experiments computer programming has been used either as a preliminary tool for mental training (Calao et al. 2015) without any real connection with the mathematical curriculum, or as a substitute of pencil and paper that allows to make calculations and draw geometrical shapes (Lopez and Hernandez 2015; Foerster 2016; Hobenshield Tepylo and Floyd 2016; Miller and Larkin 2017) or, finally, to create animated stories (Costa et al. 2016; Gresse von Wangenheim et al. 2017). But computational thinking is a skill that can be successfully applied to all school subjects in an even deeper way, by allowing students to manipulate the basic elements of their subjects, being them numbers, words and phrases, scenes, etc. © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 211–235, 2020. https://doi.org/10.1007/978-3-030-58459-7_11

212

S. Federici et al.

Fig. 1. The Scratch environment based on the metaphor of building blocks.

The first step in this new approach started on the usage of visual block-programming tools -that is programming tools based on the metaphor of building blocks specifically designed to easily teach computer programming and creating colorful interactive objectsto “give life” to the components of a given topic, for example the components of an exponentiation operation (Fig. 2) or the components of a work of art (Fig. 3) by programming their interactive behavior.

Fig. 2. Exponentiation operations by assembling visual blocks in a 2D environment (Federici et al. 2019b).

Fig. 3. Programming the behavior of the components of a work of art (Federici et al. 2019b).

The blocks of the visual programming language describe in an almost-naturallanguage style the outcome of the action to be performed associated to the block. So,

Computational Pedagogy: Block Programming as a General Learning Tool

213

for example, the “when green flag clicked” block of the Scratch programming language (see Fig. 4) waits for the user to click the green flag at the top right corner of the environment, and then the following “say ‘Hello!’ for 2 secs” block makes an “Hello!” speech bubble show up associated to the corresponding component (in this case the orange Scratch cat). The students, following this approach, learn each new topic through a programming-based learning paradigm (Federici et al. 2018).

Fig. 4. Behavior of the Scratch blocks. (Color figure online)

The next step is then moving to the creation of new, simplified programming languages that are reminiscent of the structure of the topic to be taught. The new languages are designed in order to allow students and teachers to use the mechanism of coding, but without the further burden of having to fully understand all the somewhat complex concepts of computer programming (Fig. 5).

Fig. 5. Block programming environments to learn exponentiation or a subset of English (Federici et al. 2019b).

By means of a special-purpose tool based on block-programming, named BloP (Federici and Gola 2014), the design of the new environment is not extremely complex and can be performed even by users that have learnt how to use block-programming environments for just a few weeks. The final, new programming language is extremely simpler to use by people that do not know much about computer programming. In this paper, an extended version of (Federici et al. 2019a, b), we outline the basic principles of using block programming in order to design new teaching strategies that can form the base of a new computational pedagogy.

214

S. Federici et al.

2 Block Programming Tools for Computational Pedagogy Computational Pedagogy is a new way of implementing school education by means of multimedia and interactive tools. Even if the usage of multimedia and interactive tools at school is not new, computational pedagogy makes stronger assumptions by basing its principles on tools that non only mimic the basic elements of each subject and their behaviors but that are strongly based on coding. The advantages of this new way of using multimedia and interactive tools at school, other than being certainly more fun to the students than the standard approaches (Federici et al. 2018) as it usually happens when porting technology into the classroom (Costley 2014), also increases the problemsolving skills of the students and their creativity thanks to the presence of a coding layer. Furthermore, as the students are asked to build by themselves their explanations by discovering the relationships among the elements of a given topic, the approach increases retention (Federici et al. 2019a, b). 2.1 Principles of Computational Pedagogy Tools that conform to the principles of computational pedagogy must adhere to several important features that will allow the student to fruitfully use the new environment without being frustrated by irrelevant problems. The relevant features of a computational pedagogy tool must be at least the following ones: • • • •

no need to memorize no need to “represent” usage of coding blocks no need to use “variables”

No Need to Memorize. The users must not need to memorize which are the elements that they will have to handle (e.g. numbers, words, etc.) nor which is the specific element that will allow them to do something specific happen or, finally, what is the meaning of a given block or the order and the meaning of its arguments. The environment must clearly show all the relevant elements and their names must clearly explain the function of the element (that is the behavior of each block). This is something that is true in blockprogramming environments such as Scratch, but the items and the blocks necessary to create a project that will explain a given topic are surrounded by a lot of non-relevant items that confuse and discourage the students slowing down their learning process. No Need to Design a “Representation”. The environment must allow the user to handle visual elements that closely looks like the elements of the subject. So, if the subject is mathematics, the environment must show numbers or quantities to the user; if the subject is linguistics it must show words and/or phrases; if the subject is history it must show characters and scenes and sequences of scenes; etc. The users should not be asked to build by themselves representations of the elements of the subject by using numbers, list of numbers, etc. Even if the need to represent each element of the topic by using numbers, list of numbers, etc. is something that is at the base of every standard programming language, this is not the main point when coding is used to improve the learning of a given topic and not as a subject by itself.

Computational Pedagogy: Block Programming as a General Learning Tool

215

Coding Blocks. The environment must limit the amount of actions that the user can perform by directly manipulating the elements of the subject. The manipulation of the elements must be mainly done via special-purpose blocks. No Need to Use “Variables”. All procedural programming languages, that is the vast majority of programming languages, require the user to model the elements of the program they are building by using variables, that is stores were relevant properties of the elements of the program are recorded for future reference. The usage of variables is a difficult element to be acquired by people new to computer programming (Kuittinen and Sajaniemi 2004; Kohn 2017). A computational pedagogy environment must then allow the users to manage the visual elements of the environment by means of specific blocks without having to memorize further properties other than those already accessible by using the available blocks.

2.2 Computational Pedagogy by Pure Coding In the first experiment on the subject of mathematics, the students had to learn how to build the explanation of the exponentiation operation, a topic that is particularly difficult to learn and to remember for 5th year children (Federici et al. 2018). In order to build the explanation, the students had to use a standard block-programming environment. The first step was then teaching to them the fundamental elements of Scratch-like environments. We choose to use both Scratch and Snap (Harvey and Moenig 2010). Snap is a blockprogramming language very similar to Scratch, and mostly compatible with it, with the added bonus of being implemented in a modern and widespread programming language such as Javascript and being designed to be very easily extendible.

Fig. 6. Creating block sequences in Snap by dragging-and-dropping blocks from the block area (to the left) to the script area (to the right). (Color figure online)

The students were first exposed to simple fun projects created by using the block programming language and then they were taught the basics of block programming, that is how Snap allows to manage several characters (“sprites”) by defining their look (imported or drawn pictures) and their behavior by composing “scripts”, that is sequences

216

S. Federici et al.

of programming blocks of the Snap language, in order to build an explanation of the exponentiation operation. The structure of a programming environment for a visual block language is very easy and quick to grasp (Fig. 6). All “instructions” are represented by colored blocks that are visible in the block area, at the left-hand side of the window. By dragging the blocks from the block area to the central part of the tool users can assemble “scripts” for their characters to behave and interact as desired. Blocks are organized in categories. Indeed, like building blocks, they are grouped in different bins with respect to their color/function. Blocks in each category can be accessed by clicking the desired button right above the block area, at the top left. Among the available categories we find Movement (to move the characters of the project), Looks (to change their appearance), Control (to make available basic programming structures such as the repetition of a given behavior), Sensing (to allow characters to “sense” their environment), etc. The organization of blocks in categories is important, so that users can quickly find the desired block by starting from its intended function. In our first experiment about exponentiation the students had to build the explanation that 23 corresponds to the multiplication “2 × 2 × 2”, that is the number 2 multiplied by itself 3 times (Fig. 7). To build the explanation, the students had to learn the behavior of about 20 of the more than 130 Snap programming blocks that allowed them to show/hide the sprites (that is the correct and wrong numbers to be multiplied; the correct and wrong operations to be applied; the main character used by the user to interact with the explanation elements), to move the sprites on the “stage” of the Scratch environment and to interact with the user so to control the movement of the characters.

scripts

stage

sprites

Fig. 7. Tangible elements of the Scratch exponentiation project: numbers, operators, animated character (Federici et al. 2019b).

In building the project the students were manipulating the numbers and the operators involved in calculating 23 , and they had to correctly understand which were the operands (that is the number 2, used 3 times, and not, for example, the number 3 used 2 times or the numbers 2 and 3 multiplied together) and the operators (that is the multiplication operator and not, for example, the addition operator). They had to correctly position the

Computational Pedagogy: Block Programming as a General Learning Tool

217

operands and the operators on the stage and describe the behavior of the main character that gave a positive or a negative feedback depending on the fact that the user had driven it to the correct or the wrong operand and/or operator. The students could get a correct feeling of which were the elements of the problem and how they had to be combined together in order to get the correct result. In this project all the elements of the problem were tangible elements (sprites) and their correct arrangement was regulated by the scripts created by the student. Weaknesses of the Pure Coding Approach. In this approach, in order to describe how to build a correct exponentiation we need to take care of “low-level” features such as the position of the elements on the stage (that is the x and y coordinates of the numbers and operands on the stage), of the images they are showing on the stage, of making the elements show up on the stage at the correct time, etc. This is something that is distant from the topic we are studying, in which we are speaking instead of “numbers” and “multiplications”. Using the low-level operations of the Pure Coding approach requires then both to students and teachers to spend a non-negligible amount of time on each new topic.

2.3 Computational Pedagogy by Special-Purpose Tools In the previous approach the full power of a programming language -in this case the Scratch blocks used to create an interactive animation- was used in order to describe the complex behavior of the components of the exponentiation topic by using low level operations such as positioning of images, loops, etc. But using low level operations requires to both students and teachers to spend a non-negligible amount of time on each new topic. To speed up the learning process, we developed a new approach that still hinges on the compositional properties of programming languages, but in which the blocks of the programming language now represent high level operations that directly describe “real” operations on the basic elements of the problem, leaving the description of their internal behavior to mechanisms that are hidden to the student. So, for example, while in the previous approach to describe how to build a correct exponentiation we needed to take care of features such as the position of the elements on the stage (that is the x and y coordinates of the numbers and operands on the stage), of the images they are showing on the stage, of making the elements show up on the stage at the correct time, etc. (Fig. 8, left) in this new approach the students could describe how to build a 3D representation (Fig. 9) of the exponentiation operation by defining the value of the “base” (that is the number to be multiplied by itself), the value of the “exponent” (that is the number of times the “base” must be multiplied by itself) and by describing how this operations is performed by using several “numbers” whose values are equal to the base, and, finally, by multiplying them (that is using the multiplication “x” operation) as shown in Fig. 8, right. This time the student had just to assemble the scripts, as all the elements of the 3D educational tool (the base, the exponent, the operators, etc.) were already built in. In building the script the students were now logically manipulating the numbers and the operators involved in calculating 23 , so they had to correctly understand the meaning

218

S. Federici et al.

Fig. 8. Exponentiation operations by assembling low level (left) and high level (right) operations.

Fig. 9. Meaning of the exponentiation operation by assembling 3D elements referred in the script as: the “exponentiation” (the black shape containing the white “=” sign), the “base” (the blue element before the “=” sign), the “exponent” (the green number above the “=” sign), the “numbers” (the blue elements after the “=” sign and the “x” operation), and the “operation” (the white “x” sign). (Color figure online)

of what they were doing. In order to build the 3D model of the exponentiation the students had to remember that the value of the “base” (the blue number at the left of the “=” sign) and the values of the “numbers” (the blue numbers at the right of the “=” sign) must be identical. If this is not the case, the script raises an error message. Moreover, by using a “repeat times” block (that is a block corresponding to a loop structure in classic programming) and by specifying inside the loop the value of the “numbers” at the right, the student showed to understand that the blue numbers at the right can only be all identical as they are defined just once inside the loop. Exactly as it happens when the user incorrectly asks to use at the right numbers different from the base, even the usage of an operation different from the multiplication “x” raises an error. So, even in this second approach the students acquire a correct feeling of which are the elements

Computational Pedagogy: Block Programming as a General Learning Tool

219

of the problem and how they have to be combined together in order to get the correct outcome. This time too the elements of the problem are tangible elements (3D shapes) and their correct arrangement is regulated by the scripts created by the student. Weaknesses of the Special-Purpose Tools Approach. In order to build the 3D exponentiation environment, we had to start from a special-purpose tool that -even if still based on the metaphor of building blocks- was strongly oriented towards creating 3D artifacts (BettleBlocks, Romagosa et al. 2016). So we had to learn a full set of new concepts related to 3D printing in order to create the specific blocks related to the exponentiation problem (see for example the internal behavior of the “use n as BASE below” block in Fig. 10, left). Moreover, all the blocks, options, elements of the original BettleBlocks tool, that are not related to the specific subject we are interested in, are still fully visible (see for example several blocks in the BeetleBlocks’ Control category in Fig. 10, right) causing some confusion to the interested student.

Fig. 10. Behavior of the blocks of the Exponentiation tool described as 3D printing primitives (left). Standard Beetleblocks blocks of the Control palette (right).

Finally, the student can, even by mistake, change or remove important blocks so to impair the environment created to them by their teacher to learn the exponentiation operation. This is extremely undesirable, as most users can be intimidated if they have to work using a tool that can be disrupted by their actions at their first approach to block programming, to computer science and to technology. 2.4 Computational Pedagogy by Simplified Coding in a Safe Environment What we propose in this paper as the correct implementation of computational pedagogy principles is the creation of simplified block languages in a “safe” environment for all

220

S. Federici et al.

subjects, that is the creation of block-programming environments where the users don’t see all the blocks, options, and elements of the original environment as it happens instead in the “Pure Coding” and “Special-Purpose Tools” strategies, so that the users cannot impair the environment created to them or be confused by irrelevant elements. That is exactly what happens in a standard environment with a given purpose, such as Scratch. This strategy is exemplified by the BlockLang tool, a special-purpose tool with a reduced set of blocks that allow the students to build a sentence by embedding blocks that create a drawing of the described situation (Fig. 11).

Fig. 11. Learning the meaning of I have got seven lemons in the blockLang tool.

It is important to note that the English learning tool we have devised here, by applying block programming to a non-technical subject such as English as a second language, is not intended to become a full-fledged tool to learn English. The final purpose of the tool indeed is just having a simple way to introduce students to the basics of the English language in a fun way (Federici et al. 2015). This approach is similar in some sense to the development of simple block programming tools (such as miniC; Federici 2011) designed to introduce students to the basics of computer programming by overcoming the usual initial difficulties due to syntax, memory, etc. By studying the performances of the users, we wanted to clearly understand if students can get good performances in translating English by letting them embed blocks to build interactive phrases and sentences in a way that will automatically create visual representations of the phrase or sentence when correct. The tool was built by means of BloP (Block Programming environment; Federici and Gola 2014), an extension of the Snap environment that allows to easily build special-purpose block-programming environments by just knowing Snap programming. BloP is specifically designed so that in the new tool created with BloP “irrelevant” elements are not visible and the tool can be safely used even by young users without the worry of impairing the environment by doing something inappropriate. The BlockLang interface is very similar to the Snap environment (Fig. 12). Note that the interface is fully in Italian (except, of course, for the English phrase/sentence blocks) so that young Italian students are not overcharged by too many English elements to learn in advance. Thanks to the customization features offered by

Computational Pedagogy: Block Programming as a General Learning Tool

221

Fig. 12. The BlockLang environment: the categories (at the top left), the blocks (at the bottom left), the scripts (in the center) and the Stage (at the top right) (Federici et al. 2019b).

Snap, the interface can be easily translated to other languages so that the tool can be used to learn English food phrases/sentences by students of different nationalities. At the top left corner, we see new block categories -specifically created for BlockLang by using the BloP features- to organize the blocks into meaningful groups (Fig. 13), namely Cibi (Foods), Colori (Colors), Pasti (Meals), Verbi (Verbs), Numeri (Numbers), and Aggettivi (Adjectives).

Fig. 13. Interactive objects clearly visible in the block categories of BlockLang (Federici et al. 2019b).

When the blocks from each category are “run” by clicking them they create visual representations on the Stage. For example, the “the bread” block, when dragged to the script area and “run” by clicking it, draws a loaf of bread on the stage together with its Italian translation “il pane” at the top of the Stage (Fig. 14).

Fig. 14. Running a Food block (Federici et al. 2019b).

222

S. Federici et al.

Each food, when corresponding to a countable item, has a “companion” block with a fillable gap for an argument representing the number of items. For example, we have the the apple block but also the ONE apple block. Note that redundancy, in block languages, is a very well-accepted mechanism, largely used by both Scratch and Snap. The capital letters in the fillable gaps are just a cue to remember to the students that the gap must be filled by another block. The word ONE, for example, suggests that the type of the blocks that correctly fills the gap is a block from the Numbers category. If the students try to run the block without filling the gap, they see an error message on the Stage. Instead, when the gap is correctly filled by a block from the Numbers category (e.g., the block “two”) and the final argument “e” is replaced by typing the correct singular/plural ending (in this case “es”) inside the gap the correct result is shown on the Stage (Fig. 15). The correct translation, in this case “due mele”, shows up at the top of the Stage and two apples are drawn on the Stage.

Fig. 15. Running the “two apples” script after filling the gap “ONE” of the “ONE apple” block with a “two” block from the Numbers category and the “e” gap with “es” (Federici et al. 2019b).

There are no blocks that do not get a corresponding drawing on the stage. Every script that does not contain unfilled gaps can be run, so that its visual meaning is shown on the Stage. So, even running the “two” block will show the number 2 on the Stage (Fig. 16) and running, e.g., the “dinner” block from the Meals category will draw a classic dinner scene on the Stage (Fig. 17).

Fig. 16. Running the “two” script (Federici et al. 2019b).

Computational Pedagogy: Block Programming as a General Learning Tool

223

Fig. 17. Running the “dinner” script (Federici et al. 2019b).

Due to the limited vocabulary usually learned by 2nd grade students, the phrases and sentences available in the BlockLang tool on the food vocabulary could not be very different from the phrases and sentences the students already knew. The tool has then been based on a small set of phrases and sentences, namely “I like/don’t like Food”, “I have got Food”, “I am/am not hungry”, “Food is/are Color”, “I have Meal”. Note that, due to the stepwise learning of the English language in the Italian primary school, the students were thaught to always use the article “the” in front of names -as it happens in Italian- when there was no cardinal number. So, they learned to say I like the lemon instead of I like lemon, as if they were always referring to a specific food item. That is why, in the BlockLang tool, there is no single lemon block, but there are only the the lemon and ONE lemon blocks. By dragging for example the I have BREAKFAST block from the Verbs category and the dinner block from the Meals category and assembling them in the script area as the sentence I have dinner (Fig. 18) the stage will show the correct Italian translation, that is “Io ceno”.

Fig. 18. Running the I have dinner script (Federici et al. 2019b).

In the same way, by using the THE MILK is WHITE block from the Verbs category, the the lemon block from the Foods category, and the yellow block from the Colors category, they users can build the sentence the lemon is yellow (Fig. 19), that is the correct translation of “il limone è giallo”.

224

S. Federici et al.

Fig. 19. Running the the lemon is yellow script (Federici et al. 2019b). (Color figure online)

3 Evaluation of the Computational Pedagogy Approaches The three approaches to Computational Pedagogy have been evaluated in the three experiments described above about exponentiation and English as a second language. 3.1 Evaluation of the Pure Coding Approach In the first experiment (Federici et al. 2018) 36 students of two 5th grade classes were split in three roughly similar groups of 12 students each (Group A, B and C) by their teachers basing on their general skills. Group A worked on exponentiation by both following standard explanations and creating, in 4 sessions of 2 h each, a multimedia interactive explanation by using Scratch. Group B worked on exponentiation by following standard explanations and by playing with the multimedia interactive explanation created by their peers in Group A. Group C worked on exponentiation by just following standard explanations. The results of a first test administered right after the end of the fourth session showed that there was not big difference among the three groups. Group A had a correctness rate of 99.82%, group B 99.83% and group C 100%. The difference between the top and the bottom group was less than 0.2%, which is really non-meaningful even if we must note that students from group C, which had had 6 more hours to exercise on exponentiation with respect to students from group A that at the same time were learning about block languages, did better than any other group. In a second test administered after two weeks, the difference among the three groups was again really small (Group A 100%, group B 98.6% and group C 99.3%) but this time results from Group A had slightly increased. In the final test administered after six months, without giving prior notice to the students that, since the end of the second test, had not made further exercise on exponentiation at school, Group A had a correctness rate of 78%, Group B 73% and Group C 67%. The difference between the top and the bottom group this time was more than 10%. As we had expected, at the end of the third test we had a substantial decrease in the correctness rate of the three groups. But, what was very important to support our hypothesis of the positive effects of computational pedagogy principles, group A this time was the best group by more than 10% with respect to Group C. So, if a lot of exercise done by Groups B and C (more than 8 h spent in doing just exponentiation) proves certainly

Computational Pedagogy: Block Programming as a General Learning Tool

225

effective for a short- or medium-term evaluation, when time passes by student do not remember very well what they have learnt about the exponentiation operation. Instead, by building by themselves the explanation as done by Group A the topic is internalized much better. 3.2 Evaluation of the Special-Purpose Tool Approach In the second experiment (Federici et al. 2019a) 24 students of a 7th grade class were first given several exponentiation operations. For 7th grade student exponentiation is a topic that they reinforce at the beginning of the year after having studied it at the 5th grade. The students had to write the meaning of the operations, that is they had to write for each exponentiation operation the correct sequence of multiplications corresponding to the operation. So, for example, after “24 =” they had to write “2 × 2 × 2 × 2”, that is multiplying 2 by itself 4 times. In this first test the results were comparable to the ones obtained on average by the 5th grade students at the end of the year, that is about 72% of correct answers. Among the 24 students 3 of them got a very low score, ranging from 0% to 17%. After this test the students, according to the Special-Purpose Tool approach, learned how to build the 3D model of the 42 exponentiation operation, by using the special-purpose blocks BASE, EXPONENT, etc. added to BeetleBlocks to allow them to build 3D models of the exponentiation operations. In the second test, administered 7 days after having experimented with the 3D blocks, the results had a notable improvement, with an average percentage of about 95% of correct answers. As for the three students that had got a very low score in the first test, that is a score ranging from 0% to 17%, this time all of them had a perfect score of 100%. Even in this case we noticed a positive effect of the application of the computational pedagogy principles, were student can manipulate the elements of the topic. The three students that in the first test had got scores from 0% to 17% had very low scores in all subjects. They were not interested in school lessons but, all of them, had a very strong interest in driving and repairing agricultural vehicles and, in general, in manual work, something that was not taken into account by the standard school explanations. 3.3 Evaluation of the Simplified Coding Approach In the third experiment (Federici et al. 2019b) we analyzed the impact of the Simplified Coding approach to computational pedagogy by studying the usage of interactive tools based on block programming in foreign language learning in two 2nd grade classes. The usage of interactive tools in foreign language learning, and how these tools can improve student’s performances, is something that has been analyzed many times (Atkinson 1972; Levy 1997; Warschauer and Healey 1998; Beatty 2013) and specific studies concentrated on foreign language acquisition at the level of primary school (Neri et al. 2008; Chang et al. 2010; Han 2012; Pathan and Aldersi 2014; Moreno-León and Robles 2015). Several studies hinged on the active learning paradigm (Prince 2004; Bachelor et al. 2012) where users must actively take part in the learning process. In those studies, to facilitate the acquisition of a foreign language at the primary school level, were used

226

S. Federici et al.

robots (Chang et al. 2010; Han 2012) the development of games (Pathan and Aldersi 2014) or the invention of stories (Moreno-León and Robles 2015). In this experiment, one class of students, by using a simplified visual programming language, learn how to assemble syntactic structures that create drawings of the situation described in the sentence. By using this strategy, students not only have the chance to test the meaning of the individual foreign language “components”, such as nouns, verbs, adjectives, etc. but they can also see if the final English sentence corresponds to the phrase or sentence that they are translating. Students must fully understand how all the parts of the English phrase or sentence fit together, in order to correctly assemble them by means of a simplified programming language. Our working hypothesis was that, by assembling language elements in an interactive way, students will better remember their relative positions, thanks to the fact of putting at work different learning strategies at the same time (Seemüller et al. 2012; Udomon et al. 2013). Learning Phrases and Sentences from the Food Domain in a Primary School. In Italy, 2nd grade students use mostly oral tests or multi-choice questions and their written tests, if any, are usually fill-the-gap exercises with only one or -at most- two words to be filled-in for each phrase or sentence. As to short open questions, the errors are mostly due to spelling and forgotten words. Errors in plurals or in using the wrong word tend to be rare. Spelling errors are likely due to the oral nature of their learning, so that they tend to Italianize the spelling, by using the Italian transliteration of the pronunciation of the words, e.g. they write carot instead of carrot, fisC instead of fish, milC instead of milk, KEIK instead of cake. As to forgotten words, they tend to forget words that have non-Latin roots, that is words that are very dissimilar from the corresponding Italian word, such as for example like (Italian piacere), milk (latte), breakfast (colazione), lunch (pranzo), dinner (cena). When it comes to closed question instead, they very rarely choose the wrong translation of single words, even when the English word is very different from the corresponding Italian one. The suggestion they get is enough to them. Moreover, they tend to choose the wrong verb in sentences where the correct verb is not the literal translation of the verb used in the Italian sentence. So, for example, to them people do a meal (instead of have), have something (instead of have got) and not do something (instead of do not do). Programming English Sentences. The experiment involved a total of 36 students from two 2nd grade classes (the test class and the control class) from a local elementary school. The experiment, running during school hours already devoted to English, involved only the test class and was limited to 3 sessions of 2 h each (that is about 1 h of real work for each session) with a preliminary session of 2 h to introduce the students to Scratch. The two classes (as in Federici et al. 2018) were selected by the teachers of the school as classes that were roughly similar, basing on the general skills of the students in each class. By studying the performances of the test group, we wanted to clearly understand if students can get good performances in translating English by letting them embed blocks to build interactive phrases and sentences in a way that will automatically create visual

Computational Pedagogy: Block Programming as a General Learning Tool

227

representations of the phrase or sentence when correct. At the same time the students were acquiring the basics elements of coding. Even in this experiment students in the test group spent significantly less time in exercising on phrases and sentences of the food domain than their peer in the control group as they were introduced to block programming at the same time. Furthermore, each session in the computer lab required a non-negligible amount of time spent in technical setup operations. Second Session. Whereas in the first session the students were allowed to play with several projects created with Snap in order to learn how they could move programming blocks around and how they could assemble them in scripts, in the second session the students started using the BlockLang tool specifically created to assemble English phrases and sentences. In this very first English session, the students learnt how food blocks (that is blocks from the Foods category) create visual representations on the Stage and that the Italian translation of the block show up at the top of the Stage (Fig. 20).

Fig. 20. Running a Food block (Federici et al. 2019b).

Then they learnt that each countable food block, for example the the apple block, has a “companion” block with a fillable gap for an argument representing the number of items, in this case the ONE apple block. They were shown that the word ONE, for example, suggests that the type of the blocks that correctly fills the gap is a block from the Numbers category, that they have to change, if necessary, the final argument of the block by typing the correct singular/plural ending, and that they get an error message on the Stage if they try to run the block without filling the gap (Fig. 21). At the end of this session, many students were able to correctly anticipate the category where to find the correct block corresponding to a given Italian word. They were also able to assemble simple nominal constituents such as “three lemons” or “four ice creams” and to make the outcome show up on the Stage by running the corresponding scripts. Every time a phrase/sentence was shown on the overhead projector the correct pronunciation of the item was also practiced. Third Session. In the third session the students learned (or revised) all the words in the Foods, Colors, Meals and Numbers categories. In this session the students could read an Italian sentence and look at the corresponding drawing on the overhead projector screen and they were guided to assemble the corresponding English sentence. For example,

228

S. Federici et al.

Fig. 21. Running the “two apples” script after filling the gap “ONE” of the “ONE apple” block with a “two” block from the Number category and the “e” gap with “es” (Federici et al. 2019b).

looking at the noun phrase “pranzo”, they were guided to go to the Meals category and drag and run the lunch block on the script area (Fig. 22).

Fig. 22. Running the lunch block (Federici et al. 2019b).

In this second English session the students practiced how to build sentences like I like the bread, I have got seven lemons, I am hungry, the chocolate is brown, I have breakfast (Fig. 23 and Fig. 24).

Fig. 23. Running the I have dinner script (Federici et al. 2019b).

At the end of this session, many students were able to correctly anticipate the category and the block/script that would create on the Stage the correct objects/scene

Computational Pedagogy: Block Programming as a General Learning Tool

229

Fig. 24. Running the the lemon is yellow script (Federici et al. 2019b). (Color figure online)

shown to them on the overhead projector screen. Even this time, every time a new phrase/sentence was shown on the overhead projector screen the correct pronunciation of the phrase/sentence was practiced. Fourth Session. In the fourth and final session, the students reinforced the knowledge of phrases and sentences from the food domain by practicing how to build phrases and sentences using the BlockLang tool just by looking at the objects/scene and at their Italian translation shown on the overhead projector screen and with no suggestion by the teacher. In order to get the correct translation of a full sentence they had to identify the correct category for each word in the sentence, drag the correct blocks from these categories into the script area and assemble them by filling all the gaps in order to build the correct script. For example, in order to build the correct translation of the sentence “il limone è giallo” (the lemon is yellow) they had to drag the “the lemon”, the “yellow” and the “THE MILK is WHITE” blocks from the Foods, Colors and Verbs categories respectively. Then they had to assemble them in the script area in the correct order, suggested by the uppercase fillers of the is block (in this case “THE MILK” and “WHITE”). Finally, they had to run the script in order to check, on the Stage, if the correct sentence and the correct drawing were showing up on the Stage. The Final Test. After the fourth session had ended, the knowledge acquired by the students of the two groups about words, phrases and sentences of the food domain has been tested by giving them a test with 12 closed questions and 11 open questions. The test was done completely on paper, so that the students of the control group, that were used to paper and pencil, would be on a par with the students of the test group. On the other side, students of the test group were in a slight difficult position as they were used to complete this kind of test by using the tool and had not done much exercise on paper. The test, prepared in collaboration with the teachers, was mainly based on closed questions as the teachers thought their students could not be able to correctly answer open questions, being them used to fill-the-gap assignments. As we wanted to see if the usage of a block programming approach could have a major impact on learning the “structure” of English phrases, we asked to add a further test based on open questions, even if we were aware that this was something that the students were not used to.

230

S. Federici et al.

The closed questions were in the majority two–words noun phrases (7 questions) where the students had to select the correct food domain word, and short sentences (5 questions) where the students had to select the correct verb or the correct word order. The open questions instead were mostly two–words noun phrases (9 questions) and a few short sentences (2 questions) where the students had to write down the correct translation. Taking into account the features of the tool, we were expecting the test class doing well in those tasks that were better supported by the tool, namely using the correct singular/plural form, the correct verb and the correct construction. Instead we were expecting them to do less (hopefully, slightly) well in the tasks less supported or not supported at all by the tool, like for example writing down a word by using the correct spelling or remembering the correct translation of each word in the phrase/sentence. Taking into account the common mistakes made by young Italian students of 2nd grade, the errors made by the students of the two classes were classified as follows: spelling, wrong singular/plural, wrong word(s), missing word(s), wrong word order/verb structure. Due to the different nature of the two tests, the distribution of the errors was different in the two tests, as shown in Table 1. Table 1. Distribution of the different sources of errors. Closed questions

Open questions

Wrong spelling

Nd

Wrong spelling

30%

Wrong sing/plur

Nd

Wrong sing/plur

11%

Wrong word

48% Wrong word

Missing word

Nd

Missing word

Wrong word order 52% Wrong word order

8% 50% 1%

Test Results. The results of the test followed the foreseen pattern, even if the differences were less pronounced than expected. Indeed, students in the test group, as we expected, found slightly more difficult to remember the correct translation of food domain words than the students in the control group as they were used to search for those words by trial and error and hadn’t had time to practice the recall of the correct translation of a given word without using the tool. Instead, still as expected, they did a lot better than the control group when it came to choosing the correct word order or the correct structure of the sentence. So, in the first test, based on closed questions, the students in the test group got a better overall score (Table 2) as, even if they did slightly worse at remembering the correct words when they had to choose the correct translation of a food noun among the proposed ones (22% of error vs 21% of errors in the control group), they only made 27% of errors when they had to choose the correct word order in short sentences. Instead, in the control group, the error rate for these sentences was 40%.

Computational Pedagogy: Block Programming as a General Learning Tool

231

Table 2. Errors on closed questions (lower is better). Test group Wrong word

Control group 22% Wrong word

21%

Wrong word order 27% Wrong word order 40% GLOBAL

24% GLOBAL

29%

Instead, in the second test, based on open questions, the students in the control group got a better overall score (Table 3). Table 3. Errors on open questions (lower is better). Test group Wrong spelling

Control group 44% Wrong spelling

27%

Wrong sing/plur 15% Wrong sing/plur 15% Wrong word

8% Wrong word

12%

Missing word

81% Missing word

55%

Missing word

0% Missing word

1%

GLOBAL

24% GLOBAL

21%

There are two interesting points to note. The first one is that even if the test group, as expected, did not remember at all the translations of many words (81% of phrase/sentences had at least one missing word), nonetheless when they used a word, they were more accurate than the control group (only 8% of wrong words compared to 12%). Second, when we don’t take into account the features that are not specifically supported by BlockLang (that is the spelling and rote learning of isolated English words) the difference between the two groups decreases sensibly, with even a slight superiority of the test group with respect to the control group, namely 8% vs 9% (Table 4). Table 4. Errors on open questions (revised). Lower is better. Test group Wrong sing/plur

Control group 15% Wrong sing/plur

15%

Wrong word

8% Wrong word

12%

Wrong word order

0% Wrong word order

1%

GLOBAL

8% GLOBAL

9%

Even if the students in the test group didn’t remember well the translation of each word, we are confident that this can improve when they are using the tool for a longer

232

S. Federici et al.

time. In any case, the rote memorization of a word is something that cannot be “seen”, so it is beyond the objectives of the BlockLang tool. Rote memorization is something that can be reached by repetition. We have to remember that the students in the test group could only exercise in total for less than half of the time of the control group (namely 3 h instead of 8 h). Analysis of the Results. What follows from the results of the two tests is that adding to an educational tool operations that are reminiscent of block-programming improves the performance of the students with respect to the features specifically addressed in the interactive tool. This is true even when, in order to learn how to use a block programming environment, students devote less time to exercising on the specific topic. Before analyzing the possible reasons of this positive outcome, we need to notice that programming-based learning does not require, for every new topic, a higher number of sessions than the standard classroom learning. When this computer-supported educational methodology is acquired by the class, part of the time spent in teaching and exercising can be fruitfully replaced by self-exploration of the special-purpose items made available in the tool. The results of the experiment show that even less exercise is not a disadvantage if it is replaced by another kind of manipulative activities that gives the students further insight in what is behind the specific topic they are studying, whether they are new ways of expressing a sequence of multiplications (as in the exponentiation operation) or words and phrases already unknown. Whereas a lot of standard exercise (more than 8 h spent in translating food domain phrases/sentences) proves certainly effective for a short-term evaluation, we suspect that when time passes by, as also shown in the “Pure Coding” experiment, students will tend to forget the basic elements of the topic that they have learnt just by rote memorization and that have not deeply internalized. So, we expect that, after some time, they won’t remember well how a noun phrase, a number, an agreement or a verb works in an English sentence. Instead, by assembling the basic “programming” blocks to form a correct sentence and by getting an immediate visual feedback, students are forced to internalize the elements that compose an English sentence in a more durable way. Handling Interactive Objects to Understand and Remember a School Topic. To allow the student to just test their knowledge about English phrases and sentences we could build an interactive tool that will ask them to click each word in the translation in the correct order. A better way of using a visual programming environment is instead to create an interactive model of the topic by modelling interactive objects for every component of the topic. Several different learning strategies are at work in this case (Udomon et al. 2013, Seemüller et al. 2012) in order to build an interactive virtual model that will help the student to improve the recall of the topic. The students must assemble these objects in the correct way so adding a manipulative level that will allow them to learn how every component relates to the other components. To create the correct phrase/sentence the student will have then to know which component must be used to fill a gap and move the component to the right gap.

Computational Pedagogy: Block Programming as a General Learning Tool

233

All these elements are clearly visible in the list of blocks of BlockLang so, for the students, they are tangible objects whose corresponding object in the “real world” can be discovered in the drawings shown on the Stage.

4 Applying Computational Pedagogy to Other Disciplines The Computational Pedagogy principles discussed in this paper, that allows students to acquire a deeper understanding of exponentiation and English words and phrases, in our view is not limited to the study of exponentiation or foreign languages. We think that every task that has a “linguistic” structure can be mimicked by building blocks that are reminiscent of visual programming languages. Just to give an example, a deeper level of text comprehension (a topic that is currently under continuous investigation at all school levels) could be reached by using block programming to make students understand the underlying meaning of constructs such as “every time that” (loop) or what does it mean using a word like, e.g., eat that stands for a complex list of steps (namely, opening the mouth, inserting food inside, etc.).

5 Conclusions In this paper we illustrated the positive outcomes of recent experiments about the introduction of block programming as an effective strategy to improve the performances and the interest of students even for topics apparently so distant from computer programming like learning a foreign language. We analyzed the results of several experiment based on using general block programming environments or designing new block programming environments for specific topics. The devised strategy is not limited to language learning but can be fruitfully applied to further linguistic disciplines.

References Atkinson, R.C.: Optimizing the learning of a second-language vocabulary. J. Exp. Psychol. 96(1), 124–129 (1972) Bachelor, R.L., Vaughan, P.M., Wall, C.M.: Exploring the effects of active learning on retaining essential concepts in secondary and junior high classrooms, dissertation thesis, School of Education, Saint Xavier University, Chicago, Illinois (2012) Beatty, K.: Teaching & Researching: Computer-Assisted Language Learning. Routledge, Taylor and Francis Group, New York (2013) Calao, L.A., Moreno-León, J., Correa, H.E., Robles, G.: Developing mathematical thinking with scratch. In: Conole, G., Klobuˇcar, T., Rensing, C., Konert, J., Lavoué, É. (eds.) EC-TEL 2015. LNCS, vol. 9307, pp. 17–27. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-242 58-3_2 Chang, C.W., Lee, J.H., Chao, P.Y., Wang, C.Y., Chen, G.D.: Exploring the possibility of using humanoid robots as instructional tools for teaching a second language in primary school. J. Educ. Technol. Soc. 13(2), 13–24 (2010)

234

S. Federici et al.

Costa, S., Gomes, A., Pessoa, T.: Using scratch to teach and learn English as a foreign language in elementary school. Int. J. Educ. Learn. Syst. 1, 207–213 (2016) Costley, K.C.: The Positive Effects of Technology on Teaching and Student Learning (2014). https://files.eric.ed.gov/fulltext/ED554557.pdf. Accessed 14 Oct 2019 Federici, S.: A minimal, extensible, drag-and-drop implementation of the C programming language. In: Proceedings of SIGITE 2011, West Point, New York, USA (2011) Federici, S., Gola, E.: BloP: easy creation of Online Integrated Environments to learn custom and standard Programming Languages. In: Proceedings of SIREM-SIEL 2014, 1st Joint SIREMSIel conference. The Innovative LEDI Publishing Company (2014) Federici, S., Gola, E., Brau, D., Zuncheddu, A.: Are educators ready for coding? From students back to teacher: introducing the class to coding the other way round. In: Proceedings of the 7th International Conference on Computer Supported Education, Funchal, Madeira; Portugal, vol. 2, pp. 124–133 (2015) Federici, S., Medas, C., Gola, E.: Who learns better: achieving long-term knowledge retention by programming-based learning. In: Proceedings of 10th International Conference on Computer Supported Education, Funchal, Madeira, Portugal, vol. 2, pp. 124–133 (2018) Federici, S., Molinas, J., Sergi, E., Lussu, R., Gola, E.: Rapid and easy prototyping of multimedia tools for education. In: Proceedings of the 5th World Conference on Media and Mass Communication, Kuala Lumpur, Malaysia, vol. 5, issue 1, pp. 12–24 (2019a) Federici, S., Sergi, E., Gola, E., Zuncheddu, A.: Easy prototyping of multimedia interactive educational tools for language learning based on block programming. In: Proceedings of 11th International Conference on Computer Supported Education, Heraklion, Crete, Greece, vol. 2, pp. 140–153 (2019b) Foerster, K.T.: Integrating programming into the mathematics curriculum: combining scratch and geometry in grades 6 and 7. In: Proceedings of SIGITE 2016, Boston, MA, USA, pp. 91–96 (2016) Gresse von Wangenheim, C., Cruz Alves, N., Rodrigues, P.E., Hauck, J.C.: Teaching computing in a multidisciplinary way in social studies classes in school – a case study. Int. J. Comput. Sci. Educ. Sch. 1(2) (2017). https://doi.org/10.21585/ijcses.v1i2.9 Han, J.: Robot assisted language learning. Lang. Learn. Technol. 16(3), 1–9 (2012) Harvey, B., Monig, J.: Bringing “no ceiling” to scratch: can one language serve kids and computer scientists. In: Proceedings of Constructionism 2010, Paris (2010) Hobenshield Tepylo, D., Floyd, L.: Learning Math Through Coding (2016). http://researchideas. ca/mc/learning-math-through-coding/. Accessed 14 Oct 2019 Kuittinen, M., Sajaniemi, J.: Teaching roles of variables in elementary programming courses. In: Proceedings of the 9th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education, ITiCSE 2004, Leeds, UK (2004) Kohn, T.: Variable evaluation: an exploration of novice programmers’ understanding and common misconceptions. In: Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education, Seattle, Washington, USA, pp. 345–350 (2017) Levy, M.: Computer-Assisted Language Learning: Context and Contextualisation. Oxford University Press, New York, Oxford (1997) Lopez, V., Hernandez, M.I.: Scratch as a computational modelling tool for teaching physics. In: Physics Education, vol. 50, no. 3. IOP Publishing Ltd (2015) Maloney, J., Resnick, M., Rusk, N., Silverman, B., Eastmond, E.: The scratch programming language and environment. ACM Trans. Comput. Educ. 10(4) (2010). https://doi.org/10.1145/ 1868358.1868363 Miller, J., Larkin, K.: Using coding to promote mathematical thinking with year 2 students: alignment with the Australian curriculum. In: Downton, A., Livy, S., Hall, J. (eds.) 40 years on: We are still learning! Proceedings of the 40th Annual Conference of the Mathematics Education Research Group of Australasia, Melbourne, Australia, pp. 381–388 (2017)

Computational Pedagogy: Block Programming as a General Learning Tool

235

Moreno-León, J., Robles, G.: Computer programming as an educational tool in the English classroom. In: Proceedings of IEEE Global Engineering Education Conference (EDUCON 2015), p. 962 (2015) Pathan, M.M., Aldersi, Z.E.M.: Using games in primary schools for effective grammar teaching: a case study from sebha. Int. J. Engl. Lang. Transl. Stud. 2(2), 211–227 (2014) Neri, A., Mich, O., Gerosa, M., Giuliani, D.: The effectiveness of computer assisted pronunciation training for foreign language learning by children. J. Comput. Assist. Lang. Learn. 21(5), 393–408 (2008) Prince, M.: Does active learning work? A review of the research. Res. J. Eng. Educ. 93(3), 223–231 (2004) Romagosa, B., Rosenbaum, E., Koschitz, D.: From the Turtle to the Beetle (2016). http://openac cess.uoc.edu/webapps/o2/handle/10609/52807. Accessed 14 Oct 2019 Seemüller, A., Müller, E.M., Rösler, F.: EEG-power and -coherence changes in a unimodal and a crossmodal working memory task with visual and kinesthetic stimuli. Int. J. Psychophysiol. 83, 87–95 (2012) Udomon, I., Xiong, C., Berns, R., Best, K., Vike, N.: Visual, audio, and kinesthetic effects on memory retention and recall. J. Adv. Student Sci. (JASS) 1, 1–29 (2013) Warschauer, M., Healey, D.: Computers and language learning: an overview. Lang. Teach. 31, 57–71 (1998)

RefacTutor: An Interactive Tutoring System for Software Refactoring Thorsten Haendler(B) , Gustaf Neumann, and Fiodor Smirnov Institute for Information Systems and New Media, Vienna University of Economics and Business (WU Vienna), Vienna, Austria {thorsten.haendler,gustaf.neumann,fiodor.smirnov}@wu.ac.at

Abstract. While software refactoring is considered important to manage software complexity, it is often perceived as difficult and risky by software developers and thus neglected in practice. In this article, we present refacTutor, an interactive tutoring system for promoting software developers’ practical competences in software refactoring. The tutoring system provides immediate feedback to the users regarding the quality of the software design and the functional correctness of the (modified) source code. In particular, after each code modification (refactoring step), the user can review the results of run-time regression tests and compare the actual software design (as-is) with the targeted design (tobe) in order to check quality improvement. For this purpose, structural and behavioral diagrams of the Unified Modeling Language (UML2) representing the as-is software design are automatically reverse-engineered from source code. The to-be UML design diagrams can be pre-specified by the instructor. To demonstrate the technical feasibility of the approach, we provide a browser-based software prototype in Java accompanied by a collection of exercise examples. Moreover, we specify a viewpoint model for software refactoring, allocate exercises to competence levels and describe an exemplary path for teaching and training. Keywords: Intelligent tutoring system · Software refactoring · Software design · Code visualization · Unified Modeling Language (UML2) · Software-engineering education and training · Interactive training environment

1

Introduction

As a result of time pressure in software projects, priority is often given to implementing new features rather than ensuring code quality [40]. In the long run, this leads to software aging [51] and increased technical debt with the consequence of increased maintenance costs on the software system (i.e. debt interest) [36]. As studies found out, these costs for software maintenance often account to 80% or more than 90% of software project costs [16,57]. A popular means for repaying this debt is software refactoring, which aims at improving code quality by restructuring the source code while preserving the external system c Springer Nature Switzerland AG 2020  H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 236–261, 2020. https://doi.org/10.1007/978-3-030-58459-7_12

RefacTutor: An Interactive Tutoring System for Software Refactoring

237

behavior [21,49]. Several kinds of flaws (such as code smells) can negatively impact code quality and are thus candidates for software refactoring [3]. Besides kinds of smells that are relatively simple to identify and to refactor (e.g. stylistic code smells), others are more complex and difficult to identify, to assess and to refactor, such as smells in software design and architecture [21,46,66]. Although refactoring is considered important to guarantee that a system remains maintainable and extensible, it is often neglected in practice due to several barriers, among which are perceived by software developers the difficulty of performing refactoring activities as well as the risk of introducing errors into a previously correctly working software system, and a lack of adequate tool support [67]. These barriers pose challenges with regard to improving refactoring tools as well as promoting the skills of software developers. In the last years, several tools have been proposed for identifying and assessing bad smells in code via certain metrics and benchmarks mostly based on static analysis (e.g. level of coupling between system elements), for supporting in planning and applying the (sequences of) refactoring techniques, such as JDeodorant [69] and DECOR [43], as well as for measuring and quantifying the impact of bad smells on software quality and project costs in terms of technical debt, such as SonarQube [10] or JArchitect [13]. Despite these advantages, the refactoring process is still challenging (also see [27]). For instance, more complex kinds of smells (such as smells on the level of software design and architecture) are covered only moderately by detection tools [17,18]. Moreover, these tools to produce false positives [19] (representing constructs intentionally used by the developer with symptoms similar to smells, such as certain design patterns), which need to be assessed and discarded by developers manually. In addition, the decision what and how to refactor also depends on the domain knowledge (e.g. regarding design rationale) or the project schedule for which human expertise is required such as provided by software architects and project managers [52]. Due to these issues, the refactoring process is still mostly performed without tool support [45] and thus requires developers to be competent. In turn, so far only little attention has been paid in research to education and training in the field of software refactoring. Besides textbooks with best practices and rules on how to identify and to remove smells via refactoring, e.g. [21], there are only a few approaches (see Sect. 5) that aim at mediating or improving software developers’ practical and higher-level competences such as application, analysis and evaluation according to Bloom’s taxonomy of cognitive learning objectives; see, e.g. [8,35]. In the conference paper presented at CSEDU-2019 [30], we have proposed an approach for an interactive learning environment for mediating and improving practical competences in software refactoring. The basic idea of the developed tutoring system is to foster active learning by providing instant feedback to user activities (cf. [9,22]). In particular, the tutoring system provides immediate feedback and decision-support to the user regarding relevant aspects for software refactoring, i.e. both the quality of the software design as well as the functional correctness of the source code modified by the user (see Fig. 1). After each refactoring step, the user can review the results of run-time regression tests (e.g.

238

T. Haendler et al.

Fig. 1. Exercise interaction supported by refacTutor [30].

pre-specified by the instructor; in order to check that no error has been introduced) and compare the actual software design (as-is) with the intended design (to-be; in order to check quality improvement). For this purpose, structural and behavioral software-design diagrams of the Unified Modeling Language (UML2) [47] representing the as-is software design are automatically reverse-engineered from source code. The to-be design diagrams (also in UML) can be pre-specified by the instructor. UML is the de-facto standard for modeling and documenting structural and behavioral aspects of software design. There is evidence that UML design diagrams are beneficial for understanding software design (issues) [5,25,56]. This way, the approach primarily addresses the refactoring of issues on the level of software design and architecture. Moreover, the code visualization supports in building and questioning users’ mental models, see, e.g. [11,24]. In general, so the approach aims to support learners to actively apply knowledge (previously consumed on conceptual level) and to reflect their own problemsolving skills [42]. This post-conference revision of the CSEDU-2019 publication [30] incorporates important extensions, also in response to feedback by reviewers and conference attendees. In particular, this article includes the following additional contributions: (1) We elaborate on the conceptual foundations of the tutoring system, e.g. by specifying a viewpoint model for software refactoring (see Sect. 2). (2) We provide a set of examples for refactoring exercises (available for download; e.g. to be used with the developed software prototype; see Sect. 3). (3) We analyze and allocate the competence levels (in terms of prerequisite and target competences) addressed by training exercises according to Bloom’s revised taxonomy of educational objectives (see Sect. 4). (4) We extend the discussion of related work, e.g. by including recently published approaches (see Sect. 5). The applicability of the proposed tutoring system is demonstrated in the following two ways. On the one hand, we introduce a proof-of-concept implementation in Java 1 in terms of a browser-based development environment. Its 1

See Sect. 3. The software prototype is available for download from http:// refactoringgames.com/refactutor.

RefacTutor: An Interactive Tutoring System for Software Refactoring

239

functionality is illustrated via a detailed exercise example. On the other hand, we reflect on the usability for teaching and training refactoring. For this purpose, we describe exemplary types of exercises, allocate addressed prerequisite and target competences and sketch a path for learning and training software refactoring. Structure. The remainder of this article is structured as follows. In Sect. 2, the applied conceptual framework of the interactive tutoring system (including interactive learning and the applied views for software refactoring) is elaborated in detail. In Sect. 3, we introduce refacTutor. In particular, after an overview of components and activities, we present a software prototype in Java including a detailed exercise example as well as a collection of further examples. Then, Sect. 4 illustrates the usability of the tutoring system in the learning context, i.e. by describing two exercise scenarios (i.e. design refactoring and test-driven development), allocating addressed competence levels and drawing an exemplary learning and training path. In Sect. 5, related approaches (e.g. approaches for teaching software refactoring and tutoring systems) are discussed. Section 6 reflects on limitations and further potential of the approach and Sect. 7 concludes the paper.

2

Interactions in (Training) Software Refactoring

In this section, we describe the applied conceptual framework of the interactive tutoring system in terms of interactive learning (see Sect. 2.1) and views on the software system relevant for software refactoring (see Sect. 2.2). 2.1

Interactive Learning

Active learning focuses on mediating practical competences by actively involving and engaging learners [22]. This way, it strongly differs from teacher-centered approaches, where learners mainly have to consume passively. This active participation, however, requires the learner to reflect on their problem-solving skills and at the same time to apply the knowledge previously imparted. [42]. In general, a challenge can be seen in stimulating learners to engage actively in the learning process. One key to this is to provide intelligent interactions in the form of immediate feedback, which can either be based on social interaction (e.g., with other learners) or computer-aided and automated as provided by intelligent tutoring systems [38,59], which enable the learner to immediately reflect on and, if necessary, improve the consequences and effects of a performed action. Figure 2 depicts a generic cyclic workflow of decision making including double-loop learning (left-hand side) [4,27] applied to software refactoring exercises (right-hand side).2 The exercise interaction represents a cycle of analysis and actions performed by the user which is built on immediate feedback provided by the tutoring system. Driven by the specific task (problem recognition) and 2

This workflow is supported by refacTutor; see Sect. 3 and Fig. 4 in particular.

240

T. Haendler et al.

Fig. 2. Generic cycle for decision making including double-loop learning (left-hand side; see [4, 27]) applied to refactoring exercises (right-hand side; oriented to [30]).

based on the user’s mental model, the user analyses the task and investigates the provided views on the software system. After analysis, the user identifies options and plans for refactoring (decision making). In this course, the user selects a certain option for code modification and plans the particular steps for performing them (implementation). After the code has been modified, the user analyzes the feedback given by the tutoring system (evaluation). This workflow supports a double-loop learning [4]. Through previous learning and practice experiences, the users have built a certain mental model [11] on performing software refactoring, on which the analysis and the decisions for planned modifications are built (first loop), e.g. how to modify the code structure in order to remove the code or design smell. After modification, the feedback (e.g. in terms of test result or the delta of the as-is and the to-be design diagrams) impacts the user’s mental model and allows for scrutinizing and adapting the decision-making rules (second loop). The following section investigates the views and feedback mechanisms relevant to software refactoring in particular. 2.2

Views on Software Refactoring

Here we investigate the refactoring process as well as the relevant feedback for decision-making and learning in software refactoring.

RefacTutor: An Interactive Tutoring System for Software Refactoring

241

Refactoring Process. Performing software refactoring is generally driven by the goal to improve the internal quality of the source code, e.g. regarding its maintainability or extensibility [49]. For this purpose, the code is modified while the external behavior of the system should remain unaffected. A typical risk, which often prevents refactoring activities, is to introduce errors into a correctly working software system [45]. However, the process of software refactoring can be roughly divided into the two following activities (see [27]): (1) Identifying and assessing opportunities for refactoring (such as bad code smells and other technical-debt items [36]), and (2) Planning and applying refactoring techniques to remove the bad smells [21]. In the following, we focus on the actual process of removing the smelly structures by modifying the source code (i.e. activity 2). During software refactoring, the system under analysis (SUA) can take different system states (see top diagram in Fig. 3). The states can be classified into the Initial State (before refactoring), Transitional States (during the refactoring) and a Final State(s) (representing the system after refactoring). In particular, the Initial State represents the software system before refactoring, i.e. with correct functionality (tests passing), but including a certain identified refactoring opportunity (bad smell), such as a CyclicDependency [21,66] (see Fig. 3). In order to remove this smell, multiple refactoring options exist, which also depend on the characteristics of the concrete smell instance. In general, as already mentioned above, for more complex bad smells multiple options and sequences for achieving a desired design can exist, such as for some types of software-design smells [66]. This way, the options and steps of possible modifications can be represented via a graph structure (i.e. modifications as edges, states as nodes). The performance of one user by applying several refactoring techniques (steps) then draws a concrete path through this graph. However, in [66] the following four options are distinguished for removing a CyclicDependency smell: (a) ExtractClass in terms of introducing a new class containing the methods and field that introduce the dependency. (b) Multiple MoveMethod (and/or MoveField) refactorings to move the methods or fields that introduce the cyclic dependency to one of the other participating classes. (c) InlineClass to merge the two dependent classes into one class. (d) Multiple ExtractMethod and MoveMethod (as refinement to (a)) to remove only the code fragment from including methods that introduces the dependency and then move it accordingly. For further details on the refactoring techniques or bad smells, please see e.g. [21,66]. After each code modification (e.g. MoveMethod refactoring), the software system changes its state (see Fig. 3). Besides the impact on the smelly code structure, the code modification can also introduce errors resulting in failing tests. However, every actual or possible modification to be performed on a certain state then leads to a certain following state. In case that it is planned to apply

242

T. Haendler et al.

further refactoring techniques, this state is described as a Transitional State (see e.g. the bottom row in Fig. 3). After the planned modifications are performed, the Final State of the system is reached. In particular, we distinguish between failing (F) and passing (P) final states. Failing states represent states that do not conform to other quality requirements (especially functional requirements; i.e. the tests fail). Passing states (P), in turn, fulfill the defined quality standards. For example, when performing the InlineClass refactoring for removing a CyclicDependency (see third row in top diagram in Fig. 3), the methods calling the moved methods or fields of the merged classes might fail addressing the target. In order to pass the tests, these dependencies need to be updated by applying another code modification (e.g. adapting the target class).

Fig. 3. Options and paths of refactoring an exemplary bad smell (i.e. CyclicDependency (top; based on the refactoring options described in [66]) and the applied viewpoint model for software refactoring represented as a UML class diagram (bottom; extending the viewpoint model in [30]).

A Viewpoint Model. In addition to the procedural aspects reflected above, certain views on (the states of) a software system are important for software refactoring, which can be derived from the definition of software refactoring;

RefacTutor: An Interactive Tutoring System for Software Refactoring

243

i.e. refactoring as a restructuring of the source code driven by the objective to improve the system’s internal quality (e.g. in terms of maintainability or extensibility) without changing the external system’s behavior [21,49]. Oriented to this definition, certain views on the software system can be considered relevant for software refactoring, which can be structured in a viewpoint model. Viewpoint models (also regarded as view models or viewpoint frameworks [41]) are means to structure a set of coherent (often concurrent) views on a software system. Each view represents the system from the perspective of one or multiple concerns held by stakeholders of the system [12]. They are especially popular in software architecture for constructing and maintaining a software system [37,61]. In addition, viewpoint models are also applied in other activities related to software engineering such as requirements engineering [14,63] and process management and modeling [62]. Oriented to and as a complement to these viewpoint models, we propose and apply a viewpoint model for software refactoring that includes the following five views on (states of) a software system: source code, run-time tests, structural software design and behavioral software design as well as artifact quality. The bottom diagram in Fig. 3 (see above) depicts the structure of the views and the relations between them in terms of viewpoint model represented as a class diagram of the Unified Modeling Language [47]. The included views with motivating concerns (and optionally analysis tools to obtain the views) are described below (also see Fig. 3). A The Source Code represents the basis for other views and is the target 

of refactoring activities, i.e. it is analyzed and modified by the software developer. B Run-Time Tests are a popular means to specify the functional require ments for the source code. Thus, they are often used in software projects in terms of regression tests to ensure the functional correctness of the source code, i.e. that no error has been introduced by the code modifications. For automating the tests certain test frameworks such as XUnit or scenariobased tests are available. Feedback is then returned to the user in terms of the test result. C Design Structure reflects the structure of the source code. A popular  means for documenting the design structure are UML class diagrams [47], which can either be created manually or derived (a.k.a. reverse-engineered) automatically by using tools or techniques based on static-code analysis (cf. [70]). The diagrams can support software engineers (and other stakeholders) in getting a better overview of the logical structure of the software system. As studies found out, UML design diagrams are especially beneficial for understanding software design (issues) [5,25,56]. In general, the diagrams can either represent the as-is design (reflecting the current state) or the to-be design (defining an intended or targeted software design). D Design Behavior represents behavioral aspects of the software design  (e.g. interactions or processes). Here, also UML diagrams represent the defacto standard for specification. For describing interactions on a level of

244

T. Haendler et al.

software design, UML sequence diagrams [47] are popular. For automatically deriving (a.k.a. reverse-engineering) the behavioral design diagrams representing the as-is software design of the source code, (automatic) static and dynamic analysis techniques can be leveraged; see, e.g. [31,34,53]. In particular, there is evidence that for identifying and assessing software design smells (such as Abstraction or Modularization smells), a combination of UML class diagrams (providing inter-class couplings evoked by methodcall dependencies) and sequence diagrams (representing run-time scenarios) is helpful [25]. E Orthogonally to the views above, which also represent concrete arti facts, the Quality view reflects on further quality aspects such as the Maintainability and the Extensibility of the software artifacts, which can e.g. concretely manifest via the level of code complexity. For instance, analysis tools such as software-quality analyzers (e.g. SonarQube [10]) apply certain code metrics to measure the software quality. Thus, they can also be applied to identify quality issues (such as smells) in the source code or software design. Moreover, these metrics can be used to measure and quantify the technical debt score [36] (e.g. as person hours required to repay the debt). The proposed tutoring system provides these views conforming to the viewpoint model depicted in Fig. 3 (bottom) as immediate feedback (and decision support) to code modifications performed by the user. Further details on the feedback mechanisms and automatic-analysis techniques are provided in the next Sect. 3.

3

RefacTutor

In this section, we introduce our tutoring system refacTutor. At first, a conceptual overview of components and activities is given (see Sect. 3.1). Then we present a software prototype in Java and detail on used technologies, diagramderivation techniques and GUI and also illustrate a detailed exercise example (see Sect. 3.2). Finally, we provide a collection of exercise examples, which are available for download (see Sect. 3.3). 3.1

Conceptual Overview of refacTutor

Figure 4 depicts a conceptual overview of the tutoring system in terms of the technical components and artifacts as well as of the activities performed by instructor and user. At first, the instructor prepares the exercise. For this purpose, she specifies a task, provides (or re-uses existing) source code and test 1 in Fig. 4). script, and specifies the expected (to-be) software design (see step  The user’s exercise workflow then starts by analyzing the task description (step 2 ). After each code modification (step  3 ), refacTutor provides automated  feedback in terms of several kinds of views on the software system (see steps

RefacTutor: An Interactive Tutoring System for Software Refactoring

245

4 to  6 ) conforming to the proposed viewpoint model specified in Fig. 3. For 

this purpose, refacTutor integrates several analysis tools, which automatically analyze the modified source code and produce corresponding views. In particular, these analyzers consist of a test framework for checking the functional correctness (providing the test result), a quality analyzer that examines the code using pre-configured quality metrics (providing hints on quality issues), and a diagram builder that automatically derives (reverse-engineers) diagrams from source code and reflects the current as-is software design. This way, the tutoring system supports the cyclic-exercise workflow as defined in Fig. 2 by providing immediate 3 to  6 ). feedback on user’s code modifications (see steps 

Fig. 4. Overview of components and activities provided by refacTutor [30].

246

3.2

T. Haendler et al.

Software Prototype

In order to demonstrate the technical feasibility, we introduce a prototype implementation of refacTutor3 that realizes the key aspects of the infrastructure presented in Fig. 4. In the following, the used technologies, the derivation of UML class diagrams reflecting the as-is design, the graphical user interface (GUI), and an exercise example are explained in detail. Used Technologies. We implemented a software prototype as a browser-based application using Java Enterprise Edition (EE), where the source code can be modified by a client without the need of installing any additional software locally. Java is also the supported language for the software-refactoring exercises. As editor for code modification the Ace Editor [2] has been integrated, which is a flexible browser-based code editor written in JavaScript. After each test run, the entered code is then passed to the server for further processing. For compiling the Java code, we applied InMemoryJavaCompiler [68], a GitHub repository that provides a sample of utility classes allowing compilation of Java sources in memory. The compiled classes are then analyzed using Java Reflection [20] in order to extract information on the ode structure. The correct behavior is verified via JUnit tests [23]. Relevant exercise information is stored in XML files including task description, source code, test script, and information on UML diagrams (see below). Derivation of Design Diagrams. For automatically creating the as-is design diagrams, the extracted information is transferred to an integrated diagram editor. PlantUML [54] is an open-source and Java-based UML diagram editor that allows for passing plain text in terms of a simple DSL for creating graphical UML diagrams (such as class and sequence diagrams), which then can be exported as PNG or SVG files. PlantUML also already manages the efficient and aesthetic composition of diagram elements. Listing 1 presents exemplary source code to be refactored. The corresponding as-is design diagram (reflecting the current state of code) in terms of a UML class diagrams is depicted in Fig. 5. In general, the automatically derived class diagrams provide associations in terms 1 in the diagram and line 4 in of references to other classes. For example, see  the listing in Fig. 5). Generalizations represent is-a relationships (between sub2 ) in Fig. 5 and line 11 in Listing 1. Moreover, classes and super-classes); see  3 ). the derived class diagrams also provide dependencies between classes (see  Dependencies can express as call or usage dependencies representing inter-class calls of attributes or methods from within a method of another class (see lines 19 and 20 in the listing in Fig. 5). For deriving UML sequence diagrams, we apply dynamic reverse-engineering techniques based on the execution traces triggered by the run-time tests, already described and demonstrated e.g. in [31,32]. As already explained above (see Sect. 2.2), the identification and assessment of 3

The software prototype is available for download from http://refactoringgames.com/ refactutor.

RefacTutor: An Interactive Tutoring System for Software Refactoring

247

issues in software design or architecture can be improved by consulting a combination of UML class and sequence diagrams [25].

Listing 1. Exemplary Java code fragment for a refactoring task. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

19 20 21 22 23 24 25 26 27

p u b l i c c l a s s BankAccount { p r i v a t e I n t e g e r number ; p r i v a t e D o uble b a l a n c e ; p r i v a t e L i m i t l i m i t ; // ( 1 ) p u b l i c BankAccount ( I n t e g e r number , D o uble b a l a n c e ) { t h i s . number = number ; this . balance = balance ; } [...] } public c l a s s CheckingAccount extends BankAccount { // ( 2 ) [...] } public c l a s s Limit { [...] } public c l a s s Transaction { public Boolean t r a n s f e r ( I n t e g e r senderNumber , I n t e g e r r e c e i v e r N u m b e r , D o uble amount ) { BankAccount s e n d e r = new BankAccount ( 1 2 3 4 5 6 ) ; // ( 3 ) BankAccount r e c e i v e r = new BankAccount ( 2 3 4 5 6 7 ) ; i f ( s e n d e r . g e t B a l a n c e ( ) >= amount ) { r e c e i v e r . i n c r e a s e ( amount ) ; s e n d e r . d e c r e a s e ( amount ) ; return t r u e ; } e l s e { return f a l s e ; } } }

Fig. 5. Exemplary UML class diagram representing the as-is design automatically derived from source code (in Listing 1) and visualized using PlantUML.

Graphical User Interface. As outlined in the overview depicted in Fig. 4, the prototype provides GUI perspectives both for instructors to configure and prepare exercises and for users to actually perform the exercises. Each role has a specific browser-based dashboard with multiple views (a.k.a. widgets) on different artifacts (as described in Fig. 4). Figure 6 depicts the user’s perspective with provided views and an exercise example (which is detailed in the following Sect. 3.2). In particular, the user’s perspective comprises a view on the task 1 in Fig. 6). In  2 , the tests scripts description (as defined by the instructor; see  3 reports of the (tabbed) JUnit test cases are presented. The console output in  on the test result (and additional hints as prepared by the instructor or provided by additional analysis tools, e.g. technical-debt analyzers [10]). The code editor 4 in Fig. 6) provides the source code to be modified by the user (with tabs (see  for multiple source files to deal with larger source-code examples). Via the but5 , the user can trigger the execution of the run-time tests. For both the ton in  6 ) and the actual as-is to-be design (defined by the instructor beforehand; see 

248

T. Haendler et al.

design (automatically derived after and while test execution respectively and 7 ), also tabs for class and sequence reflecting the state of the source code; see  diagrams are provided. The teacher’s perspective is quite similar. In addition to the user’s perspective, it provides an editor for specifying the to-be diagrams as well as means to save or load exercises. Exercise Example. In the following, a example of a refactoring exercise applying the refacTutor prototype is illustrated in detail. In particular, the source code, run-time tests, the task to be performed by the user and to-be design diagrams (as prepared or reused by the instructor) as well as the the as-is diagrams are presented. Source Code. For this exemplary exercise, the following Java code fragment (containing basic classes and functionality for a simple banking application) has 4 in Fig. 6). been prepared by the instructor (see Listing 2 and also the editor  Listing 2. Java source code. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

public class BankAccount { private Integer number; private Double balance; private Limit limit; public BankAccount(Integer number, Double balance) { this.number = number; this.balance = balance; } public Integer getNumber() { return this.number; } public Double getBalance() { return this.balance; } public void increase(Double amount) { this.balance += amount; } public void decrease(Double amount) { this.balance -= amount; } } public class Limit { private Double amount; public Double getAmount() { return this.amount; } } public class Transaction { public Boolean transfer(Integer senderNumber, Integer receiverNumber, Double amount) { BankAccount sender = new BankAccount(123456,2000.00); BankAccount receiver = new BankAccount(234567,150.00); if(sender.getBalance() >= amount){ receiver.increase(amount); sender.decrease(amount); return true; } else {return false;} } }

Fig. 6. Screenshot of user’s perspective with views provided by refacTutor [30].

RefacTutor: An Interactive Tutoring System for Software Refactoring 249

250

T. Haendler et al.

Tests. In this example, four JUnit test cases have been specified by the instructor 2 and  3 in Fig. 6). Two out of these cases (see Listing 3 and also the views  fail at the beginning (initial state). Listing 3. Test result. 1 2 3 4 5

4 test cases were executed (0.264 s) Test case 1 "transfer_default" failed. Test case 2 "transfer_limitExceeded" failed. Test case 3 "checkBalance_default" passed. Test case 4 "accountConstructor_default" passed.

One of these failing cases is transfer_default, which is specified in the script in Listing 4, for example. Listing 4. Test case 1 transfer_default. 1 @test 2 public void transfer_default() { 3 BankAccount accountS = new BankAccount(00234573201, 2000.00); 4 BankAccount accountR = new BankAccount(00173948725, 1500.00); 5 Transaction trans = new Transaction (accountS, accountR); 6 assertTrue(trans.transfer(300.00)); 7 assertEquals(accountS.getBalance(),1700); 8 assertEquals(accountR.getBalance(),1800); 9 }

Task. For this exemplary exercise, the user is confronted with the task presented 1 in Fig. 6). in Listing 5 (also see  Listing 5. Task description. 1 Modify the given code example so that all tests pass and the given design (\ textit{as-is}) matches the targeted design (to-be). 2 In particular, this includes the following modifications: 3 (1) Modify class "Transaction" according to the following aspects 4 (a) Add attributes "sender" and "receiver" both typed by class "BankAccount" to class "Transaction". 5 (b) Add a parametrized constructor with parameters "sender" and "receiver". 6 (c) Modify method transfer to only comprise one parameter ("amount" of type Double). 7 (2) Add a class "CheckingAccount", which should be a subclass of "BankAccount".

Design Diagrams. In the course of the exercise (i.e. after each code modification), the user can review the diagrams reflecting the actual design derived from code and compare them with the targeted design diagrams (see Fig. 7; also see 6 and  7 in Fig. 6).  3.3

Example Collection

From a didactic perspective, instructors are faced with the challenge in the process of planning a training unit to put together appropriate refactoring exercises consisting of code fragments that manifest as smells in corresponding UML diagrams. For this purpose, we have prepared a set of exercise examples in Java,

RefacTutor: An Interactive Tutoring System for Software Refactoring

251

Fig. 7. Contrasting as-is (automatically derived from source code; left-hand side) and to-be software design (targeted design specified by the instructor; right-hand side) both represented as UML class diagrams.

which can be used with the refacTutor prototype. The examples are available for download from our website4 . In particular, the examples are oriented to the catalog of software-design smells investigated in [25], where we analyzed how 14 kinds of software-design smells can be identified (and represented) in UML class and sequence diagrams. The provided set of examples can be extended/adapted or combined for creating individual refactoring exercises, e.g. for a certain application domain.

4

Application Scenarios and Addressed Competences

In this section, we illustrate scenarios for applying the tutoring system for teaching and training software refactoring. In particular, two exemplary exercise types are described in detail (see Sect. 4.1). Then, we allocate competences addressed by the exercises to levels and categories according to Bloom’s revised taxonomy and draw an exemplary path for teaching and training software refactoring (see Sect. 4.2). 4

http://refactoringgames.com/refactutor.

252

4.1

T. Haendler et al.

Exemplary Exercise Types

In order to illustrate the applicability of the proposed tutoring system for teaching and training software refactoring, two possible exercise types are described in detail.5 In Table 1, exemplary tasks with corresponding selected options of views and other exercise aspects are specified for two exercise types, i.e. (a) design refactoring and (b) test-driven development (TDD). Table 1. Exemplary exercise type: (a) design refactoring and (b) test-driven development (extends the exercise specification provided in [30]). Aspects and Views Task

Design Refactoring Refactor the given smelly code fragment in order to achieve the defined to-be design Understanding, applying and analyzing onceptual knowledge on refactoring options for given smell types (as well as on notations and meaning of UML diagrams) Analyze the software design and plan & perform refactoring steps for removing a given smell (applying and analyzing procedural knowledge)

Test-driven Development (TDD) Implement the given to-be design in code so that also the given run-time tests pass Prerequisite Competence Understanding, applying and analyzing conceptual knowledge on refactoring options for given smell type as well (as on notations and meaning of UML diagrams) Target Competence Apply the red–green-refactor steps (to realize the defined to-be design and test behavior), which includes the application and analysis of procedural knowledge Code (initial state) Code including software-design smells No code given [66] As-is-Design (initial state) Representing software-design smells No as-is-design available As-is-Design (passing final Conforming to to-be design Conforming to to-be design state) Tests Tests pass in (initial state) and in Tests fail in (initial state), but pass in (passing final state) (passing final state) Assessment as-is and to-be design are identical as-is and to-be design are identical and all tests pass and all tests pass

Design Refactoring. First, consider an instructor who aims at fostering the practical competences of analyzing the software design and performing refactoring steps (i.e. application and analysis, see target competences in Table 1). In this case, a possible exercise can be realized by presenting a piece of source code that behaves as intended (i.e. run-time tests pass), but with smelly design structure (i.e. as-is design and to-be design are different). The user’s task then is to refactor the given code in order to realize the specified targeted design. As important prerequisite, the user needs to bring along the competences to already (theoretically) know the rules for refactoring and to analyze UML diagrams. At the beginning (initial state), the tests pass, but code and design are smelly by containing, e.g. a MultifacetedAbstraction smell [66]. During the (path of) refactorings (e.g. by applying corresponding ExtractClass and/or MoveMethod/MoveField refactorings [21]), the values of the views 5

Besides the two described types of exercises, multiple other scenarios can be designed by varying the values of the exercise aspects and views (see Table 1).

RefacTutor: An Interactive Tutoring System for Software Refactoring

253

can differ (box in the middle). Characteristic for a non-passing final state or the transitional states is that at least one view (e.g. test, design structure or behavior) does not meet the requirements. In turn, a passing final state fulfills the requirements for all view values. Test-Driven Development. Another exercise type is represented by testdriven development. Consider the situation that the user shall improve her competences in performing the red–green–refactor cycle typical for test-driven development [7], which also has been identified as challenging for teaching and training [44]. This cycle includes the activities to specify and run tests that reflect the intended behavior (which first do not pass; red ), then implement (extend) the code to pass the tests (i.e. green), and finally modify the code in order to improve design quality (i.e. refactor ). For this purpose, corresponding run-time tests and the intended to-be design are prepared by the instructor. The user’s task then is to realize the intended behavior and design by implementing and modifying the source code. 4.2

Competence-Based Learning Path

In the following, we allocate the competence levels addressed by exercises of selected teaching and training environments and describe an exemplary competence-based path for teaching and training software refactoring. Addressed Competences. In [28], we have applied Bloom’s revised taxonomy of educational objectives [35] to software refactoring. In this course, we have specified the corresponding knowledge categories and cognitive-process levels for the two main activities of software refactoring (i.e. A: identifying refactoring candidates and B: planning and performing refactoring steps; see above), which then allows to precisely allocate the competence levels addressed by refactoring exercises. In particular, the corresponding prerequisite (required to perform an activity) and target competences (a.k.a. learning objectives; see [50]) can be allocated. Figure 8 depicts a two-dimensional matrix correlating the four knowledge categories (vertical) and six cognitive-process levels (horizontal) according to Bloom’s revised taxonomy [35]. For example, the exercise types presented above (see Sect. 4.1) can be classified as follows: – Prerequisite Competences: For both exercises conceptual knowledge on applying refactoring techniques is required. In particular: understanding, applying and analyzing the concepts for performing refactoring steps (such as refactoring options and paths on a conceptual level ; i.e. knowledge category II and process levels 2/3/4 for activity B; also see PC_Tutor in Fig. 8). – Target Competences: In turn, the aim of the exercises is to mediate procedural knowledge on how to actually perform refactoring techniques (on the levels of understanding, application and analysis; i.e. knowledge category III and process levels 2/3/4 for activity B; also see TC_Tutor in Fig. 8).

254

T. Haendler et al.

Fig. 8. Prerequisite (PC) and target competences (TC) of exercises provided by selected teaching and training environments, i.e. a card game (referred as Card) [26], refacTutor (referred as Tutor) and a serious game (referred as Game) [29] allocated to knowledge categories (I–IV; vertical) and cognitive-process levels (1–6; horizontal) of Bloom’s revised taxonomy of educational objectives [35] applied to identification and assessment of candidates (activity A; top rows) as well as to planning and applying refactoring techniques (activity B; bottom rows). For further details on the competence framework, see [28].

Learning and Training Path. In addition to the competences addressed by the tutoring system, the matrix in Fig. 8 also presents the competences addressed by exercises provided by other learning and training environments, i.e. a nondigital card game (referred as Card) [26] as well as a digital serious game (referred as Game) [29]. The allocation of competences shows that by combining the three training environments, a large part of competences for activity B is covered. Moreover, these environments could by used to describe a path from lower level competences addressed by the card game (e.g. understanding of conceptual knowledge) via the tutoring system covering practical aspects of refactoring (e.g. applying procedural knowledge) to more reflexive and higherlevel competences (e.g. analysis of meta-cognitive knowledge, such as strategies for refactoring) addressed by the serious game. In turn, it can also been seen that activity A (i.e. the identification of refactoring opportunities) is barely addressed by these three environments. For this purpose, for example, other environments such as quizzes (e.g. with multiplechoice questions) addressing the lower levels of factual and conceptual knowledge for activity A or capstone projects [6] focusing on the development of a smelldetection tool to address the levels of evaluation and creation of meta-cognitive knowledge (see the cells colored grey in Fig. 8) could by applied.

5

Related Work

Research related to our tutoring system for software refactoring can be roughly divided into (1) interactive tutoring systems for software development, especially those leveraging program visualization (in terms of UML diagrams) and (2) approaches for teaching software refactoring, especially with focus on software design.

RefacTutor: An Interactive Tutoring System for Software Refactoring

5.1

255

Tutoring Systems for Software Development

Interactive learning environments in terms of editor-based web-applications such as Codecademy [58] are popular nowadays for learning programming. These tutoring systems provide learning paths for accomplishing practical competences in selected programming aspects. They motivate learners via rewards and document the achieved learning progress. Only very few tutoring systems can be identified that address software refactoring; see, e.g. [55]. In particular, Sandalski et al. present an analysis assistant that provides intelligent decision-support and feedback very similar to a refactoring recommendation system, see, e.g. [69]. It identifies and highlights simple code-smell candidates. After the user’s code modification, it reacts by proposing (better) refactoring options. Related are also tutoring environments that present interactive feedback in terms of code and design visualization; for an overview, see, e.g. [64]. Only a few of these approaches provide the reverse-engineering of UML diagrams, such as JAVAVIS [48] or BlueJ [33]. However, existing approaches do not target refactoring exercises. For instance, they not allow for comparing the actual design (as-is) and the targeted design (to-be), e.g. as defined by the instructor, especially not in terms of UML class diagrams. Moreover, tutoring systems barely provide integrated software-behavior evaluation in terms of regression tests (pre-specified by the instructor). In addition, Krusche and Seitz propose an approach for a tutoring system providing immediate feedback and applied in the framework of MOOCs for teaching and training software engineering [38]. In particular, the environment visualizes errors by highlighting the affected parts of the code in a reverse-engineered UML class diagram. By providing the results of run-time tests and program visualization as decision support the approach is very similar to ours. However, while it focuses on reporting errors triggered by run-time tests, our approach focuses on supporting the removal of design flaws. In general, we complement these approaches by presenting an approach that integrates immediate feedback on code modifications in terms of software-design quality and software behavior. 5.2

Teaching Software Refactoring

Besides tutoring systems (discussed above) other kinds of teaching refactoring are related to our approach. In addition to tutoring systems based on instructional learning design, a few other learning approaches in terms of editor-based refactoring games can be identified that also integrate automated feedback. In contrast to tutoring systems which normally apply small examples, serious games such as [15,29] are based on real-world code base. These approaches also include means for rewarding successful refactorings by increasing the game score (or reducing the technical-debt score) and partially provide competitive and/or collaborative game variants. However, so far these games do not include visual feedback that is particularly important for the training of design-related refactoring.

256

T. Haendler et al.

Moreover other approaches without integrated automated feedback are established. For instance, Smith et al. propose an incremental approach for teaching different refactoring types on college level in terms of learning lessons [60,65]. The tasks are also designed in an instructional way with exemplary solutions that have to be transferred to the current synthetic refactoring candidate (i.e. code smell). Within this, they provide an exemplary learning path for refactorings. Furthermore, Abid et al. conducted an experiment for contrasting two students groups, of which one performed pre-enhancement (refactoring first, then code extension) and the second post-enhancement (code extension first, then refactoring) [1]. Then they compared the quality of the resulting code. López et al. report on a study for teaching refactoring [39]. They propose exemplary refactoring task categories and classify them according to Bloom’s taxonomy and learning types. They also describe learning settings, which aim at simulating real-world conditions by providing, e.g. an IDE and revision control systems. In addition to these teaching and training approaches, we present a tutoring system that can be seen as a link in the learning path between lectures and lessons (that mediate basic knowledge on refactoring) on the one hand and environments such as serious games that already require practical competences on the other.

6

Discussion

In this article, we have presented a novel approach for teaching and training practical competences in software refactoring. As shown above (Sect. 5), the tutoring system is considered as complement to other environments and techniques, such as lectures for mediating basic understanding (before) and serious games (afterwards) for consolidating and extending the practical competences in direction of higher-level competences such as evaluating refactoring options and developing refactoring strategies. The provided decision support in terms of UML design diagrams (representing the as-is and to-be software design) especially addresses the refactoring of quality issues on the level of software design and architecture, which are not directly visible by reviewing the source code alone; also see architectural debt [36]. The applicability of the proposed approach has been shown in two ways. First, by demonstrating the technical feasibility via a proof-of-concept implementation in Java, which realizes the most important functionalities. In this course, also a more detailed exercise example has been presented and a set of further examples is provided. Second, in terms of the usability for teaching and training practical competences in software refactoring. This has been illustrated by two exemplary exercise types (i.e. design refactoring and test-driven development), for which the addressed competence have been allocated according to a competence framework [28]. Moreover, an exemplary path for training software refactoring has been drawn including the exercises provided by the tutoring system. As a next step, it is important to investigate, how and to what extent the tutoring system can contribute to competence acquisition and training in the framework of an actual educational setting, such as a university course or an

RefacTutor: An Interactive Tutoring System for Software Refactoring

257

open online course. In addition to collecting feedback from users, the challenge here is to measure the competence levels of users and in particular whether and to what extent the levels have risen due to the use of the environment.

7

Conclusion

In this article, we have presented an interactive training and development environment for improving practical competences in software refactoring. refacTutor represents a novel approach for actively learning and training software refactoring. Key to our approach is providing immediate feedback to the user’s code modifications (i.e. refactoring steps) on the quality of the software design (in terms of reverse-engineered UML diagrams), the functional correctness of the (modified) source code (via presenting the result of integrated run-time regression tests) as well as an extensible array of other quality aspects (depending on the kind of integrated analysis tools). In particular, this article extends the previous conference publication [30] by providing an elaboration of the conceptual foundations of the tutoring system (e.g. in terms of the viewpoint model for software refactoring), a set of further examples of refactoring exercises for download as well as a competence-based analysis and an exemplary teaching path. As already discussed in Sect. 6, an interesting future research direction is to measure, how and to what extent refacTutor can contribute to acquiring and training competences for software refactoring. For this purpose, it is planned to integrate a training unit on refactoring for design smells into a university course on software design and modeling.

References 1. Abid, S., Abdul Basit, H., Arshad, N.: Reflections on teaching refactoring: a tale of two projects. In: Proceedings of the 2015 ACM Conference on Innovation and Technology in Computer Science Education, pp. 225–230. ACM (2015) 2. Ajax.org: AceEditor (2019). https://ace.c9.io/. Accessed 7 Aug 2019 3. Alves, N.S., Mendes, T.S., de Mendonça, M.G., Spínola, R.O., Shull, F., Seaman, C.: Identification and management of technical debt: a systematic mapping study. Inf. Softw. Technol. 70, 100–121 (2016). https://doi.org/10.1016/j.infsof.2015.10. 008 4. Argyris, C.: Double loop learning in organizations. Harvard Bus. Rev. 55(5), 115– 125 (1977) 5. Arisholm, E., Briand, L.C., Hove, S.E., Labiche, Y.: The impact of UML documentation on software maintenance: an experimental evaluation. IEEE Trans. Softw. Eng. 32(6), 365–381 (2006). https://doi.org/10.1109/TSE.2006.59 6. Bastarrica, M.C., Perovich, D., Samary, M.M.: What can students get from a software engineering capstone course? In: 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering Education and Training Track (ICSE-SEET), pp. 137–145. IEEE (2017) 7. Beck, K.: Test-Driven Development: By Example. Addison-Wesley Professional (2003)

258

T. Haendler et al.

8. Bloom, B.S., et al.: Taxonomy of Educational Objectives, vol. 1: Cognitive Domain, pp. 20–24. McKay, New York (1956) 9. Bonwell, C.C., Eison, J.A.: Active Learning: Creating Excitement in the Classroom. 1991 ASHE-ERIC Higher Education Reports. ERIC (1991) 10. Campbell, G., Papapetrou, P.P.: SonarQube in Action. Manning Publications Co. (2013). https://www.sonarqube.org/. Accessed 7 Aug 2019 11. Cañas, J.J., Bajo, M.T., Gonzalvo, P.: Mental models and computer programming. Int. J. Hum.-Comput. Stud. 40(5), 795–811 (1994). https://doi.org/10.1006/ijhc. 1994.1038 12. Clements, P., et al.: Documenting Software Architectures: Views and Beyond. Pearson Education (2002) 13. CoderGears: JArchitect (2018). http://www.jarchitect.com/. Accessed 7 Aug 2019 14. Daun, M., Tenbergen, B., Weyer, T.: Requirements viewpoint. In: Pohl, K., Hönninger, H., Achatz, R., Broy, M. (eds.) Model-Based Engineering of Embedded Systems, pp. 51–68. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3642-34614-9_4 15. Elezi, L., Sali, S., Demeyer, S., Murgia, A., Pérez, J.: A game of refactoring: studying the impact of gamification in software refactoring. In: Proceedings of the Scientific Workshops of XP2016, pp. 23:1–23:6. ACM (2016). https://doi.org/10.1145/ 2962695.2962718 16. Erlikh, L.: Leveraging legacy system dollars for e-business. IT Prof. 2, 17–23 (2000) 17. Fernandes, E., Oliveira, J., Vale, G., Paiva, T., Figueiredo, E.: A review-based comparative study of bad smell detection tools. In: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering, pp. 18:1–18:12. ACM (2016). https://doi.org/10.1145/2915970.2915984 18. Fontana, F.A., Braione, P., Zanoni, M.: Automatic detection of bad smells in code: an experimental assessment. J. Object Technol. 11(2), 5-1 (2012). https://doi.org/ 10.5381/jot.2012.11.2.a5 19. Fontana, F.A., Dietrich, J., Walter, B., Yamashita, A., Zanoni, M.: Antipattern and code smell false positives: preliminary conceptualization and classification. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, pp. 609–613. IEEE (2016). https://doi.org/10. 1109/SANER.2016.84 20. Forman, I.R., Forman, N.: Java Reflection in Action (In Action Series). Manning Publications Co. (2004). https://www.oracle.com/technetwork/articles/java/ javareflection-1536171.html. Accessed 7 Aug 2019 21. Fowler, M., Beck, K., Brant, J., Opdyke, W., Roberts, D.: Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional (1999). http:// martinfowler.com/books/refactoring.html. Accessed 7 Aug 2019 22. Freeman, S., et al.: Active learning increases student performance in science, engineering, and mathematics. Proc. Nat. Acad. Sci. 111(23), 8410–8415 (2014) 23. Gamma, E., Beck, K., et al.: JUnit: a cook’s tour. Java Rep. 4(5), 27–38 (1999). http://junit.sourceforge.net/doc/cookstour/cookstour.htm. Accessed 7 Aug 2019 24. George, C.E.: Experiences with novices: the importance of graphical representations in supporting mental mode. In: PPIG, p. 3 (2000) 25. Haendler, T.: On using UML diagrams to identify and assess software design smells. In: Proceedings of the 13th International Conference on Software Technologies, pp. 413–421. SciTePress (2018). https://doi.org/10.5220/0006938504470455 26. Haendler, T.: A card game for learning software-refactoring principles. In: Proceedings of the 3rd International Symposium on Gamification and Games for Learning (GamiLearn@CHIPLAY) (2019)

RefacTutor: An Interactive Tutoring System for Software Refactoring

259

27. Haendler, T., Frysak, J.: Deconstructing the refactoring process from a problemsolving and decision-making perspective. In: Proceedings of the 13th International Conference on Software Technologies (ICSOFT), pp. 363–372. SciTePress (2018). https://doi.org/10.5220/0006915903970406 28. Haendler, T., Neumann, G.: A framework for the assessment and training of software refactoring competences. In: Proceedings of 11th International Conference on Knowledge Management and Information Systems (KMIS). SciTePress (2019) 29. Haendler, T., Neumann, G.: Serious refactoring games. In: Proceedings of the 52nd Hawaii International Conference on System Sciences (HICSS), pp. 7691–7700 (2019). https://doi.org/10.24251/HICSS.2019.927 30. Haendler, T., Neumann, G., Smirnov, F.: An interactive tutoring system for training software refactoring. In: Proceedings of the 11th International Conference on Computer Supported Education (CSEDU), vol. 2, pp. 177–188. SciTePress (2019). https://doi.org/10.5220/0007801101770188 31. Haendler, T., Sobernig, S., Strembeck, M.: Deriving tailored UML interaction models from scenario-based runtime tests. In: Lorenz, P., Cardoso, J., Maciaszek, L.A., van Sinderen, M. (eds.) ICSOFT 2015. CCIS, vol. 586, pp. 326–348. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30142-6_18 32. Haendler, T., Sobernig, S., Strembeck, M.: Towards triaging code-smell candidates via runtime scenarios and method-call dependencies. In: Proceedings of the XP2017 Scientific Workshops, pp. 8:1–8:9. ACM (2017). https://doi.org/10.1145/3120459. 3120468 33. Kölling, M., Quig, B., Patterson, A., Rosenberg, J.: The BlueJ system and its pedagogy. Comput. Sci. Educ. 13(4), 249–268 (2003). https://doi.org/10.1076/ csed.13.4.249.17496 34. Kollmann, R., Selonen, P., Stroulia, E., Systa, T., Zundorf, A.: A study on the current state of the art in tool-supported UML-based static reverse engineering. In: Proceedings of the Ninth Working Conference on Reverse Engineering, pp. 22–32. IEEE (2002). https://doi.org/10.1109/WCRE.2002.1173061 35. Krathwohl, D.R.: A revision of Bloom’s taxonomy: an overview. Theory Pract. 41(4), 212–218 (2002). https://doi.org/10.1207/s15430421tip4104_2 36. Kruchten, P., Nord, R.L., Ozkaya, I.: Technical debt: from metaphor to theory and practice. IEEE Softw. 29(6), 18–21 (2012). https://doi.org/10.1109/MS.2012.167 37. Kruchten, P.B.: The 4+1 view model of architecture. IEEE Softw. 12(6), 42–50 (1995). https://doi.org/10.1109/52.469759 38. Krusche, S., Seitz, A.: Increasing the interactivity in software engineering MOOCs a case study. In: 52nd Hawaii International Conference on System Sciences, HICSS 2019, pp. 1–10 (2019) 39. López, C., Alonso, J.M., Marticorena, R., Maudes, J.M.: Design of e-activities for the learning of code refactoring tasks. In: 2014 International Symposium on Computers in Education (SIIE), pp. 35–40. IEEE (2014). https://doi.org/10.1109/ SIIE.2014.7017701 40. Martini, A., Bosch, J., Chaudron, M.: Architecture technical debt: understanding causes and a qualitative model. In: 2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications, pp. 85–92. IEEE (2014). https:// doi.org/10.1109/SEAA.2014.65 41. May, N.: A survey of software architecture viewpoint models. In: Proceedings of the Sixth Australasian Workshop on Software and System Architectures, pp. 13–24 (2005) 42. Michael, J.: Where’s the evidence that active learning works? Adv. Physiol. Educ. 30(4), 159–167 (2006)

260

T. Haendler et al.

43. Moha, N., Gueheneuc, Y.G., Duchien, L., Le Meur, A.F.: DECOR: a method for the specification and detection of code and design smells. IEEE Trans. Softw. Eng. 36(1), 20–36 (2010). https://doi.org/10.1109/TSE.2009.50 44. Mugridge, R.: Challenges in teaching test driven development. In: Marchesi, M., Succi, G. (eds.) XP 2003. LNCS, vol. 2675, pp. 410–413. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44870-5_63 45. Murphy-Hill, E., Parnin, C., Black, A.P.: How we refactor, and how we know it. IEEE Trans. Softw. Eng. 38(1), 5–18 (2012). https://doi.org/10.1109/TSE.2011. 41 46. Nord, R.L., Ozkaya, I., Kruchten, P., Gonzalez-Rojas, M.: In search of a metric for managing architectural technical debt. In: 2012 Joint Working IEEE/IFIP Conference on Software Architecture and European Conference on Software Architecture, pp. 91–100. IEEE (2012). https://doi.org/10.1109/WICSA-ECSA.212.17 47. Object Management Group: Unified Modeling Language (UML), Superstructure, Version 2.5.1, June 2017. https://www.omg.org/spec/UML/2.5.1. Accessed 7 Aug 2019 48. Oechsle, R., Schmitt, T.: JAVAVIS: automatic program visualization with object and sequence diagrams using the Java Debug Interface (JDI). In: Diehl, S. (ed.) Software Visualization. LNCS, vol. 2269, pp. 176–190. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45875-1_14 49. Opdyke, W.F.: Refactoring object-oriented frameworks. University of Illinois at Urbana-Champaign Champaign, IL, USA (1992). https://dl.acm.org/citation.cfm? id=169783 50. Paquette, G.: An ontology and a software framework for competency modeling and management. Educ. Technol. Soc. 10(3), 1–21 (2007). https://www.jstor.org/ stable/jeductechsoci.10.3.1?seq=1 51. Parnas, D.L.: Software aging. In: Proceedings of 16th International Conference on Software Engineering, pp. 279–287. IEEE (1994). http://portal.acm.org/citation. cfm?id=257734.257788 52. Ribeiro, L.F., de Freitas Farias, M.A., Mendonça, M.G., Spínola, R.O.: Decision criteria for the payment of technical debt in software projects: a systematic mapping study. In: ICEIS (1), pp. 572–579 (2016) 53. Richner, T., Ducasse, S.: Recovering high-level views of object-oriented applications from static and dynamic information. In: Proceedings of the IEEE International Conference on Software Maintenance, pp. 13–22. IEEE Computer Society (1999). https://doi.org/10.1109/ICSM.1999.792487 54. Roques, A.: PlantUml: UML diagram editor (2017). https://plantuml.com/. Accessed 7 Aug 2019 55. Sandalski, M., Stoyanova-Doycheva, A., Popchev, I., Stoyanov, S.: Development of a refactoring learning environment. Cybern. Inf. Technol. (CIT) 11(2) (2011). http://www.cit.iit.bas.bg/CIT_2011/v11-2/46-64.pdf. Accessed 7 Aug 2019 56. Scanniello, G., et al.: Do software models based on the UML aid in source-code comprehensibility? Aggregating evidence from 12 controlled experiments. Empirical Softw. Eng. 23(5), 2695–2733 (2018). https://doi.org/10.1007/s10664-017-9591-4 57. Schach, S.R.: Object-Oriented and Classical Software Engineering, vol. 6. McGrawHill, New York (2007) 58. Sims, Z., Bubinski, C.: Codecademy (2018). http://www.codecademy.com. Accessed 7 Aug 2019 59. Sleeman, D., Brown, J.S.: Intelligent tutoring systems (1982)

RefacTutor: An Interactive Tutoring System for Software Refactoring

261

60. Smith, S., Stoecklin, S., Serino, C.: An innovative approach to teaching refactoring. In: ACM SIGCSE Bulletin, vol. 38, pp. 349–353. ACM (2006). https://doi.org/10. 1145/1121341.1121451 61. Software Engineering Standards Committee of the IEEE Computer Society: IEEE recommended practice for architectural description of software-intensive systems. IEEE Std 1471–2000, pp. 1–29, September 2000 62. Sommerville, I., Kotonya, G., Viller, S., Sawyer, P.: Process viewpoints. In: Schäfer, W. (ed.) EWSPT 1995. LNCS, vol. 913, pp. 2–8. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-59205-9_35 63. Sommerville, I., Sawyer, P.: Viewpoints: principles, problems and a practical approach to requirements engineering. Ann. Softw. Eng. 3(1), 101–130 (1997) 64. Sorva, J., Karavirta, V., Malmi, L.: A review of generic program visualization systems for introductory programming education. ACM Trans. Comput. Educ. (TOCE) 13(4), 15 (2013). https://doi.org/10.1145/2490822 65. Stoecklin, S., Smith, S., Serino, C.: Teaching students to build well formed objectoriented methods through refactoring. ACM SIGCSE Bull. 39(1), 145–149 (2007). https://doi.org/10.1145/1227310.1227364 66. Suryanarayana, G., Samarthyam, G., Sharma, T.: Refactoring for Software Design Smells: Managing Technical Debt. Morgan Kaufmann (2014). https://dl.acm.org/ citation.cfm?id=2755629 67. Tempero, E., Gorschek, T., Angelis, L.: Barriers to refactoring. Commun. ACM 60(10), 54–61 (2017). https://doi.org/10.1145/3131873 68. Trung, N.K.: InMemoryJavaCompiler (2017). https://github.com/trung/ InMemoryJavaCompiler. Accessed 7 Aug 2019 69. Tsantalis, N., Chaikalis, T., Chatzigeorgiou, A.: JDeodorant: identification and removal of type-checking bad smells. In: Proceedings of 12th European Conference on Software Maintenance and Reengineering (CSMR 2008), pp. 329–331. IEEE (2008). https://doi.org/10.1109/CSMR.2008.4493342 70. Wichmann, B., Canning, A., Clutterbuck, D., Winsborrow, L., Ward, N., Marsh, D.: Industrial perspective on static analysis. Softw. Eng. J. 10(2), 69–75 (1995)

User-Centered Design: An Effective Approach for Creating Online Educational Games for Seniors Louise Sauvé1(B) and David Kaufman2(B) 1 TELUQ University/SAVIE, Quebec G1K 9H6, Canada

[email protected] 2 Simon Fraser University, British Columbia V5A 1S6, Canada

[email protected]

Abstract. To be effective, development of online educational games for older adults should be rooted in a user-centered design (UCD) process. This design approach is derived from computer ergonomics, in which the needs, expectations, and characteristics of users are taken into account at every stage of development. This differs from other approaches in that it seeks to adapt the product (in this case, an online educational game) to the needs and preferences of the end user rather than imposing characteristics imagined by the product’s designers. In this chapter, we present the UCD process which allowed us to identify areas for improvement during the modelling phase (through testing a mock-up of the game on paper), the prototype phase (testing a limited version of the programmed game) and during implementation of the final version of the game (online testing of the full game). In our experience over the past 25 years, our online educational games have normally required two or three iterations to finalize a game’s design. The results of the approach show that UCD considerably reduces the costs inherent in game design and development while ensuring a high degree of player satisfaction. Keywords: Educational game · User-centered design · Seniors · Older adults

1 Introduction Researchers [1–4] point out that the effectiveness of online educational games depends on the individual needs and characteristics of the players and that systems must be developed that are able to adapt to the needs of the target audience. An inappropriate design can act as a barrier to seniors’ use of online educational games. In order to develop an online educational game adapted for seniors, we first conducted a survey of seniors in Quebec and British Columbia to identify promising games to adapt [5]. The game Solitaire was identified as a favorite game of older adults. In order to ensure that our Solitaire-based game performs well for our target population, we used user-centered design (UCD), which integrates an ergonomic approach into product development. This methodology makes it possible to identify the points to be improved at the different development stages: during modelling (building the game © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 262–284, 2020. https://doi.org/10.1007/978-3-030-58459-7_13

User-Centered Design

263

in paper format), prototyping (programming the game on a computer), and building the nearly-finalized version (an online game offered in a restricted version). Normally, it only takes two to three iterations to finalize the design of a game [6]. In this chapter, we will describe how the creation process of the game “In Anticipation of Death,” based on UCD, made it possible to adapt this game to the needs of seniors [7]. First, we report on the methodology used to adapt the Solitaire card game for older adults. Then, we describe how we took into account ergonomic aspects of game design for seniors in developing the game mock-up. We then present the Alpha version of the game, which included certain parameters for user-friendliness. Subsequently, we explain the Beta version, for which the game’s external environment was developed. Finally, we offer recommendations in the form of a guide for educational game designers. This chapter differs from the proceedings of the CSEDU [8] conference, since the latter dealt only with adaptation of the game design for seniors.

2 Methodology When creating an online educational game for a particular population, the UCD process consists of testing the product (an educational game) at different stages of its development with its future users (in our case, older adults) and making any modifications needed. Table 1 summarizes our process for the game Solitaire Quiz. Table 1. Summary of the UCD process as applied to our game. Product

Paper Game (Mock-up)

Alpha Game (Prototype)

Participants

6

12

42

Purpose of the test

Educational game design

User-friendliness

Educational game design User-friendliness

Number of times the game is played

3

3 to 6

5 to 9

Place of testing

Laboratory

Laboratory

Senior associations, retirement homes

Measuring instruments Observation grid Interview Recording gameplay actions

Observation grid Interview System to track players’ responses

Questionnaires (2) Interview System to track players’ responses

Duration of the test

3 days

14 days

3 days

Beta Game (Online)

The experiment took place over the course of two months after being approved by the TELUQ university’s ethics committee. Each participant was made aware of the study’s research purpose and signed a paper or online consent form.

264

L. Sauvé and D. Kaufman

Various ergonomic aspects noted in the literature were taken into consideration during the development of the educational game: the design of the educational game in the mock-up version and the user-friendliness of the game in the Alpha version. We first discuss the type of game that became the object of our development project. 2.1 Choice of the Game We relied on a survey of 931 seniors from Quebec and British Columbia, as part of the project “Aging Well: Can Digital Games Help?” (2012–2016), in which the game of Solitaire (paper and digital) was identified as a favourite for older adults [5]. This short game (five to 15 min) is recommended for seniors. 2.2 Description of the Game Game Board: Solitaire is a single-user game that is played with a deck of 52 cards. The first 28 cards are arranged into seven columns of increasing size that form the Board. Only the last card of each column on the Board is placed face up. The 24 remaining cards (face down) make up the Stock pile, also called the Deck. Cards from the Stock pile are discarded, three at a time. Only the visible cards can be used. Goal of the Game: The game ends when all the cards are placed into four piles for each suit and sorted in ascending order (from Ace to King), or when a player declares forfeit because they cannot move any more cards. In the latter case, the player can start a new game. Movement in the Seven Columns: Cards can be moved from one column to another provided that the card being moved can be placed immediately on a higher card and of a different color, for example, a red 6 on a black 7. Aces are set apart to form the beginnings of the four piles to be reconstituted.

2.3 The Contribution of Digital Technology to Solitaire In a review of existing digital Solitaire games, we noted the addition of various features to the original paper game. Some rules and options are included in our version, such as the choice of playing with one or three cards at a time, the addition of scoring in connection with the movement of cards, and scoring based on the time taken by the player to build the four piles. Elements to customize and add interest complete the game, for example, changing the card layout for right-handed or left-handed players; displaying gameplay time and movements; and personalizing the game environment by choosing a theme, the size of the numbers on the cards, images on the backs of the cards, and the color of the playing surface. The interface also provides access to certain functions: Play (start a new game), Start again (use the same cards, shuffled) and Personal data (score, statistics, and successes).

User-Centered Design

265

Advantages help players to reach the goal of the game: Hint shows possible card movements, Undo clears the last actions, and Help (?) provides access to the game rules. At the end of a game, different elements are displayed, including animation, display of results and statistics (best score, rank, etc.), display of successes achieved during the game, and display of the personal badge earned according to the player’s game successes.

3 The Mock-Up of the Solitaire Quiz Game The mock-up is a paper version of the Solitaire Quiz game that is inspired by the digital version of the game Solitaire. The mock-up version takes into account the ergonomic aspects of educational game design for seniors, as described below. 3.1 Educational Game Design The design of an educational game first refers to its essential attributes: players, competition/challenge, rules, the predetermined goal, learning content, and feedback [9, 10]. We will now examine the ergonomic requirements for these aspects of game design. Players. Solitaire is a one-player game. For our version, the player is a senior who is at least 55 years old [11] and retired [12]. The player is considered to be a beginner in terms of their technological skills, as much for the use of a computer or a mobile (tablet, phone) as for using online games. In the context of developing an educational game, the player has not necessarily played online games. Competition/Challenge. Various mechanisms are found in the literature to ensure challenging and competitive online educational games [13]. To support competition, the game should include levels of difficulty or challenges appropriate to the knowledge, age, and physical abilities of the targeted players [2]. Concerning knowledge, the learning content must be graduated from simple to complex to maintain motivation [14]. It is suggested that a game should offer at least three levels of difficulty in terms of learning content [15]. It is equally important that the mechanics of the game allow players to select increasingly difficult questions from one game to another in order to maintain a sense of challenge, especially for older adults. Rules. The rules are instructions, simple or complex, which describe the relationship between the players and the game environment [15]. Understanding the rules of the game and mastering them gives players a sense of control in the game interface (buttons, movement in the game, etc.) [16]. A recommended way to engage seniors is to use known games with few and wellunderstood rules, since confusion about the rules can discourage seniors from playing [17]. Researchers [18, 19] suggest adding new rules to known games to maintain a sense of challenge and manage the integration of learning content. Finally, we must make the rules accessible at any time through a single click from any page of the game’s environment [15].

266

L. Sauvé and D. Kaufman

Predetermined Goal. The predetermined goal of a game refers to how a game ends and to its notions of reward and victory. A game must have a goal and winners [20, 21]. The rules that determine winners and losers can be formulated to engage players’ abilities and knowledge; for example, giving points for correct answers and actions [22]. Learning Content. Studies show that a balance between play time and learning time is needed to maintain players’ motivation. To maintain this balance, the learning content in the game must be properly measured so that there is a place for chance and for actions that are only related to the pleasure of playing [13, 15]. To integrate learning content into the game without creating cognitive overload for seniors, information should be broken up into small units (one or two lines) or simple questions. It seems best to use closed questions (true/false or multiple choice with one or more answers or objects to be matched), therefore facilitating older adults’ participation without highlighting their memory difficulties. Repeating content elements allows seniors to recognize them and consider them useful for their progress in the game [2, 13, 15]. For educational games, it is important to link points gained to positive learning outcomes and their loss to negative results [22, 23]. However, fewer points must be lost than are gained in order to maintain seniors’ interest, particularly for those who have little knowledge of the game’s subject matter [24]. Acquiring points in connection with performance increases older adults’ self-confidence, while displaying players’ scores and highlighting the winner motivates seniors to replay the game. Feedback. For learners who perform actions in the game to achieve learning, on-thespot feedback is recommended [25]. The result of each learning activity (success or failure) should be highlighted by visual or audible feedback, such as a smiling or sad face, a positive or negative sound tone, and/or points added to the player’s score [26]. For an incorrect response, the game should provide textual, visual, or auditory feedback about the content, together with additional information about a correct response, in order to sustain the player’s interest and promote learning [27]. At the end of a game, it is important to display the learning outcomes with a general view of players’ results for the learning activities, and to provide access to learning materials for reviewing subject matter that was not learned [10–15]. For older adult players, immediate feedback about their actions is also recommended. This feedback often takes the form of a tutorial, guiding each player throughout the game to enable them to see the results of their actions [24, 28]. The tutorial facilitates understanding the game without forcing seniors to learn the rules quickly, thus reducing their cognitive load [27]. The instructions should be simple and contextualized to facilitate comprehension of the game, helping seniors to avoid demotivating mistakes [26]. 3.2 The Description of the Game Mock-Up Game Board. For the paper version of the game, we used a 16 × 20 mock-up to reproduce the game’s interface (Fig. 1). Playing cards were used and placed on the mock-up sheet. Question cards were developed, and feedback was written on the back

User-Centered Design

267

of each card. A privilege sheet was also provided to the player. Finally, the rules were provided on paper.

Fig. 1. Paper mock-up of the game solitaire.

At the beginning of the game, the modified rules of the game were given to the player without precise instructions. The player had to choose the game mode (one or three cards) and the difficulty level of the questions (easy, intermediate and difficult). Regardless of the level of difficulty chosen, the player received the same $500 in credits at the start of the gameplay. The player shuffled the cards and deposited them on the mock-up of the game interface. During the game, the gamemaster performed the actions that the computer would do in the digital version, as follows: – For the movement indicator, he moved the cursor after each player moved. – After a given number of moves, he asked the player a question and had him read the feedback. If the answer was correct, the gamemaster marked down the credits earned and displayed the total credits available. If the answer was incorrect, he subtracted the credits lost. – When a player chose one of the privileges as a reward, the player performed the action requested by the privilege and the gamemaster subtracted the number of credits from the accumulated sum. Rules of the Game. First of all, we made no changes to the goal of the game. Our new rules were related to the educational aspects that we introduced into the Solitaire digital game and to privileges that helped players to finish the game. With regard to the game mode, we introduced a choice for question difficulty level that allowed us to add a question score to the standard Solitaire scoring. This encouraged

268

L. Sauvé and D. Kaufman

learning by reading feedback, helping seniors to build their knowledge of the game’s theme (Table 2). Table 2. Rule for question scoring.

Depending on the difficulty of the game, the computer displays a question that the player is asked to answer: ─ If the player answers correctly, he earns points according to the degree of difficulty of the question: 20 points for an easy question; 35 points for an intermediate question, and 50 points for a difficult question. ─ If the player does not answer correctly, he loses points according to the difficulty of the game being used: 10 points for an easy question; 20 points for an intermediate question, and 35 points for a difficult question.

We also introduced the option of purchasing privileges to help players break a stalemate in a game that promises to be stuck or simply to earn extra points. To manage these purchases, we established a new rule (Table 3). Table 3. Rule to manage privileges.

At any time, a player who has accumulated enough credits can buy privileges from the Store, which increases the chances of finishing the game and earning points. ─ 15$ - Buying a Question: Answer a question correctly to accumulate points. ─ 25$ - Going Backwards : Undo the last action taken to move a card from the Tableau or one of the piles. ─ 50$ - Joker’s Advice: Buy help from the Joker to view all possible moves. ─ 75$ - Risky Freedom: Randomly draw a hidden card from the board. ─ 100$ - Selective Freedom: Take a card from the hidden cards on the Tableau. ─ 150$ - The Red King: Release the king (heart or diamond) hidden on the board or in the deck to place it in a blank column. ─ 150$ - The Black King: Release the king (spade or club) hidden on the board or in the deck to place it in a blank column. ─ 200$ - Ace of Aces: Release an ace hidden among the cards on the Tableau and place it on a pile. ─ 300$ - The Chameleon Joker: Replace any card on the Tableau. ─ 300$ - The Imperial Discard: Return a card of your choice to the deck.

Game Questions (Quiz). To choose a content theme for the quiz questions, we interviewed 167 adults aged 55 and over. These participants were interested in the actions to be taken upon the death of their spouse; more than 72% expressed a lack of knowledge

User-Centered Design

269

about putting the affairs of their spouse in order, recovering what is due to their spouse, paying debts, and fulfilling their spouse’s wishes concerning the disposition of their body [7]. In Solitaire Quiz, we dealt with the learning content by using closed questions (true/false or multiple choice with one or more answers), to which we added feedback to be displayed when the player answers a question. We also limited the number of questions to 40 so that each is used at least twice during a game. Finally, we split the learning content into small units, divided into three levels of difficulty (15 easy, 15 intermediate and 10 difficult) identified by one, two, or three stars. To ensure a balance between playing and learning, we opted to pose a question after 10 card movements. These movements are represented by an indicator that moves on a progression bar, with a fraction to indicate its progress. If the player answers the question correctly, they earn points, and if they do not answer the question correctly, they lose points. Finally, we integrated feedback in the form of a smiling or sad face as well as text and audible feedback to explain the correct or incorrect answer (Fig. 2).

Difficulty

★★★

What is the legal structure under which property is a separate asset held by one person for the benefit of another? 1. A Trust 2. Inheritance 3. Indemnity 4. Annuity Question (Face of the Card)

+ 20 points A Trust

★★★

Exact! An inheritance is a set of assets from an estate. An indemnity is an amount allocated to compensate for any loss suffered. An annuity is an annual income from financial investments or paid under a program or plan, public or private. Feedback (Back of the Card)

Fig. 2. Question card.

3.3 Testing the Mock-Up Six people participated in the test of the paper version: three seniors aged 55 to 64 and three aged 65 to 72. Each played the paper game three times for three days, with all of their gameplay actions recorded. Observations were taken using an observation grid, and individual interviews with the players were conducted following their tests. First of all, all the respondents liked playing the Solitaire Quiz paper game. They described the game as simple, requiring little time to play. They appreciated the choice of game mode (one or three cards), as well as the integrated questions that offered them a pleasant way to learn. Players found that the type of questions were easy to answer, and they took time to read the feedback. In addition, they considered that the question types (True or False, Multiple Choice, etc.) did not slow the gameplay.

270

L. Sauvé and D. Kaufman

Regarding elements to be improved, recommendations were identified based on the comments from the participants and the development team: • The number of card movements to display a question was too high. Players did not answer all of the questions at least twice. Reduce the number of moves to eight to have players answer more questions and accumulate more money credits. • Rearrange the gaming space to move the discard pile to the left of the game and the four in-play piles to the right. • Clarify some rules of the game. In the test, two respondents read the rules of the game before starting to play, two read the rules during the game, and two did not read the game rules at all. Some requests were made for clarification of the rules. • Some privileges were not used by respondents (Selective Freedom, Discard, Joker’s Advice). Wait for the Alpha version before removing these privileges. • Two players mentioned that the game could offer the additional challenge of playing against time. Other players found this idea interesting but suggested making it optional. Incorporate an option for playing against time: finishing a game in 0–5 min (100 points bonus or 100 points loss), 5–10 min (50 points bonus or 50 points loss), 10 min or more (no bonus or loss). • The audible reading of the questions was appreciated by the participants, given that the size of the characters were not easily read by two players. Incorporate a digital voice function for game questions. • The development team suggested locating the rules and the game tutorial in the Options menu to maximize space for the game interface.

4 The Alpha Version of the Game In the Alpha version (prototype of the computerized game), we took into account both criteria from the literature for user-friendliness of digital games for older adults and recommendations arising from the first test (rules and some options). 4.1 User-Friendliness Criteria for Seniors’ Digital Games User-friendliness refers to the qualities of a digital game that make it easy and pleasant to use and understand, even for someone with little computer knowledge. The role of the game’s environment is to help the player focus on what is important. Problems reported by older adults with the use of technologies are predominantly associated with user-friendliness (navigation and display) and can often be resolved by appropriate design. For seniors, the game’s user-friendliness also depends on using appropriate physical equipment to accommodate eyesight and dexterity problems. Navigation and Display in the Game. To make a game environment intuitive for seniors, designers should ensure that players can easily access all components (cards, navigation buttons, instructions/tutorials and score) needed for the game to run smoothly [23, 24, 29, 30]. To facilitate players’ movement in the game, it is very important to make sure that the game and its components are displayed without overflowing the screen and

User-Centered Design

271

without blocking some game elements [15–26]. For a comfortable gameplay experience, the design should use a predetermined frame or a responsive web design to maintain a standard display layout across screens. The game board and accessories for playing should cover most of the screen, and scroll bars in page displays should be avoided. To facilitate navigation within the game, the game elements and question content should be limited to one screen page. This avoids long and tedious scrolling on the screen, which particularly demotivates seniors with short attention spans [15, 23, 24, 29, 31]. It is also important to minimize the use of superimposed windows during the course of a game, since some older users are less likely to notice page changes and can become confused. A clear notification of a change of screens should be displayed, for example, when the player goes from the “Game” page to a “Questions/Information” page [32]. Images should be processed to avoid waiting for their on-screen display, which frustrates players. To prevent the user from believing that his equipment has failed, it is best to notify him if the estimated download time will exceed five seconds [17, 26, 33–37]. Also, we must avoid using sounds to support each gameplay action. Similarly, if question content is integrated into the game, all relevant information must be available to the player through single clicks. Gameplay Equipment. Physical equipment should provide options for seniors to adapt the gameplay to their reaction speed, degree of autonomy, and physical ability [13, 38, 39]. Game equipment such as a laptop, tablet, keyboard, or joystick must be used with some constraints to make it comfortable for seniors [10]. Complicated physical actions, such as those requiring a double mouse click, or that force the player to precisely control a pointer on the screen while having to correctly press a button, should be avoided [7– 26]. Mouse handling should be reduced to essential actions, since it requires hand-eye coordination and increases cognitive load [18]. It is preferable to use the arrow keys of a standard keyboard or a keyboard adapted to handle the game. For seniors, game equipment should avoid newer technologies that require high skills for effective use [26]. If a game controller is used, it better to use a one-handed device such as a computer mouse or the Wii Remote. Tablets must have screen sizes that are large enough to clearly display needed information [18–20]. 4.2 Application of User-Friendliness Criteria to Solitaire Quiz We now look at how the structure and content of Solitaire Quiz specifically considered ergonomic factors appropriate for older adult players. Navigation and Display. We restricted the display format of the game board to the smallest configuration used by our target audience: 1024 × 768. For larger screens, we inserted a background of the same color as the background of the board and programmed the display so that the board is positioned in the center of the screen. This window is always visible independently of the other windows that are superimposed. We limited the number of windows to only two. When the second window appears in the center of the screen, the game board becomes gray and inactive.

272

L. Sauvé and D. Kaufman

We designed learning questions to include all relevant information (question statements, answers, degree of difficulty, feedback, credits earned or lost) on the same page. The questions, answers, feedback, etc. are displayed in a second window superimposed on the game board. The size of this window is always smaller than the board. We processed and tested image display times with low, medium, and high speed connections. The display time in all cases does not require a waiting period for the computer. Finally, using the inter-rater method, we assessed the relevance of each image that illustrates a question in the game. Sound effects were added to maintain the player’s interest: music at the start of the game, Yay! for a positive answer to a question, and a discordant sound for a negative answer to a question. Gameplay Equipment. We avoided requiring a double click to perform any action, whether to answer questions, move cards in the game, open the tutorial, purchase a privilege, or choose gaming options. We opted to run the game on computers with a mouse, 15 touchscreen laptops, and tablets; this allows seniors to move the elements of the game with a mouse or with their finger. We also integrated buttons with words and symbols to make it easier for seniors who were not born in the digital age. 10

4.3 Changes to the Game Based on the Paper Testing The requested changes were made to the production specifications in terms of the interface, rules and questions. Game Interface. We changed the order of the elements in the game interface and the number of card movements needed for displaying a question to be answered. We divided the game interface into three areas (Fig. 3) to make it easier to navigate. Zone 1 (Information) contains all the information needed to understand how the game unfolds: the Options menu, the timer, the number of accumulated credits, and the access icon for the Privilege Store. Zone 2 (Game board) includes all the playing elements of the game: the Stock pile, the seven columns and the four stacks of cards. Zone 3 (Apprenticeship) refers to the educational aspect of the game: a tutorial accessible at all times and a progression line that allows you to display a question to be answered after every eight card movements in the game. We also grouped together the informational elements and the tutorial to help navigation in the Options menu. Rules of the Game. We clarified the rules of the game by simplifying and illustrating them. To ensure a bigger challenge, we added the option of playing against time and wrote a rule to explain how it works. Finally, we corrected the rule governing the number of moves needed in the game to display a question (Table 4). Questions and Feedback. We integrated a digital voice application that allows players to listen to the questions and answers of the game instead of reading them, thus facilitating the accumulation of credits while overcoming seniors’ visual impairments.

User-Centered Design

273

Fig. 3. Game interface.

Table 4. The addition of two new rules.

─ It is possible to play with a time limit: 0-5 minutes (bonus of 250 points); 5-10 minutes (bonus of 125 points); 10 minutes or more (loss of 100 points). This is optional. ─ Questions appear at each 8th Movement, when the indicator reaches the end of the progression line. A correct answer allows for the accumulation of credits. These credits allow for the purchasing of privilegesfrom the Store. These privilegeses are for taking shortcuts, or to finish a stuck game.

4.4 Testing of the Alpha Version Twelve people participated in the testing of the Alpha version: five seniors aged 55 to 64 and seven seniors aged 65 to 75. They tested the game on an Android tablet three to six times over a three-day period. Their gameplay actions were recorded in detail by the game system. We recorded observations on an observation grid and conducted individual interviews with participants. They made various comments and recommend further refinements to the development team. Comments and Recommendations about the Game Interface – All respondents liked playing the digital form of Solitaire Quiz. – Most of them found that the positioning of the majority of the elements in the game interface was readable on a 10-inch tablet but a little less on a seven-inch tablet.

274

L. Sauvé and D. Kaufman

– When there were too many cards in a column, it was difficult to see the last card because of the movement counter: Review the displaying of cards and the movement counter to make the last card in the column readable. – The Store placed to the right of the game reduces the game visualization: Position the Store in the first third of the game interface. – Most respondents (10 out of 12) would have liked to answer questions faster in order to earn credits: Reduce the number of movements of the indicator on the counter to 5. Comments and Recommendations about the Rules of the Game – Eight players indicated that there was no information indicating that they lacked enough credits to buy a privilege: Add a statement (“X, Not Available”) to warn players that they do not have enough credits to buy certain privileges. – The privileges The Chameleon Joker and Selective Freedom were not used. Players did not understand their purpose: Remove these privileges. – The purposes of the privileges Going Backwards and Joker’s Advice were not clear: Review the wording of these privileges. – Eight respondents questioned the allocation of $500 regardless of the degree of game difficulty. They considered this amount to be too great and suggested a graduated amount depending on the degree of difficulty: Review the number of credits based on the degree of difficulty: $200 for Easy, $100 for Intermediate and $0 for Difficult. – Six players suggested keeping the game rules in the Options menu to make them available as needed to those who do not know them: Keep game rules in the Options menu. Comments and Recommendations about the Feedback – Four players found that the sound used to indicate a correct answer (Yay!) was irritating after a few games: Reduce the volume of this sound and wait for the final test before changing it. – Six players wanted help in understanding certain aspects of the game. They wondered about the positioning of the tutorial under the Options menu: Insert real-time contextual help in the game interface, Options, and the Store, and remove the tutorial from the Options menu.

5 The Beta Version of the Game In order to finalize the game and make it accessible to the general public, we integrated some aspects that had not been developed in the Alpha version: the external environment of the game and contextual help, the choice between two languages (French and English), and an end-of-game page that is present in all online Solitaire games. We also made the following requested changes: the privileges offered by the Store were revised and some of them removed, and real-time contextual help was developed and integrated into the game interface, Options, and Store. Finally, the game Solitaire Quiz was made available on the Google Store to make it available to the general public.

User-Centered Design

275

5.1 Navigation in the Game’s External Environment To make a game intuitive, its external environment (interface) should not require that seniors have to think hard about what they have to do [38]. First, the different pages of the game’s interface must be standardized by using screen layouts, navigation, and terms that are consistent, simple, and easily understood [26, 35]. Navigation information needs to be simplified in order to minimize the amount of information to be memorized [40]. It is necessary to avoid complex visual displays by using known visual clues to reduce searching; seniors often forget command names and waste a lot of time searching for basic information. The number of steps and controls needed to accomplish a task must be minimized [17–40]. Older people prefer a more direct way to access information without deep hierarchies [37]. 5.2 Development of the Game’s External Environment In the Beta version, the game’s homepage includes a form for creating an account, access to the game by access code, a function for a forgotten password, and a game access button (Fig. 4).

Fig. 4. Homepage and registration.

276

L. Sauvé and D. Kaufman

Wishing to offer different learning content for the game Solitaire Quiz, we developed a page to allow players to choose a content using a search tool (Fig. 5).

Fig. 5. Game selection page.

Similarly, two pages allow players to choose the mode of play, the degree of difficulty of the game and the time challenge (Fig. 6).

Fig. 6. Game options – game mode and degree of difficulty (Source: [8, p. 215]).

5.3 Modifications Applied to the Game Contextual help (Fig. 7), accessible as needed, was included to guide seniors throughout the game. They can close it and open it at any time with a simple click on the corresponding icon.

User-Centered Design

277

Fig. 7. Example of contextual help.

Finally, we integrated feedback on the player’s performance in the form of a score at the end of the game. This score consists of money credits earned during the game plus a bonus if the player has chosen the option of playing with a time limit. In order to motivate seniors to play more often, a ranking of all players registered for the game is available at the end of the game by using the Ranking button (Fig. 8).

Fig. 8. Ending the solitaire game (Source: [8, p. 216]).

5.4 Testing of the Beta Version To test the limited online version of the game, we recruited independent seniors living at home, members of associations and seniors’ clubs, and older adults living in seniors’ residences. Their gameplay actions were recorded by the game system. Pre- and post-test questionnaires were administered, and individual interviews were conducted. Of the 42 participants, 90.5% played the game at least five times during the 14-day test period for an average duration of 7.3 min, and 42.9% played between six and nine times.

278

L. Sauvé and D. Kaufman

Demographic Data. Among the 42 participants in the Solitaire Quiz experiment, there were 19 women and 23 men. The sample included 20 participants aged 55 to 60 (47.6%) and 22 subjects aged 61 and over (52.4%). Participant’s Gaming Habits. Among the sample, nine players said that they did not have the skills to use digital games, while 18 players identified themselves as “beginners” and 15 as “intermediate” digital game players. Most participants (88%) had already played Solitaire. Over three quarters of them (78.6%) had some experience with other digital games: six players had experience of one year or less, more than half (19) had between one and five years of experience, and eight had been playing for more than six years. Of the 33 players who had some experience with these types of games, five people (15.2%) typically used them on only one day per week. Eleven players (33.3%) used digital games two or three days per week, and the same number of participants played between four and five days per week, which shows a strong preference among seniors for the use of technology for entertainment purposes (66.7% of participants played between two and five days per week). Also, of the 33 players who had experience with playing games, 11 played up to 60 min per day and, interestingly, 21 people (63.6%) used games between two and three hours per day. Player Perceptions of the Educational Game Design. With respect to the design of the educational game, 88.1% of respondents found that the game’s duration was short enough that they could finish their game in less than 10 min, and 97.6% of them found that the privileges allowed them to finish the game. As for the challenge posed by the game, three aspects were measured: 85.7% of respondents considered that the degree of difficulty of the questions represented well the challenge that they posed. For the two options, “Playing with a time limit” and the game mode (one-card or three-cards), their opinions are more moderate (57.1% and 69.0% respectively rated them as appealing). With regard to the educational aspects of the game, 90.5% of the participants responded that the game took into account their prior knowledge, since they could answer a large number of questions when they chose the “Easy” difficulty level. All players reported that question repetition was an effective strategy to help them remember and respond correctly. Nearly all respondents agreed that the game’s feedback helped them to progress in the game (92.9%), that the smiling or sad face told them clearly if a question was or was not answered correctly (95.2%), and that the sound emitted after a good answer increased their motivation (88.1%). In addition, 90.5% of the participants agree that the audible reading of questions and feedback facilitated their comprehension and avoided fatigue related to reading on the screen. In addition, 85.7% of respondents found that the images used for the questions were representative of the content. Finally, 97.6% of respondents found it to be an original way to learn about certain topics. Player Perceptions of User-Friendliness. The first aspect of the game’s userfriendliness of the game is the ease of navigating the game without contextual help. Most participants (90.5%) considered navigation in the game’s external environment (registration, choice of game learning content, choice of game mode, degree of difficulty, time

User-Centered Design

279

challenge, and staring the game) to be easy, while 16.7% of the players needed to use the help function in real time. As for the game interface (game board, questions/feedback, rules of the game, Privilege Store, contextual help), 88.1% of respondents navigated without difficulty, while 40.5% of players needed to use the help in real time. Only 9.5% of the players consulted the rules of the game, but they judged them to be well explained. Finally, more than half of the respondents considered the sounds and music in the game to be stimulating. In terms of the gameplay equipment, moving the cards using a touch screen was judged easy by 85.7% of players. Similarly, moving the cards with a mouse was described as easy by all respondents. Revision Requirements for the Beta Version. During the testing, 10 participants (five men and five women) took part in interviews to check if certain game elements should be improved. They made the following comments and recommendations: – Respondents reiterated their interest in maintaining the option “Playing with a time limit.” Most would like to experiment with this option after achieving 100% on the easy or intermediate level of difficulty. – Respondents expressed their interest in keeping the game mode choice of one card or three cards. Having never played with the one-card mode, the majority of respondents initially chose it to familiarize themselves with the game. They found that this mode allowed them to finish the game more easily. However, two of them, considering themselves intermediate-level in the use of online games, suggest maintaining the three-card mode because it represented a greater challenge for them. – Most respondents did not use the rules of the game. After reading the rules during the interview, however, all recommend keeping the rules accessible at all times in the Options menu, especially for those who have never played Solitaire Quiz. – All respondents emphasized the importance of having contextual help in real time. Some of them pointed out that these aids allowed them to understand the new rules that are not in the classic Solitaire game and that they explained how the Quiz works. – Three respondents suggested offering the players the option of modifying the Wild West theme with a theme of their choice. – The majority of respondents confirmed that moving cards with a finger or a mouse did not require special dexterity on their part and that accessing the different elements of the game was easy. – Five respondents suggested integrating a mute control for the sound, music, and digital voice.

6 Recommendations The vast majority (95.2%) of the participants liked to play Solitaire enhanced with a Quiz, and 90.5% of the players wished that they could try a new quiz. All participants would recommend the game to other older adults. Building on their feedback, the literature, and our experience during this game development process, we propose the following recommendations to help developers of educational games build online educational games for seniors.

280

L. Sauvé and D. Kaufman

6.1 Competition/Challenge • Offer games of short duration to maintain seniors’ motivation, while integrating the option of allowing players to vary the duration of the game. • Add new rules (add-ons) to maintain a sense of challenge in known games. Older adults prefer to play games that they know, with add-ons that engage them. • Integrate the option of “Playing with a time limit” for gaining additional points in order to maintain a motivating challenge. The availability of two game modes (one card or three cards) also represents different challenges in the game, according to the players’ responses. • Incorporate multiple difficulty levels or challenges to the user to foster competition, facilitate learning, build self-confidence and concentration, and better engage older adults in the game. 6.2 Learning Content • Balance learning time and playing time by integrating at least three levels of difficulty for the questions. • Classify the learning content from simple to complex in order to offer multiple levels of difficulty and inform the players that the “Easy” level corresponds to their basic knowledge, thus encouraging everyone to participate. • Use closed questions to facilitate the use of prior knowledge for progressing in the game and accumulating points. It is crucial to analyze the learning content and to break it down into small units of information; this makes it possible to formulate simple questions in order to avoid cognitive overload in seniors. • Limit the number of questions in a game to allow older adult players to recognize them and see them as useful for progression in the game. • Ensure the representativeness of images used in the questions. • Use visual or audible feedback to reinforce the answers to the questions. For example, the face that accompanies each feedback comment, along with the sound emitted for a correct response, makes it easy to quickly tell whether or not the question was answered correctly. 6.3 Navigation • Group gameplay actions on one page without a superimposed window. • Reduce the number of windows and clicks needed to access and play the game. This speeds up the pace of the game and promotes player motivation. • To avoid player confusion, organize gameplay information into zones and reduce as much as possible the number of controls necessary to accomplish a task. • Design the game board components to minimize the game’s download time. 6.4 Gameplay Equipment • Facilitate the movement of objects on the game board by using a touch screen (for tablet and touch-screen users) or a mouse (for PC and Apple users). • Avoid actions that require a double click of the mouse or that force the player to precisely control the pointer on the screen.

User-Centered Design

281

7 Conclusions Our participants were interested and engaged in playing this educational game. Although their perceptions as observed in this study relate to a specific game (Solitaire Quiz) with specific content (actions to be taken on the death of a spouse), the results can be applied to different types of games. Our study shows that the design of an educational game must take into account its target audience: it is important that a game for older adults provide an appropriate duration of play, display game progression, provide an appropriate level of difficulty, and be adapted in many specific ways for this audience. It is also important to reduce the risk of player frustration by posing an interesting challenge. To make the game easier for seniors to use, it is important that the components of the game are visible within the screen, that the grouping of players’ actions accelerates the game and keeps up players’ motivation, and that the use of the mouse or the touch screen makes actions in the game easy to perform and requires little manual dexterity. Finally, our use of the user-centered design process enabled significant changes to be made to the game interface in the first two versions of the Solitaire Quiz, which helped to improve the design for seniors as well as saving costs, since unnecessary features or critical usability issues were identified early in the development process [33]. Acknowledgements. These research studies were funded in part by grants from the Social Sciences and Humanities Research Council of Canada and from AGE-WELL NCE, Inc., a member of Canada’s Networks of Centres of Excellence progam. We would like to thank Gustavo Adolfo Angulo Mendoza for statistical analysis of the study data, Curt Ian Wright for the translation, and Alice Ireland for editing the manuscript. We would also like to thank the development team, Pierre Olivier Dionne, Jean-Francois Pare, and Louis Poulette, for the online educational game.

References 1. Diaz-Orueta, U., Facal, D., Herman Nap, H., Ranga, M.-M.: What is the key for older people to show interest in playing digital learning games? Initial qualitative findings from the LEAGE Project on a Multicultural European Sample. Games Health 1(2), 115–123 (2012) 2. Astell, A.J.: Technology and fun for a happy old age. In: Sixsmith, A., Gutman, G. (eds.) Technologies for Active Aging, pp. 169–187. Springer, New York (2013). https://doi.org/10. 1007/978-1-4419-8348-0_10 3. Marston, H.R.: Design recommendations for digital game design within an ageing society. Educ. Gerontol. 39(2), 103–118 (2013). https://doi.org/10.1080/03601277.2012.689936 4. Marston, H.R.: The future of technology use in the fields of gerontology and gaming. Gener. Rev. 24(2), 8–14 (2014) 5. Kaufman, D., Sauvé, L., Renaud, L., Duplàa, E.: Enquête auprès des aînés canadiens sur les bénéfiques que les jeux numériques ou non leur apportent [Survey of Canadian Seniors to Determine the Benefits Derived from Digital Games]. Research report, TELUQ, UQAM, Quebec (Qc), Simon Fraser University, Vancouver (BC), University of Ottawa, Ottawa (2014) 6. Nogier, J.-F.: Ergonomie du logiciel et design web: Le manuel des interfaces utilisateur [Ergonomics of Software and Web Design: Handbook on User Interfaces]. (4th edn.). Dunod, Paris, France (2008)

282

L. Sauvé and D. Kaufman

7. Sauvé, L., Plante, P., Mendoza, G.A.A., Parent, E., Kaufman, D.: Validation de l’ergonomie du jeu Solitaire Quiz: une approche centrée sur l’utilisateur [Validation of the Ergonomics of the Game Solitaire Quiz: A User-Centered Approach]. Research Report, TÉLUQ, Quebec, Simon Fraser University, Vancouver, British Columbia (2017) 8. Sauvé, L., Kaufman, D.: Learning with educational games: adapting to older adult’s needs. In: Lane, H., Zvaceks., Uhomoibhi, J. (eds.) Proceedings of 11th International Conference on Computer Supported Education - CSEDU 2019, Heraklion, Crete, Greece, 2–4 May, vol. 1, pp. 213–230 (2019) 9. Sauvé, L., Renaud, L., Kaufman, D.: Les jeux, les simulations et les jeux de simulation: pour l’apprentissage: définitions et distinctions [Games, Simulations and Simulation Games for Learning: Definitions and Distinctions]. In: Sauvé, L., Kaufman, D. (eds.) Jeux et simulations éducatifs: Études de cas et leçons apprises [Educational Games and Simulations: Case Studies and Lessons Learned], pp. 13–42. Presses de l’Université du Québec, Sainte-Foy, Québec, Canada (2010) 10. Sauvé, L.: Online educational games: guidelines for intergenerational use. In: Romero, M., Sawchuk, K., Blat, J., Sayago, S., Ouellet, H. (eds.) Game-Based Learning Across the Lifespan. AGL, pp. 29–45. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-417 97-4_3 11. Beaudoin, J., Kooli, N., Thomas, F., Arlabosse, B., Couture, A., Danjou, R.: Génération @: Portrait de l’utilisation d’internet et de l’ordinateur par les aînés internautes du Québec [Generation@: Portrait of Internet and Computer Use by Quebec’s Senior Internet Users]. Report, CEFRIO, Québec, Canada (2011) 12. Brand, J.E., Lorentz, P., Mathew, T.: Digital Australia 14. National research prepared by Bond University for the Interactive Games & Entertainment Association. IGEA, Sydney (2014). http://igea.wpengine.com/wp-content/uploads/2013/11/Digital-Australia-2014-DA14.pdf 13. Sauvé, L., Renaud, L., Mendoza, G.A.A.: Expérimentation du jeu de Bingo «Pour bien vivre, vivons sainement!» [Experimenting with the Bingo Game “Live Well, Live Healthy!”]. Research Report. CRSH, TÉLUQ, UQAM et SAVIE, Québec, Canada (2016) 14. De Schutter, B.: Never too old to play: the appeal of digital games to an older audience. Games Cult. 6(2), 155–170 (2011) 15. Sauvé, L.: Les jeux éducatifs efficaces [Effective Educational Games]. In: Sauvé, L., Kaufman, D. (eds.) Jeux et simulations éducatifs : Études de cas et leçons apprises [Educational Games and Simulations: Case Studies and Lessons Learned], pp. 43–72. Presses de l’Université du Québec, Sainte-Foy, Québec, Canada (2010) 16. Dinet, J., Bastien, C.: L’ergonomie des objets et des environnements physiques et numériques [The Ergonomics of Objects and of Physical and Digital Environments]. Lavoisier, Hermès, Paris (2011) 17. Gamberini, L., Raya, M.A., Barresi, G., Fabregat, M., Ibanez, F., Prontu, L.: Cognition, technology and games for the elderly: an introduction to ELDERGAMES project. Psychnol. J. 4(3), 285–308 (2006) 18. Mahmud, A.A., Shahid, S., Mubin, O.: Designing with and for older adults: experience from game design. In: Zacarias, M., de Oliveira, J.V. (eds.) Human-Computer Interaction: The Agency Perspective. SCI, vol. 396, pp. 111–129. Springer, Heidelberg (2012). https://doi. org/10.1007/978-3-642-25691-2_5 19. Mubin, O., Shahid, S., Al Mahmud, A.: Walk 2 win: towards designing a mobile game for elderly’s social engagement. In: England, D. (ed.) Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction, 1–5 September, vol. 2, pp. 11–14. BCS Learning & Development Ltd., Swindon, Royaume-Uni (2008)

User-Centered Design

283

20. Marin, J.G., Navarro, K.F., Lawrence, E.: Serious games to improve the physical health of the elderly: a categorization scheme. In: CENTRIC 2011, The Fourth International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services, pp. 64–71. IARIA, Wilmington, DE (2011). http://www.thinkmind.org/index.php?view=art icle&articleid=centric_2011_3_20_30056 21. Whitlock, L.A., McLaughlin, A.C., Allaire, J.C.: Video game design for older adults: usability observations from an intervention study. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, September, vol. 55, no. 1, pp. 187–191 (2011). https:// doi.org/10.1177/1071181311551039 22. Kickmeier-Rust, M., Holzinger, A., Albert, D.: Fighting physical and mental decline of the elderly with adaptive serious games. In: Felicia, P. (ed.) Proceedings of the 6th European Conference on Games Based Learning, 4–5 October, pp. 631–634. Conferences international Limited (2012) 23. Ogomori, K., Nagamachi, M., Ishihara, K., Ishihara, S., Kohchi, M.: Requirements for a cognitive training game for elderly or disabled people. In: International Conference on Biometrics and Kansei Engineering (ICBAKE), pp. 150–154. IEEE, New York (2011). https://doi.org/ 10.1109/icbake.2011.30 24. Wu, Q., Miao, C., Tao, X., Helander, M.G.: A curious companion for elderly gamers. In: 2012 Southeast Asian Network of Ergonomics Societies Conference (SEANES), July, pp. 1–5. IEEE, Langkawi (2012). https://doi.org/10.1109/SEANES.2012.6299597 25. Callari, T.C., Ciairano, S., Re, A.: Elderly-technology interaction: accessibility and acceptability of technological devices promoting motor and cognitive training. Work 41(Suppl. 1), 362–369 (2012) 26. Lopez-Martinez, A., Santiago-Ramajo, S., Caracuel, A., Valls-Serrano, C., Hornos, M.J., Rodriguez-Fortiz, M.J.: Game of gifts purchase: computer-based training of executive functions for the elderly. In: 1st International Conference on the Serious Games and Applications for Health (SeGAH), 16–18 November, pp. 1–8 (2011). https://doi.org/10.1109/segah.2011. 6165448 27. Marston, H.R., Smith, S.T.: Interactive videogame technologies to support independence in the elderly. A narrative review. Games Health J. 1/2, 139–152 (2012). https://doi.org/10.1089/ g4h.2011.0008 28. Senger, J., et al.: Serious gaming: enhancing the quality of life among the elderly through play with the multimedia platform SilverGame. In: Wichert, R., Eberhardt, B. (eds.) Ambient Assisted Living. ATSC, pp. 317–331. Springer, Heidelberg (2012). https://doi.org/10.1007/ 978-3-642-27491-6_23 29. Barnard, Y., Bradley, M.D., Hodgson, F., Lloyd, A.D.: Learning to use new technologies by older adults: perceived difficulties, experimentation behaviour and usability. Comput. Hum. Behav. 29(4), 1715–1724 (2013) 30. Hwang, M.-Y., Hong, J.-C., Hao, Y.-W., Jong, J.-T.: Elders’ usability, dependability, and flow experiences on Embodied interactive video games. Educ. Gerontol. 37(8), 715–731 (2011) 31. Game accessibility Guidelines: Game accessibility guidelines (2012–2015). Full list. http:// gameaccessibilityguidelines.com/full-list 32. Shneiderman, B., Plaisant, C., Cohen, M., Jacobs, S., Elmqvist, N., Diakopoulos, N.: Designing the User Interface: Strategies for Effective Human-Computer Interaction, 6th edn. Pearson, Boston (2016) 33. Nielsen, J.: Designing Web Usability, the Practice of Simplicity. New Riders Publishing, Indianapolis (2000) 34. Adams, E., Rollings, A.: On Game Design. New Rider Publishing, Indianapolis (2003) 35. Sauvé, L.: Quelques règles médiatiques à respecter lors de la production d’une coquille générique de jeu éducatif [Usability Guideliness for a Generic Educational Game Shell]. In:

284

36.

37.

38.

39.

40.

L. Sauvé and D. Kaufman Sauvé, L., Kaufman, D. (eds.) Jeux éducatifs et simulations: étude de cas et leçons apprises [Educational Games and Simulations: Case Studies and Lessons Learned], pp. 529–544. Presses de l’Université du Québec, Québec (2010) Rice, M., et al.: Evaluating gesture-based games with older adults on a large screen display. In: Taylor, T.L. (ed.) Proceedings of the 2011 ACM SIGGRAPH Symposium on Video Games, 7–11 August, pp. 17–24. ACM, Vancouver (2011) Muskens, L., van Lent, R., Vijfvinkel, A., van Cann, P., Shahid, S.: Never too old to use a tablet: designing tablet applications for the cognitively and physically impaired elderly. In: Miesenberger, K., Fels, D., Archambault, D., Peˇnáz, P., Zagler, W. (eds.) ICCHP 2014. LNCS, vol. 8547, pp. 391–398. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-085968_60 Caprani, N., O’Connor, N.E., Gurrin, C.: Touch screens for the older user. In: Cheein, F.A.A. (ed.) Assistive Technologies. InTech, Rijeka, Croatia (2012). http://www.intechopen.com/ books/assistive-technologies/touch-screens-for-the-older-user Loureiro, B., Rodrigues, R.: Design guidelines and design recommendations of multi-touch interfaces for elders. In: The 7th International Conference on Advances in ComputerHuman Interactions (2014). http://www.thinkmind.org/index.php?view=article&articleid= achi_2014_2_30_20162 Lee, J., Park, S.: A study on interface design of serious games for the elderly. Adv. Sci. Technol. Lett. 39, 159–163 (2013). https://doi.org/10.14257/astl.2013.39.30

Population Growth Modelling Simulations: Do They Affect the Scientific Reasoning Abilities of Students? Kathy Lea Malone1(B) and Anita Schuchardt2 1 Graduate School of Education, Nazarbayev University, Kabanbay Batyr 53, Astana 01000,

Kazakhstan [email protected] 2 Department of Biology Teaching and Learning, University of Minnesota, Minneapolis, MN 53455, USA [email protected]

Abstract. Students need to be able to be able to develop their scientific reasoning skills in secondary schools by collecting data and developing science models. Internationally, a greater number of countries are developing nationwide standards that require the use of hands-on approaches, the development and use of science models by students and include final assessments of their scientific reasoning skills. However, in biology classes this can be difficult due to the nature of the subjects. This paper discusses the use of spreadsheet-based simulations within the context a modelling-based pedagogical unit focused on population growth in introductory secondary level biology classes. The effect of the implementation on students scientific reasoning skills were assessed in terms of scientific reasoning sub-skills as well as Piagetian reasoning levels within the context of a quasi-experimental design study. The findings suggest that the implementation was successful with the treatment cohort usually outperforming the comparison cohort. Keywords: Modelling instruction · Science modelling · Simulations population growth · Scientific reasoning

1 Introduction Internationally, the development of scientific or higher order reasoning skills has become a priority [1, 2]. The Programme for International Student Assessment (PISA) results for many nations has revealed that students continue to struggle with the interpretation of scientific data and have little ability to develop scientifically sound conclusions based on that data [2]. The causes for this deficit could be numerous and might include the lack of student-centred science classes using authentic inquiry-based practices or possibly the lack of science lab facilities in schools. However, a more common barrier, especially in the biological concepts of population growth and evolution might be due to data collection difficulties owing to the time needed to collect data over the course of multiple generations [3, 4]. Computer simulations are one possible technique that could be used to allow students to collect data that would normally not be able to be collected in a © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 285–307, 2020. https://doi.org/10.1007/978-3-030-58459-7_14

286

K. L. Malone and A. Schuchardt

timely fashion in secondary schools [5]. In addition, the student construction of scientific conclusions based on reasoning using authentic practices such as the science modelling used by scientists could assist students in improving not only their knowledge base but also their sense-making or reasoning skills [6]. However, there is a dearth of studies that attempt to use population growth simulations emphasizing scientific modelling within the context of secondary biology classrooms. Heightened scientific reasoning skills on the part of students has been shown in multiple studies to correlate with increased conceptual change in science [e.g., 7, 8]. One type of scientific reasoning skill is the interpretation of data and the development of scientific conclusions. These are the same skills that PISA has shown that students worldwide lack [2]. Thus, it behoves policy makers, educational practitioners, and researchers to develop school science interventions that target the improvement of scientific reasoning skills as these skills can affect students’ lifelong learning. However, few studies have focused on assessing shifts in students’ scientific reasoning when using simulations in the context of a Modelling Instruction class in biology. This chapter attempts to fill this gap by detailing a USA based quasi-experimental study that assesses the effectiveness of population growth simulations on student scientific reasoning skills between students taught within the context of a Modelling Instruction classroom and those in a traditionally taught classroom (i.e., no simulations or modelling). This chapter is an expanded version of a proceedings paper presented at the 11th International Conference on Computer Supported Education [9]. This chapter includes a more detailed literature review and methods sections as well as an expanded data analysis. This study was guided by the following research goal: • Will the use of population growth simulations in the context of a Modelling Instruction in biology class increase students scientific reasoning skills beyond that experienced by students in a comparison class?

2 Literature Review This section begins by highlighting the learning challenges in the teaching of population growth. This section also includes a detailed description of Modelling Instruction in general and specifically in biology. Past instructional interventions in population growth are discussed as well as those specifically utilizing scientific modelling simulations and/or modelling-based interventions. Finally, this section will include a discussion of past studies focused on scientific reasoning studies and their connection to modelling and simulations. 2.1 Learning Challenges in Population Growth Several studies show that at all educational levels, students harbour many misconceptions or alternative conceptions about population growth even after instruction in biology. Brody and Koch [10] found that many students consider an ecosystems’ resources to be limitless which could lead to continuous population growth. However, the misconceptions about what happens to populations over time are quite abundant. Studies have also

Population Growth Modelling Simulations

287

determined that many students think populations are either in a constant state of growth or decline while others believe that organisms in a population increase till some limit is reached then the entire population crashes concluding in extinction [11]. Thus, there are numerous faulty ideas held by students in this conceptual area. In addition, the understanding of predator-prey relationships is fraught with many student difficulties. For example, students were found to believe that two organisms can only affect each other if they share a predator-prey relationship [12]. This is problematic since organisms can affect each other’s growth in many additional ways such as by competition or symbiosis. Even when middle school students know the term competition, Stammen [13] found that they believed that competition within an ecosystem always involved aggressive interactions similar to those of predation. These alternative conceptions do not portend well for students learning a strong correct conception of population growth that would support their learning of other biological concepts such as evolution and genetics. 2.2 Models, Modelling and Modelling Instruction The use of science models and modelling in science classrooms has become quite prevalent in several countries worldwide due to its inclusion in secondary level science standards [e.g., 14, 15]. This can be difficult for teachers since it has been determined that science teachers do not have a strong grasp on science models in general nor on how to use them in their classrooms [e.g., 16–20]. These studies demonstrate a need to not only train teachers about science models and modelling but also to produce curriculum units for school usage. This chapter details the development of a unit to teach population ecology to secondary school students via the use of models, modelling, and simulations as well as the quasi-experimental study to determine the effects of that unit on scientific reasoning. Science Models and Scientific Modelling. Science modelling is considered to be either the process of teachers guiding students in either the construction of science models via empirical data or the use of empirical data to test the effectiveness of already existing science models. Science models are idealized representations of the world that can be used to communicate not only science concepts but also predictions about scientific systems [21, 22]. When teachers or lay people are asked about a scientific model in many cases, they consider it to be only a 2D or 3D representation of a specific science phenomena such as an animal cell [19, 20]. However, scientifically speaking science models are much more that a straightforward physical representation but also can include graphical, pictorial, diagrammatic, mathematical, computer and/or verbal representations. The multiple representations shown in Fig. 1 are derived from a specific scientific system or phenomenon and they in turn help students to produce a mental model of the situation at hand. The use of multiple representations alone has been shown to lead to improved conceptual learning [e.g., 23, 24]. Harrison and Treagust [25] found that during problem solving experts can switch easily from one representation to another as required by the problem. Since experts use the model to make predictions in multiple contexts, the representations evolve and change

288

K. L. Malone and A. Schuchardt

Fig. 1. Scientific Model representations for Population Growth Model [Source: 9].

to make the model and its’ multiple representations more predictive. This continuous cycle of model development is known as the scientific modelling cycle (see Fig. 2). Thus, scientists use models and the modelling cycle to reason through old and new science concepts [6].

Fig. 2. The modelling cycle [Source: 9].

Studies focused on the use of models and modelling in science classrooms have shown that their use can increase students’ conceptual gains, problem solving abilities, model competency and fascination with science [e.g., 26–33]. However, these studies

Population Growth Modelling Simulations

289

were mostly in the areas of physics and chemistry, especially at the secondary school level. Some of the secondary level biology research studies were focused on student revision of pre-constructed biological models and did not allow students the ability to design experiments to collect data. For example, Passmore and Stewart [31] designed the MUSE pedagogy for secondary school students that focused on students’ comparison of previously determined models against empirical data collected by others. They determined that students improved their understanding of models. At the college level, Dauer et al. [27] had undergraduates develop models in multiple biological areas over the course a semester. They discovered that using this approach allowed for greater gains in model accuracy for students with lower GPAs. While both these studies determined that students understood models better, neither study had a control group, nor did they test for conceptual understanding and shifts in scientific reasoning ability. In addition, the use of multiple representations, the modelling cycle and control groups were not always utilized in the studies. Malone et al. [30] assessed the effectiveness of a model-based natural selection and population ecology unit using a quasi-experimental design. It was determined that students’ fascination with science, competency with model representations and their conceptual understanding increased over the course of the unit. However, in this study the design also included an engineering design challenge theme and it did not assess for changes in students scientific reasoning skills. However, these studies are focused at most on a semester and at worst usually only a single unit. Very few studied the use of modelling over time. Lehrer and Schauble [34] have shown that elementary students can start developing preliminary models in biology. However, they believe their findings demonstrate the need for students to be able to consistently use modelling techniques over time and to revisit their models over time. Modelling Instruction in Science. Modelling Instruction is a scientific modelling pedagogy that makes use of student generated empirical data to develop models with multiple representations [35]. See Fig. 1 for an example of model representations in population ecology. It has been used in multiple science disciplines at the secondary school level. Modelling Instruction (MI) explicitly uses the Modelling cycle employed by practicing scientists (see Fig. 2). Students use the data collected from student designed experiments to develop a model that consists of multiple representations. The initial science model is used to make predictions about behaviour which can be checked against the initial empirical data. If predictions are not in line with the empirical data, then student revisions to the model and its representations are produced. The revised model is then tested by students in other contexts. The cycle is continuous so that if at any time predictions do not match the collected data, revisions of the model representations are considered by the students. This allows students to develop a robust understanding of the science concepts. Importantly, MI units are developed to have students confront the fact that at times their models fail. This allows them to confront the idea that failure is an opportunity to refine an existing model or determine that an entirely new model is needed. Modelling Instruction has been shown to be effective in chemistry and physics classes at increasing students’ conceptual understanding, problem solving abilities, and

290

K. L. Malone and A. Schuchardt

metacognition as well as improving their ability to handle failure [28, 29, 36–38]. However, only two studies have been published about the effect of Modelling Instruction in Biology, one in evolution [39] and one in population ecology [9]. Malone et al., [39] demonstrated that the use of Modelling Instruction and physical simulations can be effective in increasing student conceptual understanding in evolution. The quasiexperimental study determined that modelling students showed a decline in alternative conceptions as well as an increase in the use of multiple representations when explaining evolutionary concepts over that of a comparison group. Malone and Schuchardt’s [9] conference proceedings paper demonstrated that the use of modelling instruction in a population ecology unit produced an increase in MI students’ scientific reasoning skills over that of a comparison group. This chapter expands on that proceedings paper by exploring specific scientific reasoning sub-skill differences between cohorts as well as student shifts in Piagetian reasoning levels. 2.3 Simulations, and Modelling in Population Growth A meta-analysis showed that simulations have a beneficial effect over that of units with no simulations in secondary schools [40]. However, a review of studies determined that the effectiveness of computer simulations in schools might depend upon how they are deployed [41]. The review suggested that in order to be most effective simulations should be incorporated into the classroom pedagogy while also encouraging reflection on the part of the students. This might be the reason that Wilensky and Reisman [42] found mixed results in their study of one secondary school student’s attempt to produce a simulation modeling predation using Net Logo. The student was able to produce a model of predation using the simulation that was predictive of observed lab outcomes. However, the produced model was not consistent with real-life observations. Another study using preconstructed evolution simulations in the seventh grade called WISE [43] did show a gain in conceptual understanding on the part of the students who were required to write explanations that differentiated between different science models when their teacher reinforced the students’ work. Unfortunately, there was not a comparison group in this study. However, few of these studies focus on the effect of simulations which are embedded within the context of a pedagogy that is fully modelling based. Malone et al. [30] did use an excel based preconstructed simulation embedded in a modelling-based pedagogy to teach natural selection and population ecology within the context of an engineering themed unit. They found that this was indeed effective at the secondary level in terms of student conceptual understanding. This quasi experimental study showed a significant gain in student understanding of population growth and natural selection over that of a comparison group as well as an increase in student use of multiple representations. In addition, students demonstrated a greater fascination with science. However, this study did not test for shifts in student scientific reasoning skills and the results were confounded by the engineering theme. Thus, studies showing conceptual gains used simulations that were incorporated into specific pedagogical units as was suggested by Smetana and Bell [41]. However, none focused on determining if there was also a shift in students’ scientific reasoning skills which would be important in terms of producing possible shifts in PISA performance across countries [2].

Population Growth Modelling Simulations

291

2.4 Scientific Reasoning, Modelling Instruction and Simulations As mentioned in the introduction the link between scientific reasoning skills and science content gains has been studied. However, fewer studies have focused on the link between simulation use in science classrooms and possible increases in scientific reasoning. Scientific Reasoning. The learning of scientific knowledge can be a difficult task for students, and a number of studies imply that scientific reasoning skills are necessary for student success [44, 45]. Scientific reasoning skills have been defined in multiple ways. Lawson [45] defined it as the use of hypothetico-deductive reasoning and thus focused on abstract reasoning and hypothesis testing. Kuhn and Dean [46] defined it as the ability to coordinate between evidence and theory, while Russ et al. [47] tended to consider it the ability to determine causal mechanisms from data. All of these scientific reasoning skills are important for students to master and they tend to overlap. For example, in order to determine causal mechanisms from data one needs to complete some hypothesis testing and in order to develop or refine scientific models one must coordinate between theory and evidence. In fact, these three large overlapping constructs can be broken down into a number of sub-skills that promote conceptual change. These skills range from hypothesis testing (i.e., hypothetico-dedcutive reasoning), control of variables (i.e. experimentation skills), correlational reasoning, probabilistic reasoning (i.e., the use of probability and deductive logic) and proportional reasoning [48–50]. A number of studies have found a correlation between students’ incoming scientific reasoning abilities and their increase in scientific knowledge at the college level [9, 51– 53] as well as at the secondary level [7]. Thus, it seems that one of the goals of secondary school science should be to raise students’ scientific reasoning abilities in order to help ensure their future success in science. Links Between Scientific Reasoning and Modelling Instruction. Modelling Instruction pedagogy as discussed above is a constructivist-based methodology that actively engages students in the development and assessment of science models. Students actively engage in the work of science. Students are required to design and conduct experiments in order to collect data. The experiments require the control of variables in order to collect data that can be used during data modelling activities. They use the data to look for patterns between variables. The patterns lead to the development of multiple representations that link together to produce a model of the science phenomena in question. The students then use the newly generated model to make and test hypotheses in other contexts. The students participate in student discourse and argumentation as they refine and discuss their work allowing for the descriptive power of the model to be reflected upon [34]. All these tasks that students engage in during a typical Modelling Instruction lesson require the use and ultimately enhance their development of numerous scientific reasoning sub-skills. Thus, MI should have a positive effect upon student development in terms of increasing their scientific reasoning skills. However, the process described above is easy to accomplish in physics and chemistry but not so in biology classes, especially in terms of population growth. As mentioned previously, in biology classes the time

292

K. L. Malone and A. Schuchardt

needed to carry out experiments over multiple generations is much too time consuming to complete in secondary biology classes. The opportunity to use modelling-based simulations should allow students to design experiments, control variables and complete data modelling activities in a reasonable amount of time. As discussed previously, it has been shown that the use of model-based computer and physical simulations can increase conceptual abilities but the effect on scientific reasoning skills has not been assessed to date. Therefore, it is not known how modelling based simulation activities affect students’ shifts in scientific reasoning pre to post implementation. This study attempts to fill these gaps by testing the effect on students’ scientific reasoning skills of a science modelling simulation embedded in a Modelling Instruction population growth curricular unit.

3 Methods This quantitative evaluation study looks at the differences in scientific reasoning subskills pre to post instruction between students in two conditions: a treatment and a comparison group. The treatment group was taught using a spreadsheet (google sheets) modelling based computer simulation embedded in a Modelling Instruction pedagogical curriculum unit. The comparison group did not actively use computer-based simulations nor specific modelling-based activities. 3.1 Research Questions The study was guided by the following research questions: 1) Will the students in the Modelling Instruction cohort experiencing population growth simulations develop increased scientific reasoning sub-skills greater than those of the students in the comparison cohort? 2) Will the Modelling Instruction cohort students experience a greater shift towards Piagetian Formal Reasoning Stages than that of the students in the comparison cohort? 3.2 Participants and Settings The participants in this study were first year regular level high school biology students. The study took place in the Midwestern region of the United States. All the students attended a suburban school district but were located at different high schools within the district. The implementing cohort consisted of 205 students and was taught by a teacher in their first year of modelling instruction implementation. The comparison cohort of 141 students was taught by two different teachers using a traditional biology unit. All three teachers had similar backgrounds and total years of teaching experience. The traditional teachers were not implementing any new curriculum units. Thus, their students could have had an advantage over the first year implementing teacher since they were more practiced in the units they were presenting to students.

Population Growth Modelling Simulations

293

3.3 Population Growth Modelling Instruction Unit The MI unit was conducted at the start of the school year and began with a pre-assessment activity that required students to consider what would happen if all the plants in the world died. This was used to draw out their preconceptions. No “correct” answers were given to the question by the teacher as students only shared their initial thoughts and ideas. In addition, the implementing teacher requested that students supply their reasoning for any of the claims they were making to connect their predictions to their past biology knowledge from middle school. Next, the students were introduced to two species of paramecium (P. caudatum and P. aurelia) using microscopes. The students were asked to consider what they thought might happen to the population size of each species after 100 years if they were in a location without any predators. The students were broken into groups to develop a prediction and methods to represent or “show” their predictions to the rest of the class. This was the first Modelling Instruction modelling cycle unit experience by the implementation students. Thus, most student groups drew pictures about what would happen over time and only a few represented their predictions in other representational formats. During group sharing, some student groups developed diagrams, storyboards or graphical representations of their predictions and these were shared with the class during class discussion. Thus, after the student discourse session, the representations could be quite diverse depending on the background of the students in each class. In this case, most of the graphical representations were in the form of bar graphs and pie charts without any line graphs showing population growth over time. As students were sharing their representations, they were asked to describe what biological ideas might be influencing the changes they predicted. After the prediction phase, groups considered how they might investigate this question using paramecium. The students used the internet to discover more information about the life cycle of paramecium and the number of offspring in each generation. In addition, they were asked to consider any variables that might be affecting their experimental designs and which of those would be independent, dependent and which would need to be held constant. The students discovered that if they used live paramecium the time to collect data would be much too long for the time allotted to them (i.e., less than a week). Consequently, the teacher introduced the use of google sheet simulations to the students. The lab was conducted by dividing the class in half so that each half worked with one or the other species google spreadsheet simulation. This allowed groups to see what happened to the output due to the difference in growth rate between these two species of paramecium. The google sheet simulation consisted of an input page where students could decide upon a number of different initial conditions, as well as having the output generated as a graph and/or data sheet as well as an equations sheet. The equations sheet was there to show students the mathematical growth formula if the teacher desired to do so. The simulation input pages were designed to contain the variables that were requested by pilot students. The input page asked students to select the initial population size, whether the paramecium had limited or unlimited resources, and their generation time. In addition, they had to input the container size, number of offspring produced per

294

K. L. Malone and A. Schuchardt

generation and the number of paramecium that died each generation. See Fig. 3 for more detail about the input page.

Fig. 3. Simulation input page. [Source: 9].

At this point, depending upon students’ abilities with graphs, the teacher assigned the class to either focus on just the simulation generated data charts (see Fig. 4) or the simulation generated graphs (see Fig. 5). The ones that focused only on data charts were asked to hand graph their output data. This allowed for a comparison of graphing techniques between groups since many did not originally draw line graphs. This activity allowed for a class discussion on why in this case a line graph was a better representation of the data than a bar graph. The simulation output included the number of “observed” as well as “predicted” paramecium and these values changed depending on the variables inputted into the simulation on the input page. The students were tasked with changing the input page so that the predicted numbers matched the observed numbers on the generated graph (see Fig. 5). As they did this, they were asked to discuss with their teachers their changes as well as explain why and what part of their prediction they were changing before they looked at the results. This was done to scaffold a more explicit rationale for changes as well as greater control of variables as it was found in the pilot that several students would change multiple inputs without having a rationale. After the simulation, students were asked to construct large poster displays which detailed how their predictions changed depending on input selections. For example, they detailed both limited and unlimited food supplies, as well as large vs small containers, etc. Each student group produced a number of representations of their findings and then the class with the teacher’s guidance developed a class consensus. The teacher

Population Growth Modelling Simulations

295

Fig. 4. Simulation’s data output page. [Source: 9].

only assisted in focusing the student discourse about the final consensus so that it was very much a student driven model. The final consensus consisted usually of graphical representations (see Fig. 6), diagrammatic representations (see Fig. 7) as well as verbal representations. An example of a verbal representation is: As the days go by, the number of paramecium increase at a greater rate. The relationship is not linear so that when you double the days the number does not double. Different organisms have different growth rates. At this point the class had not determined a mathematical representation. The mathematical representation was later developed using the data from the two types of paramecium. The unit then had the students deploy or test their new model for population growth in multiple contexts; for example, what happens when two paramecia are living together in the same container. Thus, students continued to refine their model further. 3.4 Research Instrument Many instruments have been developed and used to measure students’ scientific reasoning, however, many of these tend to target scientific literacy broadly and not the specific sub-skills of scientific reasoning that this study is targeting [54]. However, one assessment, Lawson’s Classroom Test for Scientific Reasoning (LCTSR) can be used to not only assess the overall scientific reasoning ability of students but also allows

296

K. L. Malone and A. Schuchardt

Fig. 5. Simulation’s Graphical Output Page - students’ predictions almost match the observed graph [Source: 9].

Fig. 6. Sample of graphical representations of the model. [Source: 9].

for an assessment of the specific sub-skills that are pertinent to Modelling Instruction pedagogy [55]. These sub-skills consist of control of variables, correlational reasoning, proportional reasoning and hypothetico-deductive reasoning. In addition, the LCTSR

Population Growth Modelling Simulations

297

Fig. 7. Sample diagrammatic representation [Source: 9].

has the advantage of having been used in multiple studies across a number of contexts [54]. Therefore, this study chose to make use of the 24 item two-tiered Lawson’s Classroom Test for Scientific Reasoning (LCTSR) as a pre and posttest [55]. The single tiered analysis of the assessment scores allow for not only a total scientific reasoning score but also for the determination of separate sub-skill scores. The LCTSR also had the advantage that the results can be used to determine the number of students at different Piagetian reasoning stages (i.e., Formal reasoner, Late Transitional Reasoner, Early Transitional Reasoner and Concrete Reasoner) when scored as a two-tiered assessment. Piagetian reasoning stages are based on student abilities to apply deductive reasoning skills to abstract hypothetical problems. Lawson’s test allows one to identify learners as Level 0 (Piagetian formal operational reasoners), Level 1 and 2 (Piagetian transitional reasoner) or Level 3 (Piagetian formal operational reasoner). Thus, based on the scores obtained students can be categorized into separate reasoning levels. Figure 8 compares the LCTSR to the three levels of Piagetian formal reasoning and specifies reasoning characteristics at each level. The shifts in Piagetian reasoning levels would allow for another picture to develop in terms of the efficacy of the modelling embedded simulation. The pre-test LCTSR was given within the first 2 weeks of the school year and the post test was given during the last month of the school year. 3.5 Data Analysis and Results Since this study is focused on population ecology only 20 of the 24 items on the LCTSR were analysed. The 4 items not analysed focused on conservation of mass and volume which were not considered pertinent to this study since neither of these were directly focused upon in the implementation or comparison of population growth units. Single-tiered Question Analysis. In order to determine overall differences between the two cohorts a single-tiered analysis was completed whereby all 20 items on the LCTSR were treated as independent from one another. The total scores on the singletiered analysis were analysed for statistical differences. The data was then subdivided in order to obtain a single score for each scientific reasoning sub-skill. These values were analysed to determine similarities and differences in sub-skills between cohorts.

298

K. L. Malone and A. Schuchardt

Fig. 8. Comparison of LCTSR with Piagetian reasoning levels (Source: 56}.

Total Scientific Reasoning Scores. Initially, the pre-tests of the treatment and comparison cohorts were analysed to determine if there were any significant differences in overall average scientific reasoning between the two cohorts prior to the study implementation. The treatment and comparison cohorts average scientific reasoning scores (M = 37.23 and 34.5, respectively) were not significantly different from each other (t (345) = 1.2, p = 0.23). Thus, the ability levels of the two groups were not different from each other at the start of the school year. This is not surprising since both cohorts came from the same district and their socio-economic backgrounds were similar. Figure 9 shows the overall average scientific reasoning pre-test and post-test scores for the two cohorts. The t-test results for a paired pre to post-test comparison of scientific reasoning scores were significant for the treatment cohort (t (410) = 3.29, p < 0.001) but not for the comparison cohort (t (280) = 1.52, p < 0.13). Thus, demonstrating that there was not a shift in reasoning ability for the comparison students but there was an overall significant shift in reasoning ability for the treatment cohort.

Fig. 9. Single tiered LCTSR pre and post test scores by Cohort.

Population Growth Modelling Simulations

299

In order to determine if there was any difference between the two cohort’s post reasoning levels the post-test scores of the two cohorts were also compared. The post-test scores between the treatment and comparison cohorts (M = 43.25 and 37.34, respectively) were significantly different (t (345) = 2.92, p < 0.004) further demonstrating that the treatment cohort enhanced their scientific reasoning abilities over the course of the school year to a greater extent than the comparison cohort, 12% vs 5%, respectively. Analysis of Scientific Reasoning Sub-skills. The distribution of scientific reasoning subskills on the posttest by cohort can be seen in Fig. 10. Across all dimensions the post-test sub-skill scores earned by the treatment cohort were larger than that of the comparison cohort. The proportional reasoning subskill is a bit concerning given that it is the lowest overall average score of all the subskills for both cohorts (see Fig. 10). Given the simulations’ focus on the effects of one variable upon other variables it was expected that the treatment cohort would have scored much higher than observed since they were implicitly looking at differences in proportions throughout the unit.

Fig. 10. Scientific reasoning sub-skill post test scores by cohort.

The differences in the pre and post-test scientific reasoning sub-skill scores were analysed further to see if there were any specific differences between the cohorts that could be determined. Table 1 contains the pre and post test scores for each cohort as well as the normalized gain for each sub-skill. The significance between the pre and posttest scores for each subskill were tested using T-tests. The pre to post scores that were statistically significant with p-values less than 0.05 and less than 0.1 are highlighted in Table 1. The normalized gain (N-gain) was calculated in order to develop a more indepth picture of the strengths and weaknesses of each cohort and the overall effectiveness of the treatment in terms of scientific reasoning sub-skills. The normalized gain is the ratio between the average gain (percent posttest score – percent pretest score) to the maximum possible average gain (100 – percent pretest score). Thus, it allows one to determine the gain based upon the amount of percentage points available to gain for each cohort in each sub-skill. The overall N-gain for treatment vs comparison cohort was 12 vs 3, respectively. This seems to imply as did the statistical analysis that the treatment was very effective at enhancing the scientific reasoning skills of the treatment group while the cohort group showed little to no overall gain.

300

K. L. Malone and A. Schuchardt

Fig. 11. Scientific reasoning N-Gain scores by Cohort.

Table 1. Scientific reasoning average pre and posttest scores and N-Gains by Cohort (* significance at p < 0.05, **significance at p < 0.1). Scientific sub-skills

Treatment Cohort

Comparison Cohort

Pre score

Post score

N-gain

Pre score

Post score

N-gain

31

38**

10

30

33

4

Correlational reasoning 54

64**

22

45

55**

18

Proportional reasoning

17

25*

10

14

17

3

Hypothetico-deductive reasoning

35

40

8

25

32

9

Probability

61

66

13

60

62

5

Control of variables

According to post-test scores, the treatment group outperformed the comparison group. However, the N-gain scores reveal a more nuanced picture. The treatment group showed much greater gains than the comparison group in in the sub-skills of proportional reasoning, control of variables, and probability. However, the gains in correlational reasoning were more comparable. For hypothetico-deductive reasoning, the comparison group made slightly higher normalized gains than the treatment group. One possible explanation is that the pre-test scores for the comparison group for these two categories were much lower than for the treatment group so there was greater opportunity for gain. Because the post-test scores for hypothetico-deductive reasoning were not very high for the treatment group, this analysis also suggests an opportunity for improvement as well as future research.

Population Growth Modelling Simulations

301

Thus, overall the use of the modelling embedded simulation produced larger N-gains in proportional reasoning, control of variables and probability. However, the effect on correlational reasoning over that of the comparison group was slight (i.e., N-Gains of 22 vs 18, respectively). In addition, the effect on hyopthetico-deductive reasoning was non-existent with the N-Gains being similar for both treatment and comparison cohorts (i.e., 8 and 9, respectively). Two-tiered Question Analysis. By using the two-tiered analysis (treating paired items as a group), the LCTSR scores can be categorized into Piagetian reasoning stages. In this method the largest possible score is 13. Therefore, students scoring from 11–13 are categorised as Formal Operational Reasoners whereas those scoring 0–4 would be categorized as Concrete Reasoners. Figures 12a and b show the shift in number of students in each reasoning stage per cohort. Figure 12 demonstrates that the treatment cohort showed a large shift towards more formal reasoners from pre to post assessment compared to the comparison cohort (i.e., 7 vs 1 additional student, respectively). In the treatment group there was also a large decline in the number of Early Transitional reasoners overall (i.e., 85 vs 67, respectively) and an increase in Late Transitional Reasoners (i.e., 40 vs 26, respectively). This is over a 50% increase in Late Transitional Reasoners pre to post and an overall decline of 25% in the Early Transitional Reasoner category. However, only a few students shifted out of the Concrete Reasoner stage (less than 5%). This is a troubling finding for the treatment effectiveness.

Fig. 12. Pre and post student reasoning levels by Cohort [Source: 9].

On the other hand, the comparison group did not show any large shifts between reasoning levels. In fact, only one student was shifted into the Formal Reasoner (i.e., 0 to 1, respectively), Late Transitional Reasoner and Early Tranistional Reasoner categories. The comparison cohort had similar findings to that of the treatment cohort in terms of shifts in Concrete Reasoners with less than 5% shifting out of that category pre to post study. Thus, the use of computer simulations in conjunction with Modelling Instruction pedagogy in a population ecology unit seems to allow for a change in Piagetian reasoning

302

K. L. Malone and A. Schuchardt

levels allowing for a much greater number of students to shift from Early Transitional Reasoners to Formal Reasoners. However, neither group showed much ability to shift concrete reasoners towards higher reasoning levels.

4 Discussion 4.1 Modelling Instruction with Embedded Simulations Increases Overall Scientific Reasoning Abilities The single-tier item analysis demonstrated that statistically significant gains in scientific reasoning were made by the treatment group using the population ecology simulation and modelling instruction between assessment administrations (p < 0.001). However, the comparison cohort did not make any significant gains between administrations of the assessment. Indeed, a comparison of the two cohorts total post test scores demonstrated that the treatment cohorts’ average score was statistically greater than the comparison cohorts’ (p < 0.004). Therefore, the use of population growth computer-based spreadsheet simulations in conjunction with Modelling Instruction in the context of population ecology produced a significant shift overall in scientific reasoning skills. 4.2 Scientific Reasoning Sub-skills Show Mixed Results The subskill post test scores demonstrated that the curriculum units used by the treatment cohort showed post assessment reasoning abilities that exceeded that of the comparison cohort. The subskills showing the lowest percentage values for both cohorts were that of proportional reasoning, hypothetico-deductive reasoning and control of variables. The differences in acquiring scientific reasoning subskills between the two cohorts were further assessed using normalized gain calculations. While the post-test scores indicated that students in both cohorts seemed to have difficulty mastering three particular subskills, the N-gains demonstrated that the use of computer simulations embedded in a Modelling Instruction pedagogical unit did produce much larger N-gains in terms of proportional reasoning and control of variables. Students in the treatment group are required when using the simulations to more often analyse generated data by using proportional reasoning. Moreover, the simulations asked the students to control variables when matching model-predicted and observed data. However, both skills have been proven to be difficult to develop in students in other studies [57, 58]. However, even though the N-gains were larger than that of traditional students the values point to the need for improvements in the use of simulations within this context. For example, specific scaffolds need to be put into place that will allow for students to more explicitly compare data via proportional reasoning. In addition, the control of variables needs to be highlighted more for the students. For example, activities that have the students compare results when only one variable is changed at a time vs. a number of variables may encourage greater development of this sub-skill. The most troubling sub-skill was hypothetico-deductive reasoning since the N-gain in both cohorts was basically the same. This shows that the treatment had no effect overall on this reasoning skill. This is troubling since the unit specifically dealt with students not

Population Growth Modelling Simulations

303

only producing testable hypotheses but also testing them using the simulation. In order to encourage this skill development using simulations, additional activities and simulations need to be included that explicitly asks students to develop a hypothesis, detail a prediction and then have them specifically compare those predictions to simulation results. This type of technique should allow for further enhancement of this sub-skill. 4.3 Modelling Instruction with Embedded Simulations Scaffolds the Movement of Students Towards Abstract Reasoning The two-tiered analysis showed that the use of a simulation in conjunction with Modelling Instruction assisted over 6% of the students in shifting to formal reasoners. In addition, the treatment cohort did reveal several students shifting from Piagetian Early Transitional Reasoning abilities towards higher reasoning levels. The comparison group did not demonstrate any major shift between reasoning levels. This means that the treatment students overall will have a greater ability to succeed more easily in future science classes given the correlation between scientific reasoning levels and conceptual gains [e.g., 7, 53]. However, even though the treatment cohort showed much more positive results in terms of shifts in Piagetian reasoning levels pre to post assessment there was still very little shift in the total number of concrete reasoners to higher reasoning levels. This is troubling since the students who could most benefit in the future from a shift in reasoning levels would be the concrete reasoners. While this could be improved upon by changes to the unit that make the simulations more understandable to concrete reasoners possibly by scaffolding their transition from physical to computer simulations (i.e., concrete to more analytic representations) the study did demonstrate that the use of this unit is a great improvement over the shifts in reasoning levels from that in traditional classes. Thus, the use of computer-based simulations in modelling-based classrooms shows much promise for the future.

5 Conclusions and Future Directions Overall, the study demonstrated that the use of simulations in conjunction with Modelling Instruction pedagogy demonstrates positive results in terms of scientific reasoning gains versus that of the comparison cohort (i.e., a 15% difference in overall reasoning scores). In addition, the study demonstrated the ability of the materials to allow for a greater number of students to shift their Piagetian reasoning levels towards becoming more formal reasoners. Therefore, overall students using these materials should be better prepared for advanced science study. However, the results also demonstrated that the materials need to be improved in order to allow for a more authentic ability to practice control of variables and to develop proportional reasoning skills. In addition, the unit did not allow for much change in hypothetico-deductive reasoning ability in the treatment cohort over that of the comparison cohort. Therefore, studies need to be conducted that will allow for a greater amount of hypothesis testing in order to further enhance this skill in students via the use of simulation.

304

K. L. Malone and A. Schuchardt

In addition, differences in the simulation use between high and low ability students should be studied in order to develop better simulation scaffolds. Better simulation scaffolds could allow all students to show similar gains in reasoning levels across classrooms that contain students of varying abilities. Thus, these changes could help more concrete reasoners move towards abstract reasoning. In addition, this study did not include cohorts that used just the simulation or just used Modelling Instruction materials without population ecology simulations in order to tease apart the effects of the two in terms of scientific reasoning skills. Future studies should also include an analysis of conceptual gains as well as that of scientific reasoning. Acknowledgements. This research was partially funded by a grant under the federally funded Math Science Partnership State Grants Program, under Grant number OH160505 and OH160511. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding organizations.

References 1. Mullis, I.V.S., Martin, M.O., Goh, S., Cotter, K. (eds.): TIMSS 2015 Encyclopaedia: Education Policy and Curriculum in Mathematics and Science. Retrieved from Boston College, TIMSS & PIRLS International Study Canter website (2016). http://timssandpirls.bc.edu/timss2015/ encyclopedia/ 2. Organisation of Economic Co-operation and Development (OECD, 2016). Low-Performing Students: Why they fall behind and how to help them survive, PISA, OECD Publishing, Paris. http://dx.doi.org/10.1787/9789264250246-en 3. Heaps, A.J., Dawson, T.D., Briggs, J.C., Hansen, M.A., Jensen, J.L.: Deriving population growth models by growing fruit fly colonies. Am. Biol. Teacher 78(3), 221–225 (2016) 4. Oswald, C., Kwiatkowski, S.: Population growth in Euglena: a student-designed investigation combining ecology, cell biology, and quantitative analysis. Am. Biol. Teacher 73(8), 469–473 (2011) 5. Huppert, J., Lomask, S.M., Lazarowitz, R.: Computer simulations in the high school: Students’ cognitive stages, science process skills and academic achievement in microbiology. Int. J. Sci. Educ. 24(8), 803–821 (2002) 6. Passmore, C., Gouvea, J.S., Giere, R.: Models in science and in learning science: focusing scientific practice on sense-making. In: Matthews, M.R. (ed.) International Handbook of Research in History, Philosophy and Science Teaching, pp. 1171–1202. Springer, Dordrecht (2014). https://doi.org/10.1007/978-94-007-7654-8_36 7. Coletta, V.P., Phillips, J.A., Steinert, J.J.: Why you should measure your students’ reasoning ability. Phys. Teacher 45, 235–238 (2007) 8. Moore, J.C., Rubbo, L.J.: Scientific reasoning abilities of nonscience majors in physics-based courses. Phys. Rev. Spec. Topics – Phys. Educ. Res. 8(1), 10106 (2012) 9. Malone, K.L., Schuchardt, A.M.: Improving students’ performance through the use of simulations and modelling: the case of population growth. In: Lane, H., Zvacek, S., Uhomobhi, J. (eds.) Proceedings of the 11th International Conference on Computer Supported Education – vol. 1, pp. 220–230. Crete, Greece, May 2019 10. Brody, M.J., Koch, H.: An assessment of 4th-, 8th-, and 11th-grade students’ knowledge related to marine science and natural resource issues. J. Environ. Educ. 21(2), 16–26 (1990) 11. Munson, B.H.: Ecological misconceptions. J. Environ. Educ. 25(4), 30–34 (1994)

Population Growth Modelling Simulations

305

12. Griffiths, A.K., Grant, B.A.C.: High school students’ understanding of food webs: identification of learning hierarchy and related misconceptions. J. Res. Sci. Teach. 22(5), 421–436 (1985) 13. Stammen, A.: The development and validation of the Middle School Life Science Concept Inventory (MS-LSCI) using Rasch analysis. (Doctoral dissertation, Ohio State University) (2018) 14. KMKSekretariat der Ständigen Konferenz der Kultusminister der Länder in der BRD (Ed.). Bildungsstandards im Fach Biologie für den Mittleren SchulabschlussBiology education standards for the Mittlere Schulabschluss]. München & Neuwied: Wolters Kluwer (2005) 15. NGSS Lead States: Next Generation Science Standards: For States, By States. The National Academies Press, Washington, DC (2013) 16. Berber, N.C., Guzel, H.: Fen ve matematik ö˘gretmen adaylarının modellerin bilim ve fendeki rolüne ve amacına ili¸skin algıları. Selçuk Üniversitesi Sosyal Bilimler Enstitüsü Dergisi 21, 87–97 (2009) 17. Henze, I., Van Driel, J., Verloop, N.: The change of science teachers’ personal knowledge about teaching models and modeling in he context of science education reform. Int. J. Sci. Educ. 15(3), 819–1846 (2007) 18. Justi, R., Gilbert, J.: Teachers’ views on the nature of models. Int. J. Sci. Educ. 25(11), 1369–1386 (2003) 19. Krell, M., Krüger, D.: Testing models: a key aspect to promote teaching activities related to models and modelling in biology lessons? J. Biol. Educ. 50(2), 160–173 (2016) 20. Ware, T., Malone, K.L., Irving, K., Mollohan, K.: Models and modeling: an evaluation of teacher knowledge. In: Proceedings from HICE 2017: The 15th Annual Hawaii International Conference on Education. Honolulu, HI, pp. 1834–1842, January 2017 21. Giere, R.N.: How models are used to represent reality. Philos. Sci. 71, 742–752 (2004) 22. Svoboda, J., Passmore, C.: The strategies of modeling in biology education. Sci. Educ. 22(1), 119–142 (2013) 23. Dori, Y.J., Belcher, J.: Learning electromagnetism with visualizations and active learning. In: Gilbert, J. (ed.) Visualization in Science Education, pp. 198–216. Springer, Dordrecht (2005). https://doi.org/10.1007/1-4020-3613-2_11 24. Won, M., Yoon, H., Treagust, D.F.: Students’ learning strategies with multiple representations: explanations of the human breathing mechanism. Sci. Educ. 98(5), 840–866 (2014) 25. Harrison, A.G., Treagust, D.F.: Learning about atoms, molecules, and chemical bonds: a case study of multiple-model use in grade 11 chemistry. Sci. Educ. 84, 352–381 (2000) 26. Chang, H., Chang, H.: Scaffolding students’ online critiquing of expert- and peer-generated molecular models of chemical reactions. Int. J. Sci. Educ. 35(12), 2028–2056 (2013). https:// doi.org/10.1080/09500693.2012.733978 27. Dauer, J.T., Momsen, J.L., Speth, E.B., Makohon-Moore, S.C., Long, T.M.: Analyzing change in students’ gene-to-evolution models in college-level introductory biology. J. Res. Sci. Teach. 50(6), 639–659 (2013) 28. Jackson, J., Dukerich, L., Hestenes, D.: Modeling instruction: an effective model for science education. Sci. Educator 17(1), 10–17 (2008) 29. Malone, K.L.: Correlations among knowledge structures, force concept inventory, and problem-solving behaviors. Phys. Rev. – Spec. Topics Phys. Educ. Res. 4(2), 20107 (2008) 30. Malone, K.L., Schunn, C.D., Schuchardt, A.M.: Improving conceptual understanding and representation skills through Excel-based modeling. J. Sci. Educ. Technol. 27(1), 30–44 (2018) 31. Passmore, C., Stewart, J.: A modeling approach to teaching evolutionary biology in high schools. J. Res. Sci. Teach. 39(3), 185–204 (2002) 32. Schwarz, C.V., White, B.Y.: Metamodeling knowledge: developing students’ understanding of scientific modeling. Cogn. Instruc. 23(2), 165–205 (2005)

306

K. L. Malone and A. Schuchardt

33. Wynne, C., Stewart, J., Passmore, C.: High school students’ use of meiosis when solving genetics problems. Int. J. Sci. Educ. 23(5), 501–515 (2001) 34. Lehrer, R., Schauble, L.: Seeding evolutionary thinking by engaging children in modeling its foundations. Sci. Educ. 96(4), 701–724 (2012) 35. Wells, M., Hestenes, D., Swackhamer, G.: A modeling method for high school physics instruction. Am. J. Phys. 63(7), 606–619 (1995) 36. Jenkins, J.L., Howard, E.M.: Implementation of Modelling Instruction in a high school chemistry unit on energy and states of matter. Sci. Educ. Int. 30(2), 97–104 (2019) 37. Malone, K., Reiland, R.: Exploring Newton’s third law. Phys. Teacher 33(6), 410–411 (1995) 38. Malone, K.L., Schuchardt, A.M.: The efficacy of modelling instruction in chemistry: a case study. In: Proceedings form HICE 2016: The 14th Annual Hawaii International Conference on Education, pp. 1513–1518. Honolulu, HI (2016) 39. Malone, K.L., Schuchardt, A.M., Sabree, Z.: Models and modeling in evolution. In: Harms, U., Reiss, M. (eds.) Evolution Education Re-considered, pp. 207–226. Springer, UK (2019). https://doi.org/10.1007/978-3-030-14698-6_12 40. D’Angelo, C., Rutstein, D., Harris, C., Bernard, R., Borokhovski, E., Haertel, G.: Simulations for STEM Learning: Systematic Review and Meta-analysis. SRI International, Menlo Park (2014) 41. Smetana, L.K., Bell, R.L.: Computer simulations to support science instruction and learning: a critical review of the literature. Int. J. Sci. Educ. 34(9), 1337–1370 (2012) 42. Wilensky, U., Reisman, K.: Thinking like a wolf, a sheep, or a firefly: learning biology through constructing and testing computational theories - an embodied modeling approach. Cogn. Instruct. 24(2), 171–209 (2006) 43. Donnelly, D.F., Namdar, B., Vitale, J.M., Lai, K., Linn, M.C.: Enhancing student explanations of evolution: comparing elaborating and competing theory prompts. J. Res. Sci. Teach. 53(9), 1341–1363 (2016) 44. Kuhn, D.: Children and adults as intuitive scientists. Psychol. Rev. 96, 674–689 (1989) 45. Lawson, A.E.: The nature and development of scientific reasoning. Int. J. Sci. Math. Educ. 2(3), 307–338 (2004) 46. Kuhn, D., Dean Jr., D.: Connecting scientific reasoning and causal inference. J. Cogn. Dev. 5(2), 261–288 (2004) 47. Russ, R.S., Coffey, J.E., Hammer, D., Hutchison, P.: Making classroom assessment more accountable to scientific reasoning: a case for attending to mechanistic thinking. Sci. Educ. 93(5), 875–891 (2009) 48. Lawson, A.E.: Developing Scientific Reasoning Patterns in College Biology. NSTA Press. Virginia (2006) 49. Klahr, D.: Exploring Science: The Cognition and Development of Discovery Processes. MIT Press, Cambridge (2002) 50. Zimmerman, C.: The development of scientific reasoning skills. Dev. Rev. 20(1), 99–149 (2000) 51. Lawson, A.E., Banks, D.L., Logvin, M.: Self-efficacy, reasoning ability, and achievement in college biology. J. Res. Sci. Teach.: Official J. Natl. Assoc. Res. Sci. Teach. 44(5), 706–724 (2007) 52. Coletta, V.P., Phillips, J.A.: Interpreting FCI scores: normalized gain, preinstruction scores, and scientific reasoning ability. Am. J. Phys. 73(12), 1172–1182 (2005) 53. Ding, L.: Verification of causal influences of reasoning skills and epistemology on physics conceptual learning. Phys. Rev. Spec. Topics-Phys. Educ. Res. 10(2), 023101 (2014) 54. Ding, L., Wei, Z., Mollohan, K.: Does higher education improve student scientific reasoning skills? Int. J. Sci. Math. Educ. 14, 619–634 (2016) 55. Lawson, A.E.: The development and validation of a classroom test of formal reasoning. J. Res. Sci. Teach. 15, 11–24 (1978)

Population Growth Modelling Simulations

307

56. Stammen, A., Malone, K.L., Irving, K.E.: Effects of Modeling Instruction professional development on biology teachers’ scientific reasoning skills. Educ. Sci. 8(3) (2018). https://doi. org/10.3390/educsci8030119 57. Ben-Chaim, D., Fey, J.T., Fitzgerald, W.M., Benedetto, C., Miller, J.: Proportional reasoning among 7th grade students with different curricular experiences. Educ. Stud. Math. 36(3), 247–273 (1998) 58. Klahr, D., Li, J.: Cognitive research and elementary science instruction: from the laboratory, to the classroom, and back. J. Sci. Educ. Technol. 14(2), 217–238 (2005)

Novice Learner Experiences in Software Development: A Study of Freshman Undergraduates Catherine Higgins(B) , Ciaran O’Leary, Claire McAvinia, and Barry Ryan Technological University Dublin, Dublin, Ireland [email protected]

Abstract. This paper presents a study that is part of a larger research project aimed at addressing the gap in the provision of educational software development processes for freshman, novice undergraduate learners, to improve proficiency levels. With the aim of understanding how such learners problem solve in software development in the absence of a formal process, this case study examines the experiences and depth of learning acquired by a sample set of novice undergraduates. A novel adaption of the Kirkpatrick framework known as AKM-SOLO is used to frame the evaluation. The study finds that without the scaffolding of an appropriate structured development process tailored to novices, students are in danger of failing to engage with the problem solving skills necessary for software development, particularly the skill of designing solutions prior to coding. It also finds that this lack of engagement directly impacts their affective state on the course and continues to negatively impact their proficiency and affective state in the second year of their studies leading to just under half of students surveyed being unsure if they wish to pursue a career in software development when they graduate. Keywords: Software development undergraduate education · Freshman university learners · Kirkpatrick framework · SOLO taxonomy

1 Introduction The rapid growth in technologies has increased the demand for skilled software developers and this demand is increasing on a global scale. A report from the United States Department of Labor [1] states that employment in the computing industry is expected to grow by 12% from 2014 to 2024; a higher statistic than the average for other industries. However, learning how to develop software solutions is not trivial due to the high cognitive load it puts on novice learners. Novices must master a variety of skills such as requirements analysis, learning syntax, understanding and applying computational constructs and writing algorithms [2]. This high cognitive load means that many novice developers focus on programming language syntax and programming concepts and, as a result, find the extra cognitive load of problem solving difficult [3]. This suggests © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 308–330, 2020. https://doi.org/10.1007/978-3-030-58459-7_15

Novice Learner Experiences in Software Development

309

that there is a need for an educational software development process aimed at cognitively supporting students in their acquisition of problem solving skills when developing software solutions. However, even though there are many formal software development processes available for experienced developers, very little research has been carried out on developing appropriate processes for freshman, university learners [4]. This lack of appropriate software development processes presents a vacuum for educators with the consequence that the skills required for solving computational problems–specifically carrying out software analysis and design - are typically taught very informally and implicitly on introductory courses at university [5, 6]. This is problematic for students as without systematic guidance, many novices may adopt maladaptive cognitive practices in software development. Examples of such practices include rushing to code solutions with no analysis or design; and coding by rote learning [7]. These practices can be very difficult to unlearn and can ultimately prohibit student progression in the acquisition of software development skills [7, 8]. It has also been found that problems in designing software solutions can persist even past graduation [9]. To address these challenges, this paper presents results and findings from a focused case study which is the first part of a larger research project, the ultimate aim of which is to develop an educational software development process with an associated tool for novice university learners. An adaption of the Kirkpatrick evaluation model [10] is used to frame the evaluation in this study. In a companion paper which first presented results from this study [11], the findings from the application of the first two levels of the adapted evaluation model (known as AKM-SOLO) were presented. In this extended paper, a detailed description of the full structure and application of the AKM-SOLO model is included. Furthermore, a summary set of results from the first two levels and full results from the remaining levels of the adapted model are presented and discussed. The aim of the study is to identify the specific issues and behaviour that can arise in the absence of a software development process when instructing novice undergraduate learners.

2 Related Research There has been a wealth of research over three decades into the teaching and learning of software development to improve retention and exam success rates at university level. Research to date has focused on a variety of areas such as reviewing the choice of programming languages and paradigms suitable for novice learners. A wide variety of languages have been suggested from commercial to textual languages through to visual block-based languages [12]. Other prominent research has included the development of visualisation tools to create a diagrammatic overview of the notional machine as a user traces through programs and algorithms [13, 14]; and the use of game based learning as a basis for learning programming and game construction [15, 16]. Research that specifically looks at software development practices for introductory software development courses at university level have tended towards the acquisition of programming skills, with the focus on analysis and design skills being studied as part of software engineering courses in later years. Examples of such research include Dahiya [17] who presents a study of teaching software engineering using an applied

310

C. Higgins et al.

approach to postgraduate and undergraduates with development experience, Savi and co-workers [18] who describe a model to assess the use of gaming as a mechanism to teach software engineering and Rodriguez [19] who examines how to teach a formal software development methodology to students with development experience. In examining research into software development processes aimed at introductory courses at university, comparatively few were found in the literature. Those that have been developed tend to focus on a particular stage of the development process or on a development paradigm. Examples include the STREAM process [4] which focus on design in an object oriented environment; the P3 F framework [20] with a focus on software design and arming novice designers with expert strategies; a programming process by Hu and co-workers [21] with a focus on generating goals and plans and converting those into a coded solution via a visual block-based programming language; and POPT [22] which has a focus on supporting software testing. In contrast to the processes cited above, this research has a focus on all stages of problem solving when developing software solutions. This study is part of the first cycle of an action research project whose ultimate aim is the generation of an educational software development process aimed at this category of student to support their acquisition and application of problem solving skills.

3 Research Methods The research question for this study is: In the context of problem solving in software development by novice university learners, what are the subjective experiences and depth of learning of a sample cohort of freshman, university students studying software development without the support of a formal software development process? 3.1 Participants The control group was a cohort of first year undergraduate students who were registered on a degree in software development in the academic years 2015/16 and 2016/17. Given that the participants were not randomly assigned by the researcher, it was necessary to first conduct a pre-test to ensure they were probabilistically equivalent in order to reduce any threat to the internal validity of the experiment. This means that the confounding factor of any student having prior software development experience was eliminated. The control group had 82 students of which the gender breakdown was 70% male and 30% female. These students were tested again at the end of their second year where 16 students participated from the academic year 2016/17 and 25 students participated from the academic year 2017/18 giving a combined control group of 41 with a 75% male to 25% female gender breakdown. 3.2 Pedagogical and Assessment Process The module that was the subject of this study was a two semester, 24 week introduction to software development which ran over the first academic year of the programme. It has

Novice Learner Experiences in Software Development

311

been observed in Sect. 1 of this paper that there is a gap in software engineering education in the provision of software development processes for freshman, undergraduate computing students [4]. Therefore, students in this study were taught software development in the absence of a formal software development process. This means that similar to equivalent undergraduate courses, students were primarily taught how to program in a specific language with the problem solving process to apply the language to solve problems being a suite of informal steps [6]. The programming language taught to students was Java and the order of topics taught to students are summarized in Table 1. These topics were taught via lectures and problem solving exercises given in practical sessions. Students were also taught to use pseudocode as a design technique in order to design solutions for the exercises. Table 1. The topics taught to students (Source: Higgins and colleagues [11]). Topics 1. Sequential Flow (e.g. using variables, display, inputs) 2. Non-sequential Flow (e.g. conditional constructs, loops) 3. Modularity (e.g. functions, parameters, scope of variables) 4. Object Oriented Interaction/Behaviour

When students were given a problem to solve, they were encouraged to analyse the problem by attempting to document on paper the requirements of the problem (i.e. a decomposition of the problem into a series of actions). Pseudocode and Java were used to design and code solutions to these requirements in an iterative and incremental cycle. There were nine intended learning outcomes (ILOs) for this module which were used as a mechanism to test students’ levels of proficiency in problem solving in software development. These ILOs are summarized in Table 2. Table 2. Taxonomy of Intended Learning Outcomes for the module (Source: Higgins and colleagues [11]). Taxonomy of Intended Learning Outcomes 1. Apply process of abstraction when solving problems 2. Illustrate evidence of mental modelling of programming concepts 3. Illustrate evidence of mental modelling of notional machine 4. Recognise opportunities for reuse of existing problems or sub-problems 5. Perform problem analysis and decomposition 6. Identify data that is required to solve a problem 7. Design algorithms for decomposed problems 8. Apply algorithm integration 9. Evaluate solution incrementally

312

C. Higgins et al.

3.3 Evaluation Process – AKM-SOLO An adaption of the Kirkpatrick model was used as the evaluation process for this study. The original Kirkpatrick model is a structured mechanism with five levels which was developed as a tool for businesses to test the effectiveness of either in-house or outsourced training programmes for employees [10, 23]. However, the scope of use for this model extends beyond business as there are also many examples in the literature of the model being used to test learning interventions in an educational context for students [24, 25]. In the original Kirkpatrick model, each of the levels are deployed sequentially starting with level 1 with each subsequent level becoming increasingly complex to measure. These higher levels provide increasingly more valuable information about the overall value and impact of the training [26]. The first level – Reaction -measures participants’ reactions to, and perceptions of, instruction received once it has been completed. The second level – Learning - assesses if the learning objectives for the training programme have been met. The third level - Behaviour – examines the behavioral change (if any) in participants as a result of instruction once they return to their jobs or future studies. The fourth level – Results – examines the targeted outcomes of training to an organisation such as reduced costs, improved quality and increased quantity of work. The fifth level – Return on Investment - measures the medium to long-term return on investment for an organisation. A return on investment is not relevant in the context of this study and in the evaluation of academic education and is therefore not considered further. However, the Kirkpatrick model is not without its critics. Specific criticism is based on the model operating as a summative, goal-based model of evaluation with the confounding factors that can affect learning often being ignored [27]. It has also has been noted that there can be little visibility into the learning that takes place and issues that arise as a training course proceeds [28]. Furthermore, the incompleteness of the framework is troublesome; particularly the high-level nature of the levels, where there is little guidance in how to evaluate those levels [29, 30]. Therefore, for this study, the original model with its first four levels has been adapted into a model titled the Adapted Kirkpatrick Model with SOLO (AKM-SOLO). The Structure of Observed Learning Outcomes (SOLO) taxonomy [31, 32] is incorporated into the adaptation in order to continually monitor the depth of learning that occurs to enhance the formative nature of the model. This change makes the adapted model both a summative and formative model of evaluation. A summary description of the four levels of this model can be seen in Table 3. Similar to the original model, this adapted model has four levels. Levels 1 and 2 are very similar to Kirkpatrick’s levels 1 and 2 albeit the newly adapted level 2 evaluation is more explicit given its incorporation with SOLO. Also as can be seen from Table 3, levels 3 and 4 have been renamed in this adapted model and also given a new focus. A full description of all four levels of the model is given in the remainder of this section. The choice of appropriate data collection instruments for this evaluation model was guided by the decision to employ a mixed methods design. Quantitative analysis was used in levels 2 and 3 of the AKM-SOLO model to evaluate a set of prescribed problems given at different stages of the academic year to test the depth of learning. Quantitative and qualitative analysis was carried out on the surveys and focus group sessions in

Novice Learner Experiences in Software Development

313

Table 3. A summary description of the four levels of the AKM-SOLO model framed in the context of the evaluation of a freshman, undergraduate software development course with data collection tools and mode of evaluation outlined for each level. Level

Definition

Data collection tools

Evaluation

1. Reaction

An evaluation of students’ reaction to - and experience of - the software development skills they were taught

Post-test survey and focus group

Mixed methods evaluation with triangulation

2. Learning

An evaluation of the depth of student learning that is taking place as they are being taught concepts and skills

Suite of well-defined SOLO Taxonomy problems across each of the Framework four topics

3. Transfer

An evaluation of student software development competency at the end of first year by examining their ability to transfer their learning to solve a large, ill-defined problem

A large, ill-defined problem SOLO Taxonomy Framework

4. Impact

An evaluation of the impact Follow-up survey of first year problem solving in software development instruction on students’ attitudes and practice in the second year of the programme

Mixed methods evaluation with triangulation

levels 1 and 4 (see Table 3). Given that this case study has a focus on understanding the learning process of freshman students studying software development for the first time, the confounding factor of prior learning is eliminated from the adapted model by subjecting students to a pre-intervention survey to ensure only novice learners are included in the evaluation. Descriptions of the characteristics and deployment of the four levels of the model is contained in the following subsections. Level 1 – Reaction. The aim of the first level was to document students’ reaction to, and experience of, problem solving in software development. In order to achieve this aim, five research questions were posed: 1. What quantifiable engagement do students have with software development? 2. What planning techniques (i.e. analysis and design techniques) did students find useful when solving computational problems?

314

C. Higgins et al.

3. What planning techniques (i.e. analysis and design techniques) did students NOT find useful when solving computational problems? 4. Is there an association between engagement and type of technique favoured? 5. What emotional responses did students experience on the course that they perceived motivated or demotivated them in their studies? To provide answers to these questions, students completed a survey (n = 82) and attended a focus group session (n = 21) close to the end of their first year of undergraduate study. In an attempt to quantify students’ engagement levels with problem solving, a dependent variable called engagement was generated from the survey. This variable had values ranging from 12, to indicate that a student is fully engaged with software development, to 0, to indicate student is not engaged. The formulation of the engagement variable involved examining 12 of the survey questions. These questions specifically examined student attitudes to the value they perceived analysis and design had when they are solving problems. Additionally, responses to these questions indicated whether the respondents would use these techniques outside of assignment work and if they planned to use them beyond the current academic year. A binary measurement score was given to the answers, which were summated to give the engagement value. The principal quantitative techniques used on the survey data were Cronbach’s alpha [33], to measure internal consistency of the data, and the Kruskal-Wallis test [34], to see if there is an association between students’ level of engagement and the type of software development techniques favoured. The tool used for the quantitative analysis was IBM SPSS Version 24. The data collected from the open questions of the survey and the focus group were subjected to qualitative thematic analysis as suggested by Braun and Clark [35]. The tool used to assist in this analysis was NVivo Version 12. Level 2 - Learning. Levels 2 and 3 of the original Kirkpatrick model required enhancement in order to have a clear and traceable process to examine learning in a formative mode. To do this, the Structure of Observed Learning Outcomes (SOLO) taxonomy [31, 32] was used to augment these levels so the depth of student learning taking place as the course progressed could be measured and issues identified. In order to test student learning in each of the four topics of interest (see Table 1), a suite of sixteen problems (four problems per topic) were given to students during the research period. As a mechanism to test the depth of student learning applied when solving these problems, a SOLO taxonomy framework was developed which mapped the five SOLO stages against the nine ILOs presented in Table 2. This framework was used as a guide by researchers to measure the depth of learning a student demonstrated in each of the nine ILOs for a specific problem (see Table 4 for a subset of this framework for illustrative purposes). For each problem solution completed by each student (i.e. 82 students by 16 problems), the depth of learning was measured as a SOLO score for each of the nine ILOs. The SOLO score achieved was measured as a number from 1–5 to represent the SOLO stages Prestructural (1), Unistructural (2), Multistructural (3), Relational (4) and Extended Abstract (5). Calculating the mean of all nine ILO SOLO scores produced a single average SOLO score which represents the SOLO depth of learning for that problem in a

Novice Learner Experiences in Software Development

315

specific topic for a student. Finally, calculating the mean of all student solutions for all four problems in a topic produced a single average SOLO score for that topic. Table 4. A subset of the SOLO Taxonomy framework as applied to the first three stages of the SOLO taxonomy in conjunction with the first three ILOs from Table 2 (Source: Higgins and colleagues [11]). Applying abstraction

Programming Concepts Notional Machine

1: Prestructural

No understanding of abstraction

No understanding of concepts

Cannot articulate state of concept

2: Unistructural

Can abstract from problem specification to code only

Understand one of the concepts

Can articulate state of one concept

3: Multistructural

Can abstract between several levels e.g. spec – analysis, analysis – design, analysis – code, design - code but no traceability across all levels

Understand several concepts but can’t relate them

Articulate states of several concepts but can’t relate them

Intended Learning Outcome SOLO Stage

Level 3 - Transfer. Level 3 in the original Kirkpatrick model is known as Behaviour, as it is intended to examine employee behaviour once they return to the workplace to see how the training has impacted their work practices. In the context of this study, students are not returning to work, but an equivalent experience is learning transfer which is the ability of a learner to successfully apply the knowledge and skills acquired in a more realistic problem solving situation. Given that the domain of learning here is problem solving with software development, this level examines students’ ability to transfer their learning from the relative containment of smaller, well-defined problems into a larger, more ill-defined problem that would mirror more closely a real-world problem. In order to better reflect this specific focus, the level is renamed Transfer in this adaptation. The level is evaluated using the same process as level 2 but in this case, instead of solving a suite of problems based on each of the four topics, students are presented with a large ill-defined problem to which they have to provide a solution. Solutions are measured for depth of learning in each of the nine ILOs so comparisons can be made with the results from level 2 and a conclusion reached regarding students level of proficiency in software development going forward.

316

C. Higgins et al.

Level 4 - Impact. In AKM-SOLO, the title of level 4 is renamed from Kirkpatrick’s original title of Results to Impact as at this level, the focus is on examining the impact that learning how to problem solve in software development in first year has on the second year experience. The rationale for this inquiry is that it has been reported in the literature that software development habits and attitudes acquired by novice learners can be very difficult to unlearn [7]. Therefore, this level evaluates students at the end of their second year to see what positive, negative or neutral impact their first year training in problem solving has had on their attitudes to- and affective state with - software development in future years. To carry out this evaluation, a survey is given to students at the end of their second year and the evaluation is framed around the following four research questions. 1. What are students’ attitudes to analysis and design in general at the end of second year? 2. What are students’ attitudes to the specific analysis and design techniques they were taught in first year? 3. What recommendations do students have to improve analysis and design in first year? 4. What impact has students’ first year experience had on their affective state when applying problem solving techniques to computational problems in second year?

4 Results and Findings This section presents the results and findings from carrying out the evaluation. Full results for AKM-SOLO level 1 (research questions 1 to 4 in Sect. 3.3, Level 1 Reaction) and AKM-SOLO level 2 can be found in Higgins and colleagues [11] with a summary of those results presented in Sects. 4.1 and 4.2 of this paper. Full results from the evaluation of AKM-SOLO levels 3 and 4 can be found in Sect. 4.3 and Sect. 4.4 of this paper. 4.1 Level 1 - Reaction This level measured students’ reactions to, and experiences of, problem solving in software development where data was collected using a survey (n = 82) and running a focus group session (n = 21). A quantifiable engagement level in problem solving (see Sect. 3.3) was calculated for the cohort (n = 82) which resulted in an average score of 5.7 out of 12. This score indicates a less than average engagement with problem solving. In examining the planning techniques (i.e. analysis and design techniques) that students find useful when solving computational problems, 42% of survey participants (n = 35) and 48% of focus group participants (n = 10) were positive about the use of analysis as a technique to help them break down the main problem into a series of ordered sub-problems which were easier to individually solve. Examining the planning techniques that students did not find useful when solving computational problems, pseudocode was specifically cited by 46% (n = 38) of survey participants with 67% (n = 14) of focus group students indicating that they found design to be very confusing and unhelpful to them when solving computational problems.

Novice Learner Experiences in Software Development

317

Testing the association between the different types of analysis and design techniques favoured by students, it was seen that 78% (n = 30) of students in this study who indicated that they found no technique useful also had a very low engagement level of 0–2, with 21% (n = 8) having an engagement level of 3 and 1% (n = 1) an engagement level of 4. Conversely, 84% (n = 41) of students who indicated they favoured the technique of requirements analysis had an engagement level of 7. 48% (n = 24) of those specifically specifying pseudocode or design techniques had an engagement factor or 3 or less. This result highlights the use of pseudocode as being in negative correlation with student engagement. Conversely, of the 100% (n = 12) of students who indicated that no technique was unhelpful, 62% (n = 7) had an engagement level of 8 or more. In examining the data from the focus group, 58% (n = 12) of students indicated that they did not carry out any design prior to attempting to code a solution and of those students, 78% (n = 9) had an engagement level of 3 or less. The findings from the fifth research question posed in Sect. 3.3 (Level 1 – Reaction), which was outside the scope of Higgins and colleagues [11], is reported here. What emotional responses did students experience on the course that they perceived motivated or demotivated them in their studies? In the context of the survey, when attempting to cite emotional responses that they perceived motivated them, students found this question difficult to answer as 73% (n = 60) either provided no answer or indicated that they were unsure. Of the answers that were received (n = 22), these answers are categorised and aggregated into three codes [36] – Enjoy creating fun or useful solutions (8.5%, n = 7), Enjoy writing programs (10%, n = 8), Motivating to see success on course to date (8.5%, n- = 7). When citing demotivating factors, students were much more comfortable with answering this question with 21% (n = 17) citing that there were no demotivating factors. Of the 79% (n = 64) of students who did respond, these answers were codified into Confidence knocked from having to engage with Analysis and Design techniques (95%), Feeling bored by software development (3%), Feeling frustrated as software development too difficult (2%). In the context of the focus group responses, there was a large response of 81% (n = 17) of students who cited design as providing a demotivating emotional response. Students indicated that design annoyed them, that it made them lose interest in software development, that it lowered their confidence in their ability, that they hated the subject as a result, that is was a miserable experience and they would consider leaving the course as a result. For students who indicated that the design process was motivating to them, 38% (n = 8) of students responded but interestingly of that 38%, 75% (n = 6) used diagrams for design with these students indicating that switching to diagrams made them feel calmer about solving problems and gave them an overview of what they wished to achieve in their solution. This was in a context where diagrammatic techniques for design were not taught to students. “Once I backed away from design using pseudocode and then started diagramming, life became much less stressful and I actually started to enjoy it” – (Focus Group Student 07). The remaining 2 students who submitted a positive result about design indicated that they enjoyed design when they worked with friends as it “made me feel less alone”

318

C. Higgins et al.

(Focus Group Student 15) and “it’s okay not to understand design initially” (Focus Group Student 9). 4.2 Level 2 – Learning It was observed from the findings produced at this level that the expected depth of learning for students was expected to begin at SOLO score 2.33 (just above unistructural score of 2) but it actually began at 1.99 which is just below this SOLO stage (see Fig. 1). As the students progressed through the four topics, the actual depth of learning remained lower than expected with the final question in the 4th topic producing an actual score of 3 (multistructural stage) while the expected score was 4 (relational stage).

Fig. 1. Line chart to compare Expected SOLO scores with Actual SOLO scores across all four topics by students (n = 82) (Source: Higgins et al. [11]).

Therefore, even though it was expected that on average students would be able to combine multiple concepts when solving problems, in reality while they could understand and utilise several ILOs across the four topics, they had difficulties when it came to integrating ILOs to generate correct solutions. This is a low result to achieve at the end of the course as it suggests that while students can demonstrate aptitude in multiple ILOs separately, they cannot integrate them (which is the SOLO relational stage). This ability to integrate ILOs when planning and developing solutions is required if students are to become proficient problem solvers in software development. It was seen that this issue exists primarily due to students having difficulties utilising design, integration and solution reuse with the learning outcomes evaluation, abstraction and modelling the notional machine also causing significant learning issues for students. However, understanding programming constructs, data representation and analysis and decomposition were at the multistructural stage which suggests students can understand and mentally model programming concepts and variables but they find it difficult to apply that knowledge when generating solutions. A positive observation is that while the actual SOLO means for each of the four topics remained lower than the expected means, both sets of

Novice Learner Experiences in Software Development

319

means followed a similar upward trend meaning there was an improvement in the depth of learning. 4.3 Level 3 – Transfer

Percentage of students

In order to test the impact of learning on students’ continued ability to solve problems, a final assignment was given to students at the end of their first year, which incorporated all topics taught on the course. This was presented as a mechanism to examine students in the process of solving a larger and more ill-defined problem in comparison to the more well-defined problems that were provided for each of the four course topics (see Table 1). The students were given a basic specification that required them to research and create a retail application for a real-world organisation that sold products or services to customers. Given that students advanced to the multistructural stage (SOLO score 3) at the end of the fourth topic in level 2, it was expected that they would at least remain at this stage as they progressed through this level. Furthermore, it was expected that they would continue to improve and would move towards the relational stage (SOLO score 4) of understanding. The solutions to this problem were analysed using the SOLO taxonomy framework as introduced in Level 2 (see Sect. 3.3, Level 2- Learning). A chart presenting the percentage of students (n = 82) with their SOLO scores achieved in each of the nine ILOs is given in Fig. 2.

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

Prestructural 1

Unistructural 2

Multistructural 3

Relational 4

Fig. 2. AKM-SOLO Level 3 Transfer - A Clustered Column Chart showing the SOLO score percentages achieved for each of the nine ILOs from Table 2 (n = 82).

From examining the data for the intended learning outcomes (see Fig. 2), it can be seen that the ILOs related to understanding programming constructs, analysis and decomposition, identifying data and evaluation recorded a majority of students achieving a SOLO score of 4 (relational stage). This suggests that that the majority of the cohort

320

C. Higgins et al.

Percentage of Students

could successfully combine these topics when applying them to a large problem. This was an improvement from AKM-SOLO level 2 where the majority of students achieved a SOLO score of 3 (multistructural stage). In examining the concepts of design and integration, which students found particularly difficult in level 2, there was a small improvement (see Fig. 3). For design, a proportion of students moved from the prestructural and unistructural depth of learning stages in level 2, resulting in small gains at the multistructural stage in level 3 (increase of 5%; n = 4) and the relational stage (increase of 8%; n = 6). In contrast, for integration, there was an 8% (n = 6) increase at the prestructural stage when level 3 is compared to level 2, indicating that there were more students who perceived they didn’t understand the concept than was recorded in level 2. This increase in miscomprehension is not surprising given the size and complexity of integrating a solution at this level. The remaining learning outcomes of abstraction, notional machine and reuse all showed a decrease of less than 5% in understanding when moving from levels 2 to 3.

60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% Pre 1

Uni 2 Multi 3 Rel 4 Design Level 2 SOLO scores

Pre 1

Uni 2 Multi 3 Rel 4 Interation

Level 3 SOLO scores

Fig. 3. Comparison of AKM-SOLO Levels 2 and 3 SOLO scores for the ILOs design and integration (n = 82).

It was seen that, on average, when all nine ILO scores are combined, 7% (n = 3) of students were still at the prestructural stage which was unchanged from level 2 and 24.7% (n = 10) were at the unistructural stage, which is an increase of 1%. However, there was a decrease of 1.7% at the multistructural stage, which is marginally mitigated by an increase of 0.6% at the relational stage, which brings that stage to 38% (n = 15). In order to measure if these percentage changes from level 2 to level 3 were statistically significant, a paired-samples t-test was conducted to compare the results between both

Novice Learner Experiences in Software Development

321

levels and it was found that there was no significant difference between the two levels (t = 0, p > 0.05). 4.4 Level 4 – Impact The impact level of students’ first year experience on their ongoing studies is tested at the end of their second year on the programme by participation in a survey in which 41 students participated. It should be noted that during the second year, students continued their software development education but were not explicitly taught any new analysis or design techniques. Therefore, students relied on the problem solving strategies they were taught in first year. The results from the four research questions posed (see Sect. 3.3) for this level are now presented and explored. 1. What are students’ attitudes to analysis and design in general at the end of second year? To measure attitudes to analysis and design, students were asked five closed questions which examined their approach to solving computational problems. The results from these questions are presented in Fig. 4 and Table 5. These results show a very similar pattern to the attitudes of students to analysis and design in first year, which indicate that students’ attitudes have not changed in the intervening year. In the first year evaluation in Level 1, it was observed that 65% (n = 26) of students found analysis to be useful, whereas 46% (n = 19) specifically cited design (in the form of pseudocode) as not being useful, with 35% (n = 14) citing that neither analysis nor design were useful. A year later, where students are not explicitly taught any new analysis and design techniques but where they had more software development education, it can be seen that 56% (n = 23) agree that analysis is useful (see Question 1, Fig. 4). However when solving problems, 73% (n = 31) of students do not design solutions, but instead look for code from an apparently similar problem to modify (see Question 3, Table 5). Equally, when students are faced with logical problems in their code, 68% (n = 28) would try and solve the problem by continually changing their code whereas 23% (n = 9) would revert to design (see Question 2, Table 5). Overall, students placed little value in analysis and design with over half explicitly labeling the process as being a waste of their time when trying to solve problems (see Questions 4 and 5, Fig. 4). In contrast, 41% (n = 17) explicitly indicate that they see the value in analysis and design in theory which would suggest that if they were taught analysis and design as part of an integrated process with programming, there may be scope for an improvement in their engagement with planning solutions to problems. 2. What are students’ attitudes to the specific analysis and design techniques they were taught in first year? To measure students’ attitudes in second year to the analysis and design techniques taught in first year, four closed questions were posed to examine how students approached solving computational problems. The results from these questions is presented in Fig. 5.

Percentage of Students (n=41)

322

C. Higgins et al. 60% 50% 40% 30% 20% 10% 0%

51% 34%

29%

22%

49%

24%

15%

12%10% 2%

0%

24% 17% 20% 0%

Q1 – I always ensure I Q4 - Analysis and design Q5 - I can see the value in is a time wasting exercise engaging in analysis and take time to first design understand the goal and required outcomes of a problem before trying to solve it. Strongly Agree

Agree

No opinion

Disagree

Strongly Disagree

Fig. 4. AKM-SOLO Level 4 Impact - Graphed results from Likert formulated questions in examining second year students’ general attitudes to analysis and design after two years of software development study (n = 41). Table 5. AKM-SOLO Level 4 Impact - Tabular results from survey given to second years recording their attitudes to analysis and design after two years of software development study (n = 41). Question

Result

Q2 – From the following options, indicate which option most closely matches your approach to solving logical errors in your programs

68% (n = 28) - “keep changing the code to try and get it to work”, 17% (n = 7) “go back to [their] design and see if [they] can find any problems in your logic”, 10% (n = 4) “get help from a friend”, 4% (n = 2) checked the Other with 2% would “check code first and then go back to design” 2% would “mix between checking code and asking friends”

Q3 – From the following options, choose the problem solving style that most closely matches your approach when solving challenging problems

73% (n = 30) - I would try and find a similar problem 22% (n = 9) - I would design part of the solution first, then write code based on that design 2.5% (n = 1) - I would design a full solution first 2.5% (n = 1) - All of the above

It can be seen that 19% (n = 8) of students perceived analysis and design to be a valuable aspect of software development; with a majority of students (63%, n = 26) stating that they found the specific analysis and design techniques taught in first year were unhelpful (see Q6 and Q7 from Fig. 5). However, this 63% was in a context where

Novice Learner Experiences in Software Development

0%

Q6 – I found the first year analysis and design techniques useful when solving problems this year.

7%

Q7 - I found the first year analysis and design techniques unhelpful this year and would prefer to have been taught other techniques Q8 – I found the first year analysis and design techniques unhelpful this year and also see no value in learning other techniques

63%

15% 12%

3% 7%

24%

63%

0% 20% 4% 0% 5%

Q9 –The first year techniques taught have not influenced my view of the value analysis and design in general

17%

0% 0%

323

37% 34%

37% 42%

10% 20% 30% 40% 50% 60% 70%

Percentage of Students (N=41) Strongly Disagree

Disagree

No opinion

Agree

Strongly Agree

Fig. 5. AKM-SOLO Level 4 Impact - Results from survey given to second years recording their attitudes to the specific analysis and design techniques they were taught in first year (n = 41).

they did want to learn other techniques as 57% (n = 23; Q8 from Fig. 5) of students disagreed with the statement that they didn’t want to learn other techniques. This suggests that students, at least theoretically, see the value in analysis and design and are open to learning other planning strategies which is also supported by 42% (n = 17; Q9 from Fig. 5) of students indicating that the choice of techniques taught has not influenced their opinion of analysis and design in general. Unsurprisingly, of the students who gave examples of impediments to learning in first year, 62% (n = 12) specifically cited pseudocode as being unhelpful to them in first year with 21% (n = 4) indicating they saw no value to analysis and design in general. 3. What recommendations do students have to improve analysis and design in first year? Almost half (44%; n = 19) of students answered this question; and the responses were categorised, by the researcher, into three themes with many students suggesting more than one theme. 59% (n = 24) suggested diagrammatic techniques should be used for analysis and design; 32% (n = 13) indicated that analysis and design should be explicitly included in their second year of study and finally, 73% (n = 30) of students wanted classes dedicated to providing a definite strategy for how to solve problems.

324

C. Higgins et al.

4. What impact has students’ first year experience had on their affective state when applying problem solving techniques to computational problems in second year? To examine the impact of first year on students’ affective levels, students were asked three closed questions. The results from these questions is presented in Table 6. Table 6. AKM-SOLO Level 4 Impact - Results from survey given to second years to examine their emotional responses to analysis and design in second year (n = 41). Question

Result

Q13 – Indicate the impact of your first year Positive – 33% (n = 10) experience learning how to solve problems on your Negative - 47% (n = 14) motivation to solve problems in second year as Neutral – 20% (n = 6) positive, negative or neutral Q14 – Rate on a 5 point Likert scale your current confidence level in software development from none to very confident

Very Confident – 7% (n = 3) Reasonable Confidence – 32% (n = 13) Low Confidence – 44% (n = 18) No Confidence – 17% (n = 7)

Q15 - Rate on a Likert scale your level of interest in working as a developer following graduation with points from Definitely to Definitely Not

Definitely future career – 29% (n = 12) Probably future career – 27% (n = 11) Don’t know – 22% (n = 9) Probably not future career – 22% (n = 9) Definitely not future career – 0%

Almost half of students (47%, n = 14, Q13 from Table 6) indicated that their first year experience of learning how to analyse and design solutions to problems had a negative impact on their motivation to plan solutions to problems in second year. This lack of planning is also borne out in students’ confidence levels, with 61% (n = 25, Q4 from Table 6) indicating they had low or no confidence in their ability to solve software problems. This negative affective impact is also reflected in 44% (n = 18, Q5 from Table 6) of students who either don’t know or definitely feel that this is not a future career for them. This means that by the end of second year, students are negatively affected by analysis and design which is impacting both their confidence and motivation in their software development studies.

5 Discussion From the application of the AKM-SOLO model in this study, it has been observed that students’ overall attitudes to - and affective states when - problem solving in software development is not encouraging. From the findings in the last section, it can be observed that students regard engaging in software development to be primarily about programming with the concept of designing solutions in particular considered not to be useful and is avoided where possible. It was also seen that this attitude carries through to the end of their second year on the programme. This is not an unexpected result given that it

Novice Learner Experiences in Software Development

325

has been cited in the literature that getting students to design solutions rather than try to program a solution through trial and error or memorizing other solutions is very difficult [20, 37]. Nonetheless, this is a worrying result especially as the issue is not that students do not have the aptitude to be software developers, but rather that they are not developing the analysis and design skills that allow them engage them with the problem solving nature of the discipline as a whole. As educators, we wish them to become developers who can design and implement solutions but they are inadvertently being taught with a focus on being programmers instead and this is affecting their proficiency and positive attitude to software development. Student engagement is generally considered to be a predictor of learning [38]. However, it has been noted that computer science students’ general level of engagement in their studies has been recorded internationally as being much lower than students from other disciplines [39]. Therefore, the relatively low engagement level of 5.7 out of 12 found in this study is not surprising as it suggests that a majority of students are not adequately engaged with the topic and that is borne out in the consistently underperforming set of actual SOLO scores acquired across the four topics taught in first year. Interestingly, 94% (n = 82) of the first year survey respondents view the process of programming as being more important than the analysis and design stages which suggests that they don’t see the value in carrying out planning prior to writing a program. This is an issue also observed by Garner [40] and it has been found that this lack of focus on planning is a lead issue in the development of maladaptive cognitive practices [7]. The results from this study suggest that student engagement in the process of solving software development problems is directly aligned to how useful they find the process of carrying out analysis and design. If the process of analysis and design wasn’t objectively important in software development, then students would be able to skip this stage and move directly to coding, and their engagement level would not be affected which has not been observed here. This is not a new observation as the importance of structuring problem solving into analysis and design strategies for novices has been recognised in the literature [41, 42]. Therefore, as the engagement level is low and their depth of learning in analysis and design is not at a SOLO relational stage, this suggests that if students can’t successfully participate in analysis and design, this affects their ability to engage fully with their studies to become proficient developers. On examining the findings, most students found the process of analysis (i.e. breaking a problem into a series of sub-tasks that need to be solved) to be a useful activity to help them start solving a problem. This is typical top-down analysis which has long been proven as a mechanism to support students [43]. This is reflected both in the responses from students in the focus group and survey as well as the improvement seen in SOLO scores for the ILO analysis and decomposition across the four topics. However, despite this positive experience, this ILO is still not at the SOLO relational stage that would be expected of students at the end of their first year, which suggests further structure in carrying out analysis would help. Students need to be able to visualise and create mental models in order to understand “what” needs to be done to solve a problem. However, it has been observed that most students find such mental modelling difficult [44]. Therefore, adding a visualisation technique to the analysis process could be useful

326

C. Higgins et al.

in helping students both carry out analysis as well as engage in the mental modelling required. The area of design is a seriously divisive issue for students. It has been found in other studies that design is typically a much harder task for novice learners than programming due to; the need for complex mental modelling of computing constructs to take place in order to design a solution and also the issues with understanding pseudocode and its inherent lack of feedback [40, 45, 46]. Likkanen & Perttula [47] also observe that even if students successfully complete design in a top-down fashion where they decompose a problem into sub-problems, they often then experience difficulties in integrating the subproblem solutions back into a final solution. These issues with design are also reflected in this study where it is very clear that pseudocode as a design technique is not fit for purpose; most students find it neither useful nor helpful. From the survey findings in research question 3 in Sect. 4.1, it can be seen that novice learners find it difficult to understand the role of pseudocode as a mechanism to abstract from the technicalities of a programming language and instead see it as yet another language they have to learn. This language issue with pseudocode was observed by Hundhausen et al. [48]. Students also criticized the lack of support and structure in this design technique which they find makes it difficult to use effectively. This difficulty is reflected by many students indicating that they move immediately to the coding phase before they have adequately decomposed a problem or carried out at least some design for a solution. From the focus group findings, this issue also emerges where it can also be seen that this issue with pseudocode is biasing students against their perception of design as being a useful process. This difficulty with design is also reflected in the SOLO scores where the ILOs of design, integration and solution reuse were found to have the lowest SOLO scores across the four topics; signaling students have a specific issue with these topics. Equally the ILOs involving the mental modelling of the notional machine, the use of abstraction and the evaluation of solutions also returned consistently low scores. As an alternative to pseudocode, it was seen from the survey findings in Sect. 4.1 that some students successfully gravitated towards using design techniques such as flowcharts to support them in designing algorithms despite it not being taught. Given that flowcharts have been cited in the literature as a very credible mechanism for visualising a flow of control in an algorithm [49] and that they also are a natural visualisation technique, such charts could be a very useful alternative to help students engage in the process of design. These negative attitudes to design were borne out when examining the results obtained when students were asked to develop a solution to an ill-defined, larger problem at the end of their first year. While students demonstrated marginal improvements in design, they were in general ill-equipped to analyse and design a solution and instead reverted to surface strategies of trying to program a solution without a plan. Students particularly exhibited problems with integrating incremental solutions into a final solution as can be seen in Fig. 2. Given these issues students have with design and integration in solving a larger problem, it is not surprising then to observe that students also performed poorly in being able to model the notional machine, carry out reuse and utilise abstraction to enable them plan a solution at different levels of detail. This is a worrying result, as going forward in their studies, students will naturally be expected to be able

Novice Learner Experiences in Software Development

327

to solve more complex and ill-defined problems which require the active planning and modelling of solutions. In their second year, students received a full year of tuition in advanced programming which would have improved their technical knowledge. However, as was seen in the results for AKM-SOLO level 3, this had no impact on the negative regard they had for the problem planning techniques taught in first year which is carried through their second year. 66% (n = 27) of second year students indicated they did not find their first year experience in problem planning to be useful and an equal number of students specifically cited pseudocode as being an impediment to learning. This result is also borne out in the literature as Hu [50] in synthesising research from [9, 51–53] found that increased educational attainment in software development has little effect on students’ valuation of design as a useful process. However, encouragingly 57% (n = 23) disagreed with the statement that they would not like to learn other planning techniques which would tentatively suggest that they do see the value in planning even theoretically. From an affective perspective, 47% (n = 19) of second year students indicated that their first year experience had a negative impact on their motivation to solve problems in second year with 61% (n = 25) of students indicating they have poor or low confidence levels in software development. Such results are a concern as they suggest a high proportion of students are at risk of either leaving or failing to proceed in the programme. Overall, these results highlight the important role that first year analysis and design has in forming effective software development habits that will enable students to grown their proficiency and affective state as they proceed through their studies. This view is also supported by Hu [50] who argues for the use of an explicit design process when teaching software development as opposed to the global norm of using informal design strategies. In summary, the results produced less than satisfactory findings around the issue of problem solving for software development coupled with a low level of engagement. Therefore, it can be concluded that if students perceive they are not appropriately supported in the development process by the use of appropriate development techniques, this has a negative impact on their engagement levels with software development. This impact can negatively affect their chances of continuing, and succeeding, in their course as well as deciding to pursue a career in software development. These findings suggest that in order for students to engage in problem solving in software development that they need to be properly scaffolded and supported by a software development process to guide them in acquiring good development planning habits as they set out on their learning journey. This suggestion for an explicit process is explicitly backed up by 73% (n = 30) of second year students who indicated they wanted classes dedicated to providing a definite strategy for how to solve problems.

6 Conclusions There has been over thirty years of research into researching and proposing new pedagogical approaches to teaching software development to freshman, undergraduate students. However, despite the valuable innovations that this research has produced, there are still ongoing issues recorded globally with proficiency and retention in comparison with other undergraduate programmes. This case study examined the learning experiences of

328

C. Higgins et al.

an undergraduate first year cohort who were studying software development as novices. It was found that the absence of a formal software development process for this cohort resulted in students attempting to program solutions to problems with little interest or engagement in problem planning and this issue continued into their second year of undergraduate study. In general, students could not see the benefit in carrying out analysis and design for problem solving and this not only affected their proficiency in software development but also had a negative impact on their desire to work in the software industry. These results suggest that the provision of an educational software development process, aimed specifically at first year novice learners, could have a positive impact on their learning and attitudes to problem solving in software development.

References 1. United States Department of Labor.: Computer and Information Technology Occupations. https://www.bls.gov/ooh/computer-and-information-technology/home.htm. Assessed 2 Feb 2018 2. Stachel, J., Marghitu, D., Brahim, T.B., Sims, R., Reynolds, L., Czelusniak, V.: Managing cognitive load in introductory programming courses: A cognitive aware scaffolding tool. J. Integr. Des. Process Sci. 17(1), 37–54 (2013) 3. Whalley, J., Kasto, N.: A qualitative think-aloud study of novice programmers’ code writing strategies. In: Proceedings of the 2014 Conference on Innovation & Technology in Computer Science Education, pp. 279–284. ACM, Uppsala (2014) 4. Caspersen, M.E., Kolling, M.: STREAM: a first programming process. ACM Trans. Comput. Educ. 9(1), 1–29 (2009) 5. Suo, X.: Toward more effective strategies in teaching programming for novice students. In: IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE), pp. T2A-1--T2A-3 (2012) 6. Coffey, J.W.: Relationship between design and programming skills in an advanced computer programming class. J. Comput. Sci. Coll. 30(5), 39–45 (2015) 7. Huang, T.-C., Shu, Y., Chen, C.-C., Chen, M.-Y.: The development of an innovative programming teaching framework for modifying students’ maladaptive learning pattern. Int. J. Inf. Educ. Technol. 3(6), 591–596 (2013) 8. Simon et al.: Predictors of success in a first programming course. In: Proceedings of the 8th Australasian Conference on Computing Education, vol. 52, pp. 189–196. Australian Computer Society, Inc., Hobart (2006) 9. Loftus, C., Thomas, L., Zander, C.: Can graduating students design: revisited. In: Proceedings of the 42nd ACM Technical Symposium on Computer Science Education, pp. 105–110. ACM, Dallas (2011) 10. Kirkpatrick, D.L.: Education Training Programs: The Four Levels, 3rd edn. Berrett-Kohler, San Francisco (1994) 11. Higgins, C., O’Leary, C., McAvinia, C., Ryan, B.: A study of first year undergraduate computing students’ experience of learning software development in the absence of a software development process. In: Lane, H., Zvacek, S., Uhomoibhi, J. (eds.) CSEDU 2019–11th International Conference on Computer Supported Education, 2019. SCITEPRESS, Heraklion, Crete (2019) 12. Pears, A., et al.: A survey of literature on the teaching of introductory programming. ACM SIGCSE Bull. 39(2), 2004–2023 (2007)

Novice Learner Experiences in Software Development

329

13. Guo, P.J.: Online python tutor: embeddable web-based program visualization for CS education. In: Proceeding of the 44th ACM Technical Symposium on Computer Science Education, pp. 579–584. ACM, Denver (2013) 14. Gautier, M., Wrobel-Dautcourt, B.: artEoz-dynamic program visualization. In: International Conference on Informatics in Schools, pp. 70–71, Münster, Germany (2016) 15. Mozelius, P., Shabalina, O., Malliarakis, C., Tomos, F., Miller, C., Turner, D.: Let the students contruct their own fun and knowledge-learning to program by building computer games. In: European Conference on Games Based Learning, pp. 418–426. Academic Conferences International Limited, Porto, Portugal (2013) 16. Trevathan, M., Peters, M., Willis, J., Sansing, L.: Serious games classroom implementation: teacher perspectives and student learning outcomes. In: Society for Information Technology & Teacher Education International Conference, pp. 624–631. Association for the Advancement of Computing in Education (AACE), Savannah, Georgia (2016) 17. Dahiya, D.: Teaching software engineering: a practical approach. ACM SIGSOFT Softw. Eng. Notes 35(2), 1–5 (2010) 18. Savi, R., von Wangenheim, C.G., Borgatto, A.F.: A model for the evaluation of educational games for teaching software engineering. In: 25th Brazilian Symposium on Software Engineering (SBES), pp. 194–203. IEEE, Sao Paulo (2011) 19. Rodriguez, G., Soria, Á., Campo, M.: Virtual Scrum: a teaching aid to introduce undergraduate software engineering students to scrum. Comput. Appl. Eng. Educ. 23(1), 147–156 (2015) 20. Wright, D.R. Inoculating novice software designers with expert design strategies. In: American Society for Engineering Education. ASEE (2012) 21. Hu, M., Winikoff, M., Cranefield, S.: A process for novice programming using goals and plans. In: Proceedings of the Fifteenth Australasian Computing Education Conference, vol. 136, pp. 3–12. Australian Computer Society, Inc, Adelaide (2013) 22. Neto, V.L., Coelho, R., Leite, L., Guerrero, D.S., Mendon, A.P.: POPT: a problem-oriented programming and testing approach for novice students. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 1099–1108. IEEE Press, San Francisco (2013) 23. Kirkpatrick, D.: Kirkpatrick, Evaluating Training Programs: The four levels, 3rd edn. BerrettKoehler Publications, San Francisco (2013) 24. Chang, N., Chen, L.: Evaluating the learning effectiveness of an online information literacy class based on the Kirkpatrick framework. 64(3), 211–223 (2014) 25. Byrne, J.R., Fisher, L., Tangney, B.: A 21st century teaching and learning approach to computer science education: teacher reactions. In: Zvacek, S., Restivo, M.T., Uhomoibhi, J., Helfert, M. (eds.) CSEDU 2015. CCIS, vol. 583, pp. 523–540. Springer, Cham (2016). https:// doi.org/10.1007/978-3-319-29585-5_30 26. Reio, T.G., et al.: A critique of kirkpatrick’s evaluation model. New Horizons Adult Educ. Hum. Resource Dev. 29(2), 35–53 (2017) 27. Guerci, M., Bartezzaghi, E., Solari, L.: Training evaluation in Italian corporate universities: a stakeholder-based analysis. Int. J. Training Dev. 14(4), 291–308 (2010) 28. Hayes, H., et al.: A formative multi-method approach to evaluating training. Eval. Program Planning 58, 199–207 (2016) 29. Aluko, F.R., Shonubi, O.K.: Going beyond Kirkpatrick’s Training Evaluation Model: the role of workplace factors in distance learning transfer. Africa Educ. Rev. 11(4), 638–657 (2014) 30. Alliger, G.M., et al.: A meta-analysis of the relations among training criteria. Pers. Psychol. 50(2), 341–358 (1997) 31. Biggs, J.B., Collis, K.F.: Evaluation the Quality of Learning: the SOLO Taxonomy (Structure of the Observed Learning Outcome). Academic Press (1982) 32. Biggs, J.B., Collis, K.F.: Evaluating the Quality of Learning: The SOLO Taxonomy (Structure of the Observed Learning Outcome). Academic Press (2014)

330

C. Higgins et al.

33. Cronbach, L.J.: Coefficient alpha and the internal structure of tests. Psychometrika 16(3), 297–334 (1951) 34. Kruskal, W.H., Wallis, W.A.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47(260), 583–621 (1952) 35. Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006) 36. Beins, B.C., McCarthy, M.A.: Research Methods and Statistics. Cambridge University Press (2017) 37. Garner, S.: A quantitative study of a software tool that supports a part-complete solution method on learning outcomes. J. Inf. Technol. Educ. (2009) 38. Carini, R.M., Kuh, G.D., Klein, S.P.: Student engagement and student learning: Testing the linkages. Res. High. Educ. 47(1), 1–32 (2006) 39. Sinclair, J., et al.: Measures of student engagement in computer science. In: Proceedings of the 2015 ACM Conference on Innovation and Technology in Computer Science Education. ACM (2015) 40. Garner, S.: A program design tool to help novices learn programming. In: ICT: Providing Choices for Learners and Learning (2007) 41. Deek, F., Kimmel, H., McHugh, J.A.: Pedagogical changes in the delivery of the first-course in computer science: problem solving, then programming. J. Eng. Educ. 87(3), 313–320 (1998) 42. Morgado, C., Barbosa, F.: A structured approach to problem solving in CS1. In: Proceedings of the 17th ACM Annual Conference on Innovation and Technology in Computer Science Education. ACM, Haifa (2012) 43. Ginat, D., Menashe, E.: SOLO taxonomy for assessing novices’ algorithmic design. In: Proceedings of the 46th ACM Technical Symposium on Computer Science Education. ACM (2015) 44. Cabo, C.: Quantifying student progress through Bloom’s taxonomy cognitive categories in computer programming courses. In: ASEE Annual Conference and Exposition, Conference Proceedings (2015) 45. Lahtinen, E., Ala-Mutka, K., Järvinen, H.-M.: A study of the difficulties of novice programmers. In: ACM SIGCSE Bulletin. ACM (2005) 46. Hummel, H.G.K.: Feedback model to support designers of blended learning courses. Int. Rev. Res. Open Distrib. Learn. 7(3) (2006) 47. Liikkanen, L.A., Perttula, M.: Exploring problem decomposition in conceptual design among novice designers. Des. Stud. 30(1), 38–59 (2009) 48. Hundhausen, C.D., Brown, J.L.: What You See Is What You Code: A “live” algorithm development and visualization environment for novice learners. J. Vis. Lang. Comput. 18(1), 22–47 (2007) 49. Paschali, M.E., et al.: Tool-assisted Game Scenario Representation Through Flow Charts. In: ENASE (2018) 50. Hu, C.: Can students design software?: The answer is more complex than you think. In: Proceedings of the 47th ACM Technical Symposium on Computing Science Education. ACM, Memphis (2016) 51. Eckerdal, A., et al.: Can graduating students design software systems? ACM SIGCSE Bull. 38(1), 403–407 (2006) 52. Eckerdal, A., et al.: Categorizing student software designs: methods, results, and implications. Comput. Sci. Educ. 16(3), 197–209 (2006) 53. Tenenberg, J.D., et al.: Students designing software: a multi-national, multi-institutional study. Inf. Educ. 4(1), 143–162 (2005)

Promoting Active Participation in Large Programming Classes Sebastian Mader(B) and Fran¸cois Bry Ludwig Maximilian University of Munich, Munich, Germany [email protected]

Abstract. Introducing flipped classrooms in large lectures which are predominant across Northern America and Europa is not an easy task, as with increasing class sizes it becomes more and more difficult for lecturers to support students while working on the exercises: To know whom to support, an overview of the class is required. This article introduces the learning and teaching format Phased Classroom Instruction which aims at making active learning similar to flipped classrooms usable in large class lectures with the help of technology. Technology supports students while working on exercises through immediate feedback and scaffolding, and lecturers by giving them an overview of the class what allows them to identify the struggling students. The contributions of this article are twofold: First, the novel learning and teaching format Phased Classroom Instruction with its specific technological support and two evaluations of the proposed format in a small and a large class respectively showing that the format is being well-liked by students, but exposing problems of the format in the large classes as well. This article is an extended version of [22], first presented at CSEDU2019 in Crete and extends upon the original article by adding a more detailed description of the technological support and a second evaluation.

Keywords: Active learning

1

· Peer review · Flipped classroom

Introduction

Active learning can be “defined as any instructional method that engages students in the learning process” [26, p. 233], which has shown to positively affect students’ learning when compared to traditional lecturing in STEM education [10]. Nonetheless, the traditional lecture is still the most predominant form of teaching across North American [28] and European universities. That might be justified by the fact that lectures are the most efficient way of transferring knowledge, which has been shown to be as effective through lectures as through other means, but are generally less appropriate for evoking thought of or interest in the subject [4]. For evoking thought and interest, active approaches might by c Springer Nature Switzerland AG 2020  H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 331–354, 2020. https://doi.org/10.1007/978-3-030-58459-7_16

332

S. Mader and F. Bry

more appropriate, but in their study examining the structure of STEM teaching in North American universities, Stains et al. [28] observe that with increasing class sizes lectures tend to include less interactive components, which might be an effect of ever-increasing numbers of students, what often leaves the traditional lecture as the last resort. An often used active learning format are flipped classrooms, which flip activities traditionally done during lectures with activities done outside of lectures: Students are learning the subject matter outside of lectures using learning material provided by lecturers, and lectures are dedicated to practical exercises [3] with lecturers standing ready the act as “guide on the side” [18] to those students unable to solve the exercises on their own. Flipped classrooms face two challenges: First, the creation of material for self-learning is generally more timeconsuming than the creation of material for a lecture session, as the former has to be more bullet-proof without an lecturer at hand who can provide clarifications and correct errors on-the-fly, and second, flipped classrooms scale badly with larger class sizes as with increasing numbers of students it becomes more and more harder for lecturers to identify whom to help. Technology opens up new ways for active learning in large classes – as technology can support students and lecturers alike during exercises: Students can be supported through problem- or subject-specific editors that provide immediate feedback and scaffolding allowing more students to solve an exercise on their own, and for lecturers, technology can provide an overview of their class what enables them to identify struggling students that are unable to solve the exercise on their own. The learning and teaching format Phased Classroom Instruction combines mini-lectures of about 20 to 25 min with the aforementioned technologysupported exercises where students are supported in solving exercises and lecturers in keeping an overview of their class. The inherent difficulty of self-learning, regardless of how good the learning material might be, is mitigated through mini-lectures, which allow lecturers to provide further explanations and react to errors in the lecture material. The technological support makes all students’ submissions available for further processing, which opens up ways to use those in further phases: Phased Classroom Instruction concludes with a phase in which each student gets assigned another student’s submission for review. The contributions of this article are twofold: First, the novel learning and teaching format Phased Classroom Instruction with its specific technological support and two evaluations of the proposed format in a small and a large class respectively showing that the format is being well-liked by students, but exposing problems of the format in the large classes as well. This article is structured as follows: Sect. 1 is this introduction. In Sect. 2 related work is discussed. Section 3 describes the course of a lecture using Phased Classroom Instruction. The technology enabling and supporting Phased Classroom Instruction is introduced in Sect. 4. Section 5 reports on two evaluations of the formats done in lectures on programming. Section 6 summarizes the arti-

Promoting Active Participation in Large Programming Classes

333

cle, points out lessons learnt from the evaluation, and provides further research directions. This article is an extended version of [22], first presented at CSEDU2019 in Crete and extends upon the original article by adding a more detailed description of the technological support and a second evaluation which points to the format working in large classes but exposing problems as well.

2

Related Work

The course format introduced in this article is a contribution to flipped classrooms and relates to feedback, peer review, and scaffolding. The following section is a verbatim reproduction of the section on related work of [22]. The survey of flipped classroom research by Bishop et al. [3] distinguishes two kinds of teaching activities: Activities taking place in the classroom and activities taking place outside of the classroom. For their survey, to qualify as flipped classroom, classroom activities have to consist of “interactive group learning activities” [3, p. 4], while outside classroom activities have to consist of “direct computer-based individual instruction” [3, p. 4]. Their definition excludes formats that do not use videos for outside classroom activities as well as formats that include traditional lectures among classroom activities. Thus, according to their definition, the format proposed in this article does not qualify as flipped classroom even though it incorporates components of that format. Flipped classrooms have already been deployed and evaluated in STEM education: Amresh et al. [1] introduced a flipped classroom in an introductory computer science class of 39 students. Their evaluation shows that while the flipped classroom improved examination results, some students were overwhelmed and intimidated by the format. Gilboy et al. [11] applied a flipped classroom to two classes on nutrition: Outside of the classroom, students learned from minilectures and written material while all of the in-classroom time was devoted to active learning in form of a “jigsaw classroom” (see [2]). In an evaluation, the majority of students preferred the classroom learning activities to a traditional lecture of similar duration. While the aforementioned studies represent flipped classrooms adhering to Bishop et al.’s definition, other studies examined flipped classrooms in which some kind of lecture took place during the in-classroom activities: Stelzer et al. [30] introduced flipped classrooms in an introductory course in physics attended by 500 to 1,000 students. Classroom activities were conducted in groups of 24 students. At the beginning, the course format did not include any lectures among the classroom activities, but it was later adapted to contain a small lecture at the classroom sessions’ beginning recapitulating the outside classroom activities. The authors observed a positive influence of the educational format on examination results. Furthermore, the course was perceived by the students as less difficult than the same course taught in a traditional manner. McLaughlin et al. [23] used a flipped classroom approach to teach pharmaceutics to 162 students. Their approach included on-demand micro-lectures to “reinforce and, if needed, redirect students’ learning” [23, p. 3].

334

S. Mader and F. Bry

The “Taxonomy of Educational Goals” by Bloom [5] defines a hierarchy of six educational goals aimed at comparing and classifying of educational formats and content: Knowledge, Comprehension, Application, Analysis, and Evaluation. Each of the aforementioned goals has all previously mentioned goals as precondition, the rationale being that nothing can be applied without first being known and comprehended. Every goal except Application is further subdivided into more specific goals. The taxonomy was revised by Krathwhol [19]: The goals are expressed as verbs and reordered resulting in the goals Remember, Understand, Apply, Analyze, Evaluate, and Create. In the revised taxonomy, Remember is not equal to Knowledge from the original taxonomy, in fact Knowledge is broken up into four dimensions: Factual Knowledge, Conceptual Knowledge, Procedural Knowledge, and Metacognitive Knowledge. These knowledge dimensions are considered orthogonal to the formerly mentioned goals, resulting in a two-dimensional taxonomy. This taxonomy makes possible to express distinct objectives such as “analyze factual knowledge” and “analyze conceptual knowledge”. Peer review supports students in attaining parts of the learning goal Analyze, as this goal concerns itself with “making judgements based on criteria and standards” [19, p. 215]. Peer review consists in students providing feedback on their peers’ work. This feedback can either replace the lecturer’s feedback or extend it. Peer review is reflexive in the sense of making reviewers reflect on their own work [20,32,35] and has been shown to have a positive impact on reviewers’ own writing [20]. In a study in tertiary STEM education, Heller and Bry [15] used peer review for providing feedback on coding assignments. Their study showed that in the majority of cases, the delivered peer review was correct and that the majority of the students found helpful to deliver peer reviews, but that they found only sometimes helpful the peer reviews they received on their own work. To support students giving peer review, scoring rubrics can be provided to students [8]. A rubric is “a scoring guide to evaluate the quality of students’ constructed responses” [25, p. 72]. Jonsson and Svingby [17] conclude in their survey, that the use of scoring rubrics in peer review can further support students’ learning. According to Hattie and Timperley [13], feedback is “information provided by an agent (...) regarding aspects of one’s performance and understanding” [13, p. 81] and effective feedback should answer three questions [13, p. 87]: – “Where Am I Going?” [13, p. 88]: With feed up, a task can be contextualized showing learners to what end a certain concept should be learned. – “How Am I Going?” [13, p. 89]: The feed back dimension gives information about students’ performance on a task and how their performance relates to some performance goal. – “Where to Next?” [13, p. 90]: With feed forward, learners’ can be given an outlook where to next, e.g., by providing them with further sources of information on concepts not understood or on related concepts.

Promoting Active Participation in Large Programming Classes

335

Feedback can be provided on four levels [13]: – Task-level feedback is given pertaining students’ work on a task, e.g., the correctness of a mathematical computation. – Process-level feedback aims to give information about the processes that are involved in achieving a goal, e.g., giving learners information about what rules to apply to simplify a mathematical equation. – Self-regulation feedback aims to provide feedback about students’ selfregulation skills and metacognitive knowledge, e.g., the skill to self-evaluate one’s own work. – Self-level feedback is unrelated to the task and only pertains the student, e.g., “You are very talented.” With the exception of self-level feedback, each of the aforementioned feedback levels is effective depending on the learner and the situation in which the feedback is given. [13] As of the timing of feedback, Hattie and Timperley conclude from surveying various studies that task-level feedback should be provided as soon as possible and process-level feedback should be delayed so as not to inhibit the construction of the learners’ autonomy. Scaffolding is defined by Wood et al. [36] as a “process that enables a child or novice to solve a problem (...) which would be beyond his unassisted efforts” [36, p. 90]. One way to provide scaffolding is by feedback. Merri¨enboer et al. [33] further specify scaffolding as a combination of “performance support and fading” [33, p. 5], the support first being provided to the students while they are working towards a goal and later being gradually withdrawn, or “faded”. Scaffolding can be provided in person by instructors which generally requires them to work with a single student or a small group of students or provided by software what can accommodate more students and be used in out-of-classroom learning scenarios. Automatic scaffolding is often provided in form of feedback and can be found in intelligent tutoring systems and adaptive learning environments. The “Test My Code” environment by Vihavainen et al. [34] provides feedback in form of both automated tests and incremental exercises, that is exercises partitioned into guidance-giving subtasks. Pedro et al. [27] developed an automatic scaffolding system which supports students in developing data collection skills by means of simulations of scientific experiments. Their environment detects off-track students and provides them with feedback to help them get back on track. In their study, they showed that the automatic scaffolding provided by their system helped students to develop data collection skills. Automatic scaffolding is explored for non-STEM subjects as well: He et al. [14] propose a system for automatic assessment of text summaries produced by students which provides feedback in the form of key points missing in students’ summaries. Yang [37] provides computer-generated feedback about a concept map created by students about a text’s content. Yang observed that the feedback provided by their system had a positive impact on the students’ reading comprehension as well as on their summary-writing skills. Didactic reduction is very similar to scaffolding and fading. Didactic reduction is a term coined by Gr¨ uner [12] and refers to breaking down a concept into

336

S. Mader and F. Bry

its most basic parts (“scaffolding”) while still retaining its functionality. Later on, the more advanced parts can be put back step by step (“fading”) until the concept is available in its full complexity. In tertiary STEM education, new concepts are often embedded into other concepts making it hard for students to grasp the actual concept to learn. With didactic reduction, other concepts can be omitted at first, and later on be reintroduced step by step. The article Heller et al. [16] describes how various course formats can be realized by Backstage.1 Phased Classroom Instruction is briefly mentioned, however, without referring to its evaluation and implementation first reported about in this article.

3

Phased Classroom Instruction

The following section introduces Phased Classroom Instruction and near verbatim reproduction of Sect. 3.1 of [22]. All mentions of teams have been rephrased to mention students as well, as the format is not limited to teams. A session in the format consists of one or more blocks each of them consisting of three phases: 1. a lecture (subsequently mini-lecture) of about 15 to 25 min introducing new concepts to the students 2. an extensive practical exercise where students or teams work on an exercise putting the newly acquired concepts to use. During this phase, the lecturer stands ready to provide struggling teams with support 3. a peer review where each student or team is assigned another student’s or team’s submission for review Mini-lectures minimize the amount of passive listening and therefore counteract the problem of students’ attention dropping during lectures after about 25 to 30 min [31]. To restore students’ attention, Young et al. suggest that “short breaks or novel activities may temporarily restore attention to normal levels” [38, p. 52], which leads to the next part of the format, the exercise for students to work on alone or in teams. Depending on the subject taught, the exercise can take different forms: From a larger coding exercise, a mathematical proof, to the creation of a larger body of text about some topic. Working on exercises in teams leverages benefits of collaborative learning, such as an improvement in academic achievement [26]. The combination of lecture, exercise, and peer review brings the majority of Bloom’s taxonomy (in the revised form) into the classroom: For “Knowledge”, the mini-lectures provide both the factual and conceptual knowledge required for the exercise. If exercises are formulated in a scaffolded way, they can provide procedural knowledge: Breaking a bigger exercise into smaller subtasks shows students the problems the bigger problem is composed of and by that one possible approach to solve the problem. Peer review teaches students the ability 1

https://backstage2.pms.ifi.lmu.de:8080.

Promoting Active Participation in Large Programming Classes

337

to evaluate and correct work of others, and therefore supports students’ metacognitive knowledge through its reflective nature. The dimensions “Remember” and “Understand” are covered by the minilectures, “Apply” and (parts of) “Analyze” by exercises, and (parts of) “Analyze” and “Evaluate” through peer review. Therefore, the format should cover six of the seven steps of Bloom’s taxonomy. The last of step, “Create”, would arguably require more extensive problems and a longer duration, which can hardly be implemented in a lecture. While the process described above does not require technology to work, as students can work on paper, lecturers identify struggling students by walking through the classroom, and peer review is done by switching papers with desk neighbour, with increasing class sizes all those steps get more and more difficult. Therefore, technology can help to bring the active learning format described above to large classes.

4

Technological Support

In order to make Phased Classroom Instruction work in large classes, technology has to provide support on various levels: Provide support to students who are on the brink on being able to solve the exercise on their own, so that those are empowered to solve exercises on their own, provide support to lecturers to identify those students who cannot solve exercises being only supported by technology alone and therefore requiring their personal help, and support for orchestrating the format’s phases, from assigning exercises to distribution of submissions for review to presenting each student with their reviewed submissions. The following section is a vastly extended version of Sect 3.2 of [22]. The technological support for Phased Classroom Instruction is implemented in the learning and teaching platform Backstage2 which will be shortly introduced in the following before discussing how Backstage supports students and lecturers using the format. Backstage is a web platform that consists of a backchannel and an audience response system (ARS). A backchannel is a communication medium that provides students a possibility to communicate anonymously during lectures. In case of Backstage, that communication is done using annotations on lecture slides, which are immediately synchronized with all other participants and can be commented on. An ARS allows lecturers to run quizzes during lectures which provide immediate feedback to lecturers and students alike, supporting students’ self-assessment and lecturers’ assessment of a class’ grasp of the taught subject [6]. The current version of Backstage allows course material (that is, the contents being presented during a lecture) to consist of various types of media, such as PDF slides, images, videos, or code, most of which can be annotated which enables the aforementioned backchannel functionality. The ARS allows 2

https://backstage2.pms.ifi.lmu.de:8080.

338

S. Mader and F. Bry

students to input their solution to quizzes using various editors that go beyond multiple-choice and quizzes to span more phases. A more detailed introduction to Backstage’s ARS can be found in [21]. The different editors that students can use directly from their browsers in conjunction with an ARS that supports multiple phases allowed Phased Classroom Instruction to be easily implemented using Backstage. In the following sections, the support provided by Backstage to students and lecturers is introduced. 4.1

Support by Technology for Students

Usually, when students are tasked to work on an exercise, they use pen and paper or software that is not built for learning, but software that would be used by professionals as well, such as integrated development environments for coding tasks. Students working on exercises in those ways are often dependent on human tutors for help as the help systems of the software are focussed on the usage of the software and not the subject matter they are used for; when working on paper, the obvious choices for finding help are the internet or books, where students are – in the authors’ experience – often not able to find the required information. Backstage’s aforementioned ARS supports a wide variety of problem- or subject-specific editors for students to create their submissions. Problem- or subject-specific editors are software that supports students in learning a certain subject matter by supporting them throughout their learning by computerprovided scaffolding, didactic reduction, and immediate feedback. Those editors allow more students to successfully solve the exercises, freeing up lecturers’ time to support those students who require their personal support. An example for an exercise-specific editor is an editor that allows students to create proofs using the proof technique Natural Deduction, which can be seen in Fig. 1. Constructing a proof using Natural Deduction encompasses various steps: Building a syntactically correct proof tree, applying the correct rules in the correct order, managing assumptions, and recognizing if a tree represents a correct solution. The editor supports students throughout that process: Students select formulas (middle part of Fig. 1) and a rule (left part of Fig. 1), rule application (and construction of a syntactically correct proof tree) as well as assumption management (right side of Fig. 1) is taken over by the editor [29]. An example for a subject-specific editor is the JavaScript editor which was used in the evaluation described in Sect. 5 and can be seen in Fig. 2. The editor allows students to immediately start coding without having to set up a development environment and provides immediate feedback in form of error messages and the results of unit tests. The top part of Fig. 2 shows the view in which students enter their code, the tabs at the top allow students to toggle between different functionalities of the editor. The bottom part of the same figure shows the output functionality, which shows the output of the entered code, which are in this case two crossing lines drawn on a HTML5 canvas element. Beyond that, a wide variety of editors is imaginable, such as editors for constructing chemical (see, e.g., [7]) or mathematical formulas. In STEM, most con-

Promoting Active Participation in Large Programming Classes

339

Fig. 1. Editor for creating proofs using the proof technique Natural Deduction (from [29, p. 4]).

tent covered by those editors can be expressed in a formal language which makes the submission understandable and in many cases correctable by a machine, which enables error messages and immediate feedback. Use beyond STEM is imaginable as well, as there are approaches for computer-provided scaffolding in non-STEM subjects as well (see Sect. 2). What all those editors have in common is that they run in the Backstage environment, which allows the system to use the currently worked-on submission to gauge the users’ or teams’ progresses, aggregate them, and present them to lecturers in an overview. 4.2

Support by Technology for Lecturers

The larger the class, the more difficult it is for lecturers to have an overview of the class, their students’ progresses, and therefore, identifying whom to help. In such situations, lecturers are dependent on students asking for help or dependent on walking through the classroom to spot struggling students or teams by chance by glancing on their papers or screens. To make it easier for lecturers to identify struggling students or teams, Backstage uses the currently worked on submissions and provides lecturers an aggregated overview of those on their own screen. Those overviews can be composed of two kinds of data:

340

S. Mader and F. Bry

Fig. 2. Examples for the JavaScript editor: The top image shows the view in which students enter code, the bottom image shows the output of the code. In both images, students can switch between the different functionalities using the tab menu at the top.

– editor-independent data, such as idle time or time between actions – editor-dependent data, such as number of successful unit tests or number of unsuccessful compiles If an overview of more than editor-independent data should be provided, a custom overview has to be built. An example for a custom-built overview for the JavaScript editor can be seen in Fig. 3: The two graphs show the number of successful unit tests on the y-axis and the time on the x-axis. The number above each of the graphs is the average slope of that graph with the intuition that non-struggling teams have an higher slope on average. In the middle part, all tests which at least on person has failed are shown.

Promoting Active Participation in Large Programming Classes

341

Fig. 3. Classroom progress for an exercise worked on using the JavaScript editor with two teams.

5

Evaluation

Phased Classroom Instruction was evaluated in two successive terms in the accompanying course to a software development practical. This section first describes the two instances of the course, their differences and then discusses the evaluations. 5.1

Courses

Both courses were the accompanying courses to a software development practical in which students develop a browser-based game using the programming language JavaScript in teams of four students. The students worked on the inclass exercises in the same teams as the teams they developed their games in. In the majority of lectures, more than one mini-lecture (with accompanying exercise and peer review) was held. Table 1 shows a first difference between the courses: The shorter duration of lectures in C2 led to a decrease in the number of mini-lectures (and consequently exercises) held in C2. Another difference between the two courses was the content of the exercises: In both courses, the mini-lectures aimed to prepare the students to implement a browser-based game using the programming language JavaScript, and therefore, each mini-lecture was structured around a certain concept of game programming.

342

S. Mader and F. Bry Table 1. Overview of the two courses.

Course Duration of a lecture Number of participants Number of mini-lectures C1 C2

135 min 90 min

16 44

11 9

The subsequent exercises then focussed on applying the taught concept. In C1, teams applied the concepts to their own version of the game Snake,3 each exercise improving upon or adding new features to their implementation. Due to shorter lectures in C2, the implementation of Snake was no longer viable. The teams still worked on an ever-evolving program, but in a pared-down version: A rectangle, which was subsequently made controllable using the arrow keys, replaced with graphics, and finally animated. A few changes were done in C2 regarding the presentation of the course material and exercises in response to students’ criticisms and shortcomings that attracted the lecturer’s attention in C1. For one, the inability to copy code from PDF slides was criticized by students, which was addressed by adding a new type of unit to Backstage that allows to combine executable code with text formatted using Markdown.4 An example for such a unit can be seen in Fig. 4: In this case, the code unit consists of two code snippets interspersed with text. The code is shown using the same editor students use to work on the exercises (see Fig. 2). Besides allowing students to easily copy code, those units opened up ways for new classroom interactions: Lecturers being able to easily demonstrate effects of changes in the code and students testing own changes directly from the lecture material. In C1, exercises were expressed as a large text composed of a general description and a number of enumerated steps with the intent of the teams working on the steps in order and learning a sound approach on how to solve the exercise at hand. Unfortunately, in the lecturer’s experience, some teams often chose to ignore the steps and began to work on the task from the general description alone. To address that problem, exercises in C2 were presented in a scaffolded way where teams had to unlock each step through the click on a button. An example for a scaffolded exercises can be seen in Fig. 5, which shows the general description and two unlocked steps and a button, a click on which would unlock the next step. Another difference between the courses was the availability of unit tests: While in C1 unit tests were not available at all, in C2, unit tests were available for five of the nine exercises. Peer review (the last step of Phased Classroom Instruction) could not be conducted consistently in both courses for various reasons: In C1, internet connectivity problems prevented some teams from submitting their code or connect to the platform at all during the first lectures. In C2, the lack of time during the 3 4

https://en.wikipedia.org/wiki/Snake (video game genre). https://en.wikipedia.org/wiki/Markdown.

Promoting Active Participation in Large Programming Classes

343

Fig. 4. Example for a code unit where two code snippets are interspersed with text (content translated from German).

Fig. 5. Example for a scaffolded exercises showing two unlocked steps and a button, a click on which would unlock the next step (content translated from German).

lectures prevented peer review to be conducted most times, because the lecturer chose to let students work on the exercises for a longer time instead of enforcing the peer review phase. An evaluation of the conducted peer reviews in C1 can

344

S. Mader and F. Bry

be found in the original article [22], but was omitted here due to peer review being rarely done in C2. 5.2

Methods

Both courses were evaluated using the same survey, therefore the description of the survey is a verbatim reproduction of the description found in Sect. 4.1 of [22]. The remaining paragraphs were revised or added to reflect changes in the evaluation. Data for the evaluation was collected using a survey and taken directly from the Backstage system as well. The surveys were conducted during the final lectures of each course and consisted of six parts: 1. Four questions referring to the students’ course of study, current semester, gender, and team they were in. 2. Six questions measuring the students’ attitude towards the course format and its elements. 3. Six questions measuring the students’ attitude towards the content and structure of mini-lectures and exercises. 4. Six questions measuring the students’ attitude towards the enabling technology. 5. Five questions measuring the students’ programming proficiency using an adapted version of the survey by Feigenspan et al. [9]. 6. Three questions in form of free-text questions, asking about what they liked most, what could be done better and for further comments. For parts (2), (3), and (4), a six-point Likert scale from strongly agree to strongly disagree with no neutral choice was utilized. Survey results of both groups were tested using a two-tailed independent t-test for significant differences. The significance threshold was set to p < 0.05. For correlation, Pearson correlation coefficient with a significance threshold of p < 0.05 was used. All submissions were retrieved directly from the Backstage system. A single lecturer determined for each team and exercise the point in time – if at all – in which the exercise was solved correctly. The correctness of an exercise was determined in a strict way: A submission was seen as correct, if and only if the whole task was solved correctly. That means that submissions that were nearly correct (e.g., a rectangle moving into the correct direction for three of the four arrow keys) were classified as wrong. Due to the aforementioned internet connectivity problems in C1 and software problems in C2, data for the first lecture is not complete, as not all teams were able to connect to the platform and submit their solutions, but is nonetheless included in the evaluation for the sake of completeness. 5.3

Results

Table 2 show the populations of the surveys conducted in C1 and C2. Not every student of C2 provided an answer to all statements, therefore the results for

Promoting Active Participation in Large Programming Classes

345

C2 reported in the following refer to between 30 and 32 students. To ensure the comparability of the courses, coding proficiency and current semester were tested for differences using a two-tailed independent t-test which yielded non-significant values (p = 0.76 for self-assessed coding proficiency, p = 0.97 for current study semester) for both variables. Table 2. Overview of the populations of the surveys. C1

C2

Number of participants 16 32 4.73 4.75 Average study semester Average self-assessed coding proficiency 3.5 3.71

Attitude Towards the Course Format. Table 3 shows the aggregated students’ responses to the statements on the questionnaire referring to the course format: While the results were always positive across both venues for all statements, results of C2 were always more negative than the results of C1. In three cases the difference between C1 and C2 was significant: Students in C2 found discussions with team mates significantly less helpful, were significantly more likely to prefer a traditional lecture, and had less fun during the lectures. Nonetheless, those results were still positive. For C2, the correlation between team discussion (statement 2 in Table 3) and fun during the lecture (statement 4 in Table 3) was examined and yielded values near the correlation significance threshold (r = 0.344, p = 0.539). Due to the fact that peer review was rarely done in C2, the results for statements referring to peer review (statements 5 and 6 in Table 3) are only reported for the sake of completeness and will not be discussed in the following. Attitude Towards Course Material. In Table 4 the students’ attitude towards the course material can be seen: Again, responses in C2 were mostly less positive as responses in C1, but still positive throughout all statements. Students in C2 were significantly less likely to agree with the statement that they liked the exercises building upon each other. Otherwise, none of the statements showed a significant difference between the two courses. Contrary to the general trend of students in C2 answering more negatively, the statement referring to exercise difficulty (statement 3 in Table 4) shows no difference in average between the courses. Attitude Towards Technological Support. Results for the attitude towards the technological support can be seen in Table 5 and show a similar pattern as the two other blocks of statements: Students in C2 answering a bit more negatively. The exception is the statement whether students would have preferred to work on the exercises in a real development environment, which shows no difference in average between the courses. None of the statements showed a significant difference between the courses.

346

S. Mader and F. Bry

Table 3. Results of the questions referring to the course format for both courses. Strongly agree was assigned the value 5, strongly disagree the value 0. C1 (n = 16)

C2 (n = 30 − 32)

Statement

Avg Mdn Avg Mdn

1 Immediate practical exercises after mini-lectures helped understanding the topic 2 Discussions with my team mates during the practical exercises helped me understand the topic 3 I would have preferred a traditional lecture without practical exercises 4 I had fun during the plenum sessions 5 Reviewing another team’s submission gave me new ideas where to my improve my team’s submission 6 The received review for our code helped me to identify weaknesses of my team’s code

4.75 5

4.47 5

0.200

p

4.75 5

3.97 4

0.002*

0.31 0

0.77 1

0.033*

4.43 4.5 3.44 3

3.94 4 3.06 3

0.012* 0.226

2.94 3

3.19 3

0.459

Table 4. Results of the questions referring to the course material for both courses. Strongly agree was assigned the value 5, strongly disagree the value 0. C1 (n = 16)

C2 (n = 30 − 32)

Statement

Avg Mdn Avg Mdn

1 The mini-lectures were sufficient to solve the practical exercises 2 I would have preferred exercises that do not build upon each other 3 The exercises were too difficult 4 Through the mini-lectures and practical exercises I feel well prepared for the implementation of the group project 5 I liked that the exercises build upon each other 6 The exercises were too big

3.94 4

3.42 4

0.083

1

1

1.65 1

0.084

1.75 2 3.69 4

1.68 1 3.3 3

0.812 0.205

4.19 4

3.5

4

0.028*

1.5

2

2

0.152

1

p

Attendance in the Lectures. Reports on the attendance in C2 can be found in Table 6: While the first three lectures were visited at by least 90% of all participants, starting from the fourth lecture that number dropped to around 70%. For C1, such data is not available, but generally around 14 out of 16 students were present during the lectures.

Promoting Active Participation in Large Programming Classes

347

Table 5. Results of the questions referring to the technological support for both courses. Strongly agree was assigned the value 5, strongly disagree the value 0. C1 (n = 16)

C2 (n = 30 − 32)

Statement

Avg Mdn Avg Mdn

p

1 The JavaScript editor on Backstage made getting started with JavaScript easy 2 The JavaScript editor was easy to operate 3 The interface of Backstage, where exercises were worked on was clearly designed 4 The interface of Backstage, where another team’s submission was reviewed, was clearly designed 5 The course format (i.e., mini-lectures followed by exercises and peer review) was well-supported by Backstage 6 I would have preferred to solve the practical exercises using a real development environment

3.31 3.5

3.72 4

0.259

3.56 3.5 3.94 4

3.63 4 3.69 4

0.865 0.446

4.13 4

3.53 4

0.058

4.13 4

3.75 4

0.171

2.94 3

2.96 4

0.960

Table 6. Attendance in the lectures in C2. Lecture

L1 L2 L3 L4 L5 L6

Number of students 44 43 40 31 31 32

Exercise Correctness. Figure 6 and Fig. 7 show for each team which exercises that team solved correctly during lectures, after lectures, or not at all for C1 and C2, respectively. While similar exercises were done in both courses, a same exercise number in Fig. 6 and Fig. 7 does not necessarily correspond to an exercise about the same concept. Generally, teams in C2 were less successful compared to the teams in C1. Exercise 4-1 (same identifier for both course), the exercise in which students had been exposed to object-oriented programming in JavaScript for the first time, was in both venues the exercise where students failed most. 5-2 in C2 (5-1 in C1) was only worked on for a short time by the students in C2, therefore correctness values in C2 for that exercise can be disregarded. An aggregated overview of teams’ performances can be seen in Table 7 and Table 8 for C1 and C2, respectively. Generally, teams in C2 were less successful than teams in C2, with only 2 out of 11 teams solving more than half of the exercises successfully during lectures opposed to 2 out of 4 teams in C1. When including successfully solved after the lecture, 7 out of 11 teams solved at least half of the exercises in C2 opposed to 3 out of 4 in C1.

348

S. Mader and F. Bry

Fig. 6. Overview of all teams and exercises solved successfully by them in C1. The first number identifies the lecture, the second number of the exercise within that lecture (Figure adapted from [22, p. 249]).

Fig. 7. Overview of all teams and exercises solved successfully by them in C2. The first number identifies the lecture, the second number of the exercise within that lecture.

5.4

Discussion

While students’ attitude towards the course format, the course material, and the technological support was positive in both courses, students of C2 were generally less positive compared to the students of C1. That suggests that while the format

Promoting Active Participation in Large Programming Classes

349

Table 7. Percentage of exercises solved correctly during lectures or solved correctly after lectures for teams in C1. Team Team Team Team Team

Correct in lecture Correct after lecture Sum 1 2 3 4

0.55 0.73 0.45 0.36

0.18 0.09 0.18 0.09

0.73 0.82 0.63 0.45

Average 0.52

0.14

0.66

Table 8. Percentage of exercises solved correctly during lectures or solved correctly after lectures for teams in C2. Team Team Team Team Team Team Team Team Team Team Team Team

Correct in lecture Correct after lecture Sum 1 2 3 4 5 6 7 8 9 10 11

0.44 0.00 0.44 0.33 0.22 0.22 0.44 0.22 0.67 0.67 0.22

0.11 0.11 0.22 0.22 0.33 0.11 0.33 0.11 0.11 0.22 0.00

0.55 0.11 0.66 0.55 0.55 0.33 0.78 0.33 0.78 0.89 0.22

Average 0.35

0.17

0.52

is able to support large classes (as students are still exhibiting a positive attitude towards all parts), it does not scale perfectly and further improvements for large classes are required. Students in C2 were significantly more likely to find team discussions less helpful (statement 2 in Table 3), for which various possible explanations are possible: For one, the larger the audience, the greater the noise in the lecture hall, which could have inhibited team discussion. Another possible explanation is the seating arrangement: In C1, chairs and tables could be freely moved, while in C2, chairs and tables were on fixed positions and arranged in rows, which made it hard to discuss with the whole team. Finally, of the 12 students not present and the single student not stating team affiliation in the survey, 8 were concentrated in 3 teams, while the other 5 were each a single member missing from a team that was otherwise complete. That fact means, that three teams had at best two members – which might by too little for an effective team discussion. Another significant difference between C1 and C2 was that students in C2 were significantly less likely to have fun during lectures. As a reason for that, the

350

S. Mader and F. Bry

authors presume a connection to the aforementioned difference in perception of team discussion. Much of the fun in collaborative learning may stem from the interactions within a team: Talking and discussing with other team members, making errors and having success together, and growing as a team. That hypothesis is supported by the nearly significant correlation between team discussion and fun during the lectures for C2, which implies that students who perceived the team discussion as positive had more fun during the lectures as well. While still exhibiting a positive attitude, students in C2 were significantly more critical of exercises building upon each other than students in C1. A possible explanation for that may stem from the fact that the majority of students in C2 were not able to solve the exercises consistently, which made working on exercises building upon previous results hardly possible. When looking at the exercise correctness in detail, it becomes evident that teams in C2 had more problems solving the exercises than teams in C1, which may have resulted from the exercises being insufficiently adapted in regard to the shorter duration of lectures. Exercises being insufficiently adapted for the shorter duration could be backed by the fact that the majority of teams could solve at least half of the exercises outside of the classroom, what could mean that they were not too hard, just too extensive. Another explanation are the absent students: Exercises were constructed to be solved in teams and when half of a team is absent, solving the exercises becomes harder. The seating arrangement in the room made it also hard for the lecturer to reach all the teams – teams sitting in the middle of a row were nearly impossible to reach physically. Therefore, teams sitting on the sides were more likely to get personal support. Finally, students in both courses had a positive attitude towards the technological support, with none of results for any statement exhibiting a significant difference between the courses, which suggests that the technological support works well for both small and large classes. In the lecturer’s experience, the scaffolded exercises were not used in the intended way as teams in C2 often exhibited the same behaviour as teams in C1: Unfolding all steps and starting to work from the general description of the exercise. Another problem was communicating phase changes, that is the end of the exercise phase and the beginning of the peer review phase – often two attempts were required before every student in the lecture hall paid attention to the lecturer and the phase change could be initiated. The lowest attendance in C2 was 70% of all participants, which is much higher than in other STEM lectures at the authors’ university. The higher attendance could result from the active approach to the course which allows students to immediately apply the acquired knowledge in practical exercises. For the first three lectures, attendance was even higher at above 90%, before dropping sharply to the aforementioned 70% for the fourth lecture. That drop coincides with the point at which the majority of students could no longer solve the exercises correctly during the lecture – the third lecture. That fact is a possible explanation for the drop in attendance: Students no longer seeing the purpose of attending lectures when not being able to solve the exercises.

Promoting Active Participation in Large Programming Classes

351

Another potential reason for the high attendance is that a few students may have thought that attending the lecture was compulsory, as one student stated in the free text part of the survey that the compulsory lectures should have been announced before the beginning of the term to allow for better planning. Overall, the course format, course material, and technological support were well received across both courses and seem to foster attendance during lectures. The drop in correctly solved exercises from C1 to C2 suggests that improvements should be introduced for further venues: improvements to the technological support for both lecturers and students, changes to the exercises, and organizational changes regarding the seating arrangements.

6

Conclusion and Perspectives

This article extended upon the course format Phased Classroom Instruction, first introduced in [16] and first evaluated in a small course in [22]. Phased Classroom Instruction divides lectures into mini-lectures and exercises followed by peer review. The format aims at giving lecturers the means for making even large classes active by providing technological support that allows lecturers to get an overview of the class and identify struggling students. Students are supported by scaffolding and immediate feedback provided by problem- or subject specific editors while working on exercises. Two evaluations – in a small course of 16 students, one in a large class of 44 students – have been performed. Both evaluations point to students liking the format and the technological support, but reveal problems in the large class: Students in the large class found team discussions less helpful and had less fun during the lecture, which potentially can be explained by team members no longer visiting the lecture, teams not being able to consistently solve the exercises during the lecture, and an unfortunate seating arrangement. For future implementations, three kinds of improvements to the technological support are envisaged: – In the JavaScript editor, the connection between steps and unit tests has to be made more explicit. This will be done by dividing exercises more clearly into steps and only showing those unit tests that pertain the current step. Going further, teams could also be prevented from proceeding to the next step if the previous step was not finished successfully. – Concerning the support for lecturers, the number of passing unit tests (see Sect. 4) turned out to not be an appropriate measure for identifying struggling teams. Therefore, for future venues, other measures such as idle time or number of consecutive tries compiling syntactically incorrect code will be taken into account. – Phases changes should be communicated using the technological support as well, e.g., by showing some kind of indicator on the students’ screens when the lecturer intends to proceed to the next phase.

352

S. Mader and F. Bry

In the authors’ opinion, at least half of the class should be able to solve the exercises during a lecture: Therefore, the size of the exercises will be adapted for future venues. If the previously discussed suggestion that drop out is connected to not being able to solve the exercises is correct, an adaption of the exercises should have a positive effect on team discussion as well, as more team members will be present. The problems resulting from a seating arrangement must be addressed by communicating a seating arrangement to the students during the first lecture, which should – as long as the students stick to it – make it easier for lecturers to physically reach all teams. Furthermore, the approach can be tested with other editors and in other contexts as well: Backstage supports the aforementioned logic proof editors as well as editors for other programming languages. A large number of lectures on different subjects and different settings are desirable. Furthermore, the suggestions for the lecturers’ side of the technological support are derived from subjective impressions of a single lecturer – for more reliable results, an evaluation with more lecturers should be done. Acknowledgements. The authors are thankful to Maximilian Meyer for the implementation of the JavaScript editor as part of his master’s thesis [24].

References 1. Amresh, A., Carberry, A.R., Femiani, J.: Evaluating the effectiveness of flipped classrooms for teaching CS1. In: 2013 IEEE Frontiers in Education Conference, pp. 733–735. IEEE (2013) 2. Aronson, E.: Building empathy, compassion, and achievement in the Jigsaw classroom. Improving Academic Achievement: Impact of Psychological Factors on Education, pp. 209–225 (2002) 3. Bishop, J.L., Verleger, M.A., et al.: The flipped classroom: a survey of the research. In: ASEE National Conference Proceedings, Atlanta, GA, vol. 30, pp. 1–18 (2013) 4. Bligh, D.A.: What’s the Use of Lectures? Intellect Books (1998) 5. Bloom, B.S.: Taxonomy of Educational Objectives, Handbook 1: Cognitive Domain. Longmans (1956) 6. Bry, F., Pohl, A.Y.S.: Large class teaching with backstage. J. Appl. Res. High. Educ. 9(1), 105–128 (2017) 7. Bryfczynski, S.P., et al.: uRespond: iPad as interactive, personal response system. J. Chem. Educ. 91(3), 357–363 (2014) 8. Cho, K., Schunn, C.D., Wilson, R.W.: Validity and reliability of scaffolded peer assessment of writing from instructor and student perspectives. J. Educ. Psychol. 98(4), 891 (2006) 9. Feigenspan, J., K¨ astner, C., Liebig, J., Apel, S., Hanenberg, S.: Measuring programming experience. In: 2012 IEEE 20th International Conference on Program Comprehension (ICPC), pp. 73–82. IEEE (2012) 10. Freeman, S., et al.: Active learning increases student performance in science, engineering, and mathematics. Proc. Natl. Acad. Sci. 111(23), 8410–8415 (2014) 11. Gilboy, M.B., Heinerichs, S., Pazzaglia, G.: Enhancing student engagement using the flipped classroom. J. Nutr. Educ. Behav. 47(1), 109–114 (2015)

Promoting Active Participation in Large Programming Classes

353

12. Gr¨ uner, G.: Die didaktische Reduktion als Kernst¨ uck der Didaktik. Die Deutsche Schule 59(7/8), 414–430 (1967) 13. Hattie, J., Timperley, H.: The power of feedback. Tijdschrift voor Medisch Onderwijs 27(1), 50–51 (2008). https://doi.org/10.1007/BF03078234 14. He, Y., Hui, S.C., Quan, T.T.: Automatic summary assessment for intelligent tutoring systems. Comput. Educ. 530(3), 890–899 (2009) 15. Heller, N., Bry, F.: Peer teaching in tertiary STEM education: a case study. In: Auer, M.E., Tsiatsos, T. (eds.) ICL 2018. AISC, vol. 916, pp. 87–98. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-11932-4 9 16. Heller, N., Mader, S., Bry, F.: Backstage: a versatile platform supporting learning and teaching format composition. In: Proceedings of the 18th Koli Calling International Conference on Computing Education Research, p. 27. ACM (2018) 17. Jonsson, A., Svingby, G.: The use of scoring rubrics: reliability, validity and educational consequences. Educ. Res. Rev. 2(2), 130–144 (2007) 18. King, A.: From sage on the stage to guide on the side. Coll. Teach. 41(1), 30–35 (1993) 19. Krathwohl, D.R.: A revision of bloom’s taxonomy: an overview. Theory Practice 41(4), 212–218 (2002) 20. Lundstrom, K., Baker, W.: To give is better than to receive: the benefits of peer review to the reviewer’s own writing. J. Second Lang. Writing 18(1), 30–43 (2009) 21. Mader, S., Bry, F.: Audience response systems reimagined. In: Herzog, M.A., Kubincov´ a, Z., Han, P., Temperini, M. (eds.) ICWL 2019. LNCS, vol. 11841, pp. 203–216. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35758-0 19 22. Mader, S., Bry, F.: Phased classroom instruction: a case study on teaching programming languages. In: Proceedings of the 11th International Conference on Computer Supported Education - Volume 1: CSEDU, pp. 241–251. SciTePress (2019) 23. McLaughlin, J.E., et al.: The flipped classroom: a course redesign to foster learning and engagement in a health professions school. Acad. Med. 89(2), 236–243 (2014) 24. Meyer, M.: A browser-based development environment for javascript learning and teaching. Master thesis, Institute of Informatics, Ludwig Maximilian University of Munich (2019) 25. Popham, W.J.: What’s wrong-and what’s right-with rubrics. Educ. Leadership 55, 72–75 (1997) 26. Prince, M.: Does active learning work? A review of the research. J. Eng. Educ. 93(3), 223–231 (2004) 27. Sao Pedro, M.A., Gobert, J.D., Baker, R.S.: The impacts of automatic scaffolding on students’ acquisition of data collection inquiry skills. Roundtable presentation at American Educational Research Association (2014) 28. Stains, M., et al.: Anatomy of STEM teaching in North American universities. Science 359(6383), 1468–1470 (2018) 29. Staudacher, K., Mader, S., Bry, F.: Automated scaffolding and feedback for proof construction: a case study. In: Proceedings of the 18th European Conference on e-Learning (ECEL 2019). ACPI (to appear) 30. Stelzer, T., Brookes, D.T., Gladding, G., Mestre, J.P.: Impact of multimedia learning modules on an introductory course on electricity and magnetism. Am. J. Phys. 78(7), 755–759 (2010) 31. Stuart, J., Rutherford, R.: Medical student concentration during lectures. Lancet 312(8088), 514–516 (1978) 32. Topping, K.: Peer assessment between students in colleges and universities. Rev. Educ. Res. 68(3), 249–276 (1998)

354

S. Mader and F. Bry

33. Van Merri¨enboer, J.J., Kirschner, P.A., Kester, L.: Taking the load off a learner’s mind: instructional design for complex learning. Educ. Psychol. 38(1), 5–13 (2003) 34. Vihavainen, A., Vikberg, T., Luukkainen, M., P¨ artel, M.: Scaffolding students’ learning using test my code. In: Proceedings of the 18th ACM conference on Innovation and Technology in Computer Science Education, pp. 117–122. ACM (2013) 35. Williams, E.: Student attitudes towards approaches to learning and assessment. Assess. Eval. High. Educ. 17(1), 45–58 (1992) 36. Wood, D., Bruner, J.S., Ross, G.: The role of tutoring in problem solving. J. Child Psychol. Psychiatry 17(2), 89–100 (1976) 37. Yang, Y.F.: Automatic scaffolding and measurement of concept mapping for EFL students to write summaries. J. Educ. Technol. Soc. 180(4) (2015) 38. Young, M.S., Robinson, S., Alberts, P.: Students pay attention! combating the vigilance decrement to improve learning during lectures. Active Learn. High. Educ. 10(1), 41–55 (2009)

Agile Methods Make It to Non-vocational High Schools Ilenia Fronza(B) , Claus Pahl, and Boris Suˇsanj Free University of Bozen-Bolzano, Piazza Domenicani 3, 39100 Bolzano, Italy {ilenia.fronza,claus.pahl,boris.susanj}@unibz.it

Abstract. An increasing number of activities is proposed to students at all school levels to foster programming and computational thinking skills. However, limited effort has been spent so far to facilitate the acquisition of Software Engineering (SE) principles, even if this would foster a set of crucial skills. In particular, Agile methods are a good candidate for the educational context. The endeavor to bring SE principles to non-vocational high schools sets several challenges. For the existing wide range of schools, customized approaches are needed, which leverage different types of activities (e.g., programming or non-programming activities). Moreover, SE should be taught while pursuing the existing curricular objectives during activities that can tap into students’ interests. This work describes and compares different approaches to foster SE principles in diverse types of non-vocational schools. The presented approaches take into account the final curricular objectives; moreover, Agile methods are practically applied and repeated until they become a habit to students. We evaluated the effectiveness of the proposed approaches through classroom experiences, which show that the participants learned how to organize their activities by applying a SE approach.

Keywords: Software engineering training and education software engineering · K-12 · Agile methods

1

· End-user

Introduction

A considerable number of people (including secretaries, accountants, teachers, and scientists) write software to support their work activities [25]. Already in 2005, in the United States, this category of people (called end-users [6], or citizen developers when they write software for use by other people [23]) included four times as many as professional programmers [36]. In 2016, about 27% of U.S. Online job postings valued coding as a technical skill [8]. In 2017, Gartner predicted that by 2022 citizen developers would be building more than a third of all web and mobile employee-facing apps delivered in organizations with mature citizen development initiatives [23]. The main problem is that, due to the lack of specific training in Software Engineering (SE) [37], the quality of end-user-produced software is overall low [6]. c Springer Nature Switzerland AG 2020  H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 355–372, 2020. https://doi.org/10.1007/978-3-030-58459-7_17

356

I. Fronza et al.

To address this issue, End-User Software Engineering (EUSE) aims at providing end-users with the knowledge of basic Software Engineering (SE) principles to improve code quality [3,7]. A wide range of activities is nowadays proposed to students at all school levels to foster programming and computational thinking skills. The broader aim is to prepare students for the labor market, where it is crucial to solve problems also using technology. However, limited effort has been spent so far to teach software quality notions and SE principles [20,31], even if SE is an excellent means to exercise a set of skills that are needed nowadays [5] and can be applied to other fields as well [17]. In particular, Agile methods are the ideal candidate for schools [24] and, because of the above-mentioned pervasiveness of programming activities in the labor market, it is advised to reach all the students, no matter what career they have chosen [22]. The endeavor to bring Software Engineering principles to high schools sets several challenges: customized approaches to teaching are needed for a wide range of schools, from technical to non-vocational ones. Indeed, depending on the type of curriculum, different types of activities (e.g., programming or nonprogramming activities) need to be leveraged to foster SE principles. Moreover, in non-vocational schools (which might not focus on CS and STEM) Software Engineering should be taught while pursuing the existing curricular objectives [16,18,22] during activities that can tap into students’ interests. Based on these considerations, in this paper, we build on [22] by describing a new approach to foster SE principles in a different non-vocational school. Furthermore, we detail how the proposed approaches take into account the final curricular objectives concerning coding and, as a common trait, they do not include lectures on Software Engineering [6]; instead, Agile methods are practically applied and repeated until they become a habit to students [26]. Finally, we better demonstrate the effectiveness of the proposed approaches by describing a new classroom experience, which confirms that the participants started using a SE approach to organize their work. The remaining part of the paper is organized as follows. Section 2 reports on the research works that focused on bringing Software Engineering principles to K-12; Sect. 3 describes the motivation of the proposed didactic modules; Sect. 4 and Sect. 5 detail the structure of the modules that have been designed for different types of non-vocational high schools, while Sect. 6 describes the assessment strategy. Section 7 is dedicated to classroom experiences, and Sect. 8 reports their results. Finally, Sect. 9 discusses the results of this work and proposes directions for future work.

2

Software Engineering in High Schools: State of the Art

The field of End-User Software Engineering (EUSE) aims at bringing the benefits of a Software Engineering (SE) approach to end-users, who develop software without having formal education (for instance, a degree in Computer Science/Software Engineering), or extensive experience in software development [6].

Agile Methods Make It to Non-vocational High Schools

357

The goal of this endeavor is to improve end-user-created code quality, which is generally low [6] because of the lack of specific training in Software Engineering [37]. At all levels of education, several new activities and projects have been proposed in the last years to teach coding and computational thinking skills. Not all students involved in these activities are prospective software engineers; however, it has become clear over the years that the teaching of software development should be treated from an engineering point of view to all students [38]. Moreover, Software Engineering is an excellent means to foster a set of necessary skills, which include, for example, communication skills, logic, and computational thinking [5]. Only a limited number of existing studies focus on teaching Software Engineering in primary and secondary schools. The majority of these studies agree that Agile methods should be chosen for K-12 [33,35], because they favor a flexible, iterative approach, and focus more on the product than on (unnecessary) documentation [17]. This approach fits end-users’ working style, which is usually opportunistic, collaborative, and incremental [7,11]. The existing studies in this field proposed mentoring methodologies on Agile [29] and analyzed how Agile can contribute to emphasizing a learner’s perspective [34] and to achieving more flexibility in projects [24]. Moreover, research effort has been dedicated to understanding how Agile practices can be leveraged or adapted to different levels of education [17], such as middle schools [20]. The endeavor to bring SE to end-users should take into account end-users’ objectives and working style [6,9]. This also applies to the educational context, especially in high schools: due to the large variety of existing curricula, for example, students might have different levels of interest in CS and STEM [19,21]. With this vision at hand, in this work, we present different approaches to foster SE principles in non-vocational high schools in which CS has different degrees of presence and is used to achieve different goals. To respect the EUSE guidelines [7], the common denominator of our approaches is the absence of additional lectures on SE: Agile methods, instead, are practically applied until they become a habit to students [26].

3

Motivation

Bringing Agile methods to high schools implies focusing on the process side (i.e., on the journey undertaken by the students to obtain the final products) more than on the final product itself, without aiming at transforming students into professional software engineers through ad-hoc courses and activities [7]. This would be beneficial, especially in non-vocational high school, that is, in those high schools that do not train prospective software developers. Indeed, students could learn SE principles while working on tasks that do not focus on software development, which would contribute to overcoming the problem that they do not perceive SE principles as something useful to be learned [9]. In this regard, due to the great variety of curricula, students may have different levels of interest

358

I. Fronza et al.

in CS and STEM in general. Moreover, CS can be more or less central in the curriculum of each non-vocational school. The resulting challenge is understanding how to foster SE principles in different types of non-vocational schools, without introducing specific lessons and by respecting the existing curricular objectives. Thus, our Research Questions are: RQ1: Is it possible to foster SE principles in non-vocational high schools? RQ2: Is this approach effective? In order to answer these research questions, in this work, we describe the approaches that we have designed to foster SE principles in non-vocational high schools. In particular, we have chosen two rather different types of non-vocational schools in terms of curriculum and students profiles: – the High School of Humanities (HSH), where CS is a relatively marginal subject, and students learn some programming basics mainly for data analysis purposes. In this school, students generally do not consider themselves candidates for STEM disciplines; – the High School of Economics (HSE), where students are in general more interested in STEM, and CS takes on a more central role as a support to developing the problem-solving skills that are needed for the specialization in economics. Based on these premises, the modules that we propose in this paper focus on different activities. The objective of the HSH modules is the creation of an infographic and a video [22]. These topics are indeed transversal to many disciplines, and they can engage students [39], considering that they have grown up in a “YouTube environment” [32]. Moreover, these tasks require problem-solving skills [4]. The objective of the HSE module is to programming an ambulancerobot. In this case, educational robotics is used to tap into students’ interests. The activity has been designed to fit the existing CS course, which aims at training students to write simple software projects with the primary goal of learning how to identify appropriate strategies to solve a problem and to code its solution. In all the proposed activities, the focus is on the process side, which, as above-mentioned, is crucial in non-vocational schools. In HSH, we do not include software development tasks at all; in HSE, instead, we introduce development tasks, but the level of difficulty is kept low so that students can focus on the process side. Among Agile methods, our activities focus on eXtreme Programming (XP), which creates an excellent setting for formative assessment thanks to the small releases practice. In particular, we have selected the set of XP practices [30] that are recommended to be adopted together when focusing on the process [17] and map to a set of Agile principles: – Incremental development is supported by small releases, frequent testing, and user stories. Small releases organize the activities in short iterations to receive continuous feedback from the customer. User stories are implemented as informal prototypes (e.g., storyboards).

Agile Methods Make It to Non-vocational High Schools

359

– Customer involvement is supported by continuous customer’s engagement to define acceptance tests and provide feedback. Having the instructor playing the role of a customer [40] can increase self-organization [24]. – Change is embraced through test-first, refactoring, small releases, and continuous integration. The following section details the structure of the proposed approaches and their assessment strategies.

4

High School of Humanities

At the end of this study program, students shall be able to write small code snippets using basic programming concepts to analyze data collected from several sources. To achieve this goal, the High School of Humanities (HSH) offers the following subjects: ICT (first two years, two hours per week), and Computer Science (third to fifth year, two hours per week). ICT teaches how to choose the appropriate tool to solve a problem; Computer Science, instead, introduces basic programming concepts to analyze data from different sources. Based on the characteristics of these courses, during the first year, our strategy consists of fostering Agile principles through activities that focus on the process. For this reason, the two proposed modules aim at creating an infographic and a video, which do not include programming tasks [22]. As shown in Table 1, the proposed modules cover a total of 28 h of curricular activity (14 h for each module), and each activity fosters specific XP practices. Table 1. Structure of the two HSH modules: activities, hours, and XP practices. First module

Hours Second module

Hours Practices

Introduction to infographics

2

Introduction to video-making

2

Creation of a paper-based prototype

6

Creation of a storyboard

4

User stories, on-site customer, small releases

Creation of the infographic

4

Video production

6

On-site customer, small releases, testing

Presentation in front of the class

2

Presentation in front of the class

2

User stories, testing

Total

14

14

360

4.1

I. Fronza et al.

HSH Module One

Infographics combine words and pictures to communicate a message, which aims at achieving a specific goal. For example, infographics may be used to inform, to convince, or to teach [14]. Creating an infographic requires students to visually communicate a concept by using citations and statistics sourced from the literature and the media. Thereby, this task requires a critical analysis and filtering of the retrieved material [28]. Moreover, an effective visual effect should be obtained by, for example, defining a clear narrative, visualizing information in such a way that the reader will be able to understand it at a glance, choosing appropriate colors and fonts. The module begins with an introduction to infographics (2 h). Guidelines are provided for the creation of effective infographics [13,41]. The message to the students is that good infographics combine text, quantitative graphs, and sometimes diagrams and photos to tell the story of a current event [14]. Afterward, students analyze a set of good/bad examples to check their understanding of the provided tips. Infographics are the focus of this part, and no notion is given neither about Agile methods nor about Software Engineering in general. Afterward, students look for information on the topic and create a paperbased prototype (6 h). The relevant XP practice in this phase is user stories [1]. This task requires to choose the structure of the infographic, the information flow, and the visualization means (e.g., symbols) [13]. Students work in iterations of twenty minutes, and the instructor checks the timer. In the 10 min between two consequent iterations, the current version of the prototype is checked together with the instructor, to foster the on-site customer and small releases XP practices. The third part of the module consists of the actual creation of the infographic (4 h). Among the many available tools to create infographics, we use Canva1 , which is one of the preferred design tools for non-designers [27]. In particular, we chose this tool because: – it provides an extensive collection of free photos and icons, and easy export options; – most of the features are available to use for free; – it guides step-by-step through tasks such as choosing appropriate colors and typefaces; – once they have learned it, students can use it to create far more than just infographics (e.g., presentation, business card, poster). The same working pace of the previous activities is kept to foster the small releases practice: students continue working in 20-minute iterations with 10minute instructor’s feedback sessions in between to exercise on-site customer and testing practices. Before the last iteration, a peer evaluation model is applied: students work in pairs and review the other’s infographic. This allows them to reflect on the received feedback and to fix the infographic accordingly [28] (i.e., testing practice). 1

www.canva.com.

Agile Methods Make It to Non-vocational High Schools

361

At the end of the module, each student gives a presentation in front of the class (2 h). During the presentation, the instructor checks the correspondence of the final product with the initial requirements and paper-based prototype. This phase represents an opportunity for peer feedback and fosters the user stories and testing practices. 4.2

HSH Module Two

The module starts with an introduction to video making (2 h), with tips for its main related activities, such as producing a video concept, scenes, framing, light. Moreover, ethical aspects are introduced, such as giving credit to others when using their material [32]. During this part, no notion is given about Software Engineering or Agile. Afterward, 4 h are dedicated to the creation of a storyboard to foster the user stories, on-site customer, small releases XP practices. Students start with retrieving information about the topic, which is the same for the entire class and is assigned by the instructor. The video shall be 1.5 min long, outro excluded. Each student works on the pre-production phase of the video, which includes the following activities: thinking about video objective and idea, deciding the point of view on the assigned topic, designing the story, scenes, framing, and environments. A paper-based storyboard summarizes all the design choices. As in the previous module, the activity is organized in iterations of twenty minutes, with 10-minute feedback sessions in between. Once the paper-based storyboard is ready, students start the video production phase (6 h). Students can choose the tool, with the only limitation that a mobile phone shall be used throughout the entire production phase. This rule has been set to “demonstrate the potential of digital tools and encourage active and purposeful device use, to help learners overcome a more passive consumerist use of technology” [12]. Students are allowed to impersonating a character in other’s videos. Moreover, video shootings can also happen outside school hours. To exercise the small releases, on-site customer, and testing practices, we keep the pace of 20-minute iterations interleaved with 10-minute feedback sessions. At the end of the module, students present their product in front of the class (2 h); the instructor checks the conformance with requirements and storyboard.

5

High School of Economics

At the end of this four-year program, students are expected to be able to work in a team to write simple software projects. Developing this skill during the CS course (2 h per week in all the four years) is mainly used as a means for learning how to identify appropriate strategies to solve a problem and to code its solution. For this reason, block-based programming languages (e.g., Scratch) are adopted during the first two years. In the last two years, text-based programming languages are introduced at a basic level but problem-solving skills remain the main focus.

362

I. Fronza et al.

One of the main problems reported by school teachers is motivating students who often do not understand the usefulness (for their careers) of learning how to code. On that basis, we use educational robotics to tap into students’ interests; we keep the level of difficulty of the software development task low so that students can focus on the process side, which is organized to foster Agile principles. For this reason, the goal of the module is the creation of an ambulance-robot. Indeed, implementing the basic features of an ambulance (different speeds, siren, and lights) does not necessarily require the use of sensors, which reduces the complexity of a solution. At the same time, managing the simultaneous functioning of sirens and lights, or determining when they should be activated, contributes to developing problem-solving skills, which is the primary goal of the HSE curriculum. As shown in Table 2, the proposed module covers a total of 10 h, and each activity fosters specific XP practices. Table 2. Structure of the HSE module: activities, hours, and XP practices. Activity

Hours Practices

Tell me how you make toast, Brainstorming

2

User stories, small releases

Tell me how you make ambulance-robot

2

User stories, small releases

Development iterations

4

On-site customer, small releases, testing

Presentation in front of the class

2

User stories, testing

Total

10

The module begins with the activity Tell me how you make toast2 (2 h). Each person sketches a diagram of how to make toast, one step per post-it. Then, all the students combine their solutions; this requires to identify the common steps in the solutions, discard the unnecessary ones, keep the ones only someone has thought about, etc. The message of this activity is about the importance of teamwork and the identification of small steps towards the solution (i.e., the tasks in an Agile board). Afterward, a brainstorming session is performed to discuss the main features of an ambulance-robot. Afterward, students work in teams since the ability to work in teams is one of those that shall be acquired at the end of HSE. Based on the Tell me how you make toast activity, students repeat the same process for the activity Tell me how you make ambulance-robot (2 h). Each post-it now represents a feature that needs to be implemented to obtain an ambulance-robot, such as “the ambulance-robot shall run at maximum speed with lights and sound on”. The following instructions are provided: 1) you can choose between writing or drawing, 2) each post-it represents only one feature, 3) each post-it also specifies how to check if what you have done works and the behavior you are 2

https://www.ted.com/talks/tom wujec got a wicked problem first tell me how you make toast.

Agile Methods Make It to Non-vocational High Schools

363

expecting. At the end of this part, we introduce the robotics kit, mBot3 , and the main characteristics of its block-based programming language4 . We chose mBot because its programming environment provides users with the possibility of viewing the Python code behind the blocks with one-click and also write code on the Python editor. This characteristic might support students in switching to a text-based programming environment, which is among the objectives of the HSE program. Once the requirements are defined, four hours are dedicated to development iterations. Students select one post-it from their set and work only on that task. Each team works with one laptop and one robotics kit. Once they have completed the task, they can show the current version of the prototype to the instructor and get feedback before proceeding. At the end of the module, students present in front of the class (2 h) to check if the initial requirements have been satisfied. Moreover, other students can test the ambulance-robot.

6

Assessment Criteria

The assessment strategy that we have designed to evaluate the effectiveness of the proposed activities focuses on two main factors, namely product and process. For product assessment, we identified the following three criteria: 1. Technical aspects. This criterion assesses compliance with the initial requirements that are identified during the introduction. 2. Structure (e.g., code analysis for software products). 3. Data handling. This criterion evaluates how the proposed solution handles data (e.g., data analysis to convey a message). Table 3 shows how these criteria have been adapted to the specific characteristics of each module’s product. Regarding process assessment, the goal of the proposed modules is learning a set of XP practices while performing the proposed activities. As shown in Table 4, for each practice, we have defined a criterion that we evaluate by observing students’ behavior during the activities. A sufficient assessment is achieved with a total of six points. For HSE, the assessment is based on group outcomes, which is a widelyused approach [10]. However, this approach may not be the best; therefore, we monitor elements that can have an impact on group outcomes, such as the presence of leaders, the level of cooperation, possible issues, the contribution of each member [15].

3 4

https://www.makeblock.com/steam-kits/mbot. mblock: http://www.mblock.cc/?noredirect=en US.

364

I. Fronza et al. Table 3. Product assessment: criteria.

Criterion

HSH module 1

HSH module 2

Technical aspects

Choice of fonts and Length and image colors in the infographic quality in the video

Correct functioning, Limited/extended functionality

Structure

Narrative, i.e., the product follows a clear and logical line that guides the viewer from an introduction to a conclusion

Code quality: the presence of dead code, duplicated code, etc

Narrative, i.e., the product follows a clear and logical line that guides the viewer from an introduction to a conclusion

Data handling Usage of visual tools Usage of visual tools (i.e., symbols, charts) to (i.e., framings) to convey the intended convey the intended message message

HSE module

Correct reaction to input data (e.g., the user pressing a key)

Table 4. Process assessment: criteria for each XP practice, and points awarded.

7

Aspect

Criteria

Max. points awarded

Small releases

Ability to organize the activities so that 2.5 a prototype will be ready at the end of each iteration

On-site customer Ability to improve the product by taking advantage of the instructor’s feedback

2.5

Testing

Ability to test/check the product frequently

2.5

User stories

Ability to use the paper-based prototype to guide the production process and, during the final presentation, to justify any changes to the initial prototype

2.5

Classroom Experiences

This section describes the first classroom experiences that we have performed in two non-vocational high schools in Italy. High School of Humanities. This classroom experience involved 16 firstyear students (11 F, 5 M). One of the authors of this paper worked in the class as a regular teacher; this situation provided a high degree of interaction of the researcher with the students who had low awareness of being observed for research purposes [42]. The two modules took place in a sequence (after a short pilot study reported in [21]), in weekly blocks of two school hours during which students worked individually on the proposed activities. Among the curricular topics, we selected the focus of the two modules (i.e., non-verbal communication: gestures and body language) together with the school teacher.

Agile Methods Make It to Non-vocational High Schools

365

High School of Economics. We have involved 13 first-year students (7 F, 6 M) in ten hours of activity, which took place in weekly blocks of two school hours. In this case, two authors of this paper were alternatively present in the classroom, which guaranteed a high degree of interaction with the students. They have been introduced as external experts; however, students had low awareness of being observed for research purposes [42]. The students worked in teams decided by the school teacher: two teams of four (2 M and 2 F), and one team of five (2 M and 3 F).

8

Results

As detailed in Sect. 6, our assessment strategy focused both on product and process. In the following, we report the results of the assessment of these two aspects for HSH and HSE. 8.1

Product Assessment

HSH Module One. All the 16 infographics conformed with the initial requirement “create an infographic that describes one aspect of gestures of your choice”. Three main aspects have been selected: a) examples of the five categories of gestures; b) meaning of some gestures; c) difference of gestures in different countries. Based on the application of each criterion (Fig. 1), most of the infographics (12) were marked as sufficient, three as almost sufficient, and one as insufficient.

Fig. 1. HSH module one: product assessment – see [22].

HSH Module Two. Only 15 videos were collected, because a student moved to another school. Most of the students created a tutorial on body language, by

366

I. Fronza et al.

adopting different strategies such as simulating a tv show on the topic. Other students described some stories in which communication issues (such as the lack of a common language) were solved by using body language. Based on the assessment criteria (Fig. 2), twelve videos were sufficient, two almost sufficient, and one insufficient.

Fig. 2. HSH module two: product assessment – see [22].

HSE Module. During the activity Tell me how you make ambulance-robot, being aware of the limited time available to complete the project, all the teams identified a small set of requirements for an ambulance, namely: – red mode, for immediately life-threatening cases: maximum speed, siren and lights on; – amber mode, for medium urgent calls: medium speed, lights on, siren off; – green mode: low speed, siren and lights off. Regarding the assessment of the technical aspects (see Table 3), all three delivered ambulance-robots could be controlled by using a keyboard (e.g., “g” key to activate the green mode). As shown in Fig. 3, the correct functioning respect to the initial set of requirements was good for two ambulance-robots (team 2 and 3). Team 3 also added the possibility to turn off the lights, and tried to implement the possibility to stop in front of an obstacle by using a sensor; however, they did not have enough time to complete this feature. Team 1 (2 M and 2 F) was nearly sufficient for this criterion because only green and amber modes were implemented; moreover, lights were not properly working. During the activities, for all the teams, the major challenge was coordinating movement, lights, and sound. The structure of the solutions delivered by teams 1 and 2 is sufficiently good. Indeed, there is no duplicated or dead code left around. Instead, as abovementioned, the code of team 3 contains some parts that are not complete and

Agile Methods Make It to Non-vocational High Schools

367

thus not used; for this reason, team 3 got a “nearly sufficient” evaluation for this criterion (Table 3). However, we should take into account that team 3 is the only one that completed all requirements and then tried to go beyond its limits, trying to use a sensor.

Fig. 3. HSE: product assessment.

All the solutions demonstrate a good ability to handle data, which is, in this case, represented by the user’s input: each condition (i.e., key pressed by the user) is correctly managed. In all the classroom experiences described above, we could appreciate an improvement in the product’s quality throughout the iterations. 8.2

Process Assessment

HSH. Based on the application of the criteria in Table 4, ten students obtained a sufficient assessment at the end of the first module, and fourteen at the end of the second one. These students organized their activities so that they were able to present a prototype at the end of each iteration (i.e., small releases practice), and took advantage of the received feedback (on-site customer practice). Moreover, having to present a prototype at the end of each iteration worked as a motivating factor for testing more frequently. An increasing number of students (module one: 10/16 vs. module two: 14/15) used the paper-based prototype (user stories practice) to guide the creation of their final product. Figure 4 shows an example of a paper-based prototype (i.e., storyboard) produced during module two.

Fig. 4. HSH module two: an example of storyboard.

368

I. Fronza et al.

In many cases, students changed their minds with respect to the original paper-based prototype. In this case, they were allowed to change, but they were encouraged to update the design accordingly. Those students who did not follow this suggestion during the final presentation could not explain why the final product differed from the planned one; for this reason, they got a negative evaluation (i.e., almost sufficient). HSE. The organization of the process was very similar in the three teams, and, for this reason, they received a similar assessment (i.e., almost sufficient). All the teams worked by iterations; the length of the iteration was not set by the instructor but depended on the task (i.e., post-it) of the specific iteration. However, the instructor kept monitoring the teams and encouraged them to speed up or to conclude a task before starting another one. Sometimes, there was no “official feedback moment”: thanks to the limited number of teams, the instructor could watch a test and provide quick feedback, which was generally appreciated and applied (i.e., on-site customer practice). A different situation happened when a team did encounter some problems and could not proceed; in this case, the team was asking for an additional feedback session, which started with a demonstration of the prototype. In general, teams understood that they had to show a working prototype at the end of the task (i.e., small releases practice) and where testing frequently their solution. Regarding the user stories practice, because of the time constraint, the set of tasks was relatively limited, and teams thought they could remember (and select) the tasks without consulting their post-its. For team 1, this resulted in forgetting to implement one crucial requirement. During the activities, the instructor monitored those elements that could impact a team’s performance [15]. The cooperation between peers was sufficiently good, and each member contributed to the solution. However, sometimes one or two members were not active, which can mainly be explained by the team dimension. For example, when there was a need to focus on some coding activity, five people could not work all together on a single laptop, and this led to some members being inactive. However, most of the times, these students were taking advantage of this time to “visit” other teams and ask for suggestions. Only one person (in team 3) had a clear leader behavior, due to his previous experience with block-based programming languages. In this case, also in front of the others’ complaints, the instructor suggested to let also the other members write the code, which contributed to solving the problem.

9

Conclusion and Future Work

To provide an answer to RQ1, in this work, we described different approaches that we have designed to foster Software Engineering (SE) principles in different types of non-vocational high schools. This goal sets several challenges. For example, in the great variety of existing curricula, the CS course has different objectives and is a more or less pivotal topic. For the same reason, students may have different levels of interest in CS and STEM in general. This means that

Agile Methods Make It to Non-vocational High Schools

369

they might not perceive additional lectures on Software Engineering as useful. Therefore, the modules proposed in this paper have the following main characteristics: 1) they focus on the process side; 2) they do not introduce additional Software Engineering lectures, and teach Software Engineering principles while pursuing the existing curricular objectives, which are quite different in the two considered curricula; and 3) they leverage activities that are able to tap into students’ interest. Our first classroom experiences have shown that students started adopting a software engineering approach (RQ2), and organized their activities to have a prototype ready at the end of each iteration. In this regard, the results of HSH module two are much better than the other two modules (which have comparable results). This can be interpreted as a confirmation of the need for practically applying and repeating Agile methods until they become a habit to students [26]. Indeed, running the second module with the same group of students improved the results. Confirming these results would need further classroom experiences, which should go in two directions: 1) involving the same HSE group of students in a newly designed HSE module two; and 2) repeating the HSH module one and two by involving another group of students of the same program. Further experiments are needed to confirm these results and to develop a comprehensive assessment protocol for the process aspect. However, these findings are an indicator of the benefits of working in the EUSE perspective in different disciplines, including non-STEM ones. Compared to our previous work [22], we have included teamwork activities (in the HSE module), which benefit the most of an XP approach. However, also considering the time constraint, we chose not to introduce additional practices to support teamwork, such as stand-up meetings. If this was an excellent choice to focus on a limited number of practices and keep a level of comparability to the other modules, we acknowledge that more support to teamwork should be provided [2]. Moreover, smaller teams would allow all the team members to be involved in the activities all the time.

References 1. Abrahamsson, P., Fronza, I., Moser, R., Vlasenko, J., Pedrycz, W.: Predicting development effort from user stories, pp. 400–403 (2011) 2. Barkley, E.F., Cross, K.P., Major, C.H.: Collaborative Learning Techniques: A Handbook for College Faculty. Wiley, Hoboken (2014) 3. Barricelli, B.R., Cassano, F., Fogli, D., Piccinno, A.: End-user development, enduser programming and end-user software engineering: a systematic mapping study. J. Syst. Softw. 149, 101–137 (2019) 4. Bell, A.: Creating digital video in your school. Libr. Media Connect. 24(2), 54 (2005) 5. Bollin, A., Pasterk, S., Antonitsch, P., Sabitzer, B.: Software engineering in primary and secondary schools-informatics education is more than programming. In: 2016 IEEE 29th International Conference on Software Engineering Education and Training (CSEET), pp. 132–136. IEEE (2016)

370

I. Fronza et al.

6. Burnett, M.: What is end-user software engineering and why does it matter? In: Pipek, V., Rosson, M.B., de Ruyter, B., Wulf, V. (eds.) IS-EUD 2009. LNCS, vol. 5435, pp. 15–28. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-64200427-8 2 7. Burnett, M.M., Myers, B.A.: Future of end-user software engineering: beyond the silos. In: Proceedings of the on Future of Software Engineering, pp. 201–211. ACM (2014) 8. Burning Glass Technologies: Beyond point and click: the expanding demand for coding skill. https://bit.ly/2YYL60A (2016). Accessed 08 Apr 2019 9. Chimalakonda, S., Nori, K.V.: What makes it hard to teach software engineering to end users? Some directions from adaptive and personalized learning. In: 2013 IEEE 26th Conference on Software Engineering Education and Training (CSEE&T), pp. 324–328. IEEE (2013) ´ Hern´ ´ Garc´ıa-Pe˜ ´ 10. Conde, M.A., andez-Garc´ıa, A., nalvo, F.J., Fidalgo-Blanco, A., Sein-Echaluce, M.: Evaluation of the CTMTC methodology for assessment of teamwork competence development and acquisition in higher education. In: Zaphiris, P., Ioannou, A. (eds.) LCT 2016. LNCS, vol. 9753, pp. 201–212. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39483-1 19 11. Costabile, M.F., Mussio, P., Parasiliti Provenza, L., Piccinno, A.: End users as unwitting software developers. In: Proceedings of the 4th International Workshop on End-user Software Engineering WEUSE 2008, pp. 6–10. ACM, New York (2008) 12. European Commission: Bring your own device (byod). Technical report, ET 2020 Working Group on Digital Skills and Competences, European Commission (2016) 13. Few, S.: Show Me the Numbers. Analytics Press (2004) 14. Few, S.: Infographics and the brain: designing graphics to inform. http:// (2019). www.perceptualedge.com/articles/misc/Infographics and the Brain.pdf Accessed 19 July 2019 ´ Sein-Echaluce, M.L., Garc´ıa-Pe˜ ´ Using 15. Fidalgo-Blanco, A., nalvo, F.J., Conde, M.A.: learning analytics to improve teamwork assessment. Comput. Hum. Behav. 47, 149–156 (2015) 16. Fronza, I., El Ioini, N., Corral, L.: Blending mobile programming and liberal education in a social-economic high school. In: Proceedings of the International Conference on Mobile Software Engineering and Systems (MOBILESoft 2016), pp. 123–126. Association for Computing Machinery, New York (2016). https://doi. org/10.1145/2897073.2897096 17. Fronza, I., El Ioini, N., Pahl, C., Corral, L.: Bringing the benefits of agile techniques inside the classroom: a practical guide. In: Parsons, D., MacCallum, K. (eds.) Agile and Lean Concepts for Teaching and Learning, pp. 133–152. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-2751-3 7 18. Fronza, I., Zanon, P.: Introduction of computational thinking in a hotel management school [introduzione del computational thinking in un istituto alberghiero]. Mondo Digitale 14(58), 28–34 (2015) 19. Fronza, I., El Ioini, N., Corral, L.: Students want to create apps: leveraging computational thinking to teach mobile software development. In: Proceedings of the 16th Annual Conference on Information Technology Education, SIGITE 2015, pp. 21–26. ACM, New York (2015) 20. Fronza, I., Ioini, N.E., Corral, L.: Teaching computational thinking using agile software engineering methods: a framework for middle schools. ACM Trans. Comput. Educ. (TOCE) 17(4), 19 (2017)

Agile Methods Make It to Non-vocational High Schools

371

21. Fronza, I., Pahl, C.: End-user software engineering in K-12 by leveraging existing curricular activities. In: Proceedings of the 13th International Conference on Software Technologies, ICSOFT 2018, Porto, Portugal, 26–28 July 2018, pp. 283–289 (2018) 22. Fronza, I., Pahl, C.: Teaching software engineering principles in non-vocational schools. In: Lane, H., Zvacek, S., Uhomoibhi, J. (eds.) CSEDU, vol. 1, pp. 252– 261. SciTePress (2019) 23. Gartner Inc.: Market Guide for Rapid Mobile App Development Tools (2017) 24. Kastl, P., Kiesm¨ uller, U., Romeike, R.: Starting out with projects: experiences with agile software development in high schools. In: Proceedings of the 11th Workshop in Primary and Secondary Computing Education, pp. 60–65. ACM (2016) 25. Ko, A.J., et al.: The state of the art in end-user software engineering. ACM Comput. Surv. (CSUR) 43(3), 21 (2011) 26. Kropp, M., Meier, A.: Teaching agile software development at university level: values, management, and craftsmanship. In: 2013 IEEE 26th Conference on Software Engineering Education and Training (CSEE&T), pp. 179–188. IEEE (2013) 27. Bernard, M.: A beginner’s guide to creating sharable infographics. https:// www.forbes.com/sites/bernardmarr/2017/09/16/data-visualization-the-bestinfographic-tools-for-2017/#ff93a727d24e (2017). Accessed 19 July 2019 28. Matrix, S., Hodson, J.: Teaching with infographics: practicing new digital competencies and visual literacies. J. Pedag. Devel. 3(2), 17–27 (2014) 29. Meerbaum-Salant, O., Hazzan, O.: An agile constructionist mentoring methodology for software projects in the high school. ACM Trans. Comput. Educ. 9(4), n4 (2010) 30. Mikre, F.: The roles of assessment in curriculum practice and enhancement of learning. Ethiop. J. Educ. Sci. 5(2) (2010) 31. Monteiro, I.T., de Castro Salgado, L.C., Mota, M.P., Sampaio, A.L., de Souza, C.S.: Signifying software engineering to computational thinking learners with agentsheets and polifacets. J. Vis. Lang. Comput. 40, 91–112 (2017) 32. Morgan, H.: Technology in the classroom: creating videos can lead students to many academic benefits. Child. Educ. 89(1), 51–53 (2013) 33. Parsons, D., MacCallum, K.: Agile and Lean Concepts for Teaching and Learning: Bringing Methodologies from Industry to the Classroom. Springer, Cham (2019) 34. Romeike, R., G¨ ottel, T.: Agile projects in high school computing education: emphasizing a learners’ perspective. In: Proceedings of the 7th Workshop in Primary and Secondary Computing Education, pp. 48–57. ACM (2012) 35. Salza, P., Musmarra, P., Ferrucci, F.: Agile methodologies in education: a review. In: Parsons, D., MacCallum, K. (eds.) Agile and Lean Concepts for Teaching and Learning, pp. 25–45. Springer, Singapore (2019). https://doi.org/10.1007/978-98113-2751-3 2 36. Scaffidi, C., Shaw, M., Myers, B.: Estimating the numbers of end users and end user programmers. In: 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, pp. 207–214. IEEE (2005) 37. Scheubrein, R.: Elements of end-user software engineering. INFORMS Trans. Educ. 4(1), 37–47 (2003) 38. Shaw, M.: Software engineering education: a roadmap. In: Proceedings of the Conference on the Future of Software Engineering, pp. 371–380. ACM (2000) 39. Spires, H.A., Hervey, L.G., Morris, G., Stelpflug, C.: Energizing project-based inquiry: middle-grade students read, write, and create videos. J. Adolesc. Adult Lit. 55(6), 483–493 (2012)

372

I. Fronza et al.

40. Stegh¨ ofer, J.P., Knauss, E., Al´egroth, E., Hammouda, I., Burden, H., Ericsson, M.: Teaching agile: addressing the conflict between project delivery and application of agile methods. In: Proceedings of the 38th International Conference on Software Engineering Companion, pp. 303–312. ACM (2016) 41. Visme: A beginner’s guide to creating sharable infographics. https://www.visme. co/wp-content/uploads/2017/03/How%20to%20Make%20an%20Infographic%20%20A%20Visual%20Guide%20for%20Beginners%20By%20Visme.pdf (2019). Accessed 19 July 2019 42. Wohlin, C., Runeson, P., H¨ ost, M., Ohlsson, M.C., Regnell, B., Wessl´en, A.: Experimentation in Software Engineering: An Introduction. Kluwer Academic Publishers, Norwell (2000)

Testing the Testing Effect in Electrical Science with Learning Approach as a Factor James Eustace(B)

and Pramod Pathak

National College of Ireland, Dublin, Ireland [email protected], [email protected]

Abstract. The application of retrieval practice to electrical science education has been shown to be effective for student learning. While research is beginning to emerge in classroom contexts, the learning approach of students taking electrical science has not been considered as a factor when participating in retrieval practice. This research paper addresses this gap and presents a study of n = 207 students in a within-group design and the impact of retrieval practice within a practice testing learning framework on their subsequent performance in a high stake’s unseen criterion test. The Revised Two-Factor Study Process Questionnaire (R-SPQ-2F) was administered with n = 88 responses to determine learning approach before retrieval practice participation with an average score (Standard deviation) on Deep Approach = 29.32 (9.06) with Surface Approach = 22.53 (7.72). Students reported using a mix of deep and surface approaches with retrieval practice enhancing performance. The findings from this study support the application of retrieval practice to enhance learning in electrical science and provides guidelines for future educational research on retrieval practice in electrical science and other domains. Keywords: Testing effect · Retrieval practice · Practice testing learning framework · Electrical science · Learning approach

1 Introduction The testing effect, also known as retrieval practice, promotes practice testing as an effective learning technique. Learning involves the acquisition and encoding of new information with learners apply different learning approaches and techniques. Tests are usually associated with measuring achievement at the end of a module, semester or course. Retrieval practice can be an effective learning technique in itself where the act of retrieving and applying knowledge in a test creates the ‘testing effect’ which enhances learning [1–3]. Sections 1.1 and 1.2 provides a brief overview of retrieval practice research and conditions and their impact on learning. Section 2 explores students approach to learning while Sect. 3 describes the methods used in the study. Section 4 presents the data analysis and results followed by a discussion in Sect. 5 and finally Sect. 6 presents the conclusions of the study and future work.

© Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 373–397, 2020. https://doi.org/10.1007/978-3-030-58459-7_18

374

J. Eustace and P. Pathak

1.1 Retrieval Practice Over the last century, retrieval practice has continued to interest researchers [4] and has been shown to benefit learning where the material to be learned includes practice tests. This is also referred to as the ‘testing effect’ in the and is supported by a large body of research [5–8] with increased interest in the last decade [2]. There are several theoretical accounts of how the testing effect occurs including transfer appropriate processing [9], amount of exposure [10], elaborative retrieval [11–13], effortful retrieval [14, 15] and episodic context [16, 17] and while they differ in their approach the main theme that emerges is that retrieval practice can benefit learning compared to a restudy only condition. Further research is needed to investigate the cognitive processes that yield transfer of test-enhanced learning, as well as the circumstances under which that learning may or may not transfer [18]. Practice testing fits within the assessment as learning on the continuum of educational assessment and should use design characteristics or elements of both formative and summative assessment for optimum effect. While the techniques used in assessment for learning can be used to enhance learning they are not as a result of the direct effects of testing on learning but represent the mediated or indirect effects of testing on learning. The indirect effects of frequent testing refer to effects such as learners studying more regularly or guiding future study where taking practice testing identifies gaps in knowledge, or testing providing feedback to instructors [19]. Studies of the testing effect phenomenon have identified the positive effect of retrieval practice in consolidating learning compared to repeated study [11, 20, 21]. Retrieval practice research in classroom contexts has been limited with studies emerging using computerised quizzing in electrical science [22, 23], introductory statistics [24], mathematics for computing [25] and engineering [26]. Section 1.2 provides a brief review of selected practice test parameters and their impact on learning while Sect. 1.3 provides a brief description of the theoretical framework underpinning retrieval practice in this classroom study. 1.2 Retrieval Practice Parameters A significant amount of research has focused on manipulating retrieval practice conditions and comparing study only conditions to study test conditions. The benefits of testing with studying compared to study alone have been well established [8]. One size may not fit all for retrieval practice e.g. taking a practice test once may not be as beneficial as taking several practice tests in order to realise the benefits of test-enhanced learning requiring transfer [27]. The effectiveness of practice testing may also be reduced for learners with lower working memory capacity and higher trait test anxiety [28] or their approaches to learning. Self-regulated learning is a broad field that provides a lens by which to examine variables that influence student strategies to achieve their learning goals. While students struggle to regulate their learning promising directions with retrieval practice suggests that students can be taught and guided to use retrieval practice more effectively [29]. Karpicke (2017) in the conclusion identified areas for future work including examining retrieval practice from a contextual perspective on learning strategies and a pressing need to integrate retrieval practice into existing educational activities [2].

Testing the Testing Effect in Electrical Science with Learning Approach

375

Number of Practice Tests. Eustace and Pathak (2018) found that the number of retrieval practice attempts has an impact on electrical science learning in an earlier study involving Ohms Law, with 3 to 4 practice tests on each topic recommended [22]. These finding are consistent with a subsequent study in electrical science which extended the number of topics to also include Resistance Network Measurement (also referred to as Resistance within this paper) and Cables and Cable Terminations. The participants found Resistance a more difficult topic than the other treatment topics, and this was reflected in the no practice test group not meeting the minimum standard compared to the other treatment topics [23]. Practice Test Performance. While learners may attempt a practice test, effortful retrieval is required to enhance learning. Rawson and Dunlosky (2011) recommend recalling concepts to an initial criterion of three correct recalls and then to relearn them three times at widely spaced intervals [30]. In electrical science, overall performance in the practice tests also emerged as having a significant impact on learning when learners performed well on their practice tests, 70% or greater which was a predictor of subsequent performance in the unseen criterion test. Differing from many studies on practice testing, the practice tests and criterion tests involved questions requiring problem solving, where a process or procedure was applied in new contexts [22, 23]. Distributed Practice. Spaced practice (also known as distributed practice) is a learning strategy, where learning is broken up into several short sessions over a period. Massed practice, on the other hand, involves repeated attempts taken directly after each other. While much of the testing effect research has been in laboratory settings, there are some examples in classroom settings such as with vocabulary learning [31] and US history facts [32]. In electrical science, learners with higher levels of self-regulation tended to perform better by spacing attempts with learners with lower levels of self-regulation performing better by massing their practice [22]. Retention Interval (RI). The time lag between the last retrieval practice and the criterion test, also known as the retention interval, is another important parameter. Eustace and Pathak (2018) found that leaving a gap between criterion test and the last practice test promoted better learning outcomes in electrical science [22]. A meta-analysis by Cepeda et al. (2006) suggests that inter-study interval (ISI) and RI are inter-related, and the effect size depends on the joint effects of ISI and RI [33] however the optimal spacing between retrieval sessions in not some absolute quantity but depends very much on the RI [34].

1.3 Practice Testing Learning Framework A review of retrieval practice literature found that models of practice test design are not explicitly stated, while some apply Bloom’s Taxonomy, to high and low-level questions [35], fact and application questions [36] and others focused on the theoretical underpinnings of the testing effect [37]. Despite the availability of technology enhanced learning tools and systems, the effective integration of retrieval practice into teaching and learning presents challenges for practitioners. The main problem area identified and addressed

376

J. Eustace and P. Pathak

by this research is the lack of a unified framework, and the authors propose that for the benefits of practice testing to be fully realised, practice testing should be considered within a framework for learning. The PTLF [22, 24, 25] is an operationalisation of the Conversational Framework and where the e-assessment practice test environment is the “task practice environment” or teachers constructed environment [38] and is aligned with First Principles of Instruction [39]. Within the Conversational Framework [38], learning activities occur between two levels; the discursive/theoretical level and the practice/practical level and these activities reflect the learning process [40]. This framework has been influential in the analysis of formative e-assessment within educational domains including higher and further education, and work-based learning [41], in the design of learning environments [42] and more recently in practice testing [22–25]. The PTLF [23] adapted from [38] is used to inform the design and adaptation the Practice test environment supporting alignment between curriculum learning outcomes and test item development, Hess’s CR matrix supports retrieval-based learning within the PTLF (Fig. 1).

Fig. 1. Practice testing learning framework [23] adapted from [38].

van Gog and Kester (2012), found that the study only condition group outperformed the study test condition group with no testing effect observed on a delayed retention test after 1 week. The authors acknowledge that three minutes per task may have been a limitation of the study however based on an earlier study, three minutes seemed to be appropriate for the acquisition stage [43]. The materials used by Eustace and Pathak (2018, 2019) are more complex and instead of retrieval of a word, fact or list, the learner is retrieving a solution procedure to solve a target problem. The problem task is isomorphic, not using the information previously presented but instead requiring a solution procedure to complete. The materials presented during the acquisition phases consisted of drawings of malfunctioning parallel electrical circuits, information sheet explaining Ohm’s Law and providing different forms of the formula. The questions presented required application and inference to solve the problems presented [22, 23]. Problem solving is very relevant to education in STEM subjects such as in mathematics

Testing the Testing Effect in Electrical Science with Learning Approach

377

and science and using worked examples can help learners and reduces the cognitive load [44]. Materials of this complexity have much greater ecological validity as learners are required to demonstrate knowledge, skills or competence in new contexts going beyond the memorial benefit or retrieving a fact or information presented but applying and making connections to new material. Practice testing studies involving problem solving within the PTLF have found benefits to learning where the questions are isomorphic in topics such as Ohms Law [22] and Resistance Network Measurement [23]. Nelson (1990) described a framework for metamemory as the interaction between metacognitive monitoring and metacognitive control during the acquisition, retention, and retrieval of to-be-learned material. This framework for metamemory [45] with additions from [46] and [15] positions retrieval practice in the retention stage under maintenance of knowledge and allows learners to monitor and make judgements on their learning. However learners self-assessments can be biased [15] and they may need scaffolding and a more structured environment such as being provided with training on self-assessment [47, 48], or being provided with instructions as to how to use retrieval practice [29]. Within the PTLF, retrieval practice is also considered as part of the learning or acquisition stage as part of ongoing learning as well as for the maintenance of knowledge. In this way learners can monitor and control their learning to become more self-regulated and practitioners can also monitor and control teaching strategies reflecting on the feedback from the PTLF.

2 Student Approach to Learning A body of research has emerged around Student Approaches to Learning (SAL), evolving from deep or surface level processing [49] to deep or surface level approaches [50] where students study activity is a result of their interaction with the environment with those adopting deep approaches having higher quality learning outcomes. Assessment strategy influences students study approach so assessments that encourage a deep approach to learning are recommended [51]. The deep-level focuses on knowledge transformation while surface-level processing tends to focus on reproducing information [49]. Students may use either deep or surface study approaches however the deep/surface dichotomy is not a constant condition. The student’s approach depends on several factors; the type of task, its content and form, how it will be assessed and previous experience. The operational outcome will be the intention to adopt either a deep or surface approach [52]. The ability of students to self-regulate or monitor their learning is essential to effectively their guide study and learning effort. Self-regulation is the self-directed process by which learners transform their mental abilities into academic skills. Learning is an activity that students do for themselves in a proactive way rather than as a covert event that happens to them in reaction to teaching [53]. Test expectancy influences students’ perceptions of what constitutes learning and may impact how students prepare for tests and monitor their learning [35, 54]. While practice testing provides feedback on learning and may improve meta-cognitive awareness for students [55, 56], they may not be aware of its benefits [19, 20, 57].

378

J. Eustace and P. Pathak

2.1 Revised Two-Factor Study Process Questionnaire The R-SPQ-2F was derived from the original Study Process Questionnaire (SPQ) [58] reducing from 43 items to 20 items for evaluating the learning approaches of students. The revised instrument assesses surface and deep approaches only, using fewer items [59]. In the original SPQ, the Presage-Process-Product (3P) model illustrates a dynamic system interacting between student factors, teaching context, on-task approaches to learning and the learning outcomes product. The intended use of the SPQ is not to categorise students as either surface or deep learners on the basis of their responses, rather their responses are a function of their own characteristics and the teaching context, so both the teacher and student are jointly responsible for the outcome. [59]. The R-SPQ-2F identifies deep and two surface strategies with 10 items for each and have been evaluated in teacher education [60], optometry [61], education and psychology [62]. This study extends the research on retrieval practice within electrical science by considering the learning approach of students taking electrical science as a factor when participating in retrieval practice. Two research questions guided this study: (1) What impact does the learning approach of apprentices have on their retrieval practice in Electrical Science? (2) What retrieval practice parameters are optimal for learning transfer?

3 Methods The methods adopted in this study minimised disruption to apprentices and were implemented for two topics. Learner ability, motivation and other learning opportunities are uncontrolled factors however by comparing participant performance within the topics in the ‘noisy’ classroom should address these concerns. The validity of the research design is discussed further in Sect. 5.1. 3.1 Course and Materials As in previous studies [22, 23], the course was an apprenticeship for Electrical apprentices enrolled in a 4-year national programme. The study was conducted during Phase 2, which is delivered in Education and Training Board (ETB) training centres over 22 weeks. The study focused on the first two units within the Electrical Science Module, Ohms Law and Resistance. The materials developed for earlier studies were reused and consisted of MCQ test items, assembled into a test bank. The practice test development approach employed Hess’s CR matrix to map learning outcomes to support item development and classification (Table 1). Test items were designed to reflect the cognitive process dimension with required depth of knowledge. The Apprenticeship Moodle Learning Management System was used to deploy the practice tests. The Criterion test is the national T1 Theory Test used in the Apprenticeship Programme. It consists of 75 items, four option MCQ’s with one correct option. The Criterion test is unseen to participants and delivered by a different assessment management system, not linked to the practice test item bank. Apprentices must successfully answer at least 52 of the items correctly to pass the Criterion test.

Testing the Testing Effect in Electrical Science with Learning Approach

379

The practice test items were either topically related or isomorphic where problem solving was tested. The criterion test is an unseen test and administered externally using a separate system. In Ohms Law, the learning outcome “Calculate circuit values using Ohms Law” requires application and applying a skill or concept where the learner would solve problems. E.g. “A circuit connected to a 200 V supply draws 5 Amps, what is the resistance of the circuit?” Similarly, for Resistance, learners are required to solve a range of problems E.g. “Three resistors 7 , 14  and 14  are connected in parallel. This bank is connected in series with a 2.5  resistor. The current through the 2.5  resistor is 2 amps. What is the supply voltage?” As in previous studies, Criterion test topics included Ohms Law/The Basic Circuit; Resistance Network Measurement; Power and Energy; Cables and Cable Termination; Lighting Circuits; Bell Circuits; Fixed Appliance and Socket Circuits; Earthing and Bonding and Installation Testing [22, 23]. Table 1. Learning outcomes supported by Retrieval Practice adapted from [23]. Unit

Learning Outcome(s)

Ohms law/the basic circuit

Identify graphical symbols associated 2 with the basic circuit

2

State the units associated with basic electrical quantities

2

1

State the three main effects that electric current has upon the basic circuit

2

1

Calculate circuit values using Ohm’s Law

3

2

Identify the differences between series, parallel and series/parallel resistive circuits using a multimeter

2

2

Calculate the total resistance, voltage 3 and current of series, parallel and series/parallel resistive circuits using the relevant formulae and a multimeter

2

Identify the differences between series, parallel combinations of cells in relation to the voltage and current outputs using the relevant formulae and a multimeter

3

2

Explain resistivity and list the factors 2 which affect it

1

Resistance network measurement

CPD* 1 = Remember, 2 = Understand, 3 = Apply. DOK** 1 = Recall and Reproduction, 2 = Skills and Concepts.

CPD*

DOK**

380

J. Eustace and P. Pathak

3.2 Participants The participants in the study were n = 207 Electrical apprentices on Phase 2 of their national Electrical apprenticeship programme 2017/2018. The assignment of apprentices to classes is based on when an apprentice is registered, this registration number is allocated at the beginning of an apprenticeship by SOLAS the Further Education and Training Authority. Typical apprentice class size, n = 14. All participants were enrolled in the Apprenticeship Moodle Learning Management System following registration, which provides access to course material and resources. The practice tests were provided as an optional course resource to all apprentices. 3.3 Procedures Ohms Law/The Basic Circuit and Resistance Network Measurement – randomised tests were available, with up to 4 attempts allowed. The practice tests consisted of 20 MCQ’s, with a minimum forced delay of 1 day between attempts and a 20-min time limit allowed per test. Feedback was deferred, apprentices were required to select an answer to each question and then submit the test, before the test is graded, or feedback given. Feedback is shown immediately after the attempt showing whether correct, the correct answer and the marks received. Participants attempted practice tests in their own time which were available for the duration of the course. Participants were informed by email and Moodle message that the practice tests consisted of 20 multiple choice questions and once they started, they had 20 min to complete the test. Participants were also informed they would have to wait 1 day between attempts and that the practice test results were not included in the course result calculation. Participation in the study was optional. Test items aligned to the unit learning outcomes using the Cognitive Rigor Matrix reflecting both the cognitive process dimension and the depth of knowledge for each. The test specification for practice test topics divided the topics into subtopics aligned with the learning outcomes and key learning points with over 200 items available for selection. Items were randomly selected each time the tests were taken with 2/3 items randomly selected from each of the subtopics, selecting 20 items overall for each practice test. The remaining seven topics were used as a control as no practice tests were provided. All topics were assessed in the criterion test which is typically administered around week 12 of the course. The additional ‘noisy’ activities of apprentices and instructors were not controlled, i.e. participants may have undertaken self-testing, taken additional instructor-led paper-based tests or applied preferred study techniques. Before participating in the study, learners were invited to complete the Revised Two-Factor Study Process Questionnaire (R-SPQ-2F). Participants expressed their approaches to learning on a 5-point Likert-type scale ranging from ‘never or only rarely true of me’ (Score of 1) to ‘always or almost always true of me’, (Score of 5).

4 Data Analysis and Results The internal consistency and results from the R-SPQ-2F are presented first, followed by individual analysis on several items from the questionnaire. The learner responses are

Testing the Testing Effect in Electrical Science with Learning Approach

381

analysed in relation to their subsequent performance in the topics and their participation in the practice tests. 4.1 Internal Consistency of the R-SPQ-2F There were n = 88 responses from the possible 207 participants giving a response rate of 42.5% [63] to determine learning approach prior to retrieval practice participation with an average score (Standard deviation) on Deep Approach = 29.32 (9.06) with Surface Approach = 22.53 (7.72). An evaluation of the main R-SPQ-2F scales for internal consistency using Cronbach’s Alpha coefficients resulted in Deep Approach (.923) and Surface Approach (.871). To facilitate further analysis of individual questions, the Likert scale was reduced to a three-point scale (never or sometimes, about half the time, frequently or always). Internal consistency remained high with Cronbach’s Alpha coefficients for Deep Approach (.867) and Surface Approach (.831). 4.2 Individual Item Analysis A series of One-Way ANOVA were conducted on the 20 item R-SPQ-2F questionnaire with the dependent variable, the performance in the treatment topics. Two questions had findings of significance, questions 3 and 8 for Resistance. Individual item analysis was conducted on these first, beginning with question 3. Question 3 A One-Way ANOVA with the dependent variable, the performance in the Resistance topic in the criterion test, with response to question 3, “My aim is to pass the course while doing as little as possible” as the factor found a significant difference with p = .042, F = 3.297 between groups. A One-Way ANOVA for Ohms Law did not have a finding of significance with p = .247, F = 1.422 between groups. A large effect was observed with Cohens d = 1.02 between n = 71, “Not true of me” and n = 8, “True of me” for Resistance, while a medium effect size was observed with Cohens d = 0.55 between, “Not true of me” and “True of me” for Ohms Law (Table 2 and Fig. 2). Of the n = 71 participants who responded “Not true of me”, n = 31 did not participate in practice tests (M = 70.61, SD = 18.932), n = 24 completed 1 practice test (M = 75.93, SD = 19.586), n = 14 completed 2 to 3 practice tests (M = 83.33, SD = 14.289) and n = 2 completed 4 practice tests (M = 94.44, SD = 7.857) for Resistance. Of the n Table 2. Performance in Resistance and Ohms Law topics with response to question 3 Resistance

Ohms law

N Mean Std. Dev Mean Std. Dev Not true of me True of me half the time True of me Total

71 75.59 18.745

85.77 14.800

9 76.54 16.144

86.67 17.321

8 58.33 15.430

76.25 19.226

88 74.12 18.739

85.00 15.536

382

J. Eustace and P. Pathak

Fig. 2. Table 3. Performance in Ohms Law and Resistance topics with response to question 3.

Fig. 3. Performance in Resistance with number of practice tests clustered on question 3.

Testing the Testing Effect in Electrical Science with Learning Approach

383

= 8 participants who responded “True of me”, n = 6 did not participate in practice tests (M = 55.56, SD = 9.938) and did not pass the topic, n = 1 completed 1 practice test (M = 44.44) and n = 1 completed 2 to 3 practice tests (M = 88.89) for Resistance. In contrast, participation in practice tests for Ohms Law was greater with n = 5 completing 1 practice test (M = 72.00, SD = 13.038) and n = 3 completing 2 to 3 practice tests (M = 83.33, SD = 26.868) with a medium effect size of Cohen’s d = 0.54. The size of the group that responded “True of me” is small and the level of engagement in practice tests reduced with strong initial engagement with Ohms Law and reduced participation for Resistance. The performance in the resistance topic with number of practice tests clustered on question 3 is illustrated in Fig. 3. Learners who adopted a more surface approach to their learning did not perform as well in the no practice test or 1 practice tests groups. However, when learners completed 2 to 3 practice tests their performance increased substantially in line with learners adopting a deeper approach. A One-Way ANOVA with the dependent variable, the performance in the Resistance topic in the criterion test, with number of practice tests and “True of me” in response to question 3 as the factor, n = 8, found a significant difference with p = .048, F = 5.938 between groups.

Fig. 4. Performance in Ohms Law, Resistance and Power and Energy topics with response to question 8.

Question 8. A One-Way ANOVA with the dependent variables, the performance in Ohms Law, Resistance and Power and Energy, with response to Question 8, “I learn some things by rote, going over them until I know them by heart even if I do not

384

J. Eustace and P. Pathak

Fig. 5. Performance in Resistance with number of practice tests clustered on question 8.

Table 3. Performance in Resistance and Ohms Law topics with response to question 10. Resistance

Ohms law

N Mean Std. Dev Mean Std. Dev Not true of me

24 80.09 13.097

85.00 13.513

True of me half the time 25 67.56 23.333

84.40 15.292

True of me

39 74.64 17.466

85.38 17.144

Total

88 74.12 18.739

85.00 15.536

understand them” as the factor found a significant difference in performance with p = .003, F = 6.306 between groups for the Resistance topic and p = .011, F = 4.769 between groups for the Power and Energy topic. The criterion tests performance results are illustrated in Fig. 4, and responses to question 8, with n = 42 responding “Not true of me”, n = 26 responding “True of me about half the time” and n = 20 responding “True or me”. Performance in Resistance increased with number of practice tests with 2 to 3 practice tests bringing the “True of me” group comparable with the “Not true of me group”, illustrated in Fig. 5 with n = 12 completing no practice tests (M = 62.04, SD = 22.453), n = 5 competing 1 practice test (M = 60, SD = 12.669) and n = 3 completing 2 to 3 practice tests (M = 81.48, SD = 12.830).

Testing the Testing Effect in Electrical Science with Learning Approach

385

Question 10. A One-Way ANOVA with the dependent variable, the performance in Resistance, with response to Question 10, “I test myself on important topics until I understand them completely” as the factor did not find a significant difference in performance with p = .061, F = 2.889 between groups for the Resistance topic. Of the n = 24 who responded, “Not true of me” (M = 80.09, SD = 13.097), n = 17 availed of practice tests with n = 7 not taking any practice tests (M = 73.02, SD 12.599). Of those who availed of practice tests n = 9 completed 1 practice test (M = 81.48, SD = 13.608), n = 7 completed 2–3 practice tests (M = 84.13, SD 12.599) and n = 1 completed 4 practice tests (M = 88.89). Taking 2 to 3 practice tests had a large effect size with Cohen’s d = 0.88 on participants who reported “Not true of me” compared with not taking practice tests. Of the participants n = 39 who responded “True of me”, n = 20 did not participate in practice tests (M = 71.67, SD = 16.312), n = 12 completed 1 practice test (M = 75.93, SD = 20.561) and n = 7 completed 2 to 3 practice tests (M = 80.95, SD = 15.335) for Resistance. Taking 2 to 3 practice tests had a small effect size with Cohen’s d = 0.39 on participants who reported “True of me” compared with not taking practice tests.

Table 4. Performance in Resistance and Ohms Law topics with response to question 20. Resistance

Ohms law

N Mean Std. Dev Mean Std. Dev Not true of me

36 79.32 16.190

88.89 11.656

True of me half the time 31 69.53 21.463

82.58 18.613

True of me

21 71.96 17.076

81.90 15.690

Total

88 74.12 18.739

85.00 15.536

Question 20. Individual item analysis of question 20, “I find the best way to pass tests is to try to remember answers to likely questions”, “Not true of me”, n = 36, “True of me about half the time, n = 31 and “True of me”, n = 21. Of the n = 88 who responded to the questionnaire, n = 40 did not participate in the practice tests for Resistance (M = 67.50, SD = 18.212), n = 28 completed 1 practice test (M = 75.40, SD = 19.211), n = 18 completed 2 to 3 practice tests (M = 84.57, SD = 13.278) and n = 2 completed 4 practice tests (M = 94.44, SD = 7.857). A One-Way ANOVA with the dependent variables, the performance in Resistance, with the number of practice tests as the factor found a significant difference in performance with p = .003, F = 4.951 between groups with a large effect size, Cohen’s d = 1.07 between no practice tests and 2 to three practice tests. Participants adopting a deeper approach to their learning performed better in each of the topics as illustrated in Fig. 6 while completing 2 to 3 practice tests improved performance for all participants in Resistance as illustrated in Fig. 7.

386

J. Eustace and P. Pathak

Fig. 6. Performance in Ohms Law, Resistance and Power and Energy topics with response to question 20.

Fig. 7. Performance in Resistance topic and number of practice tests clustered on question 20

Testing the Testing Effect in Electrical Science with Learning Approach

387

4.3 Practice Test Performance Of the n = 207 participants, n = 137 completed practice tests from both Ohms Law and Resistance. A paired-samples t-test was conducted to compare criterion test topic performance between practice test and no practice test topics found a significant difference in the scores for practice test topics (M = 83.81, SD = 11.4) and no practice test topics (M = 79.97, SD = 11.8); t(136) = 4.258, p < .001. A small effect size was observed with Cohen’s d = 0.33. Of those that responded to the questionnaire n = 88, n = 48 participated in the practice tests for both topics. A paired-samples t-test was conducted to compare criterion test topic performance between practice test and no practice test topics found a significant difference in the scores for practice test topics (M = 85.34, SD = 12.7) and no practice test topics (M = 81.97, SD = 12.3); t(47) = 2.390, p = .021. A small effect size was observed with Cohen’s d = 0.28. Table 5. Performance in topics with and without practice tests in response to question 10. Topics with

Topics with no

Practice Tests

Practice tests

N Mean Std. Dev Mean Std. Dev Not true of me

17 86.80 10.120

80.29 11.020

True of me half the time 12 85.56 13.016

82.94 12.099

True of me

19 83.89 14.915

82.88 13.992

Total

48 85.34 12.702

81.98 12.336

Within group performance on topics with and without practice tests is analysed in response to question 10, “I test myself on important topics until I understand them completely” with n = 17 responded “Not true of me”, n = 12 responded “True of me about half the time” and n = 19 responded “True of me”. The findings are presented in Table 4 and in Fig. 8. Participants that reported “Not true of me” performed better in the topics that included practice tests than the topics without practice tests with a medium effect size of Cohen’s d = 0.62. A small effect size was observed with Cohen’s d = 0.21 for those that reported “True of me about half the time”, with no effect observed with Cohen’s d = 0.07 for participants that reported “True of me” (Table 5). Overall performance in the practice tests had a significance impact in the criterion test result. Participants who had a maximum score in their practice tests of less than 69%, n = 49, had a mean performance in the criterion test topic of 71.63 while participants who scored 70% or greater, n = 158, had a mean performance of 90.19 in the criterion test topic. A One-Way ANOVA with the dependent variable, the performance in the Ohms Law/The Basic Circuit topic in the criterion test, with performance on the practice tests as the factor found a significant improvement in performance with p < .001, F = 68.824 between groups. A large effect was observed with Cohens d = 1.18. The was a significant difference in scores with the number of practice test attempts completed for Resistance

388

J. Eustace and P. Pathak

Fig. 8. Performance in topics with and without practice tests in response to question 10.

with n = 70 taking no practice tests, n = 80 completing 1 practice test, n = 53 completing 2-3 practice tests and n = 4 completing 4 practice tests with mean criterion test scores ranging from 69 to 86%. A One-Way ANOVA with the dependent variable, the performance in Ohms Law topic in the criterion test, with number of attempts on the practice tests as the factor found no significant difference in performance with p = .308, F = 1.185, with n = 116 completing 1 practice test (M = 84.31, SD = 16.536), n = 81 completing 2-3 practice tests (M = 87.78, SD = 14.491) and n = 10 completing 4 practice tests (M = 87.00, SD = 16.364). A One-Way ANOVA with the dependent variable, the performance in Resistance topic in the criterion test, with number of attempts on the practice tests as the factor found a significant improvement in performance with p = .002, F = 5.156 between groups with n = 70 taking no practice tests (M = 69.05, SD = 18.134), n = 80 completing 1 practice test (M = 74.31, SD = 17.103), n = 53 completing 2 to 3 practice tests (M = 80.29, SD = 14.721) and n = 4 completing 4 practice tests (M = 86.11, SD = 13.981) illustrated in Fig. 9. Overall performance in the practice tests also had a significance impact in the criterion test result. Participants who had a maximum score in their practice tests of 69% or less, n = 50, had a mean performance in the criterion test topic of 70 while participants who scored 70% or greater, n = 87, had a mean performance of 80.97 in the criterion test topic. A One-Way ANOVA with the dependent variable, the performance in the Resistance topic in the criterion test, with performance on the practice tests as the factor found a significant improvement in performance with p < .001, F = 15.810 between groups. A medium effect was observed with Cohens d = 0.69.

Testing the Testing Effect in Electrical Science with Learning Approach

389

Fig. 9. Number of practice tests and performance in Ohms Law and Resistance.

4.4 Retention Interval The retention interval is the time lag between the last retrieval practice and the criterion test. For Ohms Law with participants n = 207, n = 21 completed their last practice test on the day of the criterion test (M = 81.43, SD = 3.981); n = 107, between 1 and 10 day (M = 88.50, SD = 13.374); n = 41, between 11 to 20 days (M = 85.85, SD = 14.996) and n = 38, 21 or more days (M = 80.53, SD = 19.721). A One-Way ANOVA with the dependent variable, the performance in Resistance topic in the criterion test, with retention interval as the factor found a significant improvement in performance with p = .028, F = 3.095 between groups. A medium effect was observed with Cohens d = 0.72 between 0 and 1 to 10 days retention interval. The retention interval for Resistance with participants n = 137, n = 9 completed their last practice test on the day of the criterion test (M = 74.07, SD = 16.667); n = 83, between 1 and 10 day (M = 78.58, SD = 14.499); n = 29, between 11 to 20 days (M = 76.25, SD = 17.499) and n = 16, 21 or more days (M = 71.53, SD = 15.164). A small effect was observed with Cohens d = 0.27 between 0 and 1 to 10 days retention interval (Fig. 10).

390

J. Eustace and P. Pathak

Fig. 10. Retention Interval in Days for Ohms Law and Resistance.

4.5 Distributed Practice Of the n = 207 participants, n = 116 completed one practice test for Ohms Law (M = 84.31, SD = 16.536). The remaining participants distributed their practice with n = 46 over 1 to 10 days (M = 87.61, SD = 14.634), n = 32 over 11 to 20 days (M = 90.94, SD = 12.791) and n = 13 over 21 days (M = 80, SD = 16.833). A small effect size was observed with Cohens d = 0.45 between taking a single practice test and distributing practice over 11 to 20 days for Ohms Law. Of the n = 137 participants, n = 80 completed one practice test for Resistance (M = 74.31, SD = 17.103). The remaining participants distributed their practice with n = 31 over 1 to 10 days (M = 79.57, SD = 15.739), n = 18 over 11 to 20 days (M = 83.95, SD = 12.767) and n = 8 over 21 days (M = 77.78, SD = 14.548). A medium effect size was observed with Cohens d = 0.64 between taking a single practice test and distributing practice over 11 to 20 days for Resistance (Fig. 11).

Testing the Testing Effect in Electrical Science with Learning Approach

391

Fig. 11. Distributed Practice in Days for Ohms Law and Resistance.

5 Discussion This paper investigated e-assessment practice testing within the Practice Testing Learning Framework (PTLF) in electrical science and how participants employed the practice tests to support their learning. A discussion follows on the limitations of the experimental design, participant engagement in the practice tests and reflection on previous findings. 5.1 Limitations of the Experimental Design The study has several limitations as participants (1) were not randomly assigned to treatment and non-treatment conditions, (2) pretesting was not conducted prior to engagement in practice tests, (3) additional activities were not controlled, (4) the practice tests were provided as optional and (4) practice test parameters reported were not assigned but observed as they occurred in the classroom. While it may be argued that the participants that availed of practice tests were more motivated or higher performing and that other uncontrolled factors such as self-testing influenced the enhanced performance, the within group design of comparing performance against similar topics where no practice tests are provided does offset some of these concerns. Eustace and Pathak (2019) adopted a within-group design and demonstrated findings of significance with the paired sample T-Test showing learning was enhanced for practice test participants that engaged in the practice tests in the treatment topics but did not enhance their learning in the non-treatment topics. These findings would suggest that the practice testing treatment condition had a significant impact on learning and that the within-group design is valid

392

J. Eustace and P. Pathak

for the classroom experiment [23]. Preserving the ecological validity of the classroom setting and opting not to balance the time spend on quizzes with filler activities [64] is more indicative of common practice in the classroom than having a restudy condition [65]. 5.2 R-SPQ-2F The R-SPQ-2F questionnaire was used to identify deep and surface study approaches of participants. The R-SPQ-2F instrument scales resulted in high internal consistency for the five-point and the reduced three-point scale with Cronbach’s Alpha coefficients for Deep Approach (.867) and Surface Approach (.831). The instrument was useful for individual item analysis as several questions were directly applicable to the retrieval practice treatment including questions 3, 8, 10 and 20. Three of these questions were on surface approach scale with question 10 on deep approach scale. This study explored several retrieval practice parameters with responses to the R-SPQ-2F as a factor and their impact on electrical science learning. Analysis of the responses to question 3, “My aim is to pass the course while doing as little as possible” indicating a surface approach found a significant difference between groups and a large effect was observed with Cohens d = 1.02 for Resistance. No finding of significance for Ohms Law, the less difficult of the two topics, with a medium effect size observed with Cohens d = 0.55. Taking 2 to 3 practice tests for Resistance had a significant impact on learning with participants benefitting irrespective of indicating deep or surface approaches. However, respondents who reported a surface approach needed two to three practice tests to pass the topic. A One-Way ANOVA with the dependent variable, the performance in the Resistance topic in the criterion test, with number of practice tests and “True of me” in response to question 3 as the factor found a significant difference with p = .048, F = 5.938 between groups. Analysis of the participant responses to question 8, “I learn some things by rote, going over them until I know them by heart even if I do not understand them” indicating a surface approach found a significant difference between groups for Resistance. A large effect size was observed for participants who indicated a surface approach and availed of 2 to 3 practice tests compared to those that did not take practice tests. Participants who indicated a deep approach performed well without practice tests and a medium effect size was observed for those who availed of 2 to 3 practice tests. Analysis of the participant responses to question 10, “I test myself on important topics until I understand them completely” indicating a deep approach found a significant difference between groups for Resistance. A large effect size was observed for participants who indicated a surface approach and availed of 2 to 3 practice tests compared to those that did not take practice tests. Participants who indicated a deep approach performed well without practice tests and a small effect size was observed for those who availed of 2 to 3 practice tests. Analysis of the participant responses to question 20, “I find the best way to pass tests is to try to remember answers to likely questions” indicating a surface approach found a significant difference between groups for Resistance. A large effect size was observed for participants who indicated a surface approach and availed of 2 to 3 practice tests compared to those that did not take practice tests. Participants who indicated a deep

Testing the Testing Effect in Electrical Science with Learning Approach

393

approach performed well without practice tests and a large effect size was also observed for those who availed of 2 to 3 practice tests. 5.3 Complexity of Learning Materials Many studies involving retrieval practice have been conducted in laboratory settings and while some progress has been made in applying the research in classroom contexts using educationally relevant materials, there are concerns that the testing effect decreases or disappears as the complexity of authentic learning materials increase, what is referred to as ‘element interactivity’ [66]. Research in classroom contexts have tended to concentrate on test items requiring the retrieval of facts [32, 67, 68] and has tended to focus on recall with identical or similar test items in both practice and criterion tests. In a number a studies the final test material is identical to the practice test [32, 69], or modified or rephrased versions of the same test [8]. The robustness of the testing effect is evident in situations where criterion tests are identical to practice tests [8, 32, 69] or very similar to the practice tests [70]. This study and earlier studies in electrical science in classroom contexts [22, 23] provides evidence of a testing effect where learning transfer is required in topically related information and problem-solving in Ohms Law and Resistance. 5.4 Practice Test Parameters and Engagement The within-group analysis using a paired sample test found a significant difference in performance for those that participated in practice tests compared with their performance in topics that did not have practice tests and overall performance in practice tests also had a significant impact on learning which is consistent with earlier findings [22]. The number of practice tests were limited in this study to 4 attempts where all learners participated in the Ohms Law practice tests with no significant difference evident between 1, 2 to 3 and 4 attempts. In Eustace and Pathak (2019) learners could avail of up to 9 attempts where n = 44 did not participate in practice tests for Ohms Law and a significant difference was observed in the number of practice tests for all treatment topics. The relative difficulty of the electrical science topics positioned Power and Energy, Installation Testing and Resistance as the more difficult topics in that order. Ohms Law is generally not as difficult a topic for learners as the no practice test group passed the topic while performance was enhanced by practice testing [23]. A small to medium effect size was observed when participants left a 1 to 10-day time lag from their last practice test before the criterion test. Taking a practice test on the day of the criterion test is not recommended. Distributed practice over 11 to 20 days is recommended as small to medium effect sizes were also observed for participants who distributed their practice over 11 to 20 days compared with a single practice test.

6 Conclusion and Future Work Retrieval practice within the PTLF enhances performance in electrical science and benefits learning in topics requiring problem solving. The application of the PTLF in mathematics [25], statistics [24] and electrical science [22, 23] demonstrates its use in STEM

394

J. Eustace and P. Pathak

domains in traditional and flipped classrooms. The R-SPQ-2F provides a useful tool for teachers in evaluating teaching and learning initiatives. Reflecting on the research questions within the ‘noisy’ classroom, student learning approach does impact on learning outcomes particularly when a surface approach is adopted however by taking 2 to 3 practice tests, spacing the practice tests over 11 to 20 days and leaving a retention interval of a number of days prior to the criterion test enhances learning and is recommended for electrical science. Future work will expand the PTLF to include more topics within electrical science, explore the effect of feedback types and examine if informing participants on the benefits of retrieval practice enhances self-regulation.

References 1. Dunlosky, J., Rawson, K.A., Marsh, E.J., et al.: Improving students’ learning with effective learning techniques: promising directions from cognitive and educational psychology. Psychol Sci Public Interes 14, 4–58 (2013). https://doi.org/10.1177/1529100612453266 2. Karpicke, J.D.: Retrieval-Based Learning : A Decade of Progress, Third edn. Elsevier (2017) 3. Roediger, H.L.I., Karpicke, J.D.: The power of testing memory basic research and implications for educational practice. Perspect. Psychol. Sci. 1, 181–210 (2006) 4. Abbott, E.E.: On the analysis of the factor of recall in the learning process. Psychol. Rev. Monogr. Suppl. 11, 159–177 (1909). https://doi.org/10.1037/h0093018 5. Pan, S.C., Agarwal, P.K.: Retrieval practice and transfer of learning. Fostering Students’ application of knowledge, vol. 12 (2018) 6. Agarwal, P.K., Roediger, H.L.I., McDaniel, M.A., McDermott, K.B.: How to use retrieval practice to improve learning, vol. 12 (2018) 7. Roediger, H.L.I., Karpicke, J.D.: Test-enhanced learning: taking memory tests improves longterm retention. Psychol. Sci. 17, 249–255 (2006) 8. Roediger, H.L.I., Agarwal, P.K., McDaniel, M.A., McDermott, K.B.: Test-enhanced learning in the classroom: long-term improvements from quizzing. J. Exp. Psychol. Appl. 17, 382–395 (2011) 9. Morris, C.D., Bransford, J., Franks, J.J.: Levels of processing versus transfer appropriate processing. J. Verbal Learn. Verbal Behav. 16, 519–533 (1977) 10. Slamecka, N.J., Katsaiti, L.T.: Normal forgetting of verbal lists as a function of prior testing. J. Exp. Psychol. Learn. Mem. Cogn. 14, 716–727 (1988) 11. Carpenter, S.K.: Cue strength as a moderator of the testing effect: the benefits of elaborative retrieval. J. Exp. Psychol. Learn. Mem. Cogn. 35, 1563–1569 (2009) 12. Carpenter, S.K.: Semantic information activated during retrieval contributes to later retention: support for the mediator effectiveness hypothesis of the testing effect. J. Exp. Psychol. Learn. Mem. Cogn. 37, 1547–1552 (2011). https://doi.org/10.1037/a0024140 13. Carpenter, S.K., DeLosh, E.L.: Impoverished cue support enhances subsequent retention: support for the elaborative retrieval explanation of the testing effect. Mem. Cogn. 34, 268–276 (2006). https://doi.org/10.3758/BF03193405 14. Bjork, R.A.: Memory and metamemory considerations in the training of human beings. In: Metcalfe, J., Shimamura, A. (eds.) Metacognition: Knowing about Knowing, pp. 185–205. MIT Press, Cambridge (1994) 15. Bjork, R.A., Dunlosky, J., Kornell, N.: Self-regulated learning: beliefs, techniques, and illusions. SSRN (2013). https://doi.org/10.1146/annurev-psych-113011-143823 16. Karpicke, J.D., Lehman, M., Aue, W.R.: Retrieval-based learning. An episodic context account (2014)

Testing the Testing Effect in Electrical Science with Learning Approach

395

17. Lehman, M., Smith, M.A., Karpicke, J.D.: Toward an episodic context account of retrievalbased learning: dissociating retrieval practice and elaboration. J. Exp. Psychol. Learn. Mem. Cogn. 40, 1787–1794 (2014). https://doi.org/10.1037/xlm0000012 18. Pan, S.C., Rickard, T.C.: Transfer of test-enhanced learning: meta-analytic review and synthesis. Psychol. Bull. 144, 710–756 (2018). https://doi.org/10.1037/bul0000151 19. Roediger, H.L.I., Putnam, A.L., Smith, M.A.: Ten benefits of testing and their applications to educational practice. In: Mestre, J.P., Ross, B.H. (eds.) The Psychology of Learning and Motivation: Cognition in Education, vol. 55, pp 1–36. Elsevier Academic Press, San Diego (2011) 20. Karpicke, J.D., Roediger, H.L.I.: The critical importance of retrieval for learning. Science 319(5865), 966–968 (2008) 21. Roediger, H.L.I., Butler, A.C.: The critical role of retrieval practice in long-term retention. Trends Cogn. Sci. 15, 20–27 (2011) 22. Eustace, J., Pathak, P.: Enhancing electrical science learning within a novel practice testing learning framework. In: 2018 IEEE Frontiers in Education Conference (FIE), pp 1–8. IEEE (2018) 23. Eustace, J., Pathak, P.: Retrieval practice, enhancing learning in electrical science. In: Proceedings of the 11th International Conference on Computer Supported Education (CSEDU 2019), pp 262–270, Heraklion, Crete. SCITEPRESS (2019) 24. Eustace, J., Pathak, P.: Enhancing statistics learning within a practice testing learning framework. In: ICERI2017, pp 1128–1136. Seville (2017) 25. Eustace, J., Bradford, M., Pathak, P.: A practice testing learning framework to enhance transfer in mathematics. In: Muntean, C., Hofmann, M. (eds.) The 14th Information Technology and Telecommunications Conference, pp 88–95 (2015) 26. Butler, A.C., Marsh, E.J., Slavinsky, J.P., Baraniuk, R.G.: Integrating cognitive science and technology improves learning in a stem classroom. Educ. Psychol. Rev. 26(2), 331–340 (2014). https://doi.org/10.1007/s10648-014-9256-4 27. McDaniel, M.A., Thomas, R.C., Agarwal, P.K., et al.: Quizzing in middle-school science: successful transfer performance on classroom exams. Appl. Cogn. Psychol. 27, 360–372 (2013). https://doi.org/10.1002/acp.2914 28. Tse, C., Pu, X.: The effectiveness of test-enhanced learning depends on trait test anxiety and working-memory capacity. J. Exp. Psychol. Appl. 18, 253–264 (2012) 29. Ariel, R., Karpicke, J.D.: Improving self-regulated learning with a retrieval practice intervention. J. Exp. Psychol. Appl. 24, 43–56 (2018). https://doi.org/10.1037/xap0000133 30. Rawson, K.A., Dunlosky, J.: Optimizing schedules of retrieval practice for durable and efficient learning: how much is enough? J. Exp. Psychol. Gen. 140, 283–302 (2011) 31. Sobel, H.S., Cepeda, N.J., Kapler, I.V.: Spacing effects in real-world classroom vocabulary learning. Appl. Cogn. Psychol. 25, 763–767 (2011) 32. Carpenter, S.K., Pashler, H., Cepeda, N.J.: Using tests to enhance 8th grade students’ retention of US history facts. Appl Cogn 23, 760–771 (2009) 33. Cepeda, N.J., Pashler, H., Vul, E., et al.: Distributed practice in verbal recall tasks: a review and quantitative synthesis. Psychol. Bull. 132, 354–380 (2006) 34. Cepeda, N.J., Vul, E., Rohrer, D., et al.: Spacing effects in learning: a temporal ridgeline of optimal retention. Psychol. Sci. 19, 1095–1102 (2008) 35. Jensen, J.L., McDaniel, M.A., Woodard, S.M., Kummer, T.A.: Teaching to the test or testing to teach: exams requiring higher order thinking skills encourage greater conceptual understanding. Educ. Psychol. Rev. 26(2), 307–329 (2014) 36. Wooldridge, C.L., Bugg, J.M., McDaniel, M.A., Liu, Y.: The testing effect with authentic educational materials: a cautionary note. J. Appl. Res. Mem. Cogn. 3, 214–221 (2014). https:// doi.org/10.1016/j.jarmac.2014.07.001

396

J. Eustace and P. Pathak

37. Pyc, M.A., Rawson, K.A.: Testing the retrieval effort hypothesis: does greater difficulty correctly recalling information lead to higher levels of memory? J. Mem. Lang. 60, 437–447 (2009) 38. Laurillard, D.: Rethinking university teaching: a conversational framework for the effective use of learning technologies (2002) 39. Merrill, M.D.: First principles of instruction. Educ. Technol. Res. Dev. 50, 43–59 (2002) 40. Laurillard, D.: The pedagogical challenges to collaborative technologies. Int. J. Comput. Collab. Learn. 4, 5–20 (2009) 41. Pachler, N., Daly, C., Mor, Y., Mellar, H.: Formative e-assessment: practitioner cases. Learn. Comput. Educ. 54, 715–721 (2010). https://doi.org/10.1016/j.compedu.2009.09.032 42. Neo, M., Neo, K.T.-K., Lim, S.T.: Designing a web-based multimedia learning environment with laurillard’s conversational framework: an investigation on instructional relationships. TOJET Turk. Online J. Educ. Technol. 12, 39–50 (2013) 43. van Gog, T., Kester, L.: A test of the testing effect: acquiring problem-solving skills from worked examples. Cogn. Sci. 36, 1532–1541 (2012) 44. Sweller, J., van Merriënboer, J.J.G., Paas, F.: Cognitive architecture and instructional design. Educ. Psychol. Rev. 10, 251–296 (1998) 45. Nelson, T.O.: Metamemory: a theoretical framework and new findings. Psychol. Learn. Motiv.-Adv. Res. Theory 26, 125–173 (1990). https://doi.org/10.1016/S0079-7421(08)600 53-5 46. Dunlosky, J., Serra, M., Baker, J.: Metamemory applied. In: Durso, F.T., Gronlund, S.D., Nickerson, R., et al. (eds.) Handbook of Applied Cognition, 2nd edn, pp. 137–159. Wiley, New York (2007) 47. Raaijmakers, S.F., Baars, M., Paas, F., et al.: Training self-assessment and task-selection skills to foster self-regulated learning: do trained skills transfer across domains? Appl. Cogn. Psychol. 32, 270–277 (2018). https://doi.org/10.1002/acp.3392 48. Kostons, D., van Gog, T., Paas, F.: Training self-assessment and task-selection skills: a cognitive approach to improving self-regulated learning. Learn. Instr. 22, 121–132 (2012). https:// doi.org/10.1016/j.learninstruc.2011.08.004 49. Marton, F., Säljö, R.: On qualitative differences in learning. Br. J. Educ. Psychol. 46(1), 4–11 (1976) 50. Biggs, J.B.: Student approaches to learning and studying. Research Monograph (1987) 51. Zhang, L.F.: University students’ learning approaches in three cultures: an investigation of Biggs’s 3P model. J. Psychol. Interdiscip. Appl. 134, 37–55 (2000). https://doi.org/10.1080/ 00223980009600847 52. Laurillard, D.: Styles and approaches in problem-solving. In: Marton, F., Hounsell, D., Entwistle, N. (eds.) The Experience of Learning: Implications for teaching and studying in higher education, 3rd edn. pp. 126–144 (2005) 53. Zimmerman, B.J.: Becoming a self-regulated learner: an overview. Theory Pract. 41, 64 (2002) 54. Thiede, K., Wiley, J., Griffin, T.: Test expectancy affects metacomprehension accuracy. Br. J. Educ. 81, 264–273 (2011) 55. Agarwal, P.K., D’Antonio, L., Roediger, H.L.I., et al.: Classroom-based programs of retrieval practice reduce middle school and high school students’ test anxiety. J. Appl. Res. Mem. Cogn. (2014). https://doi.org/10.1016/j.jarmac.2014.07.002 56. Smith, M.A., Roediger, H.L.I., Karpicke, J.D.: Covert retrieval practice benefits retention as much as overt retrieval practice. J. Exp. Psychol. Learn. Mem. Cogn. 39, 1712–1725 (2013) 57. Johnson, C.I., Mayer, R.E.: A testing effect with multimedia learning. J. Educ. Psychol. 101, 621–629 (2009) 58. Biggs, J.B.: Study Process questionnaire manual. Student Approaches to Learning and Studying (1987)

Testing the Testing Effect in Electrical Science with Learning Approach

397

59. Biggs, J., Kember, D., Leung, D.: The revised two-factor study process questionnaire: R-SPQ2F. Br. J. Educ. Psychol. 71, 133–149 (2001). https://doi.org/10.1348/000709901158433 60. Martinelli, V., Raykov, M.: Evaluation of the revised two-factor study process questionnaire (R-SPQ-2F) for student teacher approaches to learning. J. Educ. Soc. Res. 7, 9–13 (2017). https://doi.org/10.5901/jesr.2017.v7n2p9 61. Moore, L.A.: The relationship between approaches to learning and assessment outcomes in undergraduate optometry students. Technological University Dublin (2015) 62. Justicia, F., Pichardo, M.C., Cano, F., et al.: The revised two-factor study process questionnaire (R-SPQ-2F): exploratory and confirmatory factor analyses at item level. Eur. J. Psychol. Educ. 23, 355–372 (2008). https://doi.org/10.1007/BF03173004 63. Shaw, M., Bednall, D., Hall, J.: A proposal for a comprehensive response-rate measure (CRRM) for survey research. J. Mark. Manag. 18, 533–554 (2002) 64. Batsell, W.R., Perry, J.L., Hanley, E., Hostetter, A.B.: Ecological validity of the testing effect: the use of daily quizzes in introductory psychology. Teach. Psychol. 44, 18–23 (2017). https:// doi.org/10.1177/0098628316677492 65. Butler, A.C., Roediger, H.L.: Testing improves long-term retention in a simulated classroom setting. Eur. J. Cogn. Psychol. 19, 514–527 (2007). https://doi.org/10.1080/095414407013 26097 66. van Gog, T., Sweller, J.: Not new, but nearly forgotten: the testing effect decreases or even disappears as the complexity of learning materials increases. Educ. Psychol. Rev. 27(2), 247–264 (2015). https://doi.org/10.1007/s10648-015-9310-x 67. Roediger, H.L.I., Marsh, E.J.: The positive and negative consequences of multiple-choice testing. J. Exp. Psychol. Learn. Mem. Cogn. 31, 1155–1159 (2005) 68. McDermott, K.B., Agarwal, P.K., D’Antonio, L., et al.: Both multiple-choice and short-answer quizzes enhance later exam performance in middle and high school classes. J. Exp. Psychol. Appl. 20, 3–21 (2014). https://doi.org/10.1037/xap0000004 69. McDaniel, M.A., Agarwal, P.K., Huelser, B.J., et al.: Test-enhanced learning in a middle school science classroom: the effects of quiz frequency and placement. J. Educ. Psychol. 103, 399–414 (2011) 70. Kang, S.H.K., McDermott, K.B., Roediger, H.L.I.: Test format and corrective feedback modify the effect of testing on long-term retention. Eur. J. Cogn. Psychol. 19, 528–558 (2007)

A Web Platform to Foster and Assess Tonal Harmony Awareness Federico Avanzini1 , Adriano Barat`e1 , Luca A. Ludovico1(B) , and Marcella Mandanici2 1

LIM – Laboratorio di Informatica Musicale, Dipartimento di Informatica “Giovanni Degli Antoni”, Universit` a degli Studi di Milano, Via Giovanni Celoria 18, Milan, Italy {federico.avanzini,adriano.barate,luca.ludovico}@unimi.it 2 Conservatorio di Musica “Luca Marenzio”, Piazza Arturo Benedetti Michelangeli 1, Brescia, Italy [email protected] https://www.lim.di.unimi.it, https://www.consbs.it

Abstract. This paper investigates the use of computer-based technologies applied to early learning of tonal music harmony, a topic often considered too abstract and difficult for young students or amateurs. A web-based platform is described, aimed at fostering and assessing harmonic awareness in children by leveraging on chord perception, gestural interaction and gamification techniques. The application guides young learners through 3 experiences towards the discovery of important features of tonal harmony, where they can listen to melodies or chunks of well-known music pieces and associate chords to them. Users’ actions during the experiences are recorded and analyzed. An early experimentation with 45 school teachers was conducted with the goal of assessing the usability of the application, the level of acceptance by teachers, and prototypical behaviors during the experiences. The results provide guidelines on how to evaluate user performances, as well as useful indications for further development of the platform. Keywords: Music learning · Computer-supported education Tonal harmony · Web · Automatic assessment

1

·

Introduction

Music technology is playing an increasingly important role in educational activities as well as in the dissemination of culture and musical practices. This is due not only to the impact of both national [27,48] and European [15] government policies aimed at supporting innovation in schools, but also to the growing availability of electronic devices (electronic keyboards, computers, digital audio workstations, etc.) both in school laboratories and at home. Moreover, portable devices such as tablets and smart phones are endowed with a great number c Springer Nature Switzerland AG 2020  H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 398–417, 2020. https://doi.org/10.1007/978-3-030-58459-7_19

A Web Platform to Foster and Assess Tonal Harmony Awareness

399

of applications for creating (e.g., GarageBand1 , Songify2 , Animoog3 ), listening (e.g., Spotify4 , Pandora5 , iHeartRadio6 ) and learning music (e.g., Simply Piano7 , Yousician8 , and Uberchord9 ), thus allowing educational training also outside a traditional school environment. This outstanding development implies not only an increase of technical knowledge for the management of electronic devices, but also a shift in the conception of musical teaching and, ultimately, of music itself. The problem of integrating technology effectively in the classroom has been addressed by Mishra & Koehler by introducing the “Technological Pedagogical Content Knowledge” framework (TPACK) [26]. According to TPACK, teaching with technology does not mean simply to embed new devices in classroom practices, but rather to take into account the complex interrelationships that connect at least three knowledge domains: content, pedagogy and technology. Thus, the teacher must be able to mediate between her domain expertise and the technological representation of the concepts; she must employ pedagogical techniques to derive all the advantages offered by the new educational means and to provide adequate monitoring of students advances. The introduction of technologies in the instructional design implies also the involvement of new learning styles and situations. Formal learning – where the activity is sequenced beforehand and has a predefined aim – and informal learning – where the activity is not sequenced beforehand and is steered by the development of the work – are the two cornerstones of a continuum along which various mix ups of both teaching approaches can be placed [16]. In this extremely varied context, constructivism, i.e. the theory of learning–by–doing, finds its full realization. Originating from the thought of Piaget [31] first and then of Papert [29], constructivism underlies many music education experiences which draw greatly from advances in music technology [47]. Consequently, other important aspects of teaching are affected by these changes such as didactic planning and management of the class group. In this regard, the blended learning approach that sees the coexistence of classroom and online activities offers some interesting outcomes, because it allows the possibility of designing student-tailored learning approaches, flexibility and the freedom to learn anytime and anywhere, social engagement and experience sharing with teachers, class mates and internet communities [11]. Digital technology not only allows to reach new musical domains but also changes the way knowledge is communicated [46]. According to Brown [5, pp. 6– 12], computers in music education can be profitably used to amplify musicality, 1 2 3 4 5 6 7 8 9

https://apps.apple.com/us/app/garageband/id408709785. https://songify.en.uptodown.com. https://www.moogmusic.com/products/animoog. https://www.spotify.com. https://www.pandora.com. http://www.iheartradio.ca. https://www.joytunes.com. https://yousician.com. https://www.uberchord.com.

400

F. Avanzini et al.

by acting as a simple tool, as a medium, and as a musical instrument. In the first case, traditional tasks such as writing and playing music back can be performed more accurately and in a shorter time, but they remain essentially the same as with paper, pencil and piano. In the second case, the transformative power of computers produces a significant shift in the very nature of the matter treated [3]. For example, the editing tools of a digital engraving software may be used to copy, paste and transpose sections of the musical work. Finally, computers can be profitably used as musical instruments, too. For instance, tunes can be programmed in terms of MIDI messages, musical textures can be altered through timbral changes and DSP effects, and a number of devices (e.g., mouse, touchscreen, webcam, tangibles, attached sensors) can further enrich the expressive possibilities. Such dramatic changes envision contexts where musicality is not only expanded thanks to technology but begins to be inextricably connected with it. Creative activities and artistic practices such as designing performance environments or synthesizing new sounds depend strictly on computers, technological devices and musicians’ programming skills. But amplifying musicality means not only providing new fields for music creation but also expanding music curricula [41] and experimenting how to extend the limits of knowledge of musical structure and theory in a new and unprecedented way. This is the aim of Harmonic Touch, a web platform for the study and practice of tonal harmony which leverages on harmonic perception, embodied cognition and gamification in order to introduce primary and middle school students to a set of experiences focused on harmonic skills and awareness. Finally, a key problem connected to the use of technology in music education is assessment. In general, technology-based assessment is considered more reliable and efficient in collecting and recording data. It allows analysis and provides rapid feedback for participants and stakeholders [13]. In the musical domain, automatic quantitative assessment of music performance can provide a sound and objective feedback to students, particularly when the supervision of a teacher is not available [44]. For teachers, automated performance assessment is useful to collect large amount of quantitative data, also providing metrics to assign student grades [38]. Automatic assessment of users performance is also a necessary element in games for music education where scoring systems may track their progress and provide feedback for self-assessment and engagement [40]. Focusing on the specific goal of fostering tonal harmony awareness in young students, the remainder of this work is structured so as to reflect this multifaceted vision: Sect. 2 will introduce the musical background which can serve as a reference point for the following discussion; Sect. 3 will provide details about Harmonic Touch, a web platform that proposes three exercise models to gain tonal harmony awareness; Sect. 4 will analyze experimental results; finally, Sect. 5 will draw conclusions. This paper is an extension of the work presented at the 11th International Conference on Computer Supported Education (CSEDU 2019) [25].

A Web Platform to Foster and Assess Tonal Harmony Awareness

401

Fig. 1. The spatial arrangement of primary and parallel chords and three common chord progressions [25].

2

Fundamentals of Tonal Harmony

Tonal harmony has been systematically defined by Rameau in 1722 [32] and, since then, it has been adopted in many musical styles, from Baroque to contemporary popular music [6,21]. As we live in a world where tonal music is pervasive, we have been subjected since our childhood to music stimuli which solicit the brain to build an internal understanding of musical structures according to their tonal function. The process of obtaining complex information from the environment regardless of our awareness is called implicit learning; this is the mechanism that supervises, e.g., the learning of language [33]. Many studies show that preschool children have implicit knowledge of tonal harmony: they can recognize the best target chord in a tonal context [35] or identify a deviant musical chord in a tune accompaniment [9]. In the remainder of this section, we will introduce the most relevant aspects to take into account in order to implement an educational activity aiming at the development of harmony awareness. 2.1

Representation of the Harmonic Space

The first step of the process is to find a suitable representation of the harmonic space, limiting the chords to be represented to a meaningful subset. Following Riemann’s theory of the tonal functions of chords [34], we divide the harmonic space of a given key into primary and parallel chords [25]. Each chord is constituted by three notes, all belonging to the natural grades of a major scale. The second note of the chord is one third above the root note, and the third note of the chord is a fifth above. In Riemann’s theory, there are three primary chords, called tonic (T ), subdominant (SD), and dominant (D): they correspond to the major triads built

402

F. Avanzini et al.

on the first, fourth, and fifth grade of a major scale, respectively. In addition, there are three parallel chords, called parallel tonic (Tp ), parallel subdominant (SDp ), and parallel dominant (Dp ). They are located a minor third below the corresponding primary chords; consequently, they are the minor triads built on the sixth, second, and third grade of a major scale, respectively. Chords can be placed along a circle with the primary chords in the lower part and the parallel chords in the upper part, just above their relatives (see Fig. 1). This abstract scheme, even if highly simplified with respect to all the possibilities offered by tonal harmony, can fit a number of popular songs as well as classic music harmonization patterns, which can be a good starting point for understanding harmonic functions. This spatial arrangement enhances the perceptual differences between the primary-chord zone (all major chords) and the parallel-chord zone (all minor chords), making it easier to associate the sound of chords with their position. This spatial representation of chords allows an easy navigation of the harmonic space, and users can discover and easily remember the routes of harmonic progressions, as shown by the colored paths in Fig. 1. 2.2

Melody Harmonization

Melody harmonization represents one of the common tasks for musicians and amateurs. The cognitive aspects of melody harmonization can be linked to the recognition of implied harmony and the detection of the harmonic rhythm. Concerning the former issue, the perception of implied harmony determines the detection of a best fitting chord sequence to harmonize a leading voice. Considering the pitches on the main beats of a melody, implied harmony is driven by their belonging to a given chord. Also if the chord is not played or the melody does not contain all chord components, listeners equally make inferences about the implied harmony [36]. But musical chords share also common pitches among them, especially when harmonies more complex than triads are involved. As a consequence, sometimes more than one implied harmony may be perceived by listeners and, even in the context of tonal harmony, more than one chord sequence can be used for the harmonization of the same melody. Concerning harmonic rhythm, it can be defined as the time pattern formed by harmonic-changes occurrences. In the example of Fig. 2, each beat is represented by a white square with a thicker black line every four squares to indicate the musical meter. The chord symbols are positioned in the points where a harmonic change occurs, determining thus the harmonic rhythm [1, pp. 376–377]. It is important to note that harmonic rhythm does not always coincide with the beats nor with measures. Starting from the auto-accompaniment function embedded in electronic keyboards [12, p. 125], many software tools have been developed for creating automated melody accompaniment, such as Band–in–a–Box10 and Chordify11 (see Fig. 2). 10 11

https://www.pgmusic.com. https://chordify.net.

A Web Platform to Foster and Assess Tonal Harmony Awareness

403

Fig. 2. A screenshot from Chordify website showing the diagram wiew of “Good Riddance” (Time Of Your Life) with the various functions in the upper bar.

2.3

Key Elements of Tonal Harmony Awareness

In Sect. 2.2 we have shown how the recognition of implied harmony and of harmonic rhythm are two cognitive aspects of melody harmonization. In the limits of this contribution we now define these same features as key elements of tonal harmony awareness and describe their relationship with the more basic elements of pitch and rhythm. Furthermore, we present metrics and analytic methods for automatically measuring users competence and progress in these domains within the context of already known methodologies for music performance analysis. Traditional and contemporary assessment methods are based essentially on two domains: tonal (pitch, melody) and temporal (rhythm, rhythm to melody, accents, tempo), while more complex abilities such as harmony are supposed to build upon these basic elements [45]. Also other music aspects may be tested such as dynamics [22,37], timbre [37] and memory [30]; however, rhythm and pitch are the main pillars of music making. When combined together, they can produce complex and interesting musical patterns such as harmony, counterpoint and musical forms [20]. Following this assumption, implied harmony and harmonic rhythm can be considered organization structures at higher level with respect to the basic domains of pitch and rhythm [8,43]. Corrigall & Trainor [9] report that children develop sensitivity to key membership earlier than to harmony. This is due to the fact that key membership involves note-by-note comparison to key, whereas harmony involves more complex single note–chord comparison within the key. To extract harmonic information from melody, children need to direct attention on multiple aspect of a music excerpt (beat, pitch, melodic contour, etc.) [10].

404

F. Avanzini et al.

Moreover, to decode the complex information conveyed by musical chords it is necessary to involve the capacity to simultaneously perceive numerous sounds as one whole, an ability defined as harmonic hearing by Teplov [42]. As far as concerns the perception of harmonic rhythm, a way to explain it is to refer to the hierarchical theory of musical meter by Lerdhal and Jackendoff [24] where rhythmic perception has a tree-like structure similar to that of Fig. 3. The first level reports the rhythm of pitch events, the second the tactus, the third the meter (4/4). The fourth level (harmonic rhythm) has been added to show the high level position of this feature.

Fig. 3. Four levels of rhythm perception in “Good Riddance” (Time Of Your Life).

3

The Web Platform

Building on the previous discussion, this section presents a web platform that implements a series of experiences aimed at fostering and assessing tonal harmony awareness. 3.1

Methods for the Automated Assessment of Musical Abilities

Objectively assessing musical abilities is a much studied – and controversial – problem. Musical aptitude batteries proposed in the second half of 20th century are now considered obsolete in several respects [22]. The concept of musical ability is multifaceted and includes various types of musical capacity (e.g., tempo, pitch, rhythm, timbre, melody perception) that are not easily separated. One of the domains where all these features converge is musical performance [4]. While it is relatively easy to assess users achievements in a computer assisted environment based on multiple choice tests, it is much more complex to assess a musical performance where multiple parameters must be tracked and evaluated, such as tempo and timing, dynamics, pitch and timbre [23]. Music information retrieval (MIR) offers a series of techniques useful to treat these problems, such as time-frequency representation, spectral decomposition, onset detection and note tracking [14]. Many educational systems are based on MIR methods,

A Web Platform to Foster and Assess Tonal Harmony Awareness

(a) Recognition of implied harmony.

405

(b) Melody harmonization.

Fig. 4. The circular representation of the harmonic space employed in the first and in the third experience of Harmonic Touch [25].

such as Smart Music12 which provides immediate feedback to students’ performances, Tonara13 for interactive score-following, and MySong14 for automatic accompaniment of vocal melodies. Objective descriptors of music performance may be score–dependent or score–independent. Score–dependent features represent performance accuracy with respect to pitch, dynamics and rhythm; score– independent information is extracted by aligning score data to the extracted features through dynamic time warping techniques15 [44]. As far as the study of tonal harmony is concerned, various applications have been proposed in the literature with the aim of aiding the understanding of musical chords and harmonic progressions [7,17–19]. One notable example is Mapping Tonal Harmony, a tool for visualizing the various shifts through harmonic regions in real time.16 However it must be noted that all these systems are rather complex to use and are not finalized for use in primary or middle schools. 3.2

Harmonic Touch

Harmonic Touch 17 is a publicly-available web platform conceived as a stepby-step wizard containing self-explanatory descriptions that lead users through a series of experiences towards the ultimate goal of melody harmonization. 12 13 14 15 16 17

https://www.smartmusic.com. https://tonara.com. https://www.microsoft.com/en-us/research/project/mysong-automatic-accompani ment-vocal-melodies. Dynamic time warping is an algorithm for measuring similarity between two temporal sequences [28]. https://mdecks.com/mapharmony.phtml. http://didacta18.lim.di.unimi.it/eng/.

406

F. Avanzini et al.

Fig. 5. The interface of the second experience: recognition of the harmonic rhythm [25].

Although the application is mainly concerned with theoretical music abilities rather than musical instrument performance, it shares some common traits with the works reviewed above, namely a performative dimension and a gamified approach. Skipping the abstractions traditionally used to introduce this topic (keys, scales, intervals, chords, etc.), Harmonic Touch employs the perceived qualities of musical chords to engage learners in experiences designed on the basis of the fundamental features of tonal harmony described in Sect. 2. As discussed in Sect. 4, the data resulting from these experiences are an important basis for an a posteriori analysis of learners’ performances, as they hold a significant amount of information about users perception, musical preference and attitudes. Harmonic Touch presents three web-based experiences using a device attached to the internet. These activities aim at an intuitive and direct communication, do not depend on previously acquired theoretical knowledge, and utilize a gamified approach to introduce primary and middle school students to the subject of tonal harmonic awareness. The three experiences, based on the cognitive aspects mentioned in Sect. 2, are the following: 1. Recognition of Implied Harmony – The learner is asked to match a brief music excerpt with a single chord that, in her view, best suits the whole melody. The chord is selected from the set of primary and parallel chords mentioned

A Web Platform to Foster and Assess Tonal Harmony Awareness

407

above. Following the spatial schema explained in Sect. 2.1, in this experience the harmonic space is represented through the circle shown in Fig. 4(a), where the layout of chords is fixed, but the position of the tonic is randomly rotated and no chord label is visible. Understanding the relative position of chords is left to the user, who can explore them freely during the training phase. 2. Recognition of Harmonic Changes (harmonic rhythm) – In accordance with a gamification approach, the experience takes place over a sort of treasure map, or any other graphical representation of a step-by-step path. After carefully listening to a complete piece (melody and chords), the learner is asked to reconstruct it by moving one step ahead over the map whenever a new chord is expected. An example of interface is shown in Fig. 5. If the click does not occur at the right timing, music stops; if it is performed in advance, a part of the tune is skipped. 3. Melody Harmonization – This experience requires to select the right chords at the right timing in order to accompany a known music tune. Conceived as the natural evolution of the previous experiences, this one focuses on the simultaneous recognition of the best-fitting chords and the occurrence of harmonic changes. The graphical interface, shown in Fig. 4(b), recalls the circular representation of chords of the first experience, where the spatial relationship among chords is maintained; in this case, the position of the tonic is fixed and chord labels are explicitly indicated. In the current implementation, each group of experiences starts with a training phase to make the user accustomed to the interface. After the training phase, three exercises per group of experiences are proposed. The system tracks timing and sequences of mouse clicks and saves them into a database, so that single user performances and improvements can be assessed and typical cross-user behaviors can emerge.

4

Assessment of Harmonic Awareness

This section presents and discusses experimental data collected through the web platform during a workshop on music education and digital languages, at the 2nd edition of Didacta Italy, Florence, October 18–20, 2018.18 The workshop aimed at involving music teachers in the use of the application, as well as analyzing teachers’ performances and behaviors in the proposed experiences. It was attended by 45 middle (57%) and primary (25%) school teachers, with mean age = 49.8 (median = 51) and mean working age = 22.7 years (median = 20 years). The complete data set of results (answers to pre-activity and post-activity surveys, and the three experiences described in Sect. 3) is publicly available.19 The remainder of the section presents a behavioral analysis of the users involved in the experiences, with the aim of characterizing prototypical behaviors in using (and possibly misusing) the application. The goal is to assess the 18 19

http://fieradidacta.indire.it/, the most important Italian fair focusing on education, vocational training and relation among school and work. http://didacta18.lim.di.unimi.it/results/index.html.

408

F. Avanzini et al.

Fig. 6. Distribution of the chords selected at the end of each test. Table 1. Minimum m, maximum M , mean μ, and standard deviation σ of selection time and number of explored chords. Selection time (s) m M μ

σ

Explored chords () m M μ σ

Test 1 0.55 211.65 65.01 49.38 0

22 5.78 5.49

Test 2 1.25 577.71 75.18 90.82 0

27 7.66 5.82

Test 3 1.03 185.91 38.43 38.32 0

18 4.73 4.53

Test 4 2.06 174.36 43.38 38.00 0

32 7.18 7.56

effectiveness of the proposed approach, and to develop a set of guidelines and tools that can help teachers evaluate pupils based on the experimental data gathered through the platform. An additional, long-term goal is to develop a set of quantitative metrics through which tests performed by users can be given objective scores (an early attempt was made [2], limited to Experience 1). 4.1

Experience 1: Implied Harmony

Four tests were administered in this experience, where the first one served as training. The corresponding music excerpts are: “Bolero” (Maurice Ravel), 8 measures; “Brother John” (traditional), 12 measures; “Guglielmo Tell” (Gioacchino Rossini), 16 measures of the Ouverture theme; “Tourdion” (traditional), 6 measures. These share the common characteristic that the implied harmony is a single tonic chord for the entire duration of the excerpt. Figure 6 shows the distribution of chords selected by users at the end of each test. The majority of the users chose the tonic chord T for all tests, which seems to demonstrate harmonic awareness by most users. Note that the first three tests are all in major key and very popular, while Test 4 is in minor key and is a little known piece from French Renaissance. Accordingly, the distributions for the first three tests are very similar, with T chosen by about 80% of the participants, while those for Test 4 are markedly different, with T chosen by only 57% of users. Table 1 illustrates some additional data aggregated across users, regarding the “selection time” and the number of explored chords for each test. The former

A Web Platform to Foster and Assess Tonal Harmony Awareness

409

Fig. 7. Performance of two users in Experience 1. Left panels depict user trajectories on the harmonic space representation, while right panels show the same trajectories as functions of time; the order in which chords were clicked is represented by progressive integers. ‘R’ and ‘X’ labels on the time axis represent instants when the user restart the test or proceed to the next one, respectively. Shaded areas correspond to time segments where there is actual music playback.

quantity refers to the time elapsed between the instant when the selected chord is first clicked and the instant when the user confirms its selection. These data show a tendency towards an exploratory behavior by users, who continue navigating through chords even after clicking the one that they consider correct. A comprehensive analysis of the results allowed to identify two prototypical behaviors: the “frantic explorer”, who wanders around the circle, usually following a clockwise or counterclockwise path, possibly several times; and the “self-confident user”, who stops almost immediately after choosing the expected chord. Figure 7 shows two examples of these behaviors. User 1 exhibited selfconfidence, jumped quickly to the selected chord in a few seconds and stopped. Conversely, User 23 showed a tendency to replicate an exploratory behavior over

410

F. Avanzini et al.

Fig. 8. Performance of User 12 in Experience 2 (the training test is excluded). The x-axis shows the song time, while the vertical dashed lines and the adjacent progressive numbers show instants of harmonic changes. Blue dots and horizontal segments show lag and lead errors by the user with respect to correct timing. Red dots and arrows show instants when the user started a new trial for the same test.

tests and trials. No evident learning effect was noticed: some users seemed to learn to stop at the first occurrence of the tonic chord, or to minimize the number of attempts after that, but not to jump towards the final destination with confidence. This is a confirmation that this experience stimulated a playful behavior by users rather than stimulating them to complete a task. 4.2

Experience 2: Harmonic Rhythm

Four tests were administered in this experience, where the first one served as training. The corresponding music excerpts are: “La donna `e mobile” (aria from

A Web Platform to Foster and Assess Tonal Harmony Awareness

411

Fig. 9. Distribution of errors across all users in Experience 2.

Fig. 10. Harmonic rhythm of Tests 2 and 3, Experience 2.

Giuseppe Verdi’s “Rigoletto”), 16 measures with T-D changes; “Yellow Submarine” (The Beatles), 8 measures (refrain) with T-D changes; “Hey Jude” (The Beatles), 8 measures (first verse) with T-D-SD changes; “Il ragazzo della via Gluck” (Adriano Celentano), 18 measures (second verse) with T-D-Tp changes. Overall, the performance across users was extremely varied. Qualitative analysis of the results shows that some users performed almost perfectly: this is a cue that the task was clearly explained and that the interface was usable. On the other hand, several users performed poorly and seemed to exhibit a limited awareness of harmonic rhythm. Figure 8 shows one of the most precise performances. It is worth analyzing lag (negative) errors and lead (positive) errors separately. The error distribution across all users exhibits a clear asymmetry, with more lags with respect to leads. Two remarks can be made. First, lag errors are larger than lead errors: this can be expected, because users will typically click after having perceived a harmonic change, unless they are extremely confident (e.g., they know well the harmony of the music excerpt) and are thus able to anticipate a harmonic change. Second, lag errors are quite large and larger than lead errors. This instead is a counter-intuitive result: recall that the playback stops if no click occurs at the harmonic change. Therefore users would be expected to click as soon as the music stops, regardless of whether or not they have actually recognized a harmonic change. Based on the above remark, it can be hypothesized that small lag errors are simply due to the reaction time of the user to the music stopping. Psychophysical studies provide a lower bound for auditory reaction times, which in experimental

412

F. Avanzini et al.

Fig. 11. Performance of User 5 in Experience 2 (Tests 2 and 3 only). Plots are organized as in Fig. 8.

settings can be as low as 300 ms [39]. However, it is reasonable to assume that users’ reaction times were much higher than this lower bound (e.g., between 500 ms and 1 s), as they were focusing on a different task. It can be assumed that only lag errors above a certain threshold related to the reaction time are actually significant, since they are due to the user not expecting an harmonic change and thus not being prepared to click at the right time. On the other hand, errors smaller than this threshold can be attributed to the reaction time of the user to the music stopping. Lead errors are more interesting, as they provide more insight on the actual user awareness of harmonic rhythm. Specifically, large leads are a signal of the user having misplaced a harmonic change. Examples of this type of mistakes in harmonic rhythm recognition are particularly evident in Tests 2 and 3. In both cases the harmonic rhythm has some variations, with some changes occurring after one measure and some after two measures (see Fig. 10). Several users were tricked by these structures, and had the tendency to incorrectly place all harmonic changes after one measure. Figure 11 shows a prototypical example of this behavior. 4.3

Experience 3: Melody Harmonization

Four tests were administered in this experience, where the first one served as training. The corresponding music excerpts were the same as in Experience 2.

A Web Platform to Foster and Assess Tonal Harmony Awareness

413

(a) User 2

(b) User 21

(c) User 19

Fig. 12. Performance of three users in Experience 3 (Test 2). The x-axis shows the song time, while the vertical dashed lines and the adjacent letters (bottom) show instants of harmonic changes and the corresponding chords. The remaining letters show chords selected by the user at a certain time instant. Red dots and arrows show instants when the user started a new trial for the same test.

This ensured that users were already familiar with the musical and harmonic materials. Users could listen to the melodic line of each excerpt and had to harmonize it. The last two tests were swapped with respect to Experience 2, thus in this experience Test 3 was “Il ragazzo della via Gluck” (Adriano Celentano) while Test 4 was “Hey Jude” (The Beatles). As explained earlier, melody harmonization requires to simultaneously recognize the occurrence of harmonic changes and the correct chord at each harmonic change, where “correct” here means the one of the original song. Other chord sequences may be found that also fit the melody. Whether to consider or not this as a mistake is debatable and may be left at the teacher’s evaluation. As a consequence, Experience 3 was expected to be the most challenging one for the users. An analysis of the number of trials performed by each user for each test confirmed this hypothesis. At several instances, users performed 20 trials or more of a single test. There was one case (User 23, Test 2) in which 48 trials were performed, for a total time of approximately 8 minutes spent on the same

414

F. Avanzini et al.

Fig. 13. Performance of User 5 in Experience 3 (Test 4). Plots are organized as in Fig. 12.

test. This behavior is markedly different from that observed in the preceding experiences, and also suggests that Experience 3 was the most engaging one. Given the complexity of the task, the sources of error are more numerous and a wider spectrum of behaviors by the users was observed. Figure 12 shows relevant examples of the behavior of three different users on Test 2 (“Yellow Submarine”). The top panel (User 2) provides an example of wrong harmonization, in which the user anticipated the harmonic changes and separated them by one measure (at about 6.5 s and 11.5 s along the song time) instead of two measures: this can be explained by looking back at Fig. 10 and the related discussion. The second panel shows instead a case (User 21) in which the same chords were selected twice consecutively: this cannot be considered a mistake in the harmonization (although the temporal accuracy is low). Finally, the bottom panel shows yet another different behavior (User 19) in which the harmonic progression T-SD-D-T was used consistently and with good temporal accuracy in place of the correct progression T-D-T: this is an interesting case in which the user’s choice is plausible and compatible with respect to the melodic line. One further example, partially related to the latter one, is provided in Fig. 13, which shows the performance of one user on Test 4 (“Hey Jude”). The user consistently inserted a harmonic change after one measure (at about 7 s along the song time). This has to be considered as a mistake, however it has to be noted that a subtle change actually occurs in the song at that particular point: namely, the triad on the dominant changes into a quadriad (a “seventh” chord with an added note with respect to the triad), while remaining on the dominant. Therefore, although the harmonic function (D) is not changing, the chord sounds different. In fact, other users exhibited a similar behavior at this particular point of Test 4.

5

Discussion and Conclusions

The analysis of experimental data presented in the previous section triggers several points of discussion. First, the choice of the music materials is critical. The excerpts should be sufficiently simple harmony-wise, ideally composed by triads only. At the same

A Web Platform to Foster and Assess Tonal Harmony Awareness

415

time, they should be already known to all the users in order to build on their implicit knowledge of tonal harmony. These two requirements pose severe constraints on the usable repertoire. In order to let the teachers additional freedom in choosing their own repertoire, we plan to extend the platform with authoring tools that allow to upload new music pieces, to segment them according to their harmonic rhythm, to augment them with metadata related to chords and relative timings, and so on. This would also allow teachers to customize contents and to adapt them to their educational goals, e.g. to develop learning paths focused on specific genres or composers. Another major point of discussion is concerned with the evaluation of user performances. The above analysis shows that this is a multidimensional problem which includes objective measures such as timing of harmonic changes and chord correctness, but also contextual elements such as user reaction times, alternative correct harmonizations, and so on. As a consequence, the task at hand cannot always be evaluated through a binary correct/wrong decision, and some subjective evaluation by the teacher may be required. This calls for the development of graphical representations of the users’ performances, which display relevant elements and aid the teachers in their evaluations by taking into account different behaviors, sources of errors, and any additional contextual information. The visualizations reported in this paper are a first example of possible graphical representations aimed at this use. Finally, the data analyzed in this paper represent a preliminary test of the platform, with a group that is not representative of the final users (upper elementary or middle school pupils). We are currently planning an experimental campaign in collaboration with primary and middle school music teachers, in which a class will be exposed to tonal harmony activities involving the use of the web platform for several months. Learning results will be compared to a second class which will serve as a control group.

References 1. Apel, W.: The Harvard Dictionary of Music. Harvard University Press, Cambridge (2003) 2. Avanzini, F., Ludovico, L.A., Barat`e, A., Mandanici, M.: Metrics for the automatic assessment of music harmony awareness in children. In: Proceedings International Conference Sound and Music Computing (SMC2019), pp. 372–379. Malaga, May 2019 3. Beckstead, D.: Will technology transform music education? Music Educ. J. 87(6), 44–49 (2001) 4. Brandao, M., Wiggins, G., Pain, H.: Computers in music education. In: Proceedings of the AISB 1999 Symposium on Musical Creativity, pp. 82–88 (1999) 5. Brown, A., Brown, A.R.: Computers in Music Education: Amplifying Musicality. Routledge (2012) 6. Butler, D., Brown, H.: Describing the Mental Representation of Tonality in Music. Oxford University Press, Oxford (1994) 7. Chew, E., Francois, A.R.: Interactive multi-scale visualizations of tonal evolution in MuSA. RT Opus 2. Comput. Entertain. 3(4), 1–16 (2005)

416

F. Avanzini et al.

8. Cohrdes, C., Grolig, L., Schroeder, S.: Relating language and music skills in young children: a first approach to systemize and compare distinct competencies on different levels. Front. Psychol. 7, 1616 (2016) 9. Corrigall, K., Trainor, L.: Effects of musical training on key and harmony perception. Ann. N. Y. Acad. Sci. 1169(1), 164–168 (2009) 10. Costa-Giomi, E.: Young children’s harmonic perception. Ann. N. Y. Acad. Sci. 999(1), 477–484 (2003) 11. Crawford, R.: Rethinking teaching and learning pedagogy for education in the twenty-first century: blended learning in music education. Music Educ. Res. 19(2), 195–213 (2017) 12. Crow, B.: Music-related ICT in education. In: Learning to Teach Music in the Secondary School, pp. 130–153. Routledge (2005) 13. Csap´ o, B., Ainley, J., Bennett, R.E., Latour, T., Law, N.: Technological issues for computer-based assessment. In: Griffin, P., McGaw, B., Care, E. (eds.) Assessment and Teaching of 21st Century Skills, pp. 143–230. Springer, Dordrecht (2012). https://doi.org/10.1007/978-94-007-2324-5 4 14. Dittmar, C., Cano, E., Abeßer, J., Grollmisch, S.: Music information retrieval meets music education. In: Dagstuhl Follow-Ups. vol. 3. Schloss Dagstuhl-LeibnizZentrum fuer Informatik (2012) 15. European Commission: Digital learning & ICT in education (2019). https://ec. europa.eu/digital-single-market/en/policies/digital-learning-ict-education 16. Folkestad, G.: Formal and informal learning situations or practices vs formal and informal ways of learning. Br. J. Music Educ. 23(2), 135–145 (2006) 17. Hedges, T.W., McPherson, A.P.: 3D gestural interaction with harmonic pitch space. In: Proceedings of the International Computer Music Conference and Sound and Music Computing Conference (ICMC-SMC 2013), pp. 103–108 (2013) 18. Holland, S.: Learning about harmony with harmony space: an overview. In: Smith, M., Smaill, A., Wiggins, G.A. (eds.) Music Education: An Artificial Intelligence Approach, pp. 24–40. Springer, London (1994). https://doi.org/10.1007/978-14471-3571-5 2 19. Johnson, D., Manaris, B., Vassilandonakis, Y.: Harmonic navigator: an innovative, gesture-driven user interface for exploring harmonic spaces in musical corpora. In: Kurosu, M. (ed.) HCI 2014. LNCS, vol. 8511, pp. 58–68. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07230-2 6 20. Krumhansl, C.L.: Rhythm and pitch in music cognition. Psychol. Bull. 126(1), 159 (2000) 21. Laitz, S.G.: The Complete Musician: An Integrated Approach to Tonal Theory, Analysis, and Listening. Oxford University Press, Oxford (2015) 22. Law, L.N., Zentner, M.: Assessing musical abilities objectively: construction and validation of the profile of music perception skills. PLoS ONE 7(12), e52508 (2012) 23. Lerch, A., Arthur, C., Pati, A., Gururani, S.: Music performance analysis: a survey. In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019 (2019) 24. Lerdahl, F., Jackendoff, R.S.: A Generative Theory of Tonal Music. MIT press, Cambridge (1996) 25. Mandanici, M., Ludovico, L., Avanzini, F., et al.: A computer-based approach to teach tonal harmony to young students. In: International Conference on Computer Supported Education, pp. 271–279. SCITEPRESS (2019) 26. Mishra, P., Koehler, M.J.: Technological pedagogical content knowledge: a framework for teacher knowledge. Teach. Coll. Record 108(6), 1017–1054 (2006)

A Web Platform to Foster and Assess Tonal Harmony Awareness

417

27. MIUR: Piano nazionale scuola digitale (2015). https://www.miur.gov.it/scuoladigitale 28. M¨ uller, M., Mattes, H., Kurth, F.: An efficient multiscale approach to audio synchronization. In: ISMIR, vol. 546, pp. 192–197. Citeseer (2006) 29. Papert, S.: Mindstorms: Children, Computers, and Powerful Ideas. Basic Books Inc. (1980) 30. Peretz, I., Champod, A.S., Hyde, K.: Varieties of musical disorders: the montreal battery of evaluation of amusia. Ann. N. Y. Acad. Sci. 999(1), 58–75 (2003) 31. Piaget, J.: The Origins of Intelligence in Children. International Universities Press, New York (1952) 32. Rameau, J.P.: Trait´e de l’harmonie reduite ` a ses principes naturels: divis´e en quatre livres. Ballard (1722) 33. Reber, A.S.: Implicit learning and tacit knowledge. J. Exp. Psychol. Gen. 118(3), 219 (1989) 34. Riemann, H.: Harmony Simplified, or the Theory of the Tonal Functions of Chords. Augener Ltd. (1896) 35. Schellenberg, E.G., Bigand, E., Poulin-Charronnat, B., Garnier, C., Stevens, C.: Children’s implicit knowledge of harmony in western music. Dev. Sci. 8(6), 551–566 (2005) 36. Schenker, H.: Free Composition: Volume III of New Musical Theories and Fantasies, vol. 1. Pendragon Press (2001) 37. Seashore, C.E., Lewis, D., Saetveit, J.G.: Seashore Measures of Musical Talents. Psychological Corp. (1956) 38. Sephus, N.H., Olubanjo, T.O., Anderson, D.V.: Enhancing online music lessons with applications in automating self-learning tutorials and performance assessment. In: 2013 12th International Conference on Machine Learning and Applications, vol. 2, pp. 568–571. IEEE (2013) 39. Shelton, J., Kumar, G.P.: Comparison between auditory and visual simple reaction times. Neurosci. Med. 1(1), 30–32 (2010) 40. Shute, V.J., Ke, F.: Games, learning, and assessment. In: Ifenthaler, D., Eseryel, D., Ge, X. (eds.) Assessment in Game-Based Learning, pp. 43–58. Springer, New York (2012). https://doi.org/10.1007/978-1-4614-3546-4 4 41. Stevens, R.: Editorial. Res. Stud. Music Educ. 3(1), 1–2 (1994). https://doi.org/ 10.1177/1321103X9400300101 42. Teplov, B.: Psychology of music and musical abilities. Leningrad: Izdatelstvo Pedagogicheskih Nauk (in Russian), Moscow (1947) 43. Trehub, S.E., Hannon, E.E.: Infant music perception: domain-general or domainspecific mechanisms? Cognition 100(1), 73–99 (2006) 44. Vidwans, A., Gururani, S., Wu, C.W., Subramanian, V., Swaminathan, R.V., Lerch, A.: Objective descriptors for the assessment of student music performances. In: Audio Engineering Society Conference: 2017 AES International Conference on Semantic Audio. Audio Engineering Society (2017) 45. Wallentin, M., Nielsen, A.H., Friis-Olivarius, M., Vuust, C., Vuust, P.: The musical ear test, a new reliable test for measuring musical competence. Learn. Individ. Differ. 20(3), 188–196 (2010) 46. Walls, K.C.: Music performance and learning: the impact of digital technology. Psychomusicol.: J. Res. Music Cogn. 16(1–2), 68 (1997) 47. Webster, P.R.: Construction of music learning. MENC Handb. Res. Music Learn. 1, 35–83 (2011) 48. Zagami, J., et al.: Creating future ready information technology policy for national education systems. Technol. Knowl. Learn. 23(3), 495–506 (2018)

MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment to Boost Mathematical Competence Alice Barana1

, Marina Marchisio1(B)

, and Raffaella Miori2

1 Department of Molecular Biotechnology and Health Sciences,

University of Turin, Via Nizza 52, 10126 Turin, Italy {alice.barana,marina.marchisio}@unito.it 2 IIS Eugenio Bona Di Biella, Via Gramsci 22, 13900 Biella, Italy [email protected]

Abstract. In the transition from lower to upper secondary education, Italian students are expected to have achieved a level of competence which allows them to use knowledge and abilities to model and to understand scientific and technical disciplines. National standardized tests show that especially students who attend technical high schools often have gaps or misunderstandings in their basic knowledge, which may hinder them in learning scientific technical disciplines, which are at the core of their curriculum. In this paper we start from items designed for summative assessment which highlight the main difficulties that students face with Mathematics at this stage, and discuss how it is possible to adapt them to automatic formative assessment through a process of expansion and digitalization, with the aim of helping students fill the gaps and develop mathematical competences. The activities are part of an online course, called “MATE-BOOSTER”, conceived to strengthen mathematical skills of students attending the first year of a technical upper secondary school. In this paper the process of design of MATEBOOSTER, rooted on constructivist assumptions, is outlined; the design of tasks for automatic formative assessment is discussed in details, and some examples of online activities are analyzed in light of a theoretical framework. Keywords: Automatic formative assessment · Basic mathematical competence · Constructivist learning environment · Learning design · Problem solving · Task design · Digital Learning Environment

1 Introduction Italian students completing lower secondary education – which in Italy ends at 8th grade – are supposed to have developed a positive attitude towards Mathematics and to understand how mathematical tools can be useful in many situations to operate in the real world [1]. INVALSI is the national institute in charge of verifying that the learning outcomes are achieved: it administers surveys and standardized tests in order to guarantee

© Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 418–441, 2020. https://doi.org/10.1007/978-3-030-58459-7_20

MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment

419

the quality of Italian instruction and to make it possible to be compared at international level. The results of INVALSI surveys highlight how, at all stages, but particularly at the end of 8th grade of instruction, there are still difficulties in the deep understanding of basic mathematical concepts, in the ability of applying knowledge to solve problems in real contexts and, above all, in the process of argumentation, which shows the difficulty in formalizing the intuitive knowledge [2]. These gaps increase in importance when students enroll to upper secondary school and they have to approach scientific and technical subjects, whose understanding relies upon their basic mathematical competence. This problem is particularly evident in technical upper secondary schools, where specialized disciplines are studied at an advanced theoretical level, though students’ average mathematical competence is lower than in Lyceums, as the national surveys show [2]. The ability to use mathematical thinking to solve problems related to the real experience or to other disciplines – in other words, mathematical competence [3, 4] – thus acquires relevance in the delicate period of transition that young people go through when they enroll to upper secondary school, when school successes and failures are deeply interlaced with the shaping of their characters [5, 6]. The Head Teacher of the Technical Upper Secondary School “Eugenio Bona” of Biella, together with her team of Mathematics teachers, designed a project aimed to strengthen the basic mathematical competences of first year students with the support of an e-learning platform and digital materials. The project, called “MATE-BOOSTER”, was implemented in collaboration with the department of Mathematics of the University of Turin, which has a long experience in the development of virtual environments for learning Mathematics, especially to prevent school failure [7, 8] and to support students in the transition from lower to upper secondary school [9]. The project started in September 2018 and lasted the entire 2018/2019 school year. In the paper “MATE-BOOSTER: Design of an e-Learning Course to Boost Mathematical Competence” [10], presented at the CSEDU Conference 2019, we focused on the design of the online course, discussing the methodologies chosen in relation to the students’ needs and showing the process which led to the realization of innovative digital materials. This paper will focus on the design of the online activities, with particular reference to the automatically graded assignments which populate the course. The tasks take inspiration from questions used in standardized assessment, but are elaborated in a formative assessment perspective with the aim of developing mathematical competences. After presenting a theoretical framework with the principles chosen for the design of the activities, the process of design of the online course is outlined and the design of tasks with automatic formative assessment is shown in detail; in particular, three examples of tasks are discussed in light of the theoretical framework.

2 Theoretical Framework 2.1 Web Based Constructivist Learning Environments The choice of the methodologies for developing the learning materials has been made on constructivist assumptions, according to which knowledge is situated, it being a product of the activity, context and culture in which it is developed and used [11]. Constructivist theories consider learning as a lifelong active process of knowledge building mediated

420

A. Barana et al.

by experiences and relations with the environment and the community [12]; thus, in constructivist learning environments students should be engaged through authentic activities and real world problems. In Mathematics education this issue has been investigated by many researchers; according to A. Schoenfeld, Mathematical thinking should be a tool to interpret quantitative phenomena of the outside world and it should be developed at school through meaningful modellization activities [13]. One of the main implications of the constructivist idea of the learner creating his or her own knowledge is the shift from a teacher-centered to a student-centered approach. If students become the protagonists, teachers need to leave the stage and move aside, changing their role from leaders to mentors, and their task from knowledge transmission to the creation of a suitable learning environment [13]. The community in which the learner is integrated is a core element as well. The sharing of opinions opens the mind and favors the process of knowledge building. Thus, in a constructivist learning environment activities that facilitate collaboration and require discussion and interaction among peers should take place [14]. Moreover, activities should be rooted in assessment with a formative value in order to inform both teachers and students about progresses [15]. Assessment and metacognition are deeply interlaced: according to Hattie and Timplerley [16] frequent and wellstructured feedback helps learners understand where they are, where they are going and what they should do in order to reach their goal, giving information not only about how the task was performed (task level), but also about the process to be mastered (process level), and enabling self-regulation and self-monitoring of actions (self-regulation level). Strategies such as formative assessment, collaborative learning and relevant problem solving are also indicated by several researches as useful enablers of learners’ engagement, which is related to high learning achievements [17]. Improving engagement is particularly important in students with challenging backgrounds, learning difficulties or low scholastic performances; in these contexts, interventions that only focus on the reinforcement of basic knowledge are often little effective, if they don’t rely on approaches which promote interest, motivation and self-efficacy [18]. Technology can support the creation of constructivist digital environments, as it can provide computer mediated communication, computer supported collaborative work, case-based learning environments, computer supported cognitive tools [19], as well as instruments for self and peer assessment [20] and for automatic evaluation [21]. Moreover, the use of constructivist Digital Learning Environments (DLE) can enhance the development of students’ engagement, particularly in students that are initially little motivated or come from lower social classes [22, 23]. The analysis of the implementation of web based constructivist learning environments has involved many authors in literature in the last twenty years and several models have been designed to engage students of different school levels, in e-learning or blended modality, in learning several disciplines [24–27]. Their results mainly deal with the relations between strategies, media and tool used and processes activated. Constructivist instructional designers generally accept as a valid and well-established framework for building learning environments the seven learning goals devised by Cunningham, Duffy and Knuth in 1993 and illustrated by Honebein [28]; they are: 1. to provide experience with the knowledge construction process; 2. to provide experience in and appreciation of multiple perspectives;

MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment

3. 4. 5. 6. 7.

421

to embed learning in realistic and relevant contexts; to encourage ownership and voice in the learning process; to embed learning in social experience; to encourage the use of multiple modes of representation; and to encourage self-awareness in the knowledge construction process.

2.2 Automatic Formative Assessment In a Digital Learning Environment, formative assessment can be easily automatized in order to provide students immediate and personalized feedback. There are several Automatic Assessment Systems (AAS) that allow the creation of questions for STEM (Science, Technology, Engineering and Mathematics); those which are based on a Computer Algebra System (CAS) support the creation of automatically graded open Mathematical answers, such as formulas and equations, but also sets, vectors and graphs, which are accepted for their meaning, not only for their form. These tools can be usefully adopted to enhance learning, master problem solving strategies, improving metacognition, facilitate adaptive teaching strategies and support teachers’ work [29]. Using Moebius AAS [30], which is based on the mathematical engine of Maple [31], the University of Turin has designed a model for the automatic formative assessment of Mathematics, based on the following principles [32]: 1. availability of the assignments to the students, who can work at their own pace; 2. algorithm-based questions and answers, so that at every attempt the students are expected to repeat solving processes on different values; 3. open-ended answers, going beyond the multiple-choice modality; 4. immediate feedback, returned to the students at a moment that is useful to identify and correct mistakes; 5. contextualization of problems in the real world, to make tasks relevant to students; 6. interactive feedback, which appears when students give the wrong answer to a problem. It has the form of a step-by step guided resolution which interactively shows a possible process for solving the task. The last one consists of a step-by-step approach to problem solving with automatic assessment, but it is conceptualized in terms of feedback, highlighting the formative function that the sub-questions fulfil for a student who failed the main task. The interactive nature of this feedback and its immediacy prevent students from not processing it, a risk well-known in literature which causes formative feedback to lose all of its powerful effects [33]. Moreover, students are rewarded with partial grading, which improves motivation [34]. This model relies on other models of online assessment and feedback developed in literature, such as Nicol and Macfarlane-Dick’s principles for the development of self-regulated learning [35] and Hattie’s model of feedback to enhance learning [16].

422

A. Barana et al.

2.3 Task Design for Formative Assessment In Mathematics education, summative assessment design is generally affected by psychometric tradition, that requires that test items satisfy the following principles [36]: • unidimensionality: each item should be strictly linked to one trait or ability to be measured; • local independence: the response of an item should be independent from the answer to any other item; • item characteristic curve: low ability students should have low probability to answer correctly to an item; • non-ambiguity: the question should be written in such a way that students are led into the only correct answer. In particular, tradition says that in order to build good assessment items, the following goals should be pursued [36]: 1. there must be a high level of congruence between a particular item and the key objective of the whole test; 2. the key objective must be defined clearly and unambiguously; 3. the contribution to measurement errors must be minimized; 4. the format of the test items must be suitable to the goals of the test; 5. each item must meet specific technical assumptions; 6. the items should be well written and should follow prescribed editorial standards and style guidelines; 7. the items should satisfy legal and ethical questions. As a result, items built according to this model are generally limited in the Mathematics that they can assess, as the possible problems are reduced to those with one only solution, deducible through the data provided in the question text. If problems admit multiple solving strategies, the only information detected is the solution given by students, thus removing the focus from the process, which is essential for assessing Mathematics understanding [37]. If different models for summative assessment are currently under research [38], models for formative assessment should definitely detach from these principles. Several researches are going into this direction; for example, Van den Heuvel-Panhuizen and Becker [37] suggest that, in formative assessment: • problems should have multiple solutions, meaning both tasks with multiple possible paths to the single correct solution, and multiple correct results; • tasks might be dependent, in the sense that information gained solving one task can be useful to solve a subsequent task or a second part of the same task; • strategies should be the intended output: tasks should be designed in such a way to make the process visible, and the process itself should be considered more important than the answer.

MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment

423

The abovementioned model of automatic formative assessment is in line with these principles: open answers and interactive feedback can be designed in order to focus on the process; the algorithmic power of the system enables different representations of the same concept to be provided to the students, or elaborated as answer; moreover, multiple attempts with random parameters help students recognize and master solution paths. Using these models, it is possible to design and build tasks that respect the constructivist principles for knowledge building, presenting the contents from multiple perspectives, activating students as owners of the learning process and embedding learning in relevant contexts.

3 Design of the Online Course The MATE-BOOSTER project was conceived with the aim of strengthening basic mathematical competence of first-year students of a technical upper secondary school, acting with methodologies and tools able to activate students’ motivation and engagement, in order to prevent failures in scientific, technological and economic subjects which are at the core of their curriculum. The main feature of the project involves the creation of a web-based course in a DLE where students can revise the contents in a self-paced way or under their teachers’ guide, both in the classroom and at home. The materials were created according to didactic methodologies which are in line with the theories of constructivism and formative assessment outlined in the previous paragraph. The project involved 202 students of nine classes with their seven teachers of Mathematics, plus one teacher in charge of coordinating the work from inside the school. MATE-BOOSTER was developed following the “ASSURE” model of learning design [39], which includes the following six steps: 1. 2. 3. 4. 5. 6.

Analyze the learners; State objectives; Select methods, media and materials; Utilize media and materials; Require learner participation; Evaluate and revise.

The whole design process was conducted by researchers from the University of Turin in close collaboration with the teachers of Mathematics of the nine classes involved. In fact, it was considered essential that teachers shared the instructional strategies, approved the didactic materials and were consulted at each step of the design; otherwise they couldn’t present the project to their students in a way that convinces them to take part in the online activities. 3.1 Analysis of the Learners The analysis of the learning needs, preceding the development of the course, was carried out with two different aims: • to examine students’ competence in Mathematics, and the gaps in their knowledge; • to inquire about students’ motivations to study in general and to study Mathematics in particular.

424

A. Barana et al.

Two different tools were thus chosen for these objectives: an entry test to assess the initial competence and a questionnaire to understand their motivations. The entry test was composed of 20 multiple choice questions to be answered in 45 min. For each correct answer students got 5 points, 0 for incorrect or ungiven answers. It was administered online with an automatic assessment system. All students took the test on the same day (8th October 2018); some settings were added to the test to prevent students from cheating: the questions and the choice of the answers were shuffled, there were some random numeric parameters, there was only one attempt available with an automatically set time limit, so that the test automatically quit after 45 min. Few days before the date of the test, students were given the log in data to access the platform where the test would take place; there, they could “find a sample test with the instructions to navigate through the questions. Questions were distributed among the core topics studied in the lower secondary school, in proportion to the time generally dedicated to each one. Each item referred to one of the main content areas of the curriculum (numbers, space and shapes, functions and relations, data and predictions), moreover there were two questions about simple logic reasonings. Questions were built in order to verify the comprehension of particular concepts or processes, not just to check the memorization of rules or formulas; they were conceived for the summative assessment, focusing on a simple, limited request, with only one correct solution, according to the typical principles of the psychometric tradition. The results of the test were statistically treated using the difficulty index, which corresponds to the ratio between the number of correct answers and the sample size, and the discrimination index, which is the difference between the difficulty indexes of the best performing group and the worst performing one, where the two groups are equal sized and cover the whole sample [40]. The test reliability was assessed through the Cronbach Alpha. Results of the entry tests were not particularly good, with an average score of 41/100, meaning that the level of difficulty was quite high, at least for the students of this school. Nobody scored more than 80 out of 100, while the lowest registered score was 5/100. If aggregated by classes, the average score varied significantly, from a minimum of 34/100 to a maximum of 54/100; the belonging to a specific class explains the 18.5% of the variance of test results (square eta = 0.185, p < 0.0001). It can be noticed that the best performers attended a curriculum which is more rooted in Mathematics than the worst performers. Results aggregated by content areas show the same trend as INVALSI tests: space and shapes turned out to be the most difficult area, with an average index of difficulty of 0.27; it was followed by logic reasonings (0.36), whilst data and predictions was the easiest one (0.60). Results for numbers and relations hung around 0.40. The difficulty of questions ranged from 0.22 to 0.87; only 4 out of 20 questions can be considered “easy”, reaching more than 50% of correct answers. The majority of questions can be considered coherent with the general test, since the discrimination index is greater than 0.25 for the 75% of the questions. Questions with a low discrimination index were qualitatively analyzed: they include frequent misunderstandings among the incorrect options or high-level reasonings that caused also the most skilled students to make mistakes [41]. The test Cronbach Alpha was 0.65; it was negatively influenced by

MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment

425

these questions which hindered the students. Our claim is that this test is quite efficient for grade 9 students, but in general students who enroll to a technical secondary school like this one have low-level competence in Mathematics, that the test highlighted. The questionnaire was composed of 33 statements where students were asked to state their level of agreement with Likert scale from 1 to 4 (completely disagree – completely agree) or from 1 to 5 (insufficient – excellent). It was administered online on the same platform where the entry test took place. The questionnaire was inspired by the student questionnaire of 2012 PISA survey, when Mathematics was the main focus [42]. It aimed at measuring attitudes and behaviors towards school and Mathematics, in particular intrinsic motivation (shown by students that study mathematics because they like it), instrumental motivation (shown by students that study Mathematics because it will be useful for their future), perseverance, openness to problem solving, perceived control over success in Mathematics, ethic and respect of school roles, mathematical activities outside school. Moreover, students were asked if they had an internet connection and a device (tablet/computer) at their home, available for their homework. It emerged that students’ intrinsic motivation is not so high (the average was 2.6 in a scale from 1 to 4), although it varied widely (standard deviation: 0.9), while instrumental motivation was higher (the average was 3.1 in a scale from 1 to 4, standard deviation: 0.5). All students had the possibility to use a computer with internet connection for large part of their time at home. Deeper analysis on the answers to the questionnaire will be carried out later in the project; the information gained will be used to better interpret the outcomes. 3.2 Statement of the Objectives In the light of the results of the entry test and of the questionnaire, during a focus group researchers and teachers listed the learning outcomes of the course. The choice of the topic that the course should cover was made considering the contents needed to understand the scientific courses of the first years (Mathematics, Computer Science, Economy, Science, Physics). They are the following: • fractions (operating with rational numbers); • proportions (calculating the unknown term of a proportion, to solve problems involving direct and inverse proportionality in real contexts); • percentages (calculating percentages in real contexts); • powers (knowing the meaning of exponentiation and applying the properties of powers); • mathematical formulas and functions (working with symbols and formulas and with their graphical representations); • equations (reading and building equations, solving linear equations in one unknown); • plane geometrical shapes (knowing and calculating measures of angles, triangles and squares); • statistics and probability (managing data, descriptive statistics indexes and graphical representations, calculating elementary probabilities in real contexts); • mathematical language (understanding and using different registers of representation: verbal, symbolic, graphical, geometrical, numerical); • logics (managing simple logic reasonings using Boolean operators).

426

A. Barana et al.

3.3 Selection of Methods, Media and Materials The choice to create an online course which students can use at home in a self-paced modality was validated by their availability of technological devices to access the material, expressed in the questionnaire. Moreover, in all the classrooms of the school there was an Interactive White Board (IWB) that teachers could use to show students the platform and to complete the activities together; the school had three computer labs and several tablets that allowed students to work with the course activities even at school. As a DLE, an integrated Moodle platform was adopted, managed by the ICT services of the Department of Computer Science of the University of Turin, the same platform where the entry test and the questionnaire were delivered. MATE-BOOSTER was inserted on an instance of the Moodle platform that the University of Turin commonly adopts for elearning and that often hosts school teachers and students for educational projects [8, 43– 45]. The platform is integrated with the mathematical engine of Maple for the creation and sharing of interactive materials, and with Moebius Assessment for automatically graded assignments. The didactic methodologies for the development of the contents were selected on the base of the constructivist framework and of the evidence gained during previous experiences of e-learning courses [46]. They are the following: • Problem posing and problem solving: assuming the social-constructive insight of problem solving, problems are considered as learning environments where mathematical knowledge is created in a collaborative discussion starting from a problem. The top-down order traditionally used to study Mathematics is inverted: from the analysis of a real-world situation, paths to the solutions are drawn, in a constructive approach toward the discipline. Afterward, the solving steps are synthetized and generalized, introducing the typical rigor of Mathematics. Learning technologies are used both for online cooperation and as a mean of representation of the solving process: freed from the burden of calculations, students can focus on the solving strategy, find relationships and better understand the solutions [47]. • Collaborative learning: in a DLE, collaboration can be fostered through activities for synchronous or asynchronous discussion; it enhances students’ comprehension of problems and of Mathematical concepts. Moreover, positive collaborations affect the quality of the environment and they are reflected on students’ motivation. Collaborative digital learning environments force the shift of the teachers’ role, who let students create their own learning – but carefully monitoring it [48]. • Learning by doing: interactivity enhances students’ engagement and contributes to increase their motivation. When students are actively involved in the resolution of the problems, they are facilitated in understanding the situation and in manipulating the abstract objects of the solving process. The feedback that students get from activities help them control their learning and move forward [49]. • Automatic formative assessment: implemented with an AAS specialized for STEM, it allows students to practice at their own pace and to obtain immediate feedback to acknowledge their own level of preparation. Questions and assignments can be enhanced by varying them in a random controlled form, inserting parts expressed in a special programming language. This allows a great variety of assessment modalities

MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment

427

which strengthen reasoning until it is mastered: students can obtain different data or graphics at every new attempt, the system can adaptively suggest guided resolutions, feedback and questions can automatically be proposed on the base of previous answers [32]. The process of creation of the materials took place in a “Management course”, where school teachers could access and follow the work, propose ideas and suggestions, get in touch with the researchers. The structure given to the course is modular, according to the general guidelines for the creation of e-learning courses [50], each module corresponding to a different topic, to the purpose of addressing students through the course topics and to show at a glance the whole content. All the modules have a fixed structure, composed of submodules containing: 1. materials with theoretical explanation of the fundamental concepts in the form of e-book, that students can read online or download in pdf. Explanations begin with problems and are correlated with examples, graphics and images; 2. interactive materials for the exploration of the concepts illustrated in the e-book, which help students to put theory into practice, to visualize and analyze different representations of the same mathematical structures when parameters change; 3. automatically graded assignments to check the understanding of the concepts presented and of the related abilities. At the end of every module there are: 4. one or more real-world problems which require the use of the contents of that unit to be solved; 5. a final test, automatically graded, to verify the achievement of the learning objectives expected for the module. Figure 1 shows an example of course module. The interactive materials and the online tests are conceived with formative purposes and their role is prominent in the online course; the design of the tasks is illustrated in the following section. Within the online course there are also a forum of discussion for students, a progress bar, through which learners can visualize their learning steps, and a link to the gradebook, where all the assignments results are recorded. 3.4 Utilization of the Materials Once the course was completed, it has been duplicated in 9 single courses, one for each class, so that teachers can easily control the progress of their own students and give them personalized support and advise. Courses were opened to the students at the end of October 2018, and they remained active until June 2019, even though students will be able to access the contents even after the project ends.

428

A. Barana et al.

Fig. 1. Structure of a section of the online course [10].

Students received an e-mail at their institutional e-mail address with the indications to log in the platform and to access the materials. Interactive instructions about how to use the automatic assessment were provided to the students directly through the platform. The teachers were also asked to repeat the instructions to the classes at school and to show through the IWB how to use the materials. The learning materials could thus be used by students who need to revise basic skills at their own pace, but they were also suitable to class activities of different kind when teachers needed to introduce new topics based on previous knowledge or assign differentiated activities to different groups of students. 3.5 Requirement of Learner Participation Aware that little motivated students would not be too keen on autonomously doing online mathematical activities in their spare time, some expedients were taken in order to assure their frequency to the course. The main one is the certification: students who initially had low grades were required to present, by the end of the school year, the certificate of completion of the course. The certificate could be automatically downloaded from the platform, at the condition that all the activities appeared as completed. If the certification acted as “external” motivational lever, the learning methodologies chosen to develop the materials should contribute to the development of intrinsic motivation. The real contexts, the immediate feedback, adaptivity and interactivity make all the materials engaging and useful to get prepared, so that students who try the activities can acknowledge their usefulness and go on with the modules. The interactive feedback provided through automatic assessment help them understand solving strategies and processes, contributing to the development of self-regulation. Through a progress bar they can be made aware of their position in the learning path and be motivated to complete it.

MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment

429

In addition, all teachers were asked to present the course to their classes, to invite them to do the activities as homework and to recall the problems during lessons. In fact, students need to see the course as linked to their study and not as an external and additional duty; the more they are convinced of the usefulness of the online course for their learning, the more easily they will participate. The collaboration with the teachers could also have the positive effect to renew their teaching practices, introducing the use of the didactic methodologies and technologies adopted in the online course. As a consequence, not only the online course, but the whole school experience with Mathematics could be more engaging for the students, who can be facilitated in the development of interest for Mathematics. 3.6 Evaluation and Revision In June 2019 an evaluation of the course was performed in several modalities. The achievement of the learning outcomes was assessed through a final test, similar to the entry one, for all the students. The appreciation of the course was evaluated via a questionnaire, which investigated the appreciation and perceived usefulness of the online activities to get a better understanding of the contents. Teachers were interviewed on their point of view about students’ performances. Data collected through the two tests, the two questionnaires, platform usage and students’ scores and teachers’ interviews are currently being analyzed and crosschecked in order to understand key strengths and limits of MATE-BOOSTER for future implementations of the project.

4 Design of the Materials for Formative Assessment The assessment materials which populate the course modules were designed according to the principles of task design for formative assessment and to the model of automatic formative assessment presented in the theoretical framework. The ideas for the problems were mainly drawn either from the questions of the entry test, especially those which caused more problems to the students, or from items of the INVALSI tests, which are national standardized tests administered to all the Italian students at specific grades of education. A consistent amount of INVALSI questions is available on the Gestinv database [51] and on the INVALSI website [52]. Questions of both the entry test and INVALSI tests are designed with summative purposes according to the principles of the psychometric tradition. However, there are very good reasons that explain why these two sources are particularly valuable for this kind of course. On one hand, questions from the entry tests revealed the specific topics in which students of this school were particularly weak, their frequent mistakes and misunderstandings, so they provided precious information about the themes on which it was necessary to work. INVALSI questions, on the other hand, although conceived for a standardized evaluation, offer rich prompts for classroom tasks and formative assessment; moreover, statistical properties of the items are available, showing the difficulties that students at each stage usually face with Mathematics.

430

A. Barana et al.

Items from both sources were expanded in order to explicitly touch upon the key points and help students clarify how the questions should be solved; they were also adapted to the technologies used, in order to take full advantage of the potential of the system for the exploration of Mathematical concepts and for the automatic feedback on students’ answers. In the following paragraphs we will provide further details on the design of tasks for the automatic assessment through some examples. 4.1 Example 1: Algebraic Computations Symbolic computation is a core topic in the Mathematics curriculum of grade 9: it is the basis for working with functions, which will be one of the main objectives of Mathematics education at secondary level. For the development of a set of tasks on symbolic computation, we started from an item of the entry test, shown in Fig. 2, which asks to compute the area of a geometrical figure, a rectangle trapezoid, whose measures are given in function of a variable. In order to provide the correct answer, students need as a first step to recall the formula to calculate the area of a trapezoid, or at least to sketch the figure and to consider its properties; as a second step they need to use algebraic rules to synthetize the obtained formula in a compact form. The item includes both geometrical and algebraic reasoning in order to be solved; since we considered the geometrical reasoning primary, in the entry test it was classified among the geometry items. The item can be considered reliable, as its d-value was 0.29, and difficult, since it was answered correctly by 34% of students. The link between geometry and algebra is what makes this question interesting: in fact, using measures of geometrical figures to visualize algebraic operations is very useful to confer concreteness to abstract computations. A similar task was proposed in the INVALSI tests to students of grade 8; it is shown in Fig. 3. Here the picture of the trapezoid is given and the main process requested to the students consists in adding and multiplying algebraic expressions. The presence of the picture can facilitate the students that do not remember the formula for the area of a trapezoid because they can use the composition of simpler figures as square and triangle. Even so, the question resulted difficult, with 27% of correct answer: it is probably due to the students’ poor familiarity with connections between different parts of Mathematics; anyway, the processes that this item involves are really interesting from a didactic point of view.

Fig. 2. Item of the entry test involving geometrical and algebraic thinking. The text has been translated into English for the comprehension of the paper.

MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment

431

Fig. 3. Item from INVALSI tests involving algebraic reasoning on a geometric figure. The text has been translated into English for the comprehension of the paper. Source: Gestinv 2.0.

We started from these considerations to build a set of tasks with automatic assessment and formative purposes, aimed at making algebra less abstract and help students visualize algebraic computations. One of them is shown in Fig. 4. The geometrical figure is different from the others and it is not standard, but students can calculate its area decomposing it in simpler parts, as rectangles or squares. Actually, students can use several decompositions to reach different forms of the same formula. Firstly, students are asked to compute the area and write the formula in the blank space, as in the INVALSI question. Thanks to Maple engine, the system is able to recognize the correctness of the formula independently of its form, so every formula obtained through different reasoning is considered correct. Students have 3 attempts to provide the formula, so that they can self-correct mistakes and deepen their reasoning if a red cross appears. After 3 attempts, either correct or not, a second section appears, showing a table that students have to fill with the values of the area of the figure when the variable assumes certain values. In this part, students have to substitute in the formula different values of the variable; the purpose is to increase the awareness that variables are symbols that stand for numbers and that a formula is a representation of a number, which has a particular meaning in a context. Numbers in the left column are in progressive order to help students grasp the linear relationship between the variable and the area. The table is a bridge to the last part of the question, where students are asked to sketch the graph of the function, using an interactive response area of Moebius Assessment that accepts answers within a fixed tolerance, without manual intervention. Students can draw the points found in the previous part in the cartesian plan to sketch the line. This question allows students to explore an algebraic formula from several points of view: geometrical, symbolic, numeric, functional and graphic. The second and third part help them deepen their familiarity with this kind of formula, make the algebraic computation more concrete and raise the awareness that in Mathematics concepts from different areas are often connected. This question is algorithmic, in the sense that the numbers of the upper and left side of the figure are randomly chosen and change at every attempt; the picture, the formula, the numbers and the graph in the following sections change accordingly. There are other versions of the same task that involve different figures, so that students can be activated in the comprehension of the meaning of variables and formulas. The question adapted in this way follows the principles for the formative assessment task design: in particular, there are more strategies to find the correct solution, namely, the different ways of decomposing the figure to compute the area; moreover, the requests are dependent on each other; the system provides students with the correct answer to be used in the following parts if they fail all the available attempts.

432

A. Barana et al.

Fig. 4. Task with automatic formative assessment on algebraic computations.

4.2 Example 2: Graphs of Functions Several tasks in the online course focused on the introduction to functions and to the study of linear functions, in particular contextualized in real-world situations, where students had to detect variables and variations. We show as an example a task ideated on the basis of the INVALSI question in Fig. 5. The original item asks, given a real-world

MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment

433

situation where a girl reads a book at different speeds during different periods of time, to find the graph that correctly describes the situation. The format of the question is multiple choice; to give the correct answer, students have to qualitatively analyze the different graphs and associate different speeds with different lines. They need to clearly understand that, if the number of pages read per day is constant, the graph of the reading trend is linear, that horizontal traits represent periods with no reading, and that higher slopes correspond to higher speeds. We expanded the item in a wider task (Fig. 6) moving backwards, from the graph to the situation, proposing a punctual analysis and interpretation of the graph. In the first section students have to read from a graph showing a similar reading trend the number of pages that a student reads in three different periods of time, at different speeds. The following section asks students to determine how many pages the student reads per day in the three different periods: in this way they determine the reading speeds. The last section asks students to choose in which period Marta was faster in reading the book, thus interpreting the numbers inserted above in the original situation. The question is algorithmic, in fact the initial graph and the following values change randomly at every attempt. The task aims at helping students explore a graph of a function and interpret it in light of a real-world situation; they are asked to explicitly find properties of the graph so that they are facilitated in connecting them with the properties of the function and in reading it within the context. The requests are dependent on each other; students can use the previous results, automatically checked, in the following sections, so that their mistakes are immediately corrected and not dragged through the resolution of the problem.

Fig. 5. Item from INVALSI tests on graphs of functions. The text has been translated into English for the comprehension of the paper. Source: Gestinv 2.0.

434

A. Barana et al.

Fig. 6. Question with automatic formative assessment on graphs of functions. The question, originally in Italian, has been translated into English for the comprehension of the paper.

4.3 Example 3: Powers and Their Properties In the entry test one of the most difficult questions was about the properties of powers (difficulty index: 0.34, d-value: 0.47) in which students had to choose, among 4 equalities involving the properties of powers, the only wrong one. The properties of powers, even at a numerical level, are essential to understand algebraic computations, where the same rules are valid with variables, so it is important that students achieve good results in this

MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment

435

kind of exercises. The options in the items involved frequent mistakes in the applications of the properties of powers; a set of tasks was developed, focused on these mistakes, on the scheme of the question shown in Fig. 7, and inserted in the online course. Initially, an equality involving properties of powers is displayed and students are asked to decide whether it is correct or incorrect. This part is equivalent to the question in the entry test, with the difference that they have only one equality to focus on. They can earn half the score if they answer correctly; after that, they are asked to fill two subsequent sections which refer to the general rule to apply, through which they can earn up to the remaining half of the score. The last two sections can have a double function: justifying the choice, if the student had answered correctly to the first part, or showing a reasoning process, if the student had given the wrong answer (or guessed) to the first step. Once they finish the test, students can try it again and find other questions with a similar structure but different examples of applications of the same or other power properties. This is an example of question with interactive feedback: after the first section the student receives a first feedback in a form of green tick or a red cross depending on whether s/he answered correctly or not; the following sections are a feedback about how s/he was supposed to develop her/his reasoning in order to reach the solution. The feedback is interactive, because the student has to complete the sub-questions step by step, following the proposed reasoning in an active way. Completing this task, students are guided towards a correct resolution using a proper language and suitable justification; it is a way of “making the process visible”, which is a key point in formative assessment task design.

Fig. 7. Example of a question on the properties of powers, developed with automatic formative assessment and interactive feedback [10].

436

A. Barana et al.

5 Discussion The tasks that populate the course, similar to the ones described above, follow the model of automatic formative assessment and exploit the potential of the system to engage students in interactive activities. In fact: 1. Questions, collected and made available through assignments, are always accessible to students, who can make multiple attempts in order to repeat reasonings and to reinforce concepts. 2. Questions are algorithm-based: in fact, numbers change at every attempt, and figures, formulas, answers change accordingly. In this way, in multiple attempts students will find different questions with the same scheme, that will help them master the solving process, since simply remembering the correct answer is not useful, as it varies every time. 3. Questions are, as much as possible, open-ended: students have to write formulas instead of selecting the correct one, and this enables higher cognitive processes. Maple engine assures that correct formulas are considered correct independently of the form. 4. Feedback is immediate, so that students can immediately acknowledge their mistakes and even self-correct them. In questions that include more than one section, this is very useful as students can individuate their mistakes during the solving process and use the correct answers in the following steps. 5. Whenever it makes sense, questions are contextualized in the reality or show connections with other disciplines, or other areas of Mathematics; this helps students to associate abstract concepts to concrete ideas. 6. Feedback are interactive, because they involve students in a step-by-step active resolution which is more efficient than just reading a correct solving process. Using this model to shape questions allowed us to follow the principles of formative assessment design. In particular, the step-by-step capabilities allowed to build wellstructured questions containing multiple requests; the dependence of one response area on another is not a problem, as students can immediately visualize the correctness of their answer and use the information in the next steps. Thus, different forms of questions were created: questions with interactive feedback, such as the one shown in the third example, where students have a chance to understand why their answer was correct or not following an interactive process; step-by-step processes which guide students in the resolution of one problem, such as the one shown in the second example, and exploration of Mathematical objects that, at each section, provide further details and new viewpoints that are useful for a complete analysis of a concept, as shown in the first example. These forms of questions are very useful to help students make the process visible: they are guided in making reasonings explicit and go through good examples of correct justifications in an interactive way, increasing their autonomy. This way, the process becomes an output; to get the maximum score students have to complete the full reasoning, not only to provide the correct answer. These questions also increase argumentation competences, which is one of the greatest difficulties for students at secondary level. Questions that involve different possible strategies to get the solution

MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment

437

can be carefully studied and adapted to the automatic assessment. It is possible to write algorithms in order to accept answers in different forms, and to prepare interactive feedbacks that show students several approaches to the correct answer, or different representations of the same problem, as shown in the first example. The tasks designed for MATE-BOOSTER are created according to constructivist directions as well, and they actually respect the seven goals for building constructivist learning environments theorized by Cunningham, Duffy and Knuth. 1. Real-world problems offer students a learning environment in which to create mathematical knowledge starting from a specific case; exploration tasks let students build and associate meanings to mathematical concepts; questions with step-bystep guided solutions help them manage a complex resolution following their own ideas. Thus, students get to experience the very knowledge construction process. 2. Questions, as shown in the first example, often show students different approaches to the resolution of a problem, and highlight how it is possible to express the same Mathematical object in different registers (through words, formulas, graphs, geometrical figures, and so on). Moreover, within the online course, questions can be discussed asynchronously with peers, or face to face in the classroom, so students have the opportunity to come to terms with different opinions and ways of understanding. These features can provide learners with experience in and appreciation of multiple perspectives. 3. Great part of the automatically graded questions is contextualized in real-world situations, interesting and challenging for students, or in other disciplines, or other areas of Mathematics. In this way, learning is embedded in realistic and relevant contexts. 4. Step-by-step processes and interactive feedback actively involve students in the correction of their mistakes and in the exploration of solving strategies. Students are not simply reading static texts, but they are requested to follow the reasoning and demonstrate their understanding. They are at the center of their own learning. Moreover, the questions and their sections do not flow automatically in front of students’ eyes: they have to autonomously get into each one and browse questions with a click, thus enhancing their commitment. In this way ownership and voice in the learning process can be encouraged. 5. Students’ work, their problems and successes are not isolated: they can share them with other learners through the forum. Moreover, MATE-BOOSTER is inserted in a blended context, where students actually meet every morning at school and teachers are advised to discuss the activities during the lessons, to the purpose of embedding learning in social experience. 6. Questions often present the same concept with different registers (in words, symbolic, graphic, tabular, and so on) and try to simplify its understanding via a shift of register. This approach aims at encouraging students to the use of multiple modes of representation. 7. Immediate feedback facilitates students’ acknowledgement of their preparation; moreover, automatically graded open answers and interactive feedback ask students to explain processes, not only to give results. Hence, the assessment activities pursue the goal to encourage self-awareness in the knowledge construction process.

438

A. Barana et al.

Thus, the automatic formative assessment provides a suitable learning environment where students can reinforce their knowledge with a constructivist approach; together with the other online activities that populate the course and that are based on the same principles, they make MATE-BOOSTER a constructive learning environment to reinforce mathematical knowledge and competence [10]. In the design of the tasks a special attention has been dedicated to feedback, considered a core element for promoting success. MATE-BOOSTER feedback works at three levels: at task level, through a red cross or a green tick, when it informs students whether the task has been performed correctly or knowledge has been achieved; at process level, through interactive feedback and step-by-step processes, when it explains how the task should be performed; and at self-regulation level, through the immediate and constant feedback during the activities, when it helps learners monitor their own learning. Trough the features described in the previous paragraph, this feedback helps students to understand their level of knowledge, to recognize the aim of their study and what a good performance is, and, most importantly, to bridge the gap between their actual and desired state. In this way, the model of good feedback theorized by Hattie and Timplerley [16] is satisfied, and the feedback provided can be considered well-structured.

6 Conclusions In summary, MATE-BOOSTER has been conceived with the aim of supporting students in the transition from lower to upper secondary school by strengthening basic mathematical competences. The project has been managed using the ASSURE design method (Analyze the learners; State objectives; Select methods, media and materials; Utilize media and materials; Require learner participation; Evaluate and revise). The core action of the project involves the implementation of an online course that students can use at their own pace as a support to their study. The design of the DLE has been carried out according to constructivist assumptions, and under the seven goals for building constructivist learning environments theorized by Cunningham, Duffy and Knuth. The learning methodologies used are problem posing and problem solving, collaborative learning, learning by doing and automatic formative assessment. In particular, tasks for automatic formative assessment are inspired to items used in summative assessment, due to the amount of information that they provide, but they are designed according to principles of formative assessment design suggested by Van den Heuvel-Panhuizen and Becker [37] that highlight the importance of having multiple solutions or solving strategies, that sub-questions can be interdependent, and that strategies should be the intended output. The process of re-design of tasks for automatic formative assessment obviously changed the initial problem tested in the summative assessment [53]; however, the resulting tasks should help students develop autonomy in finding a strategy to solve the problem and therefore we expect that they will improve their results in the following summative tests. Since the school year has just ended, the results, in terms of teachers’ and students’ satisfaction and competence achieved, are not available yet; the analysis is in progress, and the results will be used for perfecting the course and proposing it again. As for now, we can say that teachers and students are generally satisfied with the project. They particularly appreciated the automatically assessed assignments: students, because they found them a valid support for studying; teachers, because of their flexibility in the use as classroom or homework activities. Teachers expressed the intention of using the course

MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment

439

to prepare students to the INVALSI tests of grade 10 (some of them already experimented it with very good results) and the desire of having a similar instrument for the upper grades. Since in Italy schools and teachers need to offer paths for the revision to students who get low marks, including individualized courses and further assessment, similar courses could have a double effect on the optimization of scholastic resources: firstly, they could reduce failures at their root, as they are often due to gaps in the basic knowledge that cause difficulties in learning new things; secondly, they can be used as part of the paths of content revision, because the topics included within the course are the prerequisites required for understanding the first year course, and they are usually object of the revision courses. Thus, schools using online courses as MATE-BOOSTER could save human resources in delivering revision courses and collocate them elsewhere, such as in projects for the innovation of methodologies and curricula. This procedure could even be promoted by the Ministry of Education, maybe proposing a format that schools can customize. The project could be extended to other core disciplines, such as Italian and Foreign Languages, with the collaboration of experts in these disciplines.

References 1. MIUR: Indicazioni Nazionali per il curricolo della scuola dell’infanzia e del primo ciclo d’istruzione (2012) 2. INVALSI: Rilevazioni nazionali degli apprendimenti 2016–17 - Rapporto dei risultati (2017) 3. MIUR: Istituti Tecnici: Linee guida per il passaggio al nuovo ordinamento. (2010) 4. Pellerey, M.: Le competenze individuali e il portfolio. La Nuova Italia Scientifica, Roma (2004) 5. Debnam, K.J., Lindstrom Johnson, S., Waasdorp, T.E., Bradshaw, C.P.: Equity, connection, and engagement in the school context to promote positive youth development. J. Res. Adolesc. 24, 447–459 (2014) 6. Mariani, A.M.: La scuola può fare molto ma non può fare tutto. SEI, Torino (2006) 7. Barana, A., Fioravera, M., Marchisio, M., Rabellino, S.: Adaptive teaching supported by ICTs to reduce the school failure in the project “Scuola Dei Compiti.” In: Proceedings of 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), pp. 432–437. IEEE (2017). https://doi.org/10.1109/COMPSAC.2017.44 8. Giraudo, M.T., Marchisio, M., Pardini, C.: Tutoring con le nuove tecnologie per ridurre l’insuccesso scolastico e favorire l’apprendimento della matematica nella scuola secondaria. Mondo Digitale. 13, 834–843 (2014) 9. Barana, A., Marchisio, M., Pardini, C.: COSAM: Corso Online per lo Sviluppo di Abilità Matematiche per facilitare il passaggio tra la scuola secondaria di primo e di secondo grado. In: Design the Future! Extended Abstracts Della Multiconferenza Ememitalia 2016, pp. 436–447. Genova University Press (2017) 10. Barana, A., Marchisio, M., Miori, R.: MATE-BOOSTER: design of an e-Learning course to boost mathematical competence. In: Proceedings of the 11th International Conference on Computer Supported Education (CSEDU 2019), pp. 280–291 (2019) 11. Brown, J.S., Collins, A., Duguid, P.: Situated cognition and the culture of learning. Educ. Res. 18, 32–42 (1989) 12. von Glasersfeld, E.: Constructivism in Education. In: Husen, T., Postlethwaite, T.N. (eds.) The International Encyclopedia of Education, pp. 162–163. Pergamon Press, Oxford/New York (1989)

440

A. Barana et al.

13. Cornelius-White, J.: Learner-centered teacher-student relationships are effective: a metaanalysis. Rev. Educ. Res. 77, 113–143 (2007) 14. Lave, J.: Situating learning in communities of practice. In: Resnick, L.B., Levine, J.M., Teasley, S.D. (eds.) Perspectives on Socially Shared Cognition, pp. 63–82. American Psychological Association, Washington (1991). https://doi.org/10.1037/10096-003 15. Scriven, M.: The Methodology of Evaluation. Purdue University, Lafayette, Ind (1966) 16. Hattie, J., Timperley, H.: The power of feedback. Rev. Educ. Res. 77, 81–112 (2007) 17. Ng, C., Bartlett, B., Elliott, S.N.: Empowering Engagement: Creating Learning Opportunities for Students from Challenging Backgrounds. Springer, New York (2018) 18. Haberman, M.: The pedagogy of poverty versus good teaching. Phi Delta Kappan 92, 81–87 (2010). https://doi.org/10.1177/003172171009200223 19. Jonassen, D., Davidson, M., Collins, M., Campbell, J., Haag, B.B.: Constructivism and computer-mediated communication in distance education. Am. J. Distance Educ. 9, 7–26 (1995) 20. Kearns, L.R.: Student assessment in online learning: challenges and effective practices. MERLOT J. Online Learn. Teach. 8, 198–208 (2012) 21. Barana, A., Marchisio, M., Rabellino, S.: Automated assessment in mathematics. In: Proceedings of 2015 IEEE 39th Annual Computer Software and Applications Conference, pp. 670–671. IEEE, Taichung (2015). https://doi.org/10.1109/COMPSAC.2015.105 22. Barana, A., Marchisio, M., Rabellino, S.: Empowering engagement through automatic formative assessment. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), pp. 216–225. IEEE, Milwaukee (2019). https://doi.org/10.1109/COMPSAC. 2019.00040 23. Barana, A., Boffo, S., Gagliardi, F., Garuti, R., Marchisio, M.: Empowering engagement in a technology enhanced learning environment. In: Rehm, M., Saldien, J., Manca, S. (eds.) Project and Design Literacy as Cornerstones of Smart Education. Smart Innovation, Systems and Technologies, vol. 158, pp. 75–77. Springer, Singapore (2019). https://doi.org/10.1007/ 978-981-13-9652-6_7 24. Alonso, F., Lopez, G., Manrique, D., Vines, J.M.: An instructional model for web-based e-learning education with a blended learning process approach. Br. J. Educ. Technol. 36, 217–235 (2005). https://doi.org/10.1111/j.1467-8535.2005.00454.x 25. Czerkawski, B.C., Lyman, E.W.: An instructional design framework for fostering student engagement in online learning environments. TechTrends 60(6), 532–539 (2016). https://doi. org/10.1007/s11528-016-0110-z 26. Lefoe, G.: Creating constructivist learning environments on the web: the challenge in higher education, pp. 453–464 (1998) 27. Sangsawang, T.: Instructional design framework for educational media. Proc. - Soc. Behav. Sci. 176, 65–80 (2015). https://doi.org/10.1016/j.sbspro.2015.01.445 28. Honebein, P.C.: Seven Goals for the design of constructivist learning environments. In: Constructivist Learning Environments, pp. 11–24. Educational Technology Publications, New York (1996) 29. Barana, A., Marchisio, M., Sacchet, M.: Advantages of using automatic formative assessment for learning mathematics. In: Draaijer, S., Joosten-ten Brinke, D., Ras, E. (eds.) TEA 2018. CCIS, vol. 1014, pp. 180–198. Springer, Cham (2019). https://doi.org/10.1007/978-3-03025264-9_12 30. Moebius Assessment. https://www.digitaled.com/products/assessment/ 31. Maple. https://www.maplesoft.com/products/Maple/ 32. Barana, A., Conte, A., Fioravera, M., Marchisio, M., Rabellino, S.: A model of formative automatic assessment and interactive feedback for STEM. In: Proceedings of 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), pp. 1016–1025. IEEE, Tokyo (2018). https://doi.org/10.1109/COMPSAC.2018.00178

MATE-BOOSTER: Design of Tasks for Automatic Formative Assessment

441

33. Sadler, D.R.: Formative assessment and the design of instructional systems. Instr. Sci. 18, 119–144 (1989) 34. Beevers, C.E., Wild, D.G., McGuine, G.R., Fiddes, D.J., Youngson, M.A.: Issues of partial credit in mathematical assessment by computer. Res. Learn. Technol. 7, 26–32 (1999) 35. Nicol, D.J., Macfarlane-Dick, D.: Formative assessment and self-regulated learning: a model and seven principles of good feedback practice. Stud. High. Educ. 31, 199–218 (2006) 36. Osterlind, S.J.: Constructing Test Items. Springer, Dordrecht (1998) 37. van den Heuvel-Panhuizen, M., Becker, J.: Towards a didactic model for assessment design in mathematics education. In: Bishop, A.J., Clements, M.A., Keitel, C., Kilpatrick, J., Leung, F.K.S. (eds.) Second International Handbook of Mathematics Education. SIHE, vol. 10, pp. 689–716. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-010-0273-8_23 38. Suurtamm, Christine, et al.: Assessment in mathematics education. Assessment in Mathematics Education. ITS, pp. 1–38. Springer, Cham (2016). https://doi.org/10.1007/978-3-31932394-7_1 39. Heinich, R., Molenda, M., Russel, J.D., Smaldino, S.E.: Instructional Media and Technologies for Learning. Prince Hall, Upper Saddle River (1999) 40. Ebel, R.L.: Procedures for the Analysis of Classroom Tests. Educ. Psychol. Meas. 14, 352–364 (1954) 41. Tristan Lopez, A.: The item discrimination index: does it work? Rasch Meas. Trans. 12, 626 (1998) 42. OECD: PISA 2012 Results. OECD, Paris (2013) 43. Barana, A., et al.: Self-paced approach in synergistic model for supporting and testing students. In: Proceedings of 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), pp. 407–412. IEEE, Turin (2017) 44. Barana, A., Marchisio, M.: Dall’esperienza di Digital Mate Training all’attività di Alternanza Scuola Lavoro. Mondo Digitale. 15, 63–82 (2016) 45. Marchisio, M., Rabellino, S., Spinello, E., Torbidone, G.: Advanced e-learning for IT-army officers through virtual learning environments. J. E-Learn. Knowl. Soc. 13, 59–70 (2017). https://doi.org/10.20368/1971-8829/1382 46. Barana, A., Marchisio, M.: Ten good reasons to adopt an automated formative assessment model for learning and teaching mathematics and scientific disciplines. Proc. Soc. Behav. Sci. 228, 608–613 (2016). https://doi.org/10.1016/j.sbspro.2016.07.093 47. Brancaccio, A., Marchisio, M., Meneghini, C., Pardini, C.: More SMART mathematics and science for teaching and learning. Mondo Digitale 14, 8 (2015) 48. Barana, A., Marchisio, M.: Sviluppare competenze di problem solving e di collaborative working nell’alternanza scuola-lavoro attraverso il Digital Mate Training. Atti di Didamatica 2017, 1–10 (2017) 49. Gossen, F., Kuhn, D., Margaria, T., Lamprecht, A.-L.: Computational thinking: learning by doing with the Cinco adventure game tool. In: Proceedings of 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), pp. 990–999. IEEE (2018). https://doi.org/10.1109/COMPSAC.2018.00175 50. Rogerson-Revell, P.: Directions in e-learning tools and technologies and their relevance to online distance language education. Open Learn. 22, 57–74 (2007) 51. Gestinv 2.0. https://www.gestinv.it/ 52. INVALSI. https://www.invalsi.it/invalsi/index.php 53. Bolondi, G., Branchetti, L., Giberti, C.: A quantitative methodology for analyzing the impact of the formulation of a mathematical item on students learning assessment. Stud. Educ. Eval. 58, 37–50 (2018). https://doi.org/10.1016/j.stueduc.2018.05.002

Educational Practices in Computational Thinking: Assessment, Pedagogical Aspects, Limits, and Possibilities: A Systematic Mapping Study Lúcia Helena Martins-Pacheco(B) , Nathalia da Cruz Alves(B) and Christiane Gresse von Wangenheim(B)

,

Department of Informatics and Statistics - INE, Federal University of Santa Catarina, Campus Universitário s/n. Trindade, Florianópolis, Brazil {lucia.pacheco,c.wangenheim}@ufsc.br, [email protected]

Abstract. Computational thinking (CT) concept is still ill-defined despite being used in several studies and educational practices in the K-12 educational context. Many educational aspects associated with CT teaching require the understanding of learning approaches. Aiming to advance in CT comprehension we performed a second systematic mapping study aggregating new data to our previous study [71], adding 35 new articles to the discussion. Our main research question is “Which approaches exist for the assessment of computational thinking (CT) in the context of K-12 education?” Our findings indicate that 77% of the publications are from between 2016 and July 2019. Description of one CT implementation approach is common to 75% of the publications and most of them are “CT across the curriculum”. The most used tool is Scratch. Constructivism and constructionism are the most common pedagogical foundation. CT concepts that are more assessed are “algorithm”, “abstraction” and “decomposition”. “Pre or post-test/survey/questionnaire” are more usual assessment instruments. Test or questionnaire, where each item is scored, are the most usual method to weight assessments. We found just one study with strong psychometric rigor and other four have the potential for that. However, despite the amount of literature on the topic, it is still difficult to assess CT due to the lack of consensus about its definition, and consequently of a reliable construct. Also, pedagogical and psychological issues related to children development have to be deepened. Then research in this area must continue to allow advances in K12 CT educational context. Keywords: Computational thinking · Assessment · CT concepts constructivism

1 Introduction Wing [111] proposed that “Computational thinking involves solving problems, designing systems, and understanding human behavior, by drawing on the concepts fundamental to computer science. Computational thinking includes a range of mental tools that reflect © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 442–466, 2020. https://doi.org/10.1007/978-3-030-58459-7_21

Educational Practices in Computational Thinking

443

the breadth of the field of computer science”. This seminal idea about computational thinking – CT – resonate in academic and educational groups [16, 27, 28, 81, 82] some few years after. Researchers looked forward to defining concepts and practices to include CT in the educational context. Simultaneously, information and communication devices have been widely used around the world, as well as, easier manners of programming have been created. Interactive, friendly and ludic digital environments began to create other forms to use computer, desktops and even cellphones. Exhaustive high cognitive reasoning efforts and strict mechanical writing of computer commands for programming gave way to new possibilities allowing more focus on logic rather than in syntax [70]. New visual languages and graphical tools dismissed formal syntax that was traditionally needed for programming. Scratch, Blockly, AppInventor, and Snap! [3, 105, 106] are examples of these environments. Despite the great attention that academic groups have given to this subject, many questions associated with the adequate pedagogical and psychological approach for educational practices are still unanswered [45, 96, 97]. For example, questions as “how to match teaching and learning with the cognitive development of children?”, “how to train teachers to motivate students to engage in learning CT?”, “what kind of contents must be taught?”, “how to assess what was learned?” urge to be answered [71]. We performed a first systematic mapping study (SMS) analyzing 46 articles [71] aiming to advance in answering these questions. Now, we aim to update this first study by means of a new SMS, considering that those questions are still not fully answered. We performed a new search and found 35 new relevant articles that we analyzed and added to those of our first study. We make some theoretical considerations and we put together all data and analyze the collected information comparing to the literature of this area.

2 CT Conceptualization The definition of CT elements, aspects, characteristics or even CT skills is not consensual. Some authors (e.g., [70, 116]) associate CT to other competencies known as twenty-firstcentury competencies. In this present work, we use the Computer Science Standards [28] for CT definition that states that it is a K-12 core concept framework to teaching CT. Those concepts are algorithms and programming, the creation, test, and refinement of computational artefacts. Algorithms and programming are unfolded in algorithms, control structures, variables, modularity, and program development. Table 1 shows the K–12 Computer Science Framework [27–29] concepts and practices. Consequently, the [27–29] framework is an overarching structure that contains several possible educational practices related to computer science. We highlight the recent extensive review of CT concepts performed by Moreno-León et al. [76]. They did a text network analysis of the main definitions of this skill that have been found in the literature, aiming to offer insights on the common characteristics they share and on their relationship with computer programming. After that, they proposed a new definition of CT, that is:

444

L. H. Martins-Pacheco et al. Table 1. K–12 Computer Science Framework [27–29] concepts and practices.

Core concepts

Core practices

1. Computing Systems

1. Foster an Inclusive Computer Culture

2. Networks and Internet

2. Collaborating Around Computing

3. Data and Analysis

3. Recognizing and Defining Computational Problems

4. Algorithms and Programming

4. Developing and Using Abstractions

5. Impacts of Computing

5. Creating Computational Artifacts 6. Testing and Refining Computational Artifacts 7. Communicating About Computing

The ability to formulate and represent problems to solve them by making use of tools, concepts, and practices from the computer science discipline, such as abstraction, decomposition, or the use of simulations. (p. 33, [76]).

3 CT Pedagogical Concerning The principal questions about pedagogical concerns are: “how someone learns?”, “how contents have to be taught?”, “how to align human development in the teaching-learning process, especially in childhood?”, “how to insert CT practices in the educational context?” [71]. Rijke et al. [88] highlight that to implement computational thinking (CT) skills in primary schools, little research is reported about what CT skills to teach at what age. They found that age is related to some CT concepts, but their findings indicated that young primary school students can engage in learning these CT skills. Zhang and Nouri [117] proposed a table of progression of CT skills based on learners’ age. They consider three age ranges (5–9 years old; 9–12 years old, and 12–15 years old) defining which CT skills are appropriated for each range. Constructivism approaches are traditional in educational psychology as well as in pedagogy. These approaches are based on Piaget’s studies [20]. For him, knowledge representation depends on the learner’s active role in creating or in changing it. Constructionism [85, 86] is an approach of constructivism. Papert considers that knowledge construction is related to concrete and practical action, resulting in a real product. LOGO language is the main tool for this approach [71]. According to Csizmadia et al. [25] the constructionist learning theory by Papert, based on constructivism and Piaget, has a long tradition in computer science education for describing the students’ learning process by hands-on activities. In this direction, Vallance and Towndrow [104] also emphasized the active role of learners. They propose pedagogic transformation via a heutagogical approach involving problem-solving activities that invite student-centered design and Computational Thinking strategies. According to Blaschke [14], heutagogical learners are highly autonomous and self-determined. This approach has been proposed as a theory for applying to

Educational Practices in Computational Thinking

445

emerging technologies in distance education and for guiding distance education practice and the ways in which distance educators develop and deliver instruction using newer technologies such as social media. In order to make students have a more active role, some methodologies became usual in computer science educational practices. Hsu et al. [51] found that most of the studies adopted Project-Based Learning, Problem-Based Learning, Cooperative Learning, and Game-based Learning in the CT activities. In other words, such activities as aesthetic experience, design-based learning, and storytelling have been relatively less frequently adopted. Jenson and Droumeva [54] claim that constructionism theory supports Game-based Learning (GBL): game construction can increase student confidence and build their capacity towards ongoing computing science involvement and other STEM subjects. Aligned to this idea Atmatzidou and Demetriadis [6] consider that Educational Robotics (ER) have these roots in Papert’s work. For them, robotics activities have tremendous potential to improve classroom teaching. According to Ioannou and Makridu’s [53] review educational robotics, via the programming of robots, can give students the additional benefit to interact with a concrete object and construct knowledge efficiently. Weintrop et al. [119] claim that a new urgency has come to the challenge of defining computational thinking and providing a theoretical grounding for what form it should take in school science and mathematics classrooms. As a response to this challenge, they proposed a taxonomy to define CT for mathematics and science consisting of four main categories: data practices, modeling and simulation practices, computational problemsolving practices, and systems thinking practices. Bloom’s taxonomy is frequently used in computer science education [1, 38, 90]. It consists in a hierarchy of knowledge consisting of factual knowledge (relating to a specific discipline), procedural knowledge (techniques and procedures and when to use), declarative knowledge (relationship between concepts so that constituent parts can function as a whole), and metacognitive knowledge (knowledge of demands, strategies, and one’s limitations) (pp. 220–221, [104]). Hsu et al. [51] found that there are diverse terminologies for CT, as well as, a diverse current status across the globe. They identified four education initiatives and policies around the world: collaboration and partnerships across sectors and national boundaries, rationales taking a broad perspective and referring to common themes, a redefinition of digital competence, and an emphasis on broadening access and interest. It seems that there is a great international effort for a common foundation for computer science education including CT education, and some concerns about pedagogical approaches.

4 CT Assessment Assessment is a critical step in the learning process. How to measure the student’s progress or achievement? How to know if the course’s learning objectives have been reached? How to get information about the quality of the educational systems? Thus, assessment, in a broad sense, is a key component of teaching and learning processes. “Assessment provides information for decisions about students; schools, curricula, and programs; and educational policy” aiming to improve the teaching-learning process

446

L. H. Martins-Pacheco et al.

[17]. A great variety of assessment methods can be “used to gather information: formal and informal observations of a student; paper-and-pencil tests; a student’s performance on homework, lab work, research papers, projects, and during oral questioning; and analyses of a student’s records”. One of the most common assessment instruments are tests. “A test is defined as an instrument or systematic procedure for observing and describing one or more characteristics of a student using either a numerical scale or a classification scheme” [17], even though “test is a concept narrower than assessment”. According to these authors evaluation “is the process of making a value judgment about the worth of a student’s product or performance” and “may or may not be based on measurements or test results”. Bias, subjectivity, and inconsistency could influence the evaluation process. Tests tend to be standardized and objective, minimizing the influence of subjectivity. Test results, especially long-term results, could reflect school effectiveness. Some authors highlight the importance of formative assessment and summative assessment [3]. For Dixson and Worrel [32] formative assessment aims to improve teaching and learning, to diagnose student difficulties (ongoing before and during instruction), ask, “what is working” as well as “what needs to be improved.” For them, the summative assessment focuses on the evaluation of learning, placement and promotion decisions. It is usually formal, cumulative, after instruction, and asks “does student understand the material” and “is the student prepared for the next level of activity”. The summative assessment has higher psychometric rigor than formative assessment. Alves et al. [3] consider feedbacks given during the process of teaching and learning, especially instructional feedback, very important to reach educational goals. Suggestions and tips allow students to improve their learning step by step. Liu and Carless [68] consider learning in a peer-to-peer (P2P) collaborative environment. According to them, it could improve learning and enhance understanding. Peer feedback means a communication process through which learners enter into dialogues related to performance and standards. Peer assessment is defined as students grading the work or performance of their peers using relevant criteria. In computer science education P2P collaborative environment is common. For Zhong et al. [118], there is a lack of effective approaches to CT assessment despite CT being considered a fundamental skill. Shute et al. [97] consider: it’s not surprising that accurately assessing CT remains a major weakness in this area. (…) Having no standard CT assessment also makes it very difficult to compare results across various CT studies. Beyond this, there are new technologies that provide many innovative ways to cope with teaching and learning in the K12 context. Thus, several authors consider CT assessment a challenge in terms of standardization. We intend to contribute to comprehend CT assessment in K12 educational context by means of this upgrading of our previous systematic mapping study.

5 Updating the Study The present work is based on the previous systematic mapping study developed by authors [71], performed in February 2018. In order to update this mapping, we performed another search in July 2019.

Educational Practices in Computational Thinking

447

5.1 Definition of Systematic Mapping Study Updating In order to update our former mapping aiming to understand the state of the art on CT assessment in K-12 education [71], we conducted again a systematic mapping study (SMS) according to Petersen et al. [87] definition. Table 2 shows the research questions and the protocol is described next. Table 2. Definition of the mapping protocol (based on [71]). Research Question:

Which approaches exist for the assessment of computational thinking (CT) in the context of K-12 education?

Pedagogical methodology

Q1: Which approaches exist and what are their characteristics? Q2: Which theoretical, pedagogical foundations are used?

Assessment approaches

Q3: Which concepts of CT are assessed and how they are assessed? Q4: Which assessment methodology is used, and which instruments are used? Q5: Are there instructional assessments and feedbacks?

Measurement approaches

Q6: How does the instrument assign weights in the assessment? Q7: Are there psychometric bases in the assessment?

Data Source: We examined all published English-language articles that were available on Scopus, Web of Science, Wiley Online Library, ACM Digital Library, IEEE Xplore, APA Psycnet, Science Direct, with access through the CAPES Portal1 and Scholar Google [47]. Inclusion/Exclusion Criteria: We considered only English-language articles that presented an approach to the assessment of CT in K-12. We considered articles that were published after 2005, because the concept of “computational thinking” was only proposed by Wing in March 2006 [111]. In our searches, we established that “computational thinking” must be in the title of the article. We excluded approaches that do not have an intersection with K-12 context, such as higher education, teacher training, educational policies, given that they are out of the scope of our research objective. Quality Criteria: We considered only articles that present substantial information on the presented approach, to enable the extraction of relevant information for the analysis questions. Articles that provided, for example, only a summary of a proposal and for which no further information could be found, were excluded. Definition of Search String: In this second mapping we defined the search string by identifying that “computational thinking” must be in the title of the article and “assessment” in any of other fields. 1 Portal with access to scientific databases worldwide sponsored by Brazilian Education Ministry,

only available for research institutions.

448

L. H. Martins-Pacheco et al.

5.2 Execution of the Search This second search has been executed in July 2019 by the first author and revised by the co-authors. The first author carried out the initial search that resulted in the selection of 91 articles. Analyzing the title, we realized that 19 had already been selected in our former mapping and were already analyzed. In the second analysis stage, we reviewed titles, abstracts, keywords, and sometimes the conclusions of articles to identify those that matched the inclusion criteria, resulting in 35 potentially relevant articles, based on the results from all databases. Therefore, we analyzed data from 81 articles, 46 from the former mapping [71] and 35 of the new one. Table 3 shows the studies analyzed in each stage of systematic mapping. Table 3. Studies analyzed according to the stage of the study. Stage

References

First SMS (February 2018) Studies from: 2005–2018 n = 46

[1, 2, 5–7, 9–13, 15, 19, 22, 33, 34, 37, 38, 43–45, 50, 54, 55, 57, 58, 60, 62, 65, 73–75, 79, 84, 89–93, 96, 97, 108, 110, 112, 114, 115, 118]

Second SMS [3, 4, 8, 18, 23, 25, 29, 31, 35, 36, 39, 41, 42, 48, 49, 51, 53, 59, 61, 63, (July 2019) 64, 66, 69, 70, 72, 78, 80, 83, 88, 99, 103, 109, 113, 116, 117] Studies from: 2005 – 2019 (until July) n = 35

6 Data Analysis In this section, we present the data analysis. It is important to remark that the 2019 search considered only the first semester. The distribution of studies according to their year of publication is shown in Fig. 1. The number of articles increase until 2017 and were apparently steady in 2018 and 2019 (until July). About 77% of the studies were published between 2016 and 2019, showing the increase in publications in this subject in the last years.

Fig. 1. Amount of publications included per year, according to the defined criteria (n = 81).

Educational Practices in Computational Thinking

449

We classified the studies into seven categories according to their focus. The distribution of the articles into these categories is shown in Fig. 2. The most frequent category was “Implementation” (48) followed by “Framework and Implementation” (13). By “Implementation” we mean practical approaches of CT such as a test application or courses. By “Framework” we mean the proposition of a theoretical conceptual structure to model computational thinking. Some of the studies compare a formal known instrument that already had acknowledged validity and reliability to a new one. Finally, “Dataset Comparison” refers to the analysis of standard examination databases in comparison to CT tests.

Fig. 2. Articles distribution according to our classification. I-Implementation; FW-Framework; FWC-Framework with Comparison; FWI-Framework with Implementation (usually pilot implementation); FWR-Framework and Literature Review or Mapping; R- Literature Review or Mapping; and DSC – Dataset Comparison.

6.1 Pedagogical Methodology Concerning pedagogical methodology, we are looking for how researches approach CT teaching and learning (Q1) and about the epistemological foundations (Q2) of that. Sometimes, the last one, it is not explicit. The findings around these questions are presented next. Q1: Which approaches exist and what are their characteristics? The approaches were classified according to teaching and learning practices presented in the studies related to “implementation”. The categories that we found are not exclusive among their selves and could be taken together, for example, CT across

Fig. 3. Most frequent CT approaches used in the implementation studies.

450

L. H. Martins-Pacheco et al.

curriculum and drag-and-drop programming tools, or robots and agent-based models. Figure 3 shows the most frequent approaches. Most implementation studies use CT principles in non-computer science courses that are “CT across the curriculum” (e.g., [2, 23, 39, 110]). Similarly, it is frequent to use of some kind of robot (e.g., [4, 80, 112]). Also, unplugged activities are common (e.g., [15, 23, 37, 55, 69, 90]), as well as drag-and-drop programming tool (or block-based programming), such as Scratch or Scratch Jr. (e.g., [35, 42, 83, 99]). It is important to highlight that there is a great variety of approaches, that was only used once in our literature search. The majority are related to an isolated practice that was performed just once. Many times, the pedagogical approach depends on the chosen tool. Part of the found tools only appears once in our search: Arduino, Game Maker Studio, Lego EV3 Robotic Kit, Bee-bot, Hummingbird Kit, Snap!, Lego Mindstorms NTX 2.0, LOGO, Mighty Micro Controller, NetLogo. Other tools that are cited more than once are shown in Fig. 4. Scratch is the most cited tool. Python and VPython are more common in high school contexts (e.g., [2, 69].

Fig. 4. Most frequent tools used in the studies.

There were significant variations regarding the length of time of implementation. Therefore, it is difficult to compare its effect on the processes of learning. The range varied from some hours (e.g., Jenson and Droumeva [54], that took about 20 h) to years, in an incremental teaching-learning process (e.g., Grgurina et al. [24, 43] and Feldhausen et al. [37] that took three years) [71]. Similar findings are reported by Shute et al. [97]. Q2: Which theoretical pedagogical foundations are used? Most articles do not emphasize the pedagogical principles that were chosen. Part of the implementation is based on interaction with an electronic system or with a group of students to develop CT practices (e.g., Gadanidis et al. [39]). Figure 5 shows the most frequent pedagogical foundations used in the studies with implementation. Constructivism and Constructionism are the most usual foundations. In the second mapping, we realized that some articles are more concerned with establishing pedagogical principles to deal with CT practices. For example, Turchi et al. [103] introduce a game-based system, called TAPASPlay, to foster CT skills based on constructionism. Csizmadia et al. [25] present a new mapping tool which can be used to review classroom activities in terms of both computational thinking and constructionist learning. Lye and Koh [70] proposed a constructionism model based on the problem-solving learning environment, with information processing, scaffolding and reflection activities, that could

Educational Practices in Computational Thinking

451

Fig. 5. Pedagogical foundations used in the studies.

be designed to foster computational practices and computational perspectives. GameBased Learning (GBL) and Problem (or Project) Based Learning (PBL) are practices based on constructionism because the acquisition of knowledge representation results from an active role of the learner trying to solve a real problem. Also, Learners-Centered approach allows a learner to have an active role in his/her knowledge production. Some of these approaches can be taken together. According to Lye and Koh [70] scaffolding approach considers that novice learners need to be supported in order to facilitate the understanding and consolidation of the knowledge representation process. Basu et al. [9] and Basu et al. [11] also used scaffolding methodology in their CTSiM (Computational Thinking using Simulation and Modeling) model. Angeli and Valanides [4] also reported succeeded use of scaffolding methodology in CT practices with children. Witherspoon et al. [113] researched middle school students who participated in a virtual robotics programming curriculum. They found that participation in a scaffolded programming curriculum, within the context of virtual robotics, supports the development of generalizable computational thinking knowledge and skills that are associated with increased problem-solving performance on nonrobotics computing tasks. Bloom’s Taxonomy was used by Aggarwal et al. [1], Fronza et al. [38] and Rodriguez et al. [90]. These last ones, for example, describe an assessment that maps questions from a comprehensive project to computational thinking (CT) skills and Bloom’s Taxonomy. The project developed six rubrics for corresponding to creating, evaluating, analyzing, applying, understanding, and remembering according to Bloom’s Taxonomy. Sometimes studies do not explicit small aspects of CT but work together with a set of CT concepts. For examples, Basogain et al. [7] had explicitly taken top-down and bottom-down approaches. In the course, ECE130 (Introduction to Computational Thinking and Programming), which they proposed, used Top-down design and bottom-up implementation by means of object-oriented programming (OOP) paradigms. Usually, it uses real-world problems and, aiming to reach a solution, the learner establishes strategies as decomposition, modeling, reuse and so on. The knowledge of the whole problem allows the learning experience to be more engaged and meaningful for many students. PBL and GBL could also be top-down approaches.

452

L. H. Martins-Pacheco et al.

6.2 Assessment Approaches Q3: Which concepts of CT are assessed and how they are assessed? CT concepts are unprecise perhaps because there are a lot of synonyms or because researchers use a top-down approach to problem-solving. Some top-down approaches, such as Problem Based Learning (PBL) (e.g., [107]), do not address every aspect of CT because the strategy of problem-solving considers a set of aspects or reuses previous solutions without a step-by-step procedure. Some approaches add some other aspects that are more generic, for example, CTLS (Computational Thinking Levels Scale) proposed by Korkmaz et al. [63]. They go beyond problem-solving and algorithmic, including critical thinking, cooperativity, and creativity. Based on our findings we classified the most frequently used CT concepts: algorithm, abstraction; decomposition; loops, sequences, and conditionals; data representation/collection/analysis; problem-solving; debugging; variables; parallelism; modeling and events. Figure 6 shows the frequency of occurrence of these concepts in the articles. To assess CT efficiently it is necessary to define the concepts to allow their observation and analysis. In the next section, we analyze several instruments for CT assessment.

Fig. 6. Frequency of CT concepts.

Q4: Which assessment methodology is used, and which instruments are used? Figure 7 shows the distribution of the most frequent instruments used. Instruments that show some psychometrics proprieties will be analyzed in items Q6 and Q7. According to our findings “pre - post-test/survey/questionnaire” are the most frequent instruments used. Interview instruments are the second most used. Sometimes individual interviews are added to other instruments. Interviews, as well as self-assessment report, allow a better understanding of the student’s point of view and adjustments in the teaching and learning process. Just one “survey, questionnaire or test” to check the results is found as the fourth most common instrument. Another traditional statistical method is the assessment by using a matched pair or paired groups experiments which were applied in five studies. Assessment based on the analysis of the results of assignments related to “project/design/artifact” is very usual inside the educational environment. This kind could

Educational Practices in Computational Thinking

453

Fig. 7. Assessment instruments.

be associated with formative assessment, with constructionism or with scaffolding procedures, allowing students to engage in learning attitudes. Problem-based learning and Game-based learning approaches are usually chosen for this kind of assessment [71]. There are several approaches that did not present a specific model. We can exemplify these by a written essay [2]; online interactive assessment [108]; Paper and pencil test [115]; video analysis [93]; data set [34]; P2P assessment [7]; oral feedback, questionnaire, and observational analysis [103]. Some other approaches apply a standard instrument, or even mixes their own instrument with a standard instrument. We can exemplify theses by log-based assessment of computational creativity and the Torrance Test for Creative Thinking (TTCT) [49]; Map Test for Children [72]; Perceived Difficulty Assessment Questionnaire (PDAQ) [88]. Automatic assessments as Dr. Scratch [73] are used during the teaching and learning process in order to provide quick feedback to the learner. Alves et al. [3] performed a systematic mapping study about the automatic assessment of CT in K12 context and identified 14 approaches, focusing on the analysis of the code created by the students inferring computational thinking competencies related to algorithms and programming. For them, automatic assessment tools can support teachers with their assessment and grading as well as guide students throughout their learning process [3]. Oluk and Korkmaz [83] use Dr. Scratch to measure students CT abilities in comparison with the Computational Thinking Levels Scale (CTLS) [63]. Q5: Are there instructional assessments and feedbacks? An appropriate assessment improves student learning and engagement and it helps in the evolution of the whole educational system. Alves et al. [3] highlight this as an essential part of the learning process. However, few of the selected studies emphasized the importance of formative assessment, summative assessment and about feedbacks. Some authors are looking for an instrument with psychometric properties; others are more interested in the process of learning and teaching CT, in a way to keep students motivated with the technological practices [71]. Automatic assessment by code analysis

454

L. H. Martins-Pacheco et al.

could be performed in some kinds of formative assessments, quickly giving some clues about the decisions made by students to solve problems. According to our first mapping study, REACT [60] uses an embedded assessment for helping teachers to give a formative assessment and communicate students’ progress. CTSiM [8–10] makes use of a mentor agent to give feedback to students during their interaction with the system. DISSECT [19, 79] applies four tests during the teaching-learning process, making adjustments possible in the student’s performance. Fairy Assessment [67, 110] uses survey, attendances and four tasks during the process of teaching-learning, to follow a student’s performance and to give them feedback. TDIA [118] uses scaffolding methods and three tests to follow the student’s performance. Then, in fact, a formative assessment takes place in several practices, even though without being explicitly declared [71]. Peer to peer (P2P) practices could be used for feedback, formative and summative assessments [68, 102]. Basogain et al. [7] performed a P2P assessment. They weekly applied three assessment tools in their course: self-assessment, test and P2P assessment. By assessing the work of others, and comparing the assessments with the rubric assessments, the student develops expertise in the skill of reading, interpreting, and evaluating the quality of written code. For them, this makes students have a more active role in learning. Based on IDC (interest-driven creator) Model proposed by Chan et al. [21], Kong [62] proposed a seven-principle framework for guiding the design of K-12 CT curriculum. According to the IDC theory, staging is the final step in the creation loop, aiming to provide opportunities for learners to receive feedback and encourage refinement. Therefore, this model is conceived to develop CT skills through the curriculum, receiving frequent feedback. There are several articles that show models to measure or assess CT that have some psychometric rigor or potential for it. These models will be analyzed in the next sections. 6.3 Measurement Approaches Q6: How does the instrument use weights in the assessment? The most usual methods to weight assessments are tests or questionnaires where each item is either scored or a qualitative variable is used. A qualitative variable is used to ponder CT skills. Some of them use a Likert scale to assess students’ performance. Fronza et al. [38] calculated cyclomatic complexity for each project and classified it as low, medium, high. CTSiM [8–10] calculates “vector distance model accuracy metric” to evaluate the difference between a reference of correctness and result presented. Doleck et al. [34] use a Likert scale. Rodriguez et al., [90] classified results as Proficient, Partially Proficient, and Unsatisfactory. Seiter and Foreman [40, 96] classify the assessment as Basic, Developing, Proficient [71]. Rijke et al. [88] measure difficulty, cognitive load, and flow by means of standard instruments. The “perceived difficulty” was measured by means of PDAQ (Perceived Difficulty Assessment Questionnaire) [88]. Students are asked to rate the task on a four-point Likert scale, with a higher number representing more perceived difficulty, on three aspects: difficulty, length, and clarity. Cognitive load was measured by an adapted and translated version of the NASA Task Load Index [88] (…) by means of a five-point

Educational Practices in Computational Thinking

455

Likert scale. Flow was measured by items from the translated Flow Short Scale [88]. It consists of an adaption of an original instrument and it contains nine items about how students experienced the task. Students rated the questions on a five-point Likert scale, with a higher number representing more perceived flow. This study provides information concerning the minimum age at which lessons about abstraction and decomposition are appropriate. Ya˘gci [116] aimed to develop a scale to measure computational thinking skills (CTS) by using a five-point Likert scale that had a construct consisting of four factors Problemsolving, Cooperative Learning & Critical Thinking, Creative Thinking, and Algorithmic Thinking expressed by 42 items. He applied this instrument to a group of 785 high school students. He concluded that the instrument provides evidence from evaluation (…) thus offering a picture of how CT skills develop as students’ work progresses. Merino-Armeno et al. [72] used the Instructional Materials Motivation Survey (IMMS) instrument to evaluate different elements of CT. This instrument consisted of 36 items using a 5-point Likert scale, from 1-totally disagree to 5-totally agree, and addressed the four motivational dimensions: attention, relevance, confidence, and satisfaction. Computational Thinking test - CTt was developed by Román-González et al. [92]. CTt has a length of 28 items, and it addresses the following CT concepts: conditionals; defined/fixed loops; undefined/unfixed loops; simple functions; functions with parameters/variables. The score is calculated as the sum of correct answers along the 28 items of the test (minimum 0 and maximum 28) [71]. Werner et al. [110] use Fairy Assessment as an Alice program to analyze two of the three parts of CT identified by the Carnegie Mellon Center for Computational Thinking (CMCCT): thinking algorithmically, and making effective use of abstraction and modeling. In their instrument, the maximum score is 30 and each task is graded on a scale from zero to ten. Therefore, depending on the objective of the assessment, there are many forms to weight, ponder or score items in the instrument. Surveys and interviews give feedbacks for researchers and could be used for qualitative evaluation. Standard instruments are important to compare data collected under some consolidated statistical referential. Researchers that are looking for a psychometric instrument must have stronger methodological rigor than the ones that just aim to gather a data set to explore an educational procedure. Q7: Are there psychometric bases in the assessment? The first challenge to develop an instrument with psychometric properties is to define a construct that is theoretically reliable, which must model or represent the psychological (cognitive, emotional or motivational) reality. Reliability depends on the theoretical construct quality. Validity depends on the size of the sample (n) in which the test was applied in accordance with a serious statistical methodology. Only five studies showed psychometric properties or potential for it. They are present in Table 4. Román-González et al. [92] claim “we have provided evidence of reliability and criterion validity of a new instrument for the assessment of CT, and additionally we expanded our understanding of the CT nature through the theory-driven exploration of its associations with other established psychological constructs in the cognitive sphere.”

456

L. H. Martins-Pacheco et al. Table 4. Studies that show psychometrics properties or potential. MODEL

METHODOLOGY Feedback by mentor agent

Vector distance model accuracy metric.

Some

~100

Application

yes

Good

1251

SDARE [22]

Pre-post quizzes Multiple choice test Test

121

Questionnaire

0-1 Items were scored on a 5-point Likert scale ranging from 1 = Never to 5 = Always

Some

[34]

Pre and post-test Computational thinking scale

Some

104

Computational Thinking Self-Efficacy Scale [64]

scale

18 items

yes

Some

319

CTSiM [8, 9, 10] CTt [92]

SCORE

PSYCH.

INSTRUMENT

RIGOR

N

This study developed the CT test (CTt) that showed a better performance in terms of psychometrics rigor. It is important to highlight that practices in CT do not consider just cognitive aspects. Some studies also put together reasoning, motivational and interpersonal skills, (e.g., [55, 116]).

7 Discussion Looking at the general data, as shown in Fig. 1, the earliest publication is from 2011. About 77% of the publications are from 2016 until July 2019. So, in the last three and a half years this subject has been researched in greater depth probably indicating that it is a new field. Based on Fig. 2, the most usual kind of research is implementation (59%) but if we add those to the ones classified as “Framework and Implementation” we would account for 75% of the publications. So, the majority of articles describe some kind of CT educational practices. The studies that proposed some theoretical framework such as “Framework” (FW = 3), “Framework and Review” (FWR = 5), “Framework and Implementation” (FWI = 13) and “Framework and Comparison” (FWC = 2) account for 28% of the total of publications. This could indicate that more theoretical support is needed for this subject, probably because it deals with multidisciplinary issues (development psychology, pedagogy, K12 curriculum, computer science education, new technologies, and digital culture age). Consequently, the researches mirror these various possibilities, and researchers are looking forward to some consolidated references. We found 14 different reviews related to CT teaching and learning practices that encompass K12 and analyze some assessment approaches. Our previous SMS [71] was added to these reviews obtaining a total of 15. Nine of these have findings that have some intersection with our focus. The studies by Araújo et al. [5] – which analyzed 27 papers - and Martins-Pacheco et al. [71] – which analyzed 46 papers – showed systematic mapping reviews focusing in K12 CT assessment. Alves et al. [3] – which analyzed 14 papers - performed a systematic mapping review focused on K12 CT assessment based on an automatic assessment by means of code analysis. These three reviews have a narrower relationship with the focus of this present study. We also found relevant information in the following six reviews: Lye and Koh [70] and Kalelioglu et al. [57] – which analyzed 125 papers - both performed a systematic research review concerning

Educational Practices in Computational Thinking

457

a broad CT educational context. Shute et al. [97] – which analyzed 5 papers – performed a literature review aiming to define a framework for K16 context. Zhang and Nouri [117] – which analyzed 55 papers - performed a systematic review of learning CT through Scratch in K9 context. Buitrago Flórez et al. [18] – which analyzed 92 papers - performed a broad educational context review aiming to obtain an overview of the state of the art of teaching and learning programming in educational institutions including CT in K12. Hsu et al. [51] – which analyzed 120 papers - performed a literature review in a broad context with an emphasis in K12 context. Ioannou and Makridu [53] – which analyzed 9 papers - performed a review in educational robotics in the K12 context. Next, we make a brief comparison of the findings of these reviews with the findings of this present work. We take into account for discussion the main findings related to questions Q1 until Q7. About Q1 – “What approaches exist and what are their characteristics?” We found that the majority of implementation studies approach “CT across the curriculum”, that means that CT teaching and learning practices happen in non-computer science or programming disciplines. It is followed by practices with a robot and unplugged activities. Araújo et al. [5] found that “programming courses are the most common pedagogical approaches to promote CT for K-12 students”. It differs from our findings, but we can consider that “CT across the curriculum” or “practices with robot” do not exclude programming practices. Buitrago Flórez et al. [18] found that “real-life experiences (e.g., LEGO and LEGO Mindstorms robot)” are commonly aligned with robot practices. Kalelioglu et al. [57] found that “the main topics covered in the papers composed of activities (computerised or unplugged) that promote CT in the curriculum” which converges with our findings. As for the most commonly used tools, they are “Scratch” followed by “Alice, Storytelling Alice” and “Python or VPython”. Scratch is a block-based language and likely is the most interesting approach, because these environments are usually free, easy to use, with a graphic appeal, some of them have automatic code analysis, making quick feedback possible [70]. These findings of tools have some intersection with what was found by Buitrago Flórez et al. [18]; Lye and Koh [70] and Hsu et al. [51]. Concerning Q2 – “Which theoretical, pedagogical foundations are used?” we can remark the choice of constructivism and constructionism, followed by GBL or PBL and Learner-centered approach. These findings are aligned with what was found by Kalelioglu et al. [57], Buitrago Flórez et al. [18]; Lye and Koh [70] and Hsu et al. [51] The construction of games or game playing as well as robot applications could be an opportunity to allow learners to have an active role in his/her own knowledge construction. Also, it could be an interesting way to associate higher cognitive process, such as abstraction, with concrete results, besides allowing for some fun. However, in general, the articles analyzed did not deepen the pedagogical approaches, and many of them did not even show any concern about this issue [71]. However, in this upgraded review we could notice more concerns related to pedagogical issues and children’s development (e.g., [18, 88]). As for Q3 – “Which concepts are assessed and how they are assessed?” We found that as major occurrences “algorithm”, “abstraction” and “decomposition”. These findings have an intersection with Araújo et al. [5] and Shute et al. [97]. But the great convergence among studies seems to be in the finding that there is a lack of consensus in CT conceptualization. CT aspects or concepts are represented by a great variety of concepts,

458

L. H. Martins-Pacheco et al.

and some of them are synonyms [71]. According to the chosen approach, for example, top-down or PBL, the concepts are not divided into smaller concepts which are common in traditional programming approaches. Sometimes the final results are more relevant, or they are analyzed together with other skills as in Oluk and Korkmaz [83] and Kong et al. [61]. There are several ways to approach CT definition that bring up several theoretical constructs and consequently so many ways to assess. Concerning Q4 – “Which assessment methodology is used, and which instruments are used?” “Pre or post-test/survey/questionnaire” are more usual assessment instruments, followed by “Interviews”, and “Project/Design/Artifact resulted”. Also, just one survey or questionnaire and a matched pair or paired group are common. These findings are aligned with our previous study [69] and also in some part with Zhang and Nouri [117] and Araújo et al. [5]. Tests, questionnaires, survey, and paired groups are traditional to collect data and statistical data analysis. These give us clues about the student’s performance and the effectiveness of the used processes [71]. Concerning to Q5 – “Are there instructional assessments and feedbacks?” As mentioned before some implementations make use of instructional assessment by means of scaffolding, code automatic assessment, teacher interaction, peer to peer feedback among others. But frequently studies do not explicit use instructional assessments or feedbacks. Alves et al. [3], that focused on automatic assessment, claim that few approaches explicitly provide suggestions or tips on how to improve the code and/or use gamification elements, such as badges. Also, they consider a lack of consensus on the assessment criteria and instructional feedback has become evident as well as the need of such support to a wider variety of block-based programming languages. Regarding Q6 – “How does the instrument use weights in the assessment?” Test or questionnaire, where each item is scored, are the most usual method to weight assessments. Sometimes instruments make use of quantitative variables other times qualitative variables, generally by means of a Likert scale. Standard instruments are used in some studies as a reference, for example in Roman-González et al. [92] and Merino-Armeno et al. [72]. Finally, Q7 – “Are there psychometric bases in the assessment?” We found just one study with good psychometric rigor [92] and other four with some [10, 22, 34, 64]. It is still difficult to determine how to measure CT because there is no consensus about its definition, and consequently it is difficult to establish a construct with reliability.

8 Conclusion Wing [111] wrote in her seminal paper that “computational thinking involves solving problems, designing systems, and understanding human behavior, by drawing on the concepts fundamental to computer science. Computational thinking includes a range of mental tools that reflect the breadth of the field of computer science.” We believe that when she expressed her own viewpoint, she did not intend to create a new approach to deal with computer science education, especially because she did not support her opinion with cognitive science or with educational foundations. Oddly, a few years after her paper, her idea became very useful due to the development of new user interfaces and the great dissemination of digital devices. New technologies became very intuitive and

Educational Practices in Computational Thinking

459

fun even for children to help in solving everyday problems. Since then, some initiatives appeared to define the scope of CT (e.g., [26, 27]). On the other hand, Guzdial et al. [46] affirm that “there is still little evidence that knowing about computation improves everyday problem-solving, but there is no doubt that Wing’s call to action led to a broad and dramatic response.” Some researchers considered that the ill-defined concept of CT forgot consolidated historical studies of teaching and learning computer science in K12. To focus on the development of coding skills, such as what was done by practice with LOGO language [85, 86, 98], has shown that the other aspects involved in CT are incorporated through coding practices. In this sense, Tedre and Denning [100] consider “CT has a rich and broad history of many competing and complementing ideas. Many of its central ideas have been discovered, rediscovered, rebranded, and redefined over and over again. (…) Ignoring the history and the work of the field’s pioneers diminishes the computational thinking movement rather than strengthening it.” They also defend that the CT concept is previous to Wing’s idea even though this “label” was not used. And others do not use CT “label” to teach programming or Computer Science in K12 (e.g., [52, 77, 94]). Despite the skepticism of some authors, many researchers have been increasingly concerned with CT in the last years. Notwithstanding a deficiency of a consolidated general conceptualization and a precise definition, several important advances have been made, enriching educational experiences (e.g. [30]). Last year we found that more attention has been given to pedagogical foundation and children development. There are several international efforts to include CT in curriculum practices looking for a common foundation. Toedte and Aydeniz [101], for example, perform a meta-analysis of CT curricular implementation since 2006. Their principle interest is in CT framings and implementations for K-12 audiences in disciplines other than computer science, thus fulfilling the broad appeal and utility originally prescribed by Wing. They found 38 papers that framed in their criteria. They are optimistic about the utilization of CT characteristics and principles. They asserted their hope that CT approach will become more commonplace in public education, be utilized more continuously through primary and secondary grades, and be inventively applied across a more inclusive range of academic disciplines. In addition, they also regarded pedagogical issues. According to them, more data from classroom CT implementations are needed to help determine what pedagogical CT mechanisms are effective for students at different levels of cognitive development. In this sense, Hennessey et al. [48] have done a comprehensive analysis on Ontario elementary schools curriculum documents, and the findings suggest that CT is already a relevant consideration for educators in terms of concepts and perspectives; however, CT practices should be more widely incorporated to promote 21st-century skills across disciplines. Weintrop et al. [109] proposed a taxonomy consisting of four main categories as a larger effort to promote CT high school science and mathematics curricular materials. Juškeviˇcien˙e and Dagien˙e [56] analyzed the relationship between Digital Competence (DC) proposed by the European Commission Science Hub and CT. They found connections between them and conducted a study that aimed to help educators as well as educational policymakers to make informed decisions about how CT and DC can be included in their local institutions.

460

L. H. Martins-Pacheco et al.

In addition, recently, Scherer et al. [95] performed a meta-analysis to test the transfer effects of learning computer programming concerning the improvement of cognitive skills. According to them, the results suggested that students who learned computer programming outperformed those who did not participate in programming skills and other cognitive skills, such as creative thinking, mathematical skills, metacognition, and reasoning. Learning computer programming has certain cognitive benefits for other domains. This study is new, and probably it would be taken as an educational psychological foundation to support new researches regarding CT. The present scenario shows that researchers continue to seek clarity on the solid theoretical foundation for the field and in articulating best educational practices. We believe that rescuing good historical practices, and also considering new ones could strengthen and consolidate these present educational demands. Acknowledgments. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 and by the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil (CNPq) - Grant No.: 302149/2016-3. The authors would like to thank Renata Martins Pacheco for her help with formatting and reviewing the English version of the final text.

References 1. Aggarwal, A., Gardner-McCune, C., Touretzky, D.: Evaluating the effect of using physical manipulatives to foster computational thinking in elementary school. In: Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education, SIGCSE 2017. ACM, Seattle (2017) 2. Aiken, J.M., et al.: Understanding student computational thinking with computational modeling. In: AIP Conference Proceedings, pp. 46–49. AIP, Sidney (2013) 3. Alves, N., Von Wangenheim, C., Hauck, J.: Approaches to assess computational thinking competences based on code analysis in K-12 education: a systematic mapping study. Inform. Educ. 18(1), 17–39 (2019) 4. Angeli, C., Valanides, N.: Developing young children’s computational thinking with educational robotics: an interaction effect between gender and scaffolding strategy. Comput. Hum. Behav. 105, 105954 (2019) 5. Araújo, A., Andrade, W., Guerrero, D.: A systematic mapping study on assessing computational thinking abilities. In: Frontiers in Education Conference (FIE). IEEE, Erie (2016) 6. Atmatzidou, S., Demetriadis, S.: Advancing students’ computational thinking skills through educational robotics: a study on age and gender relevant differences. Robot. Auton. Syst. 75, 661–670 (2016) 7. Basogain, X., Olabe, M., Olabe, J., Rico, M.: Computational Thinking in pre-university Blended Learning classrooms. Comput. Hum. Behav. 80, 412–419 (2018) 8. Basu, S., Biswas, G., Kinnebrew, J.S.: Learner modeling for adaptive scaffolding in a Computational Thinking-based science learning environment. User Model. User-Adap. Inter. 27(1), 5–53 (2017). https://doi.org/10.1007/s11257-017-9187-0 9. Basu, S., Biswas, G., Kinnebrew, J., Rafi, T.: Relations between modeling behavior and learning in a Computational Thinking based science learning environment. In: Proceedings of the 23rd International Conference on Computers in Education, pp. 184–189. ICCE, Hangzhou (2015)

Educational Practices in Computational Thinking

461

10. Basu, S., Kinnebrew, John S., Biswas, G.: Assessing student performance in a computationalthinking based science learning environment. In: Trausan-Matu, S., Boyer, K.E., Crosby, M., Panourgia, K. (eds.) ITS 2014. LNCS, vol. 8474, pp. 476–481. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07221-0_59 11. Basu, S., Biswas, G., Sengupta, P., Dickes, A., Kinnebrew, J.S., Clark, D.: Identifying middle school students’ challenges in computational thinking-based science learning. Res. Pract. Technol. Enhanced Learn. 11(1), 1–35 (2016). https://doi.org/10.1186/s41039-016-0036-2 12. Bennett, V., Koh, K., Repenning, A.: Computing creativity: divergence in computational thinking. In: Proceeding of the 44th ACM technical symposium on Computer science education. ACM, Denver (2013) 13. Bilbao, J., Bravo, E., García, O., Varela, C., Rebollar, C.: Assessment of Computational Thinking Notions in Secondary School. Baltic J. Mod. Comput. 5(4), 391–397 (2017) 14. Blaschke, L.M.: Heutagogy and lifelong learning: a review of heutagogical practice and self-determined learning. Int. Rev. Res. Open Distrib. Learn. 13(1), 56–71 (2012) 15. Brackmann, C., Román-González, M., Robles, G., Moreno-León, J., Casali, A., Barone, D.: Development of computational thinking skills through unplugged activities in primary school. In: WiPSCE ‘17: Proceedings of the 12th . ACM Workshop on Primary and Secondary Computing Education, Nijmegen (2017) 16. Brennan, K., Resnick, M.: New frameworks for studying and assessing the development of computational thinking. In: Proceedings of the 2012 Annual Meeting of the American Educational Research Association. AERA, Vancouver (2012) 17. Brookhart, S., Nitko, A.: Educational Assessment of Students, 7th edn. Pearson, Des Moines (2015) 18. Buitrago Flórez, F., Casallas, R., Hernández, M., Reyes, A., Restrepo, S., Danies, G.: Changing a generation’s way of thinking: teaching computational thinking through programming. Rev. Educational Res. 87(4), 834–860 (2017) 19. Burgett, T., Folk, R., Fulton, J., Peel, A., Pontelli, E., Szczepanski, V.: DISSECT: Analysis of pedagogical techniques to integrate computational thinking into K-12 curricula. In: Frontiers in Education Conference (FIE). IEEE, El Paso (2015) 20. Carey, S., Zaitchik, D., Bascandziev, I.: Theories of development: in dialog with Jean Piaget. Dev. Rev. 38, 36–54 (2015) 21. Chan, T., Looi, C., Chang, B.: The IDC theory: creation and the creation loop, pp. 814–820 (2015) 22. Chen, G., Shen, J., Barth-Cohen, L., Jiang, S., Huang, X., Eltoukhy, M.: Assessing elementary students’ computational thinking in everyday reasoning and robotics programming. Comput. Educ. 109, 162–175 (2017) 23. Città, G., Gentile, M., Allegra, M., Arrigo, M., Conti, D., Ottaviano, S., Sciortino, M.: The effects of mental rotation on computational thinking. Comput. Educ. 141, 103613 (2019) 24. Comer, D.E., Gries, D., Mulder, M.C., Tucker, A., Turner, A.J., Young, P.R., Denning, P.J.: Computing as a discipline. Commun. ACM 32(1), 9–23 (1989) 25. Csizmadia, A., Standl, B., Waite, J.: Integrating the constructionist learning theory with computational thinking classroom activities. Inf. Educ. 18(1), 41–67 (2019) 26. CSTA & ISTE: Operational definition of computational thinking for K-12 education. Access in November 2017 (2011). http://csta.acm.org/Curriculum/sub/CurrFiles/CompTh inkingFlyer.pdf 27. CSTA: K-12 computer science standards. Access in November 2017 (2011). http://csta.acm. org/Curriculum/sub/CurrFiles/CSTA_K-12_CSS.pdf 28. CSTA K–12: Computer Science Framework. Access in November 2017 (2016). http://www. k12cs.org 29. Dagiene, V., Stupuriene, G.: Bebras - a sustainable community building model for the concept based learning of informatics and computational thinking. Inf. Educ. 15(1), 25–44 (2016)

462

L. H. Martins-Pacheco et al.

30. Dagien˙e, V., Stupurien˙e, G., Vinikien˙e, L.: Implementation of dynamic tasks on informatics and computational thinking. Baltic J. Mod. Comput. 5(3), 3016 (2017) 31. B. Daily, S., E. Leonard, A., Jörg, S., Babu, S., Gundersen, K., Parmar, D.: Embodying computational thinking: initial design of an emerging technological learning tool. Technol. Knowl. Learn. 20(1), 79–84 (2014). https://doi.org/10.1007/s10758-014-9237-1 32. Dixson, D., Worrell, F.: Formative and summative assessment in the classroom. Theor. Pract. 55(2), 153–159 (2016) 33. Djambong, T., Freiman, V.: Task-based assessment of students’ computational thinking skills developed through visual programming or tangible coding environments. In: Proceedings of the 13th International Conference on Cognition and Exploratory Learning in the Digital Age, pp. 41–51. CELDA, Mannheim (2016) 34. Doleck, T., Bazelais, P., Lemay, D.J., Saxena, A., Basnet, R.B.: Algorithmic thinking, cooperativity, creativity, critical thinking, and problem solving: exploring the relationship between computational thinking skills and academic performance. J. Comput. Educ. 4(4), 355–369 (2017). https://doi.org/10.1007/s40692-017-0090-9 35. Durak, H.: The effects of using different tools in programming teaching of secondary school students on engagement. Technol. Knowl. Learn. 25, 1–17 (2018) 36. Durak, H., Saritepeci, M.: Analysis of the relation between computational thinking skills and various variables with the structural equation model. Comput. Educ. 116, 191–202 (2018) 37. Feldhausen, R., Weese, J., Bean, N.: Increasing student self-efficacy in computational thinking via stem outreach programs. In: Proceedings of the 49th ACM Technical Symposium on Computer Science Education. ACM, Baltimore (2018) 38. Fronza, I., Ioini, N., Corral, L.: Teaching computational thinking using agile software engineering methods: a framework for middle schools. ACM Trans. Comput. Educ. (TOCE) 17(4), 1–28 (2017) 39. Gadanidis, G., Clements, E., Yiu, C.: Group theory, computational thinking, and young mathematicians. Math. Thinking Learn. 20(1), 32–53 (2018) 40. Gamma, J., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object Oriented Software. Addison-Wesley Longman Publishing Co., Boston (1995) 41. García-Peñalvo, F., Mendes, A.: Exploring the computational thinking effects in preuniversity education, pp. 407–411 (2018) 42. Garneli, V., Chorianopoulos, K.: Programming video games and simulations in science education: exploring computational thinking through code analysis. Interact. Learn. Environ. 26(3), 386–401 (2018) 43. Grgurina, N., Van Veen, K., Suhre, C., Barendsen, E., Zwaneveld, B.: Exploring students’ Computational thinking skills in modeling and simulation projects: a pilot study. In: ACM International Conference Proceeding Series, pp. 65–68. ACM, London (2015) 44. Grover, S., Basu, S., Bienkowski, M., Eagle, M., Nicholas, D., Stamper, J.: Framework for using hypothesis-driven approaches to support data-driven learning analytics in measuring computational thinking in block-based programming environments. ACM Trans. Comput. Educ. (TOCE) 17(3), 14 (2017) 45. Grover, S., Cooper, S., Pea, R.: Assessing computational learning in K-12. In: ITICSE’14. ACM, Uppsala (2014) 46. Guzdial, M., Kay, A., Norris, C., Soloway, E.: Computational thinking should just be good thinking. Commun. ACM 62, 28–30 (2019) 47. Haddaway, N.R., Collins, A.M., Coughlin, D., Kirk, S.: The role of Google Scholar in evidence reviews and its applicability to grey literature searching. PLoS ONE 10(9), 1–17 (2015) 48. Hennessey, E., Mueller, J., Beckett, D., Fisher, P.: Hiding in plain sight: identifying computational thinking in the ontario elementary school curriculum. J. Curriculum Teach. 6(1), 79–96 (2017)

Educational Practices in Computational Thinking

463

49. Hershkovitz, A., Sitman, R., Israel-Fishelson, R., Garaizar, P., Guenaga, M.: Creativity in the acquisition of computational thinking. Interact. Learn. Environ. 27, 1–17 (2019) 50. Hoover, A., et al.: Assessing computational thinking in students’ game designs. In: Proceedings of the 2016 Annual Symposium on Computer-Human. ACM, Austin (2016) 51. Hsu, T., Chang, S., Hung, Y.: How to learn and how to teach computational thinking: suggestions based on a review of the literature. Comput. Educ. 126, 296–310 (2018) 52. Hubwieser, P., Mühling, A.: Playing PISA with Bebras. In: Proceedings of the 9th Workshop in Primary and Secondary Computing Education, pp. 128–129. ACM, New York (2014) 53. Ioannou, A., Makridou, E.: Exploring the potentials of educational robotics in the development of computational thinking: a summary of current research and practical proposal for future work. Educ. Inf. Technol. 23(6), 2531–2544 (2018). https://doi.org/10.1007/s10639018-9729-z 54. Jenson, J., Droumeva, M.: Exploring media literacy and computational thinking: a game maker curriculum study. Electron. J. E-Learn. 14(2), 111–121 (2016) 55. Jiang, S., Wong, G.: Assessing primary school students’ intrinsic motivation of computational thinking. In: IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), pp. 469–474. IEEE, Tai Po (2017) 56. Juškeviˇcien˙e, A., Dagien˙e, V.: Computational thinking relationship with digital competence. Inf. Educ. 17(2), 265–284 (2018) 57. Kalelioglu, F.: A new way of teaching programming skills to K-12 students: Code.org. Comput. Hum. Behav. 52, 200–210 (2015) 58. Kalelioglu, F., Yasemin, G., Kukul, V.: A framework for computational thinking based on a systematic research review. Baltic J. Mod. Comput. 4(3), 583–596 (2016) 59. Kite, V., Park, S.: BOOM BUST BUILD: teaching computational thinking and content through urban redevelopment. Sci. Teach. 85(3), 22 (2018) 60. Koh, K.H., Basawapatna, A., Nickerson, H., Repenning, A.: Real time assessment of computational thinking. In: Proceedings of IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC, pp. 49–52. IEEE, Melbourne (2014) 61. Kong, S., Chiu, M., Lai, M.: A study of primary school students’ interest, collaboration attitude, and programming empowerment in computational thinking education. Comput. Educ. 127, 178–189 (2018) 62. Kong, S.-C.: A framework of curriculum design for computational thinking development in K-12 education. J. Comput. Educ. 3(4), 377–394 (2016). https://doi.org/10.1007/s40692016-0076-z 63. Korkmaz, Ö., Çakir, R., Özden, M.Y.: A validity and reliability study of the computational thinking scales (CTS). Comput. Hum. Behav. 72, 558–569 (2017) 64. Kukul, V., Karatas, S.: Computational thinking self-efficacy scale: development. Validity Reliab. Inf. Educ. 18(1), 151–164 (2019) 65. Lee, E., Park, J.: Challenges and perspectives of CS education for enhancing ICT literacy and computational thinking in Korea. Indian J. Sci. Technol. 9(46), 1–13 (2016) 66. Lee, V.R., Recker, M.: Paper circuits: a tangible, low threshold, low cost entry to computational thinking. TechTrends 62(2), 197–203 (2018). https://doi.org/10.1007/s11528-0170248-3 67. Linn, M.C.: The cognitive consequences of programming instruction in classrooms. Educ. Res. 14(5), 14–16+25–29 (1985) 68. Liu, N., Carless, D.: Peer feedback: the learning element of peer assessment. Teach. High. Educ. 11(3), 279–290 (2006) 69. Looi, C., How, M., Longkai, W., Seow, P., Liu, L.: Analysis of linkages between an unplugged activity and the development of computational thinking. Comput. Sci. Educ. 28(3), 255–279 (2018)

464

L. H. Martins-Pacheco et al.

70. Lye, S.Y., Koh, J.H.L.: Review on teaching and learning of computational thinking through programming: what is next for K-12? Comput. Hum. Behav. 41, 51–61 (2014) 71. Martins-Pacheco, L., von Wangenheim, C., Alves, N.: Assessment of computational thinking in K-12 context: educational practices, limits and possibilities - a systematic mapping study. In: Proceedings of the 11th International Conference on Computer Supported Education, pp. 292–303. CSEDU2019 (2019) 72. Merino-Armero, J., González-Calero, J., Cózar-Gutiérrez, R., Villena Taranilla, R.: Computational thinking initiation. An experience with robots in Primary Education. J. Res. Sci. Math. Technol. Educ. 1(2), 181–206 (2018) 73. Moreno-León, J., Robles, G., Román, M.: Dr. Scratch: automatic analysis of scratch projects to assess and foster computational thinking. RED. Revista de Educación a Distancia 46, 1–23 (2015) 74. Moreno-León, J., Robles, G., Román-González, M.: Comparing computational thinking development assessment scores with software complexity metrics. In: Global Engineering Education Conference (EDUCON), pp. 1040–1045. IEEE, Abu Dhabi (2016) 75. Moreno-León, J., Román-González, M., Harteveld, C., Robles, G.: On the automatic assessment of computational thinking skills: a comparison with human experts. In: Proceedings of the 2017 CHI Conference Extended, pp. 2788–2795. Denver (2017) 76. Moreno-León, J., Robles, G., Román-González, M., Rodríguez, J.D.: Not the same: a text network analysis on computational thinking definitions to study its relationship with computer programming. RIITE. Revista Interuniversitaria de Investigación en Tecnología Educativa 7, 26–35 (2019). https://doi.org/10.6018/riite.397151 77. Mühling, A., Ruf, A., Hubwieser, P.: Design and first results of a psychometric test for measuring basic programming abilities. In: WiPSCE 2015. ACM, London (2015) 78. Neilson, D., Campbell, T.: ADDING MATH TO SCIENCE: Mathematical and computational thinking help science students make sense of realworld phenomena. Sci. Teach. 86(3), 26 (2018) 79. Nesiba, N., Pontelli, E., Staley, T.: DISSECT: exploring the relationship between computational thinking and English literature in K-12 curricula. In: Proceedings - Frontiers in Education Conference, FIE. IEEE, El Paso (2015) 80. Newley, A., Kaya, E., Deniz, H., Yesilyurt, E.: Celebrity statues: Learning computational thinking by designing biomimetic robots. (Making Middle). Sci. Scope 42(1), 74–81 (2018) 81. NRC: Report of a Workshop on the Scope and Nature of Computational Thinking. NRC, Committee for the Workshops on Computational Thinking. The National Academies Press, USA (2010) 82. NRC: Report of a Workshop of Pedagogical Aspects of Computational Thinking. National Research Council, Committee for the Workshops on Computational Thinking. The National Academies Press, USA (2011) 83. Oluk, A., Korkmaz, Ö.: Comparing students’ scratch skills with their computational thinking skills in terms of different variables. Online Submission 8(11), 1–7 (2016) 84. Ouyang, Y., Hayden, K., Remold, J.: Introducing computational thinking through nonprogramming science activities. In: Proceedings of the 49th ACM Technical Symposium on Computer Science Education. ACM, Baltimore (2018) 85. Papert, S.: Mindstorms: Children, Computers, and Powerful Ideas. Basic Books, New York (1980) 86. Papert, S.: Situating constructionism. In: Harel, I., Papert, S. (eds.) Constructionism. Ablex, Norwood (1991) 87. Petersen, K., Feldt, R., Mujtaba, S., Mattsson, M.: Systematic mapping studies in software engineering. In: Proceedings of the 12th International Conference on Evaluation. BCS Learning & Development Ltd., Swindon (2008)

Educational Practices in Computational Thinking

465

88. Rijke, W., Bollen, L., Eysink, T., Tolboom, J.: Computational thinking in primary school: an examination of abstraction and decom-position in different age groups. Inform. Educ. 17(1), 77 (2018) 89. Rodrigues, R., Andrade, W., Campos, L.: Can computational thinking help me? A quantitative study of its effects on education. In: Frontiers in Education Conference (FIE). IEEE, Erie (2016) 90. Rodriguez, B., Kennicutt, S., Rader, C., Camp, T.: Assessing computational thinking in CS unplugged activities. In: Proceedings of the Conference on Integrating Technology into Computer Science Education, ITiCSE, SIGCSE 2017, Seattle (2017) 91. Román-González, M.: Computational thinking test: design guidelines and content validation. In: 7th International Conference on Education and New Learning Technologies (EDULEARN). EDULEARN Proceedings, Barcelona (2015) 92. Román-González, M., Pérez-González, J.-C., Jiménez-Fernández, C.: Which cognitive abilities underlie computational thinking? Criterion validity of the computational thinking test. Comput. Hum. Behav. 72, 678–691 (2017) 93. Rowe, E., Asbell-Clarke, J., Gasca, S., Cunningham, K.: Assessing implicit computational thinking in zoombinis gameplay. In: Proceedings of the 12th International Conference on the Foundations of Digital Games, FDG 2017. ACM, Hyannis (2017) 94. Salomon, G., Perkins, D.: Transfer of cognitive skills from programming: when and how? J. Educ. Comput. Res. 3, 149–170 (1987) 95. Scherer, R., Siddiq, F., Viveros, B.S.: The cognitive benefits of learning computer programming: a meta-analysis of transfer effects. J. Educ. Psychol. 111(5), 764–792 (2019). https:// doi.org/10.1037/edu0000314 96. Seiter, L., Foreman, B.: Modeling the learning progressions of computational thinking of primary grade students. In: Proceedings of the Ninth Annual International ACM Conference on International Computing Education Research. ACM, La Jolla (2013) 97. Shute, V., Sun, C., Asbell-Clarke, J.: Demystifying computational thinking. Educ. Res. Rev. 22, 142–158 (2017) 98. Soloway, E., Spohrer, J.: Studying the Novice Programmer. Psychology Press (1988) 99. Sung, W., Ahn, J., Black, J.B.: Introducing computational thinking to young learners: practicing computational perspectives through embodiment in mathematics education. Technol. Knowl. Learn. 22(3), 443–463 (2017). https://doi.org/10.1007/s10758-017-9328-x 100. Tedre, M., Denning, P.: The long quest for computational thinking. In: Proceedings of the 16th Koli Calling International Conference on Computing Education Research. ACM (2016) 101. Toedte, R.J., Aydeniz, M.: Computational thinking and impacts on K-12 science education. In: 2015 IEEE Frontiers in Education Conference (FIE), pp. 1–7. IEEE (2015). https://iee explore.ieee.org/abstract/document/7344239 102. Topping, K.J.: Peer assessment. Theor. Pract. 48(1), 20–27 (2009). https://doi.org/10.1080/ 00405840802577569 103. Turchi, T., Fogli, D., Malizia, A.: Fostering computational thinking through collaborative game-based learning. Multimed. Tools Appl. 78(10), 13649–13673 (2019). https://doi.org/ 10.1007/s11042-019-7229-9 104. Vallance, M., Towndrow, P.A.: Pedagogic transformation, student-directed design and computational thinking. Pedagogies Int. J. 11(3), 218 (2016) 105. von Wangenheim, C.G., Alves, N.C., Rodrigues, P.E., Hauck, J.C.: Teaching computing in a multidisciplinary way in social studies classes in school – a case study. Int. J. Comput. Sci. Educ. Schools 1(2), 1–14 (2017) 106. von Wangenheim, C.G., von Wangenheim, A., Pacheco, F.S., Hauck, J.C.R., Ferreira, M.N.F.: Teaching physical computing in family workshops. ACM Inroads 8(1), 48–51 (2017) 107. Webb, D.C.: Troubleshooting assessment: an authentic problem solving activity for it education. Proc.-Soc. Behav. Sci. 9, 903–907 (2010)

466

L. H. Martins-Pacheco et al.

108. Weintrop, D., et al.: Interactive assessment tools for computational thinking in high school STEM classrooms. In: Reidsma, D., Choi, I., Bargar, R. (eds.) INTETAIN 2014. LNICST, vol. 136, pp. 22–25. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08189-2_3 109. Weintrop, D., et al.: Defining computational thinking for mathematics and science classrooms. J. Sci. Educ. Technol. 25(1), 127–147 (2015). https://doi.org/10.1007/s10956-0159581-5 110. Werner, L., Denner, J., Campe, S., Kawamoto, D.: The fairy performance assessment: measuring computational thinking in middle school. In: Proceedings of the 43rd ACM Technical Symposium on Computer Science Education. ACM, Raleigh (2012) 111. Wing, J.M.: Computational thinking. Commun. ACM 49(3), 33–35 (2006) 112. Witherspoon, E., Higashi, R., Schunn, C., Baehr, E., Shoop, R.: Developing computational thinking through a virtual robotics programming curriculum. ACM Trans. Comput. Educ. (TOCE) 18(1), 1–20 (2017) 113. Witherspoon, E., Schunn, C.: Teachers’ goals predict computational thinking gains in robotics. Inf. Learn. Sci. (2019). https://doi.org/10.1108/ILS-05-2018-0035 114. Wolz, U., Stone, M., Pearson, K., Pulimood, S., Switzer, M.: Computational thinking and expository writing in the middle school. ACM Trans. Comput. Educ. 11(2), 1–22 (2011) 115. Worrell, B., Brand, C., Repenning, A.: Collaboration and computational thinking: a classroom structure. In: Proceedings of IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC, pp. 183–187. IEEE, Atlanta (2015) 116. Ya˘gcı, M.: A valid and reliable tool for examining computational thinking skills. Educ. Inf. Technol. 24(1), 929–951 (2018). https://doi.org/10.1007/s10639-018-9801-8 117. Zhang, L., Nouri, J.: A systematic review of learning by computational thinking through Scratch in K-9. Comput. Educ. 141, 103607 (2019) 118. Zhong, B., Wang, Q., Chen, J., Li, Y.: An exploration of three-dimensional integrated assessment for computational thinking. J. Educ. Comput. Res. 53(4), 562–590 (2016)

Virtual Reality and Virtual Lab-Based Technology-Enhanced Learning in Primary School Physics Diana Bogusevschi1(B) and Gabriel-Miro Muntean2 1 Dundalk Institute of Technology, Dundalk, Ireland

[email protected] 2 Dublin City University, Dublin, Ireland

[email protected]

Abstract. A European Horizon 2020 NEWTON project technology enhanced learning application, Water Cycle in Nature, and its benefits in terms of learner experience and learner satisfaction, as well as usability and knowledge gain were validated as part of a small-scale pilot, carried out in a primary school in Dublin. The Water Cycle in Nature applications is focused on precipitation formation and the relevant physics phenomena, such as vaporization, condensation and evaporation. This application contains 3D immersive computer-based virtual reality and virtual laboratory simulations. As part of this pilot, two classes were involved, one control class of 29 boys which were taught the educational content in a teacher-based approach lesson, and one experimental class of 29 boys which interacted with the NEWTON project application. Following the knowledge pre and post-tests, the control class was also exposed to the TEL Water Cycle in Nature application, in order to assess and compare the learner experience and satisfaction between the control class and the experimental class. Detailed analysis of the learner experience and learner satisfaction and comparison between the two participating classes is performed in this extended paper. The TEL application’s usability and benefits in terms of knowledge gain were also evaluated. The obtained results show good outcomes in usability and learner experience for both classes, with the control class reporting slightly better results. Another notable fact was noticed in the knowledge gain analysis between the two approaches, showing that students benefited more in learning outcomes when interacting with a teacher. It has been shown that the excitement of the experimental group students towards the game might have created a barrier in terms of learning improvement and the NEWTON application will serve better as a revision tool under the supervision and guidance of the teacher and a support tool for the teacher. Keywords: Technology-enhanced learning · Virtual Reality · Virtual laboratory · Computer-based learning · STEM · Primary education

© Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 467–478, 2020. https://doi.org/10.1007/978-3-030-58459-7_22

468

D. Bogusevschi and G.-M. Muntean

1 Introduction 1.1 Related Works Subjects in science, technology and maths (STEM) are currently suffering an increased lack of interest from students starting from primary to third level institutions. It is very important to capture learners’ attention to STEM subjects early on, starting in primary school, and encouraging them to pursue these subjects in future education. Numerous studies have been performed on this, such as investigating content and language integrated learning in OpenSimulator Project (CLILiOP), focusing on Geography by employing Virtual Reality (VR) [1], showing better cognitive results for the experimental group of learners and a higher employment of geographical terms in knowledge post-tests, compared to the control class. VR is also employed in the field of primary school mathematics, investigating the benefits of OpenSimulator VR environment combined with game-based learning, showing significant improvement when using the combination between the two techniques [2]. Another TEL technique employed in STEM subjects is Augmented Reality (AR) showing much potential in attracting learners of all ages and levels to science-related education, such as in [3], where three different AR systems, TinkerLamp, Tapcarp and Kaleidoscope, were employed for geometry teaching in a primary school setting, showing great usability and integration into the classroom. In [4] AR was employed in an informal environment at a mathematics exhibition, where participants of various ages took part, including primary level students, showing that all AR-enhanced exhibits performed significantly better in terms of knowledge acquisition and retention compared to non-AR exhibits. The benefits of combining AR with game-based learning are investigated in [5] presenting multiple benefits on both knowledge gain and learning motivation, specifically in STEM education. The concept of game and technology-based primary level maths teaching is also investigated in [6], where its benefits are observed from the point of view of both teachers and learners. Modern education is trying to stay in line with technology, such as iPads being employed in many schools for all subjects, including maths. For example, in [7], where its effect on learners’ attitude and motivation toward maths is investigated, showing a positive influence from both angles. The use of tablets in education is also investigated in [1], specifically for teaching plants to primary level students, showing a positive impact on knowledge acquisition and improvement in both collaborative and independent learning for the experimental group. Mobile devices have also been combined with collaborative learning in [8] presenting major benefits for primary school learners. The use of tablets in primary and secondary education, including for STEM subjects, has been investigated in [9], where most of the examined case studies showed positive learning outcomes. Such primary education benefits are also shown in [10], improving access to content and increasing engagement from pupils. This book chapter represents an extended version of the paper presented at CSEDU in 2019 [11], providing a more detailed description of the results. It describes the use of the Horizon 2020 NEWTON project computer-based application, Water Cycle in Nature, in a primary school in Dublin, Ireland, focusing on physics, specifically on precipitation formation. A research study with two 5th primary school classes was carried out, one

Virtual Reality and Virtual Lab-Based Technology-Enhanced Learning

469

control and one experimental, examining the benefits of the Water Cycle in Nature application in terms of knowledge gain, learner experience and usability. The next two sub-sections give a brief description of the Horizon 2020 NEWTON project and the Water Cycle in Nature application. The case study is described in Sect. 2, followed by the obtained results in Sect. 3. The summary of the paper and its conclusions are presented in Sect. 4. 1.2 Overview of European Horizon 2020 NEWTON Project Horizon 2020 NEWTON project aims to design, develop and deploy innovative solutions for TEL including innovative pedagogies such as adaptive multimedia and multisensorial content delivery mechanisms [12–14], personalisation and gamification solutions [15, 16], Virtual Labs (VL) [17], fabrication labs [18, 19], game-based, problembased, game-oriented, and flipped-classroom-based learning [20–22]. All NEWTON project solutions are employed using the NEWTON project technology enhanced learning platform, NEWTELP, which was developed by NEWTON consortium partner SIVECO, Bucharest, Romania. The NEWTON project platform, NEWTELP, is to be used by teachers for course creation, learning outcome and qualitative assessment on learner motivation and satisfaction; and by students with the primary focus on learning course material and completing knowledge tests and questionnaires in school and at home [23]. 1.3 Water Cycle in Nature Application The Water Cycle in Nature application was developed by NEWTON project consortium partner SIVECO in Romania and is one of many applications employed in small and large-scale technology enhanced educational pilots with the objective of assessing learner satisfaction and knowledge gain. The Water Cycle in Nature application is a VL combined with VR technology, where the content is explored by students though immersive multimedia 3D in two separate settings: Nature Environment for presenting and Experimental VL environment [24, 25] for reinforcing the previously presented definitions, such as vaporisation, evaporation, boiling and condensation.

2 Case Study Description 2.1 Evaluation Methodology The Water Cycle in Nature application was employed in a primary school in Dublin, Ireland, St. Patrick’s Boys’ National School (BNS), where two 5th classes, with 29 pupils in each class aging from 10 to 11 years old, participated and randomly assigned as control and experimental group. Ethics approval was obtained from the DCU Ethics Committee and this evaluation meets all ethics requirements. The experimental class took part in the NEWTON approach lesson, where the Water Cycle in Nature application was employed. The control class participated in a classic approach lesson, developed by the NEWTON Project team, ensuring that the educational content in both lessons, classic approach and NEWTON approach, were identical, and

470

D. Bogusevschi and G.-M. Muntean

was provided by their usual teacher. Pre-tests were carried out in both classes before their respective lesson, assessing the participating pupils’ knowledge level of the topic. Following each lesson, post-tests were provided to students, investigating knowledge gain. Following all the compulsory steps of the small-scale study, pre-test, classic approach lesson, post-test, the control group also interacted with the Water Cycle in Nature application, allowing the NEWTON project researchers to assess the comparison in terms of learner satisfaction between the classic approach and the NEWTON approach. Following the interaction with the application, both classes completed a Learner Satisfaction Questionnaire (LSQ), containing the following 7 questions provided below: – Q1 - The video game and the experiments that I did in the lab from the video (this is called a virtual lab!) helped me to better understand vaporisation and condensation processes; – Q2 - The video game and the experiments that I did in the virtual lab helped me to learn easier about the vaporisation and condensation processes; – Q3 - I enjoyed this lesson that included the video game and the experiments in the virtual lab; – Q4 - The experiments that I did in the virtual lab made the lesson more practical; – Q5 - The video game distracted me from learning; – Q6 - I would like to have more lessons that include video games and doing experiments in virtual labs; – Q7 – Do you have any Comments or Suggestions [25]. 2.2 Data Collection For the experimental group, the Water Cycle in Nature application was uploaded on NEWTON Project lap-tops, as the school PCs did not have the necessary specifications to support the application. After the post-test knowledge assessment was performed, the control class also interacted with the application. The DCU NEWTON Project researchers supervised the experimental approach, providing support and helping students when necessary, collecting all the paper-based knowledge tests and LSQs. The classic approach lesson was carried out simultaneously with the experimental approach lesson, by the usual control class teacher. Student IDs were employed for both groups, in order to ensure anonymization. The knowledge pre-test and post-test contained seven and eight questions respectively, each with a maximum of 10 points [24].

3 Results 3.1 Learner Experience and Learner Satisfaction The learner experience was assessed based on LSQ questions Q1 to Q6. A 5-Likert scale was employed during the LSQ, with the following answers Strongly Agree (SA), Agree (A), Neutral (N), Disagree (D) and Strongly Disagree (SD). One control group student did not provide answers to questions Q4, Q5 and Q6.

Virtual Reality and Virtual Lab-Based Technology-Enhanced Learning

471

It can be observed in Fig. 1 that a much higher percentage of control group students agreed that the Water Cycle in Nature application VL helped them better understand vaporisation and condensation processes (Q1), at nearly 83% (37.93% providing a SA answer and 44.83% an A answer), compared to just under 52% for the experimental class, where 24.14% Strongly Agreed and 27.59% Agreed. This might point to the fact that the NEWTON approach worked better as a revision tool for the control class, as the learners from this group were firstly presented the relevant physics topics during a classic approach lesson, by their usual teacher.

St. Patrick's Experimental Group LSQ - Question Q1

St. Patrick's Control Group LSQ - Question Q1

3.45 6.90

6.90 24.14

10.71 37.93

37.93 27.59

SA

A

N

D

SD

44.83

SA

A

N

D

SD

Fig. 1. Experimental vs. Control groups LSQ results: Question Q1 – “The video game and the experiments that I did in the lab from the video (this is called a virtual lab!) helped me to better understand vaporisation and condensation processes”.

A higher percentage of control group students also considered that the video game and the experiments helped them to learn easier about the vaporisation and condensation processes (Q2), as presented in Fig. 2. 27.59% of experimental group students provided an SA answer and 41.38% providing an A answer, which totals in just under 69%, compared to over 86% of control group learners, with 48.28% Strongly Agreeing and 37.93% Agreeing. This also points to a higher efficiency of the application as a revision tool, rather than an introduction tool. A similar percentage of learners in both classes enjoyed the application, as per their answers to Question Q3 presented in Fig. 3. Both experimental and control classes provided 62.07% of SA answers to statement “I enjoyed this lesson that included the video game and the experiments in the virtual lab”, 24.14% of experimental group students and 31.03% or control group students agreed with this statement. This leads to a total of around 86% for experimental group and around 93% for the control group. The same percentage of students (75.86%) in both classes thought that the Water Cycle in Nature application virtual experiments made the lesson more practical, as per the answers to Question Q4, which are illustrated in Fig. 4. The difference was that in

472

D. Bogusevschi and G.-M. Muntean

St. Patrick's Experimental Group LSQ - Question Q2

St. Patrick's Control Group LSQ - Question Q2

6.90

14.29 27.59

24.14

48.28 37.93 41.38 SA

A

N

D

SD

SA

A

N

D

SD

Fig. 2. Experimental vs. Control groups LSQ results: Question Q2 – “The video game and the experiments that I did in the virtual lab helped me to learn easier about the vaporisation and condensation processes”.

St. Patrick's Experimental Group LSQ - Question Q3 10.34

St. Patrick's Control Group LSQ - Question Q3

3.45

7.14

31.03

24.14

62.07

62.07

SA

A

N

D

SD

SA

A

N

D

SD

Fig. 3. Experimental vs. Control groups LSQ results: Question Q3 – “I enjoyed this lesson that included the video game and the experiments in the virtual lab”.

the experimental group, 20.7% of students Strongly Agreed with this, whereas in the control group a higher percentage of students had the same opinion – 37.3% Strongly Agreed. Question Q5 seemed to have created some confusion. The question itself has a negative connotation – “The video game distracted me from learning”, and some students

Virtual Reality and Virtual Lab-Based Technology-Enhanced Learning

473

answered Agree or Strongly Agree as these answers were positive and the overall impression of students was a positive one. It is therefore believed that some of the Strongly Agree and Agree answers were meant to be Strongly Disagree and Disagree. Nevertheless, when looking at the actual obtained results, in the experimental group of students, 17.24% Strongly Agreed and 6.9% Agreed, compared to the control group where 13.79%

St. Patrick's Experimental Group LSQ - Question Q4

St. Patrick's Control Group LSQ - Question Q4 3.45

10.34

20.69

17.86

13.79

37.93

37.93

55.17 SA

A

N

D

SD

SA

A

N

D

SD

Fig. 4. Experimental vs. Control groups LSQ results: Question Q4 – “The experiments that I did in the virtual lab made the lesson more practical”.

St. Patrick's Experimental Group LSQ - Question Q5

13.79

17.24 34.48

St. Patrick's Control Group LSQ - Question Q5

24.14 10.34

6.90

20.69

20.69 28.57

20.69 SA

A

N

D

SD

SA

A

N

D

SD

Fig. 5. Experimental vs. Control groups LSQ results: Question Q5 – “The video game distracted me from learning”.

474

D. Bogusevschi and G.-M. Muntean

of students Strongly Agreed and 10.34% Agreed, with a total of just over 24% for each group, experimental and control (Fig. 5). The vast majority of participating students would like to have more lessons that include video games and doing experiments in virtual labs. This was seen when assessing Question Q6, presented in Fig. 6. It can be seen that in the experimental group, students split their answers between Strongly Agree, Agree and Neutral, whereas in the control group, only Strongly Agree and Agree answers were present. The experimental group reported a 93% of students providing positive answers – 65.52% with SA answers and 27.59% with A answers. The control group had a higher percentage of students with SA answers – 75.86%; and a lower percentage of students with A answers – 20.69%.

St. Patrick's Experimental Group LSQ - Question Q6

St. Patrick's Control Group LSQ - Question Q6

6.90 20.69

27.59

65.52 75.86

SA

A

N

D

SD

SA

A

N

D

SD

Fig. 6. Experimental vs. Control groups LSQ results: Question Q6 – “I would like to have more lessons that include video games and doing experiments in virtual labs”.

3.2 Water Cycle in Nature Application Usability The usability of the Water Cycle in Nature application was assessed using the provided LSQ Q7 answers. In each class, 11 students (37.9%) chose not to provide an answer to this question. The majority of the remaining 62.1% provided positive comments, such as “Good lesson”, “I enjoyed it and I would do it again”, “Very fun and help me learn about vaporization and condensation”, “Very fun and easy to learn definitely want to learn more things like this as normal school is boring” in the control class and “It was Awesome”, “It was really fun it helped me to learn about vaporization and condensation”, “It was amazing and the best I love learning that way”, “I think was so cool thank you so much” in the experimental class. It appeared that the control class was somewhat more positive in their comments, as they used the application to review what was already presented to them in the classic approach lesson and they were using the NEWTON approach lesson more so as a revision

Virtual Reality and Virtual Lab-Based Technology-Enhanced Learning

475

tool. Some of the experimental class students perceived parts of the Water Cycle in Nature application slightly boring, expressing the hope to have a more gamified experience and to have more freedom in the VL, in terms of experiments to carry out. Some students provided commented on the audio track, suggesting having a more engaging voice-over. 3.3 Learning Outcome The learning impact assessment was investigated using the pre-tests provided to both classes prior the classic approach for control class and NEWTON approach for the experimental class and the post-tests employed after each lesson. The average grades for the pre-test and post-test are presented in Fig. 7.

Points

Water Cycle in Nature Application: Average Pre and Post-Test Grades 10.00 9.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00

5.03 3.46

3.98 2.69

Experimental PRE-TEST

Control POST-TEST

Fig. 7. Average Pre and Post-test Mean grades for the experimental and control groups [11].

It is seen that the experimental class had a slightly higher average pre-test grade. A t-test between the two pre-tests showed that the experimental group’s higher average pre-test mark was of no statistical significance, at α = 0.05 (t(56) = 1.7423, p = 0.087). Following the classic and NEWTON approach lessons the average grades improved for both classes. The experimental class exhibited a 15.21% improvement, with 55% of students providing improved grades, whereas the control group had a much higher knowledge gain, at over 85%, with 82% of students having improved grades. The experimental group shows improvement of no statistical significance, at α = 0.05 (t(28) = 1.243, p = 0.2239). The grades improvement for the control group is statistically significant, at α = 0.05 (t(28) = 5.0517, p = 0.0001).

4 Conclusions The primary school small-scale TEL pilot carried out as part of the European Horizon 2020 NEWTON Project and described in this book chapter investigates the usability,

476

D. Bogusevschi and G.-M. Muntean

learner experience and learner satisfaction, as well as knowledge gain assessment of the project’s Water Cycle in Nature application. The small-scale educational pilot was conducted in St. Patrick’s BNS in Dublin, Ireland, where two classes participated: one class exposed to the NEWTON project application as the experimental group and the other class, where the classic teacher-based approach was used, as the control group. Both participating groups where provided knowledge tests before and after their respective lessons, teacher-based or computer-based, in order to assess the learning outcomes for each teaching approach. Following the post-tests after the classic approach lesson, the control class was also exposed to the Water Cycle in Nature application. Each class also completed a LSQ, investigating Learner Experience, Learner Satisfaction and Application Usability. In terms of knowledge gain, both classes showed improvement, however it was observed that the control class performed much better compared to the experimental class. The experimental class appeared much more excited about the Water Cycle in Nature application which might have created a barrier to achieving significant knowledge gain, as they were more interested in the visual aspects of the application, rather than its educational content. The classic approach presentation was provided to the control group by their usual teacher, who allowed additional questions from students during the classic approach, which inadvertently created an advantage for the control class. In the NEWTON approach lesson, the experimental class did not ask any questions, as they were very focused on the application’s impressive immersive multimedia 3D simulation and not on the actual educational content. It also has to be noted that during the experimental approach, the usual teacher was not present, which might have introduced a very high sense of freedom for the experimental group learners, enabling them to concentrate only on the game, which was a very new school experience for them, rather than on learning. This case study reinforced the perception that teachers’ leadership is extremely valuable during TEL lessons [26]. Very good results were observed in terms of Learners Experience and Application Usability. The control class reported the application much more useful when learning about precipitation formation, compared to the experimental group. This might be due to the fact that, having been presented the topic in a classic approach manner first, the control group considered the NEWTON Project Water Cycle in Nature application as a revision tool, as they were already familiar with the content. The control group also reported more positive results in a few aspects evaluated in the LSQ, such as reporting better understanding the presented content in the Water Cycle in Nature application Virtual Lab, compared to the experimental group. The control class also reported finding learning through a virtual lab easier compared to the experimental group, which is another indication that, having already been familiarized with the content in the previous classic approach lesson, the control group was perceiving the TEL application as a revision tool. Both groups reported similar results for enjoying the application, with a slightly higher result obtained from the control class. It also needs to be reminded that the control class already carried out the knowledge tests and students knew that after interacting with the Water Cycle in Nature they did not have to complete any more assessment steps, which probably increased their sense of enjoyment. Both the control and the experimental groups of participating students reported seeing the Virtual Lab embedded in the NEWTON project application more practical. All control group students and over

Virtual Reality and Virtual Lab-Based Technology-Enhanced Learning

477

90% of the experimental group students reported wanting to have more TEL approach lessons similar to the one employed in this small-scale pilot. The findings of this pilot confirm the importance of the teacher’s presence during TEL-based lessons and their guidance and support would increase both knowledge gain as well as learner motivation and and learner satisfaction. Following the LSQ comments provided by the two groups, the application was updated and localised, and it was employed as part of a large-scale pilot in various European countries (Ireland, Slovakia and Romania) as part of the Earth Course [27– 29] that is provided to students using multiple NEWTON project technologies, which, following the findings in the small-scale pilots presented in this chapter, will also assess the NEWTON approach as both an introductory and as a revision tool.

References 1. Fokides, E., Zamplouli, C.: Content and language integrated learning in OpenSimulator project. Results of a pilot implementation in Greece. Educ. Inf. Technol. 22(4), 1479–1496 (2017) 2. Kim, H., Ke, F.: Effects of game-based learning in an OpenSim-supported virtual environment on mathematical performance. Interact. Learn. Environ. 25(4), 543–557 (2017) 3. Cuendet, S., Bonnard, Q., Do-Lenh, S., Dillenbourg, P.: Designing augmented reality for the classroom. Comput. Educ. 68, 557–569 (2013) 4. Sommerauer, P., Muller, O.: Augmented Reality in informal learning environments: a field experiment in a mathematics exhibition. Comput. Educ. 79, 59–68 (2014) 5. Pellas, N., Fotaris, P., Kazanidis, I., Wells, D.: Augmenting the learning experience in primary and secondary education: a systematic review of recent trends in augmented reality gamebased learning. Virtual Reality 23, 329–346 (2018) 6. Misfeldt, M., Zacho, L.: Supporting primary-level mathematics teachers’ collaboration in designing and using technology-based scenarios. J. Math. Teacher Educ. 19, 227–241 (2016) 7. Hilton, A.: Engaging primary school students in mathematics: can ipads make a difference? Int. J. Sci. Math. Educ. 16(1), 145–165 (2016). https://doi.org/10.1007/s10763-016-9771-5 8. Iglesias Rodríguez, A., García Riaza, B., Sanchez Gomez, M.C.: Collaborative learning and mobile devices: an educational experience in Primary Education. Comput. Hum. Behav. 72, 664–677 (2017) 9. Haßler, B., Major, L., Hennessy, S.: Tablet use in schools: a critical review of the evidence for learning outcomes. J. Comput. Assist. Learn. 32, 139–156 (2016) 10. Domingo, M.G., Gargante, A.B.: Exploring the use of educational technology in primary education: teachers’ perception of mobile technology learning impacts and applications’ use in the classroom. Comput. Hum. Behav. 56, 21–28 (2016) 11. Bogusevschi, D., Muntean, G.-M.: Water cycle in nature – an innovative virtual reality and virtual lab: improving learning experience of primary school students. In: Proceedings of the 11th International Conference on Computer Supported Education (CSEDU) (2019) 12. Zou, L., Trestian, R., Muntean, G.-M.: E3DOAS: balancing QoE and energy-saving for multidevice adaptation in future mobile wireless video delivery. IEEE Trans. Broadcast. 63(1), 26–40 (2018) 13. Bi, T., Pichon, A., Zou, L., Chen, S., Ghinea, G., Muntean, G.-M.: A DASH-based mulsemedia adaptive delivery solution. In: ACM Multimedia Systems Conference (MMSys), International Workshop on Immersive Mixed and Virtual Environment Systems (MMVE), Amsterdam, The Netherlands (2018)

478

D. Bogusevschi and G.-M. Muntean

14. Togou, M., Bi, T., Muntean, G.-M.: Enhancing Students’ Learning Experience with a DashBased Multimedia Delivery System. In: World Congress on Education (WCE). Dublin, Ireland (2019) 15. Lynch, T., Ghergulescu, I.: Large Scale Evaluation of Learning Flow. Timisoara, Romania (2017) 16. Ghergulescu, I., Zhao, D., Muntean, G.-M., Muntean, C.: Improving learning satisfaction in a programming course by using course-level personalisation with NEWTELP. In: 14th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP). Larnaca, Cyprus (2019) 17. Ghergulescu, I., Moldovan, A.-N., Muntean, C., Muntean, G.-M.: Atomic Structure Interactive Personalised Virtual Lab: Results from an Evaluation Study in Secondary Schools. In: Proceedings of the 11th International Conference on Computer Supported Education (2019) 18. Togou, M.A., Lorenzo, C., Lorenzo, E., Cornetta, G., Muntean, G.-M.: Raising students’ interest in STEM education via remote digital fabrication: an Irish primary school case study. In: EDULEARN Conference, Palma de Mallorca, Spain, July 2018 19. Togou, M., Cornetta, G., Muntean, G.-M.: NEWTON fab lab initiative: attracting k-12 european students to STEM education through curriculum-based fab lab. In: 11th International Conference on Education and New Learning Technologies (2019) 20. El Mawas, N., et al.: Final frontier game: a case study on learner experience. In: Proceedings of the 10th International Conference on Computer Supported Education (CSEDU), Madeira, Portugal (2018) 21. Muntean, C.H., El Mawas, N., Bradford, M., Pathak, P.: Investigating the impact of a 3D immersive computer-based math game on the learning process of undergraduate students. In: IEEE Frontiers in Education Conference (FIE), pp. 1–8. San Jose, CA, USA (2018) 22. Zhao, D., Chis, A.E., Choudhary, N., Makri, E.G., Muntean, G.-M., Muntean, C.H.: Improving learning outcome using the NEWTON loop game: a serious game targeting iteration in java programming course. In: EDULearn, Palma De Mallorca, Spain (2019) 23. Montandon, L., et al.: Multi-dimensional approach for the pedagogical assessment in STEM technology enhanced learning. In: EdMedia World Conference on Educational Media and Technology, Amsterdam, The Netherlands, June 2018 24. Bogusevschi, D., Tal, I., Bratu, M., Gornea, B., Caraman, D., Ghergulescu, I., Muntean, G.M.: Water cycle in nature: small-scale STEM education pilot. In: EdMedia World Conference on Educational Media and Technology, Amsterdam, The Netherlands (2018) 25. Bogusevschi, D., Bratu, M., Ghergulescu, I., Muntean, G.-M.: Primary school STEM education: using 3D computer-based virtual reality and experimental laboratory simulation in a physics case study. In: Ireland International Conference on Education, Dublin, Ireland, April 2018 26. Bogusevschi, D., Maddi, M., Muntean, G.-M.: Teachers’ impact and feedback related to technology enhanced learning in STEM education in primary and secondary schools. In: Ireland International Conference on Education (IICE 2019), Dublin, Ireland (2019) 27. Bogusevschi, D., Muntean, C.-H., Gorgi, N., Muntean, G.-M.: Earth course: primary school large-scale pilot on stem education. In: EDULearn, Palma de Mallorca, Spain (2018) 28. Bogusevschi, D., Muntean, C.H., Muntean, G.-M.: Earth course: knowledge acquisition in technology enhanced learning STEM education in primary school. In: World Conference on Educational Media Technology (EdMedia). Amsterdam, The Netherlands (2019) 29. Bogusevschi, D., Finlayson, O., Muntean, G.-M.: Learner motivation case study in STEM education technology enhanced learning. In: European Science Education Research Association (ESERA), Bologna, Italy (2019)

Increasing Parental Involvement in Computer Science Education Through the Design and Development of Family Creative Computing Workshops Nina Bresnihan1 , Glenn Strong1(B) , Lorraine Fisher1 , Richard Millwood1 , ´ and Aine Lynch2 1

Trinity College Dublin, Dublin, Ireland {Nina.Bresnihan,Glenn.Strong,Richard.Millwood}@tcd.ie 2 National Parents Council (Primary), Dublin, Ireland [email protected]

Abstract. The importance of parental involvement (PI) in education is well established. However, the growth of CS Education at K12 level in recent years has raised questions about parents’ confidence and competence in engaging in this area with their children because of their own lack of CS knowledge and skills. This paper outlines the development and evaluation of family workshops designed to increase parental confidence and competence in partaking of CS activities with their primaryschool aged children (5–13). A number of design principles were identified through a comprehensive needs analysis leading to the development of a model for family creative-computing workshops. The evaluation aimed primarily to investigate the effect of the strategy on the participants’ confidence in partaking in creative computing activities with their families. Positive results were found for the sample investigated in whom the mean confidence level rose significantly. Participants reported satisfaction with and enjoyment of the strategy, particularly the inter-family collaboration and the creation of concrete artefacts. Confidence in their ability to organise such activities also rose significantly and a number of the participating families went on to do so. The identification and evaluation of the design principles for family CS activities as well as the workshop model should be of interest to other researchers and practitioners interested in improving PI in CS Education. Keywords: K-12 education involvement

1

· Computer science education · Parental

Introduction

OurKidsCode is a joint project between Trinity College, Dublin and the National Parents Council (Primary) designed to promote and encourage Computer Science (CS) education by supporting parents in Ireland who wish to engage their c Springer Nature Switzerland AG 2020  H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 479–502, 2020. https://doi.org/10.1007/978-3-030-58459-7_23

480

N. Bresnihan et al.

children’s interest and activity in computing. The importance of parental involvement (PI) in education is well established. However, the growth of CS Education at K12 level in recent years has exposed questions over parents’ confidence and competence in engaging in this with their children because of their own lack of CS knowledge and skills. Despite this, there is evidence that parents value CS education, and are interested in supporting and encouraging their children’s engagement with it. The research described in this paper forms part of an exploratory study into what level and form of assistance parents require in order to better support their children in their CS education. The design, development and evaluation of family workshops where parents and children experience the creative use of computing together was initially reported [7]. This chapter expands on this by clearly outlining the identification of design principles for family CS activities and providing more details of their implementation along with reporting our efforts to evaluate the sustained impact of those workshops. The workshops were evaluated for their ability to increase parents’ competence and confidence with digital skills and tools as they endeavour to support their children’s learning and for whether this led to any sustained involvement in CS activities. The project aims to have impact at a national scale through partnerships to deliver the workshops throughout Ireland.

2 2.1

Background and Context CS Education in Ireland

In recent years, a worldwide skills shortage in the technology sector has led to a groundswell of interest in CS Education at K12 level. This has also been fed by the idea of Computational Thinking (CT) being a valuable 21st-century skill for all, particularly as an approach to problem-solving [50,51]. While England have introduced Computing as a compulsory subject for 5–16 year olds [8] and countries across the EU have established national curricula [27], Ireland has lagged behind somewhat. At present, an optional short-course in coding is available for junior cycle students while Computer Science as a subject for the state terminal exam, the Leaving Certificate, is being piloted since September 2018. It is currently available across 40 schools with plans for a national roll-out in September 2020. At primary level, the National Council for Curriculum and Assessment is exploring the introduction of coding through its Coding in Primary Schools Initiative [33] but no national policy is yet in place. 2.2

Parental Involvement

Parental involvement (PI) has been defined by the OECD as “parents’ active commitment to spend time to assist in the academic and general development of their children”[5, p.13]. Its impact on children’s education is well established with research consistently finding a strong association between PI and both

Increasing Parental Involvement in Computer Science Education

481

higher cognitive and non-cognitive outcomes [5,12,13,19,35]. PI can be either school-based or home-based [5,21,22] with School-based PI tending towards more structured activities such as parent-teacher meetings, school events and volunteering in school. Home-based PI tends to be less structured and can include non-academic as well as academically-oriented activities [32]. While the positive impact of PI is not in dispute, there is an acknowledgement that it is a complex area involving the interplay of many factors and variables. Quality as well as quantity is important and there is extensive literature available on how to improve both. Indeed, there is evidence to show that when parents participate in specific programs aimed at increasing their involvement, improvements are seen. This can lead to positive impacts in overall achievement [42]; reading, writing, and mathematics skills [15,25,46]; homework completion [9]; US statewide assessment scores [43]; and behaviour [30,37]. While the research is generally focused on school-based activities and interventions and often takes the form of evaluating specific interventions, a number of theoretical models have also been developed [23]. Particularly influential has been Epstein’s 6 types of parental involvement: parenting; communication; volunteering; home tutoring; involvement in decision-making; and collaboration with the community [15]. This model emphasises the shared responsibilities of schools, families, and communities working together to meet the needs of children. This emphasis is also evident in Hornby & Lafeale’s widely used model of barriers to parental involvement in education which consists of individual parent and family [23]. 2.3

Parental Involvement and CS Education

Internationally, there is a body of evidence that parents highly value CS education. Over 90% of US parents want their child to learn more CS in the future with two-thirds thinking that it should be required learning in schools [17]. In Ireland, 2 in 3 parents believe it to be as important as mainstream subjects despite its current lack of availability in schools [16]. This desire for CS education among parents, and their willingness to support it, is also clearly demonstrated by the huge success of the CoderDojo movement where parents are required to accompany their children as well as forming the bulk of volunteer mentors. Coderdojo originated in Ireland in 2011 and there are now more than 1,900 verified Dojos across 93 countries [1]. Despite this, many parents have little experience in computing, technology or education, and struggle to facilitate the learning experiences of a child who has an interest in CS. These individual parent and family barriers are of particular concern because of the importance of parents’ involvement and support to CS education [10] More specifically, Kong et al. [29] argue for the importance of the parents as feedback providers in programming education. In addition, there is some evidence to suggest that parents can directly influence learning when they choose to engage in coactivity with their children [48]. However, research revealed little in the way of specific programmes to address this with some notable exceptions such as MIT’s Family Creative Learning programme [41] and [6] exploration of family participation in a museum-based maker space.

482

N. Bresnihan et al.

Parents’ influence on their children’s educational choices is also crucial [24,36] and 90% of Irish parents agree that their awareness of future career opportunities is an important factor in encouraging the choice of STEM subjects. However 68% reported feeling ‘moderately,’ ‘poorly’ or ‘very poorly’ informed on STEM career opportunities and industry needs [3]. This limits the ability of parents to talk to their children about CS. ClarkeMadura et al., 2019 argue that these conversations can be vital for recruiting youth into CS and that informal CS programs should explore ways to involve parents in their child’s CS learning experience. Correspondingly, one of the key recommendations of the Accenture report is that parents be supported and it explicitly recommends working with the national parents association to explore how “we create the learning and the understanding of the benefits of STEM” [3, p. 18]. While this report focuses on second-level, our decision to target parents of primary-school children through the National Parents Council Primary (NPC), the national representative organisation for parents of children attending primary school, is based upon research which shows that the earlier the PI occurs the greater the impact [47]. 2.4

Summary

The evidence shows that parents are interested in supporting and encouraging their children’s engagement with computing education. With the introduction of computing at primary and second level in Ireland, it is clear that PI (e.g., being able to help a primary school student with their homework or a second-level student with career choices) is of growing importance. Clearly there is potential for parents/guardians to play a significant role in sparking and supporting interest in coding and CT. What is lacking is support for those who wish to undertake these kinds of initiatives but who feel they lack confidence, knowledge and skills. There is a strong rationale for supports for parents as they guide their children in learning skills essential for success at school as well as giving them a creative outlet for critical thinking and collaborative problem solving.

3

Research Aims

The research forms part of an exploratory study into what level and form of assistance parents require in order to better support their children in their CS education. Accordingly, the following steps were undertaken: 1. The identification, through a Needs Analysis and review of the literature, of clear design principles for an intervention aimed at increasing parental confidence and competence in CS activities 2. The development and implementation of the intervention based on those design principles. 3. The evaluation of whether the implementation of the identified design principles led to increased confidence in pursuing similar activities in the future.

Increasing Parental Involvement in Computer Science Education

483

These steps are described in the sections that follow.

4 4.1

Workshop Design Needs Analysis

The project undertook a comprehensive needs analysis prior to workshop design. A set of mixed-methods online questionnaires were circulated to parents and children, with 1,228 responses received from adults and 405 from children aged 5–13. This was further supported by a focus group of domain experts (n = 5). The Needs Analysis concluded that, while parents and children are comfortable with using computers for more passive activities, they do not have the experience to engage with more creative computing activities such as programming, media creation, or physical computing. However, they did report a good level of confidence in their ability to learn new things and expressed a clear interest in learning more creative applications of computing with their families [7]. This evidence supported our hypothesis that any supports developed be collaborative in families and that families are ready and able to benefit from such interventions. In particular, they informed the decision to provide creative computing workshops for families designed to increase confidence and encourage self-sustaining communities of practice. Furthermore, this process provided strong support for the project rationale and lead to the development of a series of clear objectives for the project: 1. To develop, publish and promote a workshop model that will help parents/guardians who feel they lack the skills and knowledge to integrate technology into their children’s learning activities. 2. To provide direct support for parents/guardians who wish to encourage children’s engagement with coding and CT at an early stage of their education. 3. To encourage and enable parents/guardians to learn and develop their own involvement with creative computing by increasing their confidence and knowledge. 4.2

Learning Approach

The aim of the workshops was to increase parents’ competence and confidence with digital skills and tools as they endeavour to support their children’s learning. It was therefore important for participants to take ownership of their learning thereby increasing their confidence to pursue similar activities in the future. Therefore, we chose Constructivism’s student-centered approach where workshop leaders are not simply information providers; but facilitators of students’ knowledge construction. In particular, as the workshops are aimed at families rather than individuals, the design is informed by Social Constructivist theories of learning [4] where knowledge construction is viewed as a social and cultural process and social interaction is crucial to the learning outcomes.

484

N. Bresnihan et al.

Within this paradigm, the workshops follow a Constructionist design with the participants working in their families towards the creation of a meaningful tangible artefact [39]. This aims both to introduce them to computing concepts and to strengthen the inter-family connections [26]. However, we were also concerned to encourage a community of practice beyond each family and we therefore looked to formalising the sharing of activity outcomes between the families and to collaborative reflection [40] which can demonstrate the value of a wider community by providing greater clarity to issues than can be individually perceived in order to strengthen the connection between families. 4.3

Design Challenges and Solutions

A number of challenges relating to the anticipated modes of delivery were identified as part of the Workshop design process. Identifying solutions to these design challenges (DCs) also influenced the approach taken to the workshop design. These challenges are summarised in Table 1. Table 1. Design challenges. Design challenge (DC)

Proposed solution

DC1 The workshops will be delivered throughout Ireland by a variety of facilitators, primarily the NPC training staff, and CoderDojo mentors, but also other interested parties

We recognise that the diversity of backgrounds and experiences of these facilitators means that the workshops must not require the facilitators to have extensive technical knowledge. A need for support materials for the organisation and running of the workshops was thus identified, both to assist facilitators, and to help participating families solve problems as they encounter them in the tasks

DC2 The NPC trainers are typically contacted by schools to deliver training for parents in an after school context

The OurKidsCode workshops are therefore designed to be deliverable in this same way. The entire workshop experience must be designed to fit into the 60–90 min that are usually available for after-school activities

DC3 The constructivist nature of the workshops means that they workshops involve groups of people engaging in physical activities for which they need space

A maximum of 20 participants so that neither the facilitators nor the available physical space are over-stretched, or at most 5–6 families, is suggested

DC4 The workshop activities should be suitable for continuation in the home

The necessary materials should thus be available to families

Increasing Parental Involvement in Computer Science Education

4.4

485

Design Principles

Based on the needs analysis, and learning approach, five design principles (DPs) for the workshops were identified and are described in Table 2. Table 2. Design principles. Design principle (DP) DP1 The workshops should consist of structured, creative activities which lead to the collaborative making of a meaningful artefact

Rationale The internal structure of the activities is intended to give both guidance to facilitators and families, and to give a sense of progression through the activity. Participants with low levels of confidence can trust the structure to keep the workshop activity moving towards a demonstrable goal. Building a meaningful artefact gives a sense of satisfaction and achievement

DP2 The workshops should tap The workshop contents should contain into the participants’ desire, playful and creative activities give clearly expressed in the needs participants a sense of fun analysis, to use computers as creative tools DP3 The workshops should be collaborative within families

Children and parents should be encouraged and supported to engage in inter-generational learning in order to maximise PI quality

DP4 The workshops must bring multiple families together to encourage inter-family support and communication

This supports the desire to help foster the development of self-sustaining communities

DP5 The workshops should Supporting a desire to have families engage encourage the idea of in further activity, whether as a community pursuing further activity, as a or otherwise family unit or along with other families

5 5.1

Workshop Implementation Workshop Model

Inspiration for the workshop model was found in the Bridge 21 model, a teambased, technology mediated learning model shown to be a pragmatic model for effective twenty-first-century learning [31]. It pointed to the importance of a clear and consistent workshop structure in order to assist in keeping families

486

N. Bresnihan et al. Table 3. Workshop model: phases and rationales.

Phase

Description

Rationale and relevant design principle [DP]

Setup

The physical environment is set up to enable the rest of the workshop. The facilitator makes refreshments available to the participants, distributes materials, and helps ensure the equipment is working

Debugging issues such as wi-fi connectivity in this phase avoids interfering with the workshop activities. Meanwhile participants begin to talk casually over refreshments [DP4, DP5]

Introduction The facilitator briefly explains the workshop model to ground the families

This sets the scene, and helps focus the participants on the process as well as the content [DP1]

Ice-breaker

All participants take part in an ‘unplugged’ ice-breaker activity specific to the creative activity planned for the session. These activities are physical (participants stand up and move around as part of the activity)

This phase introduces the creative task, and allow families to be more at-ease with each other, thus facilitating peer learning during the next phase. [DP5, DP2]

Create

A creative technical challenge is given to the participants, forming the main part of the workshop. The challenges combine coding and ‘making’ activities and are designed to encourage family members to take on different roles during the completion of the challenge

Families are encouraged to collaborate both within and between family groups, and to take on varying roles as they work on the challenges. [DP1, DP2, DP3, DP4, DP5]

Share

Families share their creations Bringing the families together at the in a structured way (a end gives a sense of achievement and tournament or showcase) fulfillment [DP2, DP4]

Reflect

All participants sit in a circle and share what they have enjoyed and learned, encouraging discussion of future plans

Improves the learning by offering an opportunity to say out loud what was learnt and evaluate strengths and weaknesses. Setting agenda for further work and making commitment for future engagement is a part of this phase. [DP4, DP5]

Increasing Parental Involvement in Computer Science Education

487

on track, but was adapted to better implement the identified workshop Design Principles. Table 3 lists the workshop phases and notes how those phases address the relevant Design Principles. Each workshop follows the same model, with workshop specific content being provided for the “Icebreaker”, “Create” and “Share” phases. A facilitator trained by the project team guides the families through the workshops. Support Materials. Responding to DC2 and DC4 the workshop facilitators are supported by materials prepared by the project: – A supporting website which contains descriptions of the workshop model, and downloadable copies of the cards and provides support and encouragement for families to undertake follow-on activities. – The “OurKidsCode cards”, a series of A5 printed activity, support, and technical cards which provide both facilitators and participants with guidance. The cards are designed to give participants clear stepped instructions for activities along with troubleshooting advice. They also give facilitators guidelines for running the workshop and provide rationale and background information as well as ideas for follow-on activities. A sample card can be seen in Fig. 1. The supporting cards are divided into five groups (coded by both colour and title for ease of identification during the workshop). Each workshop makes use of a specific set of cards which the facilitator organises ahead of the workshop proper.

Fig. 1. Example OurKidsCode card.

488

N. Bresnihan et al.

• Reference cards (pink) help explain the project ideas and principles, and are used by the facilitator ahead of time to ensure they are familiar with the project structure and goals. • Setup cards (purple) help facilitators organise the workshop, and include information about the Icebreaker which begin the workshop activities. • Workshop cards (yellow) guide the workshop participants; each workshop has step-by-step instructions on A5 cards, one set of which is given to each family. • Example cards (green) contain explanations of specific topics and can be referenced by families during the workshops to aid understanding when family members wish to investigate a topic more than the workshop strictly requires; The Workshop cards will reference these occasionally. • Technical cards (blue) help guide the use of technology during the workshop, and are referenced by the Workshop cards at suitable times (for example, card T2 gives an illustrated guide to visiting and signing in to the Scratch web site). This allows the W cards to focus on the specific workshop activity. 5.2

Sample Workshop

The following description of a typical OurKidsCode workshop provides an illustration for how the model works in practice. This workshop sets families the challenge of creating an artefact incorporating music and dance. The workshop uses the Makey Makey [11], a tool for constructing physical interfaces, and the Scratch [2] programming language. Families build a series of switches using paper plates and aluminium kitchen foil which connect to the Makey Makey and are used to trigger sound effects from a computer via the Scratch programming language. Families are then invited to prepare a set of instructions (or dance moves) to allow a participant to play a short tune, activating the switches by stepping on them. Families then exchange instructions to experience each others creations. The detailed structure of the workshop follows the standard OurKidsCode phase model. The facilitators guide suggests approximate timings for each phase, allowing for the fact that flexibility will be needed during the delivery. It is recognised that the facilitator must be able to respond to the available time for the overall workshop activity, and the need to accommodate the fact that the families are involved in creative activities. It is not considered essential that families complete the entire Create phase so long as they can participate in some way in the Share phase and feel a sense of achievement. The content of the phases is described below.

Increasing Parental Involvement in Computer Science Education

489

Phase

Description

Setup

The families settle in, and the facilitator makes sure everyone has refreshments. The facilitator also ensures that there is physical space to allow for the dance activities to come, and that the participants’ computers can access the online resources for the workshop. (c.10 min)

Introduction The facilitator briefly introduces the overall workshop model, and explains the timeline for the rest of the session. (c.5 min) Ice-breaker

The facilitator introduces the dance mat idea, and shares the basic rules of the challenge. The families then work on paper to design a set of dance steps following the guidance of a Setup card, and then challenge another family to perform the dance by following instructions. This is intended to reduce inhibitions between families by sharing dance steps, and participating in a physical activity. This phase also introduces the idea of giving orders to be followed in a specific sequence. The entire Icebreaker activity is ‘unplugged’ (i.e. performed without computers) as a key objective of this phase is to help participants become more at-east with each other so that they are more likely to share knowledge and support during the following Create phase. (c.10 min)

Create

This phase begins with the facilitator giving a short introduction to the Makey Makey and explaining the basic idea of closing a circuit (a technical card gives the families further support). A simple hand-holding activity gives participants a concrete demonstration of how circuits work with the Makey Makey. The families are encouraged to explicitly identify “Maker”, “Coder”, and “Planner” activities and roles via a Setup card, giving them some guidance in structuring the family effort. Family members are encouraged to exchange roles during the activity so that participants do not become stuck in a single role. Families then set to work, following the guide of the Workshop card, to make a set of paper-plate “switches” and connect them to the Makey Makey (see Fig. 8 for examples of finished products). Families are encouraged to decorate and personalise the ‘switches’ as they make them. This encourages creativity and play during the workshop, and also provides opportunities for family members of all ages to make enjoyable and meaningful contributions, even in cases where they are not directly interested in coding. During this phase a program is built in the Scratch programming language which plays selected musical notes on key presses which will be triggered using the Makey Makey. This session is generally characterised by exploratory coding and physical activity as families investigate the capabilities and needs of the Makey Makey platform. Simple musical composition and choreography are included in the design of the challenge allowing families to integrate a range of creative interests. (c.40 min)

Share

Families are invited to try each others dances, typically with a ‘caller’ from the family who designed the challenge giving instruction to a participant from another family activating the switches, in a mirror of the Icebreaker activity. (c.10 min)

Reflect

Families discuss and share experiences, and the facilitator encourages discussion and planning for future activities. (c.5 min)

490

6 6.1

N. Bresnihan et al.

Workshop Evaluation Methodology

In order to evaluate the successful completion of the research aims the following research questions were addressed: (a) did the workshops effectively implement the identified design principles and (b) did this lead to increased confidence in pursuing similar activities in the future. Data collection instruments were therefore designed to investigate changes in participants’ confidence by recording self-reported confidence pre- and post- workshop. They also captured the participants’ perceptions of the workshop and changes to their future intentions. These factors were evaluated using 3-cycle action research approach. Each cycle led to minor refinements in the implementation process and improvements in the supporting materials. The methods and results are described below. 6.2

Data Collection and Analysis

Multiple mixed-methods research were used in the evaluation. The data collection instruments administered are described in Table 4. Table 4. Data collection instruments. Instrument

Sample

Context

Brief description

Prequestionnaire (Quant.)

Parents

Pre-workshop

Adapted from pre-validated instruments [38] to capture confidence and previous experience

Observation (Qual.)

Parents, Children

Throughout workshop

Utilising a template to capture observations of parent and child behaviours [45]

FieldNotes (Qual.)

Parents, Children

Throughout workshop, Post-workshop

Utilising a template to capture planning, administration, design, logistics and theoretical issues as well as practical influences which impact workshop delivery written prior to and after workshops [14]

Workshop Artefacts

Parents, Children

End of workshop

Photographs of artefacts produced during workshops. [20]

Focus Group (Qual.)

Parents

End of workshop

Semi structured [44], exploring learning outcomes. Audio recorded

Postquestionnaire (Quant. + Qual.)

Parents

Immediately post-workshop

Contains questions and scales adapted from pre-validated instruments [38, 49] to capture reactions, learning and future intentions

Follow-up Parents Survey (Quant. + Qual.)

>2 months post Sent >2 months post-workshop to workshop evaluate the extent of any follow on activity [28]

Increasing Parental Involvement in Computer Science Education

491

The instrument developed to assess changes in confidence consists of ten Likert-type items, each with five responses (“strongly disagree” to “strongly agree”). It was adapted from a set designed to measure Greek high school students’ computing self-efficacy [38] which had previously been adapted to investigate teachers’ attitudes to programming before and after engaging in a creative computing [34]. The set formed a scale with Cronbach’s alpha .89 [38]. Cronbach’s alpha measures the internal consistency of a set of items, assigning a value between 0 and 1; higher values indicate greater consistency and hence are deemed to reflect greater reliability for the resulting scale, with the value of .89 typically regarded as “good” verging on “excellent” [18]. Items include: – Item 1: I enjoy working with computers – Item 4: Computers are far too complicated for me As well as the confidence scale, the post-questionnaire also contains 8 items designed to measure the effectiveness of the implementation of the workshop design principles. These were adapted from a scale developed to measure perception of IS Instruction in Greek middle schools with an ‘acceptable’ Cronbach’s alpha of .71 [49]. Items include: – Item 14: One of the things that I liked about the workshop is that we created our own things. – Item 18: One of the things that I liked about the workshop is that I could collaborate and discuss what we were doing with my family. Two further items were added to the pre- and post- questionnaires to assess the impact of the workshop on future intentions: – Item 11: I would like to take part in computing activities with my family in the future – Item 12: I feel able to organise computing activities with my family at home Data from both questionnaires were entered into SPSS. The responses to items 1–10, dealing with self-reported confidence levels pre- and post workshop were then scored from 1 to 5, or from 5 to 1, so that in each case the higher number reflected greater reported “confidence”. Frequency distributions and descriptive statistics were calculated. With a view to measuring the change in reported confidence, Cronbach’s alpha was computed in order to discover if the items, or a subset of them, could form a scale. For all ten items, the value was .945, which is deemed “excellent”. The item scores were added and the totals divided by ten, to give a value in the range 1 to 5. This was done for both the pre- and the post- questionnaire. When it came to comparing the pre-workshop and post-workshop scores, the analysis was restricted to two sets of thirteen scores as only thirteen of the eighteen participants fully completed the set of ten items. The Shapiro-Wilk test was used as the samples were small. It indicated that the distributions could be treated as normal; and so, paired t-tests were carried out to see if there was a significant difference in the means before and after the workshop. For the final 2

492

N. Bresnihan et al.

items exploring change in future intentions, a similar process was followed and results are displayed and discussed in Sect. 6.4. With regard to the 8 items exploring the perceptions of the workshop, the 16 responses to these questions were scored from 1 to 5, or from 5 to 1, so that in each case the higher number reflected greater reported more positive perceptions. As the sample was small and the questions diverse, the mean response for each question was calculated and displayed below in Fig. 3. The follow-up questionnaire was designed to explore whether the workshop led to any change in behaviour among the participants. It contains quantitative items related to future intentions and follow-on behaviours. For example: – Item 6: When you left the workshop, how eager were you to engage in further computing activities with your family? – Item 7: How well prepared were you to do any further computing activities? – Item 8. Did you go on to do further computing activities with your family or with other families? Further items were qualitative and explored the nature of any future activity. Nine participants responded to the questionnaire and findings are discussed in Sects. 6.4 and 6.5. 6.3

Recruitment and Implementation

Purposive maximum-variation sampling was used to recruit 3 primary schools. This provided a diverse range of relevant cases (1 Urban, 2 Rural). Convenience sampling was then used to recruit the participant families and took place through the parent-teacher associations (PTAs) of the selected schools. Three pilot workshops took place, two in a large urban school (each with different families) and one across 2 small rural schools, between June and October 2018. In total eighteen families (18 parents and 35 children aged 5–13) participated across the three workshops. – Workshop 1: Large urban school – 6 families (6 parents, 13 children) – Workshop 2: 2 small rural schools combined – 5 families (5 parents, 10 children) – Workshop 3: Large urban school – 7 families (7 parents, 12 children) 6.4

Quantitative Findings

Pre- and Post-Questionnaires. Of the 18 parent participants 4 were male and 14 female. They ranged in age from 37 to 63 with 14 out of the 18 aged 43–53. The two sets of self-reported confidence scores are presented graphically in Fig. 2 with each line representing a parent participant. It can be seen that the confidence levels were already high pre-workshop, with only three participants returning a score below the midpoint of the scale. Two participants actually

Increasing Parental Involvement in Computer Science Education

493

Fig. 2. Changes in self-reported confidence [7].

Fig. 3. Workshop perceptions (means, n = 16) [7].

returned slightly lower scores at the end than at the beginning of the cycle. However, of the other twelve, one stayed the same and eleven returned increased scores. Overall, the mean score rose from 3.44 to 3.74, and the difference is significant at the .05 level (t = 2.835, df = 12, p = .015). The results of the items exploring the participants’ perception of the workshop are displayed in Fig. 3. As can be seen, the mean responses to all but one of the items scored between 4 (“agree”) and 5 (“strongly agree”). We can conclude that they provide strong confirmation of the successful implementation of the design principles identified in Sect. 4.4, particularly DP1 and DP2, with their

494

N. Bresnihan et al.

Fig. 4. Desire to partake in future family computing activities.

Fig. 5. Perceived ability to organise family computing activities.

emphasis on creativity and the collaborative making of a meaningful artefact, and DP3’s consideration of the importance of intergenerational learning. Fourteen participants responded to the two items measuring the future intentions of the workshop. Unsurprisingly, as the participants were self-selecting, 9 already strongly agreed pre-workshop that they would like they would like to take part in computing activities with their families in the future. Therefore any rise was not statistically significant. However, as can be seen from Fig. 4, post-workshop the number strongly agreeing had increased to 12. Of the other 2, 1 had increased from “neither agree nor disagree” to “agree” and only one had dropped from “agree” to “neither agree nor disagree”. When asked post-workshop whether they felt able to organise CS activities at home with their families 10 out of the 14 respondents reported feeling more able than pre-workshop with 4 remaining the same. Here the mean rose from 2.86 to 3.93 a significant rise at the 0.5 level (Fig. 5). Follow-Up Questionnaire. The response rate to the follow-up questionnaire (n = 9) was somewhat disappointing. However, the responses did serve to reinforce these findings. All nine of the respondents reported feeling either “very”

Increasing Parental Involvement in Computer Science Education

495

Fig. 6. Attitudes towards further computing activities.

Fig. 7. Reported future activities and intentions.

eager (n = 4) or “quite” eager (n = 5) to engage in further computing activities with their families when they had left the workshops. In addition, four participants felt “very” prepared and five “quite” prepared to do further computing activities (Fig. 6.). It should be noted however that their eagerness to engage further only correlated to their preparedness in 5 of these cases. Five respondents did go on to do further computing activities with their family or other families. However, again there was little to no correlation between those who did so and those who reported being “very” eager and “very” prepared. Finally, all 5 families who reported that they had completed further computing activities plan more in the future (Fig. 7). 6.5

Qualitative Findings

Quantitative results were supplemented by the qualitative findings of the structured observations, focus groups and open questions in the follow-up questionnaire. An important observation was the importance of the “craft” or “making” element involved in the implementation of DP1, the collaborative making of a meaningful artefact (see Fig. 8). Indeed, some child participants were observed to be more engaged by “making” than computing, and providing it maintained connection with the task. This enhanced collaboration and dialogue within the families.

496

N. Bresnihan et al.

Fig. 8. ‘Making’ elements.

A surprising degree of creative exploration was observed by families during the workshops. For example, in the Dancemat workshop described in Sect. 5.2, the OurKidsCode Cards guide families towards five note songs, matching the most easily used inputs of the Makey Makey. Despite this, a number of families explored some of its more advanced features. These included using additional inputs offered by the Makey Makey that extend the musical range of their compositions, and coding activities that improve the user experience (such as adding animations or limiting the tendency of a simple program to “repeat” a note after a time). In addition, the Workshop Card advises family members to remove their shoes before standing on the switches (a shoe will often act as an insulator keeping the switch from making a connection), but some families instead opted to create accessories to be worn over their shoes, making use of conductive materials provided as part of the “making” supplies in the workshop (see Fig. 9 for a typical example).

Fig. 9. A workshop participant experimenting with conductive materials.

Similar creativity was shown when designing the “dance” activities, and some families have chosen to involve an entire family in the dance with a family

Increasing Parental Involvement in Computer Science Education

497

member assigned to each switch (all members holding hands to create a circuit), making the activity more of a collaboratively played musical instrument than a dance challenge (Fig. 10). The observations also recorded considerable excitement and laughter culminating during the “Share” phase where participants had a chance to show off their creations, reinforcing our hypothesis about its importance in the implementation of the DP2 (giving participants and sense of fun) and DP4(encouraging inter-family support and communication).

Fig. 10. Participants testing ‘dance mat’ switches and working in family groups.

During the focus group, comments were overwhelmingly positive: “A great idea - I feel very disconnected from my kids’ computer use”, “Thoroughly enjoyed it, time flew. We have already decided to make our own Scratch game together at midterm.” Parents commented about the positive use of computing, comparing it favourably to their usual interactions giving examples of disagreements about screen time and worrying about online usage. Many remarks were made about increased confidence as a result of the activity, supporting the findings from the questionnaires that overall confidence increased. When asked in the follow-up questionnaire what they recalled learning in the workshop, 4 of the respondents just outlined the activity itself. The others, however, were more reflective, with one commenting “That my child has no fear when it comes to technology and dives straight in, I am much more apprehensive but would like to know more”. Another spoke of the importance of “creative teamwork and the delegation of tasks”. We also asked the 5 respondents who had gone on to engage in further activity what they had done and what further activities they planned. While the activities varied, they all mentioned using Scratch as a tool (Table 5).

498

N. Bresnihan et al. Table 5. Reported further activities.

Respondent Activity completed

Activity planned

R1

We set time aside to code every It won’t be a family activity as such week. We usually set ourselves a but kids have plans to enter Coolest task or download scratch Projects 2019 worksheets

R2

We made a scratch game together, him leading me and explaining how to build the program script

To build some more games with scratch, possibly introducing some live action or movie making with a pro4 camera

R4

Did more activities with Microbit

Entering National Scratch competition and Coolest Projects

R8

Bought equipment to make our own game

Our daughter was really inspired by Scratch and I am very hands-on with creative projects so we would love to do more of the same kind of activity. . .. My husband and daughter plan to look at Python together as they both wish to learn that

R9

A little scratch programming but We want to do the project we we have not yet tackled the brought home from the session project we brought home to do due to lack of coordination skills!

The reason that all 4 respondents who did not engage in any further family activity gave was lack of time, with one respondent expanding “My daughter was very encouraged to try Scratch at home and she has done that but we haven’t done anything together. We have the gift that was handed out at the end of the session but we haven’t had a chance to set it up yet!”. 6.6

Discussion

The evaluation of the pilot workshops did result in a number of changes to the initial design. The project aimed to increase confidence in the parents so that they could undertake similar activities at home. Therefore it was particularly important that they did not feel that they needed a facilitator to complete further activities. For the first workshop a Powerpoint presentation and demonstrations of the steps of the activity were projected but it was observed that the participants then overly relied on the facilitator for instruction. In response to this design challenge (DC2), for the second workshop we designed and developed the OurKidsCode Cards - A5 cards (described in Sect. 5.1). With their support, participants were then observed to be much more self-sufficient and independent in their learning. Feedback during the focus groups confirmed their value

Increasing Parental Involvement in Computer Science Education

499

to the parents. Additionally the demands on the facilitators and the physical space means that we identified this as a design challenge (DC4) and we now recommend that the average workshop accommodates up to a maximum of 20 participants over 5–6 families. The generally positive findings indicate that the workshop design effectively implemented the identified design principles. There was also evidence that taking part in the workshop achieved its aim of strengthening the participants confidence and readiness to partake in family computing activities and we know that at least 5 of the participating families went on to do so. However, some caveats must be entered. First, the number of participants was small and we cannot assume that the results would generalise to other groups. In particular, the low response rate for the follow-up questionnaire means that these findings, while interesting, are merely indicative and we are not currently in a position to know whether the reported rise in confidence will be sustained.

7

Conclusions and Future Work

The aim of the OurKidsCode Project is to provide evidence-based design for the development of interventions to increase PI in children’s CS education. Our Needs Analysis pointed to lack of parental confidence in their own computing abilities as a potential inhibitor to such involvement. Our design process identified a number of Design Principles to address this issue within the context of family computing workshops, and resulted in a workshop model and support materials that implemented those principles. The evaluation aimed primarily to investigate the effect of the strategy on the participants’ confidence and found that the mean confidence level of parent participants rose significantly. Confidence in their ability to organise such activities also rose significantly and, while responses to a follow-on survey were low, over half of the respondents reported engaging in further activities. Overall, therefore, we are optimistic that the model has potential for supporting parents as they engage their children in CS Education. Training of OurKidsCode facilitators has begun with the aim of a national rollout of the programme. Thirty-eight facilitators have been trained so far in the delivery of the workshop consisting of NPC Trainers, community workers, CoderDojo mentors, teachers and other educators. Further evaluation and revision will be ongoing as the workshops are delivered. The evaluation of our workshop model provides evidence that the Design Principles identified can be successfully used as a basis for interventions aimed at improving PI in CS Education. Acknowledgements. This research is funded by Science Foundation Ireland and administered by the National Parents Council (Primary) in partnership with the School of Computer Science and Statistics, Trinity College Dublin. Workshops have been supported by Microsoft Ireland.

500

N. Bresnihan et al.

References 1. Coderdojo Movement. https://coderdojo.com/movement/ 2. Scratch - Imagine, Program, Share. https://scratch.mit.edu/ 3. Accenture: Powering economic growth; Attracting more young women into science and technology. Technical report, Accenture (2015) 4. Ackermann, E.: Piaget’s constructivism, Papert’s constructionism: what’s the difference. Fut. Learn. Group Publ. 5(3), 438 (2001) 5. Borgonovi, F., Montt, G.: Parental involvement in selected PISA countries and economies (2012) 6. Brahms, L.: Making as a Learning Process: Identifying and Supporting Family Learning in Informal Settings. Ph.D. thesis, University of Pittsburgh (2014) ´ OurKidsCode: facil7. Bresnihan, N., Strong, G., Fisher, L., Millwood, R., Lynch, A.: itating families to be creative with computing. In: 11th International Conference on Computer Supported Education, pp. 519–530 (July 2019) 8. Brown, N.C., Sentance, S., Crick, T., Humphreys, S.: Restart: the resurgence of computer science in UK schools. ACM Trans. Comput. Educ. (TOCE) 14(2), 9 (2014) 9. Cancio, E.J., West, R.P., Young, K.R.: Improving mathematics homework completion and accuracy of students with EBD through self-management and parent participation. J. Emot. Behav. Dis. 12(1), 9–22 (2004) 10. Clarke-Midura, J., Sun, C., Pantic, K., Poole, F., Allan, V.: Using informed design in informal computer science programs to increase youths’ interest, self-efficacy, and perceptions of parental support. ACM Trans. Comput. Educ. 19(4), 371–3724 (2019). https://doi.org/10.1145/3319445 11. Collective, B.M., Shaw, D.: Makey Makey: improvising tangible and nature-based user interfaces. In: Proceedings of the 6th International Conference on Tangible, Embedded and Embodied Interaction, TEI 2012, Kingston, Ontario, Canada, p. 367. ACM Press (2012). https://doi.org/10.1145/2148131.2148219 12. Desforges, C., Abouchaar, A.: The Impact of Parental Involvement, Parental Support and Family Education on Pupil Achievement and Adjustment: A Literature Review, vol. 433. DfES Publications, Nottingham (2003) 13. Emerson, L., Fear, J., Fox, S., Sanders, E.: Parental engagement in learning and schooling: lessons from research. A report by the Australian Research Alliance for Children and Youth (ARACY) for the Family-School and Community Partnerships Bureau, Canberra (2012) 14. Emerson, R.M., Fretz, R.I., Shaw, L.L.: Writing Ethnographic Fieldnotes. Chicago Guides to Writing, Editing, and Publishing, 2nd edn. The University of Chicago Press, Chicago (2011) 15. Epstein, J.L., Simon, B., Salinas, K.C.: Involving parents in homework in the middle grades. Res. Bull. 18(4), 4 (1997) 16. Finn, C.: A third of people think coding is more important than learning Irish. thejournal.ie (October 2014) 17. Gallup, Google: Searching for Computer Science: Access and Barriers in U.S. K-12 Education. Technical report (2015) 18. Gliem, J.A., Gliem, R.R.: Calculating, interpreting, and reporting Cronbach’s alpha reliability coefficient for Likert-type scales. Midwest Research-to-Practice Conference in Adult, Continuing, and Community Education (2003) 19. Goodall, J., Vorhaus, J.: Review of best practice in parental engagement (2011)

Increasing Parental Involvement in Computer Science Education

501

20. Hammersley, M., Atkinson, P.: Ethnography: Principles in Practice. Routledge, Abingdon (2007) 21. Harris, A., Goodall, J.: Do parents know they matter? Engaging all parents in learning. Educ. Res. 50(3), 277–289 (2008) 22. Hornby, G., Blackwell, I.: Barriers to parental involvement in education: an update. Educ. Rev. 70(1), 109–119 (2018). https://doi.org/10.1080/00131911. 2018.1388612 23. Hornby, G., Lafaele, R.: Barriers to parental involvement in education: an explanatory model. Educ. Rev. 63(1), 37–52 (2011) 24. Jodl, K.M., Michael, A., Malanchuk, O., Eccles, J.S., Sameroff, A.: Parents’ roles in shaping early adolescents’ occupational aspirations. Child Dev. 72(4), 1247–1266 (2001) 25. Jordan, G.E., Snow, C.E., Porche, M.V.: Project EASE: the effect of a family literacy project on kindergarten students’ early literacy skills. Read. Res. Q. 35(4), 524–546 (2000). https://doi.org/10.1598/RRQ.35.4.5 26. Kafai, Y.B., Burke, Q.: Constructionist gaming: understanding the benefits of making games for learning. Educ. Psychol. 50(4), 313–334 (2015) 27. Keane, N., McInerney, C.: Report on the Provision of Courses in Computer Science in Upper Second Level Education Internationally. Technical report, National Council for Curriculum and Assessment of Ireland (2016) 28. Kirkpatrick, D.: The Four Levels of Evaluation. No. 701, American Society for Training and Development (2007) 29. Kong, S.C., Li, R.K.Y., Kwok, R.C.W.: Measuring parents’ perceptions of programming education in P-12 schools: scale development and validation. J. Educ. Comput. Res. (2018). https://doi.org/10.1177/0735633118783182 30. Kratochwill, T.R., McDonald, L., Levin, J.R., Bear-Tibbetts, H.Y., Demaray, M.K.: Families and schools together: an experimental analysis of a parent-mediated multi-family group program for American Indian children. J. Sch. Psychol. 42(5), 359–383 (2004) 31. Lawlor, J., Conneely, C., Oldham, E., Marshall, K., Tangney, B.: Bridge21: teamwork, technology and learning. A pragmatic model for effective twenty-first-century team-based learning. Technol. Pedagog. Educ. 27(2), 211–232 (2018). https://doi. org/10.1080/1475939X.2017.1405066 32. LeFevre, A.L., Shaw, T.V.: Latino parent involvement and school success: longitudinal effects of formal and informal support. Educ. Urban Soc. 44(6), 707–723 (2012) 33. NCCA and N.C.f.C.: Coding in Primary Schools (2017). https://www.ncca.ie/en/ primary/primary-developments/coding-in-primary-schools 34. Oldham, E., et al.: Developing confident computational thinking through teacher twinning online. Int. J. Smart Educ. Urban Soc. (IJSEUS) 9(1), 61–75 (2018) 35. O’Toole, L., Kiely, J., McGillicuddy, D.: Parental Involvement, Engagement and Partnership in their Children’s Education during the Primary School Years. Technical report, National Parents Council (2019) 36. Palmer, S., Cochran, L.: Parents as agents of career development. J. Couns. Psychol. 35(1), 71 (1988) 37. Pantin, H., et al.: Familias Unidas: the efficacy of an intervention to promote parental investment in Hispanic immigrant families. Prev. Sci. 4(3), 189–201 (2003) 38. Papastergiou, M.: Are computer science and information technology still masculine fields? High school students’ perceptions and career choices. Comput. Educ. 51(2), 594–608 (2008)

502

N. Bresnihan et al.

39. Papert, S.: Mindstorms: Children, Computers, and Powerful Ideas. Basic Books Inc., New York (1980) 40. Rearick, M.L., Feldman, A.: Orientations, purposes and reflection: a framework for understanding action research. Teach. Teach. Educ. 15(4), 333–349 (1999) 41. Roque, R.: Family creative learning. In: Makeology: Makerspaces as Learning Environments, vol. 1, p. 47 (2016) 42. Shaver, A.V., Walls, R.T.: Effect of title I parent involvement on student reading and mathematics achievement. J. Res. Dev. Educ. 31(2), 90–97 (1998) 43. Sheldon, S.B.: Linking school-family-community partnerships in urban elementary schools to student achievement on state tests. Urban Rev. 35(2), 149–165 (2003) 44. Spradley, J.P.: The Ethnographic Interview. Waveland Press Inc., Long Grove (2016) 45. Spradley, J.P.: Participant Observation. Waveland Press, Long Grove (2016) 46. Starkey, P., Klein, A.: Fostering parental support for children’s mathematical development: an intervention with Head Start families. Early Educ. Dev. 11(5), 659–680 (2000). https://doi.org/10.1207/s15566935eed1105 7 47. Sylva, K., Melhuish, E., Sammons, P., Siraj-Blatchford, I., Taggart, B.: Effective Pre-school and Primary Education 3-11 Project (EPPE 3-11)-Final Report from the Primary Phase: Pre-school, School and Family Influences on Children’s Development during Key Stage 2 (Age 7-11) (2008) 48. Takeuchi, L., et al.: The new coviewing (2011) 49. Vekiri, I.: Boys’ and girls’ ICT beliefs: do teachers matter? Comput. Educ. 55(1), 16–23 (2010) 50. Voogt, J., Fisser, P., Good, J., Mishra, P., Yadav, A.: Computational thinking in compulsory education: towards an agenda for research and practice. Educ. Inf. Technol. 20(4), 715–728 (2015). https://doi.org/10.1007/s10639-015-9412-6 51. Wing, J.M.: Computational thinking. Commun. ACM 49(3), 33 (2006). https:// doi.org/10.1145/1118178.1118215

Retention of University Teachers and Doctoral Students in UNIPS Pedagogical Online Courses Samuli Laato1(B) , Heidi Salmento1 , Emilia Lipponen1 , Henna Vilppu1 , Mari Murtonen1 , and Erno Lehtinen1,2 1 University of Turku, Turku, Finland

[email protected] 2 Vytautas Magnus University, Kaunas, Lithuania

Abstract. Online education provides learning opportunities to a global audience. Most popular MOOC platforms have millions of users and MOOC designers are already competing with each other on how to spark and retain the interest of students. However, currently in popular MOOCs, roughly 90% of enrolled students yield their participation and previous research has identified that the dropouts occur mostly in the very early stages of the courses. This study explores student retention and engagement in pedagogical online courses aimed for university staff members and doctoral students, with quantitative data (N = 404) collected between the years 2016–2019. In addition, this study looks at differences in dropout rates between students of different age, gender, teaching position and department. Based on the conducted statistical analysis, age, gender, teaching position or department have no significant correlation with dropout rates. The majority of participants who drop out from the courses do so in the beginning without completing a single task. University teachers and doctoral students behave in online courses similarly as other students, and the results of the current study fits well with predictions from previous studies. However, this study found two anomalies: (1) A relatively low dropout rate (38,1%) and (2) Over 22% of students yielding their participation return to the courses (n = 31) after which over 50% of them complete the courses. The results highlight the importance of the beginning of online courses for reducing the overall dropout rates and suggest that students yielding their participation are likely to complete the courses the second time, if they enroll again. Keywords: Staff development · Online learning · University pedagogy · Student retention · Engagement

1 Introduction The impact and possibilities online courses have provided for education are substantial [15]. MOOCs (Massive open online courses) and open educational materials allow lowincome students an opportunity to learn [46, 65], they provide flexible and continuous learning opportunities for busy people [12, 69] and once created, their maintenance costs are relatively low compared to traditional contact teaching [54]. Also the geographical reach of these courses far exceeds that of traditional contact teaching, and students © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 503–523, 2020. https://doi.org/10.1007/978-3-030-58459-7_24

504

S. Laato et al.

from all around the world have access to high level, high quality educational material [19]. Online learning comes in multiple forms: SPOCS (Small private online courses), MOOCS and open educational materials to name the most popular [51]. These are being utilized as fully distance education, but also in blended and flipped learning [47]. Even though online learning was first popularized by universities and used in higher education, it is now making its way into K-12 education [45] and employee/staff training [14, 54]. Despite the promising results online courses have had over the years, they have been criticized for the lack of social presence [4, 30] and high dropout rates [35, 47]. Employee training MOOCs and other online courses are proposed to help in offering lifelong learning and continuous learning opportunities for currently employed citizens. For example, with the rapid development in AI technology, several millions of jobs are expected to become automated in the upcoming years resulting in a massive disruption of the labour market [8]. This kind of a development would be catastrophic to many households and personal finance of those to be unemployed, however, governments are already taking drastic action to prevent this kind of a personal catastrophe for the millions of workers by offering them free lifelong learning opportunities already. As it can be expected that many workers will find it difficult to move on and learn new skills for new professions, the quality of these lifelong learning and continuous learning online courses needs to be rigorously addressed and developed. Having employees attend MOOCs while working has also found to have a beneficial impact on innovation in the industry [27, 57]. The current study extends upon the analysis of Laato et al., [35] who showed quantitatively that (1) The majority of learners in university pedagogical online courses who yield participation do so in the very beginning and that (2) simple tasks introduced to the beginning of online courses had a positive impact on the retention rates. To support and extend upon the findings, this study looks at the retention rates of university pedagogical online courses from a longer period of time (2016–2019) and with a larger amount of participants (N = 404). In addition, more quantitative data is collected and analyzed, including age, gender, position and faculty of participants. Finally, when students yield their participation and do dropouts return to study later are observed.

2 Background Generally online courses are reported to have significantly higher dropout rates in comparison to contact teaching [25, 37, 48] with MOOCs regularly having a dropout rate of over 80% [53] or even over 90% [18]. SPOCs have generally better retention rates, but there is much variance in SPOC withdrawal depending on how they are organized [26, 35]. The low retention rates in MOOCs have not improved between 2012–2018 despite efforts by MOOC designers [60]. However, learners have different motivations to enroll in online courses and their individual goal might not be to complete the course in the first place [16, 67]. Besides completion, learners might be motivated to enroll in order to access the course material or to inspire themselves to study [44, 67]. Despite the varying motivations for enrolling in online courses, increasing student retention and engagement in online courses is ubiquitously seen as a beneficial improvement which scholars and MOOC designers aim for [24].

Retention of University Teachers and Doctoral Students

505

In addition to the two forms of online courses, SPOCs and MOOCs [76], online educational material can be available online without any course structure around it, and in those cases, it can be self-studied or utilized in, for example, flipped learning [34, 47]. The term SPOC refers to courses organised privately to a small proportion of students, which can, for example, be a part of curricula studies. MOOCs, on the other hand are, as the name implies, massive and popular open online platforms such as Coursera, Udemy and EdX each have tens of millions of visitors in a year [7] with the most popular courses reaching tens of thousands of students annually [25]. Usually the level of automation in MOOCs is very high or they are completely automated, limiting the types of exercises that are given, as all assignments need to have automatic grading [41]. However, more personalized learning experiences and better feedback for learners have been associated with higher engagement [59]. This issue has been addressed by, for example, the use of artificial intelligence [23] and peer -review in assignments to reduce the load of the course facilitator. However, these solutions are problematic as online course participants have been found to have mixed opinions on the usefulness of peer feedback [43] and even though automated grading systems for essays have advanced recently, they still have many issues to solve [63, 70]. There are several concerns circulating MOOCS, one of which is that they are changing the academic world by replacing contact teaching and “traditional courses” with online alternatives, hence costing many academics their jobs as lecturers [73]. However, practical evidence shows that universities are in fact not replacing existing courses with MOOCs, but on the other hand using MOOCs to supplement existing education, for example, by utilizing flipped learning or other flexible ways to organise teaching [13, 62]. Using MOOCs in this fashion, i.e. only taking advantages of the MOOC video materials but then completing exercises and taking exams locally, will inevitably have an impact on MOOC retention rates. But is the impact positive or negative will depend on how the learning is organised. In case completing the MOOC is supported locally with additional instructions and teamwork, it most likely will have a positive impact, but in case the materials are only utilized with no intention of completing the MOOC, the impact will be negative [38]. 2.1 Employee Training and Staff Development Online Education Employee training courses have special requirements compared to courses offered to full time students. Firstly, participants are expected to be busy with their regular jobs, and hence, pacing of the course needs to be adjusted to that, and in addition, there should not be excessive requirements to participate in synchronous activities which require presence at a certain hour. Secondly, participants are expected to be less extrinsically motivated, as they do not receive any study credits from the course, and are more likely to study to develop themselves instead of studying to get credit points or a diploma. And thirdly, where students studying for a degree can be expected to roughly belong to the same life situation and age group, participants in professional development courses can be expected to have a significantly wider age distribution, and also larger differences in their initial knowledge and skills, also with regard to their experience with online learning environments. Even though the most popular degree MOOC learners have is Bachelors’ and the second most popular degree is a basic school diploma, the highest degree of those

506

S. Laato et al.

students who actually finish the MOOC and earn a certificate is Masters’ followed by Bachelors’ [17]. As higher degrees positively correlate with student retention in MOOCs, employee training courses offered to university lecturers and doctoral students can be expected to have higher retention rates than the observed average of 5–10% [35, 47, 64]. The university workforce are in a key position with regards to the upcoming disruption of the labour market due to automation [1], as they are responsible for providing higher education. University teachers are in charge of cultivating future minds, and many institutions are pushing their teachers to study pedagogical courses to ensure their skills are up to date. University pedagogical courses are delivered both through dedicated platforms such as UNIPS [34] as well as via popular MOOC platforms such as Coursera and Udemy. Completing such courses is seen as beneficial by employers, however, the information of online training opportunities does not always reach their target demographic [58]. In addition, when participants become aware of professional online courses, they might not be able to attend due to financial or scheduling difficulties. To make things easier for the students, professional development MOOC designers should address the three requirements in their design: (1) asynchronous learning opportunities and flexible schedule (2) relevant study materials aimed for intrinsically motivated students and (3) students’ varying skills and knowledge [35]. 2.2 Increasing Engagement and Retention Retention in online courses, especially MOOCs has been widely studied [28]. Scholars discuss the phenomenon using at least the following terms: retention [2, 21, 75] participation [74], withdrawal [56] and dropout rates [33, 39, 61]. Also studies on continuance to use MOOCs often deal with student retention [3, 77]. Regardless of the type of course, or the target audience, the very beginning of online courses is when the majority of dropouts occur [16, 24, 35, 50, 72]. After students pass the mid-point of MOOCs, they are already likely to complete the whole course [21]. Therefore a lot of care and consideration needs to be put in the very early stages of the courses when considering improving student retention [35] Students can also have various blends of both extrinsic and intrinsic motivators to enroll in online courses [3], resulting in existing biases in initial intention to complete the courses after enrollment [20]. Engagement and retention in MOOCs are often mentioned together in research papers [9, 21, 75] even though the two terms are not synonyms. Usually engagement is seen as the beneficial improvement that scholars aim for, and retention rates are an indicator of that. However, as established with previous examples of flipped learning [73] and students’ motivation to enroll in MOOCs [67], the two are not always directly linked. Still, many of the proposed strategies to increase student retention also positively affect student engagement and vice versa, for example, increased social presence [75], reduced cognitive load [72] and perceived effectiveness [21] have all proven to have a beneficial impact on both. Therefore, even though it is important to make the distinction between engagement and retention in online courses, the two share a connection and changes in one will most likely also have affected the other [9]. Increasing Retention in the Early Stages of MOOCs. A popular given reason for the initial dropout spike on MOOCs is cognitive overload [22, 52, 72]. Cognitive load

Retention of University Teachers and Doctoral Students

507

theory is based on humans having limited working memory and limited capability for simultaneous cognitive processes [55]. Presenting too much information for online learners at once may overload their working memory and trigger an instinct to take a step back, which in practice often means conceding their course participation [5, 31, 74]. In MOOCs, different video types, for example, can impose varying degrees of cognitive load on students, with voice-over type videos on average generating the highest level of cognitive load [5]. Other sources of cognitive load include the visual layout of the MOOC platform and especially the instructional design [71]. All together, at least nine ways to reduce cognitive load in online learning have been identified: (1) Off-loading visual load to auditory channel (2) Segment learning content into smaller packets (3) Provide pretraining on terminology or other course related content (4) Remove material which is not necessary for the course (5) Provide cues or tips how to process the presented educational material (6) Reduce the need to visually scan for information by associating words with related graphics etc. (7) Avoid repetition/redundancy (8) In educational videos, present narration and animation simultaneously (9) Take into account individual needs, for example, learners with low spatial learning capacity [40]. Besides reducing cognitive load, other strategies for improving student retention in the early stages of MOOCs have been suggested. For example, Nazir et al. [50] propose the following strategies to reduce the amount of dropouts in the beginning of MOOCs: buddying, feedback and briefing. Having social support helps students become more engaged in learning, and has a positive effect on retention [80] Immediate feedback, the ability to give it and receive answers, is also very important in the birth of engagement [66]. Additionally what will always affect student engagement and retention in the beginning is how students perceive the course and what their first impressions are [79]. Also more general strategies for increasing engagement have been suggested: gamification, interactive digital content, quizzes, immediate feedback, personalized difficulty, providing deeper learning materials when requested and real world challenges and testing [9]. To summarize, factors influencing students retention in the early stages of MOOCs are highly complex, with cognitive load factors being major contributors to the currently high dropout rates [72]. Reducing cognitive load and focusing on instructional design can already significantly improve the early dropout rates, but additional focus should be put on giving students as good a first impression as possible [71, 74, 78]. Increasing General Engagement in MOOCs. Another issue with MOOC retention is that the majority of students who complete MOOCs do not return to study after their first year [60], which suggests issues with long term engagement as well even if students manage to successfully complete the courses. Therefore strategies for general engagement are also needed, and they have been suggested widely in literature as well. For example, overall indicators and predictors that students are more likely to complete a MOOC are found to be: • Interaction with a facilitator or an organised [21] • Students having prior experience on education [16] • Students having a predetermined intention to complete the MOOC [16]

508

S. Laato et al.

• More personalized learning [68] • Reduced cognitive load and focus on instructional design [74, 78] • Interaction with peers and other social elements [6]. Online learning platforms are complex systems and the degrees to which all the above mentioned strategies influence engagement will vary between platforms and courses. Learning analytics can be utilized to pinpoint parts of online courses which cause students to struggle, and those key moments can then be targeted by designers [6, 29]. 2.3 The UNIPS Open Learning Environment The UNIPS learning environment is the case learning platform in the current study. In Finland, all the educational institutions, except universities, require that their teachers have a formal teacher qualification, which can only be obtained via completing official pedagogical studies. Nowadays also all Finnish universities offer some pedagogical training for their teachers and the popularity of university pedagogical training has grown rapidly in the country [49]. However, the pedagogical training is most often voluntary and thus, reaches mainly those teachers who are already motivated to develop their teaching and themselves as teachers. Of course educating this group is important, but perhaps even more important could be to provide some pedagogical support for those teachers who do not participate in university pedagogical courses. The UNIPS learning environment, which is an online learning platform offering university pedagogical courses for employees and doctoral students of Finnish universities [34], was developed to solve the two major problems that have limited the possibilities to participate in university pedagogical training: 1) the traditional studies have been mostly in Finnish and 2) the traditional courses have been available only for university employees and doctoral students with teaching duties [35]. In addition, the courses have been arranged as contact teaching requiring physical attendance and commitment to schedules which can be challenging for full-time employees. According to Laato et al. [36] UNIPS learning solution has managed to solve these problems and their results show that UNIPS modules has increased the diversity of participants who attend university pedagogical training. During the time of their operation from late 2015 onwards, UNIPS modules have become popular, especially among doctoral students. UNIPS has enabled participation of all the doctoral students regardless if they have teaching duties or not. In addition, they offer the option to study university pedagogy asynchronously online in English, while previously the only available university pedagogical courses were organised synchronously in Finnish. UNIPS currently has nine modules, each worth one European Credit Transfer System (ECTS) credit. A view of the nine modules is shown in Fig. 1. The module on the top left, Becoming a Teacher, is the module from which data was primary collected in the current study. Background of UNIPS Open Learning Environment. UNIPS open learning environment (University Pedagogical Support) is currently being developed in collaboration with eight Finnish Universities (University of Turku, Aalto University, Hanken School of Economics, University of Jyväskylä, Lappeenranta University of Technology, University of Oulu, Tampere University and University of Eastern Finland). The idea was

Retention of University Teachers and Doctoral Students

509

Fig. 1. A Screenshot from the unips.fi platform, showing nine available modules.

based on a previous learning solution called UTUPS (University of Turku Pedagogical Support), that was developed in University of Turku during the years 2015–2016. UTUPS included three small online courses, called “modules” about university pedagogical topics. All three modules have been continued in UNIPS and they were named as Becoming a teacher, Lecturing and expertise and How to plan my teaching. As a result of national collaboration, altogether 9 modules (see Fig. 1) have been developed and more modules will be published in UNIPS during the year 2019. Studying in UNIPS - How Modules Work in Practice? Most of the modules consist of two main parts: an Individual task period when participants study the materials of the modules independently and a Group work period where participants engage in collaboration with each other to deepen their understanding of the contents of the modules together. Currently UNIPS does not contain many of the elements typical to a learning management system (LMS), and hence, the LMS Moodle [10]. is being used to support

Fig. 2. Steps of the modules.

510

S. Laato et al.

the UNIPS courses. The workflow of the module analyzed in this study, Becoming a Teacher, follows the pattern displayed in Fig. 2. The enrollment for the modules is usually open three times per year, with currently up to 140 students accepted at once for studies at a single university. Currently all the modules require assessment from the teacher of the module, and the lack of automation is the reason the courses are not fully online and available all the time. Participants are selected in order of enrollments and the selected participants are asked to register on Moodle as seen in Fig. 2. Next step is the Individual study phase and during that time, participants are studying the materials of the modules individually and write a reflective essay on the topic of the module based on the materials and their own experiences and thoughts. The materials consist of short educational videos, scientific articles and small activating tasks about the topics of the module. After completing the individual task students are asked to submit it on Moodle for evaluation. All participants who have submitted their individual task on Moodle are able to continue to the Group work period. Participants are divided into small groups and they are reading and commenting on each others’ reflective essays that have been written during the individual task period. After studying the modules students receive individual feedback on their tasks in Moodle and they are also asked to give feedback on the modules. This is voluntary and anonymous and in that way does not affect the amount of dropouts. In addition to the five phases depicted on Fig. 2, two more phases were added to the module(s) in autumn 2017: a pretask and a final task. Investigating how the additional phases affect student engagement and retention is one of the main research questions of the current study. 2.4 Research Questions This study explores student retention and engagement in the UNIPS online module Becoming a Teacher in years 2016–2019. Based on findings and predictions from previous studies, the following research questions were formulated: (Q1). When do participants drop out during the UNIPS online modules? The hypothesis based on previous studies is that most participants who yield participation do so in the beginning of the courses, and not, for example, before the most demanding task. (Q2). Do any of the following correlate with students’ likelihood of passing the courses: (i) age and gender (ii) faculty (iii) position at the university? The hypothesis for this based on previous studies is that none of these would have a significant effect on the retention rates. (Q3) Are there students who yield their participation, but later return when the courses are organised again, and complete the modules? How common is such behavior? Only a few participants were expected to return to the courses after yielding participation. (Q4) Does including pre-tasks to the beginning of online courses increase students retention? Based on the findings of [35] and others [11], this was predicted to reduce the cognitive load [72], engage students [5] and thus, increase retention.

Retention of University Teachers and Doctoral Students

511

3 Research Design 3.1 Method Quantitative data and statistical analyses were utilized in answering the research questions laid out. First, students’ drop out was obtained by observing five phases of UNIPS module completion to see which are the phases were students yield their participation. The hypothesis based on previous studies was that the majority of students would dropout in the beginning and the amount of dropouts would decrease close to 0% as the end of the course approached. The data on when students dropout was visualized in a diagram in order to see the dropout curve over the duration of UNIPS modules. Second, with regards to statistical differences in dropout rates between students of different age, position at the university and faculty, a chi-square test [42] was performed on the collected data. Thirdly, the amount of students who yielded their participation, but then returned to study again later, was calculated. Based on predictions from previous studies [60], the amount of returning students was not expected to be high. Finally, dropout rates in the module Becoming a Teacher before the implementation of a pre-test/first task and dropout rates in the module after the implementation of a pre-test/first task were recorded and compared with each other. 3.2 Data Collection and Limitations Data was collected from participants (N = 404) taking UNIPS module Becoming a teacher between years 2016–2019. The participants were both university staff members, i.e. teachers and other employees (n = 90) and doctoral students (n = 314). The participants were both male (n = 173) and female (n = 231) and the age of the participants varied between 21 and 65. All the participants did not reply to the questions concerning gender and age and in addition, gender and age were not asked at the first time the module was organized. For that reason the age of 30,3% and the gender of 27,2% of the participants is not included in analysis. Most of the participants (335) were from the University of Turku because the UNIPS learning environment was originally developed and piloted there. Later, seven other Finnish universities came along, and in the year 2019, there were participants (n = 69) also from the seven partner universities. The data is quantitative and consists of participants’ age, position at university, faculty, and whether they dropped out, and at what point, or completed the module. Students were asked for their permission to use the data they generate for research, and all students who declined permission were excluded from the dataset. The total number of individual students in the current study is 404, however, as the data in the current study is real data collected during the span of three and a half years, and due to some information being voluntary and some information not being collected at all courses, the amount of participants in each individual statistic vary. For example, data from when students yield their participation was not available from the first modules organised in autumn 2018.

512

S. Laato et al.

4 Results 4.1 When Do the Participants Drop Out? There are three critical phases, when the dropouts have noticed to happen: 1) Registration on Moodle after module enrollment, 2) Submitting the individual task and 3) Participating on teamwork. To find out which of these are the most critical phases we analysed the dropouts of students who enrolled to the module Becoming a teacher in years 2016–2019. In total, 154 of the participants who enrolled to study (N = 404) did not complete the module. Thus, the dropout rate of the module was 38,1%. When looking at the critical phases for dropouts, the analysis revealed that over half of the dropouts (62,1%) happen immediately after enrollment, because 87 participants who enrolled did not register on Moodle. The second critical step seems to be the individual study phase, since 37% of the dropouts happened when 57 of the participants who registered on Moodle did not submit the individual task. The rest of the dropouts (6,5%) happened at the group work phase, since 10 of the participants who submitted their individual task did not participate in group work and thus, did not complete the module. The percentage value of dropouts at different phases of the module are presented in Fig. 3.

Fig. 3. The percentage value of dropouts at different phases of the module Becoming a teacher in years 2016–2019.

4.2 Does Age, Gender, Position or Department or Faculty Correlate with Retention? Age, Gender and Likelihood of Passing the Module. As presented in Fig. 4, the age of participants taking the module Becoming a teacher varied between 21–65. All the

Retention of University Teachers and Doctoral Students

513

participants did not reply to the question concerning age and in addition, age was not asked at the first times the modules were organized. For that reason the age of 30,3% of the participants (n = 123) could not be reported here and was left out of the analysis. Thus the percentage values presented in Fig. 4 are calculated from the 281 participants whose age was known.

Fig. 4. Age distribution of participants taking the module Becoming a teacher in years 2016–2019.

As Fig. 4 shows, over half of the participants were 30–40 years old. One fifth of the participants were under 30 and about one fifth between 40 and 50. The age of the rest of the participants was over 50 years. When looking at participants’ likelihood of passing the module, there were no statistically significant differences between the age groups (χ 2 (3) = 5,42; p = 0,14). The result was the same when looking at gender and likelihood of passing the module, and no statistically significant difference was found (χ 2 (2) = 5,95; p = 0,51). Faculty Likelihood of Passing the Modules. As presented in Fig. 5, most of all the enrolled participants (N = 404) in years 2016–2019 were from Medicine (26,2% of the enrolled participants) and a lot of participants came also from the faculties of Science and Engineering (21,5% of the enrolled participants) and Humanities (17,1% of the enrolled participants). Under ten percent of the enrolled participants came from the faculties of Economics (8,4%), Social Sciences (7,9%), Education (5,9%) and Law (2,7%). In addition, some participants (10,1%) were from several smaller units which were not considered in the Figure. To investigate whether participants faculty has an impact on dropout rates we compared the dropout rates by faculties. The dropout rates by participants from different

514

S. Laato et al.

Fig. 5. Enrolled participants (N = 404) by faculties (percentage values of all the enrolled participants) in years 2016–2019.

faculties are presented in Fig. 6. The percentages here means for example, that 45,5% of the enrolled participants who were from the faculty of law did not complete the module and 55,5% of the participants did. Participants from smaller units (10,1% of all the participants) were not considered in the figure.

Fig. 6. Participants who completed/did not complete the module (% of the enrolled participants) by faculties during years 2016–2019.

Retention of University Teachers and Doctoral Students

515

As Fig. 6 shows, the dropout rate was the highest at the Faculty of Law (45,5% of the participants from the faculty). The rate was almost as high in the faculties of Medicine, Science and Engineering and Humanities which also most of the participants came from (see Fig. 6). The dropout rate was the lowest at the faculties of Social Sciences, Education and Economics. The amount of participants who came from these faculties was also relatively small. Anyway, according to Chi square test the differences between faculties were no statistically significant (χ 2 (9) = 14,15; p = 0,12). Position at the University and Likelihood of Passing the Modules. Most of the UNIPS participants (77,7%) in years 2016–2019 were doctoral students (n = 314). About one fifth (22,3%) were university employees i.e. teachers or other staff members but not doctoral students (n = 90). Some of the doctoral students (n = 126) did have teaching duties and about half of all participants were doctoral students without teaching duties (see Fig. 7).

Fig. 7. UNIPS participants’ (N = 404) position at University in years 2016–2019.

The dropout rates were calculated to be able to compare if there are differences between the participants with different status at university. Like presented in Fig. 8 the dropout rate of doctoral students was higher than university employees who are not doctoral students, but according to chi square test the differences between the groups were not statistically significant (χ 2 (2) = 2,27; p = 0,32). 4.3 Do Students Who Dropout Later Enroll to the Courses Again? As presented in Table 1, over half of the participants completed the module when participating for the first time. About one fifth of those who did not pass the module when

516

S. Laato et al.

participating for the first time (n = 140) enrolled again (n = 31) and 54,8% of them (n = 17) then completed the module. Only one participant enrolled for the third time and completed the module then.

Fig. 8. Dropout rates by participants’ position at the university in years 2016–2019. Table 1. Number of enrollments to the module.

First time enrollment Second time enrollment Third time enrollment

Completed the module

Didn’t complete the module

Total

232

140

372

17

14

31

1

0

1

250

154

404

4.4 The Influence of a Pretest on Student Retention Before Autumn 2017, UNIPS modules began straight away with the individual task period lasting two weeks. However, from Autumn 2017 onwards a pretest task was introduced at the beginning of all UNIPS modules. The purpose of the task was primarily to activate participants´ thinking and preconceptions of the topic and to increase possibilities to develop the modules by collecting research data. Surprisingly the first task seemed to also have a positive impact on retention rates. As presented in Table 2, the dropout rate was higher before the pretest was added to the requirements of passing the module. However, the difference between the groups was not statistically significant (χ 2 (1) = 2,23; p = 0,14).

Retention of University Teachers and Doctoral Students

517

Table 2. Dropout rates before and after the pretest task was added to the requirements of passing the module. Completed the module

Didn’t complete the module

Total

Participated to the module before pretest was included

117 (58,2%)

84 (41,8%)

201

Participated to the module after pretest was included

133 (65,5%)

70 (34,5%)

203

250

154

404

5 Discussion 5.1 Key Findings The overall dropout rate in the UNIPS module Becoming a Teacher (38,1%) was observed to be low compared to the typical dropout rates in MOOCs [18, 25, 37, 48] which have dropout rates as high as 95%. This, however, is closer to that of typical SPOCs (Kaplan and Haenlein, 2016) and as UNIPS modules were organised at specific dates to a limited amount of students, they resemble SPOCs more than MOOCs in that sense, even if the courses were offered to participants from many universities. The UNIPS modules were organised at certain times for a limited amount of participants and in that way differ from MOOCs. The courses also had from the very beginning onwards interaction with the course facilitator or teacher, and later on with other participants. This may have contributed to students obtaining a sense of social presence and sense of belonging which are important for motivation and engagement [21, 50, 80]. The majority of students withdrawing from the module did so in the beginning, and the dropout curve greatly resembled that of previous studies on MOOC retention [16, 24, 35, 50, 72]. Most of the participants (62,3%) who did not complete the module did not log in to Moodle and thus, did not even get the instructions for the module, and were therefore not able to start studying. It would be important to consider how to motivate this group to take the next step. Also, many of those who logged in to Moodle did not finish the modules. Overall 37% of participants who dropped out did not submit their individual task. Surprisingly the most demanding individual task, the essay, was not the most common moment to drop out, but rather it was the very beginning. This finding further proves the point that online course designers should put heavy focus on making as good a first impression as possible. The findings also indicate that participants start studying - or actually doing something - they will stay on the course. However, even though the addition of pre-task and post-task did reduce the dropout rate on UNIPS courses, a careful statistical analysis showed that the change was not substantial enough to reach the 95% confidence interval (p = 0.14) for the module Becoming a Teacher. However, with all three modules taken into account, the change was significant [35]. Based on the chi-square analysis, participants´ age, faculty or position at the university were not connected to participants´ likelihood of passing the Becoming a Teacher module during the years 2016–2019. Small differences between the groups could be observed but these were not statistically significant. This finding is important as there

518

S. Laato et al.

is not much research done on online course retention where participants are university lecturers and doctoral students. The participants differ from the vast majority of online courses which are open for everyone in that they have completed on average a lot more formal courses. On the other hand, the UNIPS participants were on average 10–15 years older than the regular MOOC participant. The findings show that the general observed phenomenon that, with regards to retention the first impression matters the most, is also true for university staff and doctoral students. 5.2 Implications for Online Course Designers Based on the results of the study, the following guidelines for online course designers can be formulated: • Most participants who drop out do not complete even a single task on the online course. Designers should focus on reducing students’ cognitive load in the beginning to a bare minimum. One possible solution would be to ask students’ to complete some kind of a task related to the course contents in order to allow them to enroll in the courses. • The age or position in life do not correlate with retention, therefore personalized content should be introduced in the form of human interaction instead of, for example, supporting the debunked learning styles [32]. • Over 22% of students who drop out return later to the courses again, and on the second time over 50% of them complete the course. How to better serve this group of students? 5.3 Limitations and Future Work Data for the current study (N = 404) was collected from a single country, and most students were from a single university despite being from seven different faculties. Due to limitations of the collected data, only some of the phenomenon and features identified in previous studies could be measured and tested in the current setting with university teachers and doctoral students as participants. Another limitation is the UNIPS platform, as it differs from popular MOOC platforms in several ways and the findings might therefore not directly translate to the domain of fully automated online courses. Despite the shortcomings, this study provided data about how university lecturers and doctoral students study pedagogy online and suggested that the position in life or age does not correlate with the likelihood of completing online courses. Future work will include testing new methods of making a good first impression for students, and the addition of simple but relevant tasks, which aim to reduce the cognitive load students experience in the beginning.

6 Conclusions This study focused on student retention in university pedagogical courses by using the UNIPS module Becoming a Teacher from the years 2016–2019 (N = 404) as a

Retention of University Teachers and Doctoral Students

519

case study. The majority of students who yielded their participation did so in the very beginning of the module and the more tasks a student completed the more likely they were to complete the module. The result echoes findings from other popular MOOCs and reducing the cognitive load of students and introducing minor tasks in the beginning have been suggested throughout literature as remedies for the initial dropout spike. With the best practices in mind when designing the UNIPS solution, the overall dropout rate of 38,1% was significantly lower than in many popular MOOCs, however it is still substantially high compared to contact teaching. Based on the findings designers should focus on how to make a good first impression on students in the beginning of the module and try to get them engaged with the course by introducing small, simple and easy introductory tasks at first, thus reducing students’ cognitive load. The age, position at university or department or faculty of the students did not significantly correlate with retention. This finding shows that even with a very diverse group of students, the same pedagogical principles apply. This finding draws parallels to the debunked learning styles myth [32] by suggesting that a well designed online course should work equally well regardless of participants age or position in life. When looking at students who drop out, but later return to study, 22,1% returned later to the module again and that time over half completed the module. Perhaps online course designers should also in the future focus on how to bring dropouts back to complete their course.

References 1. Abbott, R., Bogenschneider, B.: Should robots pay taxes: tax policy in the age of automation. Harv. L. Policy Rev. 12, 145 (2018) 2. Adamopoulos, P.: What makes a great MOOC? An interdisciplinary analysis of student retention in online courses (2013) 3. Alraimi, K.M., Zo, H., Ciganek, A.P.: Understanding the MOOCs continuance: the role of openness and reputation. Comput. Educ. 80, 28–38 (2015) 4. Aragon, S.R.: Creating social presence in online environments. New Dir. Adult Contin. Educ. 2003(100), 57–68 (2003) 5. Bradford, G.R.: A relationship study of student satisfaction with learning online and cognitive load: initial results. Internet High. Educ. 14(4), 217–226 (2011) 6. Chen, B., Chang, Y.H., Ouyang, F., Zhou, W.: Fostering student engagement in online discussion through social learning analytics. Internet High. Educ. 37, 21–30 (2018) 7. Conache, M., Dima, R., Mutu, A.: A comparative analysis of MOOC (massive open online course) platforms. Informatica Economica 20(2), 4–14 (2016) 8. David, H.J.J.O.E.P.: Why are there still so many jobs? The history and future of workplace automation. J. Econ. Perspect. 29(3), 3–30 (2015) 9. Freitas, D., Isabella, S., Morgan, J., Gibson, D.: Will MOOCs transform learning and teaching in higher education? Engagement and course retention in online learning provision. Br. J. Edu. Technol. 46(3), 455–471 (2015) 10. Dougiamas, M.: Moodle. International Society for Technology in Education (2004) 11. Evans, B.J., Baker, R.B., Dee, T.S.: Persistence patterns in massive open online courses (MOOCs). J. High. Educ. 87(2), 206–242 (2016) 12. Farrow, R.: MOOC and the workplace: key support elements in digital lifelong learning (2018) 13. Fox, A.: From MOOCs to SPOCs. Commun. ACM 56(12), 38–40 (2013)

520

S. Laato et al.

14. Gonçalves, B., Osório, A.: Massive open online courses (MOOC) to improve teachers’ professional development. RE@ D-Revista de Educação a Distância e Elearning 1(1), 52–63 (2018) 15. Goodman, J., Melkers, J., Pallais, A.: Can online delivery increase access to education? J. Labor Econ. 37(1), 1–34 (2019) 16. Greene, J.A., Oswald, C.A., Pomerantz, J.: Predictors of retention and achievement in a massive open online course. Am. Educ. Res. J. 52(5), 925–955 (2015) 17. Guo, P.J., Reinecke, K.: Demographic differences in how students navigate through MOOCs. In: Proceedings of the 1st ACM Conference on Learning@ Scale Conference, pp. 21–30. ACM (2014) 18. Gütl, C., Rizzardini, R.H., Chang, V., Morales, M.: Attrition in MOOC: lessons learned from drop-out students. In: Uden, L., Sinclair, J., Tao, Y.-H., Liberona, D. (eds.) LTEC 2014. CCIS, vol. 446, pp. 37–48. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10671-7_4 19. Haavind, S., Sistek-Chandler, C.: The emergent role of the MOOC instructor: a qualitative study of trends toward improving future practice. Int. J. E-learn. 14(3), 331–350 (2015) 20. Henderikx, M.A., Kreijns, K., Kalz, M.: Refining success and dropout in massive open online courses based on the intention–behavior gap. Distance Educ. 38(3), 353–368 (2017) 21. Hone, K.S., El Said, G.R.: Exploring the factors affecting MOOC retention: a survey study. Comput. Educ. 98, 157–168 (2016) 22. Huang, N.-F., et al.: On the automatic construction of knowledge-map from handouts for MOOC courses. In: Pan, J.-S., Tsai, P.-W., Watada, J., Jain, Lakhmi C. (eds.) IIH-MSP 2017. SIST, vol. 81, pp. 107–114. Springer, Cham (2018). https://doi.org/10.1007/978-3319-63856-0_13 23. Jamil, H.M.: Automated personalized assessment of computational thinking MOOC assignments. In: 2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT), pp. 261–263. IEEE (2017) 24. Jiang, S., Williams, A., Schenke, K., Warschauer, M., O’dowd, D.: Predicting MOOC performance with week 1 behavior. In: Educational Data Mining 2014 (2014) 25. Jordan, K.: Initial trends in enrolment and completion of massive open online courses. Int. Rev. Res. Open Distrib. Learn. 15(1), 133–160 (2014) 26. Kaplan, A.M., Haenlein, M.: Higher education and the digital revolution: about MOOCs, SPOCs, social media, and the Cookie Monster. Bus. Horiz. 59(4), 441–450 (2016) 27. Karnouskos, S.: Massive open online courses (MOOCs) as an enabler for competent employees and innovation in industry. Comput. Ind. 91, 1–10 (2017) 28. Khalil, H., Ebner, M.: MOOCs completion rates and possible methods to improve retentiona literature review. In: EdMedia+ Innovate Learning, pp. 1305–1313. Association for the Advancement of Computing in Education (AACE) (2014) 29. Khalil, M., Ebner, M.: What massive open online course (MOOC) stakeholders can learn from learning analytics? In: Spector, M., Lockee, B., Childress, M. (eds.) Learning, Design, and Technology. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-17727-4_3-1 30. Kilgore, W., Lowenthal, P.R.: The human element MOOC. In: Student-Teacher Interaction in Online Learning Environments, pp. 373–391. IGI Global (2015) 31. Kim, K.J., Frick, T.W.: Changes in student motivation during online learning. J. Educ. Comput. Res. 44(1), 1–23 (2011) 32. Kirschner, P.A.: Stop propagating the learning styles myth. Comput. Educ. 106, 166–171 (2017) 33. Kloft, M., Stiehler, F., Zheng, Z., Pinkwart, N.: Predicting MOOC dropout over weeks using machine learning methods. In: Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, pp. 60–65 (2014) 34. Laato, S., Salmento, H., Murtonen, M.: Development of an online learning platform for university pedagogical studies-case study. In: CSEDU, vol. 2, pp. 481–488 (2018)

Retention of University Teachers and Doctoral Students

521

35. Laato, S., Lipponen, E., Salmento, H., Vilppu H., Murtonen, M.: Minimizing the number of dropouts in university pedagogy online courses. In: CSEDU 2019 (2019) 36. Laato, S., Salmento, H., Lipponen, E., Virtanen H, Vilppu H., Murtonen, M.: Increasing Diversity in University Pedagogical Training via UNIPS Open Learning Environment. Accepted for publication 26.7.2019 (2019) 37. Lee, Y., Choi, J.: A review of online course dropout research: implications for practice and future research. Educ. Tech. Res. Dev. 59(5), 593–618 (2011) 38. Li, N., Verma, H., Skevi, A., Zufferey, G., Blom, J., Dillenbourg, P.: Watching MOOCs together: investigating co-located MOOC study groups. Distance Educ. 35(2), 217–233 (2014) 39. Liyanagunawardena, T.R., Parslow, P., Williams, S.: Dropout: MOOC participants’ perspective (2014) 40. Mayer, R.E., Moreno, R.: Nine ways to reduce cognitive load in multimedia learning. Educ. Psychol. 38(1), 43–52 (2003) 41. Mazoue, J.G.: The MOOC model: challenging traditional education (2014) 42. McHugh, M.L.: The chi-square test of independence. Biochem. Med. 23(2), 143–149 (2013) 43. Meek, S.E., Blakemore, L., Marks, L.: Is peer review an appropriate form of assessment in a MOOC? Student participation and performance in formative peer review. Assess. Eval. High. Educ. 42(6), 1000–1013 (2017) 44. Milligan, C., Littlejohn, A.: Why study on a MOOC? The motives of students and professionals. Int. Rev. Res. Open Distrib. Learn. 18(2), 92–102 (2017) 45. Moore-Adams, B.L., Jones, W.M., Cohen, J.: Learning to teach online: a systematic review of the literature on K-12 teacher preparation for teaching online. Distance Educ. 37(3), 333–348 (2016) 46. Mulligan, B.: Lowering MOOC production costs and the significance for developing countries. In: Global Learn, pp. 352–358. Association for the Advancement of Computing in Education (AACE) (2016) 47. Murphy, J., Tracey, J.B., Horton-Tognazzini, L.: MOOC camp: a flipped classroom and blended learning model. In: Inversini, A., Schegg, R. (eds.) Information and Communication Technologies in Tourism 2016, pp. 653–665. Springer, Cham (2016). https://doi.org/10. 1007/978-3-319-28231-2_47 48. Murphy, C.A., Stewart, J.C.: On-campus students taking online courses: factors associated with unsuccessful course completion. Internet High. Educ. 34, 1–9 (2017) 49. Murtonen, M., Ponsiluoma, H.: Yliopistojemme tarjoamien yliopistopedagogisten koulutusten historia ja nykyhetki. Ylipistopedagogiikka 21(1), 7–9 (2014) 50. Nazir, U., Davis, H., Harris, L.: First day stands out as most popular among MOOC leavers. Int. J. e-Educ. e-Bus. e-Manage. e-Learn. 5, 173–179 (2015) 51. Nunez, J.L.M., Caro, E.T., Gonzalez, J.R.H.: From higher education to open education: challenges in the transformation of an online traditional course. IEEE Trans. Educ. 60(2), 134–142 (2016) 52. Oakley, B., Poole, D., Nestor, M.: Creating a sticky MOOC. Online Learn. 20(1), 13–24 (2016) 53. Onah, D.F., Sinclair, J., Boyatt, R.: Dropout rates of massive open online courses: behavioural patterns. In: EDULEARN14 Proceedings, pp. 5825–5834 (2014) 54. Ong, D., Jambulingam, M.: Reducing employee learning and development costs: the use of massive open online courses (MOOC). Develop. Learn. Organ. Int. J. 30(5), 18–21 (2016) 55. Paas, F., Tuovinen, J.E., Tabbers, H., Van Gerven, P.W.: Cognitive load measurement as a means to advance cognitive load theory. Educ. Psychol. 38(1), 63–71 (2003) 56. Packham, G., Jones, P., Miller, C., Thomas, B.: E-learning and retention: key factors influencing student withdrawal. Educ. + Train. 46(6/7), 335–342 (2004) 57. Park, S., Jeong, S., Ju, B.: MOOCs in the workplace: an intervention for strategic human resource development. Hum. Res. Develop. Int., 1–12 (2018)

522

S. Laato et al.

58. Proctor, J.: With Attention Being Paid to Digital Learning, How Can Companies Promote and Raise Awareness of Training Opportunities? (2017) 59. Rai, L., Yue, Z., Yang, T., Shadiev, R., Sun, N.: General impact of MOOC assessment methods on learner engagement and performance. In: 2017 10th International Conference on Ubimedia Computing and Workshops (Ubi-Media), pp. 1–4. IEEE (2017) 60. Reich, J., Ruipérez-Valiente, J.A.: The MOOC pivot. Science 363(6423), 130–131 (2019) 61. Rivard, R.: Measuring the MOOC dropout rate. Inside Higher Ed, vol. 8 (2013) 62. Rodríguez, M.F., Hernández Correa, J., Pérez-Sanagustín, M., Pertuze, Julio A., AlarioHoyos, C.: A MOOC-based flipped class: lessons learned from the orchestration perspective. In: Delgado Kloos, C., Jermann, P., Pérez-Sanagustín, M., Seaton, Daniel T., White, S. (eds.) EMOOCs 2017. LNCS, vol. 10254, pp. 102–112. Springer, Cham (2017). https://doi.org/10. 1007/978-3-319-59044-8_12 63. Rokade, A., Patil, B., Rajani, S., Revandkar, S., Shedge, R.: Automated grading system using natural language processing. In: 2018 2nd International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 1123–1127. IEEE (2018) 64. Sanchez-Gordon, S., Calle-Jimenez, T., Lujan-Mora, S.: Relevance of MOOCs for training of public sector employees. In: 2015 International Conference on Information Technology Based Higher Education and Training (ITHET), pp. 1–5. IEEE (2015) 65. Sanchez-Gordon, S., Luján-Mora, S.: e-Education in countries with low and medium human development levels using MOOCs. In: 2016 3rd International Conference on eDemocracy & eGovernment (ICEDEG), pp. 151–158. IEEE (2016) 66. Singh, R., Gulwani, S., Solar-Lezama, A.: Automated feedback generation for introductory programming assignments. ACM Sigplan Not. 48(6), 15–26 (2013) 67. Stracke, C.M.: Why we need high drop-out rates in MOOCs: new evaluation and personalization strategies for the quality of open education. In: 2017 IEEE 17th International Conference on Advanced Learning Technologies (ICALT), pp. 13–15. IEEE. (July 2017) 68. Sujatha, R., Kavitha, D.: Learner retention in MOOC environment: analyzing the role of motivation, self-efficacy and perceived effectiveness. Int. J. Educ. Develop. ICT 14(2), 62–74 (2018) 69. Sullivan, R., Fulcher-Rood, K., Kruger, J., Sipley, G., van Putten, C.: Emerging technologies for lifelong learning and success: a MOOC for everyone. J. Educ. Technol. Syst. 47(3), 318–336 (2019) 70. Tashu, T.M., Horváth, T.: Pair-wise: automatic essay evaluation using word mover’s distance. In: CSEDU, vol. 1, pp. 59–66 (2018) 71. Terras, M.M., Ramsay, J.: Massive open online courses (MOOCs): insights and challenges from a psychological perspective. Br. J. Edu. Technol. 46(3), 472–487 (2015) 72. Tyler-Smith, K.: Early attrition among first time eLearners: a review of factors that contribute to drop-out, withdrawal and non-completion rates of adult learners undertaking eLearning programmes. J. Online Learn. Teach. 2(2), 73–85 (2006) 73. Vardi, M.Y.: Will MOOCs destroy academia? Commun. ACM 55(11), 5 (2012) 74. Vonderwell, S., Zachariah, S.: Factors that influence participation in online learning. J. Res. Technol. Educ. 38(2), 213–230 (2005) 75. Xiong, Y., Li, H., Kornhaber, M.L., Suen, H.K., Pursel, B., Goins, D.D.: Examining the relations among student motivation, engagement, and retention in a MOOC: a structural equation modeling approach. Glob. Educ. Rev. 2(3), 23–33 (2015) 76. Xu, W., Jia, Y., Fox, A., Patterson, D.: From MOOC to SPOC: lessons from MOOC at Tsinghua and UC Berkeley. Mod. Distance Educ. Res. 4(2014), 13–21 (2014) 77. Yang, M., Shao, Z., Liu, Q., Liu, C.: Understanding the quality factors that influence the continuance intention of students toward participation in MOOCs. Educ. Tech. Res. Dev. 65(5), 1195–1214 (2017). https://doi.org/10.1007/s11423-017-9513-6

Retention of University Teachers and Doctoral Students

523

78. Yousef, A.M.F., Chatti, M.A., Schroeder, U., Wosnitza, M.: What drives a successful MOOC? An empirical examination of criteria to assure design quality of MOOCs. In: 2014 IEEE 14th International Conference on Advanced Learning Technologies, pp. 44–48. IEEE (2014) 79. Zheng, S., Rosson, M.B., Shih, P.C., Carroll, J.M.: Understanding student motivation, behaviors and perceptions in MOOCs. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, pp. 1882–1895. ACM (2015) 80. Zheng, S., Han, K., Rosson, M.B., Carroll, J.M.: The role of social media in MOOCs: how to use social media to enhance student retention. In: 2016 Proceedings of the 3rd ACM Conference on Learning@ Scale, pp. 419–428. ACM (2016)

Designing Culturally Inclusive MOOCs Mana Taheri(B) , Katharina Hölzle, and Christoph Meinel Hasso Plattner Institute, University of Potsdam, Prof.-Dr.-Helmert-Strasse 2-3, 14482 Potsdam, Germany {mana.taheri,katharina.hoelzle,christoph.meinel}@hpi.de

Abstract. Massive Open Online Courses (MOOCs) have become one of the increasingly popular channels of learning. Any given MOOC attracts thousands of learners from all around the globe with diverse language, ethnic and cultural backgrounds. The diversity among MOOC learners is one of the unique characteristics that differentiates them from any given campus classroom. This poses a great challenge for instructional designers to create learning experiences that resonates with their diverse audience. This paper reports on the efforts of an instructional design team, towards creating culturally inclusive MOOCs. A design-based approach was applied to experiment, test and evaluate these efforts over the course of four MOOCs on the topic of Design Thinking. We applied in-depth qualitative interviews with international course participants, pre-and post-course surveys, and observations from these courses to gather insights into learners’ perspectives. The authors suggest a set of instructional strategies to consider while designing MOOCs to address the needs of learners from diverse cultural backgrounds. With the increasing uptake of MOOCs all around the globe, it is of utmost importance for MOOC designers to create learning experiences that are inclusive for their diverse learners. Keywords: Massive Open Online courses (MOOCs) · Cultural inclusivity · Diversity

1 Introduction Massive Open Online Courses (MOOCs) have become one of the main channels of acquiring knowledge and new skills in today’s world. Compared to a typical campus classroom, any give MOOC draws a large number of learners from different corners of the world, with diverse languages, ethnic and cultural backgrounds. Although this offers a great opportunity for brining different views in one place, it poses a great challenge for instructional designers to create learning experiences that are culturally inclusive [1]. The advent of distance education as we know today began in the West and ever since there has been a significant amount of educational content created by Western universities and institutions and exported to different parts of the world via internet. As a result, there is a predominance of Western-centric instruction to learners of diverse cultural background [2]. This trend continues even to this day where the majority of MOOC content is provided by Western educational institutes and distributed through Western © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 524–537, 2020. https://doi.org/10.1007/978-3-030-58459-7_25

Designing Culturally Inclusive MOOCs

525

MOOC platforms [3]. In this light, instructional designers need to apply strategies and practices to accommodate diverse learners’ needs, if they aim to effectively reach their global audience. Paying attention to cultural diversity needs to be part of the instructional design process [4]. There have been some attempts by scholars to investigate the implication of cultural diversity for instructional design in traditional classrooms. In addressing cultural diversity in multicultural classrooms, Parrish and Linder-VanBerschot [5] name the following nuanced challenges that instructional designers face: • Gaining awareness about one’s own cultural biases and tendencies and accepting that their way of thinking is not necessarily the “right” way. • Understanding the cultural differences among students and appreciating them in order to support their learning with the appropriate instructions. • Accepting the responsibility of instructional designers to acculturate and respect the cultural background of individual students. • Understanding which student preferences and behaviors are related to cultural values and therefore not necessarily need to be challenged. • Considering that the instructional strategies and practices which are research-based are also culture-based, and therefore might not be appropriate at all times and may need adaptation and modification. With this background, MOOC designers face similar if not more challenges to create content and learning experiences that resonate with their diverse audience and are culturally inclusive. Despite the popularity of online learning and MOOCs, the body of literature on the intersection between cultural differences and the instructional design of online learning is narrow [2, 4, 6]. Chita-Tegmark et al. [7] suggest that creating learning experiences that reflect the cultural dimensions of learner variability not only diminishes learning barriers for culturally diverse learners, but also contributes to culturally informed learning opportunities for all learners. In addition, Meo [8] suggest that a curriculum that incorporates a variety of learner needs, not only benefits the “non-typical” learners, but also the “mainstream” learners would benefit from more flexible learning experiences. This paper reports on the efforts of an instructional design team in creating culturally inclusive MOOCs. It builds on a paper that was published in the proceedings of the 11th International Conference on Computer Supported Education (CSEDU 2019) that was held in Crete, Greece [9]. The goal was to create effective and inclusive learning experiences by accounting for cultural diversity in all aspects of instructional design. We applied a design-based approach and experimented with a number of practices, evaluated them and revised them if necessary. In the conference paper, the authors reported on the results from three MOOCs, while in this work, analysis from the forth MOOC is also included. This report is based on the results of four MOOCs on the topic of Design Thinking. For the sake of this work, we refer to these courses as MOOC#0, MOOC#1, MOOC#2, and MOOC#3. The courses are available online in archive mode

526

M. Taheri et al.

and their titles and links are provided in the footnotes1 . The courses ran on openHPI which is a well-established European MOOC platform, between November 2016 and September 2019 (see openhpi.de). Learner feedback was gathered through qualitative interviews with participants, as well as survey results and observations from the forum discussions and participants’ interactions. At the time of writing this paper, MOOC#3 has just finished. This work has two major contributions: On the one hand it adds to the relatively new discourse around MOOCs and cultural diversity and suggests practical recommendations for MOOC instructors and designers, on the other hand it contributes to teaching Design Thinking, a popular creative problem-solving approach, online and at a global scale. For the purpose of this chapter, we focus on the first contribution and refrain from deeper discussions about the latter. However, the instructional strategies that were chosen specifically due to the qualitative and explorative nature of the topic of these MOOCs will be highlighted. In the following we will first provide a brief literature review on culture and learning. Next, the research design is described followed by instructional practices applied in each MOOC. We will present the results of learners’ feedback on these practices as well as their relation to literature. And finally, we offer implications for culturally inclusive MOOC design.

2 Learning and Culture Culture is present in all aspects of an individual’s experiences. Culture shapes what we value and determines what behaviors we deem appropriate in certain contexts. Moreover, it influences our interactions with our surroundings and determines what aspect of our environment we pay attention to [10]. In addition, some aspects of culture including values, communication and learning styles, and traditions have direct implications for teaching and learning [11]. Chen, Mashhadi, Ang, and Harkrider [12] emphasize the relationship between learning and culture, and suggest that instructors need to apply and reflect critically on instructional practices that address learners’ needs from diverse cultural backgrounds. They argue that cultural inclusivity is one of the essential pillars of a student-centered learning environment. In order to understand the implication of cultural aspects on learning and teaching, scholars have often borrowed from research in other fields and have applied existing theoretical frameworks [2]. One of the first and still widely used frameworks of this kind is the Hofstede’s Model of Cultural Dimensions [13]. Critics of this popular framework however argue that one of the main drawbacks of Hofstede’s work is that it oversimplifies cultural differences and implies a static view on culture [14]. In addition, Hofstede’s 1 MOOC#1: Inspirations for Design: A Course on Human-Centered Research. https://open.hpi.

de/courses/insights-2017. MOOC#2: Human-Centered Design: From Synthesis to Creative Ideas. https://open.hpi.de/ courses/ideas2018. MOOC#3: Human-Centered Design: Building and Testing Prototypes https://open.hpi.de/ courses/prototype2019.

Designing Culturally Inclusive MOOCs

527

study assumes a homogenous domestic population within geographical borders and therefore dismisses the nuances and diversities that exist within different nations [15]. Goodfellow and Lamy [16] argue that the problem with this approach to culture is that it sees individuals mainly in terms of their cultural attributes: “The understanding of the notion of cultural differences that underpins most current research arises from a view of cultures as the manifestation in individuals of all the values, beliefs and ways of thinking and doing things that come with the membership of particular national, tribal, ethnic, civic or religious communities. Culture, in this view, is a consequence of geographical, historical, climatic, religious, political, linguistic and other behavior and attitude-shaping influences that are assumed to act on everyone who shares the same physical and social environment.” [16] p. 7 It is important to mention that we agree with scholars such as Goodfellow and Lamy, Gunawardena and Jung, and Signorini et al., who view culture as a complex set of practices and values that are not necessarily limited to the geographical borders and national level. Failing to accommodate the cultural sensitivities may lead to misunderstandings between instructors and students in both online and traditional courses. Considering the high level of diversity among participants of any given MOOC, creating effective learning experiences with cultural diversity in mind, is a great challenge for instructional designers. Although MOOCs may seem very novel and innovative, they are only the latest advancement in the field of distance and open education [17]. Therefore, MOOC research should build on and draw from good practices in distance education. Spronk [18] highlights a number of good practices on accounting for learners’ cultural diversity in developing and delivering distance education. These practices include: contextualizing the learning, creating safe spaces for learning, welcoming alternatives, using media effectively, and celebrating diversity. Teachers and instructional designers play a significant role in creating inclusive learning experiences, both onsite and online. Rogers et al. [2] explore the role of instructional designers in understanding and addressing cultural diversity in creating online educational content. Alongside other scholars, such as McLoughlin and Oliver [19] and Chen and Mashhadi [20], they argue that there needs to be more sensitivity and responsiveness to cultural differences amongst instructional designers. Bonk and his colleagues conducted a survey with instructors from major MOOC platforms to gain insights into strategies and practices that they apply to address cultural diversity in their courses. Some of these strategies contain: using visuals, using simple language and avoiding sophisticated phrases, providing text alongside audio or video, and refraining from gestures and body language that may not be familiar to other cultures. Regarding the importance of considering cultural diversity for course design, McLoughlin [21] states: “Unless educators address the issue of teaching to a diverse body of students, and do so systematically, then online delivery may become just another way of dumping course content, with the assumption that all students, regardless of cultural background, can access learning resources and achieve success.”

528

M. Taheri et al.

3 Research Design In this work we applied design-based research methodology to experiment with different strategies and gather feedback from learners about their effectiveness. Anderson and Shattuck [22] describe design-based research as a methodology that helps bridge the gap between research and practice in educational research. Wang and Hannafin [23] characterize the methodology as both systematic yet flexible, which is a result of a collaboration between researchers and practitioners. It is an approach that applies iterative analysis, design, development and implementation to improve educational practices. Thus, scholars recommend close collaboration and partnership between researchers and practitioners throughout the whole process of identifying problems, consulting literature, designing interventions, implementing and assessing them [22]. Thus, it was beneficial that the instructional designers were also conducting research. The following research questions guided this work: • Regarding addressing cultural diversity among learners, how did the MOOCs perform? • What are some of the practices that worked and what are the areas for improvement? In order to address these questions, we applied in-depth qualitative interviews with diverse learners, pre-and post-course surveys, as well as observations from assignments and discussion forums. By analyzing learner feedback and critically reflecting on each MOOC, we were able to then improve the succeeding MOOC. In the following we will briefly describe the structure of each MOOC and the instructional practices applied. We will also report on learner feedback and the performance of each MOOC regarding cultural inclusivity. 3.1 MOOC#0 The MOOC#0 served as a pilot version and ran in November 2016 on openHPI platform. The aim was to experiment with various instructional practices and gather learner feedback. For this reason, we used various channels to recruit a limited number of international participants (120 enrolled learners). The main objective of this course was to enable learners to identify inspirations and opportunities for designing human-centered solutions. Therefore, the focus was on introducing two methods of Qualitative Interviewing and Observation. Table 1 provides an overview of the structure of the MOOC#0. In the pre-course survey of the pilot MOOC, we asked participants to indicate their interest in taking part in follow-up interviews. Our goal was to evaluate how well MOOC#0 performed regarding cultural inclusivity and which aspects needed iteration and improvement for MOOC#1. In total, we conducted 16 interviews with course participants, nine of whom were international. Since the course was offered by a German institute, those who had lived for most of their lives outside of Germany were considered international. The interviews were conducted by three researchers and lasted between 30 and 60 min. The interviewees were from the following countries: Chile (1)—France (1)—Iran (2)—Russia (1)—The Netherlands (1)—Italy (1)—India (1)—Kuwait (1). The

Designing Culturally Inclusive MOOCs

529

interviews were conducted via Skype or in person. Apart from the overall learning experience and the platform, some questions focused on the culturally inclusive aspects of the course. For instance, interviewees were asked whether they found any of the material offensive or culturally insensitive, how they perceived the course in terms of addressing cultural diversity, and whether there was any aspect of the course that they found unclear or confusing. And finally, we asked them to share their recommendations for future improvements. Table 1. The structure of MOOC#0, from [9]. Week 1 Videos Introduction game Week 2 Videos Exercise Peer-reviewed assignment Week 3 Videos Exercise Peer-reviewed assignment Week 4 Wrap up

The interviewees had different levels of experience with MOOCs as well as with Design Thinking. For instance, one interviewee was herself an instructional designer of a MOOC on the same topic, whereas another interviewee was participating in his first MOOC. Therefore, the feedback gathered was on different levels regarding the content and the instructional strategies that were applied in the course. We dedicated the first week of the course to learners getting familiar with the learning platform. We designed an introductory game that would encourage interactions between learners and offer opportunities for learners to share something from their own context with their peers. In this game, participants were asked to introduce themselves by sharing a picture of three important artefacts from their daily lives in the discussion forum and explain their choices. Learners reacted positively to this exercise conducted in the first week (44 learners participated in this exercise). This activity sparked many conversations on the forum and interactions continued until the final week. All interviewees had a positive view of the first week and the introductory game. We used clear and simple language for instructions. Instead of using domain specific terminologies and jargons, we looked for simpler and more familiar synonyms if possible. Surprisingly, the fact that the instructors were not-native but fluent English speakers contributed to the simple language of the course, something that was even viewed as positive by one interviewee. The lecture views were short (max. 6 min) and they were used to either introduce a concept or to demonstrate an application of the concept in real life. We paid close attention to visuals and examples used in the videos and made sure to incorporate images and references from around the globe (e.g. an example from South Africa and France). Bringing in narratives and individuals from different cultures helps students to understand their commonalities as well as differences among cultures [24]. In addition, we consciously refrained from referring to characters and events that are only known to a specific part of the world. Following Bonk and his colleagues’ recommendation, we used visuals and images in both video and written content, instead of long text. We paid careful attention to

530

M. Taheri et al.

using visuals and text that are culturally appropriate and refrained from using content that could evoke unwanted emotional reaction from students and therefore interfere with their learning. Griffin, Pettersson, Semali and Takakuwa [25] point out how simple symbols can have completely different, and sometimes offensive interpretation, across cultures. Murrell [26] also warns about the use of icons and symbols of alcohol (e.g. popping champagne) or animals as feedback response in learning material. Throughout the course, we designed exercises that offered learners a chance to practice what they learned, without being graded. There were two qualitative assignments in the course that required peer assessment. In designing assignments, we followed Nkuyubwatsi’s [27] recommendation and enabled learners to make their learning relevant to their own context by giving them freedom to choose project topics and share examples from their environment. The first peer-reviewed assignment asked learners to take a picture of a creative solution to a design problem (also known as Workaround) from their own lives. The feedback showed that giving students the freedom to choose project topics and share with their peers is a good strategy for allowing students to contextualize their learnings. However, through our observations and learner feedback, we learned that only sharing a picture of a Workaround from one context, might not suffice and be confusing for peers to review, hence there is a risk of under grading by peers. In other words, a creative and innovative solution in one context may be a common practice in another. In the second assignment, we offered three interview topics and asked learners to pick one and to conduct an interview. The challenge was to select three topics that are relevant to different contexts. Therefore, we asked ourselves the following questions: Is this topic culturally insensitive to any group? (e.g. dating in the digital age). Can everyone relate to the topic regardless of their context? (e.g. living in a shared flat). Does this topic exclude any group? (e.g. gym membership). We discussed various potential topics through the lens of these questions, and finally picked the following three: visiting a new city, packing and preparing for a trip, and first day on a new job or at school. In this way we made sure that the topics are relevant to different cultural and professional settings [27]. For this assignment, we designed a template that was visual and helped learners to organize the tasks they were required to execute. Lastly, the diversity among the team of instructors was pointed out by interviewees as a positive aspect. One of the instructors was from Iran while the other two were from Germany, and all three came from different disciplines. We included our own unique stories and experiences in the form of little anecdotes within the course content (e.g. in the introduction game at the beginning of the course). 3.2 MOOC#1 After iterating the pilot MOOC and incorporating feedback from learners, MOOC#1 was launched publicly in September 2017 with about 5000 participants. The structure and the content of this course remained the same and we continued with the good practices from the pilot version. The following describes the iterations we made based on feedback from MOOC#0. We introduced a visual time plan and changed the assignment deadlines in the course. A common practice in MOOCs—which are mostly designed in the West—is to set

Designing Culturally Inclusive MOOCs

531

deadlines on the weekend (e.g. on Sundays). Whereas in several countries in the Middle East, Friday is the day of rest. This was also mentioned during the interviews with the participants from the pilot MOOC. Therefore, we set the deadlines and the release date for each week’s material to Thursdays. Thursday is close to the end of the week for everyone and in this way, learners can have a full weekend prior to each week’s deadline. We provided the option to download the content text and added subtitles to the video lectures. Learners could also download the summary of the course content and additional readings and resources were also recommended for those who wish to delve deeper into the given topics. Although allowing learners to contextualize their learning through identifying a local Workaround and sharing a picture with their peers proved to be a good practice, we learned that allocating a space in the assignment for learners to provide context and reasoning alongside the image is important. This resulted in less confusion and better peer reviews on the assignment. During the wrap-up week, we recorded a video showcasing examples of learner submissions for the assignments. We made sure to select submissions from learners in different parts of the world in order to represent the diversity in the course. Finally, we asked learners to reflect on how they would apply their learnings in their professional or daily lives and share it on the forum (Fig. 1).

Fig. 1. Survey answers regarding cultural inclusivity of MOOC#1, from [9].

The following two questions are examples of the questions asked in post-curse survey to evaluate the MOOCs’ performance with regard to cultural inclusivity: • How would you evaluate this course in terms of cultural inclusivity? (10: very good, 1: very poor) • Was there any aspect of this course that you found insensitive towards your own or any other culture? If yes, please elaborate. In this MOOC, 529 learners submitted the survey. The two following figures demonstrate the evaluations of the above-mentioned questions successively: It is worth mentioning that there were no written responses from those who said yes to the second question, that we could report here.

532

M. Taheri et al.

3.3 MOOC#2 This MOOC was launched in September 2018 with around 3500 participants. The topic of the MOOC was Synthesis and Ideation. We adjusted the structure of this MOOC to allow for more learner flexibility. All content was made accessible at once, and we offered only one assignment that would cover both core topics of the course. This approach proved to work better for our learners. Table 2 shows the structure of this MOOC. Table 2. The structure of MOOC#2, from [9]. Week 1 Videos Introduction game Week 2 Videos Exercise Peer-reviewed assignment Week 3 Videos Exercise Week 4 Wrap up

In order to create an environment of mutual respect among learners [28], we introduced a short video on ethics and values that we, as instructors wished to promote in the course. Some of the topics covered in this video included, emphasis on valuing constructive feedback from peers over grades and being mindful of the diverse community of learners. One of our learning from MOOC#1 was to encourage constructive feedback in peer-reviews. In fact, different cultures have different attitudes towards feedback [29]. While direct feedback may be deemed as valuable in one culture, in another culture it may be perceived as impolite and thus discouraging. Therefore, in this video, we emphasized how to provide constructive feedback along with the use of grading rubrics. In addition, in MOOC#1 we observed that some of the German participants would ask questions or write comments in German. In this video we once again reminded our learners of the language of the course and the importance of trying to include everyone in the forum conversations. In the post course survey, we asked participants to give us feedback on the ethics and value video. The responses were quite positive. Many mentioned it helped them with providing feedback and focusing on learning rather than grade. Some also found it valuable in raising awareness about cultural sensitivity within the course. The following quotes are some examples of learners’ feedback: “I find the video helpful because it reminded me of the different perceptions of critique in different nations.” “For me this video was very helpful, because first of all cultural sensitivity and constructive feedback are always very important. And second, I dared to write in English, although my English is not so good.” “This is the first time that I have seen this type of video in a MOOC. I think it’s great to have considered multicultural components that can represent a MOOC.” “I found it helpful, as I come from a direct type of culture and usually don’t praise enough.”

Designing Culturally Inclusive MOOCs

533

Introducing this video led to less misunderstandings and complaint reports from peer-reviews. Finally, in order to offer multiple avenues to learners to interact with the content, we created two podcasts that delved deeper into the theories behind synthesis and ideation in Design Thinking. The post-course survey data shows a positive perception of the overall performance of this MOOC with regard to cultural inclusivity. We repeated the two questions from MOOC#1, asking participants to evaluate the course regarding its cultural inclusivity (10: very good, 1: very poor), and to share with us if they have found any aspect of the course insensitive. 285 learners submitted the survey. The following two graphs demonstrate the evaluations of the above-mentioned questions successively (Fig. 2):

Fig. 2. Survey answers regarding cultural inclusivity of MOOC#2, from [9].

3.4 MOOC#3 The third course of the MOOC series on Design Thinking, ran in August 2019, with around 3500 participants. The topic of this course was Prototyping and User Testing. Since we received positive feedback on the structure of MOOC#2 and the flexibility it offered to learners, we decided to keep the structure of this course the same. We continued with the good practices from the previous courses. We introduced a short video that explained the course assignment and reminded learners to watch the video about course ethics and values again prior to reviewing their peers work. In fact, this was recommended by one of the learners in the post-course survey of MOOC#2. The following two graphs demonstrate how learners perceived the performance of the course regarding cultural inclusivity, and if they found any aspect of the course insensitive to their own or any other culture. 268 learners participated in the survey (Fig. 3). Finally, the following quotes are from some of the learners in this MOOC about the ethics and values video. “I found it absolutely necessary to raise awareness about this topic. Coming from German engineering work culture, I know that we usually are perceived as being very direct and ruthless with our critique and feedback.”

534

M. Taheri et al.

Fig. 3. Survey answers regarding cultural inclusivity of MOOC#3.

“For me personally, it goes without saying that you treat each culture respectfully. Nevertheless, I found this video useful in terms of cultural sensitivity and constructive feedback because as a result, this topic was once again called to mind for everyone.” “I think the video is a really good instruction to set the right tone for everybody taking part. Personally, I am super happy to have the openhpi.de pointing out that they celebrate diversity and welcoming people from all over the world.”

4 Recommendations for MOOC Designers In this work, we applied design-based research which allowed us to experiment and try-out different instructional practices and to gather insights from learners’ feedback. Testing a pilot version of the MOOC#1 with limited number of participants proved to be very valuable, even for designing the successive tow MOOCs that were dealing with different aspects of Design Thinking. One of our recommendations to MOOC designers is to test the entire or various aspects of the learning experience (e.g. an assignment or a template) with diverse group of potential learners, to gain insights about learners’ perspective. According to Kieran and Anderson [30], one of the pillars of culturally responsive teaching is designing assignments that encourage learners to make meaning and construct knowledge in their own context. Apart from making sure that learners could see the applications in professional or everyday lives, instructors need to assure that learners from diverse cultural settings can relate to the course and its content. This requires MOOC instructors to think beyond what they are familiar with. Thus, we recommend to reach out to people from different cultural backgrounds and ask them for feedback on how the course and its different facets resonate with them, and to be open to adjust and change if needed. One of the good ways to create a learning community and to promote an environment of mutual respect is to create opportunities for learners to share some aspect of their own context with other participants. Using introductory games that allow learners to feel comfortable and encourages them to share proved to be a good way to start a

Designing Culturally Inclusive MOOCs

535

course. Therefore, we recommend MOOC designers to apply creative ways to start their course instead of jumping right into the content. To make sure that learners from diverse backgrounds and different levels of MOOC experience would feel welcomed, one helpful way is to allocate the first week to getting familiar with the learning platform and getting to know the community in a playful way. According to Ginsberg [28], project-based assignments that offer learners freedom to choose the project topic that they find relevant to their context is a good approach towards a learning environment that is culturally responsive. However, when using peerreviewed and project-based assignments, MOOC instructors need to be aware of the risk of misinterpretations that may occur due to lack of context-relevant knowledge between peers. This may eventually lead to poor or unfair grading in peer-reviews. Therefore, creating clear grading rubrics is crucial. Moreover, providing examples of the application of rubrics can be very helpful. Finally, allocating space within the assignment for learners to provide context and reasoning behind their choices can help to avoid misunderstandings as well. Liyanagunawardena and Adams [31] highlight the challenges of creating inclusive and dynamic discussions in MOOCs; for instance, something humorous in one context may be perceived as offensive in another. Given that in MOOC learners from various cultures are engaging in the dialogue, the risk of misunderstandings and conflicts is higher than in a traditional classroom. Therefore, instructors need to pay careful attention to facilitating the dialogue and interactions in forums and reviews. Moreover, we recommend that instructors explicitly emphasize those behaviours and values that they wish to promote in their MOOCs, through video or text. The instructional strategies that were tested in these three MOOCs were informed by the literature and our own experiences as instructors. It is worth mentioning that we are well-experienced in teaching Design Thinking in international formats. All three instructors have either studied or worked internationally and in different countries. In fact, the instructors background and culturally sensitivity can be a valuable asset in designing inclusive learning experiences. Therefore, forming instructional design teams with diverse cultural backgrounds can lead to creating learning experiences that resonates with diverse learners. Some of the recommendations for MOOC designers may seem minor, such as changing the assignment and course deadlines from Western norms. However, they send a strong message to the learners from different cultures, that they are welcomed and the course is designed with them and their needs in mind.

5 Discussion MOOCs have the great potential of bringing new skills and knowledge to learners from all around the world. With the ever-increasing popularity of MOOCs and the growing number of universities and organizations offering MOOCs, a good instruction needs to go beyond transforming an existing lecture into a compact online format. If MOOC instructors wish to create effective learning experiences for their diverse audience, they need to step out of their comfort zone, be more sensitive towards different cultures, and go beyond what they are familiar with. It is indeed a great but exciting challenge that MOOC designers face, to explore creative ways to reach their global audience from diverse cultural backgrounds.

536

M. Taheri et al.

As Nkuyubwatsi [27] highlights that those who aim to democratize education and improve people’s lives through education in developing countries need to develop an understanding of local challenges from the perspective of local people in those contexts. In other words, they “need to empathize with local stakeholders.” The grand mission of democratizing education has no real meaning unless MOOC instructors embrace cultural diversity and try to create learning experiences that resonates with learners beyond their own context. Cultural responsiveness needs to be present in all aspects of MOOC creation, including planning, design, delivery and assessment. Moreover, besides the domain-related knowledge and skills, instructors need to equip themselves with culturally responsive teaching practices. According to McLoughlin [21], the common view on inclusivity is ‘deficit-driven’ which implies that the international learners of diverse race, language and ethnic backgrounds need to be brought up to the ‘normal’ standards by compensating for their ‘deficit’. On the contrary, inclusivity means allowing for diverse experiences to be expressed in teaching and learning and embracing differences [32]. Our experience with designing these MOOC series with diverse audience in mind supports that embracing diversity within MOOCs contributes to a rich learning experience for all. We encourage MOOC designers to treat diversity as a valuable asset in their instructional design process and practices, rather than a hurdle, and to take advantage of its potential for designing creative and innovative instructional practices. After all, inclusivity is just part of a good pedagogy.

References 1. Taheri, M., Mayer, L., von Schmieden, K., Meinel, C.: The DT MOOC prototype: towards teaching design thinking at scale. In: Plattner, H., Meinel, C., Leifer, L. (eds.) Design Thinking Research. UI, pp. 217–237. Springer, Cham (2018). https://doi.org/10.1007/978-3-31960967-6_11 2. Rogers, P.C., Graham, C.R., Mayes, C.T.: Cultural competence and instructional design: exploration research into the delivery of online instruction cross-culturally. Educ. Technol. Res. Dev. 55, 197–217 (2007). https://doi.org/10.1007/s11423-007-9033-x 3. Frechette, C., Gunawardena, C.N.: Accounting for culture in instructional design (2014) 4. Bonk, C.J., Zhu, M., Sari, A., Kim, M., Sabir, N., Xu, S.: Instructor Efforts to Address Cultural Diversity in MOOC Design and Application 5. Parrish, P., Linder-VanBerschot, J.A.: Cultural dimensions of learning: addressing the challenges of multicultural instruction. Int. Rev. Res. Open Distrib. Learn. (2010). https://doi.org/ 10.19173/irrodl.v11i2.809 6. Jung, I.: Cultural influences on online learning (2014) 7. Chita-Tegmark, M., Gravel, J.W., De Lourdes, B.S.M., Domings, Y., Rose, D.H.: Using the universal design for learning framework to support culturally diverse learners. J. Educ. 192, 17–22 (2012). https://doi.org/10.1177/002205741219200104 8. Meo, G.: Curriculum planning for all learners: applying universal design for learning (UDL) to a high school reading comprehension program. Prev. Sch. Fail. Altern. Educ. Child. Youth (2008). https://doi.org/10.3200/psfl.52.2.21-30 9. Taheri, M., Hölzle, K., Meinel, C.: Towards culturally inclusive MOOCs: a design-based approach. In: CSEDU 2019—Proceedings of the 11th International Conference on Computer Supported Education (2019)

Designing Culturally Inclusive MOOCs

537

10. Nisbett, R.E., Choi, I., Peng, K., Norenzayan, A.: Culture and systems of thought: holistic versus analytic cognition. Psychol. Rev. (2001). https://doi.org/10.1037/0033-295X.108. 2.291 11. Gay, G.: Preparing for culturally responsive teaching. J. Teach. Educ. 53, 106–116 (2001). https://doi.org/10.1177/0022487104267587 12. Chen, A.Y., Mashhadi, A., Ang, D., Harkrider, N.: Cultural issues in the design of technologyenhanced learning systems. Br. J. Educ. Technol. 30, 217–230 (1999). https://doi.org/10.1111/ 1467-8535.00111 13. Hofstede, G.: Cultural differences in teaching and learning. Int. J. Intercultural Relat. 10, 301–320 (1986). https://doi.org/10.1016/0147-1767(86)90015-5 14. Signorini, P., Wiesemes, R., Murphy, R.: Developing alternative frameworks for exploring intercultural learning: a critique of Hofstede’s cultural difference model. Teach. High. Educ. 14, 253–264 (2009). https://doi.org/10.1080/13562510902898825 15. Gu, Q., Maley, A.: Changing places: a study of Chinese students in the UK. Lang. Intercultural Commun. 8, 224–245 (2008). https://doi.org/10.1080/14708470802303025 16. Goodfellow, R., Lamy, M.-N.: Learning Cultures in Online Education. A&C Black (2009) 17. Mazoue, J.G.: The MOOC model : challenging traditional education. Educause Review (2013) 18. Spronk, B.J.: Addressing cultural diversity through learner support. In: Learner Support in Open, Distance and Online Environments (2004) 19. McLoughlin, C., Oliver, R.: Designing learning environments for cultural inclusivity: a case study of indigenous online learning at tertiary level. Australas. J. Educ. Technol. 16, (2000). https://doi.org/10.14742/ajet.1822 20. Chen, A.Y., Mashhadi, A.: Challenges and problems in designing and researching distance learning environments and communities (1998) 21. McLoughlin, C.: Inclusivity and alignment: principles of pedagogy, task and assessment design for effective cross-cultural online learning. Distance Educ. (2001). https://doi.org/10. 1080/0158791010220102 22. Anderson, T., Shattuck, J.: Design-based research: a decade of progress in education research? Educ. Res. 41, 16–25 (2012). https://doi.org/10.3102/0013189X11428813 23. Wang, F., Hannafin, M.J.: Design-based research and technology-enhanced learning environments (2005) 24. Huang, H.-J.: Designing multicultural lesson plans. Multicultural Perspect. 4, 17–23 (2002). https://doi.org/10.1207/s15327892mcp0404_4 25. Griffin, R.E., Pettersson, R., Semali, L., Takakuwa, Y.: Using symbols in international business presentations: how well are they understood. In: Imagery and Visual Literacy (1994) 26. Murrell, K.A.: Human computer interface design in a multi-cultural multi-lingual environment. In: 13th Annual MSc and PhD Conference in Computer Science (1998) 27. Nkuyubwatsi, B.: Cultural translation in massive open online courses (MOOCs). In: eLearning Papers (2014) 28. Ginsberg, M.B.: Cultural diversity, motivation, and differentiation. Theory Pract. 44, 218–225 (2005). https://doi.org/10.1207/s15430421tip4403_6 29. Bailey, J.R., Chen, C.C., Dou, S.-G.: Conceptions of self and performance-related feedback in the U.S., Japan and China. J. Int. Bus. Stud. 28, 605–625 (1997). https://doi.org/10.1057/ palgrave.jibs.8490113 30. Kieran, L., Anderson, C.: Connecting universal design for learning with culturally responsive teaching (2018) 31. Liyanagunawardena, T., Williams, S., Adams, A.: The impact and reach of MOOCs: a developing countries’ perspective. In: eLearning Papers (2013) 32. Gallini, J.K., Zhang, Y.-L.: Socio-cognitive constructs and characteristics of classroom communities: an exploration of relationships. J. Educ. Comput. Res. 17, 321–339 (1997). https:// doi.org/10.2190/y71j-t3ej-vftj-p6wg

Evaluation of an Interactive Personalised Virtual Lab in Secondary Schools Ioana Ghergulescu1(B)

, Arghir-Nicolae Moldovan2 , Cristina Hava Muntean2 and Gabriel-Miro Muntean3

,

1 Adaptemy, Dublin, Ireland [email protected] 2 School of Computing, National College of Ireland, Dublin, Ireland {arghir.moldovan,cristina.muntean}@ncirl.ie 3 School of Electronic Engineering, Dublin City University, Dublin, Ireland [email protected]

Abstract. Virtual labs are increasingly used for STEM education as they enable students to conduct experiments in a controlled environment at their own pace. While there was much research on personalisation in technology enhanced learning, most existing virtual labs lack personalisation features. This chapter presents results from a small-scale pilot from two secondary schools in Ireland, that was conducted to evaluate the Atomic Structure interactive personalised virtual lab as part of the NEWTON project. The research methodology followed a multidimensional pedagogical assessment approach, to evaluate the benefits of the lab in terms of learning outcomes, learner motivation and lab usability. The students were divided between an experimental group that learned with the lab and a control group that attended a traditional class. The results analysis show that the experimental group outperformed the control group in terms of both learning achievement and motivation dimensions such as interest, confidence, engagement and enjoyment. Keywords: Virtual labs · Personalisation · STEM education · Inquiry-based learning · Evaluation

1 Introduction Currently, there is concern related to the low and decreasing engagement with science, technology, engineering and mathematics (STEM) education across many countries and education levels ranging from secondary to third level [31, 39, 47]. This issue is drawing increasing attention from bodies such as OECD and EU given that STEM graduates play a crucial role in today’s knowledge economies through technological innovation [20, 46]. Often, students start to become disengaged with STEM at an early age due to factors such as perceived difficulty of STEM modules [47, 51], negative perceptions of the STEM field and negative beliefs in their abilities for STEM [1, 58]. Various innovative technologies and pedagogical approaches have been proposed in order to © Springer Nature Switzerland AG 2020 H. C. Lane et al. (Eds.): CSEDU 2019, CCIS 1220, pp. 538–556, 2020. https://doi.org/10.1007/978-3-030-58459-7_26

Evaluation of an Interactive Personalised Virtual Lab

539

increase student’s engagement with STEM subjects, including: adaptive and personalised learning [44], inquiry-based learning [31], and remote fabrication labs and virtual labs [48]. Virtual labs help to overcome the costs associated with maintaining physical labs, as well as a solution to make practical science education available to online learners [35, 36]. In a virtual lab, students can practice at their own pace in a safe simulated environment, being able to define their own experiments and to repeat them as many times as they want. While many virtual labs were developed over the years for different subjects and education levels, most of these are not personalised to learner’s needs and profile. Furthermore, few research studies have evaluated the benefits of virtual labs in terms of their impact on learner motivation aspects such as interest, confidence, engagement and enjoyment. This chapter presents the results of a small-scale pilot conducted in two secondary schools from Ireland to evaluate the Atomic Structure interactive personalised virtual lab. The lab enables students to learn about atoms, isotopes and molecules in an active learning way by integrating various technologies and pedagogies, including: inquirybased learning, multimedia, interactive builders, quizzes, and gamification. The pilot used a multidimensional methodology that applied knowledge tests and surveys before and after the learning session in order to assess the impact of the Atomic Structure virtual lab on learners’ knowledge, motivation and usability. 1.1 The NEWTON Project This research work is part of the NEWTON Project (http://newtonproject.eu), a largescale EU H2020 innovation action that involves 14 academic and industry partners. The project focuses on employing innovative technologies to support and improve STEM education. NEWTON innovative technologies include solutions for user modelling, adaptation and personalisation in order to increase learner quality of experience, improve learning process, and potentially increase learning outcome [17]. Interactive educational computer-based video games and gamification are used to stimulate and motivate students [18], augmented reality allows learners to access computer generated models of scientific content, while interactive avatars guide students with special learning needs in a manner which suits them the best [10]. Virtual teaching and learning laboratories [9] and remote fabrication labs [55] allow students to experiment in simulated environments and eventually transform their solutions into real life products. The project also employs adaptive multimedia to overcome network and device limitations and improve the quality of experience [25, 40, 42], as well as multi-sensorial media (or “mulsemedia”) that helps engage three or more human senses in the learning process, including smell and touch [7]. Different innovative pedagogical approaches are also deployed as part of the STEM teaching and learning process such as flipped classroom, game-based and problem-based learning [12, 45, 61]. All these technologies are implemented and deployed within an educational platform called NEWTELP (http://newtelp.eu). The platform enables educational content to be stored and delivered to learners as part of real-life pilots to see whether and how they help

540

I. Ghergulescu et al.

students to engage more with STEM subjects. Over thirty NEWTON small- and largescale pilots on various technologies, were conducted in different primary and secondary schools, university and vocational institutions from across Europe. This chapter is an extended version of the paper presented at CSEDU 2019 conference [24]. The chapter includes additional literature review on self-directed learning and personalisation as well as more detailed results from the small-scale pilot that evaluated the Atomic Structure interactive personalised virtual lab. 1.2 Chapter Structure The rest of this chapter is organized as follows. Section 2 discusses recent related works on self-directed learning, personalisation and virtual labs. Section 3 presents the Atomic Structure interactive personalised virtual lab. Section 4 presents the research methodology for the evaluation study. Section 5 presents the results analysis in terms of learning, motivation and usability. Section 6 discusses the main findings and limitations of the study and concludes the paper.

2 Related Work 2.1 Self-directed Learning One of the innovative aspects of the Atomic Structure interactive virtual lab is that it integrates personalisation based on student’s level of self-directed learning. Self-directed learning (SDL) is not a new idea, with the Greek biographer and essayist Plutarch stating in the first century AD that: “the mind is not a vessel to be filled, but a fire to be kindled”. While there are various definitions, the most cited and well-known is the one proposed by Malcolm Knowles in his book “Self-directed learning: a guide for learners and teachers” [34]: “In its broadest meaning, self -directed learning describes a process in which individuals take the initiative, with or without the help of others, in diagnosing their learning needs, formulating learning goals, identifying human and material resources for learning, choosing and implementing appropriate learning strategies, and evaluating learning outcomes.” [34] As summarised in [56], self-directed learning involves several different activities: • • • • • • • •

Setting own learning goals; Identifying appropriate learning resources; Selecting appropriate learning strategies; Selecting important from unimportant; Integrating material from different sources; Time management; Monitoring achievement of learning outcomes; Monitoring effectiveness of own study habits.

Evaluation of an Interactive Personalised Virtual Lab

541

While self-directed learning was commonly associated with adult learning and third level education [19], this is also increasingly applied to other levels of education, such as secondary education [2], vocational education [11], and primary education [15]. Selfdirected learning is considered especially important for online learning environments where there is a separation between teacher and student. Online environments provide learners with an increased level of control, and also influence learner’s perception of their self-direction [52]. Various frameworks for self-directed learning were proposed in the literature. According to the framework proposed by Tan and Koh [54], self-directed learning can be applied both in schools and out-of-school contexts. Moreover, the learning experience in each context can be either structured or unstructured. Song and Hill [52] have proposed a conceptual model for understanding self-directed learning in online environments that incorporates SDL as a combination of personal attributes and a learning processes. 2.2 Personalisation Personalisation is a key factor in modern education, as the differences between learners are now widely recognised by both researchers and educators alike. Personalisation is also one of the biggest current trends in the e-learning industry [16]. There is an increasing need for personalised and intelligent learning environments as learners differ in levels of knowledge, motivation, and have a variety of learning styles and preferences [26]. Personalised learning can support individual learning and further engage learners in their studies. Gaps between slow and fast learners are consistently emerging, and teaching to cater for these differences between students is being noted as one the most challenging aspects by science educators [32, 35]. There is a call for personalisation to be implemented into modern pedagogies, in order to meet the needs and interests of different types of learners [4]. In Technology Enhanced Learning (TEL), personalisation is one of the key features, and can assist in bringing the focus of the learning experience to the student instead of the teacher [57]. Personalisation can be implemented at learning system or course-level, or specifically at a content-level. At course level, several intelligent learning systems have implemented localisation and internationalisation [28, 33, 50]. Other systems enable educators and course organisers to add, develop and choose their own specific modules and courses [14, 33, 50]. Learning paths-based personalisation enables provision of content in a manner most suitable for each student. User modelling is also commonly used as a personalisation attribute in a learning system, as gathering data about leaner enables the system to adapt to the learner. Innovative pedagogies are also implemented as part of personalisation, and include e.g. self-directed learning (SDL), game-based learning and inquiry-based learning. Content level personalisation is more common and utilised in many systems at different levels. The different types of content-level personalisation include gamification, feedback-based personalisation, variation in levels of difficulty, and innovative pedagogies (learning loops, special education, learning paths-based personalisation, user-model based personalisation).

542

I. Ghergulescu et al.

2.3 Virtual Labs While many virtual labs were developed over the years, most of them targeted third-level education rather than secondary school education, although universities typically have more resources and better physical laboratories and equipment. Moreover, this is despite the fact that learners’ disengagement from the STEM area starts during secondary level education in many countries when students start choosing which subjects they wish to pursue [1, 8]. Table 1 presents a summary of some existing virtual labs and platforms. Several European projects have focused on virtual labs. The Go-Lab project [33] has created a platform that enables educators to host and share with other users virtual labs, apps and inquiry learning spaces. The VccSSe project [27] created a virtual community collaborating space for science education that provided virtual labs and training materials in physical laws including simulation-based exercises. The GridLabUPM [22] platform hosts a number of virtual laboratories that offers students practical experiences in the fields of electronics, chemistry, physics and topography. The BioInteractive [30] platform provides science education resources including activities, videos and interactive media (i.e., virtual labs, click & learn, interactive videos, 3D models, short courses). Other virtual labs/platforms include the Gizmos mathematics and science simulations [21], Chemistry Lab and Wind Energy Lab [38], ChemCollective [60], Open Source Physics [13], and Labster [53]. Table 1. Summary of existing virtual labs and platforms [24]. Virtual Lab/Platform name

Activities and learning materials

Adaptation and personalisation

The Go-Lab Project [33]

Multimedia material, Interactive learning activities

Gamification, Internationalisation, Inquiry Learning Spaces

Open Source Physics [13]

Chat, email, virtual reality

N/A

VccSSe [27]

Interactive learning activities

N/A

Bio Interactive (HHMI, n.d.)

Activities, videos, interactive media

N/A

Gizmos [21]

Interactive simulations

N/A

Chemistry Lab, Wind Energy Lab [38]

Mini-games

Difficulty adjustment

ChemCollective [60]

Interactive learning activities

N/A

Labster [53]

Simulations-based exercises

N/A

Most of these virtual labs offer simulation-based exercises, interactive activities and online tutorials to assist the student in their learning journey. The online tutorials and the multimedia educational resources are suitable to present the theoretical aspects, while

Evaluation of an Interactive Personalised Virtual Lab

543

the interactive activities and simulation-based exercises are important in achieving the practical skills and in understanding the phenomena/concepts. While virtual labs offer students a chance to practice their all-important practical skills in a safe environment, most virtual labs lack personalization and adaptation features, and neglect inclusive education. Many virtual labs have also been criticised for over simplification of experiments, with the result that students do not learn all the necessary skills associated with specific exercises. A number of research studies have conducted evaluations of virtual labs. Aljuhani et al. [3] evaluated a chemistry virtual lab in terms of usability and knowledge improvement. The virtual lab was found to be an exciting, useful, and enjoyable learning environment during user trials. The main drawbacks of their study were the low number of participants and the lack of control and experimental group. Migkotzidis et al. [38] evaluated the Chemistry and the Wind Energy Lab in terms of usability, adoption, and engagement with the virtual labs. The participants expressed a positive opinion regarding the virtual lab interface and high engagement rates. Bogusevschi et al. [10] evaluated a virtual lab with 52 secondary school students in terms of learning effectiveness. The results had shown a statistically significant improvement in the experimental group using the virtual lab as compared to the control group learning using classic teacher-based approach. Bellou et al. [6] did a systematic review of empirical research on digital learning technologies and secondary Chemistry education. The results of the review of 43 studies had shown that the researchers were mainly interested in the chemistry topics and to use digital learning technologies for visualisation and simulations but not in personalising the learning journey. Despite much research and development in the area, there still is a lack of personalised virtual labs and a need for more comprehensive evaluation studies that look at the impact of virtual labs from multiple dimensions such as learner knowledge, motivation and usability. This study contributes to the area of research through a comprehensive multidimensional evaluation study of the Atomic Structure interactive personalised virtual lab with secondary school students.

3 Overview of the Atomic Structure Virtual Lab Atomic Structure is an interactive personalised virtual lab for secondary levels students, that teaches abstract scientific concepts such as the structure of atoms, bonding of molecules, gaining and losing electrons, that can be hard for students to grasp, and difficult for teachers to present with traditional teaching materials [23, 37]. The Atomic Structure virtual lab places the student in the centre of the learning experience by implementing personalisation at various layers. The pedagogical foundations of this virtual lab are self-directed learning, learning in flow, and inquiry-based learning. These innovative pedagogies are beneficial for enabling learners to carry out their own experiments, analyse and question, and take responsibility for their own learning [59], while personalisation makes the learning experience an individual one and keeps the learner engaged. Figure 1 shows the models built into the Atomic Structure virtual lab to enable personalisation and adaptation. The virtual lab covers concepts such as: atoms, isotopes

544

I. Ghergulescu et al.

and molecules. The learning path is guided by the Curriculum Model structure and organisation. For example, a student can only start the isotopes part of the virtual lab when they meet the prerequisite of completing the atoms.

Pedagogical Model

Curriculum

Content

Model

Model

Adaptation and Personalisation

Learner Model

Fig. 1. Adaptation and Personalisation input models: Pedagogical Model, Curriculum Model, Content Model and Learner Model [24].

The Content Model contains various learning materials and contents available in the virtual labs: instructional content with videos, e-assessment, interactivity where students can create and perform their own experiments through inquiry-based learning. The Learner Model is updated during the entire learner journey and includes information about the learner knowledge, level of self-directness, motivation (confidence), and special education needs. Personalisation in the Atomic Structure virtual lab is implemented at different levels throughout the entire learning journey. The levels of personalisation include: • learning loop-based personalisation; • feedback-based personalisation; • innovative pedagogies-based personalisation (inquiry-based learning, learning in flow, and self - directed learning); • gamification-based personalisation; • special education needs-based personalisation (e.g., sign language translation for hearing impaired students as shown in Fig. 2). Student’s levels of motivation and self-directness are determined at the beginning of the lab by asking them to answer few questions displayed on the screen. These are used to personalise the difficulty level of questions they receive in the quizzes, what types of atoms, isotopes and molecules they are given to build, as well as what type of feedback they will receive. For example, low and medium motivated students are restricted to atoms, isotopes and molecules which have been deemed suitable to each of those levels, and highly motivated students have access to more complex atoms, isotopes and molecules.

Evaluation of an Interactive Personalised Virtual Lab

545

Fig. 2. Atomic Structure video with embedded avatar for sign language translation.

Figure 3 illustrates the process of building an atom of boron with the Atomic Structure virtual lab. The inquiry-based learning phase is offered at the end of each of the three stages in the form of interactive atom, isotope and molecule builders.

Fig. 3. Building an atom of Calcium in the interactive atom builder.

Once the students master building the suggested objects, they can freely choose their own objects, and experiment further within the atom, isotope and molecule builders. The Atomic Structure virtual lab also includes gamification elements such as award badges for completing different stages (see Fig. 4).

546

I. Ghergulescu et al.

Fig. 4. Gamification badge awarded for completing the Isotope stage.

4 Pilot Study Methodology This section details the research methodology for the case study conducted with the aim to evaluate the Atomic Structure virtual lab in secondary schools. 4.1 Student Profile A total of 78 secondary level students from two schools in Ireland have participated into the study. The students were divided in a control group and an experimental group. The wide majority of students (i.e., 69 students) were in the 13–15 age group, 6 students were in the 16–18 age group, and 3 participants did not indicate their age group. The control group had 36 students (23 boys, 11 girls, 2 did not respond) and the experimental group had 42 students (26 boys, 15 girls, 1 did not respond). Students from the control group attended a traditional teacher-led classroom while the students from the experimental group studied by using the Atomic Structure virtual lab on computers in the classroom. The control group was also exposed to the Atomic Structure virtual lab after the evaluation study. 4.2 Study Workflow and Procedure The evaluation of the Atomic Structure virtual lab was done following the multidimensional methodology for pedagogical assessment in STEM technology enhanced learning [43]. The dimensions assessed were: learning outcome, motivation and learner satisfaction (usability-based). The assessment procedure used for the study is illustrated in Table 2. A description of the research study was given to participants, and consent and assent forms were collected before the actual study. Pre-learning experience surveys were given before and after the learning experience. The pre-surveys included: demographics questionnaire, knowledge pre-test and learner motivation pre-survey for both the control and experimental group. The learning experience of the experimental group

Evaluation of an Interactive Personalised Virtual Lab

547

Table 2. Assessment procedure [24]. Activity

Type

Control group

Experimental group

Demographics survey

Pre-Learning





Knowledge pre-test

Pre-learning





Learner motivation pre-survey

Pre-learning





Atomic structure virtual lab session

Learning





Traditional teacher led session

Learning





Learner motivation post survey

Post-learning





Learner usability survey

Post-learning





Knowledge post-test

Post learning





Interviews

Post learning





was a personalised learning journey through Atomic Structure virtual lab, while the learning experience of the experimental group was traditional teacher led-class session. Knowledge post-tests and Learner motivation post-survey were given to students from both experimental and control group. Furthermore, the experimental group filled in a usability survey. The knowledge tests contain both multiple choice and input answer questions. Learner motivation was assessed through dimensions such as interest, self-efficacy, engagement, positive attitude and enjoyment. Interest was assessed through Linkert scale interest question [41, 49], self-efficacy (confidence) was assessed following Bandura’s guidelines [5], while engagement, positive attitude and enjoyment was assessed using a 5 point Likert scale [29]. The usability survey contained questions related to four dimensions (usefulness, ease of use, ease of learning and satisfaction), as well as questions where students were asked to rate tow much they liked different features on the Atomic Structure virtual lab on a 5-point Likert scale, as well as open answer questions to indicate the top three things they liked, top 3 things they didn’t like, and if they have any comments or suggestions.

5 Results 5.1 Use of Technology In the demographic questionnaire, students were asked to report their frequency of use of technology such as smartphones, computers and video games. The results presented in Fig. 5 show that most students are frequent users of technology. The majority of students have access to a PC or laptop at home (81% for experimental group and 83% for control group). The percentage of students that use a smartphone every day is even higher, with 98% for the experimental group and 100% for the control group. A higher variety in student’s answers can be observed when looking to see if they play computer games on a gaming console.

548

I. Ghergulescu et al.

Fig. 5. Student’s use of technology.

5.2 Learning Assessment An analysis of the pre-test and post-tests knowledge was conducted to investigate the impact of the Atomic Structure virtual lab on students’ learning outcome. This analysis excluded the participants that did not answer any question of the pre-test and/or posttest. This approach was treating the participants as absent rather than awarding them a score of 0, which would not be a correct representation of their knowledge level. Participants with true 0 for pre and/or post-test (i.e., answered all questions wrong), were not excluded from the analysis. 11 participants were excluded from the control group and 2 participants were excluded from the experimental group. As such, the pre and post-test scores of 25 participants from the control group and 40 participants from the experimental group were considered for the learning outcomes analysis.

Evaluation of an Interactive Personalised Virtual Lab

549

Figure 6 presents the average correct response rates for the control and experimental groups on the pre and post knowledge tests. Learning Results Mean Correct Response Rate [%]

75%

60%

60

53.5% 48%

Group 40

Control Experimental

20

Po

Pr

e-

st -te

te s

st

t

0

Knowledge Test

Fig. 6. Learning results in terms of mean correct response rates for the two groups [24].

Table 3. T-test results between pre and post-tests. Group

Pre-test

Post-test

Increase

t-test Pre-Post

Mean

SE

Mean

SD

p-value

t-stat

df

Control

48.0

23.1

60.0

32.1

12.0

0.033

0.268

24

Experimental

53.5

22.8

75.0

22.1

21.5