Thinking and Acting Systemically: Improving School Districts Under Pressure 2015048075, 9780935302448, 9780935302455

The editors argue that it is time to pay greater attention to building the capacity of leaders within districts given th

211 31 4MB

English Pages 254 [260] Year 2016

Recommend Papers

Immanent Critiques: The Frankfurt School under Pressure 1804292524, 9781804292525

The Frankfurt School’s own legacy is best preserved by exercising an immanent critique of its premises and the conclusio

109 85 736KB Read more

Suddenly Diverse: How School Districts Manage Race and Inequality 9780226675534

For the past five years, American public schools have enrolled more students identified as Black, Latinx, American India

136 69 1MB Read more

Handbook of Embodied Psychology: Thinking, Feeling, and Acting 3030784703, 9783030784706

This edited volume seeks to integrate research and scholarship on the topic of embodiment, with the idea being that thin

107 57 12MB Read more

Transhuman Space Classic: Under Pressure 1556346786

http://www.sjgames.com/transhuman/blueshadow/

513 131 11MB Read more

Encountering Poverty: Thinking and Acting in an Unequal World 9780520962736

Encountering Poverty challenges mainstream frameworks of global poverty by going beyond the claims that poverty is a pro

103 83 4MB Read more

The Viscosity of Liquids under Pressure

436 22 620KB Read more

Improving School-to-Work Transitions 0871546426, 9780871546425

As anxieties about America's economic competitiveness mounted in the 1980s, so too did concerns that the nation

123 115 Read more

Sovereign Immunity Under Pressure: Norms, Values and Interests 3030877051, 9783030877057

This book offers a critical analysis of current challenges and developments of the State immunity regime through three d

118 59 6MB Read more

Work and Care under Pressure: Care Arrangements across Europe 9789048519163

In many European countries tensions have arisen between the demands of the labor market and the caregiving responsibilit

148 64 659KB Read more

Life Under Fire: How to Build Inner Strength and Thrive Under Pressure 9781473576384, 1473576385

THE EXTRAORDINARY SUNDAY TIMES BESTSELLER. Take control of your life, build resilience and learn to thrive in any situa

124 19 2MB Read more

Thinking and Acting Systemically: Improving School Districts Under Pressure
2015048075, 9780935302448, 9780935302455

Author / Uploaded
Daly
Alan J.; Finnigan
Kara S.

Similar Topics
Education

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Thinking and Acting Systemically:

Improving School Districts

Under Pressure

The American Educational Research Association (AERA) publishes books and journals based on the highest standards of professional review to ensure their quality, accuracy, and objectivity. Findings and conclusions in publications are those of the authors and do not reflect the position or policies of the Association, its Council, or its officers. © 2016 American Educational Research Association The AERA Books Editorial Board Chair: Russell W. Rumberger Members: D. Jean Clandinin, Amanda L. Datnow, Jeffrey R. Henig, Michal Kurlaender, Felice J. Levine, Na’ilah Suad Nasir, Charles M. Payne, Christine E. Sleeter Published by the American Educational Research Association 1430 K St., NW, Suite 1200 Washington, DC 20005 Printed in the United States of America All rights reserved. No part of this publication may be reproduced or distributed in any form or by any means, including, but not limited to, the process of scanning and digitization, or stored in a database or retrieval system, without the prior written permission of the publisher. Library of Congress Cataloging-in-Publication Data Names: Daly, Alan J., editor. | Finnigan, Kara S., editor.

Title: Thinking and acting systemically: Improving school districts under

pressure / edited by Alan J. Daly, Kara S. Finnigan. Description: Washington, DC: American Educational Research Association, [2015] Identifiers: LCCN 2015048075| ISBN 9780935302448 (pbk.) | ISBN 9780935302455 (hardcover) Subjects: LCSH: School improvement programs—United States. | Educational change—United States. | Public schools—United States. Classification: LCC LB2822.82 .T536 2015 | DDC 371.2/07--dc23 LC record available at http://lccn.loc.gov/2015048075

Acknowledgments Edited volumes come about with the support of many groups and individuals. First, we wish to thank the American Educational Research Association. This volume is the result of a two-day meeting of a small group of scholars funded by a grant from the AERA Research Conferences Program (for more information about the meeting, titled “Thinking Systemically: Improving Districts Under Pressure,” visit www.districtre form.com). Without AERA’s support we could not have pulled together such an incredible group of thinkers. We are grateful to AERA Executive Director Felice Levine and the AERA Books Editorial Board for their ongo ing counsel and support. Most important, we are grateful to the participants: Stephen Anderson, William Firestone, Betheny Gross, Laura Hamilton, Julie Reed Kochanek, Kerstin Carlson Le Floch, Karen Seashore Louis, Betty Malen, Julie Marsh, Michelle Palermo-Biggs, William Penuel, Joelle Rodway, Andrea Rorrer, Georgia Sang-Baffoe, Mark Smylie, Louise Stoll, Jonathan Supovitz, Tina Trujillo, Priscilla Wohlstetter, and Kenneth Wong. The collegiality and intellectual conversation of this “professional learning community” enriched the learning experience for all of us. We are also deeply appreciative of the many scholars who carefully reviewed the chapters and provided useful feedback to the authors and to us as editors. Their graciousness, insight, and perspective made the book even stronger. And we thank our families for their ongoing sup port when our schedules were challenging and our on-the-ground work and writing took us away from home. A big thank you to the William T. Grant and Spencer Foundations, especially to Vivian Tseng and Andrea Conklin Bueschel, who became critical friends and pushed our thinking in ways that benefited not only this volume but our work in general. We are grateful for their support and friendship. We would also like to thank the doctoral students who provided critical support at various stages in this work—many of whom have moved on to careers of their own in educa tion. In addition, we are grateful to Tricia Stewart, Jing Che, and Nadine Hylton for their various contributions to our research, which helped form the basis for the conference and book. Our special thanks to Georgia Sang-Baffoe and Michelle Palermo-Biggs for their support at the AERA workshop, and to Pam Kaptein for her coordination of the meeting logis tics, enabling all aspects of the event to run smoothly. Finally, we want to thank Yi-Hwa Liou, who has worked tirelessly and continuously on this book and on many of our other projects. v

vi | Acknowledgments

We hope that in reading these chapters, researchers will take away a greater understanding of the important role of central district offices in reform, as well as some unique theoretical and methodological approaches to examining this rich topic. We also hope that practitioners will find it use ful, as they carry on the hard work of district reform every day. No longer in educational research can we merely conduct rigorous studies. We must be committed to ensuring the relevance of our work. Reciprocal, trusting partnerships between researchers, policy makers, and practitioners made this book stronger. Such partnerships hold the potential for crafting poli cies and practices that will reduce inequities in our educational systems and ultimately improve the lives of all youth.

AlAn J. DAly AnD KArA S. FinnigAn

Contents Acknowledgments

v

Introduction: Why We Need to Think Systemically in Educational Policy and Reform Kara S. Finnigan and Alan J. Daly

1

Section 1. School Districts as Leverage Points for Systems Change

1 Learning From the Past to Chart New Directions in the Study of School District Effectiveness Tina Trujillo 2 Expanding School Indicator Systems in a Post-NCLB Era Laura S. Hamilton and Heather L. Schwartz

11

49

Section 2. Systems Learning at the School and Classroom Levels

3 Formative Experimentation: The Role of Experimental Research in Program Development Jonathan Supovitz

77

4 A Research-Practice Partnership to Improve Formative Classroom Assessment in Science William R. Penuel and Angela Haydel DeBarger

97

Section 3. How Politics, Underlying Theories, and Leadership Capacity Support System-Wide Change

5 Portfolio Reform in Los Angeles: Successes and Challenges in School District Implementation Susan Bush-Mecenas, Julie A. Marsh, and Katharine O. Strunk 6 Common Core, Uncommon Theory of Action: CEOs in New York City Schools Priscilla Wohlstetter, Brandon Buck, David M. Houston, and Courtney O. Smith 7 How Leadership Churn Undermines Learning and Improvement in Low-Performing School Districts Kara S. Finnigan, Alan J. Daly, and Yi-Hwa Liou

vii

119

147

183

viii | Contents

Section 4. Systemic Lessons for Policy and Practice: Improving School Districts Under Pressure

8 Commentary: Three Organizational Lessons for School District Improvement Mark A. Smylie 9 Commentary: Toward Systemic Reform in Urban School Districts Kenneth K. Wong

209

221

Conclusions: The Challenge of School and District Improvement: Promising Directions in District Reform Alan J. Daly and Kara S. Finnigan

229

Index Conference Participants About the Contributors

243 251 252

Introduction

Why We Need to Think Systemically in Educational Policy and Reform Kara S. Finnigan

University of Rochester

Alan J. Daly

R

University of California, San Diego

ecent research has helped us to understand the challenges faced by schools under significant performance pressure from high-stakes accountability policies (e.g., see Daly, 2009; Finnigan & Gross, 2007; Finnigan & Stewart, 2009), but efforts to understand the district context that may facilitate or impede constructive responses to this pressure are limited. Recognizing that improving low-performing schools is complex work with limited success at scale, scholars have shifted their attention to the broader system in which schools reside, exploring linkages between central offices and sites in engendering change (e.g., see Hightower, Knapp, Marsh, & McLaughlin, 2002; Honig, 2006; Honig & Coburn, 2008; Hubbard, Mehan, & Stein, 2006; Marsh, 2002). In addition, a host of nonsystem actors or intermediaries play roles in the spread of knowledge, information, and other resources throughout a district (Finnigan & Daly, 2014; Honig, 2004; Penuel, Korbak, Sussex, Frank, & Belman, 2007). Central office leaders tend to rely on these intermediaries to bring about schoolwide improvement, with varying degrees of success (Finnigan, Bitter, & O’Day, 2009; Honig, 2004). In this volume, we try to capture the knowledge base that exists today on the importance of focusing on the larger system and the intermediaries rather than on school-by-school change. As our educational policy shifts from the rigid pressure and sanctions of No Child Left Behind (NCLB) to the more pliable and local capacity of the Every Student Succeeds Act (ESSA), this shift toward a more systemic lens will be important, particularly for effecting change in the lowest performing schools. We borrow a term from Smylie, Wenzel, and Fendt (2003), who argue that it is time to “think systemically [emphasis added] about schools and their development The two authors contributed equally to this chapter.

1

and see educational organizations in terms of their interdependent parts” (p. 155). This volume offers both substantive and methodological value to the field. We have gathered a cadre of outstanding scholars who represent a variety of research strands and approaches to understanding the role of the school district in reform. The volume offers a breadth of perspectives while simultaneously providing depth within each of its sections. This is important because while some scholars have begun to consider the roles of school organizational learning, social networks, and professional learning communities, there is little cross-fertilization across these areas, particu larly in district-level studies. In addition to the research chapters, we have included commentar ies by two well-regarded scholars, Mark Smylie and Kenneth Wong. By bringing their commentaries into the work (as opposed to just providing the research studies included in this volume) we offer the reader different perspectives on the research, hoping to further illuminate the complexity of improvement efforts under pressure from accountability policies and the role of the school district in such efforts.

Background and Overview This volume is particularly timely because we have seen an increasing push by nations around the world for higher levels of performance and account ability. In the United States these efforts have been codified through federal policies and programs such as No Child Left Behind and Race to the Top, whose effects we are only beginning to understand. Given the pressure to achieve at increasingly high levels or else risk facing sanctions, educators have ratcheted up their improvement efforts (Mintrop & Trujillo, 2005); yet these efforts have resulted in inconsistent performance and have not led to significant improvement (Mintrop & Sunderman, 2009). The limited success of the last decade of high-stakes accountability policies at the school level suggests both an urgent need for action and a need to reconsider the system (or school district) as the leverage point for improvement, as opposed to the more common emphasis on school-by-school change. We are at a critical turning point as our nation shifts gears with ESSA, maintaining some aspects of NCLB while moving the design of the accountability and support systems for low-performing schools back to the states and relying on competitive pro grams to spur change. Solving the puzzle of district turnaround to bring about system-wide improvement, rather than focusing on improvement school by school, has the potential to dramatically improve educational outcomes.

Our objective in this volume is to share and discuss empirical, theo retical, and methodological innovations that are focused on the examina tion of persistently struggling districts. Indeed, system-wide approaches to improvement under pressure have received inadequate coverage in policyrelated discussion about low academic performance, which we argue is a missed opportunity to improve educational opportunities and outcomes for youth in our public education system at scale. The volume is the result of a two-day intensive workshop of scholars funded by a grant from the American Educational Research Association’s Research Conferences Program. Entitled “Thinking Systemically: Improving Districts Under Pressure,” the workshop was held in Rochester, New York, and facilitated by the editors. The participating scholars (see participants list) focused on district improvement under high-stakes accountability policies, with a particular emphasis on the linkages between organizational learning, district-wide learning communities, and underly ing social networks. This volume is intended for a wide variety of readers. We think it will appeal most to school and district leaders who are working to lever age change each day; policy makers who are trying to better understand and enact educational policies that improve (rather than work against) the effectiveness of schools and districts; and researchers interested in leadership, policy, school change, and district reform. Additional poten tial readers are those interested in assessment, design-based research, portfolio reforms, social network analysis, and the politics of educa tion. The chapters represent a variety of settings and contexts across the country and examine them through multiple lenses, such as organiza tional learning and professional learning, as well as multiple method ological approaches. Section 1 is a good resource for faculty who teach courses in orga nizational development, change, and leadership. Sections 2 and 3 focus on current programs and policies, highlighting challenges in implementa tion of interest to site and district leaders. Section 2 may have particular relevance for individuals engaged in design-based partnerships, while Section 3 focuses on system-wide policy change. A number of chapters may be of special interest to researchers, particularly those in Section 4, as they help us to understand the implications of the role of districts in reform. Finally, the volume as a whole is directed to policy makers and funders who want to encourage research and development efforts on the role of districts in reform. In the concluding chapter we address how to move ideas into action.

Readers with specific interests in district-wide change can identify the chapters most relevant to their work in the detailed discussion that follows.

Contents This text is divided into four sections. Here, we briefly summarize their contents.

Section 1. School Districts as Leverage Points for Systems Change Section 1 frames the volume and provides the reader with quick access to large issues that are at play in examining the role of the district in reform. In Chapter 1, Tina Trujillo examines the parallels between school effec tiveness and district effectiveness, focusing on how schools and districts frame notions of success and the purposes of schooling; the contextualized nature of school performance; and theoretical explanations for student outcomes. She argues that holistic examinations of district effectiveness, incorporating multiple measures of success and more diverse methods of analysis, are critical to moving the field forward. In Chapter 2, Laura Hamilton and Heather Schwartz draw on prior research on accountabil ity reforms to provide guidance on expanding measurement systems to incorporate 21st-century competencies. They note that while expanding measures can direct educators’ attention to previously neglected aspects of schooling, doing so requires time and resources to ensure that the added measures are accurate.

Section 2. Systems Learning at the School and Classroom Levels This section explores new possibilities in the development of partnerships that by design involve co-learning between different parts of the system (e.g., teachers and leaders). In Chapter 3, Jonathan Supovitz explores the relationship between experimental research and program learning through a case study of an experiment with a local district in which teachers exam ined data on instruction in conjunction with data on student learning, in an effort to improve the quality of teaching. In Chapter 4, William Penuel and Angela Haydel DeBarger describe a partnership between middle school Earth science teachers and leaders in a large urban district, work ing together to design and test classroom assessment resources intended to improve the efficacy of district-adopted curriculum materials. Their chap ter highlights the ways the partnership provided subject matter, design, and

research expertise to contribute to the district’s efforts to bring coherence to curriculum, instruction, assessment, and professional development. In both chapters, the authors highlight the opportunities and challenges of embedding research and learning in the intervention process.

Section 3. How Politics, Underlying Theories, and Leadership Capacity Support System-Wide Change The chapters in this section each focus on the implementation of a policy reform at the local (district) level. These chapters should be especially important for state and district leaders who are implementing ESSA and seeking to better understand the challenges on the ground to school and district improvement. The authors pay close attention to the assumptions behind and unintended consequences of these reforms. In Chapter 5, Susan Bush-Mecenas, Julie Marsh, and Katharine Strunk examine the Los Angeles Unified School District’s Public School Choice Initiative, which allowed teams of stakeholders to compete to turn around the district’s lowest per forming “focus” schools and to operate newly constructed “relief” schools designated to ease overcrowding. The authors found that district leaders scaffolded the plan development process with an array of supports from multiple organizations; selected plans on the basis of quality, for the most part; and ensured transparency at each stage of the process. Nevertheless, the scale and complexity of the initiative made it a formidable undertak ing for district administrators and partners. The initiative was fraught with challenges, which weakened several of the key mechanisms of change. In Chapter 6, Priscilla Wohlstetter, Brandon Buck, David Houston, and Courtney Smith examine the rollout and implementation of the Common Core State Standards in New York City, paying close attention to the uptake of the instructional expectations in schools and partnerships between schools within the Children First Network. The reform efforts in high-performing schools aligned closely with the district’s theory of action, as school lead ers used their increased autonomy and accessed district supports. But the low-performing schools faced challenges resulting from the need to juggle reforms and from geographic dispersion; as a result, those schools did not benefit from the Children First Network support infrastructure. In Chapter 7, Kara Finnigan, Alan Daly, and Yi-Hwa Liou consider how the social interactions underlying district improvement efforts are disrupted when a high percentage of actors leave or enter the system, cre ating a type of social network churn. The authors used longitudinal social network analyses to examine the ties between school and central office

leaders in a low-performing school district in the northeastern United States, finding that nearly half of the leaders left during a four-year period, with a constant flow into and out of leadership positions. A loss of knowledge or expertise—or even of a person who helps to bind a social system—has detrimental effects on an organization, in terms of training and development as well as relational connectedness and support.

Section 4. Systemic Lessons for Policy and Practice: Improving School Districts Under Pressure The two commentaries in this section draw together the major ideas covered earlier and challenge readers to consider new directions and promising areas for study and review. In Chapter 8, Mark Smylie argues for the need to define the problem of district improvement as one of organizational change. He cau tions that effectiveness is not improvement, acting systemically requires mul tiple levers for improvement, and accountability pressure must be applied with care. And he points to the critical role of leadership in system-wide change. In Chapter 9, Kenneth Wong focuses on the federal government’s role in advanc ing systemic school reform. He considers the chapters in light of these efforts, highlighting the implementation challenges at the district level, in particular, that must be addressed. He points to the importance of strengthening verti cal and lateral communication in districts and developing analytic capacity and professional learning opportunities regarding data and research. He also argues for a broader focus on leadership, politics, and governance to uncover effective systemic strategies for urban districts across the country. Professor Wong’s insights about federal policy will continue to be important even in a changing policy context. He points out areas to which state departments of education and the federal government must attend in bringing about equitable and sustainable changes in educational systems across the country.

Final Thoughts This volume provides important insights into why we need to begin to think systemically about educational policy and reform related to districts. Three key lessons emerge for how to create sustainable change at scale: 1. Large-scale change requires attention to the role of school districts. 2. Large-scale change requires partnerships between researchers and practitioners. 3. Large-scale change requires strong and stable leadership.

The chapters that follow bring these themes to life through a variety of lenses and methodologies. Perhaps even more important than the focus on change strategies is the empirical evidence provided in this volume indicating that pressure should not be the preferred lever in educational policy until capacity is built within education systems. As Smylie points out in his commentary, This is a good moment to note that underperforming and under resourced school districts serving large proportions of low-income and racially isolated students are the districts likely to be subject to the most extreme combinations of environmental stressors, to have the least human and organizational capacity and sources of sup port, and to be on the receiving end of the greatest pressure from reform policy. (p. 216) The contributors to this volume point to the need not just for capacity build ing in these low-performing systems but also for relationship building, to form connections with knowledge communities outside the systems (see Finnigan & Daly, 2014). Many of the chapters identify challenges related to human capital, and leadership in particular. However, an equally strong theme in this volume, one that ordinarily receives less attention in the field, is the importance of the underlying relationships among educators and other stakeholders, which can either facilitate or hinder large-scale improvement. In this new era of ESSA, it is time for state and federal policy systems to alter underlying assumptions and leverage points if they are to shift educator beliefs and responses from maladaptive patterns to productive, sustainable large-scale efforts that reduce the current inequities in educa tional opportunity and outcomes. We have a unique opportunity in this new period to intentionally and collaboratively co-create systems that will make use of our collective wisdom and move forward to better outcomes for all.

References Daly, A. J. (2009). Rigid response in an age of accountability. Educational Administration Quarterly, 45(2), 168–216. Finnigan, K. S., Bitter, C., & O’Day, J. (2009). Improving low-performing schools through external assistance: Lessons from Chicago and California. Education Policy Analysis Archives, 17(7). Retrieved from http://epaa.asu.edu/epaa/v17n7/ Finnigan, K. S., & Daly, A. J. (Eds.). (2014). Research evidence in education: From the schoolhouse door to Capitol Hill. Rotterdam, The Netherlands: Springer.

Finnigan, K. S., & Gross, B. (2007). Do accountability policy sanctions influence teacher motivation? Lessons from Chicago’s low-performing schools. American Educational Research Journal, 44(3), 594–629. Finnigan, K. S., & Stewart, T. J. (2009). Leading change under pressure: An examination of principal leadership in low-performing schools. Journal of School Leadership, 19(5), 586–618. Hightower, A. M., Knapp, M. S., Marsh, J. A., & McLaughlin, M. W. (Eds.). (2002). School districts and instructional renewal. New York: Teachers College Press. Honig, M. (2004). Crafting coherence: How schools strategically manage multiple, exter nal demands. Educational Researcher, 33(8), 16–30. Honig, M. (2006). Street-level bureaucracy revisited: Frontline district central-office ad ministrators as boundary spanners in education policy implementation. Educational Evaluation and Policy Analysis, 28(4), 357–383. Honig, M., & Coburn, C. (2008). Evidence-based decision making in district central of fices: Toward policy and a research agenda. Educational Policy, 22(4), 578–608. Hubbard, L., Mehan, H., & Stein, M. (2006). Reform as learning. New York: Routledge. Marsh, J. A. (2002). How districts relate to states, schools, and communities: A review of emerging literature. In A. M. Hightower (Ed.), School districts and instructional renewal (pp. 25–39). New York: Teachers College Press. Mintrop, H., & Sunderman, G. L. (2009). Predictable failure of federal sanctions-driven accountability for school improvement—And why we may retain it anyway. Educa tional Researcher, 38(5), 353–364. Mintrop, H., & Trujillo, T. (2005). Corrective action in low-performing schools: Lessons for NCLB implementation from first-generation accountability systems. Education Policy Analysis Archives, 13(48), 1–27. Retrieved from http://epaa.asu.edu/epaa/v13n48/ Penuel, W. R., Korbak, C., Sussex, W., Frank, K., & Belman, D. (2007). Catalyzing network expertise: Year 1 report. Menlo Park: SRI International. Smylie, M., Wenzel, S., & Fendt, C. (2003). The Chicago Annenberg Challenge: Lessons on leadership for school development. In J. Murphy & A. Datnow (Eds.), Leadership lessons from comprehensive school reforms (pp. 135–158). Thousand Oaks, CA: Sage.

Section 1

School Districts as Leverage Points for Systems Change

Chapter 1

Learning From the Past to Chart New

Directions in the Study of School District

Effectiveness

TinA TruJillo

T

University of California, Berkeley

his chapter aims to explain whether, and in what ways, our the oretical and practical knowledge is advanced by the most recent generation of studies in district effectiveness. Over the last two decades, the local district has risen as a key unit of analysis in research on urban school reform. During that time, many scholars and practitio ners have come to view central offices—once relegated to ancillary roles in school improvement—as valuable instruments for effecting large-scale change. This shift follows, in part, from researchers’ findings about the potentially constructive roles of districts in enhancing teaching and learn ing and implementing state policy (Elmore & Burney, 1997; Hightower, Knapp, Marsh, & McLaughlin, 2002; Honig, Copland, Rainey, Lorton, & Newton, 2010; Leithwood, 1995; Spillane, 1996; Supovitz, 2006; Weinbaum, 2005). In this sense, district-level studies represent the next evolutionary step in the urban reform literature, which once focused solely on the school as the key organizational form for improvement. At least three conditions set in motion these district-level investigations. First, throughout the restructuring era of the early 1980s, scholars tended to discount the role of districts in school-level inquiries on reforms such as site-based management and whole-school reform (Anderson, 2006). At that time, policy makers and practitioners concentrated pri marily on the school as the unit of change because they were influenced, in part, by organizational theories about the advantages of decentraliza tion; disillusionment with a perceived top-heavy administrative school system; and lessons from the effective schools studies, many of which included prescriptions for individual school leaders and for school-level change strategies (Murphy, 1991). Implicit in the policies of that era was an assumption that seemingly impenetrable, cumbersome central offices were to be circumvented rather than embraced as potential contributors to educational reform. However, this research eventually revealed, among 11

other things, that few school-by-school reforms were effective in produc ing large-scale, sustainable gains in student performance (David, 1989), in challenging the status quo to further equity-oriented goals (Sarason, 1990), or in approaching change systemically (Vinovskis, 1996). Second, in the early 1990s, influential proposals for coherent systemic reforms were used to advocate for aligning all levels of the educational structure, from school to district to state (e.g., Fuhrman, 1993; Smith & O’Day, 1991). In turn, scholars and policy makers refocused their atten tion on districts as potential links in large-scale efforts to more tightly couple the state and the school (Elmore, 1993). This literature also advo cated for explicit considerations of the broader social contexts that shape reform outcomes (McLaughlin & Talbert, 1993). As practitioners began to reconsider the possibilities for districts’ contributions to systemic, con textualized reforms, researchers began to concentrate more explicitly on the characteristics of districts that were associated with more successful reform, particularly amid an increasingly standards-based, high-stakes accountability policy environment. Third, alongside these trends arose a collection of studies concluding that districts’ effects on student performance were minimal (e.g., Floden et al., 1988; Ouchi, 2003). Many scholars favored a radical decentraliza tion of public school districts to alleviate what they saw as organization ally and politically intractable problems inherent in district designs. They advocated dramatically scaling back districts’ instructional, curricular, and other functions in order to curb what some perceived to be overly bureau cratic, inefficient, and excessively politicized organizations (e.g., Doyle & Finn, 1984; Hill, 1999). Their work usually proposed scaling back dis tricts’ collective bargaining agreements, outsourcing key district functions to external management organizations, and delegating personnel, budget, and other decisions to school site leaders. In response to these developments, some scholars began to consider the potential advantages of district-wide reforms as opposed to individual, school-level ones (e.g., Cuban, 1984; Purkey & Smith, 1985). One of the primary aims of the new research was to analyze districts that demon strated unusually high performance in order to extrapolate lessons about organizational or instructional properties that brought about success. In other words, these studies were driven in part by a desire to identify the ways in which districts could effectively shape student outcomes. The resulting literature demonstrated that school districts can, in fact, have an impact on student outcomes. Their influence may not be direct, as it is usually mediated by school leadership, teacher professional

development, or other contextual conditions. However, the ultimate con clusion across these studies remained: The central office could, under certain conditions and in particular settings, play a role in shaping studentlevel performance. The motivation behind these studies echoes the drive behind the earlier school effectiveness research that was initiated in large part as a response to the Coleman Report (1966) and other studies (e.g., Jencks et al., 1972). That literature revealed that the effects of school characteristics on student achievement paled in comparison with the effects of socioeco nomic status, race, and other family background variables. School-level effectiveness studies, like their district-level successors, sought to identify the ways in which schools could in fact influence student performance. Even the wording in some of the prominent district literature titles— “School districts matter,” “What districts can do” (Spillane, 1996; Togneri & Anderson, 2003)—is reminiscent of titles from the school effective ness era: “Schools can make a difference,” “School matters” (Brookover, Beady, Flood, & Schweitzer, 1979; Mortimore, Sammons, Stoll, Lewis, & Ecob, 1988). And a number of the recommendations of district effective ness studies (e.g., to cultivate strong instructional leadership, to develop a clear mission or vision; Murphy & Hallinger, 1988; Snipes, Doolittle, & Herlihy, 2002) repeated the major recommendations of the school effec tiveness literature. These similarities suggest that the degree to which district effectiveness studies are more robust than their school-level coun terparts is not entirely clear. Beyond similarities in language, the two literatures show significant methodological and conceptual overlap. Scholars have begun to pinpoint methodological limitations of district effectiveness research that mimic the shortcomings of the school effectiveness research (e.g., Bowers, 2010). Others have started to question whether the district-level studies merely reproduce the conceptual arguments of the earlier school-level studies (Trujillo & Renée, 2012). Yet critics of the school effectiveness tradition problematized its investigations not just for design shortcom ings but for the overly technical-rational perspectives that framed their inquiries. These critics called for more holistic school effectiveness stud ies to explicitly examine the political, social, and normative factors that shape schools’ capacity to be judged effective (Slee, Weiner, & Tomlinson, 1998) and that scholars have repeatedly found to significantly shape edu cational processes and outcomes, particularly when reforms are designed to challenge the status quo by redistributing resources to traditionally underserved groups (Datnow, 2000; Oakes, 1992; Trujillo, 2013b; Welner,

2001). Critics also interrogated many school effectiveness studies for the lack of theoretical rationales behind their design and analyses (Teddlie & Reynolds, 2000). The result, in their view, was a literature whose validity and reliability were fairly limited, whose conceptualization of schooling and its outcomes was rather narrow, and whose explanations of student performance in poor, urban contexts were undertheorized. This literature review takes as its starting point the conceptual and methodological characteristics of the district effectiveness research. It compares the conceptual and methodological features of school and dis trict effectiveness research in order to assess the ways in which the district studies have, in fact, deepened our understanding of the forces that weigh on urban educational reform and our ability to draw valid, reliable conclu sions about what works in facilitating more effective school districts, and why. Specifically, this review examines the ways in which the traditions of school and district effectiveness frame notions of success, the purposes of schooling, the contextualized nature of school performance, and the theoretical explanations for student outcomes. It concludes by discussing the implications of these similarities for the research and practice of urban district reform, and by proposing new directions for the field of district effectiveness research. Three questions guided this review: 1. What are the major correlates of effective districts, and how do these cor relates compare with correlates of effective schools in earlier studies? 2. What are the major methodological designs for studying district ef fectiveness, and how do they compare with earlier designs for study ing school effectiveness? 3. What are the major conceptual dimensions of the district effective ness research, and how do they compare with dimensions of the ear lier school effectiveness research?

Scope and Purpose of the Review The field of district-related research is broad and varied. For example, researchers have taken up questions about districts’ internal capacity for change (Honig, 2003; Spillane & Thompson, 1997), central office relation ships with intermediaries (Honig, 2004), and district actors’ interpretation and implementation of state policy (Coburn, Toure, & Yamashita, 2009; Spillane, 1998). Others have explored qualities of successful district lead ership (Leithwood, 1995), organizational dynamics of central office learn ing (Labaree, 2007), and tensions between central authority and school site

autonomy (Hightower, 2002). A smaller number of studies have consid ered the political and normative forces that shape district reforms (Honig, 2009; Marsh, 2007; Skrla & Scheurich, 2003; Trujillo, 2013a, 2013b). Occasionally, scholars have put forth conceptual frameworks for analyz ing districts’ overall efficiency and effectiveness (Childress, Johnson, Grossman, & Elmore, 2007; McDonnell, 2000). However, by and large, inquiries into districts’ effects on student outcomes comprise the largest share of this literature (Anderson, 2006). Questions of what works in designing and leading effective districts dominate the research (Leithwood, 2010) and practice (Waters & Marzano, 2006) of urban district reform. Thus, for the purposes of this review, the analysis is limited to studies that explicitly considered the effects of dis trict policies, processes, conditions, and behaviors on classroom-level out comes. Inquiries into the organizational, interpretative, cognitive, or other explanatory factors in urban district reform were not reviewed unless they were expressly linked with classroom-level results.

Design and Methods of Review Data sources for this literature review included 50 primary documents on district effectiveness. Thirty-four percent of the studies included peerreviewed journal articles; 16% consisted of books or book chapters; 26% were conference papers or reports from regional labs, research centers, or technical assistance centers; and 20% included reports from founda tions, think tanks, or advocacy organizations. Given the heavy practical application of the research on district effectiveness, studies published by a variety of sources were included. These studies’ broad range of standards of evidence and other aspects of methodological rigor accurately represent the range of information that is disseminated to practitioners about district effectiveness, as well as the diverse base of scholarly knowledge being constructed in the field. The review was limited to studies that presented the original results from investigations of the relationship between classroom-level outcomes and district-level policies, routines, processes, behaviors, conditions, or other characteristics. Classroom-level outcomes were defined broadly; while they could be conceptualized in terms of standardized test scores, they could also entail a range of pedagogical strategies, curricular patterns, student behaviors, instructional tasks, materials usage, access to or oppor tunities for teaching and learning, and so forth. District-level characteris tics were also broadly defined. They could include formal district policies

related to curriculum or assessment, district leadership styles, or gover nance structures; but they could also include broader, more informal social or economic conditions in a district’s community, relationships among various stakeholders, or ideological priorities or struggles within a dis trict. These wide-ranging conceptualizations were purposeful; they were intended to avoid producing a narrow sample of literature that would not adequately represent the range of district effectiveness studies available. Studies were selected by searching the ERIC database; Google Scholar; prior reviews of the literature on district reform, improvement, and effectiveness; bibliographies of other district-related literature; and general Internet inquiries. Key search terms included “district effective ness,” “successful school districts,” “high-performing school districts,” and “research on school district success.” To ensure that we did not exclude any relevant studies, we cross-checked the results that each search yielded. In the first round of analysis, all literature was coded for inductively derived concepts that described the major methodological features of each study and predetermined concepts that derived from the major method ological critiques of the school effectiveness research. Examples of codes included sampling and selection; comparison criteria; designs; data col lection techniques; research subjects; and measures of effectiveness or success. Some codes were recoded for greater detail. For example, mea sures of effectiveness was recoded into two codes: test-based measures of effectiveness and non-test-based measures of effectiveness. In the next round of analysis, each study was coded for all correlates of district effectiveness. These correlates included the seven major corre lates of effective schools, as well as all inductively determined correlates of effective districts that were not captured by these original school-level correlates. Seven codes were used for the correlates of effective schools: clear mission/vision; frequent monitoring; instructional leadership; high expectations; opportunity to learn/time on task; safe and orderly environ ment; and home-school relations. Definitions for each code were taken directly from the definitions employed in the effective schools literature. Inductive codes were created for every correlate of district effectiveness that researchers identified but to which the seven predetermined codes did not apply. This comprehensive coding strategy ensured that the codes cap tured the complete range of district effectiveness correlates. It also ensured that I did not overlook any correlates, particularly any nontechnical cor relates of effectiveness. Examples of inductively determined correlates of effective districts included accountability; organizational coherence; standards alignment; theory of action; coalition building; explicitly

equity-oriented goals; and collective norm-building. Definitions for each of these correlates were taken directly from the studies in which they were identified. When codes were defined differently among studies, their defi nitions were expanded to encompass every study’s conceptualization. For example, if one study defined standards alignment as the coordination of state content standards with assessments and another study defined it as the coordination of state content standards with instructional materials, standards alignment was defined as the coordination of state content stan dards with assessments or instructional materials. From there, I categorized the codes for all correlates of effective dis tricts according to the major conceptual themes that arose from critiques of the school effectiveness literature. Namely, I coded for evidence of the three conceptual dimensions of schooling to which scholars called atten tion in their appraisals of the school effectiveness studies: the technical, sociopolitical, and normative dimensions. I defined each dimension to encompass the descriptions employed in the literature (e.g., Oakes, 1992; Slee et al., 1998; Thrupp, 1999). I defined the technical dimension to include districts’ rational properties, including their organizational, cur ricular, pedagogical, or managerial elements. I defined the sociopolitical dimension to include district leaders’ relationships, coalition building, explicit acknowledgment of power structures or social justice, and consid erations of communities’ social or historical contexts. I defined the norma tive dimension to include district leaders’ acknowledgment of educators’ conceptions or beliefs about students, efforts to develop shared norms, and explicit attempts to cultivate high expectations for teaching and learning. I also coded for an explicit theoretical framework/rationale for the study’s design, analysis, and interpretation. Studies were coded for the presence of a theoretical framework/rationale if their author(s) explicitly referred to a theoretical framework, a conceptual framework, major theo retical or conceptual constructs that guided the study, or grounded theory that emerged inductively from their analysis. The purpose of this coding scheme was to compile general method ological and conceptual profiles of the district effectiveness research, as well as to help identify any areas of overlap or distinction between the district-level and school-level effectiveness studies. As a final step, I used these codes to construct multiple case-level displays for partially ordered meta-matrices (Miles & Huberman, 1994) in order to compare and con trast patterns and contradictions within the district effectiveness research and between the two traditions. See the Appendix (p. 47) for an example of one coding matrix. These displays allowed me to collapse a very large

amount of qualitative data into more manageable, analyzable chunks. They also enabled me to quantify the strength, or overall proportion, of certain patterns in a limited amount of space. Thus, in the following section most qualitative patterns are described in terms of the percentage of studies that exemplified each pattern, because I aimed to depict the overall profile of this body of research, not to compare the specific features among spe cific studies. Examples that illustrate the quantitative patterns are included where appropriate and where space permits.

Findings Effective Schools Research: Contributions and Concerns In the late 1970s Edmonds contributed to what was rapidly becoming an extensive tradition of effective schools research1 with his seminal work, which demonstrated that instructionally effective schools existed for poor, urban children (1979a, 1979b). Compelled by a desire to refute the mount ing literature that pointed to the low level of variance in student outcomes that school-level factors could explain, Edmonds showed that certain orga nizational and instructional characteristics of schools could affect student achievement. Over time, Edmonds and his colleagues identified seven cor relates of such effective schools: a safe and orderly environment, high expectations for students, opportunity to learn and time on task, strong instructional leadership, frequent monitoring of student progress, positive home-school relations, and a clear and focused school mission. A cross-section of researchers took up the quest to identify the withinschool factors that were strongly correlated with high student achieve ment on standardized tests. While a minority of these studies employed longitudinal or large-scale designs (Brookover et al., 1979; Klitgaard & Hall, 1974; Teddlie & Stringfield, 1993), most relied either on outlier designs to study schools with unusually high test scores (Brookover & Schneider, 1975; Lezotte, Edmonds, & Ratner, 1974) or on case studies of such schools (Brookover & Lezotte, 1979; New York State Department of Education, 1974; Spartz, 1977). The specific combinations for corre lates of effective schools varied somewhat among the studies, but the gen eral recipe remained more or less the same; sometimes it included other correlates, such as an emphasis on basic skill acquisition. Although this research contributed fairly consistent findings to the knowledge base about the properties of schools that effectively attained higher scores for poor children or children of color, and although the research attracted renewed

attention from educators and policy makers to specific conditions that may foster higher performance, it nonetheless suffered from several method ological and conceptual concerns.

Methodological Dimensions of the School Effectiveness Research Multiple researchers have critiqued the school effectiveness research along various methodological lines (Bowers, 2010; Creemers, 1991; Good & Brophy, 1986; Rutter, 1983; Sammons, 1999; Scheerens, 1992; Teddlie & Reynolds, 2000). These analyses yielded at least five methodological concerns that are germane to this review: inadequate sampling, inadequate comparisons, a lack of longitudinal analyses, inadequate designs for cap turing school- and classroom-level processes, and inadequate measures of effectiveness. Inadequate sampling. The school effectiveness studies tended to use small, skewed samples, usually selected on the basis of the dependent variable: student performance. Often, the earliest research was conducted on samples of convenience or samples based on anecdotal evidence (e.g., Edmonds, 1979a). Later samples were derived from large-scale regression analyses of test performance that yielded outlier schools for case study analyses (e.g., Klitgaard & Hall, 1974). These sampling techniques lim ited the analyses that could be conducted as well as the generalizability of the results (Bowers, 2010). They also reduced the external validity of the research, or the degree to which the conclusions could be applied to other schools as well. Inadequate comparisons. Much of the school effectiveness literature was also criticized for its lack of comparisons between unusually highperforming schools and average schools (Teddlie & Stringfield, 1993). Instead, much of this literature examined only high-performing schools or compared unusually high performers with unusually low performers. This characteristic limited the studies’ external validity because it minimized the differences or similarities that may have existed between the outliers and schools that represented the norm (Bowers, 2010; Purkey & Smith, 1983). Lack of longitudinal analyses. While exceptions existed (e.g., Mortimore et al., 1988; Rutter, Maughan, Mortimer, & Ouston, 1979; Teddlie & Stringfield, 1993), the majority of school effectiveness research relied on regression methods based on data from a single point in time. This snapshot

approach to selecting and studying certain schools curbed the reliability of these studies by overestimating the stability of outlier schools’ test scores from one year to the next (Bowers, 2010; Purkey & Smith, 1983). Inadequate designs for capturing school- and classroom-level processes. While later research included perceptual data from surveys or interviews, few school effectiveness studies included direct observations of school or class room processes (see Brookover et al., 1979, and Teddlie & Reynolds, 2000, for exceptions). In their stead were analyses of the relationship between school inputs and outputs. These data sources limited the internal validity of the stud ies, or the ability to conclude that the student outcomes were in fact caused by the inputs being measured. They also produced incomplete explanations of why or how particular inputs may have led to the specific student outcomes. Inadequate measures of effectiveness. Many of these studies relied on a single measure to operationalize student achievement: standardized test scores. While some went further (e.g., Teddlie & Stringfield, 1993), most designs conceptualized “effectiveness” or “success” narrowly in terms of assessments, usually a single test-based assessment. Relying solely on the results of a single test-based assessment also falsely assumed a high degree of test validity and reliability across multiple school contexts (Bowers, 2010). This feature limited the reliability and validity of these studies and promoted a fairly narrow conception of effectiveness, one that excluded other measures of success, such as pedagogical quality, curricular breadth, academic rigor, student engagement, or other academic, social, or civic aspects of schooling.

Conceptual Dimensions of the School Effectiveness Research While the school effectiveness literature successfully reinvigorated the discourse on the characteristics of schools that better meet the needs of traditionally underserved students, critics pointed out certain conceptual limitations. These included concerns over its utility in furthering notions about the purposes of education, sociopolitical and normative contexts of schooling, and theoretical interpretations of patterns in school success. Reductionist view of the purposes of education. One of the most common refrains about the shortcomings of this tradition relates to its conceptu alization of the purposes of schooling. Critics maintain that by defining effectiveness or success in terms of test scores, this research reduced

notions of learning to discrete, easily measurable, quantifiable tasks (Ball, 1998; Rose, 1995). This definition of success is largely taken up in eco nomic arguments as an indicator of schools’ effectiveness in preparing students for workplace competition. Such a narrow proxy for success, crit ics maintain, implies primarily economic purposes of schools, rather than purposes related to social, moral, or civic goals for education in a demo cratic society (Trujillo, 2013a). The upshot of this reductionist view is a restricted conception of the types of students that schools can cultivate and the ultimate goals for teaching and learning. Inadequate treatment of the sociopolitical and normative context of schooling. Another frequent criticism of school effectiveness research is its heavy concentration on organizational or instructional elements of school ing, or the technical dimensions of schools, and the limited weight that it accords to sociopolitical or normative dimensions (Slee et al., 1998; Thrupp, 1999; Thrupp & Willmott, 2003). Critics charge that by limiting analyses of effectiveness to investigations about curriculum, time on task, monitor ing, instructional leadership traits, or schools’ explicit mission or vision, for example, this research narrowed notions about leadership and improvement to fairly rational, technical terms that did not fully account for the complex social context in which these characteristics arose. In doing so, they main tain, this research minimized the pervasive roles that class, race, or ideology played in these schools, and how those factors interacted with the technical dimensions to produce certain patterns in student outcomes. This research rarely posed explicit questions about the social con text in which the schools were situated, the inherently politicized nature of certain ways of organizing and leading schools, or teachers’ or other stakeholders’ deeply held normative beliefs about students or instruction. Critics concluded that this literature discounted how entrenched school structures were, how contested certain instructional or organizational pri orities could be, and how differently students could experience the same “effective” school depending on their class or race. Inadequate theoretical frameworks. Finally, school effectiveness research has been critically assessed for its lack of explicit theoretical frameworks or grounded theories (Slee et al., 1998; Teddlie & Reynolds, 2000). Critics cite the undertheorized rationale behind the variables originally selected for analysis, the ways in which the researchers operationalized these variables, the lack of explicit theoretical support for the assumptions behind the studies themselves, and the dearth of interpretive approaches to

generating theories from the ground up (Sandoval-Hernandez, 2008). The literature failed to produce sufficiently systematic interpretations of the factors associated with effectiveness and to fully recognize the theoretical knowledge bases that explain, at least in part, the contextualized nature of student performance and patterns in urban school reform.

Findings From a Review of the District Effectiveness Research: Contributions and Concerns Methodological Dimensions of the District Effectiveness Research The initial round of analysis for this review consisted of coding each study according to various methodological characteristics: sampling, selection, or comparison criteria; methodological designs, data collection techniques, and research subjects; and measures of effectiveness or success. Codes were derived both inductively, based on themes that arose during data analysis, and deductively, based on the major methodological critiques of the school effectiveness research. The latter codes were intended to facili tate methodological comparisons between the two traditions. Overall, this analysis revealed several areas of overlap between the traditions of district and school effectiveness research, though it also pointed to some areas of methodological growth that set the latter studies apart from the former. Sampling, selection, and comparison criteria. Of the 50 studies reviewed, 40% were selected on the basis of anecdotal evidence, district reputation, or convenience, rather than by systematically sampling a universe of districts (e.g., Hightower, 2002). Moreover, 56% of the cases were sampled on the basis of unusually high effectiveness, or “outlier” status (e.g., Koschoreck, 2001). Fifty-eight percent of cases were selected based on snapshot data from a single point in time (e.g., McFadden, 2009a) rather than longitudinal data over multiple years (e.g., Stringfield & Yakimowski-Srebnick, 2005). Of the studies that included comparative designs, 63% compared districts judged to be high performers with other high performers (e.g., American Federation of Teachers, 2000); 22% compared high performers with low performers (e.g., Iatarola & Fruchter, 2004); and only 15% included dis tricts that spanned a cross-section of performance or compared high-per forming districts with typical ones (e.g., O’Day & Bitter, 2003). Finally, despite a shortage of knowledge about the transferability of different corre lates of effectiveness among districts of varying size, only 16% of the cases included size as a criterion for selection or analysis (e.g., Maguire, 2003).

These characteristics suggest that a sizable portion of district effec tiveness studies still share several of the sampling or selection attributes of their school-level predecessors. In this way, the relatively common prac tice of sampling based on atypically high performance on the dependent variable or other methodologically dubious criteria continues to threaten the external validity and generalizability of the more recent district effec tiveness studies—a pattern consistent with other findings (Bowers, 2010). Methodological designs and data collection techniques. Next, each study was coded according to the characteristics of its methodological designs. Figure 1 illustrates the distribution of designs. It shows that the majority of research was based on either single or multiple case studies (e.g., WestEd, 2002). Slightly more than 20% included survey research 100

Percentage of Studies

80

60

40

20

0

dy

s

tu es

Ca

y

ve

r Su

ed

M

ix

m

n

tio

la rre

Co

n/

sio

es

gr

Re

s

od

eth

M

HL

r

he

Ot

hn

Et

hy

t

en

ap

r og

im er

p

Ex

Methodological Designs

Figure 1. Methodological designs across district effectiveness research. HLM = hierarchical linear modeling. Percentages do not add up to 100% because multiple codes could apply to each study.

or mixed methods (e.g., Supovitz, 2006). Regression analyses or simple correlations were used in less than 20% of the studies (e.g., Waters & Marzano, 2006), and hierarchical linear modeling (HLM) was rarely employed (e.g., Resnick & Harwell, 2000). Ethnographic or experimen tal designs were nonexistent. Figure 2 summarizes the data collection techniques employed across these studies. Over three-quarters of the studies included interviews, and almost as many entailed the analysis of secondary test score data (usu ally from state assessments and less frequently from district benchmark 80 70

Percentage of Studies

60 50 40 30 20 10 0

In d

on

c Se

y ar

ew

vi

ter

tes

e

or

c ts

sis

ly

a an c

Do

t

en

um

sis

ly

a an

s

Ob

on

ati

v er

ti

es

Qu

e

air

n on

cu

Fo

p

ou

r sg

r

lea

c Un

r

he

Ot

gs

Lo

Data Collection Techniques

Figure 2. Data collection techniques across district effectiveness research. Percentages do not add up to 100% because multiple codes could apply to each study.

tests). Document analysis was employed in 40% of the studies (e.g., Marshall, Pritchard, & Gunderson, 2004), followed by questionnaires (e.g., McLaughlin & Talbert, 2003) and focus groups (e.g., Cawelti, 2001). Observational data were collected in about a quarter of the studies (e.g., Massell & Goertz, 2002), although most of these were conducted in meet ings, not classrooms. Instructional logs maintained by teachers were never used. In 8% of the studies, data collection techniques were not revealed. Finally, each study was coded according to the subjects who partici pated in the research (Figure 3). Three-quarters of the studies included

80 70

Percentage of Studies

60 50 40 30 20 10 0

rs

to

in

ra ist

ict

str

Di

m ad

i

Pr

als

ip nc

rs

he

ac Te

s

nt

e ud

St

r

he

Ot

r

lea

c Un

n

re

Pa

m

om

C ts/

ity

un

Research Subjects

Figure 3. Research subjects across district effectiveness research. Percentages do not add up to 100% because multiple codes could apply to each study.

district administrators. Fewer than 40% involved principals or teachers (e.g., Darling-Hammond et al., 2005), and 12% included parents or other community members (e.g., Togneri & Anderson, 2003). No studies col lected data directly from students. The characteristics of the designs and data collection techniques in this field point to both strengths and areas needing growth that mir ror several of the themes from the school effectiveness tradition. School districts are generally well suited to case study research, but given the rich, complex nature of district-level phenomena and the contextualized nature of each district’s experiences, incorporating multiple mixed meth ods and data sources into these studies could yield more robust findings about the school- and classroom-level processes that transpire in these settings. For instance, the prevalence of interview data, most often from district administrators, risks slanting the findings from many studies in favor of central office perspectives, perhaps at the expense of potentially different classroom- or school-level accounts. Varying the breadth of the participants and utilizing more direct observations of classrooms could round out researchers’ understandings of the student and teacher points of view and the instructional patterns in these districts (aside from those measured by test scores). A plurality of methodological approaches and data sources would also heighten the internal validity of several of these studies by yielding a more robust set of findings about the range of factors that shaped particular outcomes, thereby setting these studies apart from the school-level inquiries. Measures of effectiveness or success. Last, each study was coded for whether it measured effectiveness, or success, in terms of standardized assessment scores or another indicator. As Figure 4 illustrates, 86% of studies defined success in terms of test scores, whereas only 22% defined it according to a different indicator or in terms of both. Examples of non-test-based measures of effectiveness included writing samples, stan dards-aligned instruction, instructional interventions for low-performing students, complex instructional tasks or student discourse, technology use, culturally relevant curriculum, and balanced literacy activities (e.g., Florian, 2000; Iatarola & Fruchter, 2004; Marshall et al., 2004; Opfer, Henry, & Mashburn, 2008; Resnick & Harwell, 2000; Spillane & Jennings, 1997). These patterns suggest that the district effectiveness literature is beginning to introduce a broader scope of success measures than its earlier counterpart, but that test performance—usually on a single assessment— still predominates. This commonality between the district and school

Percentage of Studies

100 80 60 40 20 0

Test-based

Other

Measures of Effectiveness

Figure 4. Measures of effectiveness or success across district effectiveness research. Percentages do not add up to 100% because multiple codes could apply to each study.

effectiveness research underscores not just how pervasively test results are relied on as reflections of educational success, but how rarely studies investigate multiple forms of effectiveness to either triangulate findings or explore potential areas of contradiction between test performance and other indicators of quality. The predominance of these test-based measures of effectiveness may also reflect, at least in part, the data that are most readily accessible to researchers and the constraints associated with research funders’ priorities. Indeed, research demonstrates the strong steering effects that philanthro pists, governments, and other funders have on studies’ parameters, includ ing the questions and outcomes that researchers are willing to investigate (Lubienski, Scott, & DeBray, 2011; Scott, 2009). At the same time, these test-centered measures of success may also reflect the policy environments in which district and school leaders exist. Some research shows that although district practitioners cannot easily step out of current accountability policy structures to demonstrate other forms of effectiveness, they nonetheless strive for broader indicators of success that relate to the “whole child,” such as personalized preparation for post secondary destinations, a “well-rounded” curriculum, and social identity development (Anderson & Rodway-Macri, 2009). Even so, these patterns lend support to critics’ concerns about the school effectiveness tradition, which asserted that narrowly test-based definitions

of success reduce notions about learning to easily measurable, quantifiable tasks. In doing so, the bulk of district effectiveness studies continue to imply fairly limited purposes of schools—to generate high test scores—in place of more comprehensive social, moral, or civic goals for participating in society.

Conceptual Dimensions of the District Effectiveness Research The next round of analysis focused on the conceptual characteristics of the district effectiveness research, namely, the most frequently cited cor relates of effectiveness. These correlates included those that were common to both school and district effectiveness research, as well as correlates that were unique to the district effectiveness literature. From there, these data were categorized according to the technical, sociopolitical, or normative dimensions that best described each correlate. Chunking the data in this way enabled comparisons of the most common dimensions of effective district correlates as well as examinations of any patterns or contradictions within each of these three categories. In a final step, each study was coded for the presence or absence of a theoretical framework. Correlates common to effective districts and schools. One of the most prominent patterns that surfaced was that the most frequently cited corre lates of effective districts mimicked the correlates of effective schools. As Figure 5 illustrates, having a clear mission or vision, frequently monitor ing progress, promoting strong instructional leadership, and having high expectations for students or teachers were found to be associated with dis trict effectiveness in 64–80% of the studies (e.g., Cawelti & Protheroe, 2001; Elmore & Burney, 1997; LaRocque & Coleman, 1990; McFadden, 2009b; Murphy & Hallinger, 1988; Skrla & Scheurich, 2003; Waters & Marzano, 2006). This pattern might be predicted, because researchers of effective districts can be expected to extrapolate certain factors from the tradition upon which they are building. At the same time, however, this similarity also suggests that some of the conceptual boundaries of this literature may approximate those of the earlier school-level studies. Highly technical correlates of effective districts. Next, the most frequently cited correlates of effective districts were analyzed within the three con ceptual dimensions to which scholars called attention in their appraisals of the school effectiveness studies (Slee et al., 1998; Thrupp, 1999; Thrupp & Willmott, 2003), dimensions that have been repeatedly identified as key elements for understanding how urban educational reforms unfold

80 70

Percentage of Studies

60 50 40 30 20 10 0

no

rm

io iss

ea Cl

n

sio

i rv

u

eq

Fr

t en

g

rin

to ni

o

m

cti

tru

s In

ea

ll

a on

ip

sh

r de

ig

H

pe

x he

ns e

m Ti L/

OT fe

Sa

t on

t

en

nm

ro

vi

n ye

l

er

rd

o nd

a

k

as

io

t cta

e

m

Ho

ol

ho

-sc

r

ns

io

t ela

Correlates of Effective Schools

Figure 5. Common correlates of effective districts and schools. OTL = opportunity to learn. Percentages do not add up to 100% because multiple codes could apply to each study.

(Oakes, 1992; Oakes, Welner, Yonezawa, & Allen, 1998)—the technical, the sociopolitical, and the normative. Correlates that fell within the techni cal dimension of districts’ affairs were by far the most commonly cited. In addition to the previously mentioned clear mission or vision and frequent monitoring, 70–90% of all studies cited a clear focus on student outcomes, accountability structures, organizational coherence, or standards align ment above all other correlates (e.g., Simmons & Codding, 2006; Snipes et al., 2002; Supovitz, 2006; Togneri & Anderson, 2003). Sixty-eight percent

cited “focused,” “high quality,” or “intensive” professional development for teachers as well as planning and goal setting, and 66% cited the aforementioned strong instructional leadership (e.g., Darling-Hammond et al., 2005; Fink & Resnick, 2011). Figure 6 presents the percentage of studies in which each technical correlate was cited. Considerably fewer sociopolitical correlates surfaced in the review, as Figure 7 shows. Thirty percent of the studies referred to fostering strong home-school relations, as in the school effectiveness research (e.g., Florian, Hange, & Copland, 2000; Maguire, 2003; Murphy & Hallinger, 1988), followed closely by efforts to build coalitions, alliances, or trusting 100

Percentage of Studies

80

60

40

20

0

e k n g ip g ty n t is s) nt s) es m isio orin bili renc me her ettin rsh ipal n tas alys men ctio o t c a e v n c a e c s i n t or t a al d in e o an ro of on un coh alig (te r ea ou i a nt sion nt m cco nal rds PD d go nal l D (p /Tim Dat env eory e d A tio da ty an tio y P arn tu mis que rly Th i n de n s ear Fre iza Sta qual ning truc alit to le r o n o n Ins qu ty s , ga d Cl , i cu ed Pla an Or ed tun e us Fo s f c r cu o Sa Fo Fo Opp Most Common Correlates

Figure 6. Most frequently cited technical correlates of effective districts. PD = professional development. Percentages do not add up to 100% because multiple codes could apply to each study.

30

Percentage of Studies

25 20 15 10 5 0

s

on

lr

oo

e

m

Ho

h -sc

ti ela

g, of ty, in ui ice nt ext li d ing q e t e u s m nt ld r dg l co n b bui fo al ju e o s l i l i t p a c w ca ali nshi go r, so no stori o t k i C tio e ic Ac l, hi la pl ow re Ex p cia so Most Common Correlates

Figure 7. Most frequently cited sociopolitical correlates of effective districts. OTL = opportunity to learn. Percentages do not add up to 100% because multiple codes could apply to each study.

relationships with multiple stakeholders (e.g., Hightower, 2002; Snipes et al., 2002). Sixteen percent of studies linked district effectiveness with explicit goals related to equity, social justice, or power redistribution (e.g., Koschoreck, 2001; Stringfield & Yakimowski-Srebnick, 2005), and only 10% found that district leaders’ recognition of the social or historical con text of their communities was related to effectiveness (e.g., Ross et al., 1998; Stringfield & Yakimowski-Srebnick, 2005). Correlates that reflected the normative aspects of district effectiveness were similarly uncommon. Figure 8 reveals that, besides the frequently cited correlate of high expectations (64%), researchers identified only two normative correlates, collective norm-building efforts and activities for exploring beliefs or ideological attitudes about learners and communities.

Percentage of Studies

80 70 60 50 40 30 20 10 0 High expectations

Collective norm-building

Ideologies or beliefs about learners, community

Most Common Correlates

Figure 8. Most frequently cited normative correlates of effective districts. Percentages do not add up to 100% because multiple codes could apply to each study.

Both were connected with effectiveness in only 12% of studies (e.g., Haycock, Jerald, & Huang, 2001; Ragland et al., 1999). Overall, 83% of all correlates cited fell within districts’ technical boundaries; normative or sociopolitical dimensions were reflected in the remaining 17%. These patterns may be explained in multiple ways. For example, they may simply reflect the most potent factors that affect district performance on standardized tests—primarily technical ones. Reforms that center on technically aligning state standards, assessments, and dis trict curriculum may in fact be one of the most efficient routes to boosting test scores. Likewise, reforms that build a more coherent district manage ment system in which all technical components of a central office focus on common goals for student test results (such as resource allocation, pro fessional development, monitoring and accountability mechanisms) are likely to produce greater test gains than less coherently organized districts. Other contributing factors may include funders’ preferences for certain research questions and researchers’ capacity to carry out more comprehen sive forms of data collection. Yet the heavy weight of these technical properties may also reflect the scope of factors that researchers of district effectiveness are disposed to

investigate based on their epistemological or methodological orientation. This latter interpretation may lend credence to the critiques that effec tiveness studies do not fully account for the complex social, political, or normative contexts in which districts are situated. The district-level stud ies rekindle some of the largest conceptual concerns about the previous effective schools research: They risk discounting how deeply ingrained district structures can be, how normatively charged specific instructional or organizational priorities can be, or how racially or socioeconomically stratified students’ experiences may be within a district. Regardless of which explanation one favors, the stark contrast between the proportion of technical correlates and the others reveals how negligible issues of politics, class, race, and ideology are in the discourse around district effectiveness and central office reform. This pattern is concerning because it suggests that most studies of district effects still exclude the full range of factors that research has proven powerfully predicts a district’s policy decisions and performance patterns (Truijllo, 2013b). Persistently undertheorized studies. Finally, each study was coded accord ing to whether it included either an explicit theoretical framework that guided its conceptualization, design, and analysis, or a grounded theory that emerged from the study. I coded any study that contained any refer ence to a guiding conceptual or theoretical framework, or that referenced any specific concepts that were used to frame the design or analysis, as “including evidence of a theoretical framework.” I also coded any study that referred to any inductively derived theory or theoretical concepts as “including evidence of grounded theory.” These definitions were purpose fully broad in order to capture the full range of studies that demonstrated any degree of theorizing about phenomena related to district effective ness. Only 26% of studies contained such theoretical bases (e.g., Honig et al., 2010; Skrla & Scheurich, 2003; Spillane & Jennings, 1997; Stein & D’Amico, 2002). While many studies situated their work in the previ ous literature on district effectiveness, only a quarter presented a theoreti cal rationale behind the factors selected for analysis and the assumptions upon which their design was based, or a systematic method of generating a theory from the data collected. One of the largest strengths of the district effectiveness literature is its relevance to issues of practice. Indeed, a primary impetus behind the research has been to isolate the concrete steps that central office leaders can take to cultivate the types of organizations that promote greater stu dent success. Yet in concentrating chiefly on practical application, some of

these studies may have repeated the same conceptual oversights as their school-level ancestors by steering too far from certain theoretical bodies of knowledge that can help explain some of their findings, particularly as the findings relate to patterns in student test performance or the reasons that different correlates of effectiveness interacted with one another in cer tain ways. The atheoretical nature of these studies is also seen in the widespread “lists” of effectiveness correlates that these studies often set forth. Rather than interpreting the ways in which different correlates interacted with one another or perhaps the multiple combinations of correlates that were linked with certain outcomes, these studies tend to produce basic inventories of behaviors and characteristics that are associated with higher performance. In doing so, several of these studies miss opportunities to systematically generate more complex, grounded theories about district effectiveness that can illustrate how and under what conditions certain correlates are related with particular outcomes. Finally, the undertheorized nature of these studies is reflected in the conventional terms in which the “school district” is conceived—the tra ditional, North American Local Education Agency. Yet sociological, political, and organizational theories increasingly address other forms of organizations that mediate between the state and the school. Literature on analogous intermediary organizations, including education management organizations or large-scale reform organizations, offers certain theoretical frames on which district effectiveness studies have thus far been largely silent (Trujillo & Woulfin, 2014). In doing so, this literature has restricted its theorizing about effective district organizations to the dominant organi zational and political forms that are specific to this region.

Discussion This review demonstrates both (a) the ways in which district effectiveness research enhances our understanding of the mechanisms that are associ ated with system-wide improvements in student test performance and (b) the areas in which the field is poised to deepen its methodological rigor, theoretical depth, and conceptual breadth. As for the contributions of this literature, the aggregate of these stud ies points to one probable conclusion: Districts matter for student out comes. Repeated findings about the properties of districts that are linked with test gains underscore that central offices likely enjoy a compara tive advantage over individual schools in the large-scale organizing and

aligning of curricular, instructional, and other resources in the service of testing. Moreover, the literature offers several practical ideas that district leaders can enact as elements of a larger district plan for increasing student success. Despite these advances, this review brings to light at least three concerning implications of the district effectiveness literature for research and practice. In the sections that follow I discuss these implications, with regard to district leadership, school improvement, and educational equity.

Hyper-Rationalized District Leadership Models That technical correlates of effectiveness so heavily outweigh other types of correlates in this literature suggests the study of district effectiveness has largely been a technical scholarly pursuit. In one sense, this techni cal emphasis is fitting. It reflects what many researchers in the field have intended to do: generate practical, technical knowledge to assist central office leaders who are attempting to craft and implement district-wide improvement on the indicators of student learning to which they are held publicly accountable. Yet this heavily technical frame risks promoting a highly rational model for leading districts, one that accounts minimally for the political, normative, and social contexts in which districts are embed ded and in which central office and school site actors carry out their work. For example, aside from a small minority of studies, the district effec tiveness research infrequently explored questions about district leaders’ negotiations and politicking that occur in higher or lower scoring districts. Knowledge of how district leaders construct a compelling ideological narrative for staff to embrace or how they build political alliances amid multiple, competing constituencies—both potential correlates of more successful districts—is relatively sparse in this literature. The upshot of these omissions is an overly technical account of what district leaders do to effectuate changes in student performance. Such hyper-rationalized por trayals of district leaders’ roles and responsibilities, absent complementary questions about the political or normative actions of district administra tors, risk presenting oversimplified scholarly and practical models of dis trict leadership.

Simplistic Notions of District Improvement Second, this review highlights at least two concerning implications of the district effectiveness field for research and practice related to district-wide improvement. First, the predominant operationalization of student success

in terms of performance on standardized assessments reinforces the con ventional wisdom that narrowly conceives of effectiveness as measured by test scores. While this focus also reflects the policy realities in which district leaders operate and the political economy of research funding that shapes researchers’ foci, this notion of student learning—centered on measurable, discrete tasks, usually in one or two subject areas—reinforces earlier critics’ concerns that effectiveness studies promote an exceedingly narrow purpose for education and distract attention from social or civic goals or intellectual aims such as critical thinking. On one hand, such a narrowly conceived purpose of education may in fact be prudent, since these may be the outcomes over which districts and schools have the most influence. Yet by emphasizing such a narrow range of purposes, this literature risks confining the discourse on district improvement and the reforms taken up by central offices to the efforts that most potently affect test scores, perhaps to the exclusion of other meaningful ends for teaching and learning. The common practice of sampling cases for study based on the depen dent variable, or unusually high test performance, also weakens the valid ity of the conclusions about the causes of improvement in successful districts. The preponderance of these outlier studies does not represent the majority of districts that are struggling to improve performance, and leaves open the question of whether other districts may have employed these same strategies but experienced less success (American Institutes for Research, 2005). Do the observed strategies and characteristics in fact rep resent authentic correlates of effective districts, or could they be artifacts of other latent factors, such as a district’s unique historical or political leg acy, one superintendent’s exceptional relationship with a school board, a district’s size, or a district’s chance encounter with particular resources or atypical community trends? This sampling bias oversimplifies the schol arly and practical implications of the research for the routes to districtwide improvement.

De-Contextualized Explanations of the Roots of Educational Inequity Perhaps the bedrock implication of this review concerns research and prac tice involving educational equity. By concentrating primarily on the tech nical features of districts that leaders can manipulate in order to improve student outcomes, many of these studies seem to imply that student out comes are largely predicted by within-district factors. Without adequately contextualizing the findings about the districts under study, studies run the

risk of contributing to a culture of blame in both practitioner and scholarly communities by focusing exclusively on the relationship between changes within districts’ sphere of responsibility and student performance. Such examinations tread a thin line between investigating constructive districtlevel practices and placing the onus for failure squarely on districts and schools. Without also acknowledging the predictive power of contextual factors related to poverty, race, or distinctive historical realities of particu lar district communities, some of these studies shift attention away from the broader institutional, systemic inequities that shape districts’ capacity to enact certain changes and achieve particular outcomes.

New Directions: Expanding Conceptualizations of District Effectiveness In light of these implications, in what follows I propose new conceptual and methodological directions for the field of district effectiveness research.

More Holistic Theoretical Frameworks Given the hyper-rationalized, sometimes overly technical conceptual models that characterize much of this literature, future researchers can be mindful of the imbalance by framing district effectiveness studies not just in terms of the technical dimensions of the reforms but in terms of the sociopolitical and normative dimensions as well. As Oakes concluded in her analysis of equity-oriented reforms, research that accounts for all three dimensions of urban educational change stands to generate more accurate findings about the complex nature of these processes and the intercon nectedness of each dimension. Interdisciplinary frameworks that draw on constructs from political science, politics of education, critical race stud ies, or critical policy studies, to name a few, are one tool that researchers can use to design more holistic inquiries. Such frameworks also stand to round out scholars’ and practitioners’ understandings about the multiple combinations of conditions that can lead to district success. Using these frameworks can not only deepen our expla nations of the range of factors that bear on a district’s success; they can rein force the notion that multiple, varied paths—as opposed to one-size-fits-all improvement plans—might yield success in different settings. More com plicated, multidimensional district improvement plans—ones that explicitly address technical, political, and normative factors—may increase districts’ chances of meaningful success across all fronts, not just test scores.

Complicating Our Notions About District Improvement Future district effectiveness researchers can incorporate multiple mea sures of success—apart from test scores—that reflect the multiple pur poses of schools. These measures can include assessments of students’ social preparedness by evaluating their work on group-based tasks and problem-based projects, and noting their suspension and expulsion rates. Broader academic measures can include “English learner” reclassification rates, graduation and four-year college enrollment rates, and access to highly credentialed teachers and advanced courses. Indicators of schools’ civic effectiveness can include students’ opportunities to engage in com munity-based learning projects and the representativeness of the com munity members who participate in school governance. Disaggregating all of these measures by race, income, and language status can also help uncover persistent gaps in opportunity and access that are often obscured in more aggregated analyses. Expanding how researchers operationalize success poses additional benefits. The overwhelming reliance on test scores as the primary gauge of an exemplary district necessarily reduces our conceptions of desirable reforms to ones centered on tests. Yet the very act of inquir ing about districts’ political dimensions, for example, can enhance the discourse on district improvement to include considerations about the complex political histories in which districts are embedded and which bear on community members’ willingness to embrace or resist particular reforms. Such historical dynamics can prove powerful in determining districts’ success—particularly with respect to reforms that challenge the status quo by redistributing economic, academic, and other resources. Researchers’ decisions about which districts to study can prove equally significant. Sampling districts based on more than just test score patterns or test-related improvement plans can flesh out our understand ings of desirable district goals and strategies. Studying districts that may not place standardized testing at the core of their improvement plans, but that prioritize other aims for their schools and families, can reinforce broader notions about districts’ civic or social purposes—not just their academic ones. Studying districts that are implementing full-service com munity school initiatives, socioemotional student learning plans, or sys tems and structures for supporting students from historically underserved racial groups can all call attention to less examined but potentially power ful goals for district-wide improvement.

Diversifying Methods and Subsequent Claims Along these same lines, future researchers of district effectiveness can not only enlarge the scope of what they define as success but also diversify the methods they use to investigate their phenomena. Longitudinal analyses can help identify which academic or other pat terns hold beyond the short term. Comparative designs that examine not only the strongest positive cases but also cases that represent a range of success, ideally across multiple indicators, can help explain how similar improvement strategies play out in different districts. Comparing pat terns within and across economic, racial, geographic, and other relevant contexts can also help scholars account for the powerful influences of these settings on districts’ processes and outcomes, particularly if they are interested in understanding the roots behind districts’ opportunities for high-quality teaching and learning. Diversifying the range of partici pants beyond administrators, to include teachers, students, and commu nity members, can also round out explanations of districts’ effectiveness. So, too, can more direct observations of classroom practice. Combined, these data sources can enhance the internal validity of studies by account ing for a fuller collection of factors that may explain districts’ successes and challenges. In sum, the sharp focus on questions of what works in the district effec tiveness literature has deepened researchers’ and practitioners’ knowledge of the specific mechanisms that may produce more desirable results in test performance; yet these questions alone, decoupled from correspond ing inquiries about the complex, highly contextualized character of higher or lower scoring districts, leave researchers and practitioners vulnerable to the same scholarly and practical pitfalls as their predecessors.

Note 1

Multiple branches of research exist in this field (e.g., school effectiveness, school ef fects, effective schools, school improvement), each with its own distinct but related focus and purpose. However, for the purposes of this review, I use the term school effectiveness research to refer broadly to all of these areas.

References References marked with an asterisk indicate studies included in the meta-analysis. *Allen, L., Osthoff, E., White, P., & Swanson, J. (2005). A delicate balance: District policies and classroom practice. Chicago, IL: Cross City Campaign for Urban School Reform.

*American Federation of Teachers. (2000). Doing what works: Improving big city school districts. Washington, DC: Author. American Institutes for Research. (2005). Toward more effective school districts: A review of the knowledge base. Washington, DC: Author. Anderson, S. (2006). The school district’s role in educational change. International Jour nal of Educational Reform, 15(1), 13–37. Anderson, S., & Rodway-Macri, J. (2009). District administrator perspectives on student learning in an era of standards and accountability: A collective frame analysis. Cana dian Journal of Education, 32(2), 192–221. Ball, S. (1998). Educational studies, policy entrepreurship, and social theory. In R. Slee, G. Weiner, & S. Tomlinson (Eds.), School effectiveness for whom? Challenges to the school effectiveness and school improvement movements (pp. 70–83). London: RoutledgeFalmer. Bowers, A. (2010). Toward addressing the issues of site selection in district effectiveness research: A two-level hierarchical linear growth model. Educational Administration Quarterly, 46(3), 395–425. Brookover, W., Beady, C., Flood, P., & Schweitzer, J. (1979). School systems and student achievement: Schools can make a difference. New York: Praeger. Brookover, W., & Lezotte, L. (1979). Changes in school characteristics coincident with changes in student achievement. East Lansing, MI: Institute for Research on Teaching. Brookover, W., & Schneider, J. (1975). Academic environments and elementary school achievement. Journal of Research and Development in Education, 9(1), 82–91. *Cawelti, G. (2001). Six districts, one goal of excellence. Journal of Staff Development, 22(4), 31–35. *Cawelti, G., & Protheroe, N. (2001). High student achievement: How six school districts changed into high-performance systems. Arlington, VA: Educational Research Service. Childress, S., Johnson, S., Grossman, A. & Elmore, R. (Eds.). (2007). Managing school districts for high performance: Cases in public education leadership. Cambridge, MA: Harvard Education Press. Coburn, C. E., Toure, J., & Yamashita, M. (2009). Evidence, interpretation, and persuasion: Instructional decision making in the district central office. Teachers College Record, 111(4), 1115–1161. Coleman, J. (1966). Equality of educational opportunity (Report No. OE-3800). Washing ton, DC: National Center for Educational Statistics. *Corcoran, T., Fuhrman, S., & Belcher, C. (2001), The district role in instructional im provement. Phi Delta Kappan, 81(1), 78–84. Creemers, B. (1991). Review of effective teaching: Current research. School Effectiveness and School Improvement, 2(3), 256–260. Cuban, L. (1984). Transforming the frog into a prince: Effective schools research, policy, and practice at the district level. Harvard Education Review, 54(2), 129–151. *D’Amico, L., Harwell, M., Stein, M., & Van de Heuvel, J. (2001). Examining the implemen tation and effectiveness of a district-wide instructional improvement effort. Paper present ed at the annual meeting of the American Educational Research Association, Seattle, WA.

*Darling-Hammond, L., Hightower, A., Husbands, J., LaFors, J., Young, V., & Christo pher, C. (2005). Instructional leadership for systemic change: The story of San Diego’s reform. Lanham, MD: Scarecrow Education. Datnow, A. (2000). Power and politics in the adoption of school reform models. Educa tional Evaluation and Policy Analysis, 22(4), 357–374. David, J. (1989). Synthesis of research on school-based management. Educational Leader ship, 47, 45–53. Doyle, D., & Finn, C. (1984). American schools and the future of local control. Public Interest, 77, 77–95. Edmonds, R. (1979a). Effective schools for the urban poor. Educational Leadership, 37(1), 15–24. Edmonds, R. (1979b). Some schools work and more can. Social Policy, 9(2), 28–32. Elmore, R. (1993). The role of local school districts in instructional improvement. In S. Fuhrman (Ed.), Designing coherent education policy: Improving the system (1st ed., pp. 96–124). San Francisco: Jossey-Bass. *Elmore, R., & Burney, D. (1997). Investing in teacher learning: Staff development and instructional improvement in Community School District #2, New York City. New York: National Commission on Teaching and America’s Future and Consortium for Policy Research in Education. *Elmore, R., & Burney, D. (1998). Continuous improvement in Community District #2. Pittsburgh, PA: University of Pittsburgh, Learning Research and Development Center. *Fink, E., & Resnick, L. (2011). Developing principals as instructional leaders. Phi Delta Kappan, 82(8), 598–606. Floden, R., Porter, A., Alford, L., Freeman, D., Irwin, S., Schmidt, W., & Schwille, J. (1988). Instructional leadership at the district level: A closer look at autonomy and control. Educational Administration Quarterly, 24(2), 96–124. *Florian, J. (2000). Sustaining education reform: Influential factors. Aurora, CO: MidContinent Research for Education and Learning. *Florian, J., Hange, J., & Copland, G. (2000). The phantom mandate: District capacity for reform. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. Fuhrman, S. (1993). Designing coherent education policy: Improving the system (1st ed.). San Francisco: Jossey-Bass. Good, T., & Brophy, J. (1986). School effects. In M. Wittrock (Ed.), Third handbook of research on teaching (Vol. 5, pp. 570–602). New York: Macmillan. *Haycock, K., Jerald, C., & Huang, S. (2001). Closing the gap: Done in a decade. Wash ington, DC: The Education Trust. *Hightower, A. (2002). San Diego’s big boom: Systemic instructional change in the central office and schools. In A. Hightower, M. Knapp, J. Marsh, & M. McLaughlin (Eds.), School districts and instructional renewal (pp. 76–93). New York: Teachers College Press.

Hightower, A., Knapp, M., Marsh, J., & McLaughlin, M. (2002).The district role in in structional renewal: Making sense and taking action. In A. Hightower, M. Knapp, J. Marsh, & M. McLaughlin (Eds.), School districts and instructional renewal (pp. 193– 201). New York: Teachers College Press. Hill, P. (1999). Supplying effective public schools in big cities (Brookings Papers on Edu cation Policy 1999). Washington, DC: Brookings Institution Press. Honig, M. (2003). Building policy from practice: District central office administrators’ roles and capacity for implementing collaborative education policy. Educational Ad ministration Quarterly, 39(3), 292–338. Honig, M. (2004). The new middle management: Intermediary organizations in education policy implementation. Educational Evaluation and Policy Analysis, 26(1), 65–87. Honig, M. (2009). “External” organizations and the politics of urban educational lead ership: The case of new small autonomous schools initiatives. Peabody Journal of Education, 84(3), 394–413. *Honig, M., Copland, M., Rainey, L., Lorton, J., & Newton, M. (2010). School district central office transformation for teaching and learning improvement: A report to the Wallace Foundation. Seattle, WA: Center for the Study of Teaching and Learning. *Iatarola, P., & Fruchter, N. (2004). District effectiveness: A study of investment strategies in New York City public schools and districts. Educational Policy, 18(3), 491–512. Jencks, C., Smith, M., Acland, H., Bane, M. J., Cohen, D., Gintis, H., Heyns, B., & Mi chelson, S. (1972). Inequality: A reassessment of the effects of family and schooling in America. New York: Basic Books. Klitgaard, R., & Hall, G. (1974). Are there unusually effective schools? Journal of Human Resources, 74, 90–106. *Koschoreck, J. (2001). Accountability and educational equity in the transformation of an urban district. Education and Urban Society, 33(3), 284–314. Labaree, D. (2007). Education, markets, and the public good: Selected works of David F. Labaree (Routledge World Library of Educationalists). London: Routledge. *LaRocque, L., & Coleman, P. (1990). Quality control: School accountability and district ethos. In M. Holmes, K. Leithwood, & D. Musella (Eds.), Educational policy for effec tive schools (pp. 168–191). Toronto: OISE Press. Leithwood, K. (1995). Effective school district leadership: Transforming politics into edu cation. Albany: State University of New York Press. Leithwood, K. (2010). Characteristics of school districts that are exceptionally effective in closing the achievement gap. Leadership and Policy in Schools, 9(3), 245–291. Lezotte, L., Edmonds, R., & Ratner, G. (1974). A final report: Remedy for school failure to equitably deliver basic school skills. Department of Urban and Metropolitan Studies, Michigan State University. Lubienski, C., Scott, J., & DeBray, E. (2011, July 22). The rise of intermediary organiza tions in knowledge production, advocacy, and educational policy. Teachers College Record. ID Number: 16487. Retrieved from http://www.tcrecord.org *Maguire, P. (2003). District practices and student achievement: Lessons from Alberta. Kelowna, BC, Canada: Society for Advancement of Excellence in Education.

Marsh, J. (2007). Democratic dilemmas: Joint work, education politics, and community. Albany: State University of New York Press. *Marshall, J., Pritchard, R., & Gunderson, B. (2004). The relation among school district health, total quality principles for school organization and student achievement. School Leadership and Management, 4(2), 175–190. *Massell, D., & Goertz, M. (2002). District strategies for building instructional capacity. In A. Hightower, M. Knapp, J. Marsh, & M. McLaughlin (Eds.), School districts and instructional renewal (pp. 173–192). New York: Teachers College Press. McDonnell, L. (2000). Defining democratic purposes. In L. McDonnell, P. Timpane, & R. Benjamin (Eds.), Rediscovering the democratic purposes of education (pp. 1–20). Lawrence, KS: University of Kansas Press. *McFadden, L. (2009a). District learning tied to student learning. Phi Delta Kappan, 90(8), 544–553. *McFadden, L. (2009b), Miami’s “Zone” teaches lessons about low-performing schools. Phi Delta Kappan, 90(8), 557–562. McLaughlin, M., & Talbert, J. (1993). Contexts that matter for teaching and learning: Strategic opportunities for meeting the nation’s educational goals. Stanford, CA: Cen ter for Research on the Context of Secondary School Teaching. *McLaughlin, M., & Talbert, J. (2003). Reforming districts: How districts support school reform. Seattle: University of Washington, Center for the Study of Teaching and Policy. Miles, M., & Huberman, A. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.). Thousand Oaks, CA: Sage. Mortimore, P., Sammons, P., Stoll, L., Lewis, D., & Ecob, R. (1988). School matters: The junior years. Wells, UK: Open Books. Murphy, J. (1991). Restructuring schools: Capturing and assessing the phenomena. New York: Teachers College Press. *Murphy, J., & Hallinger, P. (1988). Characteristics of instructionally effective districts. Journal of Educational Research, 81(3), 175–181. New York State Department of Education. (1974). School factors influencing reading achievement: A case study of two inner city schools. Albany, NY: Office of Education Performance Review. Oakes, J. (1992). Can tracking research inform practice? Technical, normative, and politi cal considerations. Educational Researcher, 21(4), 12–21. Oakes, J., Welner, K., Yonezawa, S., & Allen, R. L. (1998). Norms and politics of equityminded change: Researching the “zone of mediation.” In A. Hargreaves, A. Lieber man, M. Fullan, & D.W. Hopkins (Eds.), The international handbook of educational change (pp. 952–975). London, UK: Kluwer. *O’Day, J., & Bitter, C. (2003). Evaluation study of the Immediate Intervention/Underperform ing Schools Program and the High Achieving/Improving Schools Program of the Public Schools Accountability Act of 1999. Sacramento, CA: American Institutes of Research. *Opfer, V., Henry, G., & Mashburn, A. (2008). The district effect: Systemic responses to high stakes accountability policies in six southern states. American Journal of Educa tion, 114(2), 299–332.

Ouchi, W. (2003). Making schools work: A revolutionary plan to get your children the education they need. New York: Simon & Schuster. *Phenix, D., Siegel, D., Zaltsman, A., & Fruchter, N. (2005). A forced march for fail ing schools: Lessons from the New York City Chancellor’s District. Education Policy Analysis Archives, 13(40). Retrieved from http://epaa.asu.edu/ojs/article/view/145 *Pritchard, R., & Marshall, J. (2002). Professional development in “healthy” vs. “un healthy” districts: Top 10 characteristics based on research. School Leadership and Management, 22(2), 113–141. Purkey, S., & Smith, M. (1983). Effective schools: A review. Elementary School Journal, 83, 426–452. Purkey, S., & Smith, M. (1985). School reform: The district policy implications of the ef fective schools literature. Elementary School Journal, 85(3), 353–389. *Ragland, M., Asera, R., & Johnson, J. (1999). Urgency, responsibility, efficacy: Prelimi nary findings of a study of high-performing Texas school districts. Austin, TX: Charles A. Dana Center. *Resnick, L., & Harwell, M. (2000). Instructional variation and student achievement in a standards-based education district (CSE Technical Report 522). Los Angeles: CRESST. Rose, M. (1995). Possible lives: The promise of public education in America. New York: Penguin Books. *Ross, J., Hannay, L., & Brydges, B. (1998). District-level support for site-based renewal: A case study of secondary school reform. Alberta Journal of Educational Research, 44(4), 349–365. Rutter, M. (1983). School effects on pupil progress: Findings and policy implications. Child Development, 54(1), 1–29. Rutter, M., Maughan, B., Mortimer, P., & Ouston, J. (1979). Fifteen thousand hours: Sec ondary schools and their effects on children. London: Open Books. Sammons, P. (1999). School effectiveness: Coming of age in the twenty-first century. Lisse, NL: Swets & Zeitlinger. Sandoval-Hernandez, A. (2008, March). School effectiveness research: A review of criti cisms and some proposals to address them. Educate, 31–44. Sarason, S. (1990). The predictable failure of educational reform: Can we change course before it’s too late? San Francisco: Jossey-Bass. Scheerens, J. (1992). Effective schooling: Research, theory and practice. London: Cassell. Scott, J. (2009). The politics of venture philanthropy in charter school policy and advocacy. Educational Policy, 23(1), 106–136. *Simmons, J., & Codding, J. (2006). Breaking through: Transforming urban school dis tricts. New York: Teachers College Press. *Sims, D. (2008). Strategic responses to school accountability measures: It’s all in the tim ing. Economics of Education Review, 27(1), 58–68. *Skrla, L., & Scheurich, J. (2003). Displacing deficit thinking in school district leadership. In L. Skrla & J. Scheurich (Eds.), Educational equity and accountability: Paradigms, policies, and politics (pp. 109–132). New York: RoutledgeFalmer.

*Skrla, L., Scheurich, J., & Johnson, J. (2000). Equity-driven, achievement-focused school districts. Austin, TX: Charles A. Dana Center. Slee, R., Weiner, G., & Tomlinson, S. (1998). School effectiveness for whom? Challenges to the school effectiveness and school improvement movements. London: RoutledgeFalmer. Smith, M., & O’Day, J. (1991). Systemic school reform. In S. Fuhrman & B. Malen (Eds.), The politics of curriculum and testing. Bristol, PA: Falmer Press. *Snipes, J., Doolittle, F., & Herlihy, C. (2002). Foundations for success: Case studies of how urban school systems improve student achievement. Washington, DC: MDRC for the Council of Great City Schools. *Snyder, J. (2001). The New Haven Unified School District: A teaching quality system for excellence and equity. Journal of Personnel Evaluation in Education, 15(1), 61–81. Spartz, J. (1977). Delaware Educational Accountability System case studies: Elementary schools Grades 1–4. Dover, DE: Delaware Department of Public Instruction. Spillane, J. (1996). School districts matter: Local educational authorities and state instruc tional policy. Educational Policy, 10(1), 63–87. Spillane, J. (1998). State policy and the non-monolithic nature of the local school district: Organizational and professional considerations. American Educational Research Jour nal, 35(1), 33–63. *Spillane, J., & Jennings, N. (1997). Aligned instructional policy and ambitious pedagogy: Exploring instructional reform from the classroom perspective. Teachers College Re cord, 98, 449–481. Spillane, J., & Thompson, C. (1997). Reconstructing conceptions of local capacity: The lo cal education agency’s capacity for ambitious instructional reform. Educational Evalu ation and Policy Analysis, 19(2), 185–203. *Springboard Schools. (2006). Minding the gap: New roles for school districts in the Age of Accountability. A study of high-performing, high-poverty school districts in Califor nia. San Francisco, CA: Author. *Stein, M., & D’Amico, L. (2002). Inquiry at the crossroads of policy and learning: A study of a district-wide literacy initiative. Teachers College Record, 104(7), 1313–1344. *Stein, M., Harwell, M., & D’Amico, L. (1999). Toward closing the gap in literacy achievement: High Performance Learning Communities Project Community School District #2. Pittsburgh, PA: Learning Research and Development Center. *Stringfield, S., & Yakimowski-Srebnick, M. (2005). Promise, progress, problems, and paradoxes of three phases of accountability: A longitudinal case study of the Baltimore City Public Schools. American Educational Research Journal, 42(1), 43–75. *Supovitz, J. (2006). The case for district-based reform: Leading, building, and sustaining school improvement. Cambridge, MA: Harvard Education Press. *Supovitz, J., & Taylor, B. (2005). Systemic education evaluation: Evaluating the impact of systemwide reform in education. American Journal of Evaluation, 26(2), 204–230. Teddlie, C., & Reynolds, D. (2000). The international handbook of school effectiveness research. London: Falmer Press. Teddlie, C., & Stringfield, S. (1993). Schools do make a difference: Lessons learned from a ten-year study of school effects. New York: Teachers College Press.

Thrupp, M. (1999). Schools making a difference: Let’s be realistic! Buckingham, UK: Open University Press. Thrupp, M., & Willmott, R. (2003). Education management in managerialist times: Beyond the textual apologists. Philadelphia: Open University Press. *Togneri, W., & Anderson, S. (2003). Beyond islands of excellence: What districts can do to improve instruction and achievement in all schools. Washington, DC: Learning First Alliance and Association for Supervision and Curriculum Development. Trujillo, T. (2013a). The disproportionate erosion of local control: Urban school boards, high-stakes accountability, and democracy. Educational Policy, 27(2), 334–359. Trujillo, T. (2013b). The politics of district instructional policy formation: Compromising equity and rigor. Educational Policy, 27(3), 531–559. Trujillo, T., & Renée, M. (2012). Democratic school turnarounds: Pursuing equity and learning from evidence. Boulder, CO: National Education Policy Center. Retrieved from http://nepc.colorado.edu/publication/democratic-school-turnarounds Trujillo, T., & Woulfin, S. (2014). Equity-oriented reform amid standards-based account ability: A qualitative comparative analysis of an intermediary’s instructional practices. American Educational Research Journal, 51(2), 253–293. *Tupa, M., & McFadden, L. (2009a). District, know thyself. Phi Delta Kappan, 90(8), 563–566. Vinovskis, M. (1996). An analysis of the concept and uses of systemic educational reform. American Educational Research Journal, 33(1), 53–85. *Waters, J., & Marzano, R. (2006). School district leadership that works: The effect of su perintendent leaderhip on student achievement. Denver, CO: Mid-continent Research for Education and Learning. Weinbaum, E. (2005). Stuck in the middle with you: District responses to state account ability. In B. Gross & M. Goertz (Eds.), Holding high hopes: How high schools re spond to state accountability policies. Philadelphia: Consortium for Policy Research in Education. Welner, K. (2001). Legal rights, local wrongs: When community control collides with edu cational equity. Albany: State University of New York Press. *WestEd. (2002). Improving districts: Systems that support learning. San Francisco: Author.

Appendix Sample Coding Matrix for Methodological Features of District Effectiveness Studies Study A Quantitative design Survey Experiment Regression/correlation HLM Other Qualitative design Case study Ethnography Other Mixed design Sampling and selection Based on dependent variable Convenience sample Comparison criteria All high-performing High- to low-performing Range of performance Data-collection techniques Interviews Focus groups Secondary test score analyses Document analyses Observations Questionnaires Logs Other Unclear Research subjects District administrators Principals Teachers Students Parents/community Other Unclear Measures of effectiveness or success Test-based Non-test-based Relevant notes or comments

Study B

Study C

Study D

...

Chapter 2

Expanding School Indicator Systems in a Post-NCLB Era lAurA S. HAmilTon AnD HeATHer l. ScHwArTz RAND Corporation

O

ver the past decade, school districts in the United States have faced growing pressure to enact or participate in extensive school and edu cator performance evaluations. This pressure stems from a variety of state and federal policies and has led district staff to reallocate resources and time in ways that fundamentally alter their day-to-day work. The No Child Left Behind Act (NCLB), for example, led to growing numbers of schools’ being labeled as not making adequate progress, resulting in new demands on district leaders to intervene in school-level operations. Similarly, recently enacted educator evaluation systems in over 40 states have required districts to respond to new information about the performance of teachers and school leaders and to devote extensive resources to the implementation of the systems. Although a growing number of educators, parents, and policy makers have expressed concern over federal and state assessment and accountability requirements, there is little reason to believe that these policies will cease to affect the work of schools and school districts. In this chapter, we summa rize recent assessment-related policy initiatives intended to improve school performance and the corresponding indicator systems intended to aid school and district improvement efforts. We provide guidance to state- and districtlevel staff regarding the design of school indicator systems, and to district and school staff to help them respond to these systems. The research makes it clear that districts will play a key role in influencing how new assessment-related policies ultimately affect teaching and learning, and that the design of indica tor systems should be informed by the needs and experiences of districts.

Measuring School and District Performance:

The Current Policy Landscape

As the heightened controversy over states’ participation in the Common Core State Standards (CCSS) suggests, the impending implementation of student assessments aligned to the CCSS is potentially one of the most 49

challenging and transformative reforms that states are currently undertak ing in public schools. States that have adopted CCSS-aligned assessments have experienced precipitous declines in student proficiency scores, sig naling that the national shift to assessments aligned to the (more rigor ous) Common Core standards could have enormous impacts on school accountability ratings, teacher ratings, curriculum, and professional devel opment. Two multistate consortia are developing CCSS-aligned systems of assessment, and other test publishers have also launched efforts to create and market tests that will measure student progress toward the standards. These tests are intended to provide better measures of critical thinking and problem solving than existing state accountability tests, and an analysis of sample items from the two consortia suggests that they are on track to meet that goal better than most current state assessments have (Herman & Linn, 2013). In addition to these changes to student assessments in core subject areas, some states and districts are also trying to measure interpersonal and intrapersonal attributes that are often termed “21st-century competen cies.” These include constructs such as creativity, adaptability, resilience, and global awareness, many of which are presumed to be necessary for college and career readiness (Koenig, 2011; Pellegrino & Hilton, 2013; Soland, Hamilton, & Stecher, 2013). The growing emphasis on a broader set of outcomes stems in part from perceptions among some business and government leaders that globalization, technology, migration, interna tional competition, and changing markets require students to develop new kinds of competencies. Consequently, public schools have faced growing pressure to ensure that graduates master a broad range of outcomes related to content knowledge, skills, attitudes, and dispositions. Accompanying the expansion and revision of cognitive, interpersonal, and intrapersonal measures is a shift in the nature of accountability for schools and personnel derived from the student assessment results. As documented by Manna (2006), the first generation of test-based school accountability systems started in the 1990s, spurred by Clinton-era reforms to the Elementary and Secondary Education Act (ESEA). The 1994 ESEA reauthorization introduced many of the accountability features that became better known in the 2002 ESEA reauthorization, known as No Child Left Behind (Heinecke, Curry-Corcoran, & Moon, 2003). However, the U.S. Department of Education did not consistently enforce the 1994 reauthori zation; and states’ responses to the requirements, while widespread, were highly varied. By 2001, almost all states had adopted standards and assess ments, but only 33 had systematic state-defined accountability systems

(Goertz & Duffy, 2001; Heinecke, Curry-Corcoran, & Moon, 2003), and these ranged from weak to strong in their consequences for poor perfor mance (Dee & Jacob, 2011). No Child Left Behind ushered in the second generation of school accountability as of the school year 2002–2003 by enforcing greater uniformity in states’ accountability systems, mandating nationally that states issue Adequate Yearly Progress ratings for all schools and apply sanctions for federal Title I-funded schools. That law included a number of provisions that were intended to reduce achievement gaps and promote equity. Perhaps the best known of these were the subgroup reporting rules that required test results to be presented separately for dif ferent socioeconomic and racial/ethnic groups as well as for English lan guage learners and special education students. Schools could fail to meet Adequate Yearly Progress if a single subgroup missed its target, even if the population as a whole met it. The law was intended in large part to create pressure for districts and schools (through publicizing scores as well as through formal consequences) to bring low-performing students’ scores up to the proficient level. We are now in a third generation of school accountability, which draws on and expands even further the already large student-level data systems required for the second generation, to develop individual-level rather than school-level accountability ratings. Here we refer to performancebased evaluation practices for superintendents, principals, and teachers that are now becoming commonplace. For example, from 2009 to 2013, the number of states requiring that evaluations of public school teachers include an objective measure of student achievement increased from 15 to 41 (Doherty & Jacobs, 2013).These policies, like their predecessors, are motivated in large part by equity concerns, especially by research that suggests that low-income and minority students have less access to highly effective teachers than other students (Isenberg et al., 2013). The primary stated purpose of these systems is to provide informa tion about educators’ performance to help decision makers ensure that all students are taught by effective teachers and are enrolled in schools led by effective principals; this goal might be accomplished by using the infor mation to customize professional development or through personnel poli cies such as using evaluation scores to determine placement or tenure. The adoption of CCSS and the growth in technology-based curricula, assess ments, and analysis and reporting tools may well usher in a fourth gen eration of accountability, as personnel evaluation practices become more sophisticated in response to advances in statistical methodology and data systems (Coburn & Turner, 2011).

These iterations of test-based accountability attest to a growing emphasis on using standards, assessments, and accountability policies to improve teaching and learning. These policies have influenced the cultures and practices of schools and districts by encouraging educators to priori tize some outcomes over others when making decisions about curriculum and instruction. For instance, large numbers of districts have encouraged the use of pacing guides, benchmark assessment systems, mandated cur ricula, targeted professional development, and other resources that pro mote alignment with the state tests (Hamilton, Stecher, Russell, Marsh, & Miles, 2008; Rentner et al., 2006; Stecher et al., 2008), and these policies in turn promote classroom instruction that emphasizes tested outcomes. As a result, groups outside of traditional state and local education agencies have also shifted their approaches to interacting with the education system; these groups include commercial publishers of curricula and assessments, who increasingly emphasize alignment with standards and tests in their development and marketing activities, as well as intermediary organiza tions such as technical assistance providers, who are often called on by districts and schools to help them respond effectively to the new policies (Finnigan & O’Day, 2003; Honig, 2004). In addition, although much of the research that examines the effects of testing finds that educators respond differently to high-stakes account ability tests than they do to tests that do not have explicit stakes attached, it is important to keep in mind that the extent to which stakes are considered high is influenced by the perceptions of individual educators. Merely pub lishing school-level scores, for instance, can be considered a high-stakes use of scores because of the public pressure that can accompany this use (Faxon-Mills, Hamilton, Rudnick, & Stecher, 2013). The new era of CCSS-aligned student assessments will undoubtedly prompt states and school districts to revise existing school- and individ ual-level systems of measures. This chapter draws on pertinent research from decades of prior accountability reforms to help inform states and districts as they not only refine their measures based on the tradition ally tested “three R’s” but also expand their systems to incorporate 21st century competencies. We do not limit the discussion to tests that lead to formal rewards and sanctions, and we use the term school indicator system to refer to a collection of school-level measures that may or may not have stakes attached.1 In this context, a measure is a tool that is used to gather information systematically and is intended to support inferences about the performance or characteristics of students, educators, or educa tional institutions. Measures might be tests, surveys, or other systematic

data-collection approaches as well as straightforward indicators of char acteristics such as mobility or gender. The pervasiveness and serious con sequences of test-based accountability systems demand that federal, state, and local departments of education become expert in the design of indica tor systems that have complex and sometimes contradictory incentives. We offer guidance to designers of indicator systems and to those who are affected by those systems by first defining the purposes of assessments broadly (in which we include academic achievement as well as other outcomes). We then summarize research about the effects of measuring achievement, the effects of attaching stakes, and technical considerations about measurement. We set out the kinds of measures most commonly included in expanded state and local school indicator systems, and then briefly describe the nascent body of measures related to interpersonal and intrapersonal skills. Finally, we offer guidance to school districts as they consider how to revise or expand school indicator systems. With limited space, we focus on the use of measures designed for use at the school level rather than at an individual or classroom level.

Beyond Accountability:

Several Purposes of Large-Scale Assessments

The purpose of large-scale testing is often framed primarily in terms of high-stakes accountability systems that attach scores to specific rewards or sanctions, but these tests can serve a variety of other purposes as well (see Hamilton, Schwartz, Stecher, & Steele, 2013, for a more extensive dis cussion). They can be used as tools for monitoring the education system, akin to a temperature-taking mechanism that provides periodic updates but without imposing specific stakes. Tests can also support diagnosis and prescription, providing information to help administrators and teach ers identify areas of need and develop solutions to address them. Finally, tests serve a signaling function, sending messages to educators, students, parents, and others about what outcomes are valued. Tests can serve this purpose even in the absence of scores; it is the content of the tests rather than the scores that sends the signals. For example, the NCLB mandate to administer tests in English language arts (ELA), math, writing, and, to a lesser degree, science indicate that the framers of the policies viewed these as the functional core of schooling. As we discuss in the section on technical considerations, below, tests can also serve different purposes at different levels of the system. For instance, scores on large-scale achievement tests might be used to inform

course placement or remediation at the individual student level, while classroom-level information from the same test could help teachers decide what topics to emphasize in their lesson planning. At the school level, the same test might be used to identify schools in need of extra support or to impose rewards and sanctions. A focus of NCLB as well as of more recent accountability initiatives has been reducing performance gaps among schools or districts through increased pressure on, as well as assistance for, those that are weakest in terms of student achievement outcomes. Each of these purposes requires a specific set of considerations, and tests that have validity evidence for one purpose might not support valid inferences or decisions when used for another purpose (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2014; Kane, 2006; Linn, 1990). For instance, a test that is used exclusively for diag nosis at the individual student level should provide detailed information about students’ strengths and weaknesses, and it generally should not have high stakes for educators because of the potential for score corruption, which could weaken the test’s utility as a diagnostic tool. Even though the primary purpose of the tests developed in response to NCLB was to promote accountability, many states and districts have also used the tests for other purposes, such as diagnosing individual students’ needs. In the next section we briefly summarize recent research on how educators have responded to large-scale testing.

Lessons Learned About Indicator Systems From the Effects of Test-Based Accountability To understand the likely effects of expanding and revising indicator sys tems, it is helpful to consider research on how the measurement of stu dent achievement has shaped instructional practice, particularly when high stakes are attached. Although few studies have used a research design that can support causal inferences, there has been extensive qualitative and quantitative work seeking to understand educators’ responses to testing and to identify the conditions that might be associated with desirable or undesirable responses. This research is reviewed in Faxon-Mills et al., 2013; in this section we highlight key findings from that review. The main finding of this research is that what is tested gets taught. The literature identifies at least three broad categories of responses to test ing that have been observed among practitioners. The first is changes in curriculum content and emphasis, including shifts in the order in which

material is taught to ensure that tested material is covered before the test ing date (Firestone, Mayrowetz, & Fairman, 1998; Salinas, 2006), as well as reallocation of instructional time and resources across different subjects and across topics within a subject to emphasize tested material and deem phasize untested material (Amrein & Berliner, 2012; Hamilton, 2012; Rentner et al., 2006). This category also includes increases in the empha sis on discrete, easy-to-measure knowledge and skills (Gallagher & Smith, 2000; Herman & Golan, 1991; Johnston & McClune, 2000; Jones et al., 1999; Shepard & Dougherty, 1991), though this type of shift is not inevi table. When mathematics tests require students to explain their responses, teachers tend to engage students in activities that promote mathematical communication (Lane, Parke, & Stone, 2002; Taylor, Shepard, Kinner, & Rosenthal, 2003). Similarly, high-stakes writing tests often promote inreased emphasis on writing in the classroom (Koretz, Barron, Mitchell, & Stecher, 1996; Stecher, Barron, Kaganoff, & Goodwin, 1998). At the same time, research suggests that performance assessment and other open-ended testing formats do not always lead to the kind of instruction that their developers have envisioned. For instance, Stecher and Mitchell (1995) examined instructional practices of Vermont teachers after the state had implemented a large-scale portfolio assessment program, and found that teachers focused on specific aspects of problem solving that led to higher scores on the state scoring rubric rather than on promoting problem-solving skills more generally (see also Koretz, Stecher, Klein, & McCaffrey, 1994). The second broad category of responses focuses on changes in peda gogy, or how teachers teach rather than what they teach. This category includes the adoption of explicit test preparation activities, changes in instructional strategies, and changes in classroom assessment practices. Although there is not always a clear boundary between explicit test prep aration activities and other instructional activities that are intended to improve student performance on the content measured by the test (Koretz & Hamilton, 2006), the former category typically includes activities such as coaching students using items with the same format as those on the test, having students take a sample test or work through released items, and using commercial test preparation materials (Amrein & Berliner, 2012; Firestone, Mayrowetz, & Fairman, 1998; Rentner et al., 2006). Often these activities are supported by school and district leaders who supply test-preparation materials or set guidelines regarding the extent to which such activities should be a focus of instruction (Hamilton et al., 2008). In terms of instructional strategies, high-stakes testing has typically been

found to promote increased emphasis on more traditional forms of peda gogy such as direct instruction and reduced use of more student-centered strategies (Au, 2007; Harlen & Crick, 2002; Johnston & McClune, 2000; Smith, 2006; Watanabe, 2007). However, tests that include more complex, open-ended formats can sometimes lead to the opposite response (AdairHauck, Glisan, Koda, Swender, & Sandrock, 2006; Falk, Ort, & Moirs, 2007; Fuchs, Fuchs, Karns, Hamlett, & Katzaroff, 1999; Vogler, 2002). Perhaps most important, there is evidence that, although teachers often adopt practices that mirror the format of tests, most of these changes are somewhat superficial, and testing alone typically does not lead to funda mental changes in pedagogy (Diamond, 2007; Firestone, Mayrowetz, & Fairman, 1998). The third category of changes in instruction focuses on how teachers interact with individual students. It includes teachers’ use of test results to individualize instruction as well as teachers’ decisions to focus on some students more than others in order to maximize performance on the test. Although many advocates of high-stakes testing have argued that such tests can be useful for helping teachers understand students’ strengths and weaknesses, educators typically indicate that the utility of large-scale, standardized tests for these purposes is limited because the information is not provided in a timely way, the tests are typically administered only once per year, and the content of the tests is not always closely linked with curriculum (Marsh, Pane, & Hamilton, 2006; Wayman & Stringfield, 2006). Research on formative assessment suggests that the use of assess ments that are embedded in the curriculum and linked to guidance for next steps can help teachers understand their students’ needs and adjust instruc tion accordingly (Black & Wiliam, 1998; Goertz, Oláh, & Riggan, 2009; Oláh, Lawrence, & Riggan, 2010; Perie, Gong, Marion, & Wurtzel, 2007; Shepard, Davidson, & Bowman, 2011), though the interim assessments that many districts adopt lack the features needed to support truly forma tive uses of assessment and often result in changes in what is taught and to whom rather than changes in how specific content or students are taught (Goertz, Oláh, & Riggan, 2009). The third category includes another type of change in teachers’ inter actions with students: Assessments can lead teachers to make strategic decisions to focus on some students at the expense of others to improve test scores. A common manifestation of this in the NCLB era is a focus on “bubble kids,” those who are close to the proficient threshold and there fore most likely to exceed that threshold in response to instructional inter ventions (Amrein & Berliner, 2012; Booher-Jennings, 2005; Hamilton

et al., 2008; Pedulla et al., 2003; Stecher et al., 2008). In fact, Amrein and Berliner (2012) found that teachers’ focus on bubble kids came at the expense of students with lower scores who were considered unlikely to meet the proficiency threshold even with instructional intervention. Although this type of response could undermine the equity goals under lying NCLB, other evidence suggests that, at least in some contexts, the lowest-performing students may benefit from the accountability policies (Jacob, 2005) and that schools serving high-need students were more likely than other schools to engage in test-focused instruction and data use (Amrein & Berliner, 2012; Government Accountability Office, 2009). On balance, it is difficult to determine whether equity of educational opportu nity improved or declined under recent high-stakes testing policies because of mixed findings and also because the more-intensive mathematics and reading instruction that some high-need students might have received in response to NCLB-style incentives could have come at the expense of instruction in other subjects. In almost all of the research summarized above, there is diversity in how individual educators respond to testing. Even when researchers find large shifts in instruction, typically the magnitudes and even the directions of these shifts vary across individual educators. Although it is not pos sible to identify all of the factors that influence this variation, the research suggests a number of conditions that are relevant. These include features of the test and testing program (e.g., item format, quality of assessment, stakes attached to scores); educators’ backgrounds, beliefs, and knowl edge; student and school characteristics; and school and district policies on time use, professional development, curriculum, and collaboration opportunities for teachers (see Faxon-Mills et al., 2013, for a detailed dis cussion of these factors and the research addressing them). It is also important to note that although research points to commonly observed changes in instruction, the mechanisms through which testing induces those changes are typically unknown. Advocates of high-stakes testing have argued that attaching consequences to test scores will have a beneficial motivational effect, but some research suggests that perfor mance-based accountability policies can affect how people think about their work, their own identities, and the organizations in which they work, sometimes with negative effects on motivation (Colyvas, 2012). Specifically, linking test scores to external rewards and sanctions can, under some circumstances, have a detrimental effect on teachers’ morale and relationships and decrease the likelihood of constructive responses (Finnigan & Gross, 2007; Valli & Buese, 2007).

Moreover, many schools lack the infrastructure and resources needed to induce dramatic improvement in student achievement, and this is par ticularly true among the schools that are most at risk of not meeting their accountability targets because they also tend to serve high-need students and to be situated within districts that lack the human and financial capital needed to support improvement (Herman et al., 2008; Hess, 1998). These schools also frequently must address high levels of student mobility and other challenges related to family and neighborhood conditions that are largely outside their control (Boyd, Lankford, Loeb, Ronfeldt, & Wyckoff, 2011; Kirk & Sampson, 2011; Raudenbush, Jean, & Art, 2011; Schwartz, 2012), which limits the likelihood that incentives alone will be sufficient to induce large achievement gains. And schools and districts are also depen dent on access to high-quality, aligned curriculum and assessment materi als, professional development, and other supports that typically require the involvement of actors outside the formal education system, such as publishers and technical assistance providers. The path from new mea sures and accountability policies to improved student outcomes, therefore, is not a simple one and is influenced by a variety of factors both within and outside educators’ control. Thus, education systems should be viewed as one component that will generally need to operate in concert with other reforms or conditions in order to reach the ambitious goals that motivated the systems’ development.

Factors to Consider When Adopting New Measures The decision to adopt tests and other measures as performance indicators can promote a variety of responses among educators, some of which might be considered beneficial and others potentially harmful. Before selecting measures or designing a system of indicators, the people who are respon sible for these decisions should carefully consider the strengths and weak nesses of all potential measures based on assessment of their technical qualities as well as practical constraints such as cost. They should also identify any goals they have for using the measures to induce changes in educators’ practices and should explore the extent to which potential measures are likely to meet those goals. Soland et al. (2013) provided a framework for selecting measures based on technical, practical, and instructional considerations that we reprise here. The primary technical considerations include validity, reliability, and fairness. Validity is the most important consideration; it refers to the extent to which there is evidence to support a specific interpretation of scores

for a specific purpose. To illustrate, measures that are intended to identify which schools are in need of interventions should be accompanied by evi dence of the validity of scores on those measures as indicators of schools’ success at improving student learning in core academic subjects. Evidence can take a number of forms, including correlations with other measures, information about students’ response processes, and expert evaluations of the extent to which the content of the test covers the domain or subject area of interest (AERA, APA, & NCME, 2014). One important aspect of valid ity in the context of high-stakes measures is the extent to which scores might be corrupted as a result of the stakes. Score inflation is a common phenomenon that results from some of the kinds of instructional shifts discussed above (Koretz, 2008), and it diminishes the utility of scores as measures of student learning. Reliability is a related technical consideration that refers to the preci sion of scores, often measured by examining the stability or consistency of scores across multiple administrations of a test and by identifying sig nificant sources of error, such as the effects of human raters on open-ended assessments. Fairness refers to the concept that a test should measure the same construct for all examinees, regardless of group membership such as racial/ethnic background, gender, or disability status. It is a crucial consid eration, particularly when systems are intended to promote increased equity of opportunity as well as outcomes. Prospective users of tests and other indi cators should examine evidence related to all three of these considerations. Practical considerations are of course also important, particularly in light of shrinking budgets and the often excessive demands placed on educators’ time. The implementation of instruments such as tests, surveys, or interviews to collect the data for metrics carries substantial monetary and time costs for development (or purchase of preexisting instruments), for training of those who will collect the data, for scoring of data, for auditing of the administra tion process for quality control, and for reporting of results. With dozens of achievement tests in some school districts, teachers understandably resist the implementation of new measures without a clear justification of their intended purpose. To that end, potential users of mea sures need to identify the purposes that the measures are intended to serve and, in particular, the extent to which the measures are expected to influ ence the practices of teachers and other educators. Although instructional considerations might be most relevant for formative assessment systems that are explicitly intended to inform instruction, the research reviewed above makes it clear that externally mandated measures have the poten tial to influence what educators do. Important instructional considerations

include the clarity and usefulness of the data for helping teachers adjust their instruction and identify students’ strengths and weaknesses, whether scores or reports are linked to explicit guidance for next steps, and whether the act of participating in the assessment itself constitutes a valuable instructional experience. There are no right or wrong answers to these questions; instead, prospective users should examine measures against these various criteria and determine how to address any limitations or tradeoffs. As we discuss next, several states and districts have faced these questions as they have expanded their assessment systems to incorporate new measures.

State- and District-Level Efforts

to Expand School Performance Measures

Most state and district leaders have recognized the limitations inherent in NCLB-style assessment and have been working to expand the set of measures used to assess student progress. Some of these systems are used for accountability purposes—for example, to assign A–F letter grades to schools and mete out rewards and punishments accordingly—whereas others have lower stakes and are intended to provide information to help states and districts target assistance. A 2011 RAND study attempted to identify the components of these broader systems as means of exploring potentially promising approaches to accountability. (A complete description of our methods and findings is available in Schwartz, Hamilton, Stecher, & Steele, 2011). We found that 20 states had adopted expanded measurement systems as of the 2008– 2009 or 2009–2010 school years. Since that time, states have modified their systems in response to changes in federal and state policy. Among the 20 states we examined in the 2011 report, four categories of measures dominated these expanded systems: 1. Student performance in additional tested subjects (most often, history or social studies) 2. Measures of growth in student performance over time rather than of status at a single point in time 3. Indices to assign increasing weight to test scores along the entire spectrum of low to high performance instead of the NCLB focus lim ited to proficiency or above 4. College-readiness measures, such as dropping out of high school, ACT scores, or Advanced Placement course taking and test scores

The focus in state systems on growth in—rather than status of—student achievement is consistent with the most recent NCLB flexibility provided to states under the Obama administration. As of June 2014, of the 46 state education agencies that submitted ESEA flexibility requests, 43 states and Washington, D.C., were approved, and the U.S. Department of Education required that student growth be built into their accountability systems for schools (U.S. Department of Education, 2014). States will make additional modifications once ESEA is reauthorized, but most states will probably continue to include many of the measures listed above.

Measuring Outcomes Other Than Academic Achievement Although NCLB waivers are framed around college and career readi ness, the federal waiver system, like states’ systems, notably lacks metrics related to so-called 21st-century competencies or to indicators of school or district conditions. In our 2011 scan of state accountability systems, we only found these measures used on a smaller scale by individual schools or networks of schools. At this more local level, we identified two general categories of measures used that augmented the state achievement tests: (a) measures of students’ “opportunity to learn,” “conditions for learn ing,” and “school climate,” and (b) predictions about students’ likelihood of completing school on time, sometimes referred to as “at-risk” measures. The first category expanded on the test-based measures of school perfor mance to provide a more rounded picture of schools; the second aimed to help schools intervene with students to prevent dropping out. In more recent years, states and districts have developed measures related to college readiness and so-called 21st-century competencies. An example of the former is the College Readiness Indicator System, used in five cities (John W. Gardner Center at Stanford University, 2014). An exam ple of the latter is the initiative adopted by the William and Flora Hewlett Foundation to promote “deeper learning” among students throughout the United States, which emphasizes a broad range of outcomes, including mastery of core academic content, critical thinking and problem solving, collaboration, effective communication, self-directed learning, and an aca demic mindset (William and Flora Hewlett Foundation, 2014). Several other organizations have developed lists and frameworks that illustrate the range of attributes subsumed under the label of 21st-century skills or competencies. A recent RAND report (Soland et al., 2013) draws on a framework developed by the National Research Council and describes three broad categories of 21st-century skills, along with several specific

constructs included in each category (Table 1). The definitions provided in Table 1 do not necessarily capture the full range of each construct but are intended to provide a sense of which skills and competencies the con structs cover. Although there is research demonstrating the relevance of these skills to college and career readiness, the availability of high-quality, cost-effec tive measures varies tremendously across the skills. Established measures exist for some of the constructs; for instance, Duckworth and colleagues (Duckworth, Peterson, Matthews, & Kelly, 2007; Duckworth & Quinn, 2009) have developed and tested a measure of grit that has been widely used in a variety of contexts. Other skills have eluded easy measurement, in part because of a lack of clear definition. Leadership, for example, has been difficult to measure because it includes a number of narrower con structs, such as communication skills. Accordingly, some of the measures of 21st-century competencies that schools and districts are exploring rely on technology-based performance assessments and simulations and fre quently take a multidimensional approach, measuring several constructs simultaneously. These efforts have involved a wide range of organizations, includ ing universities and research institutions, policy organizations, and private companies that produce curricula and assessments. Despite the growing enthusiasm for, and commitment to, developing such measures, there are significant technical concerns about their properties for inclusion in a school indicator system. For example, measures of intrapersonal skills might be easily corrupted if students or teachers feel pressure to demon strate high levels of performance on them. Some of the more technologi cally sophisticated measures might be harder to “game.” However, they introduce practical problems related to cost and technological infrastruc ture. Researchers and educators are actively working to develop measures that address these limitations, but most such measures are not yet ready for large-scale deployment. The intended purposes also matter; several existing measures might be suitable for relatively low-stakes uses, such as monitoring trends to inform curricular decisions, even if the measures are not yet ready for high-stakes uses.

Guidance for Expanding Systems of Measurement Although expanding measures can direct educators’ attention to previ ously neglected aspects of schooling, adopting new measures carries risks as well as benefits—especially considering the prodigious time and cost

Table 1. Examples of 21st-Century Competencies by Category Cognitive Academic mastery: Academic achievement on traditional measures, including in subjects such as foreign languages and social studies that have been identified as important for promoting students’ ability to navigate a global economy. Critical thinking: Inductive and deductive reasoning and ability to produce appropriate analyses, inferences, and evaluations. Creativity: Ability to demonstrate innovation, to identify unusual and transformative solutions to problems. Interpersonal Communication and collaboration: Ability to engage in communication that is clear, that shares appropriate information, and that engages participants effectively; skills related to conflict resolution and negotiation. Leadership: A broad set of skills that can involve initiative, building consensus, articulating a vision, and working with and managing other people. Global awareness: Empathy and an understanding of interrelatedness of people, institutions, and systems; understanding of how actions and events in one part of the world affect other parts. Intrapersonal Growth mindset: Belief that intelligence is malleable and a function of effort.

Learning how to learn: Also called metacognition; refers to ability to determine how to

approach a task or problem, understanding of how to monitor one’s own comprehension and progress toward completing the task. Intrinsic motivation: Interest in pursuing a task because of interest in the task or other internal forces rather than in response to external incentives. Grit: Perseverance and passion for long-term goals. Note. Sources and further definitions of terms are available in Soland et al., 2013.

of collecting the data uniformly and accurately. The volume of data that schools already collect underscores the need for an intentional and care ful approach to school indicator systems. Ultimately, the purpose of a mea surement system (i.e., for monitoring, for use in a diagnostic or prescriptive way to guide school improvement decisions, or for an accountability system with explicit stakes attached) should dictate the selection of measures. The purpose relates to the expectations that society holds for schools, such as whether schools are expected to promote outcomes like civic responsibility in addition to raising student achievement. What is the rationale for expand ing a system to include interpersonal or intrapersonal measures, and how will the scores from these measures be used to make decisions about indi vidual students, educators, or schools? Understanding the purpose is neces sary for evaluating the other criteria that are relevant to the decision. Once

the purpose of the indicator system is identified, the next major decisions include how to balance complexity versus transparency, how to create an affordable system that is still reasonably comprehensive, whether to allow flexibility in choice or use of measures across units, how much to emphasize formative and summative purposes, and whether to adjust for differences in school inputs. Based on existing research, we cannot provide a definitive set of recommendations for designing a system of measures that benefits students. But we can offer some general guidance for enhancing the likeli hood of desirable outcomes. Some of our recommendations are intended for designers of indicator systems, regardless of whether they work at the federal, state, or local level. Others pertain more directly to district leaders who are in the position of responding to mandated measurement systems. We offer the following recommendations for designing a system of measures.

1. Subject each selected measure to careful scrutiny. After designers identify the purpose of the indicator system, the next stage is to identify individual measures to gauge progress against each goal. And when weighing an individual measure, the designer should consider not only the purpose of the measure but also its instructional utility (e.g., whether it provides actionable information to teachers and students and whether it promotes effective instruction); practical considerations such as cost (including purchase, training, administration, interpretation, and reporting); ease of implementation; and the technical properties of the measure—namely validity, reliability, and fairness. For example, tests used for high-stakes decisions face a higher bar for technical quality than do tests that are used for instructional improvement, whereas the instruc tional utility of the test is more relevant for the latter purpose than for the former. In many instances, designers and users of indicator systems will need to work with external providers such as test publishers to identify and adapt measures to meet their needs. Thus, having access to high-quality external partners will be a crucial condition for ensuring an appropriate system of measures.

2. Provide supports for teaching that are aligned with the system’s goals. It is especially important for teachers to receive ongoing support that helps them adopt instructional practices that are aligned with the accountability system’s goals (Desimone, Porter, Garet, Yoon, & Birman, 2002; DuFour,

Eaker, & DuFour, 2005; Elmore, 2004). Resources and supports that can help promote high-quality teaching include sample lesson plans and vid eos that illustrate desirable practices, reports that help teachers synthe size information about student learning from multiple data sources, the provision of planning time, and opportunities for frequent collaboration in the use of high-quality data systems that are embedded in local curri cula rather than simply mirroring the high-stakes test. In addition, because most educators do not know as much about teaching and learning interper sonal and intrapersonal competencies as they do about teaching traditional academic content, they are likely to experience uncertainty and anxiety if they are asked to meet specific assessment targets related to these skills. A growing number of commercial publishers and nonprofit organi zations have begun publishing materials that are ostensibly aligned with CCSS or that target interpersonal and intrapersonal outcomes, but in the absence of clear evidence to support these claims, users will need to use caution before concluding that these products will be appropriate for their schools and districts. And users also need to consider the potential uses for these materials. For example, it is important that system design ers not focus exclusively on the provision of results to educators; they should think carefully about how to design reporting systems that can help educators interpret and respond to the information. Test-score reports that provide digestible, specific information can better help teachers to isolate specific skills or patterns and not simply report aggregated broad-scale scores; to identify groups of students with similar problems; and to offer instructional responses to address the identified skill gaps. Separate reports for students might also give them direct feedback so they can act on the information provided. Ideally, a next-generation data system might pool information from across a variety of assessments—formative and summa tive—to offer customized learning recommendations to each student, and by pooling information from different types of metrics, reduce the limita tions inherent to each.

3. Consider trade-offs between breadth and focus. Although an expanded system of indicators might do a better job of mea suring the multiple goals that society holds for schools, such as preparing graduates who are college and career ready and who possess skills neces sary for high-skilled work, the inclusion of a large number of measures may diffuse rather than focus educators’ work and carry more costs for districts and states if they are to collect the data well. Nevertheless, if

additional measures are deemed necessary, training and support regard ing how to prioritize among multiple signals sent by broadened indicator systems can reduce the risk of confusion and information overload. The CCSS initiative has the potential to mitigate the breadth problem to some degree by emphasizing a relatively small number of topic areas, but edu cators will need assistance to understand how to reconcile this attempt at focus with their preexisting views regarding the range of outcomes stu dents should be expected to master.

4. Consider likely effects on equity. Any incentives built into future accountability systems should encour age the provision of high-quality educational opportunities to all students. These incentives sometimes take the form of subgroup rules, such as those enacted under NCLB that require each numerically significant subgroup to achieve a set proficiency target for the school to be identified as meet ing its overall target. But these incentives often carry unintended negative consequences; the NCLB-style subgroup accountability rule means that a school with more groups is more likely to fail than a school with fewer groups but a similar level of overall performance (Kane & Staiger, 2002). This creates incentives for states to game subgroup definitions, or to seg regate their schools to reduce the number of accountable subgroups. One solution would be for states to require schools to report on each subgroup and develop plans to improve the performance of subgroups that miss their targets, while refraining from applying whole-school sanctions based on the performance of one or two subgroups. Another equity-related concern stems from the emphasis of many accountability policies on identifying and improving schools that serve high-need student populations. To the extent that these schools lack access to resources and partnerships that could support their improvement efforts, there is a risk that these policies could exacerbate rather than address the equity problem.

5. Anticipate the ongoing refinement of a school indicator system. Local needs and contexts vary, and systems of assessment and accountability should be designed to account for these variations. Customization can require a process of experimentation to try emerging measures or to refine measures to avoid unintended negative incentives. Those who design, mandate, or use information from school indicator systems share a responsibility for investi gating the validity of the information for its intended uses. This can involve

examining relationships among different measures to assess whether each one is functioning the way it is expected to and is actually measuring what is intended. Other validation practices include asking whether the system pro vides district leaders with information they can use to improve their school support efforts, whether principals and teachers find the data useful for dayto-day decision making and for broader strategic planning, and whether any unintended shifts in curriculum, instruction, or resource allocation occur in response to the implementation of the system. Finally, since the design and validation of new measures can be costly and time-consuming, partnering with other districts could offer economies of scale. One particular aspect of these systems that is likely to need continual refinement is the technology infrastructure required for administration, reporting, and storing of information. Many new state assessment pro grams rely on computerized administration, and districts across the United States have recently adopted instructional management systems and other technology-based approaches to managing data on student achievement and other outcomes. Important considerations include whether schools have enough computers or devices (e.g., tablets) to facilitate test admin istration, whether adequate bandwidth exists for online administration, whether teachers and students have the necessary experience to use tech nology-based materials, and whether the reliance on technology facili tates or hinders communication with families. The growing emphasis on blended and personalized learning through technology-based delivery of instruction and assessment will create further challenges that will require schools and districts to evaluate on an ongoing basis their technology resources and their approaches to reporting student achievement informa tion (Partnership for Next Generation Learning, 2010).

6. Include both high- and low-stakes measures in indicator systems. Low-stakes items can be an effective way to introduce measures that are not yet well validated, or ones that have utility to educators but are easily corrupted. For example, educators and students might especially benefit from the inclusion of measures that serve as leading indicators of college and career readiness, such as information about course-taking or intrap ersonal skills. At the same time, attaching stakes to these measures may not be warranted and in fact could diminish the utility of information they produce. To design a system that strikes an appropriate balance between high and low stakes and between being comprehensive and promoting a focus on high-priority areas, system developers can take several steps:

(a) engage key stakeholder groups in all phases of planning and imple mentation, including the selection of measures and decisions about how those measures will be used; (b) publish data from the system in a timely manner, using a format and dissemination mechanism that is widely acces sible, easy to understand, and, where appropriate, linked to guidance for next steps; (c) ensure that any incentives that are attached to performance are aligned with the district’s goals and reevaluated frequently to address any undesirable consequences that may arise.

Conclusion The next generation of student assessments promises to alter school accountability ratings under current rules, which will likely prompt further revisions to existing school indicator systems. In addition, the desire to move beyond academic measures of core subjects, paired with the grow ing availability of measures of constructs other than academic achieve ment, is likely to alter the kinds of information that school systems publish and consumers demand. The question then arises how to revise or expand school indicator systems in a way that improves them rather than creat ing unwieldy, costly systems that simply require educators to devote more time and resources to reporting, without any clear benefit. Unlike states, school districts have the opportunity to modify school indicator systems on a small scale and potentially in a low-stakes context to keep up with changing goals for schools. In theory, school districts can react nimbly by developing and refining, based on feedback, a series of measures: for example, by adopting and then iterating upon “at risk” measures to hone their predictive ability, or combining data from several tests to identify gaps in students’ skills and groupings for differentiated instruction, or adopting largely untested measures of interpersonal or intrapersonal skills without stakes attached to them to discourage cor ruption. Of course, designers of school indicator systems should adopt only measures that answer questions about schools’ progress toward preidentified goals. School districts have the opportunity to experiment with a system that not only expands the aspects of teaching and learning that are measured, but also applies different levels of incentives and conse quences to avoid encouraging a lock-step focus on “teaching to the test.” In short, districts are in a unique position to be laboratories for the devel opment and refinement of measures that could increase educators’ aware ness of, and thereby their ability to improve, a wider range of the skills that students need for future success.

Note 1

Our definition of an indicator system focuses on the measures it includes and not on the broad infrastructure that might support it, such as a data-warehousing platform.

References Adair-Hauck, B., Glisan, E. W., Koda, K., Swender, E. B., & Sandrock, P. (2006). The Integrated Performance Assessment (IPA): Connecting assessment to instruction and learning. Foreign Language Annals, 39, 359–382. American Educational Research Association, American Psychological Association, & Na tional Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Amrein, A. L., & Berliner, D. C. (2012). An analysis of some unintended and negative consequences of high-stakes testing. Tempe, AZ: Arizona State University Education Policy Studies Laboratory, Education Policy Research Unit. Au, W. (2007). High-stakes testing and curricular control: A qualitative metasynthesis. Educational Researcher, 36(5), 258–267. Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Educa tion, 5(1), 7–74. Booher-Jennings, J. (2005). Below the bubble: “Educational triage” and the Texas account ability system. American Educational Research Journal, 42(2), 231–268. Boyd, D., Lankford, H., Loeb, S., Ronfeldt, M., & Wyckoff, J. (2011). The effect of school neighborhoods on teachers’ career decisions. In G. J. Duncan & R. J. Murnane (Eds.), Whither opportunity: Rising inequality, schools, and children (pp. 377–396). New York: Russell Sage Foundation. Coburn, C. E., & Turner, E. O. (2011). Research on data use: A framework and analysis. Measurement: Interdisciplinary Research & Perspective, 9(4), 173–206. Colyvas, J. A. (2012). Performance metrics as formal structures through the lens of social mechanisms: When do they work and how do they influence? American Journal of Education, 118(2), 167–197. Dee, T., & Jacob, B. (2011). The impact of No Child Left Behind on student achievement. Journal of Policy Analysis and Management, 30(3), 418–446. Desimone, L. M., Porter, A. C., Garet, M. S., Yoon, K. S., & Birman, B. F. (2002). Effects of professional development on teachers’ instruction: Results from a three-year longi tudinal study. Educational Evaluation and Policy Analysis, 24, 81–112. Diamond, J. B. (2007). Where the rubber meets the road: Rethinking the connection be tween high-stakes testing policy and classroom instruction. Sociology of Education, 80, 285–313. Doherty, K. M., & Jacobs, S. (2013). State of the states 2013: Connect the dots: Using evaluations of teacher effectiveness to inform policy and practice. Washington, DC: National Council on Teacher Quality. Retrieved from http://www nctq.org/dmsView/ State_of_the_States_2013_Using_Teacher_Evaluations_NCTQ_Report

Duckworth, A. L., Peterson, C., Matthews, M. D., & Kelly, D. R. (2007). Grit: Persever ance and passion for long-term goals. Journal of Personality and Social Psychology, 92(6), 1087–1101. Duckworth, A. L., & Quinn, P. D. (2009). Development and validation of the Short Grit Scale (GRIT–S). Journal of Personality Assessment, 91(2), 166–174. DuFour, R., Eaker, R., & DuFour, R. (2005). Recurring themes of professional learning communities and the assumptions they challenge. In R. DuFour, R. Eaker, & R. DuFour (Eds.), On common ground: The power of professional learning communities (pp. 7–29). Bloomington, IN: Solution Tree. Elmore, R. F. (2004). School reform from the inside out: Policy, practice, and perfor mance. Cambridge, MA: Harvard University Press. Falk, B., Ort, S., & Moirs, K. (2007). Keeping the focus on the child: Supporting and reporting on teaching and learning with a classroom-based performance assessment system. Educational Assessment, 12(1), 47–75. Faxon-Mills, S., Hamilton, L. S., Rudnick, M., & Stecher, B. M. (2013). New assessments, better instruction? Designing assessment systems to promote instructional improve ment. Santa Monica, CA: RAND. Finnigan, K. S., & Gross, B. (2007). Do accountability policy sanctions influence teacher motivation? Lessons from Chicago’s low-performing schools. American Educational Research Journal, 44, 594–629. Finnigan, K. S., & O’Day, J. (2003). External support to schools on probation: Getting a leg up? Philadelphia: Consortium for Policy Research in Education. Firestone, W. A., Mayrowetz, D., & Fairman, J. (1998). Performance-based assessments and instructional change: The effects of testing in Maine and Maryland. Educational Evaluation and Policy Analysis, 20(2), 95–113. Fuchs, L., Fuchs, D., Karns, K., Hamlett, C., & Katzaroff, M. (1999). Mathematics perfor mance assessment in the classroom: Effects on teacher planning and student problem solving. American Educational Research Journal, 36(3), 609–646. Gallagher, T., & Smith, T. (2000). Main report: The effects of the selective system of sec ondary education of Northern Ireland. Belfast, UK: Queen’s University. Goertz, M. E., & Duffy, M. C. (2001). Assessment and accountability systems in the 50 states: 1999–2000 (RR-046). Philadelphia: Consortium for Policy Research in Educa tion. Goertz, M. E., Oláh, L., & Riggan, M. (2009). From testing to teaching: The use of interim assessments in classroom instruction (Research Report #RR-65). Philadelphia: Con sortium for Policy Research in Education. Government Accountability Office. (2009, November). Student achievement: Schools use multiple strategies to help students meet academic standards, especially schools with higher proportions of low-income and minority students (GAO-10-18). Washington, DC: Author. Hamilton, L. S. (2012). Measuring teaching quality using student achievement tests: Les sons from educators’ responses to No Child Left Behind. In S. Kelly (Ed.), Under standing teacher effects (pp. 49–75). New York: Teachers College Press.

Hamilton, L. S., Schwartz, H., Stecher, B. S., & Steele, J. (2013). Improving accountability through expanded measures of performance. Journal of Educational Administration, 51(4), 453–475. Hamilton, L. S., Stecher, B. M., Russell, J. L., Marsh, J. A., & Miles, J. (2008). Account ability and teaching practices: School-level actions and teacher responses. In B. Fuller, M. K. Henne, & E. Hannum (Eds.), Strong state, weak schools: The benefits and dilem mas of centralized accountability. St. Louis, MO: Emerald Group. Harlen, W., & Crick, D. (2002). A systematic review of the impact of summative assessment and tests on students’ motivation for learning (EPPI-Centre Review). London: EPPICentre, Social Science Research Unit, Institute of Education. Retrieved from https://eppi. ioe.ac.uk/cms/LinkClick.aspx?fileticket=Pbyl1CdsDJU%3D&tabid=108&mid=1003 Heinecke, W. F., Curry-Corcoran, D. E., & Moon, T. R. (2003). U.S. schools and the new standards and accountability initiative. In D. L. Duke, M. Grogan, P. D. Tucker, & W. F. Heinecke (Eds.), Educational leadership in the age of accountability: The Virginia experience (pp. 7–35). Albany: State University of New York Press. Herman, J. L., & Golan, S. (1991). Effects of standardized testing on teachers and learn ing—Another look. Los Angeles: University of California, Los Angeles, Center for Re search on Evaluation, Standards, and Student Testing. Herman, J. L. & Linn, R. L. (2013). On the road to assessing deeper learning: The status of Smarter Balanced and PARCC assessment consortia (CRESST Report 823). Los Angeles: University of California, Los Angeles, Center for Research on Evaluation, Standards, and Student Testing. Herman, R., Dawson, P., Dee, T., Greene, J., Maynard, R., Redding, S., & Darwin, M. (2008). Turning around chronically low-performing schools: A practice guide (NCEE #2008-4020). Washington, DC: National Center for Education Evaluation and Region al Assistance, Institute of Education Sciences. Retrieved from http://ies.ed.gov/ncee/ wwc/publications/practiceguides Hess, R. (1998). Spinning wheels: The politics of urban school reform. Washington, DC: Brookings Institution Press. Honig, M. I. (2004). The new middle management: Intermediary organizations in educa tion policy implementation. Educational Evaluation and Policy Analysis, 26(1), 65–87. Isenberg, E., Max, J., Gleason, P., Potamites, L., Santillano, R., Hock, H., & Hansen, M. (2013). Access to effective teaching for disadvantaged students (NCEE 2014-4001). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences. Jacob, B. (2005). Accountability, incentives and behavior: The impact of high-stakes test ing in the Chicago Public Schools. Journal of Public Economics, 89(5–6), 761–796. John W. Gardner Center at Stanford University. (2014). Menu of college readiness indica tors and supports (College Readiness Indicator Systems Resource Series). Seattle, WA: Bill & Melinda Gates Foundation. Johnston, J., & McClune, B. (2000). Selection project SEL 5.1: Pupil motivation and atti tudes: Self-esteem, locus of control, learning disposition and the impact of selection on teaching and learning. Belfast, UK: Queen’s University. Retrieved from http://www. deni.gov.uk/22-ppa_gallagherandsmith_selproj5-1_pupilsmotivationandattitudes.pdf

Jones, M., Jones, G., Brett, D., Hardin, B., Chapman, L., Yarbrough, T., & Davis, M. (1999). The impact of high-stakes testing on teachers and students in North Carolina. Phi Delta Kappan, 81(3), 199–203. Kane, M. T. (2006). Validation. In R. Brennan (Ed.), Educational Measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education/Praeger. Kane, T. J., & Staiger, D. O. (2002). The promise and pitfalls of using imprecise school accountability measures. Journal of Economic Perspectives, 16(4), 91–114. Kirk, D. S., & Sampson, R. J. (2011). Crime and the production of safe schools. In G. J. Duncan & R. J. Murnane (Eds.), Whither opportunity: Rising inequality, schools, and children (pp. 397–418). New York: Russell Sage Foundation. Koenig, J. A. (2011). Assessing 21st century skills: Summary of a workshop. National Acad emies Press. Retrieved from http://www nap.edu/openbook.php?record_id=13215 Koretz, D. (2008). Measuring up: What educational testing really tells us. Cambridge, MA: Harvard University Press. Koretz, D., Barron, S., Mitchell, K., & Stecher, B. M. (1996). Perceived effects of the Ken tucky Instructional Results Information System (KIRIS) (Document MR-792-PCT/FF). Santa Monica, CA: RAND. Koretz, D., & Hamilton, L. S. (2006). Testing for accountability in K–12. In R. L. Brennan (Ed.), Educational measurement (4th ed., 531–578). Westport, CT: American Council on Education/Praeger. Koretz, D., Stecher, B. M., Klein, S. P., & McCaffrey, D. (1994). The Vermont Portfolio Assessment Program: Findings and implications. Educational Measurement: Issues and Practice, 13(3), 5–16. Lane, S., Parke, C. S., & Stone, C. A. (2002). The impact of a state performance-based assess ment and accountability program on mathematics instruction and student learning: Evi dence from survey data and school performance. Educational Assessment, 8, 279–315. Linn, R. L. (1990). Essentials of student assessment: From accountability to instructional aid. Teachers College Record, 91(3), 422–436. Manna, P. (2006). School’s in: Federalism and the national education agenda. Washington, DC: Georgetown University Press. Marsh, J., Pane, J. F., & Hamilton, L. S. (2006). Making sense of data-driven decision mak ing in education: Evidence from recent RAND research. Santa Monica, CA: RAND. Oláh, L., Lawrence, N., & Riggan, M. (2010). Learning to learn from benchmark assessment data: How teachers analyze results. Peabody Journal of Education, 85(2), 226–245. Partnership for Next Generation Learning. (2010). Partnership for Next Generation Learn ing overview. Washington, DC: Council of Chief State School Officers and Stupski Foundation. Retrieved from http://www.ccsso.org/Documents/2010/PNxG_Innovation_ Lab_Net_Overview-Aug%2010_2010.pdf Pedulla, J. J., Abrams, L. M., Madaus, G. F., Russell, M. K., Ramos, M. A., & Miao, J. (2003). Perceived effects of state-mandated testing programs on teaching and learn ing: Findings from a national survey of teachers. Boston: National Board on Educa tional Testing and Public Policy.

Pellegrino, J. W., & Hilton, M. L. (2013). Education for life and work: Developing trans ferable knowledge and skills in the 21st century. Washington, DC: National Academies Press. Retrieved from http://www nap.edu/catalog.php?record_id=13398 Perie, M., Gong, B., Marion, S., &Wurtzel, J. (2007). The role of interim assessments in a comprehensive assessment system: A policy brief. Dover, NH: National Center for the Improvement of Educational Assessment. Raudenbush, S. W., Jean, M., & Art, E. (2011). Year-by-year and cumulative impacts of attending a high-mobility elementary school on children’s mathematics achievement in Chicago, 1995–2005. In G. J. Duncan & R. J. Murnane (Eds.), Whither opportunity: Rising inequality, schools, and children’s life chances (pp. 359–375). New York: Rus sell Sage Foundation. Rentner, D. S., Scott, C., Kober, N., Chudowsky, N., Chudowsky, V., Joftus, S., & Zabala, D. (2006). From the capital to the classroom: Year 4 of the No Child Left Behind Act. Washington, DC: Center on Education Policy. Salinas, C. (2006). Teaching in a high-stakes testing setting: What becomes of teacher knowledge? In S. G. Grant (Ed.), Measuring history: Cases of state-level testing across the United States (pp. 177–193). Greenwich, CT: Information Age. Schwartz, H. (2012). Housing policy is school policy: Economically integrative housing promotes academic success in Montgomery County, Maryland. In R. D. Kahlenberg (Ed.), The future of school integration (pp. 27–66). New York: Century Foundation. Schwartz, H., Hamilton, L. S., Stecher, B. M., & Steele, J. L. (2011). Expanded measures of school performance. Santa Monica, CA: RAND. Shepard, L., Davidson, K., & Bowman, R. (2011). How middle school mathematics teach ers use interim and benchmark assessment data (CRESST Report 807). Los Angeles: University of California, Los Angeles, Center for Research on Evaluation, Standards, and Student Testing. Shepard, L. A., & Dougherty, K. C. (1991). Effects of high-stakes testing on instruction. Paper presented at the annual meeting of the American Educational Research Associa tion and the National Council on Measurement in Education, Chicago, IL. Smith, A. M. (2006). Negotiating control and protecting the private: History teachers and the Virginia Standards of Learning. In S. G. Grant (Ed.), Measuring history: Cases of statelevel testing across the United States (pp. 221–247). Greenwich, CT: Information Age. Soland, J., Hamilton, L. S., & Stecher, B. M. (2013). Measuring 21st-century competen cies: Guidance for educators. New York: Asia Society. Stecher, B., Barron, S., Kaganoff, T., & Goodwin, J. (1998). The effects of standards-based assess ment on classroom practices: Results of the 1996–97 RAND survey of Kentucky Teachers of Mathematics and Writing (CSE Tech. Rep. 482). Los Angeles: University of California, Los Angeles, Center for Research on Evaluation, Standards, and Student Testing. Stecher, B. M., Epstein, S., Hamilton, L. S., Marsh, J. A., Robyn, A., McCombs, J. S., Russell, J. L., & Naftel, S. (2008). Pain and gain: Implementing No Child Left Behind in California, Georgia, and Pennsylvania, 2004 to 2006. Santa Monica, CA: RAND. Stecher, B. M., & Mitchell, K. (1995). Portfolio driven reform: Vermont teachers’ understand ing of mathematical problem solving (CSE Tech. Rep. 55). Los Angeles: University of Cali fornia, Los Angeles, Center for Research on Evaluation, Standards, and Student Testing.

Taylor, G., Shepard, L., Kinner, F., & Rosenthal, J. (2003). A survey of teachers’ per spectives on high-stakes testing in Colorado: What gets taught, what gets lost (CSE Technical Report 588). Los Angeles: University of California, Los Angeles, Center for Research on Evaluation, Standards, and Student Testing. U.S. Department of Education. (2014). ESEA flexibility. Retrieved from http://www2. ed.gov/policy/elsec/guid/esea-flexibility/index html Valli, L., & Buese, D. (2007). The changing roles of teachers in an era of high-stakes ac countability. American Educational Research Journal, 44(3), 519 –558. Vogler, K. E. (2002). The impact of high-stakes, state-mandated student performance as sessment on teacher’s instructional practices. Education, 123(1), 39–56. Watanabe, M. (2007). Displaced teacher and state priorities in a high-stakes accountability context. Education Policy, 21(2), 311–368. Wayman, J., & Stringfield, S. (2006). Data use for school improvement: School practices and research perspectives. American Journal of Education, 112, 463–468. William and Flora Hewlett Foundation. (2014, April). Deeper learning. Retrieved from William and Flora Hewlett Foundation website: http://www hewlett.org/programs/edu cation/deeper-learning

Section 2

Systems Learning

at the School and

Classroom Levels

Chapter 3

Formative Experimentation:

The Role of Experimental Research

in Program Development

JonATHAn SupoviTz

University of Pennsylvania, Consortium for Policy Research in Education

D

istricts frequently create and implement programs of their own invention: new curriculum materials, alternative professional development approaches and models for working with teachers (e.g., by using coaches or professional learning communities), or different incentive structures. Few of these new ways of doing things are evaluated with anything more than informal impressionistic assessments. Almost none are investigated experimentally. What should the role of experimentation be in program development and district reform? Many people view formal experimentation—with ran dom assignment of subjects to treatment and control conditions—as a prac tice reserved for testing the effects of mature, stable interventions that need a final, rigorous assessment before they can be scaled up or evaluated by the What Works Clearinghouse. Randomized trials are often ungainly affairs because they require rigid adherence to strict protocols, withhold treatment from a subgroup of participants, and discourage adjustments of the prespeci fied treatment once the experiment has begun. Their purpose is to determine the ultimate effect of an intervention, not to provide useful feedback to pro gram developers that can be used to improve the intervention in process. For this reason, experiments are often viewed as the final arbiters for mature interventions but not as learning opportunities for evolving programs. They may be seen as judgments that can damage the reputation of a promising intervention if applied prematurely. Thus, program developers often shy away from experimentation in the development process. But can experimentation be used more intimately as a mechanism for intervention improvement? What value does this approach offer and in what ways does a more formative use of experimentation compromise the experiment itself? In what ways are the contrasts that emerge during an experiment helpful in providing feedback and adjusting program design? In essence, can experiments be used for program learning? 77

These are the questions I examine in this chapter, using data from a pilot intervention carried out in an experimental framework. I explore the relationship between experimental research and program learning in four sections. First, I describe the traditional role of experimentation in education research and the recent increase in attention to it. I then examine two movements in education research that may include experi mentation but emphasize data-based formative feedback to program developers. Gleaning from these strains of research, I explore the idea of formative experimentation, whose goal is as much to provide feedback to interveners for the purpose of learning and improvement as it is to provide summative evidence of impact. Using a case study of an experi ment with a local school district by the Consortium for Policy Research in Education (CPRE, at the University of Pennsylvania), I explore the opportunities and constraints of developing an intervention within an experimental design.

The Role of Experiments in School Improvement I begin this chapter with a paradox of sorts about the learning limitations of experimental research. While experimental research may produce the best estimates of program impacts, it generally provides very little infor mation that helps program developers learn. In fact, during the develop ment process, a program team may be wary, for fear that experimentation could prematurely stigmatize their program as ineffective, while providing little feedback to increase effectiveness. Classical experimental research in education is traditionally viewed as a way of determining the ultimate efficacy of educational programs, poli cies, or other interventions. Campbell and Stanley (1963), in their semi nal guide to quantitative education research designs, famously described experimental research as “the only means for settling disputes regarding educational practice, as the only way of verifying educational improve ments, and as the only way of establishing a cumulative tradition in which improvements can be introduced without the danger of a faddish discard of old wisdom in favor of inferior novelties” (p. 2). Cook (2002), an ardent experimentalist, points out the scarcity of experimental research in education “despite widespread recognition of the superiority of random assignment” (p. 195). He argues that “schol ars in schools and colleges of education who call themselves educational evaluators hardly ever do experiments and usually counsel against doing them” (p. 196). Cook points to research comparing experiments with

quasi-experiments which indicates that they reach the same general con clusions but that experiments are more efficient due to smaller standard deviations around the effect sizes. He also highlights the potential misal located human and material costs of not getting to the “right” answer on important problems. Cook (2002) lists a number of reasons that researchers have eschewed experiments, including philosophical arguments about the conditions imposed by experiments; practical arguments about the lim its and undesirable tradeoffs of implementing experiments; and prefer ences for other methods that focus on program processes as well as outcomes. He frames the debate about methods as one of priority. He describes the majority of educational evaluators (read “nonexperimen talists”) as wanting research to “examine ways to provide individual schools or district staff with continuous feedback about strategic plan ning, program implementation, or student and teacher performance monitoring” rather than “describing what works educationally” (p. 177). As the field currently thinks of it, we view experimental research not as a way of improving interventions but as a way of testing their efficacy. But why must this trade-off exist? Over the last 15 years, experimentation has emerged as the domi nant paradigm of research funded by the U.S. Department of Education’s Institute of Education Sciences, and of federally funded work in gen eral. The No Child Left Behind Act (NCLB) of 2001 called for scien tifically based research, a term which, Slavin (2002) noted, appeared no less than 115 times in the legislation. Scientifically based research, as the term implies, prioritizes the application of positivistic scientific methods to educational problems. In Slavin’s view, the emphasis on scientifically based research is an effort to ameliorate the “awful reputation” that educa tion research has among policy makers (Kaestle, 1993; Lagemann, 2002). Reviewing the post-NCLB terrain, Slavin points out that experiments are increasingly common but may have the unfortunate effect of trivializing the questions that are examined, due to the difficulty of setting up experi mental conditions for complex educational questions. Thus, the drive for more experiments may actually be reducing our attention to important and complicated educational questions. A major critique of experiments, one pointed out by Cook (2002) in his summary of reasons that experiments are less used, is the tendency of reforms to be adjusted as they play out in different contexts. A poster child for this issue is the work of Berman and McLaughlin (1978), who led the RAND Change Agent study, which described the process of 293

federally funded education projects in 18 states. Berman and McLaughlin introduced the compelling concept of mutual adaptation to describe how reforms were shaped by both designers and local implementers throughout the implementation process. They found that the projects with the biggest impacts were those that were shaped by both designers and implementers. They observed: Rarely are projects carried out exactly to the letter of the original design. Instead, they must be adapted into the institutional setting while the people in the schools, and the organizations those people have created, must at the same time adapt to the demands of the project. In other words, we hypothesize that innovation in schools is a process of mutual adaptation. If nothing happens to change the project, then it probably never really “met” the system. If nothing happens to change the setting, then there probably was no real implementation. (Berman & McLaughlin, 1978, p. 8) From this point of view, experiments are attempts to get a fix on a constantly moving target. This leads to concerns that a causal estimate may not be replicable because the treatment being measured is con tinuously adjusted. My 20 years of experience researching ambitious school and district reforms affirms that complex reforms are frequently adjusted as they are adopted in local contexts. For example, from 2004 to 2006, CPRE studied a set of five high school reforms as they were implemented (Supovitz & Weinbaum, 2008). They ranged from comprehensive school reforms to literacy reforms to data-use reforms. We interviewed educators at mul tiple times in both the schools and the districts where the reforms were being implemented and watched the reforms unfold in the schools over two years. Our analyses compared the enacted reforms with the designed reforms and found not only that there were differences between the two but also that adjustments had been made at various levels of the system (classroom, school, district) and at various times. I coined the term itera tive refraction, defined as the “process through which reforms are adjusted repeatedly as they are introduced into, and work their way through, school environments” (Supovitz, 2008, p. 153). The clear message from this study was that education reforms, like other complex social reforms that require implementation through multiple layers of large organizations (in this case, schools and districts), are constantly adjusted as they are incor porated into practice.

The use of experimental methods as a means of program improve ment almost certainly implies the involvement of researchers in the program development process. This infusing of research into the educa tional R&D process has a long and uncertain history in the United States, going back to the curriculum development movement of the 1950s, if not earlier (Tyack & Cuban, 1995). One tradition in education research is the use of design experiments to develop reform programs. Design experi ments are not experiments in the classical sense. They are conducted to develop theories. They focus on designing elements of an intervention and anticipating how the elements will function together, then testing the theorized influences against those that occur under actual conditions (Cobb, Confrey, Lehrer, & Schauble, 2003). Brown (1992) was one of the early developers of design experiments, which combined field-based and laboratory-based research. McCandliss, Kalchman, and Bryant (2003) described tensions between experimental methods and design experiment models, including several examples of collaboration across these approaches, which featured iterative feedback between causal test ing in laboratory conditions and integration into ecological educational contexts. They found that the interaction advanced both theory and prac tice. Their examples were from partnerships between educational practi tioners and cognitive scientists. A more recent development is design-based implementation research (DBIR). Penuel, Fishman, Cheng, and Sabelli (2011) intro duced the DBIR concept as a strategy for “supporting the productive adaptation of programs as they go to scale” (p. 331). Penuel et al. dis tinguished four characteristics of DBIR: persistent focus on practice from multiple stakeholders’ perspectives; commitment to collaborative design; commitment to theory development; and concern with devel oping capacity for sustaining change in systems. DBIR tries to bring researchers and practitioners closer together through shared experience with data on program implementation. Thus, in DBIR-related program development, the line between researcher and practitioner is blurred. DBIR is agnostic about the methods used by researchers to inform implementation research, but Penuel et al. caution about the persistent challenge of coordinating the timelines and activities of research and development. A similar movement for researcher–program developer collaboration is exemplified by the Improvement Research model being developed by the Carnegie Foundation for the Advancement of Teaching (Bryk, Gomez, & Grunow, 2011). This model focuses on (a) reducing variation in core

problems of practice, with attention to the measurement of key outcomes and processes to track whether change is an improvement, and (b) acceler ating learning through quick inquiry cycles and networked improvement communities. These authors contrast the typical development model, con sisting of theory development, piloting, and field testing and culminating with a rigorous experimental efficacy trial, against a host of multiple small studies, conducted across networked sites with common, or at least over lapping, measures. Nevertheless, advocates of the Improvement Research model continue to value experimentation as the strongest mechanism for causal inferences, where applicable.

The Case of the Linking Study To explore the questions posed earlier, I used information that my col leagues and I had collected in the first year of a project, funded by the Spencer Foundation of Chicago, to develop and experimentally test the influence of an intervention that provided feedback to teach ers on the connections between their teaching and their students’ learn ing. Conducted in 2010–2012, the project resulted from a partnership between CPRE and a suburban school district in New Jersey. District leaders and CPRE researchers co-designed a professional development experience for teachers nested within a research study. The project, called “Linking Instructional Practice to Student Performance” (or the Linking Study), was based on the idea that whereas teachers receive ample data on the learning of their students via test scores and are asked to make inferences back to their teaching, they receive very little information that connects their teaching to the learning of their stu dents. The hypothesis we were testing was that data on instruction exam ined by teachers in conjunction with data on student learning is a more powerful condition for improving the quality of subsequent teaching than the examination of data on student learning alone. Based on this premise, in 2009 district leaders and CPRE research ers co-designed an intervention to provide teachers with data on their instruction, in the form of feedback on a lesson observed by video tape. The teachers were to examine the data in conjunction with their students’ end-of-unit test data at regularly scheduled professional learning community (PLC) meetings. At that time, the district used the Investigations mathematics curriculum in elementary schools, so we focused our instructional feedback from the videotapes around two aligned dimensions of mathematics instruction, Academic Rigor

and Accountable Talk. We chose these dimensions because there was research evidence for their effective leverage in changing instruction (Cobb, Boufi, McClain, & Whitenack, 1997; O’Connor & Michaels, 1996; Tharp & Gallimore, 1988) and because rubrics had been developed to assess them (Boston & Wolf, 2006; Junker, Weisberg, Matsumura, Crosson, Wolf, & Levison, 2005; Matsumura, Garnier, Pascal, & Valdés, 2002). We initially envisioned the 2010–2011 academic year as the first of a two-year project, but factors external to our work as well as factors that arose from our work led us to divide the project into two one-year parts with separate studies (for the 2011–2012 results, see Supovitz, 2014). The external changes were the consequence of budget cuts in the district, which made it necessary for 10 of the 11 district mathematics coaches to be reassigned back to the classroom. This led us to rethink the extent to which we could have district coaches facilitate PLC meetings. The adjustments internal to the project are the focus of this chapter: We found a number of ways that the feedback we were receiving from the data could be used to strengthen the intervention. We did not hesitate to use this feedback to strengthen the intervention throughout the year, even as we continued the experiment. As previously stated, the goal of the research component of the proj ect was to experimentally test the hypothesis that combining the data on teaching and learning was a more powerful learning experience for teach ers than providing the data on student learning alone. Both the intervention and the research study design are encapsulated in Figure 1. To implement the intervention, teachers of mathematics in Grades 3–5 were recruited in fall 2010 to participate in the experiment. Teachers who volunteered to participate were randomly assigned by grade levels to the treatment or control condition. All participating teachers had mathematics lessons vid eotaped in two predetermined Investigations mathematics units during the school year, one in October–November 2010 and one in February–March 2011 (with the timing of the videotaping depending on the grade level and timing of the curricular units). First, the treatment group received custom ized feedback on the Academic Rigor and Accountable Talk aspects of their lessons via e-mail. Second, teachers in both the treatment and control conditions participated in facilitated conversations about their data in a PLC meeting. For the treatment group, this meant conversations about their instruction and their students’ end-of-unit test performance. For the control group, this meant conversations about their students’ end-of-unit test performance only, without feedback on their instruction. At the end of

Figure 1. Overview of linking study design. PLC = professional learning community; E-O-U = end-of-unit.

the school year, teachers in both the treatment and control groups received their videotaped lessons, along with written comments about the lessons. Data for the study were collected in several ways, also depicted in Figure 1. First, online surveys were administered to all participating teach ers, in the fall before they began taking part in the project and again in the spring after participation. The surveys collected the demographics of participants, their attitudes and beliefs about data, information about their grade-level PLCs, and their experience using the math curriculum. Second, the videotaped lessons, which were the basis for teachers’ instructional feedback, served as additional data for the research. All les sons were coded by trained raters on multiple dimensions of Academic Rigor and Accountable Talk as measures of teachers’ instructional quality for that lesson. Third, after their PLC group experience, teachers in the both the treat ment and the control groups completed a short “exit slip” survey. The exit slips focused on participants’ reactions to their PLC experience, their perceptions of insight gained on teaching and student understanding, and their comfort with examining the data in groups. Fourth, we conducted focus groups with the PLC facilitators between the two rounds of feedback. These conversations focused on the protocols that were used by the facilitators and how they felt the experience was received by teachers. Fifth, we conducted interviews with a sample of teachers in the treat ment group to better understand their perceptions of the experience and how they viewed the influence of the experience on their subsequent instruction. Finally, we collected two types of test data on all of the students of teachers who participated in the study. First, we collected the end-of-unit test data for each student. Second, we collected the New Jersey state test data for all students whose teachers participated in the study. For fur ther description of the project and complete results for the first year, see Supovitz, 2012.

Design Changes and Evidence That Spurred Them We learned a tremendous amount from the feedback that we gleaned from the data during the pilot year of the project, which resulted in several adjustments to the intervention. Five design changes were made during or after the pilot year, each of which I describe below, with discussion of the data (if any) that contributed to the change.

Design Change 1 •

Issue: Instructional data for teachers was not targeted enough.

•

Data sources: Interviews with teachers and focus groups with facilitators.

•

Adjustment: Moving away from generic questions about instruction toward examination of video clips of specific instructional interactions.

In the original design, we engaged teachers in general conversations about the academic rigor of their lessons and examples of accountable talk by asking questions such as the following: What strategies do you use to engage students to think more deeply about the main ideas of the lesson? And what strategies do you use to get them to explain their thinking? In our interviews with teachers and in the focus groups with facilitators, it became clear that these questions were too generic to engage teachers in specifics about the lessons. It was also clear that the teachers wanted more specific, and even more critical, conversations to make the experience worthwhile. Based on this information, we refined the spring PLC feed back sessions to focus discussion on instructional practice of Accountable Talk as evidenced in the video clips, and we integrated the discussion of Academic Rigor into the examination of student performance data.

Design Change 2 •

Issue: Lessons on different topics were videotaped for different teachers, leading to difficulties in discussing common experiences.

•

Data sources: Interviews with teachers and focus groups with facilitators.

•

Adjustment: Asking each grade-level team within a school to choose a com mon lesson for videotaping, feedback, and conversation.

The original program design called for videotaping a lesson within a specified curricular unit but did not specify what lesson would be video taped. Thus, in the first round of feedback the PLC facilitators spent a lot of time orienting teachers regarding which lessons within the unit had been observed. The interview data from teachers and the focus group conversa tions with facilitators both suggested that the process would be more pow erful if the participants all were discussing the same lesson. Consequently, in the second round of the intervention, in spring, we instituted a design change: Teachers at each grade level within a school agreed on a single lesson that they would all teach for videotaping (not necessarily on the

same day). This helped to streamline the feedback process and the discus sions within PLC meetings.

Design Change 3 •

Issue: The time period between lessons and lesson feedback was too long.

•

Data source: Interviews with treatment group teachers.

•

Adjustment: Changing the timing of treatment group lesson feedback.

The curricular units within which we embedded the intervention lasted for anywhere from 4 to 6 weeks. Our initial design called for teachers to receive their e-mailed feedback before the facilitated PLC meeting, which occurred early in the next unit (because the teachers needed to gather their end-of-unit test data before the PLC meeting). Often several weeks elapsed between the lesson we videotaped and the teachers’ receipt of feedback by e-mail, especially if the teacher chose to have a lesson early in the unit videotaped. We learned about the short shelf life of instructional feedback when we interviewed teachers participating in the treatment group, who said they often had to think back to situate themselves in the lesson on which we provided feedback. Based on these comments, we changed the design of the intervention so that teachers received their feedback within 10 days of the lesson, regardless of how far this was from the PLC meet ing. In subsequent interviews, teachers told us they found this to be a use ful design improvement.

Design Change 4 •

Issue: Some participants were concerned that the treatment group PLC ses sions contained too much information.

•

Data sources: Exit slip surveys and facilitator focus groups.

•

Adjustment: Restructuring treatment group PLC session design and length ening PLC meetings.

A combination of data points caused concern that the 40-minute treat ment sessions were too compressed and thus were potentially repressing the outcomes. The treatment-control comparison of results from the exit slip data indicated that there were no differences in perceived knowl edge gained about instruction (the purpose of the treatment), while there were significant differences in the perceived knowledge gained about

understanding student performance, favoring the control group. In dis cussing these results with the project development team, we thought that the latter finding may have been attributable to the fact that the control group teachers were spending more time than the treatment group in their PLC meetings looking at student performance data. In addition, the fact that perceptions of learning about instruction were no different in the two groups indicated that either the intervention was ineffective or something was amiss in the way it was being delivered. Additional data from the PLC facilitator focus groups indicated that the facilitators were having trouble getting through the designed agenda for the sessions. Consequently, we redesigned the sessions to look at the student data first and then focus on instructional practice. Switching the sequence from teachingàlearning to learningàteaching helped to focus teachers on the practices that produced student understanding. These data together raised concern that the treat ment sessions needed to be adjusted. The facilitators felt they could manage the discussions about student performance more tightly than discussions about instruction, which tended to go in many directions. In addition, the district was moving to longer (50-minute) PLC meetings in 2011–2012, which would give teachers more time to look at both types of data.

Design Change 5 •

Issue: Treatment teachers were uncomfortable with examining instructional data in PLCs.

•

Data source: Exit slips.

•

Adjustment: Steps to reinforce safety of PLC environments.

When we examined the exit slip data after the first feedback cycle, we found that the treatment group felt significantly less comfortable with exam ining data in their PLCs than did the control group (see Supovitz, 2012, for the results). We attributed this to the fact that the treatment group explicitly discussed instructional data in their PLC conversations, while the control group did not. We inferred that for treatment teachers this new experience might have increased their sense of PLC meeting insecurity. In response, we took a number of steps to make the PLCs safer environments. First, we reinforced the program’s edict that school administrators were not allowed to attend the PLCs unless unanimously invited by the team. Second, we made the discussion of individuals’ video clips voluntary; teachers could opt not to show clips from their lessons if they were uncomfortable doing so.1

Third, our previous actions raised awareness among the PLC group facilita tors, which may have had some effect on their demeanor during the PLCs. In the spring exit slip data comparing treatment and control groups on this survey scale, we observed two results. First, teachers’ average level of comfort in examining data in the PLCs went up in the treatment group (as it also did in the control group). Second, there was no longer a significant difference between the two groups. This could be interpreted in several ways. For example, it was possible that both groups became more com fortable with the whole experience because this was the second round of feedback cycles. It was also possible that our adjustments for the treatment group had produced the desired effect.

Design Change 6 •

Issue: Possible treatment expansion.

•

Data source: Classroom ratings, achievement results.

•

Adjustment: Increasing cycles of feedback from two to three.

At the end of the first year, we analyzed the results from the second feed back cycle. We compared the feedback data for the treatment and control groups and saw that the changes from fall to spring in the external ratings of class room lessons indicated that the intervention was improving classroom practice. The fall lesson ratings showed no difference between the treatment and control groups, as we expected because that measure was taken before any treatment had occurred. The spring ratings showed significant differences favoring the treatment group on both the Academic Rigor and Accountable Talk measures. This indicated that the treatment was having the desired effect—the experimen tal results showed improvements in the external ratings of the treatment group over the control group, which, due to random assignment, gave us confidence that the result was attributable to the treatment rather than being an artifact of a prior difference between the treatment and control groups. We did not, however, see student achievement effects in the end-of-unit tests. Buoyed by this prelimi nary finding, we decided to strengthen the intervention further by increasing the number of feedback cycles from two to three in the following year.

Summary In the previous section I chronicled six design changes that were made during the pilot year of the linking intervention. These changes are

summarized in the first three columns of Table 1. All of them arose after the original design was put into action, and all were based on informa tion learned from the implementation experience. This experience rein forced the well-accepted idea that adaptation occurs on a regular basis as programs, particularly new programs, are implemented (Berman & McLaughlin, 1978; Supovitz, 2008). Table 1. Summary of Design Changes Issue

Data Sources

Program Adjustment

Design Method

Adjustment Source

Teacher Design Change 1 Instructional component interviews, of treatment untargeted facilitator focus groups

Increased focus Case study on video clips as examples of instructional data examined by treatment group

Problembased

Teacher Design Change 2 PLCs seen as inefficient interviews, facilitator focus groups

Videotaping and discussing same lesson for all teachers within treatment PLCs

Case study

Problembased

Design Change 3 Teachers’ wish for earlier feedback

Teacher interviews

Refining sequence of treatment feedback

Case study

Problembased

Design Change 4 Concern that treatment sessions were too packed with information

Exit slips and facilitator focus groups

Further efforts to streamline treatment PLC

Experimental

Problembased

Exit slips Design Change 5 Teachers uncomfortable with examining instructional data in PLCs

Steps to reinforce safety of PLC environments

Experimental

Data-based

Design Change 6 Possible treatment expansion

Increasing cycles Experimental of feedback from two to three

Data-based

Classroom ratings, achievement results

The fourth column in Table 1 focuses on the methods that were used to identify the issues that resulted in the program design changes. Two things are noteworthy in this analysis. First, design changes 1 to 3 came from data collected from both the program participants and the facilitators who were charged with delivering the program. These data would have arisen without

the contrast of a control group, much less one that was randomly assigned. Adjustments 4 to 6, however, were based on experimental comparisons between the treatment and control groups. It is impossible to say whether this required an experiment; that is, whether the same results would have occurred with a quasi-experiment or with use of some other nonequivalent comparison group. The experimental nature of the results certainly gave us confidence that they were caused by the treatment rather than some other factor. This view was reinforced particularly by analyses of the baseline data, which showed that the groups were equivalent on the available mea sures (both background and pre-treatment) prior to the intervention. Second, as shown in the last column of Table 1, the first four adjust ments were based on problems that were identified in the treatment— through case study or contrast data—but the final two adjustments were made only after analyzing the data. This suggests that there are two poten tial sources for program design changes in formative experimentation. The first, which results in a problem-based adjustment, is a reaction to a per ceived problem that arises from data collection. The second, which results in a data-based adjustment, is a consequence of an examination of the data that raises an idea for improvement. In these cases, the availability of the data allows the researcher to observe differences and develop hypotheses about the causes of the differences. These data-based adjustments benefit immensely from the experimental condition because of the increased con fidence that the differences are not an artifact of the sampling approach.

Discussion One of the questions I set out to explore in this study was whether an exper imental design, typically focused on looking at program impacts, can pro vide a helpful perspective from which to inform program improvement. Based on the analysis presented in this chapter, I believe that designing a project as an experiment can provide a useful vantage point for learning. By embedding experimental contrast into the implementation design of our linking study, the program and research teams were well positioned not only to observe differences between treatment and control but also to examine why those differences existed and others did not. The research point of view helped us to take a data-oriented, inquiry-based perspec tive: The hypotheses we had set up at the beginning of the project, rep resented in both the design and the data we chose to collect, represented what we expected to observe. When we did not see anticipated differ ences, we wondered why. When we did see anticipated differences, we

wondered which component of the design was responsible. Along the way, the experimental condition removed the nagging uncertainty that differences were due to selection bias or other background differences of participants, rather than due to the experience itself. The confidence of attribution is a powerful thing. It is also important to recognize that the experimental design pro vided the framework for the investigation but did not preordain any par ticular data collection method. Important aspects of this study were the various kinds of data that we collected and the various methods we used to collect them, including observation (via video), surveys, interviews, focus groups, and collection of existing system data. We also examined the intervention using a variety of analysis techniques, both qualitative and quantitative. These, in addition to the research design, gave us a range of perspectives on the treatment. The surveys helped us to exam ine teacher perspectives; the observational data on classroom instruction and teacher interactions in PLCs gave us an independent perspective on both instructional quality and the quality of teacher conversations; and the interview and focus group data gave us a nuanced understand ing of how teachers and facilitators experienced the project. Formative experiments are richer when they use mixed methods, and much of what we learned came from insights from qualitative data. Thus, although the study was set up as an experiment, we were not constrained by data col lection that strictly followed the experimental design. Both experimental and nonexperimental data were useful for learning about the project’s strengths and shortcomings. The role of researchers in this kind of project is to use both the contrast provided by the design and the analysis of the data to strengthen the inter vention. The researchers are neither dispassionate observers nor unbridled program enthusiasts. Rather, they use data to form questions about the program design and use the questions to develop refining hypotheses. The researchers’ allegiance is to evidence-based inquiry rather than to program advocacy. In one example from this study, we found that teachers in the treatment group were less comfortable than teachers in the control group about examining data with their colleagues. We inferred from this that examining data on instruction (which only the treatment group was doing) was more uncomfortable to teachers than looking at data on student learn ing (which both groups were doing). This inference encouraged us to con sider ways to reassure teachers about the safety of the PLC meetings and positioned us to determine whether the discomfort would persist over time or was due to the newness of the experience.

This experience also led me to wonder why many advocates of experi mental research are adamant about intervention stability as a condition of such research. Intervention instability can take two forms. First is internal instability, which can be due to unclear program specifications. This sort of instability arises from lack of logical clarity in the model of the program. Second is external instability, which may be due to the situation in which the program is operating. This kind of instability may lead the implementers to adjust the program to circumstances. I wrote elsewhere about a similar dis tinction (Supovitz, 2013) between context and situation. Context is an inte gral part of any intervention, for example, the background characteristics of the participants or the socioeconomic conditions of a school. Situation is more fluid; it is tied to external events that surround and flow through an intervention as it proceeds, and may necessitate program adjustments. If the only goal of an experimental research project is to determine a causal effect, then I am much more sympathetic to arguments for internal stability as a necessary precondition to the experiment than I am to similar arguments for external stability. I agree that interventions should have clear specifications, because learning about interventions requires that the research methods and instruments be sensitive to the specifications. But to expect com plete program rigidity may be unreasonable. We have seen in the literature (Berman & McLaughlin, 1978; Supovitz & Weinbaum, 2008) that programs often adjust as they are implemented—often due to circumstances external to the program—and that participants’ sense of ownership of program ideas often depends on these adjustments. What if externally driven adjustments are involved in the majority of complex and ambitious interventions? A final factor to be considered in deciding whether to use an experi ment formatively is the costs associated with such a design. In this chapter, I have focused on the benefits of formative experimentation, but there are real costs. First, experiments often require elaborate setup and execution. The marketing to attract a group willing to forgo the treatment is often more costly than setting up a quasi-experiment with a voluntary matched comparison group. Keeping track of both groups and finding ways to retain participants also has costs. Thus, formative experiments may be most fea sible when there is an oversupply of willing participants and the treatment is limited, by cost or capacity, to a selected subgroup. The research experience that informed this article suggests that forma tive experimentation is a promising method for learning by making adjust ments in experimental designs, rather than solely for judging summative effectiveness. Under what conditions might formative experimentation make sense? First, the program should be well enough developed to have

a clear theory of action. Second, the opportunity to evaluate experimentally should be present, for example where there is excess demand or otherwise low levels of consternation about creating a control group. Third, the finan cial tradeoffs should be reasonable. In addition, researchers should take a new stance toward interventions, acting as data-informed critical friends. Perhaps most important, there needs to be a partnership between developers and researchers that involves embedding research and learning into the core of the intervention process. Formative experiments can not only show us “what works” but also create fertile conditions for learning along the way.

Note 1

If some of the participating teachers in a PLC opted not to show video clips, it was not a problem because we simply focused on those who agreed. If all teachers in a PLC decided not to show their clips, we used video from teachers in other PLCs at that grade level who had consented to their use. We felt this weakened the treatment (i.e., made the experience more generic and less customized), but it prioritized the comfort of teachers within their PLCs.

References Berman, P., & McLaughlin, M. W. (1978). Federal programs supporting educational change: Vol. 8. Implementing and sustaining innovations. Santa Monica, CA: RAND. Boston, M., & Wolf, M. K. (2006). Assessing academic rigor in mathematics instruction: The development of the Instructional Quality Assessment toolkit (CSE Technical Re port No. 672). Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing, Center for the Study of Evaluation. Brown, A. L. (1992). Design experiments: Theoretical and methodological challenges in cre ating complex interventions in classroom settings. Journal of the Learning Sciences 2(2), 141–178. Bryk, A. S., Gomez, L. M., & Grunow, A. (2011). Getting ideas into action: Building net worked improvement communities in education. In M. T. Hallinan (Ed.), Frontiers in sociology of education (pp. 127–162). Rotterdam, Netherlands: Springer. Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research on teaching. Washington, DC: American Educational Research Association. Cobb, P., Boufi, A., McClain, K., & Whitenack, J. (1997). Reflective discourse and collec tive reflection. Journal for Research in Mathematics Education, 28(3), 258–277. Cobb, P., Confrey, J., Lehrer, R., & Schauble, L. (2003). Design experiments in educational research. Educational Researcher, 32(1), 9–13. Cook, T. D. (2002). Randomized experiments in educational policy research: A critical examination of the reasons the educational evaluation community has offered for not doing them. Educational Evaluation and Policy Analysis, 24(3), 175–199.

Junker, B., Weisberg, Y., Matsumura, L. C., Crosson, A., Wolf, M., & Levison, A. (2005). Overview of the instructional quality assessment (CSE Technical Report No. 671). Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing, Center for the Study of Evaluation. Kaestle, C. F. (1993). The awful reputation of education research. Educational Research er, 22(1), 23–31. Lagemann, E. C. (2002). An elusive science: The troubling history of education research. Chicago: University of Chicago Press. Matsumura, L. C., Garnier, H., Pascal, J., & Valdés, R. (2002). Measuring instructional quality in accountability systems: Classroom assignments and student achievement. Educational Assessment, 8, 207–229. McCandliss, B. D., Kalchman, M., & Bryant, P. (2003). Design experiments and labora tory approaches to learning: Steps toward collaborative exchange. Educational Re searcher, 32(1), 14–16. O’Connor, M. C., & Michaels, S. (1996). Shifting participant frameworks: Orchestrating thinking practices in group discussions. In D. Ghicks (Ed.), Discourse, learning, and schooling (pp. 63–103). New York: Cambridge University Press. Penuel, W. R., Fishman, B. J., Cheng, B. H., & Sabelli, N. (2011). Organizing research and development at the intersection of learning, implementation, and design. Educational Researcher, 40(7), 331–337. Slavin, R. E. (2002). Evidence-based education policies: Transforming educational prac tice and research. Educational Researcher, 31(7), 15–21. Supovitz, J. A. (2008). Implementation as iterative refraction. In J. A. Supovitz & E. H. Weinbaum (Eds.), The implementation gap: Understanding reform in high schools (pp. 151–172). New York: Teachers College Press. Supovitz, J. A. (2012, April). The linking study: First-year results. Paper presented at the annual meeting of the American Educational Research Association, Vancouver, CA. Retrieved from http://cpre.org/linking-study Supovitz, J. A. (2013). Situated research design and methodological choices in formative program evaluation. In B. Fishman, W. Penuel, A. Allen, & B. Cheng (Eds.), Designbased implementation research: Theories, methods and exemplars. National Society for the Study of Education Yearbook, 112(2), 372–399. Supovitz, J. A. (2014). Linking teaching and learning: The results of an experiment to strengthen teachers’ engagement with teaching and learning data. Philadelphia: Con sortium for Policy Research in Education. Supovitz, J. A., & Weinbaum, E. H. (Eds.). (2008). The implementation gap: Understand ing reform in high schools. New York: Teachers College Press. Tharp, R. G., & Gallimore, R. (1988). Rousing minds to life: Teaching, learning, and schooling in social context. Cambridge, UK: Cambridge University Press. Tyack, D., & Cuban, L. (1995). Tinkering toward utopia: A century of school reform. Cam bridge, MA: Harvard University Press.

Chapter 4

A Research-Practice Partnership to Improve

Formative Classroom Assessment in Science

williAm r. penuel

University of Colorado, Boulder

AngelA HAyDel DeBArger

George Lucas Educational Foundation

M

any district reform efforts today focus on improving instruction as a strategy for improving learning outcomes for all students (Corcoran, Fuhrman, & Belcher, 2001; Honig, Copland, Rainey, Lorton, & Newton, 2010; Supovitz, 2006). A key aim of many of these efforts is to bring greater coherence to the key elements of the local educa tional system,1 especially standards, curriculum, assessment, and teacher professional development (Fuhrman, 1993; Rorrer, Skyrla, & Scheurich, 2008; Supovitz, 2006). In addition, a key district function in instructional improvement is to promote equity with respect to student opportunities to learn, by supporting uniformly high-quality teaching across all classrooms and schools (Rorrer et al., 2008; Supovitz, 2006; Talbert & McLaughlin, 1993; Trujillo, 2011). Developing coherence and ensuring equity, research suggests, requires that districts develop a vision for high-quality instruction and build the commitment and capacity of teachers to enact and evolve that vision (Cobb & Jackson, 2012; David & Shields, 2001; Supovitz, 2006). At present, most districts rely on external partners to provide services and resources to help realize their visions and initiatives for instructional improvement. When district central offices are organized principally to manage schools, they may be limited in their capacity to provide support for implementation of instructional change in schools (Crowson & Morris, 1985; Honig, 2013). In this case, external partners, more than district lead ers, often can bring different and useful experience, resources, and techni cal expertise to improvement efforts (Supovitz, 2006). This is especially true for services that require both specialized knowledge and extensive resources, such as curriculum design and assessment development. In this chapter, we describe a kind of design research partnership between a district and external partners that is organized to provide 97

long-term support for district reform initiatives and to support instruc tional and assessment coherence through focused professional develop ment. In partnerships like the one we describe, district leaders, teachers, and researchers work collaboratively to improve instruction by designing resources that address jointly negotiated problems of practice. Such part nerships contribute expertise in subject matter, design, and research to dis tricts’ ongoing efforts. As we illustrate with an example of a partnership in which we have been engaged since 2007, these relationships can produce tools that teachers find useful, that are adaptable to diverse students’ needs and strengths, and that support short-term improvements in instructional practice aligned to a district vision.

Design Research Partnerships in District Reform Research-practice partnerships are long-term collaborations between practitioners and researchers that are intentionally organized to investi gate problems of practice and solutions for improving district outcomes (Coburn, Penuel, & Geil, 2013). Policy makers and funders see poten tial in research-practice partnerships to enable greater use of research evidence in decision making (e.g., Tseng, 2012). Advocates from within the research community argue that such partnerships can address persis tent problems of practice and improve educational outcomes (Bryk, 2009; Donovan, 2013). Design research partnerships are a type of research-practice partner ship. A key aim of design research is to develop theories on the process of learning and the means of supporting these processes (Cobb, Confrey, diSessa, Lehrer, & Schauble, 2003; Design-Based Research Collective, 2003; Kelly, Baek, Lesh, & Bannan-Ritland, 2008; Sandoval, 2014; Sandoval & Bell, 2004). Although most design research focuses on either a single classroom or a few classrooms (e.g., Confrey & Lachance, 2000), in design research partnerships, efforts can span an entire school district (e.g., Cobb & Jackson, 2012). In design research partnerships, collaborative, iterative design is a leading activity (Coburn et al., 2013). That is, teachers, educational lead ers, and researchers work together to design, implement, and test strate gies for improving teaching and learning outcomes in a school district. The focus of the designs, moreover, is on problems of practice facing dis trict leaders (Donovan, 2013). As in the case of action research, a key aim of design research partnerships is to develop strategic knowledge that is relevant to leaders in the district. Design research partnerships also aim to

generate understandings of principles and mechanisms for change through systematic inquiry that can be used elsewhere and in the future by dis trict leaders themselves to support change (Bryk, 2009; Penuel, Fishman, Cheng, & Sabelli, 2011).

How Design Research Partnerships Can Support Coherence Design research partnerships have the potential to support district leaders’ efforts to build coherence among key elements of the educational sys tem: standards, curriculum, assessment, and professional development. For example, the collaborative design process employed in design part nerships can increase the likelihood that new tools developed to support any one of these elements will fit together into a harmonious and logi cal whole, a key condition for coherence (cf. National Research Council, 2012). The collaborative design process can also help develop teachers’ sense of ownership over new tools that can support common goals for instructional improvement (Penuel, Roschelle, & Shechtman, 2007). This is significant because successful district reform depends on districts’ bal ancing guidance with efforts to build ownership of reform goals (David & Shields, 2001). But building a coherent system requires more than bringing key ele ments into alignment with reform goals that are decided ahead of time. Implementation of district-level reforms creates its own problems that must be addressed in real time, and so district leaders must engage in con tinuous diagnosis of problems and devise new strategies for dealing with them (Spillane & Coldren, 2010). Creating coherence among elements of the system is a “never-ending grind” requiring that individual actors across multiple levels of a district gain a “grasp of the whole” that guides their moment-to-moment decision making (Kirp, 2013, pp. 13, 10). It is necessary to have resources at hand to solve new problems and enable both people and systems to learn as they design and implement innova tions (Lissack & Letiche, 2002). Design research partnerships have the potential to support these aspects of coherence building as well. For example, research activities can provide feedback about how well strategies are working and about how widely particular visions of reform are shared among teachers (Cobb, Jackson, Smith, Sorum, & Henrick, 2013). In addition, organizational routines that engage researchers, district leaders, and teachers in collaborative sense making about emerging challenges and opportunities can help individuals get a better grasp of the whole environment (Glazer & Peurach, 2013).

How Design Research Partnerships Can Promote Equity Design research partnerships also have the potential to broaden access to opportunities to learn for all students, a key dimension of equity. Opportunities to learn encompass access to the resources, practices, and skilled guidance needed to develop proficiency in a given domain (Guiton & Oakes, 1995). Opportunities to learn are defined both by participation in learning activities in classrooms and by students’ access to classrooms and courses where such activities take place, which is shaped by district policies such as tracking (Hand, Penuel, & Gutiérrez, 2012; Oakes, 1990; Quiroz & Secada, 2003). Design research partnerships can support a focus on equitable oppor tunity to learn in multiple ways. Innovations collaboratively designed for classrooms can incorporate specific equity goals, such as drawing on a wide range of students’ “funds of knowledge” to organize subject matter instruction (e.g., Civil, 2007) and promoting more equitable par ticipation in classroom discussions (e.g., Hoadley, 2004). Research can investigate whether innovations produce different kinds of outcomes for different student groups (e.g., Roschelle et al., 2010) and explore how and when participation helps students from diverse backgrounds iden tify with subject matter (Nolen, Tierney, Goodell, Lee, & Abbott, 2014). There are examples of partnerships helping to broaden students’ access to advanced courses, although in these instances, the partnerships have been between researchers and community groups rather than districts (e.g., Oakes & Rogers, 2006).

Case Study: A Design Research Partnership

With the Denver Public Schools

In this chapter we investigate the claims for the potential of design research partnerships to support district efforts to create coherence among standards, curriculum, assessment, and teacher professional development and to promote equitable participation in classrooms in the context of an ongoing research-practice partnership that began in 2007. The partner ship was formed initially between the authors of this chapter and district curriculum leaders in the Denver Public Schools when both authors were research scientists at SRI International in Menlo Park, California. Today, the partnership between the first author and district leaders continues, with the support of a research team at the first author’s current institution, the University of Colorado, Boulder.

Initial Focal Problem of Practice: Improving Classroom Assessment The initial focus of the partnership was improving classroom assessment, a goal of mutual concern to researchers and district leaders. Researchers are particularly interested in classroom assessment because there is strong evidence that formative assessment can improve student outcomes (Black & Wiliam, 1998; Crooks, 1988; Fuchs & Fuchs, 1986; Kingston & Nash, 2011; Penuel & Shepard, in press). But there is also evidence that improving assessment practices is challenging to teachers (e.g., Penuel & Gallagher, 2009; Yin et al., 2008). District leaders, for their part, are often interested in improving alignment among standards and assessment prac tices. In Denver, as in many other districts (e.g., Black & Harrison, 2001; Brandon et al., 2008; Penuel et al., 2009), the assessment component includes annual standardized achievement tests designed and scored by state officials but administered by district staff; interim or benchmark tests designed, administered, and scored by the district; and quizzes and tests that teachers give in their classrooms (Goertz, 2009). District leaders may be concerned about whether curriculum-embedded and teacher-designed assessment opportunities are informative enough and whether teachers use them in ways to promote student learning that can be demonstrated on district and state tests. In most schools and districts, system-level coherence and coher ence within assessment systems are difficult to achieve. Policies and practices of assessment may not cohere with policies and practices of curriculum, instruction, and professional development (this is known as horizontal incoherence). Assessment systems may include parts that are inconsistent with research about how children’s understanding develops over time (developmental incoherence). And there may be limited agree ment about the purposes of assessment among actors (e.g., curriculum and instruction leaders, principals, and teachers) at different levels of systems (vertical incoherence). Researchers presume that assessment systems necessarily have multiple interacting parts, but they also warn that improvement is not possible unless assessment systems are horizon tally, developmentally, and vertically coherent (Herman, 2010; National Research Council, 2006, 2012). Thus, incoherence along any of these dimensions threatens the success of interventions to improve formative assessment in individual classrooms. The Contingent Pedagogies project was a National Science Foundation– funded project that brought researchers from the University of Colorado, Boulder, and SRI International together with district curriculum leaders, a

curriculum publisher, and subject matter experts to investigate how using classroom response systems (“clickers”)2 could support classroom assess ment. The project used two units of Investigating Earth Systems, a mid dle school science curriculum published by It’s About Time, Inc. A diverse team of scientists, curriculum developers, and teachers led by the American Geosciences Institute developed the materials. The National Science Foundation, the American Geosciences Institute Foundation, and the Chevron Corporation supported the materials development. The inter vention that we developed focused on the core disciplinary idea of Earth systems. It answered the question, “How and why is Earth constantly chang ing?” (National Research Council, 2012, p. 179). The aim was to support five steps of formative assessment posited as essential or critical for improv ing learning outcomes (Black & Wiliam, 2009) among sixth-grade science teachers in a single school district: 1. Teachers elicit student thinking; 2. Students respond to teachers’ elicitation activities, revealing their thinking; 3. Teachers interpret students’ responses to make sense of where stu dents are, relative to learning goals; 4. Teachers take action (e.g., trying a new strategy) on the basis of their interpretation, in order to move students in the desired direction; and 5. Teachers reassess student understanding to measure success of the action. As with many other large urban districts, the district office in the Denver Public Schools is organized into multiple departments, and our partnership required coordination of departments related to science curriculum and instruction and instructional technology. We had infrequent and peripheral contact with the district office of assessment, evaluation, and accountabil ity. As in many other districts, multiple assessment strategies were being implemented. These included state assessments given every few years in science and district benchmark assessments that were introduced toward the end of our study that teachers were expected to administer. As researchers, we began with the premise that if Contingent Pedagogies were to be brought successfully to scale and sustained, we would need to anticipate and address the ways that the suite of tools could cohere with and support district goals and assessments. We therefore organized the project as a research-practice partnership with the district, and we set about making sure that Contingent Pedagogies supported horizontal, developmental, and

vertical coherence in the district. Ours was a design research partnership, in which teachers and district leaders contributed to the overall design of the tools and supported their implementation and testing. Through the collab orative design process and in implementation, equity emerged as a concern of teachers, and we incorporated strategies for promoting equitable class room participation into our designs, particularly for the growing population of English language learners in the school system. Next, we describe the ways we organized the Contingent Pedagogies project to (a) build horizontal, developmental, and vertical coherence and (b) promote equity.

Promoting Horizontal, Developmental, and Vertical Coherence We were particularly purposeful about promoting horizontal and devel opmental coherence in structuring the partnership and somewhat less successful in promoting vertical coherence. We organized the assessment purposes to fit within district goals for science, using a particular perspec tive on the development of student understanding in science (a knowl edge-in-pieces or facets-based view of cognitive development; diSessa & Minstrell, 1998; Minstrell, 1992). We structured the design process to include extensive involvement of a cadre of teacher leaders and to include selected district and school leaders in helping to shape the project. Strategies for promoting horizontal coherence. Horizontal coherence refers to alignment among four key interrelated elements of the “technical core” of education: curriculum, assessment, instruction, and professional development (National Research Council, 2006). A key aim of the instruc tional improvement efforts for science education in the Denver Public Schools was to build children’s understanding of big ideas through a con structivist, inquiry-oriented approach to teaching. Each grade level in the middle grades focused on core ideas in a different domain of science: sixth grade on Earth science, seventh grade on life science, and eighth grade on physical science. Students were expected to learn through guided inquiry, teacher demonstrations, and direct engagement with phenomena. The district leaders in the partnership viewed the district-adopted cur riculum materials as essential supports for helping teachers meet the stu dent learning goals reflected in standards documents; they expected us to use those materials to anchor our design work with the teachers. The par ticular curriculum materials used in Contingent Pedagogies had been tested in prior research. Initial field tests and subsequent research both found

that when combined with high-quality professional development, use of the Investigating Earth Systems curriculum materials could have posi tive impacts on science teaching and learning (Penuel & Gallagher, 2009; Penuel, Gallagher, & Moorthy, 2011; Penuel et al., 2009). Implementation of the curriculum was well supported by the district: All teachers received professional development from the curriculum developers, and the materi als and supplies were restocked each year for students to use in conducting investigations. Before the partnership began, district leaders expected teachers to make formative use of the assessments embedded in the Investigating Earth Systems curriculum to improve their instruction. From the vantage point of the researchers and the teacher leaders, the curriculum-embed ded activities and assessments provided few opportunities for students to relate hands-on investigations to disciplinary core ideas or science prac tices. District leaders who agreed to be part of the original proposal effort were interested in learning whether the tools we would develop in collabo ration with their teachers could improve both teachers’ classroom assess ment practice and students’ learning from the curriculum materials. Thus, we partnered with the curriculum developers to improve the assessment activities and developed a research design aimed at comparing changes in teacher practice and student learning in two groups of classrooms: those where teachers would receive the additional assessment resources and those where the teachers would not receive them. In both groups, teach ers would implement the adopted Investigating Earth Systems curriculum units chosen for the sixth grade by district leaders. Embedding assessment activities within the district-adopted curricu lum was a purposeful and important strategy for promoting horizontal coherence. We hypothesized that the assessment activities we developed in that way would be more likely to support the district’s goals for student learning and thus would be consistent with teachers’ goals for instruction. In addition, we hypothesized that the assessment activities would fit easily within the flow of instruction because they were embedded strategically within the curriculum materials to make student thinking more visible at key moments prior to and following investigations. Strategies for promoting developmental coherence. Developmental coherence refers to how systems help build and assess student learning over time (National Research Council, 2006). The curriculum materials them selves provided one source of coherence at the unit level. Each unit was organized into seven or eight investigations that provided opportunities for

students to participate in science practices related to one or two disciplin ary core ideas in Earth science. The Contingent Pedagogies project sup ported teachers’ implementation of two units from the curriculum, “Our Dynamic Planet” and “Rocks and Landforms.” A key goal of the assessment activities that we designed was to pro mote more opportunities for eliciting and developing students’ conceptual models about disciplinary core ideas that related to the investigations. We employed a facet-based perspective on student cognitive development for purposes of developing questions to elicit and develop student thinking. A facet is a construction of one or more pieces of knowledge by a learner in order to solve a problem or explain an event (diSessa & Minstrell, 1998). Facets that are related to one another can be organized into clusters; the basis for grouping can be an explanation or interpretation of a physical situation, or it can be the facets’ connection with a disciplinary core idea (Minstrell & Kraus, 2005). The facets perspective assumes that, in addition to problematic thinking, students also possess insights and understandings about the core disciplinary idea that can be deepened and revised through additional learning opportunities (Minstrell & van Zee, 2003). While researchers developed many of the facet clusters using tradi tional methods of data collection and analysis, teachers in the study played key roles in designing questions that would elicit student facets as the teachers introduced new topics in the curriculum and for review. The aim of the questions was to provide evidence of student learning for short time scales of development (three to four days). To that end, researchers and teachers collaboratively designed diagnostic elicitation questions intended to identify aspects of disciplinary core ideas that students understood at the beginning of the lesson. They also helped to design what we called reflect-and-revise questions that checked student understanding of core ideas at the conclusion of an investigation. Teachers were particularly valuable to the partnership in identifying problem or question contexts that would likely captivate students’ attention and in developing wording that could be understood by a wide range of learners. Their contributions pointed to an important dimension of developmental coherence that is dif ficult for reviewers to know about ahead of time, namely, the fit between the current capabilities of particular groups of students and the challenges of particular activities and vocabularies. Strategies for promoting vertical coherence. An educational system has vertical coherence when there is agreement on the key goals for student learning and on the purposes and uses of assessment across actors in

different levels of the system (National Research Council, 2012). Users of assessment data tend to have different needs, depending on their level within a system. District leaders, for example, may use assessments for accountability purposes and for improving programs. Teachers, by con trast, may use assessments to gauge individual progress in learning and to adjust instruction targeted to individuals, groups, and the whole class. In principle, these purposes for assessment can be brought into alignment, but the process takes time and typically requires multiple assessment instru ments and coordination of activities across levels of the system (Herman, 2010; National Research Council, 2006). Initially, the researchers had approached the curriculum developers of Investigating Earth Systems about conducting a study focused on embed ding technology-supported assessments into selected units of study. The researchers asked the developers to help identify a district that might be interested; they readily identified the Denver Public Schools. Being part of the project fit into the district’s ongoing improvement efforts, especially as researchers refined the goals for the study to match the district’s goals. After the study was funded, however, the district’s main champion for the project left, and researchers had to develop a relationship with the new science curriculum coordinator. She was open to being part of the project, although the research team was initially uncertain about whether the project could succeed without its champion. To build teacher support, the researchers asked the new coordinator to nominate teachers who could participate in co-designing the assessments with the research team. She selected two sixth-grade teachers who were part of another district reform initiative to digitize and organize curriculum resources in science. During fall 2008, researchers assembled a design team consisting of the study principal investigator and co–principal investigator, four researchers, three technology specialists, four content experts from partner organizations, one teacher leader, and two co-design teachers from the district. The design process began with a researcher visit to each co-design teacher’s classroom, where the researcher conducted an interview and classroom observation. The interview covered teaching experience, teaching approach and practices, details of a recently taught Earth science unit, access to and use of technology at the school, and school and district context. The observation protocol was designed to capture details of classroom organization and teacher and student interactions so that design work could target real-life situations. To further ground the design work in classroom realities, the technology support leader at each school completed a detailed survey about what technology was avail able at the school, how it was used, and how it was supported.

We then held a series of design meetings over the course of two years, in which we structured processes for developing assessment activities for the two focal Investigating Earth Systems modules. An additional three teachers from the district joined in the second year as co-designers. As with all forms of design research (Cobb et al., 2003), the design process was iterative: We developed assessment activities together, teachers tested them in classrooms, and on the basis of the classroom testing, we revised the activities. Over the course of the two years, we made significant changes to the forms of activities, their placement within modules, and the ways that technology was used to support activities. The designs we used in the field trial with the co-design teachers focused on assessment opportunities at the beginning and end of each investigation. The teachers used the “clicker” student response system to pose questions that were intended to elicit com mon facets of student understanding and, in turn, spark rich discussion of the diversity of student ideas and how they related to student investigations.

Equity as an Emergent Priority in the Research-Practice Partnership Some of the teachers who participated in the design of Contingent Pedagogies tools had large percentages of emerging bilinguals (English language learn ers) in their classrooms. They expressed concern about the language demands both of the existing curriculum and of the new tools we were developing to promote participation in classroom discussions. For example, they worried about whether emerging bilingual students would be able to follow fast-mov ing discussions about difficult disciplinary ideas. The teachers suggested that students might benefit if whole-class discussions were broken into segments that included pair and small-group talk. As the teachers and some researchers studying classroom discussions have argued (Michaels & O’Connor, 2011), using these additional formats could provide time for emerging bilinguals to develop and rehearse their thoughts. The reporting out of small-group dis cussions, moreover, could help students keep up in discussion because they would hear fewer and more consistent ideas and would benefit from the rep etition when the ideas were reported to the larger group. We incorporated recommendations to use these different formats into a revised set of “pedagogical patterns” or teaching routines (DeBarger, Penuel, Harris, & Schank, 2010) that we provided to teachers in a field trial as part of their professional development. The use of the expanded formats for “thinking time” proved popular with teachers, who incorpo rated a wider range of formats for orchestrating classroom discussions than did their counterparts in comparison classrooms (Penuel et al., 2013).

Key Partnership Outcomes One of the key ways that we measured the impact of the partnership was through a study of the effects of Contingent Pedagogies in which we tested the tools with a wide range of teachers in different schools. In the last year of the project, district leaders helped us to recruit seven additional sci ence teachers to the project as part of a field trial. All teachers who were recruited attended a two-day summer professional development work shop, where they were introduced to Contingent Pedagogies. During the first day of the workshop, the teachers participated as students, using the clicker technologies and performing activities that they would be imple menting in the classroom, while discussing ideas about classroom assess ment. They learned about facet-based instruction and how it can be used to support classroom assessment. Discussion and hands-on activities about how to elicit and develop student thinking through a set of “talk moves” (Michaels & O’Connor, 2011) were also part of the workshop. The teachers had professional development and implementation sup port throughout the year. During the school year following the workshops, the teachers joined 14 two-hour teleconferences. The teleconferences were used to discuss current classroom activities, technology issues, challenges and problems, reflections on the activities, tips for implementation, remind ers, and research activities. Quick tips in the form of a weekly newslet ter were provided for 14 weeks during the implementation of the project to support facet-based instruction and provide refreshers on the workshop material. Beyond discussing classroom strategies, the tips “unpacked” some of the clicker questions to focus on how teachers could engage stu dents to think more deeply about the substance of Earth science content. During the field trial, we collected data on the perceived usability and value of the Contingent Pedagogies suite of tools and their impact on teaching and learning. To analyze impacts on teaching and learning, we recruited a group of comparison teachers to participate in research activi ties. More detailed descriptions of the research design and findings appear in peer-reviewed publications (Penuel, Beauvineau, DeBarger, Moorthy, & Allison, 2012; Penuel, Moorthy, DeBarger, Beauvineau, & Allison, 2012) and online at http://contingentpedagogies.org. In short, we found good evidence that teachers perceived the tools to be usable and valu able for helping them accomplish their instructional goals. There was a close relationship, moreover, between the tools teachers found valuable and those they actually used with students. Our analysis of student learn ing outcomes found small to medium effects of the tools (+0.25 SD for

one unit and +0.45 SD for the second), as measured by tests we developed that consisted of a mix of multiple-choice and constructed-response items. Despite the promising results of the study, there has been little uptake of the tools beyond our initial field trial, and the district ultimately decided to implement an alternative approach to formative assessment in middle school science that focused on common tasks related to the district’s stan dards. Anecdotally, we know that some of the teachers continue to use the tools, but many of these teachers also have become part of other reform initiatives, a pattern that many other research and development efforts have documented. In addition, although the co-design process created a strong sense of com mon purpose among the participating teachers, in the end we were less suc cessful in aligning the project to the district’s priorities. The district leaders decided that other, concurrent initiatives in the same district that were comple mentary to ours (notably, one focused on promoting academic language devel opment in middle school science) were more central to the district’s goals. Some of these competing initiatives involved science instruction, and one involved formative assessment in science. In retrospect, we concluded that we had not involved district leaders enough in the research and development process to foster vertical coherence. We did not have sufficient contact with the supervisor of the science coordinator in the district, nor did we involve dis trict staff who were charged with developing district-level assessments, which might have facilitated better coherence among the assessments that teachers were using. We also failed to engage principals, who held considerable author ity for allowing teachers to participate in the research.

Lessons Learned and Implications for Research-Practice Partnership Designs In our view, the findings from our study underscore the importance of each of the three kinds of coherence in systems: horizontal, developmental, and vertical. We had considerable success in promoting horizontal coherence because we worked with an existing, district-adopted curriculum to improve formative assessment. Had we introduced assessments incompatible with the curriculum or ignored the core ideas and science practices that were focal in the materials, it is unlikely that teachers would have found the tools useful in supporting their learning goals. Co-design also proved an effective strategy for enhancing developmental coherence, by helping to calibrate researchers’ expectations of students with teachers’ perceptions of student capabilities.

At the same time, we conclude that a well-designed suite of forma tive assessment activities closely aligned to district standards and curricu lum and informed by learning theory cannot be sustained unless there is a shared understanding of its goals and contributions at all levels of the system. We needed more involvement not only of a more diverse group of district leaders but also of principals in the district. With respect to equity, our studies point to the need for additional research and development related to a specific aspect of classroom assess ment, namely, participation in classroom discussions. As researchers, we did not anticipate the need to focus specifically on tools and supports for equi table participation of emerging bilinguals in discussions of student ideas, and so this aspect of our work got less attention than it should have. Since concluding our study, we developed a partnership with a second school dis trict that has adopted the Investigating Earth Systems materials, and we col laborated with teachers there to create resources for supporting emerging bilinguals in practices of argumentation in science classrooms. The tools are intended to augment those in the Contingent Pedagogies tool kit. One of us remains involved in a research-practice partnership with the district as part of a subsequent research and development project, the Inquiry Hub, also funded by the National Science Foundation. The pur pose of the Inquiry Hub is to develop and test models for helping teach ers to make principled adaptations to curriculum materials using a digital platform for sharing and adapting resources. Formative assessment activi ties inspired by the Contingent Pedagogies approach will be part of the Inquiry Hub, though not the specific activities we developed as part of the project, because the grade-level focus is different for the new proj ect. Important to note, the design process is structured differently than in Contingent Pedagogies: There are regular meetings between district lead ers and researchers, as well as regular meetings with teachers who are helping to design and test models for adapting curriculum materials. Our early qualitative analyses of discussions in the meetings provide evidence of greater shared understanding of purpose across the two different role groups of district leaders and teachers. Our involvement in this research-practice partnership inspires us with cautious optimism about the promise of such partnerships. The fate of our particular project tools was similar to that in many such projects that are developed with less involvement of district leaders and teachers. In most projects, interventions are not sustained, particularly as district priorities and personnel change. Yet we have been able to develop partnerships with a new district where these tools will be accessible to a large number of teachers

and students in the coming years. Moreover, in Denver the relationships we built remain strong, as does a joint commitment on the part of researchers and district leaders to work together to address current problems of practice, realizing that this commitment requires ongoing negotiation about the focus of joint work. The foundation established through past work provides a solid basis for engaging in these negotiations. So does a shared understanding across the research-practice divide of the importance of achieving system coherence, as well as supporting teachers across the district in their efforts to improve their own practice.

Notes 1

We use the term system to refer to the set of interrelated components that make up a school district. There are many kinds of systems and ways of conceptualizing school districts, including as complex adaptive systems (McDaniel, 2007). In our view, it may or may not be useful to characterize school districts as complex adaptive systems, given that bureaucratic governance structures limit the potential for self-organization. What matters for the present argument is that designing for one component requires partnerships to consider the fit of that component with others in the system, because the parts are connected, both in policies and in how teachers make sense of the policies.

2

Classroom response systems known as “clickers” are technology tools that enable teachers to pose questions and collect student responses rapidly. These systems ag gregate student responses so that distributions of responses to each question can be viewed and shared immediately with the class.

References Black, P., & Harrison, C. (2001). Feedback in questioning and marking: The science teach er’s role in formative assessment. School Science Review, 82(301), 55–61. Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Educa tion, 5(1), 7–74. Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educa tional Assessment, Evaluation, and Accountability, 21(1), 5–31. Brandon, P. R., Young, D. B., Shavelson, R. J., Jones, R., Ayala, C. C., Ruiz-Primo, M. A., . . . Furtak, E. M. (2008). Lessons learned from the process of curriculum developers’ and assessment developers’ collaboration on the development of embedded formative assessments. Applied Measurement in Education, 21(4), 390–402. Bryk, A. S. (2009). Support a science of performance improvement. Phi Delta Kappan, 90(8), 597–600. Civil, M. (2007). Building on community knowledge: An avenue to equity in mathematics education. In N. S. Nasir & P. Cobb (Eds.), Improving access to mathematics: Diversity and equity in the classroom (pp. 105–117). New York: Teachers College Press.

Cobb, P. A., Confrey, J., diSessa, A. A., Lehrer, R., & Schauble, L. (2003). Design experi ments in educational research. Educational Researcher, 32(1), 9–13. Cobb, P. A., & Jackson, K. (2012). Analyzing educational policies: A learning design per spective. Journal of the Learning Sciences, 21, 487–521. Cobb, P. A., Jackson, K., Smith, T., Sorum, M., & Henrick, E. C. (2013). Design research with educational systems: Investigating and supporting improvements in the quality of mathematics teaching at scale. In B. J. Fishman, W. R. Penuel, A.-R. Allen, & B. H. Cheng (Eds.), Design-based implementation research: Theories, methods, and exem plars. National Society for the Study of Education Yearbook (pp. 320–349). New York: Teachers College Record. Coburn, C. E., Penuel, W. R., & Geil, K. (2013). Research-practice partnerships at the district level: A new strategy for leveraging research for educational improvement. Berkeley, CA, and Boulder, CO: University of California and University of Colorado. Confrey, J., & Lachance, A. (2000). Transformative teaching experiments through con jecture-driven research design. In A. E. Kelly & R. A. Lesh (Eds.), Handbook of research design in mathematics and science education (pp. 231–266). Mahwah, NJ: Lawrence Erlbaum. Corcoran, T. B., Fuhrman, S., & Belcher, C. L. (2001). The district role in instructional improvement. Phi Delta Kappan, 83(1), 78–84. Crooks, T. J. (1988). The impact of classroom evaluation practices on students. Review of Educational Research, 58(4), 438–481. Crowson, R., & Morris, V. C. (1985). Administrative control in large-city school systems: An investigation of Chicago. Educational Administration Quarterly, 21, 51–70. David, J. L., & Shields, P. M. (2001). When theory hits reality: Standards-based reform in urban districts. Menlo Park, CA: SRI International. DeBarger, A., Penuel, W. R., Harris, C. J., & Schank, P. (2010). Teaching routines to en hance collaboration using classroom network technology. In F. Pozzi & D. Persico (Eds.), Techniques for fostering collaboration in online learning communities: Theo retical and practical perspectives (pp. 222–244). Hershey, PA: IGI Global. Design-Based Research Collective. (2003). Design-based research: An emerging paradigm for educational inquiry. Educational Researcher, 32(1), 5–8. diSessa, A. A., & Minstrell, J. (1998). Cultivating conceptual change with benchmark les sons. In J. G. Greeno & S. V. Goldman (Eds.), Thinking practices in learning and teaching science and mathematics (pp. 155–187). Mahwah, NJ: Lawrence Erlbaum. Donovan, M. S. (2013). Generating improvement through research and development in educational systems. Science, 340, 317–319. Fuchs, L. S., & Fuchs, D. (1986). Effects of systematic formative evaluation: A meta analysis. Exceptional Children, 53(3), 199–208. Fuhrman, S. H. (1993). The politics of coherence. In S. H. Fuhrman (Ed.), Designing coher ent educational policy: Improving the system (pp. 1–34). San Francisco: Jossey-Bass. Glazer, J. L., & Peurach, D. J. (2013). School improvement networks as a strategy for large-scale education reform: The role of educational environments. Educational Pol icy, 27(4), 676–710.

Goertz, M. E. (2009). Overview of current assessment practices. Paper presented at the workshop “Best Practices for State Assessment Systems” (Committee on Best Practic es for State Assessment Systems: Improving Assessment While Revisiting Standards). Washington, DC: National Research Council. Guiton, G., & Oakes, J. (1995). Opportunity to learn and conceptions of educational equal ity. Educational Evaluation and Policy Analysis, 17(3), 323–336. Hand, V., Penuel, W. R., & Gutiérrez, K. D. (2012). (Re)framing educational possibility: Attending to power and equity in shaping access to and within learning opportunities. Human Development, 55, 250–268. Herman, J. L. (2010). Coherence: Key to Next Generation assessment success (AACC Report). Los Angeles: University of California, Los Angeles. Hoadley, C. M. (2004). Methodological alignment in design-based research. Educational Psychologist, 39(4), 203–212. Honig, M. I. (2013). Beyond the policy memo: Designing to strengthen the practice of district central office leadership for instructional improvement at scale. In B. J. Fishman, W. R. Penuel, A.-R. Allen, & B. H. Cheng (Eds), Design-based implementation research: Theories, methods, and exemplars. National Society for the Study of Educa tion Yearbook (pp. 256–273). New York: Teachers College Record. Honig, M. I., Copland, M., Rainey, L. R., Lorton, J. A., & Newton, M. (2010). Central of fice transformation for district-wide teaching and learning improvement. Seattle, WA: Center for the Study of Teaching and Policy. Kelly, A. E., Baek, J. Y., Lesh, R. A., & Bannan-Ritland, B. (2008). Enabling innovations in education and systematizing their impact. In A. E. Kelly, R. A. Lesh, & J. Y. Baek (Eds.), Handbook of design research methods in education (pp. 3–18). New York: Routledge. Kingston, N., & Nash, B. (2011). Formative assessment: A meta-analysis and a call for research. Educational Measurement: Issues and Practice, 30(4), 28–37. Kirp, D. L. (2013). Improbable scholars: The rebirth of a great American school system and a strategy for America’s schools. New York: Oxford University Press. Lissack, M. R., & Letiche, H. (2002). Complexity, emergence, resilience, and coherence: Gaining perspective on organizations and their study. Emergence: A Journal of Com plexity Issues in Organizations and Management, 4(3), 72–94. McDaniel, R. R. (2007). Management strategies for complex adaptive systems sensemak ing, learning, and improvisation. Performance Improvement Quarterly, 20(2), 21–41. Michaels, S., & O’Connor, C. (2011). Talk science primer. Cambridge, MA: TERC. Minstrell, J. (1992). Facets of students’ knowledge and relevant instruction. In F. Duit, F. Goldberg, & H. Niedderer (Eds.), Research in physics learning: Theoretical issues and empirical studies (pp. 110–128). Kiel, Germany: IPN. Minstrell, J., & Kraus, P. (2005). Guided inquiry in the science classroom. In National Research Council (Ed.), How students learn: History, mathematics, and science in the classroom (pp. 475–514). Washington, DC: National Academies Press. Minstrell, J., & van Zee, E. (2003). Using questioning to assess and foster student thinking. In J. M. Atkin & J. Coffee (Eds.), Everyday assessment in the science classroom (pp. 61–74). Arlington, VA: National Science Teachers Association Press.

National Research Council. (2006). Systems for state science assessment. Washington, DC: National Academies Press. National Research Council. (2012). A framework for K–12 science education: Practices, crosscutting concepts, and core ideas. Washington, DC: National Academies Press. Nolen, S. B., Tierney, G., Goodell, A., Lee, N., & Abbott, R. D. (2014). Designing for engagement in environmental science: Becoming “environmental citizens.” In J. L. Polman, E. A. Kyza, D. K. O’Neill, I. Tabak, W. R. Penuel, A. S. Jurow, K. O’Connor, T. F. Lee, & L. D’Amico (Eds.), Learning and becoming in practice: Proceedings of the 11th International Conference of the Learning Sciences (ICLS) 2014 (Vol. 2, pp. 962–966). Boulder, CO: International Society of the Learning Sciences. Oakes, J. (1990). Multiplying inequalities: The effects of race, social class, and tracking on opportunities to learn mathematics and science. Santa Monica, CA: RAND. Oakes, J., & Rogers, J. (2006). Learning power: Organizing for education and justice. New York: Teachers College Press. Penuel, W. R., Beauvineau, Y., DeBarger, A. H., Moorthy, S., & Allison, K. (2012). Fostering teachers’ use of talk moves to promote productive participation in scientific practices. In J. van Aalst, K. Thompson, M. J. Jacobson, & P. Reimann (Eds.), The future of learning: Proceedings of the 10th International Conference of the Learning Sciences (ICLS) 2012 (Vol. 2, pp. 505–506). Sydney, Australia: International Society of the Learning Sciences. Penuel, W. R., DeBarger, A., Kim, C. B., Moorthy, S., Beauvineau, Y., Kennedy, C. A., . . . Allison, K. (2013, April). Improving learning by improving classroom assessment in Earth science: Findings from the Contingent Pedagogies project. Paper presented at the annual meeting of the National Association for Research in Science Teaching, San Juan, Puerto Rico. Penuel, W. R., Fishman, B. J., Cheng, B., & Sabelli, N. (2011). Organizing research and development at the intersection of learning, implementation, and design. Educational Researcher, 40(7), 331–337. Penuel, W. R., & Gallagher, L. P. (2009). Preparing teachers to design instruction for deep understanding in middle school Earth science. Journal of the Learning Sciences, 18(4), 461–508. Penuel, W. R., Gallagher, L. P., & Moorthy, S. (2011). Preparing teachers to design se quences of instruction in Earth science: A comparison of three professional develop ment programs. American Educational Research Journal, 48(4), 996–1025. Penuel, W. R., McWilliams, H., McAuliffe, C., Benbow, A., Mably, C., & Hayden, M. M. (2009). Teaching for understanding in Earth science: Comparing impacts on planning and instruction in three professional development designs for middle school science teachers. Journal of Science Teacher Education, 20(5), 415–436. Penuel, W. R., Moorthy, S., DeBarger, A., Beauvineau, Y., & Allison, K. (2012, July). Tools for orchestrating productive talk in science classrooms. Paper presented at the work shop “Classroom Orchestration: Moving Beyond Current Understanding of the Field,” International Conference of the Learning Sciences, Sydney, Australia. Penuel, W. R., Roschelle, J., & Shechtman, N. (2007). The WHIRL co-design process: Participant experiences. Research and Practice in Technology Enhanced Learning, 2(1), 51–74.

Penuel, W. R., & Shepard, L. A. (in press). Assessment and teaching. In D. H. Gitomer & C. A. Bell (Eds.), Handbook of research on teaching. Washington, DC: American Educational Research Association. Quiroz, P. A., & Secada, W. G. (2003). Responding to diversity. In A. Gamoran, C. W. Anderson, P. A. Quiroz, W. G. Secada, T. Williams, & S. Ashman (Eds.), Transforming teaching in math and science: How schools and districts can support change (pp. 87– 104). New York: Teachers College Press. Rorrer, A. K., Skyrla, L., & Scheurich, J. J. (2008). Districts as institutional actors in edu cational reform. Educational Administration Quarterly, 44(3), 307–357. Roschelle, J., Pierson, J., Empson, S., Shechtman, N., Dunn, M., & Tatar, D. (2010). Eq uity in scaling up SimCalc: Investigating differences in student learning and classroom implementation. In K. Gomez, L. Lyons, & J. Radinsky (Eds.), Learning in the dis ciplines: Proceedings of the 9th International Conference of the Learning Sciences (ICLS) 2010 (Vol. 1, pp. 333–340). Chicago: International Society of the Learning Sciences. Sandoval, W. A. (2014). Conjecture mapping: An approach to systematic educational design research. Journal of the Learning Sciences, 23(1), 18–36. Sandoval, W. A., & Bell, P. (2004). Design-based research methods for studying learning in context: Introduction. Educational Psychologist, 39(4), 199–201. Spillane, J. P., & Coldren, A. F. (2010). Diagnosis and design for school improvement: Using a distributed perspective to lead and manage change. New York: Teachers College Press. Supovitz, J. A. (2006). The case for district-based reform: Leading, building, and sustain ing school improvement. Cambridge, MA: Harvard University Press. Talbert, J. E., & McLaughlin, M. W. (1993). Reforming districts: How districts support school reform. Stanford, CA: Center for the Study of Teaching and Policy. Trujillo, T. (2011). The reincarnation of effective schools research: Rethinking the litera ture on district effectiveness. Paper presented at the workshop “Thinking Systemically: Improving Districts Under Pressure,” Rochester, NY. Tseng, V. (2012). Partnerships: Shifting the dynamics between research and practice. New York: William T. Grant Foundation. Yin, Y., Shavelson, R. J., Ayala, C. C., Ruiz-Primo, M. A., Brandon, P. R., Furtak, E. M., . . . Young, D. B. (2008). On the impact of formative assessment on student motivation, achievement, and conceptual change. Applied Measurement in Education, 21(4), 335– 359.

Section 3

How Politics, Underlying Theories, and Leadership Capacity Support System-Wide Change

Chapter 5

Portfolio Reform in Los Angeles:

Successes and Challenges in School District

Implementation

SuSAn BuSH-mecenAS, Julie A. mArSH, AnD KATHArine o. STrunK University of Southern California

M

ore than 25 major cities are currently implementing the portfolio management model, a governance reform that is intended to help low-performing school districts improve (Hill, Campbell, & Gross, 2012). Unlike school-centered turnaround strategies, portfolio reforms treat the district as a key unit of change, encouraging districts to allow a diverse set of service providers to operate schools so the districts can observe the performance of various educational approaches and make decisions about the future selection of school operators. Toward this end, districts take on a new role as “performance optimizers” and, periodically, remove the lowest performing providers and expand the operations of higher performing providers based on student outcomes (Bulkley, 2010; Lake & Hill, 2009). In this way, the portfolio management model sits at the intersection of several existing district improvement movements, includ ing market-based reform, standards-based reform, and context-aligned differentiation of schools (Bulkley, 2010). Although the portfolio management model has been around for 15 years, it continues to develop and evolve (Bulkley, 2010). Many cities that have adopted the portfolio model did not do so intentionally. Rather, city leaders applied market logic, integrating concepts such as student choice, the use of external providers or partnerships, increased attention to human capital concerns, and differentiation of support to schools (Bulkley, 2010). Drawing on research about district reform and public-private sector con tracting, Paul Hill described and bounded the portfolio management model and identified key elements of portfolio districts. These elements included school autonomy, public engagement, student choice, capacity- and human capital–building strategies, per-pupil-based funding, and performancebased accountability (Hill et al., 2012). More recently, the Center on Reinventing Public Education (2014) has engaged in efforts to support and study portfolio districts through its Portfolio Network. 119

To date, research on the implementation and effects of district port folio reforms remains limited. Studies of the reforms’ effects on schools and students have been limited both in quantity and in their ability to draw causal conclusions about impacts on student outcomes, and they have shown mixed results (Fruchter & McAlister, 2008; Humphrey & Shields, 2009; Kemple, 2011; Mac Iver & Mac Iver, 2006; Roderick, Nagaoka, & Allensworth, 2006). Studies of individual districts have also demonstrated some of the organizational changes and challenges resulting from a shift to portfolio management, including difficulties in maintaining local capacity to sustain reform, in engaging parents and the community, in coordinating services and diffusing reform within the central office, and in maintaining neutrality (Bulkley, Christman, & Gold, 2010; Christman, Gold, & Herold, 2006; Gyurko & Henig, 2010; Hill, 2011; Lake & Hill, 2009; Levin, 2010; Menefee-Libey, 2010; O’Day, Bitter, & Talbert, 2011). In this chapter we examine the strategic case of the Los Angeles Unified School District (LAUSD) Public School Choice Initiative (PSCI), which served as a mechanism to increase new school options in the district portfolio. Implemented for the first time in August 2009, PSCI enabled teams of internal and external stakeholders to compete by submitting plans to turn around the district’s lowest performing “focus” schools (selected by LAUSD administrators based on a diverse set of performance indicators) and to operate newly constructed “relief” schools designated to ease overcrowding. The district’s theory of change behind PSCI was that, with intensive supports and appropriate autono mies, a range of school providers would be able to turn around lowperforming schools and increase student achievement. The initiative also integrated school turnaround strategies, such as replacing the principal and/or staff and providing increased flexibility, which were expected to produce significant achievement gains in a very short time (usually within two to three years) followed by sustained improvement over the long run. The ultimate goal of this district reform was to build a diverse portfolio of high-performing schools tailored to and supported by the local community. This chapter examines the implementation of the PSCI reform, focus ing on the ways in which the mechanisms of change—including compe tition and plan selection, autonomy, parental engagement, and capacity building—played out in the first four years. We ask, How were the key mechanisms of change that were outlined in the district’s vision of PSCI enacted over the course of the initiative? To what extent did the district succeed in implementing PSCI as intended, and what were the challenges?

What can be learned from this research to inform future portfolio manage ment efforts? In this chapter we first focus on the real-time efforts of the district and its partners to implement the reform and on how the teams applying to operate the schools and other stakeholders experienced the initial stages of the reform process: plan development and selection. In the remainder of the chapter we describe the LAUSD initiative and the study’s conceptual framework and methods. We then present answers to the research questions and conclude with a set of implications for policy, practice, and research.

Background on LAUSD’s Public School Choice Initiative LAUSD operates more than 900 schools and authorizes 187 charter schools that together serve approximately 655,000 students, three quarters of whom qualify for free or reduced-price lunch and more than one quarter of whom are English language learners. Scores on the California Standards Test (CST) reveal widespread failure in the system: 35% of third graders score below proficient in mathematics, and 60% score below proficient in English language arts; by the seventh grade these figures increase to 61% and 63%, respectively. More than 440 schools are in Program Improvement Year 3 status or higher under No Child Left Behind, and only two thirds of students graduate on time from high school (authors’ calculations). PSCI built on decades of past reform efforts in Los Angeles, most notably a series of systemic reforms in the 1990s seeking to empower local actors and advance student achievement (Kerchner, MenefeeLibey, Mulfinger, & Clayton, 2008). These reforms—the Los Angeles Educational Alliance for Restructuring Now (LEARN) and the Los Angeles Annenberg Metropolitan Project (LAAMP)—shared many of the ideas and levers of change embraced by PSCI, including increased autonomy and accountability, capacity building, planning, and parent involvement (Kerchner et al., 2008). However, even though LAUSD had increasingly adopted nontraditional school options for families in the past, including charter schools and magnet programs, PSCI’s introduc tion of competition for the operation of district facilities represented a dramatic shift in district policy. It opened up the possibility for exter nal teams to operate both new and existing campuses, in the context of a large urban district with multiple competing reform efforts, interest groups, and actors (see Marsh, in press, for further details). Adopted by the LAUSD Board of Education in August 2009, the Public School Choice Resolution established the long-term goal of creating

“diverse options for high quality educational environments, with excellent teaching and learning, for students’ academic success” (Flores Aguilar, 2009, p. 1). Responding to the “chronic academic underperformance” of many district schools and the strong desire of parents and communities to “play a more active role” in “shaping and expanding the educational options” (p. 1), the resolution invited operational and instructional plans from internal and external stakeholders, such as school planning teams, local communities, pilot school operators, labor partners, charters, and others who are interested in collaborating with the District to operate the District’s new schools and PI 3+ schools . . . in an effort to create more schools of choice and educational options for the District’s students and families. (p. 2)1 The initiative was not intended to be a typical choice program in which parents select the school their child will attend. Rather, it provided the community with an opportunity to participate in developing school plans. The ultimate “choice” in PSCI was made by the LAUSD leaders (the school board in Years 1–2, the superintendent in Years 3–4). Designed for gradual scale-up, PSCI involved annual rounds (or cohorts) of schools in the process, with the intention that all lowperforming public schools would be transformed into high performers. The district’s lowest performing schools were selected for participation based on a composite performance metric that consisted of schools’ Program Improvement status, their Academic Performance Index level and growth scores, the percentage of students scoring “proficient” or “advanced” on state standardized tests, and dropout rates. In each round, teams applying for a PSCI school responded to a detailed Request for Proposals, submit ting lengthy school plans that covered topics including curriculum and instruction, school organization, professional development, and school operations. In addition, applicants were asked to select one of a set of gov ernance models already in existence in the district. These models varied in schools’ levels of autonomy from district and/or union policies and in decisions about resource use. In addition, representatives from the dis trict and the teachers’ and administrators’ unions created a collaborative, the Los Angeles School Development Institute (LASDI), to provide teams with technical assistance in plan writing and implementation. The submitted applications underwent a multi-stage review. In the first two rounds of PSCI, plans were first reviewed by a set of volunteer

reviewers, including LAUSD central office staff, representatives from higher education and research organizations, school staff, parents, and union and charter organization representatives. Next, plans were reviewed by a Superintendent’s Review Panel (consisting of selected reviewers with representation similar to that in the initial review panel). Then an advi sory vote was held, allowing all parents and community members of the affected schools to vote on their preferred school plan. Finally, taking all of the previous ratings into consideration, the superintendent made recom mendations to the LAUSD Board of Education, and the Board voted on the final set of winning applicants. In February 2010 the Board selected from among 100 applications for the first cohort of participating schools (PSCI 1.0), which included 14 low-performing focus schools and 28 newly constructed relief schools (Table 1). Internal teams included groups of teachers seeking increased site-based management or combinations of teachers, parents, and/or administrators from the local school community. External teams included nonprofit organizations, charter management Table 1. Number of Schools in PSCI 1.0–4.0 and Selected Governance Models Applicant and School Characteristics

PSCI 1.0

PSCI 2.0

PSCI 3.0

PSCI 4.0

Number of sites (relief/focus)

32 (20/12)

14 (10/4)

32 (15/17)

17 (0/17)

Number of schools (relief/focus)

42 (28/14)

28 (23/5)

41 (22/19)

20 (0/20)

100

49

64

22

17

5

8

11

Number of applicants Selected teams Traditional ESBMM

8

2

5

4

Pilot

8

12

2

1

Network partner

3

1

2

0

Affiliated charter

0

0

0

0

Independent charter

6

8

-

-

LIS

-

-

1

3

22

1

0

1

4

0

Internal

32

19

39

20

External

10

9

2

0

Model not provided School moved to next cohort

Note. The number of schools may be larger than the number of sites because some campuses, particularly high schools, were broken into smaller schools. PSCI = Public School Choice Initiative; ESBMM = Expanded School-Based Management Model; LIS = Local Initiative Schools.

organizations, and partnerships between internal and external groups. In this first round of PSCI, the Board awarded 33 schools to internal teams and selected six teams proposing charter schools and three nonprofit orga nization plans. The second full cohort (PSCI 2.0), identified in May 2010 and scheduled to open or reopen in fall 2011, included 23 relief schools and 5 focus schools. In March 2011, the Board selected from 49 propos als and approved a range of teams to operate these schools, including 19 internal and 9 external teams (charter or network partner teams). We refer to the first two cohorts of PSCI as Phase I. In later years, the district and its partners adjusted the design of PSCI to address some of the challenges identified during Phase I. First, the Board replaced the advisory vote with a series of community engagement and feed back meetings in which parents, community members, and students devel oped a set of priorities for their school, evaluated plans against those priorities, and provided their recommendations to the superintendent and his panel for review. Other modifications to the initiative occurred in 2011 following the appointment of Superintendent John Deasy and the election of several new Board members and a new teachers’ union president. Responding to Board and union pressure, Deasy made modifications to limit competition in PSCI in exchange for new opportunities for all schools to obtain autonomies related to district and union policy. Under a new Memorandum of Understanding (MOU), ratified by union members and approved by the Board in December 2011, external teams of charter operators and nonprofits were eligible to par ticipate in PSCI only if they agreed to operate the school using district employ ees under the current collective bargaining agreement. In exchange, all district schools now had the option of adopting a new governance model allowing for greater freedom in the areas of curriculum and instruction, school scheduling, and staffing selection. Under this MOU, the superintendent became the final authority in selecting applicants. Operating under this revised design, we refer to PSCI Cohorts 3.0 and 4.0 as Phase II. In all, 64 applications were submit ted for PSCI 3.0 for 22 relief and 19 focus schools. All selected applicants were internal teams, with the exception of two external network partners. In PSCI 4.0, 22 applications were submitted for the cohort’s 20 focus schools. All selected applications were from internal teams.

Conceptual Framework Our study design, data collection, and analysis were guided by a concep tual framework grounded in the research on portfolio districts, as well as by the district’s implicit theory of change. We constructed this theory of

change based on our early interviews with several district leaders and a review of documents. We then brought the draft theory of change to dis trict leaders, who helped us to refine it based on their intentions during implementation. Modifiers such as intensive and appropriate were derived from the original language used by district officials in interviews and doc uments and, as such, represent the designers’ intent. The district’s initial theory of change, illustrated in Figure 1, high lighted five key levers of change that, if implemented, were expected to lead to dramatic improvements in student performance. First, rigorous screening of school plans was intended to ensure high-quality school designs, and competition among a diverse set of applicant teams was believed to motivate applicants to enhance the quality of the plans and increase the potential for innovation. Second, granting school operators autonomy over key domains such as staffing, budget, governance, and curriculum and instruction was expected to foster the development of schools that met district needs, were responsive to local contexts and student needs, and were staffed with personnel committed to the schools’ goals and priorities (e.g., Chubb & Moe, 1990; Edmonds, 1979). Third, district oversight and accountability mechanisms were intended to motivate PSCI school staff to perform their best. By requir ing schools to establish and achieve the goals outlined in their proposals, the district encouraged staff to continually assess and improve their per formance. Through strong oversight and accountability, the district could stay informed of PSCI schools’ progress, intervene quickly with strug gling schools, and, if necessary, return schools to the PSCI process. Fourth, capacity-building efforts, including technical assistance and support from the district and its partners, were expected to build applicant teams’ ability to develop and implement high-quality plans. Finally, increased pressure and contribution from parents and commu nity in the selection, development, and implementation of school plans was intended to provide additional pressures and supports. We note that the 2011 changes to the initiative’s design (described earlier) led to modifications of the initial theory of change. Namely, the autonomy and capacity-building levers were strengthened, while the competition lever was de-emphasized. These mechanisms were expected to yield a diverse set of high-quality learning environments (see Figure 1 for specific characteristics identified by LAUSD) and ultimately to positively affect student outcomes. Student outcomes could result either directly or indirectly through effects on teach ers and other school staff (e.g., retention of effective individuals, high job satisfaction) and effects on parents and community members (e.g., high

Development of plan

• Implementation

Positive outcomes for parents and community

Positive outcomes for students

Non-PSCI Schools • Adoption of best practices • Pressure to improve school quality and outcomes • “Relief” effects on feeder schools

• Innovative and diverse schools and practices

• Sound financial practices

• Strong community involvement

• High-quality professional development

District, Community, School, Classroom Context Understanding and commitment; capacity; motivation; leadership; politics; other accountability policies and competing interventions; community, school, staff, and student characteristics

Oversight and accountability

Increased pressure and contribution from parents and community

• Performance management

• Effective use of data and assessments

• Supportive school climate

Positive outcomes for staff

Figure 1. Public School Choice theory of change representing the initiative’s intent. PSCI = Public School Choice Initiative; ESBMM = Expanded School-Based Management Model.

• Dissemination

• Identification and codification of best practices

Diffusion activities

Accountability and monitoring

• Interviews

• Selection

Review process

Selection of school model (charter, network partner, ESBMM, pilot, traditional) Capacity building

Autonomy to respond to local contexts and needs

Applicant team formation (external vs. internal) • Rigorous curriculum and instruction

• Strong leadership and governance

Rigorous screening of plans and competition for selection

Selection of school type (focus vs. relief)

High-Quality PSCI Schools

Levers of Change

PSCI School Application Process

• Engagement with community

• Review

• Application

• Planning

Provision of support and oversight

Facilitation of stakeholder involvement

Identification of PSCI schools

PSCI Portfolio Environment Established by District and Partners

satisfaction with the school, strong sense of ownership). To the extent that the PSCI’s effects might spill over to non-PSCI schools—whether because schools felt pressure to implement reforms to avoid being selected into the initiative in the future, or because of the intentional or natural spread of successful school models and ideas, or, in the case of relief schools, because overcrowding was alleviated—positive student outcomes were expected to translate into higher quality schools overall. Finally, it is understood that PSCI was embedded in a broader con text that could influence its implementation and mediate its effects. Local human, physical, and social capital; stakeholders’ understanding and commitment to reform; school and district leadership and support; and the alignment of PSCI with existing policies all were likely to affect the implementation of the core processes and the quality of the work in the PSCI schools (Berends, Bodilly, & Kirby, 2002; Bryk & Schneider, 2002; Corcoran, Fuhrman, & Belcher, 2001; Gamoran, Nystrand, Berends, & LePore, 1995; Marsh, 2002; Marsh, Hamilton, & Gill, 2008; Massell, 1998; McLaughlin, 1987; Snipes, Doolittle, & Herlihy, 2002; Spillane & Thompson, 1997). This chapter examines the PSCI portfolio environment and school application process and the levers of change (see Figure 1), focusing primarily on the four levers of change key to the implementation of the initiative’s processes of plan development and selection: rigorous screening of plans and competition for selection, autonomy to respond to local context and needs, capacity building, and increased pressure and contribution of parents and the community. Implementation of selected school plans, intermediate outcomes, and student achievement outcomes will not be addressed in this chapter. Although improvement in student outcomes is the ultimate goal, it cannot be reached unless the beginning steps of this process (plan development and selection) are well imple mented and the key levers of change operate as intended. It is from this premise that we begin our inquiry.

Data and Methods The chapter draws on results from a three-year mixed-methods study with emphasis on PSCI 2.0, 3.0, and 4.0. The decision to focus on Cohorts 2–4 was pragmatic: The funding and start date of the study coincided with the second year of the initiative, and thus real-time interviews and observa tions of the plan development and selection process were possible only for the second cohort and beyond. Multiple data sources informed our

analyses, including surveys of participating design teams, school case studies, leader interviews, observations, and document review.

Surveys We administered Web-based surveys annually to one representative from each participating team, yielding response rates of 80% (n = 36) in PSCI 2.0, 85% (n = 46) in PSCI 3.0, and 95% (n = 21) in PSCI 4.0.2 The sur vey asked about the plan-writing process, plan content, and perceptions of PSCI. The survey respondents were design team leaders (DTLs) identified from letters of intent submitted to LAUSD. They included a mix of teach ers, nonprofit or charter school administrators, principals and school staff, and local district administrators. We conducted descriptive analyses of the survey data and also compared responses of respondents across cohorts.

Case Studies We conducted case studies of five PSCI 2.0 schools, four PSCI 3.0 schools, and five PSCI 4.0 schools and of the stakeholders: either those involved in developing the proposals or those belonging to the school communities associated with the target schools. We chose schools to represent variation in level (elementary, middle, high), type (relief or focus), and geographic location. For each case school, we interviewed at least one member from all teams submitting applications (n = 24; 17 internal and 7 external teams); conducted parent focus groups (n = 100 parents total); observed site-specific meetings (n = 50); and reviewed documents showing print and social media coverage, school plans, communications with stakeholders, and the like. All data collected were in near-equal proportions across case study schools.

Leader Interviews, Documents, and Observations Documents and interviews with LAUSD central office administrators and district partners (n = 45) provided us with information about the PSCI design and theory of change. To understand the nature of communication and technical assistance, we observed 40 district meetings, including ori entation sessions and workshops on school turnaround and various aspects of the school plan, and we collected all relevant documents, such as agen das, PowerPoints, and print and online communications. Our team coded the case study and leader interview notes and docu ments along the dimensions of the conceptual framework and analyzed

notes by individual schools and across schools. We examined the results of all of these analyses to identify cross-school findings and themes. To enhance the internal validity and accuracy of the findings, we triangulated data from multiple sources, comparing interview data with documents and surveys whenever possible.

Findings: Implementation Successes and Challenges In this chapter we assess successes and challenges that faced LAUSD during implementation by comparing what occurred during PSCI imple mentation with the policy framers’ intentions, as outlined in the theory of change. In particular, we examine the ways in which plan screening and competition, autonomy, capacity building, and parent engagement and support were implemented as intended, and how implementation did or did not lead to expected outcomes. Overall, we find evidence of suc cesses in the implementation of PSCI plan development and selection but also a number of challenges that brought into question some of the main assumptions of PSCI. We organized these findings below by the lever of change that each addresses. Our chapter focuses on four of the levers of change relevant to this initial plan-writing stage of the reform process: (a) rigorous screening of plans and competition for selection, (b) autonomy to respond to local context and needs, (c) capacity building, and (d) increased pressure and contribution of parents and the community.

Rigorous Screening of Plans and Competition for Selection Rigorous screening of plans and competition for selection were key mechanisms of change in the PSCI design. In fact, the district and part ner organization leaders we interviewed overwhelmingly emphasized the importance of competition over the other levers. In practice, the district saw mixed success in implementing this lever of change. We organize these findings in four key areas. Transparency. The initial PSCI resolution called for “a transparent pro cess for plans to be submitted, reviewed, and evaluated by internal staff and external stakeholders” (Flores Aguilar, 2009, p. 5). By all accounts these goals were achieved. As intended, LAUSD leaders consistently shared information about the process with the public during both phases of the reform. The district posted weekly updates on a designated PSCI website and, for each round of PSCI, posted copies of all letters of intent

and final proposal submissions, along with detailed documentation of the review process, including reviewer comments and ratings for each school. Meeting agendas, PowerPoint slides, and other informational materials were also regularly posted on this website. The leader of one Investing in Innovation Fund (i3) grant partner organization described the level of transparency as “unheard of” and “profoundly powerful.”3 This partner noted, however, that allowing the public “to see the intra-district decisionmaking process” unfold weekly also led to “disorientation” and confusion, as we discuss later. During early Phase II, in particular, significant modifications to the initiative made it difficult for initiative administrators to provide accurate, up-to-date information because key components of the PSCI process were in flux. As one design team leader noted, “Then you had a district person who was still kind of like, ‘Well, this is the current document, but it’s going to change in the next few months,’ which does not help me, because the deadline [for plan submission] is next week.” Others were more sym pathetic with the challenges facing the district. Another design team leader stated, “I think they [the district administrators] were able to provide us as much as they could. They were in the middle of organic change, too.” Applicant supply. Despite some early success in drawing large numbers, the supply of teams submitting school applications became an area of con cern over time. For example, the average number of proposals per school decreased from 2.4 in PSCI 1.0 to 1.8 in PSCI 2.0. One possible contribut ing factor was that LASDI consultants (retired LAUSD administrators and teachers engaged to assist internal teams) sometimes encouraged multiple applicant teams to collaborate and jointly submit a single proposal (e.g., when they observed teams proposing similar ideas). It is not clear from our data that these LASDI efforts entirely explain the observed decrease, but they may have contributed to it. In Phase II of the reform, the sup ply of applicants further decreased, to an average of 1.6 proposals per school in PSCI 3.0 and 1.1 in PSCI 4.0. This decrease likely was due to MOU-related changes in the initiative, which discouraged applications from external organizations seeking to run independent charter schools. Overall, the decreased supply of applicants weakened the competition lever of change and indicated that PSCI may not have provided a true choice for each participating community, particularly in the later years. Neutrality and perceived fairness of plan review. Another set of chal lenges pertained to the neutrality and perceived fairness of review and

selection. PSCI and portfolio models generally assume that leaders will be neutral about the types of organizations or teams running schools and will select operators based on quality and effectiveness (Hill & Campbell, 2011). Leaders are expected to support broad participation and competi tion so that the most innovative and highest quality ideas and plans emerge. Our data indicate, however, that not all district leaders and partners ful filled this expectation, which contributed to widespread perceptions that the process was unfair. In Phase I, 69% of 2.0 DTL survey respondents disagreed or strongly disagreed that the selection process was fair to all applicants. In interviews, teams of all types commonly reported that the district had not created a level playing field and that certain teams had the upper hand, with greater access to resources and support. Several internal teams reported that external teams had greater capacity to write plans due to full-time staff dedicated to grantwriting and experience developing school charters. Conversely, all external teams and some teacher-led teams believed that internal teams supported by the local districts had greater access to data and parents and had more sup port from LASDI and the local teachers’ unions. In most case schools, design teams believed they were at a disadvantage relative to their competitors, par ticularly in the early rounds. Even though competition decreased in Phase II, 52% of 3.0 DTLs and 56% of 4.0 DTLs still disagreed or strongly disagreed that the PSCI plan review and selection process was fair to all applicants. There was also a pervasive belief that politics interfered with the selection process, particularly in Phase I. For example, six teams across all five case study schools in PSCI 2.0 mentioned that review panels and Board members were not adequately trained in how to judge the quality of school plans and that decisions were not guided by the rubric or by assess ments of quality but, rather, by political motivations or personal biases. Several individuals believed that Board members knew in advance who they were going to select and had strong preferences for preserving jobs, selecting internal teams, or adopting specific governance models. In one case, a Board member endorsed an internal applicant team before propos als had been submitted. Another DTL reported being pressured by a Board member to drop out of the process so that the preferred internal team could succeed. Interestingly, our independent analysis of plan quality showed that higher quality plans were selected overall and, in most cases, in Phase I (Strunk, Marsh, Duque, & Bush, 2013).4 These data indicate that while many observers believed political interests rather than assessments of quality drove Board decisions in Phase I, their perceptions may have been based on a relatively small number of (somewhat high-profile) cases.

Unintended consequences of competition. Finally, the implementation of PSCI starkly illustrated the unintended consequences of competition. The high stakes of the decision to award a team control over a school—in some cases leading to removal of entire staffs to allow external staffs to take over— heightened the intensity and emotion surrounding the plan development and selection process and at times interfered with PSCI’s intended focus on developing and selecting high-quality school plans. One focus school prin cipal and design team leader commented: “This process really did a lot to divide the staff, and it took a lot of people’s hope away. . . . [S]ome of my teachers were saying, ‘I’ve been doing all this work [as a teacher at a focus school], and what? You’re going to take it away?’” According to multiple case study respondents in the early years, the com petition itself took valuable time and attention away from designing their school and writing their plan. “We have to come to consensus ourselves [and decide] ‘What is our goal?’” said one internal team leader. “That we have to spend time selling ourselves, pitching it . . . just makes it all the more difficult.” Many teams invested significant time and material resources on a variety of strategies to garner support for their plans—resources some teams would have preferred to devote to plan development. “It was supposed to be about collabo ration,” explained one internal DTL, “but you create this extreme competition between these strong competitors, and no one wins.” As the district sought to address these challenges, leaders eventually de-emphasized competition as a key lever of change, while maintaining a focus on rigorous screening of plans.

Autonomy As with policy implementation in general (Bodilly, Chun, Ikemoto, & Stockly, 2004; Marsh et al., 2008; McDermott, 2006; McLaughlin, 1987; Sabatier & Mazmanian, 1979), stakeholder understanding was a nec essary precondition for several of PSCI’s core mechanisms of change, particularly that of autonomy. If autonomy is going to drive change or innovation, stakeholders must be clear on what autonomies they are gain ing when selecting a governance model and how to use them to advance improvement. However, our evidence suggests that these conditions may not have been fully achieved. Although on surveys the majority of DTLs reported that they understood what each governance model entailed (86% in 2.0, 78% in 3.0, and 65% in 4.0 reported mostly or completely under standing), several DTLs expressed or exhibited misunderstandings about multiple aspects of the autonomy-related mechanisms. Various models included different levels of autonomy related to staffing (i.e., independent

charter schools had complete staffing autonomy, whereas traditional pub lic schools were bound by the teacher and administrator contracts, and other governance models fell in between). The models also allowed for different levels of autonomy related to scheduling, among other things (i.e., some models allowed for increased school days and work hours, and others required schools to follow the teacher contracts in full). Several of the teams interviewed each year were unclear about these distinctions and the types of autonomy granted by each model. Phase II changes, in particular the creation of a new governance model called Local Initiative Schools, may have contributed to increased confusion about autonomies in later years. In fact, one design team noted that their confusion regard ing Local Initiative Schools influenced their governance model decision: “They [the district administrators] don’t know what that looks like yet. . . . So how can you pick what a Local Initiative is, when the district isn’t saying what it’s going to be?” Several factors appeared to contribute to this mixed level of under standing. First, the short time frame given to enact the Board resolution initially—just two months during the first round of PSCI—and the mul tiple changes to rules and processes throughout PSCI 1.0 to 4.0 may have played a role. Second, the district and its partners may have conveyed unclear and inconsistent messages about the initiative, particularly in the early years. “We haven’t done a good job at communicating about PSCI,” said former superintendent Ramon Cortines. “People still see it as takeover rather than improvement, and that bothers me.” Third, decision-making processes within LAUSD may have exacerbated the confusion. Although central office administrators were responsible for day-to-day PSCI imple mentation, other district leaders—including the superintendent and Board members—had considerable latitude to make decisions that greatly altered the initiative and conflicted with prior decisions or rules laid out by admin istrators. As a result of the multiple decision makers involved in the pro cesses, local teams and stakeholders may not have received consistent or clear answers or messages. Despite these challenges, the creation of a new governance model in Phase II marked an increased emphasis on autonomy as a key lever of change in PSCI.

Capacity Building Technical assistance and support provided to design teams during the plan-writing process was an essential mechanism of change under the design of PSCI. The initiative relied on the development of high-quality

school plans, written in part by nontraditional actors who may have been unaccustomed to plan writing. Our data indicate that LAUSD and its part ner organization, LASDI, provided plan development support in accor dance with the intended model, although uneven provision of this support reinforced doubts about the neutrality of the district and fairness of the process for some teams. Overall, the district made significant progress toward establish ing a wide range of standardized supports and services to assist teams with plan writing. It served as both direct service provider and bro ker of individualized support through the new LASDI collaborative, a role commonly adopted by central offices implementing portfolio strategies (Bulkley et al., 2010; Gyurko & Henig, 2010; Honig, 2009). Drawing on grant resources, LASDI provided support via consultants and workshops, aiming to deliver targeted technical assistance and ongoing support to internal applicant teams that included both teacher and administrator representatives. Because external teams were per ceived to possess greater resources and experience in writing plans (e.g., school charters), LASDI was created to “ensure existing inter nal stakeholder teams are positioned and supported to develop and put forth viable designs for new and existing schools identified” (Flores Aguilar, 2009, p. 2). LAUSD also succeeded in providing direct technical assistance to design teams. Administrators, with support from various other LAUSD departments, delivered a series of workshops and meetings to build understanding of PSCI and the Request for Proposals and to develop deeper knowledge about multiple topics. More than three quarters of the design team leaders in all cohorts reported that they or a mem ber of their team attended LAUSD and LASDI workshops and meet ings, and the majority of DTLs reported receiving multiple supports and services. Despite the provision of diverse supports, access was a challenge at times. Early on, LASDI decided that it would support only teams that included both administrators and teachers; both groups were represented by the unions constituting two of the three partners directing LASDI. As one LASDI leader said, When a school decides that they want LASDI’s help, the directors go out and sit down with the principal and teacher teams. They’re very adamant that they meet with both teacher leader and principal, and if they’re not even willing to sit in a room, we don’t go out.

This decision withheld LASDI support from teams composed solely of teachers (because they did not include both teachers and administra tors), as well as from external teams (because LASDI was created to support internal teams, not charter/network partner teams, which were thought to have more experience in writing charter-like school plans). Not surprisingly, those excluded expressed strong dissatisfaction with this decision to limit access to capacity-building resources. Referring to LASDI, one charter school team member explained, We don’t have an equivalent institution for our capacity. . . . We don’t have . . . another technical assistance piece . . . that the dis trict’s internal applicants are getting. Survey data substantiate this concern of unequal access to services. Overall, LASDI-ineligible teams reported accessing and receiving fewer services and supports than LASDI-eligible teams. For example, in PSCI 2.0, only 40% of LASDI-ineligible teams reported receiving help in writing their plans, as compared with 76% of LASDI-eligible teams. Similarly, only 53% of LASDI-ineligible teams received assistance in understanding governance models, compared with 95% of LASDIeligible teams. In response to concerns about access to and quality of support, LAUSD made efforts to offer more support to design teams in PSCI 4.0 (late Phase II). In contrast to informational meetings provided during the first three years of the initiative, the district began offering in-depth workshops to all teams on analyzing data and determining instructional interventions during PSCI 4.0. Designed in partnership with the National Equity Project, the workshops consisted of structured work time (e.g., time allocated for teams to develop and write specific parts of their plans, such as their mission statement or performance goals) and activities designed to assist teams in defining their school mission, examining their student performance data, and identifying appropriate interventions. On our survey, 81% of PSCI 4.0 DTLs reported receiving support on instruc tional programs (compared with 54% of 3.0 DTLs and 61% of 2.0 DTLs), and 76% reported receiving support on utilizing performance data (com pared with 54% of 3.0 DTLs and 69% of 2.0 DTLs). District and partner organization efforts to enhance the capacity-building supports available to teams during Phase II were indicative of a new emphasis on capacity building as a key lever of change.

Parent and Community Pressure and Contribution From the initiative’s inception, engagement of parents and communities was considered a key lever for change. While parents and students did not directly choose their school of attendance (PSCI schools enrolled students according to LAUSD neighborhood boundaries), parents and community members were engaged, and their feedback was solicited to aid district lead ers in plan selection. Consistent with previous research in portfolio districts (e.g., Bulkley et al., 2010; Levin, 2010), this proved to be a major challenge for the district and its partners, particularly in the initial rounds of PSCI. For parents and communities to serve as a source of both support for and pressure on PSCI schools, they needed to not only participate but also understand the PSCI process, their own roles, and the content of options presented by applicant teams. Our data suggest that this did not occur as hoped. First, the district and its partners struggled to get parents and com munity members to participate in PSCI. Attendance at district-sponsored community workshops was generally low, with an average of 64 parents attending each meeting in PSCI 2.0 (school enrollments ranged from 700 to 3,800) (Beltran, Cruz, Guevara, Holmquist, & Logan, 2011). In addition, a report from the League of Women Voters of Los Angeles esti mated that fewer than 1% of parents of students at the PSCI schools and feeder schools voted in the 2.0 advisory vote (Beltran et al., 2011). As Superintendent Deasy (2011) acknowledged, this vote tally “is not a reli able data point and in no way provides an adequate indication of what par ents want for their school” (p. 2). With few parents attending each meeting and often only attending one in a series of meetings offered, Deasy noted, “We found it challenging to go deeper in our conversations, as we often had a new group of parents at each meeting” (p. 1). Similarly, a report from the nonprofit Families In Schools (FIS) reported that in PSCI 2.0 there were “insufficient opportunities for parents to learn and comprehend complex information regarding school performance and school plans in order to make an informed decision” (Patterson & Cruz, 2011, p. 9). In addition, the district and its partners struggled with politically moti vated mobilization of parents in the advisory vote in Phase I. FIS docu mented some of the negative behaviors occurring during the advisory vote in PSCI 2.0, including “23 reports of voter intimidation, disruption or elec tioneering at 11 sites” (Patterson & Cruz, 2011, p. 5). Based on 140 hours of observation by community volunteers at all 13 voting sites, FIS con cluded that “electioneering continues to be a significant problem despite efforts by the LAUSD and the LWVLA [League of Women Voters of Los

Angeles] to thwart such activities” (p. 9). In fact, many district leaders and local media condemned the advisory vote as a failure (see “Incomplete Grade,” 2011). As one Board member explained: A lot of parents were really engaged and involved, but so many were turned off . . . they saw the politics, they saw the polarization, and they didn’t like what they saw. . . . We heard over and over again: They didn’t understand the information, they were getting lied to, they felt manipulated, they didn’t know who to believe. In response, district leaders significantly modified parent engagement efforts in Phase II. Rather than voting, parents offered feedback by using a scoring rubric with a four-point scale to rate plans in several areas (e.g., student expectations, school culture, turnaround strategies) and by pro viding written qualitative feedback. Informational sessions were replaced with “academies” intended to train parents to analyze school data and make informed decisions about school options. In many respects, these changes greatly improved the process over time. The parent academies ran fairly smoothly and rarely experienced the level of conflict witnessed in the early years of the advisory vote. Yet, despite the efforts of district leaders, administrators, and partners and despite significant investments in training, materials, and staff, our Phase II case data indicate that low participation rates, concerns about representation, and low levels of understanding persisted. Our research also indicates that several factors contributed to the patterns we observed. Notably, variable quality of facilitation, limited time and information, and language barriers contributed to the uneven quality of engagement, while skepticism about the purpose of parent engagement—and about the extent to which leaders’ plan selection decisions would be influenced by par ent feedback, structural constraints (e.g., location), and cultural and class differences—inhibited the quantity and perceived representative nature of participation. While parent engagement was a challenge in both phases of PSCI, it remained a priority over time.

Conclusions and Implications Our research indicates that, in the first four years of PSCI, LAUSD expe rienced many challenges and some successes in implementing the plandevelopment and selection processes as originally intended. On the one hand, PSCI leaders scaffolded the plan-development process with an array

of supports from multiple organizations, generally selecting plans based on quality and ensuring transparency at each stage of the process. On the other hand, the scale and complexity of PSCI proved to be formidable for district administrators and partners, fraught with challenges that weakened several of the key mechanisms of change. Most notably, leaders encoun tered difficulty communicating with and engaging parents and community members, attracting sufficient numbers of applicants for all schools, main taining the perception of fairness, and ensuring that competition did not yield unintended consequences. Our findings have several important implications for policy makers and practitioners in and outside of Los Angeles. First, the findings indicate that when implementing a portfolio model, districts need time to develop a multitude of new policies, processes, and practices. The misunderstand ings and reported confusion that our research uncovered, particularly in Phase I, suggest that more planning time might have improved the con sistency of central office messages about the reform initiative. And while transparency is a worthy goal, it may come at a cost. The trade-off between access to information and possible inconsistency has to be factored into the development process of any reform, particularly ones as complex as PSCI. Added time for planning prior to public dissemination of information and prior to the implementation process may avoid some of these communica tion problems early on. Moreover, districts will then need to adapt these policies, processes, and practices to their specific and often-changing local contexts. The adaptation process requires complex negotiations with mul tiple stakeholders and decision makers. It is important that policies calling for fast results do not penalize districts for trying innovative and perhaps untested strategies and modifying them over time (as witnessed particu larly with the PSCI parent engagement processes). The changes made to PSCI in Phase II illustrate the potential instability that results from lead ership change and financial distress in the course of reforms. It will be interesting to observe over time whether persistent budgetary crises and leadership turnover impede other districts’ efforts to sustain portfolio man agement, which by its nature threatens key interests of traditional political actors in district reform. Second, many of the specific challenges encountered in LAUSD speak to potential roadblocks for others seeking to implement portfolio reforms. For example, some have raised questions about system-wide and citywide capacity to take over failing schools. Research indicates that large char ter management organizations did not jump at the opportunity to restart schools under the federal School Improvement Grants Program (Zehr,

2011). Some of these organizations admitted that the difficulty of improv ing an existing school, compared with creating a new school that families have chosen to attend, lessened the appeal of the turnaround option. Our data on the decreasing supply of PSCI applications affirm that there may be justifiable concerns about the number of organizations interested, will ing, and able to take on the turnaround challenge. Moreover, LAUSD’s struggle with PSCI’s community and parent engagement mechanism demonstrates that it is difficult to obtain input and achieve consensus about what strategies best fit a particular school community. It also shows that the complexity increases as more stakehold ers are involved in decision making. Developing mechanisms that ensure effective parent and community engagement in portfolio reforms is a lin gering challenge that deserves further attention in research and practice. Districts considering similar reforms should anticipate potential language issues and consider investing in the development of unbiased, high-quality information and opportunities for stakeholder engagement that provide sufficient time and support to ensure understanding. Our findings echo key themes in the extant literature, specifically the research on district reform and on public-private contracting. Consistent with research on district reform in general (e.g., Garda, 2011; Gittell, 1994; Hess, 1999; Marshall, Mitchell, & Wirt, 1985; Shipps, Kahne, & Smylie, 1999; Stone, Henig, Jones, & Pierannunzi, 2001), our research on PSCI indicates that the implementation of portfolio reforms is a deeply political undertaking. Prospects of losing one’s school or one’s job (in the case of a charter school taking over) or of gaining more autonomy (in the case of a school adopting a governance model or waivers from district policies or collective bargaining provisions) create high stakes and strong incentives for stakeholders to mobilize to protect their interests. Leaders planning to adopt such reforms should anticipate their highly charged nature and invest in ways to ensure neutrality (e.g., through independent monitoring of the process), to counteract potentially divisive practices and unintended consequences (e.g., by countering misinformation, penalizing electioneer ing), and to create a level playing field (e.g., by providing equal access to high-quality technical assistance). Echoing the literature on public-private sector contracting, our findings highlight the challenges for districts in maintaining an adequate supply of high-quality school operators, developing the capacity to hold operators accountable, and adjusting to new political contexts/structures resulting from the shift from control to contracting. First, the limited supply of quali fied applicants in a competitive context like PSCI can be a challenge when a

few organizations enjoy a relatively large market share. And as a relatively small number of contractors provide services, ambiguities in power and control may arise. In deregulated contexts like the portfolio model, Henig (2010) has described the shift toward contracting regimes, which emphasize the role of political actors (including superintendents, mayors, foundations, and partners) over the traditionally powerful role of school boards, unions, and district administrators or of parents in market-based reforms. These changes may lead to ambiguities regarding control and accountability of contracted schools. As districts seek to hold contractors accountable, new power dynamics may contribute to the difficulty that districts face in creat ing the capacity to monitor and evaluate school performance (DiMartino, 2014). As Sclar (2001) has suggested, government reorganization to man age contractors may not result in a leaner government and may not meet the theoretical expectations of effectiveness. There are also substantial politi cal implications of these changes in district management, as alterations in power, authority, and regulation occur under the portfolio model. Our findings present a paradox in the enactment of PSCI and the con flicts among various levers of change. By design, the initiative embraced the importance of engaging stakeholders, particularly community mem bers, and viewed their support and pressure as a critical mechanism for enhancing the quality of school plans and ultimately student learning. Thus, time spent on external outreach could be viewed as appropriate, if not essential. Yet the competitiveness and high stakes of the final decision may have led participants to behave in ways that attracted support for their plans at the expense of quality. Widespread perceptions about fairness and politics also affected buy-in for the initiative and could have even greater long-term effects on participants’ willingness to invest in the process or focus on the quality of educational programs presented in school plans. As the district and its partners made modifications in the design of the initiative to address implementation challenges over time, the relative emphasis on various levers of change shifted. Most notably, design revi sions resulting from the district-union negotiations decreased the empha sis on market-oriented reform mechanisms in PSCI by greatly limiting the scope of competition and increased the importance of autonomy and capacity building as strategies for improvement. Although such adjust ments may meet some school needs, it is unclear to what extent they alter the initiative’s strength and efficacy and what influence they may have on the other levers of change and outcomes. As districts seek to respond to implementation challenges, they should carefully consider the influence of modifications on each lever and on the overall theory of change.

In the coming years, it will be important to examine the implemen tation and effects of the reform with the policy changes discussed in this chapter. As these reform concepts grow in popularity, it will be impor tant to build on the successes and challenges of other districts imple menting portfolio reforms across the country to inform policy decisions at the federal, state, and local levels. The research base is slowly grow ing (e.g., Bulkley et al., 2010; Christman et al., 2006; Gyurko & Henig, 2010; Hill, 2011; Levin, 2010; Menefee-Libey, 2010; O’Day et al., 2011). LAUSD’s PSCI provides an excellent opportunity to study how one large urban school district initially implemented a portfolio reform mechanism, learned from the challenges that arose, and continuously adapted its policies to address the needs of its students and other stake holders. The ultimate test of the efficacy of PSCI lies in the years ahead. Our ongoing research aims to answer questions about the eventual out comes of PSCI.

Notes 1

Under No Child Left Behind, PI 3+ schools are those identified for program improve ment (PI) for three or more years because of failing to achieve Adequate Yearly Prog ress targets for four or more consecutive years. In California, all Title I–funded schools that do not make Adequate Yearly Progress for two consecutive years or more are iden tified for PI—often referred to as “In Need of Improvement.”

2

In a few cases in each cohort, teams submitted identical plans for several schools. If there were multiple team members, we asked a different team member to respond for each school. When a single individual wrote several identical plans, we asked the indi vidual to respond regarding his or her experience.

3

In August 2010, the district and partners received nearly $5 million in federal funds through the Investing in Innovation Fund grant competition and $1 million in matching private funds, which were used to bolster support for the development and implemen tation of school plans and to develop new accountability processes for PSCI schools. Grant partners included LAUSD, Unite-LA (an affiliate organization of the Los Ange les Chamber of Commerce), and the United Way of Greater Los Angeles.

4

In Phase I, however, the parent advisory vote and the superintendent’s panel did not select significantly higher quality plans, while initial reviewers did select higher quality plans. In contrast, in Phase II, the initial reviewers did not select significantly higher quality plans.

References Beltran, R., Cruz, S., Guevara, M., Holmquist, E., & Logan, R. (2011). Los Angeles Unified School District Public School Choice 2.0 advisory vote recommendation process. Los Angeles: League of Women Voters of Los Angeles.

Berends, M., Bodilly, S. J., & Kirby, S. N. (2002). Facing the challenges of whole-school reform: New American Schools after a decade. Santa Monica, CA: RAND. Bodilly, S. J., Chun, J., Ikemoto, G., & Stockly, S. (2004). Challenges and potential of a collaborative approach to education reform. Santa Monica, CA: RAND. Bryk, A. S., & Schneider, B. (2002). Trust in schools: A core resource for improvement. New York: Russell Sage. Bulkley, K. E. (2010). Introduction: Portfolio management models in urban school reform. In K. E. Bulkley, J. R. Henig, & H. M. Levin (Eds.), Between public and private: Politics, governance, and the new portfolio models for urban school reform (pp. 3–26). Cambridge, MA: Harvard Education Press. Bulkley, K. E., Christman, J. B., & Gold, E. (2010). One step back, two steps forward. In K. E. Bulkley, J. R. Henig, & H. M. Levin (Eds.), Between public and private: Poli tics, governance, and the new portfolio models for urban school reform (pp. 127–164). Cambridge, MA: Harvard Education Press. Center on Reinventing Public Education. (2014). Portfolio strategy: Network. Retrieved from http://www.crpe.org/research/portfolio-strategy/network Christman, J. B., Gold, E., & Herold, B. (2006). Privatization “Philly style”: What can be learned from Philadelphia’s diverse provider model of school management? Philadel phia: Research for Action. Chubb, J., & Moe, T. (1990). Politics, markets, and America’s schools. Washington, DC: Brookings Institution Press. Corcoran, T., Fuhrman, S. H., & Belcher, C. L. (2001). The district role in instructional improvement. Phi Delta Kappan, 83, 78–84. Deasy, J. (2011, April 15). Recommended changes to the PSC process [Interoffice correspon dence]. Los Angeles: Los Angeles Unified School District, Office of the Superintendent. DiMartino, C. (2014). Navigating public-private partnerships: Introducing the continuum of control. American Journal of Education, 120(2), 257–282. Edmonds, R. R. (1979). Effective schools for the urban poor. Educational Leadership, 37(1), 5–24. Flores Aguilar, Y. (2009, March 25). Public school choice: A new way at LAUSD [Motions/ resolutions presented to the Los Angeles City Board of Education for consideration]. Retrieved from http://www.lausd.k12.ca.us/lausd/board/secretary/BoardMembers/flo resaguilar/PublicSchoolChoice.doc Fruchter, N., & McAlister, S. (2008). School governance and accountability: Outcomes of mayoral control of schooling in New York City. Providence, RI: Brown University, Annenberg Institute for School Reform. Gamoran, A., Nystrand, M., Berends, M., & LePore, P. C. (1995). An organizational analysis of the effects of ability grouping. American Educational Research Journal, 32(4), 687–715. Garda, R. (2011). The politics of education reform: Lessons from New Orleans. Journal of Law and Education, 40(1), 57–103. Gittell, M. (1994). School reform in New York and Chicago: Revisiting the ecology of lo cal games. Urban Affairs Quarterly, 30(1), 136–151.

Gyurko, J., & Henig, J. R. (2010). Strong vision, learning by doing, or the politics of mud dling through? In K. E. Bulkley, J. R. Henig, & H. M. Levin (Eds.), Between public and private: Politics, governance, and the new portfolio models for urban school reform (pp. 91–126). Cambridge, MA: Harvard Education Press. Henig, J. R. (2010). Portfolio management models and the political economy of contract ing regimes. In K. E. Bulkley, J. R. Henig, & H. M. Levin (Eds.), Between public and private: Politics, governance, and the new portfolio models for urban school reform (pp. 27–53). Cambridge, MA: Harvard Education Press. Hess, F. M. (1999). Spinning wheels: The politics of urban school reform. Washington, DC: Brookings Institution Press. Hill, P. T. (2011). Leadership and governance in New York City school reform. In J. A. O’Day, C. S. Bitter, & L. M. Gomez (Eds.), Education reform in New York City: Ambi tious change in the nation’s most complex school system (pp. 17–32). Cambridge, MA: Harvard Education Press. Hill, P. T., & Campbell, C. (2011). Growing number of districts seek bold change with port folio strategy. Seattle, WA: University of Washington, Bothell, Center on Reinventing Public Education. Hill, P. T., Campbell, C., & Gross, B. (2012). Strife and progress: Portfolio strategies for managing urban schools. Washington, DC: Brookings Institution Press. Honig, M. I. (2009). No small thing: School district central office bureaucracies and the implementation of new small autonomous schools initiatives. American Educational Research Journal, 46(2), 387–422. Humphrey, D. C., & Shields, P. M. (2009). High school reform in Chicago Public Schools: An overview. Menlo Park, CA: SRI International. Incomplete grade—LAUSD still has work to do to make the school reform process func tional. (2011, July 21). Los Angeles Daily News. Retrieved from http://www.dailynews. com/ci_18517193?source=most_viewed Kemple, J. J. (2011). Children First and student outcomes: 2003–2010. In J. A. O’Day, C. S. Bitter, & L. M. Gomez (Eds.), Education reform in New York City: Ambitious change in the nation’s most complex school system (pp. 255–291). Cambridge, MA: Harvard Education Press. Kerchner, C. T., Menefee-Libey, D. J., Mulfinger, L. S., & Clayton, S. E. (2008). Learning from LA. Cambridge, MA: Harvard Education Press. Lake, R. J., & Hill, P. T. (2009). Performance management in portfolio school districts. Seattle, WA: University of Washington, Bothell, Center on Reinventing Public Education. Retrieved from http://www.crpe.org/publications/performance-management-portfolio-school-districts Levin, H. M. (2010). A framework for designing governance in choice and portfolio dis tricts: Three economic criteria. In K. E. Bulkley, J. R. Henig, & H. M. Levin (Eds.), Between public and private: Politics, governance, and the new portfolio models for urban school reform (pp. 217–250). Cambridge, MA: Harvard Education Press. Mac Iver, D., & Mac Iver, M. (2006, April). Effects on middle grades’ mathematics achievement of educational management organizations (EMOs) and new K–8 schools. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.

Marsh, J. A. (2002). How districts relate to states, schools, and communities: A review of emerg ing literature. In A. M. Hightower, M. S. Knapp, J. A. Marsh, & M. W. McLaughlin (Eds.), School districts and instructional renewal (pp. 25–40). New York: Teachers College Press. Marsh, J. A. (in press). The political dynamics of district reform: The form and fate of the Los Angeles Public School Choice Initiative. Teachers College Record. Marsh, J. A., Hamilton, L., & Gill, B. (2008). Assistance and accountability in externally managed schools: The case of Edison Schools, Inc. Peabody Journal of Education, 83(3), 423–458. Marshall, C., Mitchell, D., & Wirt, F. (1985). Influence, power, and policy making. Pea body Journal of Education, 62(4), 61–89. Massell, D. (1998). State strategies for building local capacity: Addressing the needs of standards-based reform. Philadelphia: University of Pennsylvania, Consortium for Policy Research in Education. McDermott, K. A. (2006). Incentives, capacity, and implementation: Evidence from Mas sachusetts education reform. Journal of Public Administration Research and Theory, 16(1), 45–65. McLaughlin, M. W. (1987). Learning from experience: Lessons from policy implementa tion. Educational Evaluation and Policy Analysis, 9(2), 17–78. Menefee-Libey, D. (2010). Neoliberal school reform in Chicago? In K. E. Bulkley, J. R. Henig, & H. M. Levin (Eds.), Between public and private: Politics, governance, and the new portfo lio models for urban school reform (pp. 55–90). Cambridge, MA: Harvard Education Press. O’Day, J. A., Bitter, C. S., & Talbert, J. E. (2011). Introduction to the volume and Children First. In J. A. O’Day, C. S. Bitter, & L. M. Gomez (Eds.), Education reform in New York City: Ambitious change in the nation’s most complex school system (pp. 1–15). Cambridge, MA: Harvard Education Press. Patterson, K. Y., & Cruz, O. E. (2011). A report on the Public School Choice 2.0 advisory vote process. Los Angeles: Families In Schools. Roderick, M., Nagaoka, J., & Allensworth, E. (2006). From high school to the future: A first look at Chicago public school graduates’ college enrollment, college prepa ration, and graduation from four-year colleges. Chicago: Consortium on Chicago School Research. Sabatier, P., & Mazmanian, D. (1979). The conditions of effective implementation: A guide to accomplishing policy objectives. Policy Analysis, 5(4), 481–504. Sclar, E. D. (2001). You don’t always get what you pay for: The economics of privatization. Ithaca, NY: Cornell University Press. Shipps, D., Kahne, J., & Smylie, M. A. (1999). The politics of urban school reform: Legitimacy, city growth, and school improvement in Chicago. Educational Policy, 13(4), 518–545. Snipes, J., Doolittle, F., & Herlihy, C. (2002). Foundations for success: Case studies of how urban schools improve student achievement. New York: MDRC. Spillane, J. P., & Thompson, C. L. (1997). Reconstructing conceptions of local capacity: The local education agency’s capacity for ambitious instructional reform. Educational Evaluation and Policy Analysis, 19(2), 185–203.

Stone, C. N., Henig, J. R., Jones, B. D., & Pierannunzi, C. (2001). Building civic capacity: The politics of reforming urban schools. Lawrence: University Press of Kansas. Strunk, K. O., Marsh, J. A., Duque, M. R., & Bush, S. (2013, May 1). The best laid plans: An examination of school plan quality, selection, and implementation in Los Angeles Unified School District. Paper presented at the annual meeting of the American Educa tional Research Association, San Francisco. Zehr, M. A. (2011). Few big-name charter operators opt for federal “restart” grants. Educa tion Week, 30(22), 14–15.

Chapter 6

Common Core, Uncommon Theory of Action: CEOs in New York City Schools priScillA woHlSTeTTer, BrAnDon BucK, DAviD m. HouSTon, AnD courTney o. SmiTH Teachers College, Columbia University

T

he Common Core State Standards (CCSS) are a set of national standards that outline the basic architecture of skills for English language arts (ELA) and mathematics from kindergarten through 12th grade. Aiming to prepare all students to succeed in college or a career, regardless of their district or state, the standards demand more rigorous academic skills and greater depth of knowledge than most of the state standards they replaced. The CCSS initiative (http://www.corestandards. org) was led by two organizations—the Council of Chief State School Officers and the National Governors Association. Developed in 2009 by experts and state leaders from across the United States, the standards were finalized and released in 2010 after extensive input from teachers, school administrators, and parents (for a comprehensive account of the Common Core’s development, see Rothman, 2011). To enhance political palat ability, the CCSS were presented as voluntary standards that states could adopt or decline, rather than as a federal initiative or a top-down mandate.1 In accord with the standards movement of the 1980s and 1990s, the creators of the CCSS set forth what students should know and be able to do at each grade level in ELA and mathematics but did not stipulate how states, districts, and schools should meet those requirements. Educators, accordingly, have been challenged to find creative and innovative ways to incorporate the CCSS into their pedagogies and practices (Hess & McShane, 2014). Implementation of the CCSS calls for aligned curricu lum materials and assessments, shifts in instructional practice, reimagined professional development, and a coherent set of incentives and supports. These demands have instigated a dynamic and somewhat turbulent policy landscape. And so, despite early support, it is not surprising that the CCSS have more recently been subject to a flurry of criticism (Gewertz, 2013; Trotter, 2014; Ujifusa, 2013). For example, in Educational Researcher, McDonnell and Weatherford (2013) investigated “why there was so little 147

148 | Common Core, Uncommon Theory of Action

initial opposition [to the CCSS] and why it reached significant levels in the past year” (p. 488). Their analysis employed theories of policy and political learning. In this chapter, we invite reflection on that question from a differ ent angle, using theory of action (Argyris & Schon, 1974; Malen, Croninger, Muncey, & Redmond-Jones, 2002) as a conceptual framework. Little empirical research has analyzed the rollout and implementation of the CCSS at the district and school levels. Our exploratory study is intended to address that gap through an investigation of the New York City school system—by all accounts one of the forerunners in implement ing the Common Core (Garland, 2013; Wingert, 2013). With particular attention to the context in which school principals function, our aim in this chapter is to provide insight into New York City’s theory of action (Argyris & Schon, 1974) surrounding the implementation of the Common Core Learning Standards (CCLS), New York State’s version of the Common Core.2 We pursue this aim in two ways. First, we investigate whether and how the district’s espoused theory of action, or express policy, aligns with its theory in use, or observed patterns of practice (Argyris, Putnam, & Smith, 1985). Second, we describe and analyze revealed discrepancies between key assumptions and patterns of practice. In brief, we find that with respect to our sample of high-performing schools, the district’s the ory in use aligns very closely with its espoused theory of action; with respect to the low-performing schools in our sample, the espoused theory of action does not similarly hold. Given the focus of the present volume on “improving districts under pressure,” theory-of-action research provides a sound frame for our study, for three reasons. First, inquiry into a district’s theory of action can serve to highlight dilemmas and contradictions between philosophical princi ples and professional practice (Kerr, 2013). In this way, theory-of-action research can prompt greater reflection, calling practitioners to question and revise key assumptions guiding practice (Savaya & Gardner, 2012). Second, working to reveal theory in use, that is, “what really happens” (Patton, 1990, p. 107), as opposed to simply studying the official ver sion of the story (what we have been calling the “espoused” theory), can help identify areas for further research and future investigation. Third, as Malen and colleagues (2002) have suggested, theory-of-action research can “unearth information on the ‘substantive promise’ dimension of pol icy,” that is, the expected results that justify the costs of a given reform (2002, p. 114). The research findings we present here identify much that is promising but also uncover a host of challenges that might help illumi nate why “advocates worry implementation could derail Common Core”

Priscilla Wohlstetter et al. | 149

(Gewertz, 2013). In any event, we hope that this study can inform subse quent implementation efforts in other states and districts that are tackling the same set of challenges. In the sections that follow, we first review CCSS implementation activities nationwide to provide context for our study of New York City. We then provide a brief overview of the reforms that were implemented under Mayor Michael Bloomberg in order to sketch three foundational pillars of the district’s espoused theory of action. Specifically, we describe how strong central leadership, school support networks, and account ability mechanisms are designed to mediate and structure school “CEO” autonomy. (In New York City, Mayor Bloomberg employed the term CEO, for chief executive officer, common in the private sector to refer to the principals’ enhanced decision-making roles at school sites.) Next, we present our research design and data collection methods, followed by our findings. Our findings describe how the three foundational pillars of the district’s espoused theory of action are expressed in the district’s theory in use regarding the Common Core. More specifically, we describe how strong central leadership, school support networks, and accountability mechanisms interact with—and influence—school CEOs as they pur sue implementation of the CCSS. In the findings section we also under score how certain assumptions implied in the district’s espoused theory of action do not necessarily hold in practice (this is especially true in the low-performing schools in our sample). That is, we show how certain core assumptions simply did not come to fruition. We conclude by outlining a more grounded set of assumptions and identifying key questions to guide future research.

National Context: Common Core Activities in the Adopting States Transitioning to the Common Core The Common Core State Standards reflect three problems with past attempts at establishing educational standards in the United States. The first problem was the widespread perception that previous state standards lacked sufficient curricular depth. Similar standards reforms launched in the 1980s and 1990s were often criticized for being “a mile wide and an inch deep” (Schmidt, 2004, p. 1). In response, CCSS drafters reduced the number of overall skill areas covered and required educators to explore

150 | Common Core, Uncommon Theory of Action

them in greater detail to promote deeper learning. Table 1 summarizes the key instructional shifts highlighted in the move to the Common Core. Table 1. Instructional Shifts Demanded by the Common Core Standards Literacy Shifts

Math Shifts

1. Balancing Informational and Literary Texts Students read a balance of informational and literary texts.

1. Focus Teachers significantly narrow and deepen the scope of how time and energy are spent in the math classroom. They do so in order to focus deeply on, and only on, the concepts that are prioritized in the standards.

2. Knowledge in the Disciplines Students build knowledge about the world (domains/content areas) through text rather than the teacher or activities.

2. Coherence Principals and teachers carefully connect the learning within and across grades so that students can build new understanding on foundations built in previous years.

3. Staircase of Complexity Students read the central gradeappropriate text around which instruction is centered. Teachers are patient and create more time, space, and support in the curriculum for close reading.

3. Fluency Students are expected to have speed and accuracy with simple calculations; teachers structure class time and/or homework time for students to memorize core functions through repetition.

4. Text-Based Answers Students engage in rich and rigorous evidence-based conversations about text.

4. Deep Understanding Students deeply understand and can operate easily within a math concept before moving on. They learn more than the trick to get the answer right. They learn the math.

5. Writing From Sources Writing emphasizes use of evidence from sources to inform or make an argument.

5. Application Students are expected to use math and choose the appropriate concept for application even when they are not prompted to do so.

6. Academic Vocabulary Students constantly build the transferable vocabulary they need to access gradelevel complex texts. This can be done effectively by spiraling like content in increasingly complex texts.

6. Dual Intensity Students are practicing and understanding. There is more than a balance between these two things in the classroom. Both are occurring with intensity.

Note. Adapted from New York State Department of Education website, 2012 (https://www.engageny.org/).

Priscilla Wohlstetter et al. | 151

The second problem with previous state standards that the CCSS were designed to solve was the incomparability between standards, and among the resulting student outcome data, used in different states. Under the No Child Left Behind law (NCLB, 2001), states were allowed to establish their own standards, assessments, and cut scores for levels of proficiency. This led to gamesmanship and considerable variation among states (Benson & McGill-Wilkinson, 2013; Rothman, 2011). The third problem, which the CCSS sought to mitigate, was the longstanding resistance to top-down federal initiatives in education. Although the Obama administration incentivized the adoption of the CCSS by incor porating them into the Race to the Top application and the No Child Left Behind waivers, the National Governors Association and the Council of Chief State School Officers were the original drafters of the new stan dards. The drafters took pains to emphasize that the Common Core was a national rather than a federal project. Also, while the CCSS prescribe ulti mate standards, they do not outline specific instructional or pedagogical demands, leaving school districts and teachers the freedom to design how to teach and sequence the standards in a wide variety of ways. In addition, state adopters of the CCSS may augment the standards with up to 15% more content (Kendall, Ryan, Alpert, Richardson, & Schwols, 2012).

New Standards, New Assessments The new standards emerged during a time of growth in the use of stan dardized testing in education, as a result of NCLB. The shifts in assess ment content and online delivery resulting from efforts to align with the new CCSS significantly raised the stakes for communities working to implement them. The federal government supported the development of two major new assessments aligned to the CCSS, which states had the option to adopt: the Partnership for Assessment of Readiness for College and Careers (PARCC) and the Smarter Balanced Assessment Consortium (SBAC). To date, the U.S. Department of Education has provided $362 million to PARCC and SBAC to develop Common Core–aligned assess ments (Benson & McGill-Wilkinson, 2013). These assessments were field-tested in spring 2014 with the intention that states would administer finalized versions as end-of-year tests in spring 2015. Two states, Kentucky and New York, were the first to embrace CCSSaligned assessments at the state level. New York State developed and administered its own end-of-year test based on the Common Core in spring 2013, in anticipation of the federally financed options. This early-bird

152 | Common Core, Uncommon Theory of Action

mentality may have given New York a leg up on the necessary curricu lar and instructional changes that were fast approaching. However, New York’s shorter and more pressurized timeline for Common Core imple mentation produced obstacles of its own (a phenomenon this chapter will explore in more detail).

States’ Responses to the Common Core In response to the CCSS, states have undertaken a flurry of activity, seek ing to get ahead of the new consortium-developed assessments and the accountability mandates that are expected to follow. Our recent review of state responses indicates broad similarities across CCSS implementa tion activities, which include planning and delivering standards-aligned professional development (37 states), conducting analyses comparing previous state standards with the CCSS (38 states), revising or creating curriculum and instructional materials more closely aligned to the CCSS (29 states), and redesigning teacher evaluation systems to match teacher observation frameworks with the CCSS and to incorporate student growth as a measure of teacher quality (42 states) (Council of Chief State School Officers, 2013; Education First & Editorial Projects in Education Research Center, 2013; Rentner, 2013). Within these broad similarities are nearly innumerable variations in the style, focus, and pace of implementation among the states, as well as a few instances of reversals of direction and wholesale rejection of the standards. Local contexts call for a range of different strategies; and despite the simi larities among activities across a majority of states, implementation prog ress remains uneven. For example, while many states were implementing the Common Core in all grades at once, Florida phased implementation gradually, beginning with kindergarten in 2011–2012 and first and sec ond grades in 2012–2013. Third- through twelfth-graders experienced a blended curriculum, combining Florida’s old standards and the new CCSS. Although Michigan was one of the first states to adopt Common Core in 2010, implementation came to a standstill in July 2013 when state legis lators blocked funding for the standards amid concerns that they under mined local control of schools. The state legislature inserted language in the Michigan Department of Education’s FY 2013–2014 budget explicitly “prohibiting the department from using any funds in the implementation of the standards and Smarter Balanced Assessments” (Einhorn, 2015).3 Some states, such as Kentucky and Utah, moved quickly to gener ate and communicate curriculum guidance. In both of those states, formal

Priscilla Wohlstetter et al. | 153

networks were established, connecting district and school-level educators as they worked to create and disseminate CCSS supports surrounding pro fessional development, curriculum and instructional materials, and prom ising practices. Other states moved less swiftly. Although California was a leader with standards-based reform in the 1980s and 1990s, Common Core proponents have complained that implementation statewide has been “slow and uneven” since the state adopted the CCSS in 2010 (Wingert, 2013).4 California only recently established an Instructional Quality Commission to create curriculum frameworks and other materials and to make recom mendations for aligning components such as professional development, student assessments, and accountability. Moreover, this guidance was not scheduled for release until June 2014, far behind most states (Kirst, 2013).

Introduction to the Local Context: The New York City Department of Education’s Espoused Theory of Action New York City has the largest public school system in the nation, with an enrollment of nearly 1.1 million students. The district’s size dwarfs those of other major U.S. cities. Close to 20% of New York City students partici pate in special education services, and over 40% come from a home where a language other than English is spoken (New York City Department of Education, 2013b). Seventy-five thousand teachers are dispersed across approximately 1,700 schools in the five city boroughs, making the New York City Department of Education (NYC DOE) a system of schools that is as complicated as it is vast (for a comprehensive account of recent reforms, see O’Day, Bitter, & Gomez, 2011). But apart from its size, what sets the New York system apart is the unique governance arrangement established under Mayor Bloomberg. To sketch the district’s espoused theory of action, we will describe the logic that underwrote the Bloomberg reforms beginning in 2002. Soon after Bloomberg took office, the New York State Assembly established mayoral control of the city’s schools. The new mayor appointed Joel Klein, a sea soned antitrust attorney, to the NYC DOE chancellorship, and the new administration began taking steps to restructure the system. At the center of the reforms, principals were empowered as CEOs of their schools and gained responsibility for budgeting, staffing, and instructional program ming. The basic idea behind the move, modeled after the for-profit sec tor, was that a large and inflexible bureaucracy had overburdened school principals and consequently stifled initiative, creativity, and achievement at the school level. Accordingly, the chief aim of the reform agenda was to

154 | Common Core, Uncommon Theory of Action

unfetter school CEOs by granting them autonomy to lead their schools and greater freedom from bureaucracy. However, the new structure empow ered CEOs but did not grant them autonomy carte blanche. Bloomberg and Klein deliberately established three foundational pillars to mediate and structure CEO autonomy: strong central office, robust support struc ture, and comprehensive accountability mechanisms. These pillars repre sented the primary elements of the district’s espoused theory of action. Accordingly, they are our key focus in this chapter. Figure 1 is our repre sentation of the NYC DOE’s espoused theory of action.

Central Office Power As Bloomberg and Klein pursued greater autonomy within each school, they simultaneously worked to empower the central office. The reforms served to “strengthen both the top [central office] and the bottom [CEOs] of the organization, but weaken[ed] the middle” (Hill, 2011, p. 20). Specifically, the reorganization efforts eliminated elected community school boards and reduced the size of district offices, trimming staffs from about 100 people per office to 10 or fewer (Hemphill & Nauer, 2010). By “weakening the middle” in this way, the reform helped to eliminate a significant part of the bureaucracy and cut costs by over 30%. Here, a core principle of the espoused theory of action was that a strengthened central office could set the agenda and establish targets for the entire dis trict. Accordingly, the NYC DOE released an annual set of “Citywide Instructional Expectations,” a policy document outlining the scope and pace of Common Core and teacher development reforms and identifying the key goals that CEOs were expected to pursue.

School Support Networks The Children First Networks (CFNs) constituted the second founda tional pillar of the district’s espoused theory of action: All schools were required to partner with a Children First Network. The rationale behind school support networks was that while CEOs need autonomy to pursue achievement goals, autonomy does not mean isolation; strong leadership requires strong support. But to maintain the integrity of principals’ auton omy, CFNs had little formal authority over local schools. Thus, rather than deliver heavy-handed mandates, CFNs were designed largely to partner with schools in identifying needs in the school or to respond to CEOs’ selfidentified needs (Wohlstetter, Gallagher, & Smith, 2013). CEOs typically

Pillar 3 Comprehensive Accountability Mechanisms Standardized tests and quality reviews measure the degree to which CEOs have achieved central office benchmarks. Assumption: CEOs need to be held accountable for performance.

Pillar 2 Robust Support Infrastructure CFNs provide PD, promote collaboration across member schools, and give CEOs the support they need to achieve district goals. Assumption: CEOs need assistance to achieve district targets.

Figure 1. New York City Department of Education’s espoused theory of action for the implementation of the Common Core Learning Standards. CFN = Children First Network; PD = professional development.

Assumption: CEOs need a clear set of targets to pursue.

The central office sets agenda, pace, and scope of reforms. It establishes targets for CEOs to attain.

Pillar 1 Strong Central Office

School Leaders as Autonomous CEOs

Espoused Theory of Action

156 | Common Core, Uncommon Theory of Action

invited CFNs into their schools to help generate a customized menu of both instructional and operational support, thus avoiding the imposition of an outside one-size-fits-all program. In effect, the arrangement turned accountability on its head (Wohlstetter, Gallagher, & Smith, 2013), mak ing networks fundamentally responsible to school CEOs. In another shift from the previous school support structure, principals were asked to selfselect their preferred CFNs, usually on the basis of a similar instructional focus, educational philosophy, or student population. Neither district boundaries nor geography therefore predicted CFN-school affiliation.5 All CFNs were similarly staffed, with about 12–14 instructional and opera tional personnel, serving approximately 30 member schools (see Figure 2 for an example of a CFN staffing structure).

Comprehensive Accountability Mechanisms The final pillar of the district’s espoused theory of action was accountability. The core idea was that mechanisms should be in place to measure the degree to which CEOs have made progress in relation to district-established bench marks. Despite significant cuts to the district’s bureaucracy, local district offices were not entirely abolished. Along with acquiring some authority over personnel and transactional decision making, the local district offices were made responsible for the quality review process. Comprising both quantita tive and qualitative measures, the quality reviews were the primary data col lection procedures for schools across the city. As such, they served as one of the chief accountability mechanisms for all schools throughout the district. Along with quality reviews, the district placed a heavy emphasis on the New York State exams. The Bloomberg reforms coincided with the rollout of NCLB, and Bloomberg epitomized the decade’s focus on standardized assessment. Student achievement scores also became a highly important factor in the quality review assessment system. Schools that underperformed faced closure proceedings (Medina, 2009), and principals received bonuses if their student scores demonstrated the requisite progress (Gootman, 2008). Almost all of the Bloomberg reforms were designed with one aim in mind: to raise standardized test scores.

Summary In summary, the district’s espoused theory of action was designed to opti mize school leaders’ autonomy as CEOs by mediating that autonomy with the three foundational pillars: a strong central office, robust support infrastructure,

Achievement

Coach

Achievement

Coach

Data/IT, Special Education Support

Director of Human Resources and Payroll

Food, Transportation, and Health

Budget and Procurement Manager

Director of Operations

Operations

Attendance, Safety, and Suspensions

Youth Development, ELL, Network Family Point

Administrator of Special Education

Student and Family Services

Figure 2. Sample staffing structure for Children First Networks. Not all networks are configured the same way. ELL = English language learners. Adapted from New York City Department of Education website (http://schools nyc. gov/AboutUs/schools/support/default htm).

Special Education

Achievement Coach

Achievement Coach

Achievement Coach

Instruction

Deputy Network Leader

Network Leader

Network Leadership

158 | Common Core, Uncommon Theory of Action

and comprehensive accountability mechanisms. The rationale implied in this model is plain. First, the district should provide a clear set of instructional goals and expectations for the principals, whose autonomous role is similar to that of private-sector CEOs in relation to profits and earnings targets. Second, CEOs should be provided with a robust support infrastructure to help them achieve the district goals. The support infrastructure should operate on an as-needed consulting basis and should not resemble a management appara tus. Third, comprehensive accountability mechanisms should measure prog ress toward district benchmarks to ensure school performance. Clearly, the espoused theory of action placed a great deal of faith in the capacity of school CEOs to effectively manage their schools, shift instructional practice, and navigate the district’s unique governance terrain. In what follows, we describe how this model played out during early implementation of the Common Core, and we also discuss which assumptions held and which were mistaken.

Study Methods: Research Questions and Data Sources Using a multi-case, qualitative study design, our research was guided by the following three research questions: 1. In what ways is the district’s espoused theory of action expressed in its theory in use regarding implementation of the Common Core? 2. How do school CEOs interact with the three pillars of the district’s espoused theory of action? 3. What are the key assumptions implied in the district’s theory in use, and how are they reflected in observed patterns of practice? To investigate our research questions, the research team conducted initial interviews in 2011–2012, in 2012–2013, and then again during the 2013– 2014 school year. Using semistructured interview protocols, we conducted a total of nearly 30 interviews across all levels of the education system. We conducted seven interviews with NYC DOE staff (past and present) in lead ership positions relevant to CFNs, CCLS, and the Citywide Instructional Expectations. We determined our sample with guidance from the deputy chief academic officer in the NYC DOE. We asked him to identify key per sons throughout the district in charge of overseeing various aspects of imple mentation of the CCLS. We then conducted five interviews with cluster leaders,6 who supervised the CFNs and served as intermediaries between the central office and the CFNs. The central office and cluster interviews cov ered CCLS topics, such as the division of implementation responsibilities

Priscilla Wohlstetter et al. | 159

across the different levels of the NYC DOE, and local strategies for increas ing the use of the CCLS by teachers. We then asked central office staff to nominate networks that provided average levels of service or support to their schools, based on NYC DOE metrics of network performance. From interviewees’ recommendations based on criteria surrounding average performance, we selected two CFNs. The first CFN contained 29 schools, 25 of which were located in Manhattan and 4 in the Bronx. The other CFN also contained 29 schools, with 20 in Queens and 9 in Brooklyn. We interviewed network leaders, along with achievement coaches for ELA and mathematics, special education, and English lan guage learning. In total, 10 network team members were interviewed. We asked network leaders at each CFN to nominate member schools: one high-performing school (NYC Report Card grade: A or A-) and one lowperforming school (NYC Report Card grade: D or F) within each network. This choice allowed us to compare CFN-school interactions in highperforming and low-performing schools within the same network, thus to some extent helping to mitigate selection bias. Moreover, we attempted to represent at least a portion of the city’s vast socioeconomic, racial, and cultural diversity in our choice of schools. We conducted interviews in schools located in Manhattan’s Lower East Side (low-performing), East Harlem (low-performing), an ethnically diverse neighborhood in western Queens (high-performing), and a more homogeneous, upper-middle-class neighborhood in northeastern Queens (high-performing). Table 2 indi cates key characteristics of the participating schools. Across the four schools, we interviewed (a) the principal at each school, (b) other members of the school leadership team, and (c) teachers, some who were leading CCLS implementation and some who were resist ing it. At the network and school levels, the protocol questions focused on how CFNs worked with schools to build capacity to implement the CCLS. We also asked about the schools’ progress in implementing CCLS in class rooms, inviting respondents to identify both successes and challenges. The interviews varied in length between 45 and 90 minutes; most of the shorter interviews were with teachers on-site at their schools. All interviews were recorded, transcribed, and coded using Dedoose software. During the coding process, we first identified topics addressed in the interview protocols. Additional codes were then added in an iterative process, allowing themes to emerge from the data through a grounded the ory approach. To triangulate our interview analyses, we reviewed archival documents and policies relevant to CFNs as well as to the implementation of the CCLS and Citywide Instructional Expectations.

160 | Common Core, Uncommon Theory of Action

Table 2. Characteristics of Sample Schools School

Location

HighPerforming School 1 (Middle)

Queens (Western)

Number of Students

Student Race/Ethnicity

Other Student Characteristics

Large

5% Black

6% ELL

10% Hispanic

7% IEP

19% White

50% Eligible for free/ reduced-price meal

66% Asian HighPerforming School 2 (Middle)

Queens Medium (Northeastern)

15% Black

5% ELL

41% Hispanic

11% IEP

30% White

25% Eligible for free/ reduced-price meal

14% Asian LowPerforming School 1 (Elementary)

Manhattan (East Harlem)

Small

45% Black

40% ELL

45% Hispanic

24% IEP

6% White

95% Eligible for free/ reduced-price meal

3% Asian LowPerforming School 2 (Elementary)

Manhattan (Lower East Side)

Small

75% Black

20% ELL

24% Hispanic

18% IEP

1% Asian

100% Eligible for free/ reduced-price meal

Note. ELL = English language learner; IEP = individualized education plan. To maintain the confidentiality of our sample schools, we classified the student populations as follows: Small = fewer than 800 students; Medium = 800–1,199; Large = 1,200–1,800. Adapted from the New York City Department of Education, School Search (http://schools nyc.gov/default htm).

Findings: Common Core Implementation in New York City The findings presented next describe the way the district’s espoused theory of action was reflected and expressed in its theory in use with respect to the early implementation of the CCLS (illustrated in Figure 3). In particular, we describe how a strong central office, robust school support, and comprehensive accountability mechanisms were expressed in the theory in use. For each foundational pillar of the espoused theory of action, we describe corresponding assumptions for how CEOs were

Priscilla Wohlstetter et al. | 161

expected to interact with that pillar. We show how the patterns of practice did not always align with the district’s core assumptions, and we detail notable differences between high- and low-performing schools.

Central Office Power: Citywide Instructional Expectations In keeping with the district’s espoused theory of action, the Citywide Instructional Expectations (CIE) served a prominent role in the rollout of the CCLS. Beginning in 2011, one year after New York State adopted the CCLS, the CIE both prioritized key reforms throughout the city and out lined an integrated implementation strategy. With this single policy, the NYC DOE generated a significant degree of coherence and consistency across the city with respect to messaging and focal points for the school year to come. The CIE served as the district’s primary tool for setting the agenda and the pace of implementation by highlighting the fundamental elements of the reforms on which schools and networks should focus in the coming school year. One principal said he appreciated that the CIE consoli dated messages coming from the central office: “It keeps us all on the same page.” In the course of many rapid, simultaneous reforms, the CIE aimed to provide clarity regarding the necessary first steps toward CCLS-aligned instruction and curricula, and it established how schools should prepare for the new teacher evaluation system. The scaffolded implementation timeline for the CCLS, as structured by the CIE, is displayed in Table 3. Many of the administrators and teachers we interviewed described the CIE as widely used and influential. In fact, all interviewees were familiar with both past and current versions of the CIE and regularly cited their influence in daily practice. A middle school ELA teacher underscored their value with respect to planning and preparation: “The CIE are usually released in the spring, and so we know our goals for the next school year. They help me figure out what I need to work on during the summer months.” As shown in Table 3, during the 2011–2012 school year the CIE required teachers to engage all students in at least one literacy task and one math task aligned to Common Core standards, strategically selected citywide. According to an NYC DOE staff member, The intent of the first set of CIE was to offer principals, teach ers, and students a glimpse of the new grade-level expectations of the CCLS. Secondly, the student results on the performance tasks provided schools with tools they could then use to conduct

Common Core assessments become the primary focus, and quality review measures shift toward the new standards. Assumption: CEOs have the resources and support they need to achieve district accountability benchmarks.

Support networks have two functions: communication and professional development. Assumption: CEOs can exploit extant infrastructure to shift instructional practice.

Figure 3. New York City Department of Education’s theory in use for implementation of the Common Core Learning Standards. CIE = Citywide Instructional Expectations.

Assumption: CEOs have the capacity to manage simultaneous major reforms.

CIE set the agenda for multiple reforms (Common Core, teacher evaluations).

Pillar 1 Strong Central Office

Pillar 3 Comprehensive Accountability Mechanisms

Pillar 2 Robust Support Infrastructure

School Leaders as Autonomous CEOs

Theory in Use

Priscilla Wohlstetter et al. | 163

Table 3. Timeline of New York’s Citywide Instructional Expectations 2011–2012 Prepare

2012–2013 Prepare

2013–2014 Enact

Students will engage in one literacy task and one math task aligned to the strategically selected Common Core standards.

Students in Grades PK–5 will experience four Common Core–aligned units of study: two in math and two aligned to the literacy standards in ELA, social studies, and/or science.

Students will experience rigorous instruction and learn content by engaging with standards-aligned curricula in all content areas.

Teachers will work in teams to review student work to understand the steps needed to reach the level of performance that the Common Core demands. School leaders will provide teachers with meaningful feedback tied to an evidence-based rubric of teacher performance.

Students in Grades 6–12 will experience eight Common Core–aligned units of study: two in math, two in ELA, two in social studies, and two in science. Teachers will deepen their understanding of the CCLS and of the use of the Danielson Framework for Teaching to shift instructional practice. School leaders will focus teacher development on the CCLS. School leaders will refine terminology and strengthen understanding of what highquality teaching looks like aligned to the Danielson Framework for Teaching. School leaders will conduct frequent formative observations and provide feedback and professional development to improve practice in identified competencies. Teachers and school leaders will share information on students’ CCLS work and progress toward college and career readiness with families.

Teachers will, in all grades and content areas, plan and teach lessons and units that integrate the literacy and math CCLS instructional shifts. Teachers will adjust their lessons, units, and classroom assessments to address the gaps between what the standards require and what their students know and are able to do. School teams will create systems to regularly and collaboratively look for evidence of growth and gaps in student work and teacher practice in order to make adjustments. Teachers will actively participate in their own development, supported by the implementation of a new system of teacher evaluation. School leaders will review evidence of teacher effectiveness, including student work and teacher practice aligned to the Danielson Framework for Teaching, to inform teacher development as a part of the new teacher evaluation system.

Note. ELA = English language arts; CCLS = Common Core Learning Standards. Each year, the Citywide Instructional Expectations tool, designed to clarify the district-wide expectations for schools across New York City, outlines what school teams are called to do in service of the Common Core’s goal of preparing all students for college and career success. This table gives a brief summary of major portions of the Citywide Instructional Expectations from 2011 to 2014. Data from New York City Department of Education, 2012a, 2013c, 2014.

164 | Common Core, Uncommon Theory of Action

gap analyses on student achievement against the new standards. Finally, our intention was that the results of the performance tasks would guide professional development at the school, CFN, and central office levels to support the instructional shifts needed to prepare students for the Common Core. In the following year, 2012–2013, the CIE moved beyond isolated tasks, challenging schools to engage all students in two units aligned to literacy and math standards of the Common Core. This expansion called for more sub stantial instructional changes, which—by extension—prompted increased network professional development to assist teachers with, among other things, unit and lesson plan development, Common Core–aligned text selec tion, and shifting instructional practices. As one network leader said, “A lot of the work of the networks now is around providing support to our schools in the instructional initiatives that the central Department of Education has picked as priorities.” In 2013–2014, the annual CIE were released again, this time asking schools to build on their experiences with the Common Core in 2011–2013 and to expand for comprehensive implementation across the system. This year’s CIE called for all ELA and math instruction from pre-K through ninth grade to align to the Common Core, while other subject areas were called to align to existing New York State standards and to incorporate the Common Core instructional shifts where appropriate. At the same time that the CIE targeted shifts toward the CCLS, they also identified specific aspects of the Danielson Framework for Teaching on which CEOs should focus their attention with respect to teacher feed back and development. The Danielson Framework is a comprehensive tool used by school leaders to guide observations and feedback cycles on teachers’ instructional practices. It was included as part of the NYC DOE’s newly created teacher evaluation system in 2013, though use of the framework was highlighted in the 2011–2012 and 2012–2013 CIE in an effort to help schools develop familiarity with the rubric before it was used formally in the implementation of the new evaluation system. As a recipient of both a Race to the Top grant in 2010 and an NCLB waiver in 2013, New York City worked to simultaneously implement the CCLS and redesign its teacher evaluation system. Framing the two reforms as equal and interdependent, the CIE signaled that the district’s main reforms were not separate and discrete new requirements; instead, they constituted a complementary, integrated strategy. The idea was that, by focusing on par ticular aspects of instructional practice alongside CCLS alignment, these aspects of the system would work together to move both instructional

Priscilla Wohlstetter et al. | 165

practice and student work, two key levers in preparing students for college and career. Hence, the CIE highlighted specific instructional moves and the Common Core standards and shifts to gauge progress. Assumption: CEOs can manage the simultaneous implementation of major reforms. The district’s theory in use regarding the CIE reflects the core idea embedded in the district’s espoused theory of action. However, a key assumption, namely, that all CEOs have the requisite dexterity and bandwidth to juggle multiple reforms, was not necessarily borne out in practice. To be sure, our interviews with principals and teachers in highperforming schools indicated that the CIE messaging was, on balance, largely successful. Significantly, however, we also uncovered patterns in which respondents from low-performing schools described a set of interactions with the CIE different from those of respondents from highperforming schools. CEOs and teachers in the low-performing schools complained that the CIE generated too many demands in too short a time. In school-site inter views with CEOs at low-performing schools, the two major reforms outlined in the CIE were viewed merely as an additive, burdensome list of compliance requirements rather than a coherent—and achievable—strategy. In addition, as a new set of CIE was released each year, CEOs said they remained in perpetual motion on curriculum matters. One CEO expressed frustration at the large workload required to align the curriculum to the CCLS: “What is being asked of us is not something that can’t be done. But if the materials that we’re using don’t correlate with the new standards, then we’re the ones who have to fill in all the gaps. This whole process is like knitting a sweater while we’re wearing it!” A teacher at the same school confirmed the stress of formal classroom observations grounded in the Danielson Framework for Teaching: “If you don’t really understand the purpose and the expectations of what you’re teaching, then you just can’t feel prepared. It can be uncom fortable when [the principal] is in my classroom. I have too many jobs. How am I supposed to teach and develop curriculum at the same time?” We asked the CEO at the other low-performing school about her approach to the multiple reforms. Her frantic cadence reflected the general tone of the interview and aligned with the complaints of others at her school: Oh my God—special ed reform, the Danielson Framework, feed back to students, feedback to teachers, the units of study, the per formance tasks, the new type of . . . it’s not “Acuity” anymore. What are those things called? Performance assessments. So, I

166 | Common Core, Uncommon Theory of Action

mean, the list goes on and on and on and on. I sometimes leave my [CFN’s] principals’ meeting and I’ll text my AP [assistant principal] to say, “Okay, we gotta add this onto the list now.” It’s not that these things aren’t important. I just think that with our kids we just need time to focus on them. Less is sometimes more. I think what happens is, the teachers feel the pressure from all these demands competing for our time, and then the teachers get overwhelmed. There’s this fear that I’m not doing what I’m supposed to be doing, even though there are 20 things on our plate that we’re supposed to be doing. Reform becomes very, very difficult. Respondents at the high-performing schools, on the other hand, gener ally viewed the CIE and the twin reforms far more favorably. One teacher suggested that she and the school CEO were on the same page with regard to how the CCLS and the new teacher evaluation system were related; she believed they served to enrich one another: “If I can shift how I teach more in line with Danielson and get positive feedback from my classroom observations, then I think I’ll be better at teaching the Common Core, and all my students will have a better chance of success.” The CEO at the other high-performing school made a similar point: I know that some people really don’t enjoy the new teacher evalu ation process. But I think if used in the right way, it will make my teachers much stronger. Regardless of whether the district requires it, I would still be evaluating my teachers to make sure they are incorporating the standards in their classroom practice. If we’re going to introduce a new set of standards, then we have to make sure teachers are doing it. It is evident that while the district worked to advance clear and coher ent targets for all schools across the city, school leaders and teachers interpreted the CIE in different ways. Some embraced the simultaneous reforms and were able to generate a coherent strategy toward implementa tion. Others believed the reforms were layered-on “too much, too soon.”

Network Support: Communication and Professional Development To assist schools in advancing the CCLS, the CFNs occupied two key support roles. First, they served as the primary communication liaison between schools and the central office; second, the CFNs assisted schools

Priscilla Wohlstetter et al. | 167

with professional development support for school leaders and for their teachers (the two often facilitated one another). One network team leader described the communications role this way: “We’re the middleman. The central office defines the reform initiatives; we go out and make sure that our schools are aware of them and that they’re implementing them.” The CFN’s chief responsibilities with respect to professional development support included targeted learning opportunities for school staff, assist ing with classroom observations of teaching practices, and demonstrating model lessons. In short, the network’s achievement coaches worked to provide concrete feedback for teachers in preparation for the new teacher evaluation system, as well as for the CCLS. As communication liaisons, CFNs functioned to clarify Common Core expectations for schools while operating as a central component of the citywide feedback loop. In this respect, the flow of information was bi directional. CFNs working in the schools had a better sense of CCLS imple mentation on the ground and could, therefore, share the concerns of CEOs with cluster leaders who, in turn, informed the central office. The CFNs in this way served as boundary spanners (Honig, 2006; Wohlstetter, Houston, & Buck, 2015), bridging together diffuse governance apparatuses. The CFN communication role was activated primarily through profes sional development trainings. During the trainings, CFNs worked to ensure that everyone was on the same page with respect to the CIE, including both CCLS and the Danielson Framework. The CFNs largely offered two options for professional development trainings: Some held centrally at the CFN offices, others based at individual member schools. The office-based train ings served to promote communication across and among member schools by bringing together educators in similar roles from across the network. At its best, this form of professional development fostered meaningful col laboration among schools because the trainings targeted specific content areas and student populations (e.g., English language learners, students with disabilities). The trainings provided an opportunity for teachers across the CFN’s schools to come together and share ideas, practices, and strategies. Both the central and on-site trainings aimed to unpack the CCLS the matically so teachers could better incorporate the standards into the curric ulum and into their own teaching practices. To that end, workshops often integrated topics related to implementing the Common Core and strength ening overall teacher effectiveness. According to a network achievement coach who specialized in ELA, “Part of what we [CFNs] do is go into classrooms in schools and try things out and talk about what we see; then we share observations with different teachers at different schools.”

168 | Common Core, Uncommon Theory of Action

Assumption: CEOs have the capacity to exploit extant infrastructure to shift instructional practice. The network-provided professional develop ment activated a key element of the district’s espoused theory of action: While the CIE established the district’s implementation targets, the net works aimed to clarify the targets and to support schools in pursuit of them. But as observed in the schools’ reactions to the CIE, the CEO-CFN inter actions played out differently in the high-performing and low-performing schools in our sample. The CEOs of high-performing schools were aware of and actively participated in the broad menu of centralized CFN trainings that were available. They regularly sent teams of administrators and teach ers to the trainings and used the sessions to broaden their schools’ CCLS expertise across faculty members. By contrast, the low-performing schools tended not to attend central CFN trainings, and when they did, the same two people—the principal and assistant principal—went every time. They sent neither teachers nor specialized support staff. With respect to site-based training, the same pattern emerged. Here, the disconnect might have been a consequence of how CFNs were struc tured. In keeping with the espoused theory of action, autonomous CEOs were expected to reach out directly to CFNs for training. The CFNs, on this account, functioned in a consulting capacity and were generally first solicited to address areas that the schools had specifically targeted internally. Our research findings indicate that CEOs at high-performing schools proactively identified areas where help was needed and then were very purposeful in their professional development requests. For these schools, collaboration between the schools and CFNs was initi ated at the school leadership level. A CEO of a high-performing school described a communication with CFN staff: “Here are our school goals; this is what our needs are, so it’s not your agenda, it’s our agenda.” These CEOs typically invited CFNs into their schools to help teachers and administrators develop performance assessments aligned with the CCLS and to create enhanced CCLS-aligned lesson plans, among other things. The CEO at the other high-performing school explained the tar geted support this way: The way I worked it out with our CFN is to zoom in on what our goals are. So when we were focusing on ELA, we had one achievement coach come in. When we shifted to math, it was [another network member]. Right now, we’re focusing on our subgroups—special education students and English language learners, so we have been working with two other achievement coaches for the past year.

Priscilla Wohlstetter et al. | 169

By contrast, the low-performing schools rarely initiated assistance from their CFNs. The CEOs at low-performing schools did not generally express a clear plan of action with respect to specific school needs in the pursuit of the focal areas of the CIE. And since there was no actionable game plan, there was no set of specific requests. While CEOs and other school leaders at low-performing schools indicated that they were receptive to CFN assis tance when it was offered, little assistance was forthcoming, in part because of the expectation that assistance should first be requested by the principal. In the few instances that we uncovered in which the low-performing schools did use site-based CFN training, the sessions tended to be far more general and informational (e.g., introducing and orienting faculty to CCLS ELA) than specific and targeted. Sometimes, the CFNs approached the low-per forming schools with unsolicited assistance, either because performance in a particular area was a particular challenge or because a quality review was imminent; this practice, however, remained relatively rare.7 CFN staff indicated that they were strongly inclined to assist the lowperforming schools, even when unsolicited. They did not refrain from doing so merely because the expectation was for CEOs to reach out first. Many CFN staff said it was very difficult to reach out with unsolicited assistance, in part because of the time and attention dedicated to providing the actively solicited assistance but also because of significant structural and organizational challenges. The CFNs we studied employed approxi mately 15 educators and professionals and were responsible for serving between 25 and 35 member schools. CFN staff members were respon sible for only a subset of member schools, but even so, some schools were nearly ignored. Just attending to normal, day-to-day instructional support required an extraordinary amount of time. One ELA specialist was respon sible for 30 schools; in a 40-hour work week, that amounted to just over one hour per school, per week. Even under ideal conditions, that is insuf ficient. As one network member said, “I’m simply not always capable of being in the schools the way I want to be. What I end up doing is a driveby for the sake of accountability. With 30 schools, it is very difficult to take a deep dive [in]to any one school, or even a few schools.” The problem was compounded by geographic realities. The NYC DOE intentionally did not require networks to cluster by geographic location (Hemphill, Nauer, White, & Jacobs, 2013). Instead, it aimed to promote clustering based on other network features, such as a shared mission or a common instructional philosophy. But as New York City is expansive and geographically dispersed across five boroughs, travel between school sites can be time-consuming and cumbersome. One network leader listed member

170 | Common Core, Uncommon Theory of Action

schools by geographic district, describing a broad swath of New York, jumping from borough to borough, neighborhood to neighborhood—four boroughs in all. While some coordination was possible, a network stretch ing from Brooklyn to the Bronx undoubtedly posed significant challenges. The use of the CIE coupled in this way with CFN professional develop ment was based on the assumption that principals and teachers knew what they needed in order to achieve defined objectives and could exploit the extant infrastructure accordingly. This assumption can also be seen in the fact that the CFNs functioned first of all in a consulting capacity. That is to say, while the CFNs tried to provide general support to all of their affiliated schools, the expectation was that schools would reach out independently to identify specific kinds of assistance on an as-needed basis. And these prac tices, above all, reflect the espoused theory: Empowered and autonomous school CEOs should not be told what to do or what they need; rather, the CEO knows best what is required to succeed. Our findings, however, at least among CEOs at the low-performing schools, tell a more complicated story.

Accountability Mechanisms: Quality Reviews and Assessment By 2012, the quality review protocol was revised to reflect prominent CCLS requirements contained in the CIE. The quality review, as mentioned earlier, was a process in which central administrators visited a school over multiple days to collect qualitative data for the school’s overall evalua tion. The review generated a picture of the practices and processes of each school, noting both school strengths and areas for growth. In this respect, the quality review augmented the progress report’s predominantly quan titative metrics with qualitative data based on observations and document reviews. The quality review rubric, highlighting practices aligned to the CIE, was especially valuable for CFNs because it served to clarify even further what was intended by the CIE. As one CFN team member noted, “When a school is getting ready for its quality review, I definitely spend more time with the principal.” The high-performing schools partnered with their CFNs well in advance to hold mock quality reviews through which network members engaged in “part role playing and part coaching.” A CFN team member explained: We help schools with their self-assessment. [We say,] Let’s now think of each grade level. What do you want to concentrate on with your teacher teams for this performance task? Let’s think about the features on the quality review rubric before, while there

Priscilla Wohlstetter et al. | 171

is still time to get yourself ready. Let’s make a plan for what you’re going to do to be more ready. In the 2012–2013 “instructional core” section of the quality review rubric, the shifts toward the Common Core were identified as fundamental indi cators of school quality. In fact, the first two components of instruction that the quality review team looked for were (a) a shift in language from “state standards” to “Common Core Learning Standards,” or “CCLS” and/ or “content standards,” and (b) the inclusion of instructional shifts that reflected a connection between the CCLS and classroom instruction. These shifts needed to incorporate native language scaffolds into CCLS teaching strategies, among other things (NYC DOE, 2012a, 2012b). The NYC DOE also added new questions to the 2012–2013 teacher survey to assess the quality of feedback that teachers were receiving to support their efforts to understand and implement the Common Core (NYC DOE, 2013a). In addition, a strong component of the quality reviews measured progress on the new CCLS-aligned assessments. The district persistently emphasized the development of new CCLS assessment tools as another key fixture in the accountability framework. By 2012, the district had already contracted with Pearson (Otterman, 2011) to design and implement early CCLS pilot tests across the district in anticipation of the PARCC assess ments that were scheduled to roll out in 2013. Assumption: Principals and teachers have the resources and supports they need to achieve accountability benchmarks. Because the district’s theory in use continued to put a premium on school autonomy, the NYC DOE emphasized the development of new student assessment tools but did not focus on the creation of new curricular resources. In this way, the theory in use aligned with the espoused theory: If the district establishes clear targets and benchmarks, unfettered CEOs will know best how to attain them. Schools were largely expected to generate their own curricu lar materials and programs to engage with the CIE and adhere to quality review requirements, while CCLS-aligned curricular options were made available for school selection in the spring of 2013. The CEO autonomy implied in the district’s espoused theory of action is underwritten by a key assumption that CEOs have the resources and supports they need to execute district aims and achieve accountability benchmarks. But both network members and CEOs invariably suggested that the emphasis on assessment came at the expense of much-needed cur ricular and instructional supports. That is, even CFN staff (the district’s

172 | Common Core, Uncommon Theory of Action

key support apparatus) expressed frustration at not being able to provide the requisite support to develop an entirely new CCLS-aligned curriculum with their member schools. As one network member said, “The fact that the third- through eighth-grade [New York State] tests changed before cur riculum was ready added some extra pressure to schools, to say the least.” The CFNs simply were not designed for developing curriculum; their role was curriculum support, not curriculum development. They had neither the existing resources nor the capacity to build curriculum for, in many cases, 30 member schools. Given the lack of district and CFN support in this area, CEOs and grade-level teams were forced to spend significant additional time devel oping curriculum from scratch and were consequently often distracted from normal day-to-day duties. In light of the new teacher evaluation system, this was especially trying. For example, a fourth-grade teacher who was also a grade-level leader complained that the additional planning responsibilities prevented teachers from going deeper into instructional issues in their team meetings: So one period a week, approximately 45 to 50 minutes, we’ll meet as a grade level to discuss business. . . . I think we should be dis cussing teaching practices. . . . But I guess this year’s business was all about curriculum—unspiraling the math curriculum, figuring out how to align our ELA curriculum with the standards, curricu lum writing, curriculum development. . . . I would prefer that we used the meeting time to discuss instructional practices. Another teacher at a low-performing school expressed a similar point, that prior to the CCLS reforms, teachers and school leaders would spend more time together talking about individual students—who was struggling, who needed extra help. But since so much time was spent developing new cur ricular materials, these other conversations were no longer as central. Despite the complaints, the NYC DOE did not completely aban don curricular support during the transition. In fact, the district cre ated the Common Core Library (http://schools.nyc.gov/Academics/ CommonCoreLibrary/default.htm), a website housing curriculum bundles, rubrics, student work samples, professional learning activities, and other classroom resources. In the early phases of implementation, however, the library was sparsely supplied. The CEO at one underperforming school complained that what few resources were available were irrelevant to her students and required too much supplementary instruction. She expressed

Priscilla Wohlstetter et al. | 173

particular dissatisfaction with the district’s fourth-grade Farmer Fred math bundle, which she said required significant vocabulary support for urban students unfamiliar with farming terminology, such as irrigation, hectare, acre, and livestock. But as Farmer Fred was one of the few bundles avail able, they used it. The New York State Department of Education had also launched EngageNY (https://www.engageny.org), another website intended to provide curricular resources for school leaders and teachers, in this case throughout the state. But while the CEOs and teachers in our study invari ably complained about the district’s Common Core Library, there was rarely mention of New York State’s EngageNY website, which had been established with the good intention of open-sourcing Common Core– aligned materials. In effect, however, it constricted the number of materi als available to schools because many publishers were unwilling to share their products free of charge. Thus, EngageNY had insufficient resources during the time that the CIE required implementation of CCLS-aligned tasks and other assessments. Again, the CEOs at low-performing and high-performing schools expressed different sentiments. The CEOs at the two high-performing schools, for example, commissioned teacher teams to design their own units and performance assessments, relying partly on CFN staff but also tapping into curricular resources available from other states. And while teachers across all sample schools complained about the lack of NYC DOE curricular support, school leaders at the high-performing schools embraced the additional challenge, viewing it as a learning opportunity. Here is how one CEO put it: Well, I think an unintended consequence of having to write our own units is that teachers get to learn and practice backwards design. So they get to learn what it is to create the assessments based on the big ideas. They have to be more precise in their cur riculum design and lesson planning. Our [network] assessment trainer said he thought the work that our teachers had done was better than some of the Common Core work that’s been done else where. We’re pretty proud of that. Other teachers agreed that the opportunity to create new curriculum mate rials and bundles was worth the extra time, since they could better meet the specific needs of their students. One teacher said that he was actually pleased that they did not have to use the district’s “spoon-fed” curriculum.

174 | Common Core, Uncommon Theory of Action

Conclusions Our research findings make plain that the district’s theory in use played out differently in high-performing and low-performing schools. It appears that the decisive assumption—on which the district’s theory in use hinges— surrounds school and principal autonomy. The high-performing schools in our study were able to exploit autonomy to their own advantage. In partic ular, they took ownership of the district’s established focal areas expressed in the CIE and knew how to use the CFNs to advance shared aims. In this way, they demonstrated the capacity to diagnose problems and develop a clear vision to remediate them. In short, the district’s theory in use aligned best with the espoused theory of action where school leaders exhibited the requisite qualities to flourish with increased autonomy. For the low-performing schools in our sample, increased autonomy did not similarly advance shared district aims. Low-performing schools did not demonstrate an ability to juggle multiple reforms simultaneously. As a consequence, they did not exploit extant CFN infrastructure in the same way. The espoused theory of action anticipated that school leaders would perform the bulk of the diagnostic and prescriptive work and that CFN members would assist in the pursuit of internally developed goals. Thus, patterns of practice often found in the low-performing schools departed significantly from the district’s espoused theory of action. CFN efforts were often also hampered by the geographic dispersion of member schools within the network. CFN staff expended considerable time and resources traveling between member schools. The purpose of geographic dispersion was originally, in part, to reduce the influence of local politics on schools (Hemphill et al., 2013; Nadelstern, 2012). It is not clear, however, whether the perceived gains of this model sufficiently out weighed its inefficiencies. Along with increasing travel time, geographic dispersion also diminished the likelihood that school-level leaders from different schools would interact on a regular basis. In this respect, our research signaled the possibility that CFNs were underutilizing a capacity to promote greater sharing across member schools. Research on charter school networks (charter management organiza tions, or CMOs) has revealed that the home office often assumes a bound ary-spanning role to facilitate sharing across school buildings—a model reinforced by deliberate geographic clustering (Wohlstetter, et al., 2015; Wohlstetter, Smith, & Farrell, 2013). It is not uncommon, for example, to find CMOs that have Grades K–12 housed in the same building. In this respect, comparative analyses could prove instructive: How do CFNs

Priscilla Wohlstetter et al. | 175

compare with CMOs vis-à-vis sharing best practices across member schools? Does geographic clustering promote more shared practices? How can CFNs redeploy resources to disrupt the silo effect that can emerge among member schools? To be sure, given our small sample size and the exploratory nature of this study, the inferences we draw are necessarily preliminary, tenta tive, and at best, suggestive. And we do not present the reductive thesis that observed differences in school performance are a consequence solely of differences in school leadership. It is not, for example, insignificant that the low-performing schools in our study were situated in low-income neighborhoods stressed by a plethora of other challenges. These schools served high numbers of low-income at-risk students, English language learners, and special education students. The school leaders, accordingly, had a range of increased responsibilities, beyond academics, incumbent on schools that operate in such contexts. An alternate explanation for these schools’ lower performance is that, rather than being deficient in leader ship capacity or initiative, they simply lacked the requisite supports and resources to adequately pursue the focal points of CIE. At the same time, however, it has been documented that schools that serve larger low-income, at-risk populations tend to attract less-qualified and less-experienced school leaders and teachers (Clark, Martorell, & Rockoff, 2009; Clotfelter, Ladd, Vigdor, & Wheeler, 2006). Thus, some observed differences in school performance may plausibly be attributed to poor leadership. But even that conclusion does not tell the whole story. It is prudent, and probably more productive, to interpret some problems as arising from a mismatch between leadership style and support infra structure. Further research is necessary to uncover the most efficacious school support practices in cases where different school leadership capaci ties and resources are present. Some school leaders flourish when given increased autonomy; others perform best with more direct guidance and outside support. Ultimately, it is irresponsible to assume that all school leaders should (or even can) operate in the same way when they are given enhanced autonomy. The pace at which the NYC DOE rolled out both the CCLS and the teacher evaluation system did indeed appear to be “too much, too soon,” even if some school leaders were able to manage the simultane ous demands. The relative success of reform implementation should not be measured solely according to the best performing units in the district. This seems to be the fundamental problem with the theory in use regard ing CCLS implementation: The strongest performers were rewarded

176 | Common Core, Uncommon Theory of Action

with the most support because they could effectively navigate the NYC DOE infrastructure. In summary, our findings complicate the core assumptions implied in the NYC DOE’s theory in use. We believe each of the core assump tions should be revisited to more directly reflect patterns of practice on the ground. First, while it is in some respects laudable to accelerate and inte grate multiple reforms, such a pace does not necessarily promote sustained change. Regardless of the school, time to reflect on and modify practice is important during any course of systemic change. While some schools can be responsive, other schools stall under the weight of simultaneous reforms and, given additional burdens, require a more gradual, measured pace. Second, a support structure that serves on a consult-first basis has the effect in practice of providing the most support to the highest performers. Such practice is like a regressive tax and does not promote achievement among all participants throughout the system. Based on this exploratory study, consideration ought to be given to revising the NYC DOE’s support apparatus so that the most intensive interventions are targeted first, and above all, to the needs of the lowest performing schools in the NYC DOE. This might involve rearranging the district’s support structures so that (a) schools are clustered geographically, (b) the lowest performing schools are in small networks that allow for more direct support, and (c) networks that serve higher-needs schools have more support personnel. Recent research agrees with our conclusions. Hemphill and colleagues (2013), for instance, have argued that the geographic dispersion of networks is ultimately inefficient. And Pallas (2013) has suggested that the universal model of support networks may be disadvantageous to struggling schools; he therefore endorses a model of differentiated support. As he writes, Although this strategy would be at odds with the DOE’s desire to devolve decision-making authority to the principal, the DOE may have a broader perspective on what would constitute a good match between a school’s needs and a support provider than does an inexperienced principal. One could envision the school support providers for high-needs schools operating out of the DOE’s cen tral offices, in ways that parallel the chancellor’s district created by Chancellor Rudy Crew in 1996. (p. 10) Finally, the fact that the NYC DOE chose to focus first on CCLS assess ments rather than curriculum very likely inhibited more robust implemen tation of the CCLS, especially early on. Unquestionably, all participants

Priscilla Wohlstetter et al. | 177

in our study needed substantially more district support. As our findings indicate, teachers and other school-level personnel reported significant energy and time devoted to developing CCLS-aligned curricula. Many complained that other important responsibilities had been subordinated to this one task. Further research should examine the relative implementation efficacy of other states, such as Kentucky and Utah, that focused first on curriculum development. Did school leaders in those states feel they had more support and resources? Through what mechanisms did those states develop and provide curricula for their districts? Ultimately, the findings from this study challenge the central assumption that school leader auton omy should be preferred in all cases. The early rollout of the CCLS in New York City presents at least one instance in which greater district resources and control would have been advantageous.

Epilogue In January 2014, Bill de Blasio took the oath of office as mayor of New York City and, soon after, appointed the longtime school administrator Carmen Fariña, who came out of retirement to become de Blasio’s school chancellor. Since our time in the field, implementation of the Common Core nationally and locally has hit some road bumps. Controversy over the Common Core has led a number of states to distance themselves from the new standards (Ujifusa, 2014). State leaders in Missouri, North Carolina, Utah, and Wisconsin have moved to reassess their involvement in recent months, although it remains unclear what the final outcome may be. Oklahoma and Indiana have formally dropped the Common Core, while other states, such as Florida, Georgia, Indiana, and Pennsylvania, have opted out of the consortia poised to finalize new Common Core–aligned assessments (Vander Hart, 2014). In New York, throughout the fall of 2013 a series of public forums intended to generate community feedback and clarify Common Core policies were interrupted by hecklers hurling protests, many directed specifically at New York State Education Commissioner John King. Commissioner King’s office, in response, cancelled the remaining forums, citing persistent interruptions and distractions (DeWitt, 2013). On January 26, 2014, the New York State United Teachers 80-member board of direc tors unanimously passed and publicized a resolution of “no confidence” aimed at King, citing among other things a diminished opportunity for local stakeholder input (New York State United Teachers, 2014). Newspaper articles identified complaints similar to those uncovered in our study:

178 | Common Core, Uncommon Theory of Action

(a) teachers lacked the proper curriculum, (b) implementation was too rushed, and (c) assessment was overemphasized. In light of similar feedback across the state, New York State Governor Andrew Cuomo called for a special committee to investigate the “botched” implementation (Riede, 2014). The committee’s report, released in March 2013, offered some significant criticisms of the standards’ rollout. It suggested a reprieve from the high stakes of testing associated with the Common Core (particularly among younger students), greater teacher supports, ongoing parent and citizen input regarding implementation, and stricter data standards to protect student privacy (Litow et al., 2014). In keeping with the skeptical political environment, the NYC DOE’s 2014–2015 CIE represented a marked departure from the hurried imple mentation goals of previous years. Rather than targeting another round of CCLS benchmarks, the new CIE encouraged practitioners to “refine and reflect,” indicating a significantly tempered implementation pace (NYC DOE, 2014). In a similarly moderated tone, Chancellor Carmen Fariña, marking her first 100 days in office, delivered a speech calling for more collaboration among schools: “In a system of our size, we can’t work in silos” (Darville, 2014). It is clear that the increasingly turbulent politi cal climate in U.S. education is conditioning the ways that school leaders and practitioners approach the Common Core. How the de Blasio–Fariña administration will ultimately respond remains an open question.

Notes 1

A few states declined to adopt the CCSS in their entirety. Minnesota adopted the new standards for ELA but not for mathematics. The city of Anchorage adopted the stan dards, but the state of Alaska declined, as did Indiana, Nebraska, Texas, and Virginia.

2

In keeping with the spirit of federalism behind the new standards, states were allowed to augment the CCSS with up to 15% additional content. New York State amended both the ELA and math standards with state-specific content and called the resulting package the Common Core Learning Standards to signal those changes.

3

Funding was restored in October 2013 after four months of hearings. In 2014, just nine months before Michigan planned to administer the Smarter Balanced exam to students, the state legislature shut down the test, barring the test and joining a growing number of states that have pulled out of the two Common Core consortia in favor of single-state exams (Einhorn, 2015).

4

The recession and its aftermath have gotten a lot of blame for California’s slowness, but in November 2012 voters approved Proposition 30, a seven-year tax hike to raise state education funding. Every district in the state received a proportional slice of the new tax money (totaling $1.25 billion) in fall 2013, with the caveat that it be used only to implement the Common Core.

Priscilla Wohlstetter et al. | 179

5 It is not unusual for CFNs to serve schools spread across multiple New York City boroughs (Hemphill et al., 2013). 6 Cluster offices, as conceived of by the Bloomberg-Klein administration, served as intermediaries between the CFNs and the central NYC DOE. The first year that New York instituted clusters (2010), the public schools were served by six clusters. The number was reduced to five late in the fall of 2011. Cluster offices supported their respective CFNs (and schools) by creating professional learning opportunities to improve teacher practice and student achievement. At the same time, cluster offices were in charge of evaluating CFNs and, in this way, held CFNs accountable. 7 CFNs are evaluated, in part, on the grades their member schools receive during the quality review. This serves both as an incentive to CFN proactivity and as an additional accountability mechanism.

References Argyris, C., Putnam, R., & Smith, D. M. (1985). Action science. San Francisco: Jossey-Bass. Argyris, C., & Schon, D. A. (1974). Theory in practice: Increasing professional effectiveness. Oxford, UK: Jossey-Bass. Benson, J., & McGill-Wilkinson, R. (2013). Research objectives related to the Common Core State Standards. Paper prepared for Technical Working Group meeting, “Research in College- and Career-Ready Standards to Improve Student Outcomes,” Institute of Education Sciences, Washington, DC. Clark, D., Martorell, P., & Rockoff, J. E. (2009). School principals and school performance (CALDER Working Paper 38). Washington, DC: Urban Institute. Clotfelter, C., Ladd, H. F., Vigdor, J., & Wheeler, J. (2006). High-poverty schools and the distribution of teachers and principals. North Carolina Law Review, 85, 1345–1379. Council of Chief State School Officers. (2013, October). Implementing the Common Core State Standards: State spotlights. Retrieved from http://www.schoolturnaroundsupport. org/resources/implementing-common-core-state-standards Darville, S. (2014, April 12). Here’s the full text of Fariña’s speech marking her first 100 days. Chalkbeat New York. Retrieved from http://ny.chalkbeat.org/2014/04/12/heresthe-full-text-of-farinas-speech-marking-her-first-100-days/#.Ve8xmJfW8h4 DeWitt, P. (2013, October 15). N.Y. State Commissioner John King’s “message” gets cancelled. Education Week, Finding Common Ground. Retrieved from http://blogs. edweek.org/edweek/finding_common_ground/2013/10/ny_state_commissioner_john_ kings_message_gets_cancelled html?qs=dewitt++message+gets+cancelled Education First & Editorial Projects in Education Research Center. (2013). Moving forward: A national perspective on states’ progress in Common Core State Standards implementation planning. Seattle, WA: Authors. Einhorn, E. (2015, March 18). Common Core means 3 tests in 3 years for Michigan kids. The Hechinger Report. Retrieved from http://www npr.org/sections/ed/2015/03/18/389772922/ common-core-means-three-tests-in-three-years-for-michigan-kids

180 | Common Core, Uncommon Theory of Action

Garland, S. (2013, October 15). New York teachers excited and worried as Common Core stan dards launched in full. The Hechinger Report. Retrieved from http://hechingerreport.org/ new-york-teachers-excited-and-worried-as-common-core-standards-launched-in-full/ Gewertz, C. (2013, April). Advocates worry implementation could derail Common Core. Education Week, 31(29), 6–9. Gootman, E. (2008, June 6). Leaders of four “F” schools are now up for bonuses. New York Times. Retrieved from http://www nytimes.com/2008/06/06/education/06bonuses html Hemphill, C., & Nauer, K. (2010). Managing by the numbers: Empowerment and account ability in New York City’s schools. New York: Center for New York City Affairs at The New School. Hemphill, C., Nauer, K., White, A., & Jacobs, T. (2013, November). Building blocks for better schools: How the next mayor can prepare New York’s students for college and careers. New York: Education Funders Research Initiative. Hess, F. M., & McShane, M. Q. (2014). Introduction. In F. Hess & M. McShane (Eds.), Common Core meets education reform: What it all means for politics, policy, and the future of schooling (pp. 1–13). New York: Teachers College Press. Hill, P. T. (2011). Leadership and governance in New York City school reform. In J. A. O’Day, C. S. Bitter, & L. M. Gomez (Eds.), Education reform in New York City: Ambi tious change in the nation’s most complex school system (pp. 17–32). Cambridge, MA: Harvard Education Press. Honig, M. I. (2006). Street-level bureaucracy revisited: Frontline district central-office ad ministrators as boundary spanners in education policy implementation. Educational Evaluation and Policy Analysis, 28(4), 357–383. Kendall, J., Ryan, S., Alpert, A., Richardson, A., & Schwols, A. (2012, March). State adop tion of the Common Core State Standards: The 15 percent rule. Retrieved from http:// files.eric.ed.gov/fulltext/ED544664.pdf Kerr, P. A. (2013). Theory of action and information literacy: Critical assessment towards effective practice. In S. Kurbanoğlu, E. Grassian, D. Mizrachi, R. Catts, & S. Špiranec (Eds.), Worldwide commonalities and challenges in information literacy research and practice (pp. 429–435). Cham, Switzerland: Springer International. Kirst, M. W. (2013, March). The Common Core meets state policy: This changes almost everything [Policy memorandum]. Stanford, CA: Stanford University, Graduate School of Education, Policy Analysis for California Education. Litow, S. S., Flanagan, J., Nolan, C., Darling-Hammond, L., Hathaway, T., Jackson-Jolley, A., . . . Weisberg, D. (2014, March). Putting students first. (Report of the Common Core Implementation Panel to Governor Andrew M. Cuomo: Initial recommenda tions). Retrieved from http://www.governor ny.gov/sites/governor ny.gov/files/archive/ governor_files/Common_Core_Implementation_Panel_3-10-14.pdf Malen, B., Croninger, R., Muncey, D., & Redmond-Jones, D. (2002). Reconstitut ing schools: “Testing” the “theory of action.” Educational Evaluation and Policy Analysis, 24(2), 113–132. McDonnell, L., & Weatherford, S. (2013). Organized interests and the Common Core. Educational Researcher, 42(9), 488–497.

Priscilla Wohlstetter et al. | 181

Medina, J. (2009, November 26). Mayor says student scores will factor into teacher ten ure. New York Times, p. A1. Retrieved from http://www.nytimes.com/2009/11/26/ education/26teachers html?pagewanted=all&_r=0 Nadelstern, E. (2012). The evolution of school support networks in New York City (CRPE Working Paper 2012-2). Seattle, WA: Center on Reinventing Public Education, Uni versity of Washington, Bothell. New York City Department of Education. (2012a). 2012–13 citywide instructional expectations. New York: Author. Retrieved from http://schools nyc.gov/NR/rdonlyres/ 944401BC-84C3-41AC-A8C6-9F326F310DD2/0/201213CitywideInstructional Expectations091412.pdf New York City Department of Education. (2012b). 2012–13 quality review rubric. New York: Author. Retrieved from http://schools nyc.gov/NR/rdonlyres/99E6EBBC 888F-45FF-A526-DAF67D76328E/0/201213QualityReviewRubric.pdf New York City Department of Education. (2013a). NYC school survey: 2013 surveys. New York: Author. Retrieved from http://schools nyc.gov/Accountability/tools/ survey/2013surveysamples htm New York City Department of Education. (2013b, June). Office of English Language Learners: 2013 demographic report. New York: Author. Retrieved from http://schools nyc.gov/ NR/rdonlyres/FD5EB945-5C27-44F8-BE4B-E4C65D7176F8/0/2013Demographic Report_june2013_revised.pdf New York City Department of Education. (2013c, May). 2013–14 citywide instructional expectations. New York: Author. Retrieved from http://schools nyc.gov/NR/rdon lyres/C8CFE95F-9488-458B-AEBF-1A21AD914F64/0/201314CitywideInstructional ExpectationsMay62013.pdf New York City Department of Education. (2014). 2014–15 citywide instructional ex pectations. New York: Author. Retrieved from http://schools.nyc.gov/Academics/ CommonCoreLibrary/About/InstructionalExpectations/default htm New York State United Teachers. (2014, January 25). NYSUT Board approves “no con fidence” resolution. Albany, NY: Author. Retrieved from http://www nysut.org/ news/2014/january/nysut-board-approves-no-confidence-resolution O’Day, J., Bitter, C. S., & Gomez, L. M. (Eds.). (2011). Education reform in New York City: Ambitious change in the nation’s most complex school system. Cambridge, MA: Harvard Education Press. Otterman, S. (2011, August 12). In $32 million contract, state lays out some rules for its stan dardized tests. New York Times. Retrieved from http://www.nytimes.com/2011/08/13/ nyregion/new-york-in-contract-with-pearson-lays-out-rules-for-state-tests html Pallas, A. M. (2013). Policy directions for K–12 public education in New York City. Toward a 21st century city for all: Progressive policies for New York City in 2013 and beyond. Retrieved from http://21c4all.org/content/policy-directions-k-12-public education-new-york-city?_ga=1.233427230.1095034774.1441734701 Patton, M. (1990). Qualitative evaluation and research methods. New York: SAGE. Rentner, D. S. (2013, August). Year 3 of implementing the Common Core State Standards: An overview of states’ progress and challenges. Washington, DC: Center on Education Policy.

182 | Common Core, Uncommon Theory of Action

Riede, P. (2014, January 21). Cuomo calls for panel to take “corrective action” on Common Core. Syracuse.com. Retrieved from http://blog.syracuse.com/news/print.html?entry=/2014/01/ cuomo_weighs_in_on_flawed_common_core_implementation html Rothman, R. (2011). Something in common: The Common Core Standards and the next chapter in American education. Cambridge, MA: Harvard Education Press. Savaya, R., & Gardner, F. (2012). Critical reflection to identify gaps between espoused theory and theory-in-use. Social Work, 57(2), 145–154. Schmidt, W. H. (2004, September 14). Papers and presentations, Mathematics and Science Initiative, archived information. ED.gov. Retrieved from http://www2.ed.gov/print/ rschstat/research/progs/mathscience/schmidt.html Trotter, A. (2014, January 14). Big business speaks out for the Common Core. The Journal: Transforming Education Through Technology. Retrieved from http://thejournal.com/ articles/2014/01/14/big-business-speaks-up-for-common-core.aspx Ujifusa, A. (2013, October). Six states to collaborate on standards implementation. Educa tion Week, 33(7), 4. Ujifusa, A. (2014, August 5). Standards persist amid controversy. Education Week. Re trieved from http://www.edweek.org/ew/articles/2014/08/06/37standards-2 h33 html Vander Hart, S. (2014, January 23). What states have pulled out of their Common Core assessment consortium? Truth in American Education. Retrieved from http://truthina mericaneducation.com/common-core-assessments/what-states-have-pulled-out-of their-common-core-assessment-consortium/ Wingert, P. (2013, October 15). With money available for Common Core, California dis tricts study their options. The Hechinger Report. Retrieved from http://hechingerreport. org/with-money-available-for-common-core-california-districts-study-their-options/ Wohlstetter, P., Gallagher, A., & Smith, J. (2013). New York City’s Children First Net works: Turning accountability on its head. Journal of Educational Administration, 51(4), 528–549. Wohlstetter, P., Houston, D. M., & Buck, B. (2015). Networks in New York City: Im plementing the Common Core. In T. Young & W. Lewis (Eds.), Educational Policy Implementation Revisited, 2015 Politics of Education Association Yearbook, Educational Policy, 29(1), 85–110. Wohlstetter, P., Smith, J., & Farrell, C. (2013). Choices and challenges: Charter school performance in perspective. Cambridge, MA: Harvard Education Press.

Chapter 7

How Leadership Churn Undermines Learning

and Improvement in Low-Performing

School Districts

KArA S. FinnigAn

University of Rochester

AlAn J. DAly

University of California, San Diego

yi-HwA liou

I

National Taipei University of Education

t is a common refrain in educational communities, political circles, and the mainstream media that greater accountability is needed to bring about change in our public schools. Yet we are faced now with more than a decade of high-stakes accountability policies at all levels of the educational system with little positive improvement to show for it. As the national push for higher levels of performance and accountability through federal policies and programs, such as No Child Left Behind (NCLB) and Race to the Top, has increased the pressure on schools and districts in the most challenging circumstances, most schools have struggled rather than improved. Along with the movement toward greater accountability there has been a nationwide effort to increase the use of data in decision making on instructional practice and organizational processes, resulting in increased training on the use of data and increased access to different types of data for school and district leaders. This movement toward evidence-based decision making in education has caused quite a bit of controversy as stakeholder groups define and prioritize different types of evidence. But it is important to note that education is not the only field attempting to bridge the gap between knowledge and action. Similar movements have emerged in other fields (e.g., health care; see Ward, Smith, Carruthers,

The first two authors contributed equally to this chapter.

183

Harner, & House, 2010) and in other countries (e.g., the United Kingdom; see Gough, Tripney, Kenny, & Buk-Berge, 2011; for Canada, see Cooper, 2012). Data-based decision making in the current context of accountabil ity policy has pushed school and district leaders to access and interpret data within their formal organizational structures, as well as through infor mal social networks, as they collectively make sense of different types of evidence and explore what they mean for improvement (see, for example, Coburn, 2005; Parise & Spillane, 2010; Spillane, Reiser, & Reimer, 2002). In this chapter we focus on districts that historically have faced low test scores and increased sanctions through accountability policies. What many of these policies seemingly overlook in simply raising standards and increasing consequences is that many underperforming schools are turbulent organizations with high staff turnover, multiple and changing reforms, and leadership challenges (Daly, 2009; Daly & Finnigan, 2011, 2012; Finnigan, 2010, 2012; Finnigan & Gross, 2007; Finnigan & Stewart, 2009). Although we know that system-wide improvement is closely linked to the quality and structure of interpersonal relationships (McGrath & Krackhardt, 2003; Tenkasi & Chesmore, 2003), we have little understand ing of how these critical relationships are threatened by the turmoil in underperforming systems. Social interactions can be significantly dis rupted when a high percentage of actors leave and enter the system, creat ing a type of social network churn—the focus of this chapter. Exploring and understanding churn is important, as there are significant costs asso ciated with the exit of individuals in an organization, including loss of knowledge, social support, and organizational memory, as well as loss of training and development investments as these individuals will no longer contribute to organizational goals. In addition, there are costs related to the ongoing entrance of new actors into the system, including training and development costs, as they learn new technical and social systems. In essence, constant churn can be viewed as disruptive to fiscal, human, and social capital within organizations. Building on this prior work, drawing on the theoretical lens of orga nizational learning, and using the methodological approach of social net work analysis, we examine schools and their larger district contexts as they attempt to improve under accountability sanctions. We pay particular atten tion to the existence of relational ties and changes in ties over time. We use social network analyses (SNA) to answer two questions: To what extent do leaders in low-performing districts have the cross-sector connectedness necessary for large-scale learning and improvement? And how does net work churn impact the underlying social networks of leaders in the system?

Our study makes a unique contribution because it involves four years of longitudinal network data on school and central office leaders in a lowperforming school district in the northeastern United States. Although the number of network studies in education continues to grow, longitudinal network data are still quite rare. In addition, we focus specifically on the relationships among and between school principals and central office lead ers in an effort to understand the district as a larger organizational unit and to understand district-wide leadership, in particular, as both are critical to system-wide (vs. school-by-school) change. A secondary goal of this chapter is to highlight the use of SNA as a tool for understanding organi zational learning and improvement in low-performing districts.

Theoretical Framework Organizational Learning Researchers tend to think of learning in education as the process through which individual students gain knowledge or skills in school settings. But learning is also an important aspect of the work of educators pursu ing school or district improvement. To understand what learning means as part of improvement efforts, we draw upon the work of organizational learning theorists (see, e.g., Argyris & Schön, 1996; Huber, 1991; Levitt & March, 1988; March, 1991). As these scholars suggest, this kind of learning is the process of detecting and correcting problems to improve organizational effectiveness (Argyris & Schön, 1996). Members of orga nizations diagnose underlying issues facing the organization as a first step in efforts to learn and improve (Argyris & Schön, 1996; Collinson & Cook, 2007). Learning in an organizational sense leads members to change behaviors (Levitt & March, 1988) and to change organizational norms (Collinson & Cook, 2007) through a deliberate and collective pro cess (Fiol & Lyles, 1985). Much of the theoretical literature suggests that, for learning to occur, members of an organization must carefully examine underlying assump tions, values, and beliefs through a reflective and collaborative process. Unlike single-loop learning, which is an effort to achieve existing goals within existing norms, double-loop learning involves examining “incom patible organizational norms by setting new priorities and weightings of norms, or by restructuring the norms themselves together with asso ciated strategies and assumptions” (Argyris & Schön, 1996, p. 24). An important aspect of double-loop learning is that it requires examination

of organizational values or assumptions that at one time were supportive of organizational goals but now inhibit the organization’s ability to learn. While single-loop learning involves incremental or routine changes, dou ble-loop learning involves more radical change and innovation (EasterbySmith, Crossan, & Nicolini, 2000). As is suggested by the questioning of organizational norms and goals, learning at the organizational level requires collective or collaborative par ticipation. As Stoll (2009) has pointed out, it involves dialogue, allowing members of the community to connect, discuss, and debate. Organizational learning thus involves social activities and social processing of knowledge (Bransford et al., 2009; Hubbard, Mehan, & Stein, 2006; Marks & Louis, 1999), as individuals in the organization develop and share new knowl edge and tools that result in commonly held ideas or practices or collective learning. Organizational actors incorporate the ideas into practice, either formally or informally, and through a retrieval process adopt the new prac tices over time when faced with new situations.

Social Networks and Social Capital In addition to drawing upon organizational learning theories, our study is based on social network theory and methods. A core aspect of social network theory is social capital, which consists of “the resources embed ded in social relations and social structure which can be mobilized when an actor wishes to increase the likelihood of success in purposive action” (Lin, 2001, p. 24). Social capital is concerned with the resources that exist in the relationships or ties between individuals, as opposed to the resources of a specific individual. The structure and quality of the ties between indi viduals ultimately determine opportunities for social capital transactions and access to resources (Burt, 2000; Granovetter, 1973, 1982; Lin, 2001; Putnam, 1993, 1995). Next, we focus on two central aspects of social capital: networks and trust (e.g., Bourdieu, 1986; Halpern, 2005; Nahapiet & Ghoshal, 1998). Networks can be seen as the patterned structure of relationships that exist within a particular organization or group (Nahapiet & Ghoshal, 1998). Importantly, while individuals are embedded within relationships, their relationships are, in turn, embedded in larger subgroups that form a social network. The role of networks has been implicated in both supports and constraints in the process of organizational change, learning, and improve ment (Balkundi & Kilduff, 2005; Bartol & Zhang, 2007; Leana & Van Buren, 1999; Mehra, Dixon, Brass, & Robertson, 2006; Penuel, Riel,

Krause, & Frank, 2009; Weinbaum, Cole, Weiss, & Supovitz, 2008). The structure of social networks can support organizational goals by facilitat ing the flow of information between individuals and overcoming problems of coordination (Lazega & Pattison, 2001; Tsai & Ghoshal, 1998). In addition, trust has been identified as one of the most important affective norms characterizing a community (Nahapiet & Ghoshal, 1998). Trust is based on interpersonal interdependence (Rousseau, Sitkin, Burt, & Camerer, 1998) and involves an individual’s or group’s willingness to be vulnerable to another party based on the confidence that the latter party is benevolent, reliable, competent, honest, and open (Cummings & Bromiley, 1996; Hoy & Tschannen-Moran, 2003). High levels of trust have been associated with a variety of efforts that require collaboration, learn ing, complex information sharing and problem solving, shared decision making, and coordinated action (Bryk & Schneider, 2002; Cosner, 2009; Lin, 2001; Tschannen-Moran, 2004; Tschannen-Moran & Hoy, 2000). In essence, predictability of relations gained through reciprocal interactions decreases the vulnerability between individuals and potentially increases the depth of exchange due to a willingness to engage in risk taking (Larson, 1992; Uzzi, 1997). Reciprocated relations provide opportunities for individuals to interact and learn together, which is important in edu cational systems oriented toward change (Honig, 2008; Lave & Wenger, 1991; Wenger, 1998). Networks and trust are central to understanding the disruptive nature of organizational churn.

Methods and Data Sources Our study involves case study design (Yin, 2003) focusing on a mid-size urban district in the northeastern United States, serving approximately 32,000 students. Labeled as “In Need of Improvement” under NCLB, the district is 90% non-White, with 88% of students receiving free or reducedprice lunches. Within the district, nearly all of the high schools and many elementary schools are identified as “underperforming,” based on state and federal accountability guidelines. This district is an important case as it typifies many urban districts across the country that (a) serve primarily students of color from low socioeconomic communities, (b) have a pattern of underperformance, and (c) are engaged in district-wide improvement efforts to move beyond sanctions. The longitudinal quantitative data collection occurred between 2010 and 2013 and involved a survey instrument administered to school and district staff. The social network questions were developed on the basis of

prior network studies (Cross, Borgatti, & Parker, 2002; Cross & Parker, 2004; Hite, Williams, & Baugh, 2005) and targeted both instrumen tal (work-related) and affective (emotional or expressive) relationships. Respondents were asked to quantitatively assess a particular relationship with each individual on a 4-point frequency scale ranging from 0 (not at all) to 4 (1–2 times a week). For example, regarding expertise ties respon dents were asked the following: Please select the frequency of interaction for each school/district staff member whom you consider a reliable source of expertise related to your work. Each year, we administered the survey to the district’s leadership team, which included 181 individuals over the four-year period. We sur veyed those in formal leadership positions in the district, including the superintendent, chiefs and directors from the central office, and principals at the school sites. We used a bounded/saturated approach (Lin, 2001; Scott, 2000), meaning that we listed all leaders, both in schools and in central offices. The benefit of using this strategy is that, coupled with high response rates, it provides a more complete picture and more valid results than an unbounded approach (Lin, 2001; Scott, 2000). Response rates were above 75%, thereby meeting the threshold for social network analysis (Scott, 2000). For the social network analyses, we used UCINET software (Borgatti, Everett, & Freeman, 2002), including Netdraw. Given that respondents tend to be more accurate at identifying ongoing patterns than determining occasional interactions and because we were interested in stable structural patterns (Krackhardt, 2001), we dichotomized the data for our analysis to include only the most frequent ties between actors, that is, data indicating that individuals interacted at least once every two weeks. Importantly, the realities of the low-performing urban district are extremely challenging for using rigorous social network strategies. To accurately examine changes in network structures and types of relation ships over time is difficult when you have such high turnover from year to year. Thus, for this chapter we use unmatched comparisons, both to dem onstrate the challenges at hand and because the numbers of people who remained in the schools across all four time periods were somewhat low. We used visual maps to display the data in the figures that follow, which allowed us to use different sizes, shapes, and colors based on our analytic focus. For example, we sized the nodes by centrality to indicate which members had disproportionate influence over the flow of resources (Raider & Krackhardt, 2001). In addition, we examined average degree (the average number of ties), as well as distinct network measures such as

density (the number of social ties between actors divided by the number of total possible connections) and reciprocity (the proportion of mutual connections). We also examined differences related to type of relationship, including instrumental or work-related versus expressive or emotional, and we examined individuals who brought research-based ideas into the network, given our focus on organizational learning and improvement.

Results Our central focus was on whether low-performing districts had the crosssector connectedness necessary for large-scale learning and improvement. We used social network analysis, based on the alignment between this method and the theoretical lenses discussed earlier, which suggest that collaborative, co-constructed learning is necessary. To get a glimpse of whether this district had the embedded social relationships necessary for organizational learning, we started by examining the instrumental or work-related ties among school and central office leaders. By looking closely at the network maps from the cross-sectional data, we find that the instrumental ties increased from Time 1 to Time 4 (Figure 1). In each social network map, a dot represents a leader in the district, and the lines represent the connections they have to each other around a particular network resource, in this case around expertise. The ties are directional, meaning the arrow indicates to whom the person goes for expertise, and if the arrow is bidirectional, then this represents a recip rocal relationship. Any dots on the left-hand side of the map are isolates; no school or central office leader turns toward this person for expertise, nor does the person turn to anyone else in the leadership network. The nodes are sized by centrality, meaning more central individuals—because more people turn to them—are represented by larger nodes on the map. As the maps show, the structure of this instrumental (work-related) network during this time period became slightly more dense, increasing in density by 2% (meaning over time the proportion of connections between leaders out of possible connections increased by 2%, from .03 or 3% of available ties in Time [T] 1 to .05 or 5% of available ties in T4, as seen in Table 1). However, considering the number of connections leaders have in this network makes this a little more apparent. In this network of more than 120 people, leaders were, on average, connected to 5.74 other leaders around expertise in Time 1. These linkages more than doubled in the time period of our study, with 12.52 average connections in Time 2, 10.03 in Time 3, and 11.20 in Time 4, as seen in Table 1. (Please note: We illustrate

Time 1

Time 4

Figure 1. Increased instrumental ties from Time 1 to Time 4. Squares represent school leaders; circles represent central office leaders.

T1 and T4 only in the network maps to show the overall pattern of change, but we provide the measures for all four time periods.) In essence, the maps suggest that district leaders became more connected around their work dur ing this time period (though their overall connectedness was somewhat low). This, perhaps, is not surprising given the heavy emphasis on the tech nical nature of improvement through accountability policies. Table 1. Network Measures Over Time Measure

Instrumental Ties T1

Average Degree

T2

T3

T4

Emotional Ties

Research Ties

T1

T2

T3

T4

T2

T3

T4

5.74 12.52 10.03 11.20 4.94

1.84

2.09

3.03

6.75

5.66

4.84

Density

.03

.06

.05

.05

.02

.01

.01

.01

.03

.02

.02

Reciprocity

.14

.24

.17

.17

.12

.04

.02

.04

.08

.03

.06

Note. T = time.

However, SNA also allows us to dig deeper into these data, and the results are much more complex. One of our main findings from the study relates to the conceptual and methodological challenges of network insta bility as school and district leaders leave their positions voluntarily or involuntarily from year to year. Importantly, from Time 1 to Time 4 only 51% of the same leaders were in these roles (we retained them in the sam ple even if they had moved positions within the school and central office, so this percentage likely underestimates the actual churn in the district). In essence, while we know that turnover exists in urban school districts, our data unearth just how challenging the problem of improving these districts is when leaders are in a constant state of flux, with approximately one half of the leaders moving in and out over this four-year period. As we show in more detail below, it is critical that we pay attention to the movement of leaders given the importance of trust and the strong and collaborative rela tionships that are necessary for organizational learning. In Figure 2 we show the same map as in Figure 1 (Time 1), but in this case we use colors to show movement in and out of the district over four years. The pink nodes are the leaders who stayed throughout the four years. The other colors (gray, pea green, black, and purple) are those who moved out of the district between T1 and T4. As this map shows, the most central expert individuals left the district between T1 and T4. The map also shows school and central office leaders by shape, with the school leaders designated by squares and the central office leaders

designated by circles. The large proportion of squares on the periphery of the network (shown on the left-hand side) illustrates how isolated the principals were from the expertise that resided in the district. We now look more closely at the network churn over this time period

Time 1

Figure 2. Instrumental ties in Time 1, illustrating churn. Squares represent school leaders; circles represent central office leaders.

and find that the Figure 1 maps can be misleading because they suggest network stability while the underlying networks in this district are constantly in flux, making strong, trusting relationships and collaborative learning nearly impossible. In Figure 3, again, only the pink nodes are constant throughout our study period. Here we include all leaders over the four time periods to show the gravity of the movement in and out; in this case the left-hand side has not only isolates but also “comers” and “goers.” We include these to help show the network churn, as all of these individuals were part of the leadership network during the course of the study. Interestingly, while some of the “old timers” became more central by Time 4 (meaning pink nodes that were more peripheral and therefore smaller at T1 became more central and therefore bigger in T4), the newcomers, that is, the bluish-green nodes that had just arrived in Time 4, were equally central, suggesting the fast development of work-related ties in this district.

with other leaders at Time 4. In Figure 4, we again show the entire net work, with the pink nodes as the “stayers” throughout the four years. This network map helps to illustrate the decrease in emotional ties and, impor tantly, the challenges of churn, since there are very few central nodes and none in the stable group of leaders or the group of “stayers” (in pink). In essence, each year leaders in this district had to reestablish underly ing relationships, including both work-related and affective or emotional ties. The SNA shows just how tenuous these relationships were, with instrumental ties being slightly more dense than emotional ties, indicating high levels of fragmentation. In addition, reciprocal ties were somewhat low, with less than 17% of instrumental ties that existed being reciprocal ties in T4 (after a high of 24% in T2). Reciprocated emotional ties dropped from 12% to 4% over the four-year period, suggesting weak affective or emotional connectedness in the district. To provide another perspective, we look at another instrumental net work, which we call “research ties” (Figure 5). This network focuses on where people turned for research-based ideas that they used in their prac tice. This question was not asked in the first year, so we have only three time points. In the first time point, T2, we see more average ties than in the expertise network, suggesting the movement of research-based ideas and practices. However, many of the central research sharers were gone by T4, as shown in the Time 4 map in Figure 5. In this map we used the mapping tools to remove leaders who left between T2 and T4 and retain leaders who stayed during that time period. Given the importance of trust in the sharing of ideas and practices, it is not that surprising that the network churn has an impact on some of the more complex resources that are shared; in this case, research-based ideas and practices are likely more complex than, say, more routine knowledge. As Table 1 shows, the average number of ties around research is reduced to 4.84 in Time 4 from 6.75 in Time 2. It is important to call attention to the largest light blue node in Figure 5 (T2). This leader in the central office, a critical source for research-based ideas and practices, moved out of the district by T3. The move severely disrupted the sharing of research-based practices districtwide, as can be seen in the Time 4 map, with fewer ties overall and no clear “go-to” people for research-based ideas and practices. Given the challenges facing this district under sanction, the sparse network is important in representing the negative impact of churn on organizational learning, as exploitation becomes limited. Another important area of our analysis has to do with the movement in and out of principal positions in this district. While superintendent

Time 1

Time 4

Figure 4. Network churn for emotional ties from Time 1 to Time 4. Squares represent school leaders; circles represent central office leaders.

Time 2

Time 4

Figure 5. Research users/sharers in Time 2 who remained in Time 4. Squares represent school leaders; circles represent central office leaders.

and teacher turnover gets a great deal of attention in the broader litera ture, less attention has been paid to the movement of principals into and out of schools. In a low-performing district this has always been impor tant, but given the recent policies requiring replacement of principals (e.g., through the School Improvement Grants), principal turnover is even more important to understanding not only schoolwide but districtwide change. In Figure 6 we focus on the expertise network again, but this time we include only principals. Again, pink nodes are the “stayers” throughout the four-year time period. If individuals moved to the cen tral office or to an assistant principal or teaching position, they were not included in these network maps. In other words, these maps represent school leaders who occupied principal leadership positions throughout the time of the study. In Figure 6, we include all of the individuals across the four years to illustrate the movement in and out of these positions, as any colors other than pink on the far left side are “comers” and “goers” (the pink nodes on the left-hand side are isolates). As can be seen from as early as T1, the underlying relationships among princi pals were quite sparse. The colors in T1 help to illustrate the churn, as any tie that is not linked to a pink node would be removed from year to year. As Table 1 indicates and the map in Figure 6 shows, the district’s instrumental ties increased by Time 2, but the more central leaders were gone by Time 3 and Time 4. In Time 3, and again in Time 4, we see a decrease in instrumental ties between school leaders. Importantly, nearly all of the high schools and many of the elementary schools in this district were under sanction because of their low performance. The social network maps help to illustrate the limited lateral flow of ideas and practices across school-based leaders, thereby limiting the larger district-wide improvement efforts.

Conclusions and Implications In the literature there is overwhelming attention to the low performance levels of youth in urban school settings, but the organizational instability of these systems resulting from the churn of educational leaders is gener ally overlooked. As our data show, nearly half of the leaders in the dis trict we studied left across a four-year period, with a constant flow into and out of leadership positions. In today’s knowledge-intensive systems, such as our public schools, the consequences of churn include not only the monetary costs associated with people leaving and entering a system but the relational costs as well. A loss of knowledge or expertise—or

Time 1

Time 2

Figure 6. Principal instrumental ties from Time 1 to Time 4. Squares represent school leaders; circles represent central office leaders.

Time 3

Time 4

(Figure 6, cont.)

even of someone who helps to “bind” a social system—has detrimental effects on an organization in terms of training and development as well as embeddedness and support. In this final section we call attention to our three main findings and discuss the implications for policy, practice, and research. First, our social network data indicate sparse emotional ties among district leaders—with decreasing emotional connectedness during the time period of our study and low levels of reciprocity. These findings indicate that greater attention needs to be paid to the relational aspects of reform, to allow the leaders in this district to develop underlying relation ships built on the kind of trust and respect that enables true collaboration and collegial practices. We found that network churn among school and district leaders created an instability of relationships that undermined the potential for organizational learning. Also connected to network churn, we found very few reciprocal relationships, which are the cornerstone of communities of practice. Second, our data show that over time, more central leaders in the expertise network moved out of the district, while more peripheral leaders remained in leadership positions. Our results seem to align with the study by Soltis, Agneessens, Sosovova, and Labianca (2013), which found that leaders who were most sought for relationships but received less reward and recognition tended to leave the system. In addition, a handful of new comers, along with these peripheral old-timers, became central by the last year of our study, resulting in a bifurcated system of leaders. While the instrumental or work-related connections were increasing, they were still somewhat sparse, with weak links to research-based ideas and practices. The movement, bifurcation, and weak research ties all indicate challenges to learning and improvement district-wide. Third, our data indicate high levels of principal churn and sparse prin cipal ties, resulting in extremely limited sharing of ideas and practices across schools. The challenging larger context, along with the improve ment strategies embedded in NCLB and more recent reform policies, has resulted in high levels of movement of school-level administrators in urban districts. Most connections that existed in the last year of our study were connections among old-timers, with newcomer principals either peripheral or isolated from the expertise network. Our data suggest that newcomer principals have a difficult time connecting to other principals to share ideas and practices. As a result, new ideas—brought into the dis trict from these new principals—rarely make their way into the knowledge base of the community of leaders.

Our study has implications for the accountability policies that are driving reforms and improvement but increasing network churn at both the school and district levels. First, the policies increase levels of stress in these systems as the stakes become higher, resulting in high levels of movement in and out of leadership teams, which in our study comprised both principals and central office leaders. Second, accountability policies may have directly caused some of the network churn through the school turnaround strategy requiring replacement of principals based on the num ber of years that those principals’ schools are on sanction. Our findings also have important implications for practice and research. At the district level, the data indicate that strengthening trust within a sys tem may need to be given top priority, which can be difficult in light of the heavy emphasis on technical aspects of reform (e.g., curriculum and test ing). However, collaborative practices rely on trusting relationships, and they are required for organizational learning and improvement. Finally, our findings suggest that more sophisticated, longitudinal research meth ods are required to uncover the high levels of organizational instability in urban districts, the dynamic nature of their networks over time, and the underlying relationships involved.

References Argyris, C., & Schön, D. A. (1996). Organizational Learning II: Theory, method, and practice. Reading, MA: Addison-Wesley. Balkundi, P., & Kilduff, M. (2005). The ties that lead: A social network approach to leader ship. Leadership Quarterly, 16, 941–961. Bartol, K. M., & Zhang, X. M. (2007). Networks and leadership development: Building linkages for capacity acquisition and capital accrual. Human Resource Management Review, 17, 388–401. Borgatti, S. P., Everett, M. G., & Freeman, L. C. (2002). UCINET for Windows: Software for social network analysis. Harvard, MA: Analytic Technologies. Bourdieu, P. (1986). The forms of social capital. In J. G. Richardson (Ed.), Handbook of theory and research for the sociology of education (pp. 241–258). New York: Green wood. Bransford, J., Mosberg, S., Copland, M. A., Honig, M. A., Nelson, H. G., Gawel, D., . . . Vye, N. (2009). Adaptive people and adaptive systems: Issues of learning and design. In A. Hargreaves, A. Lieberman, M. Fullan, & D. Hopkins (Eds.), Second international handbook of educational change (pp. 825–856). Dordrecht, The Netherlands: Springer International. Bryk, A. S., & Schneider, B. (2002). Trust in schools: A core resource for improvement. New York: Russell Sage Foundation.

Burt, R. S. (2000). The network structure of social capital. In R. I. Sutton & B. M. Staw (Eds.), Research in organizational behavior (pp. 1–83). Greenwich, CT: JAI. Coburn, C. E. (2005). Shaping teacher sensemaking: School leaders and the enactment of reading policy. Educational Policy, 19(3), 476–509. Collinson, V., & Cook, T. F. (2007). Organizational learning: Improving learning, teach ing, and leading in school systems. Thousand Oaks, CA: Sage. Cooper, A. (2012). Knowledge mobilization intermediaries in education: A cross-case analysis of 44 Canadian organizations. Doctoral dissertation, University of Toronto. Retrieved from Canada Theses Portal (AMICUS No. 41102012) Cosner, S. (2009). Building organizational capacity through trust. Educational Administra tion Quarterly, 45(2), 248–291. Cross, R., Borgatti, S., & Parker, A. (2002). Making invisible work visible: Using social network analysis to support strategic collaboration. California Management Review, 44(2), 25–46. Cross, R., & Parker, A. (2004). The hidden power of social networks: Understanding how work really gets done in organizations. Cambridge, MA: Harvard Business School Press. Cummings, L., & Bromiley, P. (1996). Organizational trust inventory. In R. Kramer & T. R. Tyler (Eds.), Trust in organizations (pp. 302–330). Thousand Oaks, CA: Sage. Daly, A. J. (2009). Rigid response in an age of accountability. Educational Administration Quarterly, 45(2), 168–216. Daly, A. J., & Finnigan, K. (2011). The ebb and flow of social network ties between dis trict leaders under high-stakes accountability. American Education Research Journal, 48, 39–79. Daly, A. J., & Finnigan, K. (2012). Exploring the space between: Social networks, trust, and urban school district leaders. Journal of School Leadership, 22(3), 493–530. Easterby-Smith, M., Crossan, M., & Nicolini, D. (2000). Organizational learning: Debates past, present, and future. Journal of Management Studies, 37(6), 783–796. Finnigan, K. S. (2010). Principal leadership and teacher motivation under high-stakes ac countability policies. Leadership and Policy in Schools, 9(2), 161–189. Finnigan, K. S. (2012). Principal leadership in low-performing schools: A closer look through the eyes of teachers. Education and Urban Society, 44(2), 183–202. Finnigan, K. S., & Gross, B. (2007). Do accountability policy sanctions influence teacher motivation? Lessons from Chicago’s low-performing schools. American Educational Research Journal, 44(3), 594–629. Finnigan, K. S., & Stewart, T. (2009). Leading change under pressure: An examination of principal leadership in low-performing schools. Journal of School Leadership, 19(5), 586–618. Fiol, C. M., & Lyles, M. A. (1985). Organizational learning. Academy of Management Review, 10(4), 803–813. Gough, D., Tripney, J., Kenny, C., & Buk-Berge, E. (2011). Evidence informed policy in education in Europe: EIPEE final project report. London: EPPI-Centre, Social Science Research Unit, Institute of Education, University of London.

Granovetter, M. S. (1973). The strength of weak ties. American Journal of Sociology, 78, 1360–1380. Granovetter, M. S. (1982). The strength of weak ties: A network theory revisited. In P. V. Marsden & N. Lin (Eds.), Social structure and network analysis (pp. 105–130). Bev erly Hills, CA: Sage. Halpern, D. (2005). Social capital. Cambridge, MA: Polity. Hite, J., Williams, E., & Baugh, S. (2005). Multiple networks of public school administra tors: An analysis of network content and structure. International Journal on Leader ship in Education, 8(2), 91–122. Honig, M. I. (2008). District central offices as learning organizations: How sociocultural and organizational learning theories elaborate district central office administrators’ par ticipation in teaching and learning improvement efforts. American Journal of Educa tion, 114(4), 627–664. Hoy, W. K., & Tschannen-Moran, M. (2003). The conceptualization and measurement of faculty trust in schools: The omnibus T-Scale. In W. K. Hoy & C. G. Miskel, Studies in leading and organizing schools (pp. 181–208). Greenwich, CT: Information Age. Hubbard, L., Mehan, H., & Stein, M. K. (2006). Reform as learning: School reform, orga nizational culture, and community politics in San Diego. New York: Routledge. Huber, G. P. (1991). Organizational learning: The contributing processes and the litera tures. Organization Science, 2(1), 88–115. Krackhardt, D. (2001). Network conditions of organizational change. Paper presented at the Academy of Management Annual Meeting, Washington, DC. Larson, A. (1992). Network dyads in entrepreneurial settings: A study of the governance of exchange relations. Administrative Science Quarterly, 37, 76–104. Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge, UK: Cambridge University Press. Lazega, E., & Pattison, P. (2001). Social capital and social mechanisms and collective assets: The example of status auctions among colleagues. In N. Lin, K. Cook, & R. Burt (Eds.), Social capital: Theory and research (pp. 185–208). New York: Aldine– de Gruyter. Leana, C. R., & Van Buren, H. J., III. (1999). Organizational social capital and employment practices. Academy of Management Review, 24(3), 538–555. Levitt, B., & March, J. G. (1988). Organizational learning. American Review of Sociology, 14, 319–340. Lin, N. (2001). Social capital: A theory of social structure and action. New York: Cam bridge University Press. March, J. (1991). Exploration and exploitation in organizational learning. Organization Science, 2(1), 71–87. Marks, H. M., & Louis, K. S. (1999). Teacher empowerment and the capacity for organiza tional learning. Educational Administration Quarterly, 35, 707–750. McGrath, C., & Krackhardt, D. (2003). Network conditions for organizational change. Journal of Applied Behavioral Science, 39(3), 324–336.

Mehra, A., Dixon, A., Brass, D. J., & Robertson, B. (2006). The social networks of lead ers: Implications for group performance and leader reputation. Organization Science, 17, 64–79. Nahapiet, J., & Ghoshal, S. (1998). Social capital, intellectual capital, and the organiza tional advantages. Academy of Management Review, 23(2), 242–266. Parise, L. M., & Spillane, J. P. (2010). Teacher learning and instructional change: How for mal and on-the-job learning opportunities predict changes in elementary school teach ers’ instructional practice. Elementary School Journal, 110(3), 323–346. Penuel, W. R., Riel, M. R., Krause, A., & Frank, K. A. (2009). Analyzing teachers’ profes sional interactions in a school as social capital: A social network approach. Teachers College Record, 111(1), 124–163. Putnam, R. D. (1993). Making democracy work. Princeton, NJ: Princeton University Press. Putnam, R. D. (1995). Bowling alone: America’s declining social capital. Journal of De mocracy, 6, 65–78. Raider, H., & Krackhardt, D. (2001). Intraorganizational networks. In J. A. C. Baum (Ed.), Companion to organizations (pp. 58–74). Oxford, UK: Blackwell. Rousseau, D., Sitkin, S., Burt, R., & Camerer, C. (1998). Not so different after all: A crossdiscipline view of trust. Academy of Management Review, 23(3), 393–404. Scott, J. (2000). Social network analysis (2nd ed.). London: Sage. Soltis, S. M., Agneessens, F., Sosovova, Z., & Labianca, G. (2013). A social network per spective on turnover intentions: The role of distributive justice and social support. Hu man Resource Management, 52(4), 561–584. Spillane, J. P., Reiser, B. J., & Reimer, T. (2002). Policy implementation and cognition: Reframing and refocusing implementation research. Review of Educational Research, 72(3), 387–431. Stoll, L. (2009). Connecting learning communities: Capacity building for systemic change. In A. Hargreaves, A. Lieberman, M. Fullan, & D. Hopkins (Eds.), Second international handbook of educational change (pp. 469–484). Dordrecht, The Netherlands: Springer International. Tenkasi, R., & Chesmore, M. (2003). Social networks and planned organizational change. Journal of Applied Behavioral Science, 39(3), 281–300. Tsai, W., & Ghoshal, S. (1998). Social capital and value creation: The role of intrafirm networks. Academy of Management Journal, 41(4), 464–476. Tschannen-Moran, M. (2004). Trust matters: Leadership for successful schools. San Fran cisco: Jossey-Bass. Tschannen-Moran, M., & Hoy, W. K. (2000). A multidisciplinary analysis of the nature, meaning, and measurement of trust. Review of Educational Research, 71, 547–593. Uzzi, B. (1997). Social structure and competition in interfirm networks: The paradox of embeddedness. Administrative Science Quarterly, 42(1), 35–67. Ward, V., Smith, S., Carruthers, S., Hamer, S., & House, A. (2010). Knowledge Brokering: Exploring the process of transferring knowledge into action. Leeds, UK: University of Leeds.

Weinbaum, E. H., Cole, R. P., Weiss, M. J., & Supovitz, J. A. (2008). Going with the flow: Communication and reform in high schools. In J. A. Supovitz & E. H. Weinbaum (Eds.), The implementation gap: Understanding reform in high schools (pp. 68–102). New York: Teachers College Press. Wenger, E. (1998). Communities of practice: Learning, meaning and identity. Cambridge, MA: Cambridge University Press. Yin, R. K. (2003). Case study research, design and methods (3rd ed.). Newbury Park, CA: Sage.

Section 4

Systemic Lessons for Policy and Practice: Improving School Districts Under Pressure

Chapter 8

Commentary: Three Organizational Lessons for School District Improvement mArK A. Smylie

University of Illinois at Chicago

T

his chapter focuses on three organizational lessons for improving school districts drawn from the other chapters in the volume. Each lesson is developed with perspectives from the literature on orga nizations and organization change. I begin with a few observations about the definitions of the term district and about how to frame the problem of district improvement.

Introductory Observations As the chapters of this volume suggest, it is not clear that everyone is referring to the same things when considering school districts as objects and agents of improvement. A district can be thought of as the central office of a school system within a given geographic area; it can also be a subdistrict office, a collective of individual schools, or a community that the collective and the central office inhabit and serve. And a district can be an interactive combination of these and other things. The focus of discus sion makes a great deal of difference in considering district improvement, and it is important to be clear. Once the focus is clear, another issue emerges. The chapters in this volume tell us that districts are remarkably complex organizations and considerably varied in size, characteristics, conditions, and contexts. For example, less than a mile from my home in Oak Park, Illinois, is Austin Boulevard, the north-south boundary between Oak Park’s School Districts 97 and 200, with their eight elementary schools, two middle schools, and one high school. Also bounded by Austin Boulevard is the city of Chicago’s District 299, with its 480 elementary and middle schools and 174 high schools. The upper to middle-income, racially and culturally diverse, and largely professional community of Oak Park stands in stark contrast to the predominantly low-income African American commu nity of Chicago’s Austin neighborhood just 70 feet of asphalt away. It is 209

important to consider these kinds of situational differences among districts and their implications for thinking similarly or differently about district improvement. A number of the chapters in this volume suggest, directly or indi rectly, that district improvement should be seen as a problem of orga nization change. Yes, district improvement involves developing and implementing effective programs and practices “at scale.” It also involves developing and enacting processes of inquiry and continuous improvement at the school and district levels. But to focus on programs and processes does not make the problem of organization go away. Most of the chapters that focus on programs and processes also demonstrate that even the most promising of them can be undermined by organiza tional problems. It is not simply these kinds of mediating influences that make district improvement a matter of organization change. A good argument exists that the present organization of schools and school districts in general is largely inadequate to meet current educational needs and will require wholesale “re-formation” to meet the educational needs of the future (Smylie, 2010). Most schools and school districts are organized around purposes, structures, and processes of the past 50 to 75 years. They are organized for stability rather than change and will be largely incapable of success in a future of accelerating change and uncertainty. Thus, we need to define the problem of improvement as larger than the develop ment of discrete programs and practices, no matter how efficacious they may be. We need to define the problem of district improvement as one of organization change.

Lesson 1. Effectiveness Is Not Improvement Thinking about improvement as a problem of organization change often begins with notions of organizational effectiveness. And it often stops there. But organizational effectiveness and improvement are not the same thing. Improvement requires a sense of how to get from here to there. Without good theories and models of improvement, a notion of effective ness can be little more than wishful thinking. To be sure, well-grounded conceptions of organizational effectiveness are crucial. There must be clear and valid ideas of what well-performing and effective districts (and schools) look like and what they do. There must be sound theories regarding how and why certain characteristics and behaviors of districts are associated with performance and effectiveness.

These theories should point to the appropriate goals of district organiza tion. This is the “effectiveness perspective” of the literature reviewed in Chapter 1, by Trujillo—a perspective that is reflected in other chapters, as well, which contend that particular programs and processes, when well designed and executed, are linked to district effectiveness. Sound indica tor and assessment systems, experimentation processes, research-practice partnerships, portfolio management, and network structures are often ele ments of effective district organization. In addition, it is important to have good theories of organization change that are relevant to the complex and situational nature of school district organization. Good strategic theories of action are needed to explicate efficacious “levers,” or mechanisms, of change (Argyris & Schön, 1974; Hannaway & Woodroffe, 2003; Smylie & Perry, 1998). If contextual and organizational variability among school districts is acknowledged, then effectiveness can be considered situationally, perhaps in terms of princi ples and preferred states of being (Light, 1998). Improvement can then be considered in terms of equifinality in approach and action (Burke, 2011). Uniform prescriptions and scaled-up replications of discrete programs and practices can be recognized as problematic. The contributors to this volume also view the above-mentioned char acteristics of effective district organization—indicator and assessment systems, experimentation, research-practice partnerships, portfolio man agement, and network structures—as mechanisms or levers of improve ment. Thus, these characteristics are considered to be both means and ends. The contributors to this volume sketch the outlines of a new form of district organization, grounded in dynamic processes more than in static characteristics, reflecting a view of districts as continuously improving (see Smylie, 2010). There are problems with thinking piecemeal about theories of organiza tion improvement (Demers, 2007) or thinking about single improvement levers in isolation (Burke, 2011). Reliance on singular levers may achieve some results (e.g., Rowan, 1990), but strategic, dynamic combinations of levers are more likely to achieve substantial, long-lasting improvement (e.g., Smylie & Perry, 1998; Smylie, Wenzel, & Fendt, 2003). The organizational complexities of districts and their contexts demand holistic, multifaceted, and mutually reinforcing systems of levers and strategies. Such a system is illustrated in Chapter 5, by Bush-Mecenas, Marsh, and Strunk. Chapter 5 also reinforces the importance of the fit of improvement strategies to a dis trict’s context, orientation, capabilities, and organization problems.

Lesson 2. Think and Act Systemically Systemic thinking is not a new development. Indeed, applying systems perspectives to organizations and organization change can be traced to the 1950s (Hatch & Cunliffe, 2012; Scott, 2002). Yet, as Sunstein (2013) has observed in his analysis of thinking about policy problems and solu tions, most people focus on parts of problems and on only one way to address them. Moreover, most people overlook the “systemic effects of one-shot interventions”; they fail to look “globally at the consequences of apparently isolated actions” (p. 235). Bolman and Deal (2008) have drawn attention to the widespread “system blindness” in organizations (see also Oshry, 2007). In their view, looking at and acting on only one part of an organization, or looking at it in only one way, will help leaders “get it right” some of the time but will make them wrong most of the time. Thus the need to think and act systemically. And the need to adopt an open-systems perspective that directs attention not only to the organiza tion but also to the relationships between the organization and the elements of its environment (Burke, 2011). According to Penuel and DeBarger (Chapter 4 in this volume), open-systems perspectives have emerged in recent calls for new analyses of education policy and change. Moreover, systems perspectives are evident in recent models and theories of school organizational effectiveness (e.g., Bryk, Sebring, Allensworth, Luppescu, & Easton, 2010; Smylie, 2010) and in recent literature on the sources of organizational failure, including school failure (Dekker, 2011; Meyer & Zucker, 1989; Murphy & Meyers, 2008). What does it mean to think and act systemically about school districts and their improvement? Several chapters in this volume provide insights, first with regard to thinking systemically about districts and then about improvement. Trujillo’s categorization of the correlates of effectiveness portrays district organization as a system of technical, sociopolitical, and normative elements (Chapter 1). While finding that the most attention has been given to technical elements (rational, structural, and managerial), Trujillo argues for a better understanding of correlates in other categories and of the relationships among correlates in promoting district success. Penuel and DeBarger (Chapter 4) place their analysis of research-prac tice partnerships in systemic perspective, identifying politics, leadership, human resources, systems of accountability and incentives, and structural coherence as crucial interacting components of district organization that mediate the formation and work of such partnerships and the development and implementation of new district-wide programs.

From a systems perspective, Bush-Mecenas and her colleagues (Chapter 5) found that the effects of policy designed for some schools were not confined to those schools. For example, accountability policies for some schools had an indirect spillover effect on other schools in the district. The authors also draw attention to the importance of (a) elements of a systemic organizational context in which the district-level strategy of portfolio management is employed; (b) school-level human, physical, and social capital; (c) commitment to reform across the system; (d) school and district leadership and support; (e) the structural alignment of new reforms with existing policies and procedures; and (f) the district’s relationships with, and the capacities of, external partners and school developers. In an analysis of subdistrict networks, Wohlstetter, Buck, Houston, and Smith (Chapter 6) reveal a system of central office–network–school relationships and point to the problem of coordination and coherence within and across such systems. They speak of policy coherence from a systems perspective, emphasizing the alignment of priorities, capabili ties, and politics. Finnigan and Daly (Introduction) chart systems of social networks among district leaders and their prospects for organizational learning and district-wide improvement. They also draw attention to the mediating effects of leadership turnover. These insights are consistent with general approaches to systemic thinking found in the organization literature (Burke, 2011; Hatch & Cunliffe, 2012; Scott, 2002). Accordingly, school districts should be seen in terms of multiple units (e.g., schools), levels (e.g., central offices, sub districts, schools, even departments and classrooms), and dimensions (e.g., structures, social relationships, politics, cultures). These elements should be considered as interactive, mutually influential parts of a dynamic whole. From an open-systems perspective, the system that is the organiza tion should be considered as part of larger systems, in dynamic, mutually influential relationships with external environments. So what does it mean to act systemically? Recall the earlier argument about the importance of the strategic use of multiple levers of improve ment. Wohlstetter and her colleagues (Chapter 6) illustrate this point by considering the interplay of capacity development, support, and account ability across the levels of district organization (central offices, networks, and schools). The authors also demonstrate that organization improve ment is perhaps best achieved when reforms address multiple levels and mutually influential elements of the broader organizational system. This idea, along with the idea of coherence among the levers themselves, is foundational to systemic and comprehensive approaches to school reform

that have emerged since the early 1990s (e.g., Murphy & Datnow, 2003; Rowan, Correnti, Miller, & Camburn, 2009; Smith & O’Day, 1991). The same idea is emphasized in studies of the implementation of complex reforms that address the cultivation of external environments to increase the prospects of organizational improvement (Smylie, 2010; Smylie & Crowson, 1996).

Lesson 3. Apply Pressure With Care An intriguing element of this volume’s title is the phrase “under pressure.” Today, a great deal of effort to improve education proceeds from the appli cation of pressure on students, educators, schools, and school districts. Pressure has been the preferred lever in U.S. public schools. Be it highstakes standards-based testing, educator evaluation, school and district rat ings, or choice, the logic is that externally applied pressure creates stress and stress creates incentives for educators, schools, and school districts to improve their performance and outcomes. Substantial evidence exists that policy pressure can induce change, and sometimes in positive direc tions (Hannaway & Woodroffe, 2003; Rowan, 1990). However, if not applied wisely and with care, pressure can lead to unintended negative consequences. First attention is usually given to pressures that come from reform pol icy, but there are many other sources of pressure on schools and school dis tricts. Many are associated with the shifting “terrains” of schooling (Lugg, Bulkley, Firestone, & Garner, 2002). These include the transformation of jobs and the globalization of employment, the expansion of the control of schooling to a wider range of constituents, the expansion and contestation of educational goals and strategies, shrinking funding and growing reli ance on local sources of revenue, changes in the characteristics and condi tions of children and youth, and the challenges of inequity and inequality associated with it all. So the pressure that comes from policy is not the only consideration. Pressures from other sources, along with policy pres sures, are likely to cascade concurrently on schools and school districts. The phrase “under pressure” in the title of this volume invites deep and problematic consideration of pressure as a mechanism of district improve ment. Several chapters provide insights. Hamilton and Schwartz (Chapter 2) show that assessments and the stakes attached to them can indeed invoke change. The problem is that some of the changes, particularly in classroom instruction, may not be desirable. Hamilton and Schwartz conclude that pressure from assessments should be part of a system of aligned levers, a

system that is clear in its purposes and that couples pressure with support. Penuel and DeBarger (Chapter 4), as well as Bush-Mecenas and her col leagues (Chapter 5), contend that program development and implemen tation, even as promoted by innovative structures and processes, can be affected positively or negatively by pressure from contextual sources, nota bly internal and external politics. Bush-Mecenas and her colleagues write of pressures from competition among school developers and indirect pressures on schools that are not currently the direct targets of district intervention (see also Yu, Insead, & Lester, 2008). Wohlstetter and her colleagues (Chapter 6) argue that productive responses to central office accountability pressure are mediated by school capacity, the availability and quality of external sup ports, and the ways in which schools avail themselves of those supports. These insights are consistent with much of the literature on organiza tional behavior and change (e.g., Burke, 2011; Hatch & Cunliffe, 2012). And the literature reveals an irony about the use of pressure, particularly extreme pressure, to induce district organizational improvement. Reform pressure, alone or in combination with pressure from other sources, may push dis tricts away from organizational behaviors that promote improvement, such as experimentation, risk-taking, innovation, creative adaptation, and productive organizational learning—behaviors encouraged by some of the most prom ising improvement strategies described in this volume. Moreover, reform pressure may push districts toward behaviors that are antithetical to improve ment. Indeed, it can trigger maladaptive tendencies and snowball in a vicious cycle toward organizational distress and failure (Smylie, 2010). Noted organizational theorists March (1994), Simon (1986), and Thompson (1967) each observed that organizations under high external pressure tend to become more anxious and intractable, and more persis tent in their thinking and behavior. Staw, Sandelands, and Dutton (1981) have referred to this phenomenon as “threat-rigidity.” Organizations under extreme pressure tend to rely on “well-learned” assumptions and responses that may not be appropriate or effective under new conditions. These organizations may narrow their attention to information, particularly dissonant information. They simplify and reduce the number of alterna tives they choose to consider, rely on experience and prior knowledge, and restrict alternatives to those that are consistent with that experience and knowledge. They focus on what confirms rather than on what challenges. Organizations under high pressure tend to centralize authority, consoli date control, and reduce autonomy and discretion. They direct organiza tional members toward activities that reduce pressure rather than toward more productive activity. They emphasize efficiency over effectiveness

and intensify internal accountability, stifling experimentation and making failure prohibitive, failure that might serve as a foundation for innovation (Petroski, 2006; Smylie, 2010; see also O’Day, 2002). There is more. Rudolph and Repenning (2002) found that the effect of pressure on organizational performance is curvilinear. Small and mod erate pressures that organizations might manage well if they arose one at a time or in small doses, and that might stimulate performance, can accumulate past a tipping point and lead to organizational ineffectiveness, malfunction, and collapse. This is a good moment to note that underper forming and underresourced school districts serving large proportions of low-income and racially isolated students are the districts likely to be sub ject to the most extreme combinations of environmental stressors, to have the least human and organizational capacity and sources of support, and to be on the receiving end of the greatest pressure from reform policy. And it is not only external pressure that can cause problems. Bozeman (2011) has observed that stress can arise from poor internal decision making and bad organizational responses to external pressure (e.g., threat rigidity), both of which can contribute to organizational implosion. The upshot is that some types, combinations, and degrees of pressure, if they fit the situation and an organization’s capacity to deal productively with them, can be productive. Other types, combinations, and degrees of pressure, if mismatched to situation and capacity, can be destructive. If we are to think productively about pressure, we need sound theories of action. As discussed earlier, pressure is perhaps best applied in dynamic and strategic combina tions with other levers of change. It may be most productive if applied with situational sensitivity. Careful reform efforts take into account the cumulative effects of multiple sources and types of pressure, as well as the capabilities of organizations to respond productively. They take into account the possibility of unintended consequences and unproductive spillover effects.

A Concluding Comment: Leadership Matters These lessons point directly to the importance of leadership in school dis trict improvement. No chapter in this volume is devoted primarily to leader ship, but several chapters refer to ways that leadership matters. Finnigan and Daly (Introduction) argue that social networks of district leaders constitute an important source of productive organizational learning and district improve ment. Penuel and DeBarger (Chapter 4) tell of the importance of engaging principals in improvement efforts, even efforts focused on the district level. Their message is that principals matter; they can contribute greatly to the

success of improvement efforts, or they can really mess things up. Penuel and DeBarger point to the importance of several leadership functions in improve ment efforts, including managing the politics of partnerships, establishing pri orities at the district level, and coordinating efforts within the central office and between the central office and schools. Similarly, Trujillo (Chapter 1) points to the importance of instructional leadership and to various tasks and functions generally associated with effective leadership. It is certainly difficult to imag ine developing many of the correlates of district effectiveness identified in her analysis without strong and thoughtful leadership. And as Bush-Mecenas and her colleagues (Chapter 5) suggest, it is difficult to imagine creating con ditions, developing capacity, or managing processes conducive to successful portfolio management without such leadership. It is difficult to imagine how the networks examined by Wohlstetter and her colleagues (Chapter 6) could be successful in their relations with the central office and with schools without effective leadership across each of these levels of district organization. It seems clear that leadership matters. But it is much less clear how it matters and how effective leadership for district improvement might look. Following Wohlstetter and her colleagues, I do not presume that leadership for district improvement is only a function of executive leadership from the central office and the boardroom, although executive leadership from the top is certainly very important (Burke, 2011; Yukl, 2013). If district orga nization and district improvement are to be considered systemically, then district leadership, too, should be considered systemically, across all levels of district organization. This means defining leadership as much by tasks and functions as by roles. It means considering leadership as performed by teachers and perhaps by students, parents, and outside partners. It means thinking about a system of leadership for district improvement. What would a system of leadership look like, and how would it function to successfully promote improvement in school districts? How can leaders do more than nudge their “massive old battleship[s]” in the right direction (“Chancellor Levy exits,” 2002)? How can leaders succeed “under pressure” by becoming better organized for the future? These are some of the ques tions we will need to address if we are to succeed in district improvement.

References Argyris, C., & Schön, D. A. (1974). Theory in practice: Increasing professional effective ness. San Francisco: Jossey-Bass. Bolman, L. G., & Deal, T. E. (2008). Reframing organizations: Artistry, choice, and lead ership. San Francisco: Jossey-Bass.

Bozeman, B. (2011). Toward a theory of organizational implosion. American Review of Public Administration, 41(2), 119–140. Bryk, A. S., Sebring, P. B., Allensworth, E., Luppescu, S., & Easton, J. Q. (2010). Organizing schools for improvement: Lessons from Chicago. Chicago: University of Chicago Press. Burke, W. W. (2011). Organization change: Theory and practice (3rd ed.). Los Angeles: Sage. Chancellor Levy exits. (2002, August 7). Editorial. New York Times. Retrieved from http:// www nytimes.com/2002/08/07/opinion/chancellor-levy-exits html Dekker, S. (2011). Drift into failure: From hunting broken components to understanding complex systems. Burlington, VT: Ashgate. Demers, C. (2007). Organizational change theories: A synthesis. Los Angeles: Sage. Hannaway, J., & Woodroffe, N. (2003). Policy instruments in education. Review of Re search in Education, 27, 1–24. Hatch, M. J., & Cunliffe, A. L. (2012). Organization theory: Modern, symbolic, and postmodern perspectives (3rd ed.). New York: Oxford University Press. Light, P. (1998). Sustaining innovation: Creating nonprofit and government organizations that innovate naturally. San Francisco: Jossey-Bass. Lugg, C. A., Bulkley, K., Firestone, W. A., & Garner, C. W. (2002). The contextual terrain facing educational leaders. In J. Murphy (Ed.), The educational leadership challenge: Redefining leadership for the 21st century. 101st Yearbook of the National Society for the Study of Education, Part I (pp. 20–41). Chicago: National Society for the Study of Education. March, J. G. (1994). A primer on decision making: How decisions happen. New York: Free Press. Meyer, M. W., & Zucker, L. G. (1989). Permanently failing organizations. Newbury Park, CA: Sage. Murphy, J., & Datnow, A. (Eds.). (2003). Leadership lessons from comprehensive school reforms. Thousand Oaks, CA: Corwin. Murphy, J., & Meyers, C. V. (2008). Turning around failing schools: Leadership lessons from the organizational sciences. Thousand Oaks, CA: Corwin. O’Day, J. A. (2002). Complexity, accountability, and school improvement. Harvard Edu cational Review, 72(3), 293–329. Oshry, B. (2007). Seeing systems: Unlocking the mysteries of organizational life (2nd ed.). San Francisco: Berrett-Koehler. Petroski, H. (2006). Success through failure: The paradox of design. Princeton, NJ: Prince ton University Press. Rowan, B. (1990). Commitment and control: Alternative strategies for the organizational design of schools. Review of Research in Education, 16, 353–389. Rowan, B. P., Correnti, R. J., Miller, R. J., & Camburn, E. M. (2009). School improvement by design: Lessons from a study of comprehensive school reform programs. In G. Sykes, B. Schneider, D. N. Plank, & T. G. Ford (Eds.), Handbook of education policy research (pp. 637–651). New York: Routledge.

Rudolph, J. W., & Repenning, N. P. (2002). Disaster dynamics: Understanding the role of quantity in organizational collapse. Administrative Science Quarterly, 47(1), 1–30. Scott, W. R. (2002). Organizations: Rational, natural, and open systems (5th ed.). Upper Saddle River, NJ: Prentice-Hall. Simon, H. (1986). Theories of bounded rationality. In C. B. McGuire & R. Radner (Eds.), Decision and organization (Vol. 2, pp. 161–176). Minneapolis, MN: University of Minnesota Press. Smith, M. S., & O’Day, J. (1991). Systemic school reform. In S. H. Fuhrman & B. Malen (Eds.), The politics of curriculum and testing: 1990 Yearbook of the Politics of Educa tion Association (pp. 233–267). Washington, DC: Falmer. Smylie, M. A. (2010). Continuous school improvement. Thousand Oaks, CA: Corwin. Smylie, M. A., & Crowson, R. L. (1996). Working within the scripts: Building institutional infrastructure for children’s service coordination. Educational Policy, 10(1), 3–21. Smylie, M. A., & Perry, G. (1998). Restructuring schools for improving teaching. In A. Hargreaves, A. Lieberman, M. Fullan, & D. Hopkins (Eds.), International handbook of educational change (pp. 976–1005). Dordrecht, The Netherlands: Kluwer. Smylie, M. A., Wenzel, S. A., & Fendt, C. R. (2003). The Chicago Annenberg Challenge: Lessons on leadership for school development. In J. Murphy & A. Datnow (Eds.), Leadership for school reform: Lessons from comprehensive school reform designs (pp. 135–158). Thousand Oaks, CA: Corwin. Staw, B. M., Sandelands, L. E., & Dutton, J. E. (1981). Threat-rigidity effects in organiza tional behavior: A multilevel analysis. Administrative Science Quarterly, 26(4), 501–524. Sunstein, C. (2013). If misfearing is the problem, is cost-benefit analysis the solution? In E. Shafir (Ed.), The behavioral foundations of public policy (pp. 231–242). Princeton, NJ: Princeton University Press. Thompson, J. G. (1967). Organizations in action. New York: McGraw-Hill.

Yu, T., Insead, M. S., & Lester, R. H. (2008). Misery loves company: The spread of nega-

tive impacts resulting from an organizational crisis. Academy of Management Review, 33(2), 452–472. Yukl, G. (2013). Leadership in organizations (8th ed.). Boston: Pearson.

Chapter 9

Commentary: Toward Systemic Reform in

Urban School Districts

KenneTH K. wong Brown University

U

rban districts that persistently struggle with academic performance often face long-term structural challenges in their larger socioeco nomic and political environments. Steady outmigration of middleand working-class families has left behind school communities with rising concentrations of families in poverty (Barton & Coley, 2010; Kneebone & Holmes, 2014; Wilson, 1987). Districts in need of improvement are often home to recent immigrants with diverse linguistic and cultural experience. When a substantial percentage of the working-age population suffers from long-term unemployment or severe underemployment in a stagnant local labor market, adolescents and young adults lack social connections to workoriented role models. In many low-income school districts with a diminish ing middle class, school leaders have difficulty in building broad-based coalitions to seek state and local government support for needy students to help them meet their learning goals, which is particularly important as state and district leaders begin to implement the recently passed Every Student Succeeds Act (ESSA). These districts also experience high turnover of superintendents and principals, and parents and the general public dem onstrate low levels of confidence in school reform efforts. Fragmentation in district governance creates an additional obstacle to policy coherence. These districts, in other words, are “besieged” (Howell, 2005). There is, to be sure, no shortage of ideas on how to restructure per sistently low-performing districts. On one side of the reform continuum are advocates who question the relevance of school boards and districts as governing entities (Finn & Petrilli, 2013; Hess & Meeks, 2013). These seemingly inflexible, jurisdictionally defined structures are seen as out dated in meeting students’ learning needs for the 21st century. On the other side are reformers who argue for additional government funding as the cure for low school performance. These and other competing perspectives may be applicable to certain situations at certain times, but the critical need is for systemic reform, 221

based on the integration of knowledge and practice from various disci plines and perspectives. We have a wealth of experience in turnaround of individual schools. But it is time to build a knowledge base that can support turnaround of an entire district. In this commentary, I will discuss several approaches to district reform, highlighting the contributions of the chapters in this volume to meeting the systemic challenge.

The Federal Role in Advancing Systemic School Reform: Accountability With Differentiated Intervention and Support Efforts to restructure failing districts and schools have gained national attention in part because of the federal focus on intervention since the No Child Left Behind Act of 2001 (NCLB). Since the enactment of the 1965 Elementary and Secondary Education Act (ESEA), the federal role has broadened from the initial focus on antipoverty to embracing outcomebased accountability (Wong, 2013). Building on the recommendations of the 1983 report A Nation at Risk and on subsequent amendments to ESEA, federal K–12 initiatives steadily shifted toward outcome-based performance until NCLB was enacted in 2001 with bipartisan support. The Obama administration has further broadened the federal role by leveraging institutional conditions at the state and district levels toward stronger academic outcomes. First, U.S. Secretary of Education Arne Duncan pushed for more direct district intervention in persistently lowperforming schools. In his proposal to reauthorize NCLB in elementary and secondary education, Secretary Duncan argued for four strategies to turn around the nation’s lowest performing 5% of schools (approximately 5,000 schools). The Duncan strategies included •

School turnaround under a new principal, who is permitted to recruit at least half of the new teachers from outside the school;

•

School transformation to strengthen professional support, teacher evaluation, and capacity building;

•

School restarting, that is, reopening of schools either as charter schools or under management by organizations outside the district; and

•

School closure, which results in moving all students in the low-performing schools to other, higher performing schools.

Since 2012, as the federal government approves state applications for NCLB waivers, the turnaround framework has been modified to encour age state education agencies to support districts’ efforts to differentiate

low-performing schools into several categories, including Focus Schools, which require the most extensive district intervention. Second, the Obama administration has targeted fiscal incentives to support school improvement initiatives. In making its first School Improvement Grants (SIGs) to support school turnaround efforts, the administration in December 2010 allocated $3 billion in federal funds to more than 730 schools in 44 states. Of these schools, an overwhelm ing majority (71%) chose the transformation option; very few decided to restart (5%), and even fewer were subject to closure (3%). The remaining 21% chose the turnaround option, where the principal and a majority of the teaching staff were replaced (Klein, 2011). Equally important, only 16.5% of the students in all the SIG schools were White, as compared with 44% African American and 34% Hispanic. The choices made by the first cohort of SIG grantees seemed to suggest a preference for more incremen tal approaches to school improvement. Third, the Obama administration has attempted to align federal admin istrative capacity with the reform needs at the district and school levels. In anticipation of local inertia toward incremental organizational changes, the Obama administration created the Office of School Turnaround in late 2011 to monitor and support local efforts to raise school performance. Further, Obama’s February 2012 proposal to reauthorize ESEA aimed at stronger program coordination. Specifically, the administration proposed to consolidate 38 major programs into 11 broad initiatives. Departing from the categorical arrangements, these new programs were designed to allocate federal funding through competitive applications, focus on what works, and allow for greater local discretion in implementation. With the aim of “expanding educational options,” the administration proposed to consolidate five programs: Charter Schools Grants, Credit Enhancement for Charter School Facilities, Parental Information and Resource Centers, Smaller Learning Communities, and Voluntary Public School Choice. A Teacher and Leader Pathways fund would support School Leadership, Teach For America, Teacher Quality Partnerships, Teachers for a Competitive Tomorrow, and Transition to Teaching. The Teacher and Leader Innovation Fund would consolidate the teacher incentive fund and advanced credentialing. More recently, the Obama administration has granted NCLB waivers to about 40 states. In exchange for a new timetable of academic growth for their students, waiver states are required to meet 18 major reform expectations. In an analysis of the waiver implementation in 16 states, Wong and Reilly (2014) found that federal requirements that call for fundamental departures from current

policies and practices tend to face greater local implementation difficul ties. For example, because of lengthy legislative processes and union pol itics, fewer than half of the sample states were able to institute evaluation systems for teachers and principals.

Urban Complexity in Implementing Differentiated Strategies The federal focus on turnaround by engaging state education agencies and local education authorities in differentiated intervention has strong sup port in large urban districts, including Washington, D.C., Chicago, New Orleans, and Philadelphia. At the same time (as suggested in case studies of the Los Angeles and New York City school districts in Chapters 5 and 6 of this volume), school autonomy, service contracting, and other differ entiated strategies are likely to face implementation challenges. In exam ining the Public School Choice Initiative in Los Angeles during its first several years, Bush-Mecenas, Marsh, and Strunk (Chapter 5) observed the difficulties in implementing a fairly comprehensive and yet differentiated set of reform strategies, such as granting greater school autonomy coupled with district oversight, replacing principals and staff, recruiting a diverse set of service providers to operate and manage schools, and engaging the community. These reform features resembled those of the Obama admin istration, but with a stronger dose of portfolio management that involved diverse providers (Bulkley, Henig, & Levin, 2010). Bush-Mecenas et al. found that political influence, notably that of members of the school board, interfered with the process of selecting redesign plans and providers for schools with high needs. As the district responded to initial implementa tion problems by creating new governing layers for the initiative, the orga nizational complexity and confusion tended to discourage school design teams from submitting Public School Choice Initiative applications. Similarly, the New York City case study, by Wohlstetter, Buck, Houston, and Smith (Chapter 6), illuminated the challenges of imple menting the Common Core standards in a system that empowered prin cipals as CEOs of their schools and, at the same time, instituted a strong accountability framework: the Citywide Instructional Expectations (CIE). Particularly interesting were the contrasts between high- and low-perform ing schools (based on an in-depth study of four New York City schools). Not surprisingly, principals in low-performing schools complained about the pressure of trying to meet the CIE. These principals did not actively seek technical support from the district. In contrast, their high-performing peers had favorable views of the CIE and welcomed greater executive

autonomy. High-performing schools “were able to exploit autonomy to their own advantage” (p. 174) In other words, the New York City study suggests the need for the district administration to redouble its efforts to build the capacity of the lower performing schools in the era of Common Core, a top priority of the Obama administration.

Variations in Capacity Building Given the importance of capacity building for schools’ ability to meet reform expectations, three chapters in this volume offer approaches to this urgent issue. First, an all-too-common challenge in low-performing schools is professional isolation. There is a general lack of communica tion about student progress by subject matter across grade levels and on different subject matters within grade. There is clearly a need to facilitate stronger vertical and lateral communication and exchange among educa tors. In this regard, the study conducted by Penuel and DeBarger (Chapter 4) on forging partnerships between researchers and subject matter educa tors shows the promise of sustaining a purposeful professional commu nity. Particularly interesting is the authors’ observation of the need to build a broader engagement process to ensure support from district leadership. The second issue in capacity building is analytic capacity in using data. Urban districts, given their organizational complexity and diverse student populations, can benefit from using data to facilitate strategic deliberation. I had the opportunity to observe a “performance manage ment” session conducted by the CEO of the Chicago Public Schools, where principals and district administrators collaboratively examined data on attendance and course-taking patterns in several high schools with different dropout rates. The discussion was driven by the CEO’s active search for actionable strategies. Clearly, this rapidly changing arena of data use has attracted ongoing investment in financial and human capital resources across urban districts. Hamilton and Schwartz (Chapter 2) offer their expertise in mapping out the guiding principles for designing meaningful indicators at the school level. In considering 21st-century learning objectives, the authors advise district leaders to carefully weigh various trade-offs in the indicators system, such as a balance between high- and low-stakes items. They argue cogently for supporting principals and teachers in using the indicators system. The third issue in capacity building is the systemic challenge of using rigorous research to inform practice in a timely manner. The gold standard for research, according to the Institute of Education Sciences, is the use

of randomized controlled trials (RCT) to determine the causal effects of reform strategies. While RCT is well respected as a research design, a key constraint is the lengthy process it often requires to yield results. There is clearly a need for rigorous studies with more relevance for educators. In this regard, the notion of “formative experimentation,” proposed by Supovitz (Chapter 3), creatively combines the rigor of RCT with timely practitioner feedback to meet the immediate needs of schools and educa tors to improve their practice. In his experimental project linking instruc tional practice to student performance in a suburban district in New Jersey, Supovitz randomly assigned math teachers to the treatment and control conditions and engaged the participating teachers in discussions of the data within each assignment group. The teachers’ perspectives contributed to an understanding of why the differences did or did not exist between the treatment and control conditions. The discussions also facilitated the development of a professional learning community.

Efforts to Promote Policy Coherence Given the fragmented nature of our education policy system, reforming the technical core alone may not be adequate as a systemic reform strategy. Two of the chapters in this volume call for a broader focus on leadership, politics, and governance. To be sure, a structural challenge for many urban districts is the fragmentation of district governance. Too often, superin tendents face conflicting demands from individual members of the elected school boards, well-organized teachers’ unions, state and local govern mental actors, and community groups, among others. In some large dis tricts, mayors aim to place accountability in the mayoral office to improve policy coherence at the district level (Wong, Shen, Anagnostopoulos, & Rutledge, 2007). But because mayoral involvement is not feasible or desirable in some urban communities, reformers have implemented a wide range of district-wide strategies. Given the ongoing investment in district reforms, there is a need to generate usable lessons. However, the existing knowledge base on district reform effectiveness may need to be reconsidered. In an extensive review of “what works” in district and school reforms, Trujillo (Chapter 1) found that the empirical literature generally suffers from several limitations, including inadequate sampling, weak comparative design, an absence of longitudinal analyses, limited focus on school- and classroom-level processes, and a narrow range of effectiveness measures. Further, she found that most stud ies tended to focus on technical aspects (such as reform framed in terms of

a subject matter) and gave little attention to issues of politics, class, race, or ideology in seeking to understand the effectiveness of district-level reform. Given the changing demographic characteristics and political landscapes in urban districts, researchers will need to rely more on interdisciplinary methods when examining the multidimensional aspects of district reform. Effective district reform is critically shaped by the quality of human capital in the system. Recruiting, rewarding, and keeping effective school leaders and district administrators have become necessary conditions for district turnaround. In Chapter 7, Finnigan, Daly, and Liou apply social net work analysis to provide visual mapping of the professional relationships among school leaders and administrators over a four-year period in a lowperforming district. They found that high rates of turnover among school leaders and administrators had a detrimental effect on trust among key play ers within the system. For example, they found that high turnover of school principals was linked to “limited lateral flow of ideas and practices across school-based leaders” (p. 197). They observe that this kind of limitation may weaken the human-capital foundation for advancing district reform.

From Status Quo to a Movement of Urban Systemic Reform Researchers of urban district reform are witnessing moments of historic sig nificance. The institutional status quo is called into question by the rationale behind systemic reform, namely, accountability with consequences for dif ferentiated interventions. At the same time, the articulation and the configura tion of systemic approaches are shaped by local political and policy contexts. Taken as a whole, this volume illuminates the diverse reform strategies that are being carried out across urban districts. There is simply no single silver bullet. Nor can we afford to maintain the status quo in a fast-changing edu cation sector. Indeed, given the growing public sentiment against the status quo, the current systemic efforts in urban districts can be seen as a new reform movement in formation. This volume urgently calls upon education researchers to assess the knowledge base and generate usable knowledge for the next phase of urban school reform. Clearly, it offers a useful institutional baseline for reformers as they assess the current status of major reform ini tiatives using models such as portfolio management, professional learning communities, researcher-practitioner partnerships, and indicator systems. If the lessons learned can be applied to build, and in some cases, redesign, the capacity of low-performing districts, then the next edition will allow us to focus on the exciting task of scaling up effective systemic strategies in urban districts across the country.

References Barton, P., & Coley, R. (2010). The Black-White achievement gap: When progress stopped. Princeton, NJ: Educational Testing Service. Bulkley, K., Henig, J., & Levin, H. (Eds.). (2010). Between public and private. Cambridge, MA: Harvard Education Press. Finn, C., & Petrilli, M. (2013). The failures of U.S. education governance today. In P. McGuinn & P. Manna (Eds.), Education governance for the twenty-first century (pp. 21–35). Washington, DC: Brookings Institution Press. Hess, F., & Meeks, O. (2013). Rethinking district governance. In P. McGuinn & P. Manna (Eds.), Education governance for the twenty-first century (pp. 107–129). Washington, DC: Brookings Institution Press. Howell, W. (Ed.). (2005). Besieged: School boards and the future of education politics. Washington, DC: Brookings Institution Press. Klein, A. (2011, January 11). Turnaround-program data seen as promising, though pre liminary. Education Week, 30(15), 20–21. Retrieved from http://www.edweek.org/ew/ articles/2011/01/12/15turnaround-2 h30.html?qs=school+turnaround Kneebone, E., & Holmes, N. (2014, September 19). New census data show few metro areas made progress against poverty in 2013 (Brookings Metropolitan Opportunity Series). Washington, DC: Brookings Institution Press. Wilson, W. J. (1987). The truly disadvantaged: The inner city, the underclass, and public policy. Chicago: University of Chicago Press. Wong, K. (2013). Education governance in performance-based federalism. In P. McGuinn & P. Manna (Eds.), Education governance for the twenty-first century (pp. 156–177). Washington, DC: Brookings Institution Press. Wong, K., & Reilly, M. (2014, August). Education waivers as reform leverage in the Obama administration: State implementation of ESEA flexibility waiver request. Paper prepared for the annual meeting of the American Political Science Association, Wash ington, DC. Wong, K., Shen, F., Anagnostopoulos, D., & Rutledge, S. (2007). The education mayor. Washington, DC: Georgetown University Press.

Conclusions

The Challenge of School

and District Improvement:

Promising Directions in District Reform

AlAn J. DAly

University of California, San Diego

Kara S. Finnigan

T

University of Rochester

he studies contained in this volume provide powerful insight into various organizational structures that facilitate school reform, ways that school districts respond to multiple contextual and political demands, how capacity for improvement is developed and sustained, and what gets in the way of reform efforts. Together, these studies can guide future work on the challenges facing the lowest performing schools and districts nationwide, which is particularly important as state and district leaders begin to implement the recently passed Every Student Succeeds Act (ESSA). One of our central aims for this volume was to determine where further exploration is needed, by researchers, policy makers, and prac titioners. We also wanted to leverage learning from recent efforts at improving schools and districts under accountability policies, and we wanted to consider more systemic approaches to large-scale change efforts across educational systems. Smylie (Chapter 8) and Wong (Chapter 9) highlight the main themes of the volume and together pro vide a strong understanding of the local and policy challenges to largescale improvement. In this final chapter, we build on their commentaries and on the volume as a whole to elevate the major cross-cutting ideas and to highlight promising directions for policy, practice, and research. We offer this chapter not just as an academic task—bringing the central findings to the surface to consider them in the abstract—but also with an eye toward their importance and their application in policy and practice. The two authors contributed equally to this chapter.

229

We discuss three overarching themes in the following sections: (a) why school districts matter, (b) how politics shapes district reform, and (c) the importance of district capacity. In each area we describe high-lever age points where policy makers and district leaders can best bring about system-wide improvement. We then consider the implications for district reform, policy change, and future research.

Why School Districts Matter As we review the chapters in this volume, one thing becomes clear: that school districts, as coordinating mechanisms, matter for system-wide change. But the organizational and governance structures of districts vary greatly, and their differences make the work of studying district reform more complex and nuanced than is typically recognized. In this section we argue that it is important to closely examine districts’ underlying struc tures and how they facilitate or hinder change.

District Form As the work in this volume indicates, a tremendous amount of variation exists in school districts’ form, structure, governance, size, and success; yet often these complex systems are treated as a single type of unit. If descriptors are used, they typically indicate geographic classifica tions, for example “urban district” or “rural district.” Such classifica tions leave much to be desired in a world of rapid demographic and sociopolitical change. Districts are changing in form and function, and our understanding of them must evolve as well. Generalizations about school districts often overlook critical aspects of their structure that may be consequential and therefore must be attended to. For example, in this volume we present data on several large urban districts that are radically different from one another in organizational and governance features and demographics. As the field of education reform moves forward, we need to unpack our understanding of districts and pay closer attention to how their varied characteristics support or constrain conditions for learning and improve ment. Our study of districts will also need to include the roles of unions at different levels, of school boards, and of wider local and state govern mental control. The emphasis in district reform work has often privileged large urban districts, which serve only about 12% of the nation’s students. The next

generation of district reform would benefit from closer attention to mid size districts, which serve almost four times as many students as large urban districts. For example, according to the Common Core of Data (CCD), a total of 856 of 13,449 school districts enrolled between 10,000 and 99,999 students in 2010–2011, serving 41.7% of students in the United States (Keaton, 2012). Beyond unpacking variability among districts in regard to size, form, and structure, it is important to examine new and developing district structures (e.g., portfolios, clusters, and networks), many of which are guided by metaphors and approaches drawn from the business literature and efforts at market reform, or result from efforts to better link educators across districts. These structures need to be studied and problematized to unpack both the forms themselves and the theories of action driving structural shifts. Crafting careful studies that attend to larger organiza tional features, while at the same time attending to the conditions that support and constrain them, will add considerably to our understanding of the variability among districts.

District Functions Beyond studying district forms, we need to better examine how dis tricts play the external role of mediator between state and local poli cies, as well as the internal role of mediator between district offices and schools, teachers, and parents; between school boards and super intendents; and even between individual actors. In our work we have focused a great deal on the role of central office leaders in brokering the ideas and practices that flow throughout the educational system (Daly, Finnigan, Jordan, Moolenaar, & Che, 2014). The chapters in this volume expand the understanding of districts’ brokering role and, as we discuss in the next section, point to important ways that district leaders must navigate multiple, often conflicting information and political pressures. The degree to which districts are able to play their mediator roles may have direct influence on the outcomes and the ways in which policies are implemented (Finnigan & Daly, 2014). Furthermore, it is important to pursue a better understanding of the interdependent and complex, interacting layers within districts. For too long, the work in district reform has not deeply examined the interdepen dent nature of the relationships between schools and central offices, rather viewing them as separate entities. As we become more skilled and aware of how the work of district reform is systemic in nature, it is important

to better theorize the mediating and intermediary roles that districts play among governmental and educational actors. Critical examination of the brokering roles that central office administrators play, and of the condi tions under which they facilitate or constrain district-wide improvement, is an area ripe for examination. A growing and critical line of work reflected in this volume involves the relationships between districts and external partners. Intermediary organizations provide not only input and practical advice but also handson support for struggling districts (Finnigan, Bitter, & O’Day, 2009). Intermediaries have taken on important roles in the packaging of practices and services to “sell” to districts (Scott, Lubienski, DeBray & Jabbar, 2014). A tension results from the fact that these partners fill major district needs and therefore may end up having a great deal of influence over local policies and practices. Despite their deep involvement with districts, these intermediary organizations are often allowed to pursue their own agendas unchecked. Furthermore, even though they receive public funds, they are not necessarily vetted in a systematic and rigorous manner. In the best cases the relationship between intermediary organizations and districts is symbiotic; in the worst cases, the relationship is parasitic and sometimes difficult to dissolve. External organizations have come to play a critical role in the flow of ideas and practices both into and across districts, and their involvement and influence show no sign of abating. Therefore, it is important to better understand the degree to which district-level leaders have the knowledge and skills to learn from these external partnerships as well as to rigor ously evaluate their impact. An emerging line of work suggests that most districts do not have the resources or capacities to carefully evaluate these partnerships and their effectiveness, and that these resources must be developed in training or professional development. This could be accom plished through relationships with colleges and universities, assuming that that those institutions have the capacity among faculty to do this very specialized work and that there are faculty who want to develop these important relationships. If we are truly interested in collaborative, two-way partnerships with external groups—whether think tanks, associations, foundations, research ers, or even other districts—we need to more critically examine the capacity of the external groups to learn from the districts in which they are working. Creating reciprocal learning and knowledge-sharing relationships will be critical to lifting the overall knowledge base of district reform. This line of work presents another opportunity for the next generation of district

reform scholars, particularly in the realm of identifying, characterizing, and better evaluating the roles and types of organizations involved, as well as better understanding how different types of partnerships are formed, sustained, supported, and terminated.

How Politics Shapes District Reform One of the most pervasive themes in this volume is the role of beliefs, values, and assumptions in driving action. Extensive research on beliefs, particularly those of teachers, suggests that they matter for how indi viduals make sense of implementation (e.g., Kelchtermans, 2009; März & Kelchtermans, 2013). Based on education and experience, educators develop a personal belief system that serves as a cognitive lens through which they make sense of their context (Kelchtermans, 2009). The role of beliefs and values is also closely tied to the role of politics and how political actors shape and determine the distribution of resources and access to them.

The Strength of Beliefs Implicit theories as to how the world “works” play an important part in the judgments and interpretations that are made as the work of change takes place. The actors’ intuitive “screens” filter new experiences and knowl edge and thus guide and drive action, while also providing a vantage point from which new ideas and activities are interpreted and enacted. The role of beliefs and assumptions merits more attention in the district reform literature. Beliefs and assumptions are important in considering the ways that politics may undermine some of the goals of ESSA, given local competing interests as well as the more limited voice of low-income families and families of color in educational policy. In addition, competing values, both personal and organizational, influence district reform. From an organizational learning framework, we find more examples of “single-loop learning,” in which individuals and systems operate on the basis of past approaches or current beliefs without engaging in the deeper questioning that can lead to different approaches and outcomes (Argyris & Schön, 1996). The technical work of education reform is important and is, in fact, what most researchers have focused on in an effort to help guide practice. But the social, political, and cul tural contexts in which the work gets done are equally important. Narrow conceptions of improvement and purpose may also artificially narrow the

discourse about reform ideas. The complexity of a reform effort is lim ited by the conceptions of those who are designing and implementing the reform—conceptions that may ultimately inhibit improvement due to narrow and overly simplified identification of issues. Expanding notions of how problems are identified, assessed, and addressed will require first becoming more explicit about the underlying beliefs and assumptions regarding the target of the reforms.

The Power of Politics Education and educational institutions are inherently political. Decisions are often fraught with layers of politics that counter existing knowledge and evidence. Political systems in education and the individuals who populate them often hold competing values and notions about how work gets done and what it takes for improvement to occur, in essence vying for resources, power, and position, often at the expense of wider orga nizational goals. Moreover, politics in education often has high stakes, such as the risk of losing one’s job or one’s control over resources and outcomes. The current reform movement across the United States focuses heavily on technical aspects of schooling, such as the standards for students at different grade levels to prepare them for college and the ways that assessment systems ensure reform efforts. Yet we know from the work in this volume that the best laid plans can be derailed, either by individuals on the ground who do not buy into the efforts or by micro- and macro-political pressures that shape behaviors and responses. Attending to the technical core of the work of districts while also recognizing the complex political environments in which they reside will be important in moving the field forward. Macro-political forces arise from large-scale demands from fed eral and state policy, communities, parents, and other interested parties, who place pressures on school districts to meet certain demands (e.g., improving student performance through planning processes) (Bacharach & Mundell, 1993; McLaughlin, 1990). These macro-political forces are evident in this volume as they often set the context, as well as define and influence the decision-making processes and actions of district-level actors. In essence, macro-political forces exert pressure on individual actors, who by engaging in micro-politics based on their own lenses of values, beliefs, and experiences prioritize action, make judgments, and leverage resources (material, skill-related, and social) to produce out comes. Macro- and micro-political forces are at work in district systems

and all too often are conflated. This volume also hints at how districts continue to be structured in silos, with individual groups working toward their own needs and goals. Discrete actions by departments and other groups create a lack of coherence across these groups—whether they be teachers, educators in a school, parents in a neighborhood, school board members, or central office leaders—as well as a lack of align ment across the entire district. What requires more attention is the way in which external and internal political pressures reinforce these silos or promote rigid responses in the system. These pressures ultimately isolate and polarize rather than connecting individuals and moving them toward collective, shared action. As observed in this volume, coherent systems are those in which micro-decisions align to macro-demands, perhaps resulting in desired outcomes. Ultimately, decision making at the micro-political level can be conceptualized as being about interests (of systems or individuals) and how to protect those interests (Blase, 1991). As Flessa (2009) notes, more robust research designs that probe actor relations are necessary for understanding the macro- and micro-political levels. Future studies that draw on theories provided by political science will offer additional analytic purchase on the study of district reform. Equally important is to more carefully examine how political actors and systems at the macroand micro-levels are related to the uptake and implementation of district reform efforts.

The Importance of District Capacity While some have argued that low outcomes are the result of will, capac ity, or both, there is very little effort in current policy to build the capacity of educational leaders in districts. Although we have argued that districts play an important role in large-scale change—as they weather external and internal political forces—many will fail to achieve necessary complex changes without greater attention to the shifting instructional, curricular, and socioemotional needs in school systems. When people talk about cen tral offices or schools, they often overlook the complex web of authority and relationships that ties individuals and subgroups together. In this final section we call attention to some issues related to capacity, particularly relational coherence and stability. The volume overall points to innova tive methods (e.g., design-based approaches, social network analyses, and longitudinal studies) that could be used to tackle important and pressing questions of district reform.

Relational Coherence As some chapters in this volume show, fragmentation and disconnected efforts across a district negatively impact reform efforts and, ultimately, outcomes. System-wide approaches require attention not just to all of the actors in a system but also to the interconnectedness of the actors. Other chapters note the professional isolation that exists among educators and the need for more attention to how this isolation has impacted improve ment efforts. Attention to horizontal, vertical, and developmental coher ence is important because it is often through these kinds of coherence that reform and change take place. Current reforms in education, which focus heavily on improving the technical core of the work, are impor tant but will not be effective unless attention is also given to the ways in which trust, innovative climate, and system-wide learning come together. Without these relational alignments, improvement is unlikely in district systems. In some of the studies presented in this volume, trusting relationships among stakeholders within a district, as well as between districts and external partners, facilitated improvement both in and outside the district. Both design-based work and top-down policies required collaborative and reciprocal relationships to successfully implement reforms. One of the biggest challenges for both state and local leaders in response to ESSA is not just how to build these relationships but how to rebuild them after histories of dysfunction and mistrust. For example, researchers may need to express more explicitly the connections between their work and the social and educational challenges in their communities. Educators may need to overcome histories of negative interactions among themselves and with their communities. Rebuilding relationships requires nurturing and sustaining trust and actively engaging in the work of creating connections. Research is needed on how to form high-quality social ties imbued with trust, and on the conditions under which long-strained relationships are repaired and renewed to do the work of improvement.

Achieving Stability Without Stagnation The district reform literature often focuses on turnover among educators con sidered as movement out of a system. This volume suggests that turnover consists of both the departure and the arrival of actors in a system. Exploring and understanding churn is important, as significant leadership costs are asso ciated with both the exit of actors (loss of knowledge, social support, orga nizational memory, and funds spent on training and development) and the

entrance of new actors (training costs, time spent on learning technical and social systems). In general, the limited district reform literature regarding churn argues that an organization’s development and improvement depend on the degree to which members make contributions to organizational learn ing, knowledge, and innovation. Constant churn can be viewed as disruptive; inevitably, it results in fiscal, human, and social-capital instability. This volume and the broader district literature both demonstrate the human-capital costs of churn, but this volume points more directly to its social costs. Little attention has been paid to how churn disrupts the relational coherence needed for complex change to happen in urban dis tricts. We have limited empirical work on how churn impacts perceptions of organizational learning in systems and the role of those perceptions in churn, or on how the entry of newcomers into a system may change leadership dynamics. This is a rich area for future scholarship in district reform, and it will be enhanced by new approaches (design-based work) and unique methods (longitudinal social network analysis), as demon strated in this volume. Coupling new frameworks with robust methods will result in a more nuanced examination of how accountability policies impact churn and how churn impacts change.

District Reform, Policy Change, and Research Implications The complex work of district reform, particularly district reform under pressure, is fraught with challenges and rich with opportunity. We have presented some of the most recent thinking in this area to benefit both those working on the ground—in school districts or government agencies—and those continuing to move this area of scholarship forward. The importance of three primary and connected elements has emerged: leadership in dis trict improvement, alignment (or lack thereof) between micro- and macropolitics and resulting outcomes, and the challenge of connecting research to policy and practice. In this final section we consider the next steps in moving forward in policy, practice, and research in light of these themes. An important takeaway from this volume is that state and federal pol icy must shift toward capacity building in educational systems to bring about systems-level change. Although ESSA shifts the focus to the state level, to ensure that this happens other federal efforts could also incen tivize this work. This suggests a better partnering relationship between federal, state, and local actors in support of building capacity throughout the educational system. If we are truly interested in educational improve ment, particularly in the lowest performing schools and districts, policies

must shift away from compliance and toward the building of knowledge and skills—and stabilization—among leaders in these systems. Penalties and sanctions can move systems only so far, and at a potentially great price (Daly, 2009; Finnigan et al., 2009). Shifting to policies that focus on skill development, that recognize and support performance, that create opportu nities for collaboration, that build leader capacity, and that create networks of knowledge sharing holds great potential for improving districts. But it will require a paradigm shift in the way we view our public school sys tem and those who work within it, away from blame and toward complex systems change with support. State policy makers will need to focus less on accountability and more on development, in essence back-tracking to certification and preservice and in-service development to determine how to better develop leaders who deeply understand the importance of struc tures and relationships that support collaborative sense-making. This per spective has, unfortunately, gotten lost in the current reform movements, creating a missed opportunity for fostering innovation and improvement as the threat of school and individual accountability has resulted in narrow responses to change (Daly, 2009; Daly et al., 2014; Finnigan, Daly, Jordan, Moolenaar, & Che, 2014; Finnigan, Daly, & Stewart, 2012; Finnigan & Gross, 2007). The need to shift policy creates a host of important research and prac tice opportunities related to district reform. For example, in California there is much work on Local Control and Accountability Plans, in which districts and local communities determine goals, specific actions, and measurement systems. This approach represents new research opportuni ties in tracking how the local policies are built; what research base is used to determine plans, outcomes, and measures; and how well standards are maintained and communicated. Further, opportunities arise in unpacking the theory of action governing these plans and observing which voices at the table receive the most attention. As we move out of the original NCLB era and into the age of Common Core State Standards, a different set of questions regarding implementation and co-construction will open up for scholars studying educational policy in district reform. Exploring the interaction and communication patterns surrounding these policies will also be critical as a way to understand the framing of improvement. Capacity building may occur through intensive partnerships between district educators and researchers or other external groups, or through leadership development that provides greater attention to the quantity and quality of the social interactions and cross-collaborative efforts that are necessary within districts if they are to become learning organizations. We

see this volume as important not only in documenting what has occurred in districts under the accountability policy reforms of the last decade, but also in pushing the idea of focusing on systems, which means moving beyond traditional piecemeal approaches targeting individual low-per forming schools. Both policy and practice must shift toward district-wide organizational efforts to address the ways in which underlying systems facilitate or hinder school-level improvement. We exist in an increasingly interdependent world, and the districts that serve America’s students are no different. Understanding and embracing the systemic nature of the work will be key for research, policy, and practice, as will the leveraging of different perspectives and new voices. In addition to this shift to a more systemic view, we see a few criti cal areas of research that need to be investigated in the coming years. First, in this hyperconnected world, social media will become increas ingly important in district reform. How media are used (and misused) for professional development, communication, framing of issues, and as a battleground for beliefs will provide a rich area for exploration (see http://www.hashtagcommoncore for a sample). The new media space will require new methods for looking at local issues of policy and practice and at the role of “big data” and socially influential individuals in the work of enacting and examining system-wide reform. Work that takes a larger systems perspective, draws on seldom-used frameworks (e.g., Complex Adaptive Systems), links across disciplines, and examines the social side of district reform may provide rich insights into the complex work of improving districts. As this volume indicates, the work of district reform is at its core a social act that is grounded in sense-making and co-construction among a wide set of actors. The idea of co-creation also extends to the researchers and practitioners who are drawing on “design-based approaches” in which the scholar and partner engage in a co-constructed, mutually respectful, and reciprocal relationship in an effort to understand and improve out comes. This work in education is still evolving, but the research set forth in this volume indicates that it offers some promising benefits for both research and practice. As the volume also indicates, the work of improvement is intercon nected and includes other actors that have not been specifically examined in these chapters. The other actors include important share-holders such as school boards, unions at different levels, mayoral and other governmental offices, and other intermediary groups (e.g., universities, think tanks, and foundations). These groups can wield significant power in the work of

district reform and therefore must be part of the systems view of reform. Politics are often at play in shaping district-wide improvement; many of the chapters indicate that district reform includes a variety of stakeholders competing for position and voice. Drawing on political theory and attend ing to issues of power and control as well as competition and collaboration may yield great opportunities for scholars of district reform. For example, centralization and decentralization of organizations has long been debated in the field. Applying these seemingly polar terms to a temporal trajectory of change and growth may open up new avenues in the work of district reform. Given the passage of ESSA, now is the time for policies that support capacity building and leadership development so that we have, across the nation, a more dynamic and generative model of learning and improve ment in which knowledge is co-constructed and co-implemented within districts as well as across districts and their partners. Through these stud ies, we have the opportunity to learn from previous challenges relating to policy design and implementation and to move toward more functional systems of coherence, collaboration, and capacity in support of preparing students for college and careers. This call is critical in both a practical and a moral sense. For far too long, the most underserved youth have not realized the promise of the U.S. education system, and dedicated indi viduals within systems have been blamed for systemic shortcomings. State and federal policy implemented from a systems and relational perspective could redirect our efforts toward supporting school systems as learning organizations. This can be accomplished by connecting the knowledge that resides inside and outside of systems and by orienting leaders toward deep and lasting change grounded in interdependent relationships.

References Argyris, C., & Schön, D. A. (1996). Organizational Learning II: Theory, method, and practice. Reading, MA: Addison-Wesley. Bacharach, S. B., & Mundell, B. L. (1993). Organizational politics in schools: Micro, mac ro, and logics of action. Educational Administration Quarterly, 29, 423–452. Blase, J. (Ed.). (1991). The politics of life in schools: Power, conflict and cooperation. Newbury Park, CA: Sage. Daly, A. J. (2009). Rigid response in an age of accountability: The potential of leadership and trust. Educational Administration Quarterly, 45, 168–216. Finnigan, K. S., Bitter, C., & O’Day, J. (2009). Improving low-performing schools through external assistance: Lessons from Chicago and California. Education Policy Analysis Archives, 17(7), 1–24.

Finnigan, K. S., & Daly, A. J. (Eds.). (2014). Using research evidence in education: From the schoolhouse door to Capitol Hill. Rotterdam, The Netherlands: Springer. Finnigan, K. S., Daly A. J., Jordan, S., Moolenaar, N., & Che, J. (2014). Misalignment and perverse incentives: Examining the role of district leaders as brokers in the use of research evidence. Educational Policy, 28(2), 145–174. Finnigan. K. S., Daly, A., & Stewart, T. (2012). Organizational learning in schools un der sanction. Education Research International, Vol. 2012, Article ID 270404. doi:10.1155/2012/270404 Finnigan, K. S., & Gross, B. (2007). Do accountability policy sanctions influence teacher motivation? Lessons from Chicago’s low-performing schools. American Educational Research Journal, 44(3), 594–629. Flessa, J. (2009). Educational micropolitics and distributed leadership. Peabody Journal of Education, 84, 331–349. Keaton, P. (2012). Public elementary and secondary school student enrollment and staff counts from the Common Core of Data: School Year 2010–11 (NCES 2012-327). Wash ington, DC: U.S. Department of Education, National Center for Education Statistics. Kelchtermans, G. (2009). Who I am in how I teach is the message: Self-understanding, vul nerability and reflection. Teachers and Teaching: Theory and Practice, 15(2), 257–272. März, V., & Kelchtermans, G. (2013). Sense-making and structure in teachers’ reception of educational reform: A case study on statistics in the mathematics curriculum. Teaching and Teacher Education, 29, 13–24. McLaughlin, B. (1990). Restructuring. Applied Linguistics, 11, 1–16. Scott, J., Lubienski, C., DeBray, L., & Jabbar, H. (2014). The intermediary function in evidence production, promotion, and utilization: The case of educational incentives. In K. S. Finnegan & A. J. Daly (Eds.), Using research evidence in education: From the schoolhouse door to Capitol Hill (pp. 69–89). Rotterdam, The Netherlands: Springer.

Index

The letter f following a page number denotes a figure, the letter n denotes an endnote, and the letter t denotes a table.

Agneessens, F., 200 American Geoscience Institute, 102 assessment accountability and, 50–51 CCSS-aligned, 49–50, 151–152 classroom, design research partnerships case study, 100–107 classroom response systems, 102 consortium-designed, 50, 151–152 current policy landscape, 49–53 large-scale, purposes of, 53–54 state/district expansions and revisions, 49–53 test- or performance-based, 18, 49–53, 57, 60–61 assessment, formative: design research partnerships case study, Denver Public Schools activities, embedding, 104–105 classroom assessment focus, 100–107 cognitive development perspective in, 105 coherence strategies, 104 design process, 106–107 design team, 106 emerging bilinguals concerns, 107 equity priority, 107 findings and implications, 109–111 partnership outcomes, 108–109 autonomy change mechanism, LAUSD PSCI, 132–133 of principals as CEOs, NYC CCLS implementation, 154–156, 155f, 158, 162, 165–174, 224–225

accountability Adequate Yearly Progress reports, 51 data-based decision making and, 183–184 in district reform, 222–224 generational iterations of, 50–51 global demand for, 2 high-stakes, effects of, 52, 54–58, 183 NCLB requirements, 49–51, 183 NYC DOE, CCLS implementation espoused theory in use, 170– 171, 176 espoused theory of action, 155–156 NYC DOE Bloomberg reforms, 158 in teacher evaluation, 49, 51 See also indicator systems achievement factors limiting, 58 global demand for increasing, 2 infrastructure and resources available to improve, 58, 165– 174 outcomes other than academic, measuring, 50, 61–62 school characteristics affecting, 18 test-based, measures of, 18, 49–53, 57 See also Linking Instructional Practice to Student Performance (Linking Study) Adequate Yearly Progress report, NCLB requirement, 51

Berman, P., 79–80 Bitter, C. S., 153 Bloomberg, Michael, 148, 153–154 Bolman, L. G., 212 243

244 | Index

Bozeman, B., 216 Brown, A. L., 81 Bryant, P., 81 Buck, Brandon, 5, 147–179, 213, 224 Bush-Mecenas, Susan, 5, 119–141, 211, 213, 215, 217, 224 Campbell, D. T., 78 Carnegie Foundation for the

Advancement of Teaching,

Improvement Research model, 81–82 Center on Reinventing Public Education Portfolio Network, 119 change pedagogical, 54–56, 150t Change Agent Study, RAND Corporation, 79–80 change mechanisms, portfolio reform LAUSD, PSCI autonomy, 132–133 capacity building / technical assistance and support, 133–135 competition for selection, 129–132 overview, 125–127 parent and community pressure and contribution, 136–137, 139 rigorous screening of plans, 129–132 change theory, 124–125, 126f Cheng, B. H., 81 Chevron Corporation, 102 Children First Networks, 154–157, 157f, 165–174 classroom response systems

(clickers), 111n2

Clinton administration, Bill, 50 Coleman Report, 13 college/career readiness, 50, 61–63 College Readiness Indicator

Systems, 61

Common Core State Standards (CCSS) adoption and implementation, 147, 149–153, 178nn1–4 assessment aligned with, 49–50, 151–152

critics of, 147–148 development, 147 drafting the, 151 instructional shifts demanded by, 150t purpose, 147 requirements, 147 See also New York City (NYC) Department of Education (DOE), CCLS implementation complex adaptive systems, 111n1 Consortium for Policy Research in Education (CPRE), University of Pennsylvania, 78, 82 Contingent Pedagogies project (NSF), 101–109 Cook, T. D., 78–79 Council of Chief State School Officers, 147, 151 Cuomo, Andrew, 178 curriculum changes with test-based accountability, 54–55 Daly, Alan J., 1–7, 183–201, 213, 216, 227, 229–240 data-based decision making, 183– 184, 225 data systems, third-generation, 51 Deal, T. E., 212 Deasy, John, 124 DeBarger, Angela Haydel, 4, 97–111, 212, 216–217, 225 de Blasio, Bill, 177–178 design-based implementation research (DBIR), 81 district effectiveness literature conceptual dimensions proposed directions, 37–39 research review, 28–34 review findings, 19–22 school-district correlates, 28, 29f implications for research and practice, 35–37 methodological dimensions

Index | 245

limitations, 226–227 proposed directions, 37–39 review findings, 19–20 district effectiveness literature review conclusions, 34–35 data collection techniques, 24–28, 24f design and methodology, 15–18, 22–28, 23f, 47 scope and purpose, 14–15 district effectiveness literature review, findings

of inadequate comparisons, 19

of inadequate measures of

effectiveness, 20

of inadequate sampling, 19 of inadequate theoretical frameworks, 21–22 reductionist view purposes of education, 20–21 school effectiveness relation, 33–35 sociopolitical and normative context of schooling, inadequate treatment of, 21 district effectiveness research background, 11–14 conceptual dimensions school-district correlates, 28, 29f technical correlates, 28–31 effectiveness measures, 26–29, 27f, 31–32f proposed directions, 37–39 theoretical framework, 33–34 district reform advantages of, research on, 12–13 capacity building strategy, 225– 226 coherence goal, 97 equity goal, 97, 100 federal role in advancing, 222–224 human-capital foundation of, 227 leadership role in, 216–217 organizational effectiveness vs., 210–211

policy coherence strategy, 226–227 pressure as a mechanism of, 214–216 research partnerships, 98–100 social networks and social capital in, 185–194, 200–201, 213, 236–237 systemic thinking and action, 212–214 urban districts, 221–222, 224–227 See also leadership churn, effects on learning and improvement in low-performing districts district reform, research recommendations actors role, 239–240 beliefs and assumptions role, 233–234

district capacity, 235–237

district forms and functions, effect on change, 230–233 partnerships, 232–233, 238–239 policy, moving forward in, 237–239 politics effect on shaping, 234–235 relational coherence, 236 social costs of churn, 236–237 social media, 239 district reform initiatives, 97–100 districts decentralization, advocates for, 12 defined, 209 low-performing, competing reform perspectives, 221 measurement systems, guidance for designers of, 64–68

NCLB accountability

requirements, 49–51, 183

school performance measures, efforts to expand, 60–61 situational differences, 209–210 student performance effect,

literature on, 12

21st-century competencies,

measuring, 61–63, 63t

246 | Index

double-loop learning, 185–186 Duckworth, A. L., 62

Duncan, Arne, 222

Dutton, J. E., 215

Edmonds, R., 18 educational research, traditional role, 78–82 educational structure, aligning, 12 Education Department (DOE), U.S., 151 effectiveness, organizational, 210– 214, 216 effectiveness measures, districts, 20, 26–29, 27f, 31–32f equity in education, 51, 56–57, 66, 97, 100 Every Student Succeeds Act (ESSA), 50, 221–223, 229, 236, 240 evidence-based decision making,

183–184, 225

experimental research intervention stability condition, 93 program learning relation, 78–82 experimentation, formative capacity building with, 225–226 implementation decision factors, 91–94 Linking Instructional Practice to Student Performance (Linking Study), 82–93 program learning relation, 78–82 Fariña, Carmen, 177–178

Faxon-Mills, S., 54

Fendt, C., 1 Finnigan, Kara S., 1–7, 183–201, 213, 216, 227 Fishman, B. J., 81 Gomez, L. M., 153 grit, measuring, 62 Hamilton, Laura S., 4, 49–68, 214, 225

Hill, Paul, 119

Houston, David M., 5, 147–179, 213,

224 Improvement Research model, Carnegie Foundation for the Advancement of Teaching, 81–82 indicator systems academic achievement, measuring outcomes other than, 61–62 accountability measures in teacher evaluations, 49, 51

conclusions, 68

defined, 52, 69

designing, 62–68

equity and, 56–57, 66

purpose, 51 purpose-selection congruence, 63–64 test-based accountability, responses to implementation, 54–58 trade-offs, 225 indicator systems, measures in high- and low-stakes, including, 67–68 instructional factors when adopting, 59–60 meaning of in indicator systems, 52–53 practical factors when adopting, 59 school/district performance current policy landscape, 49–53 expanded measurement systems, 60–61 technical factors when adopting, 59 Inquiry Hub (NSF), 110

Institute of Education Sciences, DOE,

79, 225 isolation, professional, 154, 225, 236 Kalchman, M, 81 King, John, 177 Klein, Joel, 153–154

Index | 247

Labianca, G., 200 leadership district improvement, role in, 216–217 measuring, 62 leadership churn, effects on learning and improvement in low-performing districts conclusions and implications, 197, 200–201 methods and data sources, 187–189 research focus, 184–185 results, 189–199, 190f, 192–193f, 195–196f, 198–199f social networks, social capital, and, 186–189, 190f, 192–193f, 195– 196f, 198–199f, 200–201, 213 theoretical framework, 185–187 learning, organizational, 185–194, 200–201 learning communities, 82, 84, 225–227 Linking Instructional Practice to Student Performance (Linking Study) design, 83–85

design changes, 85–91, 90t

discussion, 91–94

hypothesis, 82–83

introduction, 82–83

overview, 84f

summary, 89–91

Liou, Yi-Hwa, 5, 183–201, 227 Los Angeles Annenberg Metropolitan Project (LAAMP), 121 Los Angeles Educational Alliance for Restructuring Now (LEARN), 121 Los Angeles Unified School District (LAUSD), 121 See also portfolio reform, Los Angeles School District (LAUSD) Public School Choice Initiative (PSCI) Malen, B., 148 March, J. G., 215

Marsh, Julie A., 5, 119–141, 210, 224 McCandliss, B. D., 81 McDonnell, L., 147–148 McLaughlin, M. W., 79–80 measurement systems guidance for designers of, 64–68 purpose of, 63 See also indicator systems, measures in motivation, effect on test-based accountability, 57 National Governors Association, 147, 151 National Science Foundation (NSF) Contingent Pedagogies project, 101–109 Inquiry Hub, 110 A Nation at Risk, 222–223 New York City (NYC) Department of Education (DOE) Bloomberg reforms, 153–158 statistics, public school system, 153 New York City (NYC) Department of Education (DOE), CCLS implementation accountability mechanisms Bloomberg reforms, 158 espoused theory in use, 170– 171, 176 espoused theory of action, 155–156 central office power espoused theory in use, 161, 163–167, 170, 178, 224–225 espoused theory of action, 154 CEO autonomy Bloomberg reforms, 158 espoused theory in use, 162, 162f, 165–174, 224–225 espoused theory of action, 154– 156, 155f, 171 Children First Networks, 154–157, 157f, 165–174

248 | Index

Citywide Instructional Expectations (CIE), 161, 163– 167, 163t, 170, 178, 224–225 criticisms, 2013-2015, 177–178 espoused theory in use conclusions, 174–177 espoused theory of action alignment, 148, 158–160 espoused theory of action components, 155f high- vs. low-performing schools, 160t, 166–169, 173–176, 224–225 research questions and data sources, 158–160, 160t resources and support espoused theory in use, 165– 174 espoused theory of action, 154–157, 155f New York State United Teachers, 177 No Child Left Behind (NCLB), 49, 51, 54, 79, 151, 183, 222–224 Oakes, J., 37 Obama administration, Barack, 151, 222–223, 225 O’Day, J., 153 Office of School Turnaround, 223 organizational change, district

improvement and, 210

organizational effectiveness, 210– 214, 216 Partnership for Assessment of Readiness for College and Careers (PARCC), 151 pedagogy, changes in effects of test-based accountability, 55–56 instructional shifts demanded by CCSS, 150t Penuel, W. R., 4, 81, 97–111, 212, 216–217, 225 portfolio districts, key elements, 119

portfolio management model, 119 Portfolio Network, Center on Reinventing Public Education, 119 portfolio reform, Los Angeles Unified School District (LAUSD) Public

School Choice Initiative (PSCI)

background, 120–124 change theory, 124–125, 126f conceptual framework, 124–127 conclusions and implications, 137–141

data and methods, 127–129

design changes, 124 goal of, 121–122 implementation politics, 224

mechanisms of change

autonomy, 132–133 capacity building / technical assistance and support, 133– 135 competition for selection, 129–132

overview, 125–127

parent and community pressure and contribution, 136–137, 139 rigorous screening of plans, 129–132 portfolio environment, 126 school application and selection, 122–124, 123t, 126, 129–132 pressure as mechanism for district improvement, 214–216 parent and community, in portfolio reform, 136–137, 139

threat-rigidity response, 215

program development DBIR-related, 81 design experiments to develop, 79–81 experimental research role in, 78–82 researcher-program developer collaboration, 81–82 Race to the Top, 151, 183

Index | 249

RAND Corporation Change Agent Study, 79–80 Reilly, M., 223 Repenning, N. P., 216 research, NCLB requirement for scientifically based, 79 Rudolph, J. W., 216 Sabelli, N., 81 Sandelands, L. E., 215 school effectiveness research conceptual dimensions, 20–22 correlates of achievement, 18–19 district effectiveness relation, 13–14, 26–29, 29f, 33–35 methodological dimensions, 19–20 school-by-school reform, 11–14 school improvement communication in, 225 experiment’s role in, 78–82 federal role in advancing, 222–224 fiscal incentives, 223 NCLB accountability requirements, 49–51 School Improvement Grants (SIGs), 223 schools, high- vs. low-performing low-performing turnaround framework, 222–223 NYC CCLS implementation, 166– 169, 173–176, 224–225 See also leadership churn, effects on learning and improvement in lowperforming districts Schwartz, Heather L., 4, 49–68, 214, 225 Simon, H., 215 single-loop learning, 185–186 Slavin, R. E., 79 Smarter Balanced Assessment Consortium (SBAC), 151 Smith, Courtney O., 5, 147–179, 213, 224

Smylie, Mark A., 1–2, 6, 209–217, 229 social media, 239 social networks and social capital district reform, research recommendations, 236–237, 239 leadership churn and, 189–199, 190f, 192–193f, 195–196f, 198– 199f, 200–201 organizational learning and, 185– 194, 200–201, 213 trust aspect of, 186–187, 191–192, 200–201 Soland, J., 58 Soltis, S. M., 200 Sosovova, Z., 200 Spencer Foundation of Chicago, 82 standardized tests, 18 standards-based reform, state-level, 149–153 Stanley, J. C., 78 states ESEA reauthorization, response to, 50–51, 61 measurement systems, guidance for designers of, 64–68

NCLB accountability

requirements, 49–51, 54

NCLB waivers, 223–224

school performance measures, efforts to expand, 60–61 21st-century competencies, measuring, 50, 61–63, 63t See also Common Core State Standards (CCSS) Staw, B. M., 215

Strunk, Katherine O., 5, 119–141,

211, 224 Supovitz, Jonathan, 4, 77–94, 226 teachers isolation, professional, 154, 225, 236 performance-based evaluation,

49, 51

teacher-student interactions, effects of test-based accountability on, 56–57 theory-of-action research, 148 See also New York City (NYC) Department of Education (DOE), CCLS implementation Thompson, J. G., 215 threat-rigidity response, 215 Trujillo, Tina, 4, 11–39, 211–212, 217, 226 trust aspect of social capital, 186–187, 191–192, 200–201 trust relationships, reform and, 236 21st-century competencies, 50, 61–63, 63t University of Pennsylvania, Consortium for Policy Research in Education (CPRE), 78, 82 Weatherford, S., 147–148 Wenzel, S., 1 William and Flora Hewlett Foundation, 61 Wohlstetter, Priscilla, 5, 147–179, 213, 215, 217, 224 Wong, Kenneth K., 2, 221–227, 229

Conference Participants AlAn J. DAly, Co-Convener, University of California, San Diego KArA S. FinnigAn, Co-Convener, University of Rochester Stephen e. AnDerSon, University of Toronto, Ontario Institute for Studies in Education WilliAm A. FireStone, Rutgers Graduate School of Education Betheny groSS, Center on Reinventing Public Education lAurA S. hAmilton, RAND Corporation Julie reeD KochAneK, American Institutes for Research KerStin cArlSon le Floch, American Institutes for Research KAren SeAShore louiS, University of Minnesota Betty mAlen, University of Maryland Julie mArSh, University of Southern California michelle pAlermo-BiggS, Roberts Wesleyan College WilliAm r. penuel, University of Colorado, Boulder Joelle roDWAy, University of Toronto, Ontario Institute for Studies in Education AnDreA K. rorrer, University of Utah georgiA SAng-BAFFoe, University of Rochester mArK A. Smylie, University of Illinois at Chicago louiSe Stoll, University College London, Institute of Education JonAthAn Supovitz, University of Pennsylvania, Consortium for Policy Research in Education tinA truJillo, University of California, Berkeley priScillA WohlStetter, Teachers College, Columbia University Kenneth K. Wong, Brown University Note. These are the participants’ institutional affiliations at the time of publication.

251