Performance Assessment for Teachers and of Teachers: Combining the Development of Teaching with Teacher Evaluation

Editor’s Note: In a previous post by Lee Nordstrom, he warns that we should not conflate evaluation and improvement processes, but he also points out that systems of evaluation and improvement are not mutually exclusive. This post explores the conditions under which these two systems might be coherently integrated.

The national push to revamp systems of teacher evaluation has spurred a growing call for the need to attend to teacher development, not just evaluation. But can school leaders effectively evaluate teachers and simultaneously support their growth? Are these goals too contradictory to combine, or can a single system support both efforts?

In a recent 90-Day Cycle[i] conducted at the Carnegie Foundation, we explored the question of if teacher evaluation and teacher development efforts can and should be combined as aspects of a single system.  From 20 expert scholars and practitioners in education and several key pieces of literature, we heard an emphatic “yes.” These experts argued not only that is it possible to address both goals in a blended way, but it is preferable for these two efforts to be woven together. One important caveat was that the larger school cultural context within which these processes occur matters greatly. If the school culture is focused on professional growth, the potential success of combining evaluation with improvement is much greater than if such a development-oriented culture does not exist.

Combining Formative and Summative: Reconciling Characteristics of Assessment Across Purposes

The scholarly conversation about whether and how formative and summative assessments can be combined has been ongoing for decades in the field of education. There is a widely shared understanding among experts that the differing aims of formative and summative assessment lend themselves to different characteristics of an assessment system. Some of the more prominently discussed characteristics include:

The grain-size of the data Specific and actionable Broad and global
The frequency of assessments and feedback frequent Infrequent
The importance of reliability of data Less critical because context-specificity is valuable Important because valid global and uniform conclusions depend on high reliability
The criteria by which to make judgments about students’ learning Dependent on context and the individual’s own past performance Criterion- or norm-referenced to enable uniform judgments across learners

While these characteristics may appear contrary across the two purposes of assessments, some scholars assert that these differences are not mutually exclusive.  In “Systems of Coherence and Resonance: Assessment for Education and Assessment of Education,” authors Paul LeMahieu and Elizabeth Reilly point out that some characteristics are necessary for a particular purpose, while others are common but not required. They give the example that frequent feedback is necessary for formative assessment, but summative assessments do not require infrequency—they can also be frequent.

In the same vein, in “Assessment and Learning: Differences and Relationships between Formative and Summative Assessment”  Wynne Harlen and Mary James assert that detailed, context-specific data are requirements for formative assessment, but that these data can be aggregated over time to produce a holistic perspective and more reliable data for summative purposes. Summative assessments do not require strictly general and non-specific evidence—even if they are often informed by such data.

What we must differentiate when formative and summative assessments are combined is the lens through which judgments are made.

What we must differentiate when formative and summative assessments are combined is the lens through which judgments are made. Formative assessments should depend on learners’ own past performance and the particular context of assessment, while summative assessments should be judged against external standards or norm-referenced criteria, so that uniform judgments are made across all learners. The critical point for this discussion is that with thoughtful operationalization, the evidence and the mode of data collection that serves formative purposes can also function for summative purposes with the aggregation of fine-grained and frequent data.  The differentiation comes when making inferences and determining next steps, which require different lenses, but do not necessitate entirely separate systems.

Practitioners Call for Combining Improvement and Evaluation Efforts

In addition to technical characteristics of assessment systems, there is a set of issues articulated by the individuals who experience and utilize processes of assessment and feedback. In a study, “Seeking Balance Between Assessment and Support,” of 83 teachers in six high-poverty urban schools, Stefanie Reinhorn found that most teachers said they want to be evaluated and that the evaluation should be connected to support in the same process. These teachers explained that the combination of evaluation and support led to a professionalization of their work, holding all teachers to clear and high standards. Other experts have also found that teachers prefer to be evaluated by someone who knows their practice well and who has seen their growth over time, rather than an evaluator who visits their classroom infrequently.

On the other side of the feedback relationship, feedback providers also described a preference for combining support and evaluation. These experts explained that teachers are more likely to take feedback seriously and to make changes in their practice when the feedback is connected to evaluation. This is especially the case when the feedback includes critiques of the teacher’s current practice. Brian Yusko and Sharon Feiman-Nemser make this point in their study of two induction programs, “Embracing Contraries,” describing how the feedback from Consulting Teachers (CTs) in Cincinnati had “teeth,” since there were consequences if teachers did not act on the CTs’ feedback.

Experts also discussed some unintended negative consequences of a system where evaluation and development are separated. In such a system, teachers are left to their own devices to “connect the dots” between the multiple sources of feedback. Especially for early career teachers, this may prove to be challenging, leaving teachers feeling overwhelmed or confused.  When coaches and evaluators are not able to align their feedback for teachers, they are also prevented from combining and coordinating their strengths. In a system with a firewall, feedback providers with specific expertise cannot easily enhance the work of their colleagues who lack this expertise through a team-based approach to providing feedback.

Trust between teachers and feedback providers is essential for transparency of practice, communication, and the uptake of recommendations.

Building Trust in the Presence of Evaluation

Trust between teachers and feedback providers is essential for transparency of practice, communication, and the uptake of recommendations that can lead to the improvement of teaching. A reason often given for separating development and evaluation efforts is that teachers will feel more comfortable sharing their practice with someone who is not also responsible for evaluating them. The experts we consulted agreed with the importance of trust to promote transparency and growth, but they argued that whether teachers trust their feedback providers does not depend on whether she does or does not also evaluate.  Instead, they explained that trust depends on whether teachers see the feedback providers as effective aides to their professional growth who are genuinely committed to supporting them. Yusko and Feiman-Nemser found this to be the case for CTs in Cincinnati, who both evaluate and support teachers’ development.  CTs reported that their relationships with early career teachers usually developed trust over time, even though they evaluate the teachers.

Next Steps

The experts whom we consulted laid out a strong set of arguments that it is possible and even preferable for efforts of teaching development to be combined with teacher evaluation. There is research that supports these assertions. But this says little of how school leaders should combine these efforts in their day-to-day practice. CTs in PAR programs offer one powerful example, and we should leverage what we can learn from their work.  However, this is one model, and there is also a need to document and explore other examples in other contexts that can serve as practical guidance for school leaders. Collecting and condensing the wisdom from the field about how, concretely, to combine efforts of evaluation with efforts of teaching improvement should be a next step in this line of inquiry.  Then, taking an improvement science approach, school leaders interested in moving towards an effective model of combining evaluation with development can test these practices in their contexts to ultimately serve the goal of improved teaching and learning in their schools.

[i] 90-Day Cycles are a disciplined and structured form of inquiry adapted from the work of the Institute for Healthcare Improvement (IHI).  90-Day Cycles aim to:

  • prototype an innovation, broadly defined to include knowledge frameworks, tools, processes, etc.;
  • leverage and integrate knowledge from scholars and practitioners;
  • leverage knowledge of those within and outside of the field associated with the topic; and
  • include initial “testing” of a prototype.