As PAEA transitions the End of Rotation exams to scale scores, many programs have questions about how to convert scale scores to grades. As has always been the case, it is important to reiterate that PAEA does not set standards for End of Rotation exams, nor do we advise programs on setting their own performance bars or passing score requirements. Additionally, the national mean score of any PAEA End of Rotation exam does not suggest a pass bar. These need to be established at the program level.
In making this change, our goal is to help programs understand how to use scale scores to make informed decisions about student performance. The following are a few general considerations and some methodological guidance to help programs determine how to convert a scale score into a letter grade.
Similar to when the PAEA End of Rotation exams were first introduced and programs had to determine performance bars or passing scores, the scale score enhancement requires that program faculty determine how to translate scale scores into grades. Depending on the program, this may be pass/fail or letter grades. PAEA End of Rotation raw scores range from 0 to 100, making it appear relatively easy to transfer raw scores into percentages and thus letter grades. But even when 0 to 100 raw scores were reported, a raw score of 80 did not translate to “80%” for every program. PA programs have used a number of different techniques including z-scores, cohort-level means and standard deviations, as well as program trends over time, to determine performance bars and passing score requirements that fit their program’s grading scale.
With the transition to scale scores, student scores on End of Rotation exams will now range from 300–500, which is significantly different from a raw score. However, the process of translating that scale score to a grade should follow the same principles that were used when translating a raw score of 0–100 to a grade.
A unified scale eliminates differences between forms. PAEA transitioned to scale scores to remove small differences in difficulty between exam forms. We do not recommend using the conversion tool to revert the scale score back to a raw score for purposes of assigning a grade. Not only is this a cumbersome extra step, but it also reintroduces differences in difficulty between forms that scale scores eliminate.
Bell-shaped curves. It is important to remember that only the score metric has changed. Scores will still follow the same bell-shaped curve, with means and standard deviations. For example, see the following classic histogram of model-based performance data for the soon-to-be-released Emergency Medicine End of Rotation exam, Version 6:
Consistency is important. No matter which method your program chooses to convert the scale score into a grade, we encourage you to maintain consistency with practices you have followed in the past. As a starting point, look to how your program has historically set performance bars, passing scores, and grades.
A single decision point allows programs to make the strongest determination. When considering performance bars and passing scores, programs can make the strongest, most defensible statement about a student’s performance by setting just one pass/fail bar. The more performance bars set, the more difficult it is to differentiate student performance. For example:
- Programs can make the most reliable determinations about student performance if they set one performance bar. The grouping of all the students who passed or failed the exam into two categories, regardless of their exact score, is both fair and defensible. This clearly differentiates students who have appropriate knowledge (they passed) from those who have significant knowledge deficits (they failed).
- Programs can make reasonable determinations about student performance if they set a few performance bands, i.e., setting scoring quartiles or ranges. Looking at a traditional grading scale, we can say that all students who score between 90–100% are high achievers. The grouping of students into defined performance bands can stratify students into performance categories and is also fair and defensible.
- It becomes risky, however, to differentiate students in small range performance bands or point-by-point. Using that traditional grading scale as an example, we cannot make meaningful determinations about the difference between two students who scored a 94% and 95% on an exam.
We encourage program faculty to meet, discuss, and agree upon the following:
- Performance bars and/or passing scores for the new scale score metric that are best for your program, as well as the rationale for those choices so you can defend the grades, if necessary (i.e., in the setting of an academic action)
- The method your program will use to convert scale scores to recordable grades
- Which student policies need to be updated, as well as how and when those updates will be communicated
The options below are presented to provide considerations and methodological guidance. This guidance will help your program consider overall supervised clinical practice experience (SCPE) and/or course grades in the context of how to convert scale scores to best meet your grading criteria. This list is not all-inclusive, and order does not indicate an endorsement or preference.
Pass/fail — compensatory. In this model, programs use the PAEA End of Rotation exams as a pass/fail exam, setting a single pass bar. If a student passes the exam, a specific point value will be assigned that contributes to the overall course or SCPE grade. If a student fails the exam, it is associated with a lower point value that contributes to their overall grade. This is called the compensatory model because it allows students who fail a PAEA End of Rotation exam to “compensate” and still potentially receive a passing course or SCPE grade if they were strong in other graded areas (i.e., preceptor evaluation, OSCE, or other course assignments). Remediation, if required by the program, should be considered for students who do not achieve a passing grade.
Pass/fail — non-compensatory. Also known as pass/fail — conjunctive hurdle. In this model, programs also use the PAEA End of Rotation exams as a pass/fail exam, setting a single pass bar. However, a student cannot fail the exam and still pass the course or SCPE. It’s called a conjunctive hurdle because students are required to pass multiple hurdles to pass the course or SCPE. They must pass the exam and get a passing course grade that is made up of other required course components. In this model, the PAEA End of Rotation exams become a standard for progression.
Percentage-based performance bands. In this model, programs convert a scale score range to percentage-based performance bands, choose a single mid-point for each band, and assign that mid-point as a grade. Any student who performed within that range would be assigned the same percentage-based grade. For example, your program may decide that a scale score range of 400–450 converts to 80–90% and that all students falling in this performance band are assigned an 85% as an exam grade. That 85% then becomes a single component of the course, or SCPE grade. With this model, students may be frustrated if their score was on the higher end of the band, but they were still assigned an 85%. However, we know that it’s difficult to make any substantial determinations about the difference in knowledge between a student who got an 85% versus an 89%, making this model defensible. This model is similar to the way the assignment of final letter grades works at many institutions. An A grade may equate to any grade of 93–100%. The A is recorded on the transcript, not the specific percentage. If more precision is required, smaller bands can be developed. Students can further distinguish themselves via other components of the overall evaluation. It’s important to note that programs should not convert the scale score back to a raw number to implement this model. The correct way to implement this model is to look at the scale score means and standard deviations and set scoring ranges that can be converted to percentage-based performance bands.
Z-score. Using the z-score model is, in essence, applying a second scale to the PAEA scale score — it may be a reasonable option for programs, especially those programs already using the z-score. We can use the example provided above (PAEA Emergency Medicine End of Rotation exam, Version 6) where the mean scale score is 402.99 and the standard deviation is 22.55, and then calculate a z-score for a hypothetical test score of 450:
The student has a z-score that is 2.08 standard deviations above the mean. Refer to a statistics textbook or online program for a z-table.
Note: All of these models should only be applied to total scores, not subscale scores. Trying to convert subscales for an individual student in a small content area to an alternative metric limits your ability to make meaningful inferences or determinations.
Several programs provided examples of the decision-making process and grading metrics they used. Those case studies are available here.