Exams help us determine whether students have learned, provide feedback to students about what they have learned, drive students to study, and help discriminate among students. However, writing challenging yet fair exam questions is one of the most difficult things we do as instructors. Luckily, we can use research on learning and data from our own exams to write better questions and make exams learning experiences as well as assessment tools. 

Writing exam questions

One way to write useful exam questions is to complete the following process:

  1. Write a learning objective statement that reflects what you expect students to be able to do after they have learned what you are teaching. For example, do students need to recall a fact, analyze data, interpret a graph or figure, solve a problem, make a prediction, or define a term? 
  2. Then draft a question or problem that requires students to achieve the objective to answer the question or problem correctly. 
  3. Rate the level of thinking students will use to answer the question. Categorizing objectives according to Bloom's taxonomy is one approach to determining whether students will need to think at a higher or lower level. Recalling facts and describing ideas are tasks that are not very cognitively demanding, while applying knowledge and skills to a new situation, analyzing data, and constructing and evaluating arguments demand higher level thinking. This set of verbs can be helpful for thinking about how to turn a low level objective into a higher level objective.
  4. If you want to turn the question into multiple-choice, generate a list misconceptions, erroneous ideas, or errors in logic that students may employ that would prevent them fom being successful on the question. Write response options that students would be likely to choose if they held misconceptions or erroneous ideas, or made errors in logic. Try to avoid "none of the above" and "all of the above" responses since students may be able to rule these out without understanding the material. 
  5. Take your own exam or asking a colleague, graduate TA, or undergraduate learning assistant to take it and give you feedback. Are the questions or problems clear? Are they getting at important ideas in the discipline? Do they vary in difficulty? Remember that students will likely take at least twice as long as you will to complete the exam. 
  6. Revisit your exam AFTER students have taken it. Did some students finish early? If not, maybe the exam was too long. Do the test statistics reveal issues with the difficulty of the questions or problems, or their ability to reveal useful information about student learning? 

Interpreting exam statistics

Two commonly calculated exam statistics can reveal important information about the quality of exam questions. These statistics are calculated automatically for multiple-choice exams using bubble sheets, but can also be calculated for open-ended questions or hand-graded exams by tracking students' scores on individual questions or problems.

The first is item difficulty, which may be better understood as item easiness. It is the percent or proportion of students who anwered the question correctly (the P-value). If all answer correctly, item difficulty is 100% or 1.00 (i.e., the question is "easy" to answer). If no students answer the question correctly, item difficulty is 0% or 0.00 (i.e., the question is impossible to answer). 

The other common exam statistic is item discrimination, also known as R or the point biserial correlation, is the correlation between students' performance on a specific question and on the entire test. The closer the value is to +1.0, the better it is at discriminating among students who performed well versus poorly on the test. Discrimination values of >0.20 are considered good.

Item difficulty and item discrimination values should be considered together when evaluating the quality of particular exam questions and the exam as a whole. For example, if the question assessing an idea that is straightforward to understand and that all students should have learned, item difficulty values (i.e., item easiness) will be high and item discrimination will be low. If however many students miss the question (item difficulty value is low), including the highest performing students in the class (item discrimination is also low), it may not be a very good exam question - students may not be understanding or interpreting it as you intended.

In designing exams, here are questions and points to consider:

  • How many questions should have low, medium, or high levels of difficulty? An exam with too many difficult questions may be discouraging to students, and an exam with too many easy questions may not be fully assessing what students have learned.
  • In revising your exams from year to year, do you want to keep questions with a difficulty of less than 0.20? Perhaps yes, if the question is fundamental to learning in your course and you want to make sure all students understand the idea the question is asssessing.
  • In revising your exams from year to year, do you want to keep questions with a difficulty of greater than 0.80? Perhaps yes, if the question is assessing a complex or challenging idea that is fundamental or otherwise important to the discipline.
  • It is useful to consider difficulty and discrimination together. A question with a low difficulty value (i.e., not many students answered correctly) and a high discrimination value may be helpful for identifying the highest performing students. A question with a low difficulty and a low discrimination value may need to be revised because even the highest performing students are unable to answer it.

Making exams a learning experience

We often think of exams as the last thing students do - a demonstration of what they have learned. However, exams can also be opportunities for further learning. Here are two strategies for encouraging students to LEARN from their exams:

Exam corrections. As much as we would like for our students to review their graded exams and study the content they missed, there are few if any immediate incentives for students to do so. Giving students the opportunity to earn a portion of their missed points back can be a good incentive (not all of them, which may discourage studying in the first place!). Ask students to provide correct responses for the questions they missed along with an explanation for why their first response was problematic. Test corrections work best when instructors DO NOT distribute exam keys, and when exam questions demand higher level thinking. If the questions require only factual recall, then the reason students miss them is because they simply didn't remember the facts.

Group testing. Peer instruction not only works for in-class activities, it can be used effectively through a process called group testing, collaborative testing, or two-stage exams. Students first take the exam solo. Solo exams are collected. Then students work in previously assigned groups to take the exam again, as a group. Through this process students have the opportunity to think on their own, and then are required to explain and defend their thinking to their peers. Most of the time, group test scores are higher than solo test scores because the quality of stduents' thinking improves as a result of the debate. In fact, students who all selected incorrect responses on the solo exam can arrive at the correct response through the process of thinking and debating as a group - this is not simply the result of the "smart" student in the group telling the rest of the group what the correct answer is. Group testing can also encourage less confident students to speak up and defend their selections to the more vocal members of the group when their group test grade is on the line. For more on group testing, see: