Marielle's little place on the web

about usability, cognition, neuroscience, psychology, learning, interface design, ergonomics, and other interesting things

Browsing Posts published by marielle

  • The 960 Grid System is an effort to streamline web development workflow by providing commonly used dimensions, based on a width of 960 pixels. There are two variants: 12 and 16 columns, which can be used separately or in tandem.
  • The Public Sphere Project (PSP) is an initiative of Computer Professionals for Social Responsibility (CPSR) to help promote more effective and equitable public spheres all over the world. The Public Sphere Project is intended to provide a broad framework for a variety of interrelated activities and goals. With pattern library.
  • InterPlay is a platform for designers in the Media Lab to create dynamic social simulations, which transform public spaces into immersive environments where people become the central agents. It utilizes computer vision and projection to facilitate full body interaction with digital content.
  • Mouseless consists of an Infrared (IR) laser beam (with line cap) and an Infrared camera. Both IR laser and IR camera are embedded in the computer. The laser beam module is modified with a line cap and placed such that it creates a plane of IR laser just above the surface the computer sits on. The user cups their hand, as if a physical mouse was present underneath, and the laser beam lights up the hand which is in contact with the surface.

How many people are needed for a usability study? The question comes up time and time again, with different answers. This time, Hwang and Salvendy have tried to answer it by a meta-analysis of the available literature since 1990. As inclusion criteria they used:

  1. the usability evaluation was done with one of the methods think-aloud, heuristic evaluation or cognitive walkthrough
  2. the study reported the number of participants in the evaluation (users or evaluators) and the overall discovery rate of errors.

Out of the 102 usability evaluation experiments found, only 27 satisfied the inclusion criteria. Hwang and Salvendy then performed a linear regression analysis on the data, and tried to estimate the number of people needed to detect 80% of the usability problems. The results: 9 for think-aloud, 8 for heuristic evaluation and 11 for cognitive walkthrough. This leads them to propose 10±2 as a rule of thumb.

I have a few problems with this approach. The ‘how many users’ question is a very logical question to ask, both when planning a study and when interpreting the results. But the statistical analyses are just numbers, and to determine the real value of a study one should not forget the content.

  • Not all errors are equal. While 80% detection sounds good, issue severity should not be neglected. Hwang and Salvendy mention that the cognitive walkthrough method is good at finding critical issues, but less adequate for detecting minor flaws. Knowing that, I wouldn’t choose for increasing the number of evaluators to 11 (!), but would rather combine a smaller early cognitive walkthrough with another evaluation method later, maybe even on a fixed prototype. If 2 or 3 evaluators using the cognitive walkthrough method can point to some of the severe issues, this is valuable enough in itself.
  • Issues are not always errors. A usability evaluation can help discover potential problems, but the way these issues influence the user experience and user behavior in real life may not be the same as in the user study.
  • The usefulness of your results depends on how they are used. Using the results wisely to improve the next design iteration of a system is useful. Using them to formulate specific questions to be answered with other methods (think of A/B testing for example), or to decide on what data to collect, makes sense too. But using them to calculate a magic number to report to stakeholders (based on ‘few issues found = good system’) makes less sense.

Besides, what does that 80% number mean? 80% of the total number of issues hidden in a system, but that’s not a real, measurable quantity. Okay, when the number of distinct issues found is plotted against the number of participants used, the curve does flatten and an upper limit can be estimated. But even with very large groups, there is still a chance that one more participant will find something that all of the others overlooked. In addition, the characteristics of the participants and the protocol used also influence which types of issues will be found easily.

However, I think that in practice the 10±2 guideline should work pretty well, especially for the think-aloud case with non-expert users. With a very small group of users, it is often difficult to say if the findings will generalize across the real user base. On the other hand, using a very large amount of people is costly in time and money, and does not have much added value since there will be lots of repetition in your findings.

Hwang, W., & Salvendy, G. (2010). Number of people required for usability evaluation Communications of the ACM, 53 (5) DOI: 10.1145/1735223.1735255