Early in my career, I learned that the widely-held established assumption in user experience (UX) that “it only takes 5 users to test an interface,” was based on a faulty application and analysis of several, unrelated data sets. I set the intention and goal to conduct a definitive investigation and reliable analysis which practitioners could use with confidence when estimating and justifying user experience research decisions.
There was no known way to measure it directly and accurately. There was no established way to calculate it. While on the Research Scientist professional track at Applied Research Laboratories of UT-Austin, I set out to solve it by taking it piece by piece. First, it was essential to find a ‘live’ application, with a measurable workflow, diverse set of users, and in a single location. Next was discovering and mastering the permissions and paperwork that would allow me to use those circumstances, personnel, and application for the effort. Those early months paled in comparison to step 3: to create a new research instrument. Breaking down the problem and requirements involved observing and understanding human behavior with the application, practicing and identifying ways to break each observation into discrete, measurable events on two factors, and, finally, to design a data collection instrument that enabled a single researcher to see, capture, and accurately categorize multiple observations in real-time. The instrument took time to develop, test, refine, and validate, delivering a collateral method and instrument, the Optimal Path Test Method and data sheet, later released to my own developing teams and other practitioners. The method was later validated in comparison with a computer model developed by human factors scientist, Masaaki Kurosu, of Tokyo, Japan, who later became one of my mentors.
Hours and days turned into weeks in the data collection phase, while completing high-numbers of participant tests, and filling out representative user populations. In this case, the raw data would prove nothing, except a simple percentage, overall, for this 60-user test. Even typical statistical analyses would not answer the Five User Assumption question.
It became imperative to analyze the data set in with a less common method: I needed to ‘simulate’ smaller subset combinations of participants and their scores from the robust 60 discrete data sets. Back to the books I went, exploring statistics and possibilities for doing a high number of random combinations, if there was a resource to program it.
**During this period it was essential to remind myself of the driving passion behind it, why it was needed, and why I wanted to answer this. Each time I envisioned fellow practitioners breathing a sigh of relief while planning their own work, standing with greater confidence before leaders while reporting their findings. This is the vision that got me through.**
I found a fit in the Monte Carlo Simulation, which, strangely, filled my need exactly. However, it is typically used on very small data sets with counts of 10 at most. My 60-count set was a monster of epic proportions, unless I could find someone to program it, or some way to do it myself. I found and signed up for a computational neuroscience course that had the collateral benefit of teaching us to program in Matlab. With no previous exposure to either programming or Matlab or even neuroscience, and only the most basic statistics at that time, I set out on the quest. I negotiated with my professor to have this program replace required assignments in the course. Bit by bit I learned built the program and the loops that eventually calculated the results of all possible combination sets of 5 users that could be created from the 60-user data set. For good measure, I expanded it to include all possible combinations of any size, to answer the impact of testing more or fewer users at in any one study. What it delivered most closely reflects what actually happens in user testing: the randomness inherent in even the smallest differences between the unpredicatable people who can come through the door of a research lab puts you at risk of missing critical user experience issues that could cause your program to fail.
At each point in this experience there were frustrations and dead-ends, heartaches and even some permanent ends. Time and again, it seemed it could not be done, then when it was, that it would neither be known or seen. Time and again, one small step at a time, and one-two-three-four tiny substeps at a time, each obstacle inspired a solution and the power to go on.
“Beyond the Five-User Assumption” was published in a juried and peer-reviewed Elsevier-published journal, following my appearance as an invited speaker at Frie University in Amsterdam, The Netherlands. The work is now described by peers as “highly influential” in the field. Practitioners rely on it, which was my driving goal all along. As it has grown in stature it finds new fans and new homes: as part of the user experience curriculum of Bentley University, and the professional certification programs of Human Factors-International, among others. As of Feburary 2016, the work has been incorporated into the published version of the U.S. Food and Drug Administration Guildelines for Testing of Medical Devices with Human Subjects.
I set out to do something that would help people breathe easier. Something simple for folks to experience and read; something that would matter. It is; and, it does, today.
There was no known way to measure it directly and accurately. There was no established way to calculate it. While on the Research Scientist professional track at Applied Research Laboratories of UT-Austin, I set out to solve it by taking it piece by piece. First, it was essential to find a ‘live’ application, with a measurable workflow, diverse set of users, and in a single location. Next was discovering and mastering the permissions and paperwork that would allow me to use those circumstances, personnel, and application for the effort. Those early months paled in comparison to step 3: to create a new research instrument. Breaking down the problem and requirements involved observing and understanding human behavior with the application, practicing and identifying ways to break each observation into discrete, measurable events on two factors, and, finally, to design a data collection instrument that enabled a single researcher to see, capture, and accurately categorize multiple observations in real-time. The instrument took time to develop, test, refine, and validate, delivering a collateral method and instrument, the Optimal Path Test Method and data sheet, later released to my own developing teams and other practitioners. The method was later validated in comparison with a computer model developed by human factors scientist, Masaaki Kurosu, of Tokyo, Japan, who later became one of my mentors.
Hours and days turned into weeks in the data collection phase, while completing high-numbers of participant tests, and filling out representative user populations. In this case, the raw data would prove nothing, except a simple percentage, overall, for this 60-user test. Even typical statistical analyses would not answer the Five User Assumption question.
It became imperative to analyze the data set in with a less common method: I needed to ‘simulate’ smaller subset combinations of participants and their scores from the robust 60 discrete data sets. Back to the books I went, exploring statistics and possibilities for doing a high number of random combinations, if there was a resource to program it.
**During this period it was essential to remind myself of the driving passion behind it, why it was needed, and why I wanted to answer this. Each time I envisioned fellow practitioners breathing a sigh of relief while planning their own work, standing with greater confidence before leaders while reporting their findings. This is the vision that got me through.**
I found a fit in the Monte Carlo Simulation, which, strangely, filled my need exactly. However, it is typically used on very small data sets with counts of 10 at most. My 60-count set was a monster of epic proportions, unless I could find someone to program it, or some way to do it myself. I found and signed up for a computational neuroscience course that had the collateral benefit of teaching us to program in Matlab. With no previous exposure to either programming or Matlab or even neuroscience, and only the most basic statistics at that time, I set out on the quest. I negotiated with my professor to have this program replace required assignments in the course. Bit by bit I learned built the program and the loops that eventually calculated the results of all possible combination sets of 5 users that could be created from the 60-user data set. For good measure, I expanded it to include all possible combinations of any size, to answer the impact of testing more or fewer users at in any one study. What it delivered most closely reflects what actually happens in user testing: the randomness inherent in even the smallest differences between the unpredicatable people who can come through the door of a research lab puts you at risk of missing critical user experience issues that could cause your program to fail.
At each point in this experience there were frustrations and dead-ends, heartaches and even some permanent ends. Time and again, it seemed it could not be done, then when it was, that it would neither be known or seen. Time and again, one small step at a time, and one-two-three-four tiny substeps at a time, each obstacle inspired a solution and the power to go on.
“Beyond the Five-User Assumption” was published in a juried and peer-reviewed Elsevier-published journal, following my appearance as an invited speaker at Frie University in Amsterdam, The Netherlands. The work is now described by peers as “highly influential” in the field. Practitioners rely on it, which was my driving goal all along. As it has grown in stature it finds new fans and new homes: as part of the user experience curriculum of Bentley University, and the professional certification programs of Human Factors-International, among others. As of Feburary 2016, the work has been incorporated into the published version of the U.S. Food and Drug Administration Guildelines for Testing of Medical Devices with Human Subjects.
I set out to do something that would help people breathe easier. Something simple for folks to experience and read; something that would matter. It is; and, it does, today.