4. Tool Evaluation
Conference Management
Sdr
As mentioned in chapter 3, part of sdr’s interface (mainly words appearing in the interface and in the online help system) was redesigned using a language based approach to conceptual design. In this section the evaluation of the new interface is described. The changes to sdr’s interface were made as part of a student project on conceptual design, and sdfg sdthe emphasis in the evaluation was therefore on whether users would form correct users’ models of sdr or not, rather than on traditional hci usability issues.
The results from the evaluation showed that all users who were introduced to the redesigned version of sdr, were able to use the tool competently after a short introduction and one week’s unsupervised practice. Furthermore, more than half the users used language belonging to the sphere of the design model to articulate their knowledge about sdr and 5 were deemed to have correct users’ models based on the design model. These results are encouraging as they are based on a redesigned version of sdr where only labels were changed to reflect the new design model.
In order to detect if users of sdr had correct users’ models, based on the "Electronic Radio Times design model", linguistic evidence had to be identified, as correct completion of tasks does not necessarily mean that the user has a correct user’s model. In other words, the user might do the right thing for the wrong reasons. Having a correct user’s model, i.e. doing the right thing for the right reason, is important in situations like error recovery etc. Possession of a correct user’s model is a prerequisite for effective use of a system. Sasse (1996) has reviewed empirical work on users’ models and concluded that performance results alone may not be reliable indicators of users’ models, and strongly recommends using verbal protocols in addition. It was therefore necessary to look for ways of making the users verbalise their thought processes in a natural way. One way of doing this is to have the user to teach someone else about the application (Miyake, 1986). But first, the users were introduced to sdr to give them time to consolidate a user’s model. In the following section the training and evaluation procedure will be described.
Methods for Eliciting Mental Models
In order to compare users’ models, both the existing and the new interface were studied. The new user interface was evaluated with 12 users who had never used sdr before (new users), and the existing user interface was tested with 12 users who had been using the original version of sdr (existing users). The reason for evaluating the original as well as the new interface was to provide a control group that we could compare with when eliciting the new users’ models. The study was divided into three parts:
Task completion. Users were asked to complete six tasks while thinking aloud. They were told that we would prefer them work out how to do the tasks themselves, but if they got irreversibly stuck, they could ask for help. The tasks were scored, based on whether the subjects had successfully completed the tasks without help or not, and problems that the users had completing the tasks were noted.
Mindmaps. Users were given paper copies of the four main windows of sdr, a large piece of paper, a pen and some glue, and asked to glue the windows onto the piece of paper and draw arrows from one window to another if they thought they could get from one window to the other in sdr. The arrows from the mindmaps were listed in tables and added up to see if there were any differences in the mindmaps of the new users and old users.
Teach-back. Users were asked to teach sdr to a contrived co-learner, whom they were told was new to sdr. In fact, the co-learner knew sdr well and prompted users to explain sdr functionality and behaviour of the user interface. The teach-back sessions were transcribed to supply data in which to look for linguistic evidence of users’ models. As mentioned earlier, words are linked together in a semantic network, i.e. words which are closely related will tend to be present at the same time. When looking for evidence of "Electronic Radio Times" based users’ models, not only the actual words "Radio Times" and "Daily Listings" were relevant, but also words closely related to the entire concept of TV and broadcasting.
Existing users performed all three parts in one session. New users did the first part one week and the second and third the following week. The first part the tasks was performed as a practice session for the new users, but they were also asked to use sdr in the week between the first and the second session to familiarise themselves with it. Part one and three were recorded on videotapes and transcribed. The videotapes contain an overlay of two images. One is a frontal image of the users, recorded with a video camera next to the workstation. A camera is an integral part of a multimedia workstation and is a necessary accessory when using sdr and it should therefore not be extraordinarily intrusive to the user. The other image was the screen the users were looking at. By overlaying these two images, and recording them onto a videotape, it is possible to see and hear the user as well as see what the user is doing on the screen, all at the same time.
Results
The use of language differed considerably between the two groups: Half of the "new" users explicitly used the "Radio Times" metaphor to explain certain features of sdr, but even "new" users who did not explicitly mention the "Radio Times" used language belonging to the "Electronic Radio Times" design model. They referred to different "stations" or that sessions are "on" etc. "Existing" users state that sessions will be "active". See Table 1 for linguistic examples from the teach-back sessions.
"New" users about sdr: |
"Existing" users about sdr: |
About the Calendar/Daily Listings Window:
|
About the calendar/daily listings:
About the Main Window:
|
Table 1: Statements by "new" and "existing" users about sdr
"Electronic Radio Times" based user models
We have now seen that there were considerable differences in the language that "existing" and "new" users use to explain certain features of sdr. However, we set out to discover whether "new" users would have an "Electronic Radio Times" based user model of sdr. Results indicate that 8 of the "new" users had few or no problems teaching sdr to the co-learner; and of these 5 had user models clearly based on the "Electronic Radio Times" design model. I suspect that the number could have been 7, had it not been for a software bug in the Daily Listings Window which caused some sessions not to appear in the Daily Listings Window, despite the fact that they were ‘on’ ¾ this appears to have disturbed the construction of their user models, particularly because the sessions that did not appear in the Daily Listings Window were ‘exciting’ ones like the Nasa shuttle launches and a Canadian News TV station.
The transcripts from the teach-back sessions for "Electronic Radio Times" based user models were analysed. All users successfully taught the co-learner all main tasks for sdr. However, they all encountered problems at some time during the session. These problems were divided into major and minor problems. The users who successfully taught all tasks without any major problems were categorised as having successful task performance.
The criteria for determining whether users had an "Electronic Radio Times" based user model were:
5 (and potentially 7) "new" users had correct user models clearly based on the "Electronic Radio Times" design model. As mentioned above, this is an encouraging result as the "Radio Times" based user models have been influenced exclusively by changing the words appearing in sdr’s interface.