508.txt

Cairo University
Faculty of Computers & Information
Information Systems Department

An Evaluation Framework to Support the Usability Design for Mobile Applications
By :
Ayat Mahmoud Ahmed
A Thesis Submitted to the 
Faculty of Computers and Information, Cairo University
In Partial Fulfillment of the Requirements for the degree of
 Master of Science in Information Systems

Under the supervision of:
Prof. Dr. Galal Hassan Galal Edeen 
Information Systems Department, Faculty of Computers and Information
Cairo University
Dr. Ehab Ezzat Hassanen
Information Systems Department, Faculty of Computers and Information
Cairo University

Cairo, Egypt
June 2011

Admission
I certify that this work has not been accepted in substance for any academic degree and is not being concurrently submitted in candidature for any other degree.
Any portion of this thesis for which I am indebted to other sources are mentioned and explicit references are given.

Student Name
:
Ayat Mahmoud Ahmed
Signature
:
_ _ _ _ _ _ _ _ _ __  _ _ _ _ _ _


Cairo University
Faculty of Computers and Information
Information Systems Department


An Evaluation Framework to Support the Usability Design for Mobile Applications

By :
Ayat Mahmoud Ahmed
A Thesis Submitted to the
Faculty of Computers and Information, Cairo University
In Partial Fulfillment of the Requirements for the degree of
Master of Science in Information Systems


Approved by the Examining Committee
__________________________________ Signature
Prof.Dr.                                                     Thesis Main Advisor
____________________________________________
Prof. Dr.
____________________________________________
Dr.
____________________________________________
Cairo University
Faculty of Computers and Information
Information Systems Department
Cairo, Egypt
June 2011
Acknowledgments
First of all, I would like to say “Alhamdulillah” for reconciling me to finish this work in the hard conditions of marriage and having two babies. Without the help of Allah, I would not finish this.
	I wish to express my deepest and sincere gratitude to Prof.Dr.Galal Hassan Galal-Edeen for his valuable advice, aid, guidance, patience, encouragement and instructive discussions. Prof.Galal made his time available for me and did not save any effort for guiding me. I am especially grateful to Prof.Galal for giving me the opportunity to do my thesis under his supervision.
 	A special word of appreciation is due to my husband, Dr.Ahmed Ali Elhanafi for his generous support, kind concern, understanding, and for being there during my times of stress.
A special word of gratitude is due to my fellows at work, Soha Safwat and Farid Ali for their guidance, precious help, being friendly, and driving force to complete this thesis.
I would also like to thank my family especially Magda Abdel Halim, Mohamed Abdel Halim, Mahmoud Abdel Halim, Souhair Abdel Halim, Nour Abdel Halim, and Seyada Abdel Halim for their very kind and precious help.

I am greatly indebted to my parents. I wish to thank my father Mahmoud Khalil for his encouragement, support, precious advice and valuable guidance. 

A special word of appreciation and deepest gratitude is due to my mother, Mona Abdel Halim, for her patience, encouragement, taking care of my babies during my work times and for being there during my times of distress, and her encouragement during my success. And for all effort she made for me all my life.
“This thesis is dedicated to my mother”
Ayat Mahmoud
Abstract
The current trend of users demanding mobile information and communication technologies is to support their everyday life and work. This has led to the development of new generations of mobile devices. The development of these mobile devices with their specific applications must follow an iterative process using appropriate usability evaluation methods at each stage of development to ensure the usability of mobile devices and applications. The great demand and rapid growth of mobile applications have attracted many research interests. Since developing mobile applications with usable friendly user interface is critical for successful adoption and use of applications, one of the significant research questions is regarding how to conduct an appropriate usability evaluation using mobile devices. Usability evaluation of mobile applications is a promising research area. Most of the usability evaluation methods developed during the last twenty five years have focused primarily on desktop applications. The challenges that face usability evaluation of mobile applications stem from the unique features of mobile devices such as the wide range of platforms, the diversity of screen sizes, the limited bandwidth, as well as the changing context. Traditional guidelines and methods used in usability evaluation of desktop applications may not be directly applicable to a mobile environment. Therefore, it is essential to develop and adopt usability evaluation methods that are appropriate for mobile applications. This research is aiming at introducing a framework to support the usability evaluation of mobile applications.

Table of Contents

Acknowledgments	i
Abstract	ii
List of Figures	vii
List of Tables	ix
Chapter 1: Introduction	1
1.1 Background and Motivation	1
1.2 Problem Statement	2
1.3 Scope	3
1.4 Objectives	3
1.5 Methodology	3
Chapter 2: Usability Issues	4
2.1 What is Usability?	4
2.2 What do we mean by “Usable”?	5
2.3 What is Usability Evaluation?	6
2.4 Why do we need to Evaluate Usability?	8
2.4.1 Informing Design	9
2.4.2 Eliminating Design Problems and Frustration	9
2.4.3 Improving Profitability	9
2.5 Usability Attributes	10
2.6 User Experience Goals	12
2.7 General Mobile Usability Research Problems	13
2.7.1 Direct Usage of Desktop Approaches	13
2.7.2 Lack of Trained Specialists	13
2.7.3 Direct Porting between Different Mobile Platforms	13
2.7.4 No Established Mobile Software Usability Culture	13
2.7.5 Fast Software Market Environment	14
2.7.6 Simplicity of the User Interface of the Mobile Application	14
2.7.7 Different Technologies and Standards	14
Chapter 3: The Nature of Mobile Applications	16
3.1 Introduction	16
3.2 Why evaluating Usability of a Mobile Application is Different from a Desktop One?	17
3.2.1 Wide Range of Platforms	17
3.2.2 Wide Range of Input Methods	18
3.2.3 Different Screen Sizes	20
3.2.4 Context of Use	21
3.2.5 Battery Consumption	21
3.2.6 Limited Processing Power	22
3.2.7 Recovery after System Failures	22
3.2.8 Standardization Issues faced by Mobile Applications’ Developers	22
3.2.9 Limited Bandwidth	23
3.2.10 Different Interaction Styles	23
Chapter 4: State of the Art of Usability Evaluation for Mobile Applications	25
4.1 Introduction	25
4.2 Usability Evaluation Paradigms	25
4.2.1 "Quick and dirty" Evaluation	26
4.2.2 Usability Testing	26
4.2.3 Field Studies	28
4.2.4 Field or Laboratory Evaluation?	30
4.2.5 Predictive Evaluation	37
4.3 Usability Evaluation Techniques	38
4.3.1 Observing Users	38
4.3.2 Asking Users	42
4.3.3 Asking Experts	46
4.3.4 Testing Users' Performance	54
4.3.5 Modeling Users' Task Performance	56
Chapter 5: Our Framework to Support Usability Evaluation of Mobile Applications	61
5.1 Introduction	61
5.2 What is Log File?	62
5.3 Why Log file Analysis?	63
5.3.1 How to Record User’s Events While Using a Mobile Application?	63
5.3.2 Success of Web Usability Evaluations Using Log File Analysis	65
5.4 Our Suggested Evaluation Framework	66
Chapter 6: Applying the Suggested Framework on Two Case Study Applications	71
6.1 Introduction	71
6.2 The Mobile Logger Application (MobLog)	71
6.3 The Case Studies	72
6.4 The Results	75
6.5 Results Validation	86
Chapter 7: Evaluation and Future Work	89
7.1 Evaluation of Our Work	89
7.1.1 Usability Testing	89
7.1.2 Field Studies	89
7.1.3 Our Framework	89
7.2 Limitations of MobLog	90
7.3 Future Work	90
References	91


List of Figures
Figure 1: Definition of Usability	4
Figure 2: A group of smart phones with different platforms (from Google images)	18
Figure 3: a sample for a log file	65
Figure 4: Our suggested framework to support the usability evaluation of mobile applications	72
Figure 5: The front page of the MobLog; listing all applications installed to enable the user choose one to monitor.	73
Figure 6: Snapshot of the mobile screen containing one of the result files after using “Cinta” application.	74
Figure 7: Snapshot of the mobile screen for a user while performing task no.1 in “eBuddy” application	76
Figure 8: Summary of task completion times for “create an account for you” in “eBuddy”	76
Figure 9: the average “TCT at first use” of task 1 for both “Cinta” and “eBuddy” applications	77
Figure 10: The average values for TCT at first use, TCT, and TCT after two weeks of task 2 for both applications	78
Figure 11: The average values for TCT at first use, TCT, and TCT after two weeks of task 3 for both applications	79
Figure 12: The average values for TCT at first use, TCT, and TCT after two weeks of task 4 for both applications	79
Figure 13: The average values for TCT at first use, TCT, and TCT after two weeks of task 5 for both applications	80
Figure 14: The average values for TCT at first use, TCT, and TCT after two weeks of task 6 for both applications	81
Figure 15: The average values for TCT at first use, TCT, and TCT after two weeks of task 7 for both applications	81
Figure 16: The average number of calls on help and the average number of errors of task 1 for both applications	82
Figure 17: The average number of calls on help and the average number of errors of task 2 for both applications	83
Figure 18: The average number of calls on help and the average number of errors of task 3 for both applications	83
Figure 19: The average number of calls on help and the average number of errors of task 4 for both applications	84
Figure 20: The average number of calls on help and the average number of errors of task 5 for both applications	85
Figure 21: The average number of calls on help and the average number of errors of task 6 for both applications	85
Figure 22: The average number of calls on help and the average number of errors of task 7 for both applications	86
Figure 23: The number of steps needed to accomplish each task in both applications	87
Figure 24: The time taken to train the test participants on each application	87
Figure 25: Average SUS scores of the two applications	90


List of Tables
Table 1: Severity Ranking Scale (SRS)	53
Table 2: The standard set of approximate times for the main kinds of operators used during a task by (Card et al., 1983)	62
Table 3: summarization of the usage data of all users using the average values for all measuring variables	77
Table 4: the average task completion time at first use (TCT at first use) of task 1 for both “Cinta” and “eBuddy” applications	77
Table 5: The average values for TCT at first use, TCT, and TCT after two weeks of task 2 for both applications	78
Table 6: The average values for TCT at first use, TCT, and TCT after two weeks of task 3 for both applications	78
Table 7: The average values for TCT at first use, TCT, and TCT after two weeks of task 4 for both applications	79
Table 8: The average values for TCT at first use, TCT, and TCT after two weeks of task 5 for both applications	80
Table 9: The average values for TCT at first use, TCT, and TCT after two weeks of task 6 for both applications	80
Table 10: The average values for TCT at first use, TCT, and TCT after two weeks of task 7 for both applications	81
Table 11: The average number of calls on help and the average number of errors of task 1 for both applications	82
Table 12: The average number of calls on help and the average number of errors of task 2 for both applications	82
Table 13: The average number of calls on help and the average number of errors of task 3 for both applications	83
Table 14: The number of calls on help and the average number of errors of task 4 for both applications	84
Table 15: The average number of calls on help and the average number of errors of task 5 for both applications	84
Table 16: The average number of calls on help and the average number of errors of task 6 for both applications	85
Table 17: The average number of calls on help and the average number of errors of task 7 for both applications	86
Table 18: The number of steps needed to accomplish each task in both applications	87
Table 19: The time taken to train the test participants on each application	87
Table 20: SUS scores of the test participants	89


Chapter 1: Introduction
1.1 Background and Motivation
Interaction design is now big business. In particular, website consultants, start-up companies, and mobile computing industries have all realized its pivotal role in successful interactive products. To get noticed in the highly competitive field of web and mobile products requires standing out. Being able to say that your product is easy and effective to use is seen as central to this [ CITATION Placeholder5 \l 1033 ].
According to Preece et al. (2002), the process of interaction design involves four basic activities:
    1. Identifying needs and establishing requirements.
    2. Developing alternative designs that meet those requirements.
    3. Building interactive versions of the designs so that they can be communicated and assessed.
    4. Evaluating what is built throughout the process.
Evaluating usability mobile applications is particularly challenging given the variability of users, uses, and environments involved. Over the past decades various usability evaluation methods have been developed and implemented to improve and assure easy-to-use user interfaces and systems [ CITATION Ber08 \l 1033 ].
Isomursu et al. (2004) argued that evaluating usability of mobile applications is an emerging area of research in the field of human computer interaction (HCI). It is commonly accepted that data collection for evaluation of mobile devices and applications is a central challenge, and that novel methods must be found for that [ CITATION Iso04 \l 1033 ].
The current trend of users demanding mobile information and communication technologies is to support their everyday life and work. This has led to new generations of mobile devices. Mobile devices have expanded their functionality step by step. Looking at today’s generation of mobile phones, various functions are offered. People may communicate via voice and text (short message service), receive information from the Internet, or use calendars on their cell phone to organize their daily lives. An endless variation of functionality exists on these pocket-sized devices. Mobile devices are used in various situations and contexts. They are used to support social life, peoples’ social network, and are enablers of new ways of communication and coordination behaviors. The development of these mobile devices with their specific applications must follow an iterative process using appropriate usability evaluation methods at each stage of development to ensure the usability of mobile devices and applications[ CITATION Jon06 \l 1033 ].
1.2 Problem Statement
Most of the usability evaluation methods developed during the last 25 years have focused primarily on desktop applications [ CITATION Ber08 \l 1033 ]. The challenges for usability evaluation of mobile devices stem from their special characteristics like the very wide range of platforms, the small screens with low resolution (compared to desktops), less power supply, the limited bandwidth, and the trend to make devices smaller and smaller. At the same time the number of functions supported has increased. Non-standardized software development due to various operating systems used on these devices additionally complicate the matter. But these peculiarities are not the result of hardware and software trends only. 
Mobile devices are used in a variety of environments and contexts: at home, on the move, especially during travel. The location of the users is not the only determinant for the usage of mobile devices. The devices are used in a number of ways and situations not only influenced by the location, but rather by our activities: how we coordinate getting to places, how we adapt our daily routines, and how we organize and define our social networks are central usage behaviors for mobile devices. The context of use has a high impact on mobile device usage, and thus must be appropriately reflected in the usability evaluation [ CITATION Cat08 \l 1033 ].
Since most of the so-called ‘classical’ methods of evaluation have demonstrated shortcomings when used in the field of mobile applications [ CITATION Ber08 \l 1033 ] due to the very wide range of mobile platforms, interaction styles, input methods, and screen sizes, my mission is to adapt one of the traditional usability evaluation methods to meet the needs of mobile evaluations. 
1.3 Scope
To develop an evaluation framework to support the usability design for mobile applications.
1.4 Objectives
    • To explore the current issues of usability. 
    • To explore the current evaluation paradigms and techniques.
    • To introduce a new evaluation framework to support the usability design for mobile applications.
    • To apply and assess this new evaluation framework on two case study applications.
1.5 Methodology
In my research, we will rely on literature review from different sources such as the internet, books in library, the papers published in trusted journals such as ACM, IEEE, and other specialized sources. We aim also to apply and assess the suggested framework on two case study applications. And then validate the results. 
Chapter 2: Usability Issues
2.1 What is Usability?
The definition of usability offered by the International Standards Organization, that is, in ISO9241, part 11, is, “The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” [ CITATION Har08 \l 1033 ]; [ CITATION Abr03 \l 1033 ].
This led me to summarize this definition in Figure 1: 

Figure 1: Definition of Usability
The implications are that, first, usability is the consequence of a given user employing a given product to perform a given activity in a given environment (as stated) and, second, that it is possible to measure aspects of this relationship in terms of effectiveness, efficiency, and user satisfaction. It is important to note that these three aspects are inter-connected and that any evaluation activity ought to try to measure some aspect of each [ CITATION Frø00 \l 1033 ].
If one is able to speak of measures, then it makes sense to be able to determine some criteria that indicate good or poor performance on these measures. Good et al. (1986) proposed that it is important to define both evaluation targets and metrics that relate to these targets. For example, in a study of a conferencing system, Good et al. (1986) identified 10 attributes that they felt reflected the use of the conferencing system, for example, ranging from a fear of feeling foolish to a number of errors made during task performance[ CITATION Goo86 \l 1033 ].
For each attribute, Baber (1999) defined a method for collecting data about that attribute, for example, questionnaires, observation, and so forth, and then set performance limits relating to best, worst, and planned levels. A study of a wearable computer for paramedics made by Baber et al. (1999) used this concept to produce three measures of performance, that is, predictive modeling (using critical path analysis), user trials, and performance improvement arising from practice. 
The ISO9241 notion of usability requires the concept of evaluation targets, for example, one could begin with a target of “66% of the specified users would be able to use the 10 main functions of product X after a 30 minute introduction.” Once this target has been met, the design team might want to increase one of the variables, for example, 85% of the specified users, or 20 main functions, or 15 minute introduction, or might want to sign-off that target [ CITATION Bab08 \l 1033 ].
2.2 What do we mean by “Usable”?
In large part, what makes something usable is the absence of frustration in using it. To be usable, a product or service should be useful, efficient, effective, satisfying, learnable, and accessible [ CITATION Rub08 \l 1033 ]. 
The word usability is not so strange nowadays. More and more software designers are concerned about usability issues, such as easiness to learn, to use and to remember, efficient to use, few errors and user’s subjective satisfaction. Not only users gain benefits of a system with good usability but also organizations or companies, because when the usability of the system is high, they can reduce training and support costs and rise productivity [ CITATION Tan05 \l 1033 ]. 
Usability was also defined by IEEE in [ CITATION Abr03 \l 1033 ] as ‘the ease with which a user can operate, prepare inputs for, and interpret outputs of a system or component’, and outlined five main attributes, Learnability, Efficiency, Memorability, Errors, and Satisfaction.
Usability tends to measure the quality of a user's experience when interacting with the system. Whether a Web site, a software application, mobile technology, or any device, it is important to believe that usability is not a single, one-dimensional property of a user interface. Usability is a combination of factors including [ CITATION Usa10 \l 1033 ]:
    • Ease of learning: How fast can a user who has never seen the user interface before learn it sufficiently well to accomplish basic tasks? 
    • Efficiency of use:  Once an experienced user has learned to use the system, how fast can he or she accomplish tasks? 
    • Memorability: If a user has used the system before, can he or she remember enough to use it the next time or does the user have to start over again learning everything? 
    • Error frequency: How often do users make errors while using the system? 
    • Subjective satisfaction: How much does the user like using the system?
Like other quality attributes, we can view usability both from design and evaluation perspectives [ CITATION Fol01 \l 1033 ]. Usability is considered one of a range of non-functional requirements, such as recovery time, security, response time, and safety, which should be satisfied as part of the design phase. So, it should be clearly specified during requirements analysis phase and designed during the implementation phase. Usability needs also to be evaluated from a user-centric point of view during all the phases of design life cycle [ CITATION Hag05 \l 1033 ]. 
2.3 What is Usability Evaluation?
Rubin & Chisnell (2008) argued that the term usability evaluation is often used rather indiscriminately to refer to any technique used to evaluate a product or system. They argued also that the usability evaluation is a process that employs people as testing participants who are representative of the target audience to evaluate the degree to which a product meets specific usability criteria[ CITATION Rub08 \l 1033  ].
Usability evaluation is an important part of the overall user interface design process, which consists of iterative cycles of designing, prototyping, and evaluating. Usability evaluation is itself a process that entails many activities depending on the method employed. Common activities include [ CITATION Ivo01 \l 1033 ]:
Capture: collecting usability data, such as task completion time, errors, guideline violations, and subjective ratings.
Analysis: interpreting usability data to identify usability problems in the interface.
Critique: suggesting solutions or improvements to mitigate problems.
Usability evaluation can be seen as an irreplaceable usability practice, since it gives direct input on how real users use the system. Usability evaluation focuses on measuring a human-made product's capacity to meet its intended purpose. Examples of products that commonly benefit from usability testing are foods, consumer products, web sites or mobile applications, computer interfaces, documents, and devices. Usability evaluation measures the usability, or ease of use, of a specific object or set of objects, whereas general human-computer interaction studies attempt to formulate universal principles[ CITATION Wik11 \l 1033  ].
Usability evaluation is a research tool, with its roots in classical experimental methodology. The range of tests one can conduct is considerable, from true classical experiments with large sample sizes and complex test designs to very informal qualitative studies with only a single participant. Each evaluation approach has different objectives, as well as different time and resource requirements[ CITATION Ing07 \l 1033 ]. 
A wide range of usability evaluation techniques have been proposed, and a subset of these is currently in common use. Some evaluation techniques, such as formal user testing, can only be applied after the interface design or prototype has been implemented. Others, such as heuristic evaluation, can be applied in the early stages of design. Each technique has its own requirements, and generally different techniques uncover different usability problems[ CITATION Kos10 \l 1033 ]. 
An objective dimension generally aims to evaluate how well users conduct their tasks with the use of performance measures like task completion time and the number of errors. However, objective dimensions do not always predict the user’s assessment of usability because it does not reflect users’ feeling or satisfaction. Subjective dimensions therefore, need to be assessed to provide a holistic and complete usability evaluation [ CITATION Tre94 \l 1033 ].
Usability evaluation methods can be classified into three types: usability testing, usability inquiry, and usability inspection. Usability testing employs representative users on typical tasks using a system or a prototype and then evaluates how user interface supports the users to do their tasks. Typical methods include co-discovery learning, question-asking protocol, and shadowing method. Usability inquiry talks to users, observes their using a system in real work, and lets them answer questions in order to understand users’ feelings about the system and their information needs. Field observation, focus groups, and questionnaire survey are typical usability inquiry methods. In usability inspection, usability experts examine usability related aspects. Typical methods are cognitive walkthrough and heuristic evaluation. It cannot be said that one method is the best in all situations. It is thus necessary to choose an appropriate method, taking into account evaluation purposes, available time, measures to be collected, and so on [ CITATION Zha03 \l 1033  ].
Ham et al. (2006) argued that so far the general usability concepts are described without considering the features peculiar to mobile devices. To examine the usability of mobile devices, the user interface, functions or tasks, and the context of use needs to be understood[ CITATION Ham06 \l 1033  ].
Scholtz (2005) defined usability engineering as the discipline that provides structured methods for achieving usability in user interface design during product development.  Usability evaluation is part of this process.  While theoretically any software product could be evaluated for usability, the evaluation is unlikely to produce good results unless a usability engineering process has been followed[ CITATION Sch05 \l 1033 ].
Scholtz (2005) divided usability engineering into three basic phases:  requirements analysis, design/testing/development, and installation.  Usability goals are established during requirements analysis.  Iterative testing is done during the design/testing/development phases and the results are compared to the usability goals.  User feedback should also be obtained after installation as a check on the usability and functionality of the product[ CITATION Sch05 \l 1033 ].   
2.4 Why do we need to Evaluate Usability?
From the point of view of some companies, usability evaluation is part of a larger effort to improve the profitability of products. There are many aspects to doing so, which in the end also benefits users greatly: design decisions are informed by data gathered from representative users to expose design issues so they can be remedied, thus minimizing or eliminating frustration for users.
2.4.1 Informing Design
According to Usability.gov (2006), the overall goal of usability testing is to inform design by gathering data from which to identify and rectify usability deficiencies existing in products and their accompanying support materials prior to release. The intent is to ensure the creation of products that[ CITATION Usa06 \l 1033 ]:
    • Are useful to and valued by the target audience
    • Are easy to learn
    • Help people be effective and efficient at what they want to do
    • Are satisfying (and possibly even delightful) to use
2.4.2 Eliminating Design Problems and Frustration
One side of the profitability coin is the ease with which customers can use the product. Artim (2004) argued that when we minimize the frustration of using a product for our target audience by remedying flaws in the design ahead of product release, we also accomplish these goals[ CITATION Art04 \l 1033 ]:
    • Set the stage for a positive relationship between our organization and our customers.
    • Establish the expectation that the products our organization sells are high quality and easy to use.
    • Demonstrate that the organization considers the goals and priorities of its customers to be important.
    • Release a product that customers find useful, effective, efficient, and satisfying.
2.4.3 Improving Profitability
Kock et al. (2009) pointed out the goals or benefits of testing usability for our organization as (quoted from [ CITATION Koc09 \l 1033 ]):
    • Creating a historical record of usability benchmarks for future releases. By keeping track of test results, a company can ensure that future products either improve on or at least maintain current usability standards.
    • Minimizing the cost of service and support calls. A more usable product will require fewer service calls and less support from the company.
    • Increasing sales and the probability of repeat sales. Usable products create happy customers who talk to other potential buyers or users. Happy customers also tend to stick with future releases of the product, rather than purchase a competitor’s product.
    • Acquiring a competitive edge because usability has become a market separator for products. Usability has become one of the main ways to separate one’s product from a competitor’s product in the customer’s mind. One need only scan the latest advertising to see products described using phrases such as ‘‘simple’’ and ‘‘easy’’ among others. Unfortunately, this information is rarely truthful when put to the test.
    • Minimizing risk. Actually, all companies and organizations have conducted usability testing for years. Unfortunately, the true name for this type of testing has been “product release”, and the “testing” involved trying the product in the marketplace. Obviously, this is a very risky strategy, and usability testing conducted prior to release can minimize the considerable risk of releasing a product with serious usability problems.
2.5 Usability Attributes
Jakob Nielsen (1993) identified five key attributes of usability of systems with which humans interact. The common attributes are: ease of learning (learnability), speed of performance (efficiency), low error rate, retention over time (memorability), and user attitude (subjective satisfaction). Such attributes are commonly used for usability goal setting and benchmarking of a system throughout a development lifecycle. Usability specialists use different evaluation techniques to measure usability based on these five usability attributes. 
    1. Efficiency: Once users have learned the design, how quickly they can perform tasks? 
Measuring Variable: 
    • Task completion time ([ CITATION Chi02 \l 1033 ]; [ CITATION Chr03 \l 1033 ]; [ CITATION Placeholder1 \l 1033  ]). 

    2. Errors: involves how many errors do users make while accomplishing a task? 
Measuring Variables: 
    • Number of errors (e.g., deviation from the right path, wrong answers, percentage of completed task correctly) [ CITATION Gul04 \l 1033 ]; [ CITATION Chi02 \l 1033 ]; [ CITATION Chr03 \l 1033 ]; [ CITATION Jon99 \l 1033 ]. 

    3. Learnability: How easy is it for users to accomplish basic tasks the first time they encounter the design? It is well known that people don't like spending a long time learning how to use a system. 
Measuring Variables: 
    • Time used to accomplish tasks at the first use [ CITATION Par04 \l 1033 ]. 
    • Time spent on training users until reaching a level of satisfaction [ CITATION Mac99 \l 1033 ]; [ CITATION Jon99 \l 1033 ]. 
    • Number of calls on help or assistance [ CITATION Placeholder1 \l 1033  ]. 

    4. Memorability: refers to how easy a system is to remember how to use, once learned. If users haven't used a system or an operation for few days or weeks, they should be able to remember or at least rapidly be reminded how to use it. 
Measuring Variables: 
    • Time used to finish tasks after not using applications for a period of time (e.g., 3 days or weeks) [ CITATION Placeholder2 \l 1033  ]; [ CITATION Placeholder2 \l 1033  ]. 

    5. Effectiveness: refers to how good a system is at doing what it is supposed to do. 
Measuring Variables: 
    • Number of steps done to achieve a goal [ CITATION Ebl00 \l 1033 ]. 

It is easy to specify usability metrics, but hard to collect them. Typically, usability is measured relative to users' performance on a given set of test tasks. The most basic measures are[ CITATION Nie01 \l 1033 ]: 
    • success rate (whether users can perform the task at all), 
    • the time a task requires, 
    • the error rate, and 
    • users' subjective satisfaction. 
It is also possible to collect more specific metrics, such as the percentage of time that users follow an optimal navigation path.
2.6 User Experience Goals
As well as focusing primarily on improving efficiency and productivity at work, interaction design is increasingly concerning itself with creating systems that are[ CITATION Placeholder5 \l 1033 ]: 
    1. satisfying
    2. enjoyable
    3. fun
    4. entertaining
    5. helpful
    6. motivating
    7. aesthetically pleasing
    8. supportive of creativity
    9. rewarding
    10. emotionally fulfilling

Studying the measuring variables discussed in section 2.5, we can notice that the most frequently used variable is time (e.g. task completion times). Other common variable is navigation, by this we mean, the number of steps and which components are accessed to reach a goal. Hence using the log file analysis will be an excellent way to extract such information.
2.7 General Mobile Usability Research Problems
2.7.1 Direct Usage of Desktop Approaches
Currently, usability evaluation approaches accepted for personal computers are applied to mobile software. Developers directly transfer an ideology of PC development to the development of programs for handhelds. As a result, there emerge programs for cell phones and PDA with interfaces rather satisfactory if run on PC but never on a handheld. On a handheld they are inconvenient, bulky and sophisticated[ CITATION Bug08 \l 1033 ].
2.7.2 Lack of Trained Specialists
Another problem here is the lack of trained specialists. This problem in Ukraine has its specificity. A subject about user interface development is absent in the curriculum of many specialized colleges and universities. Some of them propose a course “Ergonomics of Web-site”. But this is extremely insufficient for development of successful interfaces for desktop, mobile or embedded applications [ CITATION Bug08 \l 1033 ].
2.7.3 Direct Porting between Different Mobile Platforms
The next problem is "direct" porting between different mobile platforms. In the struggle for the market share, software developers aim at making their programs support diverse mobile platforms and diverse models of handhelds. However, since the platforms are technologically different the programs are to be re-developed for almost each of them. Accordingly, their interface must take into account architecture and phone model, too. Thus, to be successful, a programmer has to acquire a wide range of knowledge. Such professionals are few, and very often programs migrate among platforms without changes in user interface which affects usability dramatically[ CITATION Rap10 \l 1033 ].
2.7.4 No Established Mobile Software Usability Culture
Many programmers who currently work on creation of the mobile programs were trained to develop PC software. During decades of PC software development practice there emerged programs and interface libraries that became a standard de facto. Even programmers who are not quite experienced in what concerns usability can borrow such standard interfaces, "automatically" creating fairly usable programs. In mobile industry, the situation differs a lot. Usability culture of this kind does not exist here yet. Conventional programs and libraries are few so far. Moreover, for each platform there is its own standard, the platforms are technologically diverse, and they look as well as operate differently[ CITATION Bug08 \l 1033 ].
2.7.5 Fast Software Market Environment
Currently software development projects are generally time-restricted. "To make faster" is the motto of our time. However, "faster" and "thought-out" do not always combine in one project. In the long run one has to sacrifice something. High-quality interface design is often the very thing that is sacrificed to a tight schedule.
2.7.6 Simplicity of the User Interface of the Mobile Application
When you are on the move, you cannot afford to read instructions or manuals when you want to perform certain operations. The ability to access applications and data within, literally, a few keystrokes is very important in mobile devices. User interfaces need to be friendly and attractive. Graphical capability is another issue that constrains the development of attractive interfaces on mobile devices.
2.7.7 Different Technologies and Standards
Until now, vendors competed to provide mobile devices without adhering to any particular standards. This will change in the future, as mobile device manufacturers start to realize that it is in their interests to cooperate and use open standard technology. In May 2002, Nokia and Siemens Information and Communication Mobile announced that the companies agreed on a framework of collaboration to create and drive the implementation of mobile terminal software based on open standards. Such efforts will help to open the door for more players and more collaboration.
Identifying user requirements and designing usable interactions have long been acknowledged for their role in successful software design. Although these two factors still apply to designing successful mobile data services, there is one significant difference when compared to conventional software design for the desktop environment. On a desktop PC, a user is typically aware of most of the services that are provided by the PC. These often take the form of applications that the user has personally installed. In contrast, in the mobile environment, users may often be unaware of the services that are available through their handset or even how to find them. Services come with different labeling, structures, content and initiation processes. 
This lack of standardization is similar to the vast and unregulated services that one can find on the Internet. However, interacting with a PC in order to discover such services is more tolerable. Users can find their desired service by applying a variety of techniques, such as browsing, using search engines or following indices of services. However, these explicit user actions are harder to perform through the limited interaction methods of the mobile device. Therefore, our hypothesis is that the process of service discovery on mobile devices will be a major factor contributing to low usage of data services. If this is true, awareness of the available services, their approximate content and the procedure for invoking them will play a central role in the level of usage of the services. We define this knowledge as the Service Awareness that any user has at any given moment. Although this awareness can be raised through formal and informal advertising (e.g. word of mouth), our concern is with how context-aware systems could address service awareness more dynamically[ CITATION Gar07 \l 1033 ].
Chapter 3: The Nature of Mobile Applications
3.1 Introduction
Rangel and Ferreira (2009) argued in their article that in recent years, a variety of mobile computing devices has emerged, including palmtops, and PDAs. In consequence, the evolution of mobile computing devices has imposed a clear need for evaluation methods that are specifically suited to mobile devices. Nonetheless, the more evolved is the computing market toward mobile computing systems, the more difficult becomes for HCI evaluators to choose among the approaches for mobile application usability evaluation recently proposed in the literature [ CITATION Ran09 \l 1033 ]. 
It is a fact that a user is likely to be mobile is the greatest difference in context between users of mobile and desktop devices, and of course that mobility leads to dynamic changes in users’ context. As Koutsabasis et al. (2007) said, “It has to be taken into account that it is difficult, if not impossible, to compare between different studies, based on a lot of claims made without solid statistical results. Given the inherent features of these devices (e.g., mobility, restrictive I/O and storage capabilities, and dynamic use contexts), this has imposed a clear need for specifically suited evaluation methods”[ CITATION Kou07 \l 1033 ]. 
One of the major questions in the literature is related to the possibility of adapting concepts, methodologies, and approaches commonly used in traditional lab and field testing of desktop applications to mobile ones[ CITATION Placeholder4 \l 1033 ]. Another question has been related to whether to adopt a field or a lab approach [ CITATION Ran09 \l 1033 ]. However, little discussion is given of which technique or combination of them is best suited for a specific application and its context of use [ CITATION Placeholder6 \l 1033 ]. Beyond these levels of choice and for a successful choice, it seems equally essential to know about the effectiveness of the chosen approach. Practitioners need to know which methods are more effective and in what ways and for what purposes. Otherwise the evaluation process may result in a big effort with a small payoff [ CITATION Ran09 \l 1033 ].
3.2 Why evaluating Usability of a Mobile Application is Different from a Desktop One?
The most important differences to be aware of when evaluating usability for mobile applications are:
3.2.1 Wide Range of Platforms
In a very beautiful article called, Blazing Platforms, published in the Economist Journal, “In the past three years we have seen a number of new smart phone operating systems burst onto the scene. From Apple's wildly successful iPhone OS, to Google's increasingly popular Android and Palm's praised yet still floundering Web OS -- to name a few, figure 2 is taken from Google images, it shows a group of smart phones with a wide range of platforms. As published in a very beautiful article in “The Economist”, Nokia for example could not cope with the flood of smart phones technology. With the emergence of smart phones, in particular Apple’s iPhone, which appeared in 2007. Nokia still ships a third of all handsets, but Apple astonishingly pulls in more than half of the profits, despite having a market share of barely 4%. More Americans now have smart phones than Europeans. As for standards, Verizon, America’s biggest mobile operator, is leading the world in implementing the next wireless technology, called LTE” [ CITATION Placeholder16 \l 1033  ]. 
All of them are characterized for powering a new generation of touch-driven, fast and feature-rich devices that evermore seem to be turning into pocket-sized computers. The fact that three of the most significant companies in desktop computing -- Apple, Google and Microsoft -- now stand to occupy the same positions in mobile is a clear indication of where the industry is going[ CITATION Eco10 \l 1033  ]. 
Mobile applications have become the backbone of our mobile communication system these days. Standardization is a big trouble for the developer, especially if he/she intends to create a multi-platform mobile application [ CITATION Gra09 \l 1033 ]. The Mobile Web, which implies the use of browser-based mobile applications, offers end users many functionalities and features, but sadly suffers from many issues today. These issues mostly deal with incompatibility and usability problems[ CITATION Gra09 \l 1033 ] .

Figure 2: A group of smart phones with different platforms (from Google images)
Incompatibility or interoperability issues are a consequence multiple mobile platforms, operating systems, a vast degree of platform fragmentation and browsers[ CITATION Vis10 \l 1033 ]. In her article, Priya Viswanathan (2010) argued that usability problems arise due to the small form factor of a mobile device, resolution, sound quality, difficulty in operation and so on. The emergence of smart phones with a multitude of operating systems has further added to the list of worries for the developer. There are many mobile platforms, each with its own patterns and constraints. The more the evaluator understands each platform, the better he/she can decide how to evaluate.
3.2.2 Wide Range of Input Methods
In a study by Fernandez et al. (2009), he argued that mobile phones with advanced capabilities often employ touch screen as one of the main interaction methods. By adding touch screen capability, it is easier for users to carry out certain actions depending on the task they want to accomplish. However, even with the emergence of such technologies and input capabilities, the design of applications running on such devices is critical for the success of both the device and the application. Usability still plays a big role in its acceptance in the mobile market. It may be difficult to change the design of the actual hardware itself, but a well designed application can be a big help in order to overcome such limitations [ CITATION Fer09 \l 1033 ].
Bourguet & Chang (2008) pointed out that multimodal interfaces, characterized by multiple parallel recognition-based input modes such as speech and hand gestures, have been of research interest for some years. A common claim is that they can provide greater usability than more traditional user interfaces. For example, they have the potential to be more intuitive and easily learnable because they implement interaction means that are close to the ones used in everyday human-human communication. When users are given the freedom of using the modalities of interaction of their choice, multimodal systems can also be more flexible and efficient [ CITATION Bou08 \l 1033 ]. 
In particular, mobile devices, which generally suffer from usability problems due to their small size and typical usage in adverse and changing environments, can greatly benefit from multimodal interfaces as Grifoni (2009) argued in her book. Moreover, the emergence of novel pervasive computing applications, which combine active interaction modes with passive modality channels based on perception, context, environment and ambience, raises new possibilities for the development of effective multimodal mobile devices. For example, context-aware systems can sense and incorporate data about lightning, noise level, location, time, people other than the user, as well as many other pieces of information to adjust their model of the user’s environment[ CITATION Gri09 \l 1033 ]. 
More robust interaction is then obtained by fusing explicit user inputs (the active modes) and implicit contextual information (the passive modes). In affective computing, sensors that can capture data about the user’s physical state or behavior are used to gather cues which can help the system perceive users’ emotions [ CITATION Bou081 \l 1033  ].  
In his next paper Bourguet and Chang (2008) argued that our lack of understanding of how recognition-based technologies can be best used and combined in the user interface often leads to interface designs with poor usability and added complexity. For designers and developers in the industry, developing multimodal interaction systems presents a number of challenges, such as how to choose optimal combinations of modalities, how to deal with uncertainty and error-prone human natural behavior, how to integrate and interpret combinations of modalities, and how to evaluate sets of multimodal commands[ CITATION Bou08 \l 1033 ]. 
3.2.3 Different Screen Sizes
Weiss (2002) provided a list of guidelines for designing user-interfaces for handheld devices. However, these guidelines are very general and do not have concrete examples of UI components to be used based on specific models of handheld devices[ CITATION Fer09 \l 1033 ]. Oehl et al. (2007) looked into the correlation between the size of a screen’s display to the difficulty of the pointing task influenced by the size of the target for touch screen displays using a stylus pen. They focused on the actual UI components that should be used in order to easily input information[ CITATION Oeh07 \l 1033  ]. The investigation done by Hoggan et al. (2008) regarding the interaction with mobile devices without physical keyboards resulted in the conclusion that performance improved when tactile feedback is involved[ CITATION Hog08 \l 1033 ]. 
In a report on trends on mobile screen sizes, mhjerde (2008) said that over the past years, the relative screen size difference has increased. The difference between the smallest (128 x 128) and the largest (800 x 480) is now a factor of 23. This means that th largest screen is 23 times bigger than the smallest one. We can see that the smaller screens have a portrait orientation and the large screens have a landscape orientation. Between them are the phones that can change orientation, they can work in both landscape and portrait[ CITATION mhj10 \l 1033 ]. 
The physical limitations of mobile displays limit the amount of information that can be conveyed. Techniques that have been applied to overcome this are the use of sound, rendering and restructuring the content that would normally appear in larger displays or cloning the small display on nearby bigger displays [ CITATION Lem05 \l 1033 ].
While screen resolution continues to improve, and color screens are becoming the norm, screen sizes are likely to remain small, as users prefer small and portable devices. However, this might change when electronic paper becomes available. Electronic paper is a paper-like sheet made up of thousands of microcapsules that are electrically charged to display white or black ink, or any other pair of colors. When this technology is commercialized, it will revolutionize the mobile devices industry, due to the fact that electronic papers can be folded and stored in a small pocket.
Other Problems regarding the Size of Mobile Screen:
The following two bullets are quoted from [ CITATION Moh10 \l 1033 ].
    • Excessive Scrolling: Phones have been getting smaller and sleeker over the past years, which mean screens are even smaller. While the width of a standard desktop screen is around 1052px, the standard width for a mobile site is 250px. Unfortunately because of the small screen restriction, viewing a site is very frustrating because of scrolling and/or zooming in. I’ve run into this problem so many times, when I search on my blackberry, I always have to zoom into the site to be able to read, which of course means I have to scroll from right to left to read the content properly.
    • Page Height: Along with having to zoom, the page could be longer than necessary, causing the user to have to scroll down.
3.2.4 Context of Use
According to [ CITATION Dey01 \l 1033 ], context of use can be defined as “any information that characterizes a situation related to the interaction between users, applications, and the surrounding environment It typically includes the location, identities of nearby people, objects, as well as environmental elements that may distract users’ attention. It is very difficult to select a methodology that can include all possibilities of mobile context in a single usability test [ CITATION Dey01 \l 1033 ].
3.2.5 Battery Consumption
Batteries are important for the mobility and portability of mobile devices. Batteries run out quickly in most handheld devices. This is especially true if the user is on the move and has no time to charge the battery. There is an ongoing effort to reduce consumption of power and increase battery life for mobile devices. Some of these efforts include new battery technologies, like the fuel cell.
3.2.6 Limited Processing Power 
Many of the services provided over the Internet, such as gaming, music, and video, require faster machines and higher memory levels that more suits to PCs than the current processing power of mobile devices.
3.2.7 Recovery after System Failures
The mobile device should have the ability to recover after failures; so that the user’s data cannot be lost after failures. In order to maintain connectivity and minimize dropouts, mobile devices should have the ability to save and minimize the loss of data, while conducting transactions. For many users on the move, connectivity is very important. In order to maintain connectivity and minimize dropouts, mobile devices should have the ability to save and minimize the loss of data, while conducting transactions.
3.2.8 Standardization Issues faced by Mobile Applications’ Developers
The mobile applications industry is highly fragile and fragmented, with developers working on tight schedules and highly limited budgets. To make matters more complicated, there are numerous devices, brands and smart phone applications coming into the market every day. Developers are constantly trying to create more innovative applications for giants like the iPhone, Android and BlackBerry and brands such as Nokia, HTC, Samsung and so on. Developers hence start to wonder what they should start on first - the hardware or the software. Budgeting concerns take foreground too, with so many apps working on an unimaginably low budget [ CITATION Vis10 \l 1033 ]. 
Yet another point of import is that not all devices go down well with all sections of society. For example, it is generally seen that the business community largely prefers smart phones such as BlackBerry and HTC, whereas the younger generation prefers more flashy devices such as the iPhone. So a developer has to keep his apps open to all these platforms, without having to spend too much on purchasing dedicated software and too much time on debugging issues.
Then there is the pertinent problem of popular application stores not accepting a developer's applications, no matter how good they may be. Many app stores stipulate stringent restrictions and submission processes, which can further frustrate the developer.
3.2.9 Limited Bandwidth
As the number of mobile users increases, there will be greater demand for shared wireless capacity. There is still a lot of development work to be done to improve the limitations of mobile devices and network bandwidth (Jameson, 2006). New products, such as Wi-Fi5, which is based on high-speed standards and runs in 5 Ghz spectrums and provides up to 54 Mbps, are starting to reach the market. The network bandwidth is expected to improve in the next few years, and the number of users is expected to grow as well. 
3.2.10 Different Interaction Styles
Over one billion people own or use cellular mobile telephones. Therefore, industry practitioners are faced with a question: how big steps can they take when designing the user interfaces for their new products, or how closely should they stick with the already existing user interface conventions that may already be familiar to the consumers. In his thesis, Kiljander (2004) clarified that interaction style denotes the framework consisting of the physical interaction objects, the abstract interaction elements, and the associated behavior or interaction conventions that are applied throughout the core functionality of the mobile phone, but excludes the stylistic appearance elements of the user interface.
In his literature review, Kiljander (2004) presented a literature study that compared the interaction styles applied in mainstream computing domains against the aspects relevant in the mobile phones domain. A heuristic analysis of contemporary mobile phones is used to formulate an understanding of the available interaction styles and analyze whether there is convergence towards specific types of interaction styles in the industry. An empirical usability testing experiment with 38 test users is conducted with a novel mobile phone interaction style to investigate differences between users who are already familiar with different mobile phone interaction styles. The study reveals that interaction styles applied in contemporary mobile telephones are designed around menu navigation, and they implement the three primary operations (Select, Back and Menu access) with dedicated hardkeys, context-sensitive softkeys, or using special control devices like joysticks or jog dials. The control keys in the contemporary interaction styles are converging around various two- and three-softkey conventions.
The aspects related to indirect manipulation and small displays pose specific usability and UI design challenges on mobile phone user interfaces. The study shows that the mobile handset manufacturers are applying their usually proprietary interaction styles in a rather consistent manner in their products, with the notable exception of mobile Internet browsers that often break the underlying interaction style consistency[ CITATION Dum08 \l 1033 ]. 
Chapter 4: State of the Art of Usability Evaluation for Mobile Applications
4.1 Introduction
Oxford Dictionaries defined a paradigm as “world view underlying the theories and methodology of a particular scientific subject”[ CITATION Oxf11 \l 1033 ], while Cambridge Dictionaries Online defined a paradigm as “a model of something, or a very clear and typical example of something” [ CITATION Cam11 \l 1033 ]. In most of dictionaries, the word “approach” is a synonym of the word “paradigm”. Cambridge Dictionaries Online defined a “technique” as “a way of doing an activity which needs skill” [ CITATION Cam111 \l 1033  ], but Oxford Dictionaries defined it as “the body of specialized procedures and methods used in any specific field, it is a method of performance; way of accomplishing” [ CITATION Oxf111 \l 1033 ]. 
Most of the usability evaluation approaches developed during the last 25 years have focused primarily on desktop applications. The challenges for usability evaluation of mobile devices stem from their special characteristics like small screens with low resolution (compared to desktops), less power supply, and the trend to make devices smaller and smaller. At the same time the number of functions supported has increased [ CITATION Ber08 \l 1033 ].
4.2 Usability Evaluation Paradigms
Some of the paradigms developed for standard desktop applications have been adopted to be used during the development process of mobile devices and applications. A selection of some of the variations and adoptions of classical paradigms to fit usability evaluation of mobile   devices is described in the following:
Usability evaluation paradigms are classified within the following framework as:
    • "Quick and dirty" evaluation
    • Usability testing
    • Field studies
    • Predictive evaluation
4.2.1 "Quick and dirty" Evaluation
A "quick and dirty" evaluation is a common practice in which designers informally get feedback from users or stackholders to verify that their ideas are aligned with users' needs. "Quick and dirty" evaluations can be done at any stage and the weight is on fast input rather than carefully documented findings. For example, early in design developers may meet informally with users to get opinions on thoughts for a new product [ CITATION Placeholder3 \l 1033  ]. At later stages similar meetings may occur to try out an idea for an icon, check whether a graphic is liked, or confirm that information has been appropriately categorized on a webpage. This approach is often called "quick and dirty" because it is meant to be done in a short space of time. Getting this kind of feedback is an essential ingredient of successful design.
The data collected is usually descriptive and informal and it is fed back into the design process as verbal or written notes, sketches and anecdotes, etc [ CITATION Free08 \l 1033 ].  Another source comes from experts, who use their knowledge and experience with user behavior and the market place to review software quickly and provide suggestions for improvement. It is an approach that has become particularly popular in web design where the emphasis is usually on short time scales [ CITATION Kje041 \l 1033 ].
4.2.2 Usability Testing
Usability testing means measuring typical users' performance on system typical prepared tasks that are representatives of those for which the system was designed. Users' performance is generally measured in terms of number of errors and time to complete the task. As the users perform these tasks, they are watched and recorded on video and by logging their interactions with software [ CITATION Imp11 \l 1033 ]. This observational data is used to calculate performance times, identify errors, and help explain why the users did what they did. User satisfaction questionnaires and interviews are also used to elicit users' opinions.
In a beautiful paper in the journal of usability studies, Kawalek  and his team (2008) argued that conducting usability tests in the lab is organizationally much easier than testing in the field because almost all influencing factors can be controlled. Interaction and think-aloud data can be recorded with several cameras and screen capturing tools because of having easy access to all technical options. In addition, participants are comfortable with thinking aloud because there are no passers-by. All in all, the context, which is a big part of mobile interaction, is not considered. Thus, results of usability studies of mobile applications conducted in the lab cannot be assigned to mobile use [ CITATION Kaw08 \l 1033 ]. 
This paradigm involves collecting information on users’ performance while they are carrying out specific tasks and assessing their attitude towards using a particular product service. The performance measures include time taken to complete a task, number of errors, and many others. In addition, in terms of assessing users’ attitude towards the application; questionnaires and interviews are commonly used. Most usability testing tasks take place under controlled conditions in a usability lab [ CITATION Lov05 \l 1033 ].
Duh et al. (2006) argued that the main factors which influence the interaction between users and mobile devices include contextual awareness, task hierarchy, visual attention, hand manipulation, and mobility. These are critical issues of usability and mobile application design [ CITATION Duh06 \l 1033 ].  
The defining characteristic of usability testing is that it is strongly controlled by the evaluator [ CITATION May99 \l 1033 ]. Typically tests take place in laboratory-like conditions that are controlled. Casual visitors are not allowed and telephone calls are stopped, and there is no possibility of talking to colleagues, checking email, or doing any of the other tasks that most of us rapidly switch among in our normal lives. Everything that the participant does is recorded-every key-press, comment, pause, expression, etc., so that it can be used as data [ CITATION Free08 \l 1033 ].
Kaikkonen et al. (2008) argued in his article that the goal of a usability test is to improve the system being developed. They argued also that sufficient results are often achieved with 5 to 10 users per test iteration, although all problems may not be detected. The goal of academic HCI research is to better understand users’ behavior and interaction models, as well as improve the methods used in product development. In order for the results to be reliable and to help comparison between studies, the number of test users should be higher. A minimum of 95% of usability problems were found with 20 users and variation between groups was fairly small [ CITATION Placeholder4 \l 1033 ].
4.2.3 Field Studies
Field studies take place in natural surroundings where the application is designed to be used, with the aim of understanding how the system impacts the user in their everyday context of use. One can see the importance of this evaluation approach in relation to mobile applications and systems. There are various techniques that can be used to gather evaluation information using a field study approach, such as observations and ethnography [ CITATION Lov05 \l 1033 ].
The distinguishing feature of field studies is that they are done in natural settings with the aim of increasing understanding about what users do naturally and how technology impacts them. In product design, field studies can be used to (1) help identify opportunities for new technology; (2) determine requirements for design; (3) facilitate the introduction of technology; and (4) evaluate technology[ CITATION Free08 \l 1033 ].
Kawalek et al. (2008) clarified that in field studies, context factors are taken into consideration. Therefore, the evaluator has less control of the setting. Capturing interaction data might be difficult as participants might refuse to think aloud while walking along a street. In addition, following them with a camera is obtrusive and might change their normal behavior. Nevertheless, in some cases testing in the field is indispensable, e.g., when testing a pedestrian navigation application [ CITATION Kaw08 \l 1033 ].
Usability testing in the field is more time consuming than laboratory testing. Without concrete proof to support the theory that testing in a real-life context is significantly better than a laboratory test, companies have good reason to question whether investing in more expensive and more time consuming field tests is worthwhile [ CITATION Ozt06 \l 1033 ].
According to Preece et al. (2002), there are two overall approaches to field studies. The first involves observing explicitly and recording what is happening, as an outsider looking on. Qualitative techniques are used to collect the data, which may then be analyzed qualitatively or quantitatively. For example, the number of times a particular event is observed may be presented in a bar graph with means and standard deviations. In some field studies the evaluator may be an insider or even a participant [ CITATION Placeholder5 \l 1033 ]. 
Ethnography is a particular type of insider evaluation in which the aim is to explore the details of what happens in a particular social setting. In the context of human computer interaction, ethnography is a means of studying work (or other activities) in order to inform the design of information systems and understand aspects of their use [ CITATION Fre08 \l 1033 ].
Pascoe, Ryan, and Morse (2000) studied the effects of using mobile devices while on the move, especially HCI related issues involved in using mobile devices in a real world (field) environment [ CITATION Pas00 \l 1033 ]. 
Kjeldskov et al. (2005) found that taking usability studies in the field only added little value, discovering the same problems both in the lab as well as in the field. On the other hand, several mobile HCI research studies assume benefits when conducting user testing in the field [ CITATION Kje05 \l 1033 ].
Kjeldskov et al. (2005) argued that “expensive time in the field should perhaps not be spent on usability evaluation (in the field) if it is possible to create a realistic laboratory setup including elements of context …” Duh et al. (2006) reported that more usability problems could be found in the field compared to the lab. Whether usability studies should be conducted in the lab or in the field is still a matter of discussion and needs further research [ CITATION Duh06 \l 1033 ].
Duh, Tan, and Chen (2006) also undertook a comparison of lab and field evaluations. Two groups of participants undertook an evaluation of a mobile-phone based application in one of two settings: seated in a lab with the usage scenario textually described or in the field in the actual usage scenario [ CITATION Duh06 \l 1033  ] ; [ CITATION Cre \l 1033  ]. In both cases, the think aloud technique was used and the participants’ interaction with the application was recorded [ CITATION Cre \l 1033 ]. In contrast to Kjeldskov et al. (2004), significantly more critical errors were found by the participants in the field than by those in the lab. Although no definitive reason is given, there are several possibilities [ CITATION Kje04 \l 1033 ]. The lab-based participants were seated during the evaluation so no attempt was made to mimic the real-life context of use. Also, the participants in the field expressed increased nervousness and stress which may have been an experimental artifact caused by the requirement to verbally describe everything they were doing in a public location [ CITATION Placeholder6 \l 1033 ].
4.2.4 Field or Laboratory Evaluation?
Fling (2011) pointed out in his book that since our interaction with mobile devices happens in different way to desktop computers, it seems a logical conclusion that the context of use is important in order to observe realistic behavior. Brian Fling also stated that you should “go to the user; don’t have them come to you”. However, testing users in the field has its own problems, especially when trying to record everything going on during tests (facial expressions, screen capture and hand movements) [ CITATION Fli09 \l 1033 ]. 
Kjeldskov and Skov (2003) argued that conducting mobile device usability evaluations in the field raises a number of potential problems. First, the use contexts of mobile devices are often highly dynamic involving several temporally and physically distributed actors. Additionally, mobile systems for highly specialized use contexts such as safety-critical or hazardous environments may prohibit exploratory usability evaluations since errors involving risks for people and equipment cannot be tolerated. In addition, field evaluations complicate data collection and limit means of control since users are moving physically in an environment with a number of unknown variables potentially affecting the setup[ CITATION Kje10 \l 1033  ]. 
Usability evaluation has grown into a well-established discipline. The first approaches to usability evaluation as well as today’s mainstream methods are inherently based on the use of a dedicated laboratory[ CITATION Nie08 \l 1033  ]. For several years, this focus on the laboratory has been countered by others who argue in favor of conducting usability evaluations in the field. The discussion of this distinction between field and laboratory has mostly been a matter of opinions, and it has not been prominent in the literature on experimental comparisons of evaluation methods. There are, however, examples of experimental comparisons field and laboratory evaluations, e.g. [ CITATION Her99 \l 1033 ]. 
The advent of mobile devices and systems has revived the controversies about this distinction. Usability evaluation of mobile systems is still an immature discipline[ CITATION Ber08 \l 1033 ]. Therefore, basic questions are being discussed. One such question is: should usability evaluation of a mobile system be conducted in the field or in a usability laboratory? 
Some argue that a usability evaluation of a mobile system should always be conducted in the field. It is important that systems for mobile devices are tested in realistic settings, since testing in a conventional usability laboratory is not likely to find all problems that would occur in real mobile usage. It also seems to be an implicit assumption that the usability of a mobile system can only be properly evaluated in the field. However, usability evaluation in the field is time consuming, complicates data collection and reduces experimental control. There are, however, practical guidelines for handling these challenges [ CITATION Bai03 \l 1033 ].
Others argue that usability evaluations in laboratory settings are not troubled with the problems that arise in field evaluations. In a laboratory, the conditions for the evaluation can be controlled, and it is possible to employ facilities for collection of high-quality data such as video recordings of the display and user interaction [ CITATION Kje07 \l 1033 ].
The similarities and differences between field and lab-based usability evaluations of mobile systems are beginning to be explored. Some of the comparisons that have been made have observed that there are different interaction behaviors in the laboratory and in the field settings, and they conclude that it is worthwhile carrying out evaluations in the field, even though it is problematic due to difficulties in capturing screen content and the interaction between the user and the mobile device [ CITATION Bai03 \l 1033 ]. 
A paper presented at the journal of usability studies concluded that the added value of conducting usability evaluations in the field is very limited and recreation of central aspects of the use context in a laboratory setting enables the identification of the same usability problems. These results are supported by another comparative study where it was concluded that the same usability problems were found both in the laboratory and in the field [ CITATION Placeholder4 \l 1033 ].
Kjeldskov and Skov (2010) argued that carrying out contextual enquiries using diary studies are beneficial, they also have drawbacks as they rely on the participant to provide an accurate account of their behavior which is typically not always easy to achieve, even with the best intentions. Carrying out research in a coffee shop for example provides the real-world environment which maximizes external validity. However, for those who field studies are impractical for one reason or another, simulating a real-world environment within a testing lab has been adopted. Researchers believe they can also help to provide external validity which traditional lab testing cannot. In the past researchers have attempted a variety of techniques to do this and are listed below [ CITATION Kje10 \l 1033  ]:
    • Playing music or videos in the background while a participant carries out tasks 
    • Periodically inserting people into the test environment to interact with the participant, acting as a temporary distraction 
    • Distraction tasks including asking participants to stop what they are doing, perform a prescribed task and then return to what they’re doing (e.g. Whenever you hear the bell ring, stop what you are doing and write down what time it is in this notebook.) Having participants walk on a treadmill while carrying out tasks (continuous speed and varying speed) 
    • Having participants walk at a continuous speed on a course that is constantly changing (such as a hallway with fixed obstructions) 
    • Having participants walk at varying speeds on a course that is constantly changing 
Although realism and context of use would appear important to the validity of research findings, previous research has refuted this assumption. Comparing the usability findings of a field test and a realistic laboratory test (where the lab was set up to recreate a realistic setting such a hospital ward) found that there was little added value in taking the evaluation into a field condition. The research revealed that lab participants on average experienced 18.8% usability problems compared to field participants who experienced 11.8%. In addition to this, 65 man-hours were spent on the field evaluation compared to 34 man-hours for the lab evaluation, almost half the time[ CITATION Ber06 \l 1033 ].
Subsequent research has provided additional evidence to suggest that lab environments are as effective in uncovering usability issues as effective. In this study, researchers did not attempt to recreate a realistic mobile environment, instead comparing their field study with a traditional usability test laboratory set-up. They found that the same issues were found in both environments. Laboratory tests found more cosmetic or low-priority issues than in the field and the frequency of findings in general varied. The research also found that the field study provided a more relaxed setting which influenced how much verbal feedback the participant provided, however this is refuted by other studies which found the opposite to be true[ CITATION Placeholder4 \l 1033 ].
Both studies concluded that the laboratory tests provided sufficient information to improve the user experience, in one case without trying to recreate a realistic environment. Both found field studies to be more time-consuming. Unsurprisingly this also means the field studies are more expensive and require more resources to carry out. Many user experience practitioners will agree that any testing is always better than none at all. However, there will always be exceptions where field testing will be more appropriate. For example, if a geo-based mobile application is being evaluated this will be easier to do in the field than in the laboratory.
Duh, Tan, and Chen (2006) also undertook a comparison of lab and field evaluations. Two groups of participants undertook an evaluation of a mobile-phone based application in one of two settings: seated in a lab with the usage scenario textually described or in the field in the actual usage scenario. In both cases, the think aloud technique was used and the participants’ interaction with the application was recorded. In contrast to [ CITATION Kje041 \l 1033 ], significantly more critical errors were found by the participants in the field than by those in the lab. Although no definitive reason is given, there are several possibilities. The lab-based participants were seated during the evaluation so no attempt was made to mimic the real-life context of use. Also, the participants in the field expressed increased nervousness and stress which may have been an experimental artifact caused by the requirement to verbally describe everything they were doing in a public location[ CITATION Duh06 \l 1033 ].
Studying the usability of mobile technologies, however, raises new questions and concerns. Mobile systems are typically used in highly dynamic contexts involving a close interaction between people, systems, and their surroundings. Therefore, studying mobile technology use in situ seems like an appealing or even indispensable approach rather than trying to recreate the use situation realistically in a laboratory. However, studying mobile technology usability “in the real world” is difficult. It is difficult to capture key situations of use, apply established usability techniques such as observation and “thinking aloud” without interfering with the situation, and it is complicated to collect data of an acceptable quality[ CITATION Koc09 \l 1033 ].
Deciding how to capture data is something important to think about. Finding the best way to capture all relevant information is trickier on mobile devices than desktop computers. Various strategies have been adopted by researchers, a popular one being the use of a sled which the participant can hold comfortably and have a camera positioned above to capture the screen. In addition to this it is possible to capture the mobile screen using specialized software specific to each platform.
Rangel and Ferreira (2009) argued that even though laboratory testing is widely and effectively utilized in the evaluation of software interfaces, it has some limitations. Laboratory based usability studies capture a snapshot of the use in a simulated use environment. Simulating the use setting is very hard, time consuming, expensive and sometimes impossible to attain. In the laboratory, users are isolated from contextual factors (often distractions), such as interactions with other users and products which draw their attention away from the product being used. On the other hand, new electronic devices are used in a context, in which multitasking is a key factor. These devices are used while walking, talking, running or even driving [ CITATION Ran09 \l 1033 ]. 
Kjeldskov et al. (2005) evaluated a mobile guide to support the use of public transportation with 4 distinct evaluation methods: field evaluation, lab evaluation, heuristic walkthrough and rapid reflection. They categorized the usability problems as critical (stopped users from completing the tasks), serious (inhibited/slowed down the users from completing the tasks) and cosmetic (did not inhibit users from completing the tasks). The results showed that field and lab evaluations had a significant overlapping in critical and serious problems, although field was slightly more efficient in serious problems. However, field studies were shown to be the least effective method in identifying cosmetic problems. Laboratory-based approach drew attention to device-oriented issues and field approach drew attention to issues such as the real-world validity and precision of the data presented by the system and the social comfort [ CITATION Kje05 \l 1033 ].
In a different study by Rubin and Chisnell (2008), six evaluation techniques were compared for their effectiveness in evaluating the use of a mobile device. One of them involved walking in a pedestrian street and the remaining five where in the lab: sitting at a desk, walking on a treadmill at constant or varying speed, walking at a constant or varying speed on a course that was constantly changing. The results indicate that the sitting down technique was the most effective but largely because of its effectiveness in identifying cosmetic problems. On the other hand, techniques involving movement were more effective in finding user interface layout problems. Overall, the authors report that “there were no significant differences between the techniques in terms of user performance” but the field technique exhibited significantly more mental workload in terms of perceived effort and overall workload which highly simulates the actual way of mobile use[ CITATION Rub08 \l 1033 ]. 
Another study by Kjeldskov (2004) compared field and lab evaluations for a context-aware mobile electronic patient record system prototype reported the lab evaluation as more effective. However, the difference of effectiveness of the methods was non-significant for critical problems. In particular, lab evaluation identified 13% more critical problems, 42% more serious problems and 40% more cosmetic problems, while both methods identified all context-aware related problems. We should, however, note that the evaluation time in the lab lasted twice as much as in the field per participant and the nurses participating in the field evaluation were less experienced[ CITATION Kje04 \l 1033 ]. 
In a paper by [ CITATION Nie08 \l 1033  ], they have presented and compared the results from two usability evaluations of the same system conducted in two different setting: field and laboratory. By employing identical test procedure and data collection equipment, they have established a solid foundation for comparing these two evaluations. When the evaluations were conducted in the same way, the field evaluation was more successful as this setting enabled identification of significantly more usability problems compared to the laboratory setting. In addition, it was only in the field evaluation we identified usability problems related to cognitive load and interaction style. This indicates that evaluations conducted in field settings can reveal problems not otherwise identified in laboratory evaluations. Thus the overall conclusion of is that it is worthwhile conducting user-based usability evaluations in the field, even though it is more complex and time-consuming. The added value is a more complete list of usability problems that include issues not detected in the laboratory setting [ CITATION Nie08 \l 1033  ].
When conducting usability studies in laboratory-based settings, experimental control and collection of high quality data is typically not a problem [ CITATION Kje10 \l 1033 ]. Kaikkonen et al. (2008) categorized laboratory usability evaluations to be either expert, theoretical, or user-based or a combination of these. They argued that the similarities and differences between theoretical evaluations such as heuristic inspection, and user-based evaluations such as usability testing with think-aloud, have been shown for standard desktop applications, and for web-sites. They argued also that it is generally acknowledged that user-based evaluations tend to find a higher number of problems and more relevant problems. On the other hand, user-based evaluations tend to be more time consuming than theoretical evaluations. We studied user-based and theoretical evaluations for collaborative mobile systems and found that user based evaluations seem to find more problems than theoretical evaluations [ CITATION Placeholder4 \l 1033 ]. 
Kjeldskov & Skov (2003) argued that laboratory evaluations of mobile systems raise a number of challenges. First, the relation between the system and activities in the physical surroundings can be difficult to capture in expert evaluations such as heuristic evaluation or recreate realistically in a usability laboratory. Secondly, working with systems for highly specific domains, laboratory studies may be impeded by limited access to prospective users on which such studies rely. While benefiting from the advantages of a controlled experimental space, evaluating the usability of mobile systems without going into the field thus challenges established methods for usability evaluations in controlled environments [ CITATION Kje10 \l 1033 ].
In his research on evaluating mobile human computer interaction, Garzonis (2007) argued that the attempts to adapt usability evaluation methods for mobile devices and services are limited. One of the issues that have been addressed by the research community is the tradeoff between lab and field mobile evaluation. Although the studies carried out are limited and involve a small number of participants, they seem to agree on certain points. First, lab evaluations are more efficient in identifying cosmetic problems, which do not hinder interaction and user performance [ CITATION Gar207 \l 1033 ]. Second, field evaluation is more likely to identify issues that are related to the real context of use, such as navigation and social comfort. Perhaps it is time for a new hybrid approach to be introduced, in which users might perform cooperative evaluation sessions in real world contexts [ CITATION Gar207 \l 1033 ].
4.2.5 Predictive Evaluation
According to Preece et al. (2002), In predictive evaluations, experts apply their knowledge of typical users, often guided by heuristics, to predict usability problems. Another approach involves theoretically based models. The key feature of predictive evaluation is that users need not be present, which makes the process quick, relatively inexpensive, and thus attractive to companies; but it has limitations [ CITATION Placeholder5 \l 1033 ].
Love (2005) defined a heuristic as it is a principle that is used in making a decision. He argued that the idea behind heuristic evaluation is the evaluators independently conduct a usability evaluation of a system or prototype to spot any potential problems with the design, with the help of a list of design heuristics as an aid. These evaluators also indicate the severity of the problems found. One thing to notice about heuristics is that they are related to design principles and guidelines as it makes sense to evaluate a system on these principles. In addition, heuristic evaluation is normally conducted at an early stage of evaluation, before users are involved in the evaluation process[ CITATION Lov05 \l 1033 ].
In recent years, heuristic evaluation in which experts review the software product guided by tried and tested heuristics has become popular [ CITATION Placeholder5 \l 1033 ]. Usability guidelines (e.g., always provide clearly marked exits) were designed primarily for evaluating screen-based products[ CITATION Placeholder5 \l 1033 ]. 
Tanaka et al. (2005) in his paper argued that this type of evaluation is easy, fast (almost one day long for most evaluations) and can be as cheap as needed. He argued also that the predictive evaluation is conducted by a little group of evaluation experts evaluating the intended system interface and judging its problems with respect to a set of heuristics. He recommended also to use three to five evaluators because it is difficult for only a person to find all interface usability problems. He pointed out that the results of this evaluation can be improved by discussion between the members of the group of evaluators because different experts tend to find different problems in an interface [ CITATION Tan05 \l 1033 ].
Since time and resources are critical, companies look into the most efficient ways to find usability problems in products. Sometimes this means taking shortcuts that should not be taken. When resources are limited, attention must be paid to expertise in testing. There usually is not much time for trial and error or training[ CITATION Placeholder4 \l 1033 ]. 
Preece, Sharp, Rogers (2002) divided the predictive evaluation method into two phases. In the first phase, each evaluator judges the interaction design against the heuristics. The second phase consists of a group discussion session to collect and summarize all the evaluators’ lists of problems. The discussion about the lists may include rating the severity for each problem detected[ CITATION Placeholder5 \l 1033 ]. 
“With the advent of a range of new interactive products (e.g., the web, mobiles, collaborative technologies), this original set of heuristics has been found insufficient. While some are still applicable (e.g., speak the users' language), others are inappropriate. New sets of heuristics are also needed that are aimed at evaluating different classes of interactive products. In particular, specific heuristics are needed that are tailored to evaluating web-based products, mobile devices, collaborative technologies, computerized toys, etc. These should be based on a combination of usability and user experience goals, new research findings and market research” [ CITATION Placeholder5 \l 1033 ].
4.3 Usability Evaluation Techniques
There are many evaluation techniques and they can be categorized in various ways:
    • Observing users
    • Asking users their opinions
    • Asking experts their opinions
    • Testing users' performance
    • Modeling users' task performance to predict the efficacy of a user interface
A brief description for each is given in the following subsections.
4.3.1 Observing Users
Observation involves watching and listening to users [ CITATION Placeholder5 \l 1033 ]. Observing users interacting with the system can inform the designer with an enormous amount of data about what they do, the context in which they do it, how well technology supports them, and what else is needed.
Users can be observed in a laboratory, as in usability testing, or in the field environments in which the system is used as in field studies. There is many types observations including structured, semi structured, and descriptive. The choice of which type to use depends on the evaluation goals, the specific questions being addressed, and practical constraints. Notes, audio, video, and interaction logs are well-known ways of recording. Obvious challenges for evaluators are how to observe without disturbing the people being observed and how to analyze the data. And this is a core point in our contribution as we minimize the bias occurring in the data collected due to the up normal or uncomfortable behavior of the user because of the feeling of being observed.
Using Observation in "Quick and dirty" Paradigm
"Quick and dirty" observations can occur anywhere, anytime. For example, evaluators often go into a school, home, or office to watch and talk to users in a casual way to get immediate feedback about a prototype or product. Evaluators can also join a group for a short time, which gives them a slightly more insider role. Quick and dirty observations are just that, ways of finding out what is happening quickly and with little formality[ CITATION Uni10 \l 1033 ].
Using Observation in "Usability Testing" Paradigm
Video and interaction logs capture everything that the user does during a usability test including keystrokes, mouse clicks, and their conversations. In addition, observers can watch the users while using the system through a one-way mirror or via a remote TV screen. The observational data is used to see and analyze what users do and how long they spend on different aspects of the task. It also provides insights into users' affective reactions. For example, sighs, tense shoulders, frowns, and scowls speak of users' dissatisfaction and frustrations. The environment is controlled but users often forget that they are being observed. In addition, many evaluators also supplement findings from the laboratory with observations in the field[ CITATION Chi09 \l 1033  ].
One of the problems with observation in lab is that the observer doesn't know what users are thinking, and can only guess from what he/she sees. So, Think-aloud technique can be effective here as the technique requires people to say out loud everything that they are thinking and trying to do, so that their ideas are externalized.
If a user is silent during a think-aloud protocol, the evaluator could interrupt and remind him to think out loud, but that would be intrusive. Another solution is to have two people work together so that they talk to each other. Working with another person is often more natural and revealing because they talk in order to help each other along. This technique has been found particularly successful with children. It is also very effective when evaluating systems intended to be used synchronously by groups of users, e.g., shared whiteboards.
Using Observation in "Field Studies" Paradigm
In field studies, observers may be anywhere along the outsider insider range. Looking on as an outsider, being a participant observer, or being an ethnographer brings a viewpoint and practices that affects what data is collected, how data is collected, and how it is analyzed and reported[ CITATION Usa101 \l 1033 ]. 
Whether and in what ways observers influence those being observed depends on the type of observation and the observer's skills. The goal is to cause as little disruption as possible. An example of outsider observation is when an observer is interested only in the presence of certain types of behavior. 
Whether the observer sets out to be an outsider or an insider, events in the field can be complex and rapidly changing. There is a lot for evaluators to think about, so many experts have a framework to structure and focus their observation [ CITATION Placeholder5 \l 1033 ].
Data Collection Methods
    1) Notes or diaries: Taking notes is the least technical way of collecting data.
Advantages:
        ◦ Handwritten notes are flexible in the field but must be transcribed.
        ◦ The cheapest method of collecting data.
Disadvantages:
    • It can be difficult and tiring to write and observe at the same time.
    • Observers also get bored and the speed at which they write is limited.
Equipment:
    • Paper, pencil and camera
    2) Audio Recording: Audio can be a useful alternative to note taking.
Advantages:
        ◦ Less intrusive than video.
        ◦ Tapes, batteries, and the recorder are now relatively inexpensive.
Disadvantages:
    • Lack of a visual record.
    • Transcribing the data, which can be onerous if the contents of many hours of recording have to be transcribed.
Equipment:
    • Inexpensive, handheld recorder with a good microphone. Headset useful for easy transcription.
    3) Video Recording:
Advantages:
    • Has the advantage of capturing both visual and audio data.
Disadvantages:
    • Attention becomes focused on what is seen through the lens. It is easy to miss other things going on outside of the camera view.
    • When recording in noisy conditions, e.g., in rooms with many computers running or outside when it is windy, the sound may get muffled.
    • Analysis of video data can be very time-consuming as there is so much to take note of.
Equipment:
    • Editing, mixing, and analysis needed.
    4) Interaction Logging:
A Logfile is a file that lists events that have occurred while the user is using the system. For illustration, Web servers keep log files listing every request sent to the server. With log file analysis tools, it's possible to get a good idea of where visitors are coming from, how often they return, and how they navigate through a site. Using cookies enables Webmasters to log even more detailed information about how individual users are accessing a site [ CITATION Web \l 1033  ].
Advantages:
            1) It is unobtrusive.
            2) Large volumes of data can be logged automatically.
Disadvantages:
    1) It raises ethical concerns that need careful consideration ( as all observation techniques).
4.3.2 Asking Users
According to Preece et al. (2002), asking users what they think of a product-whether it does what they want; whether they like it; whether the visual design appeals; whether they had problems using it; whether they want to use it again is an obvious way of getting feedback. Interviews and questionnaires are the main techniques for doing this [ CITATION Placeholder5 \l 1033 ].
Data Collection Methods
1) Interviews: 
Interviewing is a common technique for getting users to reflect on their experience in their own words. As with other forms of evaluation which require direct interaction with the user, the value of the session depends a lot on the quality of the interviewer. A good interviewer will be able to put the interviewee at ease, let them express their point of view without being influenced, and be able to detect and follow up on any interesting points made in the course of conversation. Furthermore, a good interviewer can build trusted relationships with the subjects, thus easing the way to conduct further interviews. Whilst experienced interviewers will naturally direct the questioning to pursue the most interesting issues, it may be worth creating a script of questions for those interviewers with less experience[ CITATION Jon06 \l 1033 ].
When developing interview questions, they should be kept short, straightforward and not too many. Here are some general guidelines [ CITATION Ste \l 1033  ]:
    • Avoid long questions because they are difficult to remember.
    • Avoid compound sentences by splitting them into two separate questions.
    • Avoid using jargon and language that the interviewee may not understand.
    • Avoid leading questions
Asking colleagues to review the questions and running a pilot study will help to identify problems in advance and gain practice in interviewing.
There are six types of interviews (the definitions of these types are quoted from [ CITATION Placeholder5 \l 1033 ] :
    1) Structured Interviews: Structured interviews pose predetermined questions similar to those in a questionnaire. Structured interviews are useful when the study's goals are clearly understood and specific questions can be identified. To work best, the questions need to be short and clearly worded, Responses may involve selecting from a set of options that are read aloud or presented on paper.
    2) Unstructured Interviews: Open-ended or unstructured interviews are at one end of a spectrum of how much control the interviewer has on the process. Questions posed by the interviewer are open, meaning that the format and content of answers is not predetermined. The interviewee is free to answer as fully or as briefly as she wishes. Both interviewer and interviewee can steer the interview. Thus one of the skills necessary for this type of interviewing is to make sure that answers to relevant questions are obtained. It is therefore advisable to be organized and have a plan of the main things to be covered.
Advantages:
    2) They generate rich data.
Disadvantages:
    1) A lot of unstructured data is generated, which can be very time-consuming.
    2) Difficult to analyze due to large amounts of data collected.
    3) Semi-structured Interviews: Semi-structured interviews combine features of structured and unstructured interviews and use both closed and open questions. For consistency the interviewer has a basic script for guidance, so that the same topics are covered with each interviewee. The interviewer starts with pre-planned questions and then probes the interviewee to say more until no new relevant information is forthcoming.
Focus Groups: A Moderator organizes a small group of 4 to 8 participants by showing them a product. The participants are encouraged to freely give their honest opinions about the product, including suggestions to make it better [ CITATION Usa210 \l 1033 ].
A risk of focus groups is “groupthink” – a social tendency in which persons agree with the most accepted opinions of the group, rather than voicing their own. Groupthink can be lessened or prevented by giving participants a preliminary “homework” assignment before they attend the exercise. A homework assignment allows participants to begin thinking about the general topic about a week before they meet as a group, and it gives individual participants an opportunity to come up with their own opinions before hearing others’ opinions in the group setting [ CITATION Usa210 \l 1033 ].
    4) Advantages: 
        1) It allows diverse or sensitive issues to be raised that would otherwise be missed.
Disadvantages:
    1) The facilitator needs to be skilful so that time is not wasted on irrelevant issues.
    2) It can also be difficult to get people together in a suitable location.
    3) Getting time with any interviewees can be difficult
    5) Telephone Interviews: can be used as a good way of asking participants with whom we cannot meet. Telephone interviews have much similar to face-to-face interviews except for not seeing the body language of the interviewee [ CITATION MIT10 \l 1033 ].
    6) Online Interviews: Online interviews, using either asynchronous communication as in email or synchronous communication as in chats, can also be used. For interviews that involve sensitive issues, answering questions anonymously may be preferable to meeting face to face [ CITATION Placeholder5 \l 1033 ].
2) Questionnaires:
Another popular technique for garnering user opinion is the questionnaire. This technique is popular as it has the potential to reach a very wide audience, is cheap to administer and can be analyzed rapidly. Of course, it can never be as flexible as an interview and requires a lot of effort to design, especially if the user is completing it with no external help. Designing a good questionnaire is a complicated process, especially if the results are to be reliable [ CITATION Jon06 \l 1033 ].
Many questionnaires start by asking for basic demographic information (e.g., gender, age) and details of user experience (e.g., level of expertise). This background information is useful in finding out the range within the sample group.
Online Questionnaires:
With the increasing use of the Internet, online questionnaires have become a popular way of collecting information. The design of an online questionnaire often has an effect on the the quality of data gathered. There are many factors in designing an online questionnaire; guidelines, available question formats, administration, quality and ethic issues should be reviewed [ CITATION Log07 \l 1033 ]
Online questionnaires are effective for reaching a lot of users quickly and easily. The advantage of email questionnaire is that you can target specific users. However, email questionnaires are usually limited to text, but web-based questionnaires can include check boxes, combo boxes, drop down menus and graphics. 
Advantages:
    1) Responses are usually received quickly. 
    2) Copying and postage costs are lower than for paper surveys or often nonexistent.
    3) Data can be transferred immediately into a database for analysis.
    4) The time required for data analysis is reduced.
    5) Errors in questionnaire design can be corrected easily (though it is better to avoid them in the first place).
Disadvantages:
    1) Obtaining Random sample of respondents. 
Rating Scales: 
There are a number of different types of rating scales that can be used, each with its own purpose [ CITATION Opp92 \l 1033 ]. Here we describe two commonly used scales, Likert and semantic differential scales.
A Likert scale is defined in Wikipedia (2009) as a psychometric scale commonly used in questionnaires, and is the most widely used scale in survey research, such that the term is often used interchangeably with rating scale even though the two are not synonymous. When responding to a Likert questionnaire item, respondents specify their level of agreement or disagreement on a symmetric agree-disagree scale for a series of statements. Thus the scale captures the intensity of their feelings. The scale is named after its inventor, psychologist Rensis Likert [ CITATION Wik10 \l 1033 ]. 
Example of likert scales:
The use of drop down menu to list options to the user is excellent
1: Agree	2: Neutral	3: Disagree
Semantic differential scales are used less frequently than Likert scales. They explore a range of bipolar attitudes about a particular item. Each pair of attitudes is represented as a pair of adjectives. The participant is asked to place a cross in one of a number of positions between the two extremes to indicate agreement with the poles. 
Example of semantic differential scales:
The use of drop down menu to list options to the user is:
Excellent   |___|___|___|___|___|   Poor
4.3.3 Asking Experts
Software inspections and reviews are long established techniques for evaluating software code and structure. During the 1980s versions of similar techniques were developed for evaluating usability. Guided by heuristics, experts step through tasks role-playing typical users and identify problems. Developers like this approach because it is usually relatively inexpensive and quick to perform compared with laboratory and field evaluations that involve users. In addition, experts frequently suggest solutions to problems.
Heuristic Evaluation
Heuristic evaluation [ CITATION Placeholder7 \l 1033  ] is an inspection usability evaluation method. In heuristic evaluation, experts scrutinize the interface and its elements against established design rules. The experts should have some background knowledge or experience in HCI design and usability evaluation. 
Three to five experts are considered to be sufficient to detect most of the usability problems. The enlisted experts individually evaluate the system/prototype under consideration. They assess the user interface as a whole and also the individual user interface elements. The assessment is performed with reference to some usability heuristics. When all the experts are through with the assessment, they come together and compare and appropriately aggregate their findings[ CITATION Bert08 \l 1033 ].
Nielsen's set of Heuristics:
Rolf Molich and Jakob Nielsen (1990) initially proposed a set of usability heuristics for the design of user interfaces [ CITATION Mol90 \l 1033 ]; [ CITATION Nie90 \l 1033 ]. Aiming to maximize the explanatory power of the heuristics, Nielsen later refined them [ CITATION Nie93 \l 1033  \m Hai04], thereby deriving the following set:
    1. Visibility of system status: 
The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.
    2. Match between system and the real world:
	The system should speak the users’ language, with words, phrases, and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order.
    3. User control and freedom: 
Users often choose system functions by mistake and will need a clearly marked “emergency exit” to leave the unwanted state without having to go through an extended dialogue. Support undo and redo.
    4. Consistency and standards: 
Users should not have to wonder whether different words, situations, or actions mean the same thing. Follow platform conventions.
    5. Error prevention: 
Even better than good error messages is a careful design which prevents a problem from occurring in the first place. Either eliminate error-prone conditions or check for them and present users with a confirmation option before they commit to the action.
    6. Recognition rather than recall: 
Make objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate.
    7. Flexibility and efficiency of use: 
Accelerators— unseen by the novice user—may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions.
    8. Aesthetic and minimalist design: 
Dialogues should not contain information which is irrelevant or rarely needed. Every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility. 
    9. Help users recognize, diagnose, and recover from errors: 
Error messages should be expressed in plain language (no codes), precisely indicate the problem and constructively suggest a solution.
    10. Help and documentation: 
Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user’s task, list concrete steps to be carried out, and not be too large. 
In predictive evaluations, experts apply their knowledge of typical users, often guided by heuristics, to predict usability problems. The key feature of predictive evaluation is that users need not be present, which makes the process quick, relatively inexpensive, and thus attractive to companies [ CITATION Placeholder5 \l 1033 ]. 
In recent years heuristic evaluation in which experts review the software product guided by tried and tested heuristics has become popular. With the advent of a range of new interactive products (e.g. web, mobiles, collaborative technologies), researchers found that this original set of heuristics has been found insufficient because the user expectations from the new interactive products are different from the legacy products. Besides that they are changing so rapidly due to the rapid development of more new interactive products.
New sets of heuristics are needed that are aimed at evaluating different classes of interactive products. In particular, specific heuristics are needed that are tailored to evaluating web based products, mobile devices, collaborative technologies [ CITATION Placeholder5 \l 1033 ] .
Mobile Heuristics:
Enrico Bertini (2008) used a group of three usability researchers to come up with a new set of heuristics better suited to be applied on mobile evaluation settings [ CITATION Bert08 \l 1033  ].
Each of the three usability researchers was provided with a table reporting Nielsen’s traditional heuristics [ CITATION Sef09 \l 1033 ] together with their corresponding definitions. Each researcher worked individually at assessing: which of Nielsen’s heuristics were considered irrelevant for mobile settings; which of Nielsen’s heuristics were relevant, but needed some revision or modification; and which additional heuristics needed to be included in the original set to cover relevant aspects of mobile applications.
A discussion meeting among the usability researchers was held to arrive at a shared table consolidated from the three developed before by each researcher; then, submitted this set of heuristics (with their definitions) to a number of targeted HCI researchers and professionals in the mobile computing and usability community, to elicit feedback on the adequacy of the heuristics proposed. They arrived at the final set of mobile usability heuristics which are listed below [ CITATION Ber061 \l 1033 ]:
“Heuristic 1: Visibility of system status of the mobile device:  
Through the mobile device, the system should always keep users informed about what is going on. Moreover, the system should prioritize messages regarding critical and contextual information such as battery status, network status, environmental conditions, and so forth. Since mobile devices often get lost, adequate measures such as encryption of data should be taken to minimize loss. If the device is misplaced, the device, system, or application should make it easy to recover it.
Heuristic 2: Match between system and the real world: 
Enable the mobile user to interpret the information provided correctly, by making it appear in a natural and logical order; whenever possible, the system should have the capability to sense its environment and adapt the presentation of information accordingly.
Heuristic 3: Consistency and mapping:
The user's conceptual model of the possible function/interaction with the mobile device or system should be consistent with the context. It is especially crucial that there be a consistent mapping between user actions/ interactions (on the device buttons and controls) and the corresponding real tasks (e.g., navigation in the real world).
Heuristic 4: Good ergonomics and minimalist design: 
Mobile devices should be easy and comfortable to hold/carry along as well as robust to damage (from environmental agents). Also, since screen real estate is a scarce resource, use it with parsimony. Dialogues should not contain information that is irrelevant or rarely needed.
Heuristic 5: Ease of input, screen readability, and glace ability: 
Mobile systems should provide easy ways to input data, possibly reducing or avoiding the need for the user to use both hands. Screen content should be easy to read and navigate through notwithstanding different light conditions. Ideally, the mobile user should be able to quickly get the crucial information from the system by glancing at it. 
Heuristic 6: Flexibility, efficiency of use, and personalization: 
Allow mobile users to tailor/personalize frequent actions, as well as to dynamically configure the system according to contextual needs. Whenever possible, the system should support and suggest system-based customization if such would be crucial or beneficial.
Heuristic 7: Aesthetic, privacy, and social conventions: 
Take aesthetic and emotional aspects of the mobile device and system use into account. Make sure that users' data is kept private and safe. Mobile interaction with the system should be comfortable and respectful of social conventions.
Heuristic 8: Realistic error management: 
Shield mobile users from errors. When an error occurs, help users to recognize, to diagnose, if possible, and to recover from the error. Mobile computing error messages should be plain and precise. Constructively suggest a solution (which could also include hints, appropriate FAQs, etc.). If there is no solution to the error or if the error would have negligible effect, enable the user to gracefully cope with the error” [ CITATION Ber061 \l 1033 ].
An experiment study was done after that by eight usability experts to perform heuristic evaluation of two mobile applications. Each of the experts was given Nilsen's set of heuristics, their proposed set of mobile heuristics, and the Nilsen's severity ranking scale (SRS) which is given below[ CITATION HOC03 \l 1033  ]:
Rating
Description
0
I don't agree that this is a usability problem at all
1
Cosmetic problem: Need not be fixed unless extra time is available for the project
2
Minor usability problem: fixing this should be given low priority.
3
Major usability problem: fixing this should be given high priority (important to fix)
4
Usability catastrophes. Imperative to fix it before the product can be released.
Table 1: Severity Ranking Scale (SRS) (quoted from [ CITATION HOC03 \l 1033 ])
Data analysis showed that the use of the mobile heuristics has increased the number of flaws identified in the analysis of both applications, and has reduced variation among experts’ analyses.
Cognitive Walkthrough
Walkthroughs involve walking through a task with the system and noting problematic usability features. Most walkthrough techniques do not involve users. Others, such as pluralistic walkthroughs, involve a team that includes users, developers, and usability specialists.
"Cognitive walkthroughs involve simulating a user's problem-solving process at each step in the human-computer dialog, checking to see if the user's goals and memory for actions can be assumed to lead to the next correct action." [ CITATION Placeholder7 \l 1033  ]. The defining feature is that they focus on evaluating designs for ease of learning-a focus that is motivated by observations that users learn by exploration [ CITATION Placeholder7 \l 1033  ].
The characteristics of typical users are identified and documented and sample tasks are developed that focus on the aspects of the design to be evaluated. A description or prototype of the interface to be developed is also produced, along with a clear sequence of the actions needed for the users to complete the task.
A designer and one or more expert evaluators then come together to do the analysis. The evaluators walk through the action sequences for each task, placing it within the context of a typical scenario, and as they do this they try to answer the following questions: Will the correct action be sufficiently evident to the user? (Will the user know what to do to achieve the task?) Will the user notice that the correct action is available? [ CITATION Placeholder5 \l 1033 ].
Advantages:
    1) It focuses on users' problems in detail.
    2) Users do not need to be present.
Disadvantages:
    1) It is very time-consuming.
    2) The technique has a narrow focus that can be useful for certain types of system but not others.
Pluralistic Walkthrough
"Pluralistic walkthroughs are another type of walkthrough in which users, developers and usability experts work together to step through a task scenario, discussing usability issues associated with dialog elements involved in the scenario steps" [ CITATION Placeholder7 \l 1033  ]. Each group of experts is asked to assume the role of typical users.
It is a method of usability inspection where a diverse group of stakeholders in a design are brought together to review the design, including user interface designers, users, developers, and management. The walkthrough is conducted by identifying primary tasks for the system and stepping through those tasks, identifying usability problems along the way. The purpose of bringing together various stakeholders is that each one brings a certain perspective, expertise, and set of goals for the project that enables a greater number of usability problems to be found [ CITATION For09 \l 1033  ].
Advantages:
    1) Focuses on users' tasks
    2) Performance data is produced and many designers like the apparent clarity of working with quantitative data.
    3) This technique also lends itself well to participatory design practices by involving a multidisciplinary team in which users play a key role.
Disadvantages:
    1) Having to get all the experts together at once and then proceed at the rate of the slowest.
    2) Only a limited number of scenarios, and hence paths through the interface, can usually be explored because of time constraints.
4.3.4 Testing Users' Performance
User testing is an applied form of experimentation used by developers to test whether the product they develop is usable by the intended user population to achieve their tasks. In user testing the time it takes typical users to complete clearly defined, typical tasks is measured and the number and type of errors they make are recorded[ CITATION Dum06 \l 1033 ].
User testing involves measuring the performance of typical users doing typical tasks in controlled laboratory-like conditions. Its goal is to obtain objective performance data to show how usable a system or product is in terms of usability goals, such as ease of use or learnability. More generally, usability testing relies on a combination of techniques including observation, questionnaires and interviews as well as user testing, but user testing is of central concern, and in this chapter we focus upon it[ CITATION Placeholder5 \l 1033 ].
Because user testing has features in common with scientific experiments, it is sometimes confused with experiments done for research purposes. Both measure performance. However, user testing is a systematic approach to evaluating user performance in order to inform and improve usability design, whereas research aims to discover new knowledge[ CITATION Jon06 \l 1033 ].
Typically 5-12 users are involved in user testing, but often there are fewer and compromises are made to work within budget and schedule constraints. “Quick and dirty” tests involving just one or two users are frequently done to get quick feedback about a design idea [ CITATION Usa02 \l 1033  ]. 
There are many things to consider before doing user testing. Controlling the test conditions is central, so careful planning is necessary. This involves ensuring that the conditions are the same for each participant, that what is being measured is indicative of what is being tested and that assumptions are made explicit in the test design[ CITATION Placeholder9 \l 1033  ].
User testing is most suitable for testing prototypes and working systems. Although the goal of a test can be broad, such as determining how usable a product is, more specific questions are needed to focus the study, such as, "can users complete a certain task within a certain time, or find a particular item, or find the answer to a question".
User testing falls in the usability testing paradigm and sometimes the term "user testing" is used synonymously with usability testing. It involves recording data using a combination of video and interaction logging, user satisfaction questionnaires, and interviews.
Deciding on which tasks to test users' performance is critical. Typically, a number of "completion" tasks are set, such as finding a website, writing a document or creating a spreadsheet. Quantitative performance measures are obtained during the tests that produce the following types of data: time to complete a task, time to complete a task after a specified time away from the product, number and type of errors per task, number of errors per unit of time, number of navigations to online help or manuals, number of users making a particular error, and number of users completing a task successfully [ CITATION Wix97 \l 1033 ].
The type of test prepared will depend on the type of prototype available for testing as well as study goals and questions. For example, whether testing a paper prototype, a simulation, or a limited part of a system's functionality will influence the breadth and complexity of the tasks set.
Generally, each task lasts between 5 and 20 minutes and is designed to probe a problem. Tasks are often straightforward and require the user to find this or do that, but occasionally they are more complex, such as create a design, join an online community or solve a problem.
It is important to have a representative sample to ensure that the findings of the user test can be generalized to the rest of the user population. Selecting participants according to clear objectives helps evaluators to avoid unwanted bias. For example, if 90% of the participants testing a product for 9-12 year-olds were 12, it would not be representative of the full age range. The results of the test would be distorted by the large group of users at the top-end of the age range.
User testing requires the testing environment to be controlled to prevent unwanted influences and noise that will distort the results. Many companies, such as Microsoft and IBM, test their products in specially designed usability laboratories to try to prevent this. These facilities often include a main testing laboratory, with recording equipment and the product being tested, and an observation room where the evaluators sit and subsequently analyze the data[ CITATION Dum08 \l 1033 ]. 
The space may be arranged to superficially mimic features of the real world. For example, if the product is an office product or for use in a hotel reception area, the laboratory can be set up to match. But in other respects it is artificial. Soundproofing and lack of windows, telephones, fax machines, co-workers, etc. eliminate most of the normal sources of distraction. The observation room is usually separated from the main laboratory by a one-way mirror so that evaluators can watch testers but testers cannot see them.
Typically performance measures (time to complete specified actions, number of errors, etc.) are recorded from video and interaction logs. Since most user tests involve a small number of participants, only simple descriptive statistics can be used to present findings: maximum, minimum, average for the group and sometimes standard deviation, which is a measure of the spread around the mean value. These basic measures enable evaluators to compare performance on different prototypes or systems or across different tasks[ CITATION Lov05 \l 1033 ].
4.3.5 Modeling Users' Task Performance
In contrast to the other forms of evaluation we have discussed, predictive models provide various measures of user performance without actually testing users. This is especially useful in situations where it is difficult to do any user testing.
The most well-known predictive modeling technique in human-computer interaction is GOMS. This is a generic term used to refer to a family of models that vary in their granularity as to what aspects of a user's performance they model and make predictions about. These include the time it takes to perform tasks and the most effective strategies to use when performing tasks. The models have been used mainly to predict user performance when comparing different applications and devices. Below we describe two of the most well-known members of the GOMS family: the GOMS model and the keystroke level model.
The GOMS Model
Modeling user tasks and processes in general attained much attention from researchers, not only in the last decades. The pioneering work of Card, Moran and Newell introduced the GOMS (Goals, Operators, Methods, Selection rules) model in 1980.
The GOMS elements enable designers to model expert user behavior during a given task and therefore to analyze user complexity for interactive systems including cognitive information processing activities[ CITATION Car83 \l 1033 ].
The GOMS model was developed in the early eighties by Stu Card, Tom Moran and Alan Newell. It was an attempt to model the knowledge and cognitive processes involved when users interact with systems. The term GOMS stands for goals, operators, methods, and selection rules[ CITATION Car83 \l 1033 ].
Goals refer to a particular state the user wants to achieve (e.g., find a website on interaction design).
Operators refer to the cognitive processes and physical actions that need to be performed in order to attain those goals (e.g., decide on which search engine to use, think up and then enter keywords in search engine). 
Methods are learned procedures for accomplishing the goals. They consist of the exact sequence of steps required (e.g., drag mouse over entry field, type in keywords, press the "go" button). 
Selection rules are used to determine which method to select when there is more than one available for a given stage of a task. For example, once keywords have been entered into a search engine entry field, many search engines allow users to press the return key on the keyboard or click the "go" button using the mouse to progress the search. A selection rule would determine which of these two methods to use in the particular instance. 
In its beginning, GOMS was targeted mainly at text editing tasks on office desktop computers. For that purpose, the Keystroke-Level Model [ CITATION Kie03 \l 1033 ], a tailored instance of GOMS, was developed. A task can be described using operators that model unit tasks like key presses, pointing, hand switches between mouse and keyboard, mental acts, system response times and others. With a set of user studies the authors were able to give estimates for the duration of these actions and evaluate those estimates. 
Several projects (e.g., [ CITATION Bii00 \l 1033 ] and [ CITATION Hau93 \l 1033 ]) successfully used and validated those values in various application areas. Others slightly adjusted or added one or more operators for a specific application setting (e.g., [ CITATION Man98 \l 1033 ]).
Benefits of GOMS Model:
    1) It allows comparative analyses to be performed for different interfaces or computer systems relatively easily. 
Problems with GOMS Model [ CITATION Joh96 \l 1033 ]:
    1) It can only really model computer based tasks that involve a small set of highly routine data-entry type tasks.
    2) It is intended to be used only to predict expert performance, and does not allow for errors to be modeled.
    3) It much more difficult (and sometimes impossible) to predict how an average user will carry out their tasks when using a range of systems.
The Keystroke Level Model
Experience has shown that it is essential to assess designs and applications early in the development phase. The phone company NYNEX probably saved millions of dollars [ CITATION Placeholder10 \l 1033  ] because the Keystroke-Level Model (KLM) was used to find out that the interaction performance of a newly designed workstation would have been worse than the existing system. This was possible without having to actually build and test the new system at all.
Even though time to completion of a task is only one aspect of a promising  application, it is an important factor for a large set of applications ranging from small games to reservation systems, from sub tasks of larger systems to support and search systems. In addition to games and entertainment, mobile phones are increasingly used to enhance productivity and throughput in various fields like security or ticket sale [ CITATION Hol07 \l 1033 ].
The keystroke level model differs from the GOMS model in that it provides actual numerical predictions of user performance. Tasks can be compared in terms of the time it takes to perform them when using different strategies. The main benefit of making these kinds of quantitative predictions is that different features of systems and applications can be easily compared to see which might be the most effective for performing specific kinds of tasks.
When developing the keystroke level model,[ CITATION Car83 \l 1033 ] analyzed the findings of many empirical studies of actual user performance in order to derive a standard set of approximate times for the main kinds of operators used during a task. In so doing, they were able to come up with the average time it takes to carry out common physical actions (e.g., press a key, click on a mouse button) together with other aspects of user-computer interaction (e.g., the time it takes to decide what to do, the system response rate). Below are the core times they proposed for these.
Operator
Description
Time in seconds
K
Pressing a single key or button 
    • Skilled typist (55 wpm) 
    • Average typist (40 wpm) 
    • User unfamiliar with the keyboard 
    • Pressing shift or control key
0.35 (average)
0.22
0.28
0.2
0.08
P
Pointing with a mouse or other device to a target on a display
1.10

P1
Clicking the mouse or similar device
0.2
H
Homing hands on the keyboard or other device
0.4
D
Draw a line using a mouse
Variable depending on the length of the line.
M
Mentally prepare to do something (e.g., make a decision)
1.35
R(t)
System response time--counted only if it causes the user to wait when carrying out their task
t (variable due to the machine capabilities)
Table 2: The standard set of approximate times for the main kinds of operators used during a task by [ CITATION Car83 \l 1033 ]
The predicted time it takes to execute a given task is then calculated by describing the sequence of actions involved and then summing together the approximate times that each one will take:

Problems with Keystroke Level Model: (Quoted from [ CITATION Kie03 \l 1033 ])
    1) The time P, the time taken for pointing with a mouse or other device to a target on a display varies due to the size of the target to be reached.
    2) Some decisions are seen quite arbitrary or obvious, then to include M time is debated.
    3) Just like typing skills vary between individuals, so too do the mental preparation times people spend thinking about what to do. Mental preparation can vary from under 0.5 of a second to well over a minute.
Chapter 5: Our Framework to Support Usability Evaluation of Mobile Applications
5.1 Introduction
In chapter 4, we argued for the following points:
    • The nature of mobile applications which is purely different from the nature of desktop applications.
    • The current debate about what is more suitable for the usability evaluation of mobile applications: Laboratory testing or field studies.
    • The suggestions in some researches that it is time to introduce a hybrid approach that combines the advantages of both laboratory and field studies.
Therefore, we are in this chapter about to introduce our usability evaluation framework for mobile applications which will combine the advantages of both with a relatively low cost.
Studying the literature, we found that the log file analysis have been widely used as a usability evaluation method for the web applications. It was very successful in such evaluations as we will show in section 2 of this chapter.
Services and applications are growing in complexity, which makes them harder to study and evaluate. Event logging provides a promising solution to the problems encountered when evaluating complex applications [ CITATION Placeholder13 \l 1033  ]. 
In recent years, the mobile device has continuously been growing as a platform for applications. Earlier mobile applications were simple and used straightforward page layouts, now mobile applications have complicated user interfaces[ CITATION Att06 \l 1033 ].
There are two problems all web and mobile application designers’ face: understanding what tasks people are trying to accomplish on a mobile application and discovering what difficulties people meet in completing these tasks. Just knowing one or the other is insufficient. For example, an interaction designer could know that someone wants to find and purchase gifts, but this isn’t useful unless the designer also knows what problems hinder the user from completing the task. Similarly, the designer could know that this user left the application at the checkout process, but this isn’t meaningful unless the designer also knows that he truly intended to buy something and is not simply browsing [ CITATION Hon02 \l 1033 ].
There are a lot of methods for discovering what people want to do on a mobile application, such as structured interviews, ethnographic observations, and questionnaires. Instead, the target is to find techniques for solving the other problem, which is understanding what obstacles people are facing on a mobile application. Traditionally, this kind of information is gathered by running usability tests in a laboratory. A usability specialist brings in several participants to a usability lab and asks them to complete a few predefined tasks. The usability engineer observes what stumbling people come across and follows up with a survey and an interview to gain more insights into the issues [ CITATION Hon02 \l 1033  ].
The shortcoming to this traditional approach is that it is very time consuming to conduct usability tests with large number of participants because this takes an extensive effort to schedule participants, observe them, and analyze the results. Therefore, the data is expected to reflect only a few people and is qualitative. These small numbers also make it difficult to cover all of the possible tasks on a site. Additionally, small samples are less convincing when asking management to make expensive changes to a site. Finally, a small set of participants may not find the majority of usability problems [ CITATION Tec09 \l 1033 ]. 
In spite of claims that around five participants are enough to find the majority of usability problems, a recent study by Spool and Schroeder suggests that this number may be nowhere near enough. Better techniques and tools are needed to increase the number of participants and tasks that can be managed for a usability test [ CITATION Hon02 \l 1033  ].
5.2 What is Log File?
According to the Webopedia, encyclopedia of computing technology, a log file is defined as “a file that lists actions that have occurred”. These files are generated by servers – a computer or a device on a network that manages network resources and contain a list of all requests made to the server by the network’s users [ CITATION Web \l 1033 ]. A sample of a log file is shown in Figure 3.

Figure 3: a sample for a log file
In contrast to all traditional usability evaluation methods, log analysis is a method for quantitatively understanding what large numbers of users are trying to do on a website. Log files also have the benefit of letting the participants of the evaluation work distantly in their own sites instead of coming to a single laboratory. In addition, usability test participants can evaluate the targeted application from any location on their own time, using their own equipment and network connection[ CITATION Hon02 \l 1033  ].
5.3 Why Log file Analysis?
5.3.1 How to Record User’s Events While Using a Mobile Application?
Some research has been done on usability testing methods comparing user interaction with mobile devices in the laboratory or in the mobile context. Results show that it is appropriate to test mobile user interaction in the lab [ CITATION Jam06 \l 1033 ]. But, testing in the lab does not work for mobile applications like navigation applications or electronic tourist guides, as these applications do not work without contextual information. Approaches such as following the user with a cameraman or letting her/him wear a capture vest with a camera on her/his shoulder are supposed to influence natural behavior. Therefore, an unobtrusive way of testing in the field is needed [ CITATION Kaw08 \l 1033 ].
Evaluating usability for mobile phone applications is a quite difficult task. Typical mobile  usability testing collect information by means of attaching an external camera to capture a view of the mobile screens during the course of experiment to record the different behaviors of the users with system [ CITATION Duh06 \l 1033  ]. Using an external camera to view the mobile device’s screen is quite difficult because of restrictions like the small size of the mobile screen.
As Ridene et al (2009) suggested, an alternative to this is to use software that takes snapshots of mobile screen similar to the software available for desktop. However, because of limitations of mobile devices, it is quite difficult to find such software that can accurately and efficiently capture user interaction with the mobile applications being tested [ CITATION Rid09 \l 1033 ].
Logging of events on the other hand can be an accurate source of usage information. The challenge in event logging though is in the whole process of preparing the system for data collection and the extraction and interpretation of the vast amount of logged data [ CITATION Tak08 \l 1033 ].
We found that using log file analysis is very suitable for the usability evaluation of mobile applications. Because of many reasons: the most important is that for companies who which to conduct field studies evaluations or laboratory based evaluations and use observation method to record user behavior while using their mobile application, they will be hindered by the problem of biasing the user’s behavior when the test participants know that they are being observed by camera or an observer. Hence, using log file analysis will not affect the user’s behavior as it does not require an observer to be there with the test participants during performing the test tasks. It also does not require the existence of cameras that take photos for the facial expressions of the user.  
5.3.2 Success of Web Usability Evaluations Using Log File Analysis
Event logging was originated in websites. Browser logging analysis has made its way into logging and analyzing events [ CITATION Placeholder13 \l 1033  ].Based on Matera et al. (2008) research, Logging analysis has proven as an efficient and effective method to examine usability of websites. In addition, usability testing in the early development phase, such improvements can now also be based on the insights gained from real users by analyzing their usage behavior. Looking at interaction logs seems to be one of the more efficient ways to do this [ CITATION Mat08 \l 1033 ].
The review of the literature in the research of Haigh and Megarity (2004) has shown that the log file analysis was used a lot and successfully in the usability evaluation of websites. Logs can be used to collect a huge amount of quantitative information about web usability. If analyzed properly, log information provides a baseline of statistics that specify levels of usage and support comparisons between sites over time. Such analysis also provides some technical information regarding server load, unusual activity, or unsuccessful requests, and can assist in marketing and site development and management activities [ CITATION Hai04 \l 1033 ].
Drott (1998) described how the usage of logging can give the web designers a much complete vision of how users are accessing their site. Server logs can be used to monitor use patterns and analyze them to improve the design and functionality of the web site. Web log data has been used to analyze and redesign a wide range of web-based material, including: online tutorials, databases, fact sheets, and reference material[ CITATION Placeholder12 \l 1033  ].
Many researchers discussed the need to study how students use and react via educational websites. Therefore, they can establish correlation between student behavior and other factors such as academic performance and the quality of their learning experiences ([ CITATION Nac03 \l 1033 ]; [ CITATION Zai01 \l 1033 ]; [ CITATION Zai011 \l 1033 ]). However, the developers of these websites have very few support tools to help them evaluate the behavior of the learners within these environments. Analysis of log file data of student interactions to gain information about learning behavior has typically focused on frequencies of Web page visits and duration of visits [ CITATION Nac03 \l 1033 ]. Other researchers have focused on exploring navigation paths of website visits, represented by sequences of interactions, to provide deeper insights into student learning behavior. The data abstractions and analysis techniques used in these studies are useful for comparisons of behavior; however, they are limited in their capacity to provide detailed information about the activities of the student[ CITATION Tak08 \l 1033 ]. 
5.4 Our Suggested Evaluation Framework
The proposed evaluation framework simply contains five stages (as shown in Figure 4): 
    1. Decide whether to perform laboratory or field study evaluation or both.
    2. Design tasks for participants.
    3. Install MobLog and the applications to be evaluated.
    4. Train the test participants on the application under evaluation.
    5. Conduct the evaluation sessions.
    6. Analyze data.
The first stage is for the evaluators to decide whether to perform the evaluation in a laboratory based settings or in the field. There are several advantages of performing usability testing of mobile applications through controlled laboratory experiments [ CITATION Buy02 \l 1033 ]. First, a tester has full control over an experiment. He/she can define particular tasks and procedures that match the goal of a usability study, and ensure that participants follow experimental instructions. For example, if the objective of a study is to investigate the effectiveness of a data entry method while a user is moving around, then a laboratory experiment is more appropriate than a field study, because testers can explicitly require and ensure participants to use a mobile device while moving. Second, it is easy to measure usability attributes and interpret results through controlling other irrelevant variables in a laboratory environment[ CITATION Bug08 \l 1033 ]. As a result, the laboratory experiment approach is very helpful to usability studies that focus on comparing multiple interface designs or data input mechanisms for mobile devices. Third, it makes it possible to use video or audio recording to capture participants’ reaction (including emotions) when using an application.
A major limitation of the laboratory testing method is that it ignores mobile context and unreliable connection of wireless networks [ CITATION Placeholder6 \l 1033 ]. A mobile application tested in a real environment may not work as well as it does in a controlled laboratory setting due to the changing and unpredictable network conditions and other environmental factors. In a lab, participants may not experience the potential adverse effects of those contextual factors[ CITATION Bug08 \l 1033 ].
On the other hand, a major advantage of conducting usability tests through field studies is that it takes dynamic mobile context and unreliable wireless networks into consideration, which are difficult to simulate in laboratory experiments as what Kjeldskov et al. (2005) proved in their paper . The perceived usability of a mobile application is derived based on participants’ experience in a real environment, which is potentially more reliable and realistic compared to laboratory experiments[ CITATION Kje05 \l 1033 ] 
However, performing field studies for mobile applications is far from trivial. A major challenge of this methodology lies in the lack of sufficient control over participants in a study. There are three fundamental difficulties reported in the literature [ CITATION Bec03 \l 1033 ]. First, it can be complicated to establish realistic environments that capture the richness of the mobile context. Second, it is not easy to apply established evaluation techniques such as observation and verbal protocol when a test is conducted in a field[ CITATION Kje10 \l 1033  ]. Third, because users will physically move around in a dynamically changing environment, it is challenging for data collection and condition control. Therefore, in a field study, testers must define the scope of mobile contexts (e.g., physical body movement such as walking, standing, or sitting, and environment such as home/office, quiet/noisy, bright/dark) and use effective methods to collect data in the field [ CITATION Kje10 \l 1033  ].
How to Decide?
Longoria (2006) argued that laboratory testing is more suitable for standalone mobile applications – those without the need of dealing with network connectivity. While designing and conducting a laboratory experiment for a mobile application that involves data transfer through a wireless network, the testers should focus on evaluating components of mobile applications, such as interface layout, information presentation schemes, design of menu and link structures, and data entry methods, that are not significantly influenced by mobility, network connectivity, and other contextual factors [ CITATION Lon06 \l 1033 ].
Field studies, on the other hand, are more appropriate for usability testing when major concerns are application performance related issues that are highly dependent on the mobile context. For example, it has shown that mobile context has strong effect on the usability of Internet surfing via mobile devices [ CITATION Kim \l 1033  ]. In addition, field studies are appropriate for studying user behavior and attitude toward mobile applications [ CITATION Pal05 \l 1033 ]. For example, if a usability study attempts to examine the user perceived usefulness and efficiency of a mobile Web portal application, then a field study should be deployed in order to enable participants to provide feedback based on their experience with the system in a real-world setting.
Garzonis & O’Neill (2007) argued that the attempts to adapt usability evaluation methods for mobile devices and services are limited. One of the issues that have been addressed by the research community is the tradeoff between lab and field mobile evaluation. Although the studies carried out are limited and involve a small number of participants, they seem to agree on certain points. First, lab evaluations are more efficient in identifying cosmetic problems, which do not hinder interaction and user performance. Second, field evaluation is more likely to identify issues that are related to the real context of use, such as navigation and social comfort [ CITATION Gar07 \l 1033 ].
The unique thing about the usage of log file analysis in the usability evaluation of mobile applications is that it reduces the cost of field studies evaluation. Because performing the evaluation through log file analysis makes the evaluation remote. This means that the test participants remain in their normal setting and need not to be present in a laboratory (Brush, Ames & Davis, 2004). This has two advantages: first, the test participants’ behavior is not affected when they know that they are observed. Second, we assure that the user uses the mobile application in the natural setting (Interruptions, movement, noise, multitasking etc.). Hence the evaluators don’t need to travel to the field and utilize portable usability laboratories and buy many portable cameras to capture the user reactions while using the application.
The second stage is to design the evaluation tasks for the participating users. This includes four sub-stages:
                a) The evaluators have an interview or more with the stakeholders of the mobile application to find out all system functionalities.
                b) Decide which functionalities will be included in the evaluation.
                c) Design the evaluation tasks based on the chosen system functionalities. 
                d) Prepare papers containing clear description of each task and its expected completion time (determined by the evaluators) in order to distribute them among test participants during the evaluation sessions.
The third stage involves installing the MobLog application and the application(s) which we tend to evaluate on the mobile devices that will be used throughout the evaluation session(s). 
The fourth stage involves two sub-stages:
    a) Training the test participants on the usage of the mobile application under evaluation.
    b) Distributing the task description papers among the test participants.
The fifth stage involves conducting the actual evaluation sessions with the participating test users. 
The sixth and final stage involves combining the log files resulting from the test experiments of the users. And then analyze them via Microsoft Excel or any other analysis software.

Figure 4: Our suggested framework to support the usability evaluation of mobile applications

Chapter 6: Applying the Suggested Framework on Two Case Study Applications
6.1 Introduction
We have developed a mobile application to help the evaluators get the data needed for the evaluation. This application is called “MobLog”. It records all events occurred by the user during the usage of specific application “the application under evaluation”. These events include: the keystrokes, time in which each stroke occurs, and the in-between time between keystrokes. A detailed description of the MobLog is given in section 6.2.
We then conducted an experiment to use the MobLog during the application of the suggested framework. Twenty users were involved. Two mobile applications were used. We applied the suggested framework and combined the data of the users’ behaviors while using the applications into Excel sheets and analyzed them to present the results.
6.2 The Mobile Logger Application (MobLog)
We have developed the Mobile Logger application (MobLog) to record all events occurred by the user while using the target “to be evaluated” installed mobile application. The MobLog front page lists all installed applications on the mobile device as shown in Figure 5.

Figure 5: The front page of the MobLog; listing all applications installed to enable the user choose one to monitor.

Once the user selects which application to monitor, the MobLog opens this application for the user and starts working in background to record each key press, the time it was pressed, and the in-between time between presses as shown in Figure 6.

Figure 6: Snapshot of the mobile screen containing one of the result files after using “Cinta” application.
These data is reported in a file of extension “csv”, so that it can be further analyzed by the evaluators via Microsoft Excel or any other analysis software.
6.3 The Case Studies
An experiment was conducted in the university of Modern Sciences and Arts (MSA), to test the MobLog on real mobile users during the application of our suggested framework. Twenty users were involved. This number of test participants was first recommended by Jackob Nielsen in [ CITATION Nie01 \l 1033 ] to discover most of the usability problems. We ensured that they are all using same mobile communication company “Etisalat” so that they are all have same internet connection speed so that the data about task completion times is more accurate. We also ensured that all mobile devices are Symbian S60.
Two mobile applications were used (“eBuddy”, and “Cinta”). They are two chatting software having seven common functionalities which are: create account, show offline friends, send message to an offline friend, chat with online friend, set your status as busy, add new friend, and view a friend’s profile. 
First, we chose to conduct both laboratory and field evaluation, because the applications under evaluation are neither risky nor geographical as explained in section 6.4. We then designed the tasks of the evaluation based on the main common functionalities of the two systems (Cinta and eBuddy). They were seven tasks which are: create account, show offline friends, send message to an offline friend, chat with online friend, set your status as busy, add new friend, and view a friend’s profile. Each one of them represents a task for the test participants.
Clear description for each task was prepared and printed in order to be distributed among test participants during the evaluation sessions. Then we began to install the MobLog and the applications to be evaluated on the mobile devices. We then asked half of the participants to perform tasks in the laboratory (this is the laboratory testing) and the other half to perform the tasks away at any time (this is the field studies). For the field studies test participants, they were asked to send us their log files in a maximum period of one week since this training session.
A training session was conducted for each application. The training session for “eBuddy” took 27 minutes, and for “Cinta” took 36 minutes. Task descriptions were distributed among participants. First evaluation session was conducted to collect the “Task Completion Time at first use” data. The second evaluation session was conducted after 2 days to collect data about: “Task Completion Time (TCT)”, “Number of calls on help”, and “Number of errors made by user”. The third evaluation session was conducted after two weeks to collect the “Task Completion Time after two weeks” which is the measuring variable for the “Memorability” usability attribute. One participant was absent in the third session (user no. 16).
Figure 7Figure 7 shows a snapshot of the mobile screen for a user while performing task1 (create account) in “eBuddy” application.  

Figure 7: Snapshot of the mobile screen for a user while performing task no.1 in “eBuddy” application
Evaluation sessions were conducted and data were collected and summarized in Excel sheets. A sample is shown in Figure 8 for the summarization of task completion time for task 1 which is “create an account for you”. The data collected for each task were: task completion time at first use, task completion time (after 2 days), number of errors, number of calls on help, number of steps, and task completion time after two weeks. Those were the measuring variables of the usability attributes which were presented in section 2.5.

Figure 8: Summary of task completion times for “create an account for you” in “eBuddy”

6.4 The Results
The following is the summarization of the usage data of all users using the average values for all measuring variables.
App
Task
TCT at first use
TCT
No of helps
No of errors
TCT after 2 weeks
No of steps
Cinta
1
3.17
 
0.29
0.41
 
4
eBuddy
1
1.51
 
0.24
0.29
 
3
Cinta
2
1.04
0.51
0.50
0.25
0.58
3
eBuddy
2
1.14
0.53
0.50
0.16
0.59
4
Cinta
3
0.57
0.51
0.25
0.15
0.58
3
eBuddy
3
1.22
0.55
0.15
0.25
0.55
2
Cinta
4
0.36
0.39
0.20
0.10
0.38
3
eBuddy
4
0.28
0.27
0.20
0.05
0.25
1
Cinta
5
1.50
1.23
0.75
0.35
1.24
5
eBuddy
5
1.24
1.15
0.55
0.00
1.11
2
Cinta
6
1.52
1.59
0.15
0.35
2.11
4
eBuddy
6
1.45
1.48
0.20
0.40
1.41
4
Cinta
7
1.43
1.02
0.50
0.35
1.21
3
eBuddy
7
1.21
1.36
0.35
0.20
1.00
2
Table 3: summarization of the usage data of all users using the average values for all measuring variables
The tables from table 4 till table 17) are subsets of table 3. Each is further clarified by a following chart that reflects the values in the table.
Task
App
TCT at first use
1
eBuddy
1.51
1
Cinta
3.17
Table 4: the average task completion time at first use (TCT at first use) of task 1 for both “Cinta” and “eBuddy” applications

Figure 9: the average “TCT at first use” of task 1 for both “Cinta” and “eBuddy” applications
Figure 9 shows that the “TCT at first use” of eBuddy in task 1 is less than that of Cinta, which means that the learnability of eBuddy is higher than the learnability of Cinta. 
Task
App
TCT at first use
TCT
TCT after 2 weeks
2
eBuddy
1.14
0.53
0.59
2
Cinta
1.04
0.51
0.58
Table 5: The average values for TCT at first use, TCT, and TCT after two weeks of task 2 for both applications

Figure 10: The average values for TCT at first use, TCT, and TCT after two weeks of task 2 for both applications
Figure 10 shows that regarding task 2, the “TCT at first use” of eBuddy is greater than that of Cinta, which means that the learnability of Cinta is higher than the learnability of eBuddy. It shows also that the TCT of eBuddy is less than that of Cinta which means that the efficiency of eBuddy is greater than that of Cinta. It shows also that the TCT after 2 weeks of eBuddy is less than that of Cinta, which means that the memorability of eBuddy is higher than that of Cinta.
Task
App
TCT at first use
TCT
TCT after 2 weeks
3
eBuddy
1.22
0.55
0.55
3
Cinta
0.57
0.51
0.58
Table 6: The average values for TCT at first use, TCT, and TCT after two weeks of task 3 for both applications

Figure 11: The average values for TCT at first use, TCT, and TCT after two weeks of task 3 for both applications
Figure 11 shows that regarding task 3, the “TCT at first use” of eBuddy is greater than that of Cinta, which means that the learnability of Cinta is less than the learnability of eBuddy. It shows also that the TCT of eBuddy is greater than that of Cinta which means that the efficiency of Cinta is greater than that of eBuddy. It shows also that the TCT after 2 weeks of eBuddy is less than that of Cinta, which means that the memorability of eBuddy is higher than that of Cinta. 
Task
App
TCT at first use
TCT
TCT after 2 weeks
4
eBuddy
0.28
0.27
0.25
4
Cinta
0.36
0.39
0.38
Table 7: The average values for TCT at first use, TCT, and TCT after two weeks of task 4 for both applications

Figure 12: The average values for TCT at first use, TCT, and TCT after two weeks of task 4 for both applications
Figure 12 shows that regarding task 4, the “TCT at first use” of eBuddy is less than that of Cinta, which means that the learnability of eBuddy is higher than the learnability of Cinta. It shows also that the TCT of eBuddy is less than that of Cinta which means that the efficiency of eBuddy is greater than that of Cinta. It shows also that the TCT after 2 weeks of eBuddy is less than that of Cinta, which means that the memorability of eBuddy is higher than that of Cinta.
Task
App
TCT at first use
TCT
TCT after 2 weeks
5
eBuddy
1.24
1.15
1.11
5
Cinta
1.50
1.23
1.24
Table 8: The average values for TCT at first use, TCT, and TCT after two weeks of task 5 for both applications

Figure 13: The average values for TCT at first use, TCT, and TCT after two weeks of task 5 for both applications
Figure 13 shows that regarding task 5, the “TCT at first use” of eBuddy is greater than that of Cinta, which means that the learnability of Cinta is higher than the learnability of eBuddy. It shows also that the TCT of eBuddy is less than that of Cinta which means that the efficiency of eBuddy is greater than that of Cinta. It shows also that the TCT after 2 weeks of eBuddy is less than that of Cinta, which means that the memorability of eBuddy is higher than that of Cinta.
Task
App
TCT at first use
TCT
TCT after 2 weeks
6
eBuddy
1.45
1.48
1.41
6
Cinta
1.52
1.59
2.11
Table 9: The average values for TCT at first use, TCT, and TCT after two weeks of task 6 for both applications

Figure 14: The average values for TCT at first use, TCT, and TCT after two weeks of task 6 for both applications
Figure 14 shows that regarding task 6, the “TCT at first use” of eBuddy is greater than that of Cinta, which means that the learnability of Cinta is higher than the learnability of eBuddy. It shows also that the TCT of eBuddy is less than that of Cinta which means that the efficiency of eBuddy is greater than that of Cinta. It shows also that the TCT after 2 weeks of eBuddy is less than that of Cinta, which means that the memorability of eBuddy is higher than that of Cinta.
Task
App
TCT at first use
TCT
TCT after 2 weeks
7
eBuddy
1.21
1.36
1.00
7
Cinta
1.43
1.02
1.21
Table 10: The average values for TCT at first use, TCT, and TCT after two weeks of task 7 for both applications

Figure 15: The average values for TCT at first use, TCT, and TCT after two weeks of task 7 for both applications
Figure 15 shows that regarding task 7, the “TCT at first use” of eBuddy is greater than that of Cinta, which means that the learnability of Cinta is higher than the learnability of eBuddy. It shows also that the TCT of eBuddy is higher than that of Cinta which means that the efficiency of eBuddy is less than that of Cinta. It shows also that the TCT after 2 weeks of eBuddy is less than that of Cinta, which means that the memorability of eBuddy is higher than that of Cinta.
Task
App
No of helps
No of errors
1
eBuddy
0.24
0.29
1
Cinta
0.29
0.41
Table 11: The average number of calls on help and the average number of errors of task 1 for both applications

Figure 16: The average number of calls on help and the average number of errors of task 1 for both applications
Figure 16 shows that regarding task 1, the average number of calls on help for eBuddy is less than that of Cinta, which means that the learnability of eBuddy is higher than Cinta. It shows also that the average number of errors the test participants made while using eBuddy is less than Cinta.
Task
App
No of helps
No of errors
2
eBuddy
0.50
0.16
2
Cinta
0.50
0.25
Table 12: The average number of calls on help and the average number of errors of task 2 for both applications

Figure 17: The average number of calls on help and the average number of errors of task 2 for both applications
Figure 17 shows that regarding task 2, the average number of calls on help for eBuddy is equal to Cinta, which means that both applications has the same level of learnability. It shows also that the average number of errors the test participants made while using eBuddy is less than Cinta.
Task
App
No of helps
No of errors
3
eBuddy
0.15
0.25
3
Cinta
0.25
0.15
Table 13: The average number of calls on help and the average number of errors of task 3 for both applications

Figure 18: The average number of calls on help and the average number of errors of task 3 for both applications
Figure 18 shows that regarding task 3, the average number of calls on help for eBuddy is less than that of Cinta, which means that the learnability of eBuddy is higher than Cinta. It shows also that the average number of errors the test participants made while using Cinta is less than eBuddy.
Task
App
No of helps
No of errors
4
eBuddy
0.20
0.05
4
Cinta
0.20
0.10
Table 14: The number of calls on help and the average number of errors of task 4 for both applications

Figure 19: The average number of calls on help and the average number of errors of task 4 for both applications
Figure 19 shows that regarding task 4, the average number of calls on help for eBuddy is equal to Cinta, which means that both applications has the same level of learnability. It shows also that the average number of errors the test participants made while using eBuddy is less than Cinta.
Task
App
No of helps
No of errors
5
eBuddy
0.55
0.00
5
Cinta
0.75
0.35
Table 15: The average number of calls on help and the average number of errors of task 5 for both applications

Figure 20: The average number of calls on help and the average number of errors of task 5 for both applications
Figure 20 shows that regarding task 5, the average number of calls on help for eBuddy is less than that of Cinta, which means that the learnability of eBuddy is higher than Cinta. It shows also that the average number of errors the test participants made while using Cinta is less than eBuddy.
Task
App
No of helps
No of errors
6
eBuddy
0.20
0.40
6
Cinta
0.15
0.35
Table 16: The average number of calls on help and the average number of errors of task 6 for both applications

Figure 21: The average number of calls on help and the average number of errors of task 6 for both applications
Figure 21 shows that regarding task 6, the average number of calls on help for eBuddy is greater than that of Cinta, which means that the learnability of eBuddy is less than Cinta. It shows also that the average number of errors the test participants made while using Cinta is less than eBuddy.
Task
App
No of helps
No of errors
7
eBuddy
0.35
0.20
7
Cinta
0.50
0.35
Table 17: The average number of calls on help and the average number of errors of task 7 for both applications

Figure 22: The average number of calls on help and the average number of errors of task 7 for both applications
Figure 22 shows that regarding task 7, the average number of calls on help for eBuddy is less than that of Cinta, which means that the learnability of eBuddy is higher than Cinta. It shows also that the average number of errors the test participants made while using eBuddy is less than Cinta.
no of steps
 
eBuddy
Cinta
Task 1
3
4
Task 2
4
3
Task 3
2
3
Task 4
1
3
Task 5
2
5
Task 6
4
4
Task 7
2
3
Table 18: The number of steps needed to accomplish each task in both applications

Figure 23: The number of steps needed to accomplish each task in both applications
Figure 23 shows that the number of steps required to accomplish tasks in e Buddy is less than Cinta for all tasks except for task 6, which means that the effectiveness of eBuddy is higher than Cinta.
Training Time
eBuddy
Cinta
36
27
Table 19: The time taken to train the test participants on each application

Figure 24: The time taken to train the test participants on each application
Figure 24 shows that the training time for eBuddy is less than Cinta, which means that the learnability of eBuddy is higher than Cinta. Studying the conclusions derived from the above 21 graphs we can conclude that eBuddy is more usable than Cinta.
6.5 Results Validation
To validate our results, we used a validated standardized questionnaire. It is “The System Usability Scale (SUS)” which was released into this world by John Brooke in 1986. 
It was originally created as a "quick and dirty" scale for administering after usability tests on systems like VT100 Terminal ("Green-Screen") applications. SUS is technology independent and has since been tested on hardware, consumer software, websites, cell-phones, and even the yellow-pages. It has become an industry standard with references in over 600 publications [ CITATION Whe09 \l 1033 ].
The SUS is a 10 item questionnaire with 5 response options: 
    1) I think that I would like to use this system frequently.
    2) I found the system unnecessarily complex.
    3) I thought the system was easy to use.
    4) I think that I would need the support of a technical person to be able to use this system.
    5) I found the various functions in this system were well integrated.
    6) I thought there was too much inconsistency in this system.
    7) I would imagine that most people would learn to use this system very quickly.
    8) I found the system very cumbersome to use.
    9) I felt very confident using the system.
    10) I needed to learn a lot of things before I could get going with this system.
The SUS uses the following response format. our version of SUS questionnaire is given in the appendix.

Scoring SUS [ CITATION Ban09 \l 1033 ]
    • For odd items: subtract one from the user response.
    • For even-numbered items: subtract the user responses from 5
    • This scales all values from 0 to 4 (with four being the most positive response). 
    • Add up the converted responses for each user and multiply that total by 2.5. This converts the range of possible values from 0 to 100 instead of from 0 to 40. 
A SUS score above a 68 would be considered above average and anything below 68 is below average [ CITATION Ban09 \l 1033 ].
The same twenty test participants were involved in the validation process. They answered the two questionnaires. Their responses were scored as shown in Table 20: SUS scores of the test participants and analyzed as shown in Table 20.
 User
eBuddy
Cinta
1
65
70
2
80
60
3
85
72.5
4
70
62.5
5
60
70
6
72.5
67.5
7
95
75
8
67.5
70
9
75
72.5
10
60
70
11
95
62.5
12
67.5
65
13
77.5
60
14
65.5
75
15
85
67.5
16
80
65
17
100
75
18
92.5
70
19
65
72.5
20
82.5
62.5
21
85
65
22
90
72.5
Average score
78
68
Table 20: SUS scores of the test participants

Figure 25: Average SUS scores of the two applications
The analysis of the average SUS scores shows that both eBuddy and Cinta are above average usability but eBuddy is much usable than Cinta as it got higher average SUS score as shown in Figure 25. And this validates the results we got in our results.
Chapter 7: Evaluation and Future Work 
7.1 Evaluation of Our Work
7.1.1 Usability Testing 
Usability evaluation of mobile devices in the laboratory has several advantages: 
    1) The conditions for conducting the test can be controlled.
    2) All participants experience the same setting leading to higher quality data.
7.1.2 Field Studies
Field studies take place in natural surroundings where the application is designed to be used. The distinguishing feature of field studies is that they are done in natural settings with the aim of increasing understanding about what users do naturally and how technology impacts them. Field studies have the following advantages:
    1) Context factors are taken into consideration. 
    2) The evaluator has less control of the setting.
7.1.3 Our Framework
The first advantage of our framework is that it lowers the cost of the field study evaluation as the only two costs we have is the cost of developing the Mobile Logger application plus the time taken to train the participants. 
Our new evaluation framework combines the advantages of both usability testing and field studies, as by installing the Mobile Logger on the phone to evaluate any application, the evaluator can conduct the evaluation session in a laboratory or let the participants perform the tasks away in the natural usage settings, and then send the usage log file by email or any other medium to the evaluator to be further analyzed. 
Another advantage here is that in case of field study, test participants need not to be present with the evaluator at all. They can be sent the Mobile logger and the application to be evaluated by email. This should be done with the advent of installation instructions and clear task descriptions. This of course lowers the cost of performing field study which was suffering from high cost. 
One of the difficulties of the MobLog is tracking the number of steps the user navigated through in order to achieve the task goal, especially because the MobLog saves only user keystrokes. We thought we can handle this by expanding MobLog so that it can record the user’s transitional forms through performing the tasks.
7.2 Limitations of MobLog
The MobLog is developed for mobile devices with keys as input method. It can not record other interaction styles such as the touch screen or gestures. 
7.3 Future Work
We suggest the following:
    • Adding a new function to the MobLog that takes snapshots or photos for the user while using the application under evaluation so that these photos can be used to analyze the facial expressions of the user. 
    • Expanding the MobLog so that it can perform some analysis to the collected log data.
    • Finding some way to generalize the MobLog so that it can be used on multiple platforms.
    • Expanding the MobLog so that it can record the user navigation throughout performing the tasks.
References
Abran, A., Khelifi, A., Suryn, W. & Seffah, A., 2009. Consolidating the ISO Usability Models. In The 11th International Software Quality Management Conference. Montreal, Canada, 2009. CiteSeer.
Artim, J., 2004. Usability and User Interface. [Online] Available at: http://www.primaryview.org/methodsProcess/Methods/notebooks/Usability.pdf [Accessed 14 July 2007].
Atterer, R., Wnuk, M. & Schmidt, A., 2006. Knowing the User’s Every Move – User Activity Tracking for Website Usability Evaluation and Implicit Interaction. In WWW 2006. Edinburgh, UK, 2006. ACM Press.
Baber, C., 2008. Evaluating Mobile Human-Computer Interaction. In J. Lumsden, ed. Handbook of Research on User Interface Design and Evaluation for Mobile Technology. USA: Information Science Reference (an imprint of IGI Global). pp.731-48.
Baber, C., Haniff, D. & wolleey, S., 1999. Contrasting paradigms for the development of wearable computers. IBM Systems Journal, pp.551-65.
Baillie, L., 2003. Future Telecommunication: Exploring actual use. In IFIP TC13 International Conference on Human-Computer Interaction (INTERACT ´03)., 2003. IOS Press.
Bangor, A., Kortum, P. & Miller, J., 2009. Determining What individual SUS Scores Mean. Journal of Usability Studies, 4(3), pp.114-23.
Beck, T., Christiansen, K. & Kjeldskov, J., 2003. Experimental Evaluation of Techniques for Usability Testing of Mobile Systems in a Laboratory Setting. In Proceedings of OzCHI 2003. Brisbane, Australia, 2003.
Bernhaupt, R., Mihalic, K. & Obrist, M., 2008. Usability Evaluation Methods for Mobile Applications. In Lumsden, L. Handbook of Research on User Interface Design and Evaluation for Mobile Technology. New York: Information Science Reference. pp.745-58.
Bertini, E., 2008. Appropriating Heuristic Evaluation Methods for Mobile Computing. In Lumsden, J. Handbook of Research on User Interface Design and Evaluation for Mobile Technology. New York: Information Science Reference. pp.780-801.
Bertini, E. et al., 2006. Conclusive Report on Accessibility and Usability in Mobile Computing. Roma: Multichannel Adaptive Information Systems (MAIS).
Bertini, E., Gabrielli, S. & Kimani, S., 2006. Appropriating and Assessing Heuristics for Mobile Computing. In AVI’06. Venezia, Italy, 2006. ACM.
Bevan, N., 1997. Quality and Usability: A New Framework.
Bevan, N., 1999. Quality in use: Meeting user needs for quality. Journal of Systems and Software, pp.89-96.
Bevan, N., 2008. [Online] Available at: http://www.nigelbevan.com/papers/ [Accessed 5 September 2009].
Biilter, O., 2000. Keystroke Level Analysis of Email Message Organization. In CHI '2000. The Hague, Amsterdam, 2000. ACM Press.
Bly, S., 1997. Field work: is it product work? ACM Interactions Magazine, pp.25-30.
Bourguet, M., 2008. Handling uncertainty in multimodal pervasive computing applications. Computer Communications, 31(18), pp.4234-41.
Bourguet, M. & Chang, J., 2008. Design and usability evaluation of multimodal interaction with finite state machines: a conceptual framework. JOURNAL ON MULTIMODAL USER INTERFACES, 2(1), pp.53-60.
BugHuntress QA Lab, 2008. Mobile Usability Testing: Problems and Solutions. In Quality Assurance: Management & Technologies (QAMT Ukraine 2007)., 2008. BugHuntress QA Lab.
Buyukkokten, O., Garcia-Molina, H., Paepcke, A. & Winograd, T., 2002. Efficient Web Browsing on Handheld Devices Using Page and Form Summarization. Transaction on Information Systems, 20(1), pp.82-115.
Cambridge University Press, 2011. Paradigm. [Online] Available at: http://dictionary.cambridge.org/dictionary/british/paradigm?q=paradigm [Accessed 23 March 2011].
Cambridge University Press, 2011. Technique. [Online] Available at: http://dictionary.cambridge.org/dictionary/british/technique?q=technique [Accessed 22 March 2011].
Card, S., Moran, T. & Newell, A., 1983. The Psychology of Human-Computer Interaction. United States of America: Lawrence Erlbaum Associates,Inc.
Catarci, T. et al., 2008. Appropriating Heuristic Evaluation Methods for Mobile Computing. In Lumsden, J. Handbook of Research on User Interface Design and Evaluation for Mobile Technology. New York: Information Science Refernece. pp.731-44.
Chisnell, D., 2009. Usability Testing. [Online] Available at: http://usabilitytestinghowto.blogspot.com/2009/02/consensus-on-observations-in-real-time.html [Accessed 15 December 2009].
Chittaro, L. & Cin, P., 2002. Evaluating Interface Design Choices on WAP Phones: Navigation and Selection. Personal and Ubiquitous Computing, pp.237-44.
Christie, J., Klein, R. & Watters, C., 2003. A comparison of simple hierarchy and grid metaphors for option layouts on small-size screens. International Journal of Human-Computer Studies, pp.564-84.
Crease, M. & Longworth, R., 2008. Mobile Evaluations in a Lab Environment. In Lumsden, J. Handbook of Research on User Interface Design and Evaluation for Mobile Technology. United States of America: Information Science Reference. pp.910-26.
Crease, M., Lumsden, J. & Longwort, B., 2007. A Technique for Incorporating Dynamic Paths in Lab Based Mobile Evaluations. In CHI 2007. San Jose, USA, 2007. ACM.
Dey, K., Salber, D. & Abowd, D., 2001. A Conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-Aware Applications. Human Computer Interaction, 16, pp.97-166.
Dillon, A., 2001. Beyond usability: process, outcome and affect in human-computer interactions. Canadian Journal of Information Science, pp.57-69.
Drott, M., 1998. Using Web Server Logs to Improve Site Design. In Proceedings of the 16th annual international conference on Computer documentation (SIGDOC '98)., 1998. ACM.
Duh, H., Tan, G. & Chen, V., 2006. Usability Evaluation for Mobile Device: A Comparison of Laboratory and Field Tests. In Proceedings of the 8th conference on Human-computer interaction with mobile devices and services (MobileHCI’06). Finland, 2006. ACM.
Dumala, R., 2006. Usability Testing. [Online] Available at: http://web4.uwindsor.ca/units/its/insight/insight.nsf/babe0ebac149fc7b852567d700715d2a/c806cb0081fdc71c852571800051497f!OpenDocument [Accessed 23 August 2009].
Dumas, J. & Loring, B., 2008. Moderating Usability Tests. Burlington: Elsevier Inc.
Ebling, M. & John, B., 2000. On the contributions of different empirical data in usability testing. In Proceedings of the 3rd conference on Designing interactive systems: processes, practices, methods, and techniques (DIS'00). Brooklyn, New York, 2000. ACM.
Fernandez, F., Forrai, J. & Hussmann, H., 2009. Evaluation of User Interface Design and Input Methods for Applications on Mobile Touch Screen Devices. In Gross, T., Gulliksen, J. & Kotzé, P., eds. 12th IFIP International Conference Uppsala. Germany, 2009. IFIP International Federation for Information Processing.
Fling, B., 2009. Mobile Design and Development. 1st ed. United States of America: O’Reilly Media, Inc.
Folmer, E., Gurp, J. & Bosch, J., 2001. Framework for capturing the Relationship between Usability and Software Architecture. In WWW10. Hong Kong, 2001. ACM.
Foraker Labs, 2009. Pluralistic Walkthrough. [Online] Available at: http://www.usabilityfirst.com/glossary/pluralistic-walkthrough/ [Accessed 25 October 2010].
Free Books Online, [ca.2008]. Evaluation paradigms and techniques. [Online] Available at: http://free-books-online.org/computers/human-computer-interaction/evaluation-paradigms-and-techniques/ [Accessed 15 January 2010].
Free Books Online, 2008. Evalaution Paradigms and Techniques. [Online] Available at: http://free-books-online.org/computers/human-computer-interaction/evaluation-paradigms-and-techniques/ [Accessed 2010].
Frøkjær, E., Hertzum, M. & Hornbæk, K., 2000. Measuring usability: are effectiveness, efficiency and satisfaction really correlated? In Proceedings of the ACM CHI 2000 Conference on Human Factors in Computing Systems. New York, 2000. ACM Press.
Garzonis, S., 2007. Usability Evaluation of context-aware mobile systems: A review. [Online] Available at: http://www.citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.8293. [Accessed 1 August 2008].
Garzonis, S. & O’Neill, F., 2007. Factors Contributing to Low Usage of Mobile Data Services: User Requirements, Service Discovery and Usability. In N. Bryan-Kinns, A. Blanford, P. Curzon & L. Nigay, eds. People and Computers XX — Engage. 5th ed. London: Springer-Verlag London Limited. pp.221-36.
Gong, J. & Tarasewich, P., 2005. Alphabetically Constrained Keypad Designs for Text Entry on Mobile Devices. In CHI 2005. Oregon, USA, 2005. ACM.
Good, M., Spine, T., Whiteside, J. & George, P., 1986. User-derived impact analysis as a tool for usability engineering. In CHI '86 Proceedings of the SIGCHI conference on Human factors in computing systems., 1986. ACM Press.
Gould, J. & Lewis, C., 1985. Designing for usability: Key principles and what designers think. Communications of the ACM, pp.300-11.
Grami, A. & Schell, B., 2009. Future Trends in Mobile Commerce: Service Offerings, Technological Advances and Security Challenges. [Online] Available at: http://dev.hil.unb.ca/Texts/PST/pdf/grami.pdf [Accessed 17 April 2010].
Gray, W., John, P. & Atwood, M., 1992. The Precis of Project Ernestine or an Overview of a Validation of GOMS. In Proceedings of the SIGCHI conference on Human factors in computing systems (CHI’92)., 1992. ACM Press.
Grifoni, P., 2009. Multimodal Human Computer Interaction and Pervasive Services. 1st ed. IGI Global snippet.
Gulliver, R., Serif, T. & Ghinea, G., 2004. Pervasive and Standalone Computing: the Perceptual Effects of Variable Multimedia Quality. International Journal of Human-Computer Studies, pp.640-65.
Hagen, P., Robertson, T., Kan, M. & Sadler, K., 2005. Emerging Research Methods for Understanding Mobile Technology Use. In OZCHI '05 Proceedings of the 17th Australia conference on Computer-Human Interaction. Australia, 2005. ACM.
Haigh, S. & Megarity, J., 2004. Web Usability. [Online] Available at: http://www.collectionscanada.ca/9/1/p1-256-e.html [Accessed 12 May 2009].
Ham, H. et al., 2006. Conceptual Framework and Models for Identifying and Organizing Usability Impact Factors of Mobile Phones. In OZCHI 2006. Sydney, Australia, 2006. ACM.
Ham, D. et al., 2006. Conceptual framework and models for identifying and organizing usability impact factors of mobile phones. In Proceedings of the 18th Australia conference on Computer-Human Interaction: Design: Activities, Artefacts and Environments (OZCHI '06). Sydney, Australia, 2006. ACM.
Hardy, R. & Rukzio, E., 2008. Touch & Interact: Touch-based Interaction of Mobile Phones with Displays. In MobileHCI 2008. Amsterdam, the Netherlands, 2008. ACM.
Haunold, P. & Kuhn, W., 1993. A Keystroke Level Analysis of ManualMap Digitizing. In Lecture Notes in Computer Science. Heidelberg-Berlin: Springer Verlag. pp.337-43.
Hertzum, M., 1999. User Testing in Industry: A Case Study of Laboratory, Workshop, and Field Tests. In the 5th ERCIM Workshop. New York, 1999. CiteseerX.
Hocko, J.M., 2003. Categorizing the “Badness” of Usability. [Online] Available at: http://www.jenhocko.com/bentley/hf750/assignments/paper_1.pdf [Accessed 9 July 2008].
Hoffer, J., George, J. & Valacich, J., 2008. Modern Systems Analysis and Design. USA: Pearson Prentice Hall.
Hoggan, E., Brewster, A. & Johnston, J., 2008. Investigating the effectiveness of tactile feedback for mobile touchscreens. In Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems. Florence, 2008. ACM.
Holleis, P., Otto, F., Hußmann, H. & Schmidt, A., 2007. Keystroke Level Model for Advanced Mobile Phone Interaction. In Models of Mobile Interaction (CHI 2007). San Jose, CA, USA, 2007. ACM Press.
Hong, J. & Landay, J., 2001. WebQuilt: A Framework for Capturing and Visualizing the Web Experience. In Proceedings of the 10th international conference on World Wide Web (WWW'10). Hong Kong, 2001. ACM.
Hughes, J., King, V., Rodden, T. & Andersen, H., 1994. Moving out of the control room: ethnography in system design. In Proceedings of the 1994 ACM conference on Computer supported cooperative work (CSCW'94). Chapel Hill, NC, USA, 1994. ACM Press.
Imperial College London, 2011. The Evaluation of Interactive Systems. [Online] Available at: http://www.doc.ic.ac.uk/~frk/frank/da/hci/Evaluation%20handout.PDF [Accessed 17 Febreuary 2011].
Ingale, J., ca.2007. Beyond Usability-The Testing Revolution. [Online] Avani Publications Available at: http://www.beyondtesting.co.in/images/stories/PDF/Issue%203-testing.pdf [Accessed 2009 June 2008].
Isomursu, M., Kuutti, K. & Vainamo, S., 2004. Experience clip: Method for user participation and evaluation of mobile concepts. In The Participatory Design Conference., 2004. ACM.
Ivory, M. & Hearst, M., 2001. The State of the Art in Automating Usability Evaluation of User Interfaces. ACM Computing Surveys, p.470–516.
Jambon, F., 2006. Reality Testing of Mobile Devices: How to Ensure Analysis Validity? In CHI2006. Montreal, Canada, 2006. ACM.
John, B. & Kieras, D., 1996. Using GOMS for User Interface Design and Ealuation: Which technique? [Online] Available at: http://reference.kfupm.edu.sa/content/u/s/using_goms_for_user_interface_design_and_336423.pdf [Accessed 3 Febreuary 2009].
Johnson, P., 1998. Usability and mobility: Interaction on the move. In Proceedings of the first Workshop on Human-Computer Interaction with Mobile Device (Interact'99). Edinburgh, 1998.
Jones, M. & Marsden, G., 2006. Mobile Interaction Design. England: John Wiley & Sons Ltd.
Jones, M., Marsden, G., Nasir, N. & Boone, K., 1999. Improving Web Interaction on Small Displays. In Proceedings of the Eighth International Conference on World Wide Web. Canada, 1999.
Kaikkonen, A., Kallio, T., Kekäläinen, A. & Cankar, M., 2005. Usability testing of mobile applications: A comparison between laboratory and field testing. Journal of Usability Studies, 1(1), pp.4-16.
Kaikkonen, A. et al., 2008. Will Laboratory Test Results be Valid in Mobile Contexts? In Lumsden, J. Handbook of Research on User Interface Design and Evaluation for Mobile Technology. United States of America: Information Science Reference. pp.897-909.
Kawalek, J., Stark, A. & Riebeck, M., 2008. A New Approach to Analyze Human-Mobile Computer Interaction. Journal of Usability studies, pp.90-98.
Keinonen, T., 2007. Usability of Interactive Products. [Online] Available at: http://www2.uiah.fi/projekti/metodi/158.htm [Accessed 26 March 2009].
Ketola, P. & Roykkee, M., 2001. The Three Facets of Usability In Mobile Handsets (2001). [Online] Available at: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.4302 [Accessed 8 March 2008].
Kieras, D., 2003. Keystroke-Level Model. [Online] Available at: http://www.pitt.edu/~cmlewis/KSM.pdf [Accessed 11 Oct. 2007].
Kim, L. & Albers, M., 2001. Web design issues when searching for information in a small screen display. In SIGDOC’01. New Mexico, 2001. ACM.
Kim, H. et al., 2006. An Empirical Study of the Use Contexts and Usability Problems in Mobile Internet. In Proceedings of the 35th Hawaii International Conference on System Sciences. Hawaii, 2006.
Kjeldskov, J. et al., 2005. Evaluating the usability of a mobile guide: The influence of location, participants and resources. In Behaviour and Information Technology., 2005.
Kjeldskov, J. & Skov, M., 2003. Creating Realistic Laboratory Settings: Comparative Studies of Three Think-Aloud Usability Evaluations of a Mobile System. In Interact'03., 2003. ACM.
Kjeldskov, J., Skov, M., Als, B. & Høegh, R., 2004. Is It Worth the Hassle? Exploring the Added Value of Evaluating the Usability of Context-Aware Mobile Systems in the Field. In Mobile Human-Computer Interaction (MobileHCI 2004). Heidelberg, 2004. Springer-Verlag.
Kjeldskov, J., Skov, M. & Stage, J., 2004. Instant Data Analysis: Conducting Usability Evaluations in a Day. In Proceedings of the third Nordic conference on Human-computer interaction (NordiCHI '04). Tampere, Finland, 2004. ACM.
Kjeldskov, J. & Stage, J., 2007. New Techniques for Usability Evaluation of Mobile Systems. International Journal of Human-Computer Studies, 60(4), pp.599-620.
Kock, E., Biljon, J. & Pretorius, M., 2009. Usability evaluation methods: Mind the gaps. In Proceedings of the 2009 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists (SAICSIT '09). Vanderbijlpark, South Africa, 2009. ACM.
Kort, J. & Poot, H., 2005. Usage Analysis: Combining Logging and Qualitative Methods. In Proceedings of the 5th international conference on Mobile systems, applications and services (CHI 2005). USA, 2005. ACM.
Kostaras, N., Stavrinoudis, D., Sokoli, S. & Xenos, M., 2010. Combining Experimental and Inquiry Methods in software usability evaluation: the paradigm of LvS Educational Software. Journal of Systems and Information Technology, 12(2), p.120‐139.
Koutsabasis, P., Spyrou, T. & Darzentas, J., 2007. Evaluating Usability Evaluation Methods: Criteria, Method and a Case Study. In Jacko, J., ed. 12th International Conference, HCI International 2007. Berlin, 2007. Springer-Verlag.
Lee, Y. et al., 2006. Systematic evaluation methodology for cell phone user interfaces. Interacting with Computers, pp.304-25.
Lemlouma, T. & Layaïda, N., 2005. Content Interaction and Formatting for Mobile Devices. In DocEng’05. Bristol, United Kingdom, 2005. acm.
Lindholm, C., Keinonen, T. & Kiljander, H., 2003. Mobile Usability: How Nokia Changed the Face of the Mobile Phone. United States of America: The McGraw-Hill Companies, Inc.
Logistica Solutions Inc., 2007. Online Questionnaire. [Online] Available at: http://www.interpriseo.com/resources/general_info_articles/Online%20Questionaire.pdf [Accessed 14 May 2009].
Longoria, R., 2006. Designing Mobile Applications: Challenges, Methodologies, and Lessons Learned. In Usability Evaluation and Interface Design: Cognitive Engineering, Intelligent Agents and Virtual Reality. New Jersey: Lawrence Erlbaum Associates Inc. pp.91-95.
Love, S., 2005. Understanding Mobile Human-Computer Interaction. Great Britain: Elsevier Ltd.
Mackenzie, S. & Zhang, S., 1999. The Design and Evaluation of a High-Performance Soft Keyboard. In CHI'99. USA, 1999. ACM.
Mackenzie, S., Zhang, S. & Soukoreff, W., 1999. Text entry using soft keyboards. Behaviour and information Technology, pp.235-44.
Manes, D., Green, P. & Hunter, D., 1998. Prediction of Destination Entry and Retrieval Times Using Keystroke-Level Models. Beverly Hills, USA: Road Commission of Oakland County (RCOC).
Matera, M., Rizzo, F. & Carughi, G., 2008. Web Usability: Principles and Evaluation. [Online] Available at: http://www.webml.org/webml/upload/ent5/1/WebUsability-MateraEtAl.pdf [Accessed 11 July 2009].
Mayhew, D., 1999. The Usability Engineering Lifecycle. United States of America: Academic Press.
mhjerde, 2010. Mobile screen size trends. [Online] Available at: http://www.mbricks.no/blog/?p=38 [Accessed 4 August 2010].
MIT Carrer Development Center, 2010. Telephone Interviews. [Online] Available at: http://www.mit.edu/~career/guide/telephone.html [Accessed 14 January 2011].
Mohamed, A., 2010. e-nor Blog : Guidelines to Mobile Usability. [Online] Available at: http://www.e-nor.com/blog/index.php/2010/07/ [Accessed 2 March 2011].
Molich, R. & Nielsen, J., 1990. Improving a human-computer dialogue. Communications of the ACM, pp.338-48.
Nachmias, R. & Segev, L., 2003. Students’ use of content in Web-supported academic courses. Internet and Higher Education , pp.145-57.
Nielsen, J., 1993. Usability Engineering. Oxford: UK: Academic Press.
Nielsen, J., 1995. Usability Inspection Methods. In Conference companion on Human factors in computing systems (CHI'95). Colorado, USA, 1995. ACM.
Nielsen, J., 2001. Usability Metrics. [Online] Available at: www.useit.com [Accessed 3 November 2008].
Nielsen, J. & Molich, R., 1990. Heuristic evaluation of user interfaces. In Proceedings of the SIGCHI conference on Human factors in computing systems (CHI '90)., 1990. ACM Press.
Nielsen, C. et al., 2006. It’s Worth the Hassle! The Added Value of Evaluating the Usability of Mobile Systems in the Field. In NordiCHI 2008. Oslo, Norway, 2006. ACM.
Nowick, E., 2001. [Online] Available at: www.uidaho.edu/~mbolin/lppv4n1.htm [Accessed 27 November 2009].
Oehl, M., Sutter, C. & Ziefle, M., 2007. Considerations on efficient touch interfaces – how display size influences the performance in an applied pointing task. In G. Salvendy & J. Smith, eds. HCII 2007. Heidelberg: Springer. p.136–143.
Oppenheim, A., 1992. Questionnaire Design, Interviewing and Attitude measurement. New York: Biddles Ltd.
Oxford University Press, 2011. Paradigm. [Online] Available at: http://oxforddictionaries.com/definition/paradigm [Accessed 22 March 2011].
Oxford University Press, 2011. Technique. [Online] Available at: http://dictionary.reference.com/browse/technique [Accessed 23 March 2011].
Oztoprak, A. & Erbug, C., 2006. Field versus Laboratory Usability Testing: a First Comparison. [Online] Available at: http://www.aydinoztoprak.com/images/HFES_Oztoprak_.pdf [Accessed 2008 July 2008].
Palen, L. & Salzman, M., 2005. Voice-Mail Diary Studies for Naturalistic Data Capture under Mobile Conditions. In CSCW’05. Louisiana, USA, 2005. ACM.
Parush, A. & Gavish, N., 2004. Web navigation structures in cellular phones: the depth/breadth trade-off issue. International Journal of Human-Computer Studies, pp.753-70.
Pascoe, J., Ryan, N. & Morse, D., 2000. Using while moving: Human- computer interaction issues in fieldwork environments. In Annual Conference Meeting Transactions on Computer-Human Interaction., 2000. ACM Press.
Pirhonen, A., Brewster, S. & Holguin, C., 2002. Gestural an Audio Metaphors as a Means of Control for Mobile Devices. In HCI'2002., 2002. ACM.
Preece, J., Rogers, Y. & Sharp, H., 2002. Interaction Design : beyond Human Computer Interaction. United Stated of America: John Wiley & Sons, Inc.
Preez, A. & Dyk, V., 2008. Computer Hardware and Software. Cape Ttown: Pearson Education South Africa (Pty) Ltd.
Rangel, J. & Ferreira, D., 2009. A Multidimensional Approach for the Evaluation of Mobile Application User Interfaces. In J. Jacko, ed. Human-Computer Interaction: New Trends. Berlin Heidelberg: Springer-Verlag. pp.242-51.
Rapidsoft Systems, Inc., 2010. Mobile Porting Services. [Online] Available at: http://www.rapidsoftsystems.com/pdocs/mobile-porting-services.pdf [Accessed 7 October 2010].
Ridene, Y., Belloir, N., Barbier, F. & Couture, N., 2009. A DSML for Mobile Phone Applications Testing. [Online] Available at: http://www.dsmforum.org/events/DSM10/Papers/Ridene.pdf [Accessed 11 June 2010].
Rubin, J. & Chisnell, D., 2008. Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests. Second Edition ed. Indianapolis, Indiana: Wiley Publishing, Inc.
Scholtz, J., ca.2005. Usability Evaluation. [Online] National Institute of Standards and Technology Available at: http://www.itl.nist.gov/iad/IADpapers/2004/Usability%20Evaluation_rev1.pdf [Accessed 18 August 2009].
Seffah, A. & Metzker, E., 2009. Adoption-Centric Usability Engineering. London: Springer-Verlag London Limited.
Spiliopoulou, M., 2000. Web Usage Statistics and Web Site Evaluation. Communications of the ACM, pp.128-34.
Steenstra, H., 2003. Interaction and the Standardized Survey Interview. USA: Cambridge University Press.
Suleman, H., Fox, E. & Abrams, M., 2000. Building Quality into a Digital Library. In ACM International Conference on Digital Libraries. San Antonio, 2000. ACM Press.
Taksa, I., Spink, A. & Jansen, B., 2008. Web Log Analysis:Diversity Of Research Methodologies. In B. Jansen, A. Spink & I. Taksa, eds. Handbook of Research on Web Log Analysis. 1st ed. USA: IGI Global snippet. pp.504-20.
Tanaka, E., Bim, S. & Rocha, H., 2005. Comparing accessibility evaluation and usability evaluation in HagáQuê. In CLIHC '05 Proceedings of the 2005 Latin American conference on Human-computer interaction. Mexico, 2005. ACM Press.
Tec-Ed, Inc., 1999. Website usability published work. [Online] Available at: http://www.teced.com/PDFs/whitepap.pdf [Accessed 6 March 2008].
TechSmith, 2009. Usability Testing Basics. [Online] Available at: http://download.techsmith.com/morae/docs/customer-research/Usability-Testing-Basics.pdf [Accessed 19 December 2009].
The Economist, 2010. Living on a platform. The Economist.
The Economist, 2011. Blazing platforms. [Online] The Economist Available at: http://www.economist.com/node/18114689?story_id=18114689 [Accessed 25 March 2011].
Treu, S., 1994. User interface evaluation: a structured approach. New York: Plenum Press.
Tsiatsos, T., Andreas, K. & Pomportsis, A., 2010. Evaluation Framework for Collaborative Educational Virtual Environments. Educational Technology & Society, p.65–77.
University of Saskatchewan, 2010. Evaluation with Users. [Online] Available at: http://www.cs.usask.ca/faculty/gutwin/481/lecture-notes/13-evaluation-with-users.pdf [Accessed 22 February 2011].
Usability Consulting Services, 2002. User Centered Design Methods. [Online] Available at: http://www.indiana.edu/~usable/presentations/ucd_methods.pdf [Accessed 2010 April 2010].
Usability.gov, 2010. Usability Basics. [Online] Available at: www.usability.gov/basics/ [Accessed May 19 2010].
Usability.gov, ca.2006. Guidelines. [Online] U.S. Department of Health & Human Services Available at: http://www.usability.gov/pdfs/chapter18.pdf [Accessed 23 October 2007].
UsabilityFirst, 2010. Focus Groups. [Online] Available at: http://www.usabilityfirst.com/usability-methods/focus-groups/ [Accessed 25 September 2010].
UsabilityNet, 2010. User observation/field studies. [Online] Available at: http://www.usabilitynet.org/tools/userobservation.htm [Accessed 13 August 2010].
Viswanathan, P., 2010. Formatting Issues for Mobile Applications. [Online] Available at: http://mobiledevices.about.com/od/mobileappbasics/a/Formatting-Issue-For-Mobile-Applications.htm [Accessed 1 April 2011].
Webopedia, [ca.2009]. Log File. [Online] Available at: http://www.webopedia.com/TERM/L/log_file.html [Accessed 23 December 2009].
Weiss, S., 2002. Handheld Usability. England: John Wiley & Sons Ltd.
Wheatley, D. & Basapur, S., 2009. A Comparative Evaluation of TV Video Telephony with Webcam and Face to Face Communication. In Proceedings of the seventh european conference on European interactive television conference EuroITV '09. Leuven, Belgium, 2009. ACM.
Wikimedia Foundation, Inc., 2011. WIKIPEDIA. [Online] Available at: http://en.wikipedia.org/wiki/Usability_evaluation [Accessed 2 March 2011].
Wikipedia, 2009. Online questionnaires. [Online] Available at: http://en.wikipedia.org/wiki/Online_questionnaires [Accessed 17 August 2009].
Wikipedia, 2010. Likert Scales. [Online] Available at: http://en.wikipedia.org/wiki/Likert_Scales [Accessed 14 March 2011].
Wixon, D. & Wilson, C., 1997. The Usability Engineering Framework for Product Design. In Helander, M., Landauer, T. & Prabhu, P. Handbook of Human-Computer Interaction. Amsterdam: Elsevier. pp.653-69.
Zaiane, O., 2001. Web Usage Mining for a BetterWeb-Based Learning Environment. In Proceedings of the Advanced Technology for Education conference. Canada, 2001.
Zaiane, O. & Luo, J., 2001. Towards Evaluating Learners’ Behaviour in a Web-Based Distance Learning Environment. In Proceedings of the Advanced Learning Technologies. USA, 2001.
Zhang, Z., 2003. Overview of Usability Evaluation Methods. [Online] Available at: http://www.usabilityhome.com [Accessed 9 December 2008].


Jameson, A. (2006),  Analysis of Region-Of-Interest Compression Over Limited-Bandwidth systems For Small-Screen Devices, Thesis, (Master),  Napier University.
Kiljander, H. (2004). Evolution and usability of mobile phone interaction styles. Thesis, (PhD). Helsinki University of Technology.
Ozer, P. (2008). Actual Avatar Behavior Tracking in Second Life. Thesis, (Master). Radboud University.