Instrumental Interaction:  An Interaction Model for 
Designing Post-WIMP User Interfaces

Michel Beaudouin-Lafon

Dept of Computer Science
University of Aarhus

Aabogade 34
DK-8200 Aarhus N - Denmark
mbl@daimi.au.dk

ABSTRACT
This article introduces a new interaction model called 
Instrumental Interaction that extends and generalizes the 
principles of direct manipulation. It covers existing 
interaction styles, including traditional WIMP interfaces, as 
well as new interaction styles such as two-handed input and 
augmented reality. It defines a design space for new 
interaction techniques and a set of properties for comparing 
them. Instrumental Interaction describes graphical user 
interfaces in terms of domain objects and interaction 
instruments. Interaction between users and domain objects 
is mediated by interaction instruments, similar to the tools 
and instruments we use in the real world to interact with 
physical objects. The article presents the model, applies it 
to describe and compare a number of interaction techniques, 
and shows how it was used to create a new interface for 
searching and replacing text.

Keywords
Interaction model, WIMP interfaces, direct manipulation, 
post-WIMP interfaces, instrumental interaction

INTRODUCTION

In the early eighties, the Xerox Star user interface [27] and 
the principles of direct manipulation [26] led to a powerful 
graphical user interface model, referred to as WIMP  
(Windows, Icons, Menus and Pointing). WIMP interfaces 
revolutionized computing, making computers accessible to 
a broad audience for a variety of applications.

In the last decade, HCI researchers have introduced 
numerous new interaction techniques, such as toolglasses 
[5] and zoomable user interfaces [3]. Although some have 
been shown to be more efficient than traditional techniques, 
e.g., marking menus [19], few have been incorporated into 
commercial systems. A likely reason is that integrating 
new interaction techniques into an interface is challenging 
for both designers and developers. Designers find it faster 
and easier to stick with a small set of well-understood 
techniques. Similarly, developers find it more efficient to 
take advantage of the extensive support for WIMP 
interaction provided by current development tools.

The leap from WIMP to newer "post-WIMP" graphical 
interfaces, which take advantage of novel interaction 
techniques, requires both new interaction models and 
corresponding tools to facilitate development. This paper 
focuses on the first issue by introducing a new interaction 
model, called Instrumental Interaction, that extends and 
generalizes the principles of direct manipulation to also 
encompass a wide range of graphical interaction techniques. 
The Instrumental Interaction model has the following goals:
€ cover the state-of-the-art in graphical interaction 
techniques;
€ provide qualitative and quantitative ways to compare 
interaction techniques, to give designers the basis for 
an informed choice when selecting a given technique 
to address a particular interface problem;
€ define a design space in which unexplored areas can be 
identified and lead to new interaction techniques; and
€ open the way to a new generation of user interface 
development tools that make it easy to integrate the 
latest interaction techniques in interactive applications.

After a review of related work, this paper analyzes the 
limits of current WIMP interfaces. The Instrumental 
Interaction model is introduced and applied to several 
existing interaction techniques as well as to the design of a 
new interface for searching and replacing text. Finally the 
paper concludes with suggestions for future work.

RELATED WORK

In this paper, an interaction model is defined as follows:

An interaction model is a set of principles, rules and 
properties that guide the design of an interface. It 
describes how to combine interaction techniques in a 
meaningful and consistent way and defines the "look 
and feel" of the interaction from the user's 
perspective. Properties of the interaction model can 
be used to evaluate specific interaction designs.

Direct Manipulation [26] is a generic interaction model, 
while style guides, e.g., Appleąs guidelines [2], describe 
more precise and specific models. Took introduced a model 
called Surface Interaction [29] and Holland & Oppenheim a 
model called Direct Combination [15].

An interaction model differs from the architectural model of 
an interface, which describes the functional elements in the 
implementation of the interface and their relationships (see 
[12] for a review). User interface development environments 
have generated a variety of implementation models for 
developing interfaces (see [24] for a review), e.g. the widget 
model of the X/Motif toolkit [14] or Garnet's Interactors 
model [23]. MVC [18] is a well-known model that was 
created to support the Xerox Star user interface and has 
influenced many other architectural and implementation 
models. Whereas architectural models are aimed at interface 
development, an interaction model is aimed at interface 
design.

The model-based approach and its associated tools [28] help 
bridge the gap between interaction and architectural models 
by offering a higher-level approach to the design of 
interactive systems.

Device-level models such as logical input devices [11] or 
Card et al.'s taxonomy [7] operate at a lower level of 
abstraction than interaction models. Understanding the role 
of the physical devices in interaction tasks is a critical 
component of the definition of the Instrumental Interaction 
model.

At the theoretical level, Activity Theory [6] provides a 
relevant framework for analyzing interaction as a mediation 
process between users and objects of interest.

Finally, Instrumental Interaction is grounded in the large 
(and growing) number of graphical interaction techniques 
that have been developed in recent years, some of which are 
referenced in the rest of this article.

FROM WIMP TO POST-WIMP INTERFACES

The WIMP interaction model can be outlined as follows: 
€ application objects are displayed in document windows; 
€ objects can be selected and sometimes dragged and 
dropped between different windows; and
€ commands are invoked through menus or toolbars, 
often bringing up a dialog box that must be filled in 
before the command's effect on the object is visible.

This section uses Shneiderman's principles of direct 
manipulation [26] to analyze WIMP interfaces:

1. Continuous Representation of objects of interest

Objects of interest are central to direct manipulation. They 
are the objects that the user is interested in to achieve a 
given task, such as the text and drawings of a document or 
the formulae and values in a spreadsheet. Principle 1 asserts 
that objects of interest should be present at all times. Since 
objects of interest are often larger than the screen or window 
in which they are displayed, WIMP interfaces makes them 
accessible at all times through scrolling, panning or 
zooming. This accessibility is hindered by the growing 
number of interface objects that are not objects of interest 
such as toolbars, floating palettes and menu bars. These use 
increasing amounts of screen real-estate, forcing the user to 
shrink the windows displaying objects of interest. Dialog 
boxes also often occlude significant parts of the screen, 
making the rest of the interface inaccessible to the user. 

Finally, there are more objects of interest than meet the 
eye: in many applications users must manipulate secondary 
objects to achieve their tasks, such as style sheets in 
Microsoft Word, graphical layers in Adobe Photoshop or 
Deneba Canvas, or paint brushes in MetaCreations Painter. 
Once the user is familiar with the application, these objects 
become part of his or her mental model and may acquire the 
status of object of interest. Unfortunately, these are rarely 
implemented as first-class objects. Thus, for example, 
Word's styles are editable only via transient dialog boxes 
that must be closed before returning to the text editing task. 

2. Physical actions on objects vs. complex syntax

Most computers have only a mouse and keyboard as input 
devices limiting the set of user actions to: typing text or 
"special" keys (e.g. function keys, keyboard shortcuts and 
modifiers), pointing, clicking, and dragging. Given the 
mismatch between this small vocabulary of actions and the 
large vocabulary of commands, WIMP interfaces must rely 
on additional interface elements, usually menus and dialog 
boxes, to specify commands. The typical sequence of 
actions to carry out a command is:
€ select the objet of interest by clicking it;
€ select a command from a menu or keyboard shortcut;
€ fill in the fields of a dialog box; and
€ click the OK button to see the result.

This is conceptually no different from typing a command in 
a command-line interface: The user must type a command 
name, file name (the object of interest), arguments (the 
fields in the dialog box) and return key (the OK button). In 
both cases the syntax is complex and cannot be considered 
direct manipulation of the objects of interest. In fact, 
WIMP interfaces directly violate principle 2 and often use 
indirect manipulation of the objects of interest, through 
(direct) manipulation of interface elements such as menus 
and dialog boxes.

3. Fast, incremental and reversible operations  with an 
immediately-apparent effect on the objects of interest

The heavy graphical syntax imposed on the user results in 
commands that are neither fast nor incremental. Specifying 
a command is not fast because of the amount of time used 
for non-semantic actions such as displacing windows and 
flipping through tabs in a tabbed dialog. Inputting 
parameter values for a command is often inefficient because 
of the small set of interactors, such as when numeric values 
are entered as text. Finally, the specification is not 
incremental: users must explicitly commit to a command 
that uses a dialog box before seeing the result. If the result 
does not match the user's expectations, the whole cycle of 
command activation must be started over again. This is 
especially cumbersome when trial-and-error is an integral 
part of the task, as when a graphics designer selects a font 
size: specifying the point size numerically is annoying 
when the goal is to see the visual result on the page.

4. Layered or spiral approach to learning

The small number of interaction techniques used by WIMP 
interfaces makes it easy to learn the basics of any new 
application. However, interaction shortcuts, such as 
combining keyboard modifiers with mouse buttons to 
activate the frequent commands, are concealed and 
inconsistent across applications and make the transition 
from novice to power user more difficult.

Towards a new interaction model

Some commercial applications, especially those dedicated to 
creative tasks such as painting, graphic design or music, 
extend the basic WIMP model to address some of the 
shortcomings identified above. For example, some painting 
programs make brushes first class objects that can be edited 
and saved into files. Some text editors have inspector 
windows that display the state of the current selection and 
update it when the user enters relevant values. Techniques 
such as the HotBox [21] were designed to access larger 
numbers of commands.

These new interaction techniques illustrate the transition 
from WIMP to post-WIMP interaction: Windows are not 
used in zoomable interfaces, icons and text are replaced by 
richer representations in interactive visualization, menus are 
complemented by more powerful interaction widgets such 
as toolglasses, pointing and dragging are superseded by 
bimanual input and gesture input. Designing Post-WIMP 
interfaces that are more faithful to the principles of direct 
manipulation and that take advantage of novel interaction 
techniques requires new interaction models. To guide 
interface designers, these models should be: 
€ descriptive, incorporating both existing and new 
applications;
€ comparative, providing metrics for comparing 
alternative designs (as opposed to prescriptive, 
deciding a priori what is good and what is bad); and
€ generative, facilitating creation of new interaction 
techniques.

INSTRUMENTAL INTERACTION

As shown in the previous analysis, WIMP interfaces do not 
follow the principles of direct manipulation. Instead, they 
introduce interface elements such as menus, dialog boxes 
and scrollbars that act as mediators between users and the 
objects of interest. Users have a (limited) sense of 
engagement, as advocated by direct manipulation, because 
they manipulate these intermediate objects directly. This 
matches our experiences in the physical world: We rarely 
fingerpaint, but often use pens and pencils to write. We 
cook with pots and pans, hang pictures with hammers and 
power drills, open doors with handles and turn off lights 
with switches. Our interaction with the physical world is 
governed by our use of tools. Direct manipulation of 
physical objects of interest occurs when we bring them into 
our current context of operation, before we manipulate them 
with the appropriate tools, usually with two hands [13]. 
The Instrumental Interaction model is based on how we 
naturally use tools (or instruments) to manipulate objects 
of interest in the physical world. Objects of interest are 
called domain objects, and are manipulated with computer 
artifacts called interaction instruments.

Domain objects

In computer systems, applications operate on data that 
represent phenomena or objects. For computer users, this 
data is the primary focus of their actions. For example, 
when creating a text document, the focus of the user is on 
the text of the document. Everything else on the screen is 
there to support the user's task of editing the text document.

Domain objects form the set of potential objects of interest 
for the user of a given application. Domain objects have 
attributes that describe their characteristics. Attributes can 
be simple values or more complex objects. For example, in 
a 3D modeller, the position and size of a sphere are simple 
values (integer or real numbers), while the material of the 
sphere is a complex entity (color, texture, transparency, 
etc.). The user may shift the object of interest, 
concentrating on the material as the focus of the interaction. 
Similarly, text styles that describe the formatting attributes 
of text also may also obtain the status of objects of 
interest. Materials and styles are therefore also domain 
objects in their respective interfaces.

In summary, domain objects form the basis of the 
interaction as well as its purpose: Users operate on domain 
objects by editing their attributes. They also manipulate 
them as a whole, e.g. to create, move and delete them.

Interaction instruments

An interaction instrument is a mediator or two-way 
transducer between the user and domain objects. The user 
acts on the instrument, which transforms the user's actions 
into commands affecting relevant target domain objects. 
Instruments have reactions enabling users to control their 
actions on the instrument, and provide feedback as the 
command is carried out on target objects (Figure 1).

A scrollbar is a good example of an interaction instrument. 
It operates on a whole document by changing the part that 
is currently visible. When the user clicks on one of the 
arrows of the scrollbar, the scrollbar sends the document a 
scrolling command. Note that the transduction here consists 
of sending scrolling commands as long as the user presses 
the arrow. The reaction of the scrollbar consists of 
highlighting the arrow being pressed. The feedback consists 
of updating the thumb to reflect the new position of the 
document. In addition, the object also responds to the 
instrument by updating its view in the window.

Another example is an instrument that creates rectangles in 
a drawing editor. As the user clicks and drags the mouse, 
the instrument provides a reaction in the form of a rubber-
band rectangle. When the user releases the button, the 
creation operation is actually carried out and a new domain 
object is created. The feedback of this operation consists in 
displaying the new object.

An instrument decomposes interaction into two layers: the 
interaction between the user and the instrument, defined as 
the physical action of the user on the instrument and the 
reaction of the instrument and the interaction between the 
instrument and the domain object, defined as the command 
sent to the object and the response of the object, which the 
instrument may transform into feedback to the user. The 
instrument is composed of a physical part, the input device, 
and a logical part, the representation of the instrument in 
software and on the screen.

Activating instruments

At any one time, an interface provides a potentially large 
number of instruments. However the user can manipulate 
only a few of them at the same time, usually only one, 
because of the limited number of input devices. In the most 
common case of a keyboard and mouse, a single input 
device (the mouse) must be multiplexed between a 
potentially large number of instruments, i.e. a single 
physical part may be associated with different logical parts. 
                               
Figure 1: Interaction instrument mediating the
interaction between a user and a domain object

An instrument is said to be activated when it is under the 
user's control, i.e. when the physical part has been 
associated with the logical part. In the case of the scrollbar, 
the user activates the instrument by pointing at it and it 
remains active as long as the pointer is within the scrollbar. 
When creating a rectangle,  the user activates the instrument 
by clicking a button in a tool palette and it remains active 
until another instrument is activated.

These two types of activation are quite different. The 
activation of the scrollbar is spatial because it is caused by 
moving the mouse (and cursor) inside the area of the 
scrollbar. The activation of the rectangle creation 
instrument is temporal because it is caused by a former 
action and remains in effect until the activation of another 
instrument. (This is traditionally called a mode). Each type 
of activation has an associated cost: Spatial activation 
requires the instrument to be visible on the screen, taking 
up screen  real-estate and requiring the user to point at it and 
potentially dividing the user's attention. Temporal 
activation requires an explicit action to trigger the 
activation, making it slower and less direct. 

Interface designers often face a design trade-off between 
temporal and spatial multiplexing of instruments because 
the activation costs become significant when the user must 
frequently change instruments. Using extra input devices 
can reduce these costs. For example, the thumbwheel on 
Microsoft's Intellimouse is a scrolling instrument that is 
always active. An extreme example is an audio mixing 
console, which may contain several hundred potentiometers 
and switches, each corresponding to a single function. This 
permits very fast access to all functions, which is crucial 
for sound engineers working in real-time and cannot afford 
the cost of activating each function indirectly. A large 
design space lies between a single mouse and hundreds of 
potentiometers, posing design challenges to maximally 
exploit physical devices and reduce activation costs.

Reification and Meta-instruments

Reification is a process for turning concepts into objects. In 
user interfaces, the resulting objects can be represented 
explicitly on the screen and operated upon. For example, a 
style in a text editor is the reification of a collection of text 
attributes; the notion of material in a 3D modeller is the 
reification of a set of rendering properties. This type of 
reification generates new domain objects such as styles and 
materials that complement the "primary" domain objects of 
the application domain.

Instrumental Interaction introduces a second type of 
reification: an interaction instrument is the reification of 
one or more commands. For example, a scrollbar is the 
reification of the command that scrolls a document. This 
link between the traditional notion of command and the 
notion of instrument makes it easy to analyze existing 
interfaces with the Instrumental Interaction model. It is also 
a useful guideline to identify instruments when designing a 
new interface. In the last part of this paper, this rule is used 
to reify the traditional search-and-replace command of a text 
editor into a search instrument.

The result of this reification rule is that instruments are 
themselves potential objects of interest. This is indeed the 
case in real life, when the focus of attention shifts from the 
object being manipulated to the tool used to manipulate it. 
For example a pencil is a writing instrument and the 
domain object is the text being written. When the lead 
breaks, the focus shifts to a new instrument, a pencil 
sharpener, which operates on the shifted domain object, the 
pencil lead. The focus may even shift to the pen sharpener, 
if we need a screwdriver to fix it. Such "meta-instruments" 
(instruments that operate on instruments) are not only 
useful for "fixing" instruments, but can also be used to 
organize instruments in the workspace, e.g. a toolbox, or to 
tailor instruments to particular tasks, e.g. turning a power-
drill into a power-saw. In graphical user interfaces, common 
meta-instruments include menus and tool palettes used to 
select commands and tools, i.e. to activate instruments.

Properties of Instruments

An important role of an interaction model is to provide 
properties to evaluate and compare alternative designs. This 
can help interface designers who face difficult choices when 
selecting the interaction techniques for a particular 
application. The goal of defining properties of instruments 
is not to decide which instruments are good and which are 
bad, but to evaluate them so that designers can make an 
informed choice and so that researchers can identify and 
explore areas of the design space that are not mapped by 
existing instruments.

The literature on user interface evaluation techniques is 
considerable. Here, we use a particular type of evaluation 
based on properties. This is a common approach in software 
engineering and has also proved valid and useful for 
evaluating interactive systems [12]. The rest of this section 
introduces three properties of interaction instruments.

Degree of indirection

The degree of indirection is a 2D measure of the spatial and 
temporal offsets generated by an instrument. The spatial 
offset is the distance on the screen between the logical part 
of the instrument and the object it operates on. Some 
instruments, such as the selection handles used in graphical 
editors, have a very small spatial offset since they are next 
to or on top of the object they control. Other instruments, 
such as dialog boxes, can be arbitrarily far away from the 
object they operate on and therefore have a large spatial 
offset. A large spatial offset is not necessarily undesirable. 
For example, placing a light switch far from the light bulb 
it controls makes it easier to turn on the light. Similar 
examples can be found in user interfaces.

The temporal offset is the time difference between the 
physical action on the instrument and the response of the 
object. In some cases, the object responds to the user's 
action in real-time. For example, clicking an arrow in a 
scrollbar scrolls the document while the mouse button is 
depressed. In other cases, the object responds to the user's  
action only when the action reaches closure. For example, 
the arguments specified in a dialog box are taken into 
account only when the OK or Apply button is activated. In 
general, short temporal offsets are desirable because they 
exploit the human perception-action loop and give a sense 
of causality [22].

Figure 2 shows the degree of indirection of various WIMP 
instruments on a 2D chart. Scrollbars occupy a range in the 
diagram. For example, some scrollbars provide immediate 
response when the thumb is moved while others only scroll 
the document when the mouse button is released. The figure 
shows that the degree of indirection describes a continuum 
between direct manipulation (lower-left corner) and indirect 
manipulation (upper-right corner).

Figure 2: Degree of indirection

Degree of integration

The degree of integration measures the ratio between the 
number of degrees of freedom (DOF) provided by the logical 
part of the instrument and the number of DOFs captured by 
the input device. This term comes from the notion of 
integral tasks [17]: some tasks are performed more 
efficiently when the various DOFs are controlled 
simultaneously with a single device. A scrollbar is a 1D 
instrument controlled by a 2D mouse, therefore its degree of 
integration is 1/2. The degree of integration can be larger 
than 1: controlling 3 rotation angles with a 2D mouse [16] 
has a degree of integration of 3/2. This property can be used 
to compare instruments that perform similar operations. 
For example, panning over a document can be achieved 
with two scrollbars or a 2D panner. The latter has a degree 
of integration of 1 and is therefore more efficient than two 
scrollbars, which have a degree of integration of 1/2 and 
incur additional activation costs.

Degree of compatibility

The degree of compatibility measures the similarity 
between the physical actions of the users on the instrument 
and the response of the object. Dragging an object has a 
high degree of compatibility since the object follows the 
movements of the mouse. Scrolling with a scrollbar has a 
low degree of compatibility because moving the thumb 
downwards moves the document upwards. Using text input 
fields to specify  numerical values in a dialog box, e.g. the 
point size of a font, has a very low degree of compatibility 
because the input data type is different from the output data 
type. Similarly, specifying the margin in a text document 
by entering a number in a text field has a lower degree of 
compatibility than dragging a tab in a ruler.

APPLYING THE MODEL

This section uses the Instrumental Interaction model to 
analyze existing interaction techniques, both from WIMP 
interfaces and from more recent research. It demonstrates the 
descriptive power of the model. The generative power of the 
model is then illustrated by the design of a new instrument 
for searching and replacing text.

Analyzing WIMP Interfaces

The primary components of WIMP interfaces can be easily 
mapped to instruments and compared (Table 1):

Table 1: Comparing WIMP interaction techniques

Menus and toolbars are meta-instruments used to select the 
command or tool to activate. This use of meta-instruments 
slows down interaction and generates shifts of attention 
between the object of interest, the meta-instrument and the 
instrument. Contextual menus have a small spatial offset 
and are therefore more efficient than toolbars and menu bars. 
Toolbars, which can be moved next to their context of use, 
have a better spatial offset than menu bars.

Dialog boxes are used for complex commands. They have a 
high degree of spatial and temporal indirection. They often 
use a small set of standard interactors such as text fields for 
numeric values, resulting in a low degree of compatibility.

Inspectors and property boxes are an alternative to dialog 
boxes that have a lower degree of temporal indirection. 
Since they can stay open, they can be activated with 
pointing (positional activation) rather than selection in a 
menu (temporal activation).

Handles are used for graphical editing and provide a very 
direct interaction: low degree of indirection, high degree of 
compatibility and good degree of integration.

Window titles and borders are instruments activated 
positionally to manipulate the window (move, resize, 
iconify, zoom, close). Scrollbars control the content of the 
window. Because of their low degree of integration, they are 
not optimal, especially for panning documents in 2D. Also, 
their spatial offset generates a division of attention, 
especially since they are activated positionally: the user 
must be sure to point at the right part of the scrollbar.

Keyboard shortcuts and accelerator keys are meta-
instruments, used to quickly switch between instruments 
and save the activation costs of menus and toolbars. Some 
accelerator keys affect the way the current instrument 
works. For example, on the Macintosh, the Shift key 
constrains the move tool to horizontal and vertical moves 
and the resize tool to maintain the current aspect ratio.

Drag and drop is a generic instrument for transferring or 
copying information. Compared to traditional cut/copy/ 
paste commands that use a hidden clipboard, it has a smaller 
degree of indirection. There is no spatial offset because the 
objects are manipulated directly and the temporal offset is 
low because there is feedback about potential drop-zones as 
the user drags the object.

Over the past few years, interaction techniques such as 
inspectors, property boxes, drag and drop and contextual 
menus have become more common in commercial 
applications. The above analysis explains why these 
techniques are more efficient than their WIMP counterparts, 
demonstrating a useful contribution of the Instrumental 
Interaction model.

Analyzing Post-WIMP Interaction Techniques

Table 2 summarizes the comparison of several post-WIMP 
interaction techniques. Interactive visualization helps users 
explore large quantities of visual data and make sense of it 
through filtering and displaying it to exhibit patterns [8]. 
These systems use two categories of instruments:
€ navigation instruments specify which part of the data to 
visualize and how; and 
€ filtering instruments specify queries and display results.

A key aspect of these systems is a strong coupling between 
user actions and system response. In other words, these 
instruments must have a small temporal offset. For 
example, in the Information Visualizer [9], the instruments 
used to control Cone Trees and Perspective Walls provide 
immediate responses and use smooth animations to display 
changes in visualization parameters. In Dynamic Queries 
[1], double sliders are used to specify the range of query 
parameters; any change in a slider updates the display of 
filtered data. 

Both navigation and filtering are usually multi-dimensional 
tasks: the user wants to control several dimensions 
simultaneously to navigate along arbitrary trajectories. This 
calls for the ability to manipulate several instruments 
simultaneously (which requires additional input devices) 
and/or for instruments with a high degree of integration. 
Current systems do not address this well. For example, 
Dynamic Queries permit only one side of a slider to be 
manipulated at a time, forcing the user to navigate along 
rectangular trajectories in the parameter space.

Zoomable user interfaces such as Pad++ [3] are based on the 
display of an infinite flat surface that can be viewed at any 
resolution. Exploring this surface requires navigation 
instruments to pan and zoom until the desired objects are in 
sight. Pad++ navigation instruments are activated by mouse 
buttons or modifier keys. This temporal activation is fast 
and provides access to navigation anywhere on the surface, 
unlike a scrollbar which requires positional activation. It 
also has high degrees of compatibility and integration. 
Editing the objects on the surface has led to Dropable Tools 
[4]: tools can be dropped anywhere on the surface and 
grabbed later. Activating these instruments is more direct 
than with a traditional toolbar because it does not involve a 
meta-instrument and the associated switch of attention.

Table 2: Comparing post-WIMP interaction techniques

A number of recent interaction techniques rely on new or 
additional input devices. This reduces activation costs by 
allowing several instruments to be active simultaneously. 
For example, the thumb wheel of the Intellimouse is 
always attached to a scrolling instrument. ToolGlasses [5] 
are semi-transparent palettes operated with a track-ball in 
the left (or non-dominant) hand. The right hand is used to 
click through the palette onto a domain object, therefore 
specifying both the action to perform and the object  to 
operate on. Here the toolglass is a meta-instrument under 
the control of the left hand, while the instruments it 
contains are activated by the right hand. In the TTT 
prototype [20], a combination of three instruments can be 
active simultaneously: the toolglass itself, an instrument in 
the toolglass and a navigation instrument to pan and zoom 
the drawing surface. This makes it possible, for example, to 
pan and zoom while creating an object. The design exploits 
the trackball and mouse input devices to minimize 
activation costs, to reduce the degree of indirection and to 
increase the degree of integration.

Graspable interfaces [10] use physical objects as input 
devices to manipulate virtual objects. In effect, they transfer 
most of the characteristics usually found in the logical part 
of the instrument into the physical part. This approach was 
pioneered by Augmented Reality [30], which explores ways 
to reconcile the physical and computer world by embedding 
computational facilities into physical objects. Here, the 
domain objects, in addition to the instruments, have a 
strong physical component. This increases the degrees of 
compatibility and integration since interaction occurs in the 
real world.

Designing a Text Search Instrument

In most text editors, searching and replacing text uses a 
dialog box where the user specifies the string to be searched 
and the string to replace it with. The operation is controlled 
with buttons to find the next or previous occurrence and 
replace it or not. Undoing the command usually means 
restoring the search string everywhere it has been replaced. 
This results in a sequential form of interaction where the 
system prompts the user and forces him or her to decide 
what to do with each occurrence, generating a very large 
temporal offset.

An instrumental approach to search and replace has led to 
the following design, implemented in a Tcl/Tk prototype 
(Figure 3). The top part of the window is the logical part of 
the instrument, used to specify the search and replace 
strings. No buttons are necessary: as the user types a search 
string, the occurrences highlight in yellow both in the text 
window and in the scrollbar. To replace an occurrence, the 
user simply clicks on it, immediately replacing it with the 
replace string (if any) highlighted in red. Editing the replace 
string changes all the replaced occurrences. A replaced 
occurrence can be undone by clicking on it in the same 
way: the search string is substituted and highlighted in 
yellow. Typing text in the document does not cancel the 
operation of the search instrument: each time the text is 
changed, a new search is performed. 

Figure 3: Search and replace instrument

The scrollbar was modified to facilitate browsing. First, it 
includes tick marks representing the occurrences and giving 
an overview of the search. Second the arrow buttons were 
changed so that clicking on an arrow scrolls the document 
at a variable speed according to the distance between the 
cursor and the arrow: clicking on the up arrow scrolls the 
document slowly downwards (as in traditional scrollbars); 
moving the cursor up speeds up scrolling; moving the 
cursor down slows it down; moving the cursor further 
down, scrolling stops, then reverts. This reduces the 
division of attention that occurs when operating the various 
parts of a scrollbar while focusing on the document. It is 
similar in effect to the thumbwheel described earlier but 
does not require a separate input device.

The instrument provides feedback about the current state of 
the search/replace operation by highlighting all  the 
occurrences in the text, as in the Document Lens [25], and 
in the scrollbar. This design results in a low degree of 
indirection: The spatial offset is reduced by getting rid of 
the traditional Next, Previous and Replace buttons; The 
temporal offset is reduced by displaying the occurrences as 
the user types the search string, by showing the effect of 
replacing an occurrence immediately and by allowing the 
user to undo any change, not necessarily in the order they 
were made.

The design of the scrollbar improves the degree of 
integration since it uses the vertical position of the mouse 
to control the speed and direction of scrolling (1 output 
DOF / 2 input DOFs) while traditional scrollbars only use 
the fact that the user clicked in the arrow (0 output DOF / 2 
input DOFs). This design also improves the degree of 
compatibility since the mouse now works as a joystick. In 
effect, the scrollbar thumb and arrows are functionally 
equivalent: the thumb provides positional control while the 
arrows provide rate control of the visible part of the 
document.

Two variants of the search instrument were also developed. 
The first variant allows several search instruments to be 
active simultaneously, each independent of the others and 
using a different pair of colors to highlight occurrences. In 
the second variant, multiple strings can be specified in the 
search string and a regular expression can be specified in the 
replace string. The instrument highlights all the occurrences 
of all search strings at once. This variant was used to build 
the index of a book: a list of words to index was entered as 
a set of search strings. The replace string added the proper 
markup to include the occurrence in the index. Indexing the 
book became simply a matter of picking which occurrences 
to include in the index, taking advantage of the display of 
all occurrences to avoid putting in the index occurrences of 
the same word that are close together in the text. At any 
time it was possible to change the list of words in the 
index, the content of the text and the occurrences to index.

CONCLUSION AND FUTURE WORK

This article has introduced the Instrumental Interaction 
model, which generalizes and operationalizes Direct 
Manipulation. The model has been used to analyze WIMP 
interfaces as well as more recent interaction techniques and 
to design a new interface for searching and replacing text. 
This demonstrates the descriptive, comparative and 
generative power of the Instrumental Interaction model. 

We are currently testing the model as we design a new 
graphical editor for Colored Petri Nets. The model is used 
both as a design guide and an evaluation tool to integrate 
existing interaction techniques and create new ones.

However further work is needed to develop the model in 
more detail and assess its limits. This requires a more 
thorough analysis of graphical interfaces and interaction 
techniques, the definition and evaluation of new properties, 
a taxonomy of interaction instruments, and an exploration 
of the design space defined by the model. Design principles 
are also needed to help designers create instruments and 
integrate them effectively into a user interface.

The other important area for future work is to make 
Instrumental Interaction useful not only to user interface 
designers but also to user interface developers by developing 
a user interface toolkit based on the model. This would 
support the adoption of novel interaction techniques in a 
wide range of applications and allow a shift from WIMP to 
post-WIMP interfaces.

ACKNOWLEDGMENTS

Many thanks to Wendy Mackay for numerous discussions 
about the model and its implications and for her help with 
the English. Thanks to Peter Břgh Andersen, Susanne 
Břdker, Yves Guiard and Austin Henderson for discussions 
and feedback on earlier versions of this paper.

REFERENCES

1.	Ahlberg, C., Williamson, C., Shneiderman, B. 
Dynamic Queries for Information Exploration: An 
Implementation and Evaluation.  In Proc. ACM 
Human Factors in Computing Systems, CHI'92, ACM 
Press, p.619-626, 1992.

2.	Apple Computer. Macintosh Human Interface 
Guidelines, Addison-Wesley, 1992.

3.	Bederson, B. and Hollan, J. Pad++: A Zooming 
Graphical Interface for Exploring Alternate Interface 
Physics. In Proc. ACM Symposium on User Interface 
Software and Technology, UISTą94, ACM Press, p.17-
26, 1994

4.	Bederson, B., Hollan, J., Druin, A., Stewart, J., 
Rogers, D., Proft, D. Local Tools : an Alternative to 
Tool Palettes. In Proc. ACM Symposium on User 
Interface Software and Technology, UISTą94, ACM 
Press, p.169-170, 1994.

5.	Bier, E., Stone, M., Pier, K., Buxton, W., De Rose, 
T. Toolglass and Magic Lenses : the See-Through 
Interface. In Proc. ACM SIGGRAPH, p.73-80, 1993.

6.	Břdker, S. Through the Interface. A Human Activity 
Approach to User Interface Design. Lawrence Erlbaum 
Associates, 1991.

7.	Card, S.K., Mackinlay, J.D., Robertson, G.G. A 
Morphological Analysis of the Design Space of Input 
Devices. ACM Trans. Information Systems, 9(2):99-
122, 1991.

8.	Card, S., Mackinlay, J., Shneiderman, B. Readings in 
Information Visualization: Using Vision to Think. 
Morgan Kaufmann Publishers, 1998.

9.	Card, S., Robertson, G., Mackinlay, J. The 
Information Visualizer, an Information Workspace. In 
Proc. ACM Human Factors in Computing Systems, 
CHI'91, ACM Press, p.181-187, 1991.

10.	Fitzmaurice, G., Ishii, H., Buxton, W. Laying the 
Foundations for Graspable User Interfaces.  In Proc. 
ACM Human Factors in Computing Systems, CHI'95, 
ACM Press, p.442-449, 1995.

11.	Foley, J., Wallace, V., Chan, P. The Human Factors 
of Computer Graphics Interaction Techniques. 
Computer Graphics and Applications, 4(11):13-48, 
1984.

12.	Gram, C. & Cockton, G. Design Principles for 
Interactive Software, Chapman & Hall, 1996.

13.	Guiard, Y. Asymmetric division of labor in human 
skilled bimanual action: The kinematic chain as a 
model. Journal of Motor Behavior, 19:486-517, 1987.

14.	Heller, D., Ferguson, P.M., Brennan, D. Motif 
Programming Manual, O'Reilly & Associates, 1994.

15.	Holland, S. & Oppenheim, D. Direct Combination.  
Proc. ACM Human Factors in Computing Systems, 
CHI'99, ACM Press, p.262-269, 1999.

16.	Jacob, I. & Oliver, J. Evaluation of Techniques for 
Specifying 3D Rotations with a 2D Input Device. 
Proc. HCI'95 Conference, People and Computers X, 
p.63-76, 1995.

17.	Jacob, R., Sibert, L., McFarlane, D., Preston Mullen, 
M. Integrability and Separability of Input Devices. 
ACM Trans. Human Computer Interaction, 1(1), p.3-
26, 1994.

18.	Krasner, G.E. & Pope, S.T. A Description of the 
Model-View-Controller User Interface Paradigm in the 
Smalltalk80 System. J. Object Oriented Programming 
1(3):26-49, 1988.

19.	Kurtenbach, G. & Buxton, W. User Learning and 
Performance with Marking Menus. Proc. ACM Human 
Factors in Computing Systems, CHI'94, ACM Press, 
p.258-264, 1994.

20.	Kurtenbach, G., Fitzmaurice, G., Baudel, T., Buxton. 
W. The Design of a GUI Paradigm based on Tablets, 
Two-hands, and Transparency.  In Proc. ACM Human 
Factors in Computing Systems, CHI'97, ACM Press, 
p.35-42, 1997.

21.	Kurtenbach, G., Fitzmaurice, G.W., Owen, R.N., 
Baudel, T. The Hotbox: efficient access to a large 
number of menu-items.  Proc. ACM Human Factors in 
Computing Systems, CHI'99, ACM Press, p.231-237, 
1999.

22.	Michotte, A. La perception de la causalité. 
Publications Universitaires de Louvain, 1946.

23.	Myers, B.A. A New Model for Handling Input. ACM 
Trans. Information Systems, 8(3):289-320, 1990.

24.	Myers, B.A. User Interface Software Tools. ACM 
Trans. Computer-Human Interaction, 2(1):64-103, 
1995.

25.	Robertson, G.G. & Mackinlay, J.D. The Document 
Lens Visualizing Information. Proc. ACM 
Symposium on User Interface Software and 
Technology, p.101-108, 1993.

26.	Shneiderman, B. Direct Manipulation : a Step Beyond 
Programming Languages. IEEE Computer, 16(8), pp 
57-69, 1983.

27.	Smith, D., Irby, C., Kimball, R., Verplank, B., 
Harslem E. Designing the Star User Interface. Byte, 
7(4), p.242:282, 1982.

28.	Szekely, P., Luo. P., Neches, R. Beyond Interface 
Builders: Model-Based Interface Tools Model-Based. 
Proc. ACM Human Factors in Computing Systems, 
INTERCHI'93, ACM Press, p.383-390, 1993.

29.	Took, R. Surface Interaction: A Paradigm and Model 
for Separating Application and Interface. Proc. ACM 
Human Factors in Computing Systems, CHI'90, ACM 
Press, p.35-42, 1990.

30.	Wellner, P., Mackay, M., Gold R. Computer-
Augmented Environments. Special Issue of 
Communications of the ACM, 23(7).