Position paper for CHI'99 Workshop on End-User Programming and Blended-User Programming


Catherine Letondal, letondal@pasteur.fr
Pasteur Institute, Scientific Computing Center

A practical and empirical approach for biologists who almost program.


Abstract:
This paper addresses the programming needs of biologists who have to adapt and combine existing programs and sometimes develop new ones for their research. We explain why existing end-user programming tools do not work well for them and we present a critical perspective on the conception of programming by computer scientists. Finally we introduce our approach, based on participatory design and technical exploration.
Keywords:
end-user programming, participatory design, prototype languages, scripting languages, spreadsheets.
Page updated on: 1999 Feb 27 11:35.

1. Introduction.

Computers are now widely used in molecular biology, e.g. to run simulations or sequence analysis and to test hypotheses against genetic databanks.
Biology researchers not only need to use software that is currently available to them, but also to develop new programs or adapt existing ones. Some biologists already write their own program and even release public software. A larger number are end- or blended-users who do not want to spend too much of time and effort on programming and computer issues. These users need a scaffolding environment to process their data themselves or to try simple ideas with greater ease.

My purpose in this position paper is not to propose a new end-user language or programming environment for these users, but to explain why, according to my experience, this is a difficult problem.

I have observed biologists programming in Scheme during a three months programming course where I helped them with practical exercises. By the end of the course, they were able do design algorithms and define data structures. After the course, however, it was very hard for them to apply what they had learned to real problems, and to build actual software. They discovered that writing a program is more than implementing an algorithm.

I have also observed for two years biologists using Web interfaces that I have developped to give access to biological sequence analysis programs on Unix. A popular feature of this system is the ability to chain programs with the equivalent of a Unix pipeline or shell scripts. These observations suggest that users would probably write shell scripts if they were more at hand.
What then, would it take to make programming "at hand" for these users?

I propose that part of the problem is on the computer scientist side: it is not what professionals consider as hard in programming, e.g. writing algorithms, that biologists find difficult, but the many "details" around it, such as input/output, databank file formats, visualization.


2. Current techniques.

2.1. Visual programming.

Visual languages (as opposed to visual environments), can be separated into bi-modal systems (editing / executing modes) such as LabView [barroth95] and direct "programming in the interface" (PITUI) tools, such as Self [rbsmith-95] or Forms/3 [burnett95].

Bi-modal models support the graphical manipulation of programming concepts, such as loops, conditionals, classes, component interfaces. The idea behind this kind of tools is that the difficulty for non professional programmers is a synctatic one, and that replacing text coding by graphical coding will solve the problem. Nardi [nardi93] showed that the visual versus textual distinction does not really discriminate between end-user versus programmer languages. Nardi also gives examples of non programmer textual languages such as knitting machine instructions and base-ball game coding.

The PITUI approach on the other hand does not make a difference between programming and using, and makes both activities available simultaneously. However, this does not necessarily mean that everything has to be graphical or visual in the language. For users such as the biologists, it is important to have a PITUI environment, although it need not be exclusively graphical.

2.2. Programming by demonstration.

The idea of the "Programming by demonstration (or by example)" approach [cypher93] is to use sequence of user's actions to infer a more general program.The more interactive the interface, the more useful and extended the approach. There are however some limits: Programming by example is designed to reuse previous actions, but this only works if there has been such actions, and it is sometimes easier to describe an action than to demonstrate it to the program. Therefore, even if there is a macro or history facility, which would be useful for biologists, it should be completed with a general language.

2.3. Software flexibility.

One can define degrees in software flexibility [nierstrasz95a]: functional parameterization, software composition where the structure is a parameter, and general programming languages. The first level is insufficient for biologists. It seems that the second level is either a software engineering matter (and too difficult for end-users) or not general enough (visual composition environments).
We are interested in the component composition level (or component reuse level, which is better suited here), and also in the component definition level since some components may have to be slightly adapted to the user's needs. This does not mean that end-users should be able to use components as a framework, nor that they should be able to define new components from scratch, but that they could go to the component definition level and incrementally modify it or clone it to try a new feature.

3. What is programming by the way?

It appears therefore that we need a general programming language that can be used directly from the user interface and that is domain oriented. We should now almost be done, because we know what programming is. Well, do we? What seems strange and confusing is that the definition of what is or is not programming is not clear at all. The table below contains several definitions of what programming is : each of these statements can be easily challenged.

programming is... but...
Programming is abstraction, generalization: writing a program involves modelling some objects in a general enough way. Programming also requires specialization, e.g. going into the code to customize part of it. For example, writing a device driver is the opposite of doing a generalization; taking a general algorithm (dynamic programming) and tuning its equations for a specific field (DNA sequences comparison) is considered as programming.
The essence of programming is in the design of algorithms. Very few programs contain a new algorithm. It is indeed well established that the largest part of a program is not algorithmic in nature, but rather consists of "glue" code, input/output, user interface. Should we considder that using, e.g., a loop is programming?
Programming is automation: saving some instructions into a file to be re-used later, is programming. Does this mean that entering complex commands into an interactive shell is not programming?
Programming, by saving instructions to be executed at a given time, is also planning. Does it mean that using the at or cron command on Unix is programming?
Programming is translating to a meta or symbolic level, e.g. dealing with the name of an object (this also refers to the use-mention dichotomy [myers92b-SUC]). Is defining an alias programming?
Programming is writing code. Is editing a configuration file programming?
Programming is using a compiler or an interpreter. Is submitting a file to LateX or even Netscape programming?
Programming is building and implementing software or software components. How often do programmers build actual software or even software components?

The purpose of this this list is not to define what programming is, but instead to show that we do not really know what it is, even though when I say to a colleague that "I am programming" he or she knows perfectly what I mean.

4. An empirical approach.

These questions about what is programming are important and have to be addressed. After all, in order to design an end-user programming language, we ought to know what is a programming language. A different perspective is to start from the users. User-centered design has proved useful in many areas. I believe it is particularily important in a field where professionals think they know what the problem is.

4.1. Participatory design.

From the user perspective, the first and most important question to ask is: "What and how would biologists program?" To address this question, I have chosen to let them participate very early in the design process and to apply participatory design techniques [schuler93]. The following studies have been conducted so far: This approach has already shown a number of immediate and potential benefits:

4.2. Requirements and technical guidelines.

My goal is not (yet) to propose a toolkit or a GUI, but rather to evaluate potential approaches and identify requirements to build a first prototype. This prototype will be used to test such trigger ideas during brainstorming and prototyping workshops with users.
  1. A scripting language is a good choice for gluing pieces of software and do actual reuse, as well as for prototyping purpose. Textual coding is not an issue, because it is intended more for intermediate users rather than for beginners. Moreover textual coding is definitely not the most difficult part of programming.
  2. Spreadsheets have shown the power of combining direct manipulation with symbolic access (naming) for dealing with objects.
  3. The source code should be accessible from the user interface. Web pages have this feature, which makes it easy for anyone to be publisher by copying and pasting from any page. Another example is the Self environment [rbsmith-95], where the implementation of every object is accessible through its outliner, directly from the user interface. This contradicts the encapsulation principle, which is important in the software engineering field, but not in a scaffolding and learning context.
  4. Meta-object protocols [kiczales91] or more generally open implementation ideas go in a similar direction by giving access to the underlying decision level. Reflective features in the chosen scripting language should be helpful for manipulating symbolic informations.
  5. I like the idea from prototype languages of not having to declare types or classes: optimization issues in compilers have made this common, but more and more scripting languages like perl or tcl are only string based, which is a convenient feature. Besides, programming is not necessarily a modeling process.
  6. A good repository or library of working and realistic objects and functions with appropriate default behaviour should be available. It is easier to use an already instantiated framework, rather than an abstract one. For example, Web interfaces benefit from an extensive collection of biological software installation.
These guidelines and existing techniques are being used as a starting point in participatory design sessions, in order to let the biologists programmers figure out their own needs and learn from existing possible solutions.
The main idea is to lower the step a user need to climb to start programming, by accepting the idea that programming is neither necessarily software engineering nor modeling, and that, for a first try, pottering is better than nothing.

References

[barroth95]
Ed Baroth and Chris Hartsough
"Visual Programming in the Real Wold"
Visual Object-Oriented Programming, Concepts and Environments, 1995, Prentice Hall

[burnett95]
Bay-Wei Chang, David Ungar, and Randall B. Smith
"Getting Close to Objects: Object-Focused Programming Environments"
Visual Object-Oriented Programming, Concepts and Environments, 1995, Prentice Hall

[cypher93]
Allen Cypher
"Watch What I Do. Programming by Demonstration" , 1993, MIT Press

[kiczales91]
G. Kiczales and J. des Rivieres and D. G. Bobrow
"The Art of the Meta-Object Protocol" , 1991, MIT Press

[mackay92a]
Mackay, W.E
"Beyond iterative design: User innovation in co-adaptive systems."
Rank Xerox EuroPARC, Cambridge, England, 1992

[myers92b-SUC]
Randall B. Smith, David Ungar, and Bay-Wei Chang
"The Use-Mention Perspective on Programming for the Interface"
Languages for Developping User Interfaces, Jones and Bartlett

[nardi93]
Bonnie A. Nardi
"A small matter of programming: perspectives on end user computing" , 1993, MIT Press

[nierstrasz95a]
Oscar Nierstrasz and Laurent Dami
"Component-Oriented Software Technology"
Object-Oriented Software Composition, 1995, Prentice Hall, pp 3-28

[rbsmith-95]
Randall B. Smith and David Ungar
" Programming as an Experience: The inspiration for Self "
in Proc. ECOOP '95, 1995

[sagot97]
M.-F. Sagot and A. Viari and H. Soldano
"Multiple sequence comparison --- A peptide matching approach"
Theoretical Computer Science, 1997, pp 115--137

[schuler93]
Douglas Schuler and Aki Namioka
"Participatory Design: Principles and Practices " , 1993, Hillsdale, NJ: LEA