Building a Language for an Interactive Experience

  • 0
150 150 Adrian


Interactive multimedia systems are often designed and implemented primarily through intuition, and thus a common language for creating an interactive experience is needed for the interactive system developer. This paper discusses building a language for an interactive experience by outlining the steps required, and principles involved in designing an interactive environment. A preliminary language is achieved by offering a working definition for interactivity, outlining the steps in the interactive process, and discussing issues around the most important component of the system – the user.


Design, experience, interactive environments, interactivity, multimedia system development, user-centered design.


“A technology is interactive to the degree that it reflects the consequences of our actions or decisions back to us. It follows that an interactive technology is a medium through which we communicate with ourselves… a mirror.” [13]

This paper discusses ideas around building a language for an interactive experience. A common language for concepts and approaches to designing interactive environments is needed because “research on multimedia system development shows that contemporary multimedia systems are designed and created primarily by intuition” [14]. To tackle the problem, this paper begins by offering a working definition for interactivity, and then proceeds to outline and discuss the steps in the interactivity feedback process, from capturing and interpreting information in the environment to designing the user experience.


The word interactivity has become a problematic term. It is used extensively to describe works of new media and interactive art, encompassing everything from simple hyperlinks to immersive virtual environments where the user navigates a 3-dimensional space. Interactivity is used to describe the reading of a novel, the menu presented to a user when playing a DVD movie, and the experience players of a Massive Multiplayer Online Game (MMOG) have when they decide to go against the game’s narrative spine and design their own quests. For this reason, interactivity needs to be understood as a term that encompasses many different levels of interaction.

In discussing the Japanese journalist and curator Itsuo Sakane, David Rokeby discusses how Itsuo considered any art to be interactive if we accept that viewing and interpreting a work of art is a form of participation [13]. This illustrates an important point: even though interactivity is commonly used to refer to participatory digital technologies, its implicit meaning has broader applications.

Rokeby defines interactive art as a dialogue between the interactor and the digital artwork: “The interactive system responds to the interactor, who in return responds to that response” [13]. This refers to the concept of a ‘feedback loop’ that many people associate with interactive systems, and which I believe is a very effective way of creating agency.

The extreme view is that a media object cannot be said to be interactive unless both the spectator/interactor and the media object are in some way changed permanently by each other or affected by the exchange [13]. In order for a media object to meet this requirement it would need to have enough intelligence to be able to adapt and grow according to its surroundings and user interactions. Limiting the definition to intelligent media or to where there is mutual transformation is disputable, however, as one can be said to interact with a book or magazine when they pick it up and turn the pages to read it. In a Choose-Your-Own-Adventure book (also called a Gamebook), for example, the user navigates a multi-linear story by choosing to follow particular forking paths – the reader is periodically confronted with a story decision, and is directed to a specific page number based on their choice. A Choose-Your-Own-Adventure book is an analog form and yet is definitely interactive on some level.

In his article “Against Hypertext,” Eric Zimmerman recognizes the broad usage of the term; he proposes four ‘modes of interactivity’ to help bring some clarity to the discourse [16]:

Interpretative Interactivity

Interpretative interactivity is cognitive participation with a media object. The interaction is with the ‘content’ of the media, such as the interpreted story from reading a book or watching a movie. This is the level which Itsuo Sakane was referring to, where the interaction is on the level of the “psychological, emotional, hermeneutic, semiotic, reader-response” [16]. In this domain almost anything can be considered to be interactive as long as it involves some form of viewing and interpreting.

Utilitarian Interactivity

Utilitarian interactivity is functional participation with a media object. This includes interaction with the physical, such as the weight and texture of a book, as well as the functional textual apparatuses, such as a table of contents or index. Although the manipulation of a mouse pointer on a computer screen is on a different level of interactivity, the physical movements of the mouse and button depression by the user fall under this category.

Designed Choice Interactivity

Designed choice interactivity is explicit participation with a media object. This is interactivity in the sense that it is typically used when referring to new media. Included here are participatory actions such as clicking a hyperlink, working your way through an action-adventure game, experiencing a flight simulator, etc. This is a programmed interactive experience. Designed choice interactivity involves a two-way communication system between the media object and the user, and just as the user has the power to exert authority on the media, the media also has the ability to exert its influence in the exchange. In designed choice interactivity, the user receives feedback from the media object based on their input – but the media is not just reacting, it is also manipulating the user on a functional level (through programmed limitations, constraints, guidance, etc.) and on an emotional level (impacting a user’s psychology). The user is thus transformed (psychologically speaking) by the media object, just as they themselves transform it (via the responsiveness and flexibility of the program).

When referring to interactivity throughout the rest of this paper, I am referring to designed choice interactivity. Thus, an interactive experience refers to a relationship between a user and a system where the user has agency over the system’s form and/or content, and thus actively contributes to the construction of meaning and/or influences the system’s response(s). A standard novel would therefore not be interactive under this definition because the reader has no control over the book’s form or content, whereas a Choose-Your-Own-Adventure book, while not digital, is still interactive because it allows for the user to co-construct meaning by deciding on their own story path through the branching narrative structure.


Macro-interactivity is cultural participation with a media object. Examples of this type of interactivity are fan culture; The Sims Exchange, a website where players of The Sims computer game can download playable game items (such as families, houses, skins, etc.) uploaded to the site by other players; the rise of popularity of the Laura Croft character from the video game Tomb Raider to become a sex symbol; and the in-game communities built up amongst players of Massive Multiplayer Online Games (MMOGs). Macro-interactivity is where “readers appropriate, deconstruct, and reconstruct… media, participating in and propagating massive narrative worlds” [16].


In order to create interactivity, a system for knowledge representation and computer intelligence are needed. The computer must have at least some a priori knowledge about the interactive environment – basic assumptions that allow the program to focus its design. Complementary to the a priori knowledge is the information that the system acquires during runtime – the information-capture and decision-making algorithms are the system’s contextual knowledge. Computer intelligence is expressed by the effectiveness of the system’s algorithms to capture and interpret the sensory data, and to manipulate the system to produce an appropriate response for the user.

According to Robb Lovell, there are three kinds of abilities that are required for computer intelligence in the interactive arts: Perception, Reasoning, and Dexterity [11]. Perception is how well the computer understands the environment which it is trying to analyze through its sensory mechanisms, which can include video images, infrared sensors, proximity sensors, spoken words, text, UI interactions (such as mouse/keyboard input), etc. Reasoning is the computer’s ability to interpret the sensed information in a meaningful way, and to make decisions based on the acquired data. Finally, dexterity is the computer’s ability to provide a feedback response to the user/performer based on the interaction with the system. Responses involve the electronic manipulation of media, such as changes in lighting, sounds, visuals, robotics, etc.

“The issue of who is controlling whom becomes blurred. The intelligence of the human interactors spreads around the whole loop, often coming back in ways they don’t recognize, causing them to attribute intelligence to the system.” [12]

In complex, well-designed interactive systems, computer intelligence can facilitate emergent or pseudo-emergent phenomena. Emergent and pseudo-emergent phenomena provide interactive artists with dynamic responses that are generated from the program’s algorithms themselves, and produce results that go beyond the system’s predictable behaviors.


The steps that can be used to build an interactive media space are outlined by Lovell in his paper “A Blueprint for Using a Reactive Performance Space” [8]. Lovell encapsulates these categories in the term media structure. Below is a slightly modified version of Lovell’s media structure; it describes the information feedback loop that occurs between the interactor and the system in an interactive space:

  • Action: The physical phenomena, such as fingers typing on a keyboard or the movement of a person’s limbs, which are used by the system as user input.
  • Sensing: The digitization of the action, captured from a peripheral device such as a keyboard or a camera sensor.
  • Processing: The transformation of the digitized data into meaningful units of information. Processing involves the algorithmic interpretation of the input data.
  • Translation: The transformation of the interpreted information into a response decision based on the system’s control program.
  • Generation: The system’s response to the user, based on the decisions made in the translation stage.
  • Presentation: The part of the system that physically produces the media that the user experiences, such as a computer monitor or projector.

The next step in this process is the reaction of the user to what is presented, which can be treated as an action, and thus the feedback loop commences once more.


There are generally four categories of media that can be controlled by a computer in an interactive environment: visuals, light, sound, and mechanical systems. Technologies to sense and interpret events that occur within the interactive space include “video-based sensing, tracking systems, sound samplers, pitch detection, and analog sensors (heat, touch, bend, acceleration, etc)” [9]. Below is a discussion on one of these technologies – video-based sensing – as an example of the capture and analysis process.

Video-Based Sensing

“The environment will, as a given, be ambiguous in nature. The information contained within a video image is incomplete and limited, and since this is the computer’s view of the world, its representation of the current state of its surroundings will be unreliable.” [10]

The task of having the computer understand what is happening in the physical world is a difficult one, for items contained in a scene are just blobs to the computer. Fortunately, these blobs have a location, size, velocity, and other measurable characteristics – characteristics that can be used for interactivity. But the task is never trivial. Special effects filmmakers get around some of the obstacles of video tracking by matching points on a custom-designed wearable suit to a 3-D model in the computer, but this is not practical for a general audience. While computers today cannot approach the level of perception of the human body, they can understand the outside world in a limited way through image processing techniques, through knowledge representation, and through assumptions about the physical environment [10].

A major conceptual hurdle for artists and programmers dealing with video-based sensing is that a camera does not see the world the way humans do. Distance information is very difficult to measure, and as such, “actions that cut across the camera’s view appear different than actions that move towards or away from the camera” [9]. Lighting is a crucial factor when doing video-based sensing. You need to be very aware, and have as much control as possible, over the lighting conditions of the installation space. How light falls on the people and props in the space determines how they will be seen by the computer.

Because of the great challenges with live video capture, many interactive installations use programs where “responses are based upon randomized manipulations or heuristic road maps for the computer to follow” [10]. The definitive challenge, then, is to be able to take the interactivity that one step further, to create a reactive space based on highly meaningful user interactions.

Video-Based Information Extraction

Because of the challenges in getting a computer to interpret a space in the desired manner, the space must be well controlled. This means that no strict rules for interpretation can effectively be established, but rather that each environment is unique, and that video-based sensing is heavily determined by context.

“The person creating the means for a computer to understand part of an environment must make assumptions about the structure and content of the environment in order to create algorithms to extract information for the computer to use.” [9]

Extraction Techniques:

  1. Motion: “Motion is calculated by subtracting successive images from each other, and counting the number of pixels that have changed… Under constant lighting conditions, motion is the change in surface area of objects in the scene” [9]. An object that is closer to the camera will appear to have more motion than the same object farther away from the camera because being closer to the camera causes many pixels to be affected.
  2. Presence: To detect presence is to detect the simple presence or absence of light.
  3. Background: A common practice with video-based sensing techniques is to grab a snapshot of the background without any objects in it to use as a baseline for comparisons. This is effective at showing foreground objects more clearly provided the foreground objects are not the same color and intensity of the background.
  4. Objects: This technique tries to distinguish single entities within the camera’s view. In order to do this, the objects need to appear different in some way to the computer. The most common methods are by having high contrast objects (e.g. light colored objects against a dark background), or by color tracking. Once an object is identified, it can be measured for traits such as motion, speed, location, etc.

In an article on human body tracking, Wren et al. describe how they used a combination of color and object detection to build a model for tracking people in an installation space:

“The person model is built by first detecting a large change in the scene, and then building up a multi-blob model of the user over time. The model building process is driven by the distribution of color on the person’s body, with blobs added to account for each differently-colored region. Typically separate blobs are required for the person’s hands, head, feet, shirt and pants.” [15]


“The interpersonal, back-channel communications and ancillary activities of the audience, which currently remain largely unsensed and unprocessed, can be just as important as the primary authored experience” [3]

Human-Computer Interaction

Human-computer interaction, or HCI, is a discipline concerned with the analysis, design, and implementation of how people interact with computers. HCI is often seen as a remediation of other media, both past and present. The Macintosh’s famous GUI, for example, first introduced in 1984, uses the metaphor of an office desktop for the personal computer user interface – a metaphor that has since been adopted by Windows and become ubiquitous for current Windows and Mac operating systems.

The study of Human-Computer Interaction is crucial to the development of a successful interface and navigation structure for interactive experiences. A major obstacle to effective design in today’s rapidly changing and expanding digital world is that too often software and hardware engineers fail to adequately address how their audience will interact with their product. One of the most important issues in bridging the gap between man and machine is developing a user interface (UI) that is intuitive and easy to learn. HCI and UI issues are important considerations in the development of an interactive experience.

Principles of User Design

The following discussion outlines five principles for user design. The first principle is that the design for an interactive system should be user-centered. As Mitchell Kapor states in his “Software Design Manifesto”:

“If a user interface is designed ‘after the fact’ it is like designing an automobile dashboard after the engine, chassis, and all other components and functions are specified” [6]

The important questions in creating a user-centered design are: Who are the users? What are the main functions that the user will need? Why does a user want to interact with the interactive system? Is the system’s interactivity accessible and understandable by users of different experience levels? What is the most intuitive way that the user could interact with the computer program controlling the environment?

One key issue that designers and programmers alike often forget (or neglect) is that interface design incorporates many different disciplines: hardware and software engineering, ergonomics, psychology, sociology, linguistics, computer science, etc. As such, the second important design consideration is integration of knowledge and experience from all of the HCI-related disciplines.

The third design consideration is that the system should be thoroughly tested before release to ensure that it contains no bugs that will inhibit its function. The quickest way to inhibit user enjoyment is to create frustration over simple interface and navigation issues.

The fourth design factor is an issue of commodity: the interface and navigation of the interactive environment should be well suited to complement the functionality of the system.

The fifth and final design consideration is less quantifiable, yet important nonetheless. The idea is that the design should be pleasurable to use, incorporating visual, aural, and functional aesthetics.

In summary, the five principles of user design that should be incorporated into creating an interactive experience are:

  1. The design should be user-centered.
  2. It should effectively integrate the HCI-related disciplines.
  3. The interactive system should be free of major bugs.
  4. The interface and navigation should be well suited to the functionality of the system, as well as the narrative structure and content.
  5. The system for interaction should be enjoyable to use.

Principles of User Navigation

The following discussion outlines five principles for user navigation. One of the most important concepts in navigation is that it should be easily learned – if it takes too long to grasp the navigational flow of a program, the user will become frustrated and lose interest. An easily learned navigation structure is the first step in creating user satisfaction. Instructions can be useful, but the navigation concepts should be intuitive enough that an excessive amount of preparation is not necessary.

“The interactive artist must strike a balance between the interactor’s sense of control, which enforces identification, and the richness of the responsive system’s behaviour, which keeps the system from becoming closed” [13]

Another important concept to help the user maintain a sense of spatial orientation is to remain consistent. From the audience perspective, this means not only a consistent look and feel to the physical interfaces, but also that the interactive presentation responds to similar senor data in similar ways. If a user waves their arms, for example, and this is a method of interaction with the piece, the system should respond in a logical manner, and respond in the same manner if the user repeats the action (unless, of course, a sense of abstraction, confusion, or mystery is desired in the particular piece).

“The interactor waves his hand to trigger a sound. He then waves again, in a similar manner, to find out if the same sound will be triggered again. If something else is heard, the interactor may conclude that the system does not function well, or that it isn’t really interactive at all.” [5]

A feedback mechanism is crucial for effective navigation – the user needs to feel that their actions have meaning. For a participant navigating an interactive presentation, feedback could come as sound cues, visual cues, tactile cues, etc. A sound cue could be a noise generated when a user steps into a certain area of the environment. A visual cue could be that when a user steps into a certain area the presentation changes in response to the user’s position.

“[T]he interactor can never have an absolute control over the system. Rather, he enters into an on-going and evolving dialogue, a ‘cybernetic feed-back loop’ without a final resolution.” [5]

The navigational tools should be based on the goals of the user, meaning that they should appear in context and support the flow of the composition. In interactive storytelling, for example, it is important to have continuous interaction such that the system does not need to pole the participant for input, and where the interactivity is inherent to the story. In order for the user’s experience to be highly immersive, “the user’s interaction should be a smooth and continuous stream of input that influences the story world, much as a rudder steers a boat” [4].

Does the navigational structure support users coming from different technical or cultural backgrounds? Along with meeting the user’s goals, the navigation structure should also be appropriate and support the interactive environment. It is often desirable in an interactive installation to have a transparent interface because it allows the user to participate without having to consider their direct relationship with the underlying system, but no interface can be completely transparent. The most one can hope for is that the interface be so well integrated as to be subconsciously accepted by the interactor as transparent.

“When an interface is accepted as transparent, the user and his or her world are changed; the transforming characteristics of the interface, no longer contained by a visible apparatus, are incorporated into the user and the user’s world view” [13]

To summarize, the navigation framework that should be used in designing an interactive experience is based on the idea that the navigation should:

  1. Be easily learned.
  2. Remain consistent.
  3. Provide feedback.
  4. Appear in context.
  5. Support the user’s goals.


What kind of experience does an author wish a user to have in an interactive environment? What kind of experience does the user themselves wish to have?

“Rather than creating finished works, the interactive artist creates relationships… Rather than broadcasting content, interactive media have the power to broadcast modes of perception and action.” [13]

Blom and Chaplin discuss ideas around the experiential body of knowledge [1]; I discuss these ideas below in the context of building a language for an interactive environment.

The first concept is kinesthetic awareness, meaning that in interactive environments participants have the opportunity to explore meaning behind their own movements. For example, audience members may ask themselves some of the following questions: “What will happen to the system if I twist my body around this way?” “Is there a different effect produced if I move just my head and try to keep the rest of my body still?” “What kind of content produces what kind of effects on my kinesthetic awareness?”

“An interactive system can be seen as giving the user the power to affect the course of the system, or as interfering in the interactor’s subjective process of exploration” [12]

The second concept in Blom and Chaplin’s theory is phrasing: “All movement contains innate rhythms and phrases which provide the magic ingredients in any of the performing arts” [1]. The third concept is form. Questions the participants could ask themselves include: “What kind of responses will I get from the system if I focus on grouping my body movements into circular patterns?” The fourth concept is relating – this concept is explored by the participant discovering their relationship to the sensors, the presented material, and the medium through which the content is presented. The fifth and final concept is abstraction – because the movements are not in any way choreographed, each time a participant enters the interactive space they create a new narrative, a new unfolding, and a new experience.

Stephen Levinson talks about how “our thinking is fundamentally spatial” [7]. This concept relates to interactive environments because there is an inherent spatial element to how the user relates to the interface and sensors, especially when the interface is something more than a mouse and keyboard.

“It is just because, for us, spatial knowledge is a matter of higher-level thinking… that we can be deeply intrigued and tantalized by the artful manipulations of space by which the architect and sculptor play on our minds.” [7]

An important issue in designing a user-experience for an interactive narrative is what role the audience plays in the story or plot. The story could be designed around a first-person perspective, where all of the story elements are presented from the audience’s point of view, and the audience is a protagonist in the narrative. Some theorists believe, however, that taking a third-person perspective produces the best results. In this case, it is still beneficial to give the participant a role within the narrative to increase the immersive qualities of the experience, but one that only indirectly affects the plot. The idea of staying away from a first-person perspective may seem, at first, to contradict the goal for increased immersion, but there are other interactivity theorists who make a similar argument:

“Most designers of cyber worlds remain committed to creating first-person experiences, which immerse the participant in an unknown world of authored action and consequence, despite the limited success of this form… [T]he first-person viewpoint loses meaning as soon as the participant steps back to a more distant experience.” [3]


“Some exciting work has taken place in sensor technology for music applications and in vision algorithms that can, for instance, read sign language. However, no one has unearthed a more general, universal language for gesture, and none may be forthcoming.” [2]

This paper discusses interactivity for a novice user domain (i.e. not performers). Therefore, there are certain precautions which must be acknowledged:

  • You cannot predict what people are going to do. Therefore, you must build into the system a certain amount of acceptance that some people won’t be doing the ‘right’ things. One way to accomplish this is to limit the flexibility of the interactions.
  • If you capture a great deal of subtle variations it’s hard for the audience to know how they are affecting the system; if you have the system respond to larger and fewer variations, the audience understands better. More than 3 or 4 different things happening at the same time is too many for people to understand what affect they are having.

“It is difficult to sense interaction in situations where one is simultaneously affecting all of the parameters… The constraints provide a frame of reference, a context, within with interaction can be perceived.” [13]

  • The most reliable tracking techniques are based on detecting location, motion, velocity, and direction of travel. All of these except location are cyclical, meaning that they can reverse direction rapidly. For cyclical traits you usually have to average the values to get meaningful results.
  • You need a comfort level for movements and gestures.
  • There is typically a great deal of ambiguity to deal with; context-specific rules need to be made for each installation space. The more rules and constraints that are applied to the installation space, however, the narrower the sensory data flow into the computer.

“By increasing the amount of filtering that is applied in the perceptual process that the interactive system employs, the designer increases the reliability of the resulting information and therefore the unambiguity of control, but at the same time, the richness of that information is reduced” [13]


This paper has outlined concepts and ideas around building a language for an interactive experience. Interactivity in this context refers to designed choice interactivity, which is a relationship between a user and a system where the user has agency over the system’s form and/or content, and thus actively contributes to the construction of meaning and/or influences the system’s response(s).

The steps in the interactive process can be summarized as: (1) Action; (2) Sensing; (3) Processing; (4) Translation; (5) Generation; and (6) Presentation. The concepts discussed in this paper relate to steps 1 to 4. The action step was discussed in the section on user experience, and deals with how the user physically interacts with the system. The sensing step was discussed in the section on capture and analysis, and used video-based sensing as an example of the detection and interpretation process. The processing and translation steps were discussed in the section on user interaction, and deal with how the overall logic of the system’s algorithms should function at the user level.

The goal of this paper was to provide a preliminary language for interactive system developers for creating an effective interactive user experience. The hope that the principles and processes outlined here can be used towards the creation of a formal methodology for interactivity design.


  1. Blom, L.A. & L. Tarin Chaplin. “The Experiential Body of Knowledge,” in The Moment of Movement: Dance Improvisation. London: Dance Books, 1988.
  2. Davenport, Glorianna. “Smarter Tools for Storytelling: Are the just around the corner?” IEEE Multimedia Spring 1996: 10-14.
  3. Davenport, Glorianna. “Curious Learning, Cultural Bias, and the Learning Curve.” IEEE Multimedia April-June 1998.
  4. Galyean, Tinsley A. Narrative Guidance of Interactivity. Ph.D. Thesis, M.I.T., 1995.
  5. Huhtamo, Erkki. “Silicon remembers Ideology, or David Rokeby’s meta-interactive art.” Accessed Jan. 22nd 2003.
  6. Kapor, Mitchell. “The Software Design Manifesto.” mkapor/Software_Design_Manifesto.html. Accessed Nov. 8th 2002.
  7. Levinson, S.C. “Space and Place,” in Some of the Facts – Exhibition Catalogue for Anthony Gormley at Tate St. Ives, 2001.
  8. Lovell, Robb. “A Blueprint for Using a Reactive Performance Space.” http://www.intelligentstage. com/papers/blueprint/Blueprint.html. Accessed Dec. 15th 2002a.
  9. Lovell, Robb. “Video Based Sensing in Reactive Performance Spaces.” http://www.intelligentstage. com/papers/VBS.html. Accessed Dec. 15th 2002b.
  10. Lovell, Robb. “Towards Computer Cognition of Gesture.” Institute for Studies in the Arts, Arizona State University. http://www.intelligentstage. com/papers/gesture%20paper.html. Accessed Dec. 15th 2002c.
  11. Lovell, Robb. “Computer Intelligence in the Theater.” cit/cit.html. Accessed Dec. 15th 2002d.
  12. Rokeby, David. “The Construction of Experience: Interface as Content” in Digital Illusion: Entertaining the Future with High Technology, Clark Dodsworth, Jr., Contributing Editor. ACM Press, 1998.
  13. Rokeby, David. “Transforming Mirrors.” mirrorsintro.html. Accessed Dec. 15th 2002.
  14. Skov, M. B., and J. Stage. “Designing Interactive Narrative Systems: Is Object-Orientation Useful?” Computers & Graphics Vol. 26 No. 1 2002: 57 – 66.
  15. Wren, C. R., Azarbayejani, A., Darrell, T., and A. Pentland. “Pfinder: Real-Time Tracking of the Human Body.” IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 19 No. 7 July 1997: 780-785.
  16. Zimmerman, Eric. “Against Hypertext.” American Letters & Commentary Issue #12 2001.
  • 0


Adrian, or AJ, is the founder and Director of Technology of Pop Digital. He has spoken at tech conferences around the world, and published numerous articles about Agile methodologies, UX design, Information Architecture, and Web Development.

All stories by: Adrian