![]() |
|
We are building a series of prototypes to learn if abstract representation of activity data does indeed convey a sense of remote presence and does so in a sufficiently subdued manner to allow the user to concentrate on his or her main activity. We have done some initial testing of the technical feasibility of our designs. What still remains is an extensive effort of designing a symbolic language of remote presence, done in parallel with studies of how people will connect and communicate through such a language as they live with the AROMA system.
Having shifted the focus away from work, we are also ready to broaden our prospective usage domain beyond the work place: enhancement of social awareness over geographical distances is certainly a theme of interest to people outside the working life. One can easily think of very useful situations in relation to care of elder relatives or situations where you are travelling and want to be closer in touch with your loved ones than current telephone technology allows. By extending the usage domains we are faced with hard questions such as balancing privacy and availability interests and choosing capture and display techniques that fit into and work in settings that are likely to be more heterogeneous than the traditional office environment.
The phenomenon we are after may be the "preattentive processes" described in the Oxford Companion to The Mind under the entry Vision: early warning: "a preattentive (process) for the almost instantaneous detection of textual changes in our environment that indicates the occurrence of objects, and an attentive (serial) one that can shift focal attention to any of the objects detected by the preattentive process". The phenomenon is related to that of subliminal perception and intuitive conduct, and further studies of more theoretical nature may prove useful in our design.
People have an amazing ability to make sense of even very few and scattered snippets of information - just think of the hunter who is reading the ground for traces of animals passing by. At the same time, the skill of reading is an acquired competence. For most of us who are not hunters, the ground would tell us absolutely nothing about the passage of a deer some hours ago.
Most people have developed skills in reading the environment; perhaps not the set of skills used by the hunter in the forest, but others more appropriate for the everyday needs of the individual, as for instance the awareness of the neighbors maintained by the urban citizen by the lively soundscape of the apartment building and the neighborhood. Those of us who work in offices with visual and auditory closeness to our colleagues know the efficiency of peripheral awareness: most of the time we have a pretty good idea about who is around, who is having a meeting with someone from outside, and who is frantically trying to get a paper out in half an hour and therefore should not be disturbed.
For the "reader" there is a balance between learning too much about the environment at the expense of whatever is one's primary activity. Examples of the affordances for the reader, and their price tags are:
+ being prepared when approached (please note that reading most likely takes place as a peripheral process)
+ a sense of when others may be approached
+ a sense of having company, not being alone
- risk of interruption if the events feeding the peripheral awareness slide into focus
One example: it would sometimes be nice to have a better sense of what is going on at the callee's site before the telephone call is made, e.g., whether the time is appropriate for an idle chat or a serious conversation.
For the people being "read" by other users there is a balance between making oneself available and preserving one's privacy and personal integrity. Examples of the affordances for the one being read, and their price tags are
+ being able to "announce" one's availability
- risk of accidental revelation of personal/private information if events not meant to be public are "overheard"
- sense of violation of personal integrity when "too much" is available to others to hear, see, ..
One example: it would sometimes be nice if people wouldn't call us when we are busy doing something important. Usually our body language would be very easy to read, provided it could be communicated to those who might think of calling us.
Two people who know each other well and work closely together have become geographically separated for a longer period of time. They are trying to stay in touch by the usual technology such as telephone and email, and in addition they have established a kind of media space to share. The media space is organized as a pair of windows on their workstations, each displaying abstract visual and auditory effects all together reflecting the state of affairs at the remote site.
The visual effect could look like an abstract, dynamic painting in which the dynamics reflect the changes in the combined auditory and visual state of the remote site (as it would be picked up by, say, a microphone and an ultrasound sensor); the auditory effects could be created as the sound landscape of a forest: audio events and processes could be structurally analyzed and processed or they could just be mapped directly into waterfall sounds, bird song, the sound of a chain saw against fur trees, etc.
The display of presence data may be characterized by its abstractness with respect to the fullness of the original source of the signals. By "abstract" we mean the amount of data removed from the original signal; the more we throw away, the more abstract our display become.
However, another kind of abstractness is at play too: upon processing and transforming the original signal we may need increasingly more interpretation to "read" the display properly. The abstractness is here in relation to the immediateness of the reading.
The display representation can be enriched semantically by more extensive processing on the capture site, through recognition of certain high level objects in the captured data, as well as through identification of patterns in series of events. We see these as different approaches which may be used alone or in combination. The object recognition approach lend itself to recreation of the original scenery, whereas the other -- which is our current preference -- is more directed towards creation of symbolic representations of the scenery. When combined, high level objects would be used to enhance the recognition of patterns in the event data. However, the area of such complex capture site processing is still uncovered by our initial prototyping.
The prototype also provides facilities for experimenting with remapping across media: for example, what was originally captured as an auditory input (for instance by a microphone) can be processed/abstracted through a number of pipelined media manipulation modules resulting in streams of abstract "activity data". These abstract data are no longer linked to any specific media and can be used to control many different display types.
In a (much too) simplistic form we may think of a simple binding of auditory change to color and visual changes to speed of a slowly evolving display scene. Slightly more complex is a setup where audio is remapped into simple state data which in turn may be used as parameters to a visual display mechanism: an audio signal is picked up by a microphone and used to determine the number of people in a location, but no detailed audio information is transmitted to the remote location(s); on the display site the data is used to select the number of animated cartoon characters.
We have found it useful to differentiate between the intentional awareness of others one may seek for the purpose of deciding if they can be approached, versus the unintentional awareness that one may maintain about others in the surrounding for no direct purpose at all. A particular system may provide support for intentional awareness while being useless with respect to the unintentional variation. An example of such a system would be a media space where you would have to keep a button pressed to see and hear the remote site; such system could be useful for determining whether someone looks like he or she is available for an interruption.
Well, we are exploring an assumption about the benefits of abstract representation over direct media transfers: we are proposing (1) that abstract representations will provide a kind of "shielding" for privacy of the people in the spaces, (2) that abstractions may be preferable to more media-rich representations by providing a better peripheral, non-attention demanding awareness, and (3) it is a painless accommodation to our perpetual bandwidth shortage (there will always be more items to transport over the net than we have capacity for). Furthermore, we find the abstract representations particularly interesting because (4) they lend themselves directly to media remapping, allowing each user to choose the display medium that is most effective, and in general accommodate individual preferences (some people hate visual cues and like auditory ones, while other have the opposite preference).
These assumed advantages of abstract representation need to be assessed in the context of long-term use, including the effort it may take to get them internalized initially.

Figure 1: Charting our design space according to the relative
concreteness/abstractness of the representation of captured signals (x-axis),
and the location of the display device with respect to the traditional
computing system(y-axis)
The Sal scenario from Weiser's paper on Ubiquitous Computing [18] in which a window pane is used to display the recent traffic in the neighborhood, points to several key aspects of our design: use of non-computer screens, use of history and abstract representations of people's movements, the social, non-work purpose of the (imagined) installation.
Hiroshi Ishii and Natalie Jeremijenko led us to play with the concept of free remapping across media, i.e., what was captured as an audio signal may be abstracted into a mediaindependent activity measure and later synthesized into a different media. In his visionary video from 1994 [11], Ishii shows us a painter and a flute player performing together with music and paint: the music is audible but also mapped into an active painting that evolves and intertwines with the painter's more traditional paint strokes. Natalie Jeremijenko was an artist in the Xerox PARC PAIR program (PARC artist in residence); she created the installation: "Dangling string", which is a short piece of ethernet cable hanging suspended from the ceiling; the piece of cable will move, wave calmly or shake violently, relative to the traffic load on the local computer network (see also description in [19]).
As the design for awareness evolved it became clear that the contextual awareness provided by full media representations might lead to unwanted revelations, and many experiments have been done on controlled muffling or distortion of the signal.
An interesting extraction technique for audio was described by Smith&Hudson [16] as "Low Disturbance Audio For Awareness and Privacy in Media Space Applications". What they do is to process a speech signal into non-speech audio resulting in a sound that "(...) allows one to determine who is speaking, but not what they are saying, and which is not demanding of attention and hence can fall into background noise". Something similar to what Smith&Hudson did to audio signals can of course be done to video too. We can analyse the video signal of a scene and select some characteristic visual features to preserve while others are abstracted out. We prefer to call these abstraction techniques "silhouetting" because they have certain parallels to that old art of portraiture.
Having extracted certain high-level features it is also possible to use this information at the display site, for instance to control the behaviors of avatar-like characters [1, 12].
The two approaches clearly taps into different sides of human perception. Silhouetting is relying on the human perceptual faculties to "fill out" a few missing elements, where as we are tapping in to our symbolic abilities. Our approach is more risky: when symbolic and abstract representations really work, they are immensely powerful and efficient, but when they fail, we are left with something entirely unintelligible.
In the first round we mostly aimed at understanding the technical feasibility of our ideas. However, we have had some preliminary usage experience with the prototype which is installed between an office and a home.
In this section we will describe an architecture for a generic "awareness system". Finally, we describe the components and main processing of our current prototype within this generic architecture.
The input devices can be microphones, video cameras, or more singular sensors of various kinds. Sample sensors are pressure sensors, ultrasonic sensors and simple binary on/off sensors (switches).
Each input device is tied to a timer controlled object, called a capture object; the timer operates at a sampling rate appropriate for the specific device. Each capture object interfaces with the rest of the system through a circular buffer used to store the most recently captured data.
These buffers of the capture objects are available to so-called abstractor objects, doing basic signal processing, accumulations, and comparative analyses (such as history processing). An abstractor object is defined by a specific process performed on one or more (capture or abstractor) objects and possibly the recent history of the abstractor. This recent history is represented by a circular buffer of recent processing results. The data contained in this buffer can be shipped "as is" to the remote sites or used as input to other abstractors.
An abstractor can make use of data from more than one (capture or abstractor) object and more abstractors can make use of the same (capture or abstractor) object.
The rates at which data are shipped are chosen to fit the characteristic time of the abstractors balanced with the available bandwidth. An exception to this rule is those abstractors that analyze history to identify complex events: they ship their results whenever ready.
Possible output devices are speakers, displays/projectors, and a whole range of transducers that produce elements of haptic and kinetic response, etc. A sample transducer could be an electromechanical vibrator in the seat or back of a chair or a thermoelectric device to control the heat on parts of a work surface.
Incoming messages are dispatched using the message type to a set of synthesizer objects. Each synthesizer object is responsible for a particular abstract representation, i.e., a mapping from presence data to some display method. Typical synthesizer tasks are transformation of the incoming data to fit the dynamic range of the specific display device. Each synthesizer can make use of several different types of data from the remote site and the same data can be delivered to a number of synthesizers.
An important class of synthesizer objects are what we call abstract animations. Our initial intuitions, which were confirmed in our experimentation, suggest that (a) discrete signals are putting higher demands on attention than continuos signals, and (b) that although monotonousness may be low on attention demand it may also be too low and thereby making the signal too easy to ignore. That made us focus on visual display of animated objects, whose dynamic characteristics include moving around in certain patterns and changing appearance in shape, color and size. By tying some dynamic characteristics to presence data and others to simple timers, we are able to create a not-too-monotonous and not-too-abrupt imagery. We are aware that adding dynamics that is unrelated to remote activity may add to the difficulty of interpreting the abstract representations, and we need to study this issue further.
The most recent setup is inspired by "inner office windows" which allow the office inhabitant to stay aware of activities in the immediate surroundings [18]. The mediated surroundings could be the offices of close colleagues and/or the living rooms of close friends and family members. This is also the prototype we have used most extensively in experiments. It demonstrates crucial elements of particular abstract representations and media remapping. Figure 2 illustrates the configuration of this prototype.
The hardware in this prototype consists of two Power Macintosh 8100av with built-in speakers and greyscale Connectix Quickcams attached, and a National Instruments multifunction interface card with a set of a/d and d/a converters; this device controls the temperature of a handrest (keeping it within the range of 15-45 C using Peltier elements), and a electromechanical merry-go-round (15cm diameter). The code is written in C++, using Apple Quicktime and Apple gamesprockets libraries.
The capture site consists of two capture objects, one for each device, and three abstractor objects: one calculates the frame differences in consecutive video frames, another calculate the difference between consecutive samples of sound input level. A third abstractor combines the data generated by the other two abstractors and creates a compound value, the "bustle" factor.
We use the "bustle" factor as input to four different synthesizer objects on the display site: it determines the rotation speed of a merry-go-round, it sets the current sound level in a sea shore soundscape, it is mapped into temperature of a surface used as a handrest, and finally, it sets the speed of drifting clouds on a display. The display is also controlled by the sound level differences which determines the shades of gray used when painting the clouds.

Figure 2: Current AROMA prototype, virtual "inner office windows"
Learning curves: Our users encountered problems in learning how to decipher the abstract representations, in particular when the user had not designed the particular mapping from capture data to display data herself. One of our users liked to have a full video representation next to the AROMA display, though she suggested it might be only a need during the initial training.
History and memory: Since we are trying to support peripheral awareness we cannot expect people to constantly monitor the various display elements. While using our first prototype, it soon became clear that our activity representations were too volatile for occasional gaze: representations of events that would be important to know of disappeared and left no trace behind. Some kind of memory was needed to sustain the display of activity bursts. We looked around for ways to somehow "stretch time" to facilitate a view into the most recent past, rather than just providing a snapshot. We discovered that the active objects we used for displays offered a natural or inherent inertia: the surface temperature changed only slowly, allowing the user to feel the recent activity, the motor controlling the merry-go-round did not stop immediately when the activity level dropped to zero. In general, we are often able to utilize the inherent relaxation time of mechanical, hydrodynamic or thermoelectric systems/transducers as the vehicle for display of history. We also found that abstract representations in general (i.e., not only the active objects, but also the various visual representations) seem to lend themselves readily to history representations (as exemplified by color fading mechanism and ghostly outlines of earlier states).
Art and aesthetics: Finally, we found that a lot needs to be done on the aesthetics: We experienced how we (the designers) soon grew tired of the abstract displays we had chosen, and rather than "blaming" the very idea of using abstract representations we suggest that we could benefit immensely from having the appropriate artistic and communicative expertise involved in our work.
We are aware that we are reporting some very early findings, and we have to allow for serious problems to be uncovered in the practical use of abstract representations. Perhaps even more important, we need to incorporate a wide range of skills and knowledge in designing and evaluating what may be thought of as an abstract symbolic language of presence, proximity, and reticience.
The entire research field of social awareness in work and non-work settings seems so wide open and rich on fascinating opportunities for design and invention. We would like to see collaborations in areas such as basic research into human perception and socializing patterns, design work on interaction and integration of awareness systems with other media space components, and, finally, technical work on signal processing, networking, etc.
![]() |
|