The Psychology of Representation

October 1969

The Psychology of Representation

J. J. Gibson, Cornell University

The World Wide Web distribution of James Gibson’s “Purple Perils” is for scholarly use with the understanding that Gibson did not intend them for publication. References to these essays must cite them explicitly as unpublished manuscripts. Copies may be circulated if this statement is included on each copy.

Fifteen years ago, in an effort to promote straight thinking, I suggested a general definition of a representative picture (Gibson, 1954). A picture with fidelity, as I put it, was a surface so treated that it delivered to a special point of observation in front of the surface the same sheaf of light rays as the sheaf of rays coming from the original scene to a point of observation. The picture lacked fidelity insofar as the pictorial sheaf of rays differed. There had to be a correspondence between the intensities and colors of the light-rays from the picture and those from the scene, or a point-to-point correspondence between the elements of the picture surface and the elements of a theoretical cross-section of the original ray-sheaf. Perfect fidelity was impossible, but approximations could be devised. A picture was said to yield an experience “at second hand” in comparison to the “first hand experience” obtained when facing the original scene.

This conception of a faithful picture was derived, of course, from the theory of perspective in painting and the theory of light rays in optics. The concept of a transparent picture-plane and its station-point to which a certain layout of surfaces is projected by a dense bundle of lines intersecting at the point is very old. The inverted image on the rear wall of a pinhole camera (and thence the theory of photography) follows from this concept of geometrical projection. The still more general concept of light coming in straight lines from all directions, and doing so at every possible point of observation, is the basis for ecological optics, as we shall see.

Following this first theoretical paper, a number of other studies and some experimental investigations were carried out at Cornell (Gibson, 1956; Ryan and Schwartz, 1956; Smith and Gruber, 1958; Gibson, 1960; Smith and Smith. 1961; Hochberg, 1962; Hochberg and Brooks, 1962). A full discussion of representation in art, with a survey of conflicting psychological theories of perception, was published by Gombrich (1960). After a time, it began to be evident that the proposed definition of a picture was inadequate. The difficulty hinged on the concept of fidelity as point-to-point correspondence of the pictorial array to the original array. It was not enough merely to say that a non-representative painting lacked fidelity. Surely it might convey information, in some sense of that term, without correspondence of elements. A successful caricature, for example, could be said to supply truer information about a person than a photographic portrait did.

I had assumed in 1954 that when an artist sacrificed projective fidelity in a picture the distortion could only be justified in the interests of graphic conventions or symbols. I argued that these were codes which, like words, and to be learned as arbitrary associations, agreed upon by artist and perceiver, before they could be effective. I assumed that the two possible kinds of specifications were by projection(a man and his image ) and by convention (a man and his name). There could be mixtures of these two types but there was no alternative way in which light rays could carry information.

This assumption about the information in light was consistent with physical optics and the doctrine of visual sensations, or sense data. The light entering the eye consisted of irreducible points of color (or at best patches of color) and these could, of necessity, carry no information about the objects and surfaces from which the light came. The distance of an object for instance, the “third dimension,” was unspecified by the points of color projected by the object. The clues for depth had to be interpreted, either by associative learning or by innate intuition.

It is now becoming clear that the classical doctrine is wrong, and that light can and does carry information about the objects and surfaces from which it comes. The theory of gradients was a step toward this conclusion (Gibson, 1950) and the recent development of ecological optics is proof it is (Gibson, 1966, Ch. 10). The heart of this new discipline is the notion of the ambient optic array at a given point in the air, consisting not of rays but of components in a nested hierarchy of structure. This notion means that there is another kind of specification than by projection and by convention – specification by invariants of structure. If this is true, an artist can capture the information about a person, animal, object, or place by means of a picture without having to replicate the sheaf of rays. He can display the same information for perception without having to display the same stimuli for sensation. The conventional symbol is not the only alternative to the literal image.

A New Definition

I therefore want to propose a new and broader definition of a representative picture, in terms not of projective fidelity of the array but of mathematical invariants in the array. A picture is a surface so treated that it delivers an angular optic array to a point of observation in front of it that contains some of the information in a sector of the original optic array at the point of observation. A picture can provide an eye with almost the same pattern of light energies that the original array provided. This is what a photographic color transparency can do. But a picture can also provide an eye with adifferent pattern of stimulus energies and the same stimulus information that the original array provided. This is what a good caricature can do. Two optic arrays with the same stimulus energies at all points necessarily contain the same information. But two arrays can contain the same stimulus information without having the same stimulus energies at all points.

Note the implication that two perceptions can be the same without any necessity that the accompanying sensations be the same. This, of course, is what I have argued in asserting that a perceptual system should not be confused with a channel of sense, or a modality of sense-impressions (Gibson, 1966).

The obtaining of optical information and the having of visual sensations are distinct psychological processes. Classical theories of perception assume that the having of sensations is a prerequisite for perceiving whereas my theory assumes that it is at best only an accompaniment of perceiving, and that the direct obtaining of information can sometimes occur without the having of sensations at all. Visual sensations are a sort of luxury incidental to the serious business of perceiving the environment.

If this is true there are different possible kinds of representation in art. At one extreme is the re-presentation by the artist of information about the world, with no attention to visual sensations. At the other extreme is the re-arousal of visual sensations with no attention to information about the world. And somewhere in between is both the re-presentation of information for perception and the rearousal of stimulation for sensation, the case when the artist has a double aim, both to point out something to us and to excite us as it would do.

Consider the first case, the effort by some artists to extract the essential information about something from all the distracting and irrelevant non-essentials. Assuming physical optics and elementary sensations the only way to describe this effort is to say that it tries for abstractions, as writers do, and that accordingly there is a “language” of vision (Keper, 1944). But if we assume ecological optics instead of physical optics, and perception instead of sensation, the effort can be described as the portraying or showing of the distinctive features of something by means of invariants in the optic array from the picture. The invariants are not to be found in the light rays and color patches, and not even in the forms of these elements, but rather in the forms of form. And learning to see these subtleties of visual structure is not the same as learning to use words, as every artist knows.

We are accustomed to think of a representative picture as one made by an unselective process of registering the light-rays, either by employing a camera or following the prescriptions of perspective painting. But this seems to be a mistake. Even the photographer explores and selects. He chooses a point of observation and, of the ambient light at that point, he chooses a sample. Every time we look around the world we select a part (roughly a hemisphere) of the total ambient array, selecting the information that interests us on that occasion. We are compelled to select by sampling the light and so is the photographer.

Reversible and Ambiguous Drawings

The new definition of representation makes intelligible the cases of reversible and ambiguous drawings those puzzling phenomena that have long interested both artists and psychologists. A line in a display, or a contour, is often said to represent as edge in the world, but this is vague, for the term “edge” has several different meanings, and it is the task of ecological optics to sort out and specify these meanings. One kind is the occluding edge (subdivided into sharp and curved occluding edges) and another is the dihedral angle edge (subdivided into convex and concave dihedral angles, and these further subdivided). The fact is that a line or contour in a display can be arranged to represent two incompatible edges at the same time, or two different things in the same place. That is, the information conveyed by the connections of the line (its relation to other lines, or the topological invariants of the display) is equivocal in the sense that it equally specifies two kinds of edge that could not coexist. The corresponding percepts of layout (or “space”) cannot coexist for the same line, and therefore the appearance of the line reverses from time to time. This explains the displays of reversible “figure and ground,” which have been known since the work of Rubin, and the displays of reversible “perspective.” In the first case it is equivocal on which side of the lines an occluding surface is represented, that is in which direction the edge occludes.

In the second case what is equivocal is the convexity or concavity of the dihedral angle or corner. The reader is invited to check this explanation for himself.

In still other cases, the connections of a line at one end provide information that contradicts the information from the connections of the line at the other end. For example, an edge may seem to occlude at one end and to be a corner at the other, or it may occlude one way at one end and the other way at the other end. These impossibilities can be noticed in the “ambiguous tuning-fork.” Or a line can be so connected up as to specify the apex of a certain corner at one end but

of a different corner at the other end, the two dihedral angles being non-congruent. This holds, I think, for the rectangular impossible object illustrated. Equivocalities of this sort have been exploited brilliantly in the drawings of Escher (1967) together with other discrepancies of information not here considered.

Non-Representative DisplaysThe new definition of representation in terms of optical information still has reference to an original scene, that is, to part of a perceptible environment. The information referred to is information about the world. But what about displays that do not have any reference to the environment, that do not represent anything? An artist sometimes claims to be doing no more than producing optical structure as such, trying out its limitless variations, when he paints. Is he providing us with information for perception even though it is not information about? Is there such a thing as pure information, information as such? Can we have perceptions that are not perceptions of?

There are psychologists as well as artists who believe that (to use the present terminology) we must distinguish between the structure of an optic array and the causes of structure of the array. Garner, for example, assumes that the information in stimulation is its structure and that is all there is to it (Uncertainty and Structure as Psychological Concepts, 1962). This is equivalent to saying, I think, that perceiving is a matter of discriminating, distinguishing, differentiating, and not so much a matter of coping with the world.

Artists and psychologists should certainly be concerned with optical structure as such, with lines, boundaries, margins, contours, textures, patterns, forms, and the like. But we should never forget that, in the last analysis, the structure of the ambient optic array is deeply connected to the structure of the world. There exists edges, surfaces, substances, corners, curvatures, scratches, and the like in the world that correspond in special ways to the lines, boundaries, and margins in the array. The two are lawfully related. It is highly unlikely, therefore, that an artist will ever be able to draw or paint or invent a pictorial element that has no information in it for something in the world .

The Discovery of Pictorial Representation and the Learning of the Pictorial Attitude

I have argued that the origin of pictures in prehistory is comparable in importance to the development of speech in human evolution (Gibson, 1966, Ch. 11). Ice-age man and his predecessors had presumably been doodling, finger painting, scratching, or scribbling on rock surfaces and the walls of caves for uncounted generations. Man had discovered trace-making, the fundamental graphic act, just as the modern infant discovers scribbling and takes pleasure in it. It is a case of noticing the lasting results of one’s handiwork, of displaying one’s handiwork, and a special case of the eyes guiding the hand. This activity is interesting enough in itself, but when it developed, gradually or suddenly, into picture making, the discovery was momentous. The evidence suggests that this happened around twenty or thirty thousand years ago.

The primitive artist who succeeded in displaying not only his handiwork on a surface of rock but also (let us say) a mammoth must have astonished himself. He had made a mammoth appear where no mammoth existed.

He had invented a new kind of perception, perception at second hand, for the people who came to look would also see what he saw, the image. the ghost, the spirit, the form, but not the substance of an animal. The artist had invented a new way of communicating with others without having to use conventional speech or gestures. The miracle of representation, of the illusion of reality (Gombrich, 1960) was thenceforth to be handed down from generation to generation. The artificer had become the artist.

But a new method of communication was not all that developed. Along with it, I believe, went a new way of seeing the world, that is, a new way of looking at things. I will call this the pictorial attitude in perception as distinct from the naive attitude.

There is evidence to show that young children and animals do not notice the forms or aspects of objects as they appear from a stationary point of view and when motionless in space. Neither do children or animals notice the appearance of the environment as a frozen patchwork of flat colors, confined by the boundaries of the temporary field of view. Instead, they notice only the crude distinguishing features of objects that are given by invariants of transformation in time. They only perceive the rigid layout of the environment as given by the non-change in the field of view that underlies the changes. This is the naive attitude. Children and animals do not see “in perspective.” They do not experience the visual field; they detect the visual world. They do not notice visual forms; they attend to the formless invariants that specify objects. The evidence for this conclusion is summarized in Gibson (1966) and E. Gibson (1969). I believe that primitive men before the discovery of representative could only take the naive attitude toward the world. They had never noticed that a mammoth had a different appearance from in front, the side, the rear, an d above. Why should they? Why should a man have noticed that an object can have the appearance of getting smaller as it goes farther away? Why should he be expected to have seen the increasing density of the items in his visual field at increasing distances of the elements of the terrain? Of what value is linear perspective? It would be useless. But as our ancestors began increasingly to see pictures they began to notice these appearances. The man who made a picture had to notice the perspectives, for he had to pay attention to the tracings on the surface. And so men began to take the pictorial attitude in viewing the environmentat least sometimes. But they had to learn to do so.

The modern child also learns to adopt the pictorial attitude for certain purposes. He is surrounded by pictures, and is encouraged to convert his scribblings into representations as soon as possible. The development of his ability to perceive by means of pictorial information is not allowed to lag far behind the development of his ability to perceive by means of natural information (E. Gibson, 1969, Ch. 18).

If I am right, then, the modern adult can take either a pictorial attitude or a naive attitude in viewing the environment, or a mixture of the two. The former tends to yield the experience of a flat patchwork of color sensations (the visual field). The latter tends to yield a direct awareness of the layout of surfaces and the features of things (the visual world). The combination of the two attitudes yields the phenomenal experience of objects whose sizes and shapes and colors are not “constant,” and of a world whose space is not Euclidian. The perceptions are compromises between what would be expected from the stimulation on the retina considered as an image and from the invariant information in the stimulus flux considered as a complex of relations.

Similarly, the modern adult can take either a pictorial attitude or a naive attitude in viewing a picture, or a combination of the two. The former means paying attention to the artist’s handiwork, to the picture as such, the medium, the technique, the style, the surface, and the way the surface has been treated. The latter means paying no attention to the medium but only to the information for the perception of what is represented. It is easy to shift from one attitude to the other and some pictures almost compel us to go back and forth, from the surface to the information, from the object to the virtual object.

Ordinarily, the viewer of a picture can perceive both the picture as a thing and the thing pictured. There are two levels of perception, as it were and two corresponding levels of space perception, one being the space in which the picture lies and the other being the space in which the objects pictured lie. If you place a photomural showing a road and trees on the wall of a room and set an observer at the proper station point (fixed by the focal length of the camera times the degree of enlargement) you can get him to judge how far away a tree is (“a hundred paces”) and how high it is (“twenty feet”). How if you ask him to judge how far away the picture is and how high it is he might reply “three paces” and “four feet.” But the two spaces are not continuous with one another, and they are not commensurate. The ambient optic array from the room specifies the location of the picture. The angular optic array from the picture specifies the location of the tree.

The consequence of being able to take the pictorial attitude toward the environment is the ability to experience visual sensations. These are curious and interesting experiences, worthy of all the investigation they have received. To assume , however, that we get nothing from our eyes but these sensations is quite wrong.

But What about the Illusion of Reality?

Gombrich (1960, p. 206) repeats a story of Pliny about the success of Greek painting, to the effect that Zeuxis painted a bunch of grapes so realistically that birds came to peck at them. And Parrhasocs even painted a curtain so deceptively that his rival tried to lift it. Pictures that could “fool the eye” were celebrated, especially after the prescriptions for painting in perspective were worked out in the 15th century. Stories of this kind are popular, and the effort to make an image completely lifelike still fascinates us. It is a continuation of the legend of Pygmalion. A perfect picture should be one where the image is indistinguishable from the reality.

I suspect that all such stories are pardonable exaggerations. For the hard facts are that a painting or a photograph can have projective fidelity to the original scene only for one eye (not two), only when the picture and the original are delimited by apertures, only when the eye is stationary (not moving) and only when the eye is at a unique point (the station point). The laws of perspective for an observer looking through a peep-hole are entirely different from the laws of perspective for an ordinary observer who can look around and move around. No picture has ever fooled an ordinary observer into thinking he was face to face with a person, or that he was actually visiting a distant place.

What is the Contribution of the Perceiver to Perception?

We cannot hope to understand pictorial perception having a theory of ordinary perception. Most of the theories of ordinary perception assert either that we learn to supplement our sensations with theory images or that we have inborn ideas of form and space with which to construct the world from sensations. There has to be a major contribution of the perceiver himself to the perceptual process inasmuch as the sensations are not sufficient for perception. The present theory of perception, based on the hypothesis of available information in the flowing sea of stimulus energy, asserts that the perceptual process is not primarily one of supplementing or contracting but of selecting. If so, the contribution of the perceiver is only one of paying attention, of looking, of exploring, of adjusting the eyes, and of detecting the “formless invariants.”

Information of this sort does not consist of signals that have to be interpreted, nor of data that must be supplemented from a store house of knowledge. I am suggesting nothing less than the hypothesis that meanings are not subjective contributions but objective facts. I prefer to call the meanings of things their affordances, that is, what they afford the observer. And the meanings or values of things in this sense are perceptible properties of things, like their shape, layout, composition, and color, although properties of very much higher order. But they are given as information in the light to the eye. This is a radical hypothesis, and the theory is a radical theory.

Gombrich, in trying to define what he calls the “beholder’s share” in the act of perception, speaks for the classical theories of sense-perception. He has recently tried to persuade us as he has done before that imagery is a necessary ingredient of perception. “Visual evidence never comes neat, as it were, unmixed with imagination” (Gombrich, 1960, p. 56). But I can argue that memory images, like sensations, are at most accomplishments of perception not ingredients. I can argue that the important effect of learning and memory on perception is to educate the attention not to recall the memories of all past perceiving. The evidence that Gombrich has marshaled so persuasively for the conclusion that memories are part and parcel of perception is evidence to show that different perceivers get different percepts from the same array, and that the same perceiver may get different percepts from the same array at different times. But this is not evidence to prove that imagery is a necessary ingredient of perception. For “the same array,” according to the new optics, may contain multiple information, ambiguous information, equivocal information, or information that is not noticed on the first exposure. And if this is true we need a theory of information-based perception in place of the theories of sensation-based perception.

ReferencesEscher, M. C. [Title?]. Meredith Press, 1967.

Garner, W. R. Uncertainty and Structure as Psychological Concepts. Wiley, 1962.

Gibson, E. J. Principles of Perceptual Learning and Development. Appleton-Century-Crofts, 1969.

Gibson, J. J. The Perception of the Visual World. Houghton-Mifflin, 1950.

Gibson, J. J. A theory of pictorial perception. Audio-Visual Communication Review, 1954, 1, 323.

Gibson, J. J. The non-projective aspects of the Rorschach experiment IV. The Rorschach blots considered as pictures. Journal of Social Psychology, 1956, 44, 203-206.

Gibson, J. J. Pictures, perspective, and perception. Daedalus, 1960, 89, 216-227. Gibson, J. J. The Senses Considered as Perceptual Systems. Houghton-Mifflin, 1950.

Gombrich, E. H. Art and Illusion: A Study in the Psychology of Pictorial Representation. Pantheon, 1960.

Hochberg, J. E. The psychophysics of pictorial perception. Audio-Visual Communication Review, 1962, 10, 22-54.

Hochberg, J. E. & Brooks, V. Pictorial recognition as an unlearned ability: A study of one child’s performance. American Journal of Psychology, 1962, 75, 624-628.

Kepes, G. The Language of Vision. Geo. Theobald, 1944.

Ryan, T. A. & Schwartz, C. Speed of perception as a function of mode of representation. American Journal of Psychology, 1956, 69, 60-69.

Smith, O. W. & Gruber, E. Perception of depth in photographs. Perceptual and Motor Skills, 1958, 8, 307-313.

Smith, P. C. & Smith, O. W. Ball throwing responses to photographically portrayed targets. Journal of Experimental Psychology, 1961, 62, 223-233.