Gibson’s 1972 revised theory of pictures

October 1972

On the Nature of Pictorial Representation

J. J. Gibson, Cornell University

The World Wide Web distribution of James Gibson’s “Purple Perils” is for scholarly use with the understanding that Gibson did not intend them for publication. References to these essays must cite them explicitly as unpublished manuscripts. Copies may be circulated if this statement is included on each copy.

What do we mean by pictorial representation? Just what does a pictorial representation do? There is a very old assumption that a faithful picture of an object is one that resembles the object, or is similar to it. There is also a very old notion that the resemblance of a picture to its object is explained by a point-to-point projective correspondence between them. This latter is closely related to the assumption that the retinal image is a picture of the object, in accordance with the still current and accepted theory of image optics that stems from Kepler. And finally there is the assumption that the idea of an object, depending on a special sort of projection of the retinal image to the brain, has the same relation to the object that a picture does. These four assumptions all go together. But I believe that all of them are mistaken, and I reject all of them.

Considering them one by one, I will try to show how they are incorrect. Then I will suggest an alternative, a general theory of optical information available in pictures. This theory is based in optics but not on the orthodox theory of image optics.

1. A faithful picture of an object is one that resembles it . This assumption is superficially plausible and, in one form or another, has been widely believed from the time of the ancient Greek thinkers. A good picture is similar to what it is a picture of. A bad picture is one that is not sufficiently similar to what it is a picture of. It must have the same form as the object and it should also have the same color if it is to be really faithful. The best picture is a copy of the object, a replica or simulacrum. The image of a man, his portrait, is said to be like him, and one who portrays strives for a likeness .

A believer in this assumption may be at a loss if required to say what he means by resemblance, similarity, or likeness. But he can give an example. He can reply that the name of an object does not resemble the object, or not in the way its picture does. The name is not “iconic”. He has a point, of sorts. A photograph of one’s wife, carried around in a wallet, is similar to her in a way that her name is not. The shadow of an automobile specifies it in a way that its license-plate does not.

But the assumption, despite its plausibility, is surely mistaken. In the first place, and most obviously, a rectangle of paper does not resemble a wife and a shadow does not resemble an automobile. I will return to this point later. Less obviously and more deeply, a caricature of a person may not even resemble the person in the way that a photograph does and yet it may specify the person, that is, it may show us his distinctive features. Similarly a cartoon drawing of a Volkswagen may convey more information about its peculiarities than a photograph could. Even more crucial is the objection that a picture can be made to specify an object that does not exist and has never existed. In this case the question of the resemblance if a picture to an object does not even arise, since there is nothing for it to resemble. I am using the term object to mean a topologically closed or semi-closed surface that reflects or emits light, the usage of a perceptionist, not a philosopher. A whole layout of reflecting surfaces, such as a room or a street will be called a place, which includes both objects proper and a surface of support, a ground. A picture, therefore, cannot possibly have the same form as an object, and still less can a plane.

2. A faithful picture of an object or a place is to be understood as a projection of it on a plane surface by a sheaf of light rays intersecting at a common point such that for every color-point on the former there is a color-point on the latter . This assumption is more explicit and exact than the first one since it appeals to the mathematical notion of point-to-point correspondence and thus can be said to explain the resemblance. It generalizes to the theory of perspective geometry when the plane is taken to be a transparent picture-plane analogous to a window and the point of intersection is taken to be a station point where an eye could be positioned. It can be still further generalized to the mathematical discipline of projective geometry if the notions of the window and the eye are dropped. And thence it can lead to all the abstract geometries of a pure transparent space with ghostly planes and forms instead of surfaces.

This assumption has been powerful and productive in the history of geometrical thought. But I believe that it has led to hopeless confusion in the history of the theory of depicting. Its mathematical elegance has prevented us from examining the muddles and contradictions to which it leads when point-to-point correspondence is taken to be the basis of vision.

For example, this assumption implies that the far side of an object and the background surface behind it cannot be depicted since no correspondence color points for them exist in the picture. If surfaces are opaque, as they generally are, only the unhidden surfaces of the environment can be portrayed for only they are “projected” (unhappy term) to the point of intersection of the light rays. But I have evidence to show that the far side of an object and the background behind can be depicted in the sense that they can be specified in a picture . The specification of one surface behind another can be quite adequate in a still picture and can become very precise in a motion picture. There are invariants in the structure of an optic array that make the relation of in-front/behind clear and unequivocal. But these invariants are formless and are not encompassed by ordinary geometry.

The notion of point-to-point projective correspondence simply does not apply to the relation between an object (or a place) and its picture. It applies only to the relation of a given plane form to a perspective transformation of it, as in projective geometry. That is, it applies only to the relation of one picture to another, including the special case of a duplicate or copy, the identity transformation. Hence the truth of a picture, or the information it provides an observer, cannot depend on projective correspondence of points.

3. The retinal image of an object or a place is a projective picture in the above sense, differing only in being of the inverted type found in a camera . The sheaf of rays through a picture-plane to a station point yields an image that is upright; the sheaf of rays through a pinhole into a camera (a darkened room) yields an image that is inverted but the correspondence of color-points is the same in both cases. The substitution of a lens for a pinhole does not affect matters: there is still a one-to-one correspondence between the radiating points of the object and the focus-points of the image. This is the accepted theory of image formation in geometrical optics. And, since the eye is taken to be a camera, it has been the unchallenged basis for all theories of vision until recently.

This third assumption is just as mistaken as the second, on which it depends, but with added errors of its own. One can see the inverted image on the ground glass screen of a view camera, to be sure, but to assume that the possessor of an eye sees the image on the retina is a pernicious fallacy. Even to suppose that the retinal image is a picture is quite wrong, for a picture is something to be looked at, a display. Yet ever since Kepler, philosophers, psychologists, and physiologists have accepted and taught the doctrine that “a picture of an object is painted by light at the back of the eye”, in the words of Isaac Newton. The popularity of photography and the prestige of orthodox optics will make this fallacy difficult to refute, but the fact is that a new theory of the eye not comparing it to a camera is possible, along with a new optics of structured ambient light as contrasted with the old optics of radiant rays.

An ocular system, as distinguished from a single chambered eye, is mobile. Its function is to explore the ambient light. Eyes are for seeing the surroundings, not merely for seeing the light entering an eye at a temporary position. An image or picture of the whole surrounding world is an impossibility, for a picture cannot be wholly panoramic. The perception of the surrounding world cannot therefore be based on an image or picture of it.

4. The relation of an idea to its object is the same as the relation of a picture to its object and thus to perceive anything is to have a mental representation of it . Similarly, to remember or imagine anything is to have a mental picture of it. The chief plausibility of this assumption comes from the notion that a retinal image is projected to an area of the surface of the brain (the visual projection area, so-called) in more or less the same way that the front surface of an object is projected to the surface of the retina, not point-to-point of course but in some vaguely analogous way. The force of this notion is diminishing as we learn more about the nervous system, or so it seems to me, but neurophysiologists are reluctant to give it up because they have nothing better to take its place. Since the cerebral cortex is supposed to be the seat of consciousness this notion leads directly to the sensation-based theories of perception.

You might suppose that psychologists would hesitate to speculate about a mental picture when they had no clear understanding of what a literal picture was, but this has not deterred them from doing so. It is tempting to explain the unfamiliar in terms of the familiar instead of explaining the unintelligible in terms of the intelligible.

The act of making a representation has never been seriously studied by psychologists, that is, the activity that underlies the seemingly different behaviors of painting, drawing, engraving, sculpting, modeling, and shadow-casting. The artist knows that he can represent from “life,” or from “memory,” or from “imagination.” He knows that he can “copy” another representation, either “freehand” or by “tracing.” He knows he can apply pigment to a surface, or make lines on a surface. But none of these acts is understood by the psychologist. The existing literature of the scribbling and drawing of children and that of graphic education are almost worthless. All we can be sure of is that the act of representing is more complex than the act of perceiving, not less. For the artist has to perceive his own display, as he is making it, as well as whatever he is displaying. He has to have a dual experience, not a single one. And hence the attempt to explain perceiving in terms of picturing is ridiculous; we shall have to do it the other way round. Perceiving cannot possibly be the having of a mental picture of the thing perceived.

All four of the assumptions, I have suggested, go together. What they have in common is an implied theory of perception based on projective correspondence. First, the surfaces of the world that face an observer at one moment of time are mapped into the forms of the retinal image. Then the latter are mapped into areas of the brain. Finally these are mapped into the color patches of primary sensory awareness. Only then is the process of perception supposed to begin. I reject both the assumption and the theory of perception. A faithful picture is not one that resembles the object it represents. A true picture is not simply a projection of its object or place in either this sense or any other sense. And the experience of an object or scene is certainly not a picture of it in any meaning of the term.

What is the Alternative ?What is a true picture of something if it is not a point-to-point projection of it? And what is a perception of something if it is not a mental picture of it? I am going to suggest that a true picture is one that provides partial but genuine optical information about the world, and that a perception is the “picking up” of this information by the visual system. But first we must clear away what seems to be an obvious alternative to the projective theory of a picture and to the projective picture theory of visual perception.

If perceiving is not the having of a mental picture of something, the next most familiar metaphor would be that it is the having of a mental description of it , a “word-picture”. This formula also has a long history. If the mind does not picture a thing to itself it speaks to itself about that thing. We seem to know what it is to describe the world at least as well as we seem to know what it is to depict it; at least intellectuals do if not artists. An extension of this formula goes so far as to assert that a pictorial representation is itself to be understood as a verbal description. We read a picture as we would read a text, and we have to learn to read writing. The most explicit and disciplined formulation of this theory has been carried out by Nelson Goodman in his Languages of Art (1968).

All that a picture can give us is signs or signals that have to be learned, and learned by association. They are no more than cues or clues for an act of interpretation. Moreover, since the retinal image is the basis of vision and is itself a picture, visual perception is also an interpretation. The retinal picture provides only sensations. Depth and distance, meaning and value, all the qualities that make perception useful have to be added by the mind. So runs the argument of associationism. By substituting intuition for association one gets the argument of nativism, but it is essentially the same argument. In contrast to this, I maintain that raw sensations are not the basis of perception. They are only the incidental, occasional, and irrelevant accompaniments of the act of information pickup.

The notion of signals sent up the optic nerve to a mind seated in the brain is no better than the notion of pictures sent up the optic nerve to the mind. A little interpreter of signals in the brain is no better than a little seer of pictures as an explanation of perception. The world does not talk to an animal through its nerves any more than the world sends photocopies of objects through the nerves, or simulacra, or eidola, or images. The analogy with language is even worse, if anything, than the analogy with images, for animals do not have language and yet they certainly perceive the world around them.

The flowing sea of energy in which an animal is immersed has certain invariants of structure that specify the environment. But these invariants are not sent through the animal’s nerves; what the animal does is explore, adjust and orient its organs of perception so as to extract the invariants. When the system has picked up the facts that are relevant, then the animal has perceived. And this activity of visual attention is not located in the brain but depends on a continuous input-output process that tends toward an optimal state. My theory of perception is not based on point-to-point correspondence. Consequently my theory of perception mediated by displays is not based on point-to-point correspondence. Perception is based on invariants that specify the environment. The information in light consists of these invariants, whether the light comes from all around or only from a display. The invariants do not resemble what they specify and they do not say what they specify.

These invariants, I said, were formless , meaning that they are not triangles or squares or circles or anything of that sort. We pick them up whenever we walk around a solid object, or whenever the object turns. The perspective change but the invariants do not. It is the same for a room, or a house, or a street: invariants underlie the changing perspectives as one moves about. The changes are reversible, since any transformation caused by a given movement of the observer is exactly canceled by an opposite movement. And these invariants are what animals and young children notice, not the frozen pictures. As adults we only begin to notice the perspective projection of things when we learn to take the pictorial attitude (Gibson, 1966 Ch. 11).

Formless invariants specify the distinctive features of objects, places, animals, person, and events. A baby who perceives a kitten is not aware of a certain form together with a motion and a color. It does not see the animal from the front, the side, or the top as the painter learns to do; the baby does not see views but invariants. The ancient doctrine that form perception is primary and that object perception is secondary has to be turned upside down.

The artists, then, and especially the caricaturist, is not so much manipulating forms as he is invariants, the subtle structures that underlie the forms as such. In a drawing, for example, the lines as geometrical elements are unimportant. What counts are the relations that make a line specify an edge, or a corner, or a thread, or a margin. The apex of an angle in a drawing can be the so-called vanishing point of a railroad tracks, if a horizon is indicated and then the viewer does not perceive the apex of angle, a “form,” but very great distance. The invariant is noticed but the meeting of two lines is not.

What a good picture does, if I am right, is to specify the distinctive features of something in the environment, including anything that has been or might be. It makes available optical information for perception. The information has been narrowed down by the selective attention of the picture-maker, and this is true of the photographers well as the painter. The information is contained in a optic array but it cannot be reduced to a set of light rays. It does not consist of forms, but of invariants that underlie forms. And it does not consists of graphic symbols like written words. If traditional optics fails to explain the kind of perception that is mediated by pictures then the remedy is to improve on the optics, not to leap into the discipline of linguistics.

References

Gibson, J. J. The senses considered as perceptual systems. Boston: Houghton Mifflin, 1966.

Gibson, J. J. The information available in pictures. Leonardo, 1971, 4, 27-35.

Gibson, J. J. On the concept of formless invariants in visual perception. Leonardo, 1973, 6, 43-45.

Goodman, N. Language of art: An approach to a theory of symbols. New York: Bobbs-Merril, 1968.