Axiomatization of the Background Knowledge in Bayesian Theory of Perception

(1)

Axiomatization of the Background Knowledge in

Bayesian Theory of Perception

YUKI OZAKI

1,a)

Abstract: Recently, Bayesian theories of human object perception are widely studied. Helmholtz’s idea of perception as unconscious inference is formalized by Bayes’ theorem. Human object perception is now called Bayesian inference or statistical inference, and to obtain a Bayesian quantitative model of human object perception has now become a primary goal for the consciousness scientists utilizing Helmholtz’s idea. An adequate Bayesian theory of perception seems to require the axiomatization of the so-called “background knowledge”. This paper argues the axiomatization.

We deal with the problem of the ambiguity of the Necker cube as a special case. The difficulty of the duplication or modeling of human object perception arises partly because of the problem of the so-called ambiguity of an image. The probabilistic account of the ambiguity of an image is wanting in recent constructivist approaches to the duplication of human object perception. In this paper, a reframing of the problem of the ambiguity of the Necker cube is proposed and a deficiency of the prevailing statement of the problem that the underlying task of the theory of visual processes is to derive properties of the three-dimensional world from two-dimensional images of it is pointed out. A linguistic definition of sense modalities and a language setup are proposed as the basis of the axiomatization of background knowledge in Bayesian theory of perception.

Keywords: the ambiguity of an image, the duplication of the establishment of human object perception, consciousness science, Necker cube, Bayesian theory of perception, background knowledge

1. Introduction

The duplication of human object perception or reliable object recognition is a harder problem than that of human inference such as the ability to play chess, numerical calculus, symbolic algebra, etc. The duplication of human visual perception is a central issue in the duplication of human object perception. In general, the issue is stated as follows: How is the experience of a world of objects arranged in three-dimensional space created from the two-dimensional image on our retina? Marr stated that the purpose of human vision is building a description of the shapes and positions of things from images (Marr 1982). Recently, the issue is widely tackled by constructivists utilizing Helmholtz's idea of perception as unconscious inference or unconscious top-down processes (Helmholtz 1867). For instance, Frith stated that when we perceive something, we actually start on the inside: a prior belief (Frith 2007). In search for quantitative models of human visual perception, constructivists use Bayes' theorem to formalize the Helmholtz's idea. Human object perception is now widely called Bayesian inference or statistical inference by them. To obtain a Bayesian model of human object perception has now become a primary goal for the constructivists utilizing Helmholtz's idea (Vincent 2015).

It is said that here are two problems to be overcome. One is the problem of inferring the causes of sensory input. Friston claims that the brain employs Bayesian models to infer the causes of sensations (Friston, K. Kilner, J., Harrison L., 2006). The other is the problem of optical illusion concerning the ambiguity of a two-dimensional image. In general, it is difficult to understand the strict mechanism of an optical illusion. Here the term ambiguity of an image concerns the inversion of the depth, and it means that different three-dimensional objects can produce the

1 School of Science, Hokkaido University, Japan a) [email protected]

same two-dimensional retinal image. A difficulty of the duplication or modeling of human object perception arises partly because of the problem of the ambiguity of images (Kersten, D., Mamassian, P. and Yuille, A. 2004). The Necker cube is a well-known example of the optical illusion that concerns the ambiguity of an image (Figure 1). According to Gregory, the ambiguity of the Necker cube, that is, the existence of two different three-dimensional interpretations of the same two-dimensional image lacks the explanation in Bayesian terms (Gregory 2006).

The Necker cube is a wire frame cube without no depth cues.

one possible

interpretation another possible interpretation

The prevailing constructivist approach to the duplication of human object perception presupposes ontological realism about the object of human perception. This paper proposes a constructivist approach to the duplication of human object perception that presupposes anti-realism about the object of human perception. The deficiency of the prevailing statement of the problem that the underlying task of the theory of visual processes is to reliably derive properties of the three-dimensional world from two-dimensional images of it is pointed out. Based on the reframing of this problem from the anti-realistic constructivist

(2)

viewpoint, we propose a probabilistic account for the ambiguity of the Necker cube implemented by Carnapian inductive logic. The goal of this paper is to give a linguistic definition of sense modalities and a language setup as the basis of the probabilistic logical construction of the Necker cube. I expect that the logical construction provides us how to go about programming human object perception on a computer. The probabilistic logical construction can provide a computer program or software that expresses the process of the establishment of human object perception.

2. Bayesian Theories of Perception

Recently, Bayesian theories of human object perception are widely studied. Helmholtz’s idea of perception as unconscious inference is formalized by Bayes’ theorem. Human object perception is now called Bayesian inference or statistical inference, and to obtain a Bayesian quantitative model of human object perception has now become a primary goal for the consciousness scientists utilizing Helmholtz’s idea. It is also said that this approach is attractive because it has been used in computer vision to develop theories and algorithms to extract information from natural images useful for perception and robot actions (Kersten, D., Mamassian, P. and Yuille, A. 2004).

In the recent Bayesian framework, perception is the inference from 2-D image description to 3-D object description (Kersten, D., Mamassian, P. and Yuille, A. 2004). A difficulty arises because 2-D images are objectively ambiguous. That is, roughly speaking, the same 3-D object can result in different 2-D images, and different 3-D objects can result in very similar 2-D images depending on the environment. Bayesian theory of perception utilizes the Bayes’ theorem to overcome this difficulty. That is, according to the Bayesian framework, humans use the prior probability to disambiguate when the ambiguity problems arise. Here the prior probability is the probability of each possible 3-D object, prior to receiving the 2-D retinal image. When the ambiguity arises, using not only the data (2-D image description) but also the prior probability, humans compute the posterior probability, which is the probability of each possible 3-D object given the 2-D retinal image.

As is well-known, the probabilities that appear in the Bayes’ theorem are all relativized to the individual's stock of contemporary background knowledge. Here, background knowledge is a “theory” in the sense of syntax. So, an adequate Bayesian theory of perception seems to require the axiomatization of background knowledge.

3. Logical construction as duplication of human

object perception

The goal of this paper is to give a linguistic definition of sense modalities and a language setup as the basis of the probabilistic logical construction of the Necker cube and as the basis of the axiomatization of background knowledge. In this section, I will briefly explicate what kind of attempt logical construction is. One can view the logical construction as the duplication of human object perception from the philosophical standpoints shown in

this chapter.

3.1 The ontological status of sense: anti-realism about the object of human perception

On the realistic presupposition that nature, natural process, and natural objects are independent of humans and humans are subject to or governed by natural laws, sense (colors, sounds, temperatures, pressures, and so on) obtains the ontological status of representation. That is, the sense is an intermediary between humans and natural objects, and sense represents natural objects to humans. This realistic presupposition derives the indirect or representational theory of perception and the causal theory of perception. The former claims that natural object is inferred from sense by humans and the latter claims that the relation between natural object and sense is the relation of cause and effect. The realistic presupposition also derives the view about sense that sense has the epistemological status of appearance or Schein, or in other words, the view that sense is the origin of the illusion or error that inhibits or blocks humans' pursuit of truth. According to this view, sense is a veil of the world while humans pursue a true or an accurate knowledge of the world as it is (i.e. independent of our actions).

In contrast to the realistic presupposition, one can adopt an anti-realistic presupposition about the object of human perception that derives the direct theory of perception, which claims that sense is not a representation for humans but immediately real for humans and that objects are not a human inference from sense but a higher-order construction or a fiction from sense. The anti-realistic presupposition also derives the view about sense that sense is the basis of intersubjective knowledge of the world including illusion. For example, this idea is stated by a 19-20th century Austrian physicist E. Mach as follows (Mach 1984):

Bodies do not produce sensations, but complexes of sensations make up bodies. There is no ''sensation'' to which an external ''thing,'' different from sensation, corresponds. As we shall see in chapter 5, one can reframe the problem of inferring the causes of sensory input on this (somewhat extreme) philosophical presupposition.

3.2 Constructivism about the object of human perception On the empiricist presupposition that all knowledge comes from experience, humans are passive at object perception. That is, an object is given to humans. The empiricist presupposition derives the empiricist motto that one should perceive an object as it is and without any preconception. The meaning of the term ''discovery'' as removing a blanket or lid from something that is there as it is reflects the empiricist motto.

In contrast to the empiricist presupposition, one can adopt a constructivist presupposition that a concrete object of perception is an artifact or that humans artificially make and produce a concrete object of perception with reference to our a priori format of object perception. Here the term format means humans' subjective framework or form of object perception, and the term a priori means preceding one's any possible object perception. On the constructivist presupposition, a priori format is considered to

(3)

be the conditions of the possibility of any possible intersubjective knowledge of the world, and time and space are such formats (Kant, I. 1781A/1787B). The constructivist presupposition derives indirect theories of perception, which claims that object perception involves a mental kind of information processing. Helmholtz's idea of perception as unconscious inference or unconscious top-down processes is considered to be a kind of constructivism.

From the constructivist view, one can regard human object perception not as a discovery but as the test of one’s prediction. Here that human object perception is a prediction means, firstly, that it is done in a top-down manner, that is, it is done with reference to premises. Secondly, it means that its conclusion is uncertain or unsure. For example, this idea is stated by a German constructivist philosopher P. Janich as follows (Janich 1992):

Sensory perceptions are tests whether our perceptual predictions that are formed from the past and directed toward our future states or purposes are fulfilled.

One can get a probabilistic approach for the problem of the ambiguity of a two-dimensional image on this presupposition about the object of human perception.

3.3 What is logical construction?

Logical construction is an attempt to linguistically define human's basic concepts such as things, my body, places, etc. in terms of logic. It is also an attempt to show the process of the definition.

According to the theory of relation developed by a British logician B. Russell, all relations are classified from the viewpoint of the formal property: symmetry, transitivity, and reflexivity. Here ''relation'' is the so-called propositional function that contains two or more variables. A symmetrical and reflexive relation is called a similarity relation, and a symmetrical and reflexive and transitive relation is called an equivalence relation. Russell also developed the formal method of bundling together all the relata that are related by these relations as a set. A set of the relata that are related by a similarity relation is called a similarity circle, and a set of the relata that are related by an equivalence relation is called an abstraction class, which corresponds to the concept of an equivalence class in modern mathematics. A set of the relata that are related by a transitive and nonsymmetrical relation is called a series. By the operations of relation, one can construct a similarity relation from any given relations. Further, one can construct an equivalence relation from any similarity relations. Russell gave a rough scenario to linguistically define, for example, ''thing'' as a similarity circle and ''motion'' as a series (Russell 1914). A German philosopher R. Carnap conducted the logical definition presented by Russell (Carnap 1928). Their attempt is called logical construction. Further, because Russell adopted sense (colors, sounds, temperatures, pressures, and so on) as the relata, their construction is called the logical construction from sense.

There is a view that sense (colors, sounds, temperatures, pressures, and so on) is an abstract concept which has been

derived from experience as a whole rather than a concrete element of which experience consists. Indeed, Carnap understood sense in this way. He chose experience as a whole as the primitive of his logical construction on the basis of this understanding of sense, and he showed the process that sense is defined as a higher-order concept. This process is called quasi-analysis (Carnap 1928). In contrast with this view, there is a view that sense is an element of which experience consists. For instance, it is well known that E. Mach took this view about sense (Mach 1984). These two views about sense make a sharp contrast.

This paper adopts the view that sense is not an abstract concept but an element of which our experience consists. On the basis of this view about sense, we will choose sense as the primitive of our logical construction.

Carnap implemented his logical construction by means of deductive logic. In contrast, we will utilize inductive logic based on the view that human object perception is a prediction and the view that a prediction is uncertain. So our logical construction is probabilistic (not deductive) and one from sense (not from experience as a whole). One can regard human object perception as such a logical construction under the two presuppositions shown in the above.

3.4 Logical construction as the duplication of human object perception

Carnap understood his logical construction as a rational reconstruction rather than the duplication of the actual process of human object perception. But now, our goal is not a rational reconstruction but the duplication of human object perception. One can regard logical construction as the duplication of the actual process of human object perception on the basis of the two presuppositions shown in this chapter. The goal of this paper is to give the basis for the probabilistic logical construction of the Necker cube with Carnapian inductive logic. As mentioned in chapter 6, we can utilize Carnapian inductive logic for this purpose because we regard human object perception as tests of prediction from the constructivist viewpoint about the object of perception.

4. Linguistic definition of sense modalities

4.1 Some general remarks

To utilize inductive logic for logical construction, we need language setup. So we need a linguistic definition of sense modalities in the first place. It is considered that what is wanting in order to adequately discuss human object perception is to analyze it from the viewpoint of modality. We will define sense modalities as the relata of spatial and temporal relations from the constructivist viewpoint that time and space are the forms of human object perception. Note that we are now adopting the view of constructivism shown in chapter 2. We think of "size" and "shape" as forms of perception rather than contents.

The phenomenological research program lacks the linguistic definition of sensory modalities, which means that it is incapable of providing logical construction. Not only philosophical perception theory but also physiology lacks the linguistic

(4)

definition of sensory modalities (Janich 1992). The term ''perception'' lacks a definition. After all, many undefined terms are used to describe the content of sensations such as tone or sound, color or warmth, etc. It is said that sensory modalities can only be experienced and that they cannot be strictly and explicitly defined.

Another problematic thing is, as many have noted (Fish 2010), that the vast majority of philosophical work on sense has only focused on sight. This seems to be partly because Metaphysics (one of the oldest and most influential philosophical works by Aristotle) manifested the priority thesis of sight in its opening. Many philosophers including an English philosopher A. J. Ayer, Russell, and Carnap seem to have followed this thesis. As compared to sight, touch has not sufficiently discussed (An exception is Mach’s work). We will question the assumption that the adequate way to think about the perception of three-dimensionality is by dealing with only sight. Instead, we will adopt the view that the adequate understanding of our capacity to perceive objects and three-dimensional space will require a united theory of sight and touch. Hereafter we deal with these two sense modalities.

4.2 Preliminary: the problem of whether space is intermodal or not

We define sense modalities as the relata of spatial and temporal relations from the constructivist viewpoint that time and space are the formats of human object perception. This definition concerns the controversial problem of whether we have distinct spatial concepts for each sense or whether the same spatial concepts are employed on the basis of both sight and touch (Fish 2010). For example, the former view is adopted by a 17-18th century Irish philosopher G. Berkeley, while the latter view is adopted by Mach. A typical version of this problem is the so-called Molyneux's problem. Throughout the definition we adopt the former view: We adopt the view that spatial concepts such as point, distance, position, and direction, are not intermodal. We are basing the claims in this chapter largely on the analysis given by G. Berkeley (Berkeley 1709), which enables us to linguistically define sense modalities.

In the first place, we will clarify the view we adopt in plain expression. Consider an ordinary situation, say, that one holds his cup by his left hand and holds his pen by his right hand and he sees both his cup and his pen. The view we adopt is as follows: The one's tangible cup is in the left of the one's tangible pen, and the one's tangible pen is in the right of the one's tangible cup. Analogously, the one's visible cup is in the left of the one's visible pen, and the one's visible pen is in the right of the one's visible cup. In contrast, the one's tangible cup is neither in the left nor in the right of the one's visible pen, and the one's visible cup is neither in the left nor in the right of the one's tangible pen. Analogously, the one's tangible pen is neither in the right nor in the left of the visible cup, and the one's visible pen is neither in the right nor in the left of the tangible cup. That is, the spatial concept of right and left is applied only to the same modality. Let us take these claims as the presupposition of our linguistic definition of sense modalities. Analogs also hold for the spatial

relation of up and down. For example, according to our presupposition, the tangible cup is not in the down of the visible ceiling but in the down of the tangible ceiling, which corresponds to the visible ceiling. That is, any spatial directions don’t hold between the tangible thing and the visible thing.

3.2.1 The heterogeneity of two perceptual spaces

Firstly, we adopt the view that each sensory input has its minimum beyond which humans cannot perceive (Berkeley 1709). In this paper, we will call tangible minimum ''tangible point'' and visible minimum ''visible point''. Because now the issue is perception, let the term ''point'' denote minimum. That is, they are not mathematical points that have no area. The tangible points and the visible points themselves are not considered to be the objects of human perception but the raw material from which humans make them. We adopt the anti-realistic view that the object of human perception is the complex of these points. These terms should require some clarifications. The visible points are the points that constitute humans’ visual field. Concerning the visible points on the field, humans see an equal number at all times. That is, when my view is bounded by the walls of my room, for example, I see just as many visible points as I could see in the case that I had a full prospect of the circumjacent fields, mountains, sea, etc. The number of visible points on the view that is bounded by near objects is the same as the number of visible points on the view that is extended to the remoter landscape. A visible point is not considered to be able to be moved on our visual field just like a point on an electronic message board can't be moved on the board or be varied its position on the board. A visible point is not considered to be same as the so-called ''pixel'' because the so-called pixel can contain a great number of ''visible points'' in the sense of this section.

Secondly, we adopt the view that there are two distinct distances, that is, tangible distance and visible distance (Berkeley 1709). Here the term distance between two points means the number of intermediate points. Because now the issue is perception, let the term distance denote the number of points or minimums. That is, they are not mathematical distances that have an infinite number of points. The tangible distance between the given two tangible points is defined by the number of the intermediate tangible points. Likewise, the visible distance between the given two visible points is defined by the number of the intermediate visible points. In this paper, we will call them tangible distance and visible distance respectively. The distance between a tangible point and a visible point is indefinable according to this definition because the distance between them consists neither of tangible points nor of visible points. According to this definition, it is nonsense to talk of distance (far or near) between a tangible point and a visible point. According to this definition of perceptual distance, spatial distance can be defined only between two points that are homogeneous or have the same mode.

Thirdly, we adopt the view that there are two distinct positions, that is, tangible position and visible position (Berkeley 1709). A tangible position of a tangible point is defined by the tangible distance from another tangible point, and a visible position of a

(5)

visible point is defined by the visible distance from another visible point. The position of a tangible point with relation to a visible point and the position of a visible point with relation to a tangible point are indefinable according to this definition. According to this definition, the position of any point is only determined with respect to the points that are homogeneous or have the same mode.

Fourthly, we adopt the view that there are two distinct directions, that is, tangible direction and visible direction. A tangible direction is defined by two tangible positions of two tangible points, and a visible direction is defined by two visible positions of two visible points. For example, there are the directions of tangible right and visible right, tangible left and visible left, tangible up and visible up, and tangible down and visible down, etc. According to this definition, a tangible direction from a visible point is indefinable. A visible direction from a tangible point is also indefinable.

Analogs can be repeated by replacing the above spatial concepts by the concepts of size or magnitude, shape, and displacement. We adopt the view that there are a tangible magnitude and a visible magnitude. A magnitude is greater or less according to the number of points. A tangible magnitude is defined by the number of tangible points, and a visible magnitude is defined by the number of visible points. We also adopt the view that there are two distinct shapes, that is, tangible shape and visible shape. For example, there are tangible circle and visible circle, tangible ellipse and visible ellipse, and tangible square and visible square, etc. Lastly, we adopt the view that there are two distinct displacements, that is, tangible displacement and visible displacement. They are defined by the concept of position and direction.

In ordinary speech, we, of course, don't distinguish them as concepts or terms. For example, we don't use the term like tangible left and visible left but we use one and the same term of left. That is, there is only one spatial concept and there is one spatial term in ordinary speech. We will discuss this problem of intermodal correspondence in 3.2.3. Summarizing the points so far, we adopt the view that spatial concepts such as point, distance, position, direction, and so on are not intermodal because such concepts are independently defined in each perceptual space. This view will give a positive answer to the question of whether we have distinct shape concepts for each sense and will give a negative answer to the question of whether the same shape concepts are employed on the basis of both sight and touch. 3.2.2 The incomparability of the heterogeneous spatial concepts

In the second place, we adopt the view that each spatial concept (distance, position, direction, etc) can only be compared to the same spatial concept that is homogeneous or has the same mode (Berkeley 1709).

Firstly, according to this view, a tangible distance and a visible distance cannot be quantitatively compared to each other. The relation of ''greater than'' and the relation of ''less than'' don’t hold between a tangible distance and a visible distance. That is, a tangible distance is neither greater than nor less than a visible

distance and a visible distance is neither greater than nor less than a tangible distance. The relation of ''equal to'' also doesn’t hold between a tangible distance and a visible distance. According to this view, the quantitative comparison holds only between two homogeneous distances. A tangible distance is ''greater than'' or ''less than'' only concerning another tangible distance and a visible distance is ''greater than'' or ''less than'' only concerning another visible distance.

Secondly, a tangible direction and a visible direction cannot be compared to each other. According to this view, the relation of ''contrary to'' and the relation of ''same with'' don’t hold between a tangible direction and a visible direction. For example, a tangible right is neither contrary to nor the same with a visible right. The same applies to the direction of left, up, and down replaced by right. The comparison holds only between two homogeneous directions. That is, a tangible direction is ''contrary to'' or ''same with'' only concerning another tangible direction. A visible direction is ''contrary to'' or ''same with'' only concerning another visible direction.

3.2.3 The correspondence between the heterogeneous spatial concepts: the problem of intermodal correspondence In the third place, we will concern the problem of intermodal correspondence which makes it possible for us to use one and the same term for the two spatial concepts that are heterogeneous or have different mode. We don't distinguish them as terms in ordinary speech. For example, we don't use the term tangible left and visible left. We use one term of left. We adopt the view that it is the temporal relations that relate two spatial concepts that are heterogeneous or have different mode (Berkeley 1709). In particular, we consider the temporal relation of synchronicity makes it possible for humans to give two heterogeneous spatial concepts the same name or term. For example, because tangible right is perceived synchronously with visible right, the two directions can have one term of right. Analogously because tangible up is perceived synchronously with visible up, the two directions can have one term of up. This framework of intermodal correspondence can be represented with figure 2.

According to this view, the synchronic correlation between two heterogeneous spatial concepts is arbitrary (Berkeley 1709). That is, the actual synchronic correlation between tangible spatial concepts and visible spatial concepts could be another one. For example, if a tangible right was perceived synchronously with a visible left, the two directions could have one term of right (or left). Analogously if a tangible up was perceived synchronously with a visible down, the two directions could have one term of up (or down). According to this view, for example, it is also logically possible that the greater tangible distance correlates with the lesser visible distance and that the tangible shape of a straight line correlates with the visible shape of a curved line. Such a synchronic correlation is possible as long as it allows us to guide our own actions. Because the synchronic correlation is arbitrary, according to this view, humans can give two heterogeneous spatial concepts one and the same term only after the experience of the synchronic correlation. For example, the fact that humans use one term of right (not two terms of tangible

(6)

right and visible right) depends on the experience of the synchronic correlation between the two heterogeneous directions.

Based on this view, we will consider the object of human perception that is called one term of "this" as the synchronic two heterogeneous points or shapes. Besides we will consider the object of human perception that is called one term of "that" as the two heterogeneous points or shapes that are related by the temporal relation of "earlier than" or "later than".

So we adopt the view that spatial relations are not intermodal but temporal relations are intermodal. Two heterogeneous points only can temporally coincide. Analogs can be repeated by replacing points by distance, position, direction, shape, displacement, etc. The spatial relations are the ordering schema that relates two homogeneous sensory inputs and the temporal relations are the ordering schema that relates two heterogeneous sensory inputs.

Summarizing, we give a positive answer to the question of whether we have distinct shape concepts for each sense and a negative answer to the question of whether the same shape concepts are employed on the basis of both sight and touch.

↻

Synchronicity

Visual space Tactual space

↻

up down left right up down left right

4.3 Linguistic definition of sense modality

One can draw a linguistic definition of sense modality from the above considerations. Spatial relations hold only between two homogeneous points. Or in other words, spatial relations are the form for ordering homogeneous minimum sensory inputs. One can linguistically define the same sense modalities as the relata of a spatial relation. The same sensory modalities can be linguistically characterized as the relata of spatial relations such as right, left, up, and down.

 The relations of right and left hold only between two points that have the same mode.

 The relations of right and left don't hold between two points that have different mode.

 The relations of up and down hold only between two points that have the same mode.

 The relations of up and down don't hold between two points

that have different mode.

Temporal relations hold only between two heterogeneous points. Or in other words, temporal relations are the form for ordering the heterogeneous minimum sensory inputs. One can linguistically define the different sense modalities as the relata of a temporal relation. Different sensory modalities can be linguistically characterized as the relata of temporal relations such as synchronicity, ''earlier than'', ''later than''.

 The relation of synchronicity holds only between two points that have different mode.

 The relation of synchronicity doesn’t hold between two points that have the same mode.

 The relations of ''earlier than'' and ''later than'' hold only between two points that have different mode.

 The relations of ''earlier than'' and ''later than'' don’t hold between two points that have the same mode.

5. A scenario of the logical construction of

human object perception

In this chapter, we will sketch a scenario of the logical construction of a concrete object of perception and three-dimensionality. As mentioned in chapter 2, we are now regarding human object perception not as a discovery but as the test of prediction. A concrete object of perception and three-dimensionality constructed according to the scenario give humans such predictions. That is, humans derive predictions following the inverse course of the scenario of the construction. 5.1 Preliminary: the problem of the order of construction

There is a controversial problem of whether humans obtain the concept of an object before the concept of depth. For example, an English philosopher A. J. Ayer considers that humans construct the concept of depth or three-dimensionality before the concept of an object. He claims that whatever may be the difference in content between visual and tactual experiences, it does not prevent their having a similarity of structure (Ayer 1940). Based on this presupposition, he claims that it is because of this similarity of structure that one finds it natural to regard a visual and a tactual construct as one material thing (Ayer 1940). He explicates how these separate groups of visual and tactual sense-contents are correlated in the following way (Ayer 1936):

any two of one's visual and tactual groups belong to the same material thing when every element of the visual group which is of minimal visual depth forms part of the same

sense-experience as an element of the tactual group which is of minimal tactual depth.

According to Ayer, therefore, both the visual structure and the tactual structure have proper depth in each of them and the concept of an object derives from the two concepts of depth.

In contrast with Ayer’s view, we will adopt the view that humans construct the concept of an object before the concept of Figure 2. Synchronicity between two perceptual spaces

(7)

depth. This is because, in the first place, we are now adopting the view that there is no similarity between visual-spatial structure and tactual-spatial structure. Following the claim in the previous section, we can define the concept of an object as the pairs of a visible point and a tangible point that is related by the temporal relation of synchronicity. In the second place, there are diverse views about whether depth is a kind of data or a kind of higher-order human construction. There are also diverse views about whether the depth is a spatial or a temporal distance. We adopt the view that the depth is not a kind of data but a higher-order construction, and unlike Ayer, we adopt the view that the depth is a temporal distance, following the claim in the previous section. That is, we can define the concept of depth not as a spatial distance between two homogeneous points but as a temporal distance between two heterogeneous points. Roughly speaking, one can consider that the concept of depth is not the concept concerning the spatially distant sight but the concept concerning the temporally distant touch. According to this view, ''three-dimensionality'' is not the reality but human temporal construct. For example, this idea is stated by a 17-18th century Irish philosopher G. Berkeley as follows (Berkeley 1713):

When I approach a distant object, the visible size and shape change perpetually. Therefore sight doesn’t inform us that the visible object exists at a distance. There is a continuing series of visible size and shape succeeding each other during the whole time of my approach. From them, I have by experience learned to collect what other senses I will be affected with after such a certain succession of time and motion.

Both the visible distance and the tangible distance defined in the previous section cannot be considered as the so-called depth. Following the claim in the previous section, we can define the concept of depth as a temporal distance from a visible point on a visual field to a tangible point that is going to be perceived in the future. According to the view we are now adopting, the temporal distance from a visible point that is perceived now to a tangible point that is going to be perceived in the future will be a prediction based only on the past experience of the synchronicity of the two heterogeneous points. Here, as mentioned in the previous section, the synchronized points constitute a concrete object of perception in our definition. So, unlike Ayer, the view we are now adopting is that humans construct the concept of an object before the concept of depth.

5.2 A scenario of logical construction

The definition of concrete object of perception precedes that of depth or three-dimensionality.

 The concrete object of perception (demonstrative "this") is a higher-order construction:

The relation of synchronicity temporally relates a visible point to a tangible point. The concrete object of perception ("this") can be constructed as the pairs of the two heterogeneous points that are related by the relation of

synchronicity.

The definition of depth or three-dimensionality will succeed this definition of an object. Two quantities can be defined. A ratio between visible displacement and tangible displacement of an object can be considered as one of the quantities that represent depth. This is because we say object A is nearer than object B in ordinary speech when the visible displacement of A is greater than that of B while the tangible displacement of A and B are equal. Further, a ratio between visible magnitude and tangible magnitude of an object also can be considered as one of the quantities that represent depth. This is because we say object A is nearer than object B in ordinary speech when the visible magnitude of A is greater than that of B while the tangible magnitude of A and B are equal. These quantities are defined based on the temporal relation of synchronicity because objects A and B are defined based on the temporal relation. The visible space or visual field that is temporally coordinated with the tangible space by these quantities cannot be considered as a mere colored pattern but as stereopsis. The temporal structure constructed between the two heterogeneous spatial structures is the depth or the so-called three-dimensionality.

 Depth or three-dimensionality (demonstrative "that") is a higher-order construction:

Depth or three-dimensionality can be constructed as the temporal structure between the visible spatial structure and the tangible spatial structure.

As mentioned earlier, unlike Ayer we cannot say that there is always a similarity between visible spatial structure and tangible spatial structure. According to our view, rather, humans get temporal information from the difference between the two spatial structures. For example, let there be a circular coin that looked elliptical from some points of view. According to our view, the term ''circular'' denotes a tangible shape, and the term ''elliptic'' denotes a visible shape. From the difference between the two heterogeneous shapes get humans the temporal information that each of the tangible points on the tangible circle has different temporal distances from them. If both the tangible shape and the visible shape are the same, then each of the tangible points on the tangible shape has the same temporal distance from them.

We are now regarding human object perception not as a discovery but as the test of prediction. The concrete object of perception and three-dimensionality constructed according to this scenario give humans such predictions. That is, humans derive predictions following the inverse course of the scenario of the construction.

6. Reframing the problems

In this chapter, we reframe the problem of inferring the cause of sensory input and the problem of the ambiguity of the Necker cube.

(8)

6.1 The necessity of reframing the problems

From the view that we are now adopting, the prevailing statement of the problem of human visual perception and depth recognition is inadequate to understand the whole problem. As can be seen in the quotation from Marr, it is said that in the theory of visual processes the underlying task is to reliably derive properties of the world from images of it (Marr 1982). Here the term ''image'' is understood as something that has two-dimensionality and the world is understood as something that has three-dimensionality. Based on this understanding, the problem of the ambiguity of an image is now widely being tackled. We claim that the term ''image'' that emerges in the prevailing statement of the problem needs to be analyzed from the viewpoint of its modality.

It is sometimes said that the explanation of the Necker cube is simple as follows (Frith 2007):

Our brain sees it as a cube rather than as the two-dimensional drawing it really is. But, as a cube, it is ambiguous. It has two possible three-dimensional versions. Our brain randomly switches from one to the other in its continuous attempts to find a better fit for the sensory signals.

However, as we will see below, an account like this is not considered to essentially solve the problem.

6.2 The problem of inferring the cause of sensory inputs As mentioned in chapter 2, the realistic presupposition derives the indirect or representational theory of perception and the causal theory of perception. The former claims that natural object is inferred from sense and the latter claims that the relation between natural object and sense is the relation of cause and effect. In contrast to the realistic presupposition, one can adopt an anti-realistic presupposition about the object of human perception that derives the direct theory of perception, which claims that sense is not a representation but immediately real for humans and that object is not an inference from sense but construction from sense. The view that object is not an inference from sense but construction from sense is a view also found in Russell (Russell 1914). While the ontological status of the inferred thing may be different from that of sense, the ontological status of the constructed thing can be the same as that of sense.

The problem of inferring the cause of sensory inputs is reframed as the problem of calculating or predicting the temporal structure between the visible spatial structure and the tangible spatial structure. One can take the problem not as the problem of inferring the object in the objective world from two heterogeneous sensory inputs but as the problem of constructing the temporal correspondence between two heterogeneous sensory inputs.

6.3 The problem of the ambiguity of the Necker cube As mentioned in chapter 2, the realistic presupposition derives the view that sense has the epistemological status of appearance or Schein, or in other words, the view that sense is the origin of the error that inhibits or blocks humans' pursuit of truth. According to this view, sense is a veil of the world while humans

pursue a true or an accurate knowledge of the world as it is. In contrast to the realistic presupposition, one can adopt an anti-realistic presupposition about the object of human perception, which derives the view about sense that sense is the basis of intersubjective knowledge of the world including optical illusions.

In general, it's difficult to understand the strict mechanism of an optical illusion. Intellectual knowledge does not always correct incorrect visual knowledge. According to Gregory, object perception is well discussed in Bayesian terms (Kersten, D., Mamassian, P. and Yuille, A. 2004). However, in contrast with perception, an illusion is not sufficiently discussed in this way. How can perspective illusions fit a Bayesian account? Gregory has classified well-known illusions into three categories: illusions that are Bayesian, illusions that are counter-Bayesian, and illusions that are not related to Bayesian terms. Gregory classified Necker cube as the illusion that is not related to Bayesian terms, while he classified the so-called hollow face and Kanizsa triangle as the Bayesian illusion (Gregory 2006).

We proceed to take up the problem of the ambiguity of the Necker cube. The Necker cube is a well-known optical illusion that concerns the problem of the ambiguity of an image. The existence of two different but perfectly plausible three-dimensional interpretations of the two-dimensional image is the ambiguity of the Necker cube (Marr 1982). Based on the preceding discussions in this paper, we can reinterpret the ambiguity of the Necker cube as the two possible temporal structures between the same two heterogeneous spatial structures:  The Necker cube can be constructed as the erroneous

mental inversion of the veridical temporal relation between the spatial structure of visible space and the spatial structure of tangible space.

This interpretation of the Necker cube basically can be applied to all illusions that concern the inversion of the depth. For example, the illusion named ‘Reverspective’ by Patrick Hughes can be interpreted as the collapse of the constructed temporal structure between two heterogeneous spaces. That is, the illusion occurs when a constructed temporal structure comes out logically impossible. The problem of the ambiguity of the Necker cube is reframed as the problem of two possible temporal structures between the same two heterogeneous spatial structures. Because the constructed temporal structure between two heterogeneous spaces is a prediction, one can consciously switch from one to the other.

According to this viewpoint, visual shape itself doesn’t have ''two-dimensionality''. The concept of depth is defined as a temporal relation. A visual shape is called two-dimensional or flat when all the visible points that constitute the visual shape have the same or proportional (linear) temporal distances to each corresponding tangible point that is going to be perceived in the future. The prevailing statement that an image is two-dimensional is, according to this viewpoint, the statement about depth or three-dimensionality itself. We consider that the Firths’ account of the Necker cube quoted at the beginning of this chapter is a

(9)

kind of begging the question and that such an account is not an essential solution to the problem. According to our viewpoint, in contrast, a visual shape is two-dimensional or flat when all the visible points that constitute the visual shape have linear temporal distances to each corresponding tangible point. If the temporal distances are quadric then the tangible shape is called a cylinder and the visible shape that corresponds to the tangible shape will not be called two-dimensional but be called three-dimensional.

This interpretation of the ambiguity of the Necker cube can be implemented by Carnapian inductive logic, which will give us a probability account of the Necker cube.

7. Language setup

We set up language for formal discussion implemented by Carnapian inductive logic.

7.1 Some deviations from the ordinary usage of logic We will use logic for the logical construction of a concrete perceptual object. As such, our usage of logic will have some deviations from the ordinary usage of logic. Roughly speaking, the deviation concerns the relation between term, sentence, and inference.

The so-called individual constants ordinarily denote particulars. In contrast with this ordinary usage, our individual constants denote visible points and tangible points. We assume these points are not particulars but the minimum inputs from which particulars are made. In this sense, our usage of logic is considered to deviate from the ordinary usage of logic. This deviation occurs mainly because the so-called propositional function in modern logic was born in a philosophical analytic context. After Frege, it is generally considered that one must not ask the meaning of a term apart from the sentence that contains it. This view is called the context principle. He also proposed the priority thesis of judgment which claims that judgment (sentence) precedes concept (term). According to these views, we don’t make sentences by connecting ready-made terms but we obtain terms only after the analysis of ready-made sentences (Frege 1879). Because propositional function in modern logic was born in such a philosophical analytic context, blanks in a propositional function are filled only with terms, and a filled propositional function is always a judgment. In contrast with this ordinary usage of propositional function, our individual constants are not term and our filled propositional functions will not be a sentence in our everyday usage (e.g. A tangible point a is synchronous with a visible point b.).

The general definition of logic is that it is the analysis and cataloging of valid arguments (Mendelson 1987). The inference is the method to produce a new sentence from a set of sentences. An argument consists of two sentences, the premise and conclusion, and logic concerns the form of the argument. In contrast with these general definitions of logic, logical relation between our sentences will not be inference in our everyday usage. This is because we will use logic for the logical construction of a concrete perceptual object. We are going to construct a concrete perceptual object (demonstratives such as "this" and "that") out of inference, which means inference

precedes concept in a certain sense. This will make the above deviations from the ordinary usage of logic.

7.2 Carnapian inductive logic

It can be said that logic concerns the strength of the relation between the premises and the conclusion of arguments. The relations can have various degrees of strength. For example, if the premises logically entail the conclusion, the relation is the strongest. If the premises logically entail the negation of the conclusion, the relation is the weakest. If the argument is inductive, the strength of the relation between the premises and the conclusion will fall somewhere between the above two cases. The concept of probability is utilized to grade the strength of the logical relation between the premise and the conclusion of arguments. This probability is called inductive probability (Maher 2006). The concept of inductive probability applies to arguments and the inductive probability of an argument is the probability that its conclusion is true given that its premises are true.

One of the most important uses of inductive logic is to predict the future (Skyrms 1966). We use induction when we predict the future, or when we use our knowledge of the past and present as a guide to our predictions of the future.

Carnap attempted to, so to speak, embed both the inference of deduction and enumerative induction into the mathematical probability theory axiomatized by Kolmogorov (Kolmogorov 1933). That is, the numerical value 1 is assigned to the strongest relation and the numerical value 0 is assigned to the weakest relation. Besides, numerical values are assigned for all logical relations in the given language L so as to satisfy the Kolmogorov’s axiom:

(1) For any A, B, c(A, B)  0. (2) For any A, c(A, A) = 1.

(3) For any A, B, c(A, B) + c(A, B) = 1.

(4) For any A, B, C, c(A&B, C) = c(A, C) c(B, A&C).

(5) For any A, B, C, D, if A



C, B



D, then c(A, B) = c(C, D).

Here A, B, and C are well-formed formulae in the given language

L. These are conclusion or premise. The function c is called the

inductive probability. Carnap attempted to obtain how the function c is uniquely determined for any fixed language (Carnap 1962).

Inductive logic concerns to what extent the premise supports or justifies the conclusion (What inductive logic concerns is not the context of discovery but the context of justification). As mentioned in chapter 2, we regard human object perception as a test of one’s prediction. Therefore we can link the constructivist view of human object perception and Carnapian inductive logic as a method.

7.3 Preliminary: the problem of the choice of primitive Language consists of primitive predicates, individual constants, and logical symbols. As primitive predicate, one can choose either relation or property, where the term relation means a propositional function that contains two or more variables and the term property means a propositional function that contains only

(10)

one variable as usual. Let us assume that the a priori forms of space and time are linguistically expressed by a binary relation, that is, the propositional function that contains two variables. We choose the binary relation of space and time as primitive predicates.

7.4 Two technical deficiencies of current inductive logic Because of the following technical deficiencies of current inductive logic, full construction is not available for now. There are two main technical deficiencies. Firstly, Carnap dealt with only unary predicates. Because we defined sense modalities as the relata of binary relations, we need the inductive logic that can deal with binary predicates. Secondly, Carnap dealt with only first-order predicates. Because we defined both veridical perception and illusion as a higher-order construction or a fiction, we need the inductive logic that can deal with higher-order predicates.

At present, there are two kinds of expanded Carnapian inductive logic. One expansion is the expansion to the direction of the species of inference: Hintikka has expanded Carnapian inductive logic to deal with not only enumerative induction but also analogy (Hintikka 1966). The other is the expansion to the direction of the species of sentence: Recently, Paris has been expanding Carnapian inductive logic to deal with not only unary predicate but also polyadic predicate including relation (Paris, J., Vencovska, A., 2015). The probabilistic logical construction of the Necker cube needs more comprehensive and applicative inductive logic.

7.5 Language setup

We set up the language L to start the probabilistic logical construction. Language consists of primitive predicates, individual constants, and logical symbols. According to the preceding discussions, the language L for the probabilistic logical construction of human object perception will be as follows.

Firstly, primitive predicates, which correspond to a priori formats, are the following relations.

Space:

 S1: x is located on the right of y.  S2: x is located on the left of y.  S3: x is located in the up of y.  S4: x is located in the down of y. Time:

 T1: x is synchronous with y.  T2: x is earlier than y.  T3: x is later than y.

These are supposed to be exclusive and exhaustive.

Secondly, individual constants are the following. These are minimum sensory inputs:

 visible points: a1, a2, …  tangible points: b1, b2, …

Thirdly, logical symbols are

 not: 

 and:   or:   if … then: 

The atomic sentences consist of individual constants and predicates.

 a1S1a2, b1S1b2, …  a1T1b1, a2T1b2, …

The molecular sentences consist of atomic sentences and logical symbols. Lastly, the sentences are atomic sentences and molecular sentences.

The future work is to expand a more comprehensive inductive logic and to obtain the function c that models human object perception for the language L.

8. Closing Remarks

An adequate Bayesian theory seems to require the axiomatization of background knowledge and this paper has argued how to go about it.

The problem of the ambiguity of an image is a crucial problem for the duplication or modeling of human object perception. I attempted to point out a deficiency of the prevailing statement of the problem that the underlying task of the theory of visual processes is to derive properties of the three-dimensional world from two-dimensional images of it. Instead of the prevailing Bayesian statistical constructivist approach that presupposes realism about the object of human perception, this paper proposed a logical constructivist approach that presupposes anti-realism about the object of human perception. It can be expected that the probabilistic logical construction implemented by Carnapian inductive logic provides us how to go about programming human object perception on a computer. It can be expected that the logical construction provides a computer program or software that expresses the process of the establishment of human object perception.

As is stated by Gregory (Gregory 2006), the problem of optical illusion in human object perception can be said as the problem that intellectual knowledge does not always correct incorrect visual knowledge. This problem concerns the following questions: How do perceptions relate to conceptions? How can perceptual experience justify our conceptual belief? This problem of the relation between perceptual experience and conceptual belief is one of the central issues in contemporary epistemology of perception. One of the prevailing views in this discipline is the view that the relation between them cannot be logical since sensations are not beliefs or other propositional attitudes. Since one can only have the inferential relation of justification when both relata have conceptual content, according to this view, a perceptual experience cannot support a conceptual belief. If the object of human perception is logically constructed from the constructivist view then the view that the relation between perceptual experience and conceptual belief is a logical one may be defended against this prevailing view.

(11)

If both veridical perception and optical illusion are constructed out of the same primitive in the attempt of logical construction, the anti-realistic direct theory of perception, which claims that sense is not a representation for humans but immediately real for humans, can be defended against the so-called ‘argument from illusion’. In the philosophy of perception, the argument from illusion is a well-known argument that supports the realistic indirect theory of perception.

These philosophical problems of optical illusion can concern the pursuit of the duplication or modeling of human object perception.

As Marr put it, the bones of the theory of vision should correspond to the perspective of the plain man, that is, what the plain man knows to be true about visual perception. Besides, it should concern the perspective of the neurophysiology scientists who know a great deal about how the nervous system is built. For example, it comes to be known that there are properties of vision one cannot explain only by the MT area neuron activity, which is known to have a strong connection to visual motion (Curr Biol. 21 (23): 2023-2028. 2011). These findings may be consistent with the view that an adequate understanding of our capacity to perceive objects and three-dimensional space will require a united theory of sight and touch. Because one of the major aims of neurophysiology is to understand the neural basis of perceptual phenomena, whether there is a consistency with neurophysiological work must be asked.

Reference

Ayer, A. J. (1936). Language Truth and Logic. Gollancz. Ayer, A. J. (1940). Foundations of Empirical Knowledge,

Macmillan and Co., Limited.

Berkeley, G. (1709). An Essay Towards a New Theory of Vision, The

works of George Berkeley bishop of Cloyne 1. eds. Luce, A. A.,

Jessop, T. E., 159–239, London: Thomas Nelson and Sons. Berkeley, G. (1713). Three Dialogues between Hylas and Philonous,

The works of George Berkeley bishop of Cloyne 2. eds. Luce, A.

A., Jessop, T. E., 163–263, London: Thomas Nelson and Sons. Carnap, R. (1928). Der logische Aufbau der Welt. Berlin: Weltkreis

Verlag.

Carnap, R. (1962). Logical Foundations of Probability (2nd ed.), University of Chicago Press.

Fish, W. (2010). Philosophy of perception. Routledge. Friston, K., Kilner, J., & Harrison L., (2006). ‘A free energy

principle for the brain,’ Journal of Physiology Paris, 100: 70-87.

Frith, C. D. (2007). Making up the Mind: How the Brain Creates our

Mental World, Malden: Blackwell.

Gregory, R. L. (2006). ‘Bayes Window (4): Table of illusions,’

Perception, 35(4): 431-2.

Gregory, R. L. (2006). ‘Bayes Window (2),’ Perception, 35: 143-144. Helmholtz, H. (1867/1962). Treatise on Physiological Optics volume

3, (New York: Dover, 1962); English translation by J P C Southall for the Optical Society of America (1925) from the 3rd German edition of Handbuch der physiologiscien Optik (first published in 1867, Leipzig: Voss).

Hintikka, J. (1966). ‘A Two-Dimensional Continuum of Inductive Methods,’ Aspects of Inductive Logic. North-Holland Publishing Company.

Janich, P. (1992). Grenzen der Naturwissenschaft. Munchen: C. H.

Beck.

Kersten, D., Mamassian, P. and Yuille, A. (2004). ‘Object perception as Bayesian inference,’ Annual Review of Psychology, 55: 271-304.

Mach, E. (1984). The analysis of sensations and the relation of

the physical to the psychical. Trans. by C. M. Williams, La

Salle, Open Court.

Maher, P. (2006). ‘The concept of inductive probability,’ Erkenntnis, 65: 185-206.

Marr, D. (1982). Vision: A Computational Investigation into the

Human Representation and Processing of Visual Information.

Cambridge, MIT Press.

Mendelson, E. (1987). Introduction to Mathematical Logic. Wadsworth & Brooks/Cole.

Paris, J., Vencovska, A., (2015). Pure Inductive Logic, Cambridge University Press.

Russell, B. (1914). Our Knowledge of the External World as a Field

for Scientific Method in Philosophy. Open Court Publishing.

Skyrms, B. (1966). Choice and Chance, Dickenson Publishing Company, Inc.

Vincent (2015). ‘A tutorial on Bayesian models of perception,’

Journal of mathematical psychology, 66: 103-114.

Acknowledgments The main idea of this paper was presented at the 16th Congress of the Logic, Methodology, and the Philosophy of Science and Technology, (Prague, Czech Republic, 2019). I thank all comments from the participants.