Knowledge Problems

Monday, August 13, 2007

The Fundamental Operation: Projection

To illustrate a further important point, below is an actual object-property table. It shows 10 objects and 5 attributes. You can think of it as representing 10 objects that just came off an assembly line. Each of those objects has 5 properties, for example `\{P_1,P_2,P_3,P_4,P_5\}=\{text{color},text{shape},text{texture},text{size},text{orientation}\}`. Thus, for example, these objects can have three possible colors, `P_1 in \{0,1,2\}`, which might be `\{text{blue},text{red},text{green}\}`. It can be noted that not all of the objects need be distinct, and in fact there are several identical objects in this set. (Can you find them? Just kidding.)


Object Property
  `P_1` `P_2 `P_3` `P_4` `P_5`
`O_1` 1 2 0 1 1
`O_2` 1 2 0 1 1
`O_3` 2 0 0 1 0
`O_4` 0 0 1 2 1
`O_5` 2 1 0 2 1
`O_6` 0 0 1 2 2
`O_7` 2 0 0 1 0
`O_8` 0 1 2 2 1
`O_9` 2 1 0 2 2
`O_{10}` 2 0 0 1 0


The crucial fact that must be observed is that the distinguishability of objects depends on which attributes are considered. For example, in the table above, we can already note that objects `\{O_1,O_2\}` are indistinguishable, as are `\{O_3,O_7,O_{10}\}`. The objects in these sets correspond to the same point in feature space, even if they are physically distinct objects (which we assume they are). However, if we were to further assume that attributes `\{P_1,P_2,P_3\}` are inaccessible, and were therefore able to consider only attributes `P_4` and `P_5`, we would observe (table below) that several other objects now also become indistinguishable, collapsing to the same point in feature space.


Object Property
  `P_4` `P_5`
`O_1` 1 1
`O_2` 1 1
`O_3` 1 0
`O_4` 2 1
`O_5` 2 1
`O_6` 2 2
`O_7` 1 0
`O_8` 2 1
`O_9` 2 2
`O_{10}` 1 0


In particular, we now find that we have the following sets of indistinguishable objects:

`{(\{O_1,O_2\}),(\{O_3,O_7,O_{10}\}),(\{O_4,O_5,O_8\}),(\{O_6,O_9\}):}`

So there are only four discriminable kinds of objects represented in this knowledge system after attributes `\{P_1,P_2,P_3\}` have been dropped and consideration has been restricted to attributes `\{P_4,P_5\}`. These four kinds are called "equivalence classes" because the entities contained in each class are indistinguishable (equivalent) based on the attributes under consideration. The process of "dropping attributes" is called projection and corresponds to an orthogonal (parallel to axes) geometric projection in feature space. In the previous case, we would say that the data is projected onto dimensions `\{P_4,P_5\}`. In general, the projection onto a given set of attributes will produce a collection of equivalence classes containing objects which cannot be disambiguated based on those attributes. This will be extremely important in what follows.

Wednesday, August 01, 2007

A Basic Knowledge Representation Framework

Reminder: There is some mathematical notation in this post. If you use Internet Explorer, you may need to download the free MathPlayer plug-in from Design Science. If you can see the equation below (rather than a bunch of ASCII text), you should already be good to go:

`sum_(k=1)^n k = 1+2+ cdots +n=(n(n+1))/2`

OK. On with the show...


Perhaps the most basic knowledge representation framework is the rectangular object-property table or array, in which rows represent individual objects or entities, and columns represent properties or variables. An example is shown below:


Object Property
  `P_1` `P_2 `P_3` `\ldots` `P_n`
`O_1` 2 1 100 `\ldots` -0.01
`O_2` 3 0 430 `\ldots` 0.23
`O_3` 2 1 `\ldots` 0.30
`\vdots` `\vdots` `\vdots` `\vdots` `\vdots` `\vdots`
`O_m` 3 0 430 `\ldots` -0.43


Sometimes this format is simply called "raw data" or "flat data," because this is the basic representation for data acquired from multiple sensors, channels, etc., and the format used in familiar spreadsheet applications. In the Rough Set literature (pawlak_91), this framework is called an "information system" (pawlak_81), "knowledge representation system" (wong_ziarko_86), "attribute-value system" (ziarko_shan_96) or "information table" (yao_yao_02). In the case where the property values are binary (i.e., present or absent), the format is what Watanabe (1985) calls an "object-predicate table," or "Aristotelian table". It can be given a rigorous mathematical definition, but we'll skip that here, since it's pretty clear what's going on. A great many knowledge discovery problems can be represented in this model (ziarko_shan_96).

The `m \times n` attribute-value table above represents a set of `m` objects (also called entities or situations), wherein each object or situation (i.e., each row in the table) is described by a set of `n` properties (also called variables, attributes, features, or dimensions). We use these terms interchangeably as do others (goldstone_98,bruner_goodnow_86), although many authors also find it convenient to draw distinctions. In the example above, all properties have been coded numerically, although it makes no difference for our discussion. For example, the `P_1` property might represent "day of week," and the actual names of the days could be substituted for the numerical indices shown. The example also illustrates that different properties may have values drawn from different domains. For example, whereas property `P_1` appears to adopt single-digit integer values, property `P_2` appears to be a (binary) attribute adopting values in the domain `[0,1]`, property `P_3` appears to adopt larger integer values, and property `P_{n}` apparently adopts real (floating point) values, perhaps in the domain `(-1,1)`. The issue of missing values, such as that for property `P_3` on object `O_3`, will not be relevant to this discussion.

Observations


We will move to a more concrete example a bit later, but let us first pause to note a number of points about this kind of representation.

  1. An object or entity in this representation is just a list of property values, called a "tuple" in database jargon. Thus, object `O_1` can be described as the property vector or ordered list `[2,1,100,\ldots,-.01]`, which is just convenient shorthand for `[P_1=2,P_2=1,P_3=100,\ldots,P_{n}=-.01]`. Geometrically, then, each object represents a point in an `n`-dimensional "feature space."

  2. An object in this framework is just a list of property values. Two objects having the same set of property values are entirely indistinguishable. Depending on the application, it may still be the case that these objects are physically distinct from one another, but they are nevertheless indiscriminable based on their properties, and therefore completely interchangeable. Identical objects occupy the same point in feature space. There is in some respect a commitment here to something like Russell's "bundle theory" in which an object is nothing more or less than its descriptions (in all possible worlds), there being no "substrata" or essence to it; that is, there is no substrate in which the properties inhere. However, since all we can ever deal with rationally are descriptions of one sort or another, this doesn't seem like a shortcoming unique to this particular representational system. More on this later.

  3. This kind of system can represent many kinds of knowledge. The objects in question may be individual entities, such as donuts on a shelf, where the properties may indicate donut attributes such as flavor, topping, calories, etc. Alternatively, the objects may represent the state of some system at discrete times. For example, an "object" or "situation" might be the weather in Central Park at a given moment, so that `O_1` is the weather at 9am, `O_2` is the weather at 10am, `O_3` is the weather at 11am, and so on. The attributes in this case could be descriptors of weather such as temperature, pressure, precipitation, cloud cover, etc. Thus, the knowledge base in this instance represents a multidimensional time-series or multidimensional signal. This idea is incredibly powerful, because each object/row then represents "the state of the universe" at a given moment in time. By "universe," I mean "universe of discourse," that is, the set of all properties that we are concerned about in a given application, which in the limit we can imagine to include all the properties in the actual universe. (We can imagine!)

  4. A real-valued (i.e., "continuous") or high-cardinality variable such as `P_{n}` in the example above would typically need to be discretized (quantized) for purposes of analysis. Patterns in data often only become (statistically) visible when we step back and view the data at a distance, i.e., at a coarser level of resolution (ziarko_89). To do this, we represent an entire range of variable values with a single symbol, no different than when we round-off decimal numbers to the nearest integer. This process of quantization is actually a matter of categorization, which we will discuss in much greater detail later. In the examples that follow, we will play only with integer values.

  5. In almost all cases of any interest, a given object-property table will be considerably smaller than the maximum size that such a data table might have if all combinations of attribute values were to occur. That is to say, in practice, not every object that can logically occur does actually occur (Mervis_Rosch81). For example, although one might logically conceive of a flying animal that weighs more than 300 pounds, in fact there are no such animals. This means that not every value of the attribute "able to fly" defined on domain `{yes,no}` co-occurs with every value of attribute "weight" defined on the real numbers. The observation that not all objects that can occur do occur, or — more generally — that their probabilities of co-occurrence are not uniform — this is an observation about the very essence of structure (pomerantz_lockhead_91). Structure in data manifests through the unequal co-occurrence of certain sets of attribute values, i.e., the tendency of certain attribute combinations to occur with greater or lesser frequency than other attribute combinations. The "empty locations" in feature space — the events that could logically have happened but did not — are the hallmark of structure. We will return to this important idea a little later.

  6. There is an object-property duality that becomes apparent by rotating the object-property table 90° counterclockwise. Just as it is possible to describe an object as a vector or tuple of property values, it is possible to describe a property as a vector of "object values." A given property then corresponds to a particular point in "object space." This duality is very interesting and, as I understand it, forms the basis for the field of Formal Concept Analysis (FCA), but we will not be concerned with it here.


Finally, before we get to the major limitation of this framework, let us point out a minor limitation: The information system representation described above has been accused of failing to adequately represent relational information, (e.g., arnone_71). As Hahn & Chater (1997) point out, a "bird" is not just a collection of features {wings, beak, feathers, ...}, but is rather a collection of these features having a particular relationship to one another: "A creature with all the right features in the wrong arrangement would not be a bird!" However, while the information system may not be the optimal representation for relational information, it is still possible to represent such information within the system. If, for example, the distance from wing tip to wing tip is an important relational aspect of birds, we can introduce a property that represents this distance, and we can do the same for any other relational qualities. If there it is a relation among many features that is of importance, then we can introduce a new variable to code that as well. This is not to say that the information system gives us the reasoning apparatus by which to deduce new relational information, such as would automatically allow us to know, for example, that "A larger than B and B larger than C entails A larger than C." No, we are simply stipulating that it is possible to represent relational information within the object-property table, and this should be enough.

The Major Limitation


Now, having gotten the more neutral observations out of the way, it is crucial to point out a number of limitations on the sort of knowledge representation system we have been describing above. Principally, the very idea of there being "objects" and "attributes" is a philosophically troublesome one. Firstly, it is just not obvious that we can represent the totality of the external world in terms of objects (Watanabe_85). Who instructs us on what are the "correct" properties by which to distinguish legitimate objects? In the opinion of Bruner, Goodnow, & Austin (1986), all that is required of an attribute is that it be a distinguishable element: "An attribute, in brief, is any discriminable feature of an event that is susceptible of some discriminable variation from event to event. Indeed, if it did not vary it would very likely not be discriminable in any case — the fish will be the last to discover water." So essentially anything discriminable constitutes an attribute, which therefore places almost no constraint on the set of possible attributes for a given system. It goes without saying (maybe) that if we have a discriminable property `P_1` and a discriminable property `P_2`, we can always imagine an additional discriminable property `Q` that is equal to `P_1^2`, or `P_1 \times P_2`, or `\pi P_1 sqrt(P_2) + 42`. So what makes one feature set more legitimate than another? And if what distinguishes two objects is just their features, as our object-property representation scheme assumes, then this ambiguity on what constitutes legitimate features propagates to an ambiguity on what constitutes legitimate objects (watanabe_85).

More provocatively, are there indeed really such things as features or objects at all? This question did not appear to bother the medieval or ancient thinkers, who believed that objects and features were indeed objective aspects of the external world. "Tails" and "hooves" are indeed genuine features of horses, and "horses" are in turn genuine objects. The ancients could debate how significant the possession of tails or hooves is to a horse's horseness, but they could not debate the fact that these objects and attributes genuinely exist. However, in the wake of Berkeley and Kant, the confidence about what constitutes objects and features has evaporated, and it is now clear that the things we subjectively regard as objects and features are themselves the product of a complex interaction between what exists in the external world (distal stimulus) and a sequence of processing performed by our perceptual and cognitive apparatus. Thus, the features to which we have access are symbols employed by a knowledge representation system (our mind) that correspond in some consistent way to aspects of the external world (markman_99), but these features are still notably the product of a classification already imposed by the mind on the world. We cannot of course assume that features are strictly internal symbols, since then we would face a much larger problem of explaining in what way (if any) these features are grounded in reality (harnad_90).

This bifurcation between external and internal features has been given the catchy name of Occam's Cleaver (panaccio_05), which, stated poetically, instructs us that "we should be cautious not take as features of the things signified the features of the signs that signify them. In other words, we should not conflate representational features with ontological ones." This cautionary attitude toward features and objects certainly also characterizes the position of modern cognitive science, where the topic of how objects and features are "made" by minds remains a vigorous area of empirical study, often yielding surprising results. In machine learning and statistics, the question of "What are the features?" plays out practically in the areas of "feature selection" and "feature creation," which include such techniques as Independent Components Analysis and Projection Pursuit. Although there is no final answer to this question, it always being a matter of needs and expectations for a particular application, in some sense a learning theorist might say that the "right" set of features are those which capture the majority of intrinsic structure in a given system. Any set of features having this property is then the "right" set of features, and there may obviously be many such sets.

This last observation gives us a way to recover somewhat from criticisms regarding a severe over-commitment to objects and features. Let it be the case that an actual system (in the distal world) can indeed be described in a myriad of ways; that is, using a large or infinite variety of different feature sets. We can still assume that there is at least one such description that will adequately capture all the interesting structure in that system. In other words, there is some property-based description — not necessarily available to a given observer — but there is some description, in some property language, that adequately represents the phenomenon in question. This assumption can be rephrased to state that there is no system in the distal world that is completely impervious to description in some property language — i.e., by some set of attributes or variables. This is not an assumption that every distal system is describable by us — just that it is, in principle, susceptible to description.

The Major Objection


It will probably immediately be objected that in fact there is a perfect and devastating counterexample to the assumption that "there is no system in the distal world that is completely impervious to description in some property language," and that this counterexample is the Deity himself. After all, have not Rambam and a thousand lesser souls admonished us that God is beyond and above all description? Wasn't this their whole point: That there is at least one distal system, God, that is not amenable to any description whatsoever?

My answer is that, yes, in some instances this may have been exactly their point. But, I believe that many thinkers (e.g., Philo, Aquinas, see earlier) probably intended something much less sweeping, and meant by their remarks merely that God is not describable by or to human minds. Not that he is resistant to description altogether. (Can even God not describe God?) This more moderate unknowability proposal poses no problem for us, since all we are assuming of distal systems is susceptibility to some description, not necessarily description by the human mind. The more extreme claim, however — that God is resistant to property-based description of any kind whatsoever, whether the description be accessible by human minds or not — this would in fact pose a problem for us, if it were a coherent claim. But it is not.

It is certainly possible to say that "God is resistant to any description whatsoever," but this is writing a check that cannot be cashed. I do not believe we can claim to offer a coherent concept of "that which cannot be described in any possible description language." That is, we can neither offer an example of such an indescribable entity, nor can we provide an explanation of how or why it is that an entity would be resistant to description in every possible language. In light of this, we have to consider the idea of the absolutely indescribable entity to be incoherent. Now, I do realize that there are arguments made by Rambam and many others as to why it is that God cannot have qualities of any kind, which as I understand the arguments (see earlier posts) all come down to the issue of the unity or non-compositionality of God. And, as I said earlier as part of my "unavoidable heresy", the idea of non-compositionality is also incoherent, and so proofs from that direction cannot ameliorate the incoherence in the notion of a system that has absolute resistance to description.

And so my position here is as follows: The notion of an entity that is completely impervious to description under any possible description language is a notion that is incoherent, and a concept which can neither be explained nor exemplified. Hamilton (1864) writes that "What in reality has no qualities, has no existence in thought, — it is a logical nonentity," and by this I understand him to mean that it is incoherent to talk about a system or phenomenon which has no attributes whatsoever. Such a notion represents not profundity, but stupidity.

That being said, the idea that a system or phenomenon might be indescribable to some observer or by some property language is not stupidity. It may not be especially profound either, as it turns out, since we are well familiar with many observers having limited descriptive abilities; infants, mosquitoes, chess programs, etc. It is entirely coherent to state that a given phenomenon in the distal world is describable by certain observers and not by others. This is the approach we will take in what follows.