Concept learning

How do people generalize from examples?

Simplicity and complexity in concept learning. One of the main hypotheses in the lab is that humans choose the simplest generalization consistent with the objects that have been observed. The key idea of "complexity" can be formalized in various ways, all of which involve the length of the most compact representation of the concept or the observations (an idea generally referred to as Kolmogorov complexity). We study concepts defined over many different kinds of features, and thus many different kinds of complexity measure, but the idea is always the that the learner compresses or simplifies what has been observed, and that this simplification forms the basis for induction to future examples. The more difficulty the observations are to compress faithfully—the more complex the concept—the more difficulty humans have learning it.

Boolean complexity. Much of our research involves concepts defined over Boolean (binary-valued) features, in which case one can define the Boolean complexity as the length of the shortest logical formula that expresses the observations. For example, big red things or small red things is equivalent to red things, and thus has low Boolean complexity, while big red things or small blue things cannot be similarly compressed, and hence has higher Boolean complexity. The higher a concept's Boolean complexity, the more difficulty humans have learning the concept.

Algebra of concepts. With features that take on more than two values, we have developed a richer concept expression language, called the concept algebra, and an associated complexity measure, called algebraic complexity (see this paper for all the details, or Appendix I of this paper for a summary, or here for software for computing algebraic complexity). In the concept algebra, each concept is represented as the composition (algebraic combination) of all the featural patterns or "regularities" it contains. We have also developed MDL-based complexity metrics for concepts defined over continuous features.

Eureka! Simple patterns seem like "real" patterns—that is, patterns that are unlikely to be accidental. But how surprising is it really to see a given degree of order when none actually exists? To answer this question, you have to consider the statistical properties of the complexity measure. We call this the "Eureka!" hypothesis: the idea that simple patterns are psychologically compelling because they are likely if the world is orderly but unlikely if it is actually random. Quantifying this idea leads to a recipe for computing the (statistical) significance of a given level of simplicity.

Comprehensive concept sets. When conducting this sort of research, it is desirable to survey a comprehensive range of concepts, rather than focusing on a narrow segment of the types human learners might encounter. With Boolean concepts with a given number of features, it is possible to exhaustively enumerate the logically distinct concept types. See this paper for a discussion and comprehensive tables of Boolean concepts.

Links:

A paper summarizing the idea of complexity-minimization in human concept learning

A paper on Boolean complexity in human learning

A paper on complexity in concepts defined over continuous features

A paper on the statistical surprisingness of simplicity

A paper tabulating and illustrating the varieties of logically possible Boolean concepts.

A paper on representing concepts as algebraic combinations of regularities

A Matlab package for computing algebraic complexity