«
»
V

Summary: What makes patterns interesting

February 5, 2009

Summary of “What makes patterns interesting in knowledge discovery systems.”

Summary: What makes patterns interesting

By: Chris Malek

Feb 05 2009

Category: Summaries

No Comments »

Silberschatz, A. and Tuzhilin, A. (1996). What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering, 8(6):970-974.

This paper is about devising a measure of interestingness of patterns.  Early on in knowledge discovery research, researchers realized that knowledge discovery systems can easily generate a huge number of patterns from a data set, most of which may be of no interest to the user of the system (p. 970).    And so researchers began to work on the problem of interestingness.   How do we know when a pattern found by the knowledge discovery system would be of interest to the user, so that we can only present such patterns to the user?    Researchers have taken two approaches towards developing interestingness measures:  objective measures, and subjective measures.

Objective measures are based only on the structure of the pattern and the underlying data used in the discovery process.  Such measures can be useful in some cases, but largely fail to capture the full complexities of the pattern discovery process, and can generate patterns that are objectively interesting, but which are still of little interest to the user.  Subjective measures of interestingness consider patterns from the point of view of the user, as well as the structure of the pattern and underlying data.     Prior work in subjective measures were limited in scope to a particular domain of patterns.

The measure that the authors propose, and the scientific contribution that they hope to make, is a domain-independent subjective measure of interestingness which can be used on patterns revealed by a knowledge discovery process.

They start by saying that there are two characteristics of a pattern that cause it to be interesting to a user: unexpectedness (it is “surprising” to the user) and actionability (the user can act on it to his advantage).   They go on to say that they believe that the majority of unexpected patterns are actionable, and  conversely that the majority of actionable patterns are unexpected.  Thus actionability and unexpectedness act as viable approximations of each other — we can look for patterns which exhibit one of those two characteristics, and assume that it also has the other.

It is hard to determine the actionability of a pattern for several reasons: we need to associate an action with each potential pattern (or pattern cluster, if we can partition the pattern space into subsets of similar patterns), which can be difficult or impossible (we may not know the full pattern space); and the mapping of actions to patterns may change over time, which means we have to do this difficult mapping over and over again.   Thus the authors chose to develop a measure of unexpectedness of a pattern.  Thus, this paper is about a domain-independent measure of subjective interestingness of a pattern which is a function of the unexpectedness of that pattern (p. 791).

The authors want to determine unexpectedness by examining the effects of new data on a person’s belief system: the more the new data changes a person’s belief system, the more unexpected it is.   A belief system is made up of one or more beliefs, which they define  as logical statements with an associated confidence level given prior evidence.   If b is a belief, then this confidence level or degree of belief based on previous evidence ? can be expressed as d(b??).  People have two kinds of beliefs:  hard beliefs, which are unchanged by new evidence (we suspect the evidence to be faulty instead); and soft beliefs, which can be changed by new evidence.

A pattern which contradicts hard beliefs is always unexpected and thus is always interesting (p. 975), so computing interestingness of such data is trivial.  Thus, much of the rest of the paper is dedicated to devising a measure of unexpectedness in relation to soft beliefs.

The authors provide four potential measures of degrees of soft beliefs and how they are affected by new evidence.  They eliminate three of them as impractical or too limited in applicability and propose a Bayesian approach (p. 973), which updates the degree of belief in a particular belief given new evidence by using Bayes rule.

They then write an expression which uses this result to express interestingness as how much degrees of belief within a belief system were changed as a result of new data.  They then show how, with this definition of interestingness, unexpected patterns do indeed cause more change in degree in beliefs than expected patterns.   Since we’re already saying unexpectedness as a proxy for interestingness, we can use this measure to estimate subjective interestingness: the bigger the measure, the more interesting the pattern.

They conclude by describing a general procedure for how one would use these concepts to look for interesting patterns in a database that gets updated with new data:

“When new data arrives, degrees of all the beliefs are revised based on new data.  If some of the degrees change above predetermined threshold levels, this means that there are some interesting patterns in the data and that the discovery processes to extract interesting patterns should be launched” (p. 974).

They call this belief-driven discovery.  The authors then say: see our upcoming paper, in which we describe those discovery processes.

Critique

In terms of Hevner et. al. (2004):

  • The artifacts that the paper describes are constructs: interestingness, unexpectedness, actionability.   Those constructs are intended to address a real business problem that knowledge discovery systems have: knowledge discovery systems can easily generate a huge number of patterns from a data set, most of which may be of no interest to the user of the system (p. 970), which makes the system not very useful.
  • The artifact could definitely provide utility if implemented correctly.
  • They do evaluate their concepts (mathematically) to show that they do solve the problem they set out to do.
  • Their measure does offer scientific contribution, because it is a domain-independent measure of interestingness, which had not yet been devised.
  • They certainly are rigorous in defining and representing their artifacts by deriving mathematical metrics for estimating them.
  • They do detail their search space for potential metrics for the effect of new data on belief systems (part of building the interestingness measure).
  • They obviously communicate their results to a technical audience.

The only potential failures to stick to the seven characteristics of good design research as described by Hevner et. al. (2004) is in direct utility and in communicating to a managerial audience (they might have done this in another venue).

Utility

I see two issues around instantiating these metrics: defining beliefs, and computing prior and conditional probabilities.

Defining beliefs seems tricky.  Clearly, in order to use the metric they provide on p. 973, one has to express a user’s belief in a mathematically testable way.   One must also be able to quantify the degree to which one holds a belief (for soft beliefs).   It seems like both of these could be hard for most people.   I am guessing that how you might implement a belief is as a function or set of functions on a feature vector extracted from the dataset (p. 973).  Your belief is about what you believe the normal range of the output of that function should be given the data you’ve already seen.   Then you state how confident you are in that range of values.

In Bayes rule, computing appropriate prior probabilities and conditional probabilities (likelihoods, or posterior probabilities) can be hard (especially the likelihoods), and yet doing it correctly is critical for successful use of Bayes rule.  If you do it wrong, you still may end up finding uninteresting patterns.  For prior probabilites, you also clearly need some experience with the data before you can even have a belief ?and begin getting a good estimate of your prior probability P(?| ?).  For the conditional probabilities, they do seem to address this on p. 973.    In both cases, you need to know what aspects of the data to look at, which goes back to defining beliefs.

Leave a Reply