 Meeting Notes

Have news to share? Email your news announcement or press release to the webmaster.

Notes from April 21, 2004

Automating the Modeling Process through Structured Risk Minimization' (SRM)

Dr. Robert Cooley, VP at KXEN (Knowledge Extraction Engines

The talk presented Structured (modeling?) Risk/error as a function of model complexity consisting of two components - modeling/fit error which decreases with model complexity and confidence interval risk/error which increases with model complexity.  The total risk has a minimum and there is a best model for that minimal risk.

As a most popular implementation of SRM speaker pointed to Support Vector Machines (SVM) - that apply kernel operators to machine learning.  However KXEN doesn't use kernel operators because the results obtained are 'too difficult to explain to the clients'.  Instead a concept of Vapnik-Chernonienkis (VC) dimension is used - but we didn't learn much about it because it would take too long to explain during our meeting (there are still Ph.D. dissertations made on it).  Fortunately there is a simpler concept of 'fractal dimension' contributed by Chaos Theory that appears to have basic properties similar to those of VC dimension.

Let imagine x-y plane (data set restricted to just two columns/attributes) and the line x=y.  If data points are confined to this line then they have the fractal dimension of 1.  If data points are somewhat frizzed, the fractal dimension will be somewhat higher than 1.  If the data points are uniformly distributed on the x-y plane, then the fractal dimension will be 2.  Suppose we add/reconsider the z attribute/axis and find the line now is given by x=y=z. Even if the x-y data was confined to x=y line, there can be some frizzing in z direction - and fractal dimension of x-y-z data would be more than 1, that it was on x-y plane.  Data with fractal dimension of one require just one variable to describe/model, even though in our example it will be neither x nor y.

This example points to practical use of fractal dimension - calculate it including all variables existing in the data set - this will be the upper bound on the number of variables needed to construct the model.  What is left is to find the best set of variables.  (Of course, there are some pitfalls, like fractal dimension is local, i.e. in general it varies from point to point, and independently the best variable set is local/may change even if the dimension doesn't (change much).  The process is also quite computation intensive.)

Other VC dimension properties and SRM we learned are:
*  It measures the complexity (number of variables?) of a set of mappings, in a way employing the Ockham razor principle (after eliminating all that's impossible, we are left with the solution, however unlikely it may seem ?).  It can be linked to generalization results, indicates generalization capacity.
*  Additional variables improve modeling quality at low cost, no harm from random or correlated variables, no over-fitting, no need for exploratory analysis ()
*  The modeling process can be automated, which is the primary objective/benefit from SRM, not improved prediction /learning /generalization.  It also makes it simple to manage generalizations.
*  SRM process is robust in several ways/aspects: regression (resistant to outliers), statistical (free of distribution assumptions, not harmed by skewed distributions), training (small training set(?), missing values), engineering (gives answer, doesn't crash), deployment (not stymied by values not seen, tells you how good model you have, model degrades slowly and one can tell when it goes bad)
*  implements smart segmentation/classification of data, as a Neural Network (NN).  Employs binning (discretization of a continuous variables into a finite set of intervals), e.g. in piece-wise models.
What is also good to remember is that:
*  When you have a hammer, everything looks like a nail,
*  Simplicity is hard.  When solution is simple, God is answering (Albert Einstein),
*  Estimating distribution is harder than harder than estimating function.

The absence of details on promised Support Vectors and VC dimension was compensated by an example of first order (piece-wise) regression of a sine function on the binned (-Pi, Pi) interval.  Overall the talk shed a light on yet another aspect/technique of modeling, just when I thought we heard it all in our meetings.