featured Dr. Auslender's newest results in Associations (Market Affinity)
analysis (AA) from speaker's upcoming book. As usual for this speaker,
the talk was illustrated by an effusive, 81 slide show.
Association Rules/heuristic patterns, categorize customers by means of
actionable profiles. In order to reduce the typically large (exponential)
number of combinations of binary attributes, various functions of 'interestness'
are used. Example shown has a function LIFT(A->B) defined as S(A->B)/S(A)/S(B),
where S stands for 'support' (frequency) of an event and '->' means
'causes' (or 'coincides' since causality relation should be
antisymmetrical). LIFT measures how un-independent two events are.
Values of LIFT > 1 make events A and B (and the rule A->B ) 'interesting'
(or positively correlated). Note that if LIFT(A->B) > 1 then usually
LIFT(A->~B) < 1 what makes us to be even more selective and to set the
interest threshold higher. The values of LIFT prove to be more selective
than the regression coefficients particularly when there is no reliable
Association Analysis is also used for knowledge discovery (mining for
nuggets of relevant information) and is useful in variable/rule selection
in conjunction with a greater than 1 'interestness' function,
support/frequency and high confidence %. Market Basket Analysis typically
prunes support at 5% - what tends to remove uncorrelated or negatively
Unfortunately despite all that pruning, many of the rules we are left with
may be/are irrelevant and anecdotal. Even more doubt on the methodology
usefulness is thrown by the Simpson paradox, where the directions of
associations may be reversed when another factor is added into analysis -
illustrated on example of death penalty race bias - slide 60; overall 11%
of white defendants get death penalty, compared to 7.9% for black
defendants. However, when broken down by the race of the victim, for
white victims white defendants get death 11.3% vs 22.9% for black
defendants, and with black victims whites get death in 0% and black
defendants in 2.8% of cases.
Other methods mentioned were:
- Association tree - structure induced by descending Confidence values
- Association Chi-Square: items deemed dependent per Chi-Square test
become composite (clustering) and process continues until there is no more
composite items possible.
- Terse Representations of AA - utilizes a regression type representation
- Bayes Nets - classification rules on items/variables linked by
- Log-linear model to summarize the results (?)
- Tree modeling - un-interpretable and non-intuitive,
An interesting method presented for visualization of the
correspondence/association data is the link graph (Giudici and Passerone)
that graphically depicts coincidence/casualty of events. By throttling
the coincidence/odds ratio levels we can obtain event/transaction
clusterings of varying importance. Promotional variables are added to
clusters to check if promotions affect sales.
For all of the shortcomings and paradoxes, the AA is expected to help with
many hard practical problems like:
Click stream analysis, creating profiles for fraud detection, police,
identification of party voting affinity, book purchase associations
(Amazon, Vignette), e-mail link analysis an web search, printing/selection
of point-of-sale retail coupons, cross selling opportunities, sequencing
of promotional offers,…
Fortunately nobody said statistics and data analysis was easy (speaker
said several times that there is no rose garden) or we wouldn't expect to
earn big bucks doing it.
The talk slides are available under