Monday, February 20, 2012

Question on association models: MINIMUM_ITEMSET_SIZE

I've been experimenting with the algorithm parameters for a market basket association model. The default MINIMUM_ITEMSET_SIZE is 1. This doesn't seem to make sense: what is the point of a single-member itemset? However changing the value to 2 substantially reduces the proportion of good recommendations obtained (which I'm testing via a holdout approach).

So I'm obviously misunderstanding what the parameter means. Can someone explain it please, and also explain the observation above)?

Every item in a dataset is an "itemset", so there are always "1-itemsets", that's just the list of items and their counts. MINIMUM_ITEMSET_SIZE is really only there for those who are only interested in itemsets larger than 2. Of course, the process of building the model still has to create all the smaller itemsets to get to the larger ones, but it then discards them for prediction and reporting purposes.

No comments:

Post a Comment