Hi, all,
I really need some advices on distinct cluster profile description as when we do the profile clustering with clustering algorithm, some clusters can be at the same time as several profiles. E.g. when I cluster the data into different clusters, select value 1 for attribute a, then cluster 1 is the darkest cluster there which can be cluster of value 1 of attribute a. Then when I select value 2 of attribute b, again, cluster 1 is the darkest one there, then it can also be called as cluster of value 2 of attribute b. Therefore I am really confused. In this case, what are really the best way to descibe this cluster 1? Is there any way to best combine all the common chraracters of these two clusters and then define cluster 1? Or any other good ideas?
I am really looking forward to hearing from you guys who are veteran of data mining and its applications and give me some advices on this.
Thanks a lot in advance.
With best regards,
Yours sincerely,
There are a couple of ways to figure this out - in all cases it is a business decision on how to name clusters, but I can help determine what is interesting.
First you can go to cluster discrimination and look at what disciminates a cluster from all other clusters. This is usually a good bet. However, you have to me careful. For example, assume you have a population of 95% female and 5% male. Say you have a cluster with that shows up in cluster discrimination with "Gender=Male" at top. You need to double check, since due to the marginal distribution it's possible that this cluster only has 15% male - which, at three times the marginal rate, makes "male" a highly discriminating factor.
Another thing to do is to look at the cluster profiles page and click on a cluster header. This sorts the attributes by what is important to that cluster. Again you have to consider if what is important is strong enough to name the cluster or not
HTH
-Jamie
|||Hi, Jamie,
Thanks.
With best regards,
Yours sincerely,
No comments:
Post a Comment