I build a data mining model to predict what are the best studying methods for the student to pass the examinaton.
Create Mining Model StudentAssociation (
Student_No long key,
Gender text discrete predict,
PassOrFail text discrete predict,
StudyMethod table predict ( MethodName text key )
) Using Microsoft_Association_Rules ( Minimum_Support=0.02, Minimum_Probability=0.03 )
The mining table will contain all the methods that the students use, no matter their examination is passed or failed.
The value of PassOrFail will have either 'Pass' or 'Fail'.
According to the above model, can I query the best studying methods?
Or I should only train the model with the student who pass the examination, and ignore all the failed.
Thanks.
Joe.
I wouldn't use AR, try Decision Trees, Naive Bayes, or Neural Nets. Also, if you're trying to predict what study methods indicate pass or fail, you only need to make PassOrFail predictable, not the nested table.
You definately need positive and negative examples to predict the result. You could create a clustering model on only the passing students to see which types of study methods group together for passing students.
|||As your suggestion to create a clustering model on only the passing students, then I suppose to find that student uses both "method A" and "method B" will have a highest chance to pass the examination.
However, "Method A" and "Method B" may also have the highest probability to fail if I study the failed students. Am I right?
Do I need to create two clustering models, one for passing and one for fail, in order to have a complete picture?
Or, is there any methods to accomplish this task?
Thanks Jamie.
|||Actually, if you only use passing students, then you won't see that Method A and Method B have higher chances than any other method, since all students passed. What you will see is what methods are used together by passing students. For example, there may be a group of passing students who use methods A and B, another using methods C and D, and yet another using A and D.
You are right in that these methods could just as easily be used by failing students. You should also create a similar model for failing students. You may find that the clusters are the same, they may be different, they may be similar with different proportions. For example, you may have an A and B cluster in both models, but in the passing model you find 40% of the students in A and B and only 5% in the failing model.
Another option if you want to use clustering is to make the Pass/Fail column "Predict Only" - in this case the algorithm will cluster based on all other attributes ignoring pass/fail, and then apply statistics for pass/fail across the clusters that were created. This will show if pass/fail is independent of method groupings or not (you can use the cluster diagram for this). However, if you want specifically to predict pass/fail based on study methods, you are better off using Trees or Neural Nets.
|||Your answer is very useful for me.
Thanks Jamie.
No comments:
Post a Comment