Wednesday, March 28, 2012

Question Regarding Scenario On DataMining

Hi to all

I want to know how achieve the following scenario, I will highly encourage if anyone knows the solution please mention it through steps. The scenario is I have a table named “CreditCardTable ” with following schema , the table is pretty much what one expect from a simple credit card application that will present form to user who fills the application of credit card and data going to be filled into the table. Later an analyst will review the data and will be accepting or rejecting the Credit Card along with some other fields as seen in the schema.

--Schema --
CustomerName Bigint
Age int
DOB DateTime
LimitAssigned Bigint
Analystname varchar
Salary varchar
Gender bit
Qualification varchar
Country varchar
JobTitle varchar
MartialStatus varchar
IsApproved bit
IsDeclined bit

Now what I want is that by using Microsoft Detaining Features after giving it the data t should purpose analyst the solution i-e (reject / approve ) the credit card application . Now I gather than there are many algorithms in SQL Server 2005 Analysis Services that can be used (Decision Trees , Association Rules ) , I research on the two I just mentioned and want to use Decision Tree for this purpose , but there is a bottleneck that is mentioned in a book named “Detaining with SQL Server 2005 by Wiely”. It says that the rules that the Decision Tree will build and use are not available to end user in contrast to Association Rule algorithm which provide the rules. I essentially need the rules by which the decision has been made. Please I really need help on these issue to summarize I need the following

1) With the table “CreditCardTable” I need to perform Decision Trees and Association Rules Algorithm on it , please provide me steps to do so , moreover please also mention how to specify Query (DMX I think) to get the final result i-e approve or reject based on the data .

2) The rules on basis of which the Algorithm makes it decision, please provide insights on how to achieve this by both Decision Trees and Association Rules Algorithm.

I really need an urgent reply on this , you can also email me at razi_rais@.yahoo.com , thanks for your feedback.

Here is a tutorial for building a mining model: http://www.sqlserverdatamining.com/DMCommunity/Tutorials/Links_LinkRedirector.aspx?id=60 . Note that the site requires you to create a (free) user account in order to download documents. One of the examples in the tutorial explains how to create a decision trees model in order to predict whether a customer is a Bike Buyer or not. I think that example is very similar to your problem. I would expect the Decision Tree algorithm to do a much better job in this case than Association Rules, but there is no harm in trying both.

One more suggestion: if IsApproved and IsDeclined describe basically the same thing (i.e., IsApproved = 1 - IsDeclined), I would ignore the IsDeclined column and only build a mining model with IsApproved as target. This way, the model will predict 1 or 0 for the IsApproved column, and you can conclude that 0 means IsDeclined.

On the query issue: there are two kinds of queries you might be interested in executing:

1) The so-called 'singleton' query -- allows executing a prediction for a new row. Example:

SELECT IsApproved FROM MyModel [NATURAL] PREDICTION JOIN (SELECT 30 as Age, 50000 as LimitAssigned ..... ) AS T [ON...]

This executes a prediction against a customer represented by the fields in the inner select statement

2) The 'batch' query -- allows executing predictions against a table or view (a collection of rows) and it is optimized for performance. Syntax:

SELECT IsApproved FROM MyModel [NATURAL] PREDICTION JOIN OPENQUERY(DataSource, 'Query') AS T [ON...]

The query builder tool inside BI Dev Studio can be used to generate both kinds of queries. Note that the prediction query is independent of the algorithm. That means that you can use these queries with both Association Rules and Decision Trees.

On the last issue (rules), the answer is different, depending on the algorith. Association Rules is designed to construct rules. The rules can be inspected visually in the viewer. In the case of Decision Trees, the rule would be the path from the root of the tree to the leaf node that is used in prediction. Clicking on a decision trees node in the model viewer will present the information for that node, including the rule (in the form of Node Path from the root to that node)

For decision trees, you can extract the rule that leads to each individual prediction. Details here: http://www.sqlserverdatamining.com/DMCommunity/TipsNTricks/784.aspx

|||

Thanks Bogdan for your prompt reply I will be reviewing the links you provided .Regarding the Rules I reviewed description at http://www.sqlserverdatamining.com/DMCommunity/TipsNTricks/784.aspx , now can you please tell me is there any way to get whole set of rules not only the branch executed for making the decison in the form of xml. I mean like say after the mining algorithm executes and hence trained on the data can we write some function that return us whole rules (preferbly in form of xml) that will be used to make the design.

Best Regards

Razi

|||

You can try

SELECT NODE_RULE, NODE_CAPTION FROM MyModel.CONTENT

This will return the rules and the XMl representation of the rules for each tree node. However, you need to manually analyze the distribution for each node to see what the prediction is for that rule (the returned results contain the left hand side of the rule).

|||

Hmm what i want to know is when you guys want to capture complete rules and want to do it without mannual interction is there is any possiblity to do so ? I mean is microsoft provide any Object Model on top of this so that I can capture whole rules like

If <xyz> then <abc>

If <abc> then <pqr>

If <pqr> then <swf>

I think there might be any possibility to acieve this or is plainly impossible to achieve using Datamining in SQL 2005.

One more thing i want to ask is the query u mentioned

SELECT NODE_RULE, NODE_CAPTION FROM MyModel.CONTENT is independent of Algorithm used ? If it is then which model is more effective among DST and AR for prediction purposes specially in Credit Card Application Scenarion that i mentioned in the first post , you mentioned DS but can please amplify why and how to capture its inner workings?

|||

It really depends on what you want to do. Association rules provides rules of the sort

A,B,C -> D

Decision trees gives you rules of the form

If A and B and C then D

In both cases rules are probabilistic.

All data mining algorithms expose their content though the content schema rowset which has an specified form, and each algorithm uses that form as appropriate for that algorithm. For example, the content for a tree model is a parent-child rowset with each row indicating a node in the tree. For Association Rules, each row is either an itemset or a rule. You can access the content through the queries like Bogdan specifies, or using the object model with ADOMD.Net. The object model essentially wraps the content and provides easier navigation, but it's the same. On SQL Server Data MIning.com there is a sample viewer that allows you to view the raw content in an easier way, and it contains source code. That may be a good place to start.

No comments:

Post a Comment