Wednesday, March 7, 2012

Question on dealing with missing values for training models

Hi, all,

Just really wonder what is the good idea to deal with missing values? Should we leave the missing values there in the traning data set ? Or replace it with other values?

What I am really concerned is that if we simply replace those missing values with other values, then how will it really affect the correctness of the training models?

I am looking forward to hearing from you for the above issue and it will be really great if we have any kind of best practices of dealing with this issue.

Thanks.

With best regards,

Yours sincerely,

I have the same question, what's the best practice to replace NULL values in the source data? Should they be left as NULL or replaced with some predetermined default value?|||

For most algorithms, NULL and NULL substituted with a value do not mean the same thing. NULL is usually ignored whereas a substituted NULL is a real state value. The correct modeling approach depends on your data. If NULL has some information in your scenario (e.g. NULL for State means customer from a different country), it's a good idea to sutstitute NULL with a default value before training. If however, NULL means absense of data and has no information, it should be left as is.

Hope this helps.

|||

Hi, Shuvro,

Thanks a lot. It's is quite clear to me now to have a clear idea on dealing with the Null values.

With best regards,

Yours sincerely,

No comments:

Post a Comment