Friday, March 9, 2012

Question on large volume of training dataset

Hi, all experts here,

Thanks a lot for your kind attention.

I have a question on training large volume of datasets. In this case, the training will take a long while to complete, is there anything we can do to improve that? I know, we obviously cant split the training dataset into different smaller datasets. What we can do to improve that?

Hope my question is clear for your help.

Thank you very much in advance for your advices and help and I am looking forward to hearing from you shortly.

With best regards,

Yours sincerely,

Generally, the performance of the training operation depends on the size of the training set and there is not much one can do about this. Sometimes, the accuracy of the model is not improved significantly by adding new data. you might try smaller samples first and see if you really need all the data.

Certain optimizations could be done, depending on the algorithm. For example, if you are using the Neural Network algorithm you might want to make sure that the continuous columns are treated as continuous and not discretized, particularly if the the column is predictable. Also, make sure the model does not include unnecessary columns and only the required columns are marked as Predictable.

All these would improve the performance, but not significantly

Alternately, if you are trying just to do some sort of data exploration, you might want to start with Naive Bayes, which takes little time for training.|||

Hi, Bogdan,

Thanks a lot for your kind advices.

With best regards,

Yours sincerely,

No comments:

Post a Comment