r - h2o randomForest variable importance -


i using h2o package create randomforest regression model. have problems variables importance. model creating here. works fine.

some of variables numeric, categorical.

randomforest <- h2o.randomforest(x = c("year",  "month", "day", "time", "show", "gen",                                    "d", "lead"), y = "ratio", data = data.hex, importance=t, stat.type = "gini",                              ntree = 50, depth = 50, nodesize = 5, oobee = t, classification = false, type = "bigdata") 

however, when want see variable importance, output looks this.

classification: false number of trees: 50 tree statistics:         min.  max.    mean. depth     30    40    33.26 leaves 20627 21450 21130.24   variable importance:                         year    month      day     time  show   gen           d   lead relative importance 20536.64 77821.76 26742.55 67476.75 283447.3 60651.24   87440.38 3658.625 standard deviation        na       na       na       na       na       na       na       na z-scores                  na       na       na       na       na       na       na       na  overall mean-squared error:   

what know is: 1) why there na values. 2) relative importance mean. shouldn't between 1 , 100? 3) why there no confusion matrix in output?

thanks help!

firstly, recommend downloading latest version of h20-3. may solve problem of getting na value standard deviation. relative importance quantifies contributions of specific predicator against made other individual predictors in predicting response variable. number might thinking of needs between 1 , 100 scaled importance. lastly, reason not getting confusion matrix in output have regression model rather classification model. confusion matrices produced classification models.

you can run random forest example in r running following commands:

library(h2o) conn <- h2o.init() demo(h2o.randomforest) 

you can see confusion matrix/relative , scaled importance table doing following:

> h2o.confusionmatrix(iris.rf) confusion matrix - (vertical: actual; across: predicted):                 iris-setosa iris-versicolor iris-virginica  error      rate iris-setosa       50.000000        0.000000       0.000000 0.0000 =  0 / 50 iris-versicolor    0.000000       47.000000       3.000000 0.0600 =  3 / 50 iris-virginica     0.000000        6.000000      44.000000 0.1200 =  6 / 50 totals            50.000000       53.000000      47.000000 0.0600 = 9 / 150 > h2o.varimp(iris.rf) variable importances:    variable relative_importance scaled_importance percentage 1 petal_len         1926.421509          1.000000   0.445738 2 petal_wid         1756.277710          0.911679   0.406370 3 sepal_len          493.782562          0.256321   0.114252 4 sepal_wid          145.390717          0.075472   0.033641 

thanks , hope helps!


Comments

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -