r - h2o randomForest variable importance -
i using h2o package create randomforest regression model. have problems variables importance. model creating here. works fine.
some of variables numeric, categorical.
randomforest <- h2o.randomforest(x = c("year", "month", "day", "time", "show", "gen", "d", "lead"), y = "ratio", data = data.hex, importance=t, stat.type = "gini", ntree = 50, depth = 50, nodesize = 5, oobee = t, classification = false, type = "bigdata")
however, when want see variable importance, output looks this.
classification: false number of trees: 50 tree statistics: min. max. mean. depth 30 40 33.26 leaves 20627 21450 21130.24 variable importance: year month day time show gen d lead relative importance 20536.64 77821.76 26742.55 67476.75 283447.3 60651.24 87440.38 3658.625 standard deviation na na na na na na na na z-scores na na na na na na na na overall mean-squared error:
what know is: 1) why there na values. 2) relative importance mean. shouldn't between 1 , 100? 3) why there no confusion matrix in output?
thanks help!
firstly, recommend downloading latest version of h20-3. may solve problem of getting na value standard deviation. relative importance quantifies contributions of specific predicator against made other individual predictors in predicting response variable. number might thinking of needs between 1 , 100 scaled importance. lastly, reason not getting confusion matrix in output have regression model rather classification model. confusion matrices produced classification models.
you can run random forest example in r running following commands:
library(h2o) conn <- h2o.init() demo(h2o.randomforest)
you can see confusion matrix/relative , scaled importance table doing following:
> h2o.confusionmatrix(iris.rf) confusion matrix - (vertical: actual; across: predicted): iris-setosa iris-versicolor iris-virginica error rate iris-setosa 50.000000 0.000000 0.000000 0.0000 = 0 / 50 iris-versicolor 0.000000 47.000000 3.000000 0.0600 = 3 / 50 iris-virginica 0.000000 6.000000 44.000000 0.1200 = 6 / 50 totals 50.000000 53.000000 47.000000 0.0600 = 9 / 150 > h2o.varimp(iris.rf) variable importances: variable relative_importance scaled_importance percentage 1 petal_len 1926.421509 1.000000 0.445738 2 petal_wid 1756.277710 0.911679 0.406370 3 sepal_len 493.782562 0.256321 0.114252 4 sepal_wid 145.390717 0.075472 0.033641
thanks , hope helps!
Comments
Post a Comment