I want to predict a variable with Naive Bayes. I tried it with another one from the same dataset and it worked perfect but not with the desired. The variable to predict contains values like 'OL','D. Jan 02, 2018 In what way is it problematic? Also, consider that using parRF has the potential to square the number of processes that you create (train does things in parallel and in each worker, parRF does more in parallel). I think that using the sequential random forest in parallel (instead of using parRF) is more efficient since there is a lot less I/O and worker startups but I don't have a lot of data.
PermalinkGitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign uptimestamp<- Sys.time() |
library(caret) |
library(plyr) |
library(recipes) |
library(dplyr) |
model<-'lda' |
######################################################################### |
set.seed(1) |
training<- twoClassSim(50, linearVars=2) |
testing<- twoClassSim(500, linearVars=2) |
trainX<-training[, -ncol(training)] |
trainY<-training$Class |
rec_cls<- recipe(Class~ ., data=training) %>% |
step_center(all_predictors()) %>% |
step_scale(all_predictors()) |
cctrl1<- trainControl(method='cv', number=3, returnResamp='all', |
classProbs=TRUE, |
summaryFunction=twoClassSummary) |
cctrl2<- trainControl(method='LOOCV', |
classProbs=TRUE, summaryFunction=twoClassSummary) |
cctrl3<- trainControl(method='none', |
classProbs=TRUE, summaryFunction=twoClassSummary) |
set.seed(849) |
test_class_cv_model<- train(trainX, trainY, |
method='lda', |
trControl=cctrl1, |
metric='ROC', |
preProc= c('center', 'scale')) |
set.seed(849) |
test_class_cv_form<- train(Class~ ., data=training, |
method='lda', |
trControl=cctrl1, |
metric='ROC', |
preProc= c('center', 'scale')) |
test_class_pred<- predict(test_class_cv_model, testing[, -ncol(testing)]) |
test_class_prob<- predict(test_class_cv_model, testing[, -ncol(testing)], type='prob') |
test_class_pred_form<- predict(test_class_cv_form, testing[, -ncol(testing)]) |
test_class_prob_form<- predict(test_class_cv_form, testing[, -ncol(testing)], type='prob') |
set.seed(849) |
test_class_loo_model<- train(trainX, trainY, |
method='lda', |
trControl=cctrl2, |
metric='ROC', |
preProc= c('center', 'scale')) |
set.seed(849) |
test_class_none_model<- train(trainX, trainY, |
method='lda', |
trControl=cctrl3, |
tuneGrid=test_class_cv_model$bestTune, |
metric='ROC', |
preProc= c('center', 'scale')) |
test_class_none_pred<- predict(test_class_none_model, testing[, -ncol(testing)]) |
test_class_none_prob<- predict(test_class_none_model, testing[, -ncol(testing)], type='prob') |
set.seed(849) |
test_class_rec<- train(x=rec_cls, |
data=training, |
method='lda', |
trControl=cctrl1, |
metric='ROC') |
if( |
!isTRUE( |
all.equal(test_class_cv_model$results, |
test_class_rec$results)) |
) |
stop('CV weights not giving the same results') |
test_class_imp_rec<- varImp(test_class_rec) |
test_class_pred_rec<- predict(test_class_rec, testing[, -ncol(testing)]) |
test_class_prob_rec<- predict(test_class_rec, testing[, -ncol(testing)], |
type='prob') |
test_levels<- levels(test_class_cv_model) |
if(!all(levels(trainY) %in%test_levels)) |
cat('wrong levels') |
######################################################################### |
test_class_predictors1<- predictors(test_class_cv_model) |
######################################################################### |
tests<- grep('test_', ls(), fixed=TRUE, value=TRUE) |
sInfo<- sessionInfo() |
timestamp_end<- Sys.time() |
save(list= c(tests, 'sInfo', 'timestamp', 'timestamp_end'), |
file= file.path(getwd(), paste(model, '.RData', sep=''))) |
if(!interactive()) |
q('no') |
I have the same problem with RFE. I do not understand since I explicitly add a function created from twoClassSummary (I tried to use twoClassSumary as well, it did not work). What am I doing wrong?
Example:
Session:
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=hu_HU.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=hu_HU.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=hu_HU.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=hu_HU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] plyr_1.8.4 kernlab_0.9-25 caret_6.0-76 ggplot2_2.2.1 lattice_0.20-35
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 magrittr_1.5 splines_3.4.1 MASS_7.3-47 munsell_0.4.3 colorspace_1.3-2 rlang_0.1.2
[8] foreach_1.4.3 minqa_1.2.4 stringr_1.2.0 car_2.1-5 tools_3.4.1 nnet_7.3-12 parallel_3.4.1
[15] pbkrtest_0.4-7 grid_3.4.1 gtable_0.2.0 nlme_3.1-131 mgcv_1.8-18 quantreg_5.33 e1071_1.6-8
[22] class_7.3-14 MatrixModels_0.4-1 iterators_1.0.8 lme4_1.1-13 lazyeval_0.2.0 tibble_1.3.3 Matrix_1.2-11
[29] nloptr_1.0.4 reshape2_1.4.2 ModelMetrics_1.1.0 codetools_0.2-15 stringi_1.1.5 compiler_3.4.1 scales_0.4.1
[36] doMC_1.3.4 stats4_3.4.1 SparseM_1.77
Error:
Warning messages:
1: In rfe.default(x = iris[, -c(5, 6)], y = iris[, 6], sizes = c(1, :
Metric 'ROC' is not created by the summary function; 'Accuracy' will be used instead
2: In train.default(x, y, ...) :
The metric 'Accuracy' was not in the result set. ROC will be used instead.