CTVsuggestTrain carries out the model building, and creates the
data.frame
containing the classification probabilities that is
outputted by the
CTVsuggest package.
These R packages are based on follow up work of my 4th year university dissertation supervised by Ioannis Kosmidis
The CTVsuggestTrain R package has a single exported function:
Train_model()
, that constructs features and trains a multinomial
logistic regression model with the objective of classifying CRAN
packages to available CRAN Task
Views. For
a more detailed description of the model, view the Model
Section
of the CTVsuggest Overview
Vignette.
Important to note that in order to output suggestions using the CTVsuggest package, you can completely ignore the CTVsuggestTrain package. I use CTVsuggestTrain to train the model weekly in order to update the predictions provided by CTVsuggest. Having the code packaged makes it easier for me to carry out model training, and allows the model building to be transparent for others to inspect.
For further detail on the workflow, view the Packages Workflow Section of the CTVsuggest Overview Vignette.
You can install the development version of CTVsuggestTrain from GitHub with:
# install.packages("devtools")
devtools::install_github("DylanDijk/CTVsuggestTrain")
The following code saves the model, model accuracy and data.frame
containing classification probabilities for packages to an "OUTPUT"
directory in your current working directory.
library(CTVsuggestTrain)
Train_model(save_output = TRUE, save_path = "OUTPUT/")
The code example above is the code I run to retrieve an up to date
model. The Train_model()
function takes a while to run, on my machine
(Windows Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, 2112 Mhz, 4 Cores,
8 Logical Processors) it takes 30 minutes.