Prep, polish, top coat, this vanity case is exactly what you need to put the final touch to your statistical analysis.
Thanks to the Ollama API that allows to use Large Language Model (LLM) locally, we developed a small package designed for interpreting continuous or categorical latent variables. You provide a data set with a latent variable you want to understand and some other explanatory variables. It provides a description of the latent variable based on the explanatory variables. It also provides a name to the latent variable. ‘NaileR’ in an R package that uses convenience functions offered by the ‘FactoMineR’ package (condes(), catdes(), descfreq()) in conjunction with the ‘ollamar’ package.
Its two main goals are to: * generate latent variables descriptions with the help of AI * offer similarity measure tools for textual data
install.packages('devtools')
::install_github('Nelhe/NaileR')
devtoolslibrary(NaileR)
‘NaileR’ currently features 9 datasets and 7 functions.
For complete case studies and a showcase of the main functions of the ‘NaileR’ package, see the documentation.
Let’s have a look at how we can interpret HCPC clusters:
library(FactoMineR)
data(local_food)
set.seed(1) # for consistency
<- MCA(local_food, quali.sup = 46:63, ncp = 100, level.ventil = 0.05, graph = F)
res_mca plot.MCA(res_mca, choix = "ind", invisible = c("var", "quali.sup"), label = "none")
<- HCPC(res_mca, nb.clust = 3, graph = F)
res_hcpc plot.HCPC(res_hcpc, choice = "map", draw.tree = F, ind.names = F)
<- res_hcpc$data.clust don_clust
Due to the very long and explicit variable names, the category description result is practically illegible. Let’s provide clear context and see how a LLM can make sense of it:
= nail_catdes(don_clust, ncol(don_clust),
res
introduction = 'A study on sustainable food systems was led on several French participants. This study had 2 parts.
In the first part, participants had to rate how acceptable "a food system that..." (e.g, "a food system that only uses renewable energy") was to them.
In the second part, they had to say if they agreed or disagreed with some statements.',
request = 'I will give you the answers from one group.
Please explain who the individuals of this group are, what their beliefs are. Then, give this group a new name, and explain why you chose this name.',
isolate.groups = T, drop.negative = T)
Out comes a list of results, for each group.
In the same fashion, nail_condes can be used to interpret axis from a PCA - although a bit more work is needed, to bind the original data frame with the coordinates on the PCA axis.
This package is under the GPL (>= 2) License. Details can be found here.
Sébastien Lê - [email protected]
Project link: https://github.com/Nelhe/NaileR