-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathREADME.Rmd
89 lines (58 loc) · 3.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
output: github_document
---
# modeldb <img src="man/figures/logo.png" align="right" alt="" width="120" />
```{r setup, include=FALSE}
library(dplyr)
library(modeldb)
```
[![R-CMD-check](https://github.com/tidymodels/modeldb/workflows/R-CMD-check/badge.svg)](https://github.com/tidymodels/modeldb/actions)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/modeldb)](https://CRAN.R-project.org/package=modeldb)
[![Codecov test coverage](https://codecov.io/gh/tidymodels/modeldb/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/modeldb?branch=main)
[![Downloads](http://cranlogs.r-pkg.org/badges/modeldb)](https://CRAN.R-project.org/package=modeldb)
Fit models inside the database! **modeldb works with most database back-ends** because it leverages [dplyr](https://dplyr.tidyverse.org/) and [dbplyr](https://dbplyr.tidyverse.org/) for the final SQL translation of the algorithm. It currently supports:
- K-means clustering
- Linear regression
## Installation
Install the CRAN version with:
```{r, eval = FALSE}
install.packages("modeldb")
```
The development version is available from GitHub using remotes:
```{r, eval = FALSE}
# install.packages("remotes")
remotes::install_github("tidymodels/modeldb")
```
## Linear regression
An easy way to try out the package is by creating a temporary SQLite database, and loading `mtcars` to it.
```{r}
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
RSQLite::initExtension(con)
dplyr::copy_to(con, mtcars)
```
```{r}
library(dplyr)
tbl(con, "mtcars") %>%
select(wt, mpg, qsec) %>%
linear_regression_db(wt)
```
The model output can be parsed by [tidypredict](https://tidypredict.tidymodels.org/) to run the predictions in the database. Please see the "Linear Regression" article to learn more about how to use `linear_regression_db()`
## K Means clustering
To use the `simple_kmeans_db()` function, simply pipe the database back end table to the function. This returns a list object that contains two items:
- A sql query table with the final center assignment
- A local table with the information about the centers
```{r}
km <- tbl(con, "mtcars") %>%
simple_kmeans_db(mpg, wt)
colnames(km)
```
The SQL statement from `tbl` can be extracted using dbplyr's `remote_query()`
```{r}
dbplyr::remote_query(km)
```
## Contributing
This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on Posit Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/modeldb/issues).
- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code. Check out [this helpful article on how to create reprexes](https://dbplyr.tidyverse.org/articles/reprex.html) for problems involving a database.
- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).