-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
53 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Set the default behavior (used when a rule below doesn't match) | ||
* text=auto | ||
*.R text | ||
*.Rd text | ||
*.Rmd text | ||
*.*proj text | ||
*.targets text | ||
*.settings text | ||
*.vssettings text | ||
|
||
*.dll -text | ||
*.lib -text | ||
*.sln -text | ||
*.ico -text | ||
*.bmp -text | ||
*.png -text | ||
*.snk -text | ||
*.mht -text | ||
*.pickle -text | ||
*.Rdata -text | ||
*.Rhistory -text | ||
|
||
# Some Windows-specific files should always be CRLF | ||
*.bat eol=crlf | ||
*.cmd eol=crlf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,19 @@ | ||
Package: dplyrXdf | ||
Title: Interface to xdf files for the dplyr package | ||
Version: 0.10.0 | ||
Title: Tools for working with Microsoft R Server Xdf files and the dplyr package | ||
Version: 1.0.0 | ||
Authors@R: c( | ||
person("Hong", "Ooi", , "[email protected]", role = c("aut", "cre")), | ||
person("Microsoft", role="cph"), | ||
person("Hadley", "Wickham", role = "ctb", comment = "Some functions based on code in dplyr and httr"), | ||
person("Ali-Kazim", "Zaidi", role = "ctb", comment = "Invaluable assistance on Spark") | ||
person("Ali-Kazim", "Zaidi", role = "ctb", comment = "Invaluable assistance on Spark"), | ||
person("Mario", "Inchiosa", role = "ctb", comment = "Invaluable assistance on Spark") | ||
) | ||
Description: Interface to xdf files for the dplyr package | ||
Description: A suite of tools for working with Microsoft R Server. Its most | ||
visible feature is a dplyr interface for the Xdf file format and other MRS | ||
data sources. It supports Hadoop and Spark clusters, as well as in-database | ||
processing with Microsoft SQL Server. | ||
Depends: | ||
R (>= 3.3.2), | ||
R (>= 3.3), | ||
dplyr (>= 0.7), | ||
RevoScaleR (>= 8.0) | ||
Imports: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,32 +1,35 @@ | ||
# dplyrXdf | ||
|
||
The [dplyr package](https://cran.r-project.org/package=dplyr) is a toolkit for data transformation and manipulation. Since its introduction, dplyr has become very popular in the R community, for the way in which it streamlines and simplifies many common data manipulation tasks. | ||
The dplyrXdf package is a suite of tools to facilitate working with [Microsoft Machine Learning Server](https://www.microsoft.com/en-au/cloud-platform/r-server), previously known as Microsoft R Server (MRS). Its features include: | ||
|
||
The dplyrXdf package implements a dplyr backend for [Microsoft R Server](https://www.microsoft.com/en-au/cloud-platform/r-server) (MRS). A key feature of MRS is that it allows you to break R's memory barrier. Instead of storing data in memory as data frames, it is stored on disk, in a file format identifiable by the `.xdf` extension. The data is then processed in chunks, so that you only need enough memory to store each chunk. This allows you to work with datasets of potentially unlimited size. | ||
- A backend to the popular [dplyr package](http://dplyr.tidyverse.org) for the Xdf file format. Xdf files are a technology provided by MRS to break R's memory barrier: instead of keeping data in-memory in data frames, it is saved on disk. The data is then processed in chunks, so that you only need enough memory to handle each chunk. | ||
- Interfaces to Microsoft SQL Server and HDInsight Hadoop and Spark clusters. dplyrXdf, in conjunction with dplyr, provides the ability to execute pipelines natively in-database and in-cluster, which for large datasets can be much more efficient than executing them locally. | ||
- Several functions to ease working with Xdf files, including functions for file management and for transferring data to and from remote backends. | ||
- Workarounds for various glitches and unexpected behaviour in MRS and dplyr. | ||
|
||
MRS includes a suite of data transformation and modelling functions in the RevoScaleR package that can handle xdf files. These functions are highly optimised and efficient, but their user interface can be complex. dplyrXdf allows you to work with xdf files within the framework supplied by dplyr, which reduces the learning curve and allows you to become productive more quickly. It works with data in the native filesystem and in HDFS, and can take advantage of a Spark or Hadoop cluster. | ||
|
||
_Note that dplyrXdf is a shell on top of the existing functions provided by Microsoft R Server, which is a commercial distribution of R. You must have MRS installed to make use of dplyrXdf. In particular, Microsoft R Open does not include support for xdf files._ | ||
|
||
## Obtaining dplyrXdf | ||
|
||
The current version of dplyrXdf is **0.10.0 beta**. You can download and install dplyrXdf from within R via the devtools package: | ||
The current version of dplyrXdf is **1.0.0**. You can download and install dplyrXdf from within R via the devtools package: | ||
|
||
```r | ||
install.packages("devtools") | ||
devtools::install_github("RevolutionAnalytics/dplyrXdf") | ||
``` | ||
|
||
dplyrXdf 0.10 requires dplyr 0.7 and Microsoft R Server release 8.0 or higher. If you are on an earlier release of MRS and/or dplyr, you can install dplyrXdf 0.9.2 instead: `install_github("RevolutionAnalytics/[email protected]")`. | ||
|
||
## Obtaining dplyr | ||
dplyrXdf requires Microsoft R Server release 8.0 or later, and dplyr 0.7 or later. If you want to use sparklyr and SQL Server integration, you will also have to install the dbplyr, sparklyr and odbc packages (and their dependencies). | ||
|
||
At the moment, dplyr 0.7 is not in the MRAN snapshot that is the default repo for MRS users. You can install it from CRAN instead: | ||
If you are using MRS 9.1 or earlier, the necessary packages will not be in the MRAN snapshot that is your default repo. You can install them from CRAN instead: | ||
|
||
```r | ||
install.packages("dplyr", repos="https://cloud.r-project.org") | ||
install.packages(c("dplyr", "dbplyr", "sparklyr", "odbc"), repos="https://cloud.r-project.org") | ||
``` | ||
|
||
Make sure you install dplyr 0.7 before you install dplyrXdf. | ||
Make sure you install dplyr 0.7 _before_ you install dplyrXdf. | ||
|
||
## Earlier versions | ||
|
||
The previous version of dplyrXdf, 0.9.2, is also available. You can install this with `install_github("RevolutionAnalytics/[email protected]")`. This version requires dplyr 0.5 or earlier; it may run into problems with dplyr 0.7. | ||
|
||
|
||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.