In order to document our process and findings as well as collaborate and share our work more effectively, we want to be using a version control system that easily allows multiple users to work on the same project.
Git is a version control system designed just for this purpose. GitHub is a hosting service that allows use to do Git-based projects on the internet. While we've seen how to make our workflow for a project be reproducible, ideally we would save different versions of our analysis, write-up, etc. along the way to document that as well. Rather than having to create different names for the files, Git allows us to simply track the changes that we commit to the files. For understanding, installing, and using Git/GitHub with R we will be reading different parts from the book happgitwithr. This book is written by some folks at the University of British Columbia for students taking their course and workshops. Please read chapter 1 of that book now and then come back to this page to get started installing and getting R to work with Git and Github!
Everyone at NCSU has a free private GitHub account through github.ncsu.edu. I've had issues connecting R Studio to the NCSU GitHub so I'd go ahead and create a free account at github.com to make sure RStudio can connect to it.
This material is distilled from chapter 7 of happygitwithr. We need to install Git on our local machine. Depending on your operating system things will differ.
Install Git for Windows, also known as msysgit
or “Git Bash”, to get Git in addition to some other useful tools, such as the Bash shell. Yes, all those names
are totally confusing, but you might encounter them elsewhere and I want you to be well-informed.
We like this because Git for Windows leaves the Git executable in a conventional location, which will help you and other programs, e.g. RStudio, find it and use it. This also supports a transition to more expert use, because the “Git Bash” shell will be useful as you venture outside of R/RStudio.
- NOTE: Select “Use Git from the command line and also from 3rd party software” during installation. Otherwise, I believe it’s OK to accept the defaults.
- Note that RStudio for Windows prefers for Git to be installed below
C:/Program Files
, for example the Git executable on my Windows system is found atC:/Program Files/Git/bin/git.exe
. Unless you have specific reasons to otherwise, follow this convention.
Install the Xcode command line tools (not all of Xcode), which includes Git. If your OS is older than 10.11 El Capitan, it is possible that you must install the Xcode command line tools in order for RStudio to find and use Git. (If you need more instructions that what is below, check out this web site for installation commands.)
Go to the shell and enter one of these commands to elicit an offer to install developer command line tools:
git --version
git config
Accept the offer! Click on “Install”.
Here’s another way to request this installation, more directly:
xcode-select --install
We just happen to find this Git-based trigger apropos.
Note also that, after upgrading your Mac OS, you might need to re-do the above and/or re-agree to the Xcode license agreement. Without this it seemed to cause the RStudio Git pane to disappear on a system where it was previously working. Use commands like those above to tickle Xcode into prompting you for what it needs, then restart RStudio.
Install Git via your distro’s package manager.
Ubuntu or Debian Linux:
sudo apt-get install git
Fedora or RedHat Linux:
sudo yum install git
At this point we are ready to learn about about using the "shell" or command line interpreter. Macs are built on Linux and so they come with the CLI naturally. For Windows this is not the case. However, if you are using Windows and have now installed Git you have access to a CLI called BASH. At this point you should read Appendix A of happygitwithr. Pay special attention to section A.4 as these are common commands you might run in the CLI. Really, if you want to be get deep into the data science world you should get very comfortable using the CLI as it will improve your efficiency and allow you to do things you can't do just in RStudio.
Let's get just a little practice using the CLI. Open the CLI (type Git Bash into the search bar for Windows, go to /Applications/Utilities/ and launch the terminal window for Mac).
- Type
ls
to see what folders are in the current directory - Create a folder on your computer that you know the path to. Set your current directory to that path using
cd "path/to/directory"
Now runls
and you probably shouldn't see any subfolders - Type
cd ..
to move up one directory. Use thecd
command to move back into the directory. - Create a new R file in the folder by typing
touch exampleFile.R
- Now run
ls
and you should see the file in the directory - You can actually edit the file through your shell. There are many great programs (such as emacs) you could use but by default I think you should have vim and nano. Let's use nano to open the file. Type
nano exampleFile.R
and you should enter into an editing mode. - Add the following command to the file (putting your path in),
write.csv(file = "path/to/your/newdirectory/data.csv", rnorm(100))
On your keyboard press "Ctrl/Cmd O" to write out the changes, hit enter to submit the changes to the file. Then "Ctrl/Cmd X" to exit the editor. - Now we can run the file from the CLI! We just need to call R (or in this case Rscript and tell it where the file is). Since we are in the directory of the file we can use a relative path. This code runs the file for me (your code
may need to be slightly different).
"C:/Program Files/R/R-3.5.1/bin/Rscript" exampleFile.R
- You should now have a .csv file in the directory. You can remove files in the directory using
rm
. Be careful, you can erase a lot of things on accident!rm exampleFile.R
At this point we are ready to interact with Git through our computer. Read chapter 8 and 9 of happygitwithr - you may wish to install SourceTree (use Atlassian account) at this time, but it is not required. Note: The editor I like to use is Notepad++. You may want to install that and configure it to work (optional).
Read through chapter 10 and try to modify a repository. We'll learn more about pushing, pulling, and committing in a bit so if you don't understand what is going on exactly, don't worry. Just try to run the commands they do there and see if you get the appropriate changes to the files.
You'll probably want to do automatic authentication (optional for now). Chapter 11 gives an easy method for doing this but if you have the time I'd try to figure out the SSH method from chapter 12 as this method is often used in industry.
Lastly, we want to make it easy to connect to our repositories through RStudio. If we get this done, we won't need to download files and reupload them. This can all be done using commands in RStudio! Read chapter 13 to get things working. If you are having an issue, chapters 14 and 15 give some common issues and how to resolve them.
Once you've worked through this, you are ready to go!
There are two resources below to help you understand Git and version control. You can view whichever you prefer (or both) - the video is a bit long but uses RStudio so that is nice :)
- Webinar from RStudio to see the functionality with Git (corresponding web site is here):