Skip to content

Latest commit

 

History

History
96 lines (93 loc) · 10.6 KB

GitHub&R.md

File metadata and controls

96 lines (93 loc) · 10.6 KB

This is a page used in ST 558 about getting R and gitHub to work together.

Understanding Git/GitHub

In order to document our process and findings as well as collaborate and share our work more effectively, we want to be using a version control system that easily allows multiple users to work on the same project.

Version Control Comic

Git is a version control system designed just for this purpose. GitHub is a hosting service that allows use to do Git-based projects on the internet. While we've seen how to make our workflow for a project be reproducible, ideally we would save different versions of our analysis, write-up, etc. along the way to document that as well.  Rather than having to create different names for the files, Git allows us to simply track the changes that we commit to the files.  For understanding, installing, and using Git/GitHub with R we will be reading different parts from the book happgitwithr. This book is written by some folks at the University of British Columbia for students taking their course and workshops.  Please read chapter 1 of that book now and then come back to this page to get started installing and getting R to work with Git and Github! 

GitHub

Everyone at NCSU has a free private GitHub account through github.ncsu.edu.  I've had issues connecting R Studio to the NCSU GitHub so I'd go ahead and create a free account at github.com to make sure RStudio can connect to it.  

Installing Git on your Computer

This material is distilled from chapter 7 of happygitwithr.  We need to install Git on our local machine.  Depending on your operating system things will differ. 

Windows

Install Git for Windows, also known as msysgit or “Git Bash”, to get Git in addition to some other useful tools, such as the Bash shell. Yes, all those names are totally confusing, but you might encounter them elsewhere and I want you to be well-informed.

We like this because Git for Windows leaves the Git executable in a conventional location, which will help you and other programs, e.g. RStudio, find it and use it. This also supports a transition to more expert use, because the “Git Bash” shell will be useful as you venture outside of R/RStudio.

  • NOTE: Select “Use Git from the command line and also from 3rd party software” during installation. Otherwise, I believe it’s OK to accept the defaults.
  • Note that RStudio for Windows prefers for Git to be installed below C:/Program Files, for example the Git executable on my Windows system is found at C:/Program Files/Git/bin/git.exe. Unless you have specific reasons to otherwise, follow this convention.

Mac

Install the Xcode command line tools (not all of Xcode), which includes Git. If your OS is older than 10.11 El Capitan, it is possible that you must install the Xcode command line tools in order for RStudio to find and use Git.  (If you need more instructions that what is below, check out this web site for installation commands.)  

Go to the shell and enter one of these commands to elicit an offer to install developer command line tools:

git --version
git config

Accept the offer! Click on “Install”.

Here’s another way to request this installation, more directly:

xcode-select --install

We just happen to find this Git-based trigger apropos.

Note also that, after upgrading your Mac OS, you might need to re-do the above and/or re-agree to the Xcode license agreement. Without this it seemed to cause the RStudio Git pane to disappear on a system where it was previously working. Use commands like those above to tickle Xcode into prompting you for what it needs, then restart RStudio.

Linux

Install Git via your distro’s package manager.

Ubuntu or Debian Linux:

sudo apt-get install git

Fedora or RedHat Linux:

sudo yum install git

Command Line Interpreter (CLI)

At this point we are ready to learn about about using the "shell" or command line interpreter.  Macs are built on Linux and so they come with the CLI naturally.  For Windows this is not the case.  However, if you are using Windows and have now installed Git you have access to a CLI called BASH.  At this point you should read Appendix A of happygitwithr.  Pay special attention to section A.4 as these are common commands you might run in the CLI.  Really, if you want to be get deep into the data science world you should get very comfortable using the CLI as it will improve your efficiency and allow you to do things you can't do just in RStudio.

Let's get just a little practice using the CLI.  Open the CLI (type Git Bash into the search bar for Windows, go to /Applications/Utilities/ and launch the terminal window for Mac).  

  1. Type ls to see what folders are in the current directory
  2. Create a folder on your computer that you know the path to. Set your current directory to that path using cd "path/to/directory" Now run ls and you probably shouldn't see any subfolders
  3. Type cd .. to move up one directory. Use the cd command to move back into the directory.
  4. Create a new R file in the folder by typing touch exampleFile.R
  5. Now run ls and you should see the file in the directory
  6. You can actually edit the file through your shell. There are many great programs (such as emacs) you could use but by default I think you should have vim and nano. Let's use nano to open the file. Type nano exampleFile.R and you should enter into an editing mode.
  7. Add the following command to the file (putting your path in), write.csv(file = "path/to/your/newdirectory/data.csv", rnorm(100)) On your keyboard press "Ctrl/Cmd O" to write out the changes, hit enter to submit the changes to the file. Then "Ctrl/Cmd X" to exit the editor.
  8. Now we can run the file from the CLI!  We just need to call R (or in this case Rscript and tell it where the file is).  Since we are in the directory of the file we can use a relative path.  This code runs the file for me (your code may need to be slightly different).  
    "C:/Program Files/R/R-3.5.1/bin/Rscript" exampleFile.R
  9. You should now have a .csv file in the directory. You can remove files in the directory using rm. Be careful, you can erase a lot of things on accident! rm exampleFile.R

Talking with Git/GitHub

At this point we are ready to interact with Git through our computer.  Read chapter 8 and 9 of happygitwithr - you may wish to install SourceTree (use Atlassian account) at this time, but it is not required.  Note: The editor I like to use is Notepad++.  You may want to install that and configure it to work (optional).

Read through chapter 10 and try to modify a repository.  We'll learn more about pushing, pulling, and committing in a bit so if you don't understand what is going on exactly, don't worry.  Just try to run the commands they do there and see if you get the appropriate changes to the files.  

You'll probably want to do automatic authentication (optional for now).  Chapter 11 gives an easy method for doing this but if you have the time I'd try to figure out the SSH method from chapter 12 as this method is often used in industry.

Using Git/GitHub through RStudio

Lastly, we want to make it easy to connect to our repositories through RStudio.  If we get this done, we won't need to download files and reupload them.  This can all be done using commands in RStudio! Read chapter 13 to get things working.  If you are having an issue, chapters 14 and 15 give some common issues and how to resolve them.

Once you've worked through this, you are ready to go!

Understanding Git/GitHub

There are two resources below to help you understand Git and version control.  You can view whichever you prefer (or both) - the video is a bit long but uses RStudio so that is nice :) 

  • Webinar from RStudio to see the functionality with Git (corresponding web site is here):