diff --git a/episodes/08-setup.md b/episodes/08-setup.md index 2f23b8852..620f0d17d 100644 --- a/episodes/08-setup.md +++ b/episodes/08-setup.md @@ -27,14 +27,14 @@ of configurations we will set as we get started with Git: On a command line, Git commands are written as `git verb options`, where `verb` is what we actually want to do and `options` is additional optional information which may be needed for the `verb`. So here is how -Dracula sets up his new laptop: +you can configure `git` on your own machine: ```bash -$ git config --global user.name "Vlad Dracula" -$ git config --global user.email "vlad@tran.sylvan.ia" +$ git config --global user.name "" +$ git config --global user.email "" ``` -Please use your own name and email address instead of Dracula's. This user name and email will be associated with your subsequent Git activity, +This user name and email will be associated with your subsequent Git activity, which means that any changes pushed to [GitHub](https://github.com/), [BitBucket](https://bitbucket.org/), @@ -84,7 +84,7 @@ $ git config --global core.autocrlf true :::::::::::::::::::::::::::::::::::::::::::::::::: -Dracula also has to set his favorite text editor, following this table: +You can also set your favorite text editor, following this table: | Editor | Configuration command | | :----------- | :------------------------------ | @@ -118,8 +118,8 @@ If you want to save your changes and quit, press Esc then type `:wq` :::::::::::::::::::::::::::::::::::::::::::::::::: Git (2.28+) allows configuration of the name of the branch created when you -initialize any new repository. Dracula decides to use that feature to set it to `main` so -it matches the cloud service he will eventually use. +initialize any new repository +You should set this to `main` so it matches the cloud service you will eventually use. ```bash $ git config --global init.defaultBranch main diff --git a/episodes/09-create.md b/episodes/09-create.md index 3cf285bc4..d7b0a6092 100644 --- a/episodes/09-create.md +++ b/episodes/09-create.md @@ -20,43 +20,30 @@ exercises: 0 Once Git is configured, we can start using it. -We will continue with the story of Wolfman and Dracula who are investigating if it -is possible to send a planetary lander to Mars. - -![](fig/motivatingexample.png){alt='The main elements of the story: Dracula, Wolfman, the Mummy, Mars, Pluto and The Moon'} -[Werewolf vs dracula](https://www.deviantart.com/b-maze/art/Werewolf-vs-Dracula-124893530) -by [b-maze](https://www.deviantart.com/b-maze) / [Deviant Art](https://www.deviantart.com/). -[Mars](https://en.wikipedia.org/wiki/File:OSIRIS_Mars_true_color.jpg) by European Space Agency / -[CC-BY-SA 3.0 IGO](https://creativecommons.org/licenses/by/3.0/deed.en). -[Pluto](https://commons.wikimedia.org/wiki/File:PIA19873-Pluto-NewHorizons-FlyingPastImage-20150714-transparent.png) / -Courtesy NASA/JPL-Caltech. -[Mummy](https://commons.wikimedia.org/wiki/File:Mummy_icon_-_Noun_Project_4070.svg) -© Gilad Fried / [The Noun Project](https://thenounproject.com/) / -[CC BY 3.0](https://creativecommons.org/licenses/by/3.0/deed.en). -[Moon](https://commons.wikimedia.org/wiki/File:Lune_ico.png) -© Luc Viatour / [https://lucnix.be](https://lucnix.be/) / -[CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/deed.en). +To demonstrate the use of `git`, we will build a +[**data dictionary**](https://en.wikipedia.org/wiki/Data_dictionary) for the data we obtained +from UKHSA. First, let's create a new directory in the `Desktop` folder for our work and then change the current working directory to the newly created one: ```bash $ cd ~/Desktop -$ mkdir planets -$ cd planets +$ mkdir data-dictionary +$ cd data-dictionary ``` -Then we tell Git to make `planets` a [repository](../learners/reference.md#repository) -\-- a place where Git can store versions of our files: +Then we tell Git to make `data-dictionary` a [repository](../learners/reference.md#repository) +-- a place where Git can store versions of our files: ```bash $ git init ``` It is important to note that `git init` will create a repository that -can include subdirectories and their files---there is no need to create -separate repositories nested within the `planets` repository, whether +can include subdirectories and their files -- there is no need to create +separate repositories nested within the `data-dictionary` repository, whether subdirectories are present from the beginning or added later. Also, note -that the creation of the `planets` directory and its initialization as a +that the creation of the `data-dictionary` directory and its initialization as a repository are completely separate processes. If we use `ls` to show the directory's contents, @@ -67,7 +54,7 @@ $ ls ``` But if we add the `-a` flag to show everything, -we can see that Git has created a hidden directory within `planets` called `.git`: +we can see that Git has created a hidden directory within `data-dictionary` called `.git`: ```bash $ ls -a @@ -85,7 +72,7 @@ we will lose the project's history. Next, we will change the default branch to be called `main`. This might be the default branch depending on your settings and version of git. -See the [setup episode](02-setup.md#default-git-branch-naming) for more information on this change. +See the [setup episode](08-setup.md#default-git-branch-naming) for more information on this change. ```bash $ git checkout -b main @@ -116,33 +103,33 @@ wording of the output might be slightly different. ## Places to Create Git Repositories -Along with tracking information about planets (the project we have already created), -Dracula would also like to track information about moons. -Despite Wolfman's concerns, Dracula creates a `moons` project inside his `planets` +Along with tracking information about the data dictionary (the project we have already created), +we would also like to track information about related datasets. +Despite any concerns, we create a `related-data` project inside the `data-dictionary` project with the following sequence of commands: ```bash $ cd ~/Desktop # return to Desktop directory -$ cd planets # go into planets directory, which is already a Git repository -$ ls -a # ensure the .git subdirectory is still present in the planets directory -$ mkdir moons # make a subdirectory planets/moons -$ cd moons # go into moons subdirectory -$ git init # make the moons subdirectory a Git repository +$ cd data-dictionary # go into data-dictionary directory, which is already a Git repository +$ ls -a # ensure the .git subdirectory is still present in the data-dictionary directory +$ mkdir related-data # make a subdirectory data-dictionary/related-data +$ cd related-data # go into related-data subdirectory +$ git init # make the related-data subdirectory a Git repository $ ls -a # ensure the .git subdirectory is present indicating we have created a new Git repository ``` -Is the `git init` command, run inside the `moons` subdirectory, required for -tracking files stored in the `moons` subdirectory? +Is the `git init` command, run inside the `related-data` subdirectory, required for +tracking files stored in the `related-data` subdirectory? ::::::::::::::: solution ## Solution -No. Dracula does not need to make the `moons` subdirectory a Git repository -because the `planets` repository can track any files, sub-directories, and -subdirectory files under the `planets` directory. Thus, in order to track -all information about moons, Dracula only needed to add the `moons` subdirectory -to the `planets` directory. +No. We do not need to make the `related-data` subdirectory a Git repository +because the `data-dictionary` repository can track any files, sub-directories, and +subdirectory files under the `data-dictionary` directory. Thus, in order to track +all information about related data, we only needed to add the `related-data` subdirectory +to the `data-dictionary` directory. Additionally, Git repositories can interfere with each other if they are "nested": the outer repository will try to version-control @@ -164,9 +151,9 @@ fatal: Not a git repository (or any of the parent directories): .git ## Correcting `git init` Mistakes -Wolfman explains to Dracula how a nested repository is redundant and may cause confusion -down the road. Dracula would like to go back to a single git repository. How can Dracula undo -his last `git init` in the `moons` subdirectory? +Now that we know that a nested repository is redundant and may cause confusion +down the road, we would like to go back to a single git repository. How can we undo +our last `git init` in the `related-data` subdirectory? ::::::::::::::: solution @@ -189,11 +176,11 @@ becomes another change that we will need to track, as we will see in the next ep ### Solution Git keeps all of its files in the `.git` directory. -To recover from this little mistake, Dracula can remove the `.git` -folder in the moons subdirectory by running the following command from inside the `planets` directory: +To recover from this little mistake, we can remove the `.git` +folder in the `related-data` subdirectory by running the following command from inside the `data-dictionary` directory: ```bash -$ rm -rf moons/.git +$ rm -rf related-data/.git ``` But be careful! Running this command in the wrong directory will remove diff --git a/episodes/10-changes.md b/episodes/10-changes.md index 337dea69b..4f7627c50 100644 --- a/episodes/10-changes.md +++ b/episodes/10-changes.md @@ -21,28 +21,35 @@ exercises: 0 :::::::::::::::::::::::::::::::::::::::::::::::::: First let's make sure we're still in the right directory. -You should be in the `planets` directory. +You should be in the `data-dictionary` directory. ```bash -$ cd ~/Desktop/planets +$ cd ~/Desktop/data-dictionary ``` -Let's create a file called `mars.txt` that contains some notes -about the Red Planet's suitability as a base. -We'll use `nano` to edit the file; -you can use whatever editor you like. -In particular, this does not have to be the `core.editor` you set globally earlier. But remember, the bash command to create or edit a new file will depend on the editor you choose (it might not be `nano`). For a refresher on text editors, check out ["Which Editor?"](https://swcarpentry.github.io/shell-novice/03-create.html#which-editor) in [The Unix Shell](https://swcarpentry.github.io/shell-novice/) lesson. +Let's create a file called `amr-data-dictionary.txt` that will contain the description of the +variables in our AMR data. +We'll use RStudio to edit the file; but you can use whatever editor you like. +In particular, this does not have to be the `core.editor` you set globally earlier. +First create the file from the command line: ```bash -$ nano mars.txt +$ touch amr-data-dictionary.txt ``` -Type the text below into the `mars.txt` file: +Now open the file in RStudio (or your preferred editor). + +Type the text below into the `amr-data-dictionary.txt` file: ```output -Cold and dry, but everything is my favorite color +AMR data +100,000 rows of 12 variables + +These data represent the sort of data that might be obtained from the Second Generation Surveillance System (SGSS) ``` +and save the file. + Let's first verify that the file was properly created by running the list command (`ls`): ```bash @@ -50,17 +57,20 @@ $ ls ``` ```output -mars.txt +amr-data-dictionary.txt ``` -`mars.txt` contains a single line, which we can see by running: +`amr-data-dictionary.txt` should contain 4 lines, which we can see by running: ```bash -$ cat mars.txt +$ cat amr-data-dictionary.txt ``` ```output -Cold and dry, but everything is my favorite color +AMR data +100,000 rows of 12 variables + +These data represent the sort of data that might be obtained from the Second Generation Surveillance System (SGSS) ``` If we check the status of our project again, @@ -78,7 +88,7 @@ No commits yet Untracked files: (use "git add ..." to include in what will be committed) - mars.txt + amr-data-dictionary.txt nothing added to commit but untracked files present (use "git add" to track) ``` @@ -88,7 +98,7 @@ that Git isn't keeping track of. We can tell Git to track a file using `git add`: ```bash -$ git add mars.txt +$ git add amr-data-dictionary.txt ``` and then check that the right thing happened: @@ -105,23 +115,23 @@ No commits yet Changes to be committed: (use "git rm --cached ..." to unstage) - new file: mars.txt + new file: amr-data-dictionary.txt ``` -Git now knows that it's supposed to keep track of `mars.txt`, +Git now knows that it's supposed to keep track of `amr-data-dictionary.txt`, but it hasn't recorded these changes as a commit yet. To get it to do that, we need to run one more command: ```bash -$ git commit -m "Start notes on Mars as a base" +$ git commit -m "Start data dictionary for AMR data" ``` ```output [main (root-commit) f22b25e] Start notes on Mars as a base 1 file changed, 1 insertion(+) - create mode 100644 mars.txt + create mode 100644 amr-data-dictionary.txt ``` When we run `git commit`, @@ -161,10 +171,10 @@ $ git log ```output commit f22b25e3233b4645dabd0d81e651fe074bd8e73b -Author: Vlad Dracula +Author: Date: Thu Aug 22 09:51:46 2013 -0400 - Start notes on Mars as a base + Start data dictionary for AMR data ``` `git log` lists all commits made to a repository in reverse chronological order. @@ -180,7 +190,7 @@ and the log message Git was given when the commit was created. ## Where Are My Changes? -If we run `ls` at this point, we will still see just one file called `mars.txt`. +If we run `ls` at this point, we will still see just one file called `amr-data-dictionary.txt`. That's because Git saves information about files' history in the special `.git` directory mentioned earlier so that our filesystem doesn't become cluttered @@ -189,18 +199,21 @@ so that our filesystem doesn't become cluttered :::::::::::::::::::::::::::::::::::::::::::::::::: -Now suppose Dracula adds more information to the file. -(Again, we'll edit with `nano` and then `cat` the file to show its contents; +Now suppose we add more information to the file. +(Again, we'll edit with RStudio and then `cat` the file to show its contents; you may use a different editor, and don't need to `cat`.) ```bash -$ nano mars.txt -$ cat mars.txt +$ cat amr-data-dictionary.txt ``` ```output -Cold and dry, but everything is my favorite color -The two moons may be a problem for Wolfman +AMR data +100,000 rows of 12 variables + +These data represent the sort of data that might be obtained from the Second Generation Surveillance System (SGSS) + +* id Integer - A unique identifier for each person ``` When we run `git status` now, @@ -216,7 +229,7 @@ Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) - modified: mars.txt + modified: amr-data-dictionary.txt no changes added to commit (use "git add" and/or "git commit -a") ``` @@ -237,13 +250,16 @@ $ git diff ``` ```output -diff --git a/mars.txt b/mars.txt +diff --git a/amr-data-dictionary.txt b/amr-data-dictionary.txt index df0654a..315bf3a 100644 ---- a/mars.txt -+++ b/mars.txt -@@ -1 +1,2 @@ - Cold and dry, but everything is my favorite color -+The two moons may be a problem for Wolfman +--- a/amr-data-dictionary.txt ++++ b/amr-data-dictionary.txt +@@ -2,3 +2,5 @@ AMR data + 100,000 rows of 12 variables + + These data represent the sort of data that might be obtained from the Second Generation Surveillance System (SGSS) ++ ++* id Integer - A unique identifier for each person ``` The output is cryptic because @@ -265,7 +281,7 @@ If we break it down into pieces: After reviewing our change, it's time to commit it: ```bash -$ git commit -m "Add concerns about effects of Mars' moons on Wolfman" +$ git commit -m "Add entry for the id column" ``` ```output @@ -274,7 +290,7 @@ Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) - modified: mars.txt + modified: amr-data-dictionary.txt no changes added to commit (use "git add" and/or "git commit -a") ``` @@ -284,13 +300,13 @@ Git won't commit because we didn't use `git add` first. Let's fix that: ```bash -$ git add mars.txt -$ git commit -m "Add concerns about effects of Mars' moons on Wolfman" +$ git add amr-data-dictionary.txt +$ git commit -m "Add entry for the id column" ``` ```output -[main 34961b1] Add concerns about effects of Mars' moons on Wolfman - 1 file changed, 1 insertion(+) +[main 7464434] Add entry for the id column + 1 file changed, 2 insertions(+) ``` Git insists that we add files to the set we want to commit @@ -340,17 +356,31 @@ Let's watch as our changes to a file move from our editor to the staging area and into long-term storage. First, -we'll add another line to the file: +we'll add a few more lines to the file and complete our dictionary: ```bash -$ nano mars.txt -$ cat mars.txt +$ cat amr-data-dictionary.txt ``` ```output -Cold and dry, but everything is my favorite color -The two moons may be a problem for Wolfman -But the Mummy will appreciate the lack of humidity +AMR data + +100,000 rows of 12 variables + +These data represent the sort of data that might be obtained from the Second Generation Surveillance System (SGSS) + + * id Integer - A unique identifier for each person + * dob Character - a string giving the date of birth + * spec_date Character - a string giving the date a specimen was taken + * sex_male Binary - indicates whether the person from whom the specimen was taken was male or not. 1 (male) 0 (not male) + * region Character - a string indicating the England region of laboratory testing the specimen + * had_surgery_past_yr Binary - indicates whether person from whom sample was taken had undergone surgery in hospital in the past year before specimen taken. 1 (surgery within last year) 0 (No surgery within last year) + * ethnicity Character - indicates self-reported ethnicity group according to Office for National Statistics groupings + * imd Integer - indicates the Index of Multiple Deprivation for residence for person from whom specimen was taken. Range: 1 (least deprived) - 5 (most deprived) + * organism Character - indicates the species name for the organism detected + * coamox Binary - indicates specimen was resistant to Coamoxiclav + * gentam Binary - indicates specimen was resistant to Gentamicin + * ciprof Binary - indicates specimen was resistant to Ciprofloxacin ``` ```bash @@ -358,24 +388,35 @@ $ git diff ``` ```output -diff --git a/mars.txt b/mars.txt -index 315bf3a..b36abfd 100644 ---- a/mars.txt -+++ b/mars.txt -@@ -1,2 +1,3 @@ - Cold and dry, but everything is my favorite color - The two moons may be a problem for Wolfman -+But the Mummy will appreciate the lack of humidity +diff --git a/amr-data-dictionary.txt b/amr-data-dictionary.txt +index f2c537e..c9a8214 100644 +--- a/amr-data-dictionary.txt ++++ b/amr-data-dictionary.txt +@@ -4,3 +4,14 @@ AMR data + These data represent the sort of data that might be obtained from the Second Generation Surveillance System (SGSS) + + * id Integer - A unique identifier for each person ++* dob Character - a string giving the date of birth ++* spec_date Character - a string giving the date a specimen was taken ++* sex_male Binary - indicates whether the person from whom the specimen was taken was male or not. 1 (male) 0 (not male) ++* region Character - a string indicating the England region of laboratory testing the specimen ++* had_surgery_past_yr Binary - indicates whether person from whom sample was taken had undergone surgery in hospital in the past year before specimen taken. 1 (surgery within last year) 0 (No surgery within last year) ++* ethnicity Character - indicates self-reported ethnicity group according to Office for National Statistics groupings ++* imd Integer - indicates the Index of Multiple Deprivation for residence for person from whom specimen was taken. Range: 1 (least deprived) - 5 (most deprived) ++* organism Character - indicates the species name for the organism detected ++* coamox Binary - indicates specimen was resistant to Coamoxiclav ++* gentam Binary - indicates specimen was resistant to Gentamicin ++* ciprof Binary - indicates specimen was resistant to Ciprofloxacin ``` So far, so good: -we've added one line to the end of the file +we've added the necessary lines to the end of the file (shown with a `+` in the first column). Now let's put that change in the staging area and see what `git diff` reports: ```bash -$ git add mars.txt +$ git add amr-data-dictionary.txt $ git diff ``` @@ -391,14 +432,25 @@ $ git diff --staged ``` ```output -diff --git a/mars.txt b/mars.txt -index 315bf3a..b36abfd 100644 ---- a/mars.txt -+++ b/mars.txt -@@ -1,2 +1,3 @@ - Cold and dry, but everything is my favorite color - The two moons may be a problem for Wolfman -+But the Mummy will appreciate the lack of humidity +diff --git a/amr-data-dictionary.txt b/amr-data-dictionary.txt +index f2c537e..c9a8214 100644 +--- a/amr-data-dictionary.txt ++++ b/amr-data-dictionary.txt +@@ -4,3 +4,14 @@ AMR data + These data represent the sort of data that might be obtained from the Second Generation Surveillance System (SGSS) + + * id Integer - A unique identifier for each person ++* dob Character - a string giving the date of birth ++* spec_date Character - a string giving the date a specimen was taken ++* sex_male Binary - indicates whether the person from whom the specimen was taken was male or not. 1 (male) 0 (not male) ++* region Character - a string indicating the England region of laboratory testing the specimen ++* had_surgery_past_yr Binary - indicates whether person from whom sample was taken had undergone surgery in hospital in the past year before specimen taken. 1 (surgery within last year) 0 (No surgery within last year) ++* ethnicity Character - indicates self-reported ethnicity group according to Office for National Statistics groupings ++* imd Integer - indicates the Index of Multiple Deprivation for residence for person from whom specimen was taken. Range: 1 (least deprived) - 5 (most deprived) ++* organism Character - indicates the species name for the organism detected ++* coamox Binary - indicates specimen was resistant to Coamoxiclav ++* gentam Binary - indicates specimen was resistant to Gentamicin ++* ciprof Binary - indicates specimen was resistant to Ciprofloxacin ``` it shows us the difference between @@ -407,12 +459,12 @@ and what's in the staging area. Let's save our changes: ```bash -$ git commit -m "Discuss concerns about Mars' climate for Mummy" +$ git commit -m "Complete the data dictionary" ``` ```output -[main 005937f] Discuss concerns about Mars' climate for Mummy - 1 file changed, 1 insertion(+) +[main 1c642ba] Complete the data dictionary + 1 file changed, 11 insertions(+) ``` check our status: @@ -433,23 +485,23 @@ $ git log ``` ```output -commit 005937fbe2a98fb83f0ade869025dc2636b4dad5 (HEAD -> main) -Author: Vlad Dracula -Date: Thu Aug 22 10:14:07 2013 -0400 +commit 1c642ba5c32a722081ea9e3c80ef0634b4e071f3 (HEAD -> main) +Author: John Doe +Date: Wed Aug 14 15:01:56 2024 +0100 - Discuss concerns about Mars' climate for Mummy + Complete the data dictionary -commit 34961b159c27df3b475cfe4415d94a6d1fcd064d -Author: Vlad Dracula -Date: Thu Aug 22 10:07:21 2013 -0400 +commit 746443401e4f3e41a4bb67844dfb03e0241a1721 +Author: John Doe +Date: Wed Aug 14 14:57:35 2024 +0100 - Add concerns about effects of Mars' moons on Wolfman + Add entry for the id column -commit f22b25e3233b4645dabd0d81e651fe074bd8e73b -Author: Vlad Dracula -Date: Thu Aug 22 09:51:46 2013 -0400 +commit 0f988204ddcf33c060ecb849d640b3bd7aec71cc +Author: John Doe +Date: Wed Aug 14 14:54:11 2024 +0100 - Start notes on Mars as a base + Start data dictionary ``` ::::::::::::::::::::::::::::::::::::::::: callout @@ -497,11 +549,11 @@ $ git log -1 ``` ```output -commit 005937fbe2a98fb83f0ade869025dc2636b4dad5 (HEAD -> main) -Author: Vlad Dracula -Date: Thu Aug 22 10:14:07 2013 -0400 +commit 1c642ba5c32a722081ea9e3c80ef0634b4e071f3 (HEAD -> main) +Author: John Doe +Date: Wed Aug 14 15:01:56 2024 +0100 - Discuss concerns about Mars' climate for Mummy + Complete the data dictionary ``` You can also reduce the quantity of information using the @@ -512,9 +564,9 @@ $ git log --oneline ``` ```output -005937f (HEAD -> main) Discuss concerns about Mars' climate for Mummy -34961b1 Add concerns about effects of Mars' moons on Wolfman -f22b25e Start notes on Mars as a base +1c642ba (HEAD -> main) Complete the data dictionary +7464434 Add entry for the id column +0f98820 Start data dictionary ``` You can also combine the `--oneline` option with others. One useful @@ -528,9 +580,9 @@ $ git log --oneline --graph ``` ```output -* 005937f (HEAD -> main) Discuss concerns about Mars' climate for Mummy -* 34961b1 Add concerns about effects of Mars' moons on Wolfman -* f22b25e Start notes on Mars as a base +* 1c642ba (HEAD -> main) Complete the data dictionary +* 7464434 Add entry for the id column +* 0f98820 Start data dictionary ``` :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -545,13 +597,13 @@ Two important facts you should know about directories in Git. Try it for yourself: ```bash - $ mkdir spaceships + $ mkdir notes $ git status - $ git add spaceships + $ git add notes $ git status ``` - Note, our newly created empty directory `spaceships` does not appear in + Note, our newly created empty directory `notes` does not appear in the list of untracked files even if we explicitly add it (*via* `git add`) to our repository. This is the reason why you will sometimes see `.gitkeep` files in otherwise empty directories. Unlike `.gitignore`, these files are not special @@ -568,16 +620,16 @@ Two important facts you should know about directories in Git. Try it for yourself: ```bash - $ touch spaceships/apollo-11 spaceships/sputnik-1 + $ touch notes/2024-08-14.txt spaceships/2024-08-13.txt $ git status - $ git add spaceships + $ git add notes $ git status ``` Before moving on, we will commit these changes. ```bash - $ git commit -m "Add some initial thoughts on spaceships" + $ git commit -m "Add meeting notes" ``` :::::::::::::::::::::::::::::::::::::::::::::::::: @@ -594,11 +646,11 @@ repository (`git commit`): ## Choosing a Commit Message Which of the following commit messages would be most appropriate for the -last commit made to `mars.txt`? +last commit made to `amr-data-dictionary.txt`? 1. "Changes" -2. "Added line 'But the Mummy will appreciate the lack of humidity' to mars.txt" -3. "Discuss effects of Mars' climate on the Mummy" +2. "Added lines 7-17 to amr-data-dictionary.txt" +3. "Add description for each data variable" ::::::::::::::: solution @@ -658,10 +710,10 @@ to my local Git repository? The staging area can hold changes from any number of files that you want to commit as a single snapshot. -1. Add some text to `mars.txt` noting your decision - to consider Venus as a base -2. Create a new file `venus.txt` with your initial thoughts - about Venus as a base for you and your friends +1. Add some text to `amr-data-dictionary.txt` to mention that more detailed information on each + variable can be found in a separate file with the same name as the variable. +2. Create a new file `ciprof.txt` with a short description of what Ciproflaxacin is + (feel free to make something up). 3. Add changes from both files to the staging area, and commit those changes. @@ -669,53 +721,52 @@ that you want to commit as a single snapshot. ## Solution -The output below from `cat mars.txt` reflects only content added during +The output below from `cat amr-data-dictionary.txt` reflects only content added during this exercise. Your output may vary. -First we make our changes to the `mars.txt` and `venus.txt` files: +First we make our changes to the `amr-data-dictionary.txt` and `ciprof.txt` files: ```bash -$ nano mars.txt -$ cat mars.txt +$ cat amr-data-dictionary.txt ``` ```output -Maybe I should start with a base on Venus. +More information on each variable can be found in the corresponding file. ``` ```bash -$ nano venus.txt -$ cat venus.txt +$ cat ciprof.txt ``` ```output -Venus is a nice planet and I definitely should consider it as a base. +Ciprofloxacin is a fluoroquinolone antibiotic used to treat a number of bacterial infections. ``` Now you can add both files to the staging area. We can do that in one line: ```bash -$ git add mars.txt venus.txt +$ git add amr-data-dictionary.txt ciprof.txt ``` Or with multiple commands: ```bash -$ git add mars.txt -$ git add venus.txt +$ git add amr-data-dictionary.txt +$ git add ciprof.txt ``` Now the files are ready to commit. You can check that using `git status`. If you are ready to commit use: ```bash -$ git commit -m "Write plans to start a base on Venus" +$ git commit -m "Add more detailed information about ciprofloxacin" +``` ``` ```output [main cc127c2] - Write plans to start a base on Venus + Add more detailed information about ciprofloxacin 2 files changed, 2 insertions(+) - create mode 100644 venus.txt + create mode 100644 ciprof.txt ``` ::::::::::::::::::::::::: @@ -737,7 +788,7 @@ $ git commit -m "Write plans to start a base on Venus" ## Solution -If needed, move out of the `planets` folder: +If needed, move out of the `data-dictionary` folder: ```bash $ cd .. diff --git a/episodes/11-history.md b/episodes/11-history.md index b4633ef44..71e2b0307 100644 --- a/episodes/11-history.md +++ b/episodes/11-history.md @@ -25,38 +25,45 @@ As we saw in the previous episode, we can refer to commits by their identifiers. You can refer to the *most recent commit* of the working directory by using the identifier `HEAD`. -We've been adding one line at a time to `mars.txt`, so it's easy to track our +We've been adding one line at a time to `amr-data-dictionary.txt`, so it's easy to track our progress by looking, so let's do that using our `HEAD`s. Before we start, -let's make a change to `mars.txt`, adding yet another line. +let's make a change to `amr-data-dictionary.txt`, adding yet another line. ```bash -$ nano mars.txt -$ cat mars.txt +$ tail amr-data-dictionary.txt ``` +Showing only the last few lines: + ```output -Cold and dry, but everything is my favorite color -The two moons may be a problem for Wolfman -But the Mummy will appreciate the lack of humidity -An ill-considered change +* sex_male Binary - indicates whether the person from whom the specimen was taken was male or not. 1 (male) 0 (not male) +* region Character - a string indicating the England region of laboratory testing the specimen +* had_surgery_past_yr Binary - indicates whether person from whom sample was taken had undergone surgery in hospital in the past year before specimen taken. 1 (surgery within last year) 0 (No surgery within last year) +* ethnicity Character - indicates self-reported ethnicity group according to Office for National Statistics groupings +* imd Integer - indicates the Index of Multiple Deprivation for residence for person from whom specimen was taken. Range: 1 (least deprived) - 5 (most deprived) +* organism Character - indicates the species name for the organism detected +* coamox Binary - indicates specimen was resistant to Coamoxiclav +* gentam Binary - indicates specimen was resistant to Gentamicin +* ciprof Binary - indicates specimen was resistant to Ciprofloxacin +* name Character - a string giving the name of the person from whom the specimen was taken ``` Now, let's see what we get. ```bash -$ git diff HEAD mars.txt +$ git diff HEAD amr-data-dictionary.txt ``` ```output -diff --git a/mars.txt b/mars.txt -index b36abfd..0848c8d 100644 ---- a/mars.txt -+++ b/mars.txt -@@ -1,3 +1,4 @@ - Cold and dry, but everything is my favorite color - The two moons may be a problem for Wolfman - But the Mummy will appreciate the lack of humidity -+An ill-considered change. +diff --git a/amr-data-dictionary.txt b/amr-data-dictionary.txt +index c9a8214..d7d742c 100644 +--- a/amr-data-dictionary.txt ++++ b/amr-data-dictionary.txt +@@ -15,3 +15,4 @@ These data represent the sort of data that might be obtained from the Second Gen + * coamox Binary - indicates specimen was resistant to Coamoxiclav + * gentam Binary - indicates specimen was resistant to Gentamicin + * ciprof Binary - indicates specimen was resistant to Ciprofloxacin ++* name Character - a string giving the name of the person from whom the specimen was taken ``` which is the same as what you would get if you leave out `HEAD` (try it). The @@ -66,26 +73,39 @@ that by adding `~1` to refer to the commit one before `HEAD`. ```bash -$ git diff HEAD~1 mars.txt +$ git diff HEAD~1 amr-data-dictionary.txt ``` If we want to see the differences between older commits we can use `git diff` again, but with the notation `HEAD~1`, `HEAD~2`, and so on, to refer to them: ```bash -$ git diff HEAD~2 mars.txt +$ git diff HEAD~2 amr-data-dictionary.txt ``` ```output -diff --git a/mars.txt b/mars.txt -index df0654a..b36abfd 100644 ---- a/mars.txt -+++ b/mars.txt -@@ -1 +1,4 @@ - Cold and dry, but everything is my favorite color -+The two moons may be a problem for Wolfman -+But the Mummy will appreciate the lack of humidity -+An ill-considered change +diff --git a/amr-data-dictionary.txt b/amr-data-dictionary.txt +index 55895cc..d7d742c 100644 +--- a/amr-data-dictionary.txt ++++ b/amr-data-dictionary.txt +@@ -2,3 +2,17 @@ AMR data + 100,000 rows of 12 variables + + These data represent the sort of data that might be obtained from the Second Generation Surveillance System (SGSS) ++ ++* id Integer - A unique identifier for each person ++* dob Character - a string giving the date of birth ++* spec_date Character - a string giving the date a specimen was taken ++* sex_male Binary - indicates whether the person from whom the specimen was taken was male or not. 1 (male) 0 (not male) ++* region Character - a string indicating the England region of laboratory testing the specimen ++* had_surgery_past_yr Binary - indicates whether person from whom sample was taken had undergone surgery in hospital in the past year before specimen taken. 1 (surgery within last year) 0 (No surgery within last year) ++* ethnicity Character - indicates self-reported ethnicity group according to Office for National Statistics groupings ++* imd Integer - indicates the Index of Multiple Deprivation for residence for person from whom specimen was taken. Range: 1 (least deprived) - 5 (most deprived) ++* organism Character - indicates the species name for the organism detected ++* coamox Binary - indicates specimen was resistant to Coamoxiclav ++* gentam Binary - indicates specimen was resistant to Gentamicin ++* ciprof Binary - indicates specimen was resistant to Ciprofloxacin ++* name Character - a string giving the name of the person from whom the specimen was taken ``` We could also use `git show` which shows us what changes we made at an older commit as @@ -93,23 +113,26 @@ well as the commit message, rather than the *differences* between a commit and o working directory that we see by using `git diff`. ```bash -$ git show HEAD~2 mars.txt +$ git show HEAD~2 amr-data-dictionary.txt ``` ```output -commit f22b25e3233b4645dabd0d81e651fe074bd8e73b -Author: Vlad Dracula -Date: Thu Aug 22 09:51:46 2013 -0400 +commit 0f988204ddcf33c060ecb849d640b3bd7aec71cc +Author: John Doe +Date: Wed Aug 14 14:54:11 2024 +0100 - Start notes on Mars as a base + Start data dictionary -diff --git a/mars.txt b/mars.txt +diff --git a/amr-data-dictionary.txt b/amr-data-dictionary.txt new file mode 100644 -index 0000000..df0654a +index 0000000..55895cc --- /dev/null -+++ b/mars.txt -@@ -0,0 +1 @@ -+Cold and dry, but everything is my favorite color ++++ b/amr-data-dictionary.txt +@@ -0,0 +1,4 @@ ++AMR data ++100,000 rows of 12 variables ++ ++These data represent the sort of data that might be obtained from the Second Generation Surveillance System (SGSS) ``` In this way, @@ -128,23 +151,36 @@ and "unique" really does mean unique: every change to any set of files on any computer has a unique 40-character identifier. Our first commit was given the ID -`f22b25e3233b4645dabd0d81e651fe074bd8e73b`, +`0f988204ddcf33c060ecb849d640b3bd7aec71cc`, so let's try this: ```bash -$ git diff f22b25e3233b4645dabd0d81e651fe074bd8e73b mars.txt +$ git diff 0f988204ddcf33c060ecb849d640b3bd7aec71cc amr-data-dictionary.txt ``` ```output -diff --git a/mars.txt b/mars.txt -index df0654a..93a3e13 100644 ---- a/mars.txt -+++ b/mars.txt -@@ -1 +1,4 @@ - Cold and dry, but everything is my favorite color -+The two moons may be a problem for Wolfman -+But the Mummy will appreciate the lack of humidity -+An ill-considered change +diff --git a/amr-data-dictionary.txt b/amr-data-dictionary.txt +index 55895cc..d7d742c 100644 +--- a/amr-data-dictionary.txt ++++ b/amr-data-dictionary.txt +@@ -2,3 +2,17 @@ AMR data + 100,000 rows of 12 variables + + These data represent the sort of data that might be obtained from the Second Generation Surveillance System (SGSS) ++ ++* id Integer - A unique identifier for each person ++* dob Character - a string giving the date of birth ++* spec_date Character - a string giving the date a specimen was taken ++* sex_male Binary - indicates whether the person from whom the specimen was taken was male or not. 1 (male) 0 (not male) ++* region Character - a string indicating the England region of laboratory testing the specimen ++* had_surgery_past_yr Binary - indicates whether person from whom sample was taken had undergone surgery in hospital in the past year before specimen taken. 1 (surgery within last year) 0 (No surgery within last year) ++* ethnicity Character - indicates self-reported ethnicity group according to Office for National Statistics groupings ++* imd Integer - indicates the Index of Multiple Deprivation for residence for person from whom specimen was taken. Range: 1 (least deprived) - 5 (most deprived) ++* organism Character - indicates the species name for the organism detected ++* coamox Binary - indicates specimen was resistant to Coamoxiclav ++* gentam Binary - indicates specimen was resistant to Gentamicin ++* ciprof Binary - indicates specimen was resistant to Ciprofloxacin ++* name Character - a string giving the name of the person from whom the specimen was taken ``` That's the right answer, @@ -152,26 +188,14 @@ but typing out random 40-character strings is annoying, so Git lets us use just the first few characters (typically seven for normal size projects): ```bash -$ git diff f22b25e mars.txt -``` - -```output -diff --git a/mars.txt b/mars.txt -index df0654a..93a3e13 100644 ---- a/mars.txt -+++ b/mars.txt -@@ -1 +1,4 @@ - Cold and dry, but everything is my favorite color -+The two moons may be a problem for Wolfman -+But the Mummy will appreciate the lack of humidity -+An ill-considered change +$ git diff 0f98820 amr-data-dictionary.txt ``` All right! So we can save changes to files and see what we've changed. Now, how can we restore older versions of things? Let's suppose we change our mind about the last update to -`mars.txt` (the "ill-considered change"). +`amr-data-dictionary.txt` (we realise the `name` variable is not actually part of the data). `git status` now tells us that the file has been changed, but those changes haven't been staged: @@ -185,8 +209,7 @@ On branch main Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) - - modified: mars.txt + modified: amr-data-dictionary.txt no changes added to commit (use "git add" and/or "git commit -a") ``` @@ -195,14 +218,21 @@ We can put things back the way they were by using `git restore`: ```bash -$ git restore mars.txt -$ cat mars.txt +$ git restore amr-data-dictionary.txt +$ tail amr-data-dictionary.txt ``` ```output -Cold and dry, but everything is my favorite color -The two moons may be a problem for Wolfman -But the Mummy will appreciate the lack of humidity +* spec_date Character - a string giving the date a specimen was taken +* sex_male Binary - indicates whether the person from whom the specimen was taken was male or not. 1 (male) 0 (not male) +* region Character - a string indicating the England region of laboratory testing the specimen +* had_surgery_past_yr Binary - indicates whether person from whom sample was taken had undergone surgery in hospital in the past year before specimen taken. 1 (surgery within last year) 0 (No surgery within last year) +* ethnicity Character - indicates self-reported ethnicity group according to Office for National Statistics groupings +* imd Integer - indicates the Index of Multiple Deprivation for residence for person from whom specimen was taken. Range: 1 (least deprived) - 5 (most deprived) +* organism Character - indicates the species name for the organism detected +* coamox Binary - indicates specimen was resistant to Coamoxiclav +* gentam Binary - indicates specimen was resistant to Gentamicin +* ciprof Binary - indicates specimen was resistant to Ciprofloxacin ``` As you might guess from its name, @@ -214,15 +244,18 @@ If we want to go back even further, we can use a commit identifier instead, using `-s` option: ```bash -$ git restore -s f22b25e mars.txt +$ git restore -s 0f98820 amr-data-dictionary.txt ``` ```bash -$ cat mars.txt +$ cat amr-data-dictionary.txt ``` ```output -Cold and dry, but everything is my favorite color +AMR data +100,000 rows of 12 variables + +These data represent the sort of data that might be obtained from the Second Generation Surveillance System (SGSS) ``` ```bash @@ -234,7 +267,7 @@ On branch main Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) - modified: mars.txt + modified: amr-data-dictionary.txt no changes added to commit (use "git add" and/or "git commit -a") @@ -243,14 +276,28 @@ Notice that the changes are currently in the staging area. Again, we can put things back the way they were by using `git restore`: ```bash -$ git restore mars.txt -$ cat mars.txt +$ git restore amr-data-dictionary.txt +$ cat amr-data-dictionary.txt ``` ```output -Cold and dry, but everything is my favorite color -The two moons may be a problem for Wolfman -But the Mummy will appreciate the lack of humidity +AMR data +100,000 rows of 12 variables + +These data represent the sort of data that might be obtained from the Second Generation Surveillance System (SGSS) + +* id Integer - A unique identifier for each person +* dob Character - a string giving the date of birth +* spec_date Character - a string giving the date a specimen was taken +* sex_male Binary - indicates whether the person from whom the specimen was taken was male or not. 1 (male) 0 (not male) +* region Character - a string indicating the England region of laboratory testing the specimen +* had_surgery_past_yr Binary - indicates whether person from whom sample was taken had undergone surgery in hospital in the past year before specimen taken. 1 (surgery within last year) 0 (No surgery within last year) +* ethnicity Character - indicates self-reported ethnicity group according to Office for National Statistics groupings +* imd Integer - indicates the Index of Multiple Deprivation for residence for person from whom specimen was taken. Range: 1 (least deprived) - 5 (most deprived) +* organism Character - indicates the species name for the organism detected +* coamox Binary - indicates specimen was resistant to Coamoxiclav +* gentam Binary - indicates specimen was resistant to Gentamicin +* ciprof Binary - indicates specimen was resistant to Ciprofloxacin ``` @@ -430,10 +477,10 @@ Venus is beautiful and full of love. ## Checking Understanding of `git diff` -Consider this command: `git diff HEAD~9 mars.txt`. What do you predict this command +Consider this command: `git diff HEAD~9 amr-data-dictionary.txt`. What do you predict this command will do if you execute it? What happens when you do execute it? Why? -Try another command, `git diff [ID] mars.txt`, where [ID] is replaced with +Try another command, `git diff [ID] amr-data-dictionary.txt`, where [ID] is replaced with the unique identifier for your most recent commit. What do you think will happen, and what does happen? @@ -446,7 +493,7 @@ and what does happen? `git restore` can be used to restore a previous commit when unstaged changes have been made, but will it also work for changes that have been staged but not committed? -Make a change to `mars.txt`, add that change using `git add`, +Make a change to `amr-data-dictionary.txt`, add that change using `git add`, then use `git restore` to see if you can remove your change. ::::::::::::::: solution @@ -460,7 +507,7 @@ Let's look at the output of `git status`: On branch main Changes to be committed: (use "git restore --staged ..." to unstage) - modified: mars.txt + modified: amr-data-dictionary.txt ``` @@ -468,13 +515,13 @@ Note that if you don't have the same output you may either have forgotten to change the file, or you have added it *and* committed it. -Using the command `git restore mars.txt` now does not give an error, +Using the command `git restore amr-data-dictionary.txt` now does not give an error, but it does not restore the file either. Git helpfully tells us that we need to use `git restore --staged` first to unstage the file: ```bash -$ git restore --staged mars.txt +$ git restore --staged amr-data-dictionary.txt ``` @@ -490,7 +537,7 @@ Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git git restore ..." to discard changes in working directory) - modified: mars.txt + modified: amr-data-dictionary.txt no changes added to commit (use "git add" and/or "git commit -a") ``` @@ -499,7 +546,7 @@ This means we can now use `git restore` to restore the file to the previous commit: ```bash -$ git restore mars.txt +$ git restore amr-data-dictionary.txt $ git status ``` @@ -519,16 +566,16 @@ nothing to commit, working tree clean Exploring history is an important part of Git, and often it is a challenge to find the right commit ID, especially if the commit is from several months ago. -Imagine the `planets` project has more than 50 files. -You would like to find a commit that modifies some specific text in `mars.txt`. +Imagine the `data-dictionary` project has more than 50 files. +You would like to find a commit that modifies some specific text in `amr-data-dictionary.txt`. When you type `git log`, a very long list appeared. How can you narrow down the search? Recall that the `git diff` command allows us to explore one specific file, -e.g., `git diff mars.txt`. We can apply a similar idea here. +e.g., `git diff amr-data-dictionary.txt`. We can apply a similar idea here. ```bash -$ git log mars.txt +$ git log amr-data-dictionary.txt ``` Unfortunately some of these commit messages are very ambiguous, e.g., `update files`. @@ -539,7 +586,7 @@ for you. Is it possible to combine both? Let's try the following: ```bash -$ git log --patch mars.txt +$ git log --patch amr-data-dictionary.txt ``` You should get a long list of output, and you should be able to see both commit messages and