-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathNotes
77 lines (59 loc) · 2.77 KB
/
Notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
Functions -
1. seq() generates numbers from
2. typeof() for details of the variable
3. starting a variable with "." makes it hidden
4. NA is not available. is.na
* The generic function is.na indicates which elements are missing.
* The generic function is.na<- sets elements to NA.
Vector is a set of one type of data (numbers or letters)
for help - > ?ls e.g. ?function
> ??model to find a particular function??model
> vignette is for help in the whole package
> vignette(package = "tidyverse")
> vignette("manifesto", "tidyverse")
* www.cran.csiro.au CRAN Task Views to find package of interest
* search google for help with the package name.
* www.stackoverflow.com search box -> "[R] change point color on R ggplot"
A tidy table
1. Each variable in the data set is placed in its own column
2. Each observation is placed in its own row
3. Each value is placed in its own cell
A tidy table header from a 96 well plate date.
plate row column value
Packages function conflicts
In case of conflickts R would use the most recent lodaded package.
To force it for a particular function use the package name -> base::stat
select(data, arguments)
arguments could be
column names,
column numbers
ranges
-sign to remove
start_with
end_with
contains etc
Renaming coulumn
select(data, new_name = old_name) #it will give you only remaned column back
rename(data, new_name = old_name)
Fixing all column look package "janitor"
Filtering rows
filter(data, column_head "column_value_to_be_Filtered" mathematical_function column_head "column_value_to_be_Filtered")
filter(gapminder, continent %in% c("Africa", "Asia", "Europe"))
#add or remove new columns using mutate
#create a new column called gdp by multiplying gdpPercep by pop
mutate(gapminder, gdp = gdpPercap * pop)
#create a new column to calculate the polupations in millions
mutate(gapminder, popMillion = pop/10000)
mutate(gapminder, popMillion = log(pop))
#to pull string from value of a column to a new column (to pull variables from a sample name)
mutate(gapminder, country_abbr = str_sub(country, start = 1, end = 4))
#using "-" minus sign, it would start from the end of the string
mutate(gapminder, country_end_letter = str_sub(country, start = -1))
Group_by
summarise(gapminder, mean_pop = mean(pop), median_pop = median(pop))
summarise(gapminder_by_country, mean_LE = mean(lifeExp), median_LE = median(lifeExp))
#Summarise and Grouping
#Calculate meand and median for the population for the gapminder dataframe
summarise(gapminder, mean_pop = mean(pop), median_pop = median(pop))
pipe
filter(gapminder, country == "Australia") %>% select(., "country", "year" , "pop")