Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

may you share link from where to get data please #2

Open
Sandy4321 opened this issue May 24, 2020 · 8 comments
Open

may you share link from where to get data please #2

Sandy4321 opened this issue May 24, 2020 · 8 comments

Comments

@Sandy4321
Copy link

as usual great code and ideas
but
d0_train <- fread(paste0("/var/data/airline/",yr-1,".csv"))
may you share link from where to get data please

@szilard
Copy link
Owner

szilard commented May 24, 2020

here:

# for yr in 1990 1991; do
# wget http://stat-computing.org/dataexpo/2009/$yr.csv.bz2
# bunzip2 $yr.csv.bz2
# done
## TODO: loop 1991..2000
yr <- 1991
d0_train <- fread(paste0("/var/data/airline/",yr-1,".csv"))
d0_train <- d0_train[!is.na(DepDelay)]
d0_test <- fread(paste0("/var/data/airline/",yr,".csv"))
d0_test <- d0_test[!is.na(DepDelay)]
d0 <- rbind(d0_train, d0_test)

(see the commented out lines with wget for the URLs)

@Sandy4321
Copy link
Author

Thanks for soon answer
something goes wrong with this code
may you please clarify what can be done ?

set.seed(123)
for yr in 1990 1991; do
Error: unexpected symbol in "for yr"
wget http://stat-computing.org/dataexpo/2009/$yr.csv.bz2
Error: unexpected symbol in " wget http"
bunzip2 $yr.csv.bz2
Error: object 'bunzip2' not found
wget http://stat-computing.org/dataexpo/2009/$1990.csv.bz2
Error: unexpected symbol in "wget http"
yr <- 1990
wget http://stat-computing.org/dataexpo/2009/$yr.csv.bz2
Error: unexpected symbol in "wget http"

install.packages("wget")
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘C:/Users/sndr/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
Warning in install.packages :
package ‘wget’ is not available (for R version 3.5.1)

install.packages("Rtools ")
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘C:/Users/sndr/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
Warning in install.packages :
package ‘Rtools ’ is not available (for R version 3.5.1)

install.packages("Rtools")
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘C:/Users/sndr/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
Warning in install.packages :
package ‘Rtools’ is not available (for R version 3.5.1)

@szilard
Copy link
Owner

szilard commented May 25, 2020

well, that's not R code, it's a bash (unix) script

you can also just download those files manually, e.g. http://stat-computing.org/dataexpo/2009/1991.csv.bz2

@Sandy4321
Copy link
Author

great thanks for soon answer
but I get this
image

and this

curl("http://stat-computing.org/dataexpo/2009/$yr.csv.bz2")
A connection with
description "http://stat-computing.org/dataexpo/2009/$yr.csv.bz2"
class "curl"
mode "r"
text "text"
opened "closed"
can read "yes"
can write "no"
yr
[1] 1990

and this
image

@szilard
Copy link
Owner

szilard commented May 25, 2020

Yeah, I see. It seems the provider has deleted the data.

http://stat-computing.org/dataexpo/2009/the-data.html

You might be able to find a copy somewhere else, though.

@szilard
Copy link
Owner

szilard commented May 25, 2020

E.g. here: https://github.com/h2oai/h2o-2/wiki/Hacking-Airline-DataSet-with-H2O

Airlines all years 1987-2008:
https://s3.amazonaws.com/h2o-airlines-unpacked/allyears.csv (12 GB)

though I'm not 100% sure it is exactly the same data (that is same rows and same columns).

@Sandy4321
Copy link
Author

Szilard
super thanks for help
very kind of you
will try to download this data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants