Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download-only option #32

Open
j-woz opened this issue Oct 31, 2018 · 7 comments
Open

Download-only option #32

j-woz opened this issue Oct 31, 2018 · 7 comments

Comments

@j-woz
Copy link
Contributor

j-woz commented Oct 31, 2018

Allow user to invoke Benchmark in download-only mode, which will simply download the input data if it does not exist. This is necessary on supercomputers. This mode should not import keras or any other modules not required for data download.

@jmohdyusof
Copy link
Contributor

jmohdyusof commented Oct 31, 2018

See, for example, p3b1. The line
fpath = fetch_data(gParameters)
is basically what you want to run separately from the 'run' command. We can modify fetch_file to allow a different base Data directory location to address your other issue?

def fetch_data(gParameters):
    """ Downloads and decompresses the data if not locally available.
        Since the training data depends on the model definition it is not loaded,
        instead the local path where the raw data resides is returned
    """
    
    path = gParameters['data_url']
    fpath = candle.fetch_file(path + gParameters['train_data'], 'Pilot3', untar=True)
    
    return fpath

@j-woz
Copy link
Contributor Author

j-woz commented Oct 31, 2018

That sounds good.

@jmohdyusof
Copy link
Contributor

So probably a command like this should work for both tickets:

python benchmark --dl_only --basedir='/scratch/candle/'

@j-woz
Copy link
Contributor Author

j-woz commented Oct 31, 2018

They will read that as Deep Learn only :) .
How about --data-dir ? Will that be a standard flag for all Benchmark invocation? The default will be the current behavior (data directory == Benchmarks/Data).

@jmohdyusof
Copy link
Contributor

Whatever we choose for keywords we can make part of the standard parser, so just decide on ones that don't conflict with other standard (keras/neon/etc) keywords.

--data_dir is fine (we currently use underscore, not dash, to separate words)

is --get_data_only clear enough without being too long?

@j-woz
Copy link
Contributor Author

j-woz commented Nov 1, 2018

Yes, those are fine.

@jmohdyusof
Copy link
Contributor

How strict is the 'don't import Keras' restriction? We need to be able to read the default_model file to get data locations, as well as import the command line parser, so this implies some sort of split between the initialize_parameters stage, the data load and the actual run. I think it makes sense to make the initialize_parameters a standalone function also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants