Download-only option #32

j-woz · 2018-10-31T18:02:43Z

Allow user to invoke Benchmark in download-only mode, which will simply download the input data if it does not exist. This is necessary on supercomputers. This mode should not import keras or any other modules not required for data download.

jmohdyusof · 2018-10-31T20:27:32Z

See, for example, p3b1. The line
fpath = fetch_data(gParameters)
is basically what you want to run separately from the 'run' command. We can modify fetch_file to allow a different base Data directory location to address your other issue?

def fetch_data(gParameters):
    """ Downloads and decompresses the data if not locally available.
        Since the training data depends on the model definition it is not loaded,
        instead the local path where the raw data resides is returned
    """
    
    path = gParameters['data_url']
    fpath = candle.fetch_file(path + gParameters['train_data'], 'Pilot3', untar=True)
    
    return fpath

j-woz · 2018-10-31T20:31:25Z

That sounds good.

jmohdyusof · 2018-10-31T20:35:41Z

So probably a command like this should work for both tickets:

python benchmark --dl_only --basedir='/scratch/candle/'

j-woz · 2018-10-31T21:14:10Z

They will read that as Deep Learn only :) .
How about --data-dir ? Will that be a standard flag for all Benchmark invocation? The default will be the current behavior (data directory == Benchmarks/Data).

jmohdyusof · 2018-10-31T22:53:07Z

Whatever we choose for keywords we can make part of the standard parser, so just decide on ones that don't conflict with other standard (keras/neon/etc) keywords.

--data_dir is fine (we currently use underscore, not dash, to separate words)

is --get_data_only clear enough without being too long?

j-woz · 2018-11-01T16:00:58Z

Yes, those are fine.

jmohdyusof · 2018-11-01T17:49:26Z

How strict is the 'don't import Keras' restriction? We need to be able to read the default_model file to get data locations, as well as import the command line parser, so this implies some sort of split between the initialize_parameters stage, the data load and the actual run. I think it makes sense to make the initialize_parameters a standalone function also.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Download-only option #32

Download-only option #32

j-woz commented Oct 31, 2018

jmohdyusof commented Oct 31, 2018 •

edited

Loading

j-woz commented Oct 31, 2018

jmohdyusof commented Oct 31, 2018

j-woz commented Oct 31, 2018

jmohdyusof commented Oct 31, 2018

j-woz commented Nov 1, 2018

jmohdyusof commented Nov 1, 2018

Download-only option #32

Download-only option #32

Comments

j-woz commented Oct 31, 2018

jmohdyusof commented Oct 31, 2018 • edited Loading

j-woz commented Oct 31, 2018

jmohdyusof commented Oct 31, 2018

j-woz commented Oct 31, 2018

jmohdyusof commented Oct 31, 2018

j-woz commented Nov 1, 2018

jmohdyusof commented Nov 1, 2018

jmohdyusof commented Oct 31, 2018 •

edited

Loading