Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gihanpanapitiya patch 1 #95

Open
wants to merge 655 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
655 commits
Select commit Hold shift + click to select a range
4750804
Simplify PID handling
j-woz Dec 23, 2022
d0cf3c9
Update Swift/T for Polaris
j-woz Dec 23, 2022
1242847
Propagate CANDLE/IMPROVE settings
j-woz Dec 23, 2022
ecad83b
Spelling
j-woz Dec 23, 2022
d6482d0
Fix header
j-woz Dec 23, 2022
00c3add
Better PARAM_SET_FILE handling
j-woz Dec 23, 2022
8310f24
Do not need PYTHONPATH setting here for Polaris
j-woz Dec 23, 2022
3c4f7e8
Initial test for GraphDRP
j-woz Dec 23, 2022
c57ca41
Clean up and support IMPROVE
j-woz Dec 23, 2022
aecff19
New set-pythonpath to set the PYTHONPATH
j-woz Dec 23, 2022
da919ea
Fix header
j-woz Dec 23, 2022
b1e89dd
Fix PPN for single-node runs
j-woz Dec 23, 2022
62d38e0
Clean up
j-woz Dec 23, 2022
602d3a6
Add node/rank log model.log
j-woz Dec 23, 2022
1530cfd
Add better mlrMBO output
j-woz Dec 23, 2022
b82668f
Fix comments
j-woz Dec 23, 2022
1330abc
Make GDRP a little bigger
Dec 31, 2022
be3b526
Move APP_PYTHONPATH logic
j-woz Jan 9, 2023
2dbd5cb
o working version on lambda0
rajeeja Jan 11, 2023
9863d05
Configure EQ-Py on Lambda
j-woz Jan 11, 2023
554a328
Add Lambda to list of systems with Swift/T PYTHONPATH
j-woz Jan 11, 2023
97b67b4
o Changes for GA
rajeeja Jan 11, 2023
627483f
Drop known-benchmarks - see set-pythonpath.sh
j-woz Jan 11, 2023
f904a5d
o Run one job at a time and don't use Benchmarks/common for candle lib
rajeeja Jan 11, 2023
c790d40
Merge branch 'develop' of https://github.com/ECP-CANDLE/Supervisor in…
rajeeja Jan 11, 2023
85b126b
Improve log_path() for unset variables
j-woz Jan 11, 2023
105a04e
Initial CANDLE-compliant model
j-woz Jan 12, 2023
670bc00
First model call works
j-woz Jan 12, 2023
84ad741
o Moving towards containers (Singularity) for GA
rajeeja Jan 13, 2023
3df1628
o Add GraphDRP Singularity with GA workflow
rajeeja Jan 15, 2023
e9baca4
o GA problem for Polaris GraphDRP
Jan 17, 2023
2d1ce31
Provide EQ/Py; export PYTHONPATH
j-woz Jan 18, 2023
889ccb2
Update header
j-woz Jan 18, 2023
7557d31
Shorten job name to fit for PBS
j-woz Jan 18, 2023
861fa8d
Use single-node job for debugging
j-woz Jan 18, 2023
17734e8
o Increase time and make epochs constant for test runs on polaris
Jan 18, 2023
b9b5471
o Add One D test model
rajeeja Jan 19, 2023
3beb13c
o Bigger GraphDRP HPO setup
Jan 19, 2023
f7a9088
o Few fixes, need more epoch stuff with history object
rajeeja Jan 19, 2023
16a0f19
o Add R file and fix one baseline run, mlrMBO still has some Error in…
rajeeja Jan 20, 2023
440ecce
o Working version of oneD, needed to remove impute.y.fun and transfor…
rajeeja Jan 24, 2023
d6a191f
o OneDim problem now works with GA also.
rajeeja Jan 24, 2023
554cbf5
Report critical paths
j-woz Jan 24, 2023
4a883ff
o Add python settings
rajeeja Jan 24, 2023
8d33489
o bring back impute, still avoid transformation(MBO)
rajeeja Jan 25, 2023
4ee1a59
Merge - start on extracting learning rate (lr)
j-woz Jan 25, 2023
f291302
Merge in branch jpg_crusher
j-woz Feb 7, 2023
931059d
Handle errors in workflow.sh parameters
j-woz Feb 8, 2023
a031e6f
Clean up comments
j-woz Feb 8, 2023
67a70da
Add comment about additional env variables
j-woz Feb 8, 2023
7193674
Don't run UPF in mode for SINGULARITY by default
j-woz Feb 8, 2023
574acad
Attempt to fix error capture
j-woz Feb 8, 2023
1dcb5cd
Add more comments, etc.
j-woz Feb 13, 2023
1c565b0
Better log parsing
j-woz Feb 13, 2023
6d3be63
Merge branch 'develop' of github.com:ECP-CANDLE/Supervisor into develop
j-woz Feb 13, 2023
5e9e6bc
WIP Comparator
j-woz Feb 14, 2023
9fc53e0
Basic Crusher tests
j-woz Feb 14, 2023
5598477
Settings for Crusher
j-woz Feb 14, 2023
d958ae4
New utility scripts
j-woz Feb 14, 2023
5f4da72
Add usage note
j-woz Feb 15, 2023
61105c7
Check for python
j-woz Feb 15, 2023
a25a41e
Fix output file location
j-woz Feb 15, 2023
dcccde7
Support node selection in print-node-info
j-woz Feb 16, 2023
3317309
Document node selection
j-woz Feb 16, 2023
b0c4345
Merge branch 'develop' of github.com:ECP-CANDLE/Supervisor into develop
j-woz Feb 16, 2023
8164bff
Handle user error
j-woz Feb 16, 2023
2b0f4dd
Fix comment
j-woz Feb 17, 2023
d592047
Quick restart example
j-woz Feb 17, 2023
ba7f98b
WS
j-woz Feb 17, 2023
d50f77f
Zero-pad node IDs
j-woz Feb 17, 2023
097cc92
Remove debug output
j-woz Feb 17, 2023
756efaa
Merge branch 'develop' of github.com:ECP-CANDLE/Supervisor into develop
j-woz Feb 17, 2023
f91c08f
Fix parent weights location
j-woz Feb 20, 2023
d81a7c5
Drop obj_prio()
j-woz Feb 21, 2023
f0ffa6b
Log the result from model.sh
j-woz Feb 21, 2023
8f265c8
Fix call signature between obj_app and obj_py
j-woz Feb 21, 2023
0cce568
Fix return if NVM is not enabled
j-woz Feb 21, 2023
e5b2133
Fix out-*.txt if not on Summit
j-woz Feb 21, 2023
54d31e5
Better error handling
j-woz Feb 21, 2023
3320349
Simple test for Lambda
j-woz Feb 21, 2023
8fc7ca6
Draft settings for Frontier
j-woz Feb 22, 2023
5b4b269
Drop reference to EQR here
j-woz Feb 22, 2023
2d380a8
Fix bad merge
j-woz Feb 22, 2023
7f1aa8f
Add PY for Frontier
j-woz Feb 22, 2023
793e0cc
Fix obj_py() for iter_indiv_id
j-woz Feb 22, 2023
9cbaf4d
Fix bad merge
j-woz Feb 22, 2023
a1a59b1
Turn off UNBUFFERED
j-woz Feb 22, 2023
a03387d
Fixes for Frontier
j-woz Feb 23, 2023
93648e5
Fixes for Frontier
j-woz Feb 23, 2023
6662ba3
Default to 1 hour walltime
j-woz Feb 23, 2023
e751f88
Set GPUs
j-woz Feb 25, 2023
8e2481d
Add Uno to PYTHONPATH
j-woz Feb 25, 2023
1212fc2
PYTHONPATH setting for Frontier
j-woz Feb 25, 2023
d455ecd
Add warning message
j-woz Feb 25, 2023
b057e4c
Use smaller job name
j-woz Feb 25, 2023
a63c999
Set default PROCS, PPN to 8 on Frontier
j-woz Feb 25, 2023
e717b6f
Record new PLAN_JSON
j-woz Feb 25, 2023
c02933a
Set default walltime to 1h
j-woz Feb 25, 2023
d88e64e
Check for directory
j-woz Feb 25, 2023
1f316ac
Fail fast if CANDLE_DATA_DIR is not set
j-woz Feb 25, 2023
8886f6c
Handle normal timeouts on Frontier
j-woz Feb 27, 2023
6e363a0
Be more verbose
j-woz Feb 27, 2023
de17c76
Remove generated training data by default
j-woz Feb 27, 2023
38bc5e9
Clean up
j-woz Feb 27, 2023
5507a76
Support big plan N=16
j-woz Feb 27, 2023
42fce3d
Support multi-digit node components
j-woz Feb 27, 2023
642d944
sbcast test works
j-woz Mar 1, 2023
180d804
Adding hook-1.tcl mpi-io.sh
j-woz Mar 1, 2023
e8a2868
Adding README.adoc
j-woz Mar 1, 2023
2e0c410
Respect user TURBINE_PRELAUNCH
j-woz Mar 1, 2023
a811803
Say "guide"
j-woz Mar 1, 2023
a880664
o Few steps to get the comparator workflow working
rajeeja Mar 4, 2023
9cea247
Adding cmp-cv/swift/workflow.sh
j-woz Mar 8, 2023
bf00a43
Adding cmp-cv/swift/workflow.swift
j-woz Mar 8, 2023
83e628c
Initial structure
j-woz Mar 8, 2023
9454737
o Use CANDLE_OUTPUT_DIR as this is a required variable for sending o/…
rajeeja Mar 15, 2023
3b1c4c7
o Add export of CANDLE_OUTPUT_DIR to workflow.sh for non SINGULARITY …
rajeeja Mar 15, 2023
bf222c0
o Use model.sh to populate CANDLE_OUTPUT_DIR
rajeeja Mar 15, 2023
f0e4eae
WIP cmp-cv workflow
j-woz Mar 16, 2023
853b065
workflow to find domain errors
Mar 22, 2023
33d0ca2
readme updated
Mar 22, 2023
cd127f3
readme updated
Mar 22, 2023
458581f
readme updated
Mar 22, 2023
50f55ef
readme updated
Mar 22, 2023
c048739
instructions to run the examplle
Mar 22, 2023
bceb974
Update README.adoc
gihanpanapitiya Mar 22, 2023
f7d99f0
Update README.adoc
gihanpanapitiya Mar 22, 2023
3d7e503
Update README.adoc
gihanpanapitiya Mar 22, 2023
fad4c98
Update model.sh
gihanpanapitiya Mar 22, 2023
72ea275
model.sh, obj_app.swift
Mar 23, 2023
dfb18a9
obj_container
Mar 23, 2023
8e89a8c
about compare.py
Mar 24, 2023
a3ba809
Merge pull request #94 from gihanpanapitiya/gihan_cmp
rajeeja Mar 24, 2023
3a8779e
o Fix comments formatting etc.
rajeeja Mar 24, 2023
c7786d8
o set SWIFT_IMPL, more fixes needed for CDD and singularity runs
rajeeja Mar 24, 2023
d6005e9
o Try to run singularity workflows
rajeeja Mar 24, 2023
987cece
o get candle model type from swift file
rajeeja Mar 25, 2023
2fded36
o change the default setting logic for CMT
rajeeja Mar 25, 2023
5205a2d
o set model name to cmp if none specified for comparison workflows
rajeeja Mar 25, 2023
341367e
o Fix flags for cmp workflow and singularity
rajeeja Mar 25, 2023
e8a4e6d
o Changes for Mac OSX M1 and also fixes for GA test-1 that would fix …
rajeeja Apr 3, 2023
7e1f810
o Remove hard coded name
rajeeja Apr 3, 2023
691e659
o Get cmp-cv to run
Apr 4, 2023
43b1ede
Merge branch 'develop' of https://github.com/ECP-CANDLE/Supervisor in…
Apr 4, 2023
e6684e6
Do not quote strings
j-woz Apr 4, 2023
25eceae
WS
j-woz Apr 4, 2023
e7f08af
Add numbers
j-woz Apr 4, 2023
4a31351
o Write empty results.txt if nothing found, add more Params to singul…
Apr 5, 2023
61d14e3
o Move compare.py to cmp-cv workflow level
Apr 5, 2023
a842222
o enable drug_features.csv, a manual process to put the file in CDD f…
Apr 6, 2023
2c100d7
o Some fixes to the noise workflow
rajeeja Apr 12, 2023
324d2ac
o Add new workflow cross-study-generalization CSG, also add new param…
Apr 12, 2023
5833ddd
o Fix args
Apr 12, 2023
614ef9f
o Add comments
Apr 12, 2023
8ae9d8b
o Fixes for various workflows GA, Noise, CSG on site Polaris ANL
Apr 12, 2023
f51f443
Initial dense-noise workflow
j-woz Apr 18, 2023
c54222c
Call everything CANDLE_MODEL_IMPL, "container" is now one of these
j-woz Apr 25, 2023
c64b70b
Update Swift/T/Frontier
j-woz Apr 26, 2023
beb84f2
Better messaging on Frontier
j-woz Apr 26, 2023
b675a35
Fix error message
j-woz Apr 26, 2023
1c80ec1
More checks
j-woz Apr 26, 2023
9214ee7
Shorten job name on Frontier
j-woz Apr 26, 2023
4edc456
WS
j-woz Apr 26, 2023
f1a4960
Update function name
j-woz Apr 26, 2023
8f23034
Support for model.log
j-woz May 3, 2023
0d9efa8
o obj is not called candle_model_train with an extra argument model_name
rajeeja May 3, 2023
713dbb3
o fix duration variable not declared error
rajeeja May 3, 2023
495fbf5
o Change workflow for GA with new args for candle_model_train
rajeeja May 3, 2023
d7b01fa
Fixes for Frontier
j-woz May 8, 2023
2fef679
Merge branch 'develop' of github.com:ECP-CANDLE/Supervisor into develop
j-woz May 8, 2023
a90343b
Merge
j-woz May 8, 2023
ec7752c
Use new Swift/T
j-woz May 11, 2023
615d6cf
Use plan file from NVMe
j-woz May 11, 2023
193a34c
o Fix things as per new defs
May 12, 2023
b9bd396
o Fix cmp-cv as per candle_model..
May 12, 2023
eaab019
Handle more errors
j-woz May 17, 2023
e618201
Try new Swift/T
j-woz May 17, 2023
9ad1888
Merge branch 'develop' of github.com:ECP-CANDLE/Supervisor into develop
j-woz May 17, 2023
07a750a
Backup DB and its log
j-woz May 17, 2023
39451c3
Better output and logging
j-woz May 17, 2023
b9adec3
Better output and logging
j-woz May 17, 2023
8e59501
Minor changes
j-woz May 17, 2023
2c4f786
Update for new log format
j-woz May 17, 2023
424d085
Better logging
j-woz May 22, 2023
14fac21
Merge branch 'develop' of github.com:ECP-CANDLE/Supervisor into develop
j-woz May 22, 2023
b2ada6e
Merge
j-woz May 22, 2023
0381057
Set default timeout
j-woz May 23, 2023
bf7432d
New Swift/T
j-woz May 24, 2023
1e986ec
Update header
j-woz May 24, 2023
89e744d
Update to new names
j-woz May 24, 2023
f8c95f4
Back to CANDLE_MODEL_IMPL="container"
j-woz May 24, 2023
36ba170
Merge
j-woz May 24, 2023
a9ecbc7
Set CANDLE_MODEL_IMPL
j-woz May 24, 2023
2c4e0cb
Reduce get_expid arguments
j-woz May 24, 2023
f447109
Improve error message
j-woz May 24, 2023
95b98ca
Fix usage message
j-woz May 24, 2023
48417b8
Set PARAM_SET_FILE for now
j-woz May 24, 2023
feff4e2
Rename to MODEL_RETURN
j-woz May 24, 2023
fac09e4
Provide PARAM_SET_FILE=graphdrp_small.R
j-woz May 24, 2023
1369a4f
Rename to MODEL_RETURN
j-woz May 24, 2023
39c9dc2
Set MODEL_RETURN
j-woz May 24, 2023
b7c8428
Auto-set CANDLE_MODEL_IMPL="container" when CANDLE_MODEL_TYPE==SINGUL…
j-woz May 24, 2023
dc4abf1
Readability improvement; drop TURBINE_STDOUT for now
j-woz May 24, 2023
aa09e78
Change back to IMPROVE_RESULT for now
j-woz May 25, 2023
ddbbdac
Fix experiments directory for Singularity runs
j-woz May 26, 2023
6d62bdd
Settings for Lambda7
May 26, 2023
c1c02d0
Merge branch 'develop' of github.com:ECP-CANDLE/Supervisor into develop
j-woz May 26, 2023
044ecb2
Fix syntax
May 26, 2023
83b80d6
New test from Wilke
j-woz May 26, 2023
0fdd2cc
Adding random_baseline_keras2.py
j-woz May 30, 2023
8fc43d7
get_expid() takes 1 argument
j-woz May 30, 2023
35622c6
Clean up
j-woz May 30, 2023
92e45fe
Allow this to be unset - user can set it
j-woz May 30, 2023
df4f3ac
Settings for Lambda7
j-woz May 30, 2023
8e40608
Better structure when not using a Benchmark; add Random
j-woz May 30, 2023
90ed602
Better messaging and comments
j-woz May 30, 2023
13d7d72
Update names
j-woz May 30, 2023
8d7b0f2
New generic test for GA
j-woz May 30, 2023
8a9dff0
Clean up
j-woz May 30, 2023
558af6e
Remove PARAM_SET_FILE
j-woz May 30, 2023
ae05b5b
Add nice output log
j-woz May 30, 2023
4a3ba55
Add human-readable report at end
j-woz May 30, 2023
1741625
Prevent "SUCCESS" on job failure
j-woz May 30, 2023
68a8d0d
Clean up
j-woz May 30, 2023
5695aa6
Update names
j-woz May 30, 2023
04a1b7f
Fix typo
j-woz May 30, 2023
d17d647
New GA test for SIFs
j-woz May 30, 2023
0e8e840
New param space for HiDRA
j-woz May 31, 2023
9cf04f2
Probably better error handling
j-woz May 31, 2023
ba12653
Better final output
j-woz May 31, 2023
e85c949
Update header
j-woz Jun 1, 2023
16ffa52
Add an iteration report
j-woz Jun 1, 2023
c1df72c
Merge
j-woz Jun 1, 2023
c2ac94c
Merge
j-woz Jun 1, 2023
2db6133
Adding data/paccman_param_space.json data/graphdrp_param_space.json
j-woz Jun 2, 2023
326fe31
Adding data/paccmann_param_space.json
j-woz Jun 2, 2023
1d6d108
o Add IGTD for lambda
Jun 3, 2023
1fede6f
Drop- misspelled
j-woz Jun 5, 2023
3859ef4
Use modern Swift
j-woz Jun 5, 2023
d6a122c
Initial tests for Paccmann and tCNNS
j-woz Jun 5, 2023
9c1b24b
Merge branch 'develop' of github.com:ECP-CANDLE/Supervisor into develop
j-woz Jun 5, 2023
7d5f299
Adding data/tcnns_param_space.json
j-woz Jun 5, 2023
bc7ba09
Enable error handling in DEAP
j-woz Jun 6, 2023
7cd829f
Merge
j-woz Jun 6, 2023
5be9fb4
New Swift/T for Polaris
j-woz Jun 6, 2023
ab525e1
Clean up
j-woz Jun 6, 2023
0f07c1f
Ignore errors in tCNNS
j-woz Jun 6, 2023
20455ab
Support random fake crashes
j-woz Jun 6, 2023
a410b6c
Enable TURBINE_STDOUT for cmp-cv
j-woz Jun 9, 2023
522e0dc
Update compare.py
gihanpanapitiya Jun 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
15 changes: 15 additions & 0 deletions .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: pre-commit

on:
pull_request:
push:
branches:
- master

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/[email protected]
- uses: pre-commit/[email protected]
22 changes: 22 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
repos:
- repo: https://github.com/pre-commit/mirrors-yapf # To format the code to conform YAPF
rev: v0.31.0
hooks:
- id: yapf
args: ['--in-place', '--recursive', '--style', 'google']

- repo: https://github.com/myint/docformatter # To format the doc strings to conform PEP257
rev: v1.4
hooks:
- id: docformatter
args: [--in-place]

- repo: https://github.com/pre-commit/pre-commit-hooks # Some common pre-commit hooks
rev: v3.4.0
hooks:
- id: check-yaml # Checks the syntax of .yaml files.
args: [--allow-multiple-documents]
exclude: 'meta.yaml' # Exclude this because it gives an error for '%' in Line 1 and couldn't fix yet
- id: end-of-file-fixer # Makes sure files end with a newline.
- id: trailing-whitespace # Checks for any tabs or spaces after the last non-whitespace character on the line.
- id: check-docstring-first # Checks that code comes after the docstrings.
38 changes: 38 additions & 0 deletions README.adoc
Original file line number Diff line number Diff line change
@@ -1 +1,39 @@
See the https://ecp-candle.github.io/Supervisor/home.html[Home Page] for more information.

# Running the feature domain based comparison

- Create the CANDLE_DATA_DIR. Place drug_features.csv in the CANDLE_DATA_DIR
- drug_features.csv shoulld contain the drug features of at least the test set drug molecules
- The paths of the model's directories have to be added to the PYTHONPATH in workflow.sh
- Start the run using the command ./test-small-1.sh SITE, where SITE is the name of the computing system. test-small-1.sh is at workflows/cmp-cv/test
- upf-1.txt is used as the input file to specify the model hyperparameters as well as the model name and candle_image location.

```
{"id": "RUN000", "epochs": 1, "model_name": "DrugCell", "candle_image": "/path/to/sif/DrugCell.sif"}
{"id": "RUN001", "epochs": 2, "model_name": "DrugCell", "candle_image": "/path/to/sif/DrugCell.sif"}
{"id": "RUN002", "epochs": 1, "model_name": "SWnet_CCLE", "candle_image": "/path/to/sif/SWnet.sif"}
{"id": "RUN003", "epochs": 2, "model_name": "SWnet_CCLE", "candle_image": "/path/to/sif/SWnet.sif"}
```

### Running the specific example at workflows/cmp-cv/test

- Clone Supervisor from https://github.com/ECP-CANDLE/Supervisor
- Clone the DrugCell and SWnet model directories from https://github.com/gihanpanapitiya/DrugCell/tree/to_candle and https://github.com/gihanpanapitiya/SWnet/tree/to_candle
- Checkout to_candle branches and create the Singularity containers (.sif files) using the command,

```
singularity build --fakeroot /path/for/sif/DerugCell.sif /path/to/DrugCell.def
singularity build --fakeroot /path/for/sif/SWnet.sif /path/to/SWnet.def
```

- Add /path/for/sif/DerugCell.sif and /path/for/sif/SWnet.sif to the PYTHONPATH in workflow.sh
- Create the CANDLE_DATA_DIR. Place drug_features.csv in the CANDLE_DATA_DIR
- Run the command ./test-small-1.sh SITE


#### Known issues

- some input files required for analysis have to be manually added to candle data dir
- outputs get written to 'experiments' not CANDLE_DATA_DIR
- python paths have to be explicitly specified in workflow.sh
- singularity container is not being used even though the CANDLE_MODEL_TYPE=SINGULARITY is specified
6 changes: 3 additions & 3 deletions archives/py-loc/p.swift
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ import location;

L0 = locationFromRank(0);
L1 = locationFromRank(1);

@location=L0 python_persist("L = []");
@location=L1 python_persist("L = []");
string D[];
foreach j in [0:9] {
string D[];
foreach j in [0:9] {
L = locationFromRank(j%%2);
D[j] = @location=L python_persist("L.append(repr(2+%i)) " % j);
}
Expand Down
22 changes: 11 additions & 11 deletions archives/templates/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,17 @@ In more detail, here are the steps required for running an arbitrary workflow on
1. Ensure the `$SITE` and `$CANDLE` variables are exported to the environment as specified [here](#CANDLE-settings-at-different-SITEs).
1. Copy the submission script `$CANDLE/Supervisor/templates/submit_candle_job.sh` to a working directory.
1. Specify the model in the submission script:
1. Set the `$MODEL_PYTHON_SCRIPT` variable to one of the models in the `$CANDLE/Supervisor/templates/models` directory (currently either "resnet", "unet", "uno", or "mnist_mlp"). Or, specify your own [CANDLE-compliant](https://ecp-candle.github.io/Candle/html/tutorials/writing_candle_code.html) Python model by setting both the `$MODEL_PYTHON_DIR` and `$MODEL_PYTHON_SCRIPT` variables as appropriate.
1. Specify the corresponding default model parameters by setting the `$DEFAULT_PARAMS_FILE` variable to one of the files in the `$CANDLE/Supervisor/templates/model_params` directory. Or, copy one of these template files to the working directory, modify it accordingly, and point the `$DEFAULT_PARAMS_FILE` variable to this file.
1. Set the `$MODEL_PYTHON_SCRIPT` variable to one of the models in the `$CANDLE/Supervisor/templates/models` directory (currently either "resnet", "unet", "uno", or "mnist_mlp"). Or, specify your own [CANDLE-compliant](https://ecp-candle.github.io/Candle/html/tutorials/writing_candle_code.html) Python model by setting both the `$MODEL_PYTHON_DIR` and `$MODEL_PYTHON_SCRIPT` variables as appropriate.
1. Specify the corresponding default model parameters by setting the `$DEFAULT_PARAMS_FILE` variable to one of the files in the `$CANDLE/Supervisor/templates/model_params` directory. Or, copy one of these template files to the working directory, modify it accordingly, and point the `$DEFAULT_PARAMS_FILE` variable to this file.
1. Specify the workflow in the submission script:
1. Set the `$WORKFLOW_TYPE` variable as appropriate (currently supported are "upf", and, to a less-tested extent, "mlrMBO").
1. Specify the corresponding workflow settings by setting the `$WORKFLOW_SETTINGS_FILE` variable to one of the files in the `$CANDLE/Supervisor/templates/workflow_settings` directory. Or, copy one of these template files to the working directory, modify it accordingly, and point the `$WORKFLOW_SETTINGS_FILE` variable to this file.
1. Specify the corresponding workflow settings by setting the `$WORKFLOW_SETTINGS_FILE` variable to one of the files in the `$CANDLE/Supervisor/templates/workflow_settings` directory. Or, copy one of these template files to the working directory, modify it accordingly, and point the `$WORKFLOW_SETTINGS_FILE` variable to this file.
1. Adjust any other variables in the submission script such as the output directory (specified by `$EXPERIMENTS`), the scheduler settings, etc.
1. Run the script from a submit node like `./submit_candle_job.sh`.

## Background

In general, it would be nice to allow for an arbitrary model (U-Net, ResNet, etc.) to be run using an arbitrary workflow (UPF, mlrMBO, etc.), all in an external working directory. For example, here is a sample submission script:
In general, it would be nice to allow for an arbitrary model (U-Net, ResNet, etc.) to be run using an arbitrary workflow (UPF, mlrMBO, etc.), all in an external working directory. For example, here is a sample submission script:

```bash
#!/bin/bash
Expand Down Expand Up @@ -60,13 +60,13 @@ export WORKFLOW_SETTINGS_FILE="/home/weismanal/notebook/2019-02-28/unet/upf1.txt
$CANDLE/Supervisor/workflows/$WORKFLOW_TYPE/swift/workflow.sh $SITE -a $CANDLE/Supervisor/workflows/common/sh/cfg-sys-$SITE.sh $WORKFLOW_SETTINGS_FILE
```

When this script is run (no arguments accepted) on a Biowulf submit node, the necessarily [CANDLE-compliant](https://ecp-candle.github.io/Candle/html/tutorials/writing_candle_code.html) file `$MODEL_PYTHON_DIR/$MODEL_PYTHON_SCRIPT.py` will be run using the default parameters specified in `$DEFAULT_PARAMS_FILE`. The CANDLE workflow used will be UPF (specified by `$WORKFLOW_TYPE`) and will be run using the parameters specified in `$WORKFLOW_SETTINGS_FILE`. The results of the job will be output in `$EXPERIMENTS`. Note that we can choose a different workflow by simply changing the value of the `$WORKFLOW_TYPE` variable, e.g.,
When this script is run (no arguments accepted) on a Biowulf submit node, the necessarily [CANDLE-compliant](https://ecp-candle.github.io/Candle/html/tutorials/writing_candle_code.html) file `$MODEL_PYTHON_DIR/$MODEL_PYTHON_SCRIPT.py` will be run using the default parameters specified in `$DEFAULT_PARAMS_FILE`. The CANDLE workflow used will be UPF (specified by `$WORKFLOW_TYPE`) and will be run using the parameters specified in `$WORKFLOW_SETTINGS_FILE`. The results of the job will be output in `$EXPERIMENTS`. Note that we can choose a different workflow by simply changing the value of the `$WORKFLOW_TYPE` variable, e.g.,

```bash
export WORKFLOW_TYPE="mlrMBO"
```

In the sample submission script above, the Python script containing the model (my_specialized_unet.py), the default model parameters (default_params.txt), and the unrolled parameter file (upf1.txt) are all specified in the "unet" subdirectory of the working directory "/home/weismanal/notebook/2019-02-28". However, often a model, its default parameters, and a workflow's settings can be reused.
In the sample submission script above, the Python script containing the model (my_specialized_unet.py), the default model parameters (default_params.txt), and the unrolled parameter file (upf1.txt) are all specified in the "unet" subdirectory of the working directory "/home/weismanal/notebook/2019-02-28". However, often a model, its default parameters, and a workflow's settings can be reused.

Thus, we provide templates of these three types of files in the `$CANDLE/Supervisor/templates` directory, the current structure of which is:

Expand Down Expand Up @@ -102,7 +102,7 @@ export WORKFLOW_SETTINGS_FILE="/home/weismanal/notebook/2019-02-28/unet/upf1.txt
export WORKFLOW_SETTINGS_FILE="$CANDLE/Supervisor/templates/workflow_settings/upf1.txt"
```

The template submission script located at `$CANDLE/Supervisor/templates/submit_candle_job.sh` utilizes all three of these types of templates and will just work (running an HPO on the MNIST dataset) as long as the `$CANDLE` and `$SITE` variables are set correctly.
The template submission script located at `$CANDLE/Supervisor/templates/submit_candle_job.sh` utilizes all three of these types of templates and will just work (running an HPO on the MNIST dataset) as long as the `$CANDLE` and `$SITE` variables are set correctly.

## Notes

Expand All @@ -119,10 +119,10 @@ mymodel_common = candle.Benchmark(file_path, os.getenv("DEFAULT_PARAMS_FILE"), '

I'd recommend this be added to the standard method for making a model [CANDLE-compliant](https://ecp-candle.github.io/Candle/html/tutorials/writing_candle_code.html).

Note further that `$DEFAULT_PARAMS_FILE` must be a full pathname. Otherwise, if we just used the filename "default_params.txt" hardcoded into the `$MODEL_PYTHON_SCRIPT`, the script would look for this global parameter file in the same directory that it's in (i.e., `$MODEL_PYTHON_DIR`), but that would preclude using a `$MODEL_PYTHON_SCRIPT` that's a symbolic link. In that case, we'd have to always copy the `$MODEL_PYTHON_SCRIPT` to the current working directory, which is inefficient because this leads to unnecessary duplication of code.
Note further that `$DEFAULT_PARAMS_FILE` must be a full pathname. Otherwise, if we just used the filename "default_params.txt" hardcoded into the `$MODEL_PYTHON_SCRIPT`, the script would look for this global parameter file in the same directory that it's in (i.e., `$MODEL_PYTHON_DIR`), but that would preclude using a `$MODEL_PYTHON_SCRIPT` that's a symbolic link. In that case, we'd have to always copy the `$MODEL_PYTHON_SCRIPT` to the current working directory, which is inefficient because this leads to unnecessary duplication of code.

### CANDLE settings at different SITEs

`$SITE` | `$CANDLE`
:---: | :---:
biowulf | /data/BIDS-HPC/public/candle
| `$SITE` | `$CANDLE` |
| :-----: | :--------------------------: |
| biowulf | /data/BIDS-HPC/public/candle |
2 changes: 1 addition & 1 deletion archives/templates/language_agnostic/submit_candle_job.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ export SITE="biowulf"
# Job specification
export EXPERIMENTS="$MY_DIR"
#TODO GZ: These 2 variables are not needed
export MODEL_NAME="mnist_upf_test"
export MODEL_NAME="mnist_upf_test"
export OBJ_RETURN="val_loss"

# Scheduler settings
Expand Down
8 changes: 4 additions & 4 deletions archives/templates/language_agnostic/train_model.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import sys
import pickle
import os
import pickle
import random
import sys

#Generate a random loss function
# Generate a random loss function
print(str(sys.argv))
print(random.uniform(0,1))
print(random.uniform(0, 1))
2 changes: 1 addition & 1 deletion archives/templates/model_params/mnist1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ epochs=20
batch_size=128
activation='relu'
optimizer='rmsprop'
num_filters=32
num_filters=32
2 changes: 1 addition & 1 deletion archives/templates/model_params/uno1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,4 @@ use_landmark_genes = True
validation_split = 0.2
verbose = None
warmup_lr = False
save='save/uno'
save='save/uno'
12 changes: 7 additions & 5 deletions archives/templates/models/mnist/mnist.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
# add candle_keras library in path
candle_lib = '/data/BIDS-HPC/public/candle/Candle/common'
candle_lib = "/data/BIDS-HPC/public/candle/Candle/common"
import sys
sys.path.append(candle_lib)

sys.path.append(candle_lib)

import os
#import sys

# import sys
file_path = os.path.dirname(os.path.realpath(__file__))
lib_path = os.path.abspath(os.path.join(file_path, '..', '..', 'common'))
lib_path = os.path.abspath(os.path.join(file_path, "..", "..", "common"))
sys.path.append(lib_path)

import candle_keras as candle
Expand All @@ -19,10 +20,11 @@
additional_definitions = None
required = None


class MNIST(candle.Benchmark):

def set_locals(self):
if required is not None:
self.required = set(required)
if additional_definitions is not None:
self.additional_definitions = additional_definitions

75 changes: 41 additions & 34 deletions archives/templates/models/mnist/mnist_mlp.py
Original file line number Diff line number Diff line change
@@ -1,62 +1,64 @@
import mnist
import os

from keras.callbacks import CSVLogger
import mnist
from keras import backend as K
from keras.callbacks import CSVLogger


def initialize_parameters():
mnist_common = mnist.MNIST(mnist.file_path,
mnist_common = mnist.MNIST(
mnist.file_path,
os.getenv("DEFAULT_PARAMS_FILE"),
'keras',
prog='mnist_mlp',
desc='MNIST example'
"keras",
prog="mnist_mlp",
desc="MNIST example",
)

import candle_keras as candle

# Initialize parameters
gParameters = candle.initialize_parameters(mnist_common)
csv_logger = CSVLogger('{}/params.log'.format(gParameters))
csv_logger = CSVLogger("{}/params.log".format(gParameters))

return gParameters


def run(gParameters):
##########################################
# Your DL start here. See mnist_mlp.py #
##########################################
'''Trains a simple deep NN on the MNIST dataset.
"""Trains a simple deep NN on the MNIST dataset.

Gets to 98.40% test accuracy after 20 epochs
(there is *a lot* of margin for parameter tuning).
2 seconds per epoch on a K520 GPU.
'''
Gets to 98.40% test accuracy after 20 epochs (there is *a lot* of
margin for parameter tuning). 2 seconds per epoch on a K520 GPU.
"""

# from __future__ import print_function

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.models import Sequential
from keras.optimizers import RMSprop

batch_size = gParameters['batch_size']
batch_size = gParameters["batch_size"]
num_classes = 10
epochs = gParameters['epochs']
epochs = gParameters["epochs"]

activation = gParameters['activation']
optimizer = gParameters['optimizer']
activation = gParameters["activation"]
optimizer = gParameters["optimizer"]

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
Expand All @@ -67,32 +69,37 @@ def run(gParameters):
model.add(Dropout(0.2))
model.add(Dense(512, activation=activation))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))
model.add(Dense(num_classes, activation="softmax"))

model.summary()

model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])

history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
model.compile(loss="categorical_crossentropy",
optimizer=optimizer,
metrics=["accuracy"])

history = model.fit(
x_train,
y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test),
)
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
print("Test loss:", score[0])
print("Test accuracy:", score[1])
##########################################
# End of mnist_mlp.py ####################
##########################################
return history


def main():
gParameters = initialize_parameters()
run(gParameters)

if __name__ == '__main__':

if __name__ == "__main__":
main()
try:
K.clear_session()
Expand Down
Loading