Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add storage import and export features to command line and web API (E…
…pistimio#1082) * Add files for scripts `orion db dump` and `orion db load` * Write orion db dump * Write orion db load for parameter --exp * Simplify code to get an experiment from its name * Wrap destination database into a storage. Move dump logic into a function. * dump: - add and use CLI basic arguments - use both version and name to find experiment to dump * Rewrite load * Hardcode collection list in dump * Raise runtime error if resolve is 'bump' for benchmarks. * - Add tests for orion db dump - Raise DatabaseError when error occurs in the script - If found many experiments for a given name, pick experiment with lower version * - Add tests for orion db load - raise DatabaseError when errors occur in th script - If found many experiments for a given name, pick experiment with lower version * Reformat code * Move function dump_database() into new module orion.core.worker.storage_backup * Add entry /dump to Web API. * [web api] Add download suffix to dumped file * Use one module for both import/export web API endpoints. * [web api] Receive a POST request to import data * Add function load_database into module storage_backup and move import logic into this function. * Rename param `experiment` to `name` for function dump_database() to have same param names as load_database() * Check conflicts before making import. * [Web API] Make load entry launch a process to manage import Capture import progress messages printed by log Add entry import-status to get import progress messages and status * [web api] Allow to follow import progress using a callback in backend * Add documentation for web API. * Add documentation for command line. * Add tests for web API /dump, /load and /import-status * Fix tests. * Move main functions to top of module storage_backup * For dump from pickledb, use db object directly instead of locking database before (database is always locked before anything read/write/remove operation) * Use storage instead of database for export. - NB: With this new interface, dumping whole database is slower, because we must dump experiments one by one, making many more database calls. Add heartbeat field to LockedAlgorithmState Update BaseStorageProtocol interface: - Allow to set initial algo state in create_experiment() - Allow to set algo state in initialize_algorithm_lock(), and rename it to write_algorithm_lock() Update test_db_dump - WIth new dumping interface, only algos related to available experiments are dumped, dumped data does not contain algo related to unknown experiment anymore - Check dumped algo state for an experiment * Update doc in storage_resource Update test for storage_resource * For load (from pickledb file), use db object directly instead of locking database before (database is always locked before anything read/write/remove operation) * Add benchmarks to tested databased for dump/load. * Use storage instead of database to import. BaseStorageProtocol: - add new function delete_benchmark() and implement it in child class Legacy. * Check progress callback messages in test_db_load:test_load_overwrite() * Fix pylint * Tru to set logging level in running import task * Update docs/src/user/storage.rst Co-authored-by: Xavier Bouthillier <[email protected]> * Allow to not specify a resolve strategy. If not specified, an exception will be raised as soon as a conflict is detected. * Just logging storage instead of `storage._db` in command lines `dump` and `load`. * Storage export: add an option `--force` to explicitly overwrite dumped file if already exists. * Import/export: if no specified, get latest instead of oldest version for specific experiment * Use NamedTemporaryFile to generate temporary file in storage_resource. * Rewrite docstring for class ImportTask in Numpy style. * Fix a function name * Rename test fixture used to test storage export. * Add comment in test_dump_unknown_experiment() to explain why output file is created in any case. * Remove unused logger in module storage_resource. * Write generic checking functions for dump unit tests, that also verify expected number of children experiments or trials. Discovering a corner case: imported/exported experiments keeps old `refers` links. Currently remove refers links if only 1 specified experiment is dumped. * Update TODO * Regenerate experiment parent links in dst according to src when dumping, using correct dst IDs. * Regenerate experiment parent links and experiment-to-trial links in dst according to src when loading, using correct dst IDs. * Factorize tests in test_db_load and do not use prec-computed PKL files anymore (use pkl_* fixtures instead) * Refactorize code for test_storage_resource. * Use test_helpers for test_db_dump * Remove useless tests/functional/commands/__init__.py Add a test to check default storage content (we have more algos than expected) * Check new data are ignored or indeed written when importing with ignore or overwrite resolution. * Set and check trial links. * Add tests/functional/conftest * Regenerate links to root experiments in imported/exported experiments. * Add common function to write experiment in a destination storage, to use for both dump and load features. * Add a lock to ImportTask to prevent concurrent execution when updating task info. Encapsulate ImportTask to make sure lock is used when necessary. * Use Orion TreeNode to manage experiment and trial links. * Try to fix CI failing tests. Tests seems to pass for python 3.8 but not for python 3.7. - In python 3.8, we can clear previous logging handlers before setting new stream just by using new argument `force` in logging.basicConfig - In python 3.7, we must clear previous handlers manually before setting new stream. * Correctly update algorithm when setting deterministic experiment ID in tests/functional/commands/conftest * Remove irrelevant calls to `logger.setLevel` * Remove irrelevant calls to `logger.setLevel` that were added in this PR. * [web api] make sure to transfer main root log level into import sub-process * [test_storage_resource] - Now that storage_resource logging depends on root logging, we must set client logging in unit tests to be sure expected logging messages are printed in import sub-process, using caplog. - Using caplog, we can rewrite logging unit test using simulated client, instead of launching a real sub-process server. * Clean dumped file if an error occurs when dumping, **except** if file already existed and *no* overwrite specified. * Remove useless pylint in `orion.core.cli.db.load` * Move module `orion.core.worker.storage_backup` into `orion.storage.backup` * Remove fixed TODO * Remove fixed TODO in test_db_load * Clean dumped files only if no error occurs in storage_resource (files should have been deleted if an error occurred) * Test dump using a temporary output path for most unit tests except `test_dump_default` * - Work on temporary file when dumping and move it to output file only if no error occurred - Move function _gen_host_file() from orion/serving/storage_resource to orion/core/utils and rename to generate_temporary_file(). * Use pytest fixture `tmp_path` instead of manually-created temp dir to test db dump. --------- Co-authored-by: Xavier Bouthillier <[email protected]> Co-authored-by: Setepenre <[email protected]>
- Loading branch information