Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add storage import and export features to command line and web API #1082

Merged
merged 79 commits into from
Aug 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
832f652
Add files for scripts `orion db dump` and `orion db load`
notoraptor Oct 3, 2022
e18f592
Write orion db dump
notoraptor Oct 4, 2022
47e5d5b
Write orion db load for parameter --exp
notoraptor Oct 4, 2022
66e586d
Simplify code to get an experiment from its name
notoraptor Oct 4, 2022
0595921
Wrap destination database into a storage.
notoraptor Oct 7, 2022
23f4923
dump:
notoraptor Oct 7, 2022
2133e68
Rewrite load
notoraptor Oct 7, 2022
01dc611
Hardcode collection list in dump
notoraptor Oct 7, 2022
7627989
Raise runtime error if resolve is 'bump' for benchmarks.
notoraptor Oct 11, 2022
b09394e
- Add tests for orion db dump
notoraptor Oct 11, 2022
7e5cb85
- Add tests for orion db load
notoraptor Oct 11, 2022
4dfe5e2
Reformat code
notoraptor Oct 11, 2022
a1d73ec
Move function dump_database() into new module orion.core.worker.stora…
notoraptor Oct 18, 2022
0d6ed79
Add entry /dump to Web API.
notoraptor Oct 18, 2022
5be6782
[web api] Add download suffix to dumped file
notoraptor Feb 7, 2023
bfec342
Use one module for both import/export web API endpoints.
notoraptor Oct 19, 2022
6223c18
[web api] Receive a POST request to import data
notoraptor Feb 7, 2023
d578389
Add function load_database into module storage_backup and move import…
notoraptor Oct 24, 2022
29e6358
Rename param `experiment` to `name` for function dump_database() to h…
notoraptor Oct 24, 2022
8b4f871
Check conflicts before making import.
notoraptor Oct 25, 2022
ae06753
[Web API]
notoraptor Feb 7, 2023
b90bc23
[web api] Allow to follow import progress using a callback in backend
notoraptor Feb 7, 2023
c6d8319
Add documentation for web API.
notoraptor Nov 7, 2022
d0f929f
Add documentation for command line.
notoraptor Nov 8, 2022
5b2d71e
Add tests for web API /dump, /load and /import-status
notoraptor Nov 8, 2022
b2abaa0
Fix tests.
notoraptor Feb 8, 2023
9dac382
Move main functions to top of module storage_backup
notoraptor Feb 8, 2023
b3ca7c3
For dump from pickledb, use db object directly instead of locking dat…
notoraptor Feb 8, 2023
10d2015
Use storage instead of database for export.
notoraptor Feb 10, 2023
0d437ee
Update doc in storage_resource
notoraptor Feb 10, 2023
3ef7bf4
For load (from pickledb file), use db object directly instead of lock…
notoraptor Feb 10, 2023
0a0d354
Add benchmarks to tested databased for dump/load.
notoraptor Feb 10, 2023
79908f5
Use storage instead of database to import.
notoraptor Feb 10, 2023
a77dabf
Check progress callback messages in test_db_load:test_load_overwrite()
notoraptor Feb 10, 2023
a84115b
Fix pylint
notoraptor Feb 10, 2023
4e134d9
Tru to set logging level in running import task
notoraptor Feb 10, 2023
722cd68
Update docs/src/user/storage.rst
notoraptor Feb 22, 2023
dcfecc2
Allow to not specify a resolve strategy. If not specified, an excepti…
notoraptor Feb 22, 2023
a927e68
Just logging storage instead of `storage._db` in command lines `dump`…
notoraptor Feb 22, 2023
10ee313
Storage export: add an option `--force` to explicitly overwrite dumpe…
notoraptor Feb 22, 2023
cd70d07
Import/export: if no specified, get latest instead of oldest version …
notoraptor Feb 22, 2023
4e57dd1
Use NamedTemporaryFile to generate temporary file in storage_resource.
notoraptor Feb 23, 2023
4a6a1de
Rewrite docstring for class ImportTask in Numpy style.
notoraptor Feb 23, 2023
8def8e9
Fix a function name
notoraptor Feb 23, 2023
955505d
Rename test fixture used to test storage export.
notoraptor Feb 23, 2023
030ed95
Add comment in test_dump_unknown_experiment() to explain why output f…
notoraptor Feb 23, 2023
d664f24
Remove unused logger in module storage_resource.
notoraptor Feb 23, 2023
cdbe15f
Write generic checking functions for dump unit tests, that also verif…
notoraptor Feb 24, 2023
bb0ef86
Update TODO
notoraptor Feb 24, 2023
05e1e99
Regenerate experiment parent links in dst according to src when dumpi…
notoraptor Feb 27, 2023
4d0cb5f
Regenerate experiment parent links and experiment-to-trial links in d…
notoraptor Feb 27, 2023
719feb1
Factorize tests in test_db_load and do not use prec-computed PKL file…
notoraptor Feb 27, 2023
85a399a
Refactorize code for test_storage_resource.
notoraptor Feb 27, 2023
21ea712
Use test_helpers for test_db_dump
notoraptor Feb 27, 2023
c4880c7
Remove useless tests/functional/commands/__init__.py
notoraptor Feb 27, 2023
7a210fe
Check new data are ignored or indeed written when importing with igno…
notoraptor Feb 27, 2023
207f2e4
Set and check trial links.
notoraptor Feb 27, 2023
9da5717
Add tests/functional/conftest
notoraptor Feb 27, 2023
0e06fcc
Regenerate links to root experiments in imported/exported experiments.
notoraptor Feb 28, 2023
5b09e52
Add common function to write experiment in a destination storage, to …
notoraptor Feb 28, 2023
86f85e9
Add a lock to ImportTask to prevent concurrent execution when updatin…
notoraptor Feb 28, 2023
70c4e47
Use Orion TreeNode to manage experiment and trial links.
notoraptor Feb 28, 2023
8ce9486
Try to fix CI failing tests. Tests seems to pass for python 3.8 but n…
notoraptor Mar 1, 2023
ded35c3
Correctly update algorithm when setting deterministic experiment ID i…
notoraptor Mar 3, 2023
7319237
Remove irrelevant calls to `logger.setLevel`
notoraptor Mar 6, 2023
d530fdf
Remove irrelevant calls to `logger.setLevel` that were added in this PR.
notoraptor Mar 6, 2023
5ba3f9b
[web api] make sure to transfer main root log level into import sub-p…
notoraptor Mar 9, 2023
79bb627
[test_storage_resource]
notoraptor Mar 9, 2023
d8b6ac6
Clean dumped file if an error occurs when dumping, **except** if file…
notoraptor Mar 16, 2023
644de4c
Remove useless pylint in `orion.core.cli.db.load`
notoraptor Mar 16, 2023
73a818f
Move module `orion.core.worker.storage_backup` into `orion.storage.ba…
notoraptor Mar 16, 2023
6d8282d
Remove fixed TODO
notoraptor Mar 16, 2023
d256890
Remove fixed TODO in test_db_load
notoraptor Mar 16, 2023
f3e6b9d
Clean dumped files only if no error occurs in storage_resource (files…
notoraptor Mar 16, 2023
e6b23fc
Test dump using a temporary output path for most unit tests except `t…
notoraptor May 12, 2023
922744d
- Work on temporary file when dumping and move it to output file only…
notoraptor May 12, 2023
91a10d9
Use pytest fixture `tmp_path` instead of manually-created temp dir to…
notoraptor May 24, 2023
34ee5fe
Merge branch 'develop' into import-export-cli-and-webapi
Delaunay Aug 10, 2023
18d3e42
Merge branch 'develop' into import-export-cli-and-webapi
Delaunay Aug 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions docs/src/user/storage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,46 @@ simply run the upgrade command.

.. _storage_python_apis:

``dump`` Export database content
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``dump`` command allows to export database content to a PickledDB PKL file.

.. code-block:: sh

orion db dump -o backup.pkl

You can also dump a specific experiment.

.. code-block:: sh

orion db dump -n exp-name -v exp-version -o backup-exp.pkl

``load`` Import database content
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``load`` command allows to import database content
from any PickledDB PKL file (including files generated by ``dump`` command).

You must specify a conflict resolution policy using ``-r/--resolve`` argument
to apply when conflicts are detected during import. Available policies are:

- ``ignore``, to ignore imported data
- ``overwrite``, to replace old data with imported data
- ``bump``, to bump version of imported data and then make import

By default, whole PKL file will be imported.

.. code-block:: sh

orion db load backup.pkl -r ignore

You can also import a specific experiment.

.. code-block:: sh

orion db load backup.pkl -r overwrite -n exp-name -v exp-version

Python APIs
===========

Expand Down
96 changes: 95 additions & 1 deletion docs/src/user/web_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -393,7 +393,6 @@ visualize your experiments and their results.

:statuscode 404: When the specified experiment doesn't exist in the database.


Benchmarks
----------
The benchmark resource permits the retrieval of in-progress and completed benchmarks. You can
Expand Down Expand Up @@ -487,6 +486,101 @@ retrieve individual benchmarks as well as a list of all your benchmarks.
or assessment, task or algorithms are not part of the existing benchmark
configuration.

Database dumping
----------------

The database dumping resource allows to dump database content
into a PickledDB and download it as PKL file.

.. http:get:: /dump

Return a PKL file containing database content.

:query name: Optional name of experiment to export. It unspecified, whole database is dumped.
:query version: Optional version of the experiment to retrieve.
If unspecified and name is specified, the **latest** version of the experiment is exported.
If both name and version are unspecified, whole database is dumped.

:statuscode 404: When an error occurred during dumping.

Database loading
----------------

The database loading resource allow to import data from a PKL file

.. http:post:: /load

Import data into database from a PKL file.
This is a POST request, as a file must be uploaded.
Launch an import task in a separate process in backend and return task ID
which may be used to get task progress.

:query file: PKL file to import
:query resolve: policy to resolve conflicts during import. Either:

- ``ignore``: ignore imported data on conflict
- ``overwrite``: overwrite ancient data on conflict
- ``bump``: bump version of imported data before insertion on conflict

:query name: Optional name of experiment to import. If unspecified, whole data from PKL file is imported.
:query version: Optional version of experiment to import.
If unspecified and name is specified, the **latest** version of the experiment is imported.
If both name and version are unspceified, whole data from PKL file is imported.

**Example response**

.. sourcecode:: http

HTTP/1.1 200 OK
Content-Type: text/javascript

.. code-block:: json

{
"task": "e453679d-e36b-427a-a14d-58fe5e42ca19"
}

:>json task: The ID of the running task that are importing data.

:statuscode 400: When an invalid query parameter is passed in the request.
:statuscode 403: When an import task is already running.

Import progression
------------------

The import progression resource allows to monitor an import task launched by ``/load`` entry.

.. http:get:: /import-status/:name

Returns status of a running import task identified by given ``name``.
``name`` is the task ID returned by ``/load`` entry.

**Example response**

.. sourcecode:: http

HTTP/1.1 200 OK
Content-Type: text/javascript

.. code-block:: json

{
"messages": ["latest", "logging", "lines", "from", "import", "process"],
"progress_message": "description of current import step",
"progress_value": 0.889,
"status": "active"
}

:>json messages: Latest logging lines printed in import process since last call to ``/import-status`` entry.
:>json progress_message: Description of current import process step.
:>json progress_value: Floating value (between 0 and 1 included) representing current import progression.
:>json status: Import process status. Either:
"active": still running
"error": terminated with an error (see latest messages for error info)
"finished": successfully terminated

:statuscode 400: When an invalid query parameter is passed in the request.

Errors
------
Oríon uses `conventional HTTP response codes <https://en.wikipedia.org/wiki/List_of_HTTP_status_codes>`_
Expand Down
61 changes: 61 additions & 0 deletions src/orion/core/cli/db/dump.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/usr/bin/env python
# pylint: disable=,protected-access
"""
Storage export tool
===================

Export database content into a file.

"""
import logging

from orion.core.cli import base as cli
from orion.core.io import experiment_builder
from orion.storage.backup import dump_database
from orion.storage.base import setup_storage

logger = logging.getLogger(__name__)

DESCRIPTION = "Export storage"


def add_subparser(parser):
"""Add the subparser that needs to be used for this command"""
dump_parser = parser.add_parser("dump", help=DESCRIPTION, description=DESCRIPTION)

cli.get_basic_args_group(dump_parser)

dump_parser.add_argument(
"-o",
"--output",
type=str,
default="dump.pkl",
help="Output file path (default: dump.pkl)",
)

dump_parser.add_argument(
"-f",
"--force",
action="store_true",
help="Whether to force overwrite if destination file already exists. "
"If specified, delete destination file and recreate a new one from scratch. "
"Otherwise (default), raise an error if destination file already exists.",
)

dump_parser.set_defaults(func=main)

return dump_parser


def main(args):
"""Script to dump storage"""
config = experiment_builder.get_cmd_config(args)
storage = setup_storage(config.get("storage"))
logger.info(f"Loaded src {storage}")
dump_database(
storage,
args["output"],
name=config.get("name"),
version=config.get("version"),
overwrite=args["force"],
)
61 changes: 61 additions & 0 deletions src/orion/core/cli/db/load.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/usr/bin/env python
"""
Storage import tool
===================

Import database content from a file.

"""
import logging

from orion.core.cli import base as cli
from orion.core.io import experiment_builder
from orion.storage.backup import load_database
from orion.storage.base import setup_storage

logger = logging.getLogger(__name__)

DESCRIPTION = "Import storage"


def add_subparser(parser):
"""Add the subparser that needs to be used for this command"""
load_parser = parser.add_parser("load", help=DESCRIPTION, description=DESCRIPTION)

cli.get_basic_args_group(load_parser)

load_parser.add_argument(
"file",
type=str,
help="File to import",
)

load_parser.add_argument(
"-r",
"--resolve",
type=str,
choices=("ignore", "overwrite", "bump"),
help="Strategy to resolve conflicts: "
"'ignore', 'overwrite' or 'bump' "
"(bump version of imported experiment). "
"When overwriting, prior trials will be deleted. "
"If not specified, an exception will be raised on any conflict detected.",
)

load_parser.set_defaults(func=main)

return load_parser


def main(args):
"""Script to import storage"""
config = experiment_builder.get_cmd_config(args)
storage = setup_storage(config.get("storage"))
logger.info(f"Loaded dst {storage}")
load_database(
storage,
load_host=args["file"],
resolve=args["resolve"],
name=config.get("name"),
version=config.get("version"),
)
2 changes: 1 addition & 1 deletion src/orion/core/cli/db/upgrade.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ def upgrade_documents(storage):
)

storage.update_experiment(uid=experiment, **experiment)
storage.initialize_algorithm_lock(uid, algorithm)
storage.write_algorithm_lock(uid, algorithm)

for trial in storage.fetch_trials(uid=uid):
# trial_config = trial.to_dict()
Expand Down
1 change: 0 additions & 1 deletion src/orion/core/cli/frontend.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
from gunicorn.app.base import BaseApplication

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

DESCRIPTION = "Starts Oríon Dashboard"

Expand Down
11 changes: 11 additions & 0 deletions src/orion/core/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from contextlib import contextmanager
from glob import glob
from importlib import import_module
from tempfile import NamedTemporaryFile

import pkg_resources

Expand Down Expand Up @@ -229,3 +230,13 @@ def sigterm_as_interrupt():
yield None

signal.signal(signal.SIGTERM, previous)


def generate_temporary_file(basename="dump", suffix=".pkl"):
"""Generate a temporary file where data could be saved.

Create an empty file without collision.
Return name of generated file.
"""
with NamedTemporaryFile(prefix=f"{basename}_", suffix=suffix, delete=False) as tf:
return tf.name
Loading