Primitive device approximation, a machine learning extension for the PREDICT Toolbox based on Lightning. Train Neural networks with PREDICT data to model the behavior of primitive devices.
Tested with:
conda
: 4.9.2pip
: 21.0.1python
: 3.8.8CUDA
: 11.2Nvidia Driver
: 460.73.01
Everything else is specified in requirements.txt
. Other/Higher versions of
these dependencies may work, but are untested.
Clone this repository:
$ git clone https://github.com/electronics-and-drives/precept.git
cd precept
into the directory and install the python package:
$ pip install .
With this the precept
API as well as the two CLIs pct
(for training) and
prc
(for inference) will be available.
precept
comes with CLIs for both training and inference based on a .yml
configuration files. For more information about all options see the
corresponding help.
$ pct --help
$ prc --help
$ pct --config ./examples/train.yml
Precept comes with the following options:
model:
learning_rate: <float, default = 0.001>
beta_1: <float, default = 0.9>
beta_2: <float, default = 0.999>
data:
data_path: <string> # Path to HDF5 database
params_x: <[string]> # List of input column names
params_y: <[string]> # List of output column names
trafo_mask_x: <[string]> # List of input paramters that will be transformed
trafo_mask_y: <[string]> # List of output paramters that will be transformed
batch_size: <int, default = 2000>
test_split: <float, default = 0.2>
num_workers: <int, default = 6>
rng_seed: <int>
serialize: <bool, default = true>
device_name: <string> # File name for output
model_prefix: <string> # Path where to store output
A default config can be generated by running:
$ pct --print_config > default.yml
Additional documentation for Lightning specific configuration can be found in their documentation.
The preferred file format for training data at this point is
HDF5, for which currently two
different formattings are supported. The file should either have two fields,
one named columns
containing a list of strings, corresponding to the
operating point parameter names and another field named data
containing the
data matrix.
In [1]: list(map(lambda k: f"{k}: {hdf_file[k].shape}", hdf_file.keys()))
Out [1]: ['columns: (18,)', 'data: (18, 16105100)']
Alternatively, if storing / reading strings is not wanted or possible (command line utilities, octave ...) the file may be formatted such that each parameter names it's own group in the file.
In [2]: list(map(lambda k: f"{k}: {f[k].shape}", f.keys()))
Out[2]:
['L: (14641000,)',
'Vbs: (14641000,)',
'Vds: (14641000,)',
'Vgs: (14641000,)',
'W: (14641000,)',
'cdb: (14641000,)',
'cds: (14641000,)',
'cgb: (14641000,)',
'cgd: (14641000,)',
'cgs: (14641000,)',
'csb: (14641000,)',
'fug: (14641000,)',
'gbd: (14641000,)',
'gbs: (14641000,)',
'gds: (14641000,)',
'gm: (14641000,)',
'gmbs: (14641000,)',
'id: (14641000,)',
'vdsat: (14641000,)',
'vth: (14641000,)']
Where columns
or group names are the headers for what is stored. These
must align with the params_x
and params_y
specification in the given
train.yml
.
If you need some toy data, check out pyrdict.
The inference interface prc
is much simpler. It takes as input only a
dictionary to all the models that should be served.
host: <string, default = localhost> # IP or hostname
port: <int, default = '5000'> # Port
models:
<model-name>:
model_path: <string> # Path to <name>-model.bin
config_path: <string> # Path to <name>-model.yaml
...
Start the flask server with the prc
command and a configuration like the one shown in examples/infer.yml
.
$ prc --config ./examples/infer.yml
* Serving Flask app 'prc' (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
Then, models can be evaluated by sending requests with the following structure:
$ curl -X POST -H "Content-Type: application/json" \
-d '{"<model-name>": {"<param 1>": [vals...], "<param 2>": [vals...], ... }}' \
127.0.0.1:5000/predict
The values for each parameter must be in a list, even if it's just one and
there must be the same number of values for each parameter and the
parameters must have been specified in the params_x
previously.
An alternative interence method for TorchScript models is currently in the works, but still in very early development. It can be found in the precppt repository.
Commented API usage examples for both training and inference can be found in
examples/training.py
and examples/inference.py
respectively.
soonTM
- Split classes into separate modules
- Install requirements in
setup
- Dump scaler and transformer in
after-fit
- Infer input and output size from x-parameters and y-parameters
- Implement serialization and compile model trace to torch script
- Implement model inference based on Flask
- Trochscript C++ interface
- CSV Data support
- Don't hardcode CSV Format
- TSV, ASCII, PSF, nutmeg, nutbin etc... support
- Alternative scaling and transforming for better use with serialized models
- Get rid of hard coded processing for additional parameter calculation
- Add training and inference API examples
- Add toy models for inference examples
- Notebooks as well
- Deprecate transformation, should be part of manual preprocessing
- Add better logging
- Add manpages for CLI(1), API(8) and CFG(5)
- Add tests
Copyright (C) 2021, Electronics & Drives Lab
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.