-
Notifications
You must be signed in to change notification settings - Fork 278
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add thinc.config.Config class * Update azure pipelines * Fix Python2.7 support * Add catalogue dependency * Update test_config * Import registry * Use catalogue for Optimizer * Fix load optimizer from config * Add config.from_str method * Add registry class * Add config.md docs * Use JSON values in config * Remove currently not set initializers and wires registries * Add SimpleEmbed class * Register layers * Register FeatureExtractor and LayerNorm layers * Tweak config docs
- Loading branch information
Showing
14 changed files
with
599 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
# Configuration files | ||
|
||
You can describe Thinc models and experiments using a configuration format | ||
based on Python's built-in `configparser` module. We add a few additional | ||
conventions to the built-in format, to provide support for non-string values, | ||
support nested objects, and to integrate with Thinc's *registry system*. | ||
|
||
## Example config | ||
|
||
|
||
``` | ||
[some_section] | ||
key1 = "string value" | ||
another_key = 1.0 | ||
# Comments, naturally | ||
third_key = ["values", "are parsed with", "json.loads()"] | ||
some_other_key = | ||
{ | ||
"multiline values?": true | ||
} | ||
# Describe nested sections with a dot notation in the section names. | ||
[some_section.subsection] | ||
# This will be moved, producing: | ||
# config["some_section"]["subsection"] = {"hi": true, "bye": false} | ||
hi = true | ||
bye = false | ||
[another_section] | ||
more_values = "yes!" | ||
null_values = null | ||
interpolation = ${some_section:third_key} | ||
``` | ||
|
||
The config format has two main differences from the built-in `configparser` | ||
module's behaviour: | ||
|
||
* JSON-formatted values. Thinc passes all values through `json.loads()` to | ||
interpret them. You can use atomic values like strings, floats, integers, | ||
or booleans, or you can use complex objects such as lists or maps. | ||
|
||
* Structured sections. Thinc uses a dot notation to build nested sections. If | ||
you have a section named `[outer_section.subsection]`, Thinc will parse that | ||
into a nested structure, placing `subsection` within `outer_section` | ||
|
||
## Registry integration | ||
|
||
Thinc's registry system lets you map string keys to functions. For instance, | ||
let's say you want to define a new optimizer. You would define a function that | ||
constructs it, and add it to the right register, like so: | ||
|
||
```python | ||
|
||
import thinc | ||
|
||
@thinc.registry.optimizers.register("my_cool_optimizer.v1") | ||
def make_my_optimizer(learn_rate, gamma): | ||
return MyCoolOptimizer(learn_rate, gamma) | ||
|
||
# Later you can retrieve your function by name: | ||
create_optimizer = thinc.registry.optimizers.get("my_cool_optimizer.v1") | ||
``` | ||
|
||
The registry lets you refer to your function by string name, which is | ||
often more convenient than passing around the function itself. This is | ||
especially useful for configuration files: you can provide the name of your | ||
function and the arguments in the config file, and you'll have everything you | ||
need to rebuild the object. | ||
|
||
Since this is a common workflow, the registry system provides a shortcut for | ||
it, the `registry.make_from_config()` function. To use it, you just need to | ||
follow a simple convention in your config file. | ||
|
||
If a section contains a key beginning with @, the `registry.make_from_config()` | ||
function will interpret the rest of that key as the name of the registry. The | ||
value will be interpreted as the name to lookup. The rest of the section will | ||
be passed to your function as arguments. Here's a simple example: | ||
|
||
``` | ||
[optimizer] | ||
@optimizers = "my_cool_optimizer.v1" | ||
learn_rate = 0.001 | ||
gamma = 1e-8 | ||
``` | ||
|
||
The `registry.make_from_config()` function will fetch your | ||
`make_my_optimizer` function from the `optimizers` registry, call it using the | ||
`learn_rate` and `gamma` arguments, and set the result of the function under | ||
the key `"optimizer"`. | ||
|
||
You can even use the `registry.make_from_config()` function to build recursive | ||
structures. Let's say your optimizer supports some sort of fancy visualisation | ||
plug-in that Thinc has never heard of. All you would need to do is create a new | ||
registry, named something like `visualizers`, and register a constructor | ||
function, such as `my_visualizer.v1`. You would also make a new version of your | ||
optimizer constructor, to pass in the new value. Now you can describe the | ||
visualizer plugin in your config, so you can use it as an argument to your optimizer: | ||
|
||
``` | ||
[optimizer] | ||
@optimizers = "my_cool_optimizer.v2" | ||
learn_rate = 0.001 | ||
gamma = 1e-8 | ||
[optimizer.visualizer] | ||
@visualizers = "my_visualizer.v1" | ||
format = "jpeg" | ||
host = "localhost" | ||
port = "8080" | ||
``` | ||
|
||
The `optimizer.visualizer` section will be placed under the | ||
`optimizer` object, using the key `visualizer` (see "structured sections" | ||
above). The `registry.make_from_config()` function will build the visualizer | ||
first, so that the result value is ready for the optimizer. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,3 +5,4 @@ | |
import numpy # noqa: F401 | ||
|
||
from .about import __name__, __version__ # noqa: F401 | ||
from ._registry import registry |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
import catalogue | ||
|
||
|
||
class registry(object): | ||
optimizers = catalogue.create("thinc", "optimizers", entry_points=True) | ||
schedules = catalogue.create("thinc", "schedules", entry_points=True) | ||
layers = catalogue.create("thinc", "layers", entry_points=True) | ||
|
||
@classmethod | ||
def get(cls, name, key): | ||
if not hasattr(cls, name): | ||
raise ValueError("Unknown registry: %s" % name) | ||
reg = getattr(cls, name) | ||
func = reg.get(name) | ||
if func is None: | ||
raise ValueError("Could not find %s in %s" % (name, key)) | ||
return func | ||
|
||
@classmethod | ||
def make_optimizer(name, args, kwargs): | ||
func = cls.optimizers.get(name) | ||
return func(*args, **kwargs) | ||
|
||
@classmethod | ||
def make_schedule(name, args, kwargs): | ||
func = cls.schedules.get(name) | ||
return func(*args, **kwargs) | ||
|
||
@classmethod | ||
def make_initializer(name, args, kwargs): | ||
func = cls.initializers.get(name) | ||
return func(*args, **kwargs) | ||
|
||
@classmethod | ||
def make_layer(cls, name, args, kwargs): | ||
func = cls.layers.get(name) | ||
return func(*args, **kwargs) | ||
|
||
@classmethod | ||
def make_combinator(cls, name, args, kwargs): | ||
func = cls.combinators.get(name) | ||
return func(*args, **kwargs) | ||
|
||
@classmethod | ||
def make_transform(cls, name, args, kwargs): | ||
func = cls.transforms.get(name) | ||
return func(*args, **kwargs) | ||
|
||
@classmethod | ||
def make_from_config(cls, config, id_start="@"): | ||
"""Unpack a config dictionary, creating objects from the registry | ||
recursively. | ||
""" | ||
id_keys = [key for key in config.keys() if key.startswith(id_start)] | ||
if len(id_keys) >= 2: | ||
raise ValueError("Multiple registry keys in config: %s" % id_keys) | ||
elif len(id_keys) == 0: | ||
return config | ||
else: | ||
getter = cls.get(id_keys[0].replace(id_start, ""), config[id_keys[0]]) | ||
args = [] | ||
kwargs = {} | ||
for key, value in config.items(): | ||
if isinstance(value, dict): | ||
value = cls.make_from_config(value, id_start=id_start) | ||
if isinstance(key, int) or key.isdigit(): | ||
args.append((int(key), value)) | ||
elif not key.startswith(id_start): | ||
kwargs[key] = value | ||
args = [value for key, value in sorted(args)] | ||
return getter(*args, **kwargs) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
from __future__ import unicode_literals | ||
|
||
import configparser | ||
import json | ||
from pathlib import Path | ||
|
||
|
||
class Config(dict): | ||
def __init__(self, data=None): | ||
dict.__init__(self) | ||
if data is None: | ||
data = {} | ||
self.update(data) | ||
|
||
def interpret_config(self, config): | ||
for section, values in config.items(): | ||
parts = section.split(".") | ||
node = self | ||
for part in parts: | ||
node = node.setdefault(part, {}) | ||
for key, value in values.items(): | ||
node[key] = json.loads(config.get(section, key)) | ||
|
||
def from_str(self, text): | ||
config = configparser.ConfigParser( | ||
interpolation=configparser.ExtendedInterpolation()) | ||
config.read_string(text) | ||
for key in list(self.keys()): | ||
self.pop(key) | ||
self.interpret_config(config) | ||
return self | ||
|
||
def from_bytes(self, byte_string): | ||
return self.from_str(byte_string.decode("utf8")) | ||
|
||
def from_disk(self, path): | ||
with Path(path).open("r", encoding="utf8") as file_: | ||
text = file_.read() | ||
return self.from_str(text) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.