Releases · ufal/udpipe

16 Nov 12:47

foxik

v2.1.0

ba295ec

UDPipe 2.1.0 Latest

Latest

Compared to UDPipe 2.0.0:

Add support for using a morphological dictionary via ufal.morphodita during prediction – if the dictionary returns some analyses for a given form, we return the one most probable according to the predicted logits.
Add support for --no_single_root in the evaluation script.

Assets 2

16 Nov 09:01

foxik

v1.3.1

da13c46

UDPipe 1.3.1

Maintenance release of UDPipe1.

Changes since UDPipe 1.3.0:

Update MorphoDiTa to 1.11.2.

Assets 3

16 Feb 18:24

foxik

v1.3.0

14a5ba7

UDPipe 1.3.0

Maintenance release of UDPipe1.

Changes since UDPipe 1.2.0:

Get rid of UndefinedBehaviourSanitizer and AddressSanitizer findings.
Add segment_size and learning_rate_final parameters to tokenizer training.
Add several options to udpipe_server.
Fix bug in returning the trained model as a string; use bytes instead.
Fix a bug that newlines after URL/emails were considered just spaces.
Fix a silent error on aarch64 caused by assuming char is signed.
On Windows, the file paths are now UTF-8 encoded, instead of ANSI. This change affects the API, binary arguments, and program outputs.
The Windows binaries are now compiled with VS 2019, older systems than Windows 7 are no longer supported.
Add ARM64 macOS build.
The Python wheels are provided for Pythons 3.6-3.11.

Assets 3

05 Aug 10:49

foxik

v2.0.0

57e342c

UDPipe 2.0.0

Compared to UDPipe 1:

UDPipe 2 is Python-only and tested only in Linux,
UDPipe 2 is meant as a research tool, not as a user-friendly UDPipe 1 replacement,
UDPipe 2 achieves much better performance, but requires a GPU for reasonable performance,
UDPipe 2 does not perform tokenization by itself – it uses UDPipe 1 for that.

UDPipe 2 is available as a REST service running at https://lindat.mff.cuni.cz/services/udpipe. If you like, you can use the udpipe2_client.py script to interact with it.

However, if you prefer to run UDPipe 2 locally, you can use this release.

Running Inference with Existing Models

To run UDPipe 2, you need to first download a model from the list of UDPipe 2 models. Then you can run UDPipe 2 as a local REST server, and use the udpipe2_client.py script to interact with it (in the same way as with the official service).

To run the server, use the udpipe2_server.py script.

Install the requirements.txt. While only TF 1 is supported for model training (ancient, I know), you can use also TF 2 for inference.
The script has the following required options:
- port: the port to listen on. We use SO_REUSEPORT to allow multiple processes to run concurrently, supporting seamless upgrades;
- default_model: model name to use when no model is specified in the request;
- models: each model is then a quadruple of the following parameters (each published model contains a file MODEL.txt with these parameters):
  - model names: any number of model names separated by :; furthermore, any hyphen-separated prefix of any model name can be also used as a name (e.g., czech-pdt-ud-2.10-220711:cs_pdt-ud-2.10-220711:cs:ces:cze);
  - model path: path to the model directory;
  - treebank name: because multiple treebanks can be handled by a single model, we need to specify a treebank name to use (this also specifies which tokenizer to use from the model directory);
  - acknowledgements: a URL to the model's acknowledgements.
The script has the following optional parameters:
- --batch_size: batch size to use (default 32);
- --logfile: if specified, log to this file instead of standard error;
- --max_request_size: maximum request size, in bytes (default 4MB);
- --preload_models: list of models to preload (or all) immediately after start (default none);
- --threads: number of threads to use (default is to use all physical cores);
- --wembedding_server: for deployment purposes, it might be useful to compute the contextualized embeddings (mBERT, RobeCzech) not in the UDPipe 2 service, but in a specialized service – see https://github.com/ufal/wembedding_service for documentation of the wembeddings service (default is to compute the embeddings directly in the UDPipe 2 service).

The service can be stopped by a SIGINT (Ctrl+C) signal or by a SIGUSR1 signal. Once such a signal is received, the service stops accepting new requests, but waits until all existing connections are handled and closed.

The models are loaded on-demand, but they are never freed. If a GPU is available, then all computation is performed on it (and an OOM might occur if too many models are loaded). If you would like to run BERT on a GPU and the remaining computation on a CPU, you could use GPU-enabled wembeddings service plus a CPU-only UDPipe 2 service.

Assets 2

02 Aug 19:08

foxik

v1.2.0

3fd3a2c

UDPipe 1.2.0

Changes since UDPipe 1.1.0:

On-demand loading of models in REST server, with a pool of least recently used models.
Make GRU tokenizer dimension configurable (16, 24, 64 supported).
Track paragraph boundaries even under normalized_spaces.
Support experimental sentence segmentation using jointly both the tokenizer and the parser.
Add EPE output format.
Make default model in REST server explicit.
Support pre-filling according to URL params in the webapp.

Assets 3

29 Mar 10:24

foxik

v1.1.0

a938e36

UDPipe 1.1.0

Changes since UDPipe 1.0.0:

Morphodita_parsito models (now version 3) require at least UDPipe version 1.1.0.
CoNLL-U v2 format is supported. Notably spaces in forms and lemmas are now allowed, as are empty nodes.
Support options for input_format and output_format instances.
Preserve all spacing when tokenizing.
Optionally generate document-level token ranges in the original text.
Optionally respect given segmentation during tokenization.
Tokenizer can be trained to allow spaces in tokens (default if there are forms with spaces in the training data).
Parser can be trained to return always one root per sentence (default).
Improve input_format API to allow inter-block state (for correct tracking of inter-sentence spaces and document-level offsets).
Improve output_format API to support begin/end document marks and to allow state in the output_format instance (to allow numbering output sentences, for example).

Assets 3

27 May 06:54

foxik

v1.0.0

83d7c77

UDPipe 1.0.0

Initial public release.

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Inference with Existing Models

Releases: ufal/udpipe

UDPipe 2.1.0

UDPipe 1.3.1

UDPipe 1.3.0

UDPipe 2.0.0

Running Inference with Existing Models

UDPipe 1.2.0

UDPipe 1.1.0

UDPipe 1.0.0