Get ML request validation to 100% #2621

pquentin · 2024-06-18T13:45:48Z

For ML requests (not responses), 97.9% or the YAML tests pass type validation. Passing means that the specification is enough to type the request, which means that all clients using the validation will be able to send the request. If we get to 100%, then we can start treating any regression as errors, to hopefully stay at 100%.

Here are all the current issues, sorted by API. Sometimes the issue is in our tooling, so I'd appreciate if @elastic/ml-core and/or @elastic/ml-ui could verify what I'm saying here, and ideally send fixes. Thank you!

get_trained_models
- include_model_definition is missing
post_data
- body is specified as Array of data, but actual type is just a string that is somehow parsed afterwards?
preview_datafeed
- The datafeed config accepts "indices" according to docs but 7 YAML tests send "indexes". Example
put_data_frame_analytics
- body._meta is missing
put_datafeed
- The datafeed config accepts "aggregations" according to docs (and spec) but "Test put datafeed with aggregations" test uses "aggs"
put_job
- body._job_id is missing
- body.analysis_limits.model_memory_limit can be sent as an integer
- body.ignore_throttled is missing from docs and spec. In docs, it's supposed to only be possible inside indices_options

put_trained_model

bert_ja used by "Test put model config with Japanese tokenizer" missing from

elasticsearch-specification/specification/ml/_types/inference.ts

Lines 114 to 129 in 33e044d

    
           export class TokenizationConfigContainer { 
        
             /** Indicates BERT tokenization and its options */ 
        
             bert?: NlpBertTokenizationConfig 
        
             /** 
        
              * Indicates MPNET tokenization and its options 
        
              * @availability stack since=8.1.0 
        
              * @availability serverless 
        
              * */ 
        
             mpnet?: NlpBertTokenizationConfig 
        
             /** 
        
              * Indicates RoBERTa tokenization and its options 
        
              * @availability stack since=8.2.0 
        
              * @availability serverless 
        
              * */ 
        
             roberta?: NlpRobertaTokenizationConfig 
        
           }

update_job

description is missing from

elasticsearch-specification/specification/ml/_types/Detector.ts

Lines 25 to 67 in 33e044d

    
           export class Detector { 
        
             /** 
        
              * The field used to split the data. In particular, this property is used for analyzing the splits with respect to their own history. It is used for finding unusual values in the context of the split. 
        
              */ 
        
             by_field_name?: Field 
        
             /** 
        
              * Custom rules enable you to customize the way detectors operate. For example, a rule may dictate conditions under which results should be skipped. Kibana refers to custom rules as job rules. 
        
              */ 
        
             custom_rules?: DetectionRule[] 
        
             /** 
        
              * A description of the detector. 
        
              */ 
        
             detector_description?: string 
        
             /** 
        
              * A unique identifier for the detector. This identifier is based on the order of the detectors in the `analysis_config`, starting at zero. If you specify a value for this property, it is ignored. 
        
              */ 
        
             detector_index?: integer 
        
             /** 
        
              * If set, frequent entities are excluded from influencing the anomaly results. Entities can be considered frequent over time or frequent in a population. If you are working with both over and by fields, you can set `exclude_frequent` to `all` for both fields, or to `by` or `over` for those specific fields. 
        
              */ 
        
             exclude_frequent?: ExcludeFrequent 
        
             /** 
        
              * The field that the detector uses in the function. If you use an event rate function such as count or rare, do not specify this field. The `field_name` cannot contain double quotes or backslashes. 
        
              */ 
        
             field_name?: Field 
        
             /** 
        
              * The analysis function that is used. For example, `count`, `rare`, `mean`, `min`, `max`, or `sum`. 
        
              */ 
        
             function?: string 
        
             /** 
        
              * The field used to split the data. In particular, this property is used for analyzing the splits with respect to the history of all splits. It is used for finding unusual values in the population of all splits. 
        
              */ 
        
             over_field_name?: Field 
        
             /** 
        
              * The field used to segment the analysis. When you use this property, you have completely independent baselines for each value of this field. 
        
              */ 
        
             partition_field_name?: Field 
        
             /** 
        
              * Defines whether a new series is used as the null series when there is no value for the by or partition fields. 
        
              * @server_default false 
        
              */ 
        
             use_null?: boolean 
        
           }

Also, put_trained_model_definition_part and put_trained_model_vocabulary don't have any known YAML tests.

And finally, there are issues caused by our recording of tests that split all comma-separated into lists which allows validating things like expand_wildcards but breaks for job_id and model_id. I'll fix that.

The text was updated successfully, but these errors were encountered:

pquentin added the missing type label Jun 18, 2024

davidkyle self-assigned this Jun 18, 2024

pquentin mentioned this issue Jun 19, 2024

Fix errors around arrays in ML requests #2623

Merged

pquentin mentioned this issue Aug 6, 2024

Fix Get trained models statistics API types #2763

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get ML request validation to 100% #2621

Get ML request validation to 100% #2621

pquentin commented Jun 18, 2024

Get ML request validation to 100% #2621

Get ML request validation to 100% #2621

Comments

pquentin commented Jun 18, 2024