Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get ML request validation to 100% #2621

Open
pquentin opened this issue Jun 18, 2024 · 0 comments
Open

Get ML request validation to 100% #2621

pquentin opened this issue Jun 18, 2024 · 0 comments
Assignees

Comments

@pquentin
Copy link
Member

For ML requests (not responses), 97.9% or the YAML tests pass type validation. Passing means that the specification is enough to type the request, which means that all clients using the validation will be able to send the request. If we get to 100%, then we can start treating any regression as errors, to hopefully stay at 100%.

Here are all the current issues, sorted by API. Sometimes the issue is in our tooling, so I'd appreciate if @elastic/ml-core and/or @elastic/ml-ui could verify what I'm saying here, and ideally send fixes. Thank you!

  • get_trained_models
    • include_model_definition is missing
  • post_data
    • body is specified as Array of data, but actual type is just a string that is somehow parsed afterwards?
  • preview_datafeed
    • The datafeed config accepts "indices" according to docs but 7 YAML tests send "indexes". Example
  • put_data_frame_analytics
    • body._meta is missing
  • put_datafeed
    • The datafeed config accepts "aggregations" according to docs (and spec) but "Test put datafeed with aggregations" test uses "aggs"
  • put_job
    • body._job_id is missing
    • body.analysis_limits.model_memory_limit can be sent as an integer
    • body.ignore_throttled is missing from docs and spec. In docs, it's supposed to only be possible inside indices_options
  • put_trained_model
    • bert_ja used by "Test put model config with Japanese tokenizer" missing from
      export class TokenizationConfigContainer {
      /** Indicates BERT tokenization and its options */
      bert?: NlpBertTokenizationConfig
      /**
      * Indicates MPNET tokenization and its options
      * @availability stack since=8.1.0
      * @availability serverless
      * */
      mpnet?: NlpBertTokenizationConfig
      /**
      * Indicates RoBERTa tokenization and its options
      * @availability stack since=8.2.0
      * @availability serverless
      * */
      roberta?: NlpRobertaTokenizationConfig
      }
  • update_job
    • description is missing from
      export class Detector {
      /**
      * The field used to split the data. In particular, this property is used for analyzing the splits with respect to their own history. It is used for finding unusual values in the context of the split.
      */
      by_field_name?: Field
      /**
      * Custom rules enable you to customize the way detectors operate. For example, a rule may dictate conditions under which results should be skipped. Kibana refers to custom rules as job rules.
      */
      custom_rules?: DetectionRule[]
      /**
      * A description of the detector.
      */
      detector_description?: string
      /**
      * A unique identifier for the detector. This identifier is based on the order of the detectors in the `analysis_config`, starting at zero. If you specify a value for this property, it is ignored.
      */
      detector_index?: integer
      /**
      * If set, frequent entities are excluded from influencing the anomaly results. Entities can be considered frequent over time or frequent in a population. If you are working with both over and by fields, you can set `exclude_frequent` to `all` for both fields, or to `by` or `over` for those specific fields.
      */
      exclude_frequent?: ExcludeFrequent
      /**
      * The field that the detector uses in the function. If you use an event rate function such as count or rare, do not specify this field. The `field_name` cannot contain double quotes or backslashes.
      */
      field_name?: Field
      /**
      * The analysis function that is used. For example, `count`, `rare`, `mean`, `min`, `max`, or `sum`.
      */
      function?: string
      /**
      * The field used to split the data. In particular, this property is used for analyzing the splits with respect to the history of all splits. It is used for finding unusual values in the population of all splits.
      */
      over_field_name?: Field
      /**
      * The field used to segment the analysis. When you use this property, you have completely independent baselines for each value of this field.
      */
      partition_field_name?: Field
      /**
      * Defines whether a new series is used as the null series when there is no value for the by or partition fields.
      * @server_default false
      */
      use_null?: boolean
      }

Also, put_trained_model_definition_part and put_trained_model_vocabulary don't have any known YAML tests.

And finally, there are issues caused by our recording of tests that split all comma-separated into lists which allows validating things like expand_wildcards but breaks for job_id and model_id. I'll fix that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants