Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor & test deployment configuration (e.g. RulesTxtDeploymentService) for Elasticsearch support #56

Open
pbartusch opened this issue Dec 1, 2020 · 13 comments

Comments

@pbartusch
Copy link
Collaborator

Deployment possibilities for SMUI have grown rapidly. The configuration is hard to understand & corresponding code is hard to maintain - this includes:

  • RulesTxtDeploymentService
  • application.conf

especially.

Approach:

Step#1: document all deployment possibilities, that should be supported by SMUI (already take future Elasticsearch support , #43 , into account).
Step#2: derive a config schema (for application.conf).
Step#3: refactor the code (breaking change)

@pbartusch pbartusch changed the title Refactor & test deployment configuration (RulesTxtDeploymentService) Refactor & test deployment configuration (e.g. RulesTxtDeploymentService) for Elasticsearch support Dec 15, 2020
@pbartusch
Copy link
Collaborator Author

pbartusch commented Dec 15, 2020

The major goal of this story is to:

  • Refactor the deployment part of SMUI, so that the complexity of 2nd search engine (Elasticsearch) supported, can be handled & maintained in the future.
  • Include a ES deployment procedure as a proof of concept (and by that prepare Support for Elasticsearch #43).

Constraints:

  • As the deployment configuration needs to evolve a breaking change to v4 is suggested. Prior versions of SMUI need to be adjusted to that new configuration specification.
  • Within the scope of this issue, the deployment options within SMUI should be reworked and realise a deployment to Elasticsearch with status-quo rules.txt files comming from SMUI. Adjusting & validating Solr- vs Elasticsearch-rules should be part of Support for Elasticsearch #43.

SMUI's deployment options (plan):

  • local deployment: for DEV setup, no need for a Querqy-enabled search engine.
  • solr-local deployment: evolution of conf/smui2solr.sh.
  • git-repository deployment: evolution of conf/smui2git.sh.
  • elasticsearch deployment: new deployment procedure to support Elasticsearch.
  • Furthermore custom deployment procedures should be supported. The deployment procedure from Chorus is a good example. It should be adopted as a proof of concept for a custom deployment procedure.

The following refactoring steps are suggested in order to sustain maintainability for SMUI with respect to the deployment options:

  • All shell scripts realising the different deployment procedures should be bundled in: conf/deployment). Renaming should be done accordingly (e.g. conf/deployment/solr-local.sh, see above).
  • All deployment shell scripts should be operatable in an "ECHO mode" (echoing an identification and all its command line params) in order to be testable without the target system setup (e.g. Solr) being present (see e.g.: test/services/RulesTxtDeploymentServiceConfigVariantsSpec.scala).
  • The concept of a "solr index" should evolve to a "rules collection". This includes:
    • the database schema (SQL),
    • the scala model (e.g.: app/models/SolrIndex.scala and all of its references),
    • the REST API (e.g.: /api/v1/solr-index, see conf/routes),
    • the documentation (setup instructions and Chorus),
    • the frontend code.
  • Resolve all deployment configurations done in app/models/FeatureToggleModel.scala into an explicit deployment model under app/models/config.
  • Refactor app/services/RulesTxtDeploymentService.scala accordingly.
  • Temp files should be an implementation detail of SMUI (if necessary). There should exist an explicit export folder for rules.txt files (as a default to all the deployment scripts above).
  • local should be the default deployment (especially for a "Quickstart", see documentation).

Explicit deployment configuration:

  • Configuration options are unstructured and "all over the place" in v3. Those should be resolved (see above) and made explicit (using custom parameters specific to the deployment procedure), e.g.:
smui.deployment.PRELIVE = {
  'procedure': 'conf/deployment/git-repository.sh',
  'params': {
    'repo': 'https+ssh://my-repo-on.domain.tld'
    ...
  }
}
  • The interface to the deployment script could like this:
{SMUI_DEPLOYMENT_PROCEDURE}.sh {DEPLOYMENT_INSTANCE} {RULES_COLLECTION_NAME} {EXPORT_PATH} {RULES_TXT_FILE(S) as ordered comma separated list} {PROCEDURE_SPECIFIC_PARAMS as --key=value}

e.g.:
git-repository.sh PRELIVE ecommerce /export common-rules.txt,decompound-rules.txt,spelling-rules.txt --repo=https+ssh://my-repo-on.domain.tld ...
  • There could also be the possibility to address a native Scala deployment procedure, e.g.:
smui.deployment.PRELIVE = {
  'procedure': 'services.deployment.ElasticsearchDeployment',
  'params': {
    'url': 'https://my-elasticsearch-instance-on.domain.tld'
    ...
  }
}
  • Configuration should be done in SMUI through an explicit smui.conf file (like Chorus does it). There should be no env var option as the configuration is too complex (local deployment will remain default).
  • Explicit validation of SMUI startup (presenting an error on misconfiguration) is needed.
  • As the change for SMUI - with this planned v4 - is breaking it should be considered to split configuration into "setup" & "customisation" in general, where only "setup" configurations can be controlled via env vars, and all "customisation" configurations should be done via a smui.conf (see above, this could account for e.g. toggle.activate-spelling). This should include the tag configuration, now being done via an explicit, extra JSON file.
  • The documentation on querqy-docs for SMUI must be adopted accordingly.

Note: As time of planning this major change, SMUI refactorings (splitting frontend & backend implementation) take place. The following branches are relevant:

@epugh
Copy link
Contributor

epugh commented Dec 15, 2020

I'm planning on removing the jackhanna script in favour of the single upload capability for ConfigSets, which should probably be how the zk-solr-cloud.sh interacts with Solr! Maybe rename it to solr-cloud.sh? See querqy/chorus#22.

@renekrie
Copy link
Collaborator

@epugh @pbartusch Please keep in mind that querqy/querqy#76 will be a breaking change: the rules.txt as no longer be deployed as such but the rules will be embedded into a JSON HTTP request (very similar to Querqy for ES). Also, the direct interaction with ZK or any direct interaction with the configset will be removed (and the collection reload as well).

It is very likely, that we can test a release candidate in production as soon as January. I think we need this kind of 'beta version' this time given the scope of the change.

Long story short: please do not invest any time into making the current deployment of rules.txt to Solr better - it will be replaced very soon.

@pbartusch
Copy link
Collaborator Author

@renekrie , thanks for the hint.

Long story short: please do not invest any time into making the current deployment of rules.txt to Solr better - it will be replaced very soon.

that is not the plan. the focus of the concept described above lies on different deployment options in general.

@renekrie
Copy link
Collaborator

that is not the plan. the focus of the concept described above lies on different deployment options in general.

@pbartusch I was a bit worried because earlier you said:

Chorus should be adjusted to the newly adopted zk-solr-cloud deployment procedure as a first proof of concept.

I assume that zk-solr-cloud deployment will become outdated very soon.

@pbartusch
Copy link
Collaborator Author

ah. got it. ok , it wasnt ment to the be the focus, but I understand the concern. Thanks , @renekrie .

Then it seems better to make the smui2solrcloud.sh a proof of concept for a custom deployment procedure. I will adjust #56 (comment) accordingly.

@pbartusch
Copy link
Collaborator Author

@epugh , now I got your point as well. Regarding:

[...] in favour of the single upload capability for ConfigSets, which should probably be how the zk-solr-cloud.sh interacts with Solr

I suggest to add this deployment procedure (once its available in Solr/Querqy) to SMUI instead of Chorus as the solr-cloud.sh you suggested.

I will not make this part of this issue/story (obviously ;-)), but we should develop it within the scope of SMUI and adjust Chorus accordingly.

@pbartusch
Copy link
Collaborator Author

pbartusch commented Dec 15, 2020

@renekrie , will there stay the solr-local deployment procedure possibility in Solr? (meaning: cp the rules.txt and then perform a core reload)

Or will that be deprecated as well?

@renekrie
Copy link
Collaborator

This will be the same HTTP call like for SolrCloud

@renekrie
Copy link
Collaborator

renekrie commented Dec 19, 2020

Just a heads-up: I've just merged a PR for querqy/querqy#116 to querqy-core.

This would give you the option to manage ES/Solr specifics via templates in the rules file. For example, a down boost on a field could look like this:

notebook =>
  UP(10): asus
  << field_down: factor=20 || fieldname=category || value=accessories >>

At the beginning of the file, you would have to prepend the search-engine-specific template:

# either Solr:
def field_down(factor, fieldname, value):
  DOWN($factor): * $fieldname:(value)

# or Elasticsearch:
def field_down(factor, fieldname, value):
  DOWN($factor): *  "match": { "$fieldname": { "query": "$value" }}

If it helps, we could probably add docstring documentation to the templates à la:

def field_down(factor, fieldname, value):
"""Use this to penalise documents that contain a certain value in the specified field.

  :param factor: the penalisation factor
  :param fieldname: the field name
  :param value: the field value
  :type factor: float
  :type fieldname: string
  :type value: string
"""
DOWN($factor): * $fieldname:(value)

This would probably enable SMUI to generate a form input in the UI from the template. At the most advanced end, we could let users create and manage their own templates in SMUI, including for more complex function queries.

@dobestler
Copy link
Contributor

dobestler commented Apr 24, 2021

Do you think it might be useful to have the ability to define a raw query to a rule as well (i.e. everything after the '*')? E.g. as a specific option in the UI instead of choosing from suggested fields and putting a field for a value. The advantage would be to enable basically all use cases for rules through SMUI. It could enable Elastic Rules completely as a first step and circumvent the templates discussion and similar approaches.
Tradeoff being the higher risk of human error when writing raw query syntax unless there is validation added to these inputs.

Update: It seems to be already possible through toggle.ui-concept.all-rules.with-solr-fields=false which renders the Term as is and does not throw any validation errors. So import, UI edit, export seems to be all working with Elastic Rules.

@Paul-Blanchaert
Copy link
Contributor

@pbartusch
Is there some activity planned on this issue?
While refactoring, could the concept of SOLR_BASE_URL (e.g. http://localhost:8983/solr) versus SOLR_HOST (that then gets hardcoded build to the SOLR_BASE_URL). The advantage of the SOLR_BASE_URL would be that it will enable the customer to use http and https (and a possible different application root replacing "/solr").

@epugh
Copy link
Contributor

epugh commented Nov 4, 2021

See #82 which is specific to @pbartusch comment back in December 2020!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants