docs: add OS 101 to readme

docs: add OS 101 to readme docs: add OS 101 to readme
strvcom · Jan 16, 2023 · 2ebe646 · 2ebe646
1 parent c22770d
commit 2ebe646
Showing 1 changed file with 30 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -15,6 +15,7 @@ work with OpenSearch.
     - [Local lintering](#local-lintering)
     - [Versioning](#versioning)
     - [Contributers](#contributors)
+- [OpenSearch](#OpenSearch)
 - [Specs](#specs)
 
 ## <a name="installation">:computer: Installation</a>
@@ -257,6 +258,35 @@ For running linters from GitHub actions locally, you need to do the following.
 - [Niek Mereu](https://github.com/niekstrv)
 - [Vladimír Kadlec](https://github.com/vladimirkadlec-strv)
 
+## <a name="OpenSearch">:scroll: OpenSearch</a>
+
+### <a name="Analyzers">:clipboard: Analyzers</a>
+Analyzers are used to process text fields during indexing and search time. Analyzers are composed of one or more tokenizer and zero or more token filters. The tokenizer breaks the text into individual terms and the token filters are then applies to each term. This section contains a small overview of the analyzers available in OpenSearch. For analyzers in general, see [Analyzers in OpenSearch](https://opensearch.org/docs/latest/opensearch/analyzers/).
+
+- **Standard**
+  - Standard analyzers are used to index and search for complete words. Standard analyzers are composed of a tokenizer and a token filter. The standard analyzer uses the standard tokenizer. The standard tokenizer breaks text into terms on word boundaries. The standard analyzer also uses the lowercase token filter. The lowercase token filter converts all tokens to lowercase.
+  - The standard tokenizer breaks down words on word boundaries. It includes whitespaces, as well as many special characters. It might thus not be useful for data like Email addresses. An email address like "[email protected]" is broken down into "123", "strv", "2", "com".
+- **Whitespace**
+  - Whitespace analyzers are used to index and search for complete words. Whitespace analyzers are composed of a tokenizer and a token filter. The whitespace analyzer uses the whitespace tokenizer.
+  - Because it does not break words on special characters, it is more suitable for data like Email addresses.
+- **N-gram**
+  - N-gram analyzers are used to index and search for partial words. N-gram analyzers are composed of a tokenizer and a token filter. The tokenizer breaks the text into individual terms and the token filter creates n-grams for each term. N-gram analyzers are used to index and search for partial words.
+  - N-gram analyzers should not be used for large text fields. This is because n-grams can be very large and can consume a lot of disk space.
+  - Words lose their meaning when they are broken into n-grams. For example, the word "search" is broken into "se", "ea", "ar", "rc", "ch". This means that a search for "sea" will match the word "search".
+
+### <a name="field types">:card_index: Field Types</a>
+
+This section contains a small overview of the field types that are available in OpenSearch. For field types that are not discussed here, please refer to the [official documentation](https://opensearch.org/docs/opensearch/rest-api/create-index/).
+
+- **Keyword**
+  - Keyword fields are used to store data that is not meant to be analyzed. Keyword fields are not analyzed and support exact value searches, aggregations, and sorting. Keyword fields are also used to create multi-fields, which are fields with the same name but different field types.
+  - Keyword fields are not analyzed and support exact value searches, aggregations, and sorting.
+  - Keyword fields are not used for full-text search.
+
+- **Text**
+  - Text fields are used to store textual data, such as the body of an email or the description of a product in an e-commerce store. Text fields are analyzed and support full-text search.
+
+
 ## <a name="specs">:books: Specs</a>
 
 #### Requirements