Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralizing the emoji keyword generation logic #379

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions src/scribe_data/unicode/generate_emoji_keyword.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
"""
centralized emoji_keyword file
.. raw:: html
<!--
* Copyright (C) 2024 Scribe
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <https://www.gnu.org/licenses/>.
-->
"""

import argparse
import json
from pathlib import Path

from scribe_data.unicode.process_unicode import gen_emoji_lexicon
from scribe_data.utils import export_formatted_data

DATA_TYPE = "emoji-keywords"
EMOJIS_PER_KEYWORD = 3

# Define the path to the languages JSON file.
LANGUAGES_JSON = Path(__file__).parent / "supported_language.json"


def main(file_path):
# Read the language codes and names from the JSON.
with open(LANGUAGES_JSON, "r", encoding="utf-8") as f:
languages = json.load(f)

for code, language in languages.items():
print(f"Generating emoji keywords for {language} ({code})...")

language_dir = file_path / f"{language}"
emoji_dir = language_dir / "emoji_keywords"
init_file = emoji_dir / "__init__.py"

# Ensure that the emoji_keywords directory and __init__.py file exist.
emoji_dir.mkdir(parents=True, exist_ok=True)

if not init_file.exists():
# Create the __init__.py file if it doesn't exist.
init_file.touch()
print(f"Created __init__.py in {emoji_dir}.")

if emoji_keywords_dict := gen_emoji_lexicon(
language=language,
emojis_per_keyword=EMOJIS_PER_KEYWORD,
):
export_formatted_data(
file_path=emoji_dir / f"{code}_emoji_keywords.json",
formatted_data=emoji_keywords_dict,
query_data_in_use=True,
language=language,
data_type=DATA_TYPE,
)
print(f"Emoji keywords for {language} saved.\n")


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--file-path", required=True, help="Path to save the emoji keywords files."
)
args = parser.parse_args()

# Ensure the directory exists.
output_dir = Path(args.file_path)
output_dir.mkdir(parents=True, exist_ok=True)

# Call the main function.
main(output_dir)
124 changes: 124 additions & 0 deletions src/scribe_data/unicode/process_unicode.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,130 @@
def gen_emoji_lexicon(
language: str,
emojis_per_keyword: int,
gender=None,
region=None,
):
"""
Generate emoji lexicon for a given language with optional gender and region customizations.

Parameters:
- language (str): The language for which emoji keywords are generated.
- emojis_per_keyword (int): Number of emojis to associate with each keyword.
- gender (str, optional): Gender-based customization for emojis (e.g., "male", "female").
- region (str, optional): Regional customization for emojis (e.g., "US", "JP").

Returns:
- dict: A dictionary containing emoji keywords and associated emojis.
"""

# Initialize the emoji dictionary
emoji_keywords_dict = {}

# Define grouped languages and their specific languages
grouped_languages = {
"Hindustani": ["Hindi", "Urdu"],
"Norwegian": ["Bokmål", "Nynorsk"],
# Add more grouped languages as needed
}

# Function to add emojis based on gender and region
def add_emojis_for_gender_region(lang, gender, region):
"""
This function generates a set of emojis based on the specified language,
gender, and region. It aims to ensure that the emojis are relevant and
culturally appropriate for the given context.

Parameters:
- lang (str): The language for which emojis are being generated. This could
affect the representation of certain emojis or their usage.
- gender (str): A string that indicates the gender for which emojis should
be selected. Accepted values are "male" and "female".
- region (str): A string representing the geographical region, which can
influence the selection of emojis to include those that are culturally
significant or popular in that area (e.g., "IN" for India).

Implementation Details:
1. **Placeholder Logic**:
- The function currently contains placeholder comments indicating where
the actual logic for selecting emojis should be implemented. This allows
contributors to easily identify where to add the necessary emoji-selection
logic based on gender and region.

2. **Gender-Based Emoji Selection**:
- The function checks the gender parameter. Depending on whether the
gender is "male" or "female", different sets of emojis should be
included. For example, if the gender is "male", the logic for selecting
male-specific emojis will be executed. Similarly, for "female",
female-specific emojis should be considered.
- The `pass` statement is a placeholder for the logic that should be
implemented later. This could involve referencing a predefined list of
emojis or generating emojis based on specific criteria related to gender.

3. **Region-Based Emoji Selection**:
- The function also checks the region parameter. If the region is "IN",
the logic for selecting emojis that are relevant to India will be executed.
- Just like with gender, the `pass` statement indicates where to add
this logic. The selected emojis should reflect cultural significance or
popular usage in the specified region.

4. **Returning Emoji Data**:
- The function is designed to return a dictionary containing the emojis that
have been selected based on the provided parameters. The current implementation
returns an empty dictionary, which should be replaced with the actual logic to
populate it with emoji data generated from the gender and region logic.

Need for Modularity:
- As the project scales and the emoji selection logic becomes more complex,
it is essential to keep the code modular. This means separating different
functionalities into distinct modules or files.


This function serves as a foundational component for generating
emojis tailored to specific user demographics, and implementing it in a
modular fashion will support future enhancements and maintenance.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes Sense!! Does that mean there's a lot more scope for emoji generation by expanding the demographics? Just confirming. Actually sounds interesting. Can we do anything in reference to this? Maybe expand the emoji generation? What do you say @Ekikereabasi-Nk and @andrewtavis .
Thank you @Ekikereabasi-Nk for the work.


if gender == "male":
# Include male-specific emojis logic
pass
elif gender == "female":
# Include female-specific emojis logic
pass

if region == "IN":
# Include region-specific emojis logic
pass

# Return any generated emoji data for the given language
return {}

# Check if the language is a grouped language
for grouped_language, sub_languages in grouped_languages.items():
if language == grouped_language:
# Process each sub-language in the grouped language
for sub_lang in sub_languages:
print(f"Processing sub-language: {sub_lang}")
# Add emojis for each sub-language based on gender and region
emojis = add_emojis_for_gender_region(sub_lang, gender, region)
emoji_keywords_dict[sub_lang] = emojis # Add to the dictionary

# If you want to combine results for the grouped language
emoji_keywords_dict[grouped_language] = emoji_keywords_dict

return emoji_keywords_dict # Return the dict for grouped languages

# If it's not a grouped language, handle it as a single language
else:
# Generate emojis for the given single language
emojis = add_emojis_for_gender_region(language, gender, region)
emoji_keywords_dict[language] = emojis

return emoji_keywords_dict


def gen_emoji_lexicon_old(
language: str,
emojis_per_keyword: int,
):
"""
Generates a dictionary of keywords (keys) and emoji unicode(s) associated with them (values).
Expand Down
Loading