-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
emoji: Process emoji keywords for flags. #31
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only thing that doesn't make sense here is the limited selection of possible tags 🤦♂️🤦♂️😊 Thanks for this, @wkyoshida! Will generate the data now and use all this as a basis for the numbers in #32 :)
@wkyoshida, did a bit of a rookie mistake here and didn't try to run it before merging... 😅 Read it and it made sense 😇 I commented out the code so I could do #32, but then when I try to run ---> 92 emoji_flags = Char.getBinaryPropertySet(UProperty.RGI_EMOJI_FLAG_SEQUENCE)
93 emoji_tags = Char.getBinaryPropertySet(UProperty.RGI_EMOJI_TAG_SEQUENCE)
94 regexp_flag_keyword = re.compile(r".*\: (?P<flag_keyword>.*)")
AttributeError: type object 'icu.Char' has no attribute 'getBinaryPropertySet' Do you have any ideas on this? Really happy to get the emojis in the table so we have some documentation of all the work we're putting in here, btw! 😊 |
I know! Those three definitely make sense as cases, but there definitely are others that should qualify too.
Hmm, so The output of
So that looks right. You may recall that PyICU does have some oddities as far as installing though, unfortunately.. There are other things that have to be installed as well. Would you by chance be using a different device or environment? Just want to check so that we cover our bases. |
I'm on the same device and the same environment supposedly, but then your comment reminds me that randomly my local env was missing some packages the other day 🤷♂️ Have it set up to auto activate on entering the dir, so that was a surprise. Would you say I should just reinstall the whole env as a first check? |
Could perhaps be a good sanity check at least, particularly since packages missing have happened recently too. If the issue is still there after, we can look at what else it could be. |
Hey @wkyoshida 👋 Deleted and remade the conda env and am still getting the issue 🤷♂️ I also am getting:
from |
😭 Let's see - So when you installed the dependencies for PyICU, did you go with Homebrew to install or one of the other installation methods (from these instructions)? Despite being on Ubuntu, I went with Homebrew. Running One thing that I could try is spin up an Ubuntu Docker container to get a clean slate, do some partial installations, and see if I can mimic what you're seeing perhaps. |
|
I played around with the installation a bit more and I'm still at a standstill with
This is new since the addition of Any other suggestions you have would be great 😊 |
Hmm I'm not sure. A container could help for development definitely, since we would both be able to use the same OS/environment (other contributors also). However, this wouldn't resolve the issue for users, I think. I'm guessing that the differences are coming from installation process differences due to OS/env. To make Scribe-Data a general-use package, I think that cross-platform usability should be something to aspire for. So kind of a bummer that we're running into issues with PyICU..
Yeah 😞 I spun up an Ubuntu container so I could get a clean slate to try some things too. Sadly, I wasn't able to actually get a working installation via Homebrew 🤔. I did get it working via the Debian option though. TBH I don't actually recall which installation option I went with months ago, so maybe I actually installed via Debian and then later just ran the Also, it does actually seem that a non-insignificant number of people have run into issues installing on Mac sadly - just looking at how more wordy the installation instructions for Mac is. Might notice also that there are links to GitLab Issues with additional asides for Mac too. For my sanity lol would you be able to share what steps you took? Also, have you tried the Macports option (no idea if it'd be different, but who knows)?
I guess we could. I would like to figure out what's happening of course, because I am considering that, if PyICU ends up being a dud, a worst-case scenario could be us having to look at a different option to get the same data. So there is a small chance of some unpleasant rework. But I'll get a PR with the JSONs ready tomorrow, and we can keep it on standby if it comes to do that. |
Sure thing! And I haven't tried the Macports option, but then looking at it I was like 🤨
Still the same error:
We'll figure this out somehow! 😊 And thanks for the willingness to send along the files for release purposes 🚀 |
Let's also not worry about getting this data in here right this moment, @wkyoshida :) I definitely have releasing today in mind as I've got loads for next week and also want to make sure the new stuff is up by the time I go to my Swift meetup on Wednesday 😅😄 We can take a look at it all and do some edits for a patch release in the coming weeks 😊 Making an issue for this might make sense just so we have it organized. There's also already some emoji stuff that I'm seeing that we might want to find a way to avoid — specifically the usage data is putting some emojis that some people wouldn't want to see all the time as suggestions ( Edit: I'm actually gonna go through and do a removal of some emojis in the process that have to do with relationships and such, as |
@wkyoshida, I'm gonna put it simply 😅 The creation of emoji_utils.py was one of the worst coding experiences of my life 😄 My autoformat changing emojis into their respective combinations and then undo reformatting and making it even worse meant that I had to restart soooo many times 🤦♂️😊 Let me know if you think that anything else should be added though. I think that what's there to be removed and what it leaves us with is ok. |
Hey @andrewtavis! So I was actually looking to ready the data yesterday, but then I noticed that the generated This is perhaps why I was also not understanding when you would point out things like:
Cause that IS the behavior that I see already 😆
Agreed, I was thinking the same. Wanted to avoid having to, though, if the problem was quick, but it looks like there might be some work here. #33 is a start
I understand that some emojis may not be ones that some may not want to see all the time. I'm with you 👍 Although, to play devil's advocate perhaps, there is a reason why these emojis are higher up in the popularity rankings though, no? In the aggregate, people do use them. Also, I'd say that all-round filtering them out does eliminate these emojis from being used in other cases where they could be more universally appropriate. Another thing is simply cultural or contextual differences. The suggested emojis, in many cases, do end up being the same across languages. However, there are instances when they differ. For instance, 💏 does appear for the German "mann", but it also does for the French "couple". Perhaps what I'm getting at is that we might want to be careful. Excluding something that some people might not want may mean excluding something that others expect. That is why I thought going off the standard (e.g. 🏴 🏴 🏴, tags with official support) and off researched usage (e.g. the popularity data) could be a decent baseline. However, my gut is that we might want to think through custom decisions thoroughly.
Yea.. 😄 I experienced that too as I was doing some of the emoji investigations a while back. Messing with Unicode can get frustrating. Also, thanks for sharing the installation steps you took! I'll think through them if there's anything that jumps out at me. |
The current JSONs come from the most recent runs of the process in the notebooks. In thinking about it, The only other major thing I've changed is the
I was getting base emojis before adding that in though, and it's solely to keep the size down as the dictionaries do maintain the rank on export to the JSONs.
I 100% agree. Makes sense that some of these would be more popular, and personally speaking removing any of the
Very much so! The ideality is that we delete emoji_utils.py. Big thing is that when demoing Scribe I also use "woman" as an example for that which would be "die", "la", etc, and I really don't want to have conversations with people about how the I'll look into this all a bit more before releasing 😊 You can likely tell that it didn't go through last night, which I'd say was the right decision as last night I was way too tired to do the proper checks. Thanks for thinking through all this with me! 🙏 |
f9df300 did a simple change to make the emoji comparison happen with unicode instead, which then prevented us from losing |
Gotchu! That makes sense; I think that the aspect with the demoing concerns are valid. For clarity, I'm definitely not opposed to custom decisions; just wanted to point out to think through them carefully too. With the emoji suggestions being WIP as they are though as you said, agreed that we'll continue to refine. |
Contributor checklist
Description
Reference the example above ⬆️
We are currently getting emoji keywords from the JSON list under
"default"
. Issue with flag emojis is that the only keyword in that list is the word "Flag" (or "Flagge" in the case of German) and not the corresponding territory of the flag, e.g. "Deutschland". That detail is in the JSON list under"tts"
. This change adds some quick regex to extract the territory name, "Deutschland" in the example, and add it as another keyword.Quick note: CLDR has a distinction between flags and tags. Tags are more for subdivision/regional flags. There are only three that are officially supported it seems, England, Scotland, and Wales.
Related issue