-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add interpunct (·) as a keyword for dot product, etc. #3584
Conversation
A couple of questions:
|
Also, this doesn't need to happen right now, but |
The problem with those two is that they're in with a bunch of characters that I think are definitely alphabetic: i1 : apply(splice(150..160,180..190), x -> ascii {195, x})
o1 = (Ö, ×, Ø, Ù, Ú, Û, Ü, Ý, Þ, ß, à, ô, õ, ö, ÷, ø, ù, ú, û, ü, ý, þ) I'm not sure which is more important -- allowing characters like à in symbol names or × and as ÷ as keywords.
From what I understand from Wikipedia, since A0 is between 80 and 7FF (and can be represented in 11 bits), it gets encoded in two bytes. The first byte is 110 followed by the first 5 bits, and the second byte is 10 followed by the last 6 bits. In this case, A0 -> 00010100000, and so the first byte is 11000010, or 194. |
Can't we single out those specific characters rather than the whole range? |
tangentially related, while I was testing this:
Clearly the second error message is incorrect. (edited: and yes it's a moot point because can't create a new symbol if it's protected anyway.) This can be traced back to
the second def of getglobalsym is missing the line about invalid symbol. |
also, I'm confused about
first dictionary listed in |
70c9f06
to
4f705af
Compare
We definitely can! I've pushed an updated version with support for this. |
I haven't looked at the code in detail yet, but there are definitely differences in behaviour. For example, in the example above, |
something's not right:
after
|
BTW: Another question which I had avoided dealing with at the time of my own PR because of a conflict with a package is the following: should the user be allowed to define its own "mathematical symbols"? it's kind of silly because they can't create new binary operators anyway (or should that be changed?). and it can have weird effects: (this is run on non PR M2)
edited: to make the weirdness clearer:
|
I have wanted to define shuffle products using (ш), so I am following this (and the previous) thread. On a related note: Is it possible to define subscripted operators in Macaulay2? Something like ⊗_p or ш_p? |
Oops -- I definitely messed up mathematical operator characters inside symbols! I'll take a look. |
It's technically possible to define keywords Also I should mention that |
This is in better shape now. I've fixed it so that |
We need to look at two bytes to figure out whether certain unicode characters are math symbols, so we add a "peek2" function to take two bytes from the file and concatenate them together as an int for easy bitwise comparison with the unicode math operators: https://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode We also no longer exclude 226 (the first byte of many of these characters) from the ALPHA chartype, but instead add an "ismathoperator" check where appropriate.
Add a helper function that performs the last step for both cases. In particular, this means that we now check if the symbol is in the global case.
How about this symbol? https://decodeunicode.org/U+29E2 That doesn't seem to be the Cyrillic letter and comes from the Miscellaneous Mathematical Symbols block. |
yep, that one is already in the list of mathematical symbols (before and after this PR, in fact) |
I suppose it's as good a time as any to make ⧢ a keyword. @moorewf -- what should its precedence be? Maybe the same as |
I think we might be able to implement this as a binary operator without making too many changes to the interpreter. |
Yes, I considered that, but it feels like a hack for the sake of syntactic sugar, and at least in the case of tensor I decided it's not worth it. |
Could you also add ⊠ (\boxtimes) as a binary operator? i6 : 2⊠2
stdio:6:4:(3): error: no method for adjacent objects:
⊠ (of class Symbol)
SPACE 2 (of class ZZ) |
It would also be good to (eventually) add equivalent unicodes for all existing symbols, like ≥ as synonym for >=, etc. |
already the case I believe:
|
I went ahead and added both |
I think this is getting out of hand. We can't add math symbols one at a time, there are thousands of them. We should let the user define them. (in some other PR, that is -- I understand there are complications with that) |
We've only asked for like 5 total symbols in this thread because we have applications for them ... |
sure but there are plenty more. For example, it would be nice to have |
actually that one is fairly universal, so would make sense in this PR. |
Hmm actually I think (This is certainly out of scope here, I'll make a new issue) |
Any further comments? Is this ready to merge? |
I'm happy with it. |
edited: never mind everything works fine. keeping for the record, because there's definitely room for improvement, but there's no bug per se. In the changes in |
I've just noticed belatedly that this PR breaks this:
vs before
|
For this to work, we no longer count unicode characters beginning with 194 as alphabetic (just like how we don't count characters beginning with 226 as alphabetic). These are the "Latin-1 Punctuation and Symbols".
So afterwards, we can use this new keyword to define things like dot products without having to worry about surrounding it with spaces:
Closes: #3434