Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interpunct (·) as a keyword for dot product, etc. #3584

Merged
merged 7 commits into from
Nov 26, 2024

Conversation

d-torrance
Copy link
Member

For this to work, we no longer count unicode characters beginning with 194 as alphabetic (just like how we don't count characters beginning with 226 as alphabetic). These are the "Latin-1 Punctuation and Symbols".

So afterwards, we can use this new keyword to define things like dot products without having to worry about surrounding it with spaces:

i1 : Vector · Vector := (v, w) -> ((transpose v#0) * w#0)_(0, 0);

i2 : v = vector {1, 2, 3}; w = vector {4, 5, 6};

       3
o2 : ZZ

       3
o3 : ZZ

i4 : v·w

o4 = 32

Closes: #3434

@d-torrance d-torrance requested review from mahrud and pzinn November 18, 2024 16:53
@mahrud
Copy link
Member

mahrud commented Nov 18, 2024

A couple of questions:

  1. While you're doing this, could you also add exceptions for U+00D7 and U+00F7 for \times and \div?
  2. I don't understand where 194 is coming from. Latin-1 punctuation and symbols starts at U+00A0 which is 160, no?

@mahrud
Copy link
Member

mahrud commented Nov 18, 2024

Also, this doesn't need to happen right now, but texMath symbol · should give \cdot (and maybe same for a handful others that have tex names).

@d-torrance
Copy link
Member Author

  1. While you're doing this, could you also add exceptions for U+00D7 and U+00F7 for \times and \div?

The problem with those two is that they're in with a bunch of characters that I think are definitely alphabetic:

i1 : apply(splice(150..160,180..190), x -> ascii {195, x})

o1 = (Ö, ×, Ø, Ù, Ú, Û, Ü, Ý, Þ, ß, à, ô, õ, ö, ÷, ø, ù, ú, û, ü, ý, þ)

I'm not sure which is more important -- allowing characters like à in symbol names or × and as ÷ as keywords.

  1. I don't understand where 194 is coming from. Latin-1 punctuation and symbols starts at U+00A0 which is 160, no?

From what I understand from Wikipedia, since A0 is between 80 and 7FF (and can be represented in 11 bits), it gets encoded in two bytes. The first byte is 110 followed by the first 5 bits, and the second byte is 10 followed by the last 6 bits. In this case, A0 -> 00010100000, and so the first byte is 11000010, or 194.

M2/Macaulay2/m2/latex.m2 Outdated Show resolved Hide resolved
@mahrud
Copy link
Member

mahrud commented Nov 18, 2024

The problem with those two is that they're in with a bunch of characters that I think are definitely alphabetic:

Can't we single out those specific characters rather than the whole range?

@pzinn
Copy link
Contributor

pzinn commented Nov 19, 2024

tangentially related, while I was testing this:

i1 : getSymbol "⟎"

o1 = ⟎

o1 : Symbol

i2 : getSymbol "⟎⟎"
stdio:2:9:(3): error: invalid symbol

i3 : getGlobalSymbol "⟎"

o3 = ⟎

o3 : Symbol

i4 : getGlobalSymbol "⟎⟎"
stdio:4:15:(3): error: attempted to create symbol in protected dictionary

Clearly the second error message is incorrect. (edited: and yes it's a moot point because can't create a new symbol if it's protected anyway.) This can be traced back to actors5.d

getglobalsym(d:Dictionary,s:string):Expr := (
     w := makeUniqueWord(s,parseWORD);
     when lookup(w,d.symboltable) is x:Symbol do Expr(SymbolClosure(globalFrame,x))
     is null do (
          if !isvalidsymbol(s) then return buildErrorPacket("invalid symbol");
	  if d.Protected then return buildErrorPacket("attempted to create symbol in protected dictionary");
	  t := makeSymbol(w,tempPosition,d);
	  globalFrame.values.(t.frameindex)));

getglobalsym(s:string):Expr := (
     w := makeUniqueWord(s,parseWORD);
     when globalLookup(w)
     is x:Symbol do Expr(SymbolClosure(if x.thread then threadFrame else globalFrame,x))
     is null do (
	  if globalDictionary.Protected then return buildErrorPacket("attempted to create symbol in protected dictionary");
	  t := makeSymbol(w,tempPosition,globalDictionary);
	  globalFrame.values.(t.frameindex)));

the second def of getglobalsym is missing the line about invalid symbol.
but why can't the second definition just call the first with globalDictionary? the only difference is this thread thing which I don't fully understand.

@pzinn
Copy link
Contributor

pzinn commented Nov 19, 2024

also, I'm confused about help getGlobalSymbol:

If dict is omitted, then the first symbol found in the dictionaries listed in [dictionaryPath](http://localhost:8002/home/pzinn/M2/M2/BUILD/fedora/usr-dist/common/share/doc/Macaulay2/Macaulay2Doc/html/_dictionary__Path.html) will be returned. If none is found, one will be created in the first dictionary listed in dictionaryPath, unless it is not mutable, in which case an error will be signalled; perhaps that behavior should be changed.

first dictionary listed in dictionaryPath??? it's Varieties.Dictionary...

@d-torrance d-torrance force-pushed the cdot branch 2 times, most recently from 70c9f06 to 4f705af Compare November 19, 2024 04:45
@d-torrance
Copy link
Member Author

Can't we single out those specific characters rather than the whole range?

We definitely can! I've pushed an updated version with support for this.

@pzinn
Copy link
Contributor

pzinn commented Nov 19, 2024

I haven't looked at the code in detail yet, but there are definitely differences in behaviour. For example, in the example above, getSymbol "⟎⟎" no longer causes an error message.

@pzinn
Copy link
Contributor

pzinn commented Nov 19, 2024

something's not right:
before

i1 : a⊗b
stdio:1:1:(3): error: no method for binary operator ** applied to objects:
            a (of class Symbol)
     **     b (of class Symbol)

after

i1 : a⊗b

o1 = a⊗b

o1 : Symbol

@pzinn
Copy link
Contributor

pzinn commented Nov 19, 2024

BTW: Another question which I had avoided dealing with at the time of my own PR because of a conflict with a package is the following: should the user be allowed to define its own "mathematical symbols"? it's kind of silly because they can't create new binary operators anyway (or should that be changed?). and it can have weird effects: (this is run on non PR M2)

i1 : a⊗b
stdio:1:1:(3): error: no method for binary operator ** applied to objects:
            a (of class Symbol)
     **     b (of class Symbol)

i2 : getSymbol "⊗"

o2 = ⊗

o2 : Keyword

i3 : a⊗b

i4 : 

edited: to make the weirdness clearer:

i3 : debugLevel=123

o3 = 123
------------ top of loop
-- bumpLineNumber
-- promptWanted
-- file: stdio:4:0
-- topLevelPrompt: previousLineNumber = -1; lineNumber = 4
-- topLevelPrompt: prompt = "[3]  \ni4 : "
[3]  
i4 : getSymbol "⊗"
-- next character: g
-- ordinary token, ready to parse: stdio:4:0:(3):
-- parsing successful
-- parse tree size: 144

o4 = ⊗

o4 : Keyword
------------ top of loop
-- bumpLineNumber
-- promptWanted
-- file: stdio:5:0
-- topLevelPrompt: previousLineNumber = -1; lineNumber = 5
-- topLevelPrompt: prompt = "[4]  \ni5 : "
[4]  
i5 : a⊗b
-- next character: a
-- ordinary token, ready to parse: stdio:5:0:(3):
-- parsing successful
-- parse tree size: 200
/home/pzinn/temp/M2/M2/BUILD/fedora/usr-dist/x86_64-Linux-Fedora-35/bin/M2-binary: error: dummy binary function called
------------ top of loop
-- bumpLineNumber
-- promptWanted
-- file: stdio:6:0
-- topLevelPrompt: previousLineNumber = -1; lineNumber = 6
-- topLevelPrompt: prompt = "[5]  \ni6 : "
[5]  

@moorewf
Copy link
Contributor

moorewf commented Nov 19, 2024

I have wanted to define shuffle products using (ш), so I am following this (and the previous) thread.

On a related note: Is it possible to define subscripted operators in Macaulay2? Something like ⊗_p or ш_p?

@d-torrance
Copy link
Member Author

Oops -- I definitely messed up mathematical operator characters inside symbols! I'll take a look.

M2/Macaulay2/d/ctype.d Outdated Show resolved Hide resolved
M2/Macaulay2/d/ctype.d Outdated Show resolved Hide resolved
@mahrud
Copy link
Member

mahrud commented Nov 20, 2024

On a related note: Is it possible to define subscripted operators in Macaulay2? Something like ⊗_p or ш_p?

It's technically possible to define keywords **_ or ⊗_ or ш_, but I think they would have to be a new kind of trenary operator with new parsing and evaluation logic to handle A **_ B C.

Also I should mention that M **_R N is defined as tensor(M, f, N) where M and N are modules and f is a ring map from R, but it's essentially evaluating tensor(M, tensor(f, N)).

M2/Macaulay2/d/ctype.d Outdated Show resolved Hide resolved
@d-torrance
Copy link
Member Author

This is in better shape now. I've fixed it so that getSymbol should once again raise an error if its argument contains a mathematical operator characters as a proper substring. We also check for some more mathematical operators, including ш.

M2/Macaulay2/d/ctype.d Outdated Show resolved Hide resolved
We need to look at two bytes to figure out whether certain unicode
characters are math symbols, so we add a "peek2" function to take two
bytes from the file and concatenate them together as an int for easy
bitwise comparison with the unicode math operators:
https://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode

We also no longer exclude 226 (the first byte of many of these
characters) from the ALPHA chartype, but instead add an
"ismathoperator" check where appropriate.
Add a helper function that performs the last step for both cases.  In
particular, this means that we now check if the symbol is in the
global case.
@moorewf
Copy link
Contributor

moorewf commented Nov 22, 2024

How about this symbol?

https://decodeunicode.org/U+29E2

That doesn't seem to be the Cyrillic letter and comes from the Miscellaneous Mathematical Symbols block.

@pzinn
Copy link
Contributor

pzinn commented Nov 22, 2024

How about this symbol?

https://decodeunicode.org/U+29E2

That doesn't seem to be the Cyrillic letter and comes from the Miscellaneous Mathematical Symbols block.

yep, that one is already in the list of mathematical symbols (before and after this PR, in fact)

@d-torrance
Copy link
Member Author

I suppose it's as good a time as any to make ⧢ a keyword. @moorewf -- what should its precedence be? Maybe the same as **?

@d-torrance
Copy link
Member Author

It's technically possible to define keywords **_ or ⊗_ or ш_, but I think they would have to be a new kind of trenary operator with new parsing and evaluation logic to handle A **_ B C.

I think we might be able to implement this as a binary operator without making too many changes to the interpreter. A **_ B could return a function that takes C as an argument.

@mahrud
Copy link
Member

mahrud commented Nov 22, 2024

Yes, I considered that, but it feels like a hack for the sake of syntactic sugar, and at least in the case of tensor I decided it's not worth it.

@mahrud
Copy link
Member

mahrud commented Nov 23, 2024

Could you also add ⊠ (\boxtimes) as a binary operator?

i6 : 2⊠2
stdio:6:4:(3): error: no method for adjacent objects:
            ⊠ (of class Symbol)
    SPACE   2 (of class ZZ)

@mahrud
Copy link
Member

mahrud commented Nov 23, 2024

It would also be good to (eventually) add equivalent unicodes for all existing symbols, like ≥ as synonym for >=, etc.

@pzinn
Copy link
Contributor

pzinn commented Nov 23, 2024

already the case I believe:

i1 : symbol ≥ 

o1 = >=

o1 : Keyword

@d-torrance
Copy link
Member Author

I went ahead and added both and as keywords. Same precedence as ** -- does that seem ok?

@pzinn
Copy link
Contributor

pzinn commented Nov 23, 2024

I think this is getting out of hand. We can't add math symbols one at a time, there are thousands of them. We should let the user define them. (in some other PR, that is -- I understand there are complications with that)

@mahrud
Copy link
Member

mahrud commented Nov 23, 2024

We've only asked for like 5 total symbols in this thread because we have applications for them ...

@pzinn
Copy link
Contributor

pzinn commented Nov 23, 2024

sure but there are plenty more. For example, it would be nice to have as a binary-method synonym for isMember.

@pzinn
Copy link
Contributor

pzinn commented Nov 23, 2024

actually that one is fairly universal, so would make sense in this PR.

@mahrud
Copy link
Member

mahrud commented Nov 23, 2024

Hmm actually I think should be a synonym for in so you can write for x ∈ L do ..., though maybe we can also add new syntactic sugar x in L which calls isMember!

(This is certainly out of scope here, I'll make a new issue)

@d-torrance
Copy link
Member Author

Any further comments? Is this ready to merge?

@mahrud
Copy link
Member

mahrud commented Nov 26, 2024

I'm happy with it.

@d-torrance d-torrance merged commit 854d105 into Macaulay2:development Nov 26, 2024
5 checks passed
@pzinn
Copy link
Contributor

pzinn commented Nov 26, 2024

edited: never mind everything works fine. keeping for the record, because there's definitely room for improvement, but there's no bug per se.

In the changes in latex.m2, why is symbol · => "\\cdot added to keywordTexMath whereas the other new symbols are added to texMathLiteralTable? to be fair this part of latex.m2 is somewhat cryptic (I wrote it recently, so my fault). I do feel like the separation between keywords and symbols is stupid and needs rethinking.

@pzinn pzinn mentioned this pull request Nov 26, 2024
@pzinn
Copy link
Contributor

pzinn commented Dec 2, 2024

I've just noticed belatedly that this PR breaks this:

i1 : 1⇒2
stdio:1:1:(3): warning: character '�' immediately following number
stdio:1:1:(3): error: no method for adjacent objects:
            1 (of class ZZ)
    SPACE   ⇒2 (of class Symbol)

vs before

i1 : 1⇒2

o1 = 1 => 2

o1 : Option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants