Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fin n for array indexes? #313

Open
Ericson2314 opened this issue Sep 23, 2024 · 6 comments
Open

Fin n for array indexes? #313

Ericson2314 opened this issue Sep 23, 2024 · 6 comments

Comments

@Ericson2314
Copy link
Collaborator

Happy currently has a lot of "untyped" array indexing, where nothing stops one using an index for the wrong of e.g. tokens, non-terminals, terminals.

I haven't done finite sets in Haskell for a while, so not sure the state of things these days (still KnownNat and friends I suppose?), but I wonder if we should try using them. It would be a lot less scary to refactor the code if these indexes wouldn't be confused for one another.

@sgraf812
Copy link
Collaborator

sgraf812 commented Sep 24, 2024

What utility would that offer? It certainly would incur a high complexity toll on the code base.

@Ericson2314
Copy link
Collaborator Author

Well have you seen this

All the tokens in the grammar are mapped onto integers, for speed.
The namespace is broken up as follows:
epsilon = 0
error = 1
dummy = 2
%start = 3..s
non-terminals = s..n
terminals = n..m
%eof = m
where n_nonterminals = n - 3 (including %starts)
n_terminals = 1{-error-} + (m-n) + 1{-eof-} (including error and %eof)
These numbers are deeply magical, change at your own risk. Several
other places rely on these being arranged as they are, including
ProduceCode.lhs and the various HappyTemplates.
Unfortunately this means you can't tell whether a given token is a
terminal or non-terminal without knowing the boundaries of the
namespace, which are kept in the Grammar structure.
In hindsight, this was probably a bad idea.
?

in particular

In hindsight, this was probably a bad idea.

Makes me chuckle. Fin would remove a major cost from this, as Fin (a + b + c) won't get mixed up with Fin a, Fin b, or Fin c.

My trying to turn Name into a mere newtype in #223 and somehow getting <<loop>> also makes me think the current code is quite fragile.

@Ericson2314
Copy link
Collaborator Author

Also, it gives us a way to pull some metadata out of band. I am thinking of something like

data Grammar eliminator
      = Grammar {
              productions       :: [Production eliminator],
              lookupProdNo      :: Int -> Production eliminator,
              lookupProdsOfName :: Name -> [Int],
              token_specs       :: [(Name, TokenSpec)],
              terminals         :: [Name],
              non_terminals     :: [Name],
              starts            :: [(String,Name,Name,Bool)],
              types             :: Array Int (Maybe String),
              token_names       :: Array Int String,
              first_nonterm     :: Name,
              first_term        :: Name,
              eof_term          :: Name,
              priorities        :: [(Name,Priority)]
       }

to

type Name nStarts nTerms nNonTerms = Fin (4 + nStarts + nTerms + nNonTerms)

data Production = Production
      (Fin nNonTerms)
      [Name nStarts nTerms nNonTerms]
      -- IIRC these are indices of the previous list, 
      -- so could do more dep types 
      [Word]
      Priority

data Grammar nStarts nTerms nNonTerms
      = Grammar {
              -- No longer need eliminator type parameter
              productions       :: Vector nProds (Production nStarts nTerms nNonTerms),
              lookupProdsOfName :: Map (Fin nNonTerms) (Set (Fin nProds)),
              starts            :: Vector nStarts (Name, Bool),
              priorities        :: Map (Fin nNonTerms) Priority
       }

which is the core formal language part (?)

and

data GrammarExtras nStarts nTerms nNonTerms
    = GrammarExtras {
              prod_elims        :: Vector nNonTerms Text,
              token_elims       :: Vector nTerms TokenSpec,
              start_names       :: Vector nStarts Text,
              start_names       :: Vector nStarts Text,
              types             :: Vector nTerms (Maybe Text),
              token_names       :: Vector nTerms Text,
    }

has all the extras bits for human legibility and/or codegen, with guarantees from the type parameters that everything has the right arities.

@Ericson2314
Copy link
Collaborator Author

From what I remember, I don't like the overhead of dependent stuff in Haskell either, but I am not sure how else lock down the data structures without changing around the terms more.

For example, one can do zip all lists/arrays of the same arity to ditch the coherence conditions, and use "expansion points" as needed to factor out the extra info", and that works pretty well, but the various index types we need to store (because it's a graph, at the end of the day) are not helped by this.

@sgraf812
Copy link
Collaborator

In hindsight, this was probably a bad idea.

Makes me chuckle. Fin would remove a major cost from this, as Fin (a + b + c) won't get mixed up with Fin a, Fin b, or Fin c.

But you ultimately need Fin (a + b + c) because they all index the same array.

I still don't see the utility of using Fin n specifically. Why not SpecialName, TerminalName, NonterminalName? You would still need to know how many of each sort there are in order to compute array indices, but that should work just fine.
It's pretty easy to check that no Name ever generated is out of bounds at generation time; no need to prove it in the type system (which would be very awkward to do).

@Ericson2314
Copy link
Collaborator Author

Ericson2314 commented Sep 26, 2024

Fair enough, I was thinking of #223 as a stepping stone already, and you are rightly pointing out there are more newtypes we can create as further stepping stones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants