Skip to content

LR(0) JSON parser written in rust to learn about LR parsers.

Notifications You must be signed in to change notification settings

satchelbaldwin/rust-lrparser-json

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rust JSON LR(0) Parser

About

In order to put to use everything I read about in Compilers: Principles, Techniques, and Tools to use, I decided to implement an LR parser as is described in Chapter 4 of the book.

Parsing is a large chunk of the textbook, and it goes into great detail on the theory and rationale behind the different approaches. The chapter dedicated to parsing is much significantly larger than the others around it, and having a project to show my understanding of the material instead of dumping a large list of exercises done seemed much better to do.

The textbook is broken into 12 chapters, and the book describes how a usual undergrad course would read 1-6. I read 1-6 and implemented this project to serve as a comprehensive demonstration of understanding.

This parser uses the LR(0) recursive ascent approach to parse and validate JSON.

JSON served as a good specification to target due to its small grammar and widespread use. Unlike targeting a programming language, the specification was small enough to fully implement and check for correctness, whereas other programming languages I would need to target only a subset to compile in the essence of time. Intermediate code generation will not come into play here, although I found that section very interesting to read.

Note: this does not handle unicode properly.

I needed to include a library for handling floats as not NaN so that they could be hashable in maps.

Parse Table

We begin with the cleanly-written grammar here, and use it to generate the LR parse tables.

object → '{' pairs '}'

pairs      → pair pairs_tail | ε
pair       → STRING ':' value
pairs_tail → ',' pairs | ε

value → STRING | NUMBER | 'true' | 'false' | 'null' | object | array
array → '[' elements ']'

elements      → value elements_tail | ε
elements_tail → ',' elements | ε

We add a start symbol and production S -> object and use it as the first state, placing the cursor at the first position. (Single quotes are omitted for space.)

S0: S -> • object $
         • { pairs }

We see that there is a nonterminal left that has not been followed through on. We make a new state S1 for this item set and add it to the GOTO table, no shift action. (Double arrows are used to show state transitions in the states themselves, not part of the table.)

S0:  S -> • object $  => S1
          • { pairs }

S1:  S -> object • $
S|    ACTION    |       GOTO                   
 |              | o  
------------------------------
0|              | 1  
1|              |
2|              |
3|              |
4|              |
...     

From S1, we see there's nowhere to go but advancing the cursor to the end of input (S2), with an ACCEPT action thereafter. All valid JSON ends at the end of an object followed by the end of input.

S0:  S -> • object $  => S1
          • { pairs } 

S1:  S -> object • $  => S2

S2:  S -> object $ •  => ACCEPT
S|    ACTION    |       GOTO                   
 | $            | o $ 
------------------------------
0|              | 1  
1|              |   2
2| a            |
3|              |
4|              |
...     

Back in S0, if the terminal { is found right after the cursor, we go to a new state with the cursor moved to the right (S3) and see what can follow from there, taking the closure as before. Since { is terminal, we add a shift action from S0 -> S3 on {.

S0:  S -> • object $  => S1
          • { pairs } => S3

S1:  S -> object • $  => S2

S2:  S -> object $ •  => ACCEPT

S3:  S -> { • pairs }
     pairs -> • pair pairs_tail 
     pairs -> • STRING : value pairs_tail
     pairs -> •
S|    ACTION    |       GOTO                   
 | $ {          | o $ 
------------------------------
0|   s3         | 1  
1|              |   2
2| a            |
3|              |
4|              |
...     

Next we will follow the process, over and over, until we reach our finished parse table as follows: (I moved to a graphical sheets program for this, screenshot below.)

Productions labelled for reduce:

1:  object → '{' pairs '}'

2:  pairs      → pair pairs_tail 
3:             | ε

4:  pair       → STRING ':' value

5:  pairs_tail → ',' pairs 
6:             | ε

7:  value → STRING 
8:        | NUMBER 
9:        | 'true' 
10:       | 'false' 
11:       | 'null' 
12:       | object 
13:       | array

14: array → '[' elements ']'

15: elements      → value elements_tail 
16:               | ε
17: elements_tail → ',' elements 
18:               | ε

States to generate the table:

S0:  S -> • object $  => S1
        | • { pairs } => S3

S1:  S -> object • $  => S2

S2:  S -> object $ •  => ACCEPT

S3:  object -> { • pairs }      => S4 (GOTO)
     pairs -> • pair pairs_tail => S6
            | •                 => S4
     pair  -> • STRING : value  => S5
            

S4:  object -> { pairs • }      => S7

S5:  pair -> STRING • : value   => S8

S6:  pairs -> pair • pairs_tail => S10
     pairs_tail -> • ',' pairs  => S9
     pairs_tail -> •            => S7

S7:  object -> { pairs } •

S8:  pair -> STRING : • value  => S11
     value -> STRING           => S12
            | NUMBER           => S13
            | 'true'           => S14
            | 'false'          => S15
            | 'null'           => S16
            | object           => S17
            | array            => S18

S9:  pairs_tail -> ',' • pairs  => S19
     pairs -> • pair pairs_tail => S6
            | •                 => S4
     pair  -> • STRING : value  => S5
-
S10: pairs -> pair pairs_tail •
-
S11: pair -> STRING : value • 

S12: value -> STRING •

S13: value -> NUMBER •

S14: value -> 'true' •

S15: value -> 'false' •

S16: value -> 'null' •

S17: value -> object •

S18: value -> array •

S19: pairs_tail -> ',' pairs •

S20: array -> [ • elements ]
     elements -> • value elements_tail
               | • [ elements ]
               | •
     value -> STRING 
            | NUMBER 
            | 'true' 
            | 'false' 
            | 'null' 
            | object 
            | array 
     object -> • { pairs }

S21: array -> [ elements • ]

S22: elements -> value • element_tail 
     elements_tail -> • , elements
                    | •

S23: array -> [ elements ] •

S24: elements -> value elements_tail •

S25: elements_tail -> , • elements
     elements -> • value elements_tail
               | • [ elements ]
               | •
     value -> STRING 
            | NUMBER 
            | 'true' 
            | 'false' 
            | 'null' 
            | object 
            | array 
     object -> • { pairs }

S26: elements_tail -> , elements •

Differences in final: State 2 was actually unnecessary.

parsechart

References Used

Building

cargo build to create the executable.

Usage

About

LR(0) JSON parser written in rust to learn about LR parsers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages