Skip to content

Commit

Permalink
Better CST result for inputs like (0 . 0)
Browse files Browse the repository at this point in the history
In such a cases, special casing helps to put the CST children into
more suitable spots of the result CST. See source comments for
details.

Also improves runtime and consing:

  (time
   (loop repeat 400000
         do (eclector.concrete-syntax-tree:read-from-string
             "((0 . 0) (0 . 0) (0 . 0) (0 . 0))")))

Before the change

  Evaluation took:
    14.376 seconds of real time
    14.366840 seconds of total run time (12.522099 user, 1.844741 system)
    [ Real times consist of 0.136 seconds GC time, and 14.240 seconds non-GC time. ]
    [ Run times consist of 0.139 seconds GC time, and 14.228 seconds non-GC time. ]
    99.94% CPU
    43,040,377,482 processor cycles
    7,294,680,432 bytes consed

After the change

  Evaluation took:
    10.796 seconds of real time
    10.787757 seconds of total run time (9.173336 user, 1.614421 system)
    [ Real times consist of 0.048 seconds GC time, and 10.748 seconds non-GC time. ]
    [ Run times consist of 0.046 seconds GC time, and 10.742 seconds non-GC time. ]
    99.93% CPU
    32,310,576,549 processor cycles
    2,584,884,960 bytes consed
  • Loading branch information
scymtym committed Feb 25, 2024
1 parent bb0d5aa commit 3ce1521
Show file tree
Hide file tree
Showing 4 changed files with 167 additions and 33 deletions.
13 changes: 13 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,19 @@ Release 0.10 (not yet released)

(frob r1 r2 :k3 4 #4; :k5 6 :k6 7)

* The concrete-syntax-tree module now produces a better tree structure for
certain inputs like (0 . 0). Before this change the produced CST had the same
CONCRETE-SYNTAX-TREE:ATOM-CST object as the CONCRETE-SYNTAX-TREE:FIRST and
CONCRETE-SYNTAX-TREE:REST of the outer CONCRETE-SYNTAX-TREE:CONS-CST node.
After this change the CONCRETE-SYNTAX-TREE:FIRST child is the
CONCRETE-SYNTAX-TREE:ATOM-CST which corresponds to the first 0 in the input
and the CONCRETE-SYNTAX-TREE:REST child is the CONCRETE-SYNTAX-TREE:ATOM-CST
which corresponds to the second 0 in the input. In contrast to the previous
example, an input like (#1=0 . #1#) continues to result in a single
CONCRETE-SYNTAX-TREE:ATOM-CST in both the CONCRETE-SYNTAX-TREE:FIRST and
CONCRETE-SYNTAX-TREE:REST slots of the outer CONCRETE-SYNTAX-TREE:CONS-CST
object.

Release 0.9 (2023-03-19)

* The deprecated function ECLECTOR.CONCRETE-SYNTAX-TREE:CST-READ has been
Expand Down
126 changes: 94 additions & 32 deletions code/concrete-syntax-tree/client.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -11,35 +11,97 @@
;;; `concrete-syntax-tree:reconstruct' for difficult cases.
(defmethod eclector.parse-result:make-expression-result
((client cst-client) expression children source)
(cond ((atom expression)
(make-instance 'cst:atom-cst :raw expression :source source))
;; List structure with corresponding elements.
((and (eql (ignore-errors (list-length expression))
(length children))
(every (lambda (sub-expression child)
(eql sub-expression (cst:raw child)))
expression children))
(loop for expression in (loop with reversed = '()
for sub-expression on expression
do (push sub-expression reversed)
finally (return reversed))
for child in (reverse children)
for previous = (make-instance 'cst:atom-cst :raw nil) then node
for node = (make-instance 'cst:cons-cst :raw expression
:first child
:rest previous)
finally (return (reinitialize-instance node :source source))))
;; Structure mismatch, try heuristic reconstruction.
(t
;; We don't use
;;
;; (cst:reconstruct client expression children)
;;
;; because we want SOURCE for the outer `cons-cst' but not
;; any of its children.
(destructuring-bind (car . cdr) expression
(make-instance 'cst:cons-cst
:raw expression
:first (cst:reconstruct client car children)
:rest (cst:reconstruct client cdr children)
:source source)))))
;; Our goal is to return a CST, say c, with the following properties:
;;
;; 1. The structure and raw values of c should match the structure
;; of EXPRESSION, abbreviated as e, in the sense that for any
;; "path" p from the set U_{L >= 0} {car,cdr}^L that is "valid"
;; for e the following should hold:
;;
;; (eql (cst:raw (apply-path/cst p c)) (apply-path p e))
;;
;; where the `apply-path' functions repeatedly apply appropriate
;; readers according to the supplied path.
;;
;; 2. The elements of CHILDREN, which is a "pool" of available
;; sub-CSTs, should be incorporated as nodes into the CST rooted
;; as c whenever possible.
;;
;; Note that property 2. does not imply that all elements of
;; CHILDREN should appear in the CST rooted at c. For example, when
;; this method is called for an EXPRESSION of the form (0 . 0),
;; there will be three elements in CHILDREN: an atom for the first
;; 0, an atom for the consing dot and another atom for the second 0.
;; The middle child which represents the consing dot should not
;; appear as a node in the CST rooted at c.
;;
;; Furthermore, there are often multiple ways for c to satisfy the
;; properties 1. and 2. Consider again the example (0 . 0).
;; Property 1. can be fulfilled by setting the car and cdr of c to
;; either the first or the third child, so there are four equally
;; valid combinations.
;;
;; The code below tries to construct good CSTs by picking off a few
;; special cases and falling back to
;; `concrete-syntax-tree:reconstruct' for the general case. There
;; are two reasons for this approach:
;;
;; 1. For special cases, more information may be available.
;; Consider once again (0 . 0). It is obvious that the car of c
;; should be the `atom-cst' which corresponds to the first 0 and
;; the cdr of c should be the `atom-cst' which corresponds to the
;; second 0. In contrast, the reconstructing heuristic for the
;; general case would use the first `atom-cst' in both cases
;; since it has no way of distinguishing (0 . 0) and (morally)
;; (#1=0 . #1#).
;;
;; 2. `concrete-syntax-tree:reconstruct' is an expensive operation.
;; Special-casing common expression shapes improves performance
;; for typical inputs.
(let (children-length)
(cond ((atom expression)
(make-instance 'cst:atom-cst :raw expression :source source))
;; EXPRESSION has a list structure with elements
;; corresponding to the elements of CHILDREN.
((and (eql (ignore-errors (list-length expression))
(setf children-length (length children)))
(every (lambda (sub-expression child)
(eql sub-expression (cst:raw child)))
expression children))
(loop for expression in (loop with reversed = '()
for sub-expression on expression
do (push sub-expression reversed)
finally (return reversed))
for child in (reverse children)
for previous = (make-instance 'cst:atom-cst :raw nil) then node
for node = (make-instance 'cst:cons-cst :raw expression
:first child
:rest previous)
finally (return (reinitialize-instance node :source source))))
;; EXPRESSION is a CONS that resulted from reading a dotted
;; list such that the elements of CHILDREN correspond to car
;; of EXPRESSION, the consing dot and the cdr of EXPRESSION.
((and (not (consp (cdr expression)))
(= 3 children-length)
(destructuring-bind (car dot cdr) children
(eql (car expression) (cst:raw car))
(eql eclector.reader::*consing-dot* (cst:raw dot))
(eql (cdr expression) (cst:raw cdr))))
(make-instance 'cst:cons-cst :raw expression
:first (first children)
:rest (third children)
:source source))
;; Structure mismatch, try heuristic reconstruction.
(t
;; We don't use
;;
;; (cst:reconstruct client expression children)
;;
;; because we want SOURCE for the outer `cons-cst' but not
;; any of its children.
(destructuring-bind (car . cdr) expression
(make-instance 'cst:cons-cst
:raw expression
:first (cst:reconstruct client car children)
:rest (cst:reconstruct client cdr children)
:source source))))))
23 changes: 22 additions & 1 deletion data/changes.sexp
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,28 @@
"to" "SRFI 62" "for" "scheme" "." "One" "difference" "is" "that" "a"
"numeric" "infix" "argument" "can" "be" "used" "to" "comment" "out" "a"
"number" "of" "s-expressions" "different" "from" "1" ":")
(:code :common-lisp "(frob r1 r2 :k3 4 #4; :k5 6 :k6 7)")))
(:code :common-lisp "(frob r1 r2 :k3 4 #4; :k5 6 :k6 7)"))
(:item
(:paragraph
"The" (:tt "concrete-syntax-tree") "module" "now" "produces" "a"
"better" "tree" "structure" "for" "certain" "inputs" "like"
(:tt "(0 . 0)") "." "Before" "this" "change" "the" "produced" "CST"
"had" "the" "same" (:symbol "concrete-syntax-tree:atom-cst") "object"
"as" "the" (:symbol "concrete-syntax-tree:first") "and"
(:symbol "concrete-syntax-tree:rest") "of" "the" "outer"
(:symbol "concrete-syntax-tree:cons-cst") "node" "." "After" "this"
"change" "the" (:symbol "concrete-syntax-tree:first") "child" "is" "the"
(:symbol "concrete-syntax-tree:atom-cst") "which" "corresponds" "to" "the"
"first" (:tt "0") "in" "the" "input" "and" "the"
(:symbol "concrete-syntax-tree:rest") "child" "is" "the"
(:symbol "concrete-syntax-tree:atom-cst") "which" "corresponds" "to" "the"
"second" (:tt "0") "in" "the" "input" "." "In" "contrast" "to" "the"
"previous" "example" "," "an" "input" "like" (:tt "(#1=0 . #1#)")
"continues" "to" "result" "in" "a" "single"
(:symbol "concrete-syntax-tree:atom-cst") "in" "both" "the"
(:symbol "concrete-syntax-tree:first") "and"
(:symbol "concrete-syntax-tree:rest") "slots" "of" "the" "outer"
(:symbol "concrete-syntax-tree:cons-cst") "object" ".")))

(:release "0.9" "2023-03-19"
(:item
Expand Down
38 changes: 38 additions & 0 deletions test/concrete-syntax-tree/read.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,44 @@
(scons ())) ; nil
(scons ())))))))) ; nil

;;; Identity

(test read/identity
"Test node identity properties in constructed CSTs."
;; Test special case for proper list.
(let* ((result (eclector.concrete-syntax-tree:read-from-string "(2 2 2)"))
(first (cst:first result))
(second (cst:first (cst:rest result)))
(third (cst:first (cst:rest (cst:rest result)))))
(is (eql 2 (cst:raw first)))
(is (eql 2 (cst:raw second)))
(is (eql 2 (cst:raw third)))
(is (notany #'eq (list first second third) (list second third first))))
(let* ((result (eclector.concrete-syntax-tree:read-from-string
"(#1=2 2 #1#)"))
(first (cst:first result))
(second (cst:first (cst:rest result)))
(third (cst:first (cst:rest (cst:rest result)))))
(is (eql 2 (cst:raw first)))
(is (eql 2 (cst:raw second)))
(is (eql 2 (cst:raw third)))
(is (not (eq first second)))
(is (eq first third)))
;; Test special case for dotted list.
(let* ((result (eclector.concrete-syntax-tree:read-from-string "(2 . 2)"))
(car (cst:first result))
(cdr (cst:rest result)))
(is (eql 2 (cst:raw car)))
(is (eql 2 (cst:raw cdr)))
(is (not (eq car cdr))))
(let* ((result (eclector.concrete-syntax-tree:read-from-string
"(#1=2 . #1#)"))
(car (cst:first result))
(cdr (cst:rest result)))
(is (eql 2 (cst:raw car)))
(is (eql 2 (cst:raw cdr)))
(is (eq car cdr))))

;;; Custom client

(defclass custom-client (eclector.concrete-syntax-tree:cst-client)
Expand Down

0 comments on commit 3ce1521

Please sign in to comment.