Skip to content

Commit

Permalink
Update/improve implementation; add annotations, doctests, and README …
Browse files Browse the repository at this point in the history
…example.
  • Loading branch information
lapets committed Mar 28, 2024
1 parent 99980cb commit ad35021
Show file tree
Hide file tree
Showing 2 changed files with 116 additions and 13 deletions.
46 changes: 46 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,52 @@ The library can be imported in the usual way:
from parsial import parsial
Example
^^^^^^^

.. |parsial| replace:: ``parsial``
.. _parsial: https://parsial.readthedocs.io/en/0.1.0/_source/parsial.html#parsial.parsial.parsial

The |parsial|_ function accepts a parsing function (that takes a string input) and returns a new parsing function. This new function attempts to parse an input string using the original parsing function *even if parsing errors occur*. This is accomplished by selectively removing portions of the input that cause errors:

.. code-block:: python
>>> lines = [
... 'x = 123',
... 'y =',
... 'print(x)',
... 'z = x +',
... 'print(2 * x)'
... ]
>>> import ast
>>> parser = parsial(ast.parse)
>>> (a, slices) = parser('\\n'.join(lines))
>>> exec(compile(a, '', 'exec'))
123
246
.. |slice| replace:: ``slice``
.. _slice: https://docs.python.org/3/library/functions.html#slice

In addition to returning the result, the new function also returns a list of |slice|_ instances (one for each line found in the input string):

.. code-block:: python
>>> for s in slices:
... print(s)
slice(0, 7, None)
slice(0, 0, None)
slice(0, 8, None)
slice(0, 0, None)
slice(0, 12, None)
Each |slice|_ instance indicates what portion of the corresponding line in the input was included in the successful parsing attempt:

.. code-block:: python
>>> [l[s] for (l, s) in zip(lines, slices)]
['x = 123', '', 'print(x)', '', 'print(2 * x)']
Development
-----------
All installation and development dependencies are fully specified in ``pyproject.toml``. The ``project.optional-dependencies`` object is used to `specify optional requirements <https://peps.python.org/pep-0621>`__ for various development tasks. This makes it possible to specify additional options (such as ``docs``, ``lint``, and so on) when performing installation using `pip <https://pypi.org/project/pip>`__:
Expand Down
83 changes: 70 additions & 13 deletions src/parsial/parsial.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,40 +3,97 @@
that skips portions of the input that contain syntax errors.
"""
from __future__ import annotations
from typing import Any, List, Callable
from typing import Any, List, Tuple, Callable
import doctest

def parsial(parse: Callable[[str], Any]) -> Callable[[str], List[int]]:
def parsial(
parse: Callable[[str], Any]
) -> Callable[[str], Tuple[Any, List[slice]]]:
"""
Accept a parsing function that takes string inputs and return a function
that returns some subset of the lines in the input string that, when
removed, allow the the parsing function to succeed.
Accept a parsing function (that takes a string input) and return a new
parsing function. This new function attempts to parse an input string
using the original parsing function even if parsing errors occur. This
is done by selectively removing portions of the input that cause
errors.
>>> lines = [
... 'x = 123',
... 'y =',
... 'print(x)',
... 'z = x +',
... 'print(2 * x)'
... ]
>>> import ast
>>> parser = parsial(ast.parse)
>>> (a, slices) = parser('\\n'.join(lines))
>>> exec(compile(a, '', 'exec'))
123
246
In addition to returning the result, the new function also returns a
list of :obj:`slice` instances (one for each line found in the input
string).
>>> for s in slices:
... print(s)
slice(0, 7, None)
slice(0, 0, None)
slice(0, 8, None)
slice(0, 0, None)
slice(0, 12, None)
Each :obj:`slice` instance indicates what portion of the corresponding
line in the input was included in the successful parsing attempt.
>>> [l[s] for (l, s) in zip(lines, slices)]
['x = 123', '', 'print(x)', '', 'print(2 * x)']
For a string that can be parsed successfully, the parser supplied to
this function is invoked exactly once. In the worst case, it is invoked
once per line of the input string.
"""
# Define the new parsing function.
def parse_(source: str) -> List[int]:
lines = source.split('\n')
lines_ = None
result = None

# Find the longest stretch of lines that begins with the first line
# and leads to a successful parse.
for end in range(len(lines), -1, -1):
try:
parse('\n'.join(lines[:end]))
result = parse('\n'.join(lines[:end]))
lines_ = lines[:end]
break
except Exception as _:
except Exception as _: # pylint: disable=broad-exception-caught
pass

skips = []
# If the entire input was not parsed via the block above, attempt to
# include each remaining line to see if a parse succeeds. Keep track
# of which lines are skipped.
skips = set()
if end < len(lines):
skips.append(end)
skips.add(end)
lines_ = lines[:end] + ['']
for i in range(end + 1, len(lines)):
try:
lines__ = lines_ + [lines[i]]
parse('\n'.join(lines__))
result = parse('\n'.join(lines__))
lines_ = lines__
except Exception as _:
except Exception as _: # pylint: disable=broad-exception-caught
lines_ += ['']
skips.append(i)
skips.add(i)

return skips
# Return the result of a successful parsing attempt, as well as a list
# of slices indicating what portions of each line were included to
# obtain the result.
return (
result,
[
slice(0, len(line) if i not in skips else 0)
for (i, line) in enumerate(lines)
]
)

return parse_

Expand Down

0 comments on commit ad35021

Please sign in to comment.