Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update parser to handle hydrates, rational subscripts, complexes, and caged species #205

Merged
merged 14 commits into from
Mar 19, 2022

Conversation

jeremyagray
Copy link
Collaborator

@jeremyagray jeremyagray commented Feb 9, 2022

Update parser to handle hydrates (via the .. operator, not .), rational subscripts (subscripts as decimals, with .), coordination complexes (uses grouping symbols (), [], and {} but does not attach special meaning), and caged species (via the @ operator). Add parsing of some state symbols to the main parser, but suppress them for now so that the phase identification code in chemistry.py does not need changed; this does not affect the existing state symbol handling. Update parsing tests to reflect changes and make minor updates to the rest of the code and other tests to accommodate parser changes. Implement the planned deprecation of / in charge strings.

Breaking change: Breaks the old hydrate operator, .. Hydrates must now be indicated with the .. operator, as in BaCl2..2H2O. The old hydrate operator, ., is now used for rational subscripts. As warned in the last deprecation cycle, / is now invalid in charge strings.
Closes: closes #202, #176

Add failing tests for complexes and fraction subscripts.  Reorganize
``test_formula_to_composition`` into several thematic tests.  Add a
few examples.

Signed-off-by: Jeremy A Gray <[email protected]>
Expand regular expression for legibility and add a results name to
find the elements after parsing.

Signed-off-by: Jeremy A Gray <[email protected]>
Deprecate the use of slashes in charge strings and add tests for
enforcement.  Remove use of slash in charges in other tests.

Signed-off-by: Jeremy A Gray <[email protected]>
Parse square bracket and curly brace groups, without added
information.  Add tests to check composition and to detect grouping
symbol mismatch.

Signed-off-by: Jeremy A Gray <[email protected]>
Require complete parsing of formula input string to prevent
typographical errors.  Add a new `primes` group to match one or more
`*` or `'` (attached to the species and before any charge or state
symbol) used to indicate excited or otherwise special species.

Signed-off-by: Jeremy A Gray <[email protected]>
Parse fractional subscripts, provisionally changing the hydrate
separator to `..` from `.` to disambiguate from decimal number
subscripts.  Add multiple `sympy.nsimplify()` calls in chemistry.py
and test_chemistry.py to convert decimals to rationals to integers.

BREAKING CHANGE: Change hydrate separator from `.` to `..`.
Closes: closes bjodah#176
Signed-off-by: Jeremy A Gray <[email protected]>
Parse species like `Li@C60` indicated one species trapped by another,
such as a lithium atom inside a buckminsterfullerene.

Closes: closes bjodah#202
Signed-off-by: Jeremy A Gray <[email protected]>
Add more parser tests including a negative test for bad
charges (`Na+Cl-`).

Signed-off-by: Jeremy A Gray <[email protected]>
Fix hydrate example in docstring to reflect change from `.` to `..`.

Signed-off-by: Jeremy A Gray <[email protected]>
Pass subscripts as integers if at all possible to prevent problems
with ``sympy.linsolve()``.  Fix some linting issues.

Signed-off-by: Jeremy A Gray <[email protected]>
Parse and suppress state symbols to avoid problems parsing formulas
with states that are not in provided suffix lists, as revealed in the
doctests in chempy/chemistry.py.  Add additional chemistry and parsing
tests to check parsing and duplicate doctests.  Fix some flake8 lint
warnings.

Signed-off-by: Jeremy A Gray <[email protected]>
@bertiewooster
Copy link

bertiewooster commented Feb 10, 2022

Great, thanks @jeremyagray! I just visually verified that some tests we discussed (@ cage operator, .. hydrate operator, and non-integer subscripts) were included, and they are. A couple thoughts on exceptions testing:

  • Would it be worth adding exceptions testing for negative subscripts and charges, and fractional charges, for example Ca-2 and Ca-2.832 (which would presumable be interpreted as a charge) and Ca+2.832?
  • If there are multiple statements in a with pytest.raises(): block, such as:
@requires(parsing_library)
def test_formula_to_composition_deprecated_charge():
    with pytest.raises(ValueError):
        # Ions.
        formula_to_composition("Cl/-")
        formula_to_composition("Cl/-(aq)")

I did a check for a simple case, and it doesn't seem to require all of the statements to raise that error. For example, this test passes:

def test_raises():
    with pytest.raises(ZeroDivisionError):
        9 / 0
        2 / 1

despite the second division statement not raising ZeroDivisionError. So unless I'm missing some context (I might be!), I think each of those checks might need to be done individually. To continue the good practice of avoiding re-writing the test code, the test can be parametrized, for example like this:

@requires(parsing_library)
@pytest.mark.parametrize(
    "formula",
    [
        ("Cl/-"),
        ("Cl/-(aq)"),
    ],
)
def test_formula_to_composition_deprecated_charge(formula):
    with pytest.raises(ValueError):
        formula_to_composition(formula)

@jeremyagray
Copy link
Collaborator Author

No, @bertiewooster you're right about the tests. I was just adding tests for this PR without much thought for organization as I planned to do another PR on some testing issues that needed attention. All the parsing tests and probably some others should be parameterized and I thought to do that and inspect the coverage in another issue/PR.

If there are no other problems with this PR, one of us should probably get around to merging it in the not to distant future.

@bertiewooster
Copy link

Hi @jeremyagray is there anything else that needs to be done to push all your hard work across the finish line and justify a new release, or does it just need to be merged?

@jeremyagray
Copy link
Collaborator Author

jeremyagray commented Mar 15, 2022 via email

@bertiewooster
Copy link

@jeremyagray Could you parametrize the tests which contain
with pytest.raises(...

so that each parameter set is tested, that it actually fails, and not just one parameter set? I believe that should be done for the following:

  • def test_formula_to_composition_fail(): currently has four raises blocks
  • def test_formula_to_composition_deprecated_charge(): currently has 11 parameter sets
  • def test_formula_to_composition_bad_complexes(): currently has three raises blocks

@jeremyagray
Copy link
Collaborator Author

I just parametrized all the parsing tests. They probably need some more organization, but this at least gets all the tests running as they should and has the added benefit of boosting the test statistics.

The checks are failing because I've got some version and configuration drift between my local black and flake8 and the CI machine's versions. I'll fix that shortly.

Copy link

@bertiewooster bertiewooster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I visually reviewed the changes and they seem good, so once the checks pass, I recommend merging this commit. I like the elegance of the tests now that they're parametrized, for example
assert formula_to_composition(species) == composition

Thanks for putting in the extra effort to parametrize all the tests!

Parametrize all parsing tests.

Signed-off-by: Jeremy A Gray <[email protected]>
@bertiewooster
Copy link

Changes seem fine and checks passed. Unless @jeremyagray has any more changes in the works, recommend merging.

@bjodah
Copy link
Owner

bjodah commented Mar 29, 2022

Some really nice work here, thank you @jeremyagray for getting this done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interpreting malformed chemical formulas as substances
3 participants