You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In v2, the way records are constructed (directly, not parsed from a file) is not very nice, which hints at an underlying problem. Luckily this is internal and can be solved in a non-breaking feature release.
The idea is that we only want one source of what is a valid FASTX file, so when a user constructs a record from e.g. a string, we use the Automa parser also. Two issues with this:
First, the Automa parser only works on a TranscodingStream due to the way Automa generates the code. This means that to parse a bytevector like a string, we need to needlessly wrap it in a NoopStream(IOBuffer(data)), which creates way more overhead than needed.
To construct records from raw parts this is even more roundabout, since we first print the parts to an IOBuffer, then convert to a bytearray, then back to an IOBuffer.
The second issue is what happens if the user does parse(FASTARecord, ">A\nG\n>A\nG"). Clearly this should error as there are two records. But the Automa machine simply returns that it found a record. What the code does now is to try to load another record, then throw an error if that succeeds. This is simply bad design. [EDIT: I've changed that to instead use NoopStream's internals. It's better design, but still not great]
A better solution would be to somehow make parsing of a record from a byte buffer the single fundamental operation, then have the IO-based code operate on top of that. This is pretty tricky and requires a rework of Automa (but it would also make the Automa code way easier to reason about, so it should probably happen)
The text was updated successfully, but these errors were encountered:
In v2, the way records are constructed (directly, not parsed from a file) is not very nice, which hints at an underlying problem. Luckily this is internal and can be solved in a non-breaking feature release.
The idea is that we only want one source of what is a valid FASTX file, so when a user constructs a record from e.g. a string, we use the Automa parser also. Two issues with this:
First, the Automa parser only works on a
TranscodingStream
due to the way Automa generates the code. This means that to parse a bytevector like a string, we need to needlessly wrap it in aNoopStream(IOBuffer(data))
, which creates way more overhead than needed.To construct records from raw parts this is even more roundabout, since we first print the parts to an IOBuffer, then convert to a bytearray, then back to an IOBuffer.
The second issue is what happens if the user does
parse(FASTARecord, ">A\nG\n>A\nG")
. Clearly this should error as there are two records. But the Automa machine simply returns that it found a record. What the code does now is to try to load another record, then throw an error if that succeeds. This is simply bad design. [EDIT: I've changed that to instead use NoopStream's internals. It's better design, but still not great]A better solution would be to somehow make parsing of a record from a byte buffer the single fundamental operation, then have the IO-based code operate on top of that. This is pretty tricky and requires a rework of Automa (but it would also make the Automa code way easier to reason about, so it should probably happen)
The text was updated successfully, but these errors were encountered: