Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use web streams instead of Node.js streams #61

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

garrettjstevens
Copy link
Contributor

This PR migrates the streaming portions of the library to use web streams instead of Node.js streams API. From the Node.js overview:

The WHATWG Streams Standard (or "web streams") defines an API for handling streaming data. It is similar to the Node.js Streams API but emerged later and has become the "standard" API for streaming data across many JavaScript environments.

The motivation for this is 1) to provide a consistent way to handle streams with the library in the browser or Node.js and 2) to remove any Node.js dependencies from the library so that no polyfilling is necessary in the browser.

The streaming functionality is exported as the GFFTransformer and GFFFormattingTransformer classes. The synchronous functionality remains unchanged, and the new streaming functionality should be easily tree-shaken out by bundlers if it isn't used.

Here is an example from the updated README of how the stream parsing can be used in Node.js and the browser:

Node.js example

import {
  createReadStream,
  createWriteStream,
  readFileSync,
  writeFileSync,
} from 'fs'
// Readable.toWeb and Writable.toWeb are only available in Node.js v18 and up
// in Node.js 16, you'll have to provide your own stream source and sink
import { Readable, Writable } from 'stream'
// TransformStream is available without importing in Node.js v18 and up
import { TransformStream } from 'stream/web'
import {
  formatSync,
  parseStringSync,
  GFFTransformer,
  GFFFormattingTransformer,
} from '@gmod/gff'

// parse a file from a file name. parses only features and sequences by default,
// set options to parse directives and/or comments
;(async () => {
  const readStream = createReadStream('/path/to/my/file.gff3')
  const streamOfGFF3 = Readable.toWeb(readStream).pipeThrough(
    new TransformStream(
      new GFFTransformer({ parseComments: true, parseDirectives: true }),
    ),
  )
  for await (const data of streamOfGFF3) {
    if ('directive' in data) {
      console.log('got a directive', data)
    } else if ('comment' in data) {
      console.log('got a comment', data)
    } else if ('sequence' in data) {
      console.log('got a sequence from a FASTA section')
    } else {
      console.log('got a feature', data)
    }
  }
})()

Browser example

import { GFFTransformer } from '@gmod/gff'

// parse a file from a URL. parses only features and sequences by default, set
// options to parse directives and/or comments
;(async () => {
  const response = await fetch('http://example.com/file.gff3')
  if (!response.ok) {
    throw new Error('Bad response')
  }
  if (!response.body) {
    throw new Error('No response body')
  }
  const reader = response.body
    .pipeThrough(new TransformStream(new GFFTransformer({ parseAll: true })))
    .getReader()
  let result
  do {
    result = await reader.read()
    if (result.done) {
      continue
    }
    const data = result.value
    if ('directive' in data) {
      console.log('got a directive', data)
    } else if ('comment' in data) {
      console.log('got a comment', data)
    } else if ('sequence' in data) {
      console.log('got a sequence from a FASTA section')
    } else {
      console.log('got a feature', data)
    }
  } while (!result.done)
})()

This would be a major version bump, and a couple other breaking change are included:

  • parseAll and encoding parse options have been removed.
    • parseAll could be difficult to reason about if a use tried to use it at the same time as e.g. parseFeatures
    • encoding was unused, and the only spec-compliant encoding of GFF3 is UTF8
  • parseStream and formatStream functions were removed since they now didn't add any functionality

One remaining functionality that would be nice to add is better TypeScript inference with the streaming functionality. For example, with

const result = parseStringSync(gff3String, { parseSequences: false, parseComments: true })

the type of result is (GFF3Feature | GFF3Comment)[], since it's able to reason about what is being parsed. I haven't been able to figure out how to do that with the Transformers, though, so with

const resultStream = new TransformStream(new GFFTransformer({ parseSequences: false, parseComments: true }))

the type of returnStream is TransformStream<Uint8Array, GFF3Item>, when I'd like it to be TransformStream<Uint8Array, GFF3Feature | GFF3Comment>.

@garrettjstevens garrettjstevens self-assigned this Apr 3, 2023
this.disableDerivesFromReferences =
args.disableDerivesFromReferences || false

// number of lines to buffer
this.bufferSize = args.bufferSize === undefined ? 1000 : args.bufferSize
this.bufferSize = args.bufferSize === undefined ? 50000 : args.bufferSize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want to call out this change, I have run into buffer size being not enough and it is a maddening error. I think it is actually a potential source of serious mysterious bugs and it may be worth removing it entirely, making it infinity (similar to chunk size error or similar...just something the user shouldnt have to configure)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants