Formalize sqibble #83
Labels
discussion
Development direction idea to quarrel over
enhancement
We don't do that here... yet
refactor
Because we have too much free time
sqibble
is non formalized idea, by formalization of which the package may benefit in numerous ways.We can define
sqibble
as atibble
containing at least one column of typesq
. Additionally, exactly one of columns of typesq
has a special role of being "sequence" column.sqtibble
has also attributecolumn_roles
which is a named character vector with at least one element. This element has namesequence
and value that is equal to the name of the "sequence" column (which usually is equal to"sequence"
).Other columns in the sqibble can also have roles specified. In this case, the mapping between a column's role (the role name is determined by the functions that use and generate the column) and its actual name (which can potentially change) is done using the
column_roles
attribute. Another frequently used role will potentially be "name", a column that determines the name of the sequence.By specifying roles in this way, we will be able to create a function (working title:
extract_role_column
) to extract fromsqibble
a column with the required role. If it is not available, a warning and a column with NA will be returned, or an error altogether -- the user will be able to specify the security level (as with other functions).Why do we need such formalization? It will allow us to write functions that operate on such objects instead of writing functions that take several vectors including one sequence vector. An example of such a function is currently
write_fasta
-- it takes two vectors:x
andname
. With a formalization like the one described above, the function will instead be able to take a single parameter --sqibbl
. The requirement will be forsqibble
to have columns with the roles "sequence" (which, recall, is a general requirement onsqibble
) and "name". A call towrite_fasta(some_sqibble)
will then be equivalent to a call to
write_fasta(x = some_sqibble %>% extract_role_column("sequence"), name = extract_role_column("name"))
which currently, if we are using unformed
sqibbles
looks like this:write_fasta(x = some_sqibble %>% pull("whatever-name-sequence-column-has-i-have-no-freaking-idea"), name = some_sqibble %>% pull("whatever-name-name-has"))
It could bring ease of use to users and another convenience to potential developers.
The text was updated successfully, but these errors were encountered: