Releases: tskit-dev/tskit
C API 0.99.3
C API release.
Breaking changes
- tsk_mutation_table_add_row has an extra
time
argument. If the time is unknownTSK_UNKNOWN_TIME
should be passed. (@benjeffery, #672) - Change genotypes from unsigned to signed to accommodate missing data. (see #144 for discussion). This only affects users of the
tsk_vargen_t class
. Genotypes are now stored asint8_t
andint16_t
types rather than the former unsigned types. The field names in the genotypes union of thetsk_variant_t
struct returned bytsk_vargen_next
have been renamed toi8
andi16
accordingly; care should be taken when updating client code to ensure that types are correct. The number of distinct alleles supported by 8 bit genotypes has therefore dropped from 255 to 127, with a similar reduction for 16 bit genotypes. - Change the
tsk_vargen_init
method to take an extra parameteralleles
. To keep the current behaviour, set this parameter toNULL
. - Edges can now have metadata. Hence edge methods now take two extra arguments:
metadata
andmetadata length
. The file format has also changed to accommodate this, but is backwards compatible. Edge metadata can be disabled for a table collection with theTSK_NO_EDGE_METADATA
flag. (@benjeffery, #496, #712) - Migrations can now have metadata. Hence migration methods now take two extra arguments:
metadata
andmetadata length
. The file format has also changed to accommodate this, but is backwards compatible. (@benjeffery, #505) - The text dump of tables with metadata now includes the metadata schema as a header. (@benjeffery, #493)
- Bad tree topologies are detected earlier, so that it is no longer possible to create a
tsk_treeseq_t
object which contains a parent with contradictory children on an interval. Previously an error occurred when some operation building the trees was attempted. (@jeromekelleher, #709)
New features
- New methods to perform set operations on table collections.
tsk_table_collection_subset
subsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690).tsk_table_collection_union
forms the node-wise union of two table collections. (@mufernando, @petrelharp, #381, #623) - Mutations now have an optional double-precision floating-point
time
column. If not specified, this defaults to a particularNaN
value (TSK_UNKNOWN_TIME
) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see Mutation requirements. Addtsk_table_collection_compute_mutation_times
and new flag totsk_table_collection_check_integrity
:TSK_CHECK_MUTATION_TIME
. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence. (@benjeffery, #672) - Add
metadata
andmetadata_schema
fields to table collection, with accessors on tree sequence. These store arbitrary bytes and are optional in the file format. (:user: benjeffery, #641) - Add the
TSK_KEEP_UNARY
option to simplify (@gtsambos). See #1 and #143. - Add a
set_root_threshold
option totsk_tree_t
which allows us to set the number of samples a node must be an ancestor of to be considered a root. (#462) - Change the semantics of
tsk_tree_t
so that sample counts are always computed, and add a newTSK_NO_SAMPLE_COUNTS
option to turn this off. (#462) - Tables with metadata now have an optional
metadata_schema
field that can contain arbitrary bytes. (@benjeffery, #493) - Tables loaded from a file can now be edited in the same way as any other table collection (@jeromekelleher, #536, #530)
- Support for reading/writing to arbitrary file streams with the
loadf
/dumpf
variants for tree sequence and table collection load/dump. (@jeromekelleher, @grahamgower, #565, #599) - Add low-level sorting API and
TSK_NO_CHECK_INTEGRITY
flag. (@jeromekelleher, #627, #626) - Add extension of Kendall-Colijn tree distance metric for tree sequences computed by
tsk_treeseq_kc_distance
(@daniel-goldstein, #548)
Deprecated
- The
TSK_SAMPLE_COUNTS
options is now ignored and will print out a warning if used. (#462)
Minor feature release
Minor feature release, providing a tree distance metric and various method to manipulate tree sequence data.
New features
- Kendall-Colijn tree distance metric computed by Tree.kc_distance (@awohns, #172).
- New “timeasc” and “timedesc” orders for tree traversals (@benjeffery, #246, #399).
- Up to 2X performance improvements to tree traversals (@benjeffery, #400).
- Add trim, delete_sites, keep_intervals and delete_intervals methods to edit tree sequence data. (@hyanwong, #364, #372, #377, #390).
- Initial online documentation for CLI (@hyanwong, #414).
- Various documentation improvements (@hyanwong, @jeromekelleher, @petrelharp).
- Rename the map_ancestors function to link_ancestors (@hyanwong, @gtsambos; #406, #262). The original function is retained as an deprecated alias.
Bugfixes
- Fix height scaling issues with SVG tree drawing (@jeromekelleher, #407, #383, #378).
- Do not reuse buffers in LdCalculator (@jeromekelleher). See #397 and #396.
Bugfix release
Minor bugfix release.
Relaxes overly-strict input requirements on individual location data that caused some SLiM tree sequences to fail loading in version 0.2.1(see :issue:351
).
New features
- Add log_time height scaling option for drawing SVG tree (:user:
marianne-aspbury
). See :pr:324
and :issue:303
.
Bugfixes
- Allow 4G metadata columns (:user:
jeromekelleher
). See :pr:342
and :issue:341
.
Major feature release
Major feature release, adding support for population genetic statistics,
improved VCF output and many other features.
Note: Version 0.2.0 was skipped because of an error uploading to PyPI
which could not be undone.
Breaking changes
-
Genotype arrays returned by
TreeSequence.variants
and
TreeSequence.genotype_matrix
have changed from unsigned 8 bit values
to signed 8 bit values to accomodate missing data (see :issue:144
for
discussion). Specifically, the dtype of the genotypes arrays have changed
from numpy "u8" to "i8". This should not affect client code in any way
unless it specifically depends on the type of the returned numpy array. -
The VCF written by the
write_vcf
is no longer compatible with previous
versions, which had significant shortcomings. Position values are now rounded
to the nearest integer by default, REF and ALT values are derived from the
actual allelic states (rather than always being A and T). Sample names
are now of the formtsk_j
for sample ID j. Most of the legacy behaviour
can be recovered with new options, however. -
The positional parameter
reference_sets
ingenealogical_nearest_neighbours
andmean_descendants
TreeSequence methods has been renamed to
sample_sets
.
New features
-
Support for general windowed statistics. Implementations of diversity,
divergence, segregating sites, Tajima's D, Fst, Patterson's F statistics,
Y statistics, trait correlations and covariance, and k-dimensional allele
frequency specra (:user:petrelharp
, :user:jeromekelleher
, :user:molpopgen
). -
Add the
keep_unary
option to simplify (:user:gtsambos
). See :issue:1
and :pr:143
. -
Add the
map_ancestors
method to TableCollection (user:gtsambos
). See :pr:175
. -
Add the
squash
method to EdgeTable (:user:gtsambos
). See :issue:59
and
:pr:285
. -
Add support for individuals to VCF output, and fix major issues with output
format (:user:jeromekelleher
). Position values are transformed in a much
more straightforward manner and output has been generalised substantially.
Addsindividual_names
andposition_transform
arguments.
See :pr:286
, and issues :issue:2
, :issue:30
and :issue:73
. -
Control height scale in SVG trees using 'tree_height_scale' and 'max_tree_height'
(:user:hyanwong
, :user:jeromekelleher
). See :issue:167
, :pr:168
.
Various other improvements to tree drawing (:pr:235
, :pr:241
, :pr:242
,
:pr:252
, :pr:259
). -
Add
Tree.max_root_time
property (:user:hyanwong
, :user:jeromekelleher
).
See :pr:170
. -
Improved input checking on various methods taking numpy arrays as parameters
(:user:hyanwong
). See :issue:8
and :pr:185
. -
Define the branch length over roots in trees to be zero (previously raise
-
Implementation of the genealogical nearest neighbours statistic
(:user:hyanwong
, :user:jeromekelleher
). -
New
delete_intervals
andkeep_intervals
method for the TableCollection
to allow slicing out of topology from specific intervals (:user:hyanwong
,
:user:andrewkern
, :user:petrelharp
, :user:jeromekelleher
). See
:pr:225
and :pr:261
. -
Support for missing data via a topological definition (:user:
jeromekelleher
).
See :issue:270
and :pr:272
. -
Add ability to set columns directly in the Tables API (:user:
jeromekelleher
).
See :issue:12
and :pr:307
. -
Various documentation improvements from :user:
brianzhang
, :user:hyanwong
,
:user:petrelharp
and :user:jeromekelleher
.
Deprecated
-
Deprecate
Tree.length
in favour ofTree.span
(:user:hyanwong
).
See :pr:169
. -
Deprecate
TreeSequence.pairwise_diversity
in favour of the new
diversity
method. See :issue:215
, :pr:312
.
Bugfixes
- Catch NaN and infinity values within tables (:user:
hyanwong
).
See :issue:293
and :pr:294
.
Alpha access to AFS and VCF updates
An alpha release for testing new stats, allele frequency spectrum and VCF updates.
Alpha access to general stats
Alpha release to give early access to new stats and drawing APIs.
Removing Python 2 support
This release removes support for Python 2, adds more flexible tree access and a new tskit
command line interface.
New features
- Remove support for Python 2 (:user:
hugovk
). See :issue:137
and :pr:140
. - More flexible tree API (:pr:
121
). AddsTreeSequence.at
andTreeSequence.at_index
methods to find specific trees, and efficient support for backwards traversal usingreversed(ts.trees())
. - Add initial
tskit
CLI (:issue:80
) - Add
tskit info
CLI command (:issue:66
) - Enable drawing SVG trees with coloured edges (:user:
hyanwong
; :issue:149
). - Add
Tree.is_descendant
method (:issue:120
) - Add
Tree.copy
method (:issue:122
)
Bugfixes
- Fixes to the low-level C API (:issue:
132
and :issue:157
)
C API Bugfixes
Feature update
Draft C API release
Draft of the C API. The tables API should be quite mature and well documented. Changes will only be made if serious problems occur. The tree sequence and tree APIs are more provisional and are subject to changes.
Changes:
- Change the
_tbl_
abbreviation to_table_
to improve readability. Hence, we now have, e.g.,tsk_node_table_t
etc. - Change
tsk_tbl_size_t
totsk_size_t
. - Standardise public API to use
tsk_size_t
andtsk_id_t
as appropriate. - Add
tsk_flags_t
typedef and consistently use this as the type used to encode bitwise flags. To avoid confusion, functions now have anoptions
parameter. - Rename
tsk_table_collection_position_t
totsk_bookmark_t
. - Rename
tsk_table_collection_reset_position
totsk_table_collection_truncate
andtsk_table_collection_record_position
totsk_table_collection_record_num_rows
. - Generalise
tsk_table_collection_sort
to take a bookmark as start argument. - Relax restriction that nodes in the
samples
argument to simplify must currently be marked as samples. (#72) - Allow
tsk_table_collection_simplify
to take a NULL samples argument to specify "all samples in the current tables". - Add support for building as a meson subproject.