Skip to content

Releases: tskit-dev/tskit

C API 0.99.3

27 Jul 14:46
ab54128
Compare
Choose a tag to compare
C API 0.99.3 Pre-release
Pre-release

C API release.

Breaking changes

  • tsk_mutation_table_add_row has an extra time argument. If the time is unknown TSK_UNKNOWN_TIME should be passed. (@benjeffery, #672)
  • Change genotypes from unsigned to signed to accommodate missing data. (see #144 for discussion). This only affects users of the tsk_vargen_t class. Genotypes are now stored as int8_t and int16_t types rather than the former unsigned types. The field names in the genotypes union of the tsk_variant_t struct returned by tsk_vargen_next have been renamed to i8 and i16 accordingly; care should be taken when updating client code to ensure that types are correct. The number of distinct alleles supported by 8 bit genotypes has therefore dropped from 255 to 127, with a similar reduction for 16 bit genotypes.
  • Change the tsk_vargen_init method to take an extra parameter alleles. To keep the current behaviour, set this parameter to NULL.
  • Edges can now have metadata. Hence edge methods now take two extra arguments: metadata and metadata length. The file format has also changed to accommodate this, but is backwards compatible. Edge metadata can be disabled for a table collection with the TSK_NO_EDGE_METADATA flag. (@benjeffery, #496, #712)
  • Migrations can now have metadata. Hence migration methods now take two extra arguments: metadata and metadata length. The file format has also changed to accommodate this, but is backwards compatible. (@benjeffery, #505)
  • The text dump of tables with metadata now includes the metadata schema as a header. (@benjeffery, #493)
  • Bad tree topologies are detected earlier, so that it is no longer possible to create a tsk_treeseq_t object which contains a parent with contradictory children on an interval. Previously an error occurred when some operation building the trees was attempted. (@jeromekelleher, #709)

New features

  • New methods to perform set operations on table collections. tsk_table_collection_subset subsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690). tsk_table_collection_union forms the node-wise union of two table collections. (@mufernando, @petrelharp, #381, #623)
  • Mutations now have an optional double-precision floating-point time column. If not specified, this defaults to a particular NaN value (TSK_UNKNOWN_TIME) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see Mutation requirements. Add tsk_table_collection_compute_mutation_times and new flag to tsk_table_collection_check_integrity:TSK_CHECK_MUTATION_TIME. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence. (@benjeffery, #672)
  • Add metadata and metadata_schema fields to table collection, with accessors on tree sequence. These store arbitrary bytes and are optional in the file format. (:user: benjeffery, #641)
  • Add the TSK_KEEP_UNARY option to simplify (@gtsambos). See #1 and #143.
  • Add a set_root_threshold option to tsk_tree_t which allows us to set the number of samples a node must be an ancestor of to be considered a root. (#462)
  • Change the semantics of tsk_tree_t so that sample counts are always computed, and add a new TSK_NO_SAMPLE_COUNTS option to turn this off. (#462)
  • Tables with metadata now have an optional metadata_schema field that can contain arbitrary bytes. (@benjeffery, #493)
  • Tables loaded from a file can now be edited in the same way as any other table collection (@jeromekelleher, #536, #530)
  • Support for reading/writing to arbitrary file streams with the loadf/dumpf variants for tree sequence and table collection load/dump. (@jeromekelleher, @grahamgower, #565, #599)
  • Add low-level sorting API and TSK_NO_CHECK_INTEGRITY flag. (@jeromekelleher, #627, #626)
  • Add extension of Kendall-Colijn tree distance metric for tree sequences computed by tsk_treeseq_kc_distance (@daniel-goldstein, #548)

Deprecated

  • The TSK_SAMPLE_COUNTS options is now ignored and will print out a warning if used. (#462)

Minor feature release

22 Nov 15:37
842996b
Compare
Choose a tag to compare

Minor feature release, providing a tree distance metric and various method to manipulate tree sequence data.

New features

Bugfixes

Bugfix release

01 Sep 09:46
8759e8c
Compare
Choose a tag to compare

Minor bugfix release.

Relaxes overly-strict input requirements on individual location data that caused some SLiM tree sequences to fail loading in version 0.2.1(see :issue:351).

New features

  • Add log_time height scaling option for drawing SVG tree (:user:marianne-aspbury). See :pr:324 and :issue:303.

Bugfixes

  • Allow 4G metadata columns (:user:jeromekelleher). See :pr:342 and :issue:341.

Major feature release

23 Aug 16:48
67a8335
Compare
Choose a tag to compare

Major feature release, adding support for population genetic statistics,
improved VCF output and many other features.

Note: Version 0.2.0 was skipped because of an error uploading to PyPI
which could not be undone.

Breaking changes

  • Genotype arrays returned by TreeSequence.variants and
    TreeSequence.genotype_matrix have changed from unsigned 8 bit values
    to signed 8 bit values to accomodate missing data (see :issue:144 for
    discussion). Specifically, the dtype of the genotypes arrays have changed
    from numpy "u8" to "i8". This should not affect client code in any way
    unless it specifically depends on the type of the returned numpy array.

  • The VCF written by the write_vcf is no longer compatible with previous
    versions, which had significant shortcomings. Position values are now rounded
    to the nearest integer by default, REF and ALT values are derived from the
    actual allelic states (rather than always being A and T). Sample names
    are now of the form tsk_j for sample ID j. Most of the legacy behaviour
    can be recovered with new options, however.

  • The positional parameter reference_sets in genealogical_nearest_neighbours
    and mean_descendants TreeSequence methods has been renamed to
    sample_sets.

New features

  • Support for general windowed statistics. Implementations of diversity,
    divergence, segregating sites, Tajima's D, Fst, Patterson's F statistics,
    Y statistics, trait correlations and covariance, and k-dimensional allele
    frequency specra (:user:petrelharp, :user:jeromekelleher, :user:molpopgen).

  • Add the keep_unary option to simplify (:user:gtsambos). See :issue:1
    and :pr:143.

  • Add the map_ancestors method to TableCollection (user:gtsambos). See :pr:175.

  • Add the squash method to EdgeTable (:user:gtsambos). See :issue:59 and
    :pr:285.

  • Add support for individuals to VCF output, and fix major issues with output
    format (:user:jeromekelleher). Position values are transformed in a much
    more straightforward manner and output has been generalised substantially.
    Adds individual_names and position_transform arguments.
    See :pr:286, and issues :issue:2, :issue:30 and :issue:73.

  • Control height scale in SVG trees using 'tree_height_scale' and 'max_tree_height'
    (:user:hyanwong, :user:jeromekelleher). See :issue:167, :pr:168.
    Various other improvements to tree drawing (:pr:235, :pr:241, :pr:242,
    :pr:252, :pr:259).

  • Add Tree.max_root_time property (:user:hyanwong, :user:jeromekelleher).
    See :pr:170.

  • Improved input checking on various methods taking numpy arrays as parameters
    (:user:hyanwong). See :issue:8 and :pr:185.

  • Define the branch length over roots in trees to be zero (previously raise

  • Implementation of the genealogical nearest neighbours statistic
    (:user:hyanwong, :user:jeromekelleher).

  • New delete_intervals and keep_intervals method for the TableCollection
    to allow slicing out of topology from specific intervals (:user:hyanwong,
    :user:andrewkern, :user:petrelharp, :user:jeromekelleher). See
    :pr:225 and :pr:261.

  • Support for missing data via a topological definition (:user:jeromekelleher).
    See :issue:270 and :pr:272.

  • Add ability to set columns directly in the Tables API (:user:jeromekelleher).
    See :issue:12 and :pr:307.

  • Various documentation improvements from :user:brianzhang, :user:hyanwong,
    :user:petrelharp and :user:jeromekelleher.

Deprecated

  • Deprecate Tree.length in favour of Tree.span (:user:hyanwong).
    See :pr:169.

  • Deprecate TreeSequence.pairwise_diversity in favour of the new
    diversity method. See :issue:215, :pr:312.

Bugfixes

  • Catch NaN and infinity values within tables (:user:hyanwong).
    See :issue:293 and :pr:294.

Alpha access to AFS and VCF updates

13 Aug 15:10
056939f
Compare
Choose a tag to compare
Pre-release

An alpha release for testing new stats, allele frequency spectrum and VCF updates.

Alpha access to general stats

14 Jun 13:42
defa0f3
Compare
Choose a tag to compare
Pre-release

Alpha release to give early access to new stats and drawing APIs.

Removing Python 2 support

27 Mar 19:52
5c7b0f8
Compare
Choose a tag to compare

This release removes support for Python 2, adds more flexible tree access and a new tskit command line interface.

New features

  • Remove support for Python 2 (:user:hugovk). See :issue:137 and :pr:140.
  • More flexible tree API (:pr:121). Adds TreeSequence.at and TreeSequence.at_index methods to find specific trees, and efficient support for backwards traversal using reversed(ts.trees()).
  • Add initial tskit CLI (:issue:80)
  • Add tskit info CLI command (:issue:66)
  • Enable drawing SVG trees with coloured edges (:user:hyanwong; :issue:149).
  • Add Tree.is_descendant method (:issue:120)
  • Add Tree.copy method (:issue:122)

Bugfixes

  • Fixes to the low-level C API (:issue:132 and :issue:157)

C API Bugfixes

27 Mar 16:09
ed2cc1f
Compare
Choose a tag to compare
C API Bugfixes Pre-release
Pre-release

Bugfix release. Changes:

  • Fix incorrect errors on tbl_collection_dump (#132)
    - Catch table overflows (#157)

Feature update

01 Feb 16:40
3c03952
Compare
Choose a tag to compare

Minor feature update. Using the C API 0.99.1.

New features

  • Add interface for setting TableCollection.sequence_length: #107
  • Add support for building and dropping TableCollection indexes: #108

Draft C API release

28 Jan 11:30
f24ad8d
Compare
Choose a tag to compare
Draft C API release Pre-release
Pre-release

Draft of the C API. The tables API should be quite mature and well documented. Changes will only be made if serious problems occur. The tree sequence and tree APIs are more provisional and are subject to changes.

Changes:

  • Change the _tbl_ abbreviation to _table_ to improve readability. Hence, we now have, e.g., tsk_node_table_t etc.
  • Change tsk_tbl_size_t to tsk_size_t.
  • Standardise public API to use tsk_size_t and tsk_id_t as appropriate.
  • Add tsk_flags_t typedef and consistently use this as the type used to encode bitwise flags. To avoid confusion, functions now have an options parameter.
  • Rename tsk_table_collection_position_t to tsk_bookmark_t.
  • Rename tsk_table_collection_reset_position to tsk_table_collection_truncate and tsk_table_collection_record_position to tsk_table_collection_record_num_rows.
  • Generalise tsk_table_collection_sort to take a bookmark as start argument.
  • Relax restriction that nodes in the samples argument to simplify must currently be marked as samples. (#72)
  • Allow tsk_table_collection_simplify to take a NULL samples argument to specify "all samples in the current tables".
  • Add support for building as a meson subproject.