Skip to content

Major feature release

Compare
Choose a tag to compare
@jeromekelleher jeromekelleher released this 23 Aug 16:48
67a8335

Major feature release, adding support for population genetic statistics,
improved VCF output and many other features.

Note: Version 0.2.0 was skipped because of an error uploading to PyPI
which could not be undone.

Breaking changes

  • Genotype arrays returned by TreeSequence.variants and
    TreeSequence.genotype_matrix have changed from unsigned 8 bit values
    to signed 8 bit values to accomodate missing data (see :issue:144 for
    discussion). Specifically, the dtype of the genotypes arrays have changed
    from numpy "u8" to "i8". This should not affect client code in any way
    unless it specifically depends on the type of the returned numpy array.

  • The VCF written by the write_vcf is no longer compatible with previous
    versions, which had significant shortcomings. Position values are now rounded
    to the nearest integer by default, REF and ALT values are derived from the
    actual allelic states (rather than always being A and T). Sample names
    are now of the form tsk_j for sample ID j. Most of the legacy behaviour
    can be recovered with new options, however.

  • The positional parameter reference_sets in genealogical_nearest_neighbours
    and mean_descendants TreeSequence methods has been renamed to
    sample_sets.

New features

  • Support for general windowed statistics. Implementations of diversity,
    divergence, segregating sites, Tajima's D, Fst, Patterson's F statistics,
    Y statistics, trait correlations and covariance, and k-dimensional allele
    frequency specra (:user:petrelharp, :user:jeromekelleher, :user:molpopgen).

  • Add the keep_unary option to simplify (:user:gtsambos). See :issue:1
    and :pr:143.

  • Add the map_ancestors method to TableCollection (user:gtsambos). See :pr:175.

  • Add the squash method to EdgeTable (:user:gtsambos). See :issue:59 and
    :pr:285.

  • Add support for individuals to VCF output, and fix major issues with output
    format (:user:jeromekelleher). Position values are transformed in a much
    more straightforward manner and output has been generalised substantially.
    Adds individual_names and position_transform arguments.
    See :pr:286, and issues :issue:2, :issue:30 and :issue:73.

  • Control height scale in SVG trees using 'tree_height_scale' and 'max_tree_height'
    (:user:hyanwong, :user:jeromekelleher). See :issue:167, :pr:168.
    Various other improvements to tree drawing (:pr:235, :pr:241, :pr:242,
    :pr:252, :pr:259).

  • Add Tree.max_root_time property (:user:hyanwong, :user:jeromekelleher).
    See :pr:170.

  • Improved input checking on various methods taking numpy arrays as parameters
    (:user:hyanwong). See :issue:8 and :pr:185.

  • Define the branch length over roots in trees to be zero (previously raise

  • Implementation of the genealogical nearest neighbours statistic
    (:user:hyanwong, :user:jeromekelleher).

  • New delete_intervals and keep_intervals method for the TableCollection
    to allow slicing out of topology from specific intervals (:user:hyanwong,
    :user:andrewkern, :user:petrelharp, :user:jeromekelleher). See
    :pr:225 and :pr:261.

  • Support for missing data via a topological definition (:user:jeromekelleher).
    See :issue:270 and :pr:272.

  • Add ability to set columns directly in the Tables API (:user:jeromekelleher).
    See :issue:12 and :pr:307.

  • Various documentation improvements from :user:brianzhang, :user:hyanwong,
    :user:petrelharp and :user:jeromekelleher.

Deprecated

  • Deprecate Tree.length in favour of Tree.span (:user:hyanwong).
    See :pr:169.

  • Deprecate TreeSequence.pairwise_diversity in favour of the new
    diversity method. See :issue:215, :pr:312.

Bugfixes

  • Catch NaN and infinity values within tables (:user:hyanwong).
    See :issue:293 and :pr:294.