Skip to content

v0.8.0

Compare
Choose a tag to compare
@Bribak Bribak released this 03 Aug 08:59
· 340 commits to master since this release
a2edec5

Change Log

For Version 0.8.0

  • Linted the package with flake8
  • Increased code coverage
  • Added another optional extras install, [chem], including glyles, requests, and pubchempy

glycan_data

  • Changed lib to be a dict of type glycoletters:index, as it’s faster to index a dict vs. a long list; also adapted all functions using lib to reflect this change

loader

  • Added replace_every_second helper function
  • Updated linkages list
  • Changed linkages and Hex etc to be sets instead of lists

motif

processing

  • Added variance_stabilization for variance stabilization normalization, both globally and group-specific
  • Added in_lib helper function to check whether all glycoletters of glycan are in lib
  • Deprecated small_motif_find
  • cohen_d now also returns the variance of the effect size and supports paired samples as well (calculating Cohen’s dz in this case)
  • Added mahalanobis_distance to calculate Mahalanobis distance as an effect size for multivariate comparisons
  • Added mahalanobis_variance to estimate variance of Mahalanobis distance via bootstrapping
  • Added MissForest for random forest based data imputation
  • Cleaned up canonicalize_iupac and made it slightly faster
  • Added variance_based_filtering
  • Added impute_and_normalize and underlying helper functions
  • Fixed numpy random seed for reproducibility
  • Sped-up presence_to_matrix

tokenization

  • Deprecated mz_to_composition
  • mz_to_composition2 is now the new mz_to_composition
  • Adapted mz_to_structures, compositions_to_structures, and match_composition_relaxed to work with this change

annotate

  • Added create_correlation_network to identify clusters of highly correlated glycans/motifs
  • Added count_unique_subgraphs_of_size_k as a helper function within get_k_saccharides
  • Refactor get_k_saccharides to be faster and more complete (and be, effectively, a replacement of motif_matrix)
  • annotate_dataset now uses get_k_saccharides for mono- and disaccharides, instead of motif_matrix
  • Deprecated motif_matrix
  • annotate_dataset now also creates relevant ?-containing motifs if ‘terminal’ in feature_set, even if they don’t explicitly occur in the glycan strings
  • Big speed-up for annotate_dataset if known=True, as we now cache the precalculated motif graphs
  • Added quantify_motifs as a wrapper around annotate_dataset to adequately distribute relative abundances across extracted motifs
  • Deprecated estimate_lower_bound as speed-ups make it no longer necessary

analysis

  • Renamed make_heatmap to get_heatmap
  • Renamed make_volcano to get_volcano
  • Deprecated replace_zero_with_random_gaussian (this is now handled by MissForest in .processing within impute_and_normalize)
  • Added hotellings_t2 for multivariate comparisons
  • Changed multiple-testing correction method from Holm-Sidak to Benjamini-Hochberg
  • Added variance_stabilization in get_differential_expression
  • Added the option to analyze highly correlated sets of glycans/motifs (via create_correlation_network) within get_differential_expression
  • Implemented usage of hotellings_t2 and the Mahalanobis distance (as effect size) for usage if sets are analyzed within get_differential_expression
  • get_heatmap and get_differential_expression now scale abundances by the actual counts of motifs per glycan, not just absence/presence
  • Added get_meta_analysis to estimate combined effect sizes from the results of multiple studies (both fixed-effects and random-effects models can be estimated)
  • Added variance_based_filtering in get_differential_expression
  • Effect size variances can now also be retrieved within get_differential_expression via the effect_size_variance keyword argument
  • get_differential_expression now also can handle paired samples when paired=True
  • get_differential_expression now also tests the homogeneity of variances using Levene’s test in all settings (also multiple-testing controlled)
  • Added get_glycanova to use ANOVA-based analyses on glycomics datasets (uses basically all the improvements of get_differential_expression, including analysis on the motif level)
  • Added get_pca to plot glycomics data (also has the motif interface)
  • Added get_pval_distribution to plot the distribution of p-values
  • Added get_ma to plot a Bland-Altman plot
  • Added get_glycan_change_over_time to detect significant changes in time-course data via OLS fitting
  • Added get_time_series as a wrapper around get_glycan_change_over_time to do time series analyses, with all the motif & normalization functionality
  • Added get_coverage to visualize glycan expression across samples (ordered by average intensity) in a coverage plot

draw

  • Added import warning if draw dependencies are not installed
  • Removed pycairo from dependencies
  • Modified annotate_figure to be compatible with .svg files from older Matplotlib versions
  • Changed “output” to “filepath” in GlycoDraw
  • If there are “?” in the provided filepath for GlycoDraw, they will now be automatically replaced with “_” to avoid saving errors

graph

  • Sped-up glycan_to_graph/glycan_to_nxGraph (and all downstream functions, which are a lot)
  • Also improved the runtime of downstream functions, such as subgraph_isomorphism independent of these advances
  • subgraph_isomorphism now also accepts precalculated motif graph as inputs (in addition to the already supported precalculated glycan graphs)

ml

  • Rephrased import warnings to reflect optional install strategy for extra dependencies

model_training

  • Sped-up train_ml_model

network

biosynthesis

  • create_neighbors no longer uses the libr keyword