v0.8.0
Change Log
For Version 0.8.0
- Linted the package with flake8
- Increased code coverage
- Added another optional extras install, [chem], including glyles, requests, and pubchempy
glycan_data
- Changed
lib
to be a dict of type glycoletters:index, as it’s faster to index a dict vs. a long list; also adapted all functions usinglib
to reflect this change
loader
- Added
replace_every_second
helper function - Updated
linkages
list - Changed
linkages
andHex
etc to be sets instead of lists
motif
processing
- Added
variance_stabilization
for variance stabilization normalization, both globally and group-specific - Added
in_lib
helper function to check whether all glycoletters of glycan are in lib - Deprecated
small_motif_find
cohen_d
now also returns the variance of the effect size and supports paired samples as well (calculating Cohen’s dz in this case)- Added
mahalanobis_distance
to calculate Mahalanobis distance as an effect size for multivariate comparisons - Added
mahalanobis_variance
to estimate variance of Mahalanobis distance via bootstrapping - Added
MissForest
for random forest based data imputation - Cleaned up
canonicalize_iupac
and made it slightly faster - Added
variance_based_filtering
- Added
impute_and_normalize
and underlying helper functions - Fixed numpy random seed for reproducibility
- Sped-up
presence_to_matrix
tokenization
- Deprecated
mz_to_composition
mz_to_composition2
is now the newmz_to_composition
- Adapted
mz_to_structures
,compositions_to_structures
, andmatch_composition_relaxed
to work with this change
annotate
- Added
create_correlation_network
to identify clusters of highly correlated glycans/motifs - Added
count_unique_subgraphs_of_size_k
as a helper function withinget_k_saccharides
- Refactor
get_k_saccharides
to be faster and more complete (and be, effectively, a replacement ofmotif_matrix
) annotate_dataset
now usesget_k_saccharides
for mono- and disaccharides, instead ofmotif_matrix
- Deprecated
motif_matrix
annotate_dataset
now also creates relevant ?-containing motifs if ‘terminal’ in feature_set, even if they don’t explicitly occur in the glycan strings- Big speed-up for
annotate_dataset
if known=True, as we now cache the precalculated motif graphs - Added
quantify_motifs
as a wrapper aroundannotate_dataset
to adequately distribute relative abundances across extracted motifs - Deprecated
estimate_lower_bound
as speed-ups make it no longer necessary
analysis
- Renamed
make_heatmap
toget_heatmap
- Renamed
make_volcano
toget_volcano
- Deprecated
replace_zero_with_random_gaussian
(this is now handled byMissForest
in .processing withinimpute_and_normalize
) - Added
hotellings_t2
for multivariate comparisons - Changed multiple-testing correction method from Holm-Sidak to Benjamini-Hochberg
- Added
variance_stabilization
inget_differential_expression
- Added the option to analyze highly correlated sets of glycans/motifs (via
create_correlation_network
) withinget_differential_expression
- Implemented usage of
hotellings_t2
and the Mahalanobis distance (as effect size) for usage if sets are analyzed withinget_differential_expression
get_heatmap
andget_differential_expression
now scale abundances by the actual counts of motifs per glycan, not just absence/presence- Added
get_meta_analysis
to estimate combined effect sizes from the results of multiple studies (both fixed-effects and random-effects models can be estimated) - Added
variance_based_filtering
inget_differential_expression
- Effect size variances can now also be retrieved within
get_differential_expression
via the effect_size_variance keyword argument get_differential_expression
now also can handle paired samples when paired=Trueget_differential_expression
now also tests the homogeneity of variances using Levene’s test in all settings (also multiple-testing controlled)- Added
get_glycanova
to use ANOVA-based analyses on glycomics datasets (uses basically all the improvements ofget_differential_expression
, including analysis on the motif level) - Added
get_pca
to plot glycomics data (also has the motif interface) - Added
get_pval_distribution
to plot the distribution of p-values - Added
get_ma
to plot a Bland-Altman plot - Added
get_glycan_change_over_time
to detect significant changes in time-course data via OLS fitting - Added
get_time_series
as a wrapper aroundget_glycan_change_over_time
to do time series analyses, with all the motif & normalization functionality - Added
get_coverage
to visualize glycan expression across samples (ordered by average intensity) in a coverage plot
draw
- Added import warning if draw dependencies are not installed
- Removed
pycairo
from dependencies - Modified
annotate_figure
to be compatible with .svg files from older Matplotlib versions - Changed “output” to “filepath” in
GlycoDraw
- If there are “?” in the provided filepath for
GlycoDraw
, they will now be automatically replaced with “_” to avoid saving errors
graph
- Sped-up
glycan_to_graph
/glycan_to_nxGraph
(and all downstream functions, which are a lot) - Also improved the runtime of downstream functions, such as
subgraph_isomorphism
independent of these advances subgraph_isomorphism
now also accepts precalculated motif graph as inputs (in addition to the already supported precalculated glycan graphs)
ml
- Rephrased import warnings to reflect optional install strategy for extra dependencies
model_training
- Sped-up
train_ml_model
network
biosynthesis
create_neighbors
no longer uses the libr keyword