Releases: KxSystems/ml
Modification to tab2df to handle single character columns
This release covers two changes to the interface
- Fix a minor bug in
.ml.tab2df
relating to the incorrect conversion of 'c' columns
// Define a table which will highlight the incorrect behaviour
q)tab:([]s:`a`b`c;j:1 2 3;c:"ABC")
// Old behaviour (duplicating 'ABC')
q)print .ml.tab2df tab
s j c
0 a 1 ABC
1 b 2 ABC
2 c 3 ABC
// New behaviour
q)print .ml.tab2df tab
s j c
0 a 1 A
1 b 2 B
2 c 3 C
- Minor change to test scripts for continuous integration purposes due to update in python side default behaviour
Change to Kolmogorov Smirnov behaviour to account for scipy version
- Update to kolmogorov-smirnov release 1.0.0-rc.2 changed the behaviour of the feature significance tests to account for update in scipy but breaks on older versions of scipy. This has been fixed with a version type check.
Release candidate update, df2tab conversion handling of NaT
- Previous version of df2tab did not account for null temporal types as introduced in pandas v1.0.0, updated functionality addresses this
- Support for Pandas migrated to >1.0.0 as this is the production version of Pandas and thus presently the stable version of the API
Initial release candidate for version 1.0.0
Added clustering
Kx Clustering brings unsupervised machine learning techniques directly to kdb+ data, enabling users to discover patterns and infer hidden relationships within their datasets.
Features include
- K-means clustering
- Hierarchical clustering
- DBSCAN clustering
- Affinity Propagation clustering
- CURE clustering
- KD Tree implementation (for optimized nearest neighbor calcs)
- Range of distance metrics and linkage algorithms
- Clustering scoring metrics
v0.3.4
Updated mproc to manage multiple loads (e.g. fresh and xval)
Minor changes match scipy/numpy
Date and timezone management in pandas functions
Fixed tests
v0.3.3
MODIFICATIONS:
Example notebooks (and associated data/images) moved to mlnotebooks repo
v0.3.2
MODIFICATIONS:
- Update to requirements for pandas, needed based on modifications to
.ml.df2tab
and.ml.tab2df
in order to handle date and time types in conversions.
-> Pandas>=0.21
v0.3.1
NEW
Multiprocessing library (mproc) for transparently distributing jobs
Serialization/deserialization (pickle) library for Python objects
Cross validation functions
- .ml.xv.kfshuff (K-Fold cross-validation with randomized indices)
- .ml.xv.kfsplit (K-Fold cross-validation with sequential indices)
- .ml.xv.kfstrat (K-Fold cross-validation with stratified indices)
- .ml.xv.mcsplit (Monte-Carlo cross-validation with random split indices)
- .ml.xv.pcsplit (Percentage split cross-validation)
- .ml.xv.tschain (Chain-forward cross-validation)
- .ml.xv.tsrolls (Roll-forward cross-validation)
Grid search functions
- .ml.gs.kfshuff (K-Fold cross-validation with randomized indices)
- .ml.gs.kfsplit (K-Fold cross-validation with sequential indices)
- .ml.gs.kfstrat (K-Fold cross-validation with stratified indices)
- .ml.gs.mcsplit (Monte-Carlo cross-validation with random split indices)
- .ml.gs.pcsplit (Percentage split cross-validation)
- .ml.gs.tschain (Chain-forward cross-validation)
- .ml.gs.tsrolls (Roll-forward cross-validation)
Cross validation and gridsearch automatically support multiprocessing jobs
UPDATES
FRESH automatically supports multiprocessing jobs
Pandas conversion functions (.ml.df2tab and .ml.tab2df) support temporal conversions
v0.2.1
NEW
- Ten new statistical metrics (fbscore, r2score, matthews correlation coeff etc.).
- Two categorical encoding schemes (lexicographical and frequency).
- Time/Date encoding.
- Multiple hyper-parameter inputs now supported in FRESH.
- Two new significant features selection options (k-best & percentile).
MODIFICATIONS - Input structure modification to
.ml.fresh.createfeatures
full explanation at
(code.kx.com/ml/toolkit/fresh). - Input structure modification to
.ml.fresh.significantfeatures
to account for
additional significant feature selection methods. - Removal of
.ml.util
namespace, compression to.ml
. This tidys implementations and
removes ambiguity arising relating to if functions were true utils.
NOTE: functions below here may have previously been in.ml.util
namespace. - Underlying file structure change to tidies code locations within toolkit
statistical functions -> util/metrics.q,
true utils -> util/util.q,
preprocessing functions -> util/preproc.q. .ml.onehot
no longer supports lists, input expected as tables. Encoding can be set to
operate on a column by column basis..ml.comb
returns combinations in ascending order, previous implementation
had non-obvious return pattern..ml.filltab
has modified expected dictionary input, previous behaviour was
`linear`mean`median!`x`x1`x2
, this has been changed to a more 'q like'
mapping of columns to desired behaviours`x`x1`x2!`linear`mean`median
..ml.filltab
no longer default forward+backward fills on entry of ()!(), entry of
empty dictionary now returns original table. Defaulted forward+backward fill is
achieved through entry of::
in place of dict..ml.dropconstant
now supports removal of constant keys of a dictionary
FIXES.ml.infreplace
only worked correctly under the condition that both positive and
negative infinities existed within the vector. Function now operates if positive,
negative or no infinities are present in the vector.
REMOVED.ml.util.traintestsplitseed
, behaviour can be set viaq)\S x
prior
to application of.ml.traintestsplit.
v0.1.2
Fix to Significant features function and addition of Appveyor test for windows install.
- Changes to the feature significance function. In the previous release this had been performing incorrectly based on how
.ml.fresh.benjhochfind
and.ml.fresh.featuresignificance
were interacting - Tests of feature benjamini-hochberg procedure have been made more rigorous to ensure function is performing correctly
- Appveyor tests are now explicitly called on upload of new changes.