Skip to content

Releases: KxSystems/ml

Modification to tab2df to handle single character columns

10 Aug 11:27
3405543
Compare
Choose a tag to compare

This release covers two changes to the interface

  1. Fix a minor bug in .ml.tab2df relating to the incorrect conversion of 'c' columns
// Define a table which will highlight the incorrect behaviour
q)tab:([]s:`a`b`c;j:1 2 3;c:"ABC")
// Old behaviour (duplicating 'ABC')
q)print .ml.tab2df tab
   s  j    c
0  a  1  ABC
1  b  2  ABC
2  c  3  ABC
// New behaviour
q)print .ml.tab2df tab  
   s  j    c
0  a  1  A
1  b  2  B
2  c  3  C
  1. Minor change to test scripts for continuous integration purposes due to update in python side default behaviour

Change to Kolmogorov Smirnov behaviour to account for scipy version

08 Jul 12:28
d69657c
Compare
Choose a tag to compare
  • Update to kolmogorov-smirnov release 1.0.0-rc.2 changed the behaviour of the feature significance tests to account for update in scipy but breaks on older versions of scipy. This has been fixed with a version type check.

Release candidate update, df2tab conversion handling of NaT

06 Jun 18:55
7cf1ae7
Compare
Choose a tag to compare
  • Previous version of df2tab did not account for null temporal types as introduced in pandas v1.0.0, updated functionality addresses this
  • Support for Pandas migrated to >1.0.0 as this is the production version of Pandas and thus presently the stable version of the API

Initial release candidate for version 1.0.0

12 May 10:40
7c0c8f0
Compare
Choose a tag to compare

Added clustering
Kx Clustering brings unsupervised machine learning techniques directly to kdb+ data, enabling users to discover patterns and infer hidden relationships within their datasets.
Features include

  • K-means clustering
  • Hierarchical clustering
  • DBSCAN clustering
  • Affinity Propagation clustering
  • CURE clustering
  • KD Tree implementation (for optimized nearest neighbor calcs)
  • Range of distance metrics and linkage algorithms
  • Clustering scoring metrics

v0.3.4

06 Jan 13:17
fab4c32
Compare
Choose a tag to compare

Updated mproc to manage multiple loads (e.g. fresh and xval)
Minor changes match scipy/numpy
Date and timezone management in pandas functions
Fixed tests

v0.3.3

19 Sep 09:14
9f653e8
Compare
Choose a tag to compare

MODIFICATIONS:

Example notebooks (and associated data/images) moved to mlnotebooks repo

v0.3.2

17 Jul 15:57
b63cf7f
Compare
Choose a tag to compare

MODIFICATIONS:

  • Update to requirements for pandas, needed based on modifications to .ml.df2tab and .ml.tab2df in order to handle date and time types in conversions.
    -> Pandas>=0.21

v0.3.1

05 Jul 16:26
5cc5af2
Compare
Choose a tag to compare

NEW
Multiprocessing library (mproc) for transparently distributing jobs
Serialization/deserialization (pickle) library for Python objects
Cross validation functions

  • .ml.xv.kfshuff (K-Fold cross-validation with randomized indices)
  • .ml.xv.kfsplit (K-Fold cross-validation with sequential indices)
  • .ml.xv.kfstrat (K-Fold cross-validation with stratified indices)
  • .ml.xv.mcsplit (Monte-Carlo cross-validation with random split indices)
  • .ml.xv.pcsplit (Percentage split cross-validation)
  • .ml.xv.tschain (Chain-forward cross-validation)
  • .ml.xv.tsrolls (Roll-forward cross-validation)

Grid search functions

  • .ml.gs.kfshuff (K-Fold cross-validation with randomized indices)
  • .ml.gs.kfsplit (K-Fold cross-validation with sequential indices)
  • .ml.gs.kfstrat (K-Fold cross-validation with stratified indices)
  • .ml.gs.mcsplit (Monte-Carlo cross-validation with random split indices)
  • .ml.gs.pcsplit (Percentage split cross-validation)
  • .ml.gs.tschain (Chain-forward cross-validation)
  • .ml.gs.tsrolls (Roll-forward cross-validation)
    Cross validation and gridsearch automatically support multiprocessing jobs

UPDATES
FRESH automatically supports multiprocessing jobs
Pandas conversion functions (.ml.df2tab and .ml.tab2df) support temporal conversions

v0.2.1

12 Apr 16:03
d765f9a
Compare
Choose a tag to compare

NEW

  • Ten new statistical metrics (fbscore, r2score, matthews correlation coeff etc.).
  • Two categorical encoding schemes (lexicographical and frequency).
  • Time/Date encoding.
  • Multiple hyper-parameter inputs now supported in FRESH.
  • Two new significant features selection options (k-best & percentile).
    MODIFICATIONS
  • Input structure modification to .ml.fresh.createfeatures full explanation at
    (code.kx.com/ml/toolkit/fresh).
  • Input structure modification to .ml.fresh.significantfeatures to account for
    additional significant feature selection methods.
  • Removal of .ml.util namespace, compression to .ml. This tidys implementations and
    removes ambiguity arising relating to if functions were true utils.
    NOTE: functions below here may have previously been in .ml.util namespace.
  • Underlying file structure change to tidies code locations within toolkit
    statistical functions -> util/metrics.q,
    true utils -> util/util.q,
    preprocessing functions -> util/preproc.q.
  • .ml.onehot no longer supports lists, input expected as tables. Encoding can be set to
    operate on a column by column basis.
  • .ml.comb returns combinations in ascending order, previous implementation
    had non-obvious return pattern.
  • .ml.filltab has modified expected dictionary input, previous behaviour was
    `linear`mean`median!`x`x1`x2, this has been changed to a more 'q like'
    mapping of columns to desired behaviours `x`x1`x2!`linear`mean`median.
  • .ml.filltab no longer default forward+backward fills on entry of ()!(), entry of
    empty dictionary now returns original table. Defaulted forward+backward fill is
    achieved through entry of :: in place of dict.
  • .ml.dropconstant now supports removal of constant keys of a dictionary
    FIXES
  • .ml.infreplace only worked correctly under the condition that both positive and
    negative infinities existed within the vector. Function now operates if positive,
    negative or no infinities are present in the vector.
    REMOVED
  • .ml.util.traintestsplitseed, behaviour can be set via q)\S x prior
    to application of .ml.traintestsplit.

v0.1.2

17 Dec 12:49
48ab13a
Compare
Choose a tag to compare
v0.1.2 Pre-release
Pre-release

Fix to Significant features function and addition of Appveyor test for windows install.

  • Changes to the feature significance function. In the previous release this had been performing incorrectly based on how .ml.fresh.benjhochfind and .ml.fresh.featuresignificance were interacting
  • Tests of feature benjamini-hochberg procedure have been made more rigorous to ensure function is performing correctly
  • Appveyor tests are now explicitly called on upload of new changes.
  • 6034eba: Fix to feature significance function
  • 48ab13achange path function to allow it to load in library into windows