Explore the docs »
Report bug
·
Request feature
This project, called SeqCluPV is an extension of the original SeqClu algorithm, developed by Dr.ir. Sicco Verwer of the Delft University of Technology, that is characterized by voting for cluster prototypes. The framework has been developed as part of the course CSE3000 Research Project at the Delft University of Technology. For instructions on how to get a local copy up and running, please refer to the Getting started section.
To get a local copy up and running follow these simple steps.
The project was made with Python 3.9, hence having Python 3.9 installed is a prerequisite.
- Install Cython
pip install Cython
- Clone the sktime repository in a separate directory
git clone https://github.com/alan-turing-institute/sktime.git
- After navigating to the sktime project root, install sktime
python setup.py install
- Install SeqCluPV
pip install seqclupv
- Clone the repo
git clone https://github.com/rtewierik/seqclupv.git
- Install Cython
pip install Cython
- Clone the sktime repository in a separate directory
git clone https://github.com/alan-turing-institute/sktime.git
- After navigating to the sktime project root, install sktime
python setup.py install
- After navigating to the SeqCluPV project root, install SeqCluPV
python setup.py install
The algorithm can be run on three data sets, which are the following.
- GesturePebbleZ1 (http://www.timeseriesclassification.com/description.php?Dataset=GesturePebbleZ1)
- UJI Pen Characters (https://archive.ics.uci.edu/ml/datasets/UJI+Pen+Characters)
- PLAID (http://www.timeseriesclassification.com/description.php?Dataset=PLAID)
The command-line interface can be used as follows.
python -m seqclupv numPrototypes numRepresentativePrototypes maxPerTick dataSourceParameters seqCluParameters maxIter online onlySeqClu experimentName
The potential values for the above parameters are as follows.
- numPrototypes: integer - The number of prototypes that will be used by all variants of the algorithm.
- numRepresentativePrototypes: integer - The number of representative prototypes that will be used by all variants of the algorithm.
- maxPerTick: integer - The maximum amount of sequences that can be processed per tick.
- dataSourceParameters: list[character] or list[boolean,string] - The two data sources that can be used are the handwritten character data source and the data source for the data sets from TimeSeriesClassification.com. For the handwritten character data source, this parameter is a JSON-formatted list of characters, where you can choose from the characters ['C', 'U', 'V', 'W', 'S', 'O', '1', '2', '3', '5', '6', '8', '9']. For the data sets from TimeSeriesClassification.com, this parameter is a list with two items, namely a boolean and a string in that order. The boolean value indicates whether or not the pair-wise distances between all items in the data set should be computed upfront, the string represents the name of the data set that is used. This string can be either of [\"pebble\",\"plaid\"]. NOTE: Since the list is JSON-formatted, the boolean values should be either true or false. Moreover, spaces are NOT allowed.
- seqCluParameters: list[integer, float, float, boolean, boolean] - The values in the list represent the following parameters in that order.
- bufferSize: integer - The maximum size of the buffer.
- minimumRepresentativeness: - float - The minimum average representativeness that prototypes should have in order for the distance computation from a sequence to the cluster that the prototypes represent to be approximated.
- prototypeValueratio: - float - The value 'a' in a:1 where a:1 is the ratio between the representativeness and the weight. This ratio is used to compute the value of a prototype as a linear combination of the representativeness and the weight of the prototype.
- clusterAssignment: - boolean - A boolean value indicating whether or not to approximate the distance to the cluster. NOTE: Since the list is JSON-formatted, the boolean values should be either true or false. Moreover, spaces are NOT allowed.
- buffering: - boolean - A boolean value indicating whether or not the buffering feature should be used. NOTE: Since the list is JSON-formatted, the boolean values should be either true or false. Moreover, spaces are NOT allowed.
- maxIter: integer - The maximum number of iterations that the offline baseline variant of the algorithm is allowed to execute. NOTE: This parameter is only needed when online and onlySeqClu are set to False, in other cases any integer is fine and the input will be ignored.
- online: boolean - A boolean value that will result in executing the online baseline variant of the SeqClu algorithm if set to true and the offline baseline variant of the algorithm if set to false. NOTE: Only the values 'True' or 'False' are possible here.
- onlySeqClu: boolean - A boolean value indicating whether or not only the SeqClu algorithm should be executed. NOTE: Only the values 'True' or 'False' are possible here.
- experimentName: string - The name of the experiment. This is used to compare the prototypes at the end of executing (online baseline variant of) the SeqClu algorithm. The possible values can be o29, o295w and pebbleFull.
A few examples of commands that are executed to run specific experiments are as follows.
Experiment with characters O, 2 and 9 of handwritten character data set using both the SeqClu algorithm and the online baseline variant of the SeqClu algorithm
python -m seqclupv 8 3 1 [\"O\",\"2\",\"9\"] [15,0.5,2.0,false,true] 0 True True o29
Experiment with Pebble data set using just the SeqClu algorithm
python -m seqclupv 8 3 1 [false,\"pebble\"] [15,0.5,3.0,true,false] 0 True True pebbleFull
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the project
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a pull request
Distributed under the MIT License. See LICENSE
for more information.
R.E.C. te Wierik - [email protected]
Project link: https://github.com/rtewierik/seqclupv