Skip to content

Data profiler is an attempt to model the behavior of a given operator for a set of datasets.

License

Notifications You must be signed in to change notification settings

giagiannis/data-profiler

Repository files navigation

data-profiler Build Status goreport Coverage Status Docker Automated build

data-profiler is a Go project used to transform a set of datasets, based on a set of characteristics (distribution similarity, correlation, etc.), in order to model the behavior of an operator, applied on top of them using Machine Learning techniques.

Screenshots

Similarity Matrix

Dataset Space

SVM Modeling

SVM Residuals Distribution

Installation

You have two ways of installing data-profiler:

  1. Through Go:
# GOPATH must be set
~> go get github.com/giagiannis/data-profiler
  1. Using Docker:
~> docker pull ggian/data-profiler

Usage

data-profiler can be used both through a CLI and a Web interface.

  1. CLI

You can access the CLI client through the data-profiler-utils binary.

~> $GOPATH/bin/data-profiler-utils

This previous command will give an overview of the available actions.

Note: use this client only if you know how data-profiler works.

  1. Web UI

First run the Docker container, providing a directory with the dataset files.

~> docker run -v /src/datasets:/datasets -p 8080:8080 -d ggian/data-profiler

This command mounts the host's /src/datasets directory to the container and forwards the host's 8080 port to the container. After the successful start of the container, go to http://dockerhost:8080 and insert the first set of datasets for analysis.

License

Apache License v2.0 (see LICENSE file for more)

Contact

Giannis Giannakopoulos [email protected]