Skip to content

Commit

Permalink
Merge pull request #24 from rec-jvm/v2
Browse files Browse the repository at this point in the history
v2 prototype
  • Loading branch information
kenpusney authored Mar 15, 2017
2 parents d58c07f + 5260991 commit c43a22b
Show file tree
Hide file tree
Showing 120 changed files with 168,431 additions and 2,178 deletions.
12 changes: 9 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
.idea/
.gradle/
out/
loadData/

/*.txt
/*.net.kimleo.rec
/*.rec
/*.patch
/default.rule

.DS_Store
build
classes/
classes/

*.txt
*.bin

logs/

*.log
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ jdk:
script:
- gradle build
- gradle fatJar
- ./testrec.sh
- ./testjs.sh
- ./rec dump rec-core/src/test/resources/caching.bin.test 3
194 changes: 2 additions & 192 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,196 +6,6 @@
It provides several DSL to describe the relationships and transformation rules
to check the data correctness and integrity.

## Example
Rec v2 is currently under design, and you can see `doc/design.md` for details.

Say if you have following data file:

person.txt
```csv
Kimmy | Leo | 1999/99/99 | self | 1000001 | not married | this is just a test data
Kimbryo | Leo | 1999/99/99 | brother | 1000002 | not married | this is just a test data
Kim | Leo | 1999/99/99 | brother | 1000003 | married | this is just a test data
Lisa | Leo | 1999/99/99 | sister | 1000004 | not married | this is just a test data
Amanda | Leo | 1999/99/99 | cusion | 1000005 | married | this is just a test data
```

salary.txt
```csv
1000001 | 10k
1000002 | 1000k
1000008 | 3t
```

if you provide following net.kimleo.rec files, it will check and tell you the details:

person.txt.rec
```
Person
delimiter=|
escape="
key=ID
format=first name, last name, {2}, ID, ..., comment
```
salary.txt.rec
```
Salary
delimiter=|
escape="
format=pid, amount
```

Rules(default.rule):
```
unique: Person[first name]
unique: Person.ID
exist: Salary.pid, Person.ID
uinque: Person.comment
```

when you run the jar, you will see following outputs:
```
1000008 of Salary[pid] cannot be found in Person[ID]
duplicate record found with comment: this is just a test data
duplicate record found with comment: this is just a test data
duplicate record found with comment: this is just a test data
duplicate record found with comment: this is just a test data
duplicate record found with comment: this is just a test data
```

## Rec file

Rec file is contains parameters for data files, and you can
use in future to process the data files.

The Rec file format is:
```Rec
<Record Name>
delimiter=<the delimiter>
escape=<the escape char>
key=<optional; the key of record; must be one of the format field>
format=<the format of the rec file, must seperate by comma>
```

## Initializer

Rec provide a initializer to generate your *.rec file by
provide some parameters.

Just run as followings:
```shell
java -jar rec-app.jar init <data file> [<name>=<value>]
```
The names and values you provided will added to the .rec file,
if you didn't provide enough parameter, there will be a
prompt for you to complete them.

## Format

Format is for generating the accessor for records.

By default, you should provide the field name which you want to
use in future analysis(the rules or the scripts).

It is called format because it has some useful conventions to help
you to get correct field as easy as you can.

Following introduction is based on this record file:
```Rec
Kimmy, Leo, 12, 1999/99/99, male, "Chengdu, Sichuan, China", Software Engineer
```

#### Fields

Cell in a format is just a field name, which can contains space
or enclosed by double quotes, this is matching to related position(column).

So for the record above, this format can extract it's first name and last name

```
first name, last name
```

#### Paddings

Usually we wont care about too much details in the data file, like if
we only focused the person's name and his address, we can just ignore
other cells, you can using padding to **skip** these unnecessary
cells.

```
first name, last name, {3}, address
```

`{3}` here means there are 3 cells which just can be ignored.

#### Placeholder

This is another useful convention when you need lookup the field
reversely.

For example, if we only care about the person and his job title,
then we can just using this convention:
```
first name, last name, ..., job title
```
Here, the `...` means, whatever cells between last field and first
2 cells, just ignore them.

also you can add more reversed field, then it will work as expected:
```
first name, last name, ..., job title
```

and even combine everything together:
```
first name, {3}, gender, ..., dob, {2}, job title
```

It seems weird that we can see there are only 7 field in the data
record but 9 (exclude `...`) in total of format.

But currently it is OK. what Rec will do is using the Placeholder (`...`)
as a separator of ordered and reverse-ordered field accessor, so for this
format it only requires not less than 5 cells-long data record, i.e. there
can be overlaps on both side of `...`.

## Rules

Rules is for analysis the data files. Currently there are only following
rules:

- Unique Rule, to check if a group of data cells is unique
- Exist Rule, to check if a data field is included in another set

Rules following the format:
```$xslt
<Rule Name>: [<Query>]
```

You can also add more rule and register to rule factory.

We've planed a more powerful tools: scripts, to do more extensible and
efficient analysis stuff.

## Query

[TODO]

## Script

Currently there are bare kotlin scripting support and some groovy scripting
support.

For groovy, you can see `rec-core/src/test/GroovyScripting.groovy` for example.

And for kotlin, there is a inner module `kotlin-api` to provide more detailed api.

## TODO

* Rec transformer
* More constraints
* API to java and groovy
* Annotation binding support for Java beans
* Typing

see doc/design.md for details.
For v1 document, see `doc/v1.md`.
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
allprojects {
version = '0.1.3'
version = '0.2.0'
}
7 changes: 0 additions & 7 deletions common/src/main/java/net/kimleo/rec/Pair.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,4 @@ public V getSecond() {
return second;
}

public K component1() {
return first;
}

public V component2() {
return second;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -123,9 +123,7 @@ public V put1(K key, V value) {

public static <V, U> MultiMap<V, U> from(Stream<V> collection, Function<V, U> transform) {
LinkedMultiHashMap<V, U> multiMap = new LinkedMultiHashMap<>();
collection.forEach(item -> {
multiMap.put1(item, transform.apply(item));
});
collection.forEach(item -> multiMap.put1(item, transform.apply(item)));

return multiMap;
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
package net.kimleo.rec.concept;

public interface Indexible<T> {
public interface Accessible<T> {
T get(int index);

int size();
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
package net.kimleo.rec.exception;

public class InitializationException extends RuntimeException {
public InitializationException(String s, Exception ex) {
super(s, ex);
}

public InitializationException(String message) {
super(message);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
package net.kimleo.rec.exception;

public class ResourceAccessException extends RuntimeException {
public ResourceAccessException(String message, Exception ex) {
super(message, ex);
}
}
5 changes: 5 additions & 0 deletions common/src/main/java/net/kimleo/rec/logging/LogAppender.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
package net.kimleo.rec.logging;

public interface LogAppender {
void append(String logEntry);
}
5 changes: 5 additions & 0 deletions common/src/main/java/net/kimleo/rec/logging/LogFormatter.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
package net.kimleo.rec.logging;

public interface LogFormatter {
String format(String name, LoggingLevel level, String message);
}
23 changes: 23 additions & 0 deletions common/src/main/java/net/kimleo/rec/logging/Logger.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
package net.kimleo.rec.logging;

import static net.kimleo.rec.logging.LoggingLevel.*;

public interface Logger {
default void trace(String msg) {
log(TRACE, msg);
}
default void debug(String msg) {
log(DEBUG, msg);
}
default void info(String msg) {
log(INFO, msg);
}
default void warn(String msg) {
log(WARN, msg);
}
default void error(String msg) {
log(ERROR, msg);
}

void log(LoggingLevel level, String msg);
}
19 changes: 19 additions & 0 deletions common/src/main/java/net/kimleo/rec/logging/LoggingLevel.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
package net.kimleo.rec.logging;

public enum LoggingLevel {
TRACE(0),
DEBUG(1),
INFO(2),
WARN(3),
ERROR(4);

private final int level;

LoggingLevel(int level){
this.level = level;
}

public int level() {
return level;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
package net.kimleo.rec.logging.impl;

import net.kimleo.rec.logging.LogAppender;
import net.kimleo.rec.logging.LogFormatter;
import net.kimleo.rec.logging.Logger;
import net.kimleo.rec.logging.LoggingLevel;

public class DefaultLogger implements Logger {
private final String name;
private final LoggingLevel loggingLevel;
private final LogFormatter logFormatter;
private final LogAppender appender;

public DefaultLogger(String name, LoggingLevel loggingLevel, LogFormatter logFormatter, LogAppender appender) {
this.name = name;
this.loggingLevel = loggingLevel;
this.logFormatter = logFormatter;
this.appender = appender;
}

@Override
public void log(LoggingLevel level, String msg) {
if (level.level() >= loggingLevel.level()) {
appender.append(logFormatter.format(name, level, msg));
}
}
}
Loading

0 comments on commit c43a22b

Please sign in to comment.