Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support #[venndb(any)] filters #4

Merged
merged 11 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 93 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,58 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

# 0.2.0 (2024-04-12)
# 0.2.1 (2024-04-15)

A backwards compatible patch for [v0.2.0](#020-2024-04-15),
to support rows that allow any value for a specific column.

Non-Breaking changes:

* support `#[venndb(any)]` filters;
* these are possible only for `T` filter maps, where `T: ::venndb::Any`;
* `bool` filters cannot be `any` as `bool` doesn't implement the `::venndb::Any` trait;
* rows that are `any` will match regardless of the query filter used for that property;

Example usage:

```rust
use venndb::{Any, VennDB};
#[derive(Debug, PartialEq, Eq, Clone, Hash)]
pub enum Department {
Any,
Hr,
Engineering,
}

impl Any for Department {
fn is_any(&self) -> bool {
self == Department::Any
}
}

#[derive(Debug, VennDB)]
pub struct Employee {
name: String,
#[venndb(filter, any)]
department: Department,
}

let db = EmployeeDB::from_iter([
Employee { name: "Jack".to_owned(), department: Department::Any },
Employee { name: "Derby".to_owned(), department: Department::Hr },
]);
let mut query = db.query();

// will match Jack and Derby, as Jack is marked as Any, meaning it can work for w/e value
let hr_employees: Vec<_> = query.department(Department::Hr).execute().unwrap().iter().collect();
assert_eq!(hr_employees.len(), 2);
```

In case you combine it with the filter map property being optional (`department: Option<Department>`),
then it will still work the same, where rows with `None` are seen as nothing at all and just ignored.
This has no affect on the correct functioning of `Any`.

# 0.2.0 (2024-04-15)

Breaking Changes:

Expand All @@ -22,10 +73,51 @@ While this changes behaviour of `filters` and `filter maps` it is unlikely that
`Option<T>` for these types before, as their ergonomics have been a bit weird prior to this version.
Even more so for `filter maps` it could have resulted in panics.

Options, be it filters of filter maps, allow you to have rows that do not register any value for optional
properties, allowing them to exist without affecting the rows which do have it.

Non-Breaking Changes:

* improve documentation;

Updated Example from 0.1:

```rust
use venndb::VennDB

#[derive(Debug, VennDB)]
pub struct Employee {
#[venndb(key)]
id: u32,
name: String,
is_manager: Option<bool>,
is_admin: bool,
#[venndb(skip)]
foo: bool,
#[venndb(filter)]
department: Department,
#[venndb(filter)]
country: Option<String>,
}

fn main() {
let db = EmployeeDB::from_iter(/* .. */);

let mut query = db.query();
let employee = query
.is_admin(true)
.is_manager(false) // rows which have `None` for this property will NOT match this filter
.department(Department::Engineering)
.execute()
.expect("to have found at least one")
.any();

println!("non-manager admin engineer: {:?}", employee);
// as we didn't specify a `country` filter, even rows without a country specified will
// match here if they match the defined (query) filters)
}
```

# 0.1.1 (2024-04-10)

Non-Breaking Changes:
Expand Down
4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ repository = "https://github.com/plabayo/venndb"
keywords = ["database", "db", "memory", "bits"]
categories = ["database"]
authors = ["Glen De Cauwsemaecker <[email protected]>"]
version = "0.2.0"
version = "0.2.1"
rust-version = "1.75.0"

[package.metadata.docs.rs]
Expand All @@ -23,7 +23,7 @@ rustdoc-args = ["--cfg", "docsrs"]
bitvec = "1.0.1"
hashbrown = "0.14.3"
rand = "0.8.5"
venndb-macros = { version = "0.2.0", path = "venndb-macros" }
venndb-macros = { version = "0.2.1", path = "venndb-macros" }

[dev-dependencies]
divan = "0.1.14"
Expand Down
106 changes: 87 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ pub struct Employee {
is_admin: bool,
#[venndb(skip)]
foo: bool,
#[venndb(filter)]
#[venndb(filter, any)]
department: Department,
#[venndb(filter)]
country: Option<String>,
Expand Down Expand Up @@ -138,34 +138,54 @@ The benchmarks tests 3 different implementations of a proxy database

The benchmarks are created by:

1. running `just bench`
2. copying the output into [./scripts/plot_bench_charts](./scripts/plot_bench_charts.py) and running it
1. running `just bench`;
2. copying the output into [./scripts/plot_bench_charts](./scripts/plot_bench_charts.py) and running it.

Snippet that is ran for each 3 implementations:

```rust,ignore
fn test_db(db: &impl ProxyDB) {
let i = next_round();

let pool = POOLS[i % POOLS.len()];
let country = COUNTRIES[i % COUNTRIES.len()];

let result = db.get(i as u64);
divan::black_box(result);

let result = db.any_tcp(pool, country);
divan::black_box(result);

let result = db.any_socks5_isp(pool, country);
divan::black_box(result);
}
```

### Benchmark Performance Results

Performance for Database with `100` records:

| Proxy DB | Fastest (µs) | Median (µs) | Slowest (µs) |
| --- | --- | --- | --- |
| naive_proxy_db_100 | 6.87 | 7.48 | 19.16 |
| sql_lite_proxy_db_100 | 34.16 | 36.33 | 78.04 |
| venn_proxy_db_100 | 0.92 | 0.99 | 8.50 |
| naive_proxy_db_100 | 6.50 | 8.00 | 18.04 |
| sql_lite_proxy_db_100 | 32.58 | 37.37 | 302.00 |
| venn_proxy_db_100 | 0.89 | 0.92 | 2.74 |

Performance for Database with `12_500` records:

| Proxy DB | Fastest (µs) | Median (µs) | Slowest (µs) |
| --- | --- | --- | --- |
| naive_proxy_db_12_500 | 402.20 | 407.60 | 434.30 |
| sql_lite_proxy_db_12_500 | 1099.00 | 1182.00 | 1519.00 |
| venn_proxy_db_12_500 | 16.79 | 17.54 | 23.16 |
| naive_proxy_db_12_500 | 404.00 | 407.70 | 478.70 |
| sql_lite_proxy_db_12_500 | 1061.00 | 1073.00 | 1727.00 |
| venn_proxy_db_12_500 | 16.04 | 16.97 | 25.54 |

Performance for Database with `100_000` records:

| Proxy DB | Fastest (µs) | Median (µs) | Slowest (µs) |
| --- | --- | --- | --- |
| naive_proxy_db_100_000 | 3769.00 | 3882.00 | 5285.00 |
| sql_lite_proxy_db_100_000 | 8334.00 | 8628.00 | 10070.00 |
| venn_proxy_db_100_000 | 128.30 | 136.50 | 152.10 |
| naive_proxy_db_100_000 | 3790.00 | 3837.00 | 5731.00 |
| sql_lite_proxy_db_100_000 | 8219.00 | 8298.00 | 9424.00 |
| venn_proxy_db_100_000 | 124.20 | 129.20 | 156.30 |

We are not database nor hardware experts though. Please do open an issue if you think
these benchmarks are incorrect or if related improvements can be made.
Expand All @@ -179,24 +199,24 @@ Allocations for Database with `100` records:

| Proxy DB | Fastest (KB) | Median (KB) | Slowest (KB) |
| --- | --- | --- | --- |
| naive_proxy_db_100 | 0.38 | 0.38 | 0.38 |
| sql_lite_proxy_db_100 | 4.53 | 4.53 | 4.53 |
| naive_proxy_db_100 | 0.33 | 0.33 | 0.33 |
| sql_lite_proxy_db_100 | 4.04 | 4.04 | 4.04 |
| venn_proxy_db_100 | 0.05 | 0.05 | 0.05 |

Allocations for Database with `12_500` records:

| Proxy DB | Fastest (KB) | Median (KB) | Slowest (KB) |
| --- | --- | --- | --- |
| naive_proxy_db_12_500 | 40.22 | 40.22 | 40.22 |
| sql_lite_proxy_db_12_500 | 5.02 | 5.03 | 5.03 |
| naive_proxy_db_12_500 | 40.73 | 40.73 | 40.73 |
| sql_lite_proxy_db_12_500 | 5.03 | 5.02 | 5.03 |
| venn_proxy_db_12_500 | 3.15 | 3.15 | 3.15 |

Allocations for Database with `100_000` records:

| Proxy DB | Fastest (KB) | Median (KB) | Slowest (KB) |
| --- | --- | --- | --- |
| naive_proxy_db_100_000 | 324.00 | 324.00 | 324.00 |
| sql_lite_proxy_db_100_000 | 5.02 | 5.02 | 5.03 |
| naive_proxy_db_100_000 | 323.30 | 323.30 | 323.70 |
| sql_lite_proxy_db_100_000 | 5.02 | 5.02 | 5.01 |
| venn_proxy_db_100_000 | 25.02 | 25.02 | 25.02 |

We are not database nor hardware experts though. Please do open an issue if you think
Expand All @@ -219,13 +239,21 @@ Please [open an issue](https://github.com/plabayo/venndb/issues) and also read [

Alternatively you can also [join our Discord][discord-url] and start a conversation / discussion over there.

> ❓ Can I use _any_ type for a `#[venndb(filter)]` property?
> ❓ Can I use _whatever_ type for a `#[venndb(filter)]` property?

Yes, as long as it implements `PartialEq + Eq + Hash + Clone`.
That said, we do recommend that you use `enum` values if you can, or some other highly restricted form.

Using for example a `String` directly is a bad idea as that would mean that `bE` != `Be` != `BE` != `Belgium` != `Belgique` != `België`. Even though these are really referring all to the same country. In such cases a much better idea is to at the very least create a wrapper type such as `struct Country(String)`, to allow you to enforce sanitization/validation when creating the value and ensuring the hashes will be the same for those values that are conceptually the same.

> ❓ How do I make a filter optional?

Both filters (`bool` properties) and filter maps (`T != bool` properties with the `#[venndb(filter)]` attribute)
can be made optional by wrapping the types with `Option`, resulting in `Option<bool>` and `Option<T>`.

Rows that have the `Option::None` value for such an optional column cannot filter on that property,
but there is no other consequence beyond that.

> ❓ Why can do keys have to be unique and non-optional?

Within `venndb` keys are meant to be able to look up,
Expand All @@ -236,6 +264,46 @@ As such it makes no sense for such keys to be:
- duplicate: it would mean: as that can result in multiple rows or the wrong row to be returned;
- optional: as that would mean the row cannot be looked up when the key is not defined;

> ❓ How can I allow some rows to match for _any_ value of a certain (filter) column?

Filter maps can allow to have a value to match all other values. It is up to you to declare the filter as such,
and to also define for that type what the _one_ value to rule them all is.

Usage:

```rust,ignore
use venndb::{Any, VennDB};
#[derive(Debug, PartialEq, Eq, Clone, Hash)]
pub enum Department {
Any,
Hr,
Engineering,
}

impl Any for Department {
fn is_any(&self) -> bool {
self == Department::Any
}
}

#[derive(Debug, VennDB)]
pub struct Employee {
name: String,
#[venndb(filter, any)]
department: Department,
}

let db = EmployeeDB::from_iter([
Employee { name: "Jack".to_owned(), department: Department::Any },
Employee { name: "Derby".to_owned(), department: Department::Hr },
]);
let mut query = db.query();

// will match Jack and Derby, as Jack is marked as Any, meaning it can work for w/e value
let hr_employees: Vec<_> = query.department(Department::Hr).execute().unwrap().iter().collect();
assert_eq!(hr_employees.len(), 2);
```

## Example

Here follows an example demonstrating all the features of `VennDB`.
Expand Down
Loading