Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.2: optional filters #2

Merged
merged 7 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,27 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

# 0.2.0 (2024-04-12)

Breaking Changes:

* Support Option<T> in a special way:
* for filters it means that both positive and negative bits will be set to false if the value is `None`;
* for filter maps this means that the filter is not even registered;
* keys cannot be optional;
* While technically this is a breaking change it is not expected to actually break someone,
as keys always had to be unique already and two times `None` will result in same hash... so it is unlikely
that there was an `Option<T>` already used by someone;
* this is potentially breaking as some implementations from `0.1*` might have already used `Option` in a different way;

While this changes behaviour of `filters` and `filter maps` it is unlikely that someone was already using
`Option<T>` for these types before, as their ergonomics have been a bit weird prior to this version.
Even more so for `filter maps` it could have resulted in panics.

Non-Breaking Changes:

* improve documentation;

# 0.1.1 (2024-04-10)

Non-Breaking Changes:
Expand Down
4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ repository = "https://github.com/plabayo/venndb"
keywords = ["database", "db", "memory", "bits"]
categories = ["database", "db"]
authors = ["Glen De Cauwsemaecker <[email protected]>"]
version = "0.1.1"
version = "0.2.0"
rust-version = "1.75.0"

[package.metadata.docs.rs]
Expand All @@ -23,7 +23,7 @@ rustdoc-args = ["--cfg", "docsrs"]
bitvec = "1.0.1"
hashbrown = "0.14.3"
rand = "0.8.5"
venndb-macros = { version = "0.1.0", path = "venndb-macros" }
venndb-macros = { version = "0.2.0", path = "venndb-macros" }

[dev-dependencies]
divan = "0.1.14"
Expand Down
128 changes: 101 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ This database is designed for a very specific use case where you have mostly sta
like these can be large and should be both fast and compact.

For the limited usecases where `venndb` can be applied to,
ithas less dependencies and is faster then traditional choices,
such as a naive implementation or a more heavy lifted dependency then _Sqlite_.
it has less dependencies and is faster then traditional choices,
such as a naive implementation or a more heavy lifted dependency such as _Sqlite_.

> See [the benchmarks](#benchmarks) for more information on this topic.

Expand Down Expand Up @@ -90,12 +90,14 @@ pub struct Employee {
#[venndb(key)]
id: u32,
name: String,
is_manager: bool,
is_manager: Option<bool>,
is_admin: bool,
#[venndb(skip)]
foo: bool,
#[venndb(filter)]
department: Department,
#[venndb(filter)]
country: Option<String>,
}

fn main() {
Expand Down Expand Up @@ -224,6 +226,16 @@ That said, we do recommend that you use `enum` values if you can, or some other

Using for example a `String` directly is a bad idea as that would mean that `bE` != `Be` != `BE` != `Belgium` != `Belgique` != `België`. Even though these are really referring all to the same country. In such cases a much better idea is to at the very least create a wrapper type such as `struct Country(String)`, to allow you to enforce sanitization/validation when creating the value and ensuring the hashes will be the same for those values that are conceptually the same.

> ❓ Why can do keys have to be unique and non-optional?

Within `venndb` keys are meant to be able to look up,
a row which was previously received via filters.

As such it makes no sense for such keys to be:

- duplicate: it would mean: as that can result in multiple rows or the wrong row to be returned;
- optional: as that would mean the row cannot be looked up when the key is not defined;

## Example

Here follows an example demonstrating all the features of `VennDB`.
Expand Down Expand Up @@ -254,7 +266,10 @@ pub struct Employee {
name: String,
is_manager: bool,
is_admin: bool,
is_active: bool,
// filter (booleans) can be made optional,
// meaning that the row will not be able to be filtered (found)
// on this column when the row has a `None` value for it
is_active: Option<bool>,
// booleans are automatically turned into (query) filters,
// use the `skip` arg to stop this. As such it is only really needed for
// bool properties :)
Expand All @@ -272,18 +287,24 @@ pub struct Employee {
// trying to do so will result in a compile-team failure.
#[venndb(filter)]
department: Department,
// similar to regular bool filters,
// filter maps can also be optional.
// When a filter map is optional and the row's property for that filter is None,
// it will not be registered and thus not be able to filtered (found) on that property
#[venndb(filter)]
country: Option<String>,
}

fn main() {
let db = EmployeeInMemDB::from_iter([
RawCsvRow("1,John Doe,true,false,true,false,Engineering"),
RawCsvRow("2,Jane Doe,false,true,true,true,Sales"),
RawCsvRow("3,John Smith,false,false,true,false,Marketing"),
RawCsvRow("4,Jane Smith,true,true,false,true,HR"),
RawCsvRow("5,John Johnson,true,true,true,true,Engineering"),
RawCsvRow("6,Jane Johnson,false,false,false,false,Sales"),
RawCsvRow("7,John Brown,true,false,true,false,Marketing"),
RawCsvRow("8,Jane Brown,false,true,true,true,HR"),
RawCsvRow("1,John Doe,true,false,true,false,Engineering,USA"),
RawCsvRow("2,Jane Doe,false,true,true,true,Sales,"),
RawCsvRow("3,John Smith,false,false,,false,Marketing,"),
RawCsvRow("4,Jane Smith,true,true,false,true,HR,"),
RawCsvRow("5,John Johnson,true,true,true,true,Engineering,"),
RawCsvRow("6,Jane Johnson,false,false,,false,Sales,BE"),
RawCsvRow("7,John Brown,true,false,true,false,Marketing,BE"),
RawCsvRow("8,Jane Brown,false,true,true,true,HR,BR"),
])
.expect("MemDB created without errors (e.g. no duplicate keys)");

Expand Down Expand Up @@ -337,9 +358,22 @@ fn main() {
.any();
assert!(manager.id == 1 || manager.id == 5);

println!(">>> Optional bool filters have three possible values, where None != false. An important distinction to make...");
let mut query = db.query();
query.is_active(false);
let inactive_employees: Vec<_> = query
.execute()
.expect("to have found at least one")
.iter()
.collect();
assert_eq!(inactive_employees.len(), 1);
assert_eq!(inactive_employees[0].id, 4);

println!(">>> If you want you can also get the Employees back as a Vec, dropping the DB data all together...");
let employees = db.into_rows();
assert_eq!(employees.len(), 8);
assert!(employees[1].foo);
println!("All employees: {:?}", employees);

println!(">>> You can also get the DB back from the Vec, if you want start to query again...");
// of course better to just keep it as a DB to begin with, but let's pretend this is ok in this example
Expand All @@ -358,13 +392,23 @@ fn main() {
assert_eq!(sales_employees.len(), 1);
assert_eq!(sales_employees[0].name, "Jane Doe");

println!(">>> Filter maps that are optional work as well, e.g. you can query for all employees from USA...");
query.reset().country("USA".to_owned());
let usa_employees: Vec<_> = query
.execute()
.expect("to have found at least one")
.iter()
.collect();
assert_eq!(usa_employees.len(), 1);
assert_eq!(usa_employees[0].id, 1);

println!(">>> At any time you can also append new employees to the DB...");
assert!(db
.append(RawCsvRow("8,John Doe,true,false,true,false,Engineering"))
.append(RawCsvRow("8,John Doe,true,false,true,false,Engineering,"))
.is_err());
println!(">>> This will fail however if a property is not correct (e.g. ID (key) is not unique in this case), let's try this again...");
assert!(db
.append(RawCsvRow("9,John Doe,false,true,true,false,Engineering"))
.append(RawCsvRow("9,John Doe,false,true,true,false,Engineering,"))
.is_ok());
assert_eq!(db.len(), 9);

Expand All @@ -381,9 +425,10 @@ fn main() {

println!(">>> You can also extend it using an IntoIterator...");
db.extend([
RawCsvRow("10,Glenn Doe,false,true,true,true,Engineering"),
RawCsvRow("11,Peter Miss,true,true,true,true,HR"),
]).unwrap();
RawCsvRow("10,Glenn Doe,false,true,true,true,Engineering,"),
RawCsvRow("11,Peter Miss,true,true,true,true,HR,USA"),
])
.unwrap();
let mut query = db.query();
query
.department(Department::HR)
Expand All @@ -398,6 +443,19 @@ fn main() {
assert_eq!(employees.len(), 1);
assert_eq!(employees[0].id, 11);

println!(">>> There are now 2 employees from USA...");
query.reset().country("USA".to_owned());
let employees: Vec<_> = query
.execute()
.expect("to have found at least one")
.iter()
.collect();
assert_eq!(employees.len(), 2);
assert_eq!(
employees.iter().map(|e| e.id).sorted().collect::<Vec<_>>(),
[1, 11]
);

println!(">>> All previously data is still there as well of course...");
query
.reset()
Expand Down Expand Up @@ -425,14 +483,29 @@ where
{
fn from(RawCsvRow(s): RawCsvRow<S>) -> Employee {
let mut parts = s.as_ref().split(',');
let id = parts.next().unwrap().parse().unwrap();
let name = parts.next().unwrap().to_string();
let is_manager = parts.next().unwrap().parse().unwrap();
let is_admin = parts.next().unwrap().parse().unwrap();
let is_active = match parts.next().unwrap() {
"" => None,
s => Some(s.parse().unwrap()),
};
let foo = parts.next().unwrap().parse().unwrap();
let department = parts.next().unwrap().parse().unwrap();
let country = match parts.next().unwrap() {
"" => None,
s => Some(s.to_string()),
};
Employee {
id: parts.next().unwrap().parse().unwrap(),
name: parts.next().unwrap().to_string(),
is_manager: parts.next().unwrap().parse().unwrap(),
is_admin: parts.next().unwrap().parse().unwrap(),
is_active: parts.next().unwrap().parse().unwrap(),
foo: parts.next().unwrap().parse().unwrap(),
department: parts.next().unwrap().parse().unwrap(),
id,
name,
is_manager,
is_admin,
is_active,
foo,
department,
country,
}
}
}
Expand Down Expand Up @@ -473,11 +546,12 @@ pub struct Employee {
name: String,
is_manager: bool,
is_admin: bool,
is_active: bool,
is_active: Option<bool>,
#[venndb(skip)]
foo: bool,
#[venndb(filter)]
department: Department,
country: Option<String>,
}
```

Expand Down Expand Up @@ -519,8 +593,8 @@ Query (e.g. `EmployeeInMemDBQuery`)
| - | - |
| `EmployeeInMemDBQuery::reset(&mut self) -> &mut Self` | reset the query, bringing it back to the clean state it has on creation |
| `EmployeeInMemDBQuery::execute(&self) -> Option<EmployeeInMemDBQueryResult<'a>>` | return the result of the query using the set filters. It will be `None` in case no rows matched the defined filters. Or put otherwise, the result will contain at least one row when `Some(_)` is returned. |
| `EmployeeInMemDBQuery::is_manager(&mut self, value: bool) -> &mut Self` | a filter setter for a `bool` filter. One such method per `bool` filter (that isn't `skip`ped) will be available. E.g. if you have ` foo` filter then there will be a `EmployeeInMemDBQuery:foo` method. |
| `EmployeeInMemDBQuery::department(&mut self, value: Department) -> &mut Self` | a filter (map) setter for a non-`bool` filter. One such method per non-`bool` filter will be available. You can also `skip` these, but that's of course a bit pointless. The type will be equal to the actual field type. And the name will once again be equal to the original field name. |
| `EmployeeInMemDBQuery::is_manager(&mut self, value: bool) -> &mut Self` | a filter setter for a `bool` filter. One such method per `bool` filter (that isn't `skip`ped) will be available. E.g. if you have ` foo` filter then there will be a `EmployeeInMemDBQuery:foo` method. For _bool_ filters that are optional (`Option<bool>`) this method is also generated just the same. |
| `EmployeeInMemDBQuery::department(&mut self, value: Department) -> &mut Self` | a filter (map) setter for a non-`bool` filter. One such method per non-`bool` filter will be available. You can also `skip` these, but that's of course a bit pointless. The type will be equal to the actual field type. And the name will once again be equal to the original field name. Filter maps that have a `Option<T>` type have exactly the same signature. |

Query Result (e.g. `EmployeeInMemDBQueryResult`)

Expand Down
9 changes: 9 additions & 0 deletions fuzz/fuzz_targets/fuzz_employee_db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,11 @@ pub struct Employee {
id: u16,
_name: String,
earth: bool,
alive: Option<bool>,
#[venndb(filter)]
faction: Faction,
#[venndb(filter)]
planet: Option<Planet>,
}

#[derive(Clone, Debug, Arbitrary, PartialEq, Eq, Hash)]
Expand All @@ -20,6 +23,12 @@ pub enum Faction {
Empire,
}

#[derive(Clone, Debug, Arbitrary, PartialEq, Eq, Hash)]
pub enum Planet {
Earth,
Mars,
}

fuzz_target!(|rows: Vec<Employee>| {
let _ = EmployeeDB::from_rows(rows);
});
2 changes: 1 addition & 1 deletion venndb-macros/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ repository = "https://github.com/plabayo/venndb"
keywords = ["database", "db", "memory", "bits"]
categories = ["database", "db"]
authors = ["Glen De Cauwsemaecker <[email protected]>"]
version = "0.1.1"
version = "0.2.0"
rust-version = "1.75.0"

[package.metadata.docs.rs]
Expand Down
16 changes: 11 additions & 5 deletions venndb-macros/src/field.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ pub struct StructField<'a> {
/// The original parsed field
field: &'a syn::Field,
/// The parsed attributes of the field
attrs: FieldAttrs,
attrs: FieldAttrs<'a>,
/// The field name. This is contained optionally inside `field`,
/// but is duplicated non-optionally here to indicate that all field that
/// have reached this point must have a field name, and it no longer
Expand Down Expand Up @@ -52,6 +52,7 @@ impl<'a> KeyField<'a> {

pub struct FilterField<'a> {
pub name: &'a Ident,
pub optional: bool,
}

impl<'a> FilterField<'a> {
Expand All @@ -71,7 +72,7 @@ impl<'a> FilterField<'a> {
impl<'a> StructField<'a> {
/// Attempts to parse a field of a `#[derive(VennDB)]` struct, pulling out the
/// fields required for code generation.
pub fn new(_errors: &Errors, field: &'a syn::Field, attrs: FieldAttrs) -> Option<Self> {
pub fn new(_errors: &Errors, field: &'a syn::Field, attrs: FieldAttrs<'a>) -> Option<Self> {
let name = field.ident.as_ref().expect("missing ident for named field");
Some(StructField { field, attrs, name })
}
Expand All @@ -81,12 +82,16 @@ impl<'a> StructField<'a> {
self.attrs.kind.as_ref().map(|kind| match kind {
FieldKind::Key => FieldInfo::Key(KeyField {
name: self.name,
ty: &self.field.ty,
ty: self.attrs.option_ty.unwrap_or(&self.field.ty),
}),
FieldKind::Filter => FieldInfo::Filter(FilterField {
name: self.name,
optional: self.attrs.option_ty.is_some(),
}),
FieldKind::Filter => FieldInfo::Filter(FilterField { name: self.name }),
FieldKind::FilterMap => FieldInfo::FilterMap(FilterMapField {
name: self.name,
ty: &self.field.ty,
ty: self.attrs.option_ty.unwrap_or(&self.field.ty),
optional: self.attrs.option_ty.is_some(),
}),
})
}
Expand All @@ -95,6 +100,7 @@ impl<'a> StructField<'a> {
pub struct FilterMapField<'a> {
pub name: &'a Ident,
pub ty: &'a syn::Type,
pub optional: bool,
}

impl<'a> FilterMapField<'a> {
Expand Down
Loading
Loading