Releases: G-Research/spark-dgraph-connector
Releases · G-Research/spark-dgraph-connector
v0.4.2 (Spark 2.4) - 2020-07-28
Fixed
- Fixed dependency conflicts between connector dependencies and Spark.
v0.4.1 (Spark 3.0) - 2020-07-27
Added
- Add example how to load Dgraph data in PySpark. Fixed dependency conflicts between connector dependencies and Spark.
v0.4.1 (Spark 2.4) - 2020-07-27
Added
- Add example how to load Dgraph data in PySpark. Fixed dependency conflicts between connector dependencies and Spark.
v0.4.0 (Spark 4.0) - 2020-07-24
Added
- Add Spark filter pushdown and projection pushdown to improve efficiency when loading only subgraphs.
Filters like.where($"revenue".isNotNull)
and projections like.select($"subject", $"`dgraph.type`", $"revenue")
will be pushed to Dgraph and only the relevant graph data will be read (issue #7). - Improve performance of
PredicatePartitioner
for multiple predicates per partition.
Restoring default number of predicates per partition of1000
from before 0.3.0 (issue #22). - The
PredicatePartitioner
combined withUidRangePartitioner
is the default partitioner now. - Add stream-like reading of partitions from Dgraph. Partitions are split into smaller chunks.
This make Spark read Dgraph partitions of any size. - Add Dgraph metrics to measure throughput, visible in Spark UI Stages page and through
SparkListener
.
Security
- Move Google Guava dependency version to 24.1.1-jre due to known security vulnerability fixed in 24.1.1
v0.4.0 (Spark 2.4) - 2020-07-24
Added
- Add Spark filter pushdown and projection pushdown to improve efficiency when loading only subgraphs.
Filters like.where($"revenue".isNotNull)
and projections like.select($"subject", $"`dgraph.type`", $"revenue")
will be pushed to Dgraph and only the relevant graph data will be read (issue #7). - Improve performance of
PredicatePartitioner
for multiple predicates per partition.
Restoring default number of predicates per partition of1000
from before 0.3.0 (issue #22). - The
PredicatePartitioner
combined withUidRangePartitioner
is the default partitioner now. - Add stream-like reading of partitions from Dgraph. Partitions are split into smaller chunks.
This make Spark read Dgraph partitions of any size. - Add Dgraph metrics to measure throughput, visible in Spark UI Stages page and through
SparkListener
.
Security
- Move Google Guava dependency version to 24.1.1-jre due to known security vulnerability fixed in 24.1.1
v0.3.0 (Spark 3.0) - 2020-06-22
Added
- Load data from Dgraph cluster as GraphFrames
GraphFrame
. - Use exact uid cardinality for uid range partitioning. Combined with predicate partitioning, large predicates get split into more partitions than small predicates (issue #2).
- Improve performance of
PredicatePartitioner
for a single predicate per partition (dgraph.partitioner.predicate.predicatesPerPartition=1
). This becomes the new default for this partitioner. - Move to Spark 3.0.0 release (was 3.0.0-preview2).
Fixed
- Dgraph groups with no predicates caused a
NullPointerException
. - Predicate names need to be escaped in Dgraph queries.
v0.3.0 (Spark 2.4) - 2020-06-22
Added
- Use exact uid cardinality for uid range partitioning. Combined with predicate partitioning, large predicates get split into more partitions than small predicates (issue #2).
- Improve performance of
PredicatePartitioner
for a single predicate per partition (dgraph.partitioner.predicate.predicatesPerPartition=1
). This becomes the new default for this partitioner. - Move to Spark 2.4.6 release (was 2.4.5).
Fixed
- Dgraph groups with no predicates caused a
NullPointerException
. - Predicate names need to be escaped in Dgraph queries.
v0.2.0 (Spark 2.4) - 2020-06-11
Added
- Load nodes from Dgraph cluster as wide nodes (fully typed property columns).
- Added
dgraph.type
anddgraph.graphql.schema
predicates to be loaded from Dgraph cluster.
v0.2.0 (Spark 3.0) - 2020-06-11
Added
- Load nodes from Dgraph cluster as wide nodes (fully typed property columns).
- Added
dgraph.type
anddgraph.graphql.schema
predicates to be loaded from Dgraph cluster.
v0.1.0 (Spark 3.0) - 2020-06-09
First release of the project
Added
- Load data from Dgraph cluster as triples (as strings or fully typed), edges or node
DataFrame
s. - Load data from Dgraph cluster as Apache Spark GraphX
Graph
. - Partitioning by Dgraph Group, Alpha node, predicates and uids.