v0.4.0 (Spark 2.4) - 2020-07-24
·
174 commits
to spark-3.3
since this release
Added
- Add Spark filter pushdown and projection pushdown to improve efficiency when loading only subgraphs.
Filters like.where($"revenue".isNotNull)
and projections like.select($"subject", $"`dgraph.type`", $"revenue")
will be pushed to Dgraph and only the relevant graph data will be read (issue #7). - Improve performance of
PredicatePartitioner
for multiple predicates per partition.
Restoring default number of predicates per partition of1000
from before 0.3.0 (issue #22). - The
PredicatePartitioner
combined withUidRangePartitioner
is the default partitioner now. - Add stream-like reading of partitions from Dgraph. Partitions are split into smaller chunks.
This make Spark read Dgraph partitions of any size. - Add Dgraph metrics to measure throughput, visible in Spark UI Stages page and through
SparkListener
.
Security
- Move Google Guava dependency version to 24.1.1-jre due to known security vulnerability fixed in 24.1.1