Skip to content

Releases: G-Research/spark-dgraph-connector

v0.4.2 (Spark 2.4) - 2020-07-28

14 Jul 10:27
Compare
Choose a tag to compare

Fixed

  • Fixed dependency conflicts between connector dependencies and Spark.

v0.4.1 (Spark 3.0) - 2020-07-27

14 Jul 10:26
Compare
Choose a tag to compare

Added

  • Add example how to load Dgraph data in PySpark. Fixed dependency conflicts between connector dependencies and Spark.

v0.4.1 (Spark 2.4) - 2020-07-27

14 Jul 10:26
Compare
Choose a tag to compare

Added

  • Add example how to load Dgraph data in PySpark. Fixed dependency conflicts between connector dependencies and Spark.

v0.4.0 (Spark 4.0) - 2020-07-24

14 Jul 10:23
Compare
Choose a tag to compare

Added

  • Add Spark filter pushdown and projection pushdown to improve efficiency when loading only subgraphs.
    Filters like .where($"revenue".isNotNull) and projections like .select($"subject", $"`dgraph.type`", $"revenue")
    will be pushed to Dgraph and only the relevant graph data will be read (issue #7).
  • Improve performance of PredicatePartitioner for multiple predicates per partition.
    Restoring default number of predicates per partition of 1000 from before 0.3.0 (issue #22).
  • The PredicatePartitioner combined with UidRangePartitioner is the default partitioner now.
  • Add stream-like reading of partitions from Dgraph. Partitions are split into smaller chunks.
    This make Spark read Dgraph partitions of any size.
  • Add Dgraph metrics to measure throughput, visible in Spark UI Stages page and through SparkListener.

Security

v0.4.0 (Spark 2.4) - 2020-07-24

14 Jul 10:23
Compare
Choose a tag to compare

Added

  • Add Spark filter pushdown and projection pushdown to improve efficiency when loading only subgraphs.
    Filters like .where($"revenue".isNotNull) and projections like .select($"subject", $"`dgraph.type`", $"revenue")
    will be pushed to Dgraph and only the relevant graph data will be read (issue #7).
  • Improve performance of PredicatePartitioner for multiple predicates per partition.
    Restoring default number of predicates per partition of 1000 from before 0.3.0 (issue #22).
  • The PredicatePartitioner combined with UidRangePartitioner is the default partitioner now.
  • Add stream-like reading of partitions from Dgraph. Partitions are split into smaller chunks.
    This make Spark read Dgraph partitions of any size.
  • Add Dgraph metrics to measure throughput, visible in Spark UI Stages page and through SparkListener.

Security

v0.3.0 (Spark 3.0) - 2020-06-22

14 Jul 10:20
Compare
Choose a tag to compare

Added

  • Load data from Dgraph cluster as GraphFrames GraphFrame.
  • Use exact uid cardinality for uid range partitioning. Combined with predicate partitioning, large predicates get split into more partitions than small predicates (issue #2).
  • Improve performance of PredicatePartitioner for a single predicate per partition (dgraph.partitioner.predicate.predicatesPerPartition=1). This becomes the new default for this partitioner.
  • Move to Spark 3.0.0 release (was 3.0.0-preview2).

Fixed

  • Dgraph groups with no predicates caused a NullPointerException.
  • Predicate names need to be escaped in Dgraph queries.

v0.3.0 (Spark 2.4) - 2020-06-22

14 Jul 10:19
Compare
Choose a tag to compare

Added

  • Use exact uid cardinality for uid range partitioning. Combined with predicate partitioning, large predicates get split into more partitions than small predicates (issue #2).
  • Improve performance of PredicatePartitioner for a single predicate per partition (dgraph.partitioner.predicate.predicatesPerPartition=1). This becomes the new default for this partitioner.
  • Move to Spark 2.4.6 release (was 2.4.5).

Fixed

  • Dgraph groups with no predicates caused a NullPointerException.
  • Predicate names need to be escaped in Dgraph queries.

v0.2.0 (Spark 2.4) - 2020-06-11

14 Jul 10:08
Compare
Choose a tag to compare

Added

  • Load nodes from Dgraph cluster as wide nodes (fully typed property columns).
  • Added dgraph.type and dgraph.graphql.schema predicates to be loaded from Dgraph cluster.

v0.2.0 (Spark 3.0) - 2020-06-11

14 Jul 10:14
Compare
Choose a tag to compare
Pre-release

Added

  • Load nodes from Dgraph cluster as wide nodes (fully typed property columns).
  • Added dgraph.type and dgraph.graphql.schema predicates to be loaded from Dgraph cluster.

v0.1.0 (Spark 3.0) - 2020-06-09

14 Jul 10:07
Compare
Choose a tag to compare
Pre-release

First release of the project

Added