Investigate possibility of PipelineDP API for Spark SQL #288
Labels
Type: New Feature ➕
Introduction of a completely new addition to the codebase
Type: Research 🔬
When further investigation into a subject is required
Context
PipleineDP supports anonymzation with Spark RDD API (example). It seems interesting to have the support of Spark SQL API.
Goal
To investigate and to design SparkSQL API for PipelineDP.
Possible example of PipelineDP SparkSQL API:
Note: This task consists for researching possible options (both API and implementation design) and proposing something that is useful for users and might be implemented reasonably simple.
Additional information
On PipelineDP Architecture
DPEngine
(code) class which implements Differential Private (DP) logic independently of the pipeline framework (now run with Apache Spark, Apache Beam and w/o framework is supported).DPEngine.aggregate() is the main method, which can perform any supported DP aggregations. Basically it's equivalent of running SQL query
where supported
dp_aggregate_function
are from the metric list.On implementation
The implementation will likely be parsing of SQL and calling of DPEngine.aggregate().
Open questions from Spark
The text was updated successfully, but these errors were encountered: