From 23617f064055889ed79da581c23b5c8b9c833b1b Mon Sep 17 00:00:00 2001 From: vibhatha Date: Thu, 21 Sep 2023 08:09:30 +0530 Subject: [PATCH] fix: address reviews v2 --- java/source/c_data.rst | 33 ++++++++++++++----- java/source/python_java.rst | 66 +++++++++++++++++++++++-------------- 2 files changed, 66 insertions(+), 33 deletions(-) diff --git a/java/source/c_data.rst b/java/source/c_data.rst index 2ccd84d6..851e8dec 100644 --- a/java/source/c_data.rst +++ b/java/source/c_data.rst @@ -1,12 +1,29 @@ -.. _c-data-java: +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at -================== -C Data Integration -================== +.. http://www.apache.org/licenses/LICENSE-2.0 -C Data interface is an important aspect of supporting multiple languages in Apache Arrow. -A Java programme can seamlessly work with C++ and Python programmes. The following examples -demonstrates how it can be done. +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. -:ref:`arrow-python-java` +.. _c-data: + +================ +C Data Interface +================ + +The Arrow C Data Interface enables zero-copy sharing of Arrow data between language +runtimes. A Java programme can seamlessly work with C++ and Python programs. +The following examples demonstrates how it can be done. + +:ref:`Python Java ` ------------------------ diff --git a/java/source/python_java.rst b/java/source/python_java.rst index b8f6cdce..b6794641 100644 --- a/java/source/python_java.rst +++ b/java/source/python_java.rst @@ -1,3 +1,20 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + .. _arrow-python-java: ======================== @@ -12,21 +29,28 @@ This document provides a guide on how to enable seamless data exchange between P Dictionary Data Roundtrip ========================= - This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java, - and finally re-accessed and validated in Python for data consistency. +This section demonstrates a data roundtrip where C Data interface is being used to provide +the seamless access to data across language boundaries. + + +Python Component +---------------- +In the Python-based component, the data roundtrip process is demonstrated through a sequential workflow. -Python Component: ------------------ +1. Create data in Python +2. Export data to Java +3. Import updated data from Java +4. Validate the data consistency - The Python code uses jpype to start the JVM and make the Java class MapValuesConsumer available to Python. - Data is generated in PyArrow and exported through C Data to Java. +The Python code uses `jpype `_ to start the JVM and make the Java class MapValuesConsumer available to Python. +Data is generated in PyArrow and exported through C Data to Java. .. code-block:: python import jpype import jpype.imports - from jpype.types import * + from jpype.types import JClass import pyarrow as pa from pyarrow.cffi import ffi as arrow_c @@ -43,7 +67,7 @@ Python Component: dictionary = pa.dictionary(pa.int64(), pa.utf8()) array = pa.array(["A", "B", "C", "A", "D"], dictionary) print("From Python") - print("Dictionary Created: ", array) + print("Dictionary Created:", array) # create the CDataDictionaryProvider instance which is # required to create dictionary array precisely @@ -62,7 +86,7 @@ Python Component: array.type._export_to_c(c_schema_ptr) # Send Array and its Schema to the Java function - # that will update the dictionary + # update values in Java consumer.update(c_array_ptr, c_schema_ptr) # Importing updated values from Java to Python @@ -92,7 +116,7 @@ Python Component: # In Java and Python, the same memory is being accessed through the C Data interface. # Since the array from Java and array created in Python should have same data. assert updated_array.equals(array) - print("Updated Array: ", updated_array) + print("Updated Array:", updated_array) del updated_array @@ -134,19 +158,17 @@ Python Component: 3 ] -In the Python component, the following steps are executed to demonstrate the data roundtrip: -1. Create data in Python -2. Export data to Java -3. Import updated data from Java -4. Validate the data consistency +Java Component +-------------- +In the Java-based component of the system, the following operations are executed: -Java Component: ---------------- +1. Receives data from the Python component. +2. Updates the data. +3. Exports the updated data back to Python. - In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. - It then updates the data and sends it back to the Python component. +MapValuesConsumer class uses C Data interface to access the data created in Python. .. testcode:: @@ -251,11 +273,5 @@ Java Component: [2, 7, 93] -The Java component performs the following actions: - -1. Receives data from the Python component. -2. Updates the data. -3. Exports the updated data back to Python. - By integrating PyArrow in Python and Java components, this example demonstrates that a system can be created where data is shared and updated across both languages seamlessly.