Vector udf via regular rdd #1025

jdries · 2025-01-30T12:54:48Z

No description provided.

#458

…r pivot #458

…turned schema up front #458

#458

jdries · 2025-02-21T12:59:28Z

@soxofaan I plan on merging this next week, feel free to review.
I think most important changes to tests have been reduced.

soxofaan · 2025-02-27T14:27:10Z

FYI: I just merged master in feature branch to trigger another jenkins build

it's still failing though

soxofaan

some quick comments from initial review

soxofaan · 2025-02-27T14:29:47Z

openeogeotrellis/vectorcube.py

-        #pyspark.pandas.set_option('compute.max_rows', None)
+        id_index = columns.index("feature_index")
+
+        def mapTimeseriesRows(id_bands):


Suggested change

def mapTimeseriesRows(id_bands):

def map_timeseries_rows(id_bands):

is it also possible to give id_bands a type annotation? it's not very clear what this variable currently is

soxofaan · 2025-02-27T14:30:04Z

openeogeotrellis/vectorcube.py


-        processed_df = csv_df.groupby("feature_index").apply(callback).reset_index()
+        csv_as_list = df.rdd.map(list).map(lambda x: (x[id_index],x)).groupByKey().map(mapTimeseriesRows)


Suggested change

csv_as_list = df.rdd.map(list).map(lambda x: (x[id_index],x)).groupByKey().map(mapTimeseriesRows)

csv_as_list = df.rdd.map(list).map(lambda x: (x[id_index],x)).groupByKey().map(map_timeseries_rows)

I'm still figuring this out, but csv_as_list seems bit confusing, this isn't a list of csv anymore, but an RDD of dicts, right?

soxofaan · 2025-02-27T14:35:15Z

openeogeotrellis/vectorcube.py

            # TODO: also pass feature_index to udf?
-            processed = udf_function(feature_data)
+            bands = id_bands[1]
+            import pandas as pd


this local import shouldn't be necessary right?

soxofaan · 2025-02-27T14:37:06Z

openeogeotrellis/vectorcube.py

-            feature_data.index = feature_data.index.strftime("%Y-%m-%dT%H:%M:%SZ")
+            #if "date" in feature_data.columns:
+            #    feature_data = feature_data.set_index("date")
+            #feature_data.index = feature_data.index.strftime("%Y-%m-%dT%H:%M:%SZ")


can't we remove these lines of code instead of commenting out? What is the use of keeping them?

jdries added 7 commits January 28, 2025 16:19

invoke vectorcube udf on executor

7b70d1a

#458

convert date to string to avoid error when using as column label afte…

c435133

…r pivot #458

use plain RDD to apply vector udf, this avoid the need to know the re…

8308e33

…turned schema up front #458

use plain RDD to apply vector udf, this avoid the need to know the re…

3109b32

…turned schema up front #458

reset_index to be consistent with previous behaviour

d96f447

#458

unit test fixes

474bd52

#458

unit test fixes

ba40979

#458

jdries requested a review from soxofaan January 30, 2025 12:54

jdries added 5 commits February 4, 2025 21:15

geojson not declared as output format

b03fa7d

test if pyspark accumulators work to track orfeo errors

1eb7c11

remove print statement

c1e5614

reduce changes to test behaviour

98aa052

Merge branch 'master' into vector_udf

9b85271

Merge remote-tracking branch 'origin/master' into vector_udf

8ad729a

soxofaan reviewed Feb 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector udf via regular rdd #1025

Vector udf via regular rdd #1025

jdries commented Jan 30, 2025

jdries commented Feb 21, 2025

soxofaan commented Feb 27, 2025

soxofaan left a comment

soxofaan Feb 27, 2025

soxofaan Feb 27, 2025

soxofaan Feb 27, 2025

soxofaan Feb 27, 2025

soxofaan Feb 27, 2025

soxofaan Feb 27, 2025

	def mapTimeseriesRows(id_bands):
	def map_timeseries_rows(id_bands):


		processed_df = csv_df.groupby("feature_index").apply(callback).reset_index()
		csv_as_list = df.rdd.map(list).map(lambda x: (x[id_index],x)).groupByKey().map(mapTimeseriesRows)

Vector udf via regular rdd #1025

Are you sure you want to change the base?

Vector udf via regular rdd #1025

Conversation

jdries commented Jan 30, 2025

jdries commented Feb 21, 2025

soxofaan commented Feb 27, 2025

soxofaan left a comment

Choose a reason for hiding this comment

soxofaan Feb 27, 2025

Choose a reason for hiding this comment

soxofaan Feb 27, 2025

Choose a reason for hiding this comment

soxofaan Feb 27, 2025

Choose a reason for hiding this comment

soxofaan Feb 27, 2025

Choose a reason for hiding this comment

soxofaan Feb 27, 2025

Choose a reason for hiding this comment

soxofaan Feb 27, 2025

Choose a reason for hiding this comment