-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] - ModuleNotFoundError when calling a function that uses a User Defined Function (UDF) #975
Comments
I have the same issue. Maybe to add to this: @udf
def str_hex_to_numeric(hex_value: str, data_type_name: str) -> float:
... or def udf_wrapper():
def str_hex_to_numeric(hex_value: str, data_type_name: str) -> float:
...
return udf(str_hex_to_numeric, FloatType()) What also doesn't work is referencing things from outside the functions scope, constants for example. |
I have the same issue. I found it with this use case: df = df.withColumn('result', my_udf(col('some_data'))) where my_udf is in a helper module. The only solution I've found to this point is to package up the helper in a wheel and install the wheel on the cluster. And then run my notebook from the databricks workspace rather than vscode. |
Had the same issue with Full resolution code sample with the decorator: # helper_module.py
# From the Python Standard Library
import struct
# From PySpark
import pyspark.sql.functions as F
import pyspark.sql.types as T
from pyspark.sql import DataFrame
from pyspark.sql.functions import udf
@udf(T.FloatType())
def str_hex_to_numeric(
hex_value: str,
data_type_name: str
) -> float:
"""Convert a hex string to a numeric value."""
if data_type_name == "Float":
return struct.unpack('!f', bytes.fromhex(hex_value))[0]
raise ValueError(f"Unknown data type: {data_type_name}")
def value_col_hex_to_numeric(
df: DataFrame,
value_col: str = "VALUE",
data_type_name_col: str = "DATA_TYPE_NAME"
) -> DataFrame:
"""Convert a hex string to a numeric value."""
return df.withColumn(
value_col,
str_hex_to_numeric(F.col(value_col), F.col(data_type_name_col))
) |
System information
Code structure
Code sample
The text was updated successfully, but these errors were encountered: