Skip to content

Commit

Permalink
Merge branch 'main' into 2956-athena-read_sql_query-provides-complete…
Browse files Browse the repository at this point in the history
…ly-wrong-results-for-qmark-style-parametrized-queries-with-cache-enabled
  • Loading branch information
LeonLuttenberger committed Oct 9, 2024
2 parents 64c4e13 + 9b2cdd9 commit 85e7a48
Show file tree
Hide file tree
Showing 37 changed files with 2,132 additions and 1,791 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "3.9.2b1"
current_version = "3.10.0"
commit = false
tag = false
tag_name = "{new_version}"
Expand Down
80 changes: 40 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,27 +94,27 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
## At scale
AWS SDK for pandas can also run your workflows at scale by leveraging [Modin](https://modin.readthedocs.io/en/stable/) and [Ray](https://www.ray.io/). Both projects aim to speed up data workloads by distributing processing over a cluster of workers.

Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/scale.html) or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to learn more.
Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/scale.html) or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to learn more.

> ⚠️ **Ray is currently not available for Python 3.12. While AWS SDK for pandas supports Python 3.12, it cannot be used at scale.**
## [Read The Docs](https://aws-sdk-pandas.readthedocs.io/)

- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/about.html)
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html)
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#pypi-pip)
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#conda)
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#emr)
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/install.html#from-source)
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/scale.html)
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/scale.html#getting-started)
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/scale.html#supported-apis)
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/scale.html#resources)
- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/about.html)
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html)
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#pypi-pip)
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#conda)
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#emr)
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/install.html#from-source)
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/scale.html)
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/scale.html#getting-started)
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/scale.html#supported-apis)
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/scale.html#resources)
- [**Tutorials**](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials)
- [001 - Introduction](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/001%20-%20Introduction.ipynb)
- [002 - Sessions](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/002%20-%20Sessions.ipynb)
Expand Down Expand Up @@ -155,30 +155,30 @@ Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/scale.html) or
- [039 - Athena Iceberg](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/039%20-%20Athena%20Iceberg.ipynb)
- [040 - EMR Serverless](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/040%20-%20EMR%20Serverless.ipynb)
- [041 - Apache Spark on Amazon Athena](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/041%20-%20Apache%20Spark%20on%20Amazon%20Athena.ipynb)
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html)
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-athena)
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-redshift)
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#postgresql)
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#mysql)
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#sqlserver)
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#oracle)
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#data-api-redshift)
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#data-api-rds)
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#opensearch)
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#aws-glue-data-quality)
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-neptune)
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#dynamodb)
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-timestream)
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#amazon-quicksight)
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#aws-secrets-manager)
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#global-configurations)
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/api.html#distributed-ray)
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html)
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-athena)
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-redshift)
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#postgresql)
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#mysql)
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#sqlserver)
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#oracle)
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#data-api-redshift)
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#data-api-rds)
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#opensearch)
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#aws-glue-data-quality)
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-neptune)
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#dynamodb)
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-timestream)
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#amazon-quicksight)
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#aws-secrets-manager)
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#global-configurations)
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.10.0/api.html#distributed-ray)
- [**License**](https://github.com/aws/aws-sdk-pandas/blob/main/LICENSE.txt)
- [**Contributing**](https://github.com/aws/aws-sdk-pandas/blob/main/CONTRIBUTING.md)

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.9.2b1
3.10.0
2 changes: 1 addition & 1 deletion awswrangler/__metadata__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@

__title__: str = "awswrangler"
__description__: str = "Pandas on AWS."
__version__: str = "3.9.2b1"
__version__: str = "3.10.0"
__license__: str = "Apache License 2.0"
3 changes: 3 additions & 0 deletions awswrangler/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,9 @@ def _apply_type(name: str, value: Any, dtype: type[_ConfigValueType], nullable:
raise exceptions.InvalidArgumentValue(
f"{name} configuration does not accept a null value. Please pass {dtype}."
)
# Handle case where string is empty, "False" or "0". Anything else is True
if isinstance(value, str) and dtype is bool:
return value.lower() not in ("false", "0", "")
try:
return dtype(value) if isinstance(value, dtype) is False else value
except ValueError as ex:
Expand Down
2 changes: 2 additions & 0 deletions awswrangler/_data_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,8 @@ def pyarrow2postgresql( # noqa: PLR0911
return pyarrow2postgresql(dtype=dtype.value_type, string_type=string_type)
if pa.types.is_binary(dtype):
return "BYTEA"
if pa.types.is_list(dtype):
return pyarrow2postgresql(dtype=dtype.value_type, string_type=string_type) + "[]"
raise exceptions.UnsupportedType(f"Unsupported PostgreSQL type: {dtype}")


Expand Down
2 changes: 2 additions & 0 deletions awswrangler/_databases.py
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,8 @@ def generate_placeholder_parameter_pairs(
"""Extract Placeholder and Parameter pairs."""

def convert_value_to_native_python_type(value: Any) -> Any:
if isinstance(value, list):
return value
if pd.isna(value):
return None
if hasattr(value, "to_pydatetime"):
Expand Down
16 changes: 8 additions & 8 deletions awswrangler/athena/_read.py
Original file line number Diff line number Diff line change
Expand Up @@ -793,11 +793,11 @@ def read_sql_query(
**Related tutorial:**
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
tutorials/021%20-%20Global%20Configurations.html>`_
**There are three approaches available through ctas_approach and unload_approach parameters:**
Expand Down Expand Up @@ -861,7 +861,7 @@ def read_sql_query(
/athena.html#Athena.Client.get_query_execution>`_ .
For a practical example check out the
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!
Expand Down Expand Up @@ -1138,11 +1138,11 @@ def read_sql_table(
**Related tutorial:**
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
tutorials/021%20-%20Global%20Configurations.html>`_
**There are three approaches available through ctas_approach and unload_approach parameters:**
Expand Down Expand Up @@ -1206,7 +1206,7 @@ def read_sql_table(
/athena.html#Athena.Client.get_query_execution>`_ .
For a practical example check out the
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.10.0/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!
Expand Down
8 changes: 7 additions & 1 deletion awswrangler/athena/_write_iceberg.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,13 @@ def _determine_differences(

catalog_column_types = typing.cast(
Dict[str, str],
catalog.get_table_types(database=database, table=table, catalog_id=catalog_id, boto3_session=boto3_session),
catalog.get_table_types(
database=database,
table=table,
catalog_id=catalog_id,
filter_iceberg_current=True,
boto3_session=boto3_session,
),
)

original_column_names = set(catalog_column_types)
Expand Down
4 changes: 2 additions & 2 deletions awswrangler/catalog/_create.py
Original file line number Diff line number Diff line change
Expand Up @@ -1100,7 +1100,7 @@ def create_csv_table(
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
Related tutorial:
https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/tutorials/014%20-%20Schema%20Evolution.html
https://aws-sdk-pandas.readthedocs.io/en/3.10.0/tutorials/014%20-%20Schema%20Evolution.html
sep
String of length 1. Field delimiter for the output file.
skip_header_line_count
Expand Down Expand Up @@ -1280,7 +1280,7 @@ def create_json_table(
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
Related tutorial:
https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/tutorials/014%20-%20Schema%20Evolution.html
https://aws-sdk-pandas.readthedocs.io/en/3.10.0/tutorials/014%20-%20Schema%20Evolution.html
serde_library
Specifies the SerDe Serialization library which will be used. You need to provide the Class library name
as a string.
Expand Down
9 changes: 8 additions & 1 deletion awswrangler/catalog/_get.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ def get_table_types(
database: str,
table: str,
catalog_id: str | None = None,
filter_iceberg_current: bool = False,
boto3_session: boto3.Session | None = None,
) -> dict[str, str] | None:
"""Get all columns and types from a table.
Expand All @@ -120,6 +121,9 @@ def get_table_types(
catalog_id
The ID of the Data Catalog from which to retrieve Databases.
If ``None`` is provided, the AWS account ID is used by default.
filter_iceberg_current
If True, returns only current iceberg fields (fields marked with iceberg.field.current: true).
Otherwise, returns the all fields. False by default (return all fields).
boto3_session
The default boto3 session will be used if **boto3_session** receive ``None``.
Expand All @@ -139,7 +143,10 @@ def get_table_types(
response = client_glue.get_table(**_catalog_id(catalog_id=catalog_id, DatabaseName=database, Name=table))
except client_glue.exceptions.EntityNotFoundException:
return None
return _extract_dtypes_from_table_details(response=response)
return _extract_dtypes_from_table_details(
response=response,
filter_iceberg_current=filter_iceberg_current,
)


def get_databases(
Expand Down
10 changes: 8 additions & 2 deletions awswrangler/catalog/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,16 @@ def _sanitize_name(name: str) -> str:
return re.sub("[^A-Za-z0-9_]+", "_", name).lower() # Replacing non alphanumeric characters by underscore


def _extract_dtypes_from_table_details(response: "GetTableResponseTypeDef") -> dict[str, str]:
def _extract_dtypes_from_table_details(
response: "GetTableResponseTypeDef",
filter_iceberg_current: bool = False,
) -> dict[str, str]:
dtypes: dict[str, str] = {}
for col in response["Table"]["StorageDescriptor"]["Columns"]:
dtypes[col["Name"]] = col["Type"]
# Only return current fields if flag is enabled
if not filter_iceberg_current or col.get("Parameters", {}).get("iceberg.field.current") == "true":
dtypes[col["Name"]] = col["Type"]
# Add partition keys as columns
if "PartitionKeys" in response["Table"]:
for par in response["Table"]["PartitionKeys"]:
dtypes[par["Name"]] = par["Type"]
Expand Down
3 changes: 3 additions & 0 deletions awswrangler/data_api/rds.py
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,9 @@ def _create_value_dict( # noqa: PLR0911
if isinstance(value, Decimal):
return {"stringValue": str(value)}, "DECIMAL"

if isinstance(value, uuid.UUID):
return {"stringValue": str(value)}, "UUID"

raise exceptions.InvalidArgumentType(f"Value {value} not supported.")


Expand Down
8 changes: 6 additions & 2 deletions awswrangler/opensearch/_write.py
Original file line number Diff line number Diff line change
Expand Up @@ -504,6 +504,7 @@ def index_documents(
initial_backoff: int | None = None,
max_backoff: int | None = None,
use_threads: bool | int = False,
enable_refresh_interval: bool = True,
**kwargs: Any,
) -> dict[str, Any]:
"""
Expand Down Expand Up @@ -559,6 +560,8 @@ def index_documents(
True to enable concurrent requests, False to disable multiple threads.
If enabled os.cpu_count() will be used as the max number of threads.
If integer is provided, specified number is used.
enable_refresh_interval
True (default) to enable ``refresh_interval`` modification to ``-1`` (disabled) while indexing documents
**kwargs
KEYWORD arguments forwarded to bulk operation
elasticsearch >= 7.10.2 / opensearch: \
Expand Down Expand Up @@ -614,7 +617,7 @@ def index_documents(
widgets=widgets, max_value=total_documents, prefix="Indexing: "
).start()
for i, bulk_chunk_documents in enumerate(actions):
if i == 1: # second bulk iteration, in case the index didn't exist before
if i == 1 and enable_refresh_interval: # second bulk iteration, in case the index didn't exist before
refresh_interval = _get_refresh_interval(client, index)
_disable_refresh_interval(client, index)
_logger.debug("running bulk index of %s documents", len(bulk_chunk_documents))
Expand Down Expand Up @@ -655,6 +658,7 @@ def index_documents(
raise e

finally:
_set_refresh_interval(client, index, refresh_interval)
if enable_refresh_interval:
_set_refresh_interval(client, index, refresh_interval)

return {"success": success, "errors": errors}
4 changes: 2 additions & 2 deletions awswrangler/s3/_read_orc.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ def read_orc(
must return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/3.10.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns
List of columns to read from the file(s).
validate_schema
Expand Down Expand Up @@ -384,7 +384,7 @@ def read_orc_table(
must return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/3.9.2b1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/3.10.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns
List of columns to read from the file(s).
validate_schema
Expand Down
Loading

0 comments on commit 85e7a48

Please sign in to comment.