chore(data-warehouse): Some minor memory optimisations #28621

Gilbert09 · 2025-02-12T15:23:56Z

Changes

Some minor optimisations to the pipeline
- Package updates
- SQL source settings
- Clean up pyarrows memory
- Added pyarrows debug mode

posthog-bot · 2025-02-12T15:24:11Z

Hey @Gilbert09! 👋
This pull request seems to contain no description. Please add useful context, rationale, and/or any other information that will help make sense of this change now and in the distant Mars-based future.

greptile-apps

PR Summary

This PR updates SQL-related dependencies and improves database connection handling in the data import pipelines.

Fixed critical SQLAlchemy engine execution options bug in /posthog/temporal/data_imports/pipelines/sql_database/__init__.py by properly chaining method calls
Added memory optimization in /posthog/temporal/data_imports/pipelines/sql_database/helpers.py with max_row_buffer and stream_results settings
Updated PostgreSQL connection string to explicitly use psycopg driver with 'postgresql+psycopg://' format
Upgraded core SQL packages including SQLAlchemy (2.0.38), psycopg (3.2.4), and related dependencies while maintaining backwards compatibility
Reorganized imports and removed redundancies in arrow_helpers.py for better code organization

_{5 file(s) reviewed, 2 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

greptile-apps · 2025-02-12T15:25:21Z

posthog/temporal/data_imports/pipelines/sql_database/__init__.py

-    engine.execution_options(stream_results=True, max_row_buffer=2 * chunk_size)
+    engine = engine_from_credentials(credentials, may_dispose_after_use=True).execution_options(
+        stream_results=True, max_row_buffer=2 * chunk_size
+    )
    metadata = metadata or MetaData(schema=schema)

    table_obj: Table | None = metadata.tables.get("table")


logic: table lookup uses hardcoded string 'table' instead of the table parameter

Suggested change

table_obj: Table | None = metadata.tables.get("table")

table_obj: Table | None = metadata.tables.get(table)

greptile-apps · 2025-02-12T15:25:31Z

posthog/temporal/data_imports/pipelines/sql_database/helpers.py

+            result = conn.execution_options(
+                yield_per=self.chunk_size, max_row_buffer=DEFAULT_CHUNK_SIZE * 2, stream_results=True
+            ).execute(query)


style: Setting max_row_buffer to 2x chunk_size could still cause OOM issues with very large chunk sizes. Consider adding an upper bound or making this configurable.

Updates to sql source and packcages

ee89c8a

greptile-apps bot reviewed Feb 12, 2025

View reviewed changes

Gilbert09 added 4 commits February 12, 2025 16:31

Dev requirements

709d845

Test out standard json dumps

9481485

JSON upgrade

bf63f04

Pyarrows cleanup

3fe8a2c

Gilbert09 changed the title ~~[WIP] Updates to sql source and packcages~~ chore(data-warehouse): Some minor memory optimisations Feb 13, 2025

Gilbert09 added 3 commits February 13, 2025 11:49

Add pyarrow debug mode

0d2c945

mypy updates

2b597bf

mypy updates

40691df

Gilbert09 requested a review from a team February 13, 2025 15:09

Gilbert09 enabled auto-merge (squash) February 13, 2025 16:00

EDsCODE approved these changes Feb 13, 2025

View reviewed changes

Gilbert09 merged commit b2dd622 into master Feb 13, 2025
95 checks passed

Gilbert09 deleted the tom/sqlalchemy-updates branch February 13, 2025 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(data-warehouse): Some minor memory optimisations #28621

chore(data-warehouse): Some minor memory optimisations #28621

Gilbert09 commented Feb 12, 2025 •

edited

Loading

posthog-bot commented Feb 12, 2025

greptile-apps bot left a comment

greptile-apps bot Feb 12, 2025

greptile-apps bot Feb 12, 2025

	table_obj: Table \| None = metadata.tables.get("table")
	table_obj: Table \| None = metadata.tables.get(table)

chore(data-warehouse): Some minor memory optimisations #28621

chore(data-warehouse): Some minor memory optimisations #28621

Conversation

Gilbert09 commented Feb 12, 2025 • edited Loading

Changes

posthog-bot commented Feb 12, 2025

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

greptile-apps bot Feb 12, 2025

Choose a reason for hiding this comment

greptile-apps bot Feb 12, 2025

Choose a reason for hiding this comment

Gilbert09 commented Feb 12, 2025 •

edited

Loading