Skip to content

Commit

Permalink
updated project to use poetry
Browse files Browse the repository at this point in the history
  • Loading branch information
mwojtyczka committed Feb 23, 2024
1 parent 1586c14 commit 028adf8
Show file tree
Hide file tree
Showing 7 changed files with 48 additions and 23 deletions.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,4 +89,12 @@ For integration testing, please use `pytest`:
```
source $(poetry env info --path)/bin/activate
pytest tests/integration --cov
```

### Reinstalling virtual env

```
poetry env list
poetry env remove marcin-project-4eO9IBzv-py3.10
poetry install
```
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ pytest = "^8.0.1"
pytest-cov = "^4.0.0"
pytest-spark = "^0.6.0"
chispa = "^0.9.2"
databricks-sdk = "^0.20.0"
#databricks-connect = "^14.3.0"

[build-system]
requires = ["poetry-core"]
Expand Down
6 changes: 3 additions & 3 deletions src/marcin_project/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
from marcin_project.functions import filter_taxis


def get_taxis():
spark = SparkSession.builder.getOrCreate()
def get_taxis(spark: SparkSession):
return filter_taxis(spark.read.table("samples.nyctaxi.trips"))

def main():
get_taxis().show(5)
spark = SparkSession.builder.getOrCreate()
get_taxis(spark).show(5)


if __name__ == '__main__':
Expand Down
6 changes: 6 additions & 0 deletions tests/how_to_run_options.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
With databricks connect, spark cannot be started in local mode.

There are a couple of options for running unit and integration tests:
1. Run unit and integration tests using databricks connect. This requires a cluster in Databricks workspace.
2. Run unit tests using spark local and integration tests using Databricks job (e.g. created using sdk).
3. Have different virtual environments for unit (no databricks-connect installed) and integration tests (databricks-connect installed).
17 changes: 0 additions & 17 deletions tests/integration/main_test.py

This file was deleted.

26 changes: 26 additions & 0 deletions tests/integration/main_test_dbx_connect.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# from databricks.connect import DatabricksSession
# from marcin_project import main
#
# # doc: https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html
#
# # Create a new Databricks Connect session. If this fails,
# # check that you have configured Databricks Connect correctly.
# # See https://docs.databricks.com/dev-tools/databricks-connect.html
#
# # Take connection from .databrikcscfg file, DEFAULT profile)
# # https://docs.databricks.com/dev-tools/databricks-connect-ref.html#requirements
#
# spark = DatabricksSession.builder.getOrCreate()
#
# #SparkSession.builder = DatabricksSession.builder.profile("DEFAULT")
# #spark = SparkSession.builder.getOrCreate()
#
# # spark = DatabricksSession.builder.remote(
# # host=f"https://adb-8870486534760962.2.azuredatabricks.net/?o=8870486534760962",
# # token="dapi03fec0a64fcc088adc1a27864050a598-2",
# # cluster_id="0222-221408-a9yml4v"
# # ).getOrCreate()
#
# def test_main():
# taxis = main.get_taxis(spark)
# assert taxis.count() > 5
6 changes: 3 additions & 3 deletions tests/unit/main_test.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from marcin_project import functions

from chispa.dataframe_comparer import *

from pyspark.sql import SparkSession
#spark_session = SparkSession.builder.getOrCreate()

# instead of using pytest-spark
#spark = SparkSession.builder.getOrCreate()

def test_get_taxi(spark_session: SparkSession): # using pytest-spark
schema = "trip_distance: double, fare_amount: double"
Expand Down

0 comments on commit 028adf8

Please sign in to comment.