Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix syntax error in version check command #21

Merged
merged 26 commits into from
Jul 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
a42a4bf
Removed notebooks and test and build job in the workflow
Vicbi Jul 8, 2024
f1de97e
Updated README.md
Vicbi Jul 8, 2024
a7464e9
Renamed QuestionnaireExplorer to QuestionnaireResponseExplorer and up…
Vicbi Jul 8, 2024
33089ee
Linting
Vicbi Jul 8, 2024
f9a4736
Linting
Vicbi Jul 8, 2024
cd067d9
Removed Colab badge from README
Vicbi Jul 8, 2024
b654e96
Fixed mutli-line job for checking if the version exists
Vicbi Jul 8, 2024
03291db
Fixed mutli-line job for checking if the version exists
Vicbi Jul 8, 2024
b647345
Fixed mutli-line job for checking if the version exists
Vicbi Jul 8, 2024
cf505c2
Fixed mutli-line job for checking if the version exists
Vicbi Jul 8, 2024
7978d2f
Fixed mutli-line job for checking if the version exists
Vicbi Jul 8, 2024
1dccb6a
Added documentation deployment
Vicbi Jul 8, 2024
f059f93
Added documentation deployment
Vicbi Jul 8, 2024
83945fa
Added empty line in the end of the yaml file
Vicbi Jul 8, 2024
722697f
Merge branch 'main' into fix-publish-workflow
Vicbi Jul 8, 2024
12db8f9
Used raw image URL for figures in README
Vicbi Jul 8, 2024
e194f33
Merge branch 'fix-publish-workflow' of github.com:StanfordSpezi/Spezi…
Vicbi Jul 8, 2024
4248891
Renamed `ElectrocardiogramClassification` to `AppleElectrocardiogramC…
Vicbi Jul 16, 2024
540bfb0
Updated `explore_total_records_number`
Vicbi Jul 16, 2024
14fbfc5
Updated string value for `ECG_RECORDING_UNIT`
Vicbi Jul 16, 2024
dbf0443
Updated handling of the ECG_RECORDING based on each type
Vicbi Jul 16, 2024
042f9ab
Updated handling of the `ECG_RECORDING_UNIT` in `data_explorer.py` an…
Vicbi Jul 17, 2024
0aab4bb
Fixed lint errors
Vicbi Jul 17, 2024
1de1614
Added test for `explore_total_records_number`
Vicbi Jul 17, 2024
f047a3d
Refactored `FirebaseFHIRAccess` class to accept a Firestore client in…
Vicbi Jul 17, 2024
341a091
Made `project_id` an Optional argument in FirebaseFHIRAccess initiali…
Vicbi Jul 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions .github/workflows/publish-to-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,16 @@ jobs:

- name: Check if version already exists on PyPI/Test PyPI
run: |
VERSION_EXISTS=$(curl -s ${{ needs.determine_environment.outputs.repo }}pypi/spezi_data_pipeline/json
| jq -r ".releases
| has(\"${{ needs.determine_environment.outputs.version }}\")")
run: |
Vicbi marked this conversation as resolved.
Show resolved Hide resolved
REPO_URL=${{ needs.determine_environment.outputs.repo }}
PACKAGE_VERSION=${{ needs.determine_environment.outputs.version }}
if [ "$REPO_URL" == "https://upload.pypi.org/legacy/" ]; then
PYPI_URL="https://pypi.org/pypi/spezi_data_pipeline/json"
else
PYPI_URL="https://test.pypi.org/pypi/spezi_data_pipeline/json"
fi
RESPONSE=$(curl -s $PYPI_URL)
VERSION_EXISTS=$(echo $RESPONSE | jq -r ".releases | has(\"$PACKAGE_VERSION\")")
if [ "$VERSION_EXISTS" = "true" ]; then
echo "Version already exists. Exiting."
exit 1
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,8 +182,9 @@ visualizer.set_date_range(selected_start_date, selected_end_date)
figs = visualizer.create_static_plot(processed_fhir_dataframe)
```

![daily_steps_data_plot.png](https://github.com/StanfordSpezi/SpeziDataPipeline/blob/main/Figures/daily_steps_data_plot.png)
![heart_rate_data_plot.png](https://github.com/StanfordSpezi/SpeziDataPipeline/blob/main/Figures/heart_rate_data_plot.png)
![daily_steps_data_plot.png](https://raw.githubusercontent.com/StanfordSpezi/SpeziDataPipeline/main/Figures/daily_steps_data_plot.png)
![heart_rate_data_plot.png](https://raw.githubusercontent.com/StanfordSpezi/SpeziDataPipeline/main/Figures/heart_rate_data_plot.png)


## ECG Observations

Expand All @@ -209,7 +210,7 @@ visualizer.set_date_range(selected_start_date, selected_end_date)
figs = visualizer.plot_ecg_subplots(processed_fhir_dataframe)
```

![ecg_data_plot.png](https://github.com/StanfordSpezi/SpeziDataPipeline/blob/main/Figures/ecg_data_plot.png)
![ecg_data_plot.png](https://raw.githubusercontent.com/StanfordSpezi/SpeziDataPipeline/main/Figures/ecg_data_plot.png)


### Questionnaire Responses
Expand Down
19 changes: 14 additions & 5 deletions src/spezi_data_pipeline/data_access/firebase_fhir_data_access.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
# Standard library imports
import json
import os
from typing import Any
from typing import Any, Optional

# Related third-party imports
from dataclasses import dataclass
Expand Down Expand Up @@ -77,22 +77,31 @@ class FirebaseFHIRAccess: # pylint: disable=unused-variable

Attributes:
project_id (str): Identifier of the Firebase project.
service_account_key_file (str): Path to the Firebase service account key file for
service_account_key_file (str | None): Path to the Firebase service account key file for
authentication.
db (Optional[firestore.Client]): A Firestore client instance for database operations,
db (firestore.Client | None): A Firestore client instance for database operations,
initialized upon successful connection.
"""

def __init__(
self, project_id: str, service_account_key_file: str | None = None
self,
project_id: Optional[ # pylint: disable=consider-alternative-union-syntax
str
] = None,
service_account_key_file: Optional[ # pylint: disable=consider-alternative-union-syntax
str
] = None,
db: Optional[ # pylint: disable=consider-alternative-union-syntax
firestore.client
] = None,
) -> None:
"""
Initializes the FirebaseFHIRAccess instance with Firebase service account
credentials and project ID.
"""
self.project_id = project_id
self.service_account_key_file = service_account_key_file
self.db = None
self.db = db

def connect(self) -> None:
"""
Expand Down
49 changes: 34 additions & 15 deletions src/spezi_data_pipeline/data_exploration/data_explorer.py
Original file line number Diff line number Diff line change
Expand Up @@ -425,11 +425,22 @@
)

if row[ColumnNames.ECG_RECORDING.value] is not None:
ecg_array = np.array(
row[ColumnNames.ECG_RECORDING.value].split(), dtype=float
)
if isinstance(row[ColumnNames.ECG_RECORDING.value], list):
ecg_array = np.array(

Check warning on line 429 in src/spezi_data_pipeline/data_exploration/data_explorer.py

View check run for this annotation

Codecov / codecov/patch

src/spezi_data_pipeline/data_exploration/data_explorer.py#L429

Added line #L429 was not covered by tests
row[ColumnNames.ECG_RECORDING.value], dtype=float
)
else:
ecg_array = np.array(
row[ColumnNames.ECG_RECORDING.value].split(), dtype=float
)

if row[ColumnNames.ECG_RECORDING_UNIT.value] == ECG_MICROVOLT_UNIT:
ecg_array = ecg_array / 1000 # Convert uV to mV
elif row[ColumnNames.ECG_RECORDING_UNIT.value] != ECG_MICROVOLT_UNIT:
print(
"ECG units must be in either uV or mV. Check units and plot again."
)
return figures

sample_rate = row.get(
ColumnNames.SAMPLING_FREQUENCY.value, DEFAULT_SAMPLE_RATE_VALUE
Expand Down Expand Up @@ -690,34 +701,42 @@
- None
"""

df["EffectiveDateTime"] = pd.to_datetime(df["EffectiveDateTime"])
df[ColumnNames.EFFECTIVE_DATE_TIME.value] = pd.to_datetime(
df[ColumnNames.EFFECTIVE_DATE_TIME.value]
)

if start_date is not None and end_date is not None:
df = df[
(df["EffectiveDateTime"] >= start_date)
& (df["EffectiveDateTime"] <= end_date)
(df[ColumnNames.EFFECTIVE_DATE_TIME.value] >= start_date)
& (df[ColumnNames.EFFECTIVE_DATE_TIME.value] <= end_date)
]

if isinstance(user_ids, str):
user_ids = [user_ids]

if user_ids is not None:
df = df[df["UserId"].isin(user_ids)]
df = df[df[ColumnNames.USER_ID.value].isin(user_ids)]

counts = df.groupby(["LoincCode", "UserId"]).size().unstack(fill_value=0)
counts = (
df.groupby([ColumnNames.LOINC_CODE.value, ColumnNames.USER_ID.value])
.size()
.unstack(fill_value=0)
)

plt.figure(figsize=(40, 50))
counts.plot(kind="bar")
plt.title("Number of records by Loinc code", fontsize=16)
plt.xlabel("Loinc code", fontsize=14)
plt.ylabel("Count", fontsize=14)
plt.xticks(rotation=45, ha="right", fontsize=12)
plt.figure(figsize=(20, 10))
ax = counts.plot(kind="bar", stacked=True, figsize=(20, 10))
plt.title("Number of Records by LOINC Code", fontsize=20)
plt.xlabel("LOINC Code", fontsize=20)
plt.ylabel("Count", fontsize=20)
plt.xticks(rotation=45, ha="right", fontsize=16)
plt.legend(
title="User ID",
fontsize=12,
fontsize=14,
title_fontsize=14,
bbox_to_anchor=(1.05, 1),
loc="upper left",
)
plt.tight_layout()
plt.show()

return ax # For test inspection
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ class ColumnNames(Enum):
NUMBER_OF_MEASUREMENTS: Number of measurements taken.
SAMPLING_FREQUENCY: Frequency at which data was sampled.
SAMPLING_FREQUENCY_UNIT: Unit for the sampling frequency.
ELECTROCARDIOGRAM_CLASSIFICATION: Classification of the ECG observation.
APPLE_ELECTROCARDIOGRAM_CLASSIFICATION: Classification of the ECG observation.
HEART_RATE: Observed heart rate.
HEART_RATE_UNIT: Unit of the observed heart rate.
ECG_RECORDING_UNIT: Unit for ECG recording data.
Expand All @@ -179,10 +179,10 @@ class ColumnNames(Enum):
NUMBER_OF_MEASUREMENTS = "NumberOfMeasurements"
SAMPLING_FREQUENCY = "SamplingFrequency"
SAMPLING_FREQUENCY_UNIT = "SamplingFrequencyUnit"
ELECTROCARDIOGRAM_CLASSIFICATION = "ElectrocardiogramClassification"
APPLE_ELECTROCARDIOGRAM_CLASSIFICATION = "AppleElectrocardiogramClassification"
HEART_RATE = "HeartRate"
HEART_RATE_UNIT = "HeartRateUnit"
ECG_RECORDING_UNIT = "ECGDataRecordingUnit"
ECG_RECORDING_UNIT = "ECGRecordingUnit"
ECG_RECORDING = "ECGRecording"
AUTHORED_DATE = "AuthoredDate"
QUESTIONNAIRE_TITLE = "QuestionnaireTitle"
Expand Down Expand Up @@ -382,7 +382,7 @@ def __init__(self, resource_type: FHIRResourceType):
ColumnNames.NUMBER_OF_MEASUREMENTS,
ColumnNames.SAMPLING_FREQUENCY,
ColumnNames.SAMPLING_FREQUENCY_UNIT,
ColumnNames.ELECTROCARDIOGRAM_CLASSIFICATION,
ColumnNames.APPLE_ELECTROCARDIOGRAM_CLASSIFICATION,
ColumnNames.HEART_RATE,
ColumnNames.HEART_RATE_UNIT,
ColumnNames.ECG_RECORDING_UNIT,
Expand Down Expand Up @@ -586,7 +586,7 @@ def flatten(
.get(KeyNames.COMPONENT.value, [{}])[1]
.get(KeyNames.VALUE_QUANTITY.value, {})
.get(KeyNames.UNIT.value, None),
ColumnNames.ELECTROCARDIOGRAM_CLASSIFICATION.value: observation.dict()
ColumnNames.APPLE_ELECTROCARDIOGRAM_CLASSIFICATION.value: observation.dict()
.get(KeyNames.COMPONENT.value, [{}])[2]
.get(KeyNames.VALUE_STRING.value, None),
ColumnNames.HEART_RATE.value: observation.dict()
Expand Down
2 changes: 1 addition & 1 deletion tests/test_data_access.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ class TestFirebaseFHIRAccess(unittest.TestCase): # pylint: disable=unused-varia
def setUp(self):
self.project_id = "test-project"
self.service_account_key_file = "/path/to/service/account.json"
self.mock_db = MagicMock()

@patch("os.path.exists")
@patch("os.environ")
Expand Down Expand Up @@ -167,7 +168,6 @@ def test_fetch_data_valid_loinc_code(self, mock_firestore):
"users", "HealthKit", [ECG_RECORDING_LOINC_CODE]
)

# Verify
self.assertIsNotNone(result)
self.assertEqual(len(result), 0)

Expand Down
66 changes: 62 additions & 4 deletions tests/test_data_exploration.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
DataExplorer,
ECGExplorer,
QuestionnaireResponseExplorer,
explore_total_records_number,
)

USER_ID1 = "user1"
Expand Down Expand Up @@ -176,10 +177,12 @@
user_data = self.fhir_dataframe.df[
self.fhir_dataframe.df[ColumnNames.USER_ID.value] == USER_ID1
]
figs = self.explorer.plot_single_user_ecg(user_data, USER_ID1)
self.assertIsNotNone(figs)
self.assertIsInstance(figs, list)
self.assertIsInstance(figs[0], plt.Figure)

if figs := self.explorer.plot_single_user_ecg(user_data, USER_ID1):
self.assertIsInstance(figs[0], plt.Figure)
self.assertIsInstance(figs, list)

Check warning on line 183 in tests/test_data_exploration.py

View check run for this annotation

Codecov / codecov/patch

tests/test_data_exploration.py#L182-L183

Added lines #L182 - L183 were not covered by tests
else:
self.assertEqual(len(figs), 0)

def test_no_ecg_data(self):
self.explorer.set_date_range("2024-01-01", "2024-01-31")
Expand Down Expand Up @@ -269,5 +272,60 @@
self.assertIsNone(fig)


class TestExploreTotalRecordsNumber(
Vicbi marked this conversation as resolved.
Show resolved Hide resolved
unittest.TestCase
): # pylint: disable=unused-variable
"""
Test the explore_total_records_number function.

This test class ensures that the function behaves correctly by creating a bar plot
showing the count of rows with the same LoincCode column value within the specified
date range and for the specified user IDs.

The tests include:
- Verifying that the function can handle input data and generate a plot.
- Ensuring that plt.show() is called to display the plot.
- Checking that the number of bars in the plot corresponds to the number of unique
LOINC codes in the input data.

Methods:
- setUp: Initializes mock data and the required objects for testing.
- test_explore_total_records_number: Tests the function with mock data, ensuring the
plot is generated and the number of bars is correct.
"""

@patch("matplotlib.pyplot.show")
def test_explore_total_records_number(self, mock_show):

data = {
ColumnNames.EFFECTIVE_DATE_TIME.value: [
"2023-01-01",
"2023-01-02",
"2023-01-03",
],
ColumnNames.USER_ID.value: ["user1", "user2", "user1"],
ColumnNames.LOINC_CODE.value: ["code1", "code1", "code2"],
}
df = pd.DataFrame(data)

df[ColumnNames.EFFECTIVE_DATE_TIME.value] = pd.to_datetime(
df[ColumnNames.EFFECTIVE_DATE_TIME.value]
)

ax = explore_total_records_number(
df,
start_date="2023-01-01",
end_date="2023-01-31",
user_ids=["user1", "user2"],
)

mock_show.assert_called_once()
num_unique_loinc_codes = df[ColumnNames.LOINC_CODE.value].nunique()
num_bars = (
len(ax.patches) // num_unique_loinc_codes
) # Since bars are stacked, divide by num_unique_loinc_codes
self.assertEqual(num_bars, num_unique_loinc_codes)


if __name__ == "__main__":
unittest.main()
Loading