Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres with db #223

Open
wants to merge 52 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
5ee1013
Initial code for postgres
varkha-d-sharma Jun 25, 2024
7e96edd
Made changes to make mlmd postgres compataible
varkha-d-sharma Jun 26, 2024
0a1c7af
Pushing changes related to implementation of postgres
varkha-d-sharma Jun 27, 2024
6db03f2
changing CmfQuery call
varkha-d-sharma Jun 27, 2024
34811c1
made changes in docker_compose-server.yml to fix startup sequence of …
varkha-d-sharma Jul 2, 2024
dd578ea
Adding env variables to server/Dockerfile and docker-compose-server.yml
varkha-d-sharma Aug 7, 2024
6dec28d
Merge branch 'HewlettPackard:master' into postgres
varkha-d-sharma Aug 9, 2024
879ece8
Merge branch 'HewlettPackard:master' into postgres
varkha-d-sharma Aug 14, 2024
e315c9e
Initial code for postgres
varkha-d-sharma Jun 25, 2024
9e249f1
Made changes to make mlmd postgres compataible
varkha-d-sharma Jun 26, 2024
be5e758
Pushing changes related to implementation of postgres
varkha-d-sharma Jun 27, 2024
510441a
changing CmfQuery call
varkha-d-sharma Jun 27, 2024
c3a12ff
made changes in docker_compose-server.yml to fix startup sequence of …
varkha-d-sharma Jul 2, 2024
ce99644
Adding env variables to server/Dockerfile and docker-compose-server.yml
varkha-d-sharma Aug 7, 2024
96c1b0a
pulling latest changes and resolving discard commits issue
varkha-d-sharma Aug 22, 2024
a327f27
Merge branch 'master' into postgres
varkha-d-sharma Sep 2, 2024
2ddcc09
pushing latest changes
varkha-d-sharma Sep 2, 2024
6f01226
making changes reduce postgres server call
varkha-d-sharma Sep 5, 2024
985e76c
Merge branch 'HewlettPackard:master' into postgres
varkha-d-sharma Sep 12, 2024
d5cff97
Merge branch 'master' into postgres
varkha-d-sharma Oct 8, 2024
0611af3
addressing review comments
varkha-d-sharma Oct 8, 2024
dd24a72
Pushing changes to local branch to track
varkha-d-sharma Oct 9, 2024
a28c4cf
adding code changes for python env artifact
varkha-d-sharma Oct 14, 2024
23e33c1
adding code for python env artifact
varkha-d-sharma Oct 15, 2024
ff508cf
review comments
varkha-d-sharma Oct 15, 2024
9c74e35
fixing compile time errors
varkha-d-sharma Oct 15, 2024
4d0bba5
Merge branch 'HewlettPackard:master' into postgres
varkha-d-sharma Oct 15, 2024
df1dc3d
adding server related changes
varkha-d-sharma Oct 15, 2024
80e038d
resolving discard commits issue
varkha-d-sharma Dec 3, 2024
5dcd9dd
Modify name of database
AyeshaSanadi Dec 3, 2024
5d55529
Merge branch 'HewlettPackard:master' into postgres
AyeshaSanadi Dec 5, 2024
c1a273c
Created gui for postgres
AyeshaSanadi Dec 6, 2024
902678f
Added postgres related api
AyeshaSanadi Dec 6, 2024
bc27120
Added string_value instead of int_value
AyeshaSanadi Dec 6, 2024
f009c7c
Added expanded row to postgres table
AyeshaSanadi Dec 6, 2024
295d789
Featched data using artifact type
AyeshaSanadi Dec 9, 2024
88135d3
removing code already availble in another feature
varkha-d-sharma Dec 10, 2024
6d938e0
cleaning code
varkha-d-sharma Dec 10, 2024
2d56f5e
removing old code
varkha-d-sharma Dec 11, 2024
313a2af
Added filter related half code
AyeshaSanadi Dec 11, 2024
f084030
Done sorting and filtering part.
AyeshaSanadi Dec 11, 2024
9197267
Merge branch 'HewlettPackard:master' into postgres_with_db
AyeshaSanadi Dec 12, 2024
d7e4af2
Added Pagination and Changed sorting logic
AyeshaSanadi Dec 12, 2024
5976739
Added filter for custom properties.
AyeshaSanadi Dec 16, 2024
d186720
Added filter for custom properties and half code for execution page
AyeshaSanadi Dec 18, 2024
eb0f1be
Merge branch 'HewlettPackard:master' into postgres_with_db
AyeshaSanadi Dec 18, 2024
6ca466b
Added sorting based on date and name column
AyeshaSanadi Dec 18, 2024
a0d0398
Merge branch 'HewlettPackard:master' into postgres_with_db
AyeshaSanadi Dec 19, 2024
7d88a2b
Added Execution page, search bar code
AyeshaSanadi Dec 20, 2024
8d0cc03
made some changes in artifact related query
varkha-d-sharma Jan 6, 2025
22adc79
Added filter inside execution.
AyeshaSanadi Jan 6, 2025
3214664
Modify psql query for execution and artifact tab.
AyeshaSanadi Jan 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 25 additions & 18 deletions cmflib/cmf.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@

# This import is needed for jupyterlab environment
from ml_metadata.proto import metadata_store_pb2 as mlpb
from ml_metadata.metadata_store import metadata_store
from cmflib.dvc_wrapper import (
dvc_get_url,
dvc_get_hash,
Expand All @@ -40,6 +39,8 @@
git_commit,
)
from cmflib import graph_wrapper
from cmflib.store.sqllite_store import SqlliteStore
from cmflib.store.postgres import PostgresStore
from cmflib.metadata_helper import (
get_or_create_parent_context,
get_or_create_run_context,
Expand All @@ -54,7 +55,7 @@
link_execution_to_input_artifact,
)
from cmflib.utils.cmf_config import CmfConfig
from cmflib.utils.helper_functions import get_python_env, change_dir
from cmflib.utils.helper_functions import change_dir
from cmflib.cmf_commands_wrapper import (
_metadata_push,
_metadata_pull,
Expand Down Expand Up @@ -103,16 +104,9 @@ class Cmf:
"""

# pylint: disable=too-many-instance-attributes
# Reading CONFIG_FILE variable
cmf_config = os.environ.get("CONFIG_FILE", ".cmfconfig")
ARTIFACTS_PATH = "cmf_artifacts"
DATASLICE_PATH = "dataslice"
METRICS_PATH = "metrics"
if os.path.exists(cmf_config):
attr_dict = CmfConfig.read_config(cmf_config)
__neo4j_uri = attr_dict.get("neo4j-uri", "")
__neo4j_password = attr_dict.get("neo4j-password", "")
__neo4j_user = attr_dict.get("neo4j-user", "")

def __init__(
self,
Expand All @@ -128,17 +122,30 @@ def __init__(
else os.getcwd()

logging_dir = change_dir(self.cmf_init_path)
temp_store = ""
if is_server is False:
Cmf.__prechecks()
temp_store = SqlliteStore({"filename":filepath})
else:
IP = os.getenv('MYIP')
POSTGRES_DB = os.getenv('POSTGRES_DB')
POSTGRES_USER = os.getenv('POSTGRES_USER')
POSTGRES_PASSWORD = os.getenv('POSTGRES_PASSWORD')
#print(f"The value of POSTGRES_DB is {POSTGRES_DB}")
#print(f"The value of POSTGRES_USER: {POSTGRES_USER}")
#print(f"The value of POSTGRES_PASSSWORD: {POSTGRES_PASSWORD}")
#print(f"The value of POSTGRES_HOST: {IP}")
config_dict = {"host":IP, "port":"5432", "user": POSTGRES_USER, "password": POSTGRES_PASSWORD, "dbname": POSTGRES_DB}
temp_store = PostgresStore(config_dict)
#print("temp_store type", type(temp_store))
if custom_properties is None:
custom_properties = {}
if not pipeline_name:
# assign folder name as pipeline name
cur_folder = os.path.basename(os.getcwd())
pipeline_name = cur_folder
config = mlpb.ConnectionConfig()
config.sqlite.filename_uri = filepath
self.store = metadata_store.MetadataStore(config)
self.store = temp_store.connect()
#print("self.store = ", self.store)
self.filepath = filepath
self.child_context = None
self.execution = None
Expand Down Expand Up @@ -419,7 +426,7 @@ def create_execution(
git_repo = git_get_repo()
git_start_commit = git_get_commit()
cmd = str(sys.argv) if cmd is None else cmd
python_env=get_python_env()

self.execution = create_new_execution_in_existing_run_context(
store=self.store,
# Type field when re-using executions
Expand All @@ -433,15 +440,14 @@ def create_execution(
pipeline_type=self.parent_context.name,
git_repo=git_repo,
git_start_commit=git_start_commit,
python_env=python_env,
custom_properties=custom_props,
create_new_execution=create_new_execution,
)
uuids = self.execution.properties["Execution_uuid"].string_value
if uuids:
self.execution.properties["Execution_uuid"].string_value = uuids+","+str(uuid.uuid1())
else:
self.execution.properties["Execution_uuid"].string_value = str(uuid.uuid1())
self.execution.properties["Execution_uuid"].string_value = str(uuid.uuid1())
self.store.put_executions([self.execution])
self.execution_name = str(self.execution.id) + "," + execution_type
self.execution_command = cmd
Expand All @@ -451,7 +457,7 @@ def create_execution(
self.execution_label_props["Execution_Name"] = (
execution_type + ":" + str(self.execution.id)
)

self.execution_label_props["execution_command"] = cmd
if self.graph:
self.driver.create_execution_node(
Expand Down Expand Up @@ -602,7 +608,6 @@ def merge_created_execution(
# print(custom_props)
git_repo = properties.get("Git_Repo", "")
git_start_commit = properties.get("Git_Start_Commit", "")
python_env = properties.get("Python_Env", "")
#name = properties.get("Name", "")
create_new_execution = True
execution_name = execution_type
Expand All @@ -623,7 +628,6 @@ def merge_created_execution(
pipeline_type=self.parent_context.name,
git_repo=git_repo,
git_start_commit=git_start_commit,
python_env=python_env,
custom_properties=custom_props,
create_new_execution=create_new_execution
)
Expand Down Expand Up @@ -658,13 +662,15 @@ def merge_created_execution(
self.execution.id,
custom_props,
)

return self.execution

def log_dvc_lock(self, file_path: str):
"""Used to update the dvc lock file created with dvc run command."""
print("Entered dvc lock file commit")
return commit_dvc_lock_file(file_path, self.execution.id)


def log_dataset(
self,
url: str,
Expand Down Expand Up @@ -1996,6 +2002,7 @@ def commit_existing(self, uri: str, props: t.Optional[t.Dict] = None, custom_pro
# print(last)
# os.symlink(str(index), slicedir + "/ " + last)


def metadata_push(pipeline_name: str, filepath = "./mlmd", tensorboard_path: str = "", execution_id: str = ""):
""" Pushes MLMD file to CMF-server.
Example:
Expand Down
11 changes: 5 additions & 6 deletions cmflib/cmf_merger.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,18 +165,18 @@ def create_original_time_since_epoch(mlmd_data):
"Pipeline"
][0]["create_time_since_epoch"]
for i in mlmd_data["Pipeline"][0]["stages"]:
i["custom_properties"]["original_create_time_since_epoch"] = i[
i["custom_properties"]["original_create_time_since_epoch"] = str(i[
"create_time_since_epoch"
]
])
original_stages.append(
i["custom_properties"]["original_create_time_since_epoch"]
)
stages.append(i["create_time_since_epoch"])
# print(i['custom_properties']['original_create_time_since_epoch'])
for j in i["executions"]:
j["custom_properties"]["original_create_time_since_epoch"] = j[
j["custom_properties"]["original_create_time_since_epoch"] = str(j[
"create_time_since_epoch"
]
])
original_execution.append(
j["custom_properties"]["original_create_time_since_epoch"]
)
Expand All @@ -185,7 +185,7 @@ def create_original_time_since_epoch(mlmd_data):
for k in j["events"]:
k["artifact"]["custom_properties"][
"original_create_time_since_epoch"
] = k["artifact"]["create_time_since_epoch"]
] = str(k["artifact"]["create_time_since_epoch"])
original_artifact.append(
k["artifact"]["custom_properties"][
"original_create_time_since_epoch"
Expand All @@ -197,4 +197,3 @@ def create_original_time_since_epoch(mlmd_data):
return mlmd_data



26 changes: 21 additions & 5 deletions cmflib/cmfquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,16 @@
# limitations under the License.
###
import abc
import os
import json
import logging
import typing as t
from enum import Enum
from google.protobuf.json_format import MessageToDict
import pandas as pd
from ml_metadata.metadata_store import metadata_store
#from ml_metadata.metadata_store import metadata_store
from cmflib.store.sqllite_store import SqlliteStore
from cmflib.store.postgres import PostgresStore
from ml_metadata.proto import metadata_store_pb2 as mlpb
from cmflib.mlmd_objects import CONTEXT_LIST

Expand Down Expand Up @@ -113,10 +116,23 @@ class CmfQuery(object):
filepath: Path to the MLMD database file.
"""

def __init__(self, filepath: str = "mlmd") -> None:
config = mlpb.ConnectionConfig()
config.sqlite.filename_uri = filepath
self.store = metadata_store.MetadataStore(config)
def __init__(self, filepath: str = "mlmd", is_server=False) -> None:
if is_server:
IP = os.getenv('MYIP')
POSTGRES_DB = os.getenv('POSTGRES_DB')
POSTGRES_USER = os.getenv('POSTGRES_USER')
POSTGRES_PASSWORD = os.getenv('POSTGRES_PASSWORD')
#print(f"The value of POSTGRES_DB is {POSTGRES_DB}")
#print(f"The value of POSTGRES_USER: {POSTGRES_USER}")
#print(f"The value of POSTGRES_PASSSWORD: {POSTGRES_PASSWORD}")
#print(f"The value of POSTGRES_HOST: {IP}")
config_dict = {"host":IP, "port":"5432", "user": POSTGRES_USER, "password": POSTGRES_PASSWORD, "dbname": POSTGRES_DB}
temp_store = PostgresStore(config_dict)
else:
temp_store = SqlliteStore({"filename":filepath})
#print("temp_store type", type(temp_store))
self.store = temp_store.connect()
#print("self.store = ", self.store)

@staticmethod
def _copy(
Expand Down
1 change: 1 addition & 0 deletions cmflib/store/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

13 changes: 13 additions & 0 deletions cmflib/store/cmfstore.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from abc import ABC, abstractmethod
from ml_metadata.metadata_store import metadata_store

class CmfStore(ABC):

def __init__(self, config):
self.config = config
super().__init__()

@abstractmethod
def connect(self)-> metadata_store:
cmf_store = metadata_store.MetadataStore(self.config)
return cmf_store
20 changes: 20 additions & 0 deletions cmflib/store/postgres.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#from cmfstore import CmfStore
from cmflib.store.cmfstore import CmfStore
from ml_metadata.proto import metadata_store_pb2 as mlpb
from ml_metadata.metadata_store import metadata_store

class PostgresStore(CmfStore):

def __init__(self, config):
self.connection_config = mlpb.ConnectionConfig()
self.connection_config.postgresql.host = config["host"]
self.connection_config.postgresql.port = config["port"]
self.connection_config.postgresql.user = config["user"]
self.connection_config.postgresql.password = config["password"]
self.connection_config.postgresql.dbname = config["dbname"]
super().__init__(self.connection_config)

def connect(self)-> metadata_store:
cmf_store = super().connect()
return cmf_store

17 changes: 17 additions & 0 deletions cmflib/store/sqllite_store.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from cmflib.store.cmfstore import CmfStore
from ml_metadata.metadata_store import metadata_store
from ml_metadata.proto import metadata_store_pb2 as mlpb

class SqlliteStore(CmfStore):

def __init__(self, config):

self.connection_config = mlpb.ConnectionConfig()
self.connection_config.sqlite.filename_uri = config["filename"]
self.connection_config.sqlite.connection_mode = 3
super().__init__(self.connection_config)

def connect(self)-> metadata_store:
cmf_store = super().connect()
return cmf_store

2 changes: 1 addition & 1 deletion cmflib/utils/helper_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,4 +131,4 @@ def generate_osdf_token(key_id, key_path, key_issuer) -> str:
except Exception as err:
print(f"Unexpected {err}, {type(err)}")

return dynamic_pass
return dynamic_pass
26 changes: 26 additions & 0 deletions docker-compose-server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,23 @@
###

services:
postgres:
image: postgres:13
container_name: postgres
environment:
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: ${POSTGRES_DB}
ports:
- "5432:5432"
volumes:
- /home/xxxx/cmf-server/data/postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -d $${POSTGRES_DB} -U $${POSTGRES_USER}"]
interval: 10s
timeout: 5s
retries: 5

tensorboard:
image: tensorflow/tensorflow
command: tensorboard --logdir /logs --host 0.0.0.0
Expand All @@ -24,6 +41,8 @@ services:
# directory path should be updated as per user's environment
- /home/xxxx/cmf-server/data/tensorboard-logs:/logs
container_name: tensorboard


server:
image: server:latest
# both the directory paths should be updated as per user's environment
Expand All @@ -39,6 +58,9 @@ services:
environment:
MYIP: ${IP:-127.0.0.1}
HOSTNAME: ${hostname:-localhost}
POSTGRES_DB: ${POSTGRES_DB}
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
ports:
- "8080:80"
expose:
Expand All @@ -47,6 +69,9 @@ services:
test: if [ -z $IP ]; then curl -f http://$hostname:8080; else curl -f http://$IP:8080; fi
interval: 15s
retries: 32
depends_on:
postgres:
condition: service_healthy

ui:
image: ui:latest
Expand All @@ -65,3 +90,4 @@ services:
depends_on:
server:
condition: service_healthy

3 changes: 0 additions & 3 deletions server/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,6 @@ RUN apt-get update -y && apt-get install -y curl
# Copy the requirements.txt file from the server to the cmf-server.
COPY ./server/requirements.txt /cmf-server/requirements.txt

# library required for lineage
RUN apt-get install -y graphviz graphviz-dev

# Install requirements.txt
RUN pip install --no-cache-dir --upgrade -r /cmf-server/requirements.txt

Expand Down
14 changes: 14 additions & 0 deletions server/app/dbconfig.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import os
from dotenv import load_dotenv

# Load .env variables
load_dotenv()

# Database configuration
DB_CONFIG = {
"user": os.getenv("POSTGRES_USER"),
"password": os.getenv("POSTGRES_PASSWORD"),
"database": os.getenv("POSTGRES_DB"),
"host": os.getenv("POSTGRES_HOST"),
"port": os.getenv("POSTGRES_PORT"),
}
Loading