-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Cannot use S3 as a remote manifest path #1126
Comments
Hello @ogescalante! I'm here to assist you with any bugs, questions, or contributions you have. Let's work together to resolve the issue you're encountering. To resolve the
Here's an example of how to set the from pathlib import Path
# Example of setting the manifest path
manifest_path = Path("s3://your-bucket/path/to/manifest.json")
# Ensure the path is set correctly in your configuration
project_config = ProjectConfig(
dbt_project_path=ct.ARTEMIS_PROJECT_DIR,
manifest_path=manifest_path,
project_name="artemis_pipeline"
)
# Validate the project to check for the existence of the manifest file
project_config.validate_project() Additionally, ensure that the Here's the relevant code for handling S3 paths: def bucket_and_key(path: str) -> Tuple[str, str]:
from urllib.parse import urlsplit
parsed_url = urlsplit(path)
bucket = parsed_url.netloc
key = parsed_url.path.lstrip("/")
return bucket, key
def open_s3_file(path: str, conn_id: Optional[str]) -> str:
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
from botocore.exceptions import ClientError
if conn_id is None:
conn_id = S3Hook.default_conn_name
hook = S3Hook(aws_conn_id=conn_id)
bucket, key = bucket_and_key(path)
try:
content = hook.read_key(key=key, bucket_name=bucket)
except ClientError as e:
if e.response.get("Error", {}).get("Code", "") == "NoSuchKey":
raise FileNotFoundError(f"{path} does not exist")
raise e
return content If the For example: profile_config = ProfileConfig(
profile_name="default",
target_name="dev",
profile_mapping=PostgresUserPasswordProfileMapping(
conn_id="example_conn",
profile_args={"schema": "public"},
),
) Make sure the connection ID By following these steps, you should be able to resolve the issue with using S3 as a remote location for your manifest file in Astronomer Cosmos version 1.6.0a3 [1][2][3]. |
hi @ogescalante , we have not included the PR #1109 that adds support for using remote manifest paths yet in any of our pre-releases. With |
Hi @pankajkoti thanks for getting back on this, consider please removing it from the official docs meanwhile since it is pretty frustrating to not being able to do this. |
For those with the same problem, my workarounf was implementing an s3 hook myself: from datetime import datetime
from airflow.models import Variable
from airflow import DAG
from cosmos import (
ProjectConfig,
ExecutionConfig,
DbtTaskGroup,
ExecutionMode,
RenderConfig,
LoadMode,
ProfileConfig
)
from airflow.hooks.S3_hook import S3Hook
import include.constants as ct
manifest_path = ct.ARTEMIS_PROJECT_DIR / "target" / "manifest.json"
def download_file_from_s3():
s3 = S3Hook()
bucket_name = ***
file_key = 'manifest.json'
s3.get_key(file_key, bucket_name).download_file(manifest_path)
with DAG(
dag_id="artemis_dag",
start_date=datetime(2023, 7, 10),
schedule_interval=None,
default_args=ct.DEFAULT_ARGS,
catchup=False,
tags=["Artemis Pipeline"],
description="Orchestrating the DBT models of the artemis-pipeline project",
max_active_runs="{{ var.value.get('artemis_active_runs', 1 }}",
max_active_tasks="{{ var.value.get('artemis_max_active_tasks', 32 }}",
) as dag:
run_artemis_pipeline = DbtTaskGroup(
group_id="artemis_pipeline",
render_config=RenderConfig(load_method=LoadMode.DBT_MANIFEST),
project_config=ProjectConfig(
dbt_project_path=ct.ARTEMIS_PROJECT_DIR,
manifest_path=manifest_path
),
execution_config=ExecutionConfig(
execution_mode=ExecutionMode.KUBERNETES,
),
operator_args={
"image": Variable.get("DBT_IMAGE"),
"namespace": Variable.get("NAMESPACE"),
"get_logs": True,
"is_delete_operator_pod": True,
},
)
run_artemis_pipeline |
Yes, the docs is confusing for the user at the moment. I have created an issue to fix it #1128 |
@ogescalante Thanks for voicing out. Apologies for the frustration caused due to the docs rendering from main branch. I just took a relook at the docs, and they do mention that this will only be available since and after Cosmos 1.6 which is yet to be released. Thanks @pankajastro for creating an issue to fix the docs. |
hi @ogescalante we just have created a pre-release 1.6.0a4 https://pypi.org/project/astronomer-cosmos/1.6.0a4/ that includes the relevant PR for remote manifest load. Appreciate if you would like to test it and provide any feedback! :) |
The ability to use S3 for remote manifest has been released with Cosmos 1.6.0 release. I invite you test it out. I'm closing this at the moment, but feel free to re-open in case you observe any issue. |
Astronomer Cosmos Version
Other Astronomer Cosmos version (please specify below)
If "Other Astronomer Cosmos version" selected, which one?
1.6.0a3
dbt-core version
1.8.1
Versions of dbt adapters
No response
LoadMode
CUSTOM
ExecutionMode
KUBERNETES
InvocationMode
None
airflow version
2.9.2
Operating System
Debian GNU/Linux 12 (bookworm)
If a you think it's an UI issue, what browsers are you seeing the problem on?
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
What happened?
I cannot see how to use S3 as remote location for my Manifest file, I keep getting this error:
My dag code:
I've installed the astronomer-cosmos[amazon] but the problem persists.
If I try to use the
manifest_conn_id
the ui says this argument does not exists.Relevant log output
No response
How to reproduce
try creating a DAG passing s3 as the manifest path location.
Anything else :)?
No response
Are you willing to submit PR?
Contact Details
No response
The text was updated successfully, but these errors were encountered: