-
Notifications
You must be signed in to change notification settings - Fork 79
Set up script for Composer and support for getting gcsDagLocation from Composer environment #585
Conversation
|
||
gcloud config set project $PROJECT | ||
gcloud auth login --activate $EMAIL | ||
sudo gcloud components repositories add https://storage.googleapis.com/composer-trusted-tester/components-2.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First thing is to install composer specific gcloud and it may vary slightly depending on the environment (e.g., VM, shell, local) before any gcloud * operations.
sudo apt-get --only-upgrade install kubectl google-cloud-sdk google-cloud-sdk-datastore-emulator \ | ||
google-cloud-sdk-pubsub-emulator google-cloud-sdk-app-engine-go google-cloud-sdk-app-engine-java \ | ||
google-cloud-sdk-app-engine-python google-cloud-sdk-cbt google-cloud-sdk-bigtable-emulator \ | ||
google-cloud-sdk-datalab |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can put apt-get first, and then composer gcloud installation, and finally composer related configs and environment creation.
PROJECT=${1:-cloud-ml-dev} | ||
EMAIL=${2:[email protected]} | ||
ZONE=${3:-us-central1} | ||
ENVIRONMENT=${3:-rajivpb-airflow} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct; fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you fixed the composer one but not airflow one yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my bad - I kept thinking that all the comments referred to the composer's setup.sh and was a little confused on getting multiple comments for the same change. Will fix in a different PR mostly because I'm trying to get this in before demo time today.
aadc6c6
to
94a3a75
Compare
PROJECT=${1:-cloud-ml-dev} | ||
EMAIL=${2:[email protected]} | ||
ZONE=${3:-us-central1} | ||
ENVIRONMENT=${3:-rajivpb-airflow} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix.
def gcs_dag_location(self): | ||
if not self._gcs_dag_location: | ||
environment_details = Api.environment_details_get(self._zone, self._environment) | ||
if 'config' not in environment_details \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per style guide, don't use \ for line continuation. Wrap the statement in parens instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
# parts after the bucket. In those cases, the final file path needs to include those as well | ||
additional_parts = '' | ||
if len(gcs_dag_location_splits) > 4: | ||
additional_parts = '/' + '/'.join(gcs_dag_location_splits[4:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of reconstructing this, just use the maxsplit argument to split so that the path section won't be split in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, cool. Done.
bucket = client.get_bucket(self.bucket_name) | ||
filename = 'dags/{0}.py'.format(name) | ||
try: | ||
gcs_dag_location_splits = self.gcs_dag_location.split('/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ , _, bucket, filepath = self.gcs_dag_location.split('/', 3)
You should have gcs_dag_location() verify that what it returns is in the expected format with trailing slash, etc. And test whether this works with a top level directory in the bucket (i.e. gcs_dag_location = "gs://bucket/" )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That made my life easy, thanks!
PROJECT=${1:-cloud-ml-dev} | ||
EMAIL=${2:[email protected]} | ||
ZONE=${3:-us-central1} | ||
ENVIRONMENT=${3:-datalab-testing-1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 instead of 3 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
gcloud config set project $PROJECT | ||
gcloud config set account $EMAIL | ||
gcloud auth login --activate $EMAIL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--activate is the default
Is this setup.sh only ever run manually? (Asking because users shouldn't gcloud auth login on a VM.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, setup.sh is only ever run manually on a local machine at the moment. It's not for the VM scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Removed --activate)
@@ -16,6 +16,7 @@ | |||
# Compiles the typescript sources to javascript and submits the files | |||
# to the pypi server specified as first parameter, defaults to testpypi | |||
# In order to run this script locally, make sure you have the following: | |||
# - A Python 3 environment (due to urllib issues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue open to fix this for Py2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or 'gcsDagLocation' not in environment_details.get('config'): | ||
raise ValueError('Dag location unavailable from Composer environment {0}'.format( | ||
self._environment)) | ||
self._gcs_dag_location = environment_details['config']['gcsDagLocation'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a slash to the end of the location if it doesn't end with a slash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
4a7c7ee
to
58b9167
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Chris and Bradley! Feedback addressed; PTAL.
@@ -16,6 +16,7 @@ | |||
# Compiles the typescript sources to javascript and submits the files | |||
# to the pypi server specified as first parameter, defaults to testpypi | |||
# In order to run this script locally, make sure you have the following: | |||
# - A Python 3 environment (due to urllib issues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def gcs_dag_location(self): | ||
if not self._gcs_dag_location: | ||
environment_details = Api.environment_details_get(self._zone, self._environment) | ||
if 'config' not in environment_details \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
# parts after the bucket. In those cases, the final file path needs to include those as well | ||
additional_parts = '' | ||
if len(gcs_dag_location_splits) > 4: | ||
additional_parts = '/' + '/'.join(gcs_dag_location_splits[4:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, cool. Done.
or 'gcsDagLocation' not in environment_details.get('config'): | ||
raise ValueError('Dag location unavailable from Composer environment {0}'.format( | ||
self._environment)) | ||
self._gcs_dag_location = environment_details['config']['gcsDagLocation'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
bucket = client.get_bucket(self.bucket_name) | ||
filename = 'dags/{0}.py'.format(name) | ||
try: | ||
gcs_dag_location_splits = self.gcs_dag_location.split('/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That made my life easy, thanks!
58b9167
to
2934bd5
Compare
'Dag location {0} from Composer environment {1} is in incorrect format'.format( | ||
gcs_dag_location, self._environment)) | ||
|
||
self._gcs_dag_location = gcs_dag_location + '/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it already ends with a slash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially disallowed it via regex, but later decided to be more tolerant. Inserted check now.
0917b1f
to
009a5e5
Compare
PROJECT=${1:-cloud-ml-dev} | ||
EMAIL=${2:[email protected]} | ||
ZONE=${3:-us-central1} | ||
ENVIRONMENT=${3:-rajivpb-airflow} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you fixed the composer one but not airflow one yet?
@@ -21,6 +20,10 @@ def _create_pipeline_subparser(parser): | |||
'transform data using BigQuery.') | |||
pipeline_parser.add_argument('-n', '--name', type=str, help='BigQuery pipeline name', | |||
required=True) | |||
pipeline_parser.add_argument('-e', '--environment', type=str, | |||
help='The name of the Composer or Airflow environment.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only the name of the Composer right? Not used in Airflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. At the of putting the comment, I made it slightly future looking since we plan to work on an Airflow environment. Will change in a different PR mostly because I'm trying to get this in before demo time today.
@@ -44,11 +47,17 @@ def _pipeline_cell(args, cell_body): | |||
bq_pipeline_config = utils.commands.parse_config( | |||
cell_body, utils.commands.notebook_environment()) | |||
pipeline_spec = _get_pipeline_spec_from_config(bq_pipeline_config) | |||
# TODO(rajivpb): This import is a stop-gap for | |||
# https://github.com/googledatalab/pydatalab/issues/593 | |||
import google.datalab.contrib.pipeline._pipeline | |||
pipeline = google.datalab.contrib.pipeline._pipeline.Pipeline(name, pipeline_spec) | |||
utils.commands.notebook_environment()[name] = pipeline | |||
|
|||
# If a composer environment and zone are specified, we deploy to composer | |||
if 'environment' in args and 'zone' in args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "'environment' in args and 'zone' in args" is always True. If user does not provide values for them, they will be None but still 'environment' and 'zone' exist in args.
Perhaps you want "if args['environment'] and args[ 'zone']:".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't know that. Will verify and fix in a different PR mostly because I'm trying to get this in before demo time today.
No description provided.