Skip to content
This repository has been archived by the owner on Sep 3, 2022. It is now read-only.

Set up script for Composer and support for getting gcsDagLocation from Composer environment #585

Merged
merged 21 commits into from
Nov 10, 2017

Conversation

rajivpb
Copy link
Contributor

@rajivpb rajivpb commented Nov 2, 2017

No description provided.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 77.771% when pulling b5eafa6 on rajivpb-composer-setup into 025b300 on master.


gcloud config set project $PROJECT
gcloud auth login --activate $EMAIL
sudo gcloud components repositories add https://storage.googleapis.com/composer-trusted-tester/components-2.json

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First thing is to install composer specific gcloud and it may vary slightly depending on the environment (e.g., VM, shell, local) before any gcloud * operations.

sudo apt-get --only-upgrade install kubectl google-cloud-sdk google-cloud-sdk-datastore-emulator \
google-cloud-sdk-pubsub-emulator google-cloud-sdk-app-engine-go google-cloud-sdk-app-engine-java \
google-cloud-sdk-app-engine-python google-cloud-sdk-cbt google-cloud-sdk-bigtable-emulator \
google-cloud-sdk-datalab

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can put apt-get first, and then composer gcloud installation, and finally composer related configs and environment creation.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 77.825% when pulling 57111a7 on rajivpb-composer-setup into 8bd7431 on master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 77.825% when pulling 57111a7 on rajivpb-composer-setup into 8bd7431 on master.

@rajivpb rajivpb requested a review from qimingj November 9, 2017 01:46
@coveralls
Copy link

Coverage Status

Coverage remained the same at 77.985% when pulling c422b10 on rajivpb-composer-setup into 2295d7b on master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 77.985% when pulling c422b10 on rajivpb-composer-setup into 2295d7b on master.

PROJECT=${1:-cloud-ml-dev}
EMAIL=${2:[email protected]}
ZONE=${3:-us-central1}
ENVIRONMENT=${3:-rajivpb-airflow}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be 4?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct; fixed now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you fixed the composer one but not airflow one yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my bad - I kept thinking that all the comments referred to the composer's setup.sh and was a little confused on getting multiple comments for the same change. Will fix in a different PR mostly because I'm trying to get this in before demo time today.

@rajivpb rajivpb force-pushed the rajivpb-composer-setup branch from aadc6c6 to 94a3a75 Compare November 10, 2017 04:08
@rajivpb rajivpb changed the title Bash script for setting up Composer Set up Composer and get gcsDagLocation Nov 10, 2017
@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 77.486% when pulling 959d4d5 on rajivpb-composer-setup into 7a472e9 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 77.486% when pulling dc02c7c on rajivpb-composer-setup into 7a472e9 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.07%) to 77.499% when pulling dc02c7c on rajivpb-composer-setup into 7a472e9 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.07%) to 77.499% when pulling dc02c7c on rajivpb-composer-setup into 7a472e9 on master.

@rajivpb rajivpb requested a review from chmeyers November 10, 2017 05:21
@coveralls
Copy link

Coverage Status

Coverage increased (+0.07%) to 77.499% when pulling e642b2f on rajivpb-composer-setup into 7a472e9 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 77.493% when pulling a44f131 on rajivpb-composer-setup into 7a472e9 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 77.489% when pulling 9774d10 on rajivpb-composer-setup into 7a472e9 on master.

PROJECT=${1:-cloud-ml-dev}
EMAIL=${2:[email protected]}
ZONE=${3:-us-central1}
ENVIRONMENT=${3:-rajivpb-airflow}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix.

def gcs_dag_location(self):
if not self._gcs_dag_location:
environment_details = Api.environment_details_get(self._zone, self._environment)
if 'config' not in environment_details \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per style guide, don't use \ for line continuation. Wrap the statement in parens instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

# parts after the bucket. In those cases, the final file path needs to include those as well
additional_parts = ''
if len(gcs_dag_location_splits) > 4:
additional_parts = '/' + '/'.join(gcs_dag_location_splits[4:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of reconstructing this, just use the maxsplit argument to split so that the path section won't be split in the first place.

https://docs.python.org/2/library/stdtypes.html#str.split

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, cool. Done.

bucket = client.get_bucket(self.bucket_name)
filename = 'dags/{0}.py'.format(name)
try:
gcs_dag_location_splits = self.gcs_dag_location.split('/')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ , _, bucket, filepath = self.gcs_dag_location.split('/', 3)

You should have gcs_dag_location() verify that what it returns is in the expected format with trailing slash, etc. And test whether this works with a top level directory in the bucket (i.e. gcs_dag_location = "gs://bucket/" )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That made my life easy, thanks!

PROJECT=${1:-cloud-ml-dev}
EMAIL=${2:[email protected]}
ZONE=${3:-us-central1}
ENVIRONMENT=${3:-datalab-testing-1}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 instead of 3 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


gcloud config set project $PROJECT
gcloud config set account $EMAIL
gcloud auth login --activate $EMAIL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--activate is the default

Is this setup.sh only ever run manually? (Asking because users shouldn't gcloud auth login on a VM.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, setup.sh is only ever run manually on a local machine at the moment. It's not for the VM scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Removed --activate)

@@ -16,6 +16,7 @@
# Compiles the typescript sources to javascript and submits the files
# to the pypi server specified as first parameter, defaults to testpypi
# In order to run this script locally, make sure you have the following:
# - A Python 3 environment (due to urllib issues)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue open to fix this for Py2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or 'gcsDagLocation' not in environment_details.get('config'):
raise ValueError('Dag location unavailable from Composer environment {0}'.format(
self._environment))
self._gcs_dag_location = environment_details['config']['gcsDagLocation']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a slash to the end of the location if it doesn't end with a slash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 77.489% when pulling bb69916 on rajivpb-composer-setup into 7a472e9 on master.

@rajivpb rajivpb force-pushed the rajivpb-composer-setup branch from 4a7c7ee to 58b9167 Compare November 10, 2017 18:47
Copy link
Contributor Author

@rajivpb rajivpb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Chris and Bradley! Feedback addressed; PTAL.

@@ -16,6 +16,7 @@
# Compiles the typescript sources to javascript and submits the files
# to the pypi server specified as first parameter, defaults to testpypi
# In order to run this script locally, make sure you have the following:
# - A Python 3 environment (due to urllib issues)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def gcs_dag_location(self):
if not self._gcs_dag_location:
environment_details = Api.environment_details_get(self._zone, self._environment)
if 'config' not in environment_details \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

# parts after the bucket. In those cases, the final file path needs to include those as well
additional_parts = ''
if len(gcs_dag_location_splits) > 4:
additional_parts = '/' + '/'.join(gcs_dag_location_splits[4:])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, cool. Done.

or 'gcsDagLocation' not in environment_details.get('config'):
raise ValueError('Dag location unavailable from Composer environment {0}'.format(
self._environment))
self._gcs_dag_location = environment_details['config']['gcsDagLocation']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

bucket = client.get_bucket(self.bucket_name)
filename = 'dags/{0}.py'.format(name)
try:
gcs_dag_location_splits = self.gcs_dag_location.split('/')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That made my life easy, thanks!

@rajivpb rajivpb force-pushed the rajivpb-composer-setup branch from 58b9167 to 2934bd5 Compare November 10, 2017 18:54
'Dag location {0} from Composer environment {1} is in incorrect format'.format(
gcs_dag_location, self._environment))

self._gcs_dag_location = gcs_dag_location + '/'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it already ends with a slash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially disallowed it via regex, but later decided to be more tolerant. Inserted check now.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 77.483% when pulling 2934bd5 on rajivpb-composer-setup into 7a472e9 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 77.483% when pulling 2934bd5 on rajivpb-composer-setup into 7a472e9 on master.

@rajivpb rajivpb force-pushed the rajivpb-composer-setup branch from 0917b1f to 009a5e5 Compare November 10, 2017 19:38
@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 77.483% when pulling 009a5e5 on rajivpb-composer-setup into 7a472e9 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 77.489% when pulling 009a5e5 on rajivpb-composer-setup into 7a472e9 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 77.489% when pulling 009a5e5 on rajivpb-composer-setup into 7a472e9 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 77.489% when pulling 009a5e5 on rajivpb-composer-setup into 7a472e9 on master.

PROJECT=${1:-cloud-ml-dev}
EMAIL=${2:[email protected]}
ZONE=${3:-us-central1}
ENVIRONMENT=${3:-rajivpb-airflow}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you fixed the composer one but not airflow one yet?

@@ -21,6 +20,10 @@ def _create_pipeline_subparser(parser):
'transform data using BigQuery.')
pipeline_parser.add_argument('-n', '--name', type=str, help='BigQuery pipeline name',
required=True)
pipeline_parser.add_argument('-e', '--environment', type=str,
help='The name of the Composer or Airflow environment.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only the name of the Composer right? Not used in Airflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. At the of putting the comment, I made it slightly future looking since we plan to work on an Airflow environment. Will change in a different PR mostly because I'm trying to get this in before demo time today.

@@ -44,11 +47,17 @@ def _pipeline_cell(args, cell_body):
bq_pipeline_config = utils.commands.parse_config(
cell_body, utils.commands.notebook_environment())
pipeline_spec = _get_pipeline_spec_from_config(bq_pipeline_config)
# TODO(rajivpb): This import is a stop-gap for
# https://github.com/googledatalab/pydatalab/issues/593
import google.datalab.contrib.pipeline._pipeline
pipeline = google.datalab.contrib.pipeline._pipeline.Pipeline(name, pipeline_spec)
utils.commands.notebook_environment()[name] = pipeline

# If a composer environment and zone are specified, we deploy to composer
if 'environment' in args and 'zone' in args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "'environment' in args and 'zone' in args" is always True. If user does not provide values for them, they will be None but still 'environment' and 'zone' exist in args.

Perhaps you want "if args['environment'] and args[ 'zone']:".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't know that. Will verify and fix in a different PR mostly because I'm trying to get this in before demo time today.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants