-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add method to upload sources from schema and metadata #82
Changes from 3 commits
a4ea5d0
77e2c0d
e550231
8d9e924
f7d0546
01f2a2d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -52,6 +52,27 @@ def wait_for_batch_status(self, batch, status): | |||||
else: | ||||||
raise ValueError("The batch did not reach the '%s' state in the " | ||||||
"given time. Please check again later." % status) | ||||||
|
||||||
def add_schema_metadata(self, ds, source_url, filename, fp, mimetype, schema, metadata): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't need the dataset here, it's only being used for the session, this could be obtained from the connection I believe. This signature should be:
Suggested change
I think that for this case, since it's going to be crunch lake, the mimetype is always known because it has to be a parquet file, we don't accept anything else do we? Note that Additionally, add a docstring indicating what are the types of the arguments. Particularly, schema and metadata, are they dictionaries? Or are they file pointers to my local json files? |
||||||
response = ds.session.post( | ||||||
source_url, | ||||||
files={ | ||||||
"uploaded_file": (filename, fp, mimetype) | ||||||
}, | ||||||
data={ | ||||||
"schema": schema, | ||||||
"metadata": metadata, | ||||||
"crunchlake": "create", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this argument? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's crunchlake specific. @mgdotdev can expand on this. |
||||||
"dataset_id": ds.get("self", "").split("/")[-2] | ||||||
} | ||||||
) | ||||||
|
||||||
return shoji.Entity(ds.session, body={ | ||||||
"status_code": response.status_code, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These aren't the attributes of a new Source entity. These So, the only attribute you should be able to fill form there is the See how it's being done here:
The Then the consumer will be able to do |
||||||
"payload": response.payload, | ||||||
"source_url": response.headers.get("Location") | ||||||
} | ||||||
) | ||||||
|
||||||
def add_source(self, ds, filename, fp, mimetype): | ||||||
"""Create a new Source on the given dataset and return its URL.""" | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
import json | ||
import subprocess | ||
import time | ||
|
||
|
||
class TestCrunchLakeWorkflow: | ||
def test_ping(self): | ||
command = ["cr.core.launch", "ping", "crunch-lake-ping", '--kwargs={"pong": "test-ping"}'] | ||
p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this test? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is to check if we have connection to the lake service from pycrunch There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So it's just a temporary test? It doesn't feel like this is the right place. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Alig1493 something to notice is that this pycrunch is an open source consumer library. It does not communicate with our internals at Crunch. The tests we implement here should be able to run stand alone alone. We can do the integration tests in other internal repositories. |
||
start = time.time() | ||
stdout = p.stdout.readline() | ||
stderr = p.stderr.readline() | ||
|
||
while time.time() - start < 30: | ||
try: | ||
assert "Workflow launched with id" in stderr.decode("utf-8") | ||
except AssertionError as assertion_error: | ||
if not time.time() - start < 30: | ||
raise assertion_error | ||
time.sleep(.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this print