-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content bytes option #56
Conversation
Hi @kd2718, thanks for the pull request. Before we can merge it, we need you to sign our Contributor License Agreement. You can do so electronically here: http://opensource.box.com/cla Once you have signed, just add a comment to this pull request saying, "CLA signed". Thanks! |
|
Hi, thanks for the PR. Before we can merge it, the unit tests will need to be updated, and the option should probably also be added to the download method. Are you interested in doing that? If not, I can pick up where you've left off - just let me know. |
@Jeff-Meadows I see what I can finish over the next few days. Thanks! |
|
""" | ||
Get the content of a file on Box. | ||
|
||
:peram bytes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: param, not peram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, bytes
is a built-in type in Python 3. Even though this is valid to do in Python, it still might be better to avoid that name. Something like byte_range
or byte_range_set
is more descriptive anyway.
I updated @jmoldow suggestions. I am not familiar with building unit tests. I could make a few guesses, but I felt it would be better if you all implemented them correctly. Once they are in, I can look over them for an idea for next time. Thanks, |
Hi @kd2718 - thanks for the updates. I can merge this to a branch and work on some unit tests for it. In the meantime, could you make sure you've signed the CLA, and once you have, leave a comment here saying "CLA signed"? |
I was originally going to suggest that the function should take a start and end position, something like def content(self, first_byte_position=None, last_byte_position=None) I find this to be more Pythonic, as it is similar to functions like But then I realized that there are a few different scenarios that the RFC calls for:
We could do this within the function, but I think this would be hard to do in a way that is future-proof (in case the other forms become allowed in the future), and (if more forms become available in the future) hard to do in a way that is not confusing to users of the SDK. The docstring would need to explain the semantics of first_byte_position, last_byte_position, and suffix_length; and explain the three different valid combinations of parameters. And the function would need to figure out what to do depending on what parameters are passed. Instead, I think we should move all of this into a new class with a smart constructor. My proposal is something like this: class ByteRange(object):
"""Represents a byte range for a byte range retrieval request, from section 14.35 of W3 RFC 2616: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35."""
def range_header_value(self):
return 'bytes={}'.format(self.byte_range_set())
@abc.abstractmethod
def byte_range_set(self):
pass
class ByteRangeSpec(ByteRange):
"""Represents a single, continuous byte range, where first_byte_position is the (0-indexed) position of the first byte of the entity that should be concluded, and last_byte_position is the position of the last byte that should be included (None is taken to mean EOF). E.g., ByteRangeSpec(2, 4) represents skipping bytes 0 and 1, including bytes 2, 3, and 4, and skipping all bytes beyond that. And ByteRangeSpec(2) includes all bytes except for 0 and 1.
"""
def __init__(self, first_byte_position, last_byte_position=None):
self.first_byte_position = first_byte_position
self.last_byte_position = last_byte_position
def byte_range_set(self):
return '{}-{}'.format(self.first_byte_position, (self.last_byte_position or ''))
# In the hypothetical future.
#class SuffixByteRangeSpec(ByteRange):
# def __init__(self, suffix_length):
# self.suffix_length = suffix_length
# def byte_range_set(self):
# return '-{}'.format(self.suffix_length) We can use these class's docstrings to explain all the semantics and behaviors, rather than cluttering the content and download_to functions. For example, we'll want to explain:
This way, we also don't have to duplicate logic in the content and download_to functions. I guess one downside is that right now, only the ByteRangeSpec form is supported, so this seems a little heavy-handed, especially if none of the other forms ever get implemented. I still like it though. @Jeff-Meadows @kd2718 what do you think? |
@jmoldow and @Jeff-Meadows This plays into an idea I had. If the file being downloaded was large, it may be more beneficial to download it in chunks, rather than all at once. What if Download_to function returned an an object that could be repeatedly called to get specified chunks from that file? However, at this point, we would have to make a new api call for each chunk that is downloaded. If I am not making my self clear, Think of this as similar to fid.readlines(). But we would call file.get_bytes() and the object would keep track of where we are in the file. let me know if this sounds ok. |
|
I got distracted and had to step away from this for a while. Is this still something you are still interested in? @Jeff-Meadows @jmoldow |
I'm closing this PR for now. The work will be tracked in issue #95, which I've started work on. This will be a more generic and Pythonic implementation, along the lines of what I mentioned in my comment. Thanks a lot for the initial pass at this! With regards to your other comment:
I definitely agree that we should support chunked downloading. However, we don't need to do this via Byte Ranges, as HTTP/1.0 already has support for receiving a non-byte-range request in chunks, and the requests library already exposes this in its API. I've opened a ticket for my idea in #96. |
Added the option to include bytes when requesting the contents of a file. This is a feature that is available in API calls, but not originally in the boxsdk for python.
Example:
CLA signed