-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize get_movie Calls #35
Conversation
app/util/concurrency.py
Outdated
>>> call_parallel(calls, timeout=2) | ||
[None, 1, None, None] | ||
""" | ||
if timeout == None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the declaration, you can just do call_parallel(funcs, timeout=DEFAULT_TIMEOUT)
The None
thing is useful if there's an object you want to assign (b/c python keeps using the same instance of the object), but for an integer it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
app/util/concurrency.py
Outdated
|
||
start_time = get_now() | ||
for func in funcs: | ||
new_thread = ResultThread(None, target=func, daemon=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit more concise:
threads = [ResultThread(None, target=func, daemon=True) for func in funcs]
for thread in threads:
thread.start()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list comprehension ran into an overwriting issue like this:
https://stackoverflow.com/questions/32595586/in-python-why-do-lambdas-in-list-comprehensions-overwrite-themselves-in-retrosp
Didn't want to invest too much time into this, so just using a for..in loop.
app/util/concurrency.py
Outdated
""" | ||
class ResultThread(Thread): | ||
def run(self): | ||
self.result = self._target() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally you're not supposed to call _
-prefixed properties directly (since Python doesn't have private methods, it's just done by convention). I don't know anything about python threads, but this seems like a red flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through this more, is the target
function not supposed to have a return value? We could also like construct a bunch of functions that store their values in an array (although that sounds like more overhead than we want...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (overrode init to use a variable self.result)
app/util/concurrency.py
Outdated
class ResultThread(Thread): | ||
def run(self): | ||
self.result = self._target() | ||
def get_result(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need an accessor method? I think we could just do thread.result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, Java habit
threads.append(new_thread) | ||
new_thread.start() | ||
for thread in threads: | ||
time_elapsed = get_now() - start_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How important is the extra few milliseconds here? I think there's a bit less complexity if we just pass in timeout, even if technically the thread gets to run for a few extra milliseconds. Then we don't need to worry about get_now()
, recording start time, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thread.join's first argument is the amount of time we're actually waiting for the thread to finish since calling thread.join on that thread. As an extreme example, if there are 20 threads, and each one exceeds our maximum timeout of 10 seconds, we will wait for 200 seconds. Keeping the 'start time' keeps this timeout relative to when we actually kicked them all off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooooh, interesting. Makes sense.
app/util/concurrency.py
Outdated
thread.join(timeout - time_elapsed) | ||
# If the thread is still alive, it's running past the timeout. | ||
if thread.is_alive(): | ||
results.append(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it seems like we'd want a way to distinguish between the function returning None and the function timing out - could we change this to not fail silently? I think maybe we'd want to throw an error so that the client could retry maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
threads.append(new_thread) | ||
new_thread.start() | ||
for thread in threads: | ||
time_elapsed = get_now() - start_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooooh, interesting. Makes sense.
Parallelized get_movie calls in movie_preference_controller by creating a utility function which calls service calls in parallel.