Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AsyncMarkItDown as a wrapper #32

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

0xRaduan
Copy link

@0xRaduan 0xRaduan commented Dec 15, 2024

My take on how this code should support async, given all underlying libraries are not supporting async.

For more details/context, see: #13 (comment)

pyproject.toml Outdated Show resolved Hide resolved
tests/test_async_markitdown.py Outdated Show resolved Hide resolved
@0xRaduan
Copy link
Author

@microsoft-github-policy-service agree

@0xRaduan 0xRaduan mentioned this pull request Dec 15, 2024
@gagb gagb requested a review from jackgerrits December 16, 2024 21:59
@gagb
Copy link
Contributor

gagb commented Dec 16, 2024

@0xRaduan -- thanks for the PR.

@afourney @jackgerrits @0xRaduan,
I love this PR, but I am worried about the code duplication because of asynchronous tests.
What if we stuck with sync implementation, but added an FAQ showing how a user could create an asynchronous wrapper similar to your approach.

This was users know how to create an asynchronous wrapper, but we avoid the code/test duplication? Please lmk your thoughts.

@gagb gagb self-assigned this Dec 16, 2024
@0xRaduan
Copy link
Author

0xRaduan commented Dec 17, 2024

Hey @gagb - good feedback, I agree.

In general, this async code is just a wrapper over sync code [1], so technically you don't need to have a full suite of async tests, as all of the functionality should be tested in normal tests.

I would suggest keeping just one async test to verify that the wrapper works, and remove everything else. Initially I've migrated all tests to just verify that it works identically for my own's sake.

@gagb
Copy link
Contributor

gagb commented Dec 17, 2024

Hey @gagb - good feedback, I agree.

In general, this async code is just a wrapper over sync code [1], so technically you don't need to have a full suite of async tests, as all of the functionality should be tested in normal tests.

I would suggest keeping just one async test to verify that the wrapper works, and remove everything else. Initially I've migrated all tests to just verify that it works identically for my own's sake.

I like your framing. Would love @jackgerrits feedback on the strategy.

@gagb
Copy link
Contributor

gagb commented Dec 17, 2024

@0xRaduan can you fix the precommit and test errors

@0xRaduan
Copy link
Author

@gagb - fyi, updated the code, fixed linter, removed all tests and just kept one async test to make sure wrapper works in async environment.

@0xRaduan 0xRaduan requested a review from gagb December 18, 2024 09:49
Comment on lines +21 to +29
async def __aenter__(self):
"""Async context manager entry."""
return self

async def __aexit__(self, exc_type, exc_val, exc_tb):
"""Async context manager exit."""
pass
Copy link
Member

@jackgerrits jackgerrits Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why bother making this a context manager if it doesn't do anything? Not sure if I'm missing something here though

Comment on lines +43 to +46
func = partial(self._markitdown.convert, file_path, **kwargs)
return await self._loop.run_in_executor(None, func)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that this will run in a non blocking way?

@gagb gagb added the awaiting op response The PR is awaiting response/edits from the original poster. label Dec 20, 2024
Copy link
Contributor

@gagb gagb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve feedback by Jack!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting op response The PR is awaiting response/edits from the original poster.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants