Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] : Implementation of read_archive function #1438

Closed

Conversation

Sabrina-Hassaim
Copy link

@Sabrina-Hassaim Sabrina-Hassaim commented Jan 21, 2025

PR Description

Please describe the changes proposed in the pull request:

1. Implementation of the read_archive Function:

  • Added a new method to read archive files (.zip, .tar, .tar.gz) and extract their contents as a DataFrame or a list of compatible files.
  • Supports CSV and Excel file formats within the archives.

2. Unit Tests

  • Added tests to validate the behavior of the read_archive method:
  • Ensures correct reading of files from .zip and .tar.gz formats.
  • Handles cases where the file is not a valid archive or does not contain compatible files.
  • Tests include interactive behavior for file selection.

This PR resolves #(put issue number here, and remove parentheses).

PR Checklist

Please ensure that you have done the following:

  1. PR in from a fork off your branch. Do not PR from <your_username>:dev, but rather from <your_username>:<feature-branch_name>.
  1. If you're not on the contributors list, add yourself to AUTHORS.md.
  1. Add a line to CHANGELOG.md under the latest version header (i.e. the one that is "on deck") describing the contribution.
    • Do use some discretion here; if there are multiple PRs that are related, keep them in a single line.

Automatic checks

There will be automatic checks run on the PR. These include:

  • Building a preview of the docs on Netlify
  • Automatically linting the code
  • Making sure the code is documented
  • Making sure that all tests are passed
  • Making sure that code coverage doesn't go down.

Relevant Reviewers

Please tag maintainers to review.

Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sabrina-Hassaim thank you for kickstarting this PR! It was a relatively easy one to review, and I just happened to have a small chunk of time to review it. I'm going to request changes here, as it seems to me that anything related to I/O should live in io.py and follow the patterns there. Once we're done with these changes and @samukweku has reviewed, I think we can merge and cut a new release.

tests/functions/test_complete.py Outdated Show resolved Hide resolved
janitor/functions/read_archive.py Outdated Show resolved Hide resolved
tests/test_simple.py Outdated Show resolved Hide resolved
tests/test_documentation_build.py Outdated Show resolved Hide resolved
tests/functions/test_read_archive.py Outdated Show resolved Hide resolved
janitor/spark/backend.py Outdated Show resolved Hide resolved
@samukweku
Copy link
Collaborator

@Sabrina-Hassaim thanks for the PR.there are some failed checks in the ci. Kindly fix that before the maintainers review.

@Sabrina-Hassaim
Copy link
Author

Hello @samukweku! Thanks for your response.

I’m encountering an issue with building the documentation using mkdocs, and I believe it’s related to the AUTHORS.md file.

Here’s what I did: I added my name to the AUTHORS.md file located in the root directory.

In the mkdocs.yml configuration file, the AUTHORS.md file is referenced in the navigation section, but it points to mkdocs/AUTHORS.md, which contains a relative reference (../AUTHORS.md) to the root file.

However, when I try to build the documentation, it fails both locally and on GitHub Actions with the following error:

ERROR - File not found: AUTHORS.md
ERROR - Error reading page 'AUTHORS.md': [Errno 2] No such file or directory: '/home/runner/work/pyjanitor/pyjanitor/mkdocs/AUTHORS.md'

It seems the ../AUTHORS.md reference in mkdocs/AUTHORS.md is not being resolved correctly. Locally, I ran make docs in my environment, but the same error occurs.

Do you have an idea how should I handle this issue? Would it be acceptable to point mkdocs.yml directly to the root AUTHORS.md file instead of using the relative reference?

Thanks in advance for your guidance! Let me know if more details are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants