Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring #21

Open
wants to merge 50 commits into
base: main
Choose a base branch
from
Open

Refactoring #21

wants to merge 50 commits into from

Conversation

Andrey170170
Copy link
Collaborator

Completed the first version of the Distributed downloader package. It is runnable and installable.
There are some non-critical problems:

  • need to write tests
  • need to somehow expose only certain functions from the package and not expose internal classes.

Andrey170170 and others added 30 commits June 11, 2024 18:52
Prepared file structure for package creation
some little fixes
fixes for scripts to be runnable in new file structure
Added config file in yaml format
Created a new wrapper scripts that controls the whole process to make the project more package like.
Finished `main.py` script
Added folder structure initialization steps to `server_prep.py` script (now renamed to `initialization.py`)
Created `fake_profiler.py`, it initializes profiles with constant rate_limit
Rewrote `MPI_download_prep` to follow a new logic of structure
Transferred downloader job submission inside schedule_creator
Added a restriction prohibiting user from running main.py if schedule_creation was already scheduled and haven't completed yet
Some minor changes
Small fix
Small fix
Added filtering scripts: based on image size and based on similarity between MD5 hashsum
Also added scripts to delete images that were filtered out
Added filtering scripts: based on image size and based on similarity between MD5 hashsum
Also added scripts to delete images that were filtered out
Some minor changes and fixes
Added name_table to have stable names between several sections of data transfer
minor updates
Fixed bug in schedule creation script.
Made downloader scripts consistent with new format of configuration (using `.yaml` file)
Added verification step inside downloading job (`slurm` files) to reduce total number of jobs that is scheduled
Added check for main function whether there is possibility of infinite loop or if all servers are downloaded
Added scripts to perform data merging
some small adjustments
Transferred code of all filters into a new file structure.
Changed the way how registry works, now it uses decorators
Added wrapper runner scripts for each stage of tool
Completed tools refactoring, haven't tested yet
# Conflicts:
#	README.md
#	requirements.txt
Some minor fixes
Some minor fixes
Andrey170170 and others added 5 commits July 23, 2024 15:27
Extracted Config and Checkpoint logic into separate classes
Updated all scripts to follow it
Updated tools to follow new Config/Checkpoint logic
Refactored code to follow snake_case scheme for all file fields
Added config checking mechanism (compares config with a template)
Added reset options for downloader and tools, so now it can be automatically relaunched
Updated structure to be package installable
Andrey170170 and others added 5 commits July 29, 2024 23:42
Updated documentation (Readme.md file)
Added example for ignored_servers
Small readme fixes
pyproject.toml Outdated Show resolved Hide resolved
Andrey170170 and others added 6 commits August 5, 2024 17:55
README.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants