Refactoring
General Guide
Since it is deployed directly from the repository, master
branch will stay.
We will have a separate main
branch and will merge refactored code into main
.
Once we finish an integration tests and confirms that it is doing at least the same job as the current implementation,
we can set the main
as a new master.
TL;DR: Open a pull request into main
General Guide
Since it is deployed directly from the repository, master
branch will stay.
We will have a separate main
branch and will merge refactored code into main
.
Once we finish an integration tests and confirms that it is doing at least the same job as the current implementation,
we can set the main
as a new master.
TL;DR: Open a pull request into main
, not into master
.
Higher Level TODOs
- Create a package
scicat_ingestor
and gradually move all the existing code to the package.
It not for deployment so it just needs to be accessible to the entry point script (The command or a script that is used for
deployment) - Setting up smaller scope tests environments
- Setting up automated continuous integration tests environments
- Update documentation
- Turn
main
branch the new master branch and archivemaster
Brainstorming what should be done for refactoring.
Feel free to shoot down the ideas : D
Encapsulating partial code into modules/functions
For example, these can be split into another module...?
scicat-filewriter-ingest/online_ingestor.py
Lines 116 to 124 in 3c5c22e
Kafka Consumer/Message Handling
As it is carrying the core-logic, we maybe better wrap the confluent kafka interfaces so that we can easily test them as well...?
For example,
scicat-filewriter-ingest/online_ingestor.py
Lines 71 to 77 in 3c5c22e
This kind of error-handling can be separated into a smaller function.
Or
Wrap this into a function or automate this somehow...?
scicat-filewriter-ingest/online_ingestor.py
Lines 32 to 40 in 3c5c22e
Logging
Logging seems like to require some configuration steps but despite of the configuration the logging interface logger.log(info/debug/error)
should be the same.
But since the configuration of the logger requires file or link handling, it'd better be tested I guess...?
Ingestor
This guy definitely seems like to need some chopping.
scicat-filewriter-ingest/ingestor_lib.py
Line 222 in 3c5c22e
For example,
I think this should be wrapped into a separate file handling interface so that we can test them.
scicat-filewriter-ingest/ingestor_lib.py
Lines 258 to 261 in 3c5c22e
Redefining Interfaces
Nexus structure path
scicat-filewriter-ingest/ingestor_lib.py
Lines 28 to 33 in 3c5c22e
This mixture of filter
and indexing
can maybe refactored into more explicit way...?
But I also think it is already working conveniently already... hmm...
Ingestor
-
Maybe not redefining, but just better type-hinting or wrapping some of the arguments into a tighter data structure.
scicat-filewriter-ingest/ingestor_lib.py
Line 222 in 3c5c22e
-
Some of
if
statements can be merged into one if we letget_instrument
to acceptdefaultInstrument
likeget_nested_value_with_default
.
scicat-filewriter-ingest/ingestor_lib.py
Lines 384 to 392 in 3c5c22e
Documentation / Docstrings
Maybe we can collect the comments into docstrings. Copilot is good at it : D
And the documentation in the readme might be sufficient but we probably need to clean that up too.
Here is a list of some sections that might be needed in the documentation.
- Deployment Environment/Method
- Whole structure (briefly, maybe mermaid graph : D)
- Logging options
- API dependencies
- Kafka Consumer/Message Handling
It seems quite straight forward but since it's the core part of it. Maybe we can document it better.
Shall we write some mermaid graph : D....??