Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance documentation #53

Merged
merged 2 commits into from
Apr 26, 2024
Merged

Enhance documentation #53

merged 2 commits into from
Apr 26, 2024

Conversation

kouloumos
Copy link
Member

Refined project documentation and added comments during an in-depth review to map out the project's architecture and component interactions.

@urvishp80 I would appreciate if you can review and verify that my attempt to explain the flow and what each script does is correct. Also, I prefixed a few comments with @?, those are questions that I have and I thought that it would be easier to discuss/answer them in-line. I will delete them (or transform them to @TODO) before merging this.

The cron-jobs are also documented on the sequence diagram that I created for Bitcoin Search & friends. I would also appreciate a review of that one.

@kouloumos kouloumos changed the title Add docs Enhance documentation Apr 10, 2024
Copy link
Collaborator

@urvishp80 urvishp80 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kouloumos, I've reviewed all the files that you've updated and provided comments wherever necessary. Please feel free to add TODO and merge this PR to the main after resolving conflicts. Once merged, I'll start refactoring the code you highlighted along with some I found while investigating the code.

src/xml_utils.py Outdated
if len(xmls_list) > 0 and not any(combined_filename in item for item in files_list):
# If individual summaries exist but no combined summary,
# extract and append their content to the dictionary
# @? in what scenario is this true?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using it as a fallback strategy, there are many cases where individual summaries are present and not the combined ones, for example:
i. The combined summary file generated isn't up to the mark due to some garbage individual posts. So, we have to manually delete the combined summary file and regenerate it.
ii. For delving bitcoin data we haven't generated all the XML files for the month before November 2023 so it might help in some exceptional cases of individual summary files.

So, let's keep it that way only.

@@ -55,6 +55,8 @@
logger.success(f"TOTAL THREADS RECEIVED FOR '{dev_name}': {len(data_list)}")

# NEW THREADS POSTS
# @TODO you already identify the original post by type==original_post
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, It seems like we can implement this change now. Earlier not all the domains had a type field so we used to filter the original post based on created_at and title.

src/xml_utils.py Outdated

# For each individual summary (XML file) that exists for the
# given thread, extract and append their content to the dictionary
# @? This method is called for every post without a summary, which means that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function file_not_present_df indeed needs a refactoring. Especially, improvements in file handling, reducing the number of operations inside the loop, and streamlining the data-appending logic.

Please transform it to TODO so I can make the necessary changes in it.

src/xml_utils.py Outdated
@@ -180,6 +204,9 @@ def file_not_present_df(self, columns, source_cols, df_dict, files_list, dict_da
self.append_columns(df_dict, file, title, namespace)

if combined_filename in file:
# @? the code will never reach this point
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, Let's add it to TODO will update this function.

@kouloumos
Copy link
Member Author

@urvishp80 I rebased and converted my observations to TODOs

During rebase, I also modified a comment based on my observation here.

@urvishp80 urvishp80 merged commit e5445b7 into bitcoinsearch:main Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants