Enhance documentation #53

kouloumos · 2024-04-10T13:12:25Z

Refined project documentation and added comments during an in-depth review to map out the project's architecture and component interactions.

@urvishp80 I would appreciate if you can review and verify that my attempt to explain the flow and what each script does is correct. Also, I prefixed a few comments with @?, those are questions that I have and I thought that it would be easier to discuss/answer them in-line. I will delete them (or transform them to @TODO) before merging this.

The cron-jobs are also documented on the sequence diagram that I created for Bitcoin Search & friends. I would also appreciate a review of that one.

urvishp80

Hi @kouloumos, I've reviewed all the files that you've updated and provided comments wherever necessary. Please feel free to add TODO and merge this PR to the main after resolving conflicts. Once merged, I'll start refactoring the code you highlighted along with some I found while investigating the code.

urvishp80 · 2024-04-25T05:17:02Z

src/xml_utils.py

-        if len(xmls_list) > 0 and not any(combined_filename in item for item in files_list):
+        # If individual summaries exist but no combined summary,
+        # extract and append their content to the dictionary
+        # @? in what scenario is this true?


We are using it as a fallback strategy, there are many cases where individual summaries are present and not the combined ones, for example:
i. The combined summary file generated isn't up to the mark due to some garbage individual posts. So, we have to manually delete the combined summary file and regenerate it.
ii. For delving bitcoin data we haven't generated all the XML files for the month before November 2023 so it might help in some exceptional cases of individual summary files.

So, let's keep it that way only.

urvishp80 · 2024-04-25T06:08:26Z

generate_weekly_newsletter_json.py

@@ -55,6 +55,8 @@
        logger.success(f"TOTAL THREADS RECEIVED FOR '{dev_name}': {len(data_list)}")

        # NEW THREADS POSTS
+        # @TODO you already identify the original post by type==original_post


Thanks for pointing this out, It seems like we can implement this change now. Earlier not all the domains had a type field so we used to filter the original post based on created_at and title.

urvishp80 · 2024-04-25T07:58:00Z

src/xml_utils.py

+
+        # For each individual summary (XML file) that exists for the
+        # given thread, extract and append their content to the dictionary
+        # @? This method is called for every post without a summary, which means that


This function file_not_present_df indeed needs a refactoring. Especially, improvements in file handling, reducing the number of operations inside the loop, and streamlining the data-appending logic.

Please transform it to TODO so I can make the necessary changes in it.

urvishp80 · 2024-04-25T08:07:43Z

src/xml_utils.py

@@ -180,6 +204,9 @@ def file_not_present_df(self, columns, source_cols, df_dict, files_list, dict_da
                    self.append_columns(df_dict, file, title, namespace)

                    if combined_filename in file:
+                        # @? the code will never reach this point


True, Let's add it to TODO will update this function.

kouloumos · 2024-04-26T07:54:07Z

@urvishp80 I rebased and converted my observations to TODOs

During rebase, I also modified a comment based on my observation here.

kouloumos changed the title ~~Add docs~~ Enhance documentation Apr 10, 2024

urvishp80 reviewed Apr 25, 2024

View reviewed changes

update README

eadf4c9

kouloumos force-pushed the add-docs branch from da80893 to 4f847bb Compare April 26, 2024 07:40

add comments

0d7cf28

kouloumos force-pushed the add-docs branch from 4f847bb to 0d7cf28 Compare April 26, 2024 07:48

urvishp80 merged commit e5445b7 into bitcoinsearch:main Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance documentation #53

Enhance documentation #53

kouloumos commented Apr 10, 2024

urvishp80 left a comment

urvishp80 Apr 25, 2024

urvishp80 Apr 25, 2024

urvishp80 Apr 25, 2024

urvishp80 Apr 25, 2024

kouloumos commented Apr 26, 2024

Enhance documentation #53

Enhance documentation #53

Conversation

kouloumos commented Apr 10, 2024

urvishp80 left a comment

Choose a reason for hiding this comment

urvishp80 Apr 25, 2024

Choose a reason for hiding this comment

urvishp80 Apr 25, 2024

Choose a reason for hiding this comment

urvishp80 Apr 25, 2024

Choose a reason for hiding this comment

urvishp80 Apr 25, 2024

Choose a reason for hiding this comment

kouloumos commented Apr 26, 2024