Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/overwrite behavior in combined summary #69

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
12 changes: 12 additions & 0 deletions push_combined_summary_to_es.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
from datetime import datetime, timedelta

import pytz
import tqdm
import traceback
from loguru import logger
Expand All @@ -16,6 +19,8 @@
xml_reader = XMLReader()
elastic_search = ElasticSearchClient()

now = datetime.now(pytz.UTC)

total_combined_files = []
static_dirs = [
'bitcoin-dev',
Expand Down Expand Up @@ -46,6 +51,13 @@
# remove timestamps from author's names and collect unique names only
xml_file_data['authors'] = remove_timestamps_from_author_names(xml_file_data['authors'])

updated_at = xml_file_data['updated']

if (now - updated_at) > timedelta(days=2):
continue

del xml_file_data['updated']

res = elastic_search.es_client.update(
index=ES_INDEX,
id=file_name,
Expand Down
20 changes: 19 additions & 1 deletion src/xml_utils.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import re
import pandas as pd
from feedgen.feed import FeedGenerator
from lxml import etree
from tqdm import tqdm
import platform
import shutil
Expand Down Expand Up @@ -73,6 +74,7 @@ def read_xml_file(self, full_path):
tree = ET.parse(full_path)
root = tree.getroot()
title = root.findall(".//atom:entry/atom:title", namespaces)[0].text
updated = datetime.fromisoformat(root.findall(".//atom:entry/atom:updated", namespaces)[0].text)
title_for_id = title.replace('Combined summary - ', '')
id = 'combined_' + clean_title(title_for_id)
summary = root.findall(".//atom:entry/atom:summary", namespaces)[0].text
Expand Down Expand Up @@ -107,7 +109,8 @@ def read_xml_file(self, full_path):
'type': "combined-summary",
'domain': domain if domain else None,
'thread_url': link if link else None,
'indexed_at': indexed_at if indexed_at else None
'indexed_at': indexed_at if indexed_at else None,
'updated': updated if updated else None
}


Expand Down Expand Up @@ -466,3 +469,18 @@ def generate_local_xml(cols, combine_flag, url):
logger.info(f"No new files are found for: {url}")
else:
logger.info(f"No input data found for: {url}")

def update_url_in_xml(self, xml_file, new_url):
tree = etree.parse(xml_file)
root = tree.getroot()
ns = {'atom': 'http://www.w3.org/2005/Atom'}

# Find the <entry> element and update its <link> with rel="alternate"
for entry in root.findall('atom:entry', ns):
for link in entry.findall('atom:link', ns):
if link.attrib.get('rel') == 'alternate':
link.set('href', new_url)
logger.info(f"Updated link: {link.attrib}")

# Save the updated XML file
tree.write(xml_file, encoding='utf-8', xml_declaration=True, pretty_print=True)
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
<id>2</id>
<title>Combined summary - Adding request/reply id in messages</title>
<updated>2023-08-01T03:26:12.874194+00:00</updated>
<link href="https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2012-April/001387.html" rel="alternate"/>
<link href="https://gnusha.org/url/https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2012-April/001375.html" rel="alternate"/>
<summary>Amidst an email conversation regarding a proposed improvement to the Bitcoin protocol, the author of the proposal defends it against claims that it would make the protocol more stateful. They argue that each reply is only the result of the current request and does not introduce new session information. On the other hand, another individual suggests that there is still state involved in the proposal, as certain commands require keeping track of previous requests. However, the author counters this argument by stating that many of the described commands already have a natural nonce and do not require additional administration.The proposed change in question focuses on distinguishing between replies and spontaneous notifications from the other peer, with all state still being tracked locally on the client side. The author is confused as to why the proposal would be seen as making the protocol more stateful, as each reply is simply a result of the current request and does not introduce new session information. Additionally, the proposed change does not directly improve peers that do not answer requests, but rather enables easier implementation of this improvement when all peers are running the modified code.The proposal aims to add a new "requestid" field in messages as a part of improving blockchain downloading. This addition would provide context information to help distinguish between responses to "getblocks" requests and spontaneous "inv" notifications. However, some members of the email conversation express concerns about the change. They argue that stateless protocols are easier to implement and prove correct, and they worry about the potential costs and vulnerabilities associated with relying on external parties for state maintenance. Despite these concerns, the discussion emphasizes the importance of ensuring the reliability and security of the Bitcoin network.In the email exchange, Christian Bodt proposes a bitcoin protocol improvement that involves adding a request/reply ID in all messages. This is intended to facilitate robust blockchain downloading by providing better handling of response and request messages. However, Jeff Garzik disagrees with the proposal, stating that the problems mentioned can be addressed without adding "requestid" fields. He argues that stateless protocols are easier to implement and prove correct, and he cautions against relying on external parties for state maintenance due to the potential for exploitation.The context also mentions a patch for modifying the behavior of the Bitcoin protocol related to making a second "getblocks" request. This modification always results in one "inv" reply with [0-500] elements and removes the filtering on previously transmitted block invs. The patch can be found on the GitHub repository of the Bitcoin project.In another email exchange between Pieter Wuille and Gavin Andresen, the topic of adding request/reply ID to all messages in the Bitcoin protocol is discussed. While Andresen sees it as a reasonable improvement, Wuille points out that the Bitcoin P2P protocol is mostly stateless, which enhances its security. Wuille suggests the addition of a "denied" message instead, to indicate when a client is unwilling to answer and report transactions not accepted into the memory pool.Gavin Andresen posts on the bitcoin developers' mailing list about the proposed improvement suggested by Christian Bodt. The proposal involves adding a request/reply ID in all messages similar to the "checksum" field. Andresen finds the proposal reasonable and seeks others' opinions. However, Pieter Wuille notes that the Bitcoin P2P protocol is not fully request-reply based, making the proposed change less intuitive. Wuille suggests including an additional "denied" message to indicate an unwillingness to answer and report rejected transactions.Christian Bodt from France proposes a Bitcoin protocol improvement by adding a "request/reply id" field in all messages. This aims to facilitate robust blockchain downloading by providing context information. Christian has already implemented a prototype of this proposal and shares it for reference. The discussion surrounding this proposal arises from reading the PONG BIP and observing a similar nonce field, leading to uncertainty about the necessity of the nonce field with the presence of request/reply ids.</summary>
<published>2012-04-13T06:30:49+00:00</published>
</entry>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
<id>2</id>
<title>Combined summary - Announcing the IFEX Project</title>
<updated>2023-08-01T03:26:54.647274+00:00</updated>
<link href="https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2012-April/001391.html" rel="alternate"/>
<link href="https://gnusha.org/url/https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2012-April/001389.html" rel="alternate"/>
<summary>A project called XChange has been developed to create a unified API for accessing financial exchanges, starting with Bitcoin exchanges like MtGox and Intersango. The goal of the project is to expand its support to other exchanges in the future. However, one challenge faced by the XChange team is that each exchange has its own data model and protocol. To address this, they aim to provide a reference implementation to guide exchanges in publishing their data.In addition to XChange, there is also the Internet Financial EXchange (IFEX) Project, which aims to enhance interoperability between various financial settlement systems. This includes both conventional and digital currencies, alternative financial communities, and financial service providers. The IFEX Protocol, still under development, seeks to establish a standard protocol for transaction and settlement path negotiation involving any financial instruments, currencies, or assets. The objective is to facilitate better connectivity, reduce settlement fees, enable real-time redundant financial routing, and handle arbitrary instruments, currencies, or assets.The IFEX Project offers a broader and more inclusive scope compared to existing vendor-specific APIs and conventional finance industry networking protocols. It does not aim to be a currency or settlement network itself, but rather a mechanism for bridging them. The project hopes to move towards an open-source implementation of the IFEX protocol that can work with major and emerging settlement networks.To further enhance interoperability, the IFEX Project has proposed the IIBAN Proposal (v1), which introduces a 13-character identifier for financial endpoint identification. This identifier, available on the project's website, is theoretically compatible with conventional banking infrastructure in Europe and other countries. By providing this identifier, the project aims to allocate identifiers in a democratic manner for the benefit of the community.Overall, both the XChange project and the IFEX Project aim to improve interoperability and connectivity within the financial ecosystem. While XChange focuses on creating a unified API for accessing financial exchanges, the IFEX Project aims to establish a standard protocol for transaction and settlement path negotiation across various financial systems. The hope is that these initiatives will lead to better connectivity, lower settlement fees, and more efficient handling of financial instruments, currencies, and assets for the benefit of all stakeholders in the community.</summary>
<published>2012-04-13T10:16:07+00:00</published>
</entry>
Expand Down
2 changes: 1 addition & 1 deletion static/bitcoin-dev/April_2012/combined_BIP-31.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
<id>2</id>
<title>Combined summary - BIP 31</title>
<updated>2023-08-01T03:25:41.622490+00:00</updated>
<link href="https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2012-April/001374.html" rel="alternate"/>
<link href="https://gnusha.org/url/https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2012-April/001372.html" rel="alternate"/>
<summary>During a discussion on April 11, 2012, Luke-Jr and Mike Hearn debated the implementation of a Bitcoin Improvement Proposal (BIP) for the pong message. Mike shared a link to BIP 0031 on the en.bitcoin.it wiki, which outlined rules for handling the pong message. However, Luke-Jr had previously suggested using protocol version bump 60001 instead. In reference to this, Pull #1081 was mentioned as making some revisions along these lines.On the same day, Mike Hearn emailed Jeff Garzik about the BIP for the pong message. The email included a link to the Bitcoin Improvement Proposals (BIP) wiki page. Although the exact context of the message is unclear, it seems that there was a discussion regarding the protocol version bump, specifically whether to adopt the value 60001.Jeff requested a BIP related to the pong message, and the provided link directed to the Bitcoin Wiki, where BIP 0031 was described in detail. This proposal outlined specific rules for handling the pong message, which serves to check if a connection between two nodes is still active. The BIP included comprehensive technical specifications and guidelines for implementing these rules. Adhering to these standards ensures compatibility among Bitcoin implementations, ultimately enhancing the network's overall quality and reliability.</summary>
<published>2012-04-11T17:00:25+00:00</published>
</entry>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
<id>2</id>
<title>Combined summary - BIP to improve the availability of blocks</title>
<updated>2023-08-01T03:27:44.714263+00:00</updated>
<link href="https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2012-April/001416.html" rel="alternate"/>
<link href="https://gnusha.org/url/https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2012-April/001408.html" rel="alternate"/>
<summary>The discussion in this context revolves around a proposal to improve the availability of blocks in Bitcoin. One member proposes adding a checksum for the blocks to save bandwidth among behaving nodes, but another argues that bandwidth is not the bottleneck of the Bitcoin system and it is the immense time needed to validate the blockchain. Clients should never send blocks first, but always an inv packet, then request the block.Amir Taaki, a bitcoin developer, stated that the proposed Bitcoin Improvement Proposal (BIP) to improve the availability of blocks is an optimization that is not needed because bandwidth is not the bottleneck of the Bitcoin system. He also stated that clients should never send blocks first and this change will cause disruption and bring little benefit. On the other hand, Rebroad suggested adding a hash to the block header to enable nodes to reject an already downloaded block, which would aid in saving bandwidth among well-behaving nodes.Wladimir noted that it would make sense for clients to be able to reject blocks they already have, but there would be no need for a BIP if you want to somehow fetch the block chain outside the bitcoin protocol. It could be downloaded from some http server or passed along on a USB stick. Currently, nodes with restrictive or slow internet have options such as going via a tor proxy; however, the problem with multiple receptions of the same block still occurs due to latency. If one is behind such a slow internet connection and concerned about every bit of bandwidth, it is better to run a lightweight node like Electrum.The email thread discusses a proposal to improve the availability of blocks in the Bitcoin system. The proposal suggests advertising hash in addition to the size in the header block, which would help nodes reject downloads if they already have a matching block to save bandwidth. Wladimir agrees that it would make sense for clients to be able to reject blocks they already have. However, he points out that introducing a new feature like this can cause maintenance and compatibility issues.Another part of the proposal is to allow nodes to request upload and download blocks that have already been partially downloaded. However, Wladimir thinks that introducing a BIP for this is unnecessary as users can simply download the blockchain from an HTTP server or pass it along on a USB stick. He suggests that running a lightweight node like Electrum could be a better option for those with restrictive or slow internet connections.In a forum discussion on SourceForge, Rebroad proposed adding hash advertisements in the header of a block, which would help nodes determine if they want to reject a download. This proposal could aid in saving bandwidth among behaving nodes. Rebroad also suggested modifying existing methods or adding a new method to allow nodes to request upload and download partially downloaded blocks. This would help nodes obtain the blockchain who have restrictive ISPs, especially if they are being served on port 80 or 443. Another option for those with slow or restrictive internet is to run a lightweight node like Electrum. However, Wladimir pointed out that even with partial blocks, the download will still be substantial.The proposal put forth by Ed to Bitcoin developers is to extend the protocol to allow partial block download and upload for people with intermittent or restricted internet connectivity. The proposal suggests adding the hash along with the size in the header of a block, which would help nodes determine whether they want to reject the download if they already have the same block. This addition could save bandwidth amongst behaving nodes. The other part of the proposal is to allow nodes to request the upload and download of blocks that have already been partially downloaded, which can be done by modifying existing methods of upload and download or by adding a new method, possibly using HTTP/HTTPS or similar methods. This could help nodes obtain the blockchain if they have restrictive ISPs, especially if it's served on ports 80 or 443. Moreover, web caches could keep caches of the blockchain, making it more widely available. Currently, nodes with restrictive or slow internet have some options, such as going via a tor proxy, but there are problems with multiple receptions of the same block due to latency. In conclusion, Ed's proposal aims to make the blockchain more accessible to nodes with limited internet connectivity while also saving bandwidth among nodes.</summary>
<published>2012-04-30T20:54:37+00:00</published>
</entry>
Expand Down
Loading