-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingester PSC stream misses events #270
Comments
The issue has been discovered to be our side, not any issue in the Companies House PSC stream. It has been traced to faulty handling of the HTTP stream itself, prior to the payload being passed to the next stage, prior also to any attempt to parse the JSON, and certainly prior to any actual ingestion attempt or publishing to the Kinesis stream. It seems that incorrect handling of newlines in the stream causes some of the stream to be silently discarded in some cases. This ultimately leads to events being dropped without errors, without logging, and not even appearing when debugging the payloads received from the stream itself. The only way to observe this issue is to in fact debug the individual chunks coming through the stream, prior to any newline parsing. When such is done, the missing events can be observed, interleaved between the parsed events. The fix will have to be to Register Common |
TODO
|
Effect on Ingester PSCHTTP adapter is used in This is in turn used in:
(1) uses That doesn't appear to have any issues. Rather, it seems like only the use of (2) is the main issue discussed. It uses (3) uses Effect on Ingester DKHTTP adapter is used in This doesn't actually appear to be used anywhere. It seems there is also no other direct use of HTTP libraries. Effect on Ingester SKHTTP adapter is used in This also doesn't actually appear to be used anywhere, since Effect on Sources BODSHTTP adapter is used in This doesn't actually appear to be used anywhere. It seems there is also no other direct use of HTTP libraries. |
Reproduction of broken HTTP handlingTo reproduce the broken HTTP handling and debug the stream data, it is possible to apply these patches: Ingester PSC: diff --git a/lib/register_ingester_psc/streams/clients/psc_stream.rb b/lib/register_ingester_psc/streams/clients/psc_stream.rb
index 1f28239..8fd2b0e 100644
--- a/lib/register_ingester_psc/streams/clients/psc_stream.rb
+++ b/lib/register_ingester_psc/streams/clients/psc_stream.rb
@@ -31,6 +31,7 @@ module RegisterIngesterPsc
Authorization: "Basic #{basic_auth_key}"
}
) do |content|
+ File.write('/tmp/psc-stream.log', content + "\n\n", mode: 'a+')
timepoint_err = false
parsed = JSON.parse(content, symbolize_names: true)
match = %r{/company/(?<company_number>\w+)/}.match(parsed[:resource_uri]) Register Common: diff --git a/lib/register_common/adapters/http_adapter.rb b/lib/register_common/adapters/http_adapter.rb
index 848bf96..89bb30d 100644
--- a/lib/register_common/adapters/http_adapter.rb
+++ b/lib/register_common/adapters/http_adapter.rb
@@ -22,6 +22,7 @@ module RegisterCommon
URI(url), params, headers
) do |req|
req.options.on_data = proc do |chunk, _overall_received_bytes|
+ File.write('/tmp/http-adapter.log', chunk + "\n", mode: 'a+')
current_chunk += chunk
lines = current_chunk.split("\n")
if current_chunk[-1] == "\n" For ease of comparison, newlines are added between each stream event (as parsed, incorrectly), and each chunk (coming directly over HTTP). Then, tailing each of these logs side-by-side shows two streams:
Ingester PSC stream excerpt:
Timepoints: 11856669, 11856671, 11856673 Register Common stream chunk-by-chunk excerpt:
Timepoints: 11856669, 11856670, 11856671, 11856672, 11856673 |
At first, I thought this was because of incorrect handling of the PSC stream keepalives. PSC Streaming API documentation states:
https://developer-specs.company-information.service.gov.uk/streaming-api/guides/overview However, Register Common HTTP adapter code seems to handle this: Rather, the issue appears to be incorrect handling of whether or not a stream chunk ends in a newline. Consider the following: bfr = "a"
p bfr.split("\n")
bfr = "a\nb\nc"
p bfr.split("\n")
That is, chunks containing inner newlines can be correctly separated from a partial record using However, consider the case where a chunk ends with a newline: bfr = "a\n"
p bfr.split("\n")
bfr = "a\nb\nc\n"
p bfr.split("\n")
That is, the difference between a partial and a complete final record (ignoring the zero or more complete records which might be in the buffer prior to this) is lost. Since these are JSON payloads, this can lead to However, note that there does appear to be code to handle this: In practice, however, this logic does not appear to be correct. I am not completely sure why, but perhaps it is something to do with when the newline heartbeats also occur. Certainly, during debugging the data coming from the HTTP stream directly, I found that sometimes the buffer would be a partial record (OK), no record would be emitted in that pass (OK), the buffer would be appended to be a complete record (OK), no record would be emitted in that pass (not OK), then the buffer would again be the complete record but still with no record emitted (not OK). This admittedly doesn't answer at which stage the buffer gets discarded, but certainly, something isn't working correctly. There are a number of potential solutions, but perhaps the simplest is this:
https://www.rubydoc.info/stdlib/core/String:split That is: bfr = "a\nb\nc\n"
p bfr.split("\n")
bfr = "a\nb\nc\n"
p bfr.split("\n", -1)
This can be used to simplify the code significantly, and to split the buffer into head and tail, always taking the -1st element as the new buffer. |
As per: openownership/register#270 (comment) Also simplify the code significantly.
All the fixes have been merged and deployed. So far, so good: no gaps. I intend to check in on it tomorrow, by which time, it definitely should have started having gaps if it still has a problem (since typically gaps appear within a couple of minutes or so). |
Looks good. Events are still consecutive after running for around 16 hours, with no apparent gaps. Taking a sample of 500 events:
I think this is fixed. |
Ingester PSC stream misses events.
This has been discussed at length elsewhere, including in #264 . However, I'm creating this new ticket, to make it clearer what the resolutions were.
The text was updated successfully, but these errors were encountered: