-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elastic Buffer #343
base: master
Are you sure you want to change the base?
Elastic Buffer #343
Conversation
Thanks for the putting in the hard work! If you need testers I'm happy to build this into my workflow and see how it performs |
Thanks for the proposed fix. It looks like there is a problem with the Python wrapper, as it now crashes. I'm not sure that such a drastic change is needed. An alternative approach would be to use a 64-frame circular buffer (as the NRSC-5 specification hints at), with silence inserted for any missing audio frames. When a batch of samples arrives through the rtlsdr callback, a corresponding batch of audio samples could immediately be returned from the circular buffer, and then decoding of the RF samples (which can take a variable amount of time) could proceed. I think this approach would result in a constant output rate, and allow the digital output to be properly synchronized with analog, without needing any change to the API. |
That would make sense since I didn't test the Python wrapper. Probably due to me removing the AUDIO callback.
I agree about the drastic change being pretty annoying. I tried to not add silence. I'll do more testing and update the PR with changes. I know it is possible to constantly send very SMALL amounts of audio packets. However, it was VERY on edge but that was without silence. You have more experience reading whitepapers. I read it so many times. |
The specification only describes how to build a transmitter, so one has to infer how to build a receiver. The "PDU control word" is what hints at a 64-frame circular buffer, with the "Starting Sequence Number" field describing where to begin inserting frames into the buffer when decoding a PDU. The "Audio Transport" section also describes how elastic buffering is supposed to work at a high level. The receiver is supposed to use an elastic buffer so that it outputs exactly 32 audio frames per PDU, even though the incoming PDUs can have anywhere from 24 to 40 audio frames each. This is the buffer that nrsc5 is currently missing, which causes our output audio rate to be variable. |
Yes. Should I assume packets are meant to be ordered per stream? If its not reset buffer? It would be nice to support the alignment of many streams. I don't have a recording of any enhanced streams to test alignment. Should I start playback once it receives full PDU Sequence? The maximum would be 64 in the buffer. I think delaying more would cause the audio to be more delayed than a normal receiver. I am not sure about 1-3 codec with 4 packets maximum. Does calculating minimum and maximum latency matter?
If I know correctly, PDU timing is also variable. I've had a PDU take 1.8 seconds before. Since you are willing for me to touch other areas for this, it looks like |
The API streams out all audio programs at once, so we'd need a separate buffer for each one.
I've only seen enhanced streams used on AM. But nrsc5 doesn't currently decode them, so I think it's fine to ignore them for now. (Related: #245) If you need a recording, I can send you one.
A real receiver would output analog audio (for the HD1 channel, anyway). nrsc5 doesn't currently demodulate the analog, but for now it could output silence, to maintain a constant input / output ratio. The same will need to be done for audio frames that are rejected due to an invalid CRC. Slots in the circular buffer could simply default to silence.
Those are used on logical channels (e.g. P3) that have a "transfer frame modulus" of 8, i.e. 8 transfer frames per block. With an average of 4 audio frames per transfer frame, it still comes out to an average of 32 audio frames per block. The PDU control word may work a bit differently though; it's been a while since I looked at that.
Yeah, as long as it happens before the digital demodulation steps (which can take a variable amount of time) we should be good. |
Right, I do believe it should be possible to do this without adding a thread. |
The reason I wanted a recording is because I am not sure how much delayed the enhanced stream compared to the core stream. Figured out i don't need to worry about it. |
180e697
to
e59d8dd
Compare
@argilo (Pinging for redundancy) Hey! I implemented a version of a circular buffer for outputting at a fixed ratio. This buffer is at the bleeding edge! I store packets in a buffer based on delay instead of 64 audio frames unlike one of your WIPs:
The buffer is meant to look like this: Such that: Dynamically allows any delay into account. For example, enhanced stream delay can be added. When reading the whitepapers, it does read like latency is in a different buffer. There wasn't an easier way to properly take that account without the code being worse. There is an "output buffer" for each program. This is meant to compensate for the FFT delay & provide IQ samples equal to audio frames. This is very useful for huge IQ buffer sizes. When HD is first received, the program will always start in the middle of a huge IQ sample. There would be no audio available to fill in the gaps before the program starts after the next IQ sample is received from the SDR + There's always a small number of IQ samples left due to FFT. This output buffer is useful for analog blending for later / do all sorts of buffering techniques. Little tidbits:
I tried the buffer on a semi-bad signal. I thought it was cool how well it handled it so I recorded it: https://youtu.be/pJEh8rs55fs Please let me know what you think |
I compiled this branch (https://github.com/TheDaChicken/nrsc5/tree/elastic-buffering) under MSYS2 to do a little testing. The BER climbs to very high rates after several minutes of playback and loses sync for several seconds causing large gaps in playback even on very strong stations where there's no interference or weak signal. |
This is either: bad gain / low signal / bad performance I ran a profiler just in case this new code degrades performance. The new code is very low. If audio is dropping out with a good **MER (BER doesn't matter) then its a new code issue and would be great to see a recording. |
The bad gain / low signal /bad performance just doesn't make sense because I'm running an i9-11980H CPU at 3.6Ghz on stations with a receive signal strength of 30-50db. And yes, I used -DUSE_SSE=ON at build time. The average BER stays in the normal range of around .02-.04% for about 10 minutes and then climbs to anywhere from 10% to 40% with the binary compiled from your branch. The audio then drops out for 5 seconds or so and then the error rate drops back into the normal range for another minute or two with the audio track resuming for several minutes before the cycle continues a few minutes later. This problem doesn't not occur at all when running the binary compiled from the original theori-oi/nrsc5 repo. Here's a screen shot immediately after the audio drops out showing the bit rates and error rate info as well as signal strength using the binary compiled from your branch. This problem occurs whether I use the binary from the command line or in conjunction with a python gui. I've verified the same problem on another PC (i7 CPU) using a completely different SDR dongle thinking it might be a system specific problem but the problem persists across both systems. |
I think you are seeing this issue here except its worse:
I will see what I can do. |
I way out of my depth when it comes to this type of coding. I'm creating a recording at the moment if you think it may hold some clues. |
Here's a 7 minute sample. The first audio drop occurs at 3:12 and the second at 6:25 so it appears to be happening at a very regular interval. I can make a longer recording to confirm this if you'd like. https://www.dropbox.com/scl/fi/u9w7bozp02wpigmtq8ob8/sample.wav?rlkey=zmk5y47vlgn7uaxq4zbvn1wl4&st=1f9bg6s0&dl=0 |
Here's a longer, completely different sample (15 minutes). Curiously, it also experiences the first audio drop at 3:12 and again at 6:25, so there does appear to be a pattern to this. https://www.dropbox.com/scl/fi/ofia7o5fz4m0grm568a35/sample2.wav?rlkey=c063rqif26rl0aj6ej4tz3g2p&st=ue9t9hor&dl=0 |
Yep. I figured out the problem - I should have tested subchannels with 4 packet maximum. Thanks for testing. Try the new latest commit. |
I ran the binary on my other test system which has an i7-4710Q CPU and the audio drop outs are back. They seem to occur every 3 min and 12 seconds apart. This isn't happening on my system equipped with an i9-11980HK CPU. Strange. https://www.dropbox.com/scl/fi/iat6xymv8iwl2qcpx8vjh/sample5.wav?rlkey=qwx6fwxsekdgnuuxbzgixd1ql&st=31y1oyf8&dl=0 |
Could you send me the output of the IQ samples using -w? |
Here's an IQ file that's 10 minutes in length. Everything really starts go out of sync badly around the 7 minute mark. https://www.dropbox.com/scl/fi/53p06s6avcq2v88ml44q0/sample.zip?rlkey=j5otk9jv625vx0o03bms1d0aw&st=j0bzgpo0&dl=0 |
It should be fixed now. The reader clock was going way too fast. Latency was incorrect. The value wasn't multiplied for the delay. Ignore the continued dumb commits. I deselect things on Clion. It pushes them away. I should use the terminal instead. |
Thanks for the update. This appears to have fixed things. I still get the occasional drop out from time to time but I'm not sure if it's code related or if it's RF interference but I don't experience any drops or interference using the theori-io code. Thanks for all the effort you're putting into this. Here's a screenshot of the the occasional audio drops: And the IQ and WAV files: https://drive.google.com/file/d/1pR4pjmE8DZ5Ko2L5GnTIume4zF8lVWIq/view?usp=sharing |
It was both RF interference and code-related. I made a silly mistake. Anyway should be fixed. |
There's still a code related issue. Here's the output from the latest build. There were only 4 audio drops during this 15 min run, but 3 of them were at regular intervals of appox. 1 min 45 secs apart. Getting much better though. https://drive.google.com/file/d/1_uewc4lwYNZxejcp2KI4wSAHt11dLcBZ/view?usp=sharing |
Thanks for the testing yet again even how repetitive this is. I am glad someone else is testing this out. The HDRadio stations here aren't super different. Unless I couldn't tell from the recording, it looks like RF interference or unrelated to the buffer. From the IQ file, I can see it lost synchronization and resyncs. The buffer properly resyncs tied to that while also keeping playback. nrsc5 still needs more things to improve reception. |
You're welcome. I've enjoyed the testing. Looking forward to any improvements that come about as time progresses. |
I compiled and tested with a strong signal inside my city. Everything worked well as far as I could tell!
I took a 10 minute sample and there only seemed to be one hiccup at 2:12 that was barely noticeable. https://share.foxxmd.dev/-bnJNyqjYh3 -- .wav audio
https://share.foxxmd.dev/-hUTvtaekFg -- IQ sample file @TheDaChicken EDIT: Been running for the last 16 hours and the stream is still smooth, no hiccups. 💪 |
Just wanted to post another update after some further testing. I recompiled after your latest commits and have let the binary run for about 12 hours now without any issues. This is the most stable version of nrsc5 yet on my systems. Whenever I used the binary compiled from theori-io, it would freeze or the audio would get choppy at anywhere from 15 minutes to 4 hours of playback. I'm going to let your binary run for another 24 hours or so just to see what happens. I don't expect any glitches though. Thanks again for the hard work you've put into your fork. I really appreciate it. |
Co-authored-by: Clayton Smith <[email protected]>
@@ -41,13 +41,13 @@ | |||
#include "log.h" | |||
|
|||
#define AUDIO_BUFFERS 128 | |||
#define AUDIO_THRESHOLD 40 | |||
#define AUDIO_DATA_LENGTH 8192 | |||
#define MAX_AUDIO_DATA_LENGTH 31072 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this number come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. This is the maximum audio size this PR outputs due to the SDR chuck size (512 * 1204)
defined in nrsc5.c.
Basically: (512 * 1024) / 1488375 * 44100 * sizeof(int16_t)
rounded up a little bit.
EDIT: During testing, there was a bug that caused it to peak up a little more than it should. That's why it's not rounded up exactly.
#define AUDIO_FRAME_CHANNELS 2 | ||
#define AUDIO_FRAME_LENGTH (NRSC5_AUDIO_FRAME_SAMPLES * AUDIO_FRAME_CHANNELS) | ||
|
||
#define OUTPUT_BUFFER_LENGTH (64 * AUDIO_FRAME_LENGTH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does 64 come from? Is it MAX_AUDIO_PACKETS
or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
64 is technically an arbitrary number. I chose the output buffer to be 64 frames. This is due to the flaw that the output buffer is limited in size. The output buffer is kind of based on IQ buffer size and probably doesn't need to be bigger than MAX_AUDIO_PACKETS
. If you think its better, I can replace 64 with MAX_AUDIO_PACKETS
I can switch to a linked list approach. That would require more code when splitting frames to output at a fixed ratio. Wanted the code to be simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if it's unrelated to MAX_AUDIO_PACKETS
then it's best to stay as it is. You could perhaps give this 64 its own name.
I'll have a look over the rest of the design to see what I think of linked list vs. buffer.
I did some testing. I wanted to see what aligning analog and digital manually would be like. I may have not properly tested this. I played back raw recordings in SDRSharp & used nrsc5 to output the audio to a file. I tried 4 stations. KQMV, KBKS, KHTP, KING-FM. I first tried adding Something may be wrong with the buffer somewhere - other stations were either behind too much or in the future. For example, KING-FM was behind 24 frames ish on HD when adding **All stations in my area are mostly properly aligned with HD receivers. KBKS digital audio was behind by latency + FFT delay. This could be what The output buffer may need to be a dynamic size instead of a constant or the size needs to be increased then a huge delay can be added. I don't know if there is an easy way of doing a dynamic size. I also wanted to put all of this out there. |
Just a thought, but as you've seen: I'm pretty sure that stations are going to have different timing between their HD streams and their FM programming. Each uses a different method of producing the stream, therefore creating variable delays - much like various TV streams have differing lags versus their on-air equivalents. I would expect that there may not be a single way to align these in all cases. You may want to just get it close and insert some 'dead air' when switching between the two. The initial buffering period may be sufficient to do this. Hopefully, it doesn't switch in and out too rapidly, which could have pretty annoying effects. |
HDRadio has a parameter called |
@argilo sorry for the ping! I want to remind you about this PR since it's been a month. |
Sometimes when acquiring a station, I see the following:
|
It seems to happen 50% of the time with my synthetic signal. If the buffer creation message has |
Could I have a quick IQ sample? That would be helpful. Rarely I got this on stations but never enough |
Give this a try: |
There are a few sign-compare warnings that could be cleaned up like so: diff --git a/src/output.c b/src/output.c
index 7c063d8..3685b4c 100644
--- a/src/output.c
+++ b/src/output.c
@@ -213,7 +213,7 @@ void output_align(output_t *st, unsigned int program, unsigned int stream_id, un
elastic->size = (elastic->delay * 2) + MAX_AUDIO_PACKETS;
elastic->ptr = malloc(elastic->size * sizeof(*elastic->ptr));
- for (int i = 0; i < elastic->size; i++)
+ for (unsigned int i = 0; i < elastic->size; i++)
{
elastic->ptr[i].size = 0;
elastic->ptr[i].seq = -1;
@@ -274,12 +274,12 @@ void output_advance_elastic(output_t *st, int pos, unsigned int used)
elastic->clock += (int)used;
// Decode packets based on average
- while (elastic->clock >= (int)sample_avg)
+ while (elastic->clock >= (unsigned int)sample_avg)
{
int16_t *audio;
unsigned int decoded_frames;
- for (int j = 0; j < elastic->avg; j++)
+ for (unsigned int j = 0; j < elastic->avg; j++)
{
elastic_decode_packet(st, i, &audio, &decoded_frames);
#ifdef USE_FAAD2 |
Thank you so much! I fixed it. Silly mistake. The available size was calculated using read as head. That would be normal behavior on a regular buffer. I couldn't get IDE autocomplete since a new build doesn't work - fftw.org is down.
That should be fixed now. Let me know if I have made a mistake somewhere. |
Fixing #330
Use a circular buffer to store packets to be played later temporarily. This allows libnrsc5 to handle buffering properly by outputting at a fixed ratio. This allows missing packets to be silenced or handled properly.
Replace audio callback with a buffering system that involves a polling function instead with any buffer size + implement latency buffering.New functions:1.nrsc5_open_program
2.nrsc5_close_program
3.nrsc5_reset_program
4.nrsc5_read_program_blocking
5.nrsc5_read_program_nonblocking
Please let me know what you think. Code review as much as you want! :D