Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify (and remove) delays that are not accounted for by the SCSI specification #1287

Closed
5 of 6 tasks
uweseimet opened this issue Nov 2, 2023 · 50 comments · Fixed by #1357 or #1361
Closed
5 of 6 tasks

Verify (and remove) delays that are not accounted for by the SCSI specification #1287

uweseimet opened this issue Nov 2, 2023 · 50 comments · Fixed by #1357 or #1361
Assignees
Labels
cleanup Refactoring, fixing SonarQube issues enhancement New feature or request

Comments

@uweseimet
Copy link
Contributor

uweseimet commented Nov 2, 2023

The existing code contains delays that are suspicious. It shall be verified which of these are needed and why. These and similar timing issues might be the reason for piscsi failing with some setups (see #1098) and the tickets where bus phases are aborted and/or not enough bytes transferred. In case a delay cannot be avoided it shall be documented in the sources why this delay is needed and how this has been tested.
The branch used for this ticket is "testing_issue_1287". When testing ensure that you have a backup of your disk image file.

Delays to be addressed:

  • 1. Delay during the transfer of Daynaport data. In case this delay is required, investigate further.
  • 2. Delays in the SCSI command and data handshake not covered by the SCSI specification.
  • 3. Delays in the SCSI controller not covered by the SCSI specification.
  • 4. When waiting for timeouts for 3 s the Pi's special timer is used. This shall be replaced by a standard C++ timer. Using a proprietary timer for timeouts of 3 s does not make sense.
  • 5. Use nanosleep for the daynaport delay
  • 6. Test data transfer with interrupts enabled

Details on 2.:

  • For each byte during the handshakes there are settle bus delays. Thess delays are not covered by the SCSI specification. Comments say that the delays wait until the signal line stabilizes. This IMO does not make sense, because the delays are only applied before reading/writing data bytes and not for any other signal. Why would the data signals need a delay before they are stable, but the REQ, ACK and other signals would not?
  • The SCSI controller class does not accept commands as fast as it could. Again, this is not covered by the SCSI specification. Why should the controller not process commands as fast as it could, i.e. as fast as the computers send them?

The issues_1263_1278_1283 and issue_1295 branches currently do not contain delays 2. and 3. and have successfully been tested against a second PiSCSI board running the same branch, a hard drive, a card reader, a streamer (each time in initiator mode using scsidump) and an Atari TT (in target mode). The speed gain in target mode is a bit less than 1% with a Pi 4.

@uweseimet uweseimet added enhancement New feature or request cleanup Refactoring, fixing SonarQube issues labels Nov 2, 2023
@uweseimet uweseimet self-assigned this Nov 2, 2023
@uweseimet uweseimet added the help wanted Extra attention is needed label Nov 2, 2023
@uweseimet uweseimet pinned this issue Nov 2, 2023
@uweseimet uweseimet changed the title Remove delays that are not accounted for by the SCSI specification Verify (and remove) delays that are not accounted for by the SCSI specification Nov 9, 2023
@uweseimet
Copy link
Contributor Author

uweseimet commented Nov 9, 2023

@benjamink This is the ticket for testing delays, e.g. for the daynaport emulation. If you would like to help, please check out the testing_issue_1287 branch. In the current code of this branch there is no delay. You already tested this with your SE and it failed. I would be interested in the log with trace:6. You already know the procedure ;-).
@akuker Thank you for pointing out the potential issue with slow Macs. Upcoming PRs will add a more detailed comment on this.

@uweseimet
Copy link
Contributor Author

uweseimet commented Nov 9, 2023

@benjamink As soon as you have confirmed that the code in this ticket, branch testing_issue_1287, does not work (because it contains the offending delay change) I will revert this change (change 1. in the list above) and will apply change 2 instead. Testing should be done with your SE for now. No need to add a logfile or screenshot unless I explicitly ask for it.
Note that these changes do not target the upcoming release but the one after. So no need to spend the whole night with testing ;-).

@benjamink
Copy link
Collaborator

First test fails (as expected). piscsi crashes & the Mac shows the following on the screen:

image

I will be labeling any log files with the commit hash being tested from now on:

piscsi-issue1287-branch-c61b5c003c84f0778a8d6fb4f8d9f5d8e5e67321.log

@uweseimet
Copy link
Contributor Author

@benjamink No screenshots, please. They make this ticket harder to read and usually do not provide any important information for this ticket.
From the log you added I do not see that piscsi was crashing. Did the console report something like "Segmentation fault"?

@uweseimet
Copy link
Contributor Author

@benjamink Regardless of the answer to my previous question (we definitely need to know whether there was an actual crash and piscsi terminated with an error message) I have already committed the next change set: I have reverted the delay change. We need to make this test in order to ensure that we have a clean starting point for any further change.

@benjamink
Copy link
Collaborator

@uweseimet piscsi didn't error out, it just froze. Want me to run with tracing or debug on for the HDD device as well & see what happens? I only had tracing on for the Daynaport device.

@benjamink
Copy link
Collaborator

@uweseimet I might have some contention going on here with another device. Let me remove everything else & try again. I'll get back to you.

@uweseimet
Copy link
Contributor Author

@benjamink OK. Please note that the current branch for this ticket is identical with what you successfully tested on your SE yesterday. So this should definitely be working.

@benjamink
Copy link
Collaborator

Confirmed a4b523b9163689ed3f018579b04c78143b1653d6 works

@uweseimet
Copy link
Contributor Author

@benjamink OK, so we have a clean starting point. I will need some time to prepare the next change set, because I will first test it with my Atari. It would not make sense if you were testing something that does not even work with my hardware ;-).

@uweseimet
Copy link
Contributor Author

@benjamink The next change set is available, tested with my Atari. Before you start testing ensure that you have a backup of the disk image file(s) you attach. The changes affect everything, not just the daynaport. If possibe, do not use the daynaport emulation at all but just the hard drive emulation.
In case of an issue please attach a log (no screenshot, but just a note what happens, e.g. Mac freezes or Mac reports an error). Limit logging to your hard drive, e.g. trace:0, depending on the SCSI ID you use for the drive. And test with your SE only.
My Atari is in a different room, by the way, but I don't need to climb any stairs to get there ;-).

@benjamink
Copy link
Collaborator

Commit 0db39ea19a0a96b60f3d1669946e63288a7da9f2 worked fine. I ran with this command so it was only the disk ($HASH is just the current git hash - this command is in a small script I wrote):

sudo "bin/fullspec-1287-debug/piscsi-${HASH}" -L trace:1 -id1 /home/pi/images/test-HD0-BK_SYS_7.1.hda

@uweseimet
Copy link
Contributor Author

@benjamink Sounds good! What you have just tested is item 2. in my list. I just updated the change set to also include item 3. Please re-test. Running "make" without "make clean" after updating is fine for this ticket.

@benjamink
Copy link
Collaborator

Commit 70b6e69e52379a8d69e169f42cdf3937864d5c54 worked fine (again with just disk attached)

@uweseimet
Copy link
Contributor Author

uweseimet commented Nov 9, 2023

@benjamink Thank you. Can you please test the same binary also with the daynaport? Please confirm that you are using an optimized build, with "-O3" being displayed when building.

@benjamink
Copy link
Collaborator

Looks like they're optimized. Example:

g++ -O3 -Wall -Werror -Wextra -DNDEBUG -Wno-psabi -std=c++20 -iquote . -D_FILE_OFFSET_BITS=64 -DFMT_HEADER_ONLY -DSPDLOG_FMT_EXTERNAL -MD -MP  -DCONNECT_TYPE_FULLSPEC -c generated/piscsi_interface.pb.cpp -o obj/fullspec/piscsi_interface.pb.o

@uweseimet
Copy link
Contributor Author

uweseimet commented Nov 9, 2023

Yes, they are optimized. I have already pushed a new change set for testing with both disk and daynaport, but before you update please test your current binary also with the daynapor.t

@benjamink
Copy link
Collaborator

Everything including networking worked fine with commit 70b6e69e52379a8d69e169f42cdf3937864d5c54

@uweseimet
Copy link
Contributor Author

@benjamink OK, please proceed with the latest change set. This one I expect not to work with the daynaport, but we'll see.

@benjamink
Copy link
Collaborator

@uweseimet Thank you! Looking forward to you having everything 100% fixed on Tuesday 😄

@uweseimet uweseimet unpinned this issue Nov 11, 2023
@uweseimet
Copy link
Contributor Author

uweseimet commented Nov 14, 2023

@benjamink I hope you enjoyed your vacation. I'm afraid not everything is fixed ;-). But there is an updated testing_issue_1287 branch at least, potentially affecting both the DaynaPort and hard drives. After testing, please just provide your verdict. No screenshots or logs, in order to keep this ticket clear. Thank you!

@benjamink
Copy link
Collaborator

@uweseimet thanks! I gave 2f4354095df3577618257dae0e00e48ef6575bed a test. It works fine with HDD only but it freezes on boot when I include Daynaport. Same thing as before, just freezes, no crash or error.

@uweseimet
Copy link
Contributor Author

@benjamink Thanks. Any change with my latest commit?

@benjamink
Copy link
Collaborator

Yes, looks like 785b0b30e6cd93ac946392f1a50201cff90d7c60 fixes it. Networking is working again.

@uweseimet
Copy link
Contributor Author

@benjamink This is excellent news! It means that we do not need the proprietary Pi hardware timer to get the work-around needed for the DaynaPort right.

@uweseimet
Copy link
Contributor Author

uweseimet commented Nov 14, 2023

@benjamink Can you please test the new "improve_daynaport_delay" branch with the DaynaPort? This branch is identical with the current develop branch, except for exactly the change you just tested. We should double-check.

@benjamink
Copy link
Collaborator

Confirmed commit 7b8ae25005ce0f5ead725010c25163ad761d1916 in the improve_daynaport_delay branch works fine with Daynaport.

@uweseimet
Copy link
Contributor Author

uweseimet commented Nov 14, 2023

@benjamink I'm afraid I have to ask you for yet another test with the improve_daynaport_delay branch, after updating. Our code analysis tool complained.

@benjamink
Copy link
Collaborator

@benjamink I'm afraid I have to ask you for yet another test with the improve_daynaport_delay branch, after updating. Our code analysis tool complained.

No problem at all. I have the machine setup next to my desk so I can run tests easily whenver you need them. I'll update & try again now.

@benjamink
Copy link
Collaborator

Commit 3bf0a06ead38bae9516a50cf4ba783bb1db48762 still works with Daynaport

@uweseimet
Copy link
Contributor Author

uweseimet commented Nov 14, 2023

@benjamink Please note that further testing should happen again on the testing_issue_1287 branch. There is a new changeset, with the latest commits to the develop branch included. Note that the binaries are no longer in cpp/fullspec/bin but in cpp/bin. Nothing should have changed regarding the daynaport.

@benjamink
Copy link
Collaborator

Commit 22dfa0c1958734ed65516bb9418d7b319ebd7bc3 of the testing_issue_1287 works fine with networking enabled.

@uweseimet
Copy link
Contributor Author

@benjamink There is a new branch scsi_compliant_handshake. This branch is identical with the current develop branch except for changes covering item 2. of my list, related to the SCSI handshake.
You have already tested these changes, but before creating a PR, please test scsi_compliant_handshake (both hard drive and daynaport), to ensure that everything works before a potential merge. Thank you!

@uweseimet uweseimet removed the help wanted Extra attention is needed label Nov 15, 2023
@benjamink
Copy link
Collaborator

Worked fine with networking enabled.

Branch: scsi_compliant_handshake - Hash: 08e0f44

@uweseimet
Copy link
Contributor Author

@benjamink There is a new branch testing_issue_1287_2, which is identical with the current develop branch except for one kind of change. When you have time, I would appreciate a test with hard drive and daynaport.

@benjamink
Copy link
Collaborator

@uweseimet worked fine w/ Daynaport enabled

Branch: testing_issue_1287_2 - Hash: f02d0d9

@rdmark
Copy link
Member

rdmark commented Nov 20, 2023

I recommend asking for testing on more platforms in Discord. We have users on f.e. DEC workstations, RISCOS, Apple II etc. Sampler devices may be impacted as well as they are notoriously quirky.

If the suspicious delays originate in the RaSCSI codebase, they may have been workarounds for NEC PC-98, X68k or MSX2 quirks.

Is there anything concrete that we gain in compatibility or performance by making these low-level changes? I'm thinking of the tradeoff between standards compliance and the risk of breaking something for the broader user base.

@uweseimet
Copy link
Contributor Author

uweseimet commented Nov 20, 2023

@rdmark Please see my notes in the related ticket: Why are the data signal lines of the PI considered unstable, but all the other lines are considered stable? Are the 8 lines used for the data byte special (less reliable)? This IMO does not make any sense from a technical perspective, does it? This is why I claim that just the fact that the RaSCSI/PiSCSI hardware/software works proves that this change is justified. The original comment implies that this delay is needed because of the Pi, not because of the connected computers/samplers. The handshake with ACK/REQ is what ensures the data are correctly transferred, not timings. Where you do need a precise timing is synchronous SCSI, because you do not handshake there. But piscsi is asynchronous.

Regarding the original RaSCSI codebase: Some drivers/computers/workstations that work today did not work at all with it ;-).

In general, I am increasingly interested in feedback from users with exotic platforms. I now know much more about the low level bus code than before and this might help me with addressing these issues, as long as the users reporting them can reproduce them and are willing to compile test branches.

This change just gives a1% performance improvement, but a very important benefit is that one can completely get rid of using the system timer. This makes porting PiSCSI to other hardware without this time (maybe even the Pi 5) much easier. And as far as I can tell this was also one of the obstacles when attempting to port to the Banana Pi. And it reduces compile times ;-).

What I suggest is to introduce a switch (macro) to enable/disable this change. For the develop branch the change should be enabled (so that anybody can easily test it and is testing it by default), but by enabling the macro in config.h you can disable it. One might just switch back with the next release, or maybe not.

By the way, remember that there is at least one platform which does not work because of delays not covered by the SCSI specs, see #1098.

Important: Even though my words above may not reflect this I fully understand your concerns. But the argument with only the data pins being considered instable and the others not is IMO a very strong one. I claim that this particular delay does not improve compatibility, but it does just the opposite. Currently there is nothing that proves this claim wrong.

@uweseimet
Copy link
Contributor Author

uweseimet commented Nov 20, 2023

@rdmark I added the switch mentioned above to the scsi_compliant_handshake branch. This switch also controls another related timing change, which has also already been tested by @benjamink and me. With this switch enabled the proprietary Pi system timer is not used anymore in the SCSI controller. The only remaining use (waiting for a 3 s timeout in scsidump) is going to be replaced in one of the scsidump tickets in progress by a standard C++ timer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Refactoring, fixing SonarQube issues enhancement New feature or request
Projects
None yet
3 participants