Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XCVR code cleanup #118

Merged
merged 3 commits into from
Jun 8, 2022
Merged

XCVR code cleanup #118

merged 3 commits into from
Jun 8, 2022

Conversation

mactcp
Copy link

@mactcp mactcp commented Jun 1, 2022

I noticed the XCVR changes had TRANSCEIVER_IO_SET(vTR_DBP,TR_OUTPUT) in the writeDataLoop. That results in it happening every block, rather than once per transfer, and probably messes up the code alignment potentially impacting performance.

Matching code to set the transceivers back to input also seemed to be missing. It might not actually matter, as typically commands will be followed by a status output (SCSI_DB_INPUT() might also not matter for the same reason), but that would require further investigation.

I don't have the XCVR hardware to test this change.

Add matching TRANSCEIVER_IO_SET(vTR_DBP,TR_INPUT) to writeDataPhase() and writeDataPhaseSD()
@erichelgeson erichelgeson requested a review from androda June 1, 2022 20:13
@androda
Copy link
Collaborator

androda commented Jun 2, 2022

Tested, and the results are mixed.
Before the change, benchmark on Beige G3 maxes at 1300 / 1342 read/write k per second.
With these changes, benchmark on the same machine with same SD card reaches 1253/1357. Read speed decreases a fair bit, but write speed increases.

@androda
Copy link
Collaborator

androda commented Jun 2, 2022

Matching code to set the transceivers back to input also seemed to be missing.

Code which sets the data bit transceivers back to input is unnecessary, unless there's a later phase where the initiator needs to send data/messages to the target. At the beginning of the loop() there's a block which sets transceivers back to input.

@mactcp
Copy link
Author

mactcp commented Jun 3, 2022

Tested, and the results are mixed.
Before the change, benchmark on Beige G3 maxes at 1300 / 1342 read/write k per second.
With these changes, benchmark on the same machine with same SD card reaches 1253/1357. Read speed decreases a fair bit, but write speed increases.

It occurred to me I can actually run this code on a regular BlueSCSI, the transceiver control lines just don't do anything. I had a look at what is going on, and there are two performance issues.

  1. writeDataLoop is currently optimized for peak performance on a Mac LC III+. While it is possible to get slightly more performance on SCSI-2 machines, there is a tradeoff in performance on older machines. The problem is shaving off a cycle between the WAIT_ACK_INACTIVE and WAIT_ACK_ACTIVE loops can cause the loop to be taken more often on slower machines, adding 4 cycles. This is what is happening with the alignment change caused by adding the transceiver code to writeDataLoop().

  2. For the larger transfer size tests, the benchmark is getting up to 1024 blocks being transferred, resulting in any timing sensitivity in writeDataPhaseSD being multiplied by 1024. I have yet to look at trying to pin the alignment of, and optimizing, the writeDataPhaseSD function, but it is something I want to take a look at.

The other thing that occurred to me: I think leaving TRANSCEIVER_IO_SET(vTR_DBP,TR_OUTPUT) until the writeDataLoop() function is actually temporarily leaving the SMT32F103 and the transceiver chips both driving the GPIO lines at the same time?

@mactcp
Copy link
Author

mactcp commented Jun 3, 2022

Matching code to set the transceivers back to input also seemed to be missing.

Code which sets the data bit transceivers back to input is unnecessary, unless there's a later phase where the initiator needs to send data/messages to the target. At the beginning of the loop() there's a block which sets transceivers back to input.

I had a look, and we can actually remove SCSI_DB_INPUT() in the same places for the same reason, so I have done that.

@androda
Copy link
Collaborator

androda commented Jun 4, 2022

Benchmark of the latest change results in 1253/1350

@mactcp
Copy link
Author

mactcp commented Jun 5, 2022

I've somewhat narrowed down an alignment issue being something linked in to the binary after the loop() function, which means it is a coin toss as to if there is a slight performance impact depending on the size of the code up to and including the loop() function. I've done a quick temporary fix by adding a 4 byte nop in the XCVR code path.

Here is what I am seeing performance wise:
Non-XCVR:
LC III+: 1247/926
G3/266: 1286/1389

XCVR without any of this change:
LC III+: 1137/917
G3/266: 1300/1368

XCVR with the latest change:
LC III+: 1247/926
G3/266: 1286/1389

@@ -920,9 +920,6 @@ void writeDataLoop(uint32_t blocksize, const byte* srcptr)
// Start the first bus cycle.
FETCH_BSRR_DB();
REQ_OFF_DB_SET(bsrr_val);
#ifdef XCVR
TRANSCEIVER_IO_SET(vTR_DBP,TR_OUTPUT)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restoring this segment increases read speed to 1335/1383 on beige G3 (three benchmarks run on two different disk images, this is the average of results).

Despite making the code appear less efficient, speeds are increased with this segment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will only be faster on SCSI-2 Macs (Quadras and later). If you take a look at the numbers I saw, the LC III+ read speed dropped from 1247 to 1137, about a 9% performance hit, so that alignment change is not a clear win. I've been deliberately tuning the performance to try and get good speeds across the board, rather than favoring any particular computers.

@androda
Copy link
Collaborator

androda commented Jun 6, 2022

Latest benchmark here on XCVR shows 1286 read and 1375 write on G3. See my code-comment about restoring the segment inside writeDataLoop

@erichelgeson erichelgeson merged commit 437b253 into erichelgeson:main Jun 8, 2022
@dotsam dotsam mentioned this pull request Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants