-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SYNC_IN jitter #16
Comments
I tried with the fully loaded Tester variant and Urukul on EEM0/1 on the hardware here. That also reproduces the narrow window (2 taps wide at validation delay 1) and the bouncing. The bouncing is something that I can't really explain with regular FPGA supply rail crowding. Then I tried with a variant containing only that Urukul EEM and nothing else. Both with it on EEM0/1 and EEM5/4. Both are very jittery as well (window size ~4 at validation delay 1). And I tried clocking the SYNC output register from its own dedicated BUFG. No significant change. |
@jordens we look at SMP_ERRs and scanned to find the optimum point, and see similar things to you (i.e. similarly narrowed window widths for the Kasli clock vs the DDS sync_out). I am also a bit worried about this, but have not had time to look at it properly yet. Empirically it is working without problem across two systems in our lab. When I tested this thoroughly (#3 (comment)) I saw several SMP_ERRs per 10^10 samples (300ps validation window), but I did not see any losses of sync. (To see this I logged channels from 2 different Urukuls on a scope on persist, started the system from cold, and ran it for a day to achieve 10^10 resyncs). |
@jordens We set the SYNC_IN delay by scanning it and manually choosing the centre of the "eye" (i.e. error-free region). Two examples from a while ago, using two Urukul v1.0 connected to the same Kasli v1.0:
Urukul v1.1 connected to a Kasli v1.1:
These are the number of SMP_ERRs per 1000 trials for each of the channels, with validation tap setting As Chris mentioned, we haven't seen any errors in production yet, but we haven't been looking very hard (i.e. only indirectly through relatively crappy quadrupole laser gates). |
At 62.5 MHz SYNC_IN there are already 1e10 resyncs after 3 minutes of running. And since SMP_ERR is latching and checking each one of them, I haven't seen an invalid (re)sync in >1e13. I am also uncertain how "loss of sync" would manifest itself on the outputs if there is no frequency/phase change. My guess is that it would not even be a 16ns transient and even if there is a 16ns transient, you'll capture that only if you mix or diff the channels on a scope. The reasoning is based on conjecture how the DDS works internally (ADI patents and SAWG: SYNC_CLK will have a pair of a short and a long cycle length glitches but since the outputs run at 1 GHz output will be exactly the same). I.e. I am not worried about using a window that is 6 taps wide at validation delay 1. But I am worried about dealing with a window that is only 2 taps wide at validation delay 1. |
You shouldn't need to repeatedly check SMP_ERR. It's latching. Just let it hammer for a couple µs. |
I forgot your posts on the other issue. Thanks for digging them out. The IO_UPDATE delay tuning is done. That is now measured without external hardware and can be done at runtime. And it is stable to the ns over all PVT cases I have looked at. |
Yep, I saw your (nice) commits – we'll definitely have a look at porting our code over from the quick stopgap fix to your driver soon. |
@jordens in our work we are using the 'clear phase accumulator on IO_UPDATE' mode. This means that a sync error looks like a 1ns phase origin glitch, which is very obvious (i.e. 90 degrees at 250 MHz). For my sync tests I checked the alignment of the RF outputs with an RTIO TTL output - I confirmed that sitting outside of the window caused obvious phase alignment errors, and that sitting at the edge of the window (as measured with 0 validation delay) caused a small but measurable phase alignment error rate. |
Yeah - the repeated checking on the eye scans is just to get an error rate estimate. |
@cjbe Could you clarify what you mean by "sync error"? Not "SMP_ERR", probably. And how are you seeing that "sync error" if the next SYNC_IN event (corrective reset of the SYNC_CLK generator) is just 16 ns away? How long are the "glitches"?
But then you iterated over the 1000-iteration another 4 times... @klickverbot ACK. But we should figure out where that high SYNC_IN jitter comes from. If someone with access to jitter measurement tools could have a look, that would be great. Might also move this to Kasli. |
From a quick look with a spectrum analyzer and scope, SYNC_IN after going through the fanout, another IDC cable and a LVDS-to-CMOS converter is pretty clean. Spurs (from sys_clk logic modulating rtio_clk) on the SYNC_IN fundamental are down ~50 dB, on the 7th harmonic down ~30 dB. Also very clean close in to carrier (1 kHz to 1 MHz). The jitter is on rather fast timescales as already a couple dozen µs of sampling show the problem. |
By 'sync error' I mean observing that the relationship between the DDS phase and an RTIO event is incorrect. The DDS is in a mode where the phase accumulator is reset to zero on IO_UPDATE. If the DDS state machine is not properly synced when it registers the IO_UPDATE the DDS phase is incorrect (i.e. the DDS chooses the wrong edge of the 1 GHz clock as the phase origin, leading to ~90 degree phase shifts for 250 MHz output). I triggered the scope from an RTIO TTL output at a fixed delay from the IO_UPDATE - if everything is working correctly the DDS phase should be fixed relative to the RTIO output event. If this phase is incorrect the DDS was not properly synced at IO_UPDATE. |
Ah - these are the SMP_ERR counts for channels 0..3, so that |
sinara-hw/Urukul#16 Signed-off-by: Robert Jördens <rj@quartiq.de>
IT's worth looking at the CPLD IO suply rail. The SMPS may work in discontinuous mode causing high ripples on CPLD supply. |
That signal doesn't go through the CPLD. It would need to be crosstalk from the control lines of the fan out. The fan out supply seemed clean. |
With the current (extremely lenient) algorithm and about two taps of margin even CFL tubes being switched on will reliably cause SMP_ERR to latch here. This is in a grounded, closed enclosure (albeit not RF shielded). There is something wrong here. |
You can try with FSEN pin state on LVDS receivers. It may affect the jitter. |
I have run sync_scan from the ad9910 test suite several times on two cards of the old (v1.0) and two cards of the new (v1.3) hardware versions of Urukul. Overall I did about 30 runs on each card. For cards of revision 1.0, the errors resulting from different validation delays were quite variable, typical results for one card would look like these: about 70% of all runs:
about 20% of all runs:
about 10% of all runs:
and for the other card
about 1/3 of runs:
For two cards of the new revision 1.3, all runs basically gave the same result:
The results for the (Creotec) v1.3 boards are consistent with pk-pk jitter of approx 400ps. |
We were setting up to look into the sync in jitter issue and see if we could locate its origin, however we weren't able to reproduce it in our test setup. Some other details (probably not material, but for completeness):
|
@gkasprow that 300ps is almost all data-dependent jitter, right? I'd have to double check what we're doing in the calibration code, but I'm not sure that could account for what we see. It also doesn't explain the observation that we see no "eye" for some window sizes on the older hardware. |
Yes. So with square wave pattern it should not be visible. The boards could differ by level of 3V3 rail noise that could affect jitter significantly. |
Such deterministic jitter depends on activity on neighbouring channels. So you can check if during calibration something happens on SPI. |
Anyway, even so, 300ps of deterministic jitter still wouldn't explain the issues @jordens and @cjbe observed.
From a quick skim over the schematics, I didn't see any changes to the power supplies which could explain this, but maybe I missed something. Could also be something to do with the clocking of the Urukuls from Kasli since I'm using the newer Kasli and @cjbe was using the older Kasli with worse clock distribution/floated MMCXs. |
That could be a matter of i.e. capacitors used. Other vendor means different characteristics. |
@gkasprow I was wondering about that kind of thing. If, the decoupling somewhere is a bit marginal then the quality of the capacitors used could have a large impact on performance. Anyway, even our results with the v1.0 hardware look better than the data @jordens posted at the top of this issue, so I don't think this is just to do with the vendor of Urukul. |
If you have SSA, you can simply pass known clock signal to the Urukul and back and see how it gets degraded. |
I had already checked crosstalk from busy SPI lines and I had looked at the signal after the fanout and another lvds-cmos converter, with a SA and not with a SSA though. My suspicion is that there is something going on between the fanout and the dds input. The jitter timescales are not slow (<100µs). |
Okay, so the clock buffer is not giving a deterministic input -> output phase relationship for the DDS clock? |
Are all control pins on that buffer correctly driven (e.g. no floating divider reset etc). |
@hartytp During tests IN_SEL on IC19 switches for a moment to high. |
Odd...I'd have to recheck the ARTIQ code to make sure that's unexpected. |
@marmeladapk can you try shorting IN_SEL so that the MMCX clock is always used and then take another eye scan, please? If that looks good then this is just some SW issue with the CPLD config. |
MMCX OSC sel = 1, IN_SEL = 0, triggered from Kasli MMCX, sync_sel = 0 (FPGA) ..[0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0] |
Thanks! Okay, well still bad eye scans even though there are no visible glitches on the SYNC line (previous measurements suggest that there wasn't excessive jitter there either). I'm out of ideas for things to test on the hw... |
@marmeladapk if you still have that setup in tact, can you repeat that last measurement on a finer timebase. I'd like to triple check there are no glitches on the SYNC_IN line. |
@hartytp I'll do it on Tuesday. |
thanks |
we can try to see with same scope both FPGA clock and SYNC signal directly on the Kasli. SYNC is available on EEM, the clock is also easy to observe. In this way, we would see exactly what FPGA feels. |
Thanks @marmeladapk! Nothing there looks suspicious. Since you're looking at SYNC_OUT as well as SYNC_IN, I believe this also rules out things like noise on the DDS PLL (as SYNC_OUT is derived from the internal clock independently of SYNC_IN). I suggest we leave this for now and produce a release that fixes the other issues. |
@cjbe spent some more time looking at this issue a few weeks back. IIRC he was looking at the output of IC16 on a fast scope using a balun and coax lines soldered to the EEM connector pins. Scope on persist to also catch glitches etc. Modified the ARTIQ driver so that we can switch between DDS0 and Kasli as the sync sources by only changing CLK_SEL and nothing else. Running eye scans on DDS0 with the two sync sources. All done with 0 validation window as recommended by ADI (IIRC there is a post on the forum about how other validation delays aren't expected to work, but that's not explained in the data sheet). Good eye scans with DDS0 as SYNC source, bad with Kasli as source. On the scope, there is no visible difference in the jitter with the two sync sources. Measurement was good enough to rule out there being enough jitter to account for the bad eye scans. The only visible change in waveforms (other than the DC phase being different due to cable lengths etc) was that the Kasli duty cycle is ~55%, while DDS0 is bang on 50%. Hard to see how small changes in duty cycle could cause issues (the DDS is DC coupled and edge-sensitive). Next thing to do would be to repeat this measurement at the DDS pin, but so far this remains a bit of a mystery. |
Can you include the EEM carrier or at least long ribbon cable to the loop? It might be an issue with crosstalk... |
That was included and doesn't appear to be the issue. |
Interesting. Let's understand this. Oddr with both phases hooked up to the ttl frequency generator output? Which mode and which kind of pipelining? |
Yes, and SAME_EDGE.
AFAIK, rtio.phy.ttl_simple.ClockGen didn't pack the register into "I/O Tile" before adding the ODDR. If it does the two cases are then implemented identically, and would not make any difference.
OSERDES and OLOGIC (where the register sits) are alternative to each other, aren't they? |
Ok. SAME_EDGE would still leave it toggling on the rising edge. |
I'll try to have a look.
Yes, true, make sense. |
@WeiDaZhang did you check the floorplan? Can you also fix this by setting |
That's what I was about to do.
I frankly don't know how to do it in migen. |
Without this, the final register in the SYNC signal TTLClockGen isn't (always) placed in the I/O tile, leading to more jitter than necessary, and causing "double window" artefacts. See sinara-hw/Urukul#16 for more details. (Patch based on work by Weida Zhang, testing by various members of the community in Oxford and elsewhere.)
Without this, the final register in the SYNC signal TTLClockGen isn't (always) placed in the I/O tile, leading to more jitter than necessary, and causing "double window" artefacts. See sinara-hw/Urukul#16 for more details. (Patch based on work by Weida Zhang, testing by various members of the community in Oxford and elsewhere.)
This appears to be FPGA jitter, not a hardware issue with Urukul. |
Without this, the final register in the SYNC signal TTLClockGen isn't (always) placed in the I/O tile, leading to more jitter than necessary, and causing "double window" artefacts. See sinara-hw/Urukul#16 for more details. (Patch based on work by Weida Zhang, testing by various members of the community in Oxford and elsewhere.)
The jitter on the SYNC_IN signal from Kasli to the AD9910 (throught the LVDS buffers and the fanout) is very high in some caes (the tester setup connected to the buildbot).
At validation delay 1 (hold and setup margin 1 tap) the window is just 2 taps wide (a tap is about 75 ps).
http://buildbot.m-labs.hk/builders/artiq/builds/2669/steps/python_unittest_2/logs/stdio
This is the SMP_ERR matrix on tester, rows are increasing validation delay, columns are SYNC_IN delay on the AD9910.:
There also seems to be some bounce around the edges (top row).
On the systems I have here I get about 5-6 tap wide windows at validation delay 1. That's not stellar but OK.
When using the SYNC signal on board from the first DDS, the window at validation delay 1 is 8 taps wide on tester, 8-9 taps here.
Assuming equal tap delay for the validation delays and the SYNC_IN delays, the theoretically best case is validation delay 4 and a window width of ~4 or a validation delay of 1 and a window width of ~10 (i.e. SYNC_IN delay periodicity minus twice the validation delay).
In both cases Kasli/v1.1 and Urukul-AD9910/v1.3, connected via MMCX to Kasli-J1.
The jitter seems to come in part from Urukul and in part (the larger part) from Kasli. And it varies between setups.
I changed a couple things (see the artiq changelog) to optimize jitter on the RTIO clock but there was little effect. From Vivado the max peak-peak jitter on the clock driving the SYNC output buffer in the FPGA is ~90ps.
I tried running the SYNC fanout from the supposedly quieter P1V8A rail but that doesn't seem to work at all.
@gkasprow @marmeladapk could you have a look at the jitter on SYNC_IN (EEM0:7, before and after the sync fanout, compared to the Kasli MMCX clock)?
@cjbe @klickverbot When you were playing with SYNC, did you look at SMP_ERR? How did you select the SYNC_IN delay and the validation delay? Did you scan them?
c.f. m-labs/artiq#1143
The text was updated successfully, but these errors were encountered: