Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RGMII Ethernet + MiSoC core does not work on Sayma #854

Closed
sbourdeauducq opened this issue Nov 23, 2017 · 464 comments
Closed

RGMII Ethernet + MiSoC core does not work on Sayma #854

sbourdeauducq opened this issue Nov 23, 2017 · 464 comments

Comments

@sbourdeauducq
Copy link
Member

No description provided.

@jbqubit
Copy link
Contributor

jbqubit commented Nov 24, 2017

I've paused to myself about how this Sayma subsystem is setup. @gkasprow please correct me if any of this is wrong.

  • There is a single outward-facing Ethernet interface on Sayma_AMC.

    • 1000BaseX over copper traces on microTCA Port 0 of the AMC backplane.
    • Ways of connecting to Port 0 are
      • a card edge adapter from WUT that provides power and RJ45 breakout
      • microTCA Carrier Hub Ethernet switch on NAT MCHBasic v3.5
  • Broadcom BCM5396 is Gigabit Ethernet Switch on NAT MCH

    • twelve 1000BaseX (SGMII) interfaces for fabric A consisting of 12 Port0's for up to 12 AMC modules
    • single 1000BaseT (GMII/RGMII/RvMII) on front panel
  • MAX24287 is Ethernet interface chip on Sayma

    • Application: Any System with a Need to Interface a Component with a Parallel MII Interface (GMII, RGMII, TBI RTBI, 10/100 MII) to a Component with an SGMII or 1000BASE-X Interface
  • LPC1776FET180 is MMC uP on Sayma

    • uses a parallel MII type interface
    • supports 10 Mbit/s or 100 Mbit/s PHY devices including 10 Base-T, 100 Base-TX, 100 Base-FX, and 100 Base-T4
  • Role of MAX23287 in Sayma:

    • Interfaces between outward-facing 1000BaseX and inward-facing MII or RGMII.

    • for 1000BaseX: RDP, RDN, TDP, TDN are connected to AMC Port0.

    • inward-facing interface can be exposed to either MMC (LPC1776 ARM uP) or the FPGA (Ultrascale).

    • A digital mux (SN74CB3Q32245ZKE) selects which of the two chips is connected to the MAX24287 under the control of trace SEL_RGMII driven by the MMC.

    • The FPGA uses a parallel RGMII-1000 type interface.

There was discussion of white wire modifications in a Sinara Issue. sinara-hw/sinara#327

@gkasprow
Copy link
Collaborator

gkasprow commented Nov 24, 2017 via email

@jbqubit
Copy link
Contributor

jbqubit commented Nov 25, 2017

@gkasprow Please edit your previous response to add additional line breaks between quote carrots '<'.

a card edge adapter from WUT that provides power and RJ45 breakout

it provides SATA connector

OK. The MAX24287 1000BaseX serial interface is connected to both AMC Port0 and SATA connector J10. But 1000BaseX and SATA are not easy to use; most Ethernet hubs are 1000BaseT with RJ45 plug. So to test Ethernet on bench top a different adapter is needed.

Exactly. By default the PHY is routed to the FPGA

When you say "by default" do you mean a combination of a) defined by components on PCB and b) defined by MMC after it's booted. Do you have a list of power-on steps taken by the MMC and what the state of various lines is? Or is there an annotated source file that I should read?

@gkasprow some Q&A that might help with trouble M-Labs is having.

  • How did you test Ethernet connectivity in Sayma hardware testing checklist sinara-hw/sinara#224? What are the steps you took to look for "MMC ethernet" and "AMC Ethernet connectvity"?
  • Can you confirm that the MII interface is "up" from the perspective of MMC?
    • Have you tried togging MAX24287 from MMC to FPGA and back to MMC? Is interface still "up"?
  • Have you tried using NAT MCH Basic hub and AMC backplane to communicate with MMC over Ethernet?

Reset-based Reconfiguration

Toggling MAX24287 from MMC to FPGA is pretty complicated as it involves resetting the chip. Table 6.1 of the datasheet describes the value of the 15 configuration pins. These pins have different roles after the reset phase is finished, so many are driven by multiple sources.

  • From the SPD annotation on sheet 6 of the Sayma_AMC layout it looks like you're configuring MMC for 100 Mbps. LPC1787 data sheet says that there's an integrated MAC so using MII DCE mode. Did I get that right?

  • Is FPGA acting as MAC or PHY? This dictates how GP02 should be pulled upon reset.

I see some odd looking things comparing Table 6.1 and sheet 6 of the Sayma_AMC layout for the MAX24287. Here's a comparison for the two modes. Odd things are noted.

select MII for use with MMC

  • line SEL_RGMII should be set to 0 by MMC
  • pin CRS should be 0 for 100 Mbps MII, is set by MMC via PHY_CFG_DDR
  • pin GPO2 should be 0 for DCE mode (parallel interface is to the MAC on LPC1787), !!BUT!! is pulled high
  • pin GPO1 should be 0 for high impedance, !!BUT!! state depends on forward bias value of LD20
  • pins RXD[1:0] should be 0,1 for 100 Mbps MII, is set by MMC via PHY_CFG_SPD0, PHY_CFG_SPD0
  • pins RXD[3:2] should be 1,0 for 25MHz, OK set by resistors
  • pins RX_ER, RXD[7:4] should be 0,0,1,0,0 for PHYAD 00100, ok
  • pin RX_DV should be 1 for autonegotiation, ok
  • pin TXCLK is ignored in MII mode
  • then PHY_RESETn is pulled low for 100e-6 s

select RGMII for use with FPGA

  • line SEL_RGMII should be set to 1 by MMC
  • pin CRS should be 1 for 1000Mbps RGMII-1000, is set by MMC via PHY_CFG_DDR !!BUT!! oddly is pulled low over 10k (R363)
  • pin GPO2 should be 0 for DCE mode (parallel interface is to the MAC on FPGA), !!BUT!! is pulled high
  • pin GPO1 should be 0 for high impedance, !!BUT!! state depends on forward bias value of LD20
  • pins RXD[1:0] should be 0,1 for 100 Mbps MII, is set by MMC via PHY_CFG_SPD0, PHY_CFG_SPD0
  • pins RXD[3:2] should be 1,0 for 25MHz, OK set by resistors
  • pins RX_ER, RXD[7:4] should be 0,0,1,0,0 for PHYAD 00100, !!BUT!! oddly MII_RX_ER is unnecessarily pulled low over 10k (R364)
  • pin RX_DV should be 1 for autonegotiation, ok
  • pin TXCLK is ignored in MII mode
  • then PHY_RESETn is pulled low for 100e-6 s

@jbqubit
Copy link
Contributor

jbqubit commented Nov 27, 2017

@sbourdeauducq What do you mean by "RGMII Ethernet + MiSoC core does not work on Sayma"? What have you tried?

@gkasprow
Copy link
Collaborator

Guys, I will work on it very soon, hopefully tomorrow.

@sbourdeauducq
Copy link
Member Author

No packet can be transmitted or received. When the PHY is clocked, and my cable is not broken (the SATA hack is very fragile), then autonegociation succeeds.

@sbourdeauducq
Copy link
Member Author

@gkasprow Any findings? Now that the clocking and DACs are mostly working, Ethernet seems to be the major blocker to get RF output using ARTIQ.

@gkasprow
Copy link
Collaborator

@sbourdeauducq Today I built setup to test Ethernet and there is partial success - it does not work at all. I'm quite happy with it because in this case I can find and solve the problem.
Moreover, I used same board that was used to test Ethernet. So it seems something changed since that time and probably the same issue emerged on other boards.

@jbqubit
Copy link
Contributor

jbqubit commented Nov 30, 2017

@gkasprow Glad there's now something tangible that looks wrong on your side too. Progress comes in many colors. :)

@jbqubit
Copy link
Contributor

jbqubit commented Nov 30, 2017

Debugging this is top priority. The Sayma hardware and lots of M-Labs gate ware is ready to test. Getting Ethernet up and running is the bottleneck to forward progress right now.

@gkasprow
Copy link
Collaborator

gkasprow commented Dec 1, 2017

I think I know where the problem is. I implemented simple condition in MMC firmware that resets the PHY chip after FPGA gets configured and DONE goes low.
But in fact the PHY is hold in reset state when DONE is LOW, which is wrong.
Corrected piece of code is here

```
	            //check if FPGA is programmed
   		            //DONE line is high - FPGA not ready
   		            if (LPC_GPIO0->PIN &(1 << 5))
   						{
   		            	//RESET PHY
   						LPC_GPIO0->DIR |= (1 << 23);
   						LPC_GPIO0->CLR |= (1 << 23);
   						// ETH LED OFF
   						LPC_GPIO0->DIR |= (1 << 31);
   						LPC_GPIO0->CLR |= (1 << 31);
   						}
   		            else
   		            {
   						//un-RESET PHY
   						LPC_GPIO0->DIR |= (1 << 23);
   						LPC_GPIO0->SET |= (1 << 23);
   						// ETH LED ON
   						LPC_GPIO0->DIR |= (1 << 31);
   						LPC_GPIO0->SET |= (1 << 31);
   		            }

@gkasprow
Copy link
Collaborator

gkasprow commented Dec 1, 2017

Here is the binary file:

lpc1776_ethernet_I2C.zip

@gkasprow
Copy link
Collaborator

gkasprow commented Dec 1, 2017

I will test it on Monday.

@sbourdeauducq
Copy link
Member Author

sbourdeauducq commented Dec 2, 2017

Why do I still get autonegotiation to work, then? Is that PHY chip still doing autonegotiation while in reset?

@gkasprow
Copy link
Collaborator

gkasprow commented Dec 2, 2017

In my media converter it shows LINK state when I plug SFP, even with AMC power supply off.
Is there any form of autonegotiation in 1Gbit Ethernet over SFP? There is only link state when valid symbols are decoded.
The reset line also disables the PHY clock generator so its impossible to have any activity. The PHY also needs Tx clock from the FPGA to send something, that's why I release the reset after the FPGA gets configured.

@sbourdeauducq
Copy link
Member Author

In my media converter it shows LINK state when I plug SFP

Yes, that was a problem with one of my media converters too. Some of those just use (and require) the EEPROM and/or the LOS signal, which was one of the problems with the cable you gave me, since you had removed its chips entirely. Some other media converters show the status of the autonegotiation instead.
https://ssl.serverraum.org/lists-archive/artiq/2017-November/001165.html

Is there any form of autonegotiation in 1Gbit Ethernet over SFP?

Yes, see section 36.2.5.2.7 "Auto-negotiation process" of IEEE 802.3-2008. The autonegotiation is optional and can be disabled with a switch on some media converters.

There is only link state when valid symbols are decoded.

No, there is more (optionally).

The reset line also disables the PHY clock generator so its impossible to have any activity.

In this case this is not the problem on my boards, since another of my media converters is sensitive to whether the SATA side of the cable is plugged or not.

@gkasprow
Copy link
Collaborator

gkasprow commented Dec 2, 2017

@sbourdeauducq I don't have access to AMC board right now, I simply found this issue looking at the code. University is closed right now. I will test it on Monday. Is the MII LED on?

@sbourdeauducq
Copy link
Member Author

What is the MII LED?

@gkasprow
Copy link
Collaborator

gkasprow commented Dec 2, 2017

It is a LED on the front panel which is connected to the CPU
Its original role was to signal who is talking to the PHY chip, at the moment it shows if PHY is in reset state or not. So when the led is lit, the PHY opearates normally.

@sbourdeauducq
Copy link
Member Author

@gkasprow Have you been able to get RGMII Ethernet to work again with your demonstration code?
If so, can you share a minimal Vivado project?

@gkasprow
Copy link
Collaborator

gkasprow commented Dec 4, 2017

I just noticed that I was wrong, DONE pin when high indicates correct FPGA configuration. So the MMC code you have is correct.

@gkasprow
Copy link
Collaborator

gkasprow commented Dec 4, 2017

So far no success. I tested 3-pin mode and 15-pin configuration mode. I observe PHY transferring data and Rx data on PHY pin. But there is no activity on DV line at all.
I will access MDIO registers to see what's really going on.

@gkasprow
Copy link
Collaborator

gkasprow commented Dec 5, 2017

@sbourdeauducq @jbqubit I found!
The PHY works in SGMII mode instead of 1000BASE-X
My media converter that I used before works in both modes and it detects it automatically.
Now I use media converter that works only in 1000BASE-X mode.

@gkasprow
Copy link
Collaborator

gkasprow commented Dec 5, 2017

Funny thing, I wrote little piece of code that dumps PHY registers
Register, ADDR, DATA
BMCR 0 0x1000
BMSR 1 0x7969
ID1 2 0x0
ID2 3 0x0
AN_ADV 4 0x20
AN_RX 5 0x41a0
AN_EXP 6 0x2
EXT_STAT 15 0x8000

And the value 0x20 in the AN_EXP register means that we operate in 1000base-X mode! So the PHY setting is OK
obraz
So it seems my media converter is simply broken.

@gkasprow
Copy link
Collaborator

gkasprow commented Dec 5, 2017

With another media converter I get reasonable data on Rx lines and observe them with chipscope

@gkasprow
Copy link
Collaborator

gkasprow commented Dec 5, 2017

There could be yet another issue which is dependent on particular chip. The datasheet says:
obraz
I will add it to the MMC and see what happens.

@gkasprow
Copy link
Collaborator

gkasprow commented Dec 5, 2017

We have revision B of the chip

@marmeladapk
Copy link
Contributor

marmeladapk commented Apr 18, 2018

Link up - no Ethernet, link down - Ethernet works

@gkasprow That's no longer true. One sfp converter doesn't work. I swapped it and in all cases I get transmission. (but still sometimes

I don't get link up led with direct SFP connection on Kasli (not with SFP-ethernet converter).

@gkasprow
Copy link
Collaborator

It seems it defaults with SGMII interface...

@gkasprow
Copy link
Collaborator

OK, I discovered that PHY on the board that works with Kasli is not configured by MMC but only by pin straps. Another one that shows link up but no traffic is configured by MMC. The PHY gets configured from time to time once per a few power cycles so it looks like floating pinstrap that is normally conntected to the FPGA.

@gkasprow
Copy link
Collaborator

@sbourdeauducq thanks for yak shaving tools, they are really helpful here.

@marmeladapk
Copy link
Contributor

marmeladapk commented Apr 18, 2018

I made Kasli transmitter and Sayma sniffer to test transmission in other direction.

It doesn't work only in one case: kasli -> sfp -> sfp-sata -> sata -> sayma. If there is any media converter in this chain transmission works.

@gkasprow It's the board on which phy is programmed.

ed: PAUSE is off.

@gkasprow
Copy link
Collaborator

So we have quite intriguing situation.
The PHY that is not initialized using MMC and by pinstraps only sends packets correctly without the media converters.
And the PHY that is correctly initialized sends the packets with media converters only.
At least we see where we can dig later on :)

@marmeladapk
Copy link
Contributor

marmeladapk commented Apr 18, 2018

Ok, this is getting ridiculous. I need a table to keep track of working configurations. These are Kasli -Sayma connections.

Sayma with not init. phy - with heatsink

Sayma with init. phy - without heatsink

PAUSE off:

~ Sayma with not init. phy Sayma with init. phy
Without converters (TX ✓) (RX X) (TX X) (RX X)
With converters (TX ✓) (RX X) (TX X) (RX ✓ (doesn't work with media converter, only with GLC-T))

@jbqubit
Copy link
Contributor

jbqubit commented Apr 18, 2018

Have you tried Sayma_AMC -> SATA -> SATA-SFP -> SFP ------> switch? If this works what are the components so I can reproduce?

@marmeladapk
Copy link
Contributor

@jbqubit It seems now we only have communication in one direction, we didn't try to plug it into switch.

@gkasprow
Copy link
Collaborator

lack of PHY configuration means that the Rx clock is generated and two clock sources are fighting.
That's why without init you won't be able to get the Rx working.

@gkasprow
Copy link
Collaborator

funny thing is that the only registers the MMC modifies are:


// power up Rx CDR
	phy_write(0x04, 16, 0x4000);
	phy_write(0x04, 0, 0x8000); // reset the data path BMCR.DP_RST

	// select page 0
	phy_write(0x04,  31, 0x10);
	// force RGMII, TXCLKEN=0
phy_write(0x04, 18, 0b1000100010000000 );

which disables the Rx clock and fixes silicon issue described in the datasheet

@gkasprow
Copy link
Collaborator

There is no other way of setting this register
obraz
Maybe there is some issue with this CDR power up?
The datasheet says
obraz

@gkasprow
Copy link
Collaborator

What we discovered with @marmeladapk today:

  • the PHY does not produce the Rx clock if not initialised. So this Rx CDR is critical and MMC needs to init it.
  • default pin-strap configuration is OK, the TX part works even with not initialised chip (reset is enough)
  • we observe data activity on the Rx path - clock, RxD 3:0 data and RX CTL line. But the microscope does not show any data, even no rubbish which would be the case when the Rx clock phase is wrong. @marmeladapk is ivestigating it.
  • when Rx CDR wrks (after init), the LINK UP LEDs is on.
  • the Tx part may work without Rx CDR, in such case the LINK UP LED is off.
  • for some reason, PHY on the board we got from Joe does not want to communicate over MIO. It works once a few power cycles. Anyway it gets configured correctly via pin-straps.
  • the PHY on another board communicates correctly over the MDIO
  • @wizath scanned all possible MDIO addresses of the PHY, so it is not a problem with wrong MDIO addeess, the PHY simply doesn't want to talk over MDIO.

@hartytp
Copy link
Collaborator

hartytp commented Apr 23, 2018

@gkasprow How did you get on with this? Any progress?

@gkasprow
Copy link
Collaborator

I prepared simple design that simply forwards RGMII data to the IO header. It is really strange why I see really nice RGMII signals on FPGA vias but the Microscope tool does not detect any traffic.
I will test it tomorrow. I can spend only 3 days per week in a lab so it take some time to test all ideas.

@gkasprow
Copy link
Collaborator

Today I connected logic analyzer to the DIO header where RGMII Rx port was forwarded.
And We get nice RGMII packets.
But we don't see them using Microscope...
tek00015

@gkasprow
Copy link
Collaborator

@sbourdeauducq any idea why microscope don't see any packets?

@gkasprow
Copy link
Collaborator

funny thing, with RGMII pins forwarded to DIO, the Microscope sees some packets...

@sbourdeauducq
Copy link
Member Author

What is the code you are using for Microscope?
How are you forwarding the pins to DIO? It's a bit tricky since those are DDR signals. Can you connect a pin to both the IDDR input and combinatorially to the fabric with Ultrascale?

If the behavior changes when you are forwarding the pins to DIO and you are sure you did not make other changes to the code, this looks like another Ultrascale/Vivado bug. Considering the design of the new Ultrascale I/O cell, that would not be very surprising.

@gkasprow
Copy link
Collaborator

no, U cannot connect it directly, At the moment the Microscope samples only rising edge data.
With IODDR primitive you don't have access to IO pin anymore.
This is the observation we did:
RGMII Rx -> GMII converter -> Microscope works on one Sayma board, don't work on second one
RGMII_Rx -> Microscope and -> DIO, works on second one
We are investigating it right now with @marmeladapk

@gkasprow
Copy link
Collaborator

@marmeladapk added RGMII converter and it works 😀

@marmeladapk
Copy link
Contributor

@sbourdeauducq I think it was my fault - I wrote (or more like copy-pasted) it second time and it works. Code looks like it was duct-taped together.

@gkasprow
Copy link
Collaborator

Here is short instruction how to modify the TRST connection.
obraz
And the photo
2018-05-17 12 57 13

@sbourdeauducq
Copy link
Member Author

Isn't it simpler to remove the capacitor, put it somewhere else, and solder on its pads?

@sbourdeauducq
Copy link
Member Author

Is this a decoupling capacitor? Does the board work without it?

@gkasprow
Copy link
Collaborator

gkasprow commented May 17, 2018 via email

@sbourdeauducq
Copy link
Member Author

According to Greg, ARTIQ Ethernet is also working on the WUT board.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants