Author Topic: Understanding Metastability Symptoms (Read 1619 times)

safarir · « **on:** August 17, 2024, 08:49:45 pm »

Hi all,

I need some help to understand some metastability issues I am experimenting (or what I think is metastability).

As part of a larger design, I need to asynchronously capture the value of a ~3Mhz counter and send it over SPI. I was experimenting a lot of issue and slowly work to simplify my design until I came up with the following code:

Code: [Select]


module meas(
  input RESET,
  input REF,
  input SPI_CLK,
  input SPI_CS,
  output SPI_MISO,
);

  wire [63:0] meas_value;

  reg [59:0] count;
  always @(posedge REF or posedge RESET) begin
    if (RESET)
      count <= 0;
    else
      count <= count + 1;
  end
  
  assign meas_value[3:0] = 'b1111;
  assign meas_value[63:4] = count;

  spi spi1 (
    .clk(SPI_CLK),
    .cs(SPI_CS),
    .miso(SPI_MISO),
    .value(meas_value),
  );
  
endmodule

// Mode 0 spi
module spi
(
  input clk,
  input cs,
  input miso,
  input [63:0] value,
);

  reg [63:0] data;
  reg [5:0] bit_to_send;

  assign miso = data[bit_to_send];

  always @(negedge cs) begin
      data <= value;
  end

  always @(negedge clk or posedge cs) begin
    if (cs) begin
       bit_to_send <= 63;
 
    end else begin
      bit_to_send <= bit_to_send - 1;
    end
  end

endmodule

The lowest 4 bits of the value should always be 0xf, this is because in the full design, those bit are driven by other signal.

I understand there is a possible meta-stability issue if my SPI_CS edge fall close to the REF positive edge signal. From my understanding, If this is the case, I expect any bit in `data` that have a different value than the previous sample to have an undefined state. However, is still expect the lowest 4 bits to stay at 0xf at all time.

However, this is not was I am seeing experimentally. Here is the output from my little python script that query the SPI at 10hz:

Code: [Select]

b'000000000000000f' <- Reset value, make sense
b'0000000000061a2f'
b'00000000000641cf'
b'00000000000653ef'
b'000000000006770f'
b'000000000006ab0f'
b'00000000000b676f'
b'000000000059218f'
b'0000000000a6e06f'
b'0000000000f49f5f'
b'000000000142578f'
b'000000000190106f'
b'0000000001ddc99f'
b'00000000022b892f'
b'000000000279498f'
b'0000000002c708df'
b'000000000314d09f'
b'000000000362916f'
b'0000000003b0515f'
b'0000000003fe148f'
... value removed ...
b'0000000056fff68f'
b'00000000574dc26f'
b'00000000579b9d6f'
b'0000000057e98000'   <--- Occasional bad value, why are lower bit not 0xf
b'000000005837650f'
b'000000005885378f'
b'0000000058d3146f'
b'000000005920ef0f'
b'00000000596ed05f'
... value removed ...
b'000000015bee401f'
b'000000015c3c3f7f'
b'000000015c8a3cff'
b'000000015cd83b7f'
b'000000015d263a8f'
b'000000015d74392f'
b'0000000000000002'    <--- Occasional bad value, why are lower bit not 0xf
b'000000015e1035df'
b'000000015e5e358f'
b'000000015eac242f'
b'000000015efa21df'
b'000000015f4813cf'
b'000000015f9611af'
... value removed ...
b'000000019eaa0edf'
b'000000019ef8145f'
b'000000019f46196f'
b'000000019f94255f'
b'0000000000000003'    <--- Occasional bad value, why are lower bit not 0xf
b'00000001a03024cf'
b'00000001a07e2b0f'
b'00000001a0cc31cf'
... value removed ...
b'00000001feb1579f'
b'00000001feff6acf'
b'00000001ff4d8a9f'
b'00000001ff9bad5f'
b'00000001ffe9d0ff'
b'0000000000000000'    <--- Ton of bad value after 0x2000000
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'00000002020c95bf'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000008'
b'0000000000000000'
b'0000000000000000'
... value removed ...
b'0000000000000008'
b'0000000000000000'
b'000000023d36967f'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'000000023ff4e27f'
b'000000024042fe0f'
b'0000000000000000'
b'0000000000000000'
b'00000002412d569f'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000242b41e3f'
b'0000000000000000'
b'0000000000000000'
... similar pattern until 0x8000000 ...
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'00000007fd6d1ecf'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'00000007fea5615f'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'0000000000000000'
b'00000008002bc50f'
b'000000080079dd8f'
b'0000000800c7ee3f'
b'000000080115febf'
b'0000000801640fbf'
b'0000000801b220bf'
b'00000008020031cf'
b'00000008024e427f'
b'00000008029c56ef'
b'0000000802ea679f'
b'0000000803387b5f'
b'0000000803868e0f'
b'0000000803d4a04f'
b'000000080422bddf'
b'000000080470e81f'
b'0000000804befedf'
b'00000008050d1dcf'
b'00000008055b3d3f'

The most most striking thing for me is why do I get so many bad value once the counter reach 0x2000000 and why does it come back to normal after 0x8000000. The counter seem to be running fine, I still occasionally get correct value. This seem to be fully reproducible from one run to the other.

I also can't wrap my head around why would there be any value where the last 4 bits are not 4'b1111. Those should never change, how is this possible ?

My experience tell me that there some basic concept that I am missing, I would appreciate if someone could point me to the right direction about what is happening here. I am not really looking for a working solution, I am mostly interested in understanding what I am seeing.

PS: I am using a ICE40LP1K and yosys/nexpnr-ice40

ejeffrey · « **Reply #1 on:** August 18, 2024, 03:10:56 am »

Metastability is only one honestly relaticely small part of clock domain crossing issues. Metastability is a failure that happens with a single bit.

When you have multi-bit values such as a counter, preventing metastability is not enough as even with single bit synchronizers you can get tearing.

Instead, to transfer a multi but word across a clock boundary you need to freeze it, send a 1 bit signal to the other clock domain, and wait for the acknowledgement that it has been received to unfreeze the value.

Otherwise, use a dual clock fifo. What I described above is essentially a 1 element dual clock fifo.

BrianHG · « **Reply #2 on:** August 18, 2024, 03:43:35 am »

Quote from: ejeffrey on August 18, 2024, 03:10:56 am

When you have multi-bit values such as a counter, preventing metastability is not enough as even with single bit synchronizers you can get tearing.

Yup, that's the crux of it. For counters, I usually only use the MSB as a 1 bit toggle in location. Example:

In the source clock domain,
I usually load my parallel data register first, then toggle my single status bit on the next clock in this clock domain which will monitored by the second clock domain.

In the destination clock domain:
I await any toggle from the single status bit from the first clock domain. Once toggled, I load a copy of the parallel data register into the local clock domain parallel register, and use that register's output to drive additional actions. I also register the status bit in this domain as well so it can be sent back to the first domain as message being acknowledged.

hamster_nz · « **Reply #3 on:** August 18, 2024, 05:02:40 am »

This isn't a metastabillity issue (at least on the counter), as others have said that can only affect bits that are in transition.

However, if you want to transfer a counter from one clock domain to the other, and don't want any glitches:

Convert the counter to Gray code, so only one bit changes per increment.
Capture the Gray code value from the target clock domain, with a 2-FF synchronizer (with proper constraints like ASYNC_REG for best performance).
Convert back to binary.

Job done.

Or if you are on Xilinx, and your counter is short than 32 bits then use an XPM macro:

Code: [Select]

Library xpm;
use xpm.vcomponents.all;

...
   xpm_cdc_gray_inst : xpm_cdc_gray
   generic map (
      DEST_SYNC_FF => 4,          -- DECIMAL; range: 2-10
      INIT_SYNC_FF => 0,          -- DECIMAL; 0=disable simulation init values, 1=enable simulation init values
      REG_OUTPUT => 0,            -- DECIMAL; 0=disable registered output, 1=enable registered output
      SIM_ASSERT_CHK => 0,        -- DECIMAL; 0=disable simulation messages, 1=enable simulation messages
      SIM_LOSSLESS_GRAY_CHK => 0, -- DECIMAL; 0=disable lossless check, 1=enable lossless check
      WIDTH => 2                  -- DECIMAL; range: 2-32
   )
   port map (
      src_clk          => src_clk,
      src_in_bin     => src_in_bin,
      dest_clk        => dest_clk,
      dest_out_bin => dest_out_bin
   );

safarir · « **Reply #4 on:** August 18, 2024, 08:46:22 pm »

Quote from: hamster_nz on August 18, 2024, 05:02:40 am

This isn't a metastabillity issue (at least on the counter), as others have said that can only affect bits that are in transition.

I think you are right. I did additional testing to prove it to myself, here what I did:

- I let the counter count until I started getting bad values
- I stopped the SPI capture
- I stopped the REF clock
- I start the spi again

At this point, there is only one clock domain running (and pretty slow) and I am still getting a mix of good and bad values

Code: [Select]

b'0000001a729d8bdf'
b'0000001a729d8bdf'
b'0000001800000069'
b'0000001a729d8bdf'
b'0000001a729d8bdf'
b'0000001a729d8bdf'
b'0000001a729d8bdf'
b'0000001a729d8bdf'
b'0000001a729d8bdf'
b'0000001800000069'
b'0000001a729d8bdf'
b'0000001a729d8bdf'
b'0000001a729d8bdf'
b'0000001a729d8bdf'
b'0000001a729d8bdf'
b'0000001a729d8bdf'
b'0000001a729d8bdf'
b'0000001a729d8bdf'
b'0000001800000069'
b'0000001a729d8bdf'

I really can't explain this !

I went ever further and removed the latch from the SPI module and start reading the bit directly from the counter:

Code: [Select]

module meas(
  input RESET,
  input REF,
  input SPI_CLK,
  input SPI_CS,
  output SPI_MISO,
);

  wire [63:0] meas_value;

  reg [59:0] count;
  reg previous_cs;
  always @(posedge REF or posedge RESET) begin
    if (RESET) begin
      count <= 0;
    end else begin
      count <= count + 1;
    end
  end
  
  assign meas_value[3:0] = 'b1111;
  assign meas_value[63:4] = count;

  spi spi1 (
    .clk(SPI_CLK),
    .cs(SPI_CS),
    .miso(SPI_MISO),
    .value(meas_value),
  );
  
endmodule

// Mode 0 spi
module spi
(
  input clk,
  input cs,
  input miso,
  input [63:0] value,
);

  reg [5:0] bit_to_send;

  assign miso = value[bit_to_send];

  always @(negedge clk or posedge cs) begin
    if (cs) begin
       bit_to_send <= 63;
 
    end else begin
      bit_to_send <= bit_to_send - 1;
    end
  end

endmodule

Once again, I stopped the clock REF clock before doing the spi read after I started getting bad value. Once again, I got a mix of bad value. The read values also go back to being good once I restart the counter and it get to an higher number.

At this point I am think that either the open source toolchain is generating bad bitstream or there is something seriously wrong with my icesugar nano.

Rainwater · « **Reply #5 on:** August 19, 2024, 12:23:36 am »

Im new so the following will likely be wrong is so many ways.
I see a lot going on here, so I want to try to unroll as much as needed to satisfy the itch in the back of my brain.
ya there is a clock domain crossing, but that is easy to spot and needs fixing, but does nothing to explain the bit errors.
I think the problem is more complex. the pattern after 0x200000 has my full attention! and is a red flag for a timing violation.

Code: [Select]

  reg [63:0] data;
......
  assign miso = data[bit_to_send];

I think this line is basically guaranteed to generate a timing violation due to how slow this operation will be.
I also think this violation will not show up on any timing report because the destination of this signal is not a flip flop.

When I see this, i see a 64:1 multiplexer being inferred.
A quick google finds this datasheet which says you don't have mux hardware, but 4bit LUTs.
which can be used to make a 2:1 mux, and if you connect 63 of them together into a 6 layer deep nary tree, you will get the 64:1 mux needed.
This is only part of the problem.
I can't see your top module, clks, constrants, simulation, so im going to make a few generic assumptions.
First is that your logic is good. and simulations perfect.
Im not sure what spi mode zero is off the top of my head, I think its sample on the rising edge. but put that to the side for now. lets look at the tripple O's (order of operation).

Code: [Select]

always @(negedge clk or posedge cs) begin
    if (cs) begin
   ...
    end else begin
      bit_to_send <= bit_to_send - 1;
    end
  end

on the negedge of the clock, update `bit_to_send`.
`bit_to_send` then begins propagating through the 64:1 mux.
so if this is the critical path, it will look alot like

Code: [Select]

data_d > net > lut0 > net > lut1 > net > lut2 > net > lut3 > net > lut4 > net > lut5 > net > miso
Now the spec. analog.com defines spi mode 0 as

Quote

clock Polarity in Idle State - Logic low
Clock Phase Used to Sample and/or Shift the Data - Data sampled on rising edge and shifted out on the falling edge

ok, so we have the tripple O's and spec. what does not match.
ooo says on falling edge, we update bit_to_send and shift out the data. but with a huge delay.
So I think the root of the error is the amount of time it takes to propagate the 64:1 mux.
Quick experiment. register the output and trade the mux for a shift register.
you will have to figure out some new always blocks to handle all of this, your only assigning data at the falling edge of ce.
maybe something like

Code: [Select]

assign miso = data[63];
....
  always @(negedge cs or negedge clk) begin
    if( /*cs edge detector magic*/ ) begin
      data <= value;
    end else begin
      data <= data << 1;
    end
  end

that should remove the 64:1 mux and replace it with a 2:1 mux, that assigns the value to 'data'.
input 0 would be the 'value' port and input 1 would be the (data << 1 ).

safarir · « **Reply #6 on:** August 20, 2024, 01:56:59 am »

Quote from: Rainwater on August 19, 2024, 12:23:36 am

So I think the root of the error is the amount of time it takes to propagate the 64:1 mux.

I really like your way of thinking, I really think you might be onto something and I will definitely do test the shifting idea.

The 6 level deep LUT seem to match what I am seeing in the synthesized code, I really did not understood that before.

I should be able to verify the CLK to MISO delay with my scope very easily. However, if this is the problem, it really does not explain how the counter value could have an effect on the result. I am also running the SPI relatively slow, 100kHz ...

Rainwater · « **Reply #7 on:** August 20, 2024, 02:40:30 am »

My understanding is that you always take external signal in through a few flipflops before using them. And all my books say never use an external input as a clock without special precautions, such as using a clock pin or synchronizing the signal.
https://www.eevblog.com/forum/fpga/ice40hx-can-it-be-clocked-from-any-input-pin/ mentioned this
As well as
https://www.doulos.com/knowhow/verilog/synchronization-and-edge-detection/
I have had designs flat out not work because I tied an external pin directly into logic.
Dont know why, but when I used my crystal clock to synchronize it, all my problems went away

Rainwater · « **Reply #8 on:** August 21, 2024, 01:42:03 am »

Just some quick thoughts before bed.

Code: [Select]

hex1 - 1a729d8bdf
Bin1 - 0001101001110010100111011000101111011111
Bin2 - 0001100000000000000000000000000001101001
Hex2 - 1800000069

Nothing too obvious when looking at the binary
Can yosys generate a timing report? Im really intrested in the lut and path delays.

The datasheet for the ice40 has a 9.4ns lut delay listed. Which seems rather slow to me. But it does say something about being pin to pin. And lets say the net delays are the same
So thats 10ns × 6 lut × 2 nets = 1÷(120×10^-9) = 8.33×10⁶. And your running at 100k

So maybe the clock or reset is bouncing?
setup a counter and toggle a gpio every arbitrary number of clock cycles. That would reveal something.
There are only 3 places the problem could be, clock in, rst, data out

radiolistener · « **Reply #9 on:** August 21, 2024, 07:12:20 am »

you can use FIFO block to transfer data from one clock domain to another.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Understanding Metastability Symptoms (Read 1619 times)

safarir

Understanding Metastability Symptoms

ejeffrey

Re: Understanding Metastability Symptoms

BrianHG

Re: Understanding Metastability Symptoms

hamster_nz

Re: Understanding Metastability Symptoms

safarir

Re: Understanding Metastability Symptoms

Rainwater

Re: Understanding Metastability Symptoms

safarir

Re: Understanding Metastability Symptoms

Rainwater

Re: Understanding Metastability Symptoms

Rainwater

Re: Understanding Metastability Symptoms

radiolistener

Re: Understanding Metastability Symptoms

Share me