Author Topic: Gating the clock (Read 10986 times)

agehall · « **on:** February 26, 2024, 09:02:09 am »

So I'm not a HDL wizard by any means. I dip my toes in Verilog from time to time for hobby projects. One thing that I'm not sure I fully understand, is how to reason around clock signals.

In most literature I've read, it is a big no-no to gate the clock. But then they happily show how you can reduce the clock by implementing a counter and taking the output from the highest bit of the counter and using that as a clock for other parts of the system. Isn't that also gating the clock?

I'm sure this is a stupid question, but I'd really appreciate if someone could elaborate and explain a bit. I feel I'm missing something here...

woofy · « **Reply #1 on:** February 26, 2024, 10:00:56 am »

Not a stupid question at all, but a huge subject.
There is nothing wrong with clock gating, modern IC's do it all the time to reduce power consumption in inactive circuits. Also dividing a clock down to multiple frequencies is ok. The real issue is multiple clocks such as in your divider example, and passing data from one clock domain to another. You have to ensure the data produced by one clock is valid at the time a second clock processes it. There are many ways to deal with that, have a look here for starters:
https://www.fpga4fun.com/CrossClockDomain.html

Berni · « **Reply #2 on:** February 26, 2024, 10:16:33 am »

The reasoning for avoiding clock gating is a bit more in depth than a simple yes or no.

FPGAs are very good at clock distribution because it is a common task, so they have dedicated wires that carry clock around with minimal distortion and skew. However once you gate a clock it means it has to branch off from the dedicated clock distribution network and be carried trough regular logic wires. This causes extra delay skew and timing uncertainty, so the logic driven with that clock will not be as in sync and so it might not reach as high of a frequency before things break due to timing getting out of wack.

The clock distribution network does have more than 1 channel, so it can distribute multiple clocks, but there is still a limited number of dedicated wires (exactly how many depends on the FPGA family) so you want to keep the number of widely used clock signals down to a minimum. One way of doing this is avoiding clock gating.

Also in general the gate always adds a tiny bit of extra delay, so when the clock gated part of the logic interacts with other logic, it will be slightly behind in timing.

So it is not like clock gating should never ever ever be used. Just that it should be used as a last resort when other methods can't do what you need. Like for example disabling parts of a circuit is often more performant to disconnect the data input wires to the circuit, rather than to stop its clock, instead just leave the clock running constantly.

MarginallyStable · « **Reply #3 on:** February 26, 2024, 05:59:52 pm »

Using the output of a timer (or directly from a register) is more like generating a clock in my mind. Clock gating usually utilizes combinational logic (think 'and' gate with the clock as one input) and can be tricky to get right from a timing perspective. You can end up with undesirable glitches on the gated clock that do not meet minimum width requirements, etc. Many FPGAs do have dedicated clock gating blocks that help timing closure complete successfully. The usual recommended alternative is the use of clock enables that are synchronous to the clock, thus can go through normal timing closure, But this does little to save on energy consumption as the clocks still propagate to the disabled circuitry.

radar_macgyver · « **Reply #4 on:** February 26, 2024, 08:05:12 pm »

The issue becomes a bit more clear when you figure out why this is a recommendation. If one uses a divider implemented in general logic to generate a slower clock, the edges are no longer perfectly synchronous with the source clock. Additionally, the routing resources used for such general-purpose signals have routing delays that are hard to predict, especially over process, voltage and temperature.
Dedicated clock routing resources, on the other hand, are designed for low skew between any two points on the chip. They are have much better characterized routing delays, and these are baked into the tables that the place-and-route uses when solving the routing of an FPGA design.
Dedicated clock resources, at least on Xilinx chips (I don't have experience on others) offer clock dividers, PLLs, multiplexers and buffers with a disable control. These can be used to drive the clocks needed in a design, and also gate them off, for example to save power. One can certainly use a divider to generate a clock for something non-critical (eg: blink an LED), and usually the tools will warn you and then exclude such paths from the timing optimization. However, there are usually only a few such buffers and multiplexers on a given FPGA, and the global clock nets they drive are also limited resources (as @Berni said), so the usual approach is to have a synchronous design with one (or relatively few) clocks, taking advantage of the synchronous clock enable inputs on each FF. This is much easier for the tools to analyze and verify timing.
On Xilinx FPGAs, a 'BUFG' (global clock buffer) drives a global clock net. For most smaller FPGAs, there are maybe 8 or 16 global clock nets, each driven by its own BUFG. If you want to implement a gated clock, use a BUFGMUX (multiplexer variant of a BUFG), with one input set to '0', and the select lines of the BUFGMUX implementing the gating signal. The big difference between using a BUFGMUX and implementing the same thing in the general logic fabric is:
1. Each BUFGMUX drives its own dedicated global clock line, so the tools have a known delay when computing the clock skew. If one uses general fabric to do this, the tools will use general purpose routing to bring the gated clock to the target flip flops, and the routing delay for these nets is not easy to predict.
2. The BUFGMUX has extra logic built in to avoid generating runt pulses on the global clock lines. For example, if you're gating a clock with a signal that can transition close to one of the edges of the clock, it can produce a runt pulse that's a fraction of the period of the clock. BUFGMUXes include logic to avoid this.

Each Xilinx family of chips has a clock resources user guide where these and other topics are explored in detail.

agehall · « **Reply #5 on:** February 27, 2024, 06:13:01 am »

Thanks for all the responses! This really helps me formulate a better understanding of the problem.

zapta · « **Reply #6 on:** June 17, 2024, 09:56:26 pm »

Quote from: agehall on February 26, 2024, 09:02:09 am

In most literature I've read, it is a big no-no to gate the clock. But then they happily show how you can reduce the clock by implementing a counter and taking the output from the highest bit of the counter and using that as a clock for other parts of the system. Isn't that also gating the clock?

I use clock enable instead of clock gating and just feed the system clock globally to all the modules. For example, if you want something to happen every N system clocks, have a divider that emits high only one every N clocks and use this to enable state transitions in downstream modules. This way I stay in the perfect world of synchronous design.

langwadt · « **Reply #7 on:** June 17, 2024, 10:30:48 pm »

Quote from: zapta on June 17, 2024, 09:56:26 pm

Quote from: agehall on February 26, 2024, 09:02:09 am
In most literature I've read, it is a big no-no to gate the clock. But then they happily show how you can reduce the clock by implementing a counter and taking the output from the highest bit of the counter and using that as a clock for other parts of the system. Isn't that also gating the clock?

I use clock enable instead of clock gating and just feed the system clock globally to all the modules. For example, if you want something to happen every N system clocks, have a divider that emits high only one every N clocks and use this to enable state transitions in downstream modules. This way I stay in the perfect world of synchronous design.

I never had a reason to try but I wonder if the tools will handle it for you if you use something like Xilinx's BUFGCE instead of a clock enable

glenenglish · « **Reply #8 on:** June 18, 2024, 12:06:53 am »

That's what CLOCK ENABLE is for on the primitives. You don't need to clock gate the whole clock tree (unless that provides the required effect)

yes to divide down the lock, must be a synchronous divider and the tools must know all about the timing relationships.

register regularly and make it easy for your tools.

Don't ignore tool warnings.

Don't distribute and use async resets unless locally registered against local clock signal.

matrixofdynamism · « **Reply #9 on:** July 26, 2024, 10:22:05 am »

I have not read through the other answers so there might be repetition.

Clock gating is one way to save power in a digital design. Why? Clock is a signal that toggles all the time. Every time a signal changes state, it causes surge of current flow in the device. Less switch activity means less power dissipation (this applies to all signals). By stopping the toggling of clock signal, we can cause the power dissipation to reduce.

Using an AND gate is not the correct way to "gate" a clock. Please take note. It will work in theory but is wrong method. In practice it will add delay into the clock signal. This is due to propagation delay of the logic gate you use to gate the clock. The end result is that we end up with clock skew in the design. Clock skew is bad. This is because it can lead to timing violation in a design. Timing violation is a separate topic.

Using a dedicated clock buffer or "integrated clock gating cell" (ICGC) in the design is the correct way to gate clock. These hardware resources are designed specifically to be able to gate clocks. Yes, you will need to look for these in your FPGA design library of the tool you are using and then instantiate them directly into the design. VHDL can't infer these type of resources.

Please note that by clock gating, I am taking this to mean that we basically "disable" the clock signal going into bunch of logic in the design. No clock means no switching activity in the design at all. If a design is idle i.e it has no valid input, it makes sense to disable the clock until that circuit block is needed again. The amount of power saved depends on the size of the circuit block.

There is another approach which is different but has the same end result of stopping design activity. This is where we use a clock enable signal. The registers in the design maintain their value if clock enable is low and only do normal function of registering the input when clock enable is high. This can also save power but the clock signal itself will still be switching but be ignored. So this other method to clock gating that does not actually turn off the clock signal, has less power saving than the actual proper method.

For large chunks of logic that process data in sequence. One other way of saving power is to actually tie the inputs to zero and keep them that way. This will save a tiny amount of power too.

Reducing power consumption is a whole topic but that is another discussion. Just remember that sticking an AND gate into the clock signal path is wrong way to do clock gating.

glenenglish · « **Reply #10 on:** July 26, 2024, 10:36:46 am »

Quote from: matrixofdynamism on July 26, 2024, 10:22:05 am

I have not read through the other answers so there might be repetition.

THEN READ BEFORE WRITING

radiolistener · « **Reply #11 on:** July 28, 2024, 11:45:32 pm »

you can do clock gating and use counter to divide it. But the problem here is that output of these operation will have a lot of issues and give you floating errors or just don't works at all if clock speed is high enough. This is because such output will have some delay and as result will not be synchronized with the main clock. It may lead to metastability issues and other synchronization problems.

And the delay may be variable, depends on some conditions. So, you cannot use such output as a clock. If you try to use it as a clock it leads to many floating issues, for example it may fail even if you change some variable in another module which is even not related to this clock, just because other elements will be used after synthesis and it will change some signal delay for a little.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Gating the clock (Read 10986 times)

agehall

Gating the clock

woofy

Re: Gating the clock

Berni

Re: Gating the clock

MarginallyStable

Re: Gating the clock

radar_macgyver

Re: Gating the clock

agehall

Re: Gating the clock

zapta

Re: Gating the clock

langwadt

Re: Gating the clock

glenenglish

Re: Gating the clock

matrixofdynamism

Re: Gating the clock

glenenglish

Re: Gating the clock

radiolistener

Re: Gating the clock

Share me