The issue becomes a bit more clear when you figure out why this is a recommendation. If one uses a divider implemented in general logic to generate a slower clock, the edges are no longer perfectly synchronous with the source clock. Additionally, the routing resources used for such general-purpose signals have routing delays that are hard to predict, especially over process, voltage and temperature.
Dedicated clock routing resources, on the other hand, are designed for low skew between any two points on the chip. They are have much better characterized routing delays, and these are baked into the tables that the place-and-route uses when solving the routing of an FPGA design.
Dedicated clock resources, at least on Xilinx chips (I don't have experience on others) offer clock dividers, PLLs, multiplexers and buffers with a disable control. These can be used to drive the clocks needed in a design, and also gate them off, for example to save power. One can certainly use a divider to generate a clock for something non-critical (eg: blink an LED), and usually the tools will warn you and then exclude such paths from the timing optimization. However, there are usually only a few such buffers and multiplexers on a given FPGA, and the global clock nets they drive are also limited resources (as
@Berni said), so the usual approach is to have a synchronous design with one (or relatively few) clocks, taking advantage of the synchronous clock enable inputs on each FF. This is much easier for the tools to analyze and verify timing.
On Xilinx FPGAs, a 'BUFG' (global clock buffer) drives a global clock net. For most smaller FPGAs, there are maybe 8 or 16 global clock nets, each driven by its own BUFG. If you want to implement a gated clock, use a BUFGMUX (multiplexer variant of a BUFG), with one input set to '0', and the select lines of the BUFGMUX implementing the gating signal. The big difference between using a BUFGMUX and implementing the same thing in the general logic fabric is:
1. Each BUFGMUX drives its own dedicated global clock line, so the tools have a known delay when computing the clock skew. If one uses general fabric to do this, the tools will use general purpose routing to bring the gated clock to the target flip flops, and the routing delay for these nets is not easy to predict.
2. The BUFGMUX has extra logic built in to avoid generating runt pulses on the global clock lines. For example, if you're gating a clock with a signal that can transition close to one of the edges of the clock, it can produce a runt pulse that's a fraction of the period of the clock. BUFGMUXes include logic to avoid this.
Each Xilinx family of chips has a
clock resources user guide where these and other topics are explored in detail.