Wide format, and even LGA (checkerboard pinout), are widely used for interposer or back-of-die/co-pack bypassing of modern SoCs, CPUs, etc.
Conversely: have seen wide-format chips used all too often with a lazy dogleg and single via to each plane, lol. You aren't gaining anything if you aren't also using the minimum-inductance connection (e.g. a pair of vias flanking each pad, as close as possible given pad/drill/via-in-pad rules, and tenting if used).
To be completely clear, inductance goes as µhl/w, for height above plane h, length l and width w. In PCB, mu will be µ_0. "Goes as", because this is true in the thin/wide-stripline case, where side fringing and end connections can be ignored. Neither of which is really quite true in a capacitor geometry, but the general trend remains.
You also tend to see very thin chips in interposers, which are themselves PCBs made with quite thin and fine-pitch geometry, hence can have quite low Zo, low stray L, while fanning out quite dense structures like ICs. Again, low h makes for low L.
The PDN (power distribution network) design of a modern BGA/LGA device, spans many tiers. The die itself has to handle harmonics into the 10s of GHz, and this is largely handled by nearby gates themselves -- with careful design, it can be ensured that only a modest fraction of gates in a general region are toggled each cycle, and thus the static ones (that edge) have capacitance that acts as bypass for the neighbors that are switching. Further up the die stackup, alternating metal grids are used to distribute power, contributing much capacitance in the process (I don't know actual figures, but I would guess this can total several µF; they can interleave a lot here!). Going from chip to interposer, power bandwidth is in the 100s of MHz to low GHz, and a fine pitch PCB is required, with low profile, wide-format chip caps. Finally at the board level, lands/bumps can only ensure up to maybe 100MHz or so bandwidth -- there's simply too much length between PCB and package, at the impedances demanded, to have any effect above 100MHz or so, even if the whole package were an alternating checkerboard of balls -- the fine tiers of on-package filtering are mandatory for practical commercial application!
On the upside, the on-package filtering relaxes PCB layout, so that VCORE/VIO/etc. and GND can be placed in more orderly fashion (square arrays, rings, rows, etc. of each connection), and fairly large caps placed on the board (say, 10s uF). Still further away, bulk caps can be placed, perhaps 100s uF polymers, and finally down at a power bandwidth of 100s kHz, maybe low MHz, the PoL converter (across however many phases) supplies VCORE from whatever other bulk source is in use (which might in turn be in the 10s kHz bandwidth, and ultimately supplied from, say, a PFC reservoir capacitor finally with bandwidth in the single-Hz range, or a battery in the mHz to µHz range).
Tim