Though not with this code, Altera's Quartus II v9 & above, just adding an extra stage of DFF, without any additional logic or deliberate pipe-lining, before or after feeding such a piece of HDL code will actually have the same effect of potentially slimming the LUT count and doubling the FMAX. Though this may be just the way I was coding at the time, but there are features in the compiler to decompose logic & reconstruct logic to achieve the best possible FMAX both int compiler stage and the fitting/physical synthesis stage.
Darn, if I had quartus installed on one of my PC's today, I would have played with this VHDL code already and posted the results...
There are a lot of undocumented tricks on how you can get great results with inference - how to cast your code 'just right' so a DSP block is inferred, with all the right pipeline registers, or so it uses block RAM, or using LUTs become shift registers rather than a chain of FFs.
The thing that annoys me is that the patterns that work are not well defined. For example, in a clocked process
data <= memory(to_integer(unsigned(address)));
should infer a block RAM if memory is big enough, but as a general rule, anything with an expression for the array index won't:
data <= memory(to_integer(unsigned(address)+1));
Will only infer LUTs and flip-flops. It leaves you with 'land mines' in your code:
addr_temp := unsigned(address)+1; -- assign address to variable
data <= memory(to_integer(addr_temp)); -- look up address
They have big flags waving away saying "Who wrote this junk! Make me shiny!", and when you touch them your design blows up.
(the example is somewhat contrived, I would have to test to find an exact case when I can prove this to be the case, but you get the idea)