I've done some mapping experiments on their technology using direct synthesis, i.e. not going through their tools which at that time was a Xilinx primitive emulation.
The agreements made won't let me publish the results, but bottomline: The CPE primitives perform very well with cascaded signal processing networks, my dry dock experiments with post-mapping co-simulation hinted that this would perform way better than the classical DSP48 architectures in terms of routing while just using the bits that are needed (in 'stupid' AI acceleration applications, a 4x4 bit truncated multiplier can be just sufficient).
However, this stopped at place & route, the tools just weren't ready at the time and the LUT trees were organized non-optimally, apart from several issues with yosys and their memory inference. So, eventually I was running out of time, gave up, and had to revert to classic LUT-Architectures.
When it comes to power, I would *suspect* that the CPE architecture leaves more room for power optimization when not going through some emulation, but again, never got past the mapping stage. I might want to revisit some issues for academical fun, since the yosys CXXRTL back end turned out to handle quite some complexities.
About the company I know so far that they had been doing ISDN chipsets for a long time. They have obviously gotten some of the (sparsely available) technology funding to get going with their FPGA architecture and avoided all other hypes, apart from an announcement from Tachyum about a cooperation (no comment...).