Perhaps it's not so simple. Every vendor advertises the number of bits in the DAC, the higher the better. But with DDS, the accuracy of waveform rendering depends not only on the clock frequency and DAC resolution, but also on the look-up table size (LUT resolution) that defines how many waveform points are tabulated. Even with a good DAC, it's easy to design a poor generator when the LUT size is not enough. The size does matter, the larger the better. But few vendors advertise the LUT resolution number and there must be a reason for that.
When the waveform math function is known (like with a built-in function waveform), it's not necessary to load all the points of the waveform cycle to LUT at once. A more economical way is to break the total number of points into several parts and load just one part to the LUT at a time. E.g., with sine waveform, it's enough to load the points of 1/4 cycle to LUT and then update the LUT with the next part at the run time. In that way, effective LUT resolution can be four times greater than the native LUT resolution. It's economy. Perhaps the imperfections in the waveform at the part boundaries are related to something like that.
Edit: Now i have to leave for some time, so to clarify some obviously illogical statements above. It was for simplification. The key point is that with a function waveform, only part of the points are loaded to the LUT. But a modern DDS gen can produce a waveform of hundreds of MHz. At that frequencies, no time is available even for thinking about loading a kilobytes of data at run time. So it does not work exactly like this, but much faster. That stuff is very function-specific and many related patents are published for every popular function. For instance, with the sine function, only the points of the first quadrant of the cycle are in the LUT. At the run time, the points are enumerated by the RAM address counter in the FPGA. The counter operates in the up-count mode, so the points are outputted in the ascending order. When the last point of the first quadrant has outputted, the waveform amplitude reaches its culmination. At that moment, the counter switches to the down-count mode and from now on the same points are outputted in the descending order, making the second quadrant. At the end of the quadrant, the waveform amplitude reaches zero. At that moment, the counter switches back to the up-count mode, and one more re-configuration is done such that a NOT gate is inserted in the data path of every data bit of the point. That starts the third quadrant, when the waveform amplitude value is negative. But an inversion is not a negation. Inversion of zero gives 111111..., which is -1, while the negation of zero gives zero. It's easy to correct the problem by adding 1, but that would require one more clock cycle and more complex hardware for the carry bit propagation. It seems they'd decided just to leave it as is.