A couple of thoughts below, and remember this comes from experience building profitable circuits in a production environment, and where customers pay for reliability and 24/7/365 operation, OR where the device might go into shutdown for an hour, a day or a month or more. For us, failure in the field is NOT an option (at least in spirit), and if there is a field failure we take that very seriously and try to improve the product so that particular product failure -never- happens again. A hobbyist may or may not choose to do it this way depending on what the application goal is. I have to just keep it to the highlights here, I can't give you an exact recipe. Use the information at your own risk, since your situation may call for a different procedure.
RE: Burn in procedure: Yes there are many temperature cycles that go into burn in, which is a combination of a long steady run, and then a period of thermal cycles, etc. We throw everything at the LTZ and it had better work within spec after every trip thru the thermal chamber. This is usually a good way to weed out bad performers on other areas of the board too.
You have to realize that if your product is shipped air freight in an unheated airplane cargo area, it's going to get very cold. Or if it goes to somewhere like a hot customs inspection area and sits for a week its going to get very hot. The customer doesn't care - they need it to work out of the box, and we provide guidelines for a minimum stabilization time before guaranteed specs are met after shipping. Hint: it's not really the extreme temperatures that causes real issues, it's the -rate of change- of temperature that can cause surprises on your Vref., so that is something we test for on every device to meet customer design requirements. Sometimes you see something break with a small 30°C shock test, and it might have worked fine before over a slow 100°C ramp. Everything around the LTZ has to remain in a working condition after every thermal cycle, no matter what we throw at it in our test design. Our test will include the customer requirements plus another safety factor to make -absolutely sure- it works to specification at customer site - with a comfortable margin for error.
We test at the whole range -65°C to 150°C for storage, and that can be over a day or some weeks, and then there is the actual operating temp range which varies by customer design requirement. We never use the non A version, only the LTZ1000A (same as most 3458a's, which we find overall is more stable over the long term, at least in our tests) and yes there is a small power cycle hysteresis but that should be well under 0.5ppm on a stable LTZ, and most of the time we need to hit a 2ppm or 4ppm stable operational window - so we're not trying to replicate a Fluke 732.
We do develop a stabilization time procedure for the customer so they know when it should be back in spec after a power down of say 15 minutes, 1hr, 12hr, 24hr and 30+ days. It should recover typically within a couple hours or a day or two, sometimes longer...but even then it should stay within the guaranteed 24hr drift rate spec. Like any precision equipment if it has been powered off for a long time, you expect to power it on and let lit stabilize for some time before it goes back to work and settles back to "in spec" condition.
It just depends on the design requirements, and usually each customer has a very specific need. Our job is to deliver customer satisfaction - because happy customers are the very best variety.
RE: Keysight 34470a as design inspiration - That's not really a metrology grade instrument. The Keysight sales guy brought in an early model to demo, and the stupid thing drifted more in 4 hours than an old reliable 3456a drifts in a month. It could be better now, but look at the specs - be careful when they start listing accuracy in % rather than PPM and compare 24V 10V range accuracy to a 3458a...and let me know which one is much more stable. Plus those danged PLayskool child-safe recessed banana plugs tells you right there they aren't paying attention to low-thermal design goals (I know they are trying to sell to the safety-spec market but that is not the metrology market). I know the 34470 is not a replacement for the 3458 and is at a price point to match - but the new design techniques don't inspire anything from a low-ppm tool point of view. They may have some of those problems worked out but we sent the salesman out the door with his stuff...he didn't sell us on any of newer, cheaper DMMs. Our fleet tends to stay at the 3456a / 3458a style for lower drift. In the long term the 3458a's don't cost much to own (for the accuracy provided) since they are usually very reliable - and if you get it under the extended warranty.
RE: Slots on LTC6655... That package is SMT, and is prone to every circuit board stress / flex problem known - which really shows up on the trimmed die resistors as increased noise. Not a low ppm-style part - that's like a toy compared to LTZ, but not every design needs an LTZ either. Sometimes even slots aren't enough for that package, but use what the design calls for, and in this case slots help and are recommended. But comparing a '6655 package to an LTZ is like comparing apples and oranges - the LTZ is much more insulated from PCB mechanical stresses.
The other thing to watch out for with slots - be careful that adding slots doesn't increase trace length so much that now you've increased EMI / current loop antenna effects. If you try to help too much in one area you wind up with decreased performance in another area.
Have fun!
PS EDIT: Also look at something like a Datron 4808 calibrator 10V accuracy spec in both long and short term (apparently with a slotted LTZ, I haven't seen this in person) and tell me if that's really different from a standard 3458a...and then compare it to a High Stab 3458a with 02 option (exact same board as any 3458a still no slots, just a rather well performing selected LTZ). Is the calibrator LTZ Vref design really more stable overall than what is really the basic LTZ datasheet circuit in the 3458a?