I have a samD10, which is similar enough. You have specific requirements for reading/writing some of those gclk registers, including checking for syncbusy, so accessing these registers via anything other than the 'top' struct member (register size) will result in those requirements not being met.
A couple examples from my gclk functions, which do the 32bit writes and do not preserve any bits (these are simple functions, and if other options needed in genctrl then modification or more functions needed)-
Gclk.generatorSource( Gclk.GEN3, Gclk.OSC8M );
Gclk.generatorDivide( Gclk.GEN3, 0 );
Gclk.generatorSource( Gclk.MAIN, Gclk.DFLL48M );
Gclk.generatorDivide( Gclk.MAIN, mhz <= 12_MHz ? 2 : mhz <= 24_MHz ? 1 : 0 );
Gclk.generatorUser( Gclk.GEN3, Gclk.SERCOM0 );
//etc.
//source
auto
generatorSource (GENERATOR g, SOURCE s)
{
while( busy() ){}
reg_.GCLK_GENCTRL = GENENbm bitor (s<<8) bitor g; //32bit write
}
auto
generatorDivide (GENERATOR g, u16 d)
{
while( busy() ){}
reg_.GCLK_GENDIV = (d<<8) bitor g; //32bit write
}
auto
generatorUser (GENERATOR g, USER user, bool enable = true)
{
while( busy() ){}
reg_.GCLK_CLKCTRL = (enable ? CLKENbm : 0) bitor (g<<8) bitor user; //16bit write
}
You also have wait states to configure for various cpu speeds and voltages, so if you skip that and are running fast, then that would also cause problems.