You're quite right that the transformer is there to provide isolation, which is a required part of the Ethernet spec.
The capacitor to GND is there for EMC. Connecting it to a locally quiet ground means there is zero common mode component on each of the differential pairs, and if the differential components cancel (as they should, in theory), that results in minimal emissions from an unshielded cable.
It's rated 2kV in order to match the dielectric strength of the transformer; this rating is chosen so as not to expose the user to a shock hazard if the cable is shorted to the mains supply.