A lot of people still pile in and start writing code instead of planning it.
If you fail to plan, you plan to fail.
True.
Although, I do tend to write code before I finalize a plan: I create separate test cases to check the key parts and algorithms of the core logic, to make sure the plan I'm planning on is on a solid ground. I usually learn a lot about it in the progress, and end up rewriting the part anyway in the actual project, so one could consider it wasted effort, but I don't mind: understanding the key parts of the algorithm is worth the time and effort spent to me.
Example:
In certain types of molecular dynamic simulations, temperature control is implemented by simply scaling particle velocities. This works well for bulk, but for small independent clusters, the initial random velocities do not cancel out perfectly, and the temperature control just makes them rotate faster. In some cases this is okay, in others it is unwanted. So, one solution is to determine the particles that form such clusters, and dampen their rotation.
How exactly do you determine which atoms are part of a cluster, when you have some kind of gas permeating the simulation volume? You might use rules of thumb, but they'll likely fail. If the simulation is about growing clusters in a very low density hot gas (a very common real life case), atoms are constantly aggregating into the cluster. In fact, you don't even know how many clusters you have in the simulation. Asking the human to select the atoms in the cluster is not viable, since simulations tend to be run on HPC clusters in queues, not interactively.
It turns out there is a trivial solution that costs very little. It is based on the
disjoint-set data structure. Basically, before each time step, you initialize the disjoint set (with an unique integer for each atom, basically numbering them). Then, when you calculate pairwise interactions –– these simulations almost always use classical potential models, not quantum mechanical ones –– you merge any sets that are within a cut-off distance, approximating covalent and ionic bonds. (You can do a more precise rule, based on potential energies and the force between the pair of atoms, but it turns out to not be necessary for this.)
At the end of the time step, you flatten the disjoint set paths, and you end up with a cluster identifier for each individual atom.
To implement this, you wouldn't just start modifying your favourite simulator like
Gromacs or
LAMMPS –– both are open source, so you're definitely allowed to. Or, well, you'd start by experimenting and testing, before formulating a plan.
Without a plan, just making changes that seem to produce the results you want, you're risking the results of all simulations done by the modified version!
After doing the initial tests so you roughly know where to add or modify the stuff needed, you write a plan. Then, you check the plan against the logic of the existing simulator, perhaps talk to the developers on the mailing list for confirmation, and only *then* do you start implementing the actual changes.
Now, this may sound like a lot of work. Thing is,
it actually saves total effort. (At least if we count in the efforts spent to debug willy-nilly made changes and the ensuing erroneous results and other users' efforts in finding why.)
I do not treat scientific simulations any differently than I do normal applications. I do not want my applications to silently garble or lose data; at minimum, I want them to tell me whenever they detected something unexpected. Having them do retries and workarounds is a plus. None of this is really
hard; it just takes developers that
care.