I find information theory provides a perspective that ties things together nicely.
We know from physics that all elemental processes are reversible. Why, then, should entropy increase?
Suppose we have a system of interacting particles. Each particle can exchange some information with its neighbors, and so on. We shall define the system as having a temperature, so that there is an average energy in the system, and therefore there is information to exchange.
This is necessary, because if the system were at absolute zero, then all particles would be in the ground state, and there would be no information to exchange.
Because interactions are necessarily reversible, the essence of "information", is knowing that a given particle, or set of particles, are absolutely known to be in some state.
Now, consider one particle. It has a given existing state. (If we're looking at, say, nuclear spins, then it might be spin-up or something.)
If that state is mixed with a neighboring state, the result is an entanglement of those two states.
In the new combined state, the fixed, known information (that the particle had a spin-up state) remains present. But after this step, you now must test two particles, and subtract the "interference" (the unknown state entangled with it) to measure the original state.
If those new states mix with their neighbors, and so on, more and more individual states become entangled (and in more and more possible permutations), and more and more particles must be measured, and their interactions subtracted, in an ever more complex series of operations.
Effectively, the kernel of information contained in any one particle, at any point in this state-transition diagram, diffuses outward as more (unknown) states interact with, and dilute, it.
It is very difficult to explain a system like this without using chronological terms ("before", "after"), but this system can be described in abstract without any assumption of "time". You can simply have, for example, an array of variables in computer memory, and iterate a function on it. There is always a successor operation, and because the function is by definition reversible, there is always a predecessor operation, too. You can use either function on the array.
But only one function, when iterated, will result in something that looks like "after".
Indeed, the arrow of time, is uniquely, and necessarily, identified by being the direction in which more states become mixed.
It is the inevitability of state mixing, of information diffusion, of information saturation -- the effect that I spoke of earlier, that information can always become more scrambled, but almost never, less -- that total entropy will rise, and it is in this direction that we experience "time" as we call it.
Perhaps this even gives insight into how one might attack my proposed problem:
Manipulate the statistics of Johnson-Nyquist noise in an otherwise ordinary resistor.
For simplicity's sake, perhaps we shall use a somewhat contrived resistor, one which is metallic, ohmic, has a nice characteristic impedance (so the noise is flat with frequency), and consists of as few atoms as possible, being perhaps nanometers wide. We shall use a resistive element for the resistor itself, and superconducting wires to connect it to a very sensitive amplifier. It will be cool, but hardly near absolute zero. The dominant thermal modes will still be vibrational (phonon) and electric.
We also put it in a vacuum, so we avoid phonon coupling from gas molecules. Being small, it also avoids basically any E&M radiation that would be a problem: as long as the whole resistor and superconducting wire leadouts are completed in some tens of nanometers, the structure should be literally "too small to see".
Furthermore, we construct as much as possible from isotopically pure substances, using spin-zero isotopes (for example, a carbon film resistor, diamond substrate and insulation). This prevents nuclear moments from coupling as well.
That's the setup. Now for the hard part. Suppose we construct a phased array of frequency/energy-agile photon amplifiers, attached to the base of this resistor fixture.[1] Since the only energy in the system is coming from thermal phonons, and phonon-electron interactions, it should be -- in principle, if the world is as ideal as you suppose -- possible to manipulate, in some way, the fluctuations seen at the resistor terminal.
Now, with "only" a few trillion atoms present, it should perhaps be possible to analyze the number of states, and by applying the correct sequence of phonon (vibrational) stimuli, manipulate the waves influencing the electrons, and perhaps manipulate the terminal voltage, ultimately.
This does have to be subject to an additional constraint: because you essentially have control over most vibrations in the system, it is trivially easy to change its temperature, and thus vary the noise intensity. So I would further require that the average energy level from the phonon array be constant. With, say, a 1000^2 array of elements, this will retain more than enough degrees of freedom to be able to manipulate the signal in more precise ways (i.e., not just turning the noise on and off). In general, the data sent to the array will satisfy Maxwell-Boltzmann statistics, because it's the result of a deconvolution with some assumed model of the system's scattering behavior. Likewise, the signal output can't exhibit a different RMS value, but for example, its frequency spectrum could be modulated to convey a message.
[1] No one knows how to do this, right now. A Piezo array would be a good start, but ferroelectricity breaks down on small scales, so we can't make a nanometer scale phased array. Some research has been done with phonon lasers and stuff, which might eventually give rise to such a generalized instrument. For sake of argument, let's say we have one, though.
I'm not sure just how many states are necessary to analyze such a system. Things go exponential (or factorial) really freaking quickly. A million-wide phonon array might not even be sufficient to control a trillions-of-atoms system. And the amount of memory and processing required to real-time solve the deconvolution is at least quadratic in that, so you're talking about basically doing DSP on the human genome in real time. And you only have a single output variable (voltage), so you have to watch it for an extremely long time to study what any given input sequence is doing to it, if anything.
Clearly, doing this even for a very small, contrived system is difficult beyond belief; doing it for any macroscopic system, is patently impossible in the realest possible meaning: that, even if each and every other atom in the Observable Universe could be turned towards computing the history of a single cubic centimeter of matter, there would be no time scale over which accurate enough observations could be made, nor the analysis performed, where that state transition could ever be computed!
It is this, that it is the most technically impossible: in the thermodynamical, statistical, information-theoretic sense, that it is impossible to analyze bulk randomness. It is not simply chaotic, it is manifest, pervasive and fully saturating. To misunderstand this fact, is to delude oneself worse than thinking 1=2!
Tim