It's always advisable to set derivative to zero unless proven to be needed.
I don't like this kind of grossly incorrect generalizations. When PID controller is used as a generic control component without conversion (e.g. integrating, taking derivatives) of inputs and outputs, what terms are needed depend on what is being measured and what is being controlled. So the roles, mental expectations of what P or I or D do are not actually fixed!
Very relevantly in OP's case, for example, if you control say a motorized valve based on water temperature measurement, or a motorized potentiometer based on volume (or reference resistance track) measurement, what terms are needed depend on how the motor is driven:
If the motor itself accepts position input, you are correct, D should be initially set as zero; PI controller is the primary method; P gives transient response and I removes offset error. P should be set fairly high, e.g, to oscillation and then halved; and just a little bit of I added to remove offset.
BUT:
If the motor is given relative movement commands (e.g., pulses of varying length based on PID output value), then the motor becomes a mechanical integrator itself, and the roles of the terms become integrated, too: I term becomes double integral (integral of integral), which is useless. P term becomes an integral, and removes the offset (thus small amount is used). D term becomes P, and provides transient response! So you need a PD controller with quite a lot of D term, and you can't have a responsive and stable feedback without it.
I know the whole PID control theory is a huge can of worms, and is best tackled in a iterative hybrid strategy of just building stuff and experimenting + looking at literature. This PD thing I discovered when building a motorized heating shunt valve controller, which mixes hot and cold water, measuring output temperature. Because the motor is just a synchronous AC motor, driven in pulses clock-wise or counter clock-wise, attempts to control it with PI control all failed miserably. The elements for success were:
* Backlash removal logic (simple piece of code which detects when direction changes and then lengthens the pulse)
* A lot of D term, and removal of I term
Very usually just PID alone doesn't give very good results even if you hire infinite number of monkeys for infinite time to generate optimal coefficients. Usually the system needs something else too, like the backlash removal I mentioned, or some physical feedforward (e.g., in a robot, from gyroscope yaw angular rate directly to motor currents).