I suppose an increment of 1 might be slightly faster on some platforms. Difference being there's usually an INC instruction, whereas for others it's LDI reg, INCREMENT / ADD accum, reg, or ADD imm, more or longer instructions in either case.
I would not worry about it, at all. There are far more important things to think about. Loop optimization is a late, late stage of development.
The loop check may also vary, for example an increasing loop uses CMP accum, LIMIT which may be longer than the TST accum (or none at all when the ADD/SUB instruction returns zero) for decreasing to zero. Mind that comparisons to zero are only equal when the initial value is a multiple of the increment.
Loops can be on pointers as well. Instead of
for (i = 0; i < size; i++) {
q[i] = p[i];
}
you might have something like,
void* qq = q;
void* pp = p;
void* ppend = pp + size;
for (; pp < ppend; ) {
*q++ = *p++;
}
In certain (simple) cases, the compiler may elucidate this for you -- check the output listings. This tends to be more compact, but may not necessarily be faster, for example on 8-bit platforms the 16-bit pointer compare requires at least two instructions (compare, compare with carry), usually four (load immediate for each byte of the comparison).
And in such cases, you could go even further and think about page-aligned accesses, for example copying a buffer aligned to 256-byte start and length, so that the low-byte compare can be trivial (0x00). The compiler won't know about this; this only applies once you're deep in the assembler, desperate for CPU cycles, pushing around object allocations for lucky breaks like this.
Tim