Maybe something like:
#define bf_sz 32
typedef struct{
uint8_t data;
uint8_t row;
} row_data_t;
row_data_t oled_cache[bf_sz]; // Stored in priority order
Anytime a new row is accessed, fetch the data from i2c and overwrite the lowest priority block. Whenever a cached row is accessed, increase the priority and sort.
But I think this would only work in very specific cases, like writing to a very small area.
Probably fetching the data from i2c isn't that slow, but it requires some smart prefetching to avoid a lot of overhead.
The drawing routines could detect the rows beforehand, and request filling the cache as required.
For example, drawing a line from xy(0,0) to xy(128,0) would need accessing row 0, columns 0-127.
If the buffer size is 32 rows, break the operation in chunks, much faster than single-byte reads.