Have an array of 250 entries containing the number of visible entries in each n-millionth block of the main array. Call it block size array, BSA (but preferably you would come up with a better name). So if the array has 130 million entries, all expanded, the first 130 entries in BSA will have the value 1 million, the rest zero.
When an entry is collapsed or expanded, the relevant BSA entry is adjusted. If the change straddles an n-million boundary, 2 entries in the BSA may need adjusting.
To calculate the scrollbar position would now take, in the worst case, 250 iterations through the BSA (summing the entries) and up to 1 million iterations through the relevant block of the main array. By keeping track of the total number of visible entries you can limit the BSA search to 125 by starting at the beginning and working forwards, or backwards from the end.
One million iterations is still quite a lot but should be reasonably quick on a non-Neolithic PC if coded reasonably, but might not be very smooth for the scrollbar. Increasing the size of the BSA to 2500 entries reduces the search range to 100K, but increases the BSA search to 1250 iterations max.
Searching a block of the main array block requires iterating through the currently visible entries, So it would be reasonable for every entry to have a size field indicating how many subsequent entries are contained within a collapsed set, so if three associated entries are collapsed into a single visible entry and surrounded by expanded entries the size fields would be ...1,1,1,3,0,0,1,1,1...
Extracting these size fields into a separate array would improve cache behaviour. Knowing the size is always 7 or less, this array can be an array of 64-bit structures each containing, say, 8 x 8-bit fields, each containing the size value of 0 to 7. Since the object of the search is to count the number of entries from the start of the block, speed would be improved by processing the size array in 64-bit chunks - 36 x 64 bit additions can be performed without any of the summed size fields overflowing which would be followed by a series of 8 steps to sum the 8 partial sums. This reduces the 1 million iterations to 10^6/8 * (36+8)/36 operations = 153k.
Reducing the number of bits for each size field saves memory but increases the total number of operations:
7 bit fields => 10^6/9 * (18+9)/18 = 167k ops,
6 bits => 10^6/10 * (9 + 10)/9 = 211.1k ops,
5 bits => 333.3k,
4 bits => 562.5k.
More complex data structures, including B-trees etc. can obviously be used at the expense of considerably more complexity. I would start with a BSA of 2500 entries which would require 15.3k operations to search a block at the expense of more memory.
If that is still too slow a BSA of 250,000 entries would probably be acceptable in memory use compared to the size of the main array so a block search would now be reduced to approx 153 operations maximum but the search time of the BSA would increase to 125,000 iterations. This could be addressed, quite simply, by using multi-level BSA arrays - eg. a top level, 1000 entry array which contains the total number of visible entries within each 10000th of the main array would reduce search time to 125 iterations max of the high level BSA plus 250 iterations max of the relevant part of the BSA with not much extra complexity. 1000 probably wouldn't be the optimum size - it's just an illustration..
I hope the above isn't too confusing but I did try to keep the algorithm as simple as possible.