Author Topic: Ten-minute code-crunch challenge (Read 2862 times)

Syntax Error · « **on:** February 06, 2021, 12:47:50 pm »

A ten-minute code-crunch challenge

The following Javascript patch is extracted from an enterprise environment and is used to compute an integer 'i'.

Vars A thru G hold any positive integer from 0 to 1,073,741,824 where, after right bit shifting, the lowest nibble is used to index a signed integer held in array X.

Can you normalize (in psuedo code) this Javascript to run faster and/or be easier to maintain? Or is this block of code bloat already optimal for running in most JS scripting engines? Incidentally, the developers appeared to place no traps for a numeric over|underflow.

No prizes or a job with Google, just kudos

Code: [Select]

// var X[0]..[15]  signed integers up to +/-1,073,741,824
// var A...G       unsigned int up to 1,073,741,824
var i=
X[(A>>28)&15] + X[(A>>24)&15] + X[(A>>20)&15] + X[(A>>16)&15] + X[(A>>12)&15] + X[(A>>8)&15] + X[(A>>4)&15] + X[A&15] + 
X[(B>>28)&15] + X[(B>>24)&15] + X[(B>>20)&15] + X[(B>>16)&15] + X[(B>>12)&15] + X[(B>>8)&15] + X[(B>>4)&15] + X[B&15] + 
X[(C>>28)&15] + X[(C>>24)&15] + X[(C>>20)&15] + X[(C>>16)&15] + X[(C>>12)&15] + X[(C>>8)&15] + X[(C>>4)&15] + X[C&15] + 
X[(D>>28)&15] + X[(D>>24)&15] + X[(D>>20)&15] + X[(D>>16)&15] + X[(D>>12)&15] + X[(D>>8)&15] + X[(D>>4)&15] + X[D&15] + 
X[(E>>28)&15] + X[(E>>24)&15] + X[(E>>20)&15] + X[(E>>16)&15] + X[(E>>12)&15] + X[(E>>8)&15] + X[(E>>4)&15] + X[E&15] + 
X[(F>>28)&15] + X[(F>>24)&15] + X[(F>>20)&15] + X[(F>>16)&15] + X[(F>>12)&15] + X[(F>>8)&15] + X[(F>>4)&15] + X[F&15] + 
X[(G>>28)&15] + X[(G>>24)&15] + X[(G>>20)&15] + X[(G>>16)&15] + X[(G>>12)&15] + X[(G>>8)&15] + X[(G>>4)&15] + X[G&15];

capt bullshot · « **Reply #1 on:** February 06, 2021, 12:58:09 pm »

Slightly off topic: This is one example where writing "(something)&0x0f" is easier to understand than "(something)&15", the code lines made me think twice - "yes, it really is the least significant nibble".

It'd be easier to maintain if the lines were written as loops instead of the manually unrolled loops here, I don't think the performance advantage is significant (the unrolled loops considered faster).

temp_A = A; i=0;
j=8;
while (j--) {i+=X[temp_A&0x0f]; temp_A>>=4; }
(rinse and repeat for B, C, ...

Syntax Error · « **Reply #2 on:** February 06, 2021, 02:50:20 pm »

Quote from: capt bullshot on February 06, 2021, 12:58:09 pm

temp_A = A; i=0;
j=8;
while (j--) {i+=X[temp_A&0x0f]; temp_A>>=4; }
(rinse and repeat for B, C, ...

Kudos.

Very much the same as my interpretation. I left the 0x0F as decimal 15, so +1 to you.

I was wondering if there is a valid reason for doing the code in this long form way? Is it easier on the Javascript engine? In the actual payload (a dense block of compacted JS ~180K bytes), array X is filled by what looks like an eliptic curve algorithm - a lot of Math.log and Math.exp function calls. Whoever wrote this was no recent boot camp graduate, so there must have been some deep thought involved. Alternatively, this was cut and pasted from Github under the assumption the code was already optimised? Either method, this client side JS burns a lot of CPU cycles.

oPossum · « **Reply #3 on:** February 06, 2021, 03:21:14 pm »

If X[] changes infrequently...

Code: [Select]

for (i = 0; i < (1 << 16); ++i) z[i] = X[(i >> 12) & 0xF] + X[(i >> 8) & 0xF] + X[(i >> 4) & 0x0F] + X[i & 0x0F];  // Do this once only when X[] changes

i = z[A >> 16] + z[A & 0xFFFF] + 
	z[B >> 16] + z[B & 0xFFFF] +
	z[C >> 16] + z[C & 0xFFFF] +
	z[D >> 16] + z[D & 0xFFFF] +
	z[E >> 16] + z[E & 0xFFFF] +
	z[F >> 16] + z[F & 0xFFFF] +
	z[G >> 16] + z[G & 0xFFFF];

This will be unkind to the cache. Probably lower performance if implemented in C/C++/Java.

Syntax Error · « **Reply #4 on:** February 06, 2021, 03:52:44 pm »

Quote from: oPossum on February 06, 2021, 03:21:14 pm

If X[] changes infrequently...
..etc..
This will be unkind to the cache. Probably lower performance if implemented in C/C++/Java.

Kudos and a chicken dinner

Indeed, X[] is populated once and then left alone. So turning it into an intermediate binary table is a cool technique.

You're right about the cache. But hey, this is written in Javacrap, so as long as the server doesn't die, no worries the client browser is sagging - or worse - the client can no longer connect as their mobile data budget just burned.

FYI The code where this originates, is some kind of page analytics that tracks user behaviour on a page. These are becoming ever popular in the e-commerce world, showing which images and <div>s have the most 'attention dwell'. So they can slam an IFRAME Ad' right into the DOM node you're viewing. This came to my attention due to a large number of blind websocket calls a browser was making. The developers assumed no-one blocks wss calls.

capt bullshot · « **Reply #5 on:** February 06, 2021, 04:50:32 pm »

Quote from: Syntax Error on February 06, 2021, 02:50:20 pm

I was wondering if there is a valid reason for doing the code in this long form way?

I've no idea what would be going on in a JS engine. To me it looks like code optimization by manually unrolling the loop. Sometimes I use this technique to gain a few cycles in an embedded C application.

Nominal Animal · « **Reply #6 on:** February 06, 2021, 07:17:58 pm »

Quote from: oPossum on February 06, 2021, 03:21:14 pm

If X[] changes infrequently...

Quote from: Syntax Error on February 06, 2021, 03:52:44 pm

Indeed, X[] is populated once and then left alone. So turning it into an intermediate binary table is a cool technique.

The question remains, what size of intermediate table yields the optimum performance. The possible sizes are 2⁴ⁿ, but only 2¹⁶ (two lookups per variable; 65536 entries), 2¹² (three lookups per variable; 4096 entries), and 2⁸ (four lookups per variable; 256 entries) make sense, because JavaScript numbers are represented using IEEE-754 Binary64, which has 53 bits of precision, and therefore can represent integers between -2⁵³ and 2⁵³ exactly.

Constructing any of these tables is trivial,

Code: [Select]

for (i = (1 << 16) - 1; i >= 0; i--) Z16[i] = X[i&0xF] + X[(i>>4)&0xF] + X[(i>>8)&0xF] + X[(i>>12)&0xF];
for (i = (1 << 12) - 1; i >= 0; i--) Z12[i] = X[i&0xF] + X[(i>>4)&0xF] + X[(i>>8)&0xF];
for (i = (1 << 8)  - 1; i >= 0; i--) Z8[i]  = X[i&0xF] + X[(i>>4)&0xF];

Technically, there are 7 variables with 8 nibbles each, or 56 nibbles, or 56×4=224 bits. To compute the result, we need to use 14 lookups into Z16[], or 19 lookups into Z12[], or 28 lookups into Z8[].

Unfortunately, we don't know anything about the underlying JavaScript interpreter or its cache lookup behaviour, and therefore have no real way to realistically compare their efficiencies. (You can microbenchmark them on one browser, but it will only apply to that version of that JS engine on that hardware architecture and OS.) We don't even have any way of comparing the lookup table access "cost" to that of a binary shift-and-mask.

I'm pretty sure both Z12[] and Z16[] perform generally similarly well, and purely for the lesser memory use I would prefer Z12[] over Z16[] use. However, because the Z12[] expression,

Code: [Select]

i = Z12[  A        & 0xFFF               ]
  + Z12[ (A >> 12) & 0xFFF               ]
  + Z12[((A >> 20) & 0xFF0) + (B & 0x00F)]
  + Z12[ (B >> 4)  & 0xFFF               ]
  + Z12[ (B >> 16) & 0xFFF               ]
  + Z12[((B >> 20) & 0xF00) + (C & 0x0FF)]
  + Z12[ (C >> 8)  & 0xFFF               ]
  + Z12[ (C >> 20) & 0xFFF               ]
  + Z12[  D        & 0xFFF               ]
  + Z12[ (D >> 12) & 0xFFF               ]
  + Z12[((D >> 20) & 0xFF0) + (E & 0x00F)]
  + Z12[ (E >> 4)  & 0xFFF               ]
  + Z12[ (E >> 16) & 0xFFF               ]
  + Z12[((E >> 20) & 0xF00) + (F & 0x0FF)]
  + Z12[ (F >> 8)  & 0xFFF               ]
  + Z12[ (F >> 20) & 0xFFF               ]
  + Z12[  G        & 0xFFF               ]
  + Z12[ (G >> 12) & 0xFFF               ]
  + Z12[ (G >> 20) & 0xFF0               ];

is utterly horrible compared to the Z16[] one,

Code: [Select]

i = Z16 [ A & 0xFFFF ] + Z16[ (A >> 16) & 0xFFFF ]
  + Z16 [ B & 0xFFFF ] + Z16[ (B >> 16) & 0xFFFF ]
  + Z16 [ C & 0xFFFF ] + Z16[ (C >> 16) & 0xFFFF ]
  + Z16 [ D & 0xFFFF ] + Z16[ (D >> 16) & 0xFFFF ]
  + Z16 [ E & 0xFFFF ] + Z16[ (E >> 16) & 0xFFFF ]
  + Z16 [ F & 0xFFFF ] + Z16[ (F >> 16) & 0xFFFF ]
  + Z16 [ G & 0xFFFF ] + Z16[ (G >> 16) & 0xFFFF ];

the Z16[] approach (as shown by oPossum above) should in usual circumstances be considered superior (ie. excluding code obfuscation and such). In particular, compare the bit shifts and masks: in the latter, there are 14 binary AND and 7 bit shift operations that are trivially verified; in the former, there are 23 binary AND and 16 bit shift operations that are rather complicated and non-obvious. Whether the cache effects and memory use "cost" is comparable to having many more operations is basically impossible to tell in a JavaScript environment, so we need to fall back to other ways of deciding what form to use, like maintainability/obvious correctness/etc.

BTW, I utterly despise "these" "code challenges". Who cares how fast you can spew out code that solves one particular problem; the best solution for those is always one that just emits the precalculated solution. Adding arbitrary rules like "but keep your left foot on top of your right hand while writing the code" is just clownish. I am of the opinion that when you solve a problem, you do it thoroughly, but only once (meaning you also record the solution, findings, and relevant details), and that takes time and effort. Anything else is just a pissing contest upwind.

SiliconWizard · « **Reply #7 on:** February 06, 2021, 07:23:58 pm »

Quote from: Syntax Error on February 06, 2021, 02:50:20 pm

I was wondering if there is a valid reason for doing the code in this long form way?

Except if it's auto-generated code with some tool, IMHO there is NEVER a good reason for writing such horrendous code.

Side note, I don't know enough Javascript to tell whether a better factored code would be less efficient. If so, that may be one reason. But that's still very ugly, and IMHO if you're using Javascript for performance code, something looks fishy there too. Oh well.

If the actual reason is none of the above, then it may just be a very bad developer. Yeah those exist.

Syntax Error · « **Reply #8 on:** February 06, 2021, 09:30:47 pm »

Chicken dinners (with fries) all round.

Quote from: Nominal Animal on February 06, 2021, 07:17:58 pm

BTW, I utterly despise "these" "code challenges". Who cares how fast you can spew out code that solves one particular problem; the best solution for those is always one that just emits the precalculated solution.

Agreed; it's programmers versus the code hacks. We know the LESS lines of code produced, the MORE thought has been put into the code. Sadly too many managers think the MORE lines of code hacked, the MORE productive their programmers are being. Which is the essence of my little Saturday afternoon post - thinking versus doing. Anyway, thank you for your in depth z16[] analysis - and we all learn something new on the EEVBlog.

Quote from: capt bullshot on February 06, 2021, 04:50:32 pm

I've no idea what would be going on in a JS engine. To me it looks like code optimization by manually unrolling the loop.

If unrolling the loop is the most efficient methodology in JS, then this kind of defeats the need for FOR, WHILE and DO loops. Why iterate when you can complicate? Maybe I should try and benchmark some examples.

Quote from: SiliconWizard on February 06, 2021, 07:23:58 pm

Except if it's auto-generated code with some tool, IMHO there is NEVER a good reason for writing such horrendous code.

Javascript is being used to offload so much functionality to the client side; often without regard for browser performance, code optimisation or, even the need to use those features. There's some real junk code in the wild, and it's not all JS framework bloat either.

RenThraysk · « **Reply #9 on:** February 06, 2021, 10:55:26 pm »

Personally would just create a helper function

Code: [Select]

const sub = (x) => (X[(x >> 28) & 15] + X[(x >> 24) & 15] + X[(x >> 20) & 15] + X[(x >> 16) & 15] + X[(x >> 12) & 15] + X[(x >> 8) & 15] + X[(x >> 4) & 15] + X[x & 15]);
var i = sub(A) + sub(B) + sub(C) + sub(D) + sub(E) + sub(F) + sub(G);

Removing the repeating code *.
Then test performance within the application.

* There is a vague hope the js compiler will inline sub() back in, and therefore remove call overhead and be virtually identical to the original verbose version, but only really care if that is happening or not if the code is too slow after testing.

Syntax Error · « **Reply #10 on:** February 08, 2021, 10:50:23 am »

@RenThraysk

kudosk

A helper function (or JS prototype) as you suggest is a no-brainer - in any computer language. Sadly, there are plenty of coders with no brain.

Code: [Select]

function codeHack(){ return void; }

Nominal Animal · « **Reply #11 on:** February 08, 2021, 01:24:51 pm »

Quote from: Syntax Error on February 08, 2021, 10:50:23 am

Sadly, there are plenty of coders with no brain.

Even sadder is, there are no code contests where the primary object is robustness (across all possible inputs) and maintainability, and efficiency at most a secondary objective.

Our code culture is emphasizing the wrong things to aspiring competitive programmers. It's sick, really.

SiliconWizard · « **Reply #12 on:** February 08, 2021, 02:58:57 pm »

Quote from: Nominal Animal on February 08, 2021, 01:24:51 pm

Quote from: Syntax Error on February 08, 2021, 10:50:23 am
Sadly, there are plenty of coders with no brain.
Even sadder is, there are no code contests where the primary object is robustness (across all possible inputs) and maintainability, and efficiency at most a secondary objective.

Unfortunately, that's right.

Quote from: Nominal Animal on February 08, 2021, 01:24:51 pm

Our code culture is emphasizing the wrong things to aspiring competitive programmers. It's sick, really.

That assertion would probably require some elaboration as to what competitive means.

For all we know, writing robust code could be seen as competitive if that was considered a major goal.
So, that's being "competitive" in contexts where "productivity" is favored. But now, we also need to define productivity when it comes to software development. These days, it's usually a very simple metric which measures how fast a given developer can implement a given feature. That's a very short-sighted approach. Beyond the code quality issues, the long-term cost can actually be much higher than taking a bit more time writing better code.

Which eventually gets us off-topic, as I think this stems from our whole current economic model based on short-term results and perpetual debt, which we can extend to software design in particular, and even engineering in general. It translates to "technical debt", which, in our model, like financial debt, doesn't matter so much if the short-term results are good.

Nominal Animal · « **Reply #13 on:** February 08, 2021, 04:10:36 pm »

Quote from: SiliconWizard on February 08, 2021, 02:58:57 pm

Quote from: Nominal Animal on February 08, 2021, 01:24:51 pm
Our code culture is emphasizing the wrong things to aspiring competitive programmers. It's sick, really.
That assertion would probably require some elaboration as to what competitive means.

You're right. I probably should have written "Our code culture is emphasizing the wrong things to aspiring programmers interested in honing and comparing their skills against others'." or something like that: I meant "competitive" as in "interested in competitions".

I do not see any problem in a person being competitive (in the sense of being interested in competing against others); I consider that a natural and often healthy aspect of many humans. I would just prefer the object/goal of the competition to be better/healthier; more suited to the actual needs of the industry and society in general.

As it is, I've only encountered code competitions where the goal is minimal runtime (on a specific hardware and compiler and compiler options), or minimal code size (code golf). Neither alone is a healthy metric.

Quote from: SiliconWizard on February 08, 2021, 02:58:57 pm

These days, [productivity is] usually a very simple metric which measures how fast a given developer can implement a given feature. That's a very short-sighted approach. Beyond the code quality issues, the long-term cost can actually be much higher than taking a bit more time writing better code.

So very true, and so very depressing, in my opinion.

In the context of the question at hand, the above means that we really should ask a bit more general question: how should one write this kind of code in JavaScript?

I believe we (as in members partaking in this thread) have pieced together a reasonable answer in two parts:
1. Use a helper function per variable, to minimize code duplication
2. Use an intermediate table of larger size, to reduce lookups and code size at the cost of a larger table

My own suggestion would be to precalculate 65536-entry intermediate table z as shown above by oPossum, with helper function
function zmap(x) { return z[x & 0xFFFF] + z[(x >> 16) & 0xFFFF]; }
(with the hope that any JS engine compiles/interprets the calls to that function into efficient array lookups), and refactor the code that assigns the values to variables A through G so that instead of

Code: [Select]

    var A = ...;
    var B = ...;
    var C = ...;
    var D = ...;
    var E = ...;
    var F = ...;
    var G = ...;
    var i = zmap(A) + zmap(B) + zmap(C) + zmap(D) + zmap(E) + zmap(F) + zmap(G);

we'd have

Code: [Select]

    var i = 0;

    i += zmap(...);
    i += zmap(...);
    i += zmap(...);
    i += zmap(...);
    i += zmap(...);
    i += zmap(...);
    i += zmap(...);

and, if possible depending on the "..." code, merge the i += zmap(...); lines into a for loop.

My core point here is that we don't really have enough information here to suggest an exact form, because the suggested form depends on the surrounding code, and we don't know it. (Syntax Error obviously omitted the full code to avoid copyright issues, so I'm not blaming them at all.)

So, to someone like me, who really wants to help others write "better" code – "better" as in choosing an appropriate algorithm, and implement it in a robust, efficient manner on the given architecture/language – the "challenge" part in the title is quite depressing. It is like hearing high-school students interested in sports discuss what kind of anabolic steroids they think they're taking, the dosages, and where on the shady net they're buying the stuff; all without medical (or even adult) advice, or care about the long-term effects of such intake will have.

It is interesting, but to even try to sway the thread into a, uh, "healthier" path, I feel like I'm butting in and preaching to people who really don't want to hear it. Apologies.

RenThraysk · « **Reply #14 on:** February 09, 2021, 07:44:48 pm »

If this was hot code, and took too long (gut feeling it's not), could use C compiling to wasm.

WebAssembly SIMD has been enabled by default in the latest (v88) Chrome, but still just in nightlies for Firefox so can't quite rely on it yet.

https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md

PlainName · « **Reply #15 on:** February 10, 2021, 09:39:51 am »

Quote

BTW, I utterly despise "these" "code challenges". Who cares how fast you can spew out code that solves one particular problem;

I think it demonstrates skill in the art. The solution may be rubbish, in quality terms, but actually getting a solution in a very short time does require considerable knowledge and skill.

SiliconWizard · « **Reply #16 on:** February 10, 2021, 02:56:52 pm »

Quote from: dunkemhigh on February 10, 2021, 09:39:51 am

Quote
BTW, I utterly despise "these" "code challenges". Who cares how fast you can spew out code that solves one particular problem;

I think it demonstrates skill in the art. The solution may be rubbish, in quality terms, but actually getting a solution in a very short time does require considerable knowledge and skill.

It certainly requires skills, but I don't think these particular skills are actually desirable for software development.

I'd even venture that people who write quality code would usually suck at such contests. Except probably for a very few exceptions, I think the two skill sets are pretty much orthogonal.
I'd make a parallel with people that are very good and very fast at mental calculation.

Nominal Animal · « **Reply #17 on:** February 10, 2021, 03:59:14 pm »

Yes, I admit: I suck at code challenges. I suck at many other things as well so meh.
It does not change the fact that I believe we need better quality code, not more of it, nor any bubblegum-and-spittle ramshackle agglomerations that work with one perfect input and silently fail producing garbage for anything else.

Circling back to the original topic, perhaps there is interest in looking at the different patterns of generating the intermediate table?
For a table only generated once it does not matter, but the alternate patterns could come in handy at some point. If not for anything else, then for recognizing them in code written by somebody else.

The obvious one as shown by oPossum, but edited to extend the X array in place, is

Code: [Select]

for (i = 16; i < 65536; i++)
    X[i] = X[i & 0xF] + X[(i >> 4) & 0xF] + X[(i >> 8) & 0xF] + X[(i >> 12) & 0xF];

Because this extension happens to be from N to N^{2^k} entries (with N=16 and k=2), we can use a doubling method to simplify the loop body, without changing the number of iterations:

Code: [Select]

for (i = 16; i < 256; i++)
    X[i] = X[i & 0xF] + X[(i >> 4) & 0xF];
for (i = 256; i < 65536; i++)
    X[i] = X[i & 0xFF] + X[(i >> 8) & 0xFF];

If we add the complexity of the code a bit, we can extend from N to N^k entries, where in the inner loop we add a constant value to consecutive entries in the array to build the new entries. The total number of inner loop iterations stays the same, with minimal added work outside the innermost loop; essentially reducing one of the operands inside the innermost loop to a constant. This does not really help with JavaScript, but some other languages (and their interpreters or compilers) can vectorize such loops, making the initialization much more efficient:

Code: [Select]

size = 16;
for (k = 1; k < 4; k++) {
    dst = size;
    for (digit = 1; digit < 16; digit++) {
        digitvalue = X[digit];
        for (src = 0; src < size; src++, dst++)
            X[dst] = X[src] + digitvalue;
    }
    size = dst;
}

Essentially, the last version is useful in many combinatorial cases, as it considers the original array size as "one digit" – here, a hexadecimal digit – with each while loop iteration extending the array size by one digit. When N is not a power of two, and modulo N (% N) is a relatively costly operation, the last version really shines. The downside is that it is non-obvious,

Just for fun, I wrote a test program in GNU C, using uint32_t __attribute__((vector_size (16*4))) vector type, so that GCC can easily vectorize the code to any SIMD processor (and still generate acceptable code for non-SIMD processors from the exact same source), and it indeed cuts the CPU cycles used to initialize the array to about a quarter compared to the original version.

Syntax Error · « **Reply #18 on:** February 11, 2021, 05:17:47 pm »

Quote from: Nominal Animal on February 10, 2021, 03:59:14 pm

Just for fun, I wrote a test program in GNU C, using uint32_t __attribute__((vector_size (16*4))) vector type, so that GCC can easily vectorize the code to any SIMD processor (and still generate acceptable code for non-SIMD processors from the exact same source), and it indeed cuts the CPU cycles used to initialize the array to about a quarter compared to the original version.

I think that goes under the heading of above and beyond

The Intermediate table, whether used for bitwise cycle saving or, just being nice to enterprise network managers, is a methodology that should be part of the Developer's "101 toolkit". I've written a few intermediate database tables with the objective of turning mind bending (outer join hell) SQL queries into simpler scalar lookups. But as you must appreciate, this is often an uphill struggle convincing key opinion leaders (idiot managers) that this is more important than having Facebook integration. The mantra is, never calculate twice when you can intialise once.

Maybe using the term "challenge" might have been hackathonish, but really it was subtext for, what the hell is wrong with this code and just how much better can you do it? Well, it's seems most of you guys can do it a whole lot better. For free.

Nominal Animal · « **Reply #19 on:** February 11, 2021, 05:56:36 pm »

Quote from: Syntax Error on February 11, 2021, 05:17:47 pm

The Intermediate table, whether used for bitwise cycle saving or, just being nice to enterprise network managers, is a methodology that should be part of the Developer's "101 toolkit".

Fully agreed!

Quote from: Syntax Error on February 11, 2021, 05:17:47 pm

I've written a few intermediate database tables with the objective of turning mind bending SQL lookups into simpler scalar lookups. But as you must appreciate, this is often an uphill struggle convincing key opinion leaders (idiot managers) that this is more important than having Facebook integration.

That hits me in the nerve: I've broken myself twice in very similar struggles.

As to the GCC C code, it isn't complicated at all:

Code: [Select]

typedef  uint32_t  u32x16  __attribute__((vector_size (16*4)));
#define U32X16(v)  { v, v, v, v, v, v, v, v, v, v, v, v, v, v, v, v }

static void extend(u32x16 *const data)
{
    u32x16  *dst = data + 1;
    u32x16  *const dst_end = data + 4096;

    while (dst < dst_end) {
        u32x16  *const src_end = dst;

        for (size_t  k = 1; k < 16; k++) {
            const u32x16  c = U32X16(data[0][k]);
            u32x16       *src = data;

            while (src < src_end)
                *(dst++) = *(src++) + c;
        }
    }
}

You first allocate a sufficiently aligned array of uint32_t's, say
uint32_t X[65536] __attribute__((aligned (64)));
then fill the first 16 entries, and then call
extend((u32x16 *)X);

The thing I don't like about it, is how non-obvious it is. That is, unless you know about this pattern, or develop it yourself (in the three steps I outlined above, like I did), you're unlikely to realize it is just an "optimized" version of

Code: [Select]

void extend(uint32_t *data)
{
    size_t  i;

    for (i = 16; i < 256; i++)
        data[i] = data[i & 0xF] + data[(i >> 16) & 0xF];

    for (i = 256; i < 65536; i++)
        data[i] = data[i & 0xFF] + data[(i >> 8) & 0xFF];
}

This kind of complexity leads to increased maintenance cost: it is quite easy to break that sort of code, unless you know exactly what the code is trying to accomplish.

Which gets me into one of my favourite rants, about how comments should describe the intent of the code (as opposed to what the code does, as that can be read from the code itself directly).


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Ten-minute code-crunch challenge (Read 2862 times)

Share me