Pfhorrest wrote:This is also why difficulties with threading programs for multiprocessing always struck me as counterintuitive, because in my mind -- not really understanding how exactly imperative languages get translated into the series of boolean logical functions that everything boils down to on the hardware level -- it seems like the computer itself must be inherently functional, so why doesn't it automatically just evaluate all the first tier of sub-expressions in parallel
It does. You do get a lot of low-level parallelism "for free." Even 20 years ago, CPUs were a long ways away from the abstraction of executing one instruction at a time per CPU core.
- First there's the pipelines: executing an instruction involves a sometimes deep sequence of sub-steps, from decoding it to figure out what kinds of resources will be needed (integer math, RAM/cache fetch, RAM/cache store, I/O busses and locking, register manipulation, etc.), to actually using those CPU sub-units to do what they need to do. There's no need for one instruction to make it all the way through the pipeline before the next one can start. There can be many instructions in a pipeline at once, at different stages.
- Then there's actual parallelism: multiple pipelines and multiple execution units of a given type. Sometimes the CPU can tell right away that 2 instructions won't interfere with each other's results, and thus they can be executed entirely in parallel. E.g., if the CPU has at least 2 integer math units, it can run two pipelines, both doing instructions that need integer math. There's a large element of resource scheduling: deciding which instructions to put into which pipeline, to minimize "pipeline stalls" in which an instruction is "waiting" for a given execution unit type to become available. (There are of course other stalls, like waiting for RAM cache, which also need to be minimized by advanced planning on the part of the CPU.)
Sometimes the first instruction
may or may not affect the second and third, in which case the CPU may
speculatively execute the second and third instructions but not actually commit to their results until the first instruction finishes and the CPU finds out whether doing the second one was a waste of time. A twist on this is
branch prediction, in which the CPU sees a conditional branch instruction ("if x = y then goto z"), guesses which way it's going to go, and speculatively starts executing stuff at location z, again not committing to it until it has finished determining the result of "x = y".
- And sometimes you get parallelism by having your compiler be smart enough to explicitly ask for it. For example, "take each of these 25 thousand bytes that represent an image, and fade each pixel a little step toward black by subtracting a small number from each byte." That's where technology with names like MMX, SSE or Altivec comes in. You say "loop through all the bytes of this memory" and your compiler says "obviously the order doesn't matter, so just set up the MMX unit to do this in large chunks of bytes at once."
...Though none of this has much to do with functional programming
per se.