Approximate speed classes of programming languages

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

User avatar
Jplus
Posts: 1692
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Approximate speed classes of programming languages

Postby Jplus » Sat Apr 19, 2014 12:10 am UTC

(This could have been a fleeting thought, but I figured it might be considered useful enough to give it its own place.)

Note: NOT a holy wars thread.

Since performance differences between languages can be quite subtle I thought it might be more meaningful to just roughly group languages by their approximate speed. First the TLDR version, more detailed explanation below.

Tier 1 (fast-fast): C++, C, Ada, Fortran, ATS, Rust.
Tier 2 (fast): Java, LuaJIT, Julia, Haskell, Scala, Ocaml, C#, Go, Common Lisp (SBCL), Pascal, F#.
"Fringe" (fast-ish): Clojure, Racket, JavaScript (V8 and others), Dart, PyPy.
Tier 3 (intermediate): Erlang HiPE.
Tier 4 (slow): Erlang, PHP, Smalltalk (VisualWorks), Lua, Perl, CPython, Hack, Ruby.
Tier 5 (very slow): R, Matlab, Octave.

See here for an interesting suggestion by KnightExemplar for "tier 0" (hardware acceleration languages) and "tier -1" (hardware description languages). The languages above are all of a different category, i.e. high-level languages. See subsequent posts for discussion.

If you have questions or comments, or if you know about elaborate benchmarks that could help to refine this categorization, please post. I will edit this upper part of the post to keep it in sync with new insights.


So I decided to waste some of my time and use the Computer Language Benchmarks Game (CLBG) to group programming languages by approximate orders of speed. To do this I identified the "flat" parts of the graph that results if you merge the boxplot charts on the "Which programs are fastest?" page. I averaged this across the four processor setups. Position in a tier depends on relative speed but shouldn't be considered important. I've used C++ as the reference speed as it's currently the all-round winner. Below is what I ended up with.

Tier 1: maximally fast (about as fast as C++)
C++, C, Ada, Fortran. (Also Rust as of November 2015.)
C++ is probably only leading this group (by a small margin) because it receives the most effort from program contributors.

Tier 2: fast (about 2-3 times slower than C++)
Java, Haskell, Scala, Ocaml, C# (Mono), Go, Common Lisp (SBCL), Rust, Pascal, F# (Mono).
This is the largest group; I don't know whether that's because of a bias in the language selection of the CLBG or because of a trend in modern programming languages. Microsoft's own .NET implementation of C# and F# would probably also fall in this group (consider this).

"Fringe" between tiers 2 and 3 (about 4-5 times slower than C++)
Clojure, Racket, Dart.
Based on the graphs you could hold endless debates on whether these languages belong in tier 2 or tier 3, so I put them in neither.

Tier 3: intermediate (about 10 times slower than C++)
Erlang (HiPE).
Apparently something is unique about Erlang.

Tier 4: slow (about 20-40 times slower than C++)
The Perl family: PHP, Perl, Python, Ruby.
After tier 2 this is the group that languages seem most likely to belong to, see below.

The CLBG also includes a few languages which are only tested on one or two processor setups, so they can't be compared to the others in an entirely fair way. This includes JavaScript (V8), which probably belongs in the "fringe", as well as ("plain") Erlang, Smalltalk (VisualWorks), ("plain") Lua and Hack, all of which seem to belong to tier 4.

There are a few languages which used to be in the CLBG but not anymore: from my memory, ATS would be in tier 1 and LuaJIT would be in tier 2 (this is still supported by LuaJIT's own benchmarks). I think Forth would be in tier 2 as well, but I'm not sure whether the latter used to be in the CLBG as well or whether I'm only basing that on my own private benchmarks. SpiderMonkey also used to be in the CLBG; while it was not as fast as V8, I think it would still be a "fringe" implementation and I expect the same to be true of Nitro/SquirrelFish and Chakra.

Finally I know from personal experience that R and Octave are still much slower than the tier 4 languages (in fact Octave itself seems much slower than R). I've heard conflicting accounts about the same (not) being true of Matlab.

I've merged the "CLBG verified" tiers with the other languages to the best of my knowledge to produce the TLDR version at the top of this post. Additional benchmarks contributed in the posts below are also taken into account.
Last edited by Jplus on Tue Apr 05, 2016 10:46 pm UTC, edited 7 times in total.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

User avatar
ahammel
My Little Cabbage
Posts: 2135
Joined: Mon Jan 30, 2012 12:46 am UTC
Location: Vancouver BC
Contact:

Re: Approximate speed classes of programming languages

Postby ahammel » Sat Apr 19, 2014 12:28 am UTC

I collected some data about the average SLOC of the programmed that were used to produce those benchmarks a while back. I'll dig them out later. It was pretty interesting, as I recall.

Ada is the talkiest of the bunch.
He/Him/His/Alex
God damn these electric sex pants!

User avatar
LucasBrown
Posts: 293
Joined: Thu Apr 15, 2010 2:57 am UTC
Location: Poway, CA

Re: Approximate speed classes of programming languages

Postby LucasBrown » Sat Apr 19, 2014 12:51 am UTC

It would be interesting to see where Python interpreted via PyPy falls. Is there any data on that?

User avatar
ahammel
My Little Cabbage
Posts: 2135
Joined: Mon Jan 30, 2012 12:46 am UTC
Location: Vancouver BC
Contact:

Re: Approximate speed classes of programming languages

Postby ahammel » Sat Apr 19, 2014 2:55 am UTC

Ah, here's my chart:
Spoiler:
Image
Conclusions:

  • I suck at R
  • ATS and Ada are very verbose
  • I suspect that some cases (Haskell and Clojure, for instance) optimizing to compete in a benchmark involves adding lots of code, and idiomatic code is much more concise.
  • I guess you can do the 'optimize by coding lots' thing more with SBCL than other lisps? Seriously, why is Common Lisp third most verbose?
He/Him/His/Alex
God damn these electric sex pants!

User avatar
Jplus
Posts: 1692
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Approximate speed classes of programming languages

Postby Jplus » Sat Apr 19, 2014 7:04 am UTC

@LucasBrown: I seem to recall the PyPy folks have been running their own benchmarks. If those are decent enough they might provide us sufficient information to put PyPy in one of the tiers. Unfortunately right now I don't have time to look into that. I seem to recall the speed boost isn't as impressive as the boost of LuaJIT over plain Lua, so maybe PyPy would be in the "fringe".

@Alex: thanks for the nice (quirky) graph. I find it especially interesting that languages seem to differ much less in verbosity than in speed. The CLBG used to have a similar graph in log scale and it was almost flat.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

Derek
Posts: 2123
Joined: Wed Aug 18, 2010 4:15 am UTC

Re: Approximate speed classes of programming languages

Postby Derek » Sat Apr 19, 2014 8:49 am UTC

Your analysis seems to be about right, from my impressions. I think the rough explanation is: Tier 1 is the languages that compile to optimized machine code, tier 2 is the languages that compile to virtual machine byte code, which is then JITed at runtime. Tier 4 is the interpreted scripting languages.

Now I'm curious to see a graph of verbosity versus speed.

User avatar
Jplus
Posts: 1692
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Approximate speed classes of programming languages

Postby Jplus » Sat Apr 19, 2014 10:36 am UTC

Actually Haskell, Ocaml, Go, Rust and Pascal are entirely compiled to machine code before execution. That's about half of the languages in tier 2. I'm seeing a much more convincing difference between tier 1 and tier 2: none of the languages in tier 1 has garbage collection, all of the languages in tier 2 do (Rust is a bit of an exception in this regard but it was probably benchmarked with GC on and it has a few other reasons to not be in tier 1).

The languages in tier 4 are still byte-compiled and executed by a virtual machine if I'm right, but you are right that (the lack of) JIT compilation distinguishes tier 4 from tiers 2 and 3. Only tier 5 is not even byte-compiled, I think (and hope).

The CLBG already has a graph of verbosity vs. speed.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

igouy
Posts: 8
Joined: Sat Apr 19, 2014 3:53 pm UTC

Re: Approximate speed classes of programming languages

Postby igouy » Sat Apr 19, 2014 4:05 pm UTC

The CLBG already has a graph of verbosity vs. speed.

Code-used Time-used Shapes provides additional comparison data -- Shortest C++

wumpus
Posts: 485
Joined: Thu Feb 21, 2008 12:16 am UTC

Re: Approximate speed classes of programming languages

Postby wumpus » Sun Apr 20, 2014 6:01 pm UTC

I'd expect pypy to be roughly similar to lauJIT and similar. Note that pypy isn't as flexible as python (my issue is that you can wrap C/C++ libraries with swig and easily run them in python, haven't see n a similar means in pypy).

Forth is a funny case. It is old enough to expect all the optimization to be done by the programmer, and appears likely to trip all over a pipeline. It should be extremely fast on old/small/low power chips (level 2, possibly level 1 on a FPGA soft-cpu that has rudimentary stack operations), but I expect it to be at least a tier lower on chips that rely on pipelines.

Nyktos
Posts: 138
Joined: Mon Mar 02, 2009 4:02 pm UTC

Re: Approximate speed classes of programming languages

Postby Nyktos » Sun Apr 20, 2014 6:33 pm UTC

PyPy's not as fast as LuaJIT. Eyeballing PyPy's benchmarks supports the idea that "fringe" is probably the right category for it.

Note that pypy isn't as flexible as python (my issue is that you can wrap C/C++ libraries with swig and easily run them in python, haven't see n a similar means in pypy).
cffi, which was created by the PyPy team, is the right way to do this. It works on CPython as well. PyPy does also have some support for imitating CPython's API, but I have no idea how well it may work on SWIG-generated code.

User avatar
Jplus
Posts: 1692
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Approximate speed classes of programming languages

Postby Jplus » Mon Apr 21, 2014 2:46 pm UTC

Thanks for checking the PyPy benchmarks, Nyktos. They claim to be 6.3 times faster than CPython (2.7), and since CPython (version 3) is about 40 times slower than C++, it seems reasonable to assume that PyPy is about 6 times slower than C++, which would put it in "fringe" indeed. I'll update the opening post accordingly.

I tried to find some good benchmarks that compare Forth with C or C++. The best I could find where two fairly old pages: this one (there's a link to a barplot about halfway on that page) and this one. Both seem to suggest that an older version of Gforth used to be about 6 times slower than C, if you did not switch on the -fast option. I think that's sufficient evidence to put it in the "fringe". The first source also shows that several implementation exist that clearly belong in tier 2, as well as several that belong in tier 3 and even one that belongs in tier 4. So Forth appears to be just terribly diverse. I'll put iForth in tier 2 and Gforth in the "fringe".
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

KnightExemplar
Posts: 5489
Joined: Sun Dec 26, 2010 1:58 pm UTC

Re: Approximate speed classes of programming languages

Postby KnightExemplar » Tue Apr 22, 2014 12:41 am UTC

Tier -1: Hardware Languages
Verilog
VHDL

VHDL / Verilog are technically hardware description languages. So obviously, VHDL / Verilog are the fastest. (Assembly is eventually run on a computer, isn't it?). Anything and everything that ever runs on a computer is "interpreted" by some machine that was written in some hardware description language (like Verilog or VHDL).

If you wanted to write a competitive Bitcoin Miner, you'd probably do it in VHDL / Verilog, "compile" down into a netlist, simulate the chip a few times on a computer... and then hand it off to a Fabrication Lab for mass production.

Bitcoin miners, and the Deep Blue Chess AI are written at this level. FPGAs are the hardware chips you use to most efficiently process this kind of code, although they're obviously slower than a custom chip made at a fabrication lab.

Tier0: Specialized Languages
OpenCL, CUDA, SIMD Assembly (ie: ARM NEON, x86 MMX, x86 SSE, AMD64 AVX assembly instructions), AES Assembly Instructions, Intel QuickSync, AMD HSA.

It should be noted that "Specialized" languages are significantly faster than C / C++ and even Assembly when used correctly. These languages are so specialized, that outside of their niche, they are impractical to use. (Either they have no speed advantage in "general purpose" applications, or are physically impossible to use outside of their respective niche). Furthermore, these languages run on specialized hardware.

Spoiler:
OpenCL runs best on AMD Graphics Cards.
CUDA runs best on NVidia Graphics Cards
ARM Neon is your SIMD for ARM Assembly.
AVX is the latest in Intel / AMD SIMD instructions.
The AES Instruction Set is blazingly fast.
Quick Sync is the fastest way to encode video on Intel chips starting with Sandy Bridge (2nd Gen Core i3/i5/i7 series and later)
AMD HSA is AMD's new push on their Kaveri line of chips (ie; A10-7850K)


Since SIMD is a Tier-0 "language" (at least, a subset of assembly language), there are issues when your slower languages compile into SIMD. For example, if you use Matlab or NumPy (both very slow languages...), you'll get faster-than-C performance whenever you hit a corner case where SIMD optimizations are enabled. I would argue that Hardware Load Balancers fall into this category (even if they are "programmed" with web GUIs).

In fact, I bet you that Matlab or optimized NumPy will execute better than general-purpose C code. SIMD means every "add" can perform 16 different additions per clock tick. A NumPy program compiling to SIMD AVX will beat out any general-purpose C code. Period.

Obviously, properly written C or C++ code using SIMD intrinsics (or an SIMD library like BLAS) will be faster. (With a better idea of how memory is laid out, you can more reliably ensure that the cache is properly "pipelined". The memory system often becomes the bottleneck with SIMD code). But the key here is recognizing that it isn't "C", or "Python" that is making things fast... it is because you're accessing the SIMD supercomputer that is now standardized on most consumer hardware. (even cellphones generally have access to ARM NEON, the SIMD instruction set for ARM)

----------------------------

Basically, Tier0 means using a specialized CPU for a particular computation. "Tier -1" means creating a fully customized specialized computer for a particular problem. Naturally, these approaches result in much faster execution than simply using a general purpose computer.

I guess "Tier -2" would be Quantum computing, which is about taking advantage of newly discovered laws of physics and computational theory. But arguably, these will have to be fabricated in some way some how... so it really is just an extension of "Tier -1": inventing new hardware.
First Strike +1/+1 and Indestructible.

Great Justice
Posts: 54
Joined: Sun Aug 15, 2010 5:28 am UTC

Re: Approximate speed classes of programming languages

Postby Great Justice » Tue Apr 22, 2014 4:27 am UTC

That's more like comparing hardware features rather than implementations of languages on the same hardware.
Many compilers let you choose which extensions/opcodes to use or disable if the arch supports it. Remember when the x87 was an optional coprocessor?

I suppose you could throw in microcode, though. It's like firmware for your typical CPU to "emulate" an architecture (even machine code is an abstraction now) and take care of internal tasks. But again, not really usable and tied to a particular chip revision.

As far as actual comparisons go, I haven't analyzed the programs CLBG use; are there any conclusions to draw about which language is best suited for what types of tasks?
An "overall" is good, but kind of wasteful to have to implement your program in 40 languages to see which you should be using for any part.
It usually isn't Congress or the State that tries to abridge free expression or free speech, [...] actually, in the present situation, the main threat to expression comes from public opinion.
~Christopher Hitchens

User avatar
Jplus
Posts: 1692
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Approximate speed classes of programming languages

Postby Jplus » Tue Apr 22, 2014 7:57 am UTC

@KnightExemplar: thanks for the interesting suggestion. I'll link to your post from the opening post. I think tier 0, as you already suggested, is mostly orthogonal to the choice of a general programming language (you can use SIMD instructions in Python or Octave if you want, or combine a Fortran program with CUDA code). Tier -1 doesn't seem like a real alternative to tiers 1-5 (to me) except in the extreme cases that you mentioned. By the way, I wonder whether it would be possible to benchmark a tier -1 "program" against a program from a higher tier in a meaningful way?

@Great Justice: there's some differentation between the minibenchmarks. Some are multi-threaded, others single-threaded. Some are memory-intensive, others aren't. Some are mostly numeric, others text-based. You would have to take the summary data, manually annotate the benchmarks with the properties you're interested in and do your own analysis to compare languages specifically by such properties, though. Note that the summary data only include the fastest implementation for each language+benchmark combination; if you want more complete statistics you can go here (one page for each language, includes problem size but not CPU load) or here (one page for each benchmark, includes CPU load but only a single problem size).
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

KnightExemplar
Posts: 5489
Joined: Sun Dec 26, 2010 1:58 pm UTC

Re: Approximate speed classes of programming languages

Postby KnightExemplar » Tue Apr 22, 2014 2:45 pm UTC

Jplus wrote:Tier -1 doesn't seem like a real alternative to tiers 1-5 (to me) except in the extreme cases that you mentioned. By the way, I wonder whether it would be possible to benchmark a tier -1 "program" against a program from a higher tier in a meaningful way?


(Tier 0: SIMD Instructions) The Intel Sandy Bridge E5-2650 (manufactured on a 32nm process, 125Watts, 8 cores) has been benchmarked at 129 kHash/s for Litecoin (Scrypt). (Intel Ivy Bridge is more power efficient, but I can't find any numbers on an Ivy Bridge Xeon.)

(Tier 0: GPGPU) The ATI R290x (28nm, 300 Watts) has been benchmarked at 1000 kHash/s on SCrypt.

(Tier -1) The KnCMiner Mini-Titan (28nm Process, 500 Watts) has been benchmarked at 150 MEGAHashes / s for Litecoin (Scrypt).

Sooooo... going for a "work done per watt", we're looking at somewhere on the order of 29000% faster than a Server-grade CPU. Mind you, the SCrypt algorithm was designed to be harder to do with ASICs. Something like MD5 (ie: Bitcoin) would be much much much faster than SCrypt (When implemented in ASICs, BTC hashing is thousands of times faster, as opposed to "only" hundreds of times faster).

BTW: I'm wondering where to put a Network Load Balancer. They're somewhere in the realm of -1 or 0. I know Cisco makes ASICs specifically for IP Network processing. (When you have a fat 10Gbps pipe, you will want a computer that can process 10Gbps worth of data and either route it, load-balance it, or perform security scans). And btw: they get expensive. Your standard CPU is not going to be able to process 10Gbps worth of packets, while encrypting, load balancing and routing it.

Tier -1 doesn't seem like a real alternative to tiers 1-5 (to me) except in the extreme cases that you mentioned.


Naturally. The further up the chain you go, the less and less practical it gets. Its typically impractical to rework algorithms to Tier0... in an SIMD-friendly way. Let alone the more difficult Tier-0 stuff (OpenCL or CUDA). But for those who need maximum performance, the solution is to invent a new computer.
First Strike +1/+1 and Indestructible.

igouy
Posts: 8
Joined: Sat Apr 19, 2014 3:53 pm UTC

Re: Approximate speed classes of programming languages

Postby igouy » Wed Apr 23, 2014 2:09 am UTC

Jplus wrote:I tried to find some good benchmarks that compare Forth with C or C++.


I tried providing you with appropriate URLs for the Internet Archive, WayBack Machine -- but the message is flagged as spam by the message board.

Try the Internet Archive, WayBack Machine using

shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all

for October 19, 2008.

EvanED
Posts: 4324
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Approximate speed classes of programming languages

Postby EvanED » Wed Apr 23, 2014 3:35 am UTC

igouy wrote:I tried providing you with appropriate URLs for the Internet Archive, WayBack Machine -- but the message is flagged as spam by the message board.
Users with < 5 posts can't post links.

I randomly chose this version of the site (I didn't notice the specific date recommendation until later, but mine is newer anyway); the 25th-75th percentile range of GNU GForth relative to C was 11.09--21.58, with a median of 16.34. That would put it on the fringe of intermediate/slow. bigForth somehow managed 2.25--4.77 with a median of 3.7, making it on the fringe between slow and fringe. :-)

User avatar
Jplus
Posts: 1692
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Approximate speed classes of programming languages

Postby Jplus » Wed Apr 23, 2014 8:40 am UTC

Thanks to both of you. To be honest I'm a bit lost about what to do with the Forth implementations at this point. The old CLBG scores seem at odds with the other two benchmarks that I linked to before. According to the latter two sources GForth should be in the fringe between fast and intermediate while according to the former it should be in the fringe between intermediate and slow. Shall I put it in tier 3 (intermediate), on the "slower" side of Erlang HiPE, and remove the other Forth implementations for lack of sufficient information?
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

igouy
Posts: 8
Joined: Sat Apr 19, 2014 3:53 pm UTC

Re: Approximate speed classes of programming languages

Postby igouy » Wed Apr 23, 2014 4:43 pm UTC

EvanED wrote:but mine is newer anyway);

But the data shown is not newer -- it's from the same mid-2008 measurements; but fewer of the language implementations were being shown for that machine in 2012, so less data was archived by the Internet Archive.

The earliest of those boxplot summaries in the Internet Archive seems to be 15 May 2009:

shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all&box=1

igouy
Posts: 8
Joined: Sat Apr 19, 2014 3:53 pm UTC

Re: Approximate speed classes of programming languages

Postby igouy » Wed Apr 23, 2014 4:57 pm UTC

Jplus wrote:The old CLBG scores seem at odds with the other two benchmarks…

I am biased but… :-)

complang.tuwien.ac.at --"The benchmarks we used were the ubiquitous Sieve (counting the primes <16384 a thousand times); bubble-sorting (6000 integers) and matrix multiplication (200*200 matrices) … we computed the 34th Fibonacci number using a recursive algorithm…"

dada.perl.it -- Note that a weighting is applied to the CRAPS![TM] scoring. Note that those are the old Doug Bagley programs that the benchmarks game started with in 2004, before replacing them with other tasks in 2005.

benchmarksgame.alioth.debian.org/dont-jump-to-conclusions.php#history

User avatar
Jplus
Posts: 1692
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Approximate speed classes of programming languages

Postby Jplus » Fri Apr 25, 2014 3:12 pm UTC

I decided to just remove all Forth implementations because I felt there was no certain way to do justice to them (not even GForth).

I'm itching to run my own copy of the benchmarking game so I can just compare whatever I want, but I don't have time.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

igouy
Posts: 8
Joined: Sat Apr 19, 2014 3:53 pm UTC

Re: Approximate speed classes of programming languages

Postby igouy » Fri Apr 25, 2014 5:15 pm UTC

Jplus wrote:I'm itching to run my own copy of the benchmarking game so I can just compare whatever I want, but I don't have time.


Me neither -- but the provided Python measurement script "bencher" doesn't need to sleep, and creates a rolling-log file, and (because it writes a data file after measuring each program) can be halted and restarted anytime without much loss... etc

benchmarksgame.alioth.debian.org/download/bencher.zip

User avatar
Jplus
Posts: 1692
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Approximate speed classes of programming languages

Postby Jplus » Fri Apr 25, 2014 6:17 pm UTC

That's the least of my concerns. What really costs time is installing it, checking that I configured things correctly, retrieving source code for programs that aren't included in the snapshot anymore, analysing the data, and so on.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

igouy
Posts: 8
Joined: Sat Apr 19, 2014 3:53 pm UTC

Re: Approximate speed classes of programming languages

Postby igouy » Fri Apr 25, 2014 7:58 pm UTC

Jplus wrote:What really costs time is…

Those are small one-off costs.
The repeat-costs are installing new versions of languages implementations and re-measuring all the programs.

The trick is to "compare whatever [you] want" without feeling a need to duplicate what the benchmarks game already does :-)

quick-dudley
Posts: 3
Joined: Mon Mar 28, 2011 10:49 am UTC

Re: Approximate speed classes of programming languages

Postby quick-dudley » Thu May 08, 2014 10:43 am UTC

I'm curious about whether Haskell would benchmark better or worse if the Haskell code used in the benchmarks wasn't such a spectacular example of someone fighting against the language rather than taking advantage of its features.

obfpen
Posts: 59
Joined: Mon Jun 20, 2011 1:16 pm UTC

Re: Approximate speed classes of programming languages

Postby obfpen » Fri May 09, 2014 9:58 am UTC

The folks developing Julia have done a few cross-language comparisons that seem to tally with your own. I can't say how selective they've been with their examples, but the graph is pretty.

User avatar
Jplus
Posts: 1692
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Approximate speed classes of programming languages

Postby Jplus » Fri May 09, 2014 11:36 am UTC

That's a pretty graph indeed! From the names of the benchmarks I get the impression the tasks used are a bit more "micro" than the ones used in the CLBG, so it might be a bit more selectively favourable to Julia. Nonetheless I think the graph definitely provides sufficient grounds to put Julia in tier 2 (fast). Would you agree?
(Tier 1 seems unlikely as Julia is dynamically typed and garbage-collected, but these data are not clear on the classification).
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

wumpus
Posts: 485
Joined: Thu Feb 21, 2008 12:16 am UTC

Re: Approximate speed classes of programming languages

Postby wumpus » Mon May 12, 2014 3:06 pm UTC

KnightExemplar wrote:Tier -1: Hardware Languages
Verilog
VHDL
[detailed explanation deleted. Read it again, anyway]
Tier0: Specialized Languages
OpenCL, CUDA, SIMD Assembly (ie: ARM NEON, x86 MMX, x86 SSE, AMD64 AVX assembly instructions), AES Assembly Instructions, Intel QuickSync, AMD HSA.
[more deleted. You kept read the whole quoted post, didn't you?]

The one catch is that of the "specialized languages", OpenCL and CUDA aren't all that specialized at all, the limitations are pretty much built into the hardware that they run on (and AMD at least has OpenCL compilers for bog-standard X86/AMD64 chips). The limitations:

Must be extremely (pretty much must be embarrasingly) parallel. SIMD falls apart if Amdahl's law comes anywhere near it.
Should be mostly multiplies, preferably multiply accumulates. If your code is based on multiply accumulates (like a lot of DSP and numerical work), it will be extremely difficult/expensive to design a ASIC that will perform better than a GPU. I suspect it will only work if you have a memory access patern that won't fit into a GPU (but you can fit on your ASIC), or that you can redifine multiply in some non-IEEE754 based system (and that won't buy you much).
Must fit unit stride (or nearly unit stride) vectors. GPUs can really only load unit stride vectors (but can re-arrange them however you want). If you have an n-stride vector, be prepared to do n loads/stores to get it into/out of your registers.
Must fit into a 16k-46k "working area". Note that you not only have to fit in the parallelism of your vectors, you probably need to run a few threads of vectors to hide any latency. Loading from this local memory is fast. Loading from other memory can take hundreds of cycles. The Cell in the PS3 "failed" because (I'm assuming) programmers had to fit cell data into 256k (per hardware thread), this is much more painful by comparison.
Must be cache friendly. There is a limited amount of "on chip memory" to swap data from all your little caches, but it is nothing like the few Gigabytes of memory on your card. While you seem to have a huge amount of bandwidth (relative to a CPU), is isn't anywhere near the difference of the GFLOPS. You must stay on chip much more than with a CPU (and GPU caches are nowhere near as big as CPU caches).
Must not care about protected memory. (exception [implied by AMD], the latest APU from AMD). Cryptographers take note, this makes timing attacks seem trivial by comparision. Attackers need only read out the entire memory to snag keys.
Should use only floats, not doubles. Titans, and to a lesser degree top of the line AMD cards (the older generations tend to be just as fast as the newer, get em while the bitcoin miners sell them off) can do doubles without problems, but everything else takes a big hit. This isn't so good as once you multiply-add the same data 20-30 times, you really start to need double (I'm going by my memory of how a 32k single point FFT sounded like, YMMV but "real" work needs double).
Must fit in a few G of RAM. While most loads must come from local (16-48k memory), a few lines from cache, there will always be those that hit main memory. You only get a few G of RAM on a board (and I haven't seen one that can be upgraded since 2000), so know your limit. PCIe transfers are pretty slow (and almost the entire optimization handbook from either AMD or Nvidia is on limiting this).

Bitcoin mining worked very, very, well in OpenCL/CUDA (CUDA's problems can be assumed to be tied to nvidia's issues with SHA256), because it hit pretty much every requirement (except "use a lot of multiply adds"). The lack of multiplies meant that custom ASICs could run much faster/cooler.

igouy
Posts: 8
Joined: Sat Apr 19, 2014 3:53 pm UTC

Re: Approximate speed classes of programming languages

Postby igouy » Sat Nov 21, 2015 6:51 pm UTC

Jplus wrote:So I decided to waste some of my time and use the Computer Language Benchmarks Game (CLBG) to group programming languages by approximate orders of speed. To do this I identified the "flat" parts of the graph that results if you merge the boxplot charts on the "Which programs are fastest?" page.


Now I've stumbled across this discussion again, it seems like I should mention one of the recent changes to those benchmarks game box plot charts is that I've broken the boxes into separate groups, at the minima of the kernel density estimate for the geometric mean scores.

http://www.rsc.org/Membership/Networkin ... sities.asp

User avatar
Jplus
Posts: 1692
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Approximate speed classes of programming languages

Postby Jplus » Sat Nov 21, 2015 7:24 pm UTC

Thanks for pointing that out. I visited this page and it struck me that the bottom graph doesn't appear to have been subdivided into groups. It's like tiers 1 and 2 have been merged. Is that because the kernel density estimate doesn't detect clear "breaks"? I have to admit, with this ordering, I wouldn't know where to put a break based on eyesight, either.

But if the distinction between tiers 1 and 2 still stands, it appears like Rust might have to be promoted to tier 1.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

igouy
Posts: 8
Joined: Sat Apr 19, 2014 3:53 pm UTC

Re: Approximate speed classes of programming languages

Postby igouy » Mon Nov 23, 2015 12:26 am UTC

Jplus wrote:Is that because the kernel density estimate doesn't detect clear "breaks"?


No inflection in the curve.

Unlike the 32-bit one-core data which had break between Java and Scala --

http://benchmarksgame.alioth.debian.org ... stest.html

pavelpotocek
Posts: 1
Joined: Mon Dec 21, 2015 11:30 am UTC

Re: Approximate speed classes of programming languages

Postby pavelpotocek » Mon Dec 21, 2015 11:37 am UTC

Yup, Rust should definitely move to tier 1. It now beats C on, like, half the benchmarks, and is ranked better than C++ (at least) on u64q.

Avalon
Posts: 1
Joined: Sat Mar 26, 2016 4:16 pm UTC

Re: Approximate speed classes of programming languages

Postby Avalon » Sun Mar 27, 2016 10:31 am UTC

You could throw in microcode, though. It's like firmware for your typical CPU to "emulate" an architecture (even machine code is an abstraction now) and take care of internal tasks. But it's not really usable and tied to a particular chip revision.
Last edited by Avalon on Wed May 25, 2016 12:23 pm UTC, edited 1 time in total.
I wonder if SizeGenetics is really that good?

wumpus
Posts: 485
Joined: Thu Feb 21, 2008 12:16 am UTC

Re: Approximate speed classes of programming languages

Postby wumpus » Mon Mar 28, 2016 9:18 pm UTC

Avalon wrote:You could throw in microcode, though. It's like firmware for your typical CPU to "emulate" an architecture (even machine code is an abstraction now) and take care of internal tasks. But it's not really usable and tied to a particular chip revision.


Microcode has a weird place in this. It has to be slower than tier -1 (there's a reason that microcode has been replaced by Verilog (and sometimes VHDL)), but should be a little above tier 0 (because it can make some of the effects happen). Presumably something like the AES opcodes in AMD-64 are coded into microcode (or not. A lot would depend on which group coded it and how old the engineer who decided on a method was). I'd just call it tier -.5 on principle, but in reality it could be slower than tier 1 (because once you hit microcode, you are typically hitting legacy opcodes that have been deprecated a *long* time ago. Full speed is rarely required.

Back in the its hayday it would certainly be tier 0 (SIMD would come later). Verilog/VHDL (when they later showed up) would then take their places as tier -1 (presumably leaving hand-placed transistors as a tier -2, so far not mentioned in this thread and as far as I know, only Intel still employs people to do this). How to fit in SIMD once microcode was tier 0 is an exercise for the reader.

- microcode was the thing I remember best as being taught in multiple engineering classes when I was at school, yet obsolete before I graduated (1992). While it is still used, it is nothing like the stuff I was taught where the microcode engine *was* the computer, in a bizarre meta way. I suspect they kept teaching this system for awhile, if only to explain this magical scheme of how a microcode engine and some microcode can turn a bunch of circuits into a computer.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 3 guests