Basic Q about compiled executables

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

User avatar
Dopefish
Posts: 854
Joined: Sun Sep 20, 2009 5:46 am UTC
Location: The Well of Wishes

Basic Q about compiled executables

Postby Dopefish » Mon Oct 21, 2013 7:46 pm UTC

So, I just realized I don't know something that is probably extremely basic. Namely, under what circumstances can I write code, compile it into an executable file, and then just copy that exe to other computers and have it still work?

Does it depend specifically on what the code is/does? (e.g. if it's basic C code using just the standard library and built in stuff, compared to using various fancy libraries?) Is it just a matter of having the same OS? Or something more general like having the same underlying filesystem? Or is it actually the case that a simple program executable could be put on a USB and run pretty much everywhere? Or is the compiled result always going to specifically work just on the computer it was compiled on unless I specifically do something?

I've generally just passed around source code and recompiled between machines, but it occurs to me that's probably not always going to be convenient, and I don't really have enough systems available to me these days to just test things.

I suppose my ignorance regarding the above stuff comes from not 'really' knowing what happens when something is compiled. I gather it takes my human written code and converts it into 'machine code' which the computer actually uses, but I don't know how machine specific that code is, and how much is really contained within it (compared to there being code that tells it to look into library files elsewhere or something).

heatsink
Posts: 86
Joined: Fri Jun 30, 2006 8:58 am UTC

Re: Basic Q about compiled executables

Postby heatsink » Mon Oct 21, 2013 8:11 pm UTC

The full story about application portability is quite complicated. A compiled program depends on an instruction set, an executable file format, an operating system binary interface, and some library binary interfaces.

If you want to write a portable compiled binary, you could statically link all the libraries, and the executable would be portable across all computers running the same operating system.

There are some related questions on stack overflow: http://stackoverflow.com/q/4101239/507803

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Basic Q about compiled executables

Postby korona » Mon Oct 21, 2013 9:15 pm UTC

If you compile on x86 your compiler produces x86 assembly code so ofc the code won't run on another architecture e.g. ARM. If you enable instruction sets like SSE the code won't run on older x86 cpus that don't support SSE. Note that instruction sets like MMX are often used for implementing things like memcpy so this might be true even if your code doesn't explicitly use SSE or MMX.

It also won't work on an operating system that uses a different binary format but AFAIK all Linux systems use ELF today and Windows uses PE so that is usually not a problem. The code also depends on calling conventions expected by the dynamically linked libraries it uses. If you change a cdecl function to a fastcall function in a dynamic library all code that uses the library has to be recompiled. This obviously includes the C library.

What is more important is that operating system APIs might vary between different versions of the kernel. If you statically link libc you can run into trouble here. A sytem call like write(STDOUT_FILENO, "hello", 5); is usually compiled to something like:

Code: Select all

move the stdout file descriptor number to a certain register
move the pointer to "hello" to a certain register
move 5 to a certain register
move a magic number telling the os that you want to call write() to a certain register
perform a magical instruction that elevates the current privileges and jumps to kernel code at an address that was previously setup by the kernel

Now if this magic number that tells the kernel "hey I want to do a write() call" changes or the registers that are expected to hold the arguments change your code will break.

This is not a problem on Windows because Windows doesn't statically link libraries to your applications. All system calls are inside .dll files and are linked dynamically to the application. It is also not a problem on Linux if you don't force gcc to statically link the C library. Even if you do system call interfaces rarely change so you're probably safe unless you use system calls that were introduced in a recent kernel build and are not stable yet.

EDIT:
heatsink wrote:If you want to write a portable compiled binary, you could statically link all the libraries, and the executable would be portable across all computers running the same operating system.

Where operating system means operating system build (as the system call interface can change) and all computers means all computers supporting the same instruction set. (as the memcpy() in libc could require MMX)
Last edited by korona on Mon Oct 21, 2013 9:33 pm UTC, edited 2 times in total.

speising
Posts: 2128
Joined: Mon Sep 03, 2012 4:54 pm UTC
Location: wien

Re: Basic Q about compiled executables

Postby speising » Mon Oct 21, 2013 9:26 pm UTC

another interesting trap: if you use visual studio, with dynamically linked libraries, and try to distribute a debug exe, this will fail because the target machine probably won't have the debug version of the dll's.
and the release version will neet the redistributable package installed.

EvanED
Posts: 4327
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Basic Q about compiled executables

Postby EvanED » Mon Oct 21, 2013 10:55 pm UTC

The Linux syscall interface is, AFAIK, stable as a rock. Compare the Linux 2.2 syscall interface to 2.6.35.4; there are a bunch of additions, but the old interfaces have stuck around. I only count 3 things that are not supported by the newer kernel -- sys_idle (112), sys_get_kernel_syms (130), and sys_query_module (167) -- and nothing that is changed, at least at the level of function signature. Linux promises a stable syscall interface, and from the kernel's perspective (e.g. how IOCTLs are interpreted and such aren't reflected in the above picture) they seem to deliver.

Of course there's no guarantee that an old program will continue to run, but it's my understanding that if you don't do fairly esoteric stuff (or mess with X, probably) and statically link (or arrange for the libraries to be present), then you can carry compiled programs forward. I know I've run programs compiled for 2.4 on 2.6, and also use some X libraries that I'd be too scared to figure out how old they are.

User avatar
Dopefish
Posts: 854
Joined: Sun Sep 20, 2009 5:46 am UTC
Location: The Well of Wishes

Re: Basic Q about compiled executables

Postby Dopefish » Mon Oct 21, 2013 10:56 pm UTC

Hmm. That was helpful, although I almost feel like I know less than before given all the abbreviations I don't know what are (and even if I did, they probably still wouldn't mean much to me), as well as various other tidbits related to the under-the-hood stuff.

Still, what I'm getting is that different builds of even the same OS can work on different kinds of magic, and things will break if the magic is different. Different OS's pretty much always use different magic. If utilizing library things, there's potentially options that can be changed to make the magic somewhat more likely to spit out something portable as long as I'm not doing anything with relatively recently introduced kinds of magic.

Interesting that what I thought was a basic question actually had lots of complications; I'm glad I decided to ask instead of deciding it was too basic to be worth the forum space.

Carnildo
Posts: 2023
Joined: Fri Jul 18, 2008 8:43 am UTC

Re: Basic Q about compiled executables

Postby Carnildo » Tue Oct 22, 2013 5:22 am UTC

EvanED wrote:(or mess with X, probably)

From an application standpoint, X is even more solid than the Linux syscall interface. I recently needed to compile an X11-based program that dated from 1994, and it compiled and ran on a modern system with no changes needed. The X11 client-server protocol is unchanged (except for being extended) since 1987, and the libX11 API hasn't changed much either. The server ABI and API are unstable, but that only matters if you're writing drivers.

User avatar
phlip
Restorer of Worlds
Posts: 7548
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

Re: Basic Q about compiled executables

Postby phlip » Tue Oct 22, 2013 6:00 am UTC

Basically, it boils down to three points: architecture, dynamic linking, and OS ABI. That is, how the machine code itself is built, where to find any other bits of machine code you need, and how the OS starts your program and they talk to each other.

For architecture, this is things like the instruction set, the system layout, the BIOS, how all the assorted electrical gubbins inside the box talks to each other.
Now, chances are you only care about the PC. x86 or x86_64, and everything that comes along with it. Now, a bunch of options and extensions and expansions exist (MMX, SSE, etc have been mentioned already), but for the most part these are backwards-compatible... ie a newer CPU can run code targeted to an older CPU, so it's jut a question of how far back in time you want to support (usually trading compatibility for performance). If you're only building something for yourself, then you might compile with everything your own CPU supports (aka "the Gentoo plan"), but if you're planning to mass-distribute something, you might want to only use those extensions you are confident your users will support (your compiler defaults are probably fine, as a conservative starting point). This is also where the 32-bit vs 64-bit difference comes in... most 64-bit OSes will still run 32-bit code, under some form of compatibility layer... while 32-bit OSes are unable to run 64-bit code at all.
As for other architectures, ARM et al... they are of course not going to run any x86 binaries short of emulating a full x86 system, they have their own machine code, and their own system layout that you'll need to handle. Exactly how portable something is between one ARM-powered device and another will depend on the specifics, but I imagine it's usually pretty limited.
There are also fake architectures that are designed to run entirely within a virtual machine, rather than on actual hardware. Java, .Net, Flash... these are all examples of "compiled" languages that target a VM. For these, the same basic idea applies... the person running the program needs to have a VM that is of a compatible architecture (you can't run a .Net program without having that version of .Net installed, or run a Java program without that version or higher of the JVM).

Next up, libraries. Chances are, not every single byte of machine code your program needs is actually stored in the binary file itself. Usually a fair amount will be pulled in at runtime from libraries. Some of those libraries, you an expect your users to already have (eg system libraries that are part of the OS), or some of them you'll need to bundle along with your program, or have some other way of getting them. If you're using a closed system, like MSVC, you'll need to be careful about what you're allowed to redistribute and what you aren't... MSVC includes documentation about what libraries you're allowed to ship with your program, usually the ones that your program requires to run even after compilation. For instance, if I write this basic program, and compile it with MSVC:

Code: Select all

#include <stdio.h>
int main()
{
  printf("Hello!\n");
  return 0;
}
then my .exe file will have the definition of main(), but it'll depend on the definition of printf() from the C library, in MSVCR100.DLL. That, in turn, relies on the low-level to actually write to standard out, which lives in KERNEL32.DLL. Now, I'm not allowed to redistribute kernel32, but since that's a part of Windows, I shouldn't have to - everyone running Windows should already have it. On the other hand, msvcr100 isn't a part of Windows, and is allowed to be redistributed, so I'd need to install that library along with my program. Tools like Dependency Walker will help you track exactly what libraries your program needs.
On free systems, with a good distro, this whole mess is a bit easier to handle... the package manager just records that your program depends on, say, libc, and then when you try to install your program from the package manager, it'll pick up all the dependencies for you. Though when you're distributing your program you still need to know what your dependencies are, so you can set those flags correctly.

The third item, the OS interface, is basically the part that stops Windows programs running on Linux, and vice-versa. It's the format of the executable file itself, all its headers and metadata and how it actually stores the machine code itself. It's exactly what state the process is in when your program starts, and how your program can expect to get input from the outside world. It's exactly what your process has to do to make calls to functions the operating system provides. This also ties into the previous point as it also includes instructions to the OS to find dynamic libraries and link everything together. This is stuff you probably rarely have to actually worry about, though... the Windows Portable Executable file format hasn't changed since Windows 95, and ELF has been the standard format on *nix for about as long. When you're making calls to the OS, on Windows that's all handled by DLLs, so your code just calls some function in user32, or kernel32, or the like, and then that does the work, and they try to keep backward-compat with those as much as possible with new versions of Windows - so on newer versions, all the old stuff is still in there to be called. On Linux, system calls can be done via C functions in libraries, but they can also be called directly via an interrupt handler... but the workings for this, again, is very stable and backward-compatible, so your binary will continue to work on newer versions of the kernel.

So generally in most cases for PC software there'll just be a "minimum version" of the OS that you support, and then you'll go on to support everything after that. Where there are compatibility problems with newer versions, it's usually at a much higher level than just "it's compiled for an earlier version"... like when Windows Vista introduced UAC, if you had a compatibility problem with that, you couldn't just "recompile targeting Vista", you needed to actually change your program logic at a higher level.

Code: Select all

enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}
[he/him/his]

User avatar
Jplus
Posts: 1692
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Basic Q about compiled executables

Postby Jplus » Tue Oct 22, 2013 1:11 pm UTC

Dopefish wrote:So, I just realized I don't know something that is probably extremely basic. Namely, under what circumstances can I write code, compile it into an executable file, and then just copy that exe to other computers and have it still work?

I think it's possible to give a short answer. Your executable will work on another computer if all of the following requirements are met:
  1. The other computer understands the machine instructions that appear in your executable. Usually this means either that it has the same machine architecture (such as x86 or ARM) or that it has an implementation of the same virtual machine (such as JVM or CLR). In both cases your executable should not rely on new instructions that the receiving computer doesn't know about.
  2. The other computer can read the file format of your executable (the file does not only contain machine instructions but also references to libraries that should be linked and other data). Usually this means that the computer runs the same operating system or, again, the same virtual machine. This is the first meaning of "application binary interface" (ABI 1).
  3. All libraries (and other additional resources) that are dynamically linked by your executable are either included in the package that you ship, or already present on the other computer. The more a library is considered to be standard on the platform that you target, the less likely you'll need to ship it yourself.
  4. In the case of libraries being already present on the other computer, they must follow the same conventions about the way of finding and calling functions as your executable (ABI 2). Executables and libraries tend to be ABI-compatible if they were compiled with the same compiler for the same operating system, or if one of the compilers used a compatibility mode for the other.
  5. The other computer can handle the interrupts by your executable or the libraries that it links to (usually only the standard library of the programming language) and conversely, your program can handle the signals from the other computer. Basically this means that the other computer uses the same convention on what we call "system calls" as your program (ABI 3), and this again implies either the same operating system + machine architecture or the same virtual machine.
So to crudely summarise: same machine architecture + same OS + same compiler = binary compatibility.

Dopefish wrote:I've generally just passed around source code and recompiled between machines, but it occurs to me that's probably not always going to be convenient, and I don't really have enough systems available to me these days to just test things.

So most of the time this is the right approach. If somebody else has roughly the same kind of system as you do (same machine architecture, same operating system with a version of similar age) you can usually just send the executable, but otherwise you need to recompile. This is indeed inconvenient, and that's the reason why people invent and use virtual machines. Of course, virtual machines have their own backdraws.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

User avatar
skeptical scientist
closed-minded spiritualist
Posts: 6142
Joined: Tue Nov 28, 2006 6:09 am UTC
Location: San Francisco

Re: Basic Q about compiled executables

Postby skeptical scientist » Tue Oct 22, 2013 3:34 pm UTC

Others have already answered this very well (especially phlip, I learned a lot from your answer).

However, they left out one thing: the existence of fat binaries/libraries. As others have said, compiled binaries are in machine-code which is processor-specific, so an x86 binary will not run on an ARM processor or vice-versa. The exception to this is fat binaries, which contain both an x86 assembly section and an ARM assembly section (for example—other architectures could be used). The various assembly sections are bundled together with an architecture-independent header, which will tell the computer to jump to the appropriate section (depending on the architecture of the processor running the executable). The resulting files are much larger, because they contain multiple versions of the same program (one per supported architecture), which is the source of the name "fat". This allows the same program to run on multiple architectures, so you could e.g. distribute a single version of an iOS app which would run natively on both a 32-bit iPhone 4 and a 64-bit iPhone 5s (without using the 32-bit backwards compatibility mode).

The architecture-independent header is still OS-specific, and not all operating systems have support for fat binaries. (Today it is mainly Apple OSes which support fat binaries.) However, it is possible to create one (simple) executable which will run on an Intel mac, a PowerPC mac, and on an iPhone.
I'm looking forward to the day when the SNES emulator on my computer works by emulating the elementary particles in an actual, physical box with Nintendo stamped on the side.

"With math, all things are possible." —Rebecca Watson

User avatar
Jplus
Posts: 1692
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Basic Q about compiled executables

Postby Jplus » Tue Oct 22, 2013 5:12 pm UTC

Now that we're at it we may also mention the use of intermediate bytecode. Plan 9 from Bell Labs uses an intermediate binary representation which can then be compiled to native machine code at load time, i.e. just before the program is going to execute. This allows you to compile the program once (to the intermediate format) and then use it on any machine architecture that runs the same operating system. It's like JIT compilation but very low-level (probably comparable to LLVM), built into the OS and much more dated. :)
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Basic Q about compiled executables

Postby korona » Tue Oct 22, 2013 7:15 pm UTC

Conceptually that is the same as Java or .NET code. In theory one could build a heavy optimizing compiler that compiles Java or .NET to native executables and libraries ahead of time (well there are some obstacles with dynamic class loading but I think they can be solved). If you want to guarantee memory safety the compiler could attach cryptographic signatures to the binary files that guarantee that they were build out of safe bytecode. I think this is a very promising path for future operating systems that want to combine the safety of managed languages with heavy optimizations that were developed for ahead-of-time compiling.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 5 guests