Safely executing unkown code

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

User avatar
Eomund
Posts: 115
Joined: Fri Jun 22, 2012 11:48 pm UTC
Location: The Great White North (Left side)

Safely executing unkown code

Postby Eomund » Fri Sep 20, 2013 1:49 am UTC

I am thinking of making a game where the players would submit some code that would control their units. This should be easy enough to do; however, I do not know the best method of doing safely. Basically, I would want to take an input source code file and be able to run a subset of a programing language, knowing that whatever is in the code would not do anything nasty to my computer. Some things I would want are a strict limit on execution time and a limit on memory. I think some sort of whitelisting would be better than blacklisting. I do not think they would need to be able to do much more than basic control statements and variable manipulation. I suppose I could write some sort of analyzer/parser myself but this seems like it would be tedious and is something useful enough that somebody probably has already made something like it. I am quite flexible in terms of language as I know java, python, c++ and php and wouldn't mind learning something new.

User avatar
Cleverbeans
Posts: 1378
Joined: Wed Mar 26, 2008 1:16 pm UTC

Re: Safely executing unkown code

Postby Cleverbeans » Fri Sep 20, 2013 2:58 am UTC

I think it would be best to make your own language with only the commands you'd want to allow. If we can't solve the halting problem then I can't see how you'd be able to check stuff like execution time if you just allowed code submissions.
"Labor is prior to, and independent of, capital. Capital is only the fruit of labor, and could never have existed if labor had not first existed. Labor is the superior of capital, and deserves much the higher consideration." - Abraham Lincoln

User avatar
Xenomortis
Not actually a special flower.
Posts: 1455
Joined: Thu Oct 11, 2012 8:47 am UTC

Re: Safely executing unkown code

Postby Xenomortis » Fri Sep 20, 2013 8:18 am UTC

Write your own language and an execution environment.
e.g. Core Wars runs in an abstract virtual machine and the "warriors" are programs written in the VM's own (very small) assembler language.

You don't need a particularly sophisticated language. You need a few basic arithmetic operations, conditional jumps and some sort of addressing mechanism.
Memory could be limited to a single array (or two, if you want to separate code and data), provided by your runtime environment, and addresses could just wrap around (e.g. the 11th element of a 10 element array is just the first element).

I don't think there's a lot you can do to truly be safe without writing your own runtime environment.
Image

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Safely executing unkown code

Postby korona » Fri Sep 20, 2013 2:49 pm UTC

I heavily recommend against building your own language. Designing a programming language is a much harder task than writing a game itself. I designed several programming languages in the past; writing a decent interpreter alone usually takes in the order of 10k sloc (remember that you have to include a simple runtime library).

There are many languages available that can be easily embedded into your game, the most common ones being JavaScript, Python and Lua.

I don't know much about embedding Lua and Python and I don't know if they can be embedded in a safe but I have some experience with JavaScript. As JavaScript is originally designed for the web the language itself is safe and does not provide any interfaces to manipulate extern data. Embedding Google's virtual machine v8 is really easy (it takes less than 20 lines of C++ code to run a script) and adding bindings for your game's interface is simple as well. If you don't want to code in C++ Java's standard library has an interface to interact with JavaScript (javax.script) or you could use Mozilla Rhino directly.

EDIT: The halting problem is irrelevant to practical execution time - just run the script in a separate thread and abort it if it takes too long. Memory limits are easy to implement as well.

User avatar
Cleverbeans
Posts: 1378
Joined: Wed Mar 26, 2008 1:16 pm UTC

Re: Safely executing unkown code

Postby Cleverbeans » Fri Sep 20, 2013 5:16 pm UTC

korona wrote:I heavily recommend against building your own language. Designing a programming language is a much harder task than writing a game itself.


Sure but we're not talking about a turing complete language, just a subset with certain features enabled which simplifies the problem. The goal is after all to prevent dangerous code from running, and I can't see a way to allow code submissions without leaving vulnerabilities or taking measures that are more difficult than writing a simple, custom API.
"Labor is prior to, and independent of, capital. Capital is only the fruit of labor, and could never have existed if labor had not first existed. Labor is the superior of capital, and deserves much the higher consideration." - Abraham Lincoln

alessandro95
Posts: 109
Joined: Wed Apr 24, 2013 1:33 am UTC

Re: Safely executing unkown code

Postby alessandro95 » Fri Sep 20, 2013 6:37 pm UTC

korona wrote:I don't know much about embedding Lua and Python and I don't know if they can be embedded in a safe but I have some experience with JavaScript.


Lua was born to be embedded so it's trivial to embed it, especially in C/C++ programs as for the safety it's supposed to be easily sandboxable, but I never tried it myself
The primary reason Bourbaki stopped writing books was the realization that Lang was one single person.

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Safely executing unkown code

Postby korona » Fri Sep 20, 2013 7:51 pm UTC

Cleverbeans wrote:
korona wrote:I heavily recommend against building your own language. Designing a programming language is a much harder task than writing a game itself.


Sure but we're not talking about a turing complete language, just a subset with certain features enabled which simplifies the problem. The goal is after all to prevent dangerous code from running, and I can't see a way to allow code submissions without leaving vulnerabilities or taking measures that are more difficult than writing a simple, custom API.

A language with integer variables, an addition operator and a while loop is already turing complete (apart from having limited memory which applies to all programming languages in the real world).

Making the language not turing complete is not the way to go. The right approach is having a turing complete language but disallowing things like interaction with arbitrary C code, the file system, networking, hardware etc.

If you are embedding a proper virtual machine like Google's v8 you get a sandboxed environment for free and you just have to add functions the script should be allowed to access e.g. functions to move units in the game etc. The script is not able to access the operating system or external code or data in any way.

JavaScript is designed to run in this sort of scenario: Your browser allows untrusted JavaScript to run on your machine each time you visit a website. However it doesn't allow the script to access arbitrary operating system interfaces so running it is still safe.

User avatar
Xanthir
My HERO!!!
Posts: 5423
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex
Contact:

Re: Safely executing unkown code

Postby Xanthir » Fri Sep 20, 2013 8:08 pm UTC

Yeah, a custom language is the easiest way to handle this. You can prevent TC without too much difficulty - only allow finite loops (that is, loops which require specifying how many iterations it does, rather than having arbitrary stopping conditions), and disallow recursion. As long as you don't allow tricky jumping, this is sufficient.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Safely executing unkown code

Postby EvanED » Fri Sep 20, 2013 8:24 pm UTC

korona wrote:Making the language not turing complete is not the way to go. The right approach is having a turing complete language but disallowing things like interaction with arbitrary C code, the file system, networking, hardware etc.

100x times this. What are you going to do to it to not make it Turing complete that isn't going to also artificially limit lots of useful stuff to do?

For instance, can you have two variables, increment and decrement them, and branch based on whether those variables are zero? Congrats; if your registers are infinite size, your language is Turing complete.

Furthermore, not only is it kind of hard to produce a non-Turing-complete, useful programming language, but IMO it's not even particularly useful to do so for purposes of this question. (Well, at least "Turing complete" in the sense of "not technically Turing complete because of memory limitations, but still an absolutely freaking enormous state space.") You'd still have to figure out a way to determine the execution time for programs, which could still be non-trivial. (Or even impossible!)

The right way to do it is to just ... do it. And if it actually uses too much time or memory, then kill it. I'd recommend starting the program in a different process rather than different thread to avoid certain problems, but either way that will be much much much much much easier to do than trying to design a language that will respect your computational limits.

I think the Javascript route sounds most promising. I wouldn't suggest Python; my impression is sandboxing that is really hard. Lua is an unknown -- I could see it working really well too, but I can't be sure of it. You could also look into seeing if any judging software for ICPC Programming Competition-style competitions are available, because they face this problem. They have software that limits system calls, CPU time, and total memory. But that is a much more heavyweight answer.

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Safely executing unkown code

Postby korona » Fri Sep 20, 2013 8:25 pm UTC

Xanthir wrote:Yeah, a custom language is the easiest way to handle this. You can prevent TC without too much difficulty - only allow finite loops (that is, loops which require specifying how many iterations it does, rather than having arbitrary stopping conditions), and disallow recursion. As long as you don't allow tricky jumping, this is sufficient.


I don't understand which anyone would like to do that. Allowing only finite loops just makes the programmers life harder without giving any more "real" safety. They can still lock your code (just copy and paste a few hundred "times(100000) { do_something(); }" statements).

Simply track how much time the users code consumes and interrupt it if it takes too long. That's the most practical method: It allows arbitrary turing complete code and is still safer than a finite-loops-only language.

Turing completeness is the bare minimum a programming language has to support in order to be useful. Even special purpose languages like XSLT and SQL are turing complete. You cannot implement any non-trivial operation without turing completeness.

Ben-oni
Posts: 278
Joined: Mon Sep 26, 2011 4:56 am UTC

Re: Safely executing unkown code

Postby Ben-oni » Fri Sep 20, 2013 8:46 pm UTC

korona wrote:A language with integer variables, an addition operator and a while loop is already turing complete (apart from having limited memory which applies to all programming languages in the real world).

I don't know where you're getting your information, but this is not the case. The language must also have assignment and dynamic memory allocation.

korona wrote:You cannot implement any non-trivial operation without turing completeness.

Still not true. If the programmer is required to provide proof that an operation is deterministic and finite, it can be run in a non-turing complete language.

Making a language that is both useful and non-turing complete is a tricky proposition, though, but a few non-toy examples exist, notably Coq which is pretty domain specific. I have a suspicion that turing completeness is overrated, and most applications programmers could build their projects with something non-turing complete that provides more compile-time guarantees. When we're not writing compilers, interpreters, etc, we usually want our programs to terminate in a deterministic manner. I suppose I'm imagining a Prolog-like "glue" language for describing the overall logic, with some kind of magic under the covers to bind non-deterministic IO, ala monads. I'm rather fuzzy on the details, but I see no reason it can't work.

User avatar
headprogrammingczar
Posts: 3072
Joined: Mon Oct 22, 2007 5:28 pm UTC
Location: Beaming you up

Re: Safely executing unkown code

Postby headprogrammingczar » Fri Sep 20, 2013 8:51 pm UTC

Perhaps you could use the library version of mueval or something similar. You'd be making people learn Haskell, but it's not really any weirder than say, Source engine console scripting.

It's survived many years of public IRC use, so I can't imagine it being unsafe.
<quintopia> You're not crazy. you're the goddamn headprogrammingspock!
<Weeks> You're the goddamn headprogrammingspock!
<Cheese> I love you

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Safely executing unkown code

Postby korona » Fri Sep 20, 2013 9:06 pm UTC

Ben-oni wrote:
korona wrote:A language with integer variables, an addition operator and a while loop is already turing complete (apart from having limited memory which applies to all programming languages in the real world).

I don't know where you're getting your information, but this is not the case. The language must also have assignment and dynamic memory allocation.

Yeah it must have assignment, but when I said you need integer variables I expected that they do not only exist but can also be assigned a value :|. Honestly I don't know what anyone would do with variables if there is no assignment. That is what being a "variable" actually means.
And no, you don't need dynamic memory allocation. Using variables only you can build arbitrary large memory in the same sense as C can access arbitrary much memory (i.e. it's not infinite but still very very big).

Before questioning my source of information you should do your own homework.

Ben-oni wrote:
korona wrote:You cannot implement any non-trivial operation without turing completeness.

Still not true. If the programmer is required to provide proof that an operation is deterministic and finite, it can be run in a non-turing complete language.


Aside from Coq and ATS I don't know any languages that allow you to supply certificates for termination. Coq is not really a programming language but a proof assistant that can emit code. The benefit of using such a language is small: Using SMT techniques even unannotated C code can be proven to halt in almost all cases. Additionally you just replaced turing-completeness by requiring a theorem prover (possibly for full first order logic). Proving first order theorems is much harder than any problem in NP (or PSPACE) as there are exponentially large resolution refutations for some formulas e.g. for the pigeon hole problem.

Using a certifying language for a game is both overkill and stupid because it decreases the programmers productivity without any real benefits.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Safely executing unkown code

Postby EvanED » Fri Sep 20, 2013 9:44 pm UTC

Ben-oni wrote:
korona wrote:A language with integer variables, an addition operator and a while loop is already turing complete (apart from having limited memory which applies to all programming languages in the real world).

I don't know where you're getting your information, but this is not the case. The language must also have assignment and dynamic memory allocation.
Depends on what you mean. Assignment is not at all a requirement, and dynamic memory allocation is only required if your non-dynamic memory has a fixed size. If you had a language with true integers instead of a fixed size, that also gives you infinite memory and dynamic allocation isn't required. (You could also argue that it in such a case it would still be present, just hidden by the language, and that would be a reasonable position. But it definitely needn't be exposed to the programmer. You could also just take the position that said variables would be, I dunno, a couple hundred megs each -- technically fixed size so no dynamic allocation even internally, but still Turing complete for practical purposes.)

Like I said before, the following is sufficient to get Turing completeness:
1. You can have two variables...
2. ...which hold integers...
3. ...which you can increment and decrement...
4. ...and branch based on whether those variables are zero.


korona wrote:The benefit of using such a language is small: Using SMT techniques even unannotated C code can be proven to halt in almost all cases.
That's... not true. Termination is a much harder problem than just correctness checking, and automatic correctness checking for arbitrary C code is still only mediocre in general.

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Safely executing unkown code

Postby korona » Fri Sep 20, 2013 9:54 pm UTC

EvanED wrote:
korona wrote:The benefit of using such a language is small: Using SMT techniques even unannotated C code can be proven to halt in almost all cases.
That's... not true. Termination is a much harder problem than just correctness checking, and automatic correctness checking for arbitrary C code is still only mediocre in general.

It's powerful enough to check the termination of all of Window's device driver bindings: http://research.microsoft.com/en-us/um/ ... nreach.pdf

Derek
Posts: 2181
Joined: Wed Aug 18, 2010 4:15 am UTC

Re: Safely executing unkown code

Postby Derek » Sat Sep 21, 2013 1:34 am UTC

headprogrammingczar wrote:but it's not really any weirder than say, Source engine console scripting.

Speaking of which, this is a good example of a language that is not Turing complete and yet let's you do almost anything you would want to do (given the context of console scripting). I don't have a formal proof for it, but the language has the computational power of DFAs.

korona wrote:Even special purpose languages like XSLT and SQL are turing complete. You cannot implement any non-trivial operation without turing completeness.

I was under the impression that SQL was not Turing-complete, but it turns out that the recursive Common Table Expressions feature added in SQL:1999 makes the language Turing-complete. This is a feature I had never heard of. Still, a lot of SQL is useful without being Turing-complete.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Safely executing unkown code

Postby EvanED » Sat Sep 21, 2013 5:11 am UTC

korona wrote:It's powerful enough to check the termination of all of Window's device driver bindings: http://research.microsoft.com/en-us/um/ ... nreach.pdf
...and device drivers are a great ground for static analysis because they're abnormally easy. They are very small in comparison to applications, they have a lot of properties that can be easily specified, they tend to do little dynamic memory allocation, and they don't really take variable-sized inputs like normal programs. In addition, drivers are also abnormally important (as they'll crash all your programs instead of just the buggy program). I'm not knocking Terminator or other work that focuses on them in the slightest.

But don't fool yourself: the techniques that Terminator use don't scale to typical programs, and a lot more work needs to happen before that becomes true (if it ever does). In addition, I'm pretty sure that Terminator wouldn't work in an adversarial setting because it (I think) assumes memory safety on the part of the analysis target. (It's been a while since I read the paper. I've emailed a friend who knows Terminator in more detail; I'll post again/edit when he gets back.) Adding a requirement to prove memory safety along the way (as you'd have to do if you want to run code from actually untrusted sources) makes the problem a lot harder.

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Safely executing unkown code

Postby korona » Sat Sep 21, 2013 10:33 am UTC

I don't think device drivers are easier to analyse than application programs. It is true that they usually receive fixed size data but they do dynamic memory allocation (think of a hard disk driver that has to cache writes on a block level) and have to access variable-length data structures on the heap (e.g. a list of threads that are currently waiting to access a certain resource). Plus they have to deal with kernel-wide locks to guarantee that only one thread accesses certain pieces of hardware concurrently. So I think it's fair to use them as a basis for testing termination checkers.

It is true that Terminator does not scale to large inputs. It depends on what you mean by large: the paper reports that a 35k sloc driver took about 2 days to check. That's a lot of time but when testing critical code like device drivers it can be worth this cost. Many applications are larger than 35k sloc and checking them is not feasible. But I think that only 10 false positives (i.e. functions that halt but cannot be proven to halt) in 35k sloc of code is an amazing result.

Now it is possible that Terminator will never scale to a 100k sloc program. If you really want to prove the termination of such a large program you will have to use different techniques. You could split the program into many mostly independent modules if that is possible and prove termination for each module. If that does not help you would need to build tools into your programming language that allow you to reason about the termination manually.

What do you mean by memory safety? Of course Terminator assumes that no memory is manipulated concurrently. I don't know if it can handle access-after-free bugs. From what I read it seems that there are some issues with functions that change the heap size and they had to manually tell Terminator how the size of vectors changes when adding elements to them.

The discussion whether turing completeness is essential for programming languages is orthogonal to the discussion about the best way to run untrusted code in a game engine. I think that it could be moved to a different thread if anyone wants to continue the discussion.

I agree that turing completeness is not needed for domain specific languages like most parts of SQL. But there are several problems with allowing only programs that can be proven to halt e.g. by annotating the code with the proof like in ATS. First of all it is much more complicated for the programmer. In the real (non safety-critical application) world it is usally better to accept code that cannot be proven not to contain a halting bug but is much easier to maintain and write than to spends hundreds of programmer hours on formally verifying programs. And secondly not all programs can be proven to halt i.e. programs that depend on user input can never be proven to halt. If the program contains a GUI typical "while(peek_event() != QUIT) { dispatch_event(); }" loop the program can never be proven to halt.

Formally proving that programs halt is important in some areas (device driver, operating system kernels, safety critical embedded systems) but it is not applicable to all programs (large parts of end-user applications).

User avatar
Eomund
Posts: 115
Joined: Fri Jun 22, 2012 11:48 pm UTC
Location: The Great White North (Left side)

Re: Safely executing unkown code

Postby Eomund » Sun Sep 22, 2013 8:58 pm UTC

I think I'm going to have to give a try to C++ with some embedded Lua. Lua seems really easy to sandbox:
http://lua-users.org/wiki/SandBoxes

Carnildo
Posts: 2023
Joined: Fri Jul 18, 2008 8:43 am UTC

Re: Safely executing unkown code

Postby Carnildo » Mon Sep 23, 2013 7:50 pm UTC

korona wrote:I heavily recommend against building your own language. Designing a programming language is a much harder task than writing a game itself. I designed several programming languages in the past; writing a decent interpreter alone usually takes in the order of 10k sloc (remember that you have to include a simple runtime library).


This really depends on the language. My bytecode compiler for RoboTalk is 743 lines including comments; the interpreter is 2078 lines. It helps that the language is essentially an assembly language for a stack-based virtual machine.

Eomund wrote:I think I'm going to have to give a try to C++ with some embedded Lua. Lua seems really easy to sandbox:
http://lua-users.org/wiki/SandBoxes

Yes and no. Restricting the available commands to only those that won't step out of your sandbox (or let the user break out) is easy; restricting memory use and CPU time is a good deal harder.

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Safely executing unkown code

Postby korona » Mon Sep 23, 2013 8:40 pm UTC

Carnildo wrote:
korona wrote:I heavily recommend against building your own language. Designing a programming language is a much harder task than writing a game itself. I designed several programming languages in the past; writing a decent interpreter alone usually takes in the order of 10k sloc (remember that you have to include a simple runtime library).


This really depends on the language. My bytecode compiler for RoboTalk is 743 lines including comments; the interpreter is 2078 lines. It helps that the language is essentially an assembly language for a stack-based virtual machine.

Sure, it depends on the language. If you develop an assembly like language you don't need an elaborate parser or type system. I wrote a compiler for a small managed curly-brace toy language in about 4k sloc of JavaScript (the equivalent C++ code would have been much larger!). It had integers, strings, structs, first class functions and automatic type inference and was mature enough to write a bytecode-to-assembly compiler in it which I used to compile the language to native code. The parser took around 700 lines, the remaining lines are code generation. An interpreter for the generated bytecode is very small and < 500 sloc.

Another project I worked on included a vm for a more complete language. It had everything Java has plus a few things Java doesn't have, for example unsigned types and real templates. The bytecode compiler, virtual machine (including a x86-JIT compiler) and runtime library together are about 30k sloc.

Carnildo wrote:
Eomund wrote:I think I'm going to have to give a try to C++ with some embedded Lua. Lua seems really easy to sandbox:
http://lua-users.org/wiki/SandBoxes

Yes and no. Restricting the available commands to only those that won't step out of your sandbox (or let the user break out) is easy; restricting memory use and CPU time is a good deal harder.

I don't know about Windows, but for unix-like systems limiting CPU and memory is trivial: Run the script in a different process and use setrlimit() to set CPU and memory limits.

Ubik
Posts: 1016
Joined: Thu Oct 18, 2007 3:43 pm UTC

Re: Safely executing unkown code

Postby Ubik » Mon Sep 23, 2013 8:56 pm UTC

Limiting memory usage of Lua might be easy, on Stack Overflow (http://stackoverflow.com/questions/9671793/limiting-a-lua-scripts-memory-usage) there is an answer that tells the interpreter can be given your own allocation function. I think also I came across some other discussion that talked about hooking into some part of the Lua VM that gets called every nth instruction to control execution time. But if allocation is already controlled, maybe just running the VM in another thread is good enough. I mean, maybe easier to handle than managing another process?

User avatar
Eomund
Posts: 115
Joined: Fri Jun 22, 2012 11:48 pm UTC
Location: The Great White North (Left side)

Re: Safely executing unkown code

Postby Eomund » Tue Sep 24, 2013 1:14 am UTC

Ubik wrote:Limiting memory usage of Lua might be easy, on Stack Overflow (http://stackoverflow.com/questions/9671793/limiting-a-lua-scripts-memory-usage) there is an answer that tells the interpreter can be given your own allocation function. I think also I came across some other discussion that talked about hooking into some part of the Lua VM that gets called every nth instruction to control execution time. But if allocation is already controlled, maybe just running the VM in another thread is good enough. I mean, maybe easier to handle than managing another process?


Yeah, I had seen that thread on Stack Overflow for memory. For time I was thinking of just running the unknown code in a child thread and then killing it, if it was still running when its time was up.

User avatar
Jplus
Posts: 1721
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Safely executing unkown code

Postby Jplus » Tue Sep 24, 2013 10:58 am UTC

A tangential consideration: LuaJIT is a drop-in replacement for the main Lua implementation that executes considerably faster. Depending on how time-consuming the task is that the players need to complete, this might be an advantage. On the other hand, it does restrict you to version 5.1 of the language.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 3 guests