Sanitizing Code Submissions

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

MostlyHarmless
Posts: 154
Joined: Sun Oct 22, 2006 4:29 am UTC

Sanitizing Code Submissions

Postby MostlyHarmless » Sun Aug 30, 2015 7:23 am UTC

I'm trying to set up a homework submission system for my class (introduction to numerical methods). The students will submit a python file, then I'll run it and check the output. I realized that someone could easily put some malicious code in one of their homework files and cause serious problems. How do I sanitize (or, more likely, quarantine) their submissions to prevent this? If it matters, I'm writing most of this in python, but I can pick something else up if needed.

User avatar
Xanthir
My HERO!!!
Posts: 5400
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex
Contact:

Re: Sanitizing Code Submissions

Postby Xanthir » Sun Aug 30, 2015 7:38 am UTC

Most hammer-y solution: run the code in a virtual machine, torn down and restarted between runs.

There might be more subtle things you can do to make it safe, but that's guaranteed to work.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

operator[]
Posts: 156
Joined: Mon May 18, 2009 6:11 pm UTC
Location: Stockholm, Sweden

Re: Sanitizing Code Submissions

Postby operator[] » Sun Aug 30, 2015 9:24 am UTC

Within the context of competitive programming, I've heard good things about isolate. Something like this would make Python work.

This SO question describes other approaches to sandboxing.

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Sanitizing Code Submissions

Postby DaveInsurgent » Wed Sep 02, 2015 4:17 pm UTC

RUN IT IN DOCKER!!!!!1oneonelevenhypemachine

MostlyHarmless
Posts: 154
Joined: Sun Oct 22, 2006 4:29 am UTC

Re: Sanitizing Code Submissions

Postby MostlyHarmless » Thu Sep 03, 2015 4:49 pm UTC

Thanks for the suggestions! I've checked out isolate, plus some of the other suggestions from that stack exchange thread and there are some good ideas there. I had considered the virtual machine approach before, but other threads have validated my fears that it would be too slow for this sort of thing (hundreds of submissions, each only taking a few seconds). Unfortunately, I have to put the project on hold for a while. (We inherited a system that mostly works, so this isn't really high priority, but the current code is such a morass of uncommented nonsense that the department has given up on fixing bugs.) Assuming I remember, I'll report back here once I get back to this thing.

elasto
Posts: 3751
Joined: Mon May 10, 2010 1:53 am UTC

Re: Sanitizing Code Submissions

Postby elasto » Fri Sep 04, 2015 1:42 pm UTC

MostlyHarmless wrote:I had considered the virtual machine approach before, but other threads have validated my fears that it would be too slow for this sort of thing (hundreds of submissions, each only taking a few seconds).

Realistically, it's quite unlikely that someone would actually code something dangerous. So in practice you wouldn't need to tear down and recreate the VM for every submission. Likely 9999 times out of 10000 all submissions would run in sequence without a hitch.

It's just for the offchance some weirdo did do something dangerous that you'd want the protection.

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Sanitizing Code Submissions

Postby DaveInsurgent » Thu Sep 10, 2015 1:49 am UTC

Oh, in all seriousness - consider Docker. You can have a set number of containers that represent each submission environment (C++, Java, whatever) and you can even create problem sets FROM those containers, and then execute their submission that way. These will start up in milliseconds.

You can even make use of GitHub and the Docker Hub + Automated Builds so you don't need much infrastructure. Just create N * M GitHub repositories where N is the languages you want to support and M is the problem set. Your Dockerfile will look something like

# java-problem1
FROM java # itself is FROM ubuntu:14.04
ADD input.txt
ADD expected.txt
VOLUME app
ENTRYPOINT /bin/bash
CMD check.sh

check.sh is the actual 'guts' of what you need to do,

- e.g. build the file (javac),
- run the program (java -jar foo.jar with some input, possibly just pipe to it, or give it a path, depends on the problem)
- check the expected output against the input


then on however many machines you actually need to do the problem execution on, the submission gets copied to some tmp directory, then you:

docker run -ti -v ./tmp:/app {problem#}

and poof you get repeatable, consistent execution environments (and the associated problems sets) all managed via dockers layer magic, that start up and shut down in fractions of a second (the actual compilation, etc. is a different story and you'd need to do things like kill long running processes etc. )


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 7 guests