Hard drive seek performance

The magic smoke.

Moderators: phlip, Moderators General, Prelates

User avatar
Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

Hard drive seek performance

Postby Shivahn » Sun Jun 03, 2012 1:08 am UTC

I have a hardware related question due to some software I'm developing, but not being a hardware person means I've had trouble finding anything other than the most vague information. First, a bit of background.

I'm developing a game engine and am currently deciding how best to implement the environment. I think a good way would be a quadtree such that each node has pointers to children that are null if there's nothing interesting in the children (e.g., if three quadrants are entirely dirt and the other is dirt and sky, then the pointers to the three quadrants are null and they're filled in with the base material of the node, in that case dirt). This method would let me have continuous materials (not blocks), while still not taking up a ton of space on the hard drive unless something goes Horribly Wrong, such as the world being entirely filled with streaks of multiple materials.

So, designing this, I realized I don't know enough about hard drive disk access to know if this is a feasible idea. In an ideal world this would be no problem - lookup time is just following pointers that logarithmically cut the area down, so the total depth of the tree wouldn't be a big problem. But that's ignoring disk access. I don't know how far apart pieces of data have to be before that starts to matter, and I don't even really know enough about filesystems to know if a few hundred megabyte file is likely to be in pieces far across the disk or close together. If the maximum depth of a tree is, say, ten nodes, is there going to be appreciable slowdown from jumping from pointer on disk to pointer on disk ten times? Or is it going to be pretty negligible? Would storing the pointers themselves in a quadtree in memory at start time be worth it in this case?

I can find broad information (lookup time is not linear to distance, because of the disk parts accelerating and decelerating) and some other broad stuff, but nothing that really lets me truly answer my question.

If anyone has some vastly superior idea for representing the world, too, feel free to let me know. But I'd really like to know about hard drive access at well - it's disconcerting that in the middle of designing something I hit a wall that I don't know how to begin to research in detail.

Carnildo
Posts: 2023
Joined: Fri Jul 18, 2008 8:43 am UTC

Re: Hard drive seek performance

Postby Carnildo » Mon Jun 04, 2012 9:02 am UTC

Hard drives are blindingly fast for sequential reads, and extremely slow for random access. A good estimate is that a random seek will take 10 milliseconds, so if you're chasing pointers ten levels deep, that's a tenth of a second per lookup.

User avatar
Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: Hard drive seek performance

Postby Shivahn » Mon Jun 04, 2012 4:21 pm UTC

Ok. I guess my question really boils down to: does it count as random access if you're jumping around within ten megabytes? Or a hundred? Is it sequential only if the information is literally 'touching', or is there a bit of a fuzzy region?

User avatar
PhoenixEnigma
Posts: 2303
Joined: Fri Sep 18, 2009 3:11 am UTC
Location: Sasquatchawan, Canada
Contact:

Re: Hard drive seek performance

Postby PhoenixEnigma » Tue Jun 05, 2012 1:20 am UTC

In general, you can probably assume that if the next byte you want isn't in the same sector (either 512B or 4kiB) as the one you just read, the drive is going to have to rotate again to read it and possibly move the read head to another track. That takes somewhere on the order of 10 milliseconds - a little less for fast drives, a little more for slow drives. You get slightly better performance if the data is all fairly local to each other, as the shorter travel for the drive head can help, but it's not a huge effect. If you need to wait for the drive to spin, it will only spin so fast, after all.

If you can section your reads up into somewhat bigger sequential chunks, you'll be far better off - the difference in throughput between 4kB random reads and 128kB random reads on spinning disk is in the neighborhood of a hundred-fold. The first few entries on this graph should give you some idea, bearing in mind that the VelociRaptors are high end 10k RPM drives, as opposed to the 7200RPM of most desktop HDDs or 5400-ish RPM of most laptop drives.

FWIW, fragmentation is becoming a non issue on modem systems, as I think all current OSs take care of it in the background by default. So you can assume what is a sequential read to the file system will almost certainly still be sequential when it hits the disk.
"Optimism, pessimism, fuck that; we're going to make it happen. As God is my bloody witness, I'm hell-bent on making it work." -Elon Musk
Shivahn wrote:I am a motherfucking sorceror.

User avatar
Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: Hard drive seek performance

Postby Shivahn » Tue Jun 05, 2012 1:37 am UTC

Ok. I think I'm going to either have to do some really serious (and often) rearranging of the world file, or change the format. The latter seems like a better choice.

Thanks. I'll see if I can find an idea that leaves the data more sequential. Space on the disk is a secondary concern to load speed unless it becomes ridiculous in size, I think.

Though, of course, another option is to just space the loading out over a second or two by doing it in steps. Hmm.

User avatar
Sc4Freak
Posts: 673
Joined: Thu Jul 12, 2007 4:50 am UTC
Location: Redmond, Washington

Re: Hard drive seek performance

Postby Sc4Freak » Thu Jun 07, 2012 6:18 am UTC

Shivahn wrote:Ok. I guess my question really boils down to: does it count as random access if you're jumping around within ten megabytes? Or a hundred? Is it sequential only if the information is literally 'touching', or is there a bit of a fuzzy region?


"It depends". The OS is liable to readahead and cache data in memory, meaning the lookup could either be practically free, or it could be extremely expensive depending on the phase of the moon.

But practically speaking, unless you have a completely ridiculous amount of data this seems completely unnecessary. For any reasonable game area I'm betting that you'll be able to fit it comfortably in memory. And if not, then you can partition your world into sections and load an entire section at a time with loading zones. It's only if you really need access to an enormous amount of data all at once that you would need to implement the system you describe - a space game with huge scale and detail where you can fly from space down to any arbitrary part of a planet's surface would be a good example. In that case, yes you'll probably need some kind of hierarchical data structure that can be loaded in real-time from disk. But in 95% of other cases this probably isn't necessary.

User avatar
Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: Hard drive seek performance

Postby Shivahn » Thu Jun 07, 2012 2:35 pm UTC

Well, explaining what I want in detail might help. I want a game world, roughly the size of Terraria, procedurally generated, with pixel-level granularity (and, yes, I realize that these are rather extreme demands, but as it's a hobby project I'm doing myself, I can afford to be demanding).

Doing this the naive way leads to a world size of about a gigabyte, assuming each pixel is a byte. Obviously, I'm not going to be doing this the naive way (hence the question initially), which is going to make it much more reasonable in size. I'll have to figure out the format before I can estimate well. Loading it all into data is an option, but even if it's a hundred megabytes, it seems.. wasteful. Perhaps I'm optimizing prematurely though.

In any case, I also realized that I was being silly by trying to fit it all into one file. It'd be reasonable to preemptively divide the world into sections, each of which gets its own file within the world folder, allowing me to easily extend the files (because each section can grow in size independently without requiring me to do weird things with their placement in a larger file). That also means that all reads will be sequential, since a section is always loaded into memory at once.

Sagekilla
Posts: 382
Joined: Fri Aug 21, 2009 1:02 am UTC
Location: Long Island, NY

Re: Hard drive seek performance

Postby Sagekilla » Fri Jun 08, 2012 2:34 am UTC

Are you familiar at all with B-Trees? They're typically one of the data structures utilized in databases for storing data.
The way they work allows them to be fairly efficient w.r.t read/write access on a hard drive. There are some other
tree data structures that are like B-trees which work on a spatial basis for indexing information.

Is there any way you could use similar data structures, perhaps some composite data structure, to store your data?

Towards the root you could store data in a quadtree format, then start specializing when you reach a sufficient depth.


(NB: these are just ideas, I have no idea if they're practical at all)
http://en.wikipedia.org/wiki/DSV_Alvin#Sinking wrote:Researchers found a cheese sandwich which exhibited no visible signs of decomposition, and was in fact eaten.

Carnildo
Posts: 2023
Joined: Fri Jul 18, 2008 8:43 am UTC

Re: Hard drive seek performance

Postby Carnildo » Fri Jun 08, 2012 4:23 am UTC

It sounds like a modified quadtree should work for you: once you start dividing space finer than a certain level, stop dividing and use a leaf node that stores a "map block" (essentially a bitmap of the area). You'll need to find the optimal size for a map block by trial and error, but it's probably going to be something in the range of 100KB to 10MB. You can store the entire quadtree in RAM for speed, while only loading map blocks on an as-needed basis.

User avatar
Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: Hard drive seek performance

Postby Shivahn » Sat Jun 09, 2012 2:43 pm UTC

Sagekilla, I'm gonna do sort of the opposite of what you suggested. I'll use bins at a high level so that I can break the files up into predictable chunks, and then from there use a quadtree. And yeah, Carnildo, that sounds good. I'm making leaves bitmaps and the nodes that aren't leaves filled in with whatever they are. I don't think I want to bother with holding the whole thing in RAM though, since I can make all the map block reads sequential by just storing relevant sections in the same bins to begin with.

Thanks everyone!


Return to “Hardware”

Who is online

Users browsing this forum: No registered users and 1 guest