Choosing datastructures

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

HungryHobo
Posts: 1708
Joined: Wed Oct 20, 2010 9:01 am UTC

Choosing datastructures

Postby HungryHobo » Sun Oct 24, 2010 12:04 pm UTC

I've been thinking lately about the huge difference that using the right datastructures in your code can make.
Now I've learned about the basic data structures but I've well aware that there's quite a few traps situations where they're far less than optimal.

I was thinking of trying to make an app where I could tick boxes or choose options to specify my criteria.

exampe:
"Will it be updated more often than read or read more often than updated"

or perhaps something to be able to specify the expected ratio of reads:inserts:updates:deletions since I know some datatypes are cheaper to update at a cost of read speed or cheaper to read at the cost of slower updates or inserts.

Some datastructures are better in highly parallel environments than others.
Some are better when the amount of memory available is significantly larger than the amount of data to be stored.

etc.

And once the parameters are set it lists the datastructures which are the best fit for your requirements.

I'm familiar with most of the common ones,
BST's, lists, arrays, heaps, queues, hashtables etc etc

but there's no shortage of ones which I've never heard of - even the list on wikipedia http://en.wikipedia.org/wiki/List_of_data_structures has plenty I've never even heard of.

So is my idea madness or feasible?

If feasible any suggestions for criteria/options?
Give a man a fish, he owes you one fish. Teach a man to fish, you give up your monopoly on fisheries.

Axidos
Posts: 167
Joined: Tue Jan 20, 2009 12:02 pm UTC
Location: trapped in a profile factory please send help

Re: Choosing datastructures

Postby Axidos » Sun Oct 24, 2010 3:16 pm UTC

Your idea sounds useful if it can be comprehensive enough, and as long as nobody comes to believe it's the be-all end-all (I am sure there are useful structures it will not know, or structures it will know but fail to recommend in some situations it should).

I only have a comment on the design of the test: Start off with a handful of generic questions to narrow down the scope some, then begin dynamically serving new questions to narrow the options down within the likely scope.
E.G. If you ask whether they're working with numbers, you can immediately grade string-exclusive data structures as irrelevant, and serve up some new questions which try to narrow down the sorts of numeric data structures they should use. My concern is that if you try to serve them every possible question, 90% of them will probably turn out to be unnecessary in that they didn't really contribute to finding out the answer.

Since that could make the test seem endless after a little while, I'd suggest some kind of progress meter saying basically "This is how close I am to being reasonably sure of your options", if you take that approach.

letterX
Posts: 535
Joined: Fri Feb 22, 2008 4:00 am UTC
Location: Ithaca, NY

Re: Choosing datastructures

Postby letterX » Sun Oct 24, 2010 8:10 pm UTC

Axidos wrote:Your idea sounds useful if it can be comprehensive enough, and as long as nobody comes to believe it's the be-all end-all (I am sure there are useful structures it will not know, or structures it will know but fail to recommend in some situations it should).

I only have a comment on the design of the test: Start off with a handful of generic questions to narrow down the scope some, then begin dynamically serving new questions to narrow the options down within the likely scope.
E.G. If you ask whether they're working with numbers, you can immediately grade string-exclusive data structures as irrelevant, and serve up some new questions which try to narrow down the sorts of numeric data structures they should use. My concern is that if you try to serve them every possible question, 90% of them will probably turn out to be unnecessary in that they didn't really contribute to finding out the answer.

Since that could make the test seem endless after a little while, I'd suggest some kind of progress meter saying basically "This is how close I am to being reasonably sure of your options", if you take that approach.


Speaking of: there's a datastructure for this kind of process!

Carnildo
Posts: 2023
Joined: Fri Jul 18, 2008 8:43 am UTC

Re: Choosing datastructures

Postby Carnildo » Tue Oct 26, 2010 9:43 am UTC

Axidos wrote:I only have a comment on the design of the test: Start off with a handful of generic questions to narrow down the scope some, then begin dynamically serving new questions to narrow the options down within the likely scope.
E.G. If you ask whether they're working with numbers, you can immediately grade string-exclusive data structures as irrelevant, and serve up some new questions which try to narrow down the sorts of numeric data structures they should use. My concern is that if you try to serve them every possible question, 90% of them will probably turn out to be unnecessary in that they didn't really contribute to finding out the answer.

There are two ways of classifying data structures. The first is on what operations are possible (eg. determining relative ordering of two elements isn't possible in a set); the second is on what operations are fast (eg. a perfect hash table has O(1) lookup time). I think I'd start by asking what operations need to be possible.

Keep in mind that some data structures have unusual limitations on "possible". For example, with a Bloom filter, "is this element in the data structure" is not a possible operation, while "is this element not in the data structure" is possible.

User avatar
MHD
Posts: 630
Joined: Fri Mar 20, 2009 8:21 pm UTC
Location: Denmark

Re: Choosing datastructures

Postby MHD » Tue Nov 02, 2010 7:57 pm UTC

This is actually an interesting concept... Might make a simple non interactive decision tree.

But things like a 2,3 tree with list ends as a sequence is pretty much all purpose... So what about them?
EvanED wrote:be aware that when most people say "regular expression" they really mean "something that is almost, but not quite, entirely unlike a regular expression"

Carnildo
Posts: 2023
Joined: Fri Jul 18, 2008 8:43 am UTC

Re: Choosing datastructures

Postby Carnildo » Wed Nov 03, 2010 3:17 am UTC

MHD wrote:This is actually an interesting concept... Might make a simple non interactive decision tree.

But things like a 2,3 tree with list ends as a sequence is pretty much all purpose... So what about them?

What operations are fast? What operations are slow? There are plenty of data structures that can perform all common operations, but they don't do them all equally well. You wouldn't use a linked list if you needed fast random access, and you wouldn't use an array if you needed fast random inserts. This is about picking the best data structure for a given task.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Choosing datastructures

Postby Yakk » Wed Nov 03, 2010 2:56 pm UTC

And why use O-notation?

Give an estimate of the cost of copying the data being stored and the keys, and comparing. These estimates should be ranges.

Give a range of possible container sizes.

Generate explicit running times from the above (which can be distributions, because some data structures are effectively random in their performance). Expose best, worst, average and 95% confidence running time ranges for the entire domain above, possibly on a range of hardware models.

You'd also want a 20-questions like system, where it drills down on some questions to find out if stranger data structures are usable.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
Jplus
Posts: 1721
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Choosing datastructures

Postby Jplus » Wed Nov 03, 2010 4:16 pm UTC

Yakk wrote:(...) and 95% confidence running time ranges for the entire domain above, possibly on a range of hardware models.

Maybe this is a bit overdone.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

User avatar
Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Choosing datastructures

Postby Yakk » Wed Nov 03, 2010 4:58 pm UTC

:-)
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
MHD
Posts: 630
Joined: Fri Mar 20, 2009 8:21 pm UTC
Location: Denmark

Re: Choosing datastructures

Postby MHD » Thu Nov 04, 2010 9:26 pm UTC

Maybe have parameters of "How easy to code?" for imperative and functional languages. Also, add pointer-tangle-ness and maybe something about using it in purely functional where no data is ever changed and all apparent changes are copies.
EvanED wrote:be aware that when most people say "regular expression" they really mean "something that is almost, but not quite, entirely unlike a regular expression"

User avatar
Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Choosing datastructures

Postby Yakk » Thu Nov 04, 2010 10:16 pm UTC

With a decent language & library support, you shouldn't have to code reasonably famous versions of these creatures.

The pure functional bit is interesting: for a data structure to be purely functional friendly, you need internal references to be monodirectional (some kind of table might get around this?) and maybe high locality (so when you make a change, you don't end up with tonnes of changes all over the place).

Or maybe ... a data structure that can be cheaply and efficiently delta'd is a good pure functional data structure.

I'm not sure what you mean by pointer-tangle-ness -- you mean the ease to which someone manually trying to walk the pointers can get lost?
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
headprogrammingczar
Posts: 3072
Joined: Mon Oct 22, 2007 5:28 pm UTC
Location: Beaming you up

Re: Choosing datastructures

Postby headprogrammingczar » Fri Nov 05, 2010 10:29 pm UTC

Multi-directional references can be simulated with a zipper, with varying degrees of ugliness. A doubly linked list can be implemented as:

Code: Select all

data DLL a = DLL [a] a [a]

forward (DLL fs x (l:ls)) = DLL (x:fs) l ls

backward (DLL (f:fs) x ls) = DLL fs f (x:ls)

fromList (a:as) = DLL [] a as
<quintopia> You're not crazy. you're the goddamn headprogrammingspock!
<Weeks> You're the goddamn headprogrammingspock!
<Cheese> I love you


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 3 guests