Coding: Fleeting Thoughts

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Prelates, Moderators General

Re: Coding: Fleeting Thoughts

Postby Ubik » Fri Mar 02, 2012 11:36 pm UTC

Annoying, I want to have an resizable array of structs in C#, and the List<T> actually is that. The thing is that it doesn't let me get a direct reference to the internal array or to the array's first element, which is what I would need, because I need to pass that contiguous block of memory to OpenGL. It does have a method that creates a copy of the data, but I'd like to minimize the copying of stuff around if possible. I guess I'll just use List for now, but still, feels bit ugly.
User avatar
Ubik
 
Posts: 892
Joined: Thu Oct 18, 2007 3:43 pm UTC

Re: Coding: Fleeting Thoughts

Postby Thesh » Sat Mar 03, 2012 1:45 am UTC

I doubt List<T> actually stores everything in a contiguous block of memory. It probably allocates a new block whenever it fills up. It's more reliable and faster to add data that way. Otherwise you have to copy all the data every time you have to allocate more space, and if you don't have a large enough contiguous block of memory, it will outright fail.
Big are getting bigger and hearts are getting harder, an imaginary game
eating at every living thing, a voice dripping with sarcasm like a bloody fat slash
grinning over bleached white-fang teeth that glow like green warning signs of sickness.
User avatar
Thesh
Has the Brain Worms, In Case You Forgot.
 
Posts: 3438
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: California, Southern USA

Re: Coding: Fleeting Thoughts

Postby EvanED » Sat Mar 03, 2012 1:57 am UTC

Maybe I'm biased by the design of the STL's vector (which is required to have contiguous storage), but I'd absolutely expect it to... it's slower to add data, but a little faster to access.
EvanED
 
Posts: 4038
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: Coding: Fleeting Thoughts

Postby Thesh » Sat Mar 03, 2012 2:07 am UTC

Well, apparently Mono's implementation does store it as a contiguous block.

https://github.com/mono/mono/blob/maste ... ic/List.cs

When most people use lists in .NET, they read the data sequentially via an enumerator or through iteration. I wouldn't expect performance to be that bad in that case (provided you optimized for sequential access). Granted, the necessary checks would still be more costly than accessing an array directly.
Big are getting bigger and hearts are getting harder, an imaginary game
eating at every living thing, a voice dripping with sarcasm like a bloody fat slash
grinning over bleached white-fang teeth that glow like green warning signs of sickness.
User avatar
Thesh
Has the Brain Worms, In Case You Forgot.
 
Posts: 3438
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: California, Southern USA

Re: Coding: Fleeting Thoughts

Postby Aaeriele » Sat Mar 03, 2012 2:11 am UTC

Thesh wrote:I doubt List<T> actually stores everything in a contiguous block of memory. It probably allocates a new block whenever it fills up. It's more reliable and faster to add data that way. Otherwise you have to copy all the data every time you have to allocate more space, and if you don't have a large enough contiguous block of memory, it will outright fail.


http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx

The List<T> class is the generic equivalent of the ArrayList class. It implements the IList<T> generic interface using an array whose size is dynamically increased as required.


Array-based lists have O(log n) amortized time for addition (assuming a basic "allocate twice as much space when you hit a boundary" scheme), and O(1) amortized time for lookup. Linked lists have O(1) amortized time for addition, and O(n) amortized time for lookup.

There's a reason why most people tend to use array-based lists by default.
Vaniver wrote:Harvard is a hedge fund that runs the most prestigious dating agency in the world, and incidentally employs famous scientists to do research.

afuzzyduck wrote:ITS MEANT TO BE FLUTTERSHY BUT I JUST SEE AAERIELE! CURSE YOU FORA!
User avatar
Aaeriele
 
Posts: 2096
Joined: Tue Feb 23, 2010 3:30 am UTC
Location: San Francisco, CA

Re: Coding: Fleeting Thoughts

Postby EvanED » Sat Mar 03, 2012 2:26 am UTC

Aaeriele wrote:Array-based lists have O(log n) amortized time for addition (assuming a basic "allocate twice as much space when you hit a boundary" scheme)

No, that's amortized O(1).

It's also not the case that a single array and traditional linked list are the only ways to do things. For instance, one typical implementation of the STL's deque is essentially an array of arrays. If there's no space in the "inner" array when adding an element, a new one is created and a pointer added to the "outer" array. If no space is left in the outer array, it is reallocated. Doesn't change the worst-case time complexity (still O(1), O(1) amortized), but it does affect the constant quite a bit. It speeds insertion, and slows lookup (because of a double indirection).
EvanED
 
Posts: 4038
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: Coding: Fleeting Thoughts

Postby Thesh » Sat Mar 03, 2012 2:33 am UTC

Aaeriele wrote:The List<T> class is the generic equivalent of the ArrayList class. It implements the IList<T> generic interface using an array whose size is dynamically increased as required.


That may very well be what it does internally, but I don't think documentation guarantees implementation details, just how it appears to the programmer. It would be easy to find out for sure just by adding a watch in the debugger and probing the private members, but I don't have it on my computer at the moment.

If that's indeed the case, which it probably is, and you are really concerned about performance, you could probably just implement a lightweight version using a class or struct containing only a public array, a property containing the Count, a field for the initial size of the array and the number of extra elements to allocate and a constructor to set it, and an Add method that calls Array.Resize() whenever Count == Array.Length, thereby avoiding the extra copy.

Aaeriele wrote:Array-based lists have O(log n) amortized time for addition (assuming a basic "allocate twice as much space when you hit a boundary" scheme), and O(1) amortized time for lookup. Linked lists have O(1) amortized time for addition, and O(n) amortized time for lookup.

There's a reason why most people tend to use array-based lists by default.


For random access, an array is clearly going to outperform. For sequential access, you can still achieve O(1), and I would argue that the vast majority of the time, Lists are accessed sequentially rather than randomly.

And actually, if you took away the remove method or accepted a total resize in the case of the remove, an array of equally sized arrays would still be O(1) for sequential access, requiring only a modulus and division for each random read.
Big are getting bigger and hearts are getting harder, an imaginary game
eating at every living thing, a voice dripping with sarcasm like a bloody fat slash
grinning over bleached white-fang teeth that glow like green warning signs of sickness.
User avatar
Thesh
Has the Brain Worms, In Case You Forgot.
 
Posts: 3438
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: California, Southern USA

Re: Coding: Fleeting Thoughts

Postby Aaeriele » Sat Mar 03, 2012 2:39 am UTC

EvanED wrote:
Aaeriele wrote:Array-based lists have O(log n) amortized time for addition (assuming a basic "allocate twice as much space when you hit a boundary" scheme)

No, that's amortized O(1).

My mistake, although I think there are characteristics one could introduce that would result in something other than O(1) depending on relative efficiency of reallocation vs. a basic addition operation if space is available. (If reallocation time heavily dominates addition time.)

That said, it actually only makes the point I was making stronger.

EvanED wrote:It's also not the case that a single array and traditional linked list are the only ways to do things. For instance, one typical implementation of the STL's deque is essentially an array of arrays. If there's no space in the "inner" array when adding an element, a new one is created and a pointer added to the "outer" array. If no space is left in the outer array, it is reallocated. Doesn't change the worst-case time complexity (still O(1), O(1) amortized), but it does affect the constant quite a bit. It speeds insertion, and slows lookup (because of a double indirection).


Sure; I was keeping things down to the simplistic options to make the example.
Vaniver wrote:Harvard is a hedge fund that runs the most prestigious dating agency in the world, and incidentally employs famous scientists to do research.

afuzzyduck wrote:ITS MEANT TO BE FLUTTERSHY BUT I JUST SEE AAERIELE! CURSE YOU FORA!
User avatar
Aaeriele
 
Posts: 2096
Joined: Tue Feb 23, 2010 3:30 am UTC
Location: San Francisco, CA

Re: Coding: Fleeting Thoughts

Postby EvanED » Sat Mar 03, 2012 2:55 am UTC

Aaeriele wrote:Sure; I was keeping things down to the simplistic options to make the example.

I just brought that out because Thesh basically suggested that option before, implicitly.

By the way, it's also possible to experimentally demonstrate what I said about the array-of-arrays thing being a common deque implementation, using the following program:
Spoiler:
Code: Select all
#include <iostream>
#include <vector>
#include <deque>

int const vector_insertions = 9;
int const deque_insertions = 1000;

struct IntWrapper
{
    int val;

    IntWrapper(int v)
        : val(v)  {}

    IntWrapper(IntWrapper const & other)
        : val(other.val)
    {
        std::cout << "    copying " << val << "\n";
        val = other.val;
    }   

    void operator= (IntWrapper const & other) {
        std::cout << "    copying " << val << "\n";
        val = other.val;
    }
};

int main()
{
    std::vector<IntWrapper> v;

    std::cout << "************************\n";
    std::cout << "Vector:\n";
   
    for (int i=0; i<vector_insertions; ++i) {
        std::cout << "  Inserting " << i << ":\n";
        v.push_back(i);
    }

    std::deque<IntWrapper> d;

    std::cout << "************************\n";
    std::cout << "Deque:\n";
   
    for (int i=0; i<deque_insertions; ++i) {
        std::cout << "  Inserting " << i << ":\n";
        d.push_back(i);
    }
}

Compiled and run with GCC 4.3.2, vector reallocates during the second, third, fifth, and ninth insertions. (I'm a bit surprised it doesn't have a minimum size. Oh well.) Yet it never reallocates the arrays that actually hold the data (the "inner arrays") with even 100,000 insertions into the deque. (The easiest way to establish that is to pipe it into wc -l. :-)) Visual Studio behaves similarly, except with even more allocation in the vector. (I don't know what its growth strategy is.)

As for Thesh's hypothesis that random lookup isn't used much, I'm not sure. It'd be interesting to see.
EvanED
 
Posts: 4038
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: Coding: Fleeting Thoughts

Postby Thesh » Sat Mar 03, 2012 3:54 am UTC

You know, if you really don't care about random read performance, you can use a staggered array with each new allocation being double the previous. For sequential reads, there's no performance hit over a square array (well, one additional operation per row as you have to double the variable containing the number of columns). For random reads, you have to take the binary logarithm of the index to figure out which row it is. This means it has less than or equal to the number of allocations of contiguous storage, provided you allocate 2x the current length in both cases, but still with no copying.

I think when no one is looking I'm going to write this class at work on Monday and test the performance against List<T> for both sequential and random reads.
Big are getting bigger and hearts are getting harder, an imaginary game
eating at every living thing, a voice dripping with sarcasm like a bloody fat slash
grinning over bleached white-fang teeth that glow like green warning signs of sickness.
User avatar
Thesh
Has the Brain Worms, In Case You Forgot.
 
Posts: 3438
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: California, Southern USA

Re: Coding: Fleeting Thoughts

Postby Ubik » Sat Mar 03, 2012 11:40 am UTC

What Thesh said about guarantees on the implementation: It's true that the documentation mentioning an internal array doesn't mean it's necessarily stored contiguously in all cases even though the current implementation apparently does so, as opposed to things like std::vector.

I also realized that when using a bare array I can actually fiddle with the elements directly instead of creating a new instance of the struct and replacing the one in the list by it.

The thing I'm doing is actually probably not very performance-critical, it's a system to manage a set 3D sprites. It's probably more efficient to manually handle their vertices and then just do a single draw call to render them all than it would be to set rendering parameters and make a draw call for each two-triangle sprites independently. Ah, the joys of premature optimization.
User avatar
Ubik
 
Posts: 892
Joined: Thu Oct 18, 2007 3:43 pm UTC

Re: Coding: Fleeting Thoughts

Postby Sc4Freak » Sun Mar 04, 2012 12:56 am UTC

EvanED wrote:Visual Studio behaves similarly, except with even more allocation in the vector. (I don't know what its growth strategy is.)


Visual Studio grows arrays by 1.5x each time it reallocates. Most implementations use 2x. It's still the same big-O complexity, of course, but MSVC's strategy trades reduced memory usage for additional allocation cost.
User avatar
Sc4Freak
 
Posts: 673
Joined: Thu Jul 12, 2007 4:50 am UTC
Location: Redmond, Washington

Re: Coding: Fleeting Thoughts

Postby Yakk » Sun Mar 04, 2012 5:16 am UTC

I saw a good talk by Stroustrup on Friday.

Suppose you have a container of integers. And suppose you add X elements to it (to the end). You then iterate over the list to a random nth element (you don't get to random-access!), and remove it. You repeat this removal until the list is empty.

At what size does the std::list<int> outperform the std::vector<int> at this operation?
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10324
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby Jplus » Sun Mar 04, 2012 10:54 am UTC

Because of block moves and because iterating by incrementing a pointer is cheaper than iterating by fetching new pointers from the heap, I suspect std::list<int> will outperform std::vector<int> only for very larges values of X. Say, when substantially less than X integers fit into the CPU cache.

But please tell us the answer, I'm curious! :)
Feel free to call me Julian. J+ is just an abbreviation.
Image coding and xkcd combined
User avatar
Jplus
 
Posts: 1427
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: classified

Re: Coding: Fleeting Thoughts

Postby EvanED » Sun Mar 04, 2012 6:27 pm UTC

Jplus wrote:Because of block moves and because iterating by incrementing a pointer is cheaper than iterating by fetching new pointers from the heap...

Not to mention the time in the allocator freeing blocks. I don't have a good sense of how expensive this is with typical allocators... it might be a dozen instructions or a thousand.
EvanED
 
Posts: 4038
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: Coding: Fleeting Thoughts

Postby korona » Mon Mar 05, 2012 2:13 am UTC

Jplus wrote:Say, when substantially less than X integers fit into the CPU cache.

I'm pretty sure that this is not the case because swapping half of the elements from index i to i-1 is very costly and will dominate the runtime if there are some (maybe a few hundred) elements in the list.
Allocation is cheap if you're using a modern allocator.
korona
 
Posts: 300
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Coding: Fleeting Thoughts

Postby Yakk » Mon Mar 05, 2012 3:44 pm UTC

As far as Stroustrup could figure out, the answer is "Never". (I think he graphed it up to 500k, and the ratio of the difference continued to grow massively with no signs that it was falling back.)

Graph is here:
http://www.computer.org/portal/web/comp ... =cnhome-v1
Direct to PDF:
http://www.computer.org/cms/Computer.or ... ucture.pdf

Page 52 (5 of 13 in PDF). Even with a preallocated list, the cost of iterating via linked list is so much higher than iterating via vector that it never gets close. (Bottom of graph is in 100k units).
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10324
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby korona » Mon Mar 05, 2012 7:32 pm UTC

That comparison is biased in favor of the vector because insertions happen at random places so that the list is actually doing random accesses to the heap causing a cache miss on every access (which is Stroustrups point here).
If the elements are added to the end of the list (as you wrote in your original post) the list will outperform the vector for large values of N.
EDIT: I wrote a quick benchmark and vector seems to perform better than list even if the memory is accessed in a linear pattern during list traversal.
korona
 
Posts: 300
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Coding: Fleeting Thoughts

Postby Yakk » Mon Mar 05, 2012 8:13 pm UTC

Good catch, I didn't relay what he said exactly. Oops.

But, as noted, vectors still rock, even with having to move every element in the container every time you delete an element.

It is hard to beat a compact array.

We got annoyed by slow performance of sets/maps in one project, and noticed that most of our set/map manipulation was a "setup" followed by a "use" phase, so I wrote a vector-map/set that stored the elements (or pairs) using push_back, kept track of what part was sorted, and resorted whenever you searched the container for an element (if needed). (The most important performance improvement was the much, much faster duplication).
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10324
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby korona » Mon Mar 05, 2012 8:19 pm UTC

The result above was wrong. Looking at the assembly code I found out that gcc 4.4 was able to optimize away ALL memory accesses in case of the vector!
Here is the benchmark code in case anyone wants to play around with it. The dummy call makes the code very slow (it acts as a "every value in memory can change at this point" barrier for gcc) and I don't have the time to run the benchmark for multiple minutes/hours. For n=10000 vector still wins (50 seconds vs. I interrupted the program after 3 minutes).

Code: Select all
#include <cstdint>
#include <list>
#include <vector>
#include <iostream>
#include <random>

/* defined as empty functions in another compilation unit so that
   gcc is not able to determine that insertion/removal of
   the vector elements does not have any side effects */
void do_anything(std::list<int> container);
void do_anything(std::vector<int> container);

template<typename Container>
void perform(Container &container, int index, uint64_t &result) {
   auto j = container.begin();
   for(int i = 0; i < index; ++i) {
      /* make sure the iteration does not get optimized away.
         since this operation is O(1) it does not change the comparison results. */
      do_anything(container);
      result += *j; /* this operation was not enough to keep gcc from optimizing away
            all memory accesses in case of the vector */
      j++;
   }
   container.erase(j);
}

template<typename Container, typename RndEngine>
uint64_t benchmark() {
   Container container;

   const int size = 10000;

   RndEngine engine;
   engine.seed(0x12345678);

   std::uniform_int<int> ins_distrib(0, 1 << 30);
   for(int i = 0; i < size; ++i)
      container.push_back(ins_distrib(engine));   

   uint64_t result = 0;
   while(container.size() > 0) {
      std::uniform_int<int> rem_distrib(0, container.size() - 1);
      perform(container, rem_distrib(engine), result);
   }
   return result;
}

int main(int argc, char **argv) {
   std::cout << "Result: " << benchmark<std::list<int>, std::mt19937>() << std::endl;
}


Stroustrups results are most likely wrong too as his timing are much better (multiple orders of magnitude) than what a state of the art cpu can archive using the benchmark above.
korona
 
Posts: 300
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Coding: Fleeting Thoughts

Postby EvanED » Mon Mar 05, 2012 8:26 pm UTC

It does make me want to rewrite some stuff and see what happens.

The most promising thing I could do easily would be to replace some sets and maps. Yakk, do you have or know of a drop-in, vector-based replacement for them? (I'd change to unordered_map but I'm worried that we might actually depend on it being in sorted order.)
EvanED
 
Posts: 4038
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: Coding: Fleeting Thoughts

Postby You, sir, name? » Mon Mar 05, 2012 8:36 pm UTC

korona wrote:The result above was wrong. Looking at the assembly code I found out that gcc 4.4 was able to optimize away ALL memory accesses in case of the vector!
Here is the benchmark code in case anyone wants to play around with it. The dummy call makes the code very slow (it acts as a "every value in memory can change at this point" barrier for gcc) and I don't have the time to run the benchmark for multiple minutes/hours. For n=10000 vector still wins (50 seconds vs. I interrupted the program after 3 minutes).

Spoiler:
Code: Select all
#include <cstdint>
#include <list>
#include <vector>
#include <iostream>
#include <random>

/* defined as empty functions in another compilation unit so that
   gcc is not able to determine that insertion/removal of
   the vector elements does not have any side effects */
void do_anything(std::list<int> container);
void do_anything(std::vector<int> container);

template<typename Container>
void perform(Container &container, int index, uint64_t &result) {
   auto j = container.begin();
   for(int i = 0; i < index; ++i) {
      /* make sure the iteration does not get optimized away.
         since this operation is O(1) it does not change the comparison results. */
      do_anything(container);
      result += *j; /* this operation was not enough to keep gcc from optimizing away
            all memory accesses in case of the vector */
      j++;
   }
   container.erase(j);
}

template<typename Container, typename RndEngine>
uint64_t benchmark() {
   Container container;

   const int size = 10000;

   RndEngine engine;
   engine.seed(0x12345678);

   std::uniform_int<int> ins_distrib(0, 1 << 30);
   for(int i = 0; i < size; ++i)
      container.push_back(ins_distrib(engine));   

   uint64_t result = 0;
   while(container.size() > 0) {
      std::uniform_int<int> rem_distrib(0, container.size() - 1);
      perform(container, rem_distrib(engine), result);
   }
   return result;
}

int main(int argc, char **argv) {
   std::cout << "Result: " << benchmark<std::list<int>, std::mt19937>() << std::endl;
}


Stroustrups results are most likely wrong too as his timing are much better (multiple orders of magnitude) than what a state of the art cpu can archive using the benchmark above.


Code: Select all
void do_anything(std::list<int> container);
void do_anything(std::vector<int> container);


Uh, these pass "container" by value. That is likely the bulk of the slowdown.
I now occasionally update my rarely-updated blog.

I edit my posts a lot and sometimes the words wrong order words appear in sentences get messed up.
User avatar
You, sir, name?
 
Posts: 6446
Joined: Sun Apr 22, 2007 10:07 am UTC
Location: Chako Paul City

Re: Coding: Fleeting Thoughts

Postby korona » Mon Mar 05, 2012 8:47 pm UTC

Ahhh :D That's true I wondered why the slowdown was that huge. These functions make only sense when passing the container by reference of course.
korona
 
Posts: 300
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Coding: Fleeting Thoughts

Postby You, sir, name? » Mon Mar 05, 2012 8:55 pm UTC

korona wrote:Ahhh :D That's true I wondered why the slowdown was that huge. These functions make only sense when passing the container by reference of course.


You could also declare the variable volatile, that's supposed to prevent the type of optimization you're trying to get rid of.
I now occasionally update my rarely-updated blog.

I edit my posts a lot and sometimes the words wrong order words appear in sentences get messed up.
User avatar
You, sir, name?
 
Posts: 6446
Joined: Sun Apr 22, 2007 10:07 am UTC
Location: Chako Paul City

Re: Coding: Fleeting Thoughts

Postby korona » Mon Mar 05, 2012 9:01 pm UTC

volatile would force gcc to keep any memory access to the variable. I just wanted to make sure that gcc actually writes values to the vector. I don't want it to read back the size, capacity and storage pointer every time it performs an operation.
korona
 
Posts: 300
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Coding: Fleeting Thoughts

Postby Yakk » Mon Mar 05, 2012 9:05 pm UTC

This isn't drop-in. A few interfaces break.
Code: Select all
template<typename T, typename Comp==std::less_than<T>, bool multi=false>
struct FastSet
{
  typedef FastSet<T, Comp, multi> MyType;
  typedef std::vector< T > raw_vector_t;
  mutable raw_vector_t raw_vector;
  mutable size_t sorted_highwater;

  typedef raw_vector_t::const_iterator const_iterator;
  typedef raw_vector_t::iterator iterator;

  void ensure_sorted() const // a lie
  {
    if (sorted_highwater == raw_vector.size())
      return;
    std::sort( raw_vector.begin()+sorted_highwater, raw_vector.end(), Comp() );
    std::inplace_merge( raw_vector.begin(), raw_vector.begin()+sorted_highwater, raw_vector.end(), Comp() );
    if (!multi)
      raw_vector.erase( std::uniq( raw_vector.begin(), raw_vector.end(), Comp() ), raw_vector.end() );
    sorted_highwater = raw_vector.size();
  }
  iterator begin() { ensure_sorted(); return raw_vector.begin(); }
  iterator end() { ensure_sorted(); return raw_vector.end(); }
  const_iterator begin() const { ensure_sorted(); return raw_vector.begin(); }
  const_iterator end() const { ensure_sorted(); return raw_vector.end(); }
  iterator lower_bound( T const& t) { ensure_sorted(); return std::lower_bound( raw_vector.begin(), raw_vector.end(), t, Comp() ); }
  iterator upper_bound( T const& t) { ensure_sorted(); return std::upper_bound( raw_vector.begin(), raw_vector.end(), t, Comp() ); }
  std::pair<iterator, iterator> equal_range( T const& t ) { ensure_sorted(); return std::equal_range( raw_vector.begin(), raw_vector.end(), t, Comp() ); }
  const_iterator lower_bound( T const& t) const { ensure_sorted(); return std::lower_bound( raw_vector.begin(), raw_vector.end(), t, Comp() ); }
  const_iterator upper_bound( T const& t) const { ensure_sorted(); return std::upper_bound( raw_vector.begin(), raw_vector.end(), t, Comp() ); }
  std::pair<const_iterator, const_iterator> equal_range( T const& t ) const { ensure_sorted(); return std::equal_range( raw_vector.begin(), raw_vector.end(), t, Comp() ); }
  iterator find( T const& t ) { auto range = equal_range( t ); if (range.first == range.second) return raw_vector.end(); return range.first; }
  const_iterator find( T const& t ) const { auto range = equal_range( t ); if (range.first == range.second) return raw_vector.end(); return range.first; }

  // Could be faster?  Dunno...  This searches the previously sorted elements first, and if it fails, checks the unsorted elements.
  bool search( T const& t ) const { auto range1 = equal_range( t ); if (range1.first != range1.second) return true; if (sorted_highwater == raw_vector.size()) return false; return find(t) != raw_vector.end(); }

  iterator erase( iterator it ) { Assert( it-raw_vector.begin() < sorted_highwater ); raw_vector.erase( it, it+1 ); --sorted_highwater; }
  iterator erase( iterator first, iterator second ) { Assert( second-raw_vector.begin() <= sorted_highwater ); raw_vector.erase( first, second ); --sorted_highwater; }

  void clear() { raw_vector.clear(); sorted_highwater = 0; }
  void swap(FastSet< T, Comp>& other ) { raw_vector.swap( other.raw_vector ); std::swap( sorted_highwater, other.sorted_highwater ); }

  // ctor, dtor and copy do not need to be written.  Default works.
 
  T & insert_unsorted(T const& src) { raw_vector.push_back(src); return raw_vector.back(); }
  T & insert_unsorted() { raw_vector.push_back(T()); return raw_vector.back(); }
  iterator insert(T const& src) {
    auto range = equal_range(src);
    if (range.first != range.second) { *range.first = src; return range.first; }
    insert_unsorted(src);
    return std::lower_bound( raw_vector.begin(), raw_vector.end(), src, Comp() );
  }
};

Standard disclaimer: the above code was written off the top of my head, and has never seen either side of a compiler. I'm sure the one I used in production was more polished as well. Oh, and I am doing far too much stuff on one line, it needs more asserts, etc. But it should get you started.

It doesn't handle collisions all that well, one random one is deleted if you do an unsorted_insert. Both uniq and inplace_merge don't deal with stability very well, nor do they make the later one win, I believe. For closer to real map/set behavior, you'd want to use stable sorts, and write a uniq that keeps the last element that is equivalent instead of whatever it does.

This is mainly a problem with unsorted insert, and equivalence classes (like you'd have in a FastMap typically).
Last edited by Yakk on Mon Mar 05, 2012 10:45 pm UTC, edited 4 times in total.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10324
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby EvanED » Mon Mar 05, 2012 10:33 pm UTC

Thanks.

I made my own vector vs list performance test implementation, which I'll post shortly. (Edit: clarified what I meant by "implementation.")

Note that GCC is smart enough to optimize
Code: Select all
    typename T::iterator place = container.begin();
    int index = rand() % container.size();
   
    for(int i=0; i<index; ++i) {
        ++place;
    }

into a single addition with -O2 for the vector case.

However, I seem to still get consistent results.

Edit: Here's my test. Run on Linux, not on Windows, but it should work. That paste has 4 files; if you want to actually run it, you can paste the whole thing into one file then run the two commands that are in the comments at the top of the file.
EvanED
 
Posts: 4038
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: Coding: Fleeting Thoughts

Postby moiraemachy » Mon Mar 19, 2012 1:23 am UTC

Ok, I'm finally fiddling with linux, and just covered file permissions. There's something bothering me: I'm at a terminal at Xubuntu, at my home folder, and do this:
Code: Select all
mkdir deeper
cd deeper
mkdir DEEPER
cd DEEPER
chmod 000 ..


Now, I just removed my permission to access a folder up in the tree(deeper), and that should cut access to my current folder(DEEPER), correct? However, if I don't cd out of DEEPER, I still can do stuff at DEEPER if I use a relative pathnames, but not if I use absolute pathnames. Is that supposed to happen?
moiraemachy
 
Posts: 95
Joined: Wed Jan 04, 2012 9:47 pm UTC

Re: Coding: Fleeting Thoughts

Postby Yakk » Mon Mar 19, 2012 2:51 am UTC

First, give things different names. Case sensitivity is not an excuse to be silly.
Second, if what you wanted to happen where to happen, all access to directory X would require checking all directories above X to determine if access is permitted, and/or recursively walk all children and applying some magic meta-permissions set everytime you change the permissions on a directory. On top of that, ".." is really a construct of your shell and not of the file system (hard links allow a directory to exist in two places at once).

The shell similarly doesn't walk the directory tree on each command. It just keeps a handle to the current directory.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10324
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby EvanED » Mon Mar 19, 2012 3:25 am UTC

Yakk wrote:First, give things different names. Case sensitivity is not an excuse to be silly.
Second, if what you wanted to happen where to happen, all access to directory X would require checking all directories above X to determine if access is permitted, and/or recursively walk all children and applying some magic meta-permissions set everytime you change the permissions on a directory. On top of that, ".." is really a construct of your shell and not of the file system (hard links allow a directory to exist in two places at once).

You can't hard link a directory on Unix.

I'm with moiraemachy; that seems like strange behavior to me. Or at least an ugly leaky abstraction.
EvanED
 
Posts: 4038
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: Coding: Fleeting Thoughts

Postby phlip » Mon Mar 19, 2012 3:39 am UTC

If a parent directory of the directory you want isn't readable, then you won't be able to find the dir. But if you've already found it before, and have a handle to it (say, by having it as the working dir) then you can keep using that. It doesn't check the permissions for the full tree on every filesystem access, that would be prohibitively expensive. I suppose you can call this an abstraction leak.
While no one overhear you quickly tell me not cow cow.
but how about watch phone?
User avatar
phlip
Restorer of Worlds
 
Posts: 7091
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia

Re: Coding: Fleeting Thoughts

Postby moiraemachy » Mon Mar 19, 2012 4:53 am UTC

Well, guess I should look up on how the access to the file system is implemented to understand this. I thought it was, basically, "absolute pathname goes in, data out". Still,
Yakk wrote: (...) and/or recursively walk all children and applying some magic meta-permissions set everytime you change the permissions on a directory

doesn't sound that bad. I mean, the OS already has to somehow keep track of validity of file handles(or whatever the part of file handles that programs have direct access to is).

EDIT: I'm assuming you can't just save file handles on disk and use them later on another session.
moiraemachy
 
Posts: 95
Joined: Wed Jan 04, 2012 9:47 pm UTC

Re: Coding: Fleeting Thoughts

Postby Steax » Mon Mar 19, 2012 10:27 am UTC

I find HTML 5's video element support detection hilarious.

The canPlayType() function doesn’t return true or false. In recognition of how complex video formats are, the function returns a string:

"probably" if the browser is fairly confident it can play this format
"maybe" if the browser thinks it might be able to play this format
"" (an empty string) if the browser is certain it can’t play this format
In Minecraft, I use the username Rirez.
User avatar
Steax
SecondTalon's Goon Squad
 
Posts: 3029
Joined: Sat Jan 12, 2008 12:18 pm UTC

Re: Coding: Fleeting Thoughts

Postby Ubik » Mon Mar 19, 2012 1:33 pm UTC

Yakk wrote:On top of that, ".." is really a construct of your shell and not of the file system (hard links allow a directory to exist in two places at once).
I actually have the impression that the . and .. are real hard links to directories, and are the only exceptions that are allowed. Can't remember where I first read it, but googling seems to turn up pages that agree with my memories.
User avatar
Ubik
 
Posts: 892
Joined: Thu Oct 18, 2007 3:43 pm UTC

Re: Coding: Fleeting Thoughts

Postby shawnhcorey » Mon Mar 19, 2012 1:42 pm UTC

Ubik wrote:
Yakk wrote:On top of that, ".." is really a construct of your shell and not of the file system (hard links allow a directory to exist in two places at once).
I actually have the impression that the . and .. are real hard links to directories, and are the only exceptions that are allowed. Can't remember where I first read it, but googling seems to turn up pages that agree with my memories.


. and .. are hard links in most file systems. Only the root user (or a sudo user) can create hard links of directories. Anyone can create hard links for regular files.
User avatar
shawnhcorey
 
Posts: 42
Joined: Sun Jan 08, 2012 2:08 pm UTC

Re: Coding: Fleeting Thoughts

Postby phlip » Mon Mar 19, 2012 1:48 pm UTC

I remember reading somewhere that the "hard link count" as given by stat on a directory, as long as you don't somehow break the filesystem by having actual cross-linked directories, is the number of subdirectories of that dir, plus 2... because that's all the different links to it - it's linked to by name in its parent, as "." in itself, and as ".." in each subdir.

shawnhcorey wrote:Only the root user (or a sudo user) can create hard links of directories.

On linux, at the very least, not even that.
ln(1) wrote:-d, -F, --directory
allow the superuser to attempt to hard link directories (note: will probably fail due to system restrictions, even for the superuser)

Code: Select all
phlip@boris:/tmp$ mkdir test
phlip@boris:/tmp$ ln test test2
ln: `test': hard link not allowed for directory
phlip@boris:/tmp$ sudo ln test test2
ln: `test': hard link not allowed for directory
phlip@boris:/tmp$ sudo ln -F test test2
ln: creating hard link `test2' => `test': Operation not permitted

Maybe you could do it if you went direct to the filesystem and started dicking with the inodes, but then we're talking less "cross-linked directories" and more "corrupted filesystem".
While no one overhear you quickly tell me not cow cow.
but how about watch phone?
User avatar
phlip
Restorer of Worlds
 
Posts: 7091
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia

Re: Coding: Fleeting Thoughts

Postby Yakk » Mon Mar 19, 2012 2:38 pm UTC

Interesting, that .. is a hard link. I'd have been tempted to implement it the other way...
moiraemachy wrote:Well, guess I should look up on how the access to the file system is implemented to understand this. I thought it was, basically, "absolute pathname goes in, data out". Still,

No. Pathnames are names of the files, they aren't the files.

A file handle is not the name of the file. The name of a file isn't even unique in most file systems (a file can live in two spots at once in most file systems out there, and both of them being "first class" instances of the file).
Yakk wrote: (...) and/or recursively walk all children and applying some magic meta-permissions set everytime you change the permissions on a directory
doesn't sound that bad. I mean, the OS already has to somehow keep track of validity of file handles(or whatever the part of file handles that programs have direct access to is).
What does that have to do with having to recursively walk all children? Are you talking about deleting a directory?

...

I'm sad that directory hardlinking is being blocked. No infinite directory trees! *sad face*
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10324
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby Xanthir » Mon Mar 19, 2012 10:03 pm UTC

Steax wrote:I find HTML 5's video element support detection hilarious.

The canPlayType() function doesn’t return true or false. In recognition of how complex video formats are, the function returns a string:

"probably" if the browser is fairly confident it can play this format
"maybe" if the browser thinks it might be able to play this format
"" (an empty string) if the browser is certain it can’t play this format

Yup, funny history there.

Ideally, we'd just return a boolean. Unfortunately, media formats are *complicated*, and we can't ever *really* be sure until we feed the file to the decoder and see if we get an error or not. Further, some media formats are just containers, so we're *extra* not sure if we can play what's inside of them.

It was decided that it's useful to be able to distinguish "as far as we can tell, we can play this" from "we can play *some* things like this", thus the current api.

Note that you can still treat it as a boolean, since the empty string is falsey and non-empty strings are truthy.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))
User avatar
Xanthir
My HERO!!!
 
Posts: 4225
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex

Re: Coding: Fleeting Thoughts

Postby EvanED » Mon Mar 19, 2012 10:23 pm UTC

Xanthir wrote:Yup, funny history there.

Ideally, we'd just return a boolean. Unfortunately, media formats are *complicated*, and we can't ever *really* be sure until we feed the file to the decoder and see if we get an error or not. Further, some media formats are just containers, so we're *extra* not sure if we can play what's inside of them.

It was decided that it's useful to be able to distinguish "as far as we can tell, we can play this" from "we can play *some* things like this", thus the current api.

Note that you can still treat it as a boolean, since the empty string is falsey and non-empty strings are truthy.

I just wanted to say how awesome it is to have you around sometimes. There have been a few times when JavaScript has gotten some ribbing around here, and while it rightly deserves a lot of it IMO, you've stepped in with at least a somewhat reasonable and definitely interesting explanation of why whatever weirdness is happening arises. (The other time that comes to mind is the brief discussion about the Wat talk, but I think there are more.)
EvanED
 
Posts: 4038
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: Coding: Fleeting Thoughts

Postby Xanthir » Mon Mar 19, 2012 11:29 pm UTC

Aw shucks, you're making me blush. This is just a natural result of being paid to do standards work, and thus spending 5+ hours a day on CSS, HTML, and JS language mailing lists.

On that note, woo, my Variables module got published as a First Public Working Draft!
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))
User avatar
Xanthir
My HERO!!!
 
Posts: 4225
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex

PreviousNext

Return to Coding

Who is online

Users browsing this forum: 3rdtry, lorb and 7 guests