Coding: Fleeting Thoughts

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

User avatar
You, sir, name?
Posts: 6983
Joined: Sun Apr 22, 2007 10:07 am UTC
Location: Chako Paul City
Contact:

Re: Coding: Fleeting Thoughts

Postby You, sir, name? » Tue Jul 07, 2015 8:33 pm UTC

Yeah, that was an improvement! Now you can see both the small and large limits easily.

Spoiler:
plot.png
plot.png (6.86 KiB) Viewed 4380 times


--edit--

More testing. Interestingly, although obvious when you think about it, this 128 byte limitation is per-argument.

A, B, C are plain structs so that sizeof(A) + sizeof(B) = sizeof(C) > 128, then invoking a function void f(A,B) is four times as fast as invoking f function g(C) on clang with -O3 (and twice on g++ with -O3).

Of course, it's all a bit academic since nobody in their right mind passes >128 byte datastructures by value.

--edit-- and I accidentally divided the x axis by two?!

--edit--

I re-read the >128 case assembly. It seems to basically do a memcopy (rep movsq). ... but it's slower?
I edit my posts a lot and sometimes the words wrong order words appear in sentences get messed up.

User avatar
Jplus
Posts: 1721
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Coding: Fleeting Thoughts

Postby Jplus » Tue Jul 07, 2015 9:31 pm UTC

I find it interesting that your graphs seem to suggest that passing by const reference is basically always a safe guess. I mean, it seems like it's never much worse than passing by value.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

User avatar
You, sir, name?
Posts: 6983
Joined: Sun Apr 22, 2007 10:07 am UTC
Location: Chako Paul City
Contact:

Re: Coding: Fleeting Thoughts

Postby You, sir, name? » Tue Jul 07, 2015 9:49 pm UTC

Jplus wrote:I find it interesting that your graphs seem to suggest that passing by const reference is basically always a safe guess. I mean, it seems like it's never much worse than passing by value.


My consumer function is basically a no-op though. Passing by value offers the compiler greater room to optimize than passing by reference. Consider:

Code: Select all

int doTheThingByReference(const Bar& b) {
  int i = bar.x;
  foo();
  return i;
}


In this case, the compiler can't know that foo() isn't somehow altering b through spooky action at a distance. Whereas

Code: Select all

int doTheThingByValue(Bar b) {
  int i = bar.x;
  foo();
  return i;
}


and

Code: Select all

int doTheThingByValue(Bar b) {
  foo();
  return bar.x;
}


are guaranteed to be equivalent.
I edit my posts a lot and sometimes the words wrong order words appear in sentences get messed up.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6568
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Wed Jul 08, 2015 2:40 am UTC

You, sir, name? wrote:In this case, the compiler can't know that foo() isn't somehow altering b through spooky action at a distance. Whereas

Code: Select all

int doTheThingByValue(Bar b) {
  int i = bar.x;
  foo();
  return i;
}


and

Code: Select all

int doTheThingByValue(Bar b) {
  foo();
  return bar.x;
}


are guaranteed to be equivalent.


Code: Select all

#include <iostream>
#include <map>

class Wtf
{
   private:
      int index;
   public:
      int x;

      Wtf();
      Wtf(const Wtf &);
      Wtf & operator=(const Wtf &);
      ~Wtf();
};

int nextIndex=0;
std::map<int,Wtf *> Ugh;

void foo()
{
   std::map<int,Wtf *>::iterator i;
   for (i=Ugh.begin(); i!=Ugh.end(); i++)
   {
      i->second->x++;
   }
}

int bar(Wtf a)
{
   int i = a.x;
   foo();
   return i;
}
int baz(Wtf a)
{
   foo();
   return a.x;
}

Wtf::Wtf(const Wtf &o)
{
   x = o.x;
   index = nextIndex++;
   Ugh[index] = this;
}

Wtf & Wtf::operator=(const Wtf &o)
{
   x = o.x;
   index = nextIndex++;
   Ugh[index] = this;
   return *this;
}

Wtf::Wtf()
{
   x=0;
   index = nextIndex++;
   Ugh[index] = this;
}

Wtf::~Wtf()
{
   Ugh.erase(this->index);
}

int main()
{
   Wtf a;
   std::cout << bar(a) << std::endl;

   Wtf b;
   std::cout << baz(b) << std::endl;

   return 0;
}
Summum ius, summa iniuria.

User avatar
Jplus
Posts: 1721
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

Re: Coding: Fleeting Thoughts

Postby Jplus » Wed Jul 08, 2015 6:16 am UTC

Can't wait to compile that. :)
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

User avatar
You, sir, name?
Posts: 6983
Joined: Sun Apr 22, 2007 10:07 am UTC
Location: Chako Paul City
Contact:

Re: Coding: Fleeting Thoughts

Postby You, sir, name? » Wed Jul 08, 2015 6:32 am UTC

Should be noted that the presence of custom ctor/dtor code is obscured from developers, not from compilers.

--oh--

I thought of another benchmark to confirm the 128 byte boundary thing is caused by the plain data copy constructor.

Spoiler:

Code: Select all

struct data {
        uint32_t payload[M];
};

data x, y;
__attribute__((noinline)) int test(data& a, data& b) {
        b = a;
        return 0;
}

int main() {
  for (int i = 0; i < 0x0FFFFFFF; i++) {
          test(x,y);
  }
  return EXIT_SUCCESS;
}


Answer seems to be... yup. Graphs below:
Spoiler:
plot2-log.png
plot2-log.png (5.77 KiB) Viewed 4281 times
plot2-lin.png
plot2-lin.png (5.81 KiB) Viewed 4281 times


Clang seems much faster at coping structs than gcc though. Curious.

--edit-- label fix.
I edit my posts a lot and sometimes the words wrong order words appear in sentences get messed up.

MostlyHarmless
Posts: 154
Joined: Sun Oct 22, 2006 4:29 am UTC

Re: Coding: Fleeting Thoughts

Postby MostlyHarmless » Sat Jul 11, 2015 10:07 am UTC

Today I learned that closures are weird. Maybe tomorrow I will learn why they're useful.

User avatar
ucim
Posts: 6819
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: Coding: Fleeting Thoughts

Postby ucim » Sat Jul 11, 2015 11:31 pm UTC

What are closures?

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Heartfelt thanks from addams and from me - you really made a difference.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6568
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Sat Jul 11, 2015 11:36 pm UTC

Summum ius, summa iniuria.

User avatar
ucim
Posts: 6819
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: Coding: Fleeting Thoughts

Postby ucim » Sun Jul 12, 2015 1:24 am UTC

Alas, that page makes sense only to you newfangled computer programmers. I'm an old DOShead whose newest language is PHP (and not that object-oriented stuff either). I get the sense that "closure" doesn't really apply there (and also, it's not the same as, say, closing a file or freeing memory).

Got a simple explanation in old-world terms?

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Heartfelt thanks from addams and from me - you really made a difference.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6568
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Sun Jul 12, 2015 1:42 am UTC

A closure is simply a function that is created in the context of a second function - so it can reference any variables within that second function. See the psuedocode on the Wikipedia page, it's pretty simple, here's a full example:

Code: Select all

function startAt(x)
   function incrementBy(y)
       return x + y #Note that this references x, even though x is a parameter of startAt
   return incrementBy


So, let's say you do this:

Code: Select all

incrementBy1 = startAt(1)
incrementBy2 = startAt(2)


This is equivalent to:

Code: Select all

function incrementBy1(y)
   return 1 + y

function incrementBy2(y)
   return 2 + y


Edit:

For a use case:

Code: Select all

function logBasen(n)
   b = log(n)
   function logx(x)
      return log(x)/b
   return logx


Then you can do:

Code: Select all

log2 = logBasen(2)
print log2(16) #prints 4
print log2(8) #prints 3
Summum ius, summa iniuria.

User avatar
ahammel
My Little Cabbage
Posts: 2135
Joined: Mon Jan 30, 2012 12:46 am UTC
Location: Vancouver BC
Contact:

Re: Coding: Fleeting Thoughts

Postby ahammel » Sun Jul 12, 2015 2:46 am UTC

ucim wrote: I'm an old DOShead whose newest language is PHP (and not that object-oriented stuff either). I get the sense that "closure" doesn't really apply there
I believe PHP does, in fact, support closures of a sort. Only crazy, like everything else in PHP.
He/Him/His/Alex
God damn these electric sex pants!

User avatar
ucim
Posts: 6819
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: Coding: Fleeting Thoughts

Postby ucim » Sun Jul 12, 2015 3:16 am UTC

It looks like this is all tied up in the idea of never letting functions take more than one variable. For example, the set of functions

function incrementBy1(y)
{ return 1 + y; }

function incrementBy2(y)
{ return 2 + y; }

could also be implemented
function increment(x, by)
{ return x + by; }

but instead there's this thing about functions within functions.

What is the advantage of doing this?

In the example:

Code: Select all

function startAt(x)
   function incrementBy(y)
       return x + y #Note that this references x, even though x is a parameter of startAt
   return incrementBy
the incrementBy function is the "closure"?

And then... when you do:

incrementBy2 = startAt(2)
is this a statement (assigning the variable incrementBy2 to the return value of startAt(2)? Which would be itself a function?

Or is it a definition of a function you are calling "incrementBy2"?

Tell me once again how sheep's bladders can be used to prevent earthquakes.

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Heartfelt thanks from addams and from me - you really made a difference.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6568
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Sun Jul 12, 2015 3:33 am UTC

ucim wrote:It looks like this is all tied up in the idea of never letting functions take more than one variable.


That's a separate concept called currying, which is an application of closures.

ucim wrote:What is the advantage of doing this?


Where closures really shine is when functions take functions as arguments. Let's say you have a map function, which calls a second function on every value in an array and assigns the new result, and you want to compute (continuing from my log example above) the base 7 logarithm on every element. Now, you can't pass log(x,7) to that map function, because the function that is passed can only take one parameter. However, you can pass logBasen(7) to that function, which has the same effect. Without closures, you would have to define a separate function in that computes the base 7 logarithm.

Code: Select all

a = [7,49,343]
map(logBasen(7),a)
print a #prints [1,2,3]



incrementBy2 = startAt(2)
is this a statement (assigning the variable incrementBy2 to the return value of startAt(2)? Which would be itself a function?

Or is it a definition of a function you are calling "incrementBy2"?


It is a statement assigning the return value of startAt - which is a function - to the variable "incrementBy2".
Summum ius, summa iniuria.

User avatar
ucim
Posts: 6819
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: Coding: Fleeting Thoughts

Postby ucim » Sun Jul 12, 2015 4:27 am UTC

Thesh wrote:Now, you can't pass log(x,7) to that map function, because the function that is passed can only take one parameter.
Why?

To me it looks like the reason is that the "x" isn't available until inside the map function, which is given the set to work on.

It looks like the map function:
map(thing_to_do, array_to_do_it_on)
is syntactic sugar for a loop:

newset = empty;
foreach(element in the set as x)
{ newset[newlement]=log(x,7); } // thing_to_do

Is this what goes on behind the scenes?

In the StartAt example,
answer=StartAt(2);
so, StartAt is called, with 2 as the argment. StartAt then calls incrementBy, feeding it the value of y. incrementBy uses the existing value of x, but where does it get y from?

Thesh wrote:It is a statement assigning the return value of startAt - which is a function - to the variable "incrementBy2".
Yeah, that's what I meant. Damn lisdexia!

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Heartfelt thanks from addams and from me - you really made a difference.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6568
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Sun Jul 12, 2015 4:57 am UTC

ucim wrote:To me it looks like the reason is that the "x" isn't available until inside the map function, which is given the set to work on.

It looks like the map function:
map(thing_to_do, array_to_do_it_on)
is syntactic sugar for a loop:

newset = empty;
foreach(element in the set as x)
{ newset[newlement]=log(x,7); } // thing_to_do

Is this what goes on behind the scenes?


Well, map is a function, not syntax so it's no more syntactic sugar than the factorial function is, but no, that would not be a correct interpretation. Use this C implementation of the map function as an example; it might be easier for to follow:

Code: Select all

void map(int (*foo) (int x), int *a, size_t length)
{
    size_t i;
    for(i=0; i<length; i++) a[i] = foo(a[i]);
}


The important thing to note is that if the function that you want to call has two arguments, you have to wrap it in another function if you want to call map, since it does not support functions with more than one argument. Using closures is just a cleaner way to handle it.

ucim wrote:so, StartAt is called, with 2 as the argment. StartAt then calls incrementBy, feeding it the value of y. incrementBy uses the existing value of x, but where does it get y from?


StartAt does not call incrementBy, startAt defines incrementBy, it then returns that as a function that takes y as a parameter.
Summum ius, summa iniuria.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Coding: Fleeting Thoughts

Postby EvanED » Sun Jul 12, 2015 5:03 am UTC

ucim wrote:It looks like this is all tied up in the idea of never letting functions take more than one variable.
It doesn't have to be restricted to one, you could have it take more than that. Thesh gave an example, but I'll give another one of something that I wrote a week or two ago. (Unfortunately, my example only builds closures that only take one argument too. :-))

In simplified version, I have a program that is printing a bunch of lines -- but I want to exclude certain lines from consideration. For example, I might print a line if it matches some regular expression, or doesn't match some regular expression, or contains some fixed string (which may be more efficient to implement in a dedicated manner than using the regex matching), or has a certain length.

Just a note before starting: I realized when I was 80% through that I maybe should have called the regex bits "search" instead of "match", because the interface will allow you to specify multiple regexes (or other criteria) that must all "match". But "match" for regexes usually means starting from the string start (while search can start in the middle), and it's unlikely that specifying multiple regexes to match would be a useful feature. (In actuality my thing was more complicated and it wasn't just an AND of all the criteria.) But whatever, I'll stick with it.

So suppose I have a bunch of functions as follows:

Code: Select all

bool matches_regex(string line, regex r) {
    ...
}

bool contains_substring(string line, string search) {
    ...
}

bool matches_length(string line, int desired_length) {
    return line.size() == desired_length;
}


Finally for the setup, I have some code that reads in a specification from the user of what to match and what not to match, e.g. maybe the user passes --matches-regex=... --matches-regex=... --matches-length=... as arguments, where all have to hold if a line is to be printed.

OK, so how do we write the code that determines whether to output a line?

Option 1 is to store a list of the regexes, anti-regexes, substrings, and lengths we want to match, and then do something like:

Code: Select all

bool should_print_line(string line) {
    for (regex r in regexes_to_match)
        if (!matches_regex(line, r))
            return false;
    for (regex r in regexes_to_not_match)
        if (matches_regex(line, r))
            return false;
    for (substring s in substrings_to_match)
        if (!contains_substring(line, s))
            return false;
    for (int l in lengths_to_match)
        if (!matches_length(line, l))
            return false;
    return true;
}


This works, but it's not great from a design standpoint. First, I didn't tell you how should_print_line gets all of the regexes_to_match, regexes_to_not_match, etc. lists. You could have them as globals, but that clearly sucks. You could pass them as parameters, but that's ugly and inconvenient too, especially if there are more layers between where they are populated and used. You might be able to make them members of some LineMatcher class or something; that may or may not be a good solution depending on context. Furthermore, suppose now I want some other kind of matcher. Now I have to not only write the matcher and add support to the command-line parsing, but I have to modify should_print_line to support the new matcher.

Conceptually, what I really want to do is pass should_print_line a list of predicates -- functions that will take the line as a parameter and return yes or no, and have it do something like the following:

Code: Select all

bool should_print_line(string line, list[string -> bool] predicates) {
    for (string->bool p in predicates)
        if (!p(line))
            return false;
    return true;
}

I've sort of invented a syntax here with list[string -> bool]. The string -> bool part means "a function that takes a string argument and returns a bool", and it's taking a list of those. So p, inside the loop, has type string->bool which means p(line) makes sense and evaluates to a bool.

But the problem is all my functions above take two arguments. Not just the line. Without closures, I have to somehow figure out a way to get the second parameter into should_print_line. How can I do that?

One way, which you're likely to see in a language like C++ (prior to C++11) or Java, is to do something like this:

Code: Select all

interface LinePredicate {
    bool matches(string line);
}

class RegexMatcher {
    regex regex_to_match;
    RegexMatcher(regex r) {
        regex_to_match = r;
    }
    bool matches(string line) {
        return ...;
    }
}

Then you can make should_print_line take a list[LinePredicate] and call p.matches(line).

This totally works, but the construction of that class hierarchy is a lot of boilerplate.

The other thing you can do, which you see in a lot of C APIs, is to make should_print_line take a list of pairs, where each pair is a (1) a two-argument function pointer and (2) a void* (or intptr_t). That would look something like this:

Code: Select all

typedef int bool; const bool true = 1; const bool false = 0;
struct MatchPair {
    bool (*match)(const char *, void *);
    void * param;
};
bool should_print_line(const char * line, MatchPair pairs[], size_t pairs_len) {
    for (int i=0; i<pairs_len; ++i)
        if (!pairs[i]->match(line, pairs[i]->param))
            return false;
    return true;
}

Again, it works. But there are two big problems. First, the void* means type safety is lost, and there's not a particularly good way to get around this problem. Second, the matcher still has to be only two parameters. That works for all of the example ones, but what happens if you had a matcher that takes three? You'd need to launder both things through the single param, probably by declaring another struct to hold the other two arguments, allocating an instance of that struct and filling it out when parsing the command-line option, and then being sure to deallocate it later.

By contrast, what happens if we have closures?

Well, if we have closures, we can write should_print_line exactly as we did for what I said was the ideal version:

Code: Select all

bool should_print_line(string line, list[string -> bool] predicates) {
    for (string->bool p in predicates)
        if (!p(line))
            return false;
    return true;
}


Then all we have to do is make the closures. I'll show two ways of this. The first uses the syntax Thesh used above (and is used on the Wikipedia page), which is Pythonesque. We will start out with a bit of boilerplate, which is a closure-making function for each of the matchers:

Code: Select all

def make_regex_matcher(regex_to_match):
    def matcher(line):
        return matches_regex(line, regex_to_match)
    return matcher

def make_antiregex_matcher(regex_to_not_match):
    def matcher(line):
        return not matches_regex(line, regex_to_not_match)
    return matcher

def make_substring_matcher(substring_to_find):
    def matcher(line):
        return matches_substring(line, substring_to_find)
    return matcher

def make_length_matcher(length):
    def matcher(line):
        return length_matches(length, line)
    return matcher

In this I'm showing the inner functions just calling to the ones I defined above, but you could of course just implement them right inside the respective matcher inner function; I'm sort of assuming that either the 2-argument matches_regex function is either useful separately or you already had it lying around or something.

Now we can handle command line parsing as follows. Suppose we parse --matches-regex=ABC --matches-regex=DEF --matches-length=7 into an array of pairs such as [["matches-regex", "ABC"], ["matches-regex", "DEF"], ["matches-length", 7]]; then we can run this to make a list of callable predicates:

Code: Select all

def make_predicate_list(command_line_arguments):
    predicates = [] # empty list
    for arg, value in command_line_arguments:
        if arg == "matches-regex":
            predicates.append(make_regex_matcher(value))
        elif arg == "matches-antiregex":
            predicates.append(make_antiregex_matcher(value))
        elif arg == "matches-substring":
            predicates.append(make_substring_matcher(value))
        elif arg == "matches-length":
            predicates.append(make_length_matcher(value))
    return predicates


When make_predicate_list encounters ["matches-regex", "ABC"] for example, it calls make_regex_matcher("ABC") which returns a closure where the type, if Python were typed, would be string->bool, just as the predicates list needs. And when called passing a line, that closure would match the line against the regex "ABC".

One more modification. Most languages that support closures support making them through "lambda syntax", which basically means an anonymous function. (Note that while "closure" and "lambda" are often used interchangeably, they are not always. For example, Python has full support for closures -- see the example above -- but only has very weak support for lambdas. Hypothetically you could also have the other way around: the ability to construct an anonymous function but without support for closures, but I can't think of any language that falls into that off the top of my head.)

Lambda closures will let us despense with the make_... functions as follows:

Code: Select all

def make_predicate_list(command_line_arguments):
    predicates = [] # empty list
    for arg, value in command_line_arguments:
        if arg == "matches-regex":
            predicates.append(lambda line: matches_regex(line, value))
        elif arg == "matches-antiregex":
            predicates.append(lambda line: not matches_regex(line, value))
        elif arg == "matches-substring":
            predicates.append(lambda line: matches_substring(line, value))
        elif arg == "matches-length":
            predicates.append(lambda line: matches_length(line, value))
    return predicates

Basically this is the same as above; the lambda line: matches_regex(line, value), when evaluated in the context of ["matches-regex", "ABC"], evaluates to a closure that can be called with a line and is checked against ABC.

One final note, since I've been typing this for a long time now. :-) In languages without first-class support for closures, you often see implementations of it using something like the class-based thing I gave way above. In fact, C++ supports closures and anonymous functions in C++11 and forward. For example, lamba line: matches_regex(line, value) in C++11 syntax would be written something like [value](std::string const & line) { return matches_regex(line, value); }, but the compiler will "desugar" it to something like this:

Code: Select all

// a declaration the compiler would inject:
struct __that_lambda {
    std::string /*or whatever type*/ __closed_value;
    __that_lambda(std::string v) {
        __closed_value = v;
    }
    bool operator() (std::string const & line) {
        return matches_regex(line, value);
    }
};
// then [value](std::string const & line) { return matches_regex(line, value); } just gets translated to:
__that_lambda(value)

so really C++11 lambdas evaluate to objects of some compiler-defined type. Similar in Java. So you could make an argument that the class-based solution above is really implementing closures, just with a lot of manual syntax overhead. Like almost every abstraction that programming languages provide, closures don't strictly give you more power, they just make it easier to express things.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6568
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Sun Jul 12, 2015 5:53 am UTC

Similar to that is most uses of LINQ in C#, which makes heavy use of lambda expressions. Let's say you have a list containing log entries, and you want to get all of the entries that start with "2015-07-11" and contain the word "invalid":

Code: Select all

IEnumerable GetErrorLines(IEnumerable<string> Log)
{
    Regex r = new Regex("^\d{4}-\d{2}-\d{2} Error:");
    return Log.Where(x => r.IsMatch(x));
}


Note the "x => r.Matches(x)" defines an anonymous function (lambda) that is also a closure, which takes the string x as a parameter, checks it against the regular expression "^\d{4}-\d{2}-\d{2} Error:" and returns true if there is a match. The Where function returns all lines where the supplied function returns true.
Summum ius, summa iniuria.

User avatar
Flumble
Yes Man
Posts: 2227
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: Coding: Fleeting Thoughts

Postby Flumble » Sun Jul 12, 2015 12:00 pm UTC

Can I just call it "argument binding"/"function wrapping" or is that too specific?

Ubik
Posts: 1016
Joined: Thu Oct 18, 2007 3:43 pm UTC

Re: Coding: Fleeting Thoughts

Postby Ubik » Sun Jul 12, 2015 12:18 pm UTC

I'd say that while those words capture something of the larger context in an important way, they apply to the concept only partially. To put this in another way, they don't give this feeling of completeness.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Coding: Fleeting Thoughts

Postby EvanED » Sun Jul 12, 2015 3:23 pm UTC

Flumble wrote:Can I just call it "argument binding"/"function wrapping" or is that too specific?
It's not a terrible way of thinking about it, but I'd say it's a bit too specific.

In practice, usually closures are doing more than just wrapping another function call and you'd put the logic from that function into the closure itself. So in my example, I almost certainly wouldn't actually bother writing a matches_length function and then making the corresponding closure lambda line: matches_line(line, length); I would just say lambda line: len(line) == length as the closure itself.

User avatar
ucim
Posts: 6819
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: Coding: Fleeting Thoughts

Postby ucim » Sun Jul 12, 2015 4:44 pm UTC

Thanks all; much to think about.

Thesh wrote:

Code: Select all

function logBasen(n)
   b = log(n)
   function logx(x)
     return log(x)/b
   return logx

and then later,
StartAt does not call incrementBy, startAt defines incrementBy, it then returns that as a function that takes y as a parameter.
So... logBasen(n) "is" (evaluates to) log(x)/n ?

If I got this right,
baz = logBasen(foo)

works as follows:
foo is passed to the logBasen function

Inside that function, the log of foo is assigned to b. A throwaway function is defined which takes the log of its input and divides it by the newly calculated b. This throwaway function is "returned" by logBasen.

So, when logBasen is called with foo as the argument, the thing that actually happens is that baz becomes the name of a (new) function which can be passed an argument.

IOW, baz "becomes" the logx function with log(foo) as a "built-in" parameter.

Pretty clever.
Spoiler:
Along the lines of EvanED's example, I have a cleanall function which calls cleanone (which takes an array[datatype|dirty|clean|isok]. I use it to sanitize user input. cleanone() is itself a huge elseif statement (if datatype matches... then...elseif...) Perhaps this can be simplified by feeding individual cleanthis() functions for each datatype to cleanone(). Would that make each cleanthis() function a closure?
Why the name "closure"? Is it somehow related to close (opposite of open)?

Thesh wrote:

Code: Select all

    void map(int (*foo) (int x), int *a, size_t length)
    {   size_t i;
        for(i=0; i<length; i++) a[i] = foo(a[i]);
    }
The important thing to note is that if the function that you want to call has two arguments, you have to wrap it in another function if you want to call map, since it does not support functions with more than one argument. Using closures is just a cleaner way to handle it.
Is this just because of this specific implementation, or is it a feature of closures in general? For example, if I had a two dimensional map, could I do:

Code: Select all

    void map(int (*foo) (int x, int y), int *a, size_t length, size_t width)
    {   size_t i;
        for(i=0; i<length; i++)
           for (j=0; j<width; j++)
              a[i,j] = foo(a[i,j]);
    }

So now this map function supports a two argument closure, yes?

EvanED wrote:It doesn't have to be restricted to one, you could have it take more than that.
Seems like I'm on the right track.

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Heartfelt thanks from addams and from me - you really made a difference.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Coding: Fleeting Thoughts

Postby EvanED » Sun Jul 12, 2015 5:06 pm UTC

ucim wrote:Inside that function, the log of foo is assigned to b. A throwaway function is defined which takes the log of its input and divides it by the newly calculated b. This throwaway function is "returned" by logBasen.

So, when logBasen is called with foo as the argument, the thing that actually happens is that baz becomes the name of a (new) function which can be passed an argument.
Yep, exactly.

Along the lines of EvanED's example, I have a cleanall function which calls cleanone (which takes an array[datatype|dirty|clean|isok]. I use it to sanitize user input. cleanone() is itself a huge elseif statement (if datatype matches... then...elseif...) Perhaps this can be simplified by feeding individual cleanthis() functions for each datatype to cleanone(). Would that make each cleanthis() function a closure?
Possibly? I'm not sure I follow in enough detail to be sure.

Why the name "closure"? Is it somehow related to close (opposite of open)?
People say that the "throwaway" function "closes over" the variables it captures from the surrounding scope -- so for logBasen, the throwaway function logx "closes over b" because b is the variable in the containing scopes that it uses. The way this is implemented usually, as suggested by the end of my post, is that the language will bundle the function logx up with its environment (containing b), which you can sort of view as closing something. But I don't know what "open" would mean.

So now this map function supports a two argument closure, yes?
Well, yes and no. Depends on what exactly you mean. (This is especially true because your example is a bit messed up right now and you're only passing one argument to foo. :-) I'll assume you wanted to call foo(i,j).)

C doesn't support closures at all (without doing really fancy extra-language stuff where you're dynamically generating runnable code, e.g. via libffi), so if you take that as a strict reading of C code, you couldn't give it closures.

But if you view it more broadly if you say "imagine a hypothetical C + closures", then yes. You'd pass a closure that takes two arguments. And maybe that closure really requires three other pieces of data to do it's thing, and that other data is held by the variables that were closed over when it was defined.

User avatar
ahammel
My Little Cabbage
Posts: 2135
Joined: Mon Jan 30, 2012 12:46 am UTC
Location: Vancouver BC
Contact:

Re: Coding: Fleeting Thoughts

Postby ahammel » Sun Jul 12, 2015 7:38 pm UTC

EvanED wrote:
So now this map function supports a two argument closure, yes?
Well, yes and no. Depends on what exactly you mean. (This is especially true because your example is a bit messed up right now and you're only passing one argument to foo. :-) I'll assume you wanted to call foo(i,j).)
Wait, a map that operates on array indices?

Other for multi-argument closures: Clojure (heh) has a variadic map that behaves like zipWith when given more than one sequence. E.g:

Code: Select all

map(function(a,b){ a ++ "-" ++ b }, ["kung", "pseudo"], ["fu", "code"])
=> ["kung-fu", "pseudo-code"]


Some languages provide a sortBy function which takes a two argument function to use for comparisons.
He/Him/His/Alex
God damn these electric sex pants!

Dr. Willpower
Posts: 197
Joined: Wed May 28, 2008 3:55 pm UTC

Re: Coding: Fleeting Thoughts

Postby Dr. Willpower » Mon Jul 13, 2015 4:11 pm UTC

ahammel wrote:Some languages provide a sortBy function which takes a two argument function to use for comparisons.


For some reason I remember reading a very in depth consideration into different ways to implement a generic search. The article was all about C++ implementations, while discussing the best way to-- That's what it was talking about! Type Erasure!

I wonder how this is related to closures?
Image
Hat me, bro

User avatar
Yakk
Poster with most posts but no title.
Posts: 11115
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby Yakk » Mon Jul 13, 2015 5:27 pm UTC

Type erasure can use closures as in implementation detail.

A really light type erasure of some operation, call it Foo, with signature R(Args...), in C++ looks like:

Code: Select all

template<class Foo, class R, class...Args>
struct erase<Foo, R(Args...)>:
  std::function<R(Args...)>
{
  template<class T,
    // do not type-erase instances of erase itself:
    class=std::enable_if_t<!std::is_same<std::decay_t<T>,erase>{}>
  >
  erase( T&& t ):
    std::function<R(Args...)>(
      [pt = std::addressof(t)](Args&&...args)->R{
        return Foo{}( *pt, std::forward<Args>(args)... );
      }
    )
  {}
};

where `Foo` is some stateless function object that accepts the type to be erased as the first parameter, and the other args next.

So for print_to:

Code: Select all

struct printer {
  template<class T>
  void operator()(T const& t, std::ostream& o)const{
    o<<t;
  }
};
using print_to = erase<printer, void(std::ostream&)>;

now creates an object that can take any value that can be printed to an ostream, and erases down to the ability to print it, without actually printing it at that time.

A more efficient erase is:

Code: Select all

template<class Foo, class R, class...Args>
struct erase<Foo, R(Args...)> {
  template<class T,
    // do not type-erase instances of erase itself:
    class=std::enable_if_t<!std::is_same<std::decay_t<T>,erase>{}>
  >
  erase( T&& t ):
    p(std::addressof(t)),
    f([](void const* p, Args&&...args)->R{
      auto* pt = static_cast<std::decay_t<T>*>(const_cast<void*>(p));
      return Foo{}( *pt, std::forward<Args>(args)... );
    })
  {}
  void const* p = nullptr;;
  R(*f)(Args&&...) = nullptr;
  explicit operator bool(){ return p; }
  R operator()(Args...args)const{
    return f(p, std::forward<Args>(args)...);
  }
};

where we manage the captured state manually.

The lambda syntax in C++ is [what state is captured](arguments to the resulting function)->return value { code }.

Most parts of that are optional, other than [] (which can be empty, or just contain & or = depending on how you want to capture) and {} (which can also be empty).

Now, amusingly, this also corresponds to bind_1st or partial application, where we bind the first (or multiple) argument(s) of a function call, and return the resulting bound function. Which is related to currying, where a function of type (A,B,C)->R can be reinterpreted as a function of type A->(B->(C->R)), which we write A->B->C->R -- each function takes one 1 argument, and returns a function which takes the next one, or which is the result once we are done.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
Xanthir
My HERO!!!
Posts: 5400
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex
Contact:

Re: Coding: Fleeting Thoughts

Postby Xanthir » Mon Jul 13, 2015 7:55 pm UTC

Type erasure is one of those weird details of languages with strict typing but an insufficiently strong type system, so you can't define interfaces properly, right?
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6568
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Mon Jul 13, 2015 8:08 pm UTC

When I hear type erasure, I think about the ability to pass a reference to a derived type to a function that takes a reference to the base type (since you no longer have compile-time information about the actual type of the object within that function). When people hear talk about using type erasure in C++, it just ends up with me being confused because in C# I never hear the term being used outside of discussing theoretical stuff about languages and C++ people talk about it like it's some super-special thing they are explicitly doing.
Summum ius, summa iniuria.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11115
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby Yakk » Tue Jul 14, 2015 12:45 am UTC

So C++ distinguishes between compile-time known information and run-time known information.

Types are mostly compile-time, but RTTI exists. It is statically determined unless the type is virtual, which implies some overhead already.

A rule of C++ is that you don't pay for what you don't use.

If a type can be invoked via () with the signature "int, double", it does not mean it is related to any other type. However, you can type erase down to that property with a bit of template tomfoolery, where you create an ad-hoc interface (C++ style virtual, or C-style void based, or aligned storage based, or exploit std::function's type erasure). You can type erase down to nearly any property with a fixed type signature, making the set of types erased and their values be stored, and the rest of the signature be exposed separately.

std::function type erases copying, invoking and destroying a copy of what you make it with. You can type erase printing, type erase convert to string (both construction and assignment), type erase indexing, type erase incrementing/dereferencing/equality (ie, iteration), etc.

Another name for it is "run time concepts" -- a concept isn't a type, but rather a kind of type. templates duck type their arguments to have certain operations on them: type erasure lets you (so long as your signature is fixed) do this at run time on distinct types.

A way of looking at it is that C#/Java generics are type erasure (almost of the work and checks are done at run time, basically).

The advantage of doing this is that you can strip type information out of type, and store it uniformly and operate on it uniformly. It throws away type information, and keeps only what you demand be kept.

It does not rely on interfaces, as interfaces imply binary layout requirements on the types mixed. Instead, it adapts any type with a custom template-written wrapper that handles the problem of interop.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6568
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Tue Jul 14, 2015 1:11 am UTC

C# generics do not use type erasure; the ability to create new types with all the compile time benefits rather than relying on type erasure is what sets it apart from Java.
Summum ius, summa iniuria.

Ubik
Posts: 1016
Joined: Thu Oct 18, 2007 3:43 pm UTC

Re: Coding: Fleeting Thoughts

Postby Ubik » Tue Jul 14, 2015 12:37 pm UTC

I'm tailing some log files. It would be neat if there was a utility that would otherwise just pass through what it's fed through stdin, but it would also add hourly lines programmatically.

Actually, that could also be separated into two tasks. A "reverse tee" that would read from multiple files and print any content as it arrives. Then the hourly or whatever-ly lines could maybe just be outputted by a simple loop (say, via process substitution) and given to eet as one input file.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11115
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby Yakk » Tue Jul 14, 2015 2:24 pm UTC

Thesh wrote:C# generics do not use type erasure; the ability to create new types with all the compile time benefits rather than relying on type erasure is what sets it apart from Java.
Really? Strange that they have such restrictions on types passed in. Is that just for dogmatic reasons?

I guess it can create new objects of that passed-in type, unlike Java?

Ah, of course, in C# object types are a sort of kind of object. The constructors and everything about the type are run-time concepts that can be accessed by the other side's runtime. So they can literally pass in the object type to the generic "class factory" that generates a class of that type at runtime?
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6568
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Tue Jul 14, 2015 2:59 pm UTC

Yakk wrote:Really? Strange that they have such restrictions on types passed in. Is that just for dogmatic reasons?


C# compiles to IL, and generics are implemented in the IL and can exist in DLLs; when you run the program and first try to create the type, the JIT compiler kicks in and it gets compiled to native. The main advantage of this is that if you can modify a generic class in a DLL without having to update all executables or libraries that call it. If there were no restrictions on types, then you couldn't know if the code works until you compile from IL into native and you would be able to update the classes in a way that break all calling code without any warnings.

So, C# generics create new types that are compiled at run-time through JIT, as opposed to Java which actually do use type erasure and use a single type for everything.
Summum ius, summa iniuria.

User avatar
Xanthir
My HERO!!!
Posts: 5400
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex
Contact:

Re: Coding: Fleeting Thoughts

Postby Xanthir » Tue Jul 14, 2015 9:30 pm UTC

So, like, Rust has traits, which classes can opt into. A trait describes some set of methods/etc that the class has to implement, and then you can use the trait in your type declarations. For example, the Error trait requires that you already implement the Debug and Display traits, and then description() and cause() methods. You can then write a function that operates on Errors generically, depending just on the properties guaranteed by the Error trait (compile-time enforced), and pass in any object you want so long as it derives Error.

This is a first-class handling of the same sort of thing that type erasure is a hack around, yes?

(Sorry if I'm sounding dumb or something; every single time Yakk has talked about type erasure and provided sample code, it goes straight over my head.)
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

User avatar
ahammel
My Little Cabbage
Posts: 2135
Joined: Mon Jan 30, 2012 12:46 am UTC
Location: Vancouver BC
Contact:

Re: Coding: Fleeting Thoughts

Postby ahammel » Wed Jul 15, 2015 1:20 am UTC

Xanthir wrote:So, like, Rust has traits, which classes can opt into. A trait describes some set of methods/etc that the class has to implement, and then you can use the trait in your type declarations. For example, the Error trait requires that you already implement the Debug and Display traits, and then description() and cause() methods. You can then write a function that operates on Errors generically, depending just on the properties guaranteed by the Error trait (compile-time enforced), and pass in any object you want so long as it derives Error.

Sounds a bit like a type class, yes?
He/Him/His/Alex
God damn these electric sex pants!

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6568
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Wed Jul 15, 2015 11:09 am UTC

I really hate that C++ calls its dynamic array a "vector" - then when I want to use my own type, I have to call it a column_matrix to avoid confusion, even though my type - an immutable, fixed-length, passed by value array that supports scalar addition and xor, scalar multiplication, and matrix multiplication - is much more deserving of the name.
Summum ius, summa iniuria.

User avatar
jaap
Posts: 2094
Joined: Fri Jul 06, 2007 7:06 am UTC
Contact:

Re: Coding: Fleeting Thoughts

Postby jaap » Wed Jul 15, 2015 12:03 pm UTC

Thesh wrote:I really hate that C++ calls its dynamic array a "vector" - then when I want to use my own type, I have to call it a column_matrix to avoid confusion, even though my type - an immutable, fixed-length, passed by value array that supports scalar addition and xor, scalar multiplication, and matrix multiplication - is much more deserving of the name.

That's what namespaces are for. You can have std::vector and your own vector side by side. If that is too confusing, keep your own vector inside a separate namespace too so that you have for example linalg::vector and std::vector side by side. This is exactly the reason why "using namespace std;" is bad practice.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6568
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Wed Jul 15, 2015 12:14 pm UTC

Yeah, that's what I originally did. It's just that vector is one of the most commonly used types in the STL, so it feels odd to use it.
Summum ius, summa iniuria.

User avatar
Xanthir
My HERO!!!
Posts: 5400
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex
Contact:

Re: Coding: Fleeting Thoughts

Postby Xanthir » Wed Jul 15, 2015 2:10 pm UTC

ahammel wrote:
Xanthir wrote:So, like, Rust has traits, which classes can opt into. A trait describes some set of methods/etc that the class has to implement, and then you can use the trait in your type declarations. For example, the Error trait requires that you already implement the Debug and Display traits, and then description() and cause() methods. You can then write a function that operates on Errors generically, depending just on the properties guaranteed by the Error trait (compile-time enforced), and pass in any object you want so long as it derives Error.

Sounds a bit like a type class, yes?

Yeah, right, same concept.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

User avatar
Yakk
Poster with most posts but no title.
Posts: 11115
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby Yakk » Wed Jul 15, 2015 2:54 pm UTC

Xanthir wrote:So, like, Rust has traits, which classes can opt into. A trait describes some set of methods/etc that the class has to implement, and then you can use the trait in your type declarations. For example, the Error trait requires that you already implement the Debug and Display traits, and then description() and cause() methods. You can then write a function that operates on Errors generically, depending just on the properties guaranteed by the Error trait (compile-time enforced), and pass in any object you want so long as it derives Error.

What does "derives Error" mean?

If you have the properties of Error "accidentally", do you qualify as an Error?

std::function<void()> doesn't require anything about the value passed in, other than (A) your destructor is accessible (ie, cleanup code), (B) your copy constructor is accessible, and (C) if your value is x, you can do x() on it.

You don't have to say "this type X is a function-type". The only requirements (enforced) are duck-type (it walks like a duck, quacks like a duck, it's a duck). You can document other requirements (for example, that "<" provides a total order could be a requirement for some sorting interface).

The advantage of this is that you can take objects or types designed independently, and wrap them uniformly.

C++ has inheritance from interfaces (which are just types with pure virtual methods (methods with vtable entries, but no implementations for said methods)), which seems similar to Rust traits. It sadly takes a bit of work to say that "only accept a type that inherits from these two interfaces" in C++, but it is doable. That typically isn't what is meant by type erasure/run time concepts in C++.

Another advantage is that you can erase down to a concept that the writer of the type didn't consider, or isn't even technically *part* of the type. For example, I could erase down to the concept of "you can call the free function to_string on the type, and get a std::string". People can write "to_string" for a type they don't own, and suddenly it qualifies as a legal argument. to_string could be written in namespace std for a bunch of basic types, and those types are now augmented. Or, when you write a new type, you could write a to_string free function to ensure it qualifies. Or you could write a to_string function that does compile-time probing to see if the type has a `.to_string()` method that returns a `std::string`, and if so calls it and returns it, but if not is not considered a valid overload of the type.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 14 guests