Page 245 of 245

Re: Coding: Fleeting Thoughts

Posted: Tue Dec 05, 2017 2:01 am UTC
by Qaanol
Xeio wrote:Todays Google doodle is nice.

They appear to measure “shortest solution” in terms of fewest instructions, not least movement.

Re: Coding: Fleeting Thoughts

Posted: Tue Dec 05, 2017 4:08 am UTC
by hotaru
Qaanol wrote:
Xeio wrote:Todays Google doodle is nice.

They appear to measure “shortest solution” in terms of fewest instructions, not least movement.

also, the "shortest solution" is wrong for the last one.

edit #1: same for number 4.

edit #2: and number 5.

Spoiler:
number 4, 6 instructions instead of 7: { { forward, forward, right } loop 4 times, left } loop 3 times

number 5, 5 instructions instead of 6: { { forward, right } loop 10 times, left } loop 18 times

number 6, 5 instructions instead of 6: { forward, right, forward, forward } loop 11 times

Re: Coding: Fleeting Thoughts

Posted: Tue Dec 05, 2017 10:16 am UTC
by speising
hotaru wrote:
Qaanol wrote:
Xeio wrote:Todays Google doodle is nice.

Spoiler:
number 6, 5 instructions instead of 6: { forward, right, forward, forward } loop 11 times

there's a solution in 4. (hint: you're doing something 3 times)

Re: Coding: Fleeting Thoughts

Posted: Mon Dec 11, 2017 9:06 pm UTC
by Yakk
Xanthir wrote:You might be interested in looking into Streams, like RxJS and the like. The concepts are very similar, and they've thought about many of these problems already.

I've poked at them, but documentation tends to focus on *using* them, not how they where written and how they work.

A lot of what they are doing -- passing key-value property bundles around -- isn't what I'm aiming for. I want type safety and argument descriptions and the like.

Also, they seem OO; and I've been trying to be functional instead of OO. My pipe/source/sink elements are fundamentally *functions* (with a bit of type-dressing), RxJS seems to have *objects* as its fundamental unit. I can see how to solve it with objects; maybe I should just bite the bullet, create real objects with a richer API, and just have factory-from-function when users don't care?

Re: Coding: Fleeting Thoughts

Posted: Tue Dec 12, 2017 12:39 pm UTC
by Tub
Yakk wrote:I've poked at them, but documentation tends to focus on *using* them, not how they where written and how they work.

Well, they are open source...

I'm not entriely sure I read your notation right, but you seem to try mixing the two fundamental control principles of streams: push and pull.

In push-based systems, the Sources will generate new data whenever they feel like it, and call into their sources with the new data. This is useful in complex event processing, e.g. when data arrives in near real-time from sensors.

In pull-based systems, the call stack is the other way around. You call into your sources, requesting a piece of data, which in turn request data from their sources etc. This is commonly called an operator graph in database systems. Because your sink will only request as much data as it needs, this allows early aborting of queries suffixed with 'LIMIT n'. It's also used anywhere you get an Iterator.
Pull-based systems require all data to be available when requested, e.g. in memory or on a local disk. You can kinda get around that in a language with async/await or something, but that'll only turn your source<T> into source<Promise<T>>, and then you're usually better off with a push-based system.

In push-based systems, no pipe or sink needs to know about its sources. In a pull-based system, no pipe or source needs to know about its sinks. Your definitions of source and sink seem to imply a push-based system, but then your pipes get a reference to both a source and a sink, and that seems weird.

You can try to create a system that somehow does both (rxjs appears to do so), but (from a cursory glance) rxjs just implements two systems in the same library with similar APIs. Pick one and start with it.


You can do everything functional, but IMHO it's easier to model as a (acyclic, connected, directed) graph of nodes, each being an object with a standardized API, like push<T> or observe<callback<T>> (for push-based) or get<T> or getIterator<T> (for pull-based). Passing the connections in a type-safe way in the constructor just seems cleaner and more robust than mixing connections and data in an arbitrary argument list.

Re: Coding: Fleeting Thoughts

Posted: Wed Dec 13, 2017 4:17 pm UTC
by Yakk
Tub wrote:
Yakk wrote:I've poked at them, but documentation tends to focus on *using* them, not how they where written and how they work.

Well, they are open source...

And so is Linux. That doesn't mean it is reasonable to learn how Linux kernel overall architecture design by reading the source. ;)
I'm not entriely sure I read your notation right, but you seem to try mixing the two fundamental control principles of streams: push and pull.

So, I'm using continuation passing style.

A function that returns X instead takes a function that is passed X.

This permits a few things. First, a function can return *more than once*, or zero times. Second, you can return *view types* into stack data.

The first seems useful for streaming operations; sometimes one unit of data becomes 3 or 0. The second gets rid of an inefficiency in returning data of statically dynamic size (static in implementation; dynamic in interface).

In push-based systems, the Sources will generate new data whenever they feel like it, and call into their sources with the new data. This is useful in complex event processing, e.g. when data arrives in near real-time from sensors.

I think you mean "call into their sinks".
In pull-based systems, the call stack is the other way around. You call into your sources, requesting a piece of data, which in turn request data from their sources etc. This is commonly called an operator graph in database systems. Because your sink will only request as much data as it needs, this allows early aborting of queries suffixed with 'LIMIT n'. It's also used anywhere you get an Iterator.
Pull-based systems require all data to be available when requested, e.g. in memory or on a local disk. You can kinda get around that in a language with async/await or something, but that'll only turn your source<T> into source<Promise<T>>, and then you're usually better off with a push-based system.

Mine is a pull-based system with continuation passing style.
In push-based systems, no pipe or sink needs to know about its sources. In a pull-based system, no pipe or source needs to know about its sinks. Your definitions of source and sink seem to imply a push-based system, but then your pipes get a reference to both a source and a sink, and that seems weird.

The sink is only known about insofar as I *return through it*.

Instead of T(U), we get void(U, continuation(T)). The transformation of (U->T) to (U->(T->void)->void) is a brainless one. Except we get (U->T*) to (U->(T->void)->void) out of it for free.
You can do everything functional, but IMHO it's easier to model as a (acyclic, connected, directed) graph of nodes, each being an object with a standardized API, like push<T> or observe<callback<T>> (for push-based) or get<T> or getIterator<T> (for pull-based). Passing the connections in a type-safe way in the constructor just seems cleaner and more robust than mixing connections and data in an arbitrary argument list.

There isn't any data in my system. Pipe's don't *take data*, they take connections.

Source, Sink and Pipe are 3 kinds of graphs. They can be composed in certain ways. If you connect a source to a pipe, you get a source of the pipe's output. If you connect a pipe to a sink, you get a sink of the pipe's input. If you connect a pipe to a pipe, you get a pipe.

If you connect a source to a sink, you get an executable object. Running it executes the graph.

You can work with any of the 3 types of graphs. You can consume data from a source by passing in a continuation. You can provide data to a sink. You can do both (at once) to a pipe to process data.

The fact that nodes are also sinks/sources or pipes simply reflects that a node is a graph.

The syntax ends up being pretty clean, and as I don't type erase unless requested everything can be inlined.

What I'm missing is a clean way to handle multiple inputs/outputs. Reading the react streams has given me a few thoughts, but their system seems really tied to timers and async and property bags and all that mess.

Re: Coding: Fleeting Thoughts

Posted: Wed Dec 13, 2017 11:12 pm UTC
by Tub
Yakk wrote:
Tub wrote:Well, they are open source...

And so is Linux. That doesn't mean it is reasonable to learn how Linux kernel overall architecture design by reading the source. ;)

Eh, if you wish to save time, just skip the comments. ;)

Yakk wrote:Source, Sink and Pipe are 3 kinds of graphs. They can be composed in certain ways. If you connect a source to a pipe, you get a source of the pipe's output. If you connect a pipe to a sink, you get a sink of the pipe's input. If you connect a pipe to a pipe, you get a pipe.

Yeah, but that's exactly the part where I think you're overcomplicating things. You wish for the ability to extend the graph on all ends, and that's going to end up more complicated than a graph that you can recursively create bottom-up, or maybe iteratively create bottom-up in a long chained method call. A single data type for nodes ("source" when pull-based) should suffice, and it makes design and reasoning a lot simpler.

I've worked with quite a bit of operator graphs, iterator implementations, workflow systems and other "stream" implementations, and in exactly none of them did anyone see a requirement to create three different data types, or to extend graphs on multiple sides. I'm sure it can be done (though I'm too rusty to do it in a functional language), but I wonder why one should.

Yakk wrote:What I'm missing is a clean way to handle multiple inputs/outputs.

Yeah, that seems easier in an OO language where you just slap an array of inputs/outputs on each object. It's also easy in a functional language if you restrict yourself to constructing the graph from one side. What you're trying in a functional language.. I see where your troubles come from.

Re: Coding: Fleeting Thoughts

Posted: Thu Dec 14, 2017 4:40 am UTC
by Xeio
The heck... how the hell does an execution plan for an Index Scan read 2+ million rows from an index for a table with only 10 thousand rows?

And then when I drop and recreate the index the problem magically goes away...

Re: Coding: Fleeting Thoughts

Posted: Thu Dec 14, 2017 5:26 am UTC
by Thesh
Inaccurate statitics. Dropping and recreating the index rebuilds the statistics too. Why so many rows? Depends. If it's a loop, it could be reading the same rows multiple times and then filtering them out. Sometimes, I've seen SQL Server do a cartesian join and then filter when it could have done a merge join because fuck you that's why.

SQL Server can be quirky with it's execution plans. For example, when it joins it always assumes some rows won't join; we had a huge staging database where we had several million rows per day per table, each record having a 1:1 correspondence, with hundreds of tables. After 5 or 6 joins, it will assume only 1 row will come back and optimize the execution plan accordingly - index seek, nested loops. The only way I could fix it was join hints.

Re: Coding: Fleeting Thoughts

Posted: Thu Dec 14, 2017 5:53 am UTC
by Xeio
No loop, just a "where in" clause which apparently caused the actual read rows to grow exponentially as the number of items in the list grew.

Granted, it was still slow-ish after the index rebuild so I twerked the way I was requesting the data from the entity framework. Now it just does some voodoo magic with an IQuerable rather than a list which turned the "where in" clause into a straight inner join which is neat.

So I learned a little about leaving something as IQueryable that I don't necessarily need.

Re: Coding: Fleeting Thoughts

Posted: Thu Dec 14, 2017 6:21 am UTC
by Thesh
I meant a nested loop join in the execution plan; they can result in the table being read multiple times in some situations. For example, the outer table has a 10,000 rows, the subquery returns 200 rows, and for each record in the outer table it reads all 200 rows from the inner table (subquery result), joins them, and then filters the result.

Re: Coding: Fleeting Thoughts

Posted: Thu Dec 14, 2017 4:28 pm UTC
by Tub
Xeio wrote:No loop, just a "where in" clause which apparently caused the actual read rows to grow exponentially as the number of items in the list grew.

WHERE IN (1, 2, 3, 4) or WHERE IN (SELECT ...) ?

The first shouldn't be much of a problem for short lists, but you'd need to figure out if longer lists get properly indexed so they can be joined. If the optimizer translates that into (WHERE a == 1 OR a == 2 OR a == 3 OR a == 4), that'd be bad. Temporary tables (or, apparently, IQueryable) can help.

The latter is a dependant subquery, and using those means taunting the optimizer. Make sure the query optimizer actually converts the dependant subquery into a proper join. Better yet, rewrite the query to be an actual join. Dependant subqueries aren't always optimized well; if the subquery is executed once per row then you'll see what you're seeing.

Re: Coding: Fleeting Thoughts

Posted: Thu Dec 14, 2017 6:32 pm UTC
by Xeio
Tub wrote:WHERE IN (1, 2, 3, 4) or WHERE IN (SELECT ...) ?

The first shouldn't be much of a problem for short lists, but you'd need to figure out if longer lists get properly indexed so they can be joined. If the optimizer translates that into (WHERE a == 1 OR a == 2 OR a == 3 OR a == 4), that'd be bad. Temporary tables (or, apparently, IQueryable) can help.

The latter is a dependant subquery, and using those means taunting the optimizer. Make sure the query optimizer actually converts the dependant subquery into a proper join. Better yet, rewrite the query to be an actual join. Dependant subqueries aren't always optimized well; if the subquery is executed once per row then you'll see what you're seeing.

For comparison you can look at the queries (except for the 3rd "select *" line this is all SQL generated by entity framework):

Slower Where In clause:

Code: Select all

SELECT [p].[Id], [p].[ApiKeyId], [p].[ExpiresIn], [p].[ItemID], [p].[Marks], [p].[Time], [p.Item].[ID], [p.Item].[IsExtraordinary], [p.Item].[ItemCategory], [p.Item].[Name], [p.Item].[Rarity]
FROM (
    select * from Prices where Id in (select max(Id) from Prices group by ItemId)
) AS [p]
INNER JOIN [Items] AS [p.Item] ON [p].[ItemID] = [p.Item].[ID]
WHERE [p].[ItemID] IN (1871, 1898, 1908, 2063, 1909, 1918, 1915, 2056, 1893, 1913, 1892, 1873, 1904, 2066, 2062, 1919, 1911, 2058, 1891, 1906, 1914, 1899, 1901, 1905, 1916, 1896, 2060, 1894, 2057, 1870, 1920, 2064, 1903, 1900, 1872, 1895, 1912, 1907, 2065, 2059, 1897, 1917, 1973)
ORDER BY [p.Item].[Rarity] DESC, [p.Item].[Name]


Newer Faster Query:

Code: Select all

SELECT [p].[Id], [p].[ApiKeyId], [p].[ExpiresIn], [p].[ItemID], [p].[Marks], [p].[Time], [p.Item].[ID], [p.Item].[IsExtraordinary], [p.Item].[ItemCategory], [p.Item].[Name], [p.Item].[Rarity]
FROM (
    select * from Prices where Id in (select max(Id) from Prices group by ItemId)
) AS [p]
INNER JOIN [Items] AS [p.Item] ON [p].[ItemID] = [p.Item].[ID]
WHERE [p].[ItemID] IN (
    SELECT TOP(40) [i].[ID]
    FROM [Items] AS [i]
    WHERE (CHARINDEX(@__name_1, [i].[Name]) > 0) OR (@__name_1 = N'')
    ORDER BY [i].[Name]
)
ORDER BY [p.Item].[Rarity] DESC, [p.Item].[Name]

Re: Coding: Fleeting Thoughts

Posted: Thu Dec 14, 2017 9:33 pm UTC
by Yakk
So thinking about those graphs as functions has helped.

There are a bunch of ways to join single-argument functions:

Code: Select all

U->T* + U->V* = U->(T  or   V)*

U->T  + U->V  = U->(T  and  V)

U->T* + U->V* = U->(T* and V*)

U->T* + U->V* = U->((T and V)* and (T* or U*))

A->B  + U->T  = (A or  U)->(B or  T)

A->B  + U->T  = (A and U)->(B and T)

which correspond to certain graph transformations.

I'll note my decision to have a U->T* primitive instead of a U->T primitive prevents some of the simpler transformations above (tupling output) from working.

There are probably more simple ways to "add" functions together to run in parallel. There is also the idea of splitting functions into two pieces (a function that returns both A and B)

Continuation Passing Style only "really" supports arguments with a * at the end.

U->T* + U->V* = U->(T or V)* is easy,
A->B* + U->T* = (A or U)->(B or T)* is also easy
A->B* + U->T* = (A and U)->(B or T)*

So I think I can conclude that by restricting myself to T* return values (0 or more copies of the return value), I cannot easily manage the "regrouping" into tuples without doing allocation while passing.

Unless I can think of a better way to express (X* and Y*) other than (X or Y)*.

Maybe I should just embrace (X or Y)* -- the idea that streams get individual changes. Mayhap what I need is "word" barriers or something, and other than that there is no atomic changing of more than one thing.

So unpacking a tuple is (As and ...)->(As or ...)*

Re: Coding: Fleeting Thoughts

Posted: Thu Dec 14, 2017 9:42 pm UTC
by Thesh
Xeio wrote:Newer Faster Query:


If that is causing problems, the Prices subquery is a good candidate for an indexed view.


But this made me throw up in my mouth:

Code: Select all

    SELECT TOP(40) [i].[ID]
    FROM [Items] AS [i]
    WHERE (CHARINDEX(@__name_1, [i].[Name]) > 0) OR (@__name_1 = N'')
    ORDER BY [i].[Name]

Re: Coding: Fleeting Thoughts

Posted: Thu Dec 14, 2017 9:57 pm UTC
by Xeio
Thesh wrote:If that is causing problems, the Prices subquery is a good candidate for an indexed view.
Initially I had assumed that would have to be the slowest part of the query but I ran into some other performance problems (including the above) that dwarfed any issue the subquery may have caused.

I've still been thinking about making the "most recent price for each item" a table of its own and just using a trigger to keep it up-to-date, but I'm reasonably happy with performance at the moment.

It no longer takes >1s to render (@Html.DisplayFor IS SLOOOOW) or >6s to do a search (before I rebuilt the index and changed the query).


Thesh wrote:But this made me throw up in my mouth:

Code: Select all

    SELECT TOP(40) [i].[ID]
    FROM [Items] AS [i]
    WHERE (CHARINDEX(@__name_1, [i].[Name]) > 0) OR (@__name_1 = N'')
    ORDER BY [i].[Name]

Yeah, string contains is pretty ugly in particular. The LINQ isn't so bad though.

Code: Select all

var items = _marketContext.Items
    .Where(i => i.Name.Contains(name))
    .OrderBy(i => i.Name)
    .Take(60)
    .Select(i => i.ID);

var prices = await _marketContext.Prices
    .FromSql("select * from Prices where Id in (select max(Id) from Prices group by ItemId)")
    .Include(p => p.Item)
    .OrderByDescending(p => p.Item.Rarity)
    .ThenBy(p => p.Item.Name)
    .Where(p => items.Contains(p.ItemID))
    .AsNoTracking()
    .ToListAsync();