My Unix CLI manifesto, aka why PowerShell is the bees knees

Please compose all posts in Emacs.

Moderators: phlip, Prelates, Moderators General

My Unix CLI manifesto, aka why PowerShell is the bees knees

Postby EvanED » Tue Dec 13, 2011 2:20 am UTC

This rant isn't really apropos of anything, I was just thinking about shells and such. It came out a bit ranty, but whatever. It'll help me let off steam. Note that I am conflating a small bit the shell with the core binaries like ls, but I think there are strong enough interactions between them that it makes sense to treat them as a unit.

I personally think that the idea behind PowerShell -- abstractly, passing objects between process instead of just a stream of bytes -- is really awesome and wish that Unix shells would pick it up. I've seen a lot of arguments from guys with Unix beards that this breaks the simple elegance of the Unix philosophies. The purpose of this manifesto is to argue that this position is dumb, and that a rich set of tools built on the PowerShell ideas ("rich" basically means rewrite the Unix coreutils and textutils and such) would be easier to use, more consistent, and in the end, more Unixy.

So first, what's my vision? PowerShell uses .Net objects on pipes, but you could also pick some textual represention of objects. JSON seems promising to me, but nor is it the only choice. (TermKit uses JSON. I'm not enamored with everything they do, but I think that was a good choice.) I don't think the particular choice of object format is very relevant; the important bit is that programs would tend to operate on objects as a whole. For instance, ls would output a stream of objects with fields with the file name, last modification time, size, etc. grep would take a stream of objects and match a whole object at a time, likely having flags for "search names of keys", "search the value of this particular key", "search the values of all keys", etc.

So what benefits would this have?

Why find sucks

Right now, there's a lot of effort that goes into getting parsing and output code to work right, and a lot of bugs that arise because of it. How many times have you seen find and xargs used together? (We'll come back to -exec.) What percentage of those do not pass -print0 to find and -0 to xargs? Basically every one of those commands is buggy, and will fail with filenames with spaces. (Let alone weirder characters.) In general, dealing obnoxious filenames like this can be a PITA -- and it's one that can be largely obviated if you have programs which deal with objects.

Imagine if xargs outputted a sequence of Filename objects (or perhaps a seqence of Strings; you might have to go with the weaker typing to keep today's shells' flexibility) and xargs read a sequence of strings. By its very nature, the object format delimits the filenames properly, so this problem evaporates.

"What about -exec?" I hear you ask. Personally, I think that this exemplifies the anti-Unix. (We'll see more of this later.) What's one of the core Unix philosophies? "This is the Unix philosophy: Write programs that do one thing and do it well." (Yes, Doug McIlroy continues by saying you should write to text streams, but work with me here.) So why do we have find doing two things? It's both looking through the file system, and it's going off execing new processes. But we already have a program to do the latter, given a file name... it's called xargs!

And of course woe be it for someone who generates a list of filenames by some program that doesn't have a -exec flag.

By making find output a list of strings and xargs read a list of strings, you can split up those two tasks and make the pair more Unixy.

Why ls and ps suck

Let's look at another example. Suppose I want a list of files in the current directory, sorted by file size. How do I do it? ls -S. What about a list of processes sorted by CPU time? ps --sort=%cpu, or something like that. What about a list of lines in a file? sort foo.txt.

But wait... there's something strange in there. Let's look at that again. ls and ps support --sort options... but we already have a sort command! Why did those programs feel the need to add those options?

Of course the answer is that you can pipe the output of ls or ps to the sort utility, but now you have to figure out how to specify the field. We're back to the text parsing problem -- the problem that disappears if you use objects.

Which of the following sounds more Unixy?

  • Option 1:

    • The author of ls implements a bunch of code to read the --sort flag and it's cousins and then actually sort the files before it outputs them
    • The author of ps implements a bunch of code to read the --sort flag and it's cousins and then actually sort the processes before it outputs them
    • The author of sort writes sort
    • The user comes along and learns that ps and ls take --sort, learns their (different) syntaxes, and learns about the sort utility
    • The user runs those three different commands I gave above
  • Option 2:
    • The author of ls... doesn't do anything extra to support sorting
    • The author of ps... doesn't do anything extra to support sorting
    • The author of sort url writes sort
    • The user learns that ls outputs records with a size field (which, BTW, they'd see through normal use of the program, and least in the equivalent to -l, and would likely not have to look up in the man page), the user learns that ps outputs records with a cpu field, and learns how to use sort
    • To sort files by size, the user runs ls | sort --field=size; to sort processes by cpu time, the user runs ps | sort --field=cpu; to sort a file, the user runs sort foo.txt

Which one of those follows the "write one program that does one thing well" philosophy?

Which one is more consistent? Which one is the user going to learn easier?


Why find sucks, take two

But wait. Let's go back to find. Because I have an odd question.

Why do we even have find in the first place?

And the reason, of course, is that it's currently necessary. If I want to find all the files bigger than a current size without using find, it's next to impossible. But let's look at a world with object pipes.

In a word of object pipes, you would have something like grep which would select objects which match some criteria. So why not have that utility understand numbers?

That turns find -size +10M (which, btw, the syntax of I had to look up then ask a friend about because the information you need to come up with that isn't in a single place in find's manpage) into ls | select where size ">" 10000000.

Which is more unixy? Write a special-purpose program (find) with a terrible, arcane syntax which is broken if you use its default (non-print0) mode with another program... or write one generally-useful select program, pair it with ls, and trash find?

(OK, so the 10000000 instead of 10M is obnoxious. I'm not sure how to deal with that. I don't claim to have all the answers, but I am quite confident that you could get this to work well.)

Summary

I think we should replace the whole Unix command-line userland to support object piping. (I will avoid the powderkeg of saying that I think we should also extend the filesystem to support object storage. Just imagine either using a textual-at-the-base format like JSON or XML, or that your objects support serialization and deserialization.)

Rethinking how the command line looks will make it possible to arrive at a set of utilities which are easier to use, more consistent, and in the end, fit Unix's "write a program that does one thing well" philosophy far better than today's Unixs' actually programs do.
EvanED
 
Posts: 4122
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby markfiend » Wed Dec 14, 2011 11:52 am UTC

What you say here...
EvanED wrote:abstractly, passing objects between process instead of just a stream of bytes
... well, I'm ignoring that word "abstractly" here for the sake of argument, but at some level, your object will have to be represented by a stream of bytes.

The Unix tools already work on "a stream of bytes", which means that they will work on any stream of bytes, and not just on the EvanED-object-representation stream of bytes. You seem to be suggesting breaking all of these tools so that they only work on your object model, rather than on any arbitrary bytestream.

What would a shell-user do if she wanted to run (say) grep on a standard text file and only had your version?
Five tons of flax
User avatar
markfiend
 
Posts: 417
Joined: Fri Jul 06, 2007 9:59 am UTC
Location: UK (Leeds)

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby troyp » Wed Dec 14, 2011 11:00 pm UTC

markfiend wrote:The Unix tools already work on "a stream of bytes", which means that they will work on any stream of bytes, and not just on the EvanED-object-representation stream of bytes. You seem to be suggesting breaking all of these tools so that they only work on your object model, rather than on any arbitrary bytestream.

What would a shell-user do if she wanted to run (say) grep on a standard text file and only had your version?

Presumably, there'd be some simple way to create a text object built in. I think the main point of EvanED's rant is that raw text is just too unstructured a format to be ideal[1]. There's not enough scope to differentiate between different uses of the same character (eg. space in a filename vs between filenames) so you have to start escaping things all the time. You say that his scheme would stop tools from working on an arbitrary bytestream, but that's the whole point: if you don't accept an arbitrary bytestream, it means you've got some bytes reserved for *your* use. You could use them to structure your input data as you pass it about, eg. passing multiple arguments instead of just one, or providing keyword arguments[2]. This would mean you don't have to rely on fragile kludges like using spaces and dashes for these kind of things. The cost is that when you pass arbitrary data it would have to be wrapped/escaped/indicated somehow. The idea would be to make this as simple as possible.
(I hope I haven't misrepresented the point too badly)

Still, I'm also curious about how this system would handle text, especially in the case of programs that typically receive input directly from the keyboard. I don't doubt you can handle that cleanly, I'm just curious what your proposal would be.

[1] Of course, you don't want data *too* structured either - that's clearly ununixy - just *slightly* structured.
[2] I should probably point out that I'm using "input" and "argument" in a general way here, to mean all the data a command receives, and the individual pieces of it.
troyp
 
Posts: 519
Joined: Thu May 22, 2008 9:20 pm UTC
Location: Lismore, NSW

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby Derek » Wed Dec 14, 2011 11:24 pm UTC

troyp wrote:Still, I'm also curious about how this system would handle text, especially in the case of programs that typically receive input directly from the keyboard. I don't doubt you can handle that cleanly, I'm just curious what your proposal would be.

Termkit (since Evan referenced it) handles this by making input from other processes (via pipes) and from the user (via keyboard) separate.
Derek
 
Posts: 1516
Joined: Wed Aug 18, 2010 4:15 am UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby troyp » Thu Dec 15, 2011 12:58 am UTC

Derek wrote:
troyp wrote:Still, I'm also curious about how this system would handle text, especially in the case of programs that typically receive input directly from the keyboard. I don't doubt you can handle that cleanly, I'm just curious what your proposal would be.

Termkit (since Evan referenced it) handles this by making input from other processes (via pipes) and from the user (via keyboard) separate.

So it expects JSON from piped output but raw text from stdin? I assume that's only a default though, and there's some sort of switch to use the other format? So if you wanted to enter JSON from stdin you could? Or do you have to use another command to collect the input and then pipe it?

all my sentences are questions...(aren't they?)
troyp
 
Posts: 519
Joined: Thu May 22, 2008 9:20 pm UTC
Location: Lismore, NSW

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby Derek » Thu Dec 15, 2011 4:07 am UTC

troyp wrote:So it expects JSON from piped output but raw text from stdin? I assume that's only a default though, and there's some sort of switch to use the other format? So if you wanted to enter JSON from stdin you could? Or do you have to use another command to collect the input and then pipe it?

Something like that. I've never used it myself, I only know what you can read on the page for it (EvanED posted a link).
Derek
 
Posts: 1516
Joined: Wed Aug 18, 2010 4:15 am UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Thu Dec 15, 2011 4:38 am UTC

markfiend wrote:The Unix tools already work on "a stream of bytes", which means that they will work on any stream of bytes,

This is the crux of my argument: They don't work on "any stream of bytes", if you define "work" to mean "work in a usable way".

This is why find has to know how to exec things, and why ls has to know how to sort things: because xargs and sort don't work on the outputs of find and ls without a heroic amount of effort. Edit: OK that's a little bit of hyperbole on the part of find/xargs -- there it just requires a moderate amount of effort. But they still don't work together by default. Oh, unless you're talking about POSIX xargs, which doesn't have -0, or POSIX find, which doesn't have -print0. Then you're back to heroics.

What would a shell-user do if she wanted to run (say) grep on a standard text file and only had your version?

I'm not saying the system would be devoid of tools that would work on text files; at the very least, you'll still have READMEs and other files that are actually text, as opposed to encoding some higher-level, machine-readable structure in text.

This is also why I'm pretty happy with the idea of JSON, or something like JSON, as an object model: if you have two object-based tools that well and truly won't cooperate, you can always go through text munging as an intermediary. (My feeling is that this will be far less common than two tools refusing to cooperate now.)


troyp wrote:[1] Of course, you don't want data *too* structured either - that's clearly ununixy - just *slightly* structured.

Yeah, I'm not sure where the tradeoff lies. You really do want the quick-hack flexibilities that you sort of get from a shell now.

Derek wrote:
troyp wrote:Still, I'm also curious about how this system would handle text, especially in the case of programs that typically receive input directly from the keyboard. I don't doubt you can handle that cleanly, I'm just curious what your proposal would be.

Termkit (since Evan referenced it) handles this by making input from other processes (via pipes) and from the user (via keyboard) separate.

I don't actually like this idea at all, really. In my ideal world there would be virtually no detection of "am I connected to a terminal or a user"; about the only thing where that's good is if you have an editor or ncurses-style app or something like that. I don't think it's particularly appropriate for, say, ls (which does it now) even if the current benefits (nice compact display when the user runs it and easy-to-handle, one-file-per-line when it's piped somewhere) mean it's sort of worth it. But I'd rather see ls output a list of files and have some higher-level formatter decide that it can display them to the user in colums. :-)

Think about how you build up more complex pipelines. (Or at least how I build up more complex pipelines.) Start with a couple commands, then tack on more as you go along until you get what you want. If a program changes behavior when you pipe it to something else, it means that what you saw isn't what the next program sees.

(Fortunately, most uses of input switching seem to be mostly benign. I do have --color=force set on a few aliases though...)
EvanED
 
Posts: 4122
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby markfiend » Thu Dec 15, 2011 11:16 am UTC

EvanED wrote:This is the crux of my argument: They don't work on "any stream of bytes", if you define "work" to mean "work in a usable way".

This is why find has to know how to exec things, and why ls has to know how to sort things: because xargs and sort don't work on the outputs of find and ls without a heroic amount of effort. Edit: OK that's a little bit of hyperbole on the part of find/xargs -- there it just requires a moderate amount of effort. But they still don't work together by default. Oh, unless you're talking about POSIX xargs, which doesn't have -0, or POSIX find, which doesn't have -print0. Then you're back to heroics.

The phrase "a bad workman blames his tools" is springing to my mind...
Five tons of flax
User avatar
markfiend
 
Posts: 417
Joined: Fri Jul 06, 2007 9:59 am UTC
Location: UK (Leeds)

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Thu Dec 15, 2011 3:24 pm UTC

markfiend wrote:
EvanED wrote:This is the crux of my argument: They don't work on "any stream of bytes", if you define "work" to mean "work in a usable way".

This is why find has to know how to exec things, and why ls has to know how to sort things: because xargs and sort don't work on the outputs of find and ls without a heroic amount of effort. Edit: OK that's a little bit of hyperbole on the part of find/xargs -- there it just requires a moderate amount of effort. But they still don't work together by default. Oh, unless you're talking about POSIX xargs, which doesn't have -0, or POSIX find, which doesn't have -print0. Then you're back to heroics.

The phrase "a bad workman blames his tools" is springing to my mind...

So if I give you a steak knife and ask you to rip a sheet of plywood for me (or I give you a table saw and ask you to help slice some cheese), and you whine, you're a bad workman?

If it's reasonable to pair ls with sort with sort, why does ls sort things itself? Do the developers of ls just like adding useless features for the hell of it?
EvanED
 
Posts: 4122
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby Yakk » Thu Dec 15, 2011 3:50 pm UTC

The object system has an advantage over a JSON stream because serializing objects is non-free.

In particular, lets look at your find -size +10M vs ls | filter where megsize > 10

In a push based model, ls has to dump all information that its readers could want, or you'd have to twerk what data it outputs in the ls command.

In a pull based model, filter only requests what it needs to know from ls, and then passes the ls pull option through.

Pull based models require either bidirectional communication, or outputting objects (in a generator-like way) with lifetimes (so we can pass-through, and mutate).

It gets worse when you think about the pretty print option. The console needs to know how to display information in a user-readable fashion -- which either requires rather extensive "best view" information about every possible program, programs need to format their own output. Which means that ls is not just dumping every possible field in a JSON that the next process requires, it is pretty-printing and formatting everything!

And the problem of making such output better looking becomes ridiculous -- if you want to be able to modify someone else's pretty-print output, you need to do a parallel UNIX style string formatting effort. Of course, I'm not sure how this is much easier with the object model.

Then again, maybe that kind of formatting and extraneous output isn't a serious cost in today's universe.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10392
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Thu Dec 15, 2011 4:40 pm UTC

Yakk wrote:The object system has an advantage over a JSON stream because serializing objects is non-free.

There are a lot of disadvantages as well. In particular, it means that you need a cross-language object model for every language people might want to write a shell program in. In other words, all of them. You also need the serialization/deserialization anyway so that you can store intermediate results in a file (which I feel is very important).

You could rig something up encoding dynamic objects in a C struct I guess, but that of course has its own concerns.

It gets worse when you think about the pretty print option. The console needs to know how to display information in a user-readable fashion -- which either requires rather extensive "best view" information about every possible program, programs need to format their own output. Which means that ls is not just dumping every possible field in a JSON that the next process requires, it is pretty-printing and formatting everything!

Ah, what I envision here is that there is a generic* pretty-printing formatter which basically the shell tacks on to the end of any pipeline ending with one of these object-based utilities.

* I do think it's possible to write a reasonably generic formatter. For instance, programs like ls and ps that I think the objects work really well for lend themselves very naturally to a tabular format. It should probably be possible to include some formatting information, like what columns to include/omit by default, or perhaps something akin to a printf specification of how to print each object. I think this would need some brainstorming.
EvanED
 
Posts: 4122
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby Yakk » Thu Dec 15, 2011 4:52 pm UTC

If you are restricting it to JSON fields of text, we just back up a stage.

We have an interface that lets you query for a particular field by name. The content of fields is text.

Note that "flat JSON" is a valid view of this data. But instead of an object model to handle a stream of text, we have an object model to handle a generator (in the python sense) of the interface, and a field list/query interface.

(A more complex version might allow for a tree structure, but that leads to complexity that might not be warranted.)

The tricky part is the two way communication, which is what lets the pipe be a lazy pipe based on pulling instead of pushing.

There is also the problem with persistence of such objects, which JSON sidesteps. How long must such objects be valid? Having tool writers have to provide a persistent object that can be repeatedly queried for information might make writing tools harder than just having to dump them to JSON. However, writing a simple JSON<->the above interface mapping is really easy: tools that produce JSON can be transparently treated like tools that produce the interface, and tools that consume JSON can be fed by tools that produce the above interface mapping, and vice versa.

The advantage of the above interface mapping is that you can include expensive queries in your tool output and have them be pulled.

---

The generic display is interesting. However, if we display everything, we end up having to overly restrict what data we provide to the stream. We could have a "pretty print" list of fields that by default should be displayed. Later consumers can echo this down the tool chain, or change it in any way they like.

This would allow ls to output tsize, gsize, msize, ksize and size, while only one of them would be pretty-printed -- or, even the pretty-print size field might be distinct from the numerical size fields (so it can contain stuff like 20k instead of 20).
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10392
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby Meteorswarm » Thu Dec 15, 2011 4:54 pm UTC

EvanED wrote:* I do think it's possible to write a reasonably generic formatter. For instance, programs like ls and ps that I think the objects work really well for lend themselves very naturally to a tabular format. It should probably be possible to include some formatting information, like what columns to include/omit by default, or perhaps something akin to a printf specification of how to print each object. I think this would need some brainstorming.


And what about when people start passing arbitrary data structures on this object pipeline? There's no way you can come up with a generic pretty-printer that performs well on trees, forests, graphs, and whatever other wacky data structures people use, while the programs using those structures would know how to interpret and represent them.
The same as the old Meteorswarm, now with fewer posts!
User avatar
Meteorswarm
 
Posts: 980
Joined: Sun Dec 27, 2009 12:28 am UTC
Location: Ithaca, NY

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Fri Dec 16, 2011 6:54 am UTC

Oh man, you don't even need a -l flag to ls... you can just pipe to stat. :-)

Yakk wrote:(A more complex version might allow for a tree structure, but that leads to complexity that might not be warranted.)

I think sort of "limited-depth" trees would be useful. For instance, ls (or stat!) would want to output objects with a mtime field which is a date-time -- but it would be useful for that date-time to itself have explicit fields for year, month, day, etc.

What I see as being very far out-of-the-norm is more general trees, or graphs.

The tricky part is the two way communication, which is what lets the pipe be a lazy pipe based on pulling instead of pushing.

So I swung a little bit toward your view for a bit, because of this argument. I was thinking about having each utility be a shared library with, say, a next_object function. (Probably not exactly this, but let's just play with the idea.)

But I was talking to a friend, and he reminded me that Unix pipes have a fixed and relatively small buffer size. So if you run foo | bar, and foo is outputting way more than bar wants by some point, then foo will block until bar clears some of the backlog.

While this isn't quite pull semantics, it seems Good Enough when you consider the benefits of sticking with the fundamentally-textual nature of something JSON-like.

This would allow ls to output tsize, gsize, msize, ksize and size, while only one of them would be pretty-printed -- or, even the pretty-print size field might be distinct from the numerical size fields (so it can contain stuff like 20k instead of 20).

My current feeling is that there should be some sense of weak typing used in here. There are a bunch of "special" kinds that show up with a lot of Unix commands, like file names, permission masks, sizes, date-times, etc.

I'm not sure whether you want to have types for those in the traditional sense, or just have some indicator which says something like "format this field as if it were a size" or what, but I think the answer involves some sense of semantic markup along those lines.

Meteorswarm wrote:And what about when people start passing arbitrary data structures on this object pipeline? There's no way you can come up with a generic pretty-printer that performs well on trees, forests, graphs, and whatever other wacky data structures people use, while the programs using those structures would know how to interpret and represent them.

Oh, I fully agree there. However, like I said above, I expect these cases to be rare. And even when they do arise, there's nothing to prevent the programmer of the utility that output them in the first place to provide a separate pretty-printing program. Sure it splits things up and isn't ideal in that regard, but I still think the tradeoffs go in favor of keeping a textual intermediary.
EvanED
 
Posts: 4122
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby Yakk » Fri Dec 16, 2011 2:19 pm UTC

As noted, unix pipes are pull based. But the only argument on the pull is more please.

What if our request wasn't merely more please?

We could even do away with the objects completely and only stream JSON. But instead of merely saying more please, there is a back channel where the reader of the pipe says what it wants to be told, where "everything" or "default data" is an option (and even the default option if you don't hook up the back channel).

So the dumb data dump would be equivalent to "default" "next" commands on the back pipe.

A smarter reader could say "megsize", check if it is bigger than 20 megs, and if not, say "next".

A filter (on megsize > 20) would do this for each "line", and when it found something with > 20 megs, would wait for its reader to request data, and echo any such requests back up the chain, until it got a "next".

The shell could easily take a dumb json dumper and parse the fields, allowing even dumb json dumpers (the one-way pipe output ones) to act like two-way pipe programs as far as the consumer is concerned. And dumb readers (one-way pipe readers) are even easier to pretend to be smart readers by the shell -- "default", "next", repeated until end of file.

I don't know if it is worth it, but it would be neat.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10392
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby phlip » Mon Dec 19, 2011 3:56 am UTC

But then you're getting into the realm of things that would probably end up going better if you just scrapped all pretenses and built a full RPC system... because that's what this is turning into - a system where the in/out pipes are RPC objects, and there are a bunch of built-in RPC interfaces available (list, map, etc) to allow clean interop between various tools.
While no one overhear you quickly tell me not cow cow.
but how about watch phone?
User avatar
phlip
Restorer of Worlds
 
Posts: 7162
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby markfiend » Mon Dec 19, 2011 9:38 am UTC

EvanED wrote:So if I give you a steak knife and ask you to rip a sheet of plywood for me (or I give you a table saw and ask you to help slice some cheese), and you whine, you're a bad workman?

If it's reasonable to pair ls with sort with sort, why does ls sort things itself? Do the developers of ls just like adding useless features for the hell of it?

Heh. Touché.
Five tons of flax
User avatar
markfiend
 
Posts: 417
Joined: Fri Jul 06, 2007 9:59 am UTC
Location: UK (Leeds)

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby markfiend » Thu Dec 22, 2011 3:11 pm UTC

this may be worth reading...
Five tons of flax
User avatar
markfiend
 
Posts: 417
Joined: Fri Jul 06, 2007 9:59 am UTC
Location: UK (Leeds)

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Thu Dec 22, 2011 5:53 pm UTC

Thanks for the link. I won't get a chance to look for a couple days, and I don't think it'll change my mind that I should try "duplicating" a lot of coreutils and see what I think about using them, but I will take a look.
EvanED
 
Posts: 4122
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby markfiend » Sat Dec 24, 2011 10:01 am UTC

Main point:
Text streams are a valuable universal format because they're easy for human beings to read, write, and edit without specialized tools. These formats are (or can be designed to be) transparent.
Five tons of flax
User avatar
markfiend
 
Posts: 417
Joined: Fri Jul 06, 2007 9:59 am UTC
Location: UK (Leeds)

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby Derek » Sat Dec 24, 2011 10:51 am UTC

I feel that much the same can be said about XML or JSON. They're more expressive than simple text streams and they are standards for which there are already numerous tools to manipulate them, so you don't need any "specialized" tools for them. They're slightly less human readable, but they're by no means unreadable.

I kind of feel that that philosophy was more relevant back when these standards didn't exist, and in particular programmers were tempted to use binary data in order to save a few bytes, for which every program would have it's own format that would be incompatible with others. But these days we have something even better than raw text streams. In fact, I suspect that any program manipulating sufficiently complex data will end up creating an ad hoc, informally-specified, bug-ridden, system to represent objects in data, so why not just use something standardized and for which tools already exist, like XML or JSON, to begin with? (Yeah, I totally just ripped off Greenspun's Tenth Rule)
Derek
 
Posts: 1516
Joined: Wed Aug 18, 2010 4:15 am UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby Yakk » Sat Dec 24, 2011 1:30 pm UTC

phlip wrote:But then you're getting into the realm of things that would probably end up going better if you just scrapped all pretenses and built a full RPC system... because that's what this is turning into - a system where the in/out pipes are RPC objects, and there are a bunch of built-in RPC interfaces available (list, map, etc) to allow clean interop between various tools.

Sort of. Except it is an RPC system that a program can emulate by simply dumping JSON records of each line. I just wanted the JSON records to be (optionally) pull based instead of forcing them to be push based.

It is also a no-going-back RPC item generator by default.

Yes, it could be extended to full-fledged RPC -- heck, you could implement a smalltalk like method calling syntax (you request field [, then field object name, then field method name, then the next method prefix, and finally ] to call the method) given the ability to request named fields and an abuse of the assumed use. However, that is just a symptom of the richness of the ability to pass strings, not a condemnation of the protocol.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10392
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby markfiend » Tue Jan 03, 2012 3:23 pm UTC

Derek wrote:I suspect that any program manipulating sufficiently complex data will end up creating an ad hoc, informally-specified, bug-ridden, system to represent objects in data

Like XML or JSON? (I kid... mostly.)
Five tons of flax
User avatar
markfiend
 
Posts: 417
Joined: Fri Jul 06, 2007 9:59 am UTC
Location: UK (Leeds)

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Wed Jan 04, 2012 4:01 am UTC

markfiend wrote:
Derek wrote:I suspect that any program manipulating sufficiently complex data will end up creating an ad hoc, informally-specified, bug-ridden, system to represent objects in data

Like XML or JSON? (I kid... mostly.)

To be fair, ESR actually has reasonably nice things to say about XML in that link which you posted before. His most damning critiques are (1) that it's overkill for simpler formats, (2) the data is lost in the markup (both related things which JSON alleviates to a large degree), and (3) that it doesn't play well with the traditional Unix tools (to which I say make new tools, perhaps like the xmltk he mentions and I still need to look in).
EvanED
 
Posts: 4122
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby markfiend » Wed Jan 04, 2012 11:46 am UTC

EvanED wrote:To be fair, ESR actually has reasonably nice things to say about XML in that link which you posted before. His most damning critiques are (1) that it's overkill for simpler formats, (2) the data is lost in the markup (both related things which JSON alleviates to a large degree), and (3) that it doesn't play well with the traditional Unix tools (to which I say make new tools, perhaps like the xmltk he mentions and I still need to look in).

Agreed.
Five tons of flax
User avatar
markfiend
 
Posts: 417
Joined: Fri Jul 06, 2007 9:59 am UTC
Location: UK (Leeds)

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tetsujin » Fri Jun 15, 2012 11:04 pm UTC

I've been away a while, but how can I resist a thread like this? I don't remember if I discussed my ideas with you before EvanED, though I'm thinking I probably did...

But, hell yes. I absolutely want something along these lines. I intend to make it, in fact. It's something I've worked on here and there, done a lot of thought on the design but sadly not enough on implementation yet.

First, various rebuttals:

Why Text isn't the One True Data Format:
Spoiler:
First, a common argument in favor of "plain text" formats is that they are easy to read and "universally" supported. This is only somewhat true, and to the extent that it is true at all, it's only because work was done to make it true.

For instance: code for dealing with "plain text" files is built into the standard C and C++ libraries, and most programming languages. Read line, printf, and so on. And because computer enthusiasts and programmers tend to need to work with text files from time to time, we have text editors and an arsenal of text-handling utilities at our disposal. Lots of code has been written to deal with text.

Now, suppose the same were true of another format - like a spreadsheet format that was so commonly-used that it had already been incorporated into all the various programming libraries, and everybody who did anything complicated on a computer would have loads of tools on hand built specifically for dealing with these spreadsheets. In that case, this spreadsheet format would be just as "universal" as a plain-text file.

One might counter this by saying that it still takes more work to write a new library for this spreadsheet format than for "plain text" - but that depends entirely on what you're doing with it. Once you start considering what you do with the plain text files, it becomes a lot more complicated. Do you have a regexp class handy? How about a parser generator? As the need for greater functionality, extensibility, and flexibility in a text format increases, the need for these things rises, because text, by itself, doesn't provide a way to separate "structure" from "payload". With the spreadsheet format, poking around at the format with a hex editor might be more challenging - the file may be a linked structure, or have tagged chunks with payload size recorded - but this allows the format to clearly separate "structure" from "payload", which means that once your library is written, you don't have to worry about questions like "what do I have to do in order to put my delimiter character into a field?" or "How can I add a new field to this file format without breaking compatibility with older code that doesn't need to read the new field?" XML can solve a problem like that and still be, technically, a text file - but if you're using XML, you're probably using an XML library to read and write it. And (IMO anyway) if you're smart you won't use a text editor to edit it. XML offers the reliable "structure vs. payload" separation that complicated data structures need, but despite the fact that the format is still "plain text", it is no longer "simple and universal" unless you have libraries and utilities installed to help you deal with XML.

Another possible response would be to point out that, while this kind of universality could exist for another format, it in fact does not. And that is a fair point. But what I'm driving at is that there is nothing intrinsically holy about "plain text". And while it's easier to write the basic support for "text" as a file format, once you start doing anything complicated with it, the complication falls on the user of the library. It's a question of who pays the price for dealing with the complexities of file structure: the library writer, once, or the library user, every time they write something using the library.

There are complications, as well. Newline conventions are one example (and while lots of utilities and libraries are sensible enough to deal with the issue intelligently, there are exceptions - various programs on Windows will play dumb if there's no CR before the LF, and there's the potential for archival or source control utilities to not realize that a file is text and so not bring the newline handling rules into play...) Character encoding is an issue as well, at least if you want to be able to play nice with the rest of the world. It's not quite as simple as saying "Use UTF-8 and get on with your life". Local character sets are still used in some cases for various reasons, there's UTF-8 vs. UTF-16, and when you've got Unicode you have to start dealing with things like normalization, at least if you want to handle it sensibly. Dealing with text properly these days is a challenge unto itself.


Why using a format other than "plain text" does NOT contradict the "Unix Philosophy", but, rather, embodies it better than "plain text" possibly can:
Spoiler:
There are various definitions and variations of the "Unix Philosophy". Some of them do, specifically, state that "plain text" is part of the formula.
However, I believe this is far from being the most important part of the formula. Another common clause is that the environment is built out of small, simple, single-purpose tools which are made to work together.
However, as EvanED says, the various tools have accumulated functionality which apparently doesn't belong in these "small, single-purpose" tools. ls has a suite of sorting options, in part because it's not easy to take the output of ls and sort it using the common set of Unix tools. To sort by date, for instance: you would need to format the output of ls such that it includes a date field - then pick out the date field from each line, sort lines according to that field interpreted as a date (I believe strongly in ISO-8601 - but in practice the output of ls is formatted for the user and so may be subject to locale-based decisions about date formatting) and print them out. That would not be entirely unreasonable as an inline Perl script - but it's not as convenient as it should be for such a common and simple task.
Additionally, I see the popularity of scripting languages like Perl or Python as a symptom of deficiencies in the Unix shell model as it's implemented. It's a chore to get all those "small, single-purpose" utilities to work together so instead people use a different programming language - whose libraries are chock full of "small, single-purpose" utilities which, for the most part, actually are easy to use together. The reason this is the case is because the modules and classes and functions in Python, for instance, have a real type system backing them up. There's never a question of what's a date and what's a string, how to compare dates properly, how to tell when a space in a list of names is a space and when it's a delimiter, and so on. Having this structure to the environment makes it a lot easier to use these tools together to actually get things done. In the Unix shell, all the tools use "plain text" as their "common" format - which really means there's no rules about what format any program might be using. Worse, the data interchange format for these programs is usually the same as its display format for the user's benefit, which means that the interchange format may be changed on a whim, subject to locale or preferences or a desire to make the program's output look better. Thus, for programs to communicate effectively with one another, they also need to include options to control how to parse their input, and how to format their output.
One of the really agonizing bits of this scenario, IMO, is that it's very difficult to make it any better within the framework of the current shells. For instance, in a normal programming language, if you wanted to handle XML, you would write or obtain an XML library, and make your code deal with that library. You could make this kind of approach work well in the Unix shell, but it would be tough. For instance, suppose you needed to pull records out of an XML file, filter them according to the value of one field, sort them according to the value of another, and then transform a third field before spitting the whole record out - without having XML-specific utilities to perform each of those tasks. In other words, how can a set of tools that don't deal in XML be made to handle XML data, just by the addition of a tool or library that adds "XML capability" to the shell?
And that's the issue: you can't. There's nothing you can translate XML to which will both retain the structure and data of the original format and be robust and easy-enough to parse that you can expect all the currently-existing tools to handle it. To deal with that data usefully, the tools need to be able to communicate in a format which, at least, respects the distinction between structure and payload. (In such a case, you could re-format the records so that each one contains robustly-delimited fields for the values you need to operate upon, do your work, and then translate it back to an XML record at the end.) Even better, if this format supported some notion of hierarchy, then a more direct translation of the original record would be possible, and the way the individual tools access the records of interest would be more natural and easier to understand.
One could respond to the situation by writing a suite of tools for the shell which replicate all the shell functionality that the user needs, but does it in an XML-specific manner. This works, but not for the reason you might assume: the important point here is that if you build that toolset around a format (even XML) which provides structure and reliable delimitation of values, then you have the option of processing other formats by temporarily translating to this common format, doing your work, and then translating back. In this way you could add support for any format that's comparable to XML in terms of its structure and complexity, just by adding utilities to translate bits of the data back and forth.
So if a shell environment adopts a more structured format as its "common language" for communication between tools, then the problem of satisfying that requirement of the "Unix Philosophy" of "small, simple tools that work together well" (to my mind, this is the most important aspect of the "Unix Philosophy") is easier. It's easier to implement the tools because you can target just one input and output format rather than implementing a bunch of options for controlling the specific formatting of the output or method for picking fields of interest out of the input, and there won't be as many issues with individual tools implementing flawed formats for their input or output thus causing problems - and it would be simpler for the user to use these tools, because their command pipelines wouldn't be bogged down with a lot of parsing/reformatting options.


With all that in mind, I come to the following conclusions about how a shell environment ought to be designed:
1: Core tools should all work with a common interchange format.
In the interest of making it easier to communicate complex abstractions between the "small, single-purpose tools", they should all speak the same "language". In order to make the language suitable as an intermediate representation for other forms of data it must provide at least robust delimitation of payload fields, with no restrictions on what can be contained in the payload fields, and no ambiguities about where payload ends. In order to make it efficient as an intermediate representation of other forms of data, it should be compact and somewhat flexible in terms of how it encodes the data payload.
Whether the shell's chosen data format is a new one or not, the shell's default toolset should include everything the user needs to work with the shell's interchange format - because it's those tools (and not the data format itself) which makes the format "easy to use", "intuitive", "universal" - all the virtues attributed to plaintext formats by Unix traditionalists.

2: Within the set of core tools, significant attention should be paid to making it easy to translate other data formats into and out of the shell's interchange format.
Obviously it's important to be realistic on this point:
There will never come a point in time in which all programs in the world will adopt this "common" data format used by the shell's tools
It should go without saying at this point - but it tends to be something people raise as an objection to the whole idea, as in "you can't hope to get the kind of adoption for your new file format that you'd need for people to willingly switch over to it." Putting all file-formats under one meta-format was something that was attempted in the '80s with efforts like IFF, and there are various reasons it doesn't work. Suffice to say that different kinds of data often need different kinds of storage. Historical lesson learned. But if the scope of this "adoption" is considered to be more limited, then the idea still has potential as a way to provide a comfortable operating environment in which users can do their work.
Additionally, it's to be expected that the user will frequently encounter file types for which a sensible conversion method is already defined and available within the shell environment. With that in mind, it's clearly important to provide tools to perform translations. The Unix shell already includes some powerful tools to do this sort of thing (though without a versatile, structured common format available for the output) but I think there is potential to improve upon those, and produce tools that are both more capable and easier to use. So for instance, the environment should provide parser generators to help users build reliable tools for handling formatted text streams - as well as tools for "parsing" structuring conventions commonly seen in "binary" files - things like tagged chunks with payload size, delimiters, offsets to data fields and so on.

4: The shell should provide assistance to the user to the extent that it is reasonable, and the overall environment of the machine should be set up to support this.
As a basic example, consider tab-completion in command arguments. Bash and other current shells provide mechanisms to make tab-completion of arguments do various useful things: like tab-completing filename arguments for "unzip" to only those filenames that end in ".zip", or tab-completing option switches to the set of options supported by a program. The problem with these mechanisms is that they're centralized, and the set of data that makes these features work is maintained separately from the programs they apply to: which means the tab-completion feature can fall out of sync with the set of options a program really provides, for instance. So having a mechanism to de-centralize the storage of information about the kinds of assistance the shell can provide for programs on the system, and ultimately getting support for that into the distribution would be very helpful.
One of the ways in which this program meta-data can be useful is providing means to translate data from one format to another. Certain programs and certain file-formats could have sensible defaults for how they are translated into (or encapsulated in) the shell's interchange format. "find" for instance, produces a list of filenames. In cases where a particular translation is used frequently, it should be easily accessible, and easily discoverable.

5: "default" format conversions should not be "automatic"
This decision feels like a compromise but I think it's important: programs that don't speak the shell's native language should not generally be considered directly compatible with those that do. There has to be a translation step in the pipeline, and it has to be explicit in some way. I like the idea of having a shorthand syntax for specifying a translation where a sensible default exists for a particular tool (for instance, there's a sensible default for how you'd translate the output of (GNU) "find" to a data stream in the shell's "common" format. But using that default all the time would be problematic for various reasons. Some users (or all users in some circumstances) wouldn't want to. Some systems may not have GNU find set up that way.

I've already talked about some of the benefits I think the shell could derive from this approach - like being generally easier to work in, more perfectly following the model of the "Unix Philosophy" (and just about any well-designed library in any sane programming language, if you think about it) in which complex tasks are built up out of relatively simple tools, made to work together. But I've also spent some time considering more elaborate cases, trying to figure out potential benefits from this approach... For instance, loop parallelization. This is something current shells really don't address. Sometimes when you have a loop you're running (over a set of files or values or whatever) there's a benefit to parallelizing it. For instance:

$ #I want to encode my directory full of WAV files into MP3's, and add ID3 tags!
$ for f in *.wav; do mp3-encode $f ${f##.wav}.mp3; id3 ${f##.wav}.mp3 -c "I encoded the hell out of this file!"; done

It's mostly a CPU-bound job so I want to take advantage of my multi-core CPU. Bash actually can do things like this:

$ for f in *.wave; do (mp3-encode $f ${f##.wav}.mp3; id3 ${f##.wav}.mp3 -c "some comment") & done

But it doesn't give you a means of specifying how many jobs you want to have running at once, and if the jobs within the loop produce an output stream, bash can't merge those streams in a useful way. (With most stdio programs it'll wind up merging the output on newline boundaries, just because of the way stdio line buffering works - but in some cases the output of one job thread could wind up right in the middle of another job thread's output.)

"for" could be extended to provide a way to specify a program that would merge the output streams of the jobs within the loop - but then, of course, the user needs to either write such a program or have it available. But if the output of the job-threads of the parallel loop are in the interchange format, then the shell is able to provide a sensible (and flexible, and broadly applicable) strategy for merging the output streams: simply interleave them on value boundaries.

Yakk wrote:As noted, unix pipes are pull based. But the only argument on the pull is more please.


Not quite true. Suppose the program feeding the pipe is something that actually has to do a pretty significant amount of work to generate each piece of output, while the program consuming the pipe does something like test the piece of data that was generated in order to decide whether to close the input pipe or not. The generator program, once started, will run continuously. It will pause if the pipe's data buffer fills up, and stop if the consumer closes the pipe (SIGPIPE termination).

This is good enough for a pretty simple approach to quasi-lazy-evaluation. There's a good chance that the generator won't generate too much more data than it really needs to. But it will keep working until the consumer breaks the pipe. If it takes the consumer a while to figure out that it's going to break the pipe, then the generator will keep working, generate more data than it actually needs to.

So it's not really "more please", it's more like "no more just yet" (i.e. "pipe buffer is full so generator's call to write() blocks") or "no more ever" (i.e. "consumer has closed the pipe so generator gets SIGPIPE when it tries to write to the pipe")... I guess this raises another issue, which is that the generator doesn't get SIGPIPE until it tries to write more data to the pipe, which may not happen until after the generator does another chunk of work. Even if you tried to get clever, have the generator write out one byte immediately before starting the next chunk of work - still it'd be a race condition. Does the far end of the pipe close before, or after you start the big job?

To me the bigger issue is when I start thinking I'd like to generate a stream of data, have one program consume the data until some criterion is satisfied, and then pass the data on to another consumer after that criterion is satisfied. For instance, if the data stream were a list of e-mail messages, and the first consumer filtered out all the ones prior to a certain date, and the second consumer did some other operation on the rest. It could be done by having the first consumer act as a filter - essentially turning itself into "cat" once the condition is met... But that's kind of wasteful. It would be cool to just make the first consumer eat up all the messages prior to the date, and then hand the stream over (directly) to the second consumer. That leads to problems of buffering and data format integrity of the stream. If the first consumer consumes too much data (like pulling an extra several bytes out of the input stream) before making the decision to close its input pipe, then the pipe will be left in a state that's not at a boundary between messages. And if the format for the pipe relies on headers at the start of the data stream or anything like that, those wouldn't be present in the "partially eaten" stream the second consumer gets. If the stream format has an easily-identifiable delimiter and the consumer is careful to never read more data out of the stream than it's actually "consuming", the model can work... But to be more generally reliable it would need a real "pull" mechanism where the consumers provide information on how much of the stream they're actually consuming.
---GEC
I want to create a truly new command-line shell for Unix.
Anybody want to place bets on whether I ever get any code written?
User avatar
tetsujin
 
Posts: 422
Joined: Thu Nov 15, 2007 8:34 pm UTC
Location: Massachusetts

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Sat Jun 16, 2012 2:31 am UTC

tetsujin wrote:I've been away a while, but how can I resist a thread like this? I don't remember if I discussed my ideas with you before EvanED, though I'm thinking I probably did...

I'm not sure either. I've mentioned the idea to various people over time; my OP is an almalgamation of lots of those discussions, my experiences at the Unix and Windows command lines, and other tools. (In particular i'm deeply indebted to TermKit for the specific idea of using JSON, and perhaps for the more broad idea of using a textual serialization of objects instead of building everything into Python or something and just giving it a better syntax for shell operations. I don't agree with a lot of what he did or how he did it and simultaneously think it doesn't go far enough, but there's a lot about it that's awesome.)

But, hell yes. I absolutely want something along these lines. I intend to make it, in fact. It's something I've worked on here and there, done a lot of thought on the design but sadly not enough on implementation yet.

We should pool resources if our ideas are compatible enough. :-p

I've done a small amount of supportive coding, working in Python because I expect the language to be flexible to accomodate changes as I discover more information about what should happen. So far, I've written a wrapper around readdir that gives an iterator-like interface and more information than Python's os.listdir() but less than stat and without the performance penaltiy of stat. My plan is to work toward a JSON-emitting ls, then a "human-friendly" display utility, then start to use them and see what I want next. :-)

various programs on Windows will play dumb if there's no CR before the LF

To be fair, it's not just Windows programs that have newline problems. Actually my experience is that more Unix programs have problems with CRLF line endings than Windows programs have problems with LF.

To sort by date, for instance: you would need to format the output of ls such that it includes a date field - then pick out the date field from each line, sort lines according to that field interpreted as a date

I'll also point out: even with ls -l, the information you need to completely accurately sort files by mtime isn't present. In fact, there are technically two separate problems. You literally can't feed it to another tool and sort, no matter how easy it'd be to pick out the date column and compare them. Maybe there's another ls flag that always outputs the full mtime -- but having been an on-and-off moderately heavy user of *nix for about a decade, I don't know what.


As a basic example, consider tab-completion in command arguments. Bash and other current shells provide mechanisms to make tab-completion of arguments do various useful things: like tab-completing filename arguments for "unzip" to only those filenames that end in ".zip", or tab-completing option switches to the set of options supported by a program. The problem with these mechanisms is that they're centralized, and the set of data that makes these features work is maintained separately from the programs they apply to: which means the tab-completion feature can fall out of sync with the set of options a program really provides, for instance. So having a mechanism to de-centralize the storage of information about the kinds of assistance the shell can provide for programs on the system, and ultimately getting support for that into the distribution would be very helpful.

Did you steal my brain when I wasn't looking? :-)

A long-term goal is to do something which would make this information centralized to the program in question. My ideal picture would be something like (1) you write a program using a good argument parsing library (read: not getopt) that provides things like what options are present, what their arguments are, help text etc., (2) in addition to generating the code to parse it, you generate a description of that which is turned into some metadata to be stored in or next to the program (for executables, a separate ELF section would be ideal; for scripts, either an extended attribute or (acknowledging that xattrs have a lot of practical problems, much to my chagrin) a file right next door like script.args or something, or a file somewhere else), (3) the shell gives you things like standard Unix shell tab completion and even intellisense-style completion.
EvanED
 
Posts: 4122
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tetsujin » Sat Jun 16, 2012 8:49 am UTC

Most of the work I've done so far has been to define and implement a binary interchange format for the shell. Apart from my distaste for text-encoding everything there are other reasons I think XML and JSON aren't suitable for use as the interchange format...

For instance, one thing I want to be able to do is wrap existing programs so they can output data in the interchange format with minimal impact - which is to say that when possible, I don't want to insert another process into the pipeline to translate data formats. To that end, the format I designed has provisions to prefix a data stream with a header, so that the resulting stream is a valid interchange-format stream.

So when wrapping a program that produces an output stream containing a newline-delimited list of filenames, the wrapper program would output a stream header that says "what follows is text of such-and-such encoding, with values delimited by newlines, with no provision for newline appearing as part of a value, and the stream terminated by EOF condition." Once the wrapper program finishes writing out that header it would then exec() the wrapped program (avoiding additional overhead)

JSON and XML can't do that for a few reasons- you can't change how values are delimited, you can't stuff data into the stream without escaping certain characters, etc.: so to wrap a program you'd need another process in the pipeline filtering its output. JSON and XML also aren't great choices for packaging binary data - and there are Unix tools that write out non-textual bytestreams, so I see no merit in either disabling that capability or forcing it to go through a base-64 type scheme in order to work with the shell interchange format. As with the previous example, it's possible, when wrapping a program that outputs binary data, to simply prepend a header that identifies the data type going over the stream and some basic information about how it's encoded (essentially, "the remainder of the stream until EOF is one big binary payload") In the case where multiple bytestream values need to be aggregated into a list, there are provisions for filtering the data to encapsulate it within the stream - prefixing each bytestream with a payload size field avoids the need to filter the data but requires foreknowledge of the size of each payload, while sentinel termination rules are more flexible but require filtering the payload to avoid including the (unescaped) sentinel byte.

The work has hit a bit of a snag due to my EEE 901 flaking out constantly over the last year until I got fed up with the machine and hit it, smashing the screen... All my code is still on its flash drive. :)
---GEC
I want to create a truly new command-line shell for Unix.
Anybody want to place bets on whether I ever get any code written?
User avatar
tetsujin
 
Posts: 422
Joined: Thu Nov 15, 2007 8:34 pm UTC
Location: Massachusetts

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby la11111 » Wed Jul 04, 2012 10:10 am UTC

as someone who writes a lot of powershell code at work - i've found that being able to pass objects around on the pipeline to be magnificently useful.

I've also considered the possibility of implementing that functionality in a linux shell environment, and it looks like this discussion has gotten about as far as I've thought it out. Honestly, unix needs this.

It seems to me that the biggest problem would be trying to get all of the hundreds of coreutil programs to speak object ... All with different maintainers, and who knows how similar the guts of those programs even are ?? I haven't even gotten so far as to bother looking at an gnu coreutils code.

I thought about data::dumper and json and what not, but my gut feeling is that writing individual wrappers, or tacking specialized i/o functions, onto each individual utility would be excessively difficult to implement and maintain.

So naturally my thoughts turned to busybox. I haven't had a whole lot of time to inspect the code and figure out exactly what goes on, but I think it's reasonable to imagine that, due to the monolithic nature of its design, the i/o interfaces surely have to be a lot more standardized internally than across the whole gamut of coreutils. (not to mention the decompressed source, with docs, is only about 15MB, and it's designed to be cruft-free, so it ought to be a lot more hackable...) So what if you just exposed the internals of (at least a subset of) busybox as a library? Since your shell would need to speak object as well, you could interface it to the language that you're using for your shell (i'd personally use perl, because i'm a masochist... and because there's already perl-shell) and access the raw data directly ... save the trouble and overhead of system() calls, marshaling and the like. then turn them into built-ins like powershell and it's cmdlets. and, you'd still be posix compliant via busybox's built in functionality when operating from other shells. it would be a beautiful thing.

And actually, now that I think of it, there's already a sub-project of busybox called bblib, and i believe they're working on precisely that. (i'm gonna look that up :)

If anything, this would be the area where i'd most want to contribute to such a project. I've also been itching to learn more about parsers. so there's that.

if you have anything concrete, maybe you should open up a sourceforge.
la11111
 
Posts: 5
Joined: Wed Jul 04, 2012 8:23 am UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Fri Jul 06, 2012 3:53 am UTC

la11111 wrote:It seems to me that the biggest problem would be trying to get all of the hundreds of coreutil programs to speak object ... All with different maintainers, and who knows how similar the guts of those programs even are ?? I haven't even gotten so far as to bother looking at an gnu coreutils code.

It's not really just a matter of making each individual coreutils program object aware; it requires rethinking what the set of utilities should be as well.

And while I'm at it, I also think that there should be some overarching design that aims for more predictable and uniform names and command line flags. (GNU-style long options help tremendously, but don't completely solve the problem.)

I thought about data::dumper and json and what not, but my gut feeling is that writing individual wrappers, or tacking specialized i/o functions, onto each individual utility would be excessively difficult to implement and maintain.

Are you talking about what happens for final output to the user, or what? I sort of think that there's not much additional problem if you have to split it up, and the benefits of being able to use different languages and such for different libraries is essential. Imagine if you could only pipe the textual output of one program into another if both were written in C.

(not to mention the decompressed source, with docs, is only about 15MB, and it's designed to be cruft-free, so it ought to be a lot more hackable...)

Ah, that's a very different goal than what I'd have in terms of implementation. For a variety of reasons, I wouldn't even remotely consider hacking coreutils to get this.

then turn them into built-ins like powershell and it's cmdlets. and, you'd still be posix compliant via busybox's built in functionality when operating from other shells

To be honest, I view POSIX compliance to be almost an anti-goal. If you want POSIX-compliance, the standard utilities are still going to be sitting there for you to use. I see interactions between my utilities and POSIX stuff as being either trivial (e.g. cat) or better done via an explicit conversion function (e.g. stripping out filenames to pass to xargs).

if you have anything concrete, maybe you should open up a sourceforge.

It'd go on Github. And as a warning, be in Python. :-)

I've actually just done a bit more work on this on a couple bus rides this week, and I'm starting to work on a replacement for ls. I've got some very basic functionality implemented; no command line flags are taken, but it takes a list of files and directories and does the usual ls thing except it prints out JSON stuff instead.

I can probably put up what I have. Give me a few minutes.

Edit Github links:
pyreaddir
futureix (I was feeling a little cheeky when naming it :-))

Note that neither of these are really usable without some effort at the moment. Though I'd like to package up the pyreaddir one so that it is. Maybe I'll work on that next.

Edit again Oh yeah, and you'll need Cython too. And right now it only works for POSIX systems (only tested on Linux).
EvanED
 
Posts: 4122
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby la11111 » Sun Jul 08, 2012 5:11 am UTC

Nice, I'll look at those in a bit.

I'd ask that you forgive my last post for the fact that I wrote it at a time of morning that I shouldn't have even been awake, much less should I have been trying to design an entirely new shell! Also that I hadn't done just a whole lot of research on the topic.

I perused the source of busybox, only to find that it literally is the original coreutils suite, each utility simply pared down to a small size and placed in its own C source file.

I also read the book suggested by markfiend earlier in this thread, and found it to be, basically awesome. I highly recommend it and definitely agree with it about "the importance of being textual". Thanks markfiend! (re: http://catb.org/~esr/writings/taoup/html/textualitychapter.html) I've been spending -waay- too much time in powershell, obviously, and reading this book brought a great deal of insight to the problem.

In light of that read, I'm of the opinion that something like json would be a great solution, and in fact, preferable to using any kind of binary interchange format. After all, I see two fine solutions to the problem of exchanging binary data over pipes - if that was ever even necessary in the first place. One would be simply converting the block to ascii hex for output, the other being something along the lines of (size=4,block=xxxx). As a 'for instance', I would have to cite the existence of PNG/SNG, which until i read TAOUP, I had no idea that such a thing even existed.

That said - In my opinion, all we're trying to do is to allow for the passing of -structured- text between programs. I can't think of any cli utility off the top of my head that uses (or would necessarily need to use) binary data over pipes. Not even in powershell is this -really- ever necessary. (Sure it's implemented in that way underneath, but powershell is just a huge horrible .NET beast which mocks the principles of what a shell should ever be.) even without any specialized tools, a human being would still be able to at least make sense of (name:foo;size:234234;date:...) Not only that, but it's just as easy to do rgrep /date:([^;]*);/ (and possibly easier, depending on your perspective) than it is to use some arcane combination of cut/sed/what have you.

on top of that, the original tools could stay intact, and structured i/o would simply need to be added to their functionality. To steal a powershell-ism, you could have something like this, in plain SH for example:

Code: Select all
$ls --json=name,date,size
(name:foo.pl; size:234234; date:20120708)
$ls --json=name,date,size | ft
name     date     size
========+========+========
foo.pl   20120708 234234
$ls --json=name,date,size | perl -ne 'print "$1\n" if /name:([^;]*);/'
foo.pl
$ls
foo.pl
$ls -l
total 0
-rw-r--r-- 1 me me 0 Jul  8 00:13 foo.pl
$


or however it looks in json. I don't know json. (and ft == format-table)

And if one was to modify or create a new shell, you could implement functionality to have a default translator for json output or some such, just as in powershell (objects get tostring()'d by default when they come out of a pipe).

The point being, it's a fairly small modification (per program) that could be extremely useful. The problem would be, implementing it in all of those 100's of different utilities, consistently, without it being a kludge or something that would piss off the greybeards. (I.E. something that, if it: worked, was stable, was consistent with unix, etc., could possibly be accepted into a standard coreutils distributions so as not to require the constant maintenance of a separate fork of coreutils! Another good reason to start with something like busybox.)

Also, in the TAOUP book, ESR reiterates something that's easy to forget when you spend a lot of time dealing with windows machines and fat bloaty gui programs - fork()ing in linux is _cheap_ - faster than threads in a lot of cases. So some sort of intermediate process to translate raw data to proper json and back might not be such a bad idea, if only for the sake of consistency ( and to save the need to include a json.h type library into -every- coreutil program, or add senseless bloat to libc...)

---

Regarding python - I've been reading about perl5 and perl6 today ... it's pretty clear that perl5 is slowly becoming irrelevant, and although perl6 is really really awesome, it looks like it may go the way of DN Forever ... so sad :( So it may be time that I learn me some python anyway.

Honestly, the main reason I don't want to like python is - No semicolons! That's 90% of it!

The beauty is that, done properly, it shouldn't even matter :)
la11111
 
Posts: 5
Joined: Wed Jul 04, 2012 8:23 am UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tetsujin » Mon Jul 09, 2012 6:07 pm UTC

la11111 wrote:as someone who writes a lot of powershell code at work - i've found that being able to pass objects around on the pipeline to be magnificently useful.

I've also considered the possibility of implementing that functionality in a linux shell environment, and it looks like this discussion has gotten about as far as I've thought it out. Honestly, unix needs this.


I think it'd be lovely, yeah. Though one thing about powershell is that it's based (of course) in .NET - which has numerous implications for anyone wanting to try the same sort of thing not in .NET. So for instance, on Linux it's a much harder sell to get everybody to use the same access-managed virtual machine so that we can safely pass live objects from one program to another. This is why I've been focusing most of my efforts on a design based around serializing data - going that route preserves process boundaries and allows different programming environments to speak the shell's language without having too much impact on the implementation of these environments. A good serialization language could then be extended to provide an IPC-style calling architecture, which would probably be the way to implement full "object passing" while still preserving process boundaries.

It seems to me that the biggest problem would be trying to get all of the hundreds of coreutil programs to speak object ... All with different maintainers, and who knows how similar the guts of those programs even are ?? I haven't even gotten so far as to bother looking at an gnu coreutils code.


From a "political" standpoint, I believe this is never going to happen. (Well, I could be wrong about "never" but I think it'd be a very hard sell) I expect that most of the people behind such utilities are very strongly in the "Traditional UNIX/Posix" camp and won't want to rewrite their code to play nice with a new shell designed around serialization of robust data structures or live objects. (By some accounts, serializing anything other than plain, ad-hoc formatted text is against "The Unix Philosophy")

Best strategy, IMO, is to focus on re-implementing the things we need, and make the new shell popular enough that people will ultimately want to start playing in its backyard, "Unix Philosophy" or no.

In the mean time, one must assume that the rest of the world will not rewrite their code to work nicely in the new shell. Some may even violently oppose the idea. So the best bet, I figure, is to make the new shell (including its very own suite of core programs) work really nicely, but also make sure that it's still a good environment for working with all the other programs in the world.

The last bit has been a challenge for me as I've thought about shell syntax. There's lots of things I want to do with syntax, like reclaim "<" and ">" for use as infix comparison operators and possibly also for XML-style "tagging" of data, treat any undecorated name anywhere on a command line as a subcommand to be found on the path, treat string literals as self-evaluating objects by default, and as command names only when explicitly requested, and so on... This has numerous complications, of course. The angle brackets thing goes against decades of experience with shells on multiple platforms. Treating undecorated names as subcommands means the whole command-line interface of "cvs" suddenly requires quoting. Making strings self-evaluate means the user has to take special action to use a string literal or variable as a command name.

That is to say:
Spoiler:
Code: Select all
$ # Comparison: < and > actually evaluate to "<" and ">" and are taken as command arguments of the first item being compared.
$ 5 > 4
true

$ # Redirection will be probably limited to just the pipe character, and redirecting to files will be done as in ancient times, by redirecting to a command which opens the file:
$ find | write filename   # or something

$ # "Tagging" - used in part for structuring XML data right on the command line (if you need to send XML to something) but also more generically to pack metadata in with a piece of data.
$ # (It doesn't really generate XML, just structured data, expressed in a way that should translate nicely to XML.)
$ # This is just an idea I've been tossing around, not sure I want to use exactly XML syntax for it.
$ <some-tag score=18>"tagged string"</some-tag>
<some-tag score=18>"tagged string"</some-tag>

$ # Strings self-evaluate.
$ "ls"
"ls"
$ "ls" --length
2
$ # Pathnames are also string literals, as are GNU-style command arguments.
$ --length
--length
$ --length --length
8
$ /usr/bin/ls
/usr/bin/ls
$ # To "dereference" a string to run it as a command requires some application of syntax.  I haven't worked out all the details.
$ # In this example, $() reinterprets a string as a full command - as in current shells.  But this means you can't provide arguments to the named command...
$ $("ls")
$ # In this example, prefixing a string with a dollar-sign would "dereference" the name, turning it into a command to be run.
$ $cc
"/usr/bin/gcc"
$ $$cc -c ./file.c   #Bleah, double dollarsign following a dollarsign prompt...
$ $/usr/bin/gcc -c ./other-file.c    # Unfortunate side-effect of "strings self-evaluate", need to decorate pathnames to run them.

$ # Undecorated names as command arguments are command invocations!  Actually I think I've already given up on this idea...
$ # Hard to keep track of some of these decisions sometimes.  I kept a lot of my notes on my old phone, and not all the data has been moved to the new one.
$ echo ls    # Works like "echo $(ls)" in bash
$ # There's various motivations for this - for instance, so I can provide things like unit specifiers on the path:
$ 5 meters --per second   # "meters" and "second" are objects found on the path.  Of course if they just became string literals that could work, too.
5m/s
$ # But this complicates things like apt-get and cvs:
$ cvs checkout filename   # Problem, problem, problem!  Shell attempts to run "checkout" and "filename" as subcommands...
$ cvs 'checkout ./filename   # Scheme-style name quoting and filename specified as a relative path to treat it as a string literal...  It's workable but I think people won't like it.  Stupid CVS, why did you not prefix your command names with double-dash?
$ apt-get --install 'vlc     # Again, got to use a quote on the package name...


If I get rid of "undecorated names as command arguments are command invocations" then I may have some issues with other bits of syntax. For instance, suppose I use parenthesis to specify a list of stuff to be supplied as an argument to a command:
Code: Select all
$ some-command (a, b, c)

Are "a", "b", and "c" commands to be run (i.e. does being in parens establish a "command position" in which undecorated names are commands) or are they string literals (i.e. do parens in a command argument inherit the context of being a command argument...?)


So I have some ideas about this stuff but I'm really not sure it'll fly. I may have to rethink some of it, as much as I'd prefer not to.

So naturally my thoughts turned to busybox. I haven't had a whole lot of time to inspect the code and figure out exactly what goes on, but I think it's reasonable to imagine that, due to the monolithic nature of its design, the i/o interfaces surely have to be a lot more standardized internally than across the whole gamut of coreutils. (not to mention the decompressed source, with docs, is only about 15MB, and it's designed to be cruft-free, so it ought to be a lot more hackable...) So what if you just exposed the internals of (at least a subset of) busybox as a library?


The problem with library linkage as a way to connect programs together is that the two programs then share the same process, which has implications both for security and reliability. This sort of approach works in .NET because the virtual machine guards against certain forms of misbehavior. It's not a big deal to load shell extensions as shared objects when you're just talking about utilities that are part of the shell's core set of tools - but opening that up to everybody increases the risk that someone will do something mean or sneaky or just harmfully careless.

As for reusing stuff from busybox or other pre-existing tools... I think for the most part there's no point. A lot of the base utilities are pretty easy to implement anyway, and probably the ones that are more complex are going to be significantly different in their new implementations anyway. Like, what does "grep" look like, when it takes a structured data stream instead of a text stream? Well, it might not be called "grep" any more for starters, but it may not even match lines of text against a regular expression - it may check data fields in data structures, things like that. The implementation of the equivalent program would be a lot different. Something like "find" would probably require a lot of the predicates and command-invocation arguments be subject to a pretty high degree of redesign to make the new version really well-suited to the new shell's environment.

if you have anything concrete, maybe you should open up a sourceforge.


I'm not quite there yet, unfortunately. I really need to replace my broken laptop so I can more easily work on this stuff again.

EvanED wrote:To be honest, I view POSIX compliance to be almost an anti-goal. If you want POSIX-compliance, the standard utilities are still going to be sitting there for you to use. I see interactions between my utilities and POSIX stuff as being either trivial (e.g. cat) or better done via an explicit conversion function (e.g. stripping out filenames to pass to xargs).


Yeah... Personally, there's just too much I want to do in a new shell that's probably flat-out incompatible with POSIX 1003.2. At any rate, the designs I have in mind are different enough from the established Unix shells that making the new shell entirely POSIX compatible doesn't necessarily make a lot of sense anyway. A shell that passes structured data or live objects around is going to be pretty different from what people are used to in any case.
---GEC
I want to create a truly new command-line shell for Unix.
Anybody want to place bets on whether I ever get any code written?
User avatar
tetsujin
 
Posts: 422
Joined: Thu Nov 15, 2007 8:34 pm UTC
Location: Massachusetts

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Wed Jul 11, 2012 6:32 am UTC

tetsujin wrote:Best strategy, IMO, is to focus on re-implementing the things we need, and make the new shell popular enough that people will ultimately want to start playing in its backyard, "Unix Philosophy" or no.

In the mean time, one must assume that the rest of the world will not rewrite their code to work nicely in the new shell. Some may even violently oppose the idea. So the best bet, I figure, is to make the new shell (including its very own suite of core programs) work really nicely, but also make sure that it's still a good environment for working with all the other programs in the world.

This sums up my view too.

I'd also go so far as to say that the second part doesn't even go far enough. It's also the case that you want that anyway, as there are formats that are just going to be plain text anyway] -- like plain text. :-) So you'll still want tools like grep and wc and whatever that can operate on plain text, and those should mesh as well as possible with the other parts of the system.

The last bit has been a challenge for me as I've thought about shell syntax.

This is where I start to disagree. :-)

There's lots of things I want to do with syntax, like reclaim "<" and ">" for use as infix comparison operators...

For instance, this. How often do I want to do a comparison? I looked through my zsh history, and in several hundred of commands I never did. Redirections though are common. I'd prefer to keep that as a natural notation, even if it means a strange syntax for comparisons.

treat any undecorated name anywhere on a command line as a subcommand to be found on the path, treat string literals as self-evaluating objects by default, and as command names only when explicitly requested, and so on... This has numerous complications, of course. The angle brackets thing goes against decades of experience with shells on multiple platforms. Treating undecorated names as subcommands means the whole command-line interface of "cvs" suddenly requires quoting. Making strings self-evaluate means the user has to take special action to use a string literal or variable as a command name.

Note that this has a lot more consequences than you acknowledge.

For instance, gcc foo.c -o foo. foo can't evaluate to a file, because the output file can't exist yet -- you'd either need special shell syntax for "create this file" (which adds extra overhead), require the user to quote output filenames, or make it context-sensitive where a name like that evaluates to a file if it exists but to a string if not (which seems somewhat unpredictable and I'm really worried about a TOCTTOU vulnerability). Note that this sort of thing -- putting the name of a nonexistant file on the command line -- is incredibly common.

It also means you can't say make all. It means you can't say grep something file. It means you can't say ssh compy, or locate file, or ps2pdf basename.

You're talking about a radical change in the command-line syntax, and personally I don't buy the benefits.

That said, I still think the fact that you're thinking about this is awesome even if I disagree with some of the specific directions.
EvanED
 
Posts: 4122
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tetsujin » Wed Jul 11, 2012 9:17 pm UTC

EvanED wrote:
There's lots of things I want to do with syntax, like reclaim "<" and ">" for use as infix comparison operators...

How often do I want to do a comparison? I looked through my zsh history, and in several hundred of commands I never did. Redirections though are common. I'd prefer to keep that as a natural notation, even if it means a strange syntax for comparisons.


It's probably more commonly seen in scripts. To me it seems that the fact that the lousy syntax for comparisons and expressions in the shell is one of the reasons people don't use them in the first place. :) With a more comprehensive type system in the shell, and a better syntax for doing stuff like that in the shell, the functionality would become both more valuable and easier to access.

I guess one way to put it is, I don't think about my shell design just as a "new command shell" or even a "scripting language" - it's to be a comfortable interactive environment for getting data from different sources and doing things with it. There are different languages and programs that I use at times for this sort of thing - as a place where I can comfortably experiment on pieces of data I've generated or collected... Like writing a Python wrapper for a module I've written in C so I can call the methods and see what happens, or loading a bunch of numerical data into Matlab so I can take a look at it. I can't hope to make a shell that will be great for every problem domain but I want to open those sorts of possibilities a bit.

There is also the problem that "redirect to file" is a less automatic thing than it used to be, once you move away from the idea that you're piping raw data. If you're writing data structures to a file, at some level a decision must be made about what the format of that file is to be...

For simple cases, however, there's a simple solution: in the syntax for my shell design, angle brackets, in isolation, are strings:
Code: Select all
$ 5 < 3    # Equivalent to {5 "<" 3}

So what if an angle bracket is in command position?
Code: Select all
$ < ./filename   # This command could be shorthand for "open this file for reading", or "cat"

Then, you could redirect to/from files by combining this with pipeline syntax:
Code: Select all
$ < ./filename | rot13 | > ./sekrit


Though I'm not entirely happy with the idea of doing that if I'm also using the angle brackets as syntax in some cases. Like if <name> is some kind of syntax for containing or encapsulating something, then using angle brackets in a way like that, which doesn't encapsulate anything is kind of awkward. (I guess the same would apply to comparison operators, then... <sigh />)

This style of redirection actually goes back to really early Unix shells. Turning the angle brackets into one-step redirection syntax was an improvement that came later on. Of course, I recognize that this is nevertheless problematic and grating for people who are familiar with current Unix shells. As with all my design decisions, it is a work in progress and a necessary compromise between how I want to see the shell work and how it needs to work in order to not drive away potential users in the first 5 minutes. :)

To give you an idea of what I'm taking about with the whole "redirection is no longer simple" thing, consider a case where I'm building some data records and storing them to a file:
Code: Select all
$ for {$x} --in {some set of inputs} --emit { ['name=(extract-name $x),'address=(extract-address $x),'phone=(extract-phone $x)] } >> ./record-table


The shell has some internal concept of how a simple data structure like this works (in this example, the square brackets encapsulate some kind of object, in this case a list of values with associated field names) - as well as an encoding that it uses to pass data structures like this to other processes. It could simply write that encoding to the end of ./record-table - so the (encoded) contents of ./record-table, starting at the point where the file previously ended, would look kind of like this:

Code: Select all
<stream header>
  <record...>
  <record...>
  ...
<stream terminator>


That could work, and it would mean that each time you do something like this, you simply add a new stream header and stream terminator to the file, surrounding the new data. However, I expect it would often be a better choice to append data to a structure already established in the file. To do that, you would have to open the file as an object (that is, some library or process has to go read what's in the file, figure out how to append new records to the file while maintaining the overall structure of the file) - and establish what the file's structure is when you create it. Like, is it an XML file with records encoded according to some schema? (xCard, for instance?) Is it a Palm "Contacts" database file? And so on. Those sorts of questions can impact how you structure the data within the loop, of course, but they also impact how you go about saying "I want to add new records".

For instance: without my decision to hijack the angle brackets as new syntax, one could say that a redirect operator like ">>" does something like "figure out what kind of file the target is, open it and get ready to append new stuff." However, that's not what people want when they say "echo '</tagname>' >> derp.xml" - so redefining the simple forms of the redirect operators to do something with that level of complication wouldn't be appropriate.

I still haven't made a clear decision of how to handle situations like this but at the moment I'm toying with this sort of syntax:
Code: Select all
$ generate-records | $./records.xml --append-records


That is, "$./records.xml" is the bit that says, "take this file, figure out what it is and open it with an appropriate object-interface library". That reference basically launches a process which handles the job of interacting with that file for a while. It is assumed that the file is already of valid structure, and that (barring any malfunctions) the file will be of valid structure once that object-wrapper process goes away. The newly-constructed object process is then instructed to receive new records (from input pipe) and append them to the end of the list encoded in the file. "generate-records" then must generate records in a form that the XML object wrapper's "append-records" method will accept.

treat any undecorated name anywhere on a command line as a subcommand to be found on the path, treat string literals as self-evaluating objects by default, and as command names only when explicitly requested, and so on... This has numerous complications, of course. The angle brackets thing goes against decades of experience with shells on multiple platforms. Treating undecorated names as subcommands means the whole command-line interface of "cvs" suddenly requires quoting. Making strings self-evaluate means the user has to take special action to use a string literal or variable as a command name.

Note that this has a lot more consequences than you acknowledge.

For instance, gcc foo.c -o foo. foo can't evaluate to a file, because the output file can't exist yet -- you'd either need special shell syntax for "create this file" (which adds extra overhead), require the user to quote output filenames, or make it context-sensitive where a name like that evaluates to a file if it exists but to a string if not (which seems somewhat unpredictable and I'm really worried about a TOCTTOU vulnerability). Note that this sort of thing -- putting the name of a nonexistant file on the command line -- is incredibly common.


Exactly... This is exactly why I've given serious thought to abandoning this thread of my design... There's just too much stuff that depends on string literals being passed in as command arguments, and people are used to being able to just type them in.

I think you misunderstood what I was getting at here, though. At least a bit.

If I included this rule in my shell design (which does seem a bit dubious at this point - most of the benefits seem pretty minor, and the problems with the concept are pretty unsettling) - what it means for a command like "gcc foo.c -o foo" is that every undecorated name on that command line (in any position) is treated as a command on $PATH. So it doesn't even matter if ./foo.c or ./foo exist, the shell would look for /usr/bin/foo.c and /bin/foo.c (and so on) instead.

To get around that, the shell syntax provides various ways of writing string literals:
  • Tokens beginning with a dash are assumed to be GNU-style command arguments, so everything from the dash up to whitespace is included as part of the string.
  • Tokens beginning with /, ./, or ../ are considered filename literals. They are subject to wildcard globbing rules and the shell will tab-complete them as filenames.
  • For cases where the user wants to put a string literal into a command and it doesn't fit the other two rules, there's a compact word-quoting rule inspired by the symbol-quoting in Scheme: prefix with a single quote, and everything following until the first whitespace is part of the string literal.

As a few examples:
Code: Select all
$ gcc foo.c -o foo    # Doesn't work as expected under the "undecorated names are subcommands" rule.  This would execute like "gcc $(foo.c) -o $(foo)"
$ gcc ./foo.c -o ./foo  # User doesn't have to "quote" the filenames because file path syntax is already a sort of quoting rule which generates strings.
$ gcc 'foo.c -o 'foo   # I know people aren't going to like having to use the single-quote prefix in cases like this...  Price paid, perhaps, for not using path syntax.


(EDIT): One of the complications here is the fact that a quoted string is (in my design) always a string, while an undecorated name is context-sensitive.
This is gonna be a big one, so open the spoiler tags for a wall of text of me struggling with the constraints of my design.
Spoiler:
Code: Select all
$ echo "ls"   #Writes the string "ls"...  Since "echo" is a regular Unix program and its output isn't redirected, it writes to the TTY instead of feeding its output to the shell's REPL.
ls
$ emit "ls"    # Emit is like echo, except it writes out shell-data rather than text.  Since it is a shell-native program and its output isn't redirected, its output goes to the shell's REPL.
"ls"
$ "ls"       # String self-evaluates, so the shell's REPL loop prints a representation of the string object...  Something you could type back in to get basically the same object.
"ls"
$ "ls" -l      # What does the string object do if you pass it "-l" as an argument?  Who knows?  But this doesn't call /bin/ls.
$ echo ls    # If "undecorated names are subcommands" is thrown out, then this is equivalent to {echo "ls"}
ls
$ ls         # Undecorated name in command position is a command.


This gets further awkward when you consider that pathnames, too, are effectively "quoted strings".
Code: Select all
$ echo /bin/ls    # Equivalent to {echo "/bin/ls"} except that the shell handles tab-completion and wildcards according to filename rules if you use path syntax to write a string.
$ emit /bin/ls
"/bin/ls"
$ "/bin/ls"          # As previously noted, quoted strings in command position self-evaluate, yielding a string object, unless they are given command arguments.
"/bin/ls"
$ /bin/ls            # Here's the trick.  Does this run /bin/ls, which is probably what users expect in this case, even though it's inconsistent with other quoting rules?  Or should it self-evaluate as a string and yield "/bin/ls"?
$ "/bin/ls" --length   # String literal treated as an object and used to invoke one of the string class's methods.
7
$ /bin/ls --length     # Does this result in running /bin/ls, passing it the argument "--length" which it will reject?  Or is this another string object invocation?
$ a="/bin/ls"; b=/bin/ls    # $a and $b should be equal.
$ $a
"/bin/ls"
$ $b
"/bin/ls"


I may have painted myself into a corner, syntactically speaking... I think the best way out, short of backpedaling away from huge chunks of my design, would be to say that pathnames are just another quoting rule, and that, like quoted strings, they don't do anything special in command position, they just evaluate to themselves, or invoke some string class method. That means people can't run commands by specifying their location:
Code: Select all
$ ./a.out    #Sorry, dude, that's a string literal!


I have considered using the "$./filename" syntax from my redirection example as a way to (less) conveniently allow users to invoke commands by their filename:
Code: Select all
$ $./a.out     # Or $"./a.out" or a="./a.out" $a, etc.

But that has issues, too. The behavior of the dollarsign thing then depends on whether the execute bit is set. It executes the file if it's executable, otherwise tries to find out what kind of file it is and construct an object out of it. The latter rule is in a similar vein to normal variable expansion (as if to say, "this file contains something I want to treat as a variable") while the former is somewhat like subcommand substitution $() syntax... I think either rule could make sense on its own but together it seems like an accident waiting to happen...

The other way out, I guess, would be to say that string literals are treated as command invocations when they're in command position, too. That would complicate the process of accessing string methods, or possibly necessitate that strings simply not be treated as "objects" at all:
Code: Select all
# For instance...
$ "/bin/ls" -l    # Just runs /bin/ls with "-l" as an argument.
$ emit ("/bin/ls" --length)    # The string literal is no longer in command position, if parenthesis don't create a new command position within them...
7
$ a="/bin/ls"
$ $a -l      # Still runs /bin/ls...
# If we go farther, and say strings just flat-out aren't objects...
$ emit ("/bin/ls" -l)   # If parens don't create a command position, then this is an error...  Trying to invoke something that can't be invoked.
$ string-length /bin/ls   # String operations must become separate commands, since strings aren't objects.


I don't really want to go that direction, but it's a question of consistency that has to be addressed in my design... Most programming languages don't have this problem because there's a distinct syntax for invoking something:
Code: Select all
command(args);   // for instance
// while this is a reference to the function itself:
(command)


In my case there's no way to distinguish the name of something from an invocation of it with no arguments... Which works fine in Haskell, but this isn't Haskell.
Code: Select all
$ ls     # For a million different reasons, this has to be a 0-arg invocation of ls


And people need to be able to name a command to be run in other ways and specify what level of dereferencing they want. The status quo of the shell is that they get one level for free (text expansion prior to evaluation)... All this mess is just a bad interaction between constraints of my design and that expectation.

This is probably what led me to "undecorated names are subcommand invocations" in the first place. It may be a nasty rule to introduce into the shell, it may be prone to trip people up in a million different ugly and possibly dangerous ways - but at least it's consistent. Undecorated names, under this rule, are simply always some kind of command invocation. But it can't work. It would trip me up, too. :)


It also means you can't say make all. It means you can't say grep something file. It means you can't say ssh compy, or locate file, or ps2pdf basename.


Well, yeah. I mentioned cvs and apt-get as examples of this problem but the problem does extend farther as you point out. Basically, I figure in some of those cases people just need to get used to quoting things:
Code: Select all
$ # In all these cases, unless otherwise noted, we cease to require quoting if "undecorated names are subcommands" is abandoned.
$ make 'all
$ grep "pattern" ./file   # - but if you write the filename without path syntax, then the shell doesn't know how to apply tab-completion...  That is, it has to assume undecorated names are filenames and tab-complete them as filenames, unless it has some information about the command being run that tells it how to tab-complete a particular argument.
$ # The choice to use double-quotes instead of the single-quote prefix for the pattern was kind of an arbitrary choice, too - simple patterns could be specified with the single-quote prefix, while more complicated patterns would tend to require double quotes.
$ ssh 'compy    # - though in this case tab-completion probably can't work regardless of syntax
$ locate 'file
$ ps2pdf ./basename


You're talking about a radical change in the command-line syntax, and personally I don't buy the benefits.


Yeah, I feel the same. I've tossed around a lot of syntax ideas over the years (radical changes, a lot of them) and a lot of them just haven't panned out. In fact I would bet that once I charge up my old phone and take a look at the notes I made for this idea, I'll probably find that I already shot it down myself.

However, I think with the direction my shell design is going, users will have to get used to quoting some things they didn't quote before. It stinks, because I know the more uncomfortable my shell is for people used to traditional Unix shells, the less likely people will want to keep using it long enough to start enjoying it... But even without that dubious "undecorated names are subcommands" rule, there's just too many things going on in my syntax design to allow undecorated names to always just work the way they used to. It's like, when I moved from DOS to Linux, I was taught that I should abandon the idea that the CWD is always on the search path... that I could add "." to $PATH if I wanted to and get that behavior back, but that it was a bad idea and I should get used to writing "./program" instead of "program"... Likewise I think with the direction my design is headed, there's going to be some nudging to get people to eschew "cmd filename" in favor of "cmd ./filename".

The way I look at it, the direction I'm looking at for the shell design is inherently radical in its departures from the familiar Unix shells. I'm trying to make something new by taking elements of the old shells that I like and combining with other ideas - and given the wide range of calling argument formats people have defined for their programs over the years, it's inevitable that it won't all fit nicely. I want to do what I can to make the new shell comfortable for people familiar with the Unix shell, but the syntax has to serve the new design, too.

A basic problem is that I want to do things that don't necessarily fit in the existing syntax... Almost any change to the syntax breaks something. So the shell design is like a series of compromises, with a trail of abandoned ideas along the way.

As an example, a very basic difference between my shell design and a regular Unix shell is that there's a concept of datatypes that extends not just to environment variables (as in the ones Bash provides when you declare a variable with "set" or (IIRC) "export") but to any context where you enter a value. For instance:
  • 10 is a number
  • "10" is a string

The whole concept that 10 isn't the same as "10" is very common in programming languages but very contentious in the context of a shell. And I recognize that it's a problematic idea when applied to the Unix shell, all the existing programs that need to run in it, etc. But I think it's a good and valuable idea, too, especially since the shell design is "object-oriented" in the sense that it attempts to help the user (in contexts like tab-completion and so on) to find commands relevant to a piece of data by using the type of that data as a starting point.

The most common example: is 103 less than or greater than 20? String comparison says it's less-than, numeric comparison says it's greater-than. So shells (and oddball languages like PHP) have separate comparison operations for string-compare and numeric-compare - while other programming languages more typically just say "they're strings, so the default comparison is lexicographic according to the encoding" or "they're numbers, so compare their values." (And Perl, IIRC, just looks at the strings and guesses what comparison is appropriate? I don't remember, it's been a while...) Likewise, string concatenation is distinguished from addition (not necessarily a bad thing)

But there's a further complication that I wanted to describe, the unexpected complication that comes from using tools designed for an environment with extremely minimalist syntax in an environment with more syntax:
Code: Select all
$ some-command | head -10

The problem there is that, conventionally, -10 would be considered a negative number. There's a further problem, which is that "names beginning with a dash" is one of my shell's quoting rules. Really, the "head" command should take a positive number, but its argument looks like a negative number because it uses the dash to distinguish between filename arguments and command options (unless it's just "-"). So that makes the inclusion of a pre-existing "head" command into my shell a bit awkward.

As another example, I mentioned before that arithmetic operators work in the syntax by evaluating to strings, which are then passed as "command arguments" to the object in "command position". In other words, arithmetic operators aren't real infix syntax, they're symbols, and a large arithmetic expression is handled as a method invocation on some numeric class, passing the "infix" symbols and other operands as arguments. (I think Smalltalk works this way, too?)
Code: Select all
$ 5 + 13    # Equivalent to {5 "+" 13}
$ 4 * 2     # Star isn't in pathname syntax, so it doesn't glob.  Yes, this means the shell won't glob on commands like "rm *", you'd need to say "rm ./*"
$ 10 / 5    # Perhaps the awkwardest of all.  The slash makes the shell's pathname syntax kick in...  Which doesn't really do any harm here, slash just evaluates to "/", but it does have other implications...


This leads to a bunch of whitespace sensitivity issues unless I make the operator syntax create implicit word breaks on either side of itself: but I can't do that for "-" and "/", because those characters have special meaning that can't be undermined.
Code: Select all
$ 5 /2     # Is that {5 / 2} or {5 "/2"}?  Is /2 a file in the root directory?
$ 5 -2     # Either {-2} should evaluate as "-2" or it should evaluate as a negative number...  So when 5 is called with the argument -2, it shouldn't yield 3, because...
$ a=-2    # a is negative 2
$ b="-2"  # b is "-2"
$ 5 $a     # This shouldn't be treated as "5 - 2" or "5 + -2", doesn't really make sense.
$ 5 $b     # Likewise.


As another, very simple example, I use comma as a command/value separator - like semicolon but with higher precedence:
Code: Select all
$ a, b, c | filter    #This is like (a; b; c) | filter


Of course, there's commands that use comma as part of their argument syntax - and not necessarily as value separators. But even if an existing command does use comma as a value separator in a command argument, turning comma into a syntax character means I have to make some kind of decision about how (or if) compatibility is going to work.
Code: Select all
$ chmod u+w,go-w ./file

I could write my own "chmod" to deal with the problem (but that doesn't solve the problem for everything else) - I could make the shell dodge the issue when calling a non-"native" program by passing in the raw text as the argument, as long as doing so doesn't conflict too badly with the shell's syntax rules (which I kind of have to do anyway, for compatibility to work) - or I could make the users of these programs suffer, by forcing them to quote the argument (or value list, or whatever) before passing it to the program.

I ran into similar problems when I've considered taking at-sign and colon as infix syntax (which are used, among other things, for things like user@hostname or hostname:port pretty commonly... And IPv6 uses colons as well, of course.) Sometimes it seems like there's just no characters left that are available to claim as syntax. :)
Last edited by tetsujin on Wed Jul 11, 2012 10:29 pm UTC, edited 1 time in total.
---GEC
I want to create a truly new command-line shell for Unix.
Anybody want to place bets on whether I ever get any code written?
User avatar
tetsujin
 
Posts: 422
Joined: Thu Nov 15, 2007 8:34 pm UTC
Location: Massachusetts

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Wed Jul 11, 2012 10:19 pm UTC

tetsujin wrote:It's probably more commonly seen in scripts. To me it seems that the fact that the lousy syntax for comparisons and expressions in the shell is one of the reasons people don't use them in the first place. :) ... I guess one way to put it is, I don't think about my shell design just as a "new command shell" or even a "scripting language" - it's to be a comfortable interactive environment for getting data from different sources and doing things with it.

This reflects a fairly fundamental difference in goals... to some extent, I don't even care about scripts. That's not fully true, and there are design decisions I've thought of which are guided by script vs interactive, but it definitely has some truth to it. I'm much more interested in the interactive case, as my general feeling is that if you want to write a script there are languages like Python that do that really well for most things that shell scripts are used for. Whereas I don't think there's nearly as good of tools for interactive shell use. And that use has some very peculiar requirements relative to most programming languages. (In particular, interactive shell use is biased far far more toward "write" instead of "read", so some more cryptic syntax that makes things easier to write and harder to read is a good tradeoff to make.)

If you're writing data structures to a file, at some level a decision must be made about what the format of that file is to be...

My feeling is both "you have to" and "you want to" do that anyway; I don't view piping things to a file as being any less useful with a new shell as it is now.

For simple cases, however, there's a simple solution: in the syntax for my shell design, angle brackets, in isolation, are strings:
Code: Select all
$ 5 < 3    # Equivalent to {5 "<" 3}

So what if an angle bracket is in command position?
Code: Select all
$ < ./filename   # This command could be shorthand for "open this file for reading", or "cat"

Then, you could redirect to/from files by combining this with pipeline syntax:
Code: Select all
$ < ./filename | rot13 | > ./sekrit

I don't see that as being particularly useful compared to just naming them in and out or from and to or whatever, but note that there is one thing which becomes notably less convenient: if you want to specify both input and output, you don't do it on the same side of the command.

Code: Select all
<stream header>
  <record...>
  <record...>
  ...
<stream terminator>

While I admit my solution to this problem doesn't exactly leave all questions answered, my answer to that problem would be you don't have a stream header or terminator.

For instance, with my JSON utilities, I don't plan on outputting a JSON list of things -- I will be outputting several JSON objects in a row. This means that concatenating something to the end of the file just means you concatenate it to the end of the file.

So it doesn't even matter if ./foo.c or ./foo exist, the shell would look for /usr/bin/foo.c and /bin/foo.c (and so on) instead. To get around that, the shell syntax provides various ways of writing string literals:

I picked up that, though not the 'foo syntax. Still, it's back to the writability thing: you're breaking expectations while making it a little more annoying to do a common task and you haven't convinced me of its utility, especially as it sounds like ./file will just be a way of quoting a string and not some File data type.

$ ssh 'compy # - though in this case tab-completion probably can't work regardless of syntax

I disagree with that comment, at least in some cases. If you type in a host name, no, you're not going to get help. But if you type in a name of a host that's named in ssh's config, you could autocomplete that. Even for arbitrary host names, it could keep an idea of what you have typed in the past and complete one of those.

Code: Select all
$ some-command | head -10

The problem there is that, conventionally, -10 would be considered a negative number. There's a further problem, which is that "names beginning with a dash" is one of my shell's quoting rules. Really, the "head" command should take a positive number, but its argument looks like a negative number because it uses the dash to distinguish between filename arguments and command options (unless it's just "-"). So that makes the inclusion of a pre-existing "head" command into my shell a bit awkward.

FWIW I didn't even know head accepted that syntax, and it's not documented in the man page that I pulled up. (Well, maybe I did know and forgot. I always say head -n10.) I'm not sure I'd be too broken up about that sort of thing.

But nor am I convinced it's a problem, at least in this very particular case... suppose that you call head -10 and your shell parses the -10 as a negative integer. What is it going to do then? It can't pass the integer -10 to head anyway -- it either has to fail or convert it to a string. And if you convert it to a string, you're likely to wind up with "-10" anyway.

This leads to a bunch of whitespace sensitivity issues unless I make the operator syntax create implicit word breaks on either side of itself: but I can't do that for "-" and "/", because those characters have special meaning that can't be undermined.

* too. Is 4*2 doing 4 times 2 or globbing files that start with a 4 and end with a 2?

As another, very simple example, I use comma as a command/value separator - like semicolon but with higher precedence:
Code: Select all
$ a, b, c | filter    #This is like (a; b; c) | filter

I do like this idea. :-)

I could write my own "chmod" to deal with the problem (but that doesn't solve the problem for everything else) - I could make the shell dodge the issue when calling a non-"native" program by passing in the raw text as the argument, as long as doing so doesn't conflict too badly with the shell's syntax rules (which I kind of have to do anyway, for compatibility to work) - or I could make the users of these programs suffer, by forcing them to quote the argument (or value list, or whatever) before passing it to the program.

FWIW I suspect that case is suitably rare to require quoting. In my zsh-history I only have commas in two cases; -Wl,-rpath,/path/blah and already quoted in commit messages specified with -m.

You could do something like have a shell option that would warn the user if comma is typed without whitespace following and ask if he/she wants to proceed.

I ran into similar problems when I've considered taking at-sign and colon as infix syntax (which are used, among other things, for things like user@hostname or hostname:port pretty commonly... And IPv6 uses colons as well, of course.) Sometimes it seems like there's just no characters left that are available to claim as syntax. :)

Just for more info, the @s in my history consist of one user@host, one file@rev to Subversion, and a handful of uses of a file called @sys. (@sys is a magical name on AFS that acts like an alias for something like amd64_rhel6, i386_rhel5, etc. depending on the current system.) Colon never appears.

I wouldn't mind requiring quoting on those either. (I'd like to use :name to name pipes and xattrs/alternate data streams.)
EvanED
 
Posts: 4122
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tetsujin » Thu Jul 12, 2012 12:40 am UTC

EvanED wrote:
tetsujin wrote:It's probably more commonly seen in scripts. To me it seems that the fact that the lousy syntax for comparisons and expressions in the shell is one of the reasons people don't use them in the first place. :) ... I guess one way to put it is, I don't think about my shell design just as a "new command shell" or even a "scripting language" - it's to be a comfortable interactive environment for getting data from different sources and doing things with it.

This reflects a fairly fundamental difference in goals... to some extent, I don't even care about scripts.


Well, one of the things I like about shells in general is that you can use them interactively, and generate progressively more complicated commands as you need them - and apply what you developed in the interactive session directly to a script when you're ready for it to be automated.

there is one thing which becomes notably less convenient: if you want to specify both input and output, you don't do it on the same side of the command.


Anything I add or change, there's always a price. :) At least that's been my experience. Nothing good comes for free. Kind of like the more elaborate forms of variable substitution syntax... Having it, or removing it, either way is a kind of trade-off. You gain (or lose) convenience and lose (or gain) clarity and freedom to define other new syntax which might conflict with it.

Code: Select all
<stream header>
  <record...>
  <record...>
  ...
<stream terminator>

While I admit my solution to this problem doesn't exactly leave all questions answered, my answer to that problem would be you don't have a stream header or terminator.
[/quote]

Well, having another header and another terminator isn't exactly a problem, if the format can accommodate it - it's just not the greatest way to form the accumulated file, and depending on what format you're writing you can't count on a simple "append" always working the way you intend. There have been other situations where having a header/terminator in my serialization format has been kind of unfortunate. For instance, suppose something like this:
Code: Select all
$ primes 2 | (for --input {$x} --until {$x > 10} --do {};  for --input {$x} --yield {$x * $x} --until {$x > 100})

The idea there is that the first loop would consume values until we reach a prime larger than 10 - and then the second loop would filter values until we reach a prime greater than 100. Suppose for the moment that "for" is not internal to the shell, it's a program on the PATH - so the shell has to hand each of those processes a file descriptor in turn. If each process read one whole value from the input stream at any given time, and never read more values than it had to, the shell could run the first "for", wait for it to terminate, then feed the same input fd to the second "for" - and everything would work as intended... if the second "for" is able to read from a "partially-consumed" input stream, missing a header and starting at some arbitrary point in the stream. (Note, however, this strategy does rely on "for" being very well-behaved about how much input it consumes... Most programs aren't so well-behaved due to buffering rules.)

There's a variety of reasons I can't do that in my serialization format. For starters, in the worst-case, if a program doesn't know what data format it's going to get as input, I want it to be able to find out what it's getting by reading the header. (Normally I think the idea is the shell would simply tell the program what to expect when it's launched - but cases may exist where that's not going to work.) Termination is less crucial, perhaps - mostly it's important for error-checking data over an unreliable link, or if you want to terminate and then write another header and start something else.

So it doesn't even matter if ./foo.c or ./foo exist, the shell would look for /usr/bin/foo.c and /bin/foo.c (and so on) instead. To get around that, the shell syntax provides various ways of writing string literals:

I picked up that, though not the 'foo syntax. Still, it's back to the writability thing: you're breaking expectations while making it a little more annoying to do a common task and you haven't convinced me of its utility, especially as it sounds like ./file will just be a way of quoting a string and not some File data type.


I guess go back and read my edit to the previous post if you want more insight into the chain of decisions (and their unfortunate complications) - at least if you're wondering about why I came up with the "undecorated names are subcommands" thing.

I mentioned before that while I want to do new things with my shell design, I also want to make it a comfortable environment for people familiar with current shells... And that a big motivation for the latter goal is because if I make it a pain for people to use the stuff they use now, the way they use it now, people won't want to run my shell. But there's always this give-and-take between my ideal goals and what I can actually do. All this mess is just the fallout. :)

Code: Select all
$ some-command | head -10


But nor am I convinced it's a problem, at least in this very particular case... suppose that you call head -10 and your shell parses the -10 as a negative integer. What is it going to do then? It can't pass the integer -10 to head anyway -- it either has to fail or convert it to a string. And if you convert it to a string, you're likely to wind up with "-10" anyway.


Well, I think this form of the line-count argument to "head" is vestigial at this point.... Not sure. But the point is that tools like this exist (and are in common use) with design that doesn't really jive with the design of my shell. Even when simple changes to syntax don't break existing tools, the difference in conceptual foundation can make the transition very jarring, a very awkward fit.

Is 4*2 doing 4 times 2 or globbing files that start with a 4 and end with a 2?


A rule in my design is that globbing never happens unless you use path syntax.
Code: Select all
$ emit a*      # No globbing.  Assume for the moment that an undecorated name just becomes some kind of string.
"a*"
$ emit ./a*    # Path syntax, so globbing kicks in.
(./a1, ./a2, ./a3)


As another, very simple example, I use comma as a command/value separator - like semicolon but with higher precedence:
Code: Select all
$ a, b, c | filter    #This is like (a; b; c) | filter

I do like this idea. :-)


Yeah, me too. I introduced it mainly as a value separator, and for a while I wasn't sure exactly what set it apart from semicolon:
Code: Select all
$ a = [2, 4, 6, 8]   # I introduced comma as syntax 'cause I thought it looked nice for things like this.  But in this case semicolon would do the same thing.

I tossed around a few wild ideas at various points in the design, like different ways of dispatching multiple commands to a process or one set of arguments to multiple processes/objects, things like that:

Code: Select all
$ (cmd1, cmd2, cmd3) --arg   # Like (cmd1 --arg; cmd2 --arg; cmd3 --arg)
$ cmd (--arg, --arg2)       # Like (cmd --arg; cmd --arg2)


You could do something like have a shell option that would warn the user if comma is typed without whitespace following and ask if he/she wants to proceed.


I'm not sure that really solves the problem. I put whitespace after commas as a style thing, but since the idea is that comma would be actual syntax, I wouldn't want it to be necessary to have (or not have) whitespace there.

I think probably the chmod case (and similar) would just have to be quoted. In my design, comma has higher precedence than pipe characters but still lower than argument binding, so a comma in the middle of a command argument would end the command.
Code: Select all
$ chmod u=rwx,go=r ./*    # Equivalent to {chmod u=rwx; go=r ./*}

To use commas as part of argument text would require quoting - or to use commas as a value separator in an argument would require parens.
Code: Select all
$ my-chmod (u=rwx,go=r) ./*    #This could work, maybe.
$ chmod "u=rwx,go=r" ./*     #And obviously there's this...


(I'd like to use :name to name pipes and xattrs/alternate data streams.)


Early on I was thinking value:constructor as a way to coerce data into another type. At this point I might use "::" for that or I might just leave it out. Currently I'm toying with the idea of name: being a prefix for a file path that specifies domain... Kind of like DOS drive letters.
Code: Select all
$ echo sdcard:/DCIM/100ANDRO/IMG0001.JPG      # This is shell syntax, the rest of the system doesn't respect it.  So in this case, sdcard: resolves to a mount point:
/media/sdcard/DCIM/100ANDRO/IMG0001.JPG
# Of course somehow the shell or system has to be set up so the shell knows how to resolve "sdcard:"
$ echo http://www.google.com                         # This is kind of a "harmless degenerate case" of the colon rule: "http:" is taken as the domain of a file path, but the common URL protocols are set up in the syntax to simply self-evaluate.  Sort of a disparity but I think it's more comfortable that way.
http://www.google.com


Another place I'm thinking of using this "domain specifier" is as a way to disambiguate commands on systems that might have multiple commands with the same name.
Code: Select all
$ find -print0 | xargs -0      # Back when we had a Solaris machine in the office, it had its own "find" and GNU "find" was "gfind"...  So what if I could do this?
$ {vendor=GNU}:find -print0 | {vendor=GNU}:xargs -0       # It's ugly, but the idea is to make it easier to write reliably portable scripts.  Either we'll run GNU find or the script will fail with an error stating that it's not available.
$ {version='3.4.6}:gcc           # In case you have a bunch of gcc's and need to run a particular one...

Obviously a feature like that can't work unless the software is installed in such a way that the shell can identify these bits of information in the available alternates... But I think having the framework there could be a good start.

Named pipes, xattrs, and so on... Are good stuff. Early in the design I spent a lot of time thinking about how to incorporate xattr access into the syntax, and make it jive with mechanisms for accessing metadata internal to the file. I abandoned most of it, though I'll probably revisit the problem later on. I'd really like xattrs to be something that's "centrally" supported in the shell as opposed to tacked-on with an external utility.

I guess one way to do that would be, again, use the colon as a domain specifier:
Code: Select all
$ ./foo.mp3:artist="Splashdown"          # This sets filesystem-level metadata (xattrs)
$ $./foo.mp3:artist="Splashdown"        # This constructs an object from the MP3 file, and accesses the variable in terms of that object...  The MP3 object interface lib recognizes this as a request to set an ID3 field.


Though this does raise awkward questions, again, about the difference between a string that's a file path and a string that's just a string. Are they the same thing? Are they different?
Code: Select all
$ a=./foo.mp3
$ emit $a | create ./b           # "./foo.mp3" is stored in new file ./b
$ $./b:artist="Splashdown"     # Is that creating metadata for the data stored in $./b?  Or is it creating metadata for the file specified within $./b?
$ $a:artist="Splashdown"       # Is that creating metadata on the file, or on the variable containing the filename?


I guess one way to resolve that would be to require something else (syntax or something) to distinguish cases where we're accessing variable bindings in a file's xattr's rather than on some piece of immediate data...
Code: Select all
$ ./foo.mp3:xattr:artist =...?   # Though I guess that still doesn't solve the case of $./b:artist
$ (xattr ./foo.mp3):artist = ...?   # Less pretty, but maybe more robust...


I've gone through a bunch of concepts for named pipes, too, but I'm not sure I've really found anything I'm entirely happy with. One thought I've had is to hijack # for these kinds of "special named entities" - and maybe use something else for comments...
Code: Select all
$ #special-name      ## Comment


So I could do things like this:
Code: Select all
$ (1, 1, (< #a)) | read {$prev}, for --input {$x} --do {emit ($x + $prev); prev=$x} | tee (> #a)     # "tee" is a shell-native version, and (< #a) and (> #a) are the read and write ends of a pipe.


Of course, somehow "read" has to pull out just the first value and leave the rest of the stream in a state that "for" can deal with - which as I said before is an issue... The point of making the pipe a special syntax instead of just creating some kind of pipe object and piping to it is so the shell can handle issues of type conversion and multiplexing more gracefully.
---GEC
I want to create a truly new command-line shell for Unix.
Anybody want to place bets on whether I ever get any code written?
User avatar
tetsujin
 
Posts: 422
Joined: Thu Nov 15, 2007 8:34 pm UTC
Location: Massachusetts

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Thu Jul 12, 2012 4:26 pm UTC

tetsujin wrote:Well, one of the things I like about shells in general is that you can use them interactively, and generate progressively more complicated commands as you need them - and apply what you developed in the interactive session directly to a script when you're ready for it to be automated.

I agree here to an extent, and I do plan on having control constructs and the ability to read scripts, but I still feel like the goals of an interactive shell and the goals of a well-designed scripting language are somewhat contradictory. And as I'm more interested in the former, I'm willing to sacrifice the latter a bit to do so.

Well, having another header and another terminator isn't exactly a problem, if the format can accommodate it - it's just not the greatest way to form the accumulated file, and depending on what format you're writing you can't count on a simple "append" always working the way you intend.

The "if the format can accommodate it" is a biggie. I can elaborate more, but I don't think either XML or JSON fits that bill and you have to do what I said and drop the header.

(That doesn't mean there can't be some sort of semantically special first element, just that you can't have [ and ] or <foo> and </foo> surround your whole document because then you can't just concatenate them together and have the same structure.


Suppose for the moment that "for" is not internal to the shell, it's a program on the PATH

Just out of curiosity, this means that you can't do the usual 'for' thing of setting an environment variable if you don't provide a way for subprocesses to do that. Do you plan on having 'for' take the command to run as opposed to having it be a shell construct? Have a way for subprocesses to set environment variables? Or is that just an example and really you'd have a more traditional 'for'?

I mentioned before that while I want to do new things with my shell design, I also want to make it a comfortable environment for people familiar with current shells... And that a big motivation for the latter goal is because if I make it a pain for people to use the stuff they use now, the way they use it now, people won't want to run my shell. But there's always this give-and-take between my ideal goals and what I can actually do. All this mess is just the fallout. :)

So my philosophy is that you should make it be what you want. If you'll get frustrated after a day and give up with your own shell, then obviously you didn't make a good design decision. :-) But I wouldn't put too much stock in what other people (e.g. me) say except in terms of making you think of things you didn't come up with yourself, to break ties, etc.; make it for yourself first.

A rule in my design is that globbing never happens unless you use path syntax.

Ah, gotcha.

I'm not sure that really solves the problem. I put whitespace after commas as a style thing, but since the idea is that comma would be actual syntax, I wouldn't want it to be necessary to have (or not have) whitespace there.

So you could make whitespace matter, though I'd admit it'd be a bit inconsistent. Actually, that could get you a way to have < and > do what you want: with whitespace, they could be redirection operators, and without, comparison operators.

I'm not sure I like that idea, but it's an idea. :-)

$ echo sdcard:/DCIM/100ANDRO/IMG0001.JPG # This is shell syntax, the rest of the system doesn't respect it. So in this case, sdcard: resolves to a mount point:
/media/sdcard/DCIM/100ANDRO/IMG0001.JPG

Not a bad idea, just to play Devil's Advocate for a second, it's not much different from just $sdcard/DCIM/.... Or you could use something like Zsh's feature that lets you type ~var/ (like any other ~user) for variables holding paths. One extra character in those.

You also have to decide whether that resolves if I type something like --prefix=foo:bar/baz.

Named pipes, xattrs, and so on... Are good stuff. Early in the design I spent a lot of time thinking about how to incorporate xattr access into the syntax, and make it jive with mechanisms for accessing metadata internal to the file. I abandoned most of it, though I'll probably revisit the problem later on. I'd really like xattrs to be something that's "centrally" supported in the shell as opposed to tacked-on with an external utility.

Personally, I'd really like them to be better usable across the system, but it's probably a chicken and egg problem, so I decided I may try to be the egg. :-)

I'm not 100% sure I can make it work, but we'll see.

Code: Select all
$ ./foo.mp3:artist="Splashdown"          # This sets filesystem-level metadata (xattrs)

This is pretty much what I'm thinking, though I wouldn't use the assignment syntax; instead, you could do echo "Splashdown" > foo.mp3:artist. This is, of course, inspired by the syntax for NTFS alternate data streams.

Though this does raise awkward questions, again, about the difference between a string that's a file path and a string that's just a string. Are they the same thing? Are they different?

My awkward question is what happens when you give file:attr to a program that doesn't understand it? The semi-stupid thing about xattr on Linux is that the people designed them forgot "everything is a file" and you can't name them and go through the normal file APIs (at least AFAIK). So what should you pass into such a program? I don't have a good answer for that, unfortunately. I can think of a couple hacks but nothing is a great solution of course.

The other awkward question is what happens if someone actually names a file with a colon -- how do you specify that name?

I've gone through a bunch of concepts for named pipes, too, but I'm not sure I've really found anything I'm entirely happy with. One thought I've had is to hijack # for these kinds of "special named entities" - and maybe use something else for comments...
Code: Select all
$ #special-name      ## Comment

Hmm, interesting idea. Personally, I view xattrs and named pipes as being pretty related: e.g. if cmd emits multiple named streams then cmd > file should write those streams to the corresponding xattrs. So that's why I was thinking of a unified syntax.

That may make more sense if I tell you that your named pipes aren't the same as mine :-). My named pipes allow a single command to have multiple inputs and/or outputs.
EvanED
 
Posts: 4122
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tetsujin » Thu Jul 12, 2012 6:29 pm UTC

EvanED wrote:
Suppose for the moment that "for" is not internal to the shell, it's a program on the PATH

Just out of curiosity, this means that you can't do the usual 'for' thing of setting an environment variable if you don't provide a way for subprocesses to do that. Do you plan on having 'for' take the command to run as opposed to having it be a shell construct? Have a way for subprocesses to set environment variables? Or is that just an example and really you'd have a more traditional 'for'?


Well, really I'd have "for" implemented internally, because I expect it to be integrated into the shell to ease the implementation of its more complicated features. The point of saying "suppose for was an external" was to show that the intent was that "for" in this case had no "magical" properties granted by integration with the shell.

Even environment variables are not an issue: "for" doesn't need to propagate its variable changes out to the calling shell, it just needs to propagate those changes to programs it runs. So I might need to throw an "export" in there but otherwise having for as an external would work, if I wanted to. I wouldn't want to write for as an external, but there may be occasions where someone wants to do something similar to that for whatever reason...

I'm not sure that really solves the problem. I put whitespace after commas as a style thing, but since the idea is that comma would be actual syntax, I wouldn't want it to be necessary to have (or not have) whitespace there.

So you could make whitespace matter, though I'd admit it'd be a bit inconsistent. Actually, that could get you a way to have < and > do what you want: with whitespace, they could be redirection operators, and without, comparison operators.


I really want to avoid that sort of thing as much as possible. I always hated that in bash, etc. you had to have spaces around the square brackets. I understand why that's the case (i.e. they're not syntax) but to me it's very awkward and uncomfortable to have that be the case. Still, there are places in my design where I probably can't avoid it (like infix operators, which, again, aren't real syntax)...

You also have to decide whether that resolves if I type something like --prefix=foo:bar/baz.


That brings up another tricky question - whether the equals sign is syntax. I kind of think it should be (and that's how I've used it in examples and such) - but it's the usual compatibility issue, making it work with existing programs in a way that makes sense and does what people want it to do. One of those cases where the shell would parse things according to its own rules, but then if it's calling a normal program it'd have to just pass in basically what was typed.

If equals is syntax, then the path would start with "foo:", "--prefix=" wouldn't be part of the domain specifier.

My awkward question is what happens when you give file:attr to a program that doesn't understand it? The semi-stupid thing about xattr on Linux is that the people designed them forgot "everything is a file" and you can't name them and go through the normal file APIs (at least AFAIK). So what should you pass into such a program? I don't have a good answer for that, unfortunately. I can think of a couple hacks but nothing is a great solution of course.


Yeah, can't even get a file descriptor to xattr data, I think. (If we could then a solution could be to pass in a /dev/fd file)

The other awkward question is what happens if someone actually names a file with a colon -- how do you specify that name?


Quoting syntax, I guess. I haven't worked out the details of quoting characters in pathnames in my design (I'm slightly averse to using backslashes because they look similar to forward slashes and are the directory separator character on DOS, etc.) - but if Bash had to escape a colon in a pathname it'd just prefix it with a backslash or put it in double quotes.

I've gone through a bunch of concepts for named pipes, too, but I'm not sure I've really found anything I'm entirely happy with. One thought I've had is to hijack # for these kinds of "special named entities" - and maybe use something else for comments...
Code: Select all
$ #special-name      ## Comment

Hmm, interesting idea. Personally, I view xattrs and named pipes as being pretty related: e.g. if cmd emits multiple named streams then cmd > file should write those streams to the corresponding xattrs. So that's why I was thinking of a unified syntax.


There are the unfortunate limits on the size of xattrs to consider, though. I forget the exact numbers but IIRC on some filesystems (incl. the EXT2/3/4 family?) it's really low, like a couple kilobytes per value.

That may make more sense if I tell you that your named pipes aren't the same as mine :-). My named pipes allow a single command to have multiple inputs and/or outputs.


Yeah, I think we discussed this before. IIRC you pointed me at something else that implemented named pipes with those kinds of capabilities - and, for instance, a line filtering program in the vein of "grep" would write out its matches to one stream and the rejected lines to another stream...

At a very basic level that's something I want for the sake of basic I/O redirection (stdin/stdout/stderr) - numeric FDs are fine for that but symbolic names would be lovely... In any case I feel like the existing shell syntax for redirecting those three FDs is pretty awkward (it's hard, for instance, to redirect the stdout of one program in the middle of the job to another program... Very minimal provisions for "non-linear pipelines" - which is why I started by addressing that question of how to write non-linear pipelines.

But beyond stdin/stdout/stderr I am interested in the idea of being able to attach more input/output file descriptors to a process. It can be done in existing shells, of course, but most programs don't support it and when they do you usually have to attach the FD and also pass in its number as an argument... Since I'm basically defining a new interface for programs to interact with the shell anyway, I could address that one as well.

One basic way to do it would be to simply specify the FDs as arguments on the command line:
Code: Select all
$ gen1 | (> #a)& gen2 | (> #b)& some-program --input1=(< #a) --input2=(< #b)


(Though in this example, "(< #a)" doesn't expand to the data extracted from the named pipe #a - that is, it's not like "cat", it's just yielding a file descriptor.)
When the shell sees a file descriptor passed as an argument to a program it can attach the FD to that program and substitute a /dev/fd path (or something else representing a FD number) in the command argument to tell the program where to get the data...

Syntactically that's the easiest way to do it, it has virtually no impact on other syntax... But it's not the sort of interface that can tell you things like "what input and output streams does this program support?" - and it'd be a lousy way to redirect standard streams (since each program would have to implement that individually...)

I have toyed with the idea of specifying multiple redirects for a job in some kind of block: though as I said I haven't come up with a syntax I'm really happy with yet.
Code: Select all
$ |{#input1=#a, #input2=#b} cmd {#x=#output1, #y=#output2}|


Basically each command in a job can define its own namespace of bindable file descriptors, and the pipe syntax can accept patch blocks on each side to map external stream names to internal ones either positionally or by mapping them to other names.
Code: Select all
## cmd1 generates streams #a and #b, cmd2 takes streams #b and #c as inputs.
## The pipe syntax basically means that file streams #a and #b on the left are mapped to #b and #c on the right.
$ cmd1 {#a, #b}|{#b, #c} cmd2


The syntax isn't fully worked out yet, though. Among other things I have to work out issues like how to reliably access local vs. global pipelines (i.e. pipelines that are defined at the scope of the whole job for the purpose of patching together a non-linear job vs. pipe names established internally by individual programs.)
---GEC
I want to create a truly new command-line shell for Unix.
Anybody want to place bets on whether I ever get any code written?
User avatar
tetsujin
 
Posts: 422
Joined: Thu Nov 15, 2007 8:34 pm UTC
Location: Massachusetts

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Thu Jul 12, 2012 7:11 pm UTC

tetsujin wrote:Even environment variables are not an issue: "for" doesn't need to propagate its variable changes out to the calling shell, it just needs to propagate those changes to programs it runs.

That's sort of what I was asking, whether you view for as running its commands. It's an interesting idea I hadn't thought of much... but it's not without its consequences. (You couldn'd do something like for ...: var=something and have var visible after the loop unless you provide a way for subprocesses to update the environment. Which I want to. :-))

If equals is syntax, then the path would start with "foo:", "--prefix=" wouldn't be part of the domain specifier.

Right. Actually I think it's kind of unfortunate now that you can't do --prefix=~ and have things work. (Well, have things work without cooperation of the program you're running.)

Yeah, can't even get a file descriptor to xattr data, I think. (If we could then a solution could be to pass in a /dev/fd file)

One thing I'm considering is to pass in an intermediary file or pipe to programs that don't advertise enhanced xattr support and using a proxy.

There are the unfortunate limits on the size of xattrs to consider, though. I forget the exact numbers but IIRC on some filesystems (incl. the EXT2/3/4 family?) it's really low, like a couple kilobytes per value.

Looking around, it's 4K even on Btrfs (which is a bit surprising). For Ext, it may be limited to 4K for file even. Still, there's an absolute ton you can do in 4K, and it may be possible to come up with a convention for dealing with overflowing one large "logical" xattr onto multiple "physical" xattrs. Or just tell people to use a better file system. :-)

Yeah, I think we discussed this before. IIRC you pointed me at something else that implemented named pipes with those kinds of capabilities - and, for instance, a line filtering program in the vein of "grep" would write out its matches to one stream and the rejected lines to another stream...

Yes.

(Though in this example, "(< #a)" doesn't expand to the data extracted from the named pipe #a - that is, it's not like "cat", it's just yielding a file descriptor.)

When the shell sees a file descriptor passed as an argument to a program it can attach the FD to that program and substitute a /dev/fd path (or something else representing a FD number) in the command argument to tell the program where to get the data...

This sounds almost just like process redirection: diff <(some-command) <(some-other-command), just in case you don't know about this (relatively-new-to-Bash) feature.
EvanED
 
Posts: 4122
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tetsujin » Thu Jul 12, 2012 10:23 pm UTC

EvanED wrote:
tetsujin wrote:Even environment variables are not an issue: "for" doesn't need to propagate its variable changes out to the calling shell, it just needs to propagate those changes to programs it runs.

That's sort of what I was asking, whether you view for as running its commands. It's an interesting idea I hadn't thought of much... but it's not without its consequences. (You couldn'd do something like for ...: var=something and have var visible after the loop unless you provide a way for subprocesses to update the environment. Which I want to. :-))


That's true. Forgot about that little detail. I have given thought to giving programs the ability to write to the shell's environment, though implementation-wise I think it's something that would come later.

Yeah, can't even get a file descriptor to xattr data, I think. (If we could then a solution could be to pass in a /dev/fd file)

One thing I'm considering is to pass in an intermediary file or pipe to programs that don't advertise enhanced xattr support and using a proxy.


Another solution would be to use something like FUSE to create filesystem-level access to xattrs. Requires more setup at the admin level, of course, but it could be one of the cleaner solutions overall. I guess another way to go would be to substitute a temp file, and watch it for changes (with one of the notify APIs maybe) and update the xattr data accordingly.
I guess the things to consider, though, are what kinds of things do you want programs to be able to do with data from xattrs? Simple cases like piping data from or to them are pretty straightforward. Providing a full (random-access) file interface to xattr data and passing in the xattr as though it were a filename is a bit more complicated. I'm not sure I'd bother going to that extent. :)

There are the unfortunate limits on the size of xattrs to consider, though. I forget the exact numbers but IIRC on some filesystems (incl. the EXT2/3/4 family?) it's really low, like a couple kilobytes per value.

Looking around, it's 4K even on Btrfs (which is a bit surprising). For Ext, it may be limited to 4K for file even. Still, there's an absolute ton you can do in 4K, and it may be possible to come up with a convention for dealing with overflowing one large "logical" xattr onto multiple "physical" xattrs. Or just tell people to use a better file system. :-)

I think 4K would go pretty fast. That's like 4 screens of text (80x25). Though I'm not so concerned about the frequency with which I'd hit the limit as I am simply with the fact that it's there waiting for me when I finally do hit one of those cases. I'd hoped xattrs could be used as file forks, and just store whatever. Having those limits in place means they can't really be exploited to that kind of degree. Still, I'm happy to have xattrs... xattrs is better than no xattrs.

When the shell sees a file descriptor passed as an argument to a program it can attach the FD to that program and substitute a /dev/fd path (or something else representing a FD number) in the command argument to tell the program where to get the data...

This sounds almost just like process redirection: diff <(some-command) <(some-other-command), just in case you don't know about this (relatively-new-to-Bash) feature.


Yep. In working on my design I've also taken the time to study the existing shells to see what kinds of features they provide outside of the ones I use regularly. I think it's important to understand some of the more arcane or recent additions so I don't fall into the trap of saying things like "You can't do this in Bash" (and then be told that, actually, there's a pretty good way to do it in bash) - and if I cast off some of these features I should at least know what I'm losing.

I thought process redirection was pretty neat and I was kind of surprised to learn about /dev/fd*. It seems like kind of a hackish way to get a file descriptor reference into a program that (probably) isn't written to accept numeric FDs for its file arguments... But at the same time it's a pretty elegant approach. A program can be totally ignorant of /dev/fd* and it'll still work...
---GEC
I want to create a truly new command-line shell for Unix.
Anybody want to place bets on whether I ever get any code written?
User avatar
tetsujin
 
Posts: 422
Joined: Thu Nov 15, 2007 8:34 pm UTC
Location: Massachusetts

Next

Return to Religious Wars

Who is online

Users browsing this forum: No registered users and 4 guests