Random Linux stuffs, comparisons to Windows, and other ...

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Random Linux stuffs, comparisons to Windows, and other ...

Postby EvanED » Tue Aug 07, 2007 12:35 am UTC

[I would have liked to continue the title with "and other streams of consciousness", but it wouldn't let me. Consider that a technical limitation.]

So over the weekend I learned a bunch of stuff about Linux shells in particular. I hope everything I say is accurate; I believe it is, but I'm obviously not an expert, especially on the Linux side.

For instance, I had a question for a friend of mine, which was how do you recursively grep a directory looking for an expression in all .c files. The only way I knew how at the time, and the way he told me, involved some incantation of find which I wouldn't be able to reproduce. The problem, of course, is that

Code: Select all

grep "foo" --recursive *.c

won't work, because the shell will expand the *.c into all matching files in the current directory before calling grep, so you might not have passed the --recursive. If you quote *.c, grep looks for a file actually called *.c, and so that doesn't work either.

By comparison, if grep were written for Windows (instead of just ported to have the same behavior it does on Unix), it's very likely that the command line I gave above would work. The Windows shell won't expand filenames, so grep would actually receive *.c as the argument, just as if you had quoted it to Bash. It would then recursively traverse, looking at .c files as one might naively expect the command to do above. (dir provides a case study -- try 'dir /s *.cpp' in a directory with subdirectories with cpp files.)

Thus what is a fairly common operation for me -- recursively grepping files of a particular type -- would be easier on Windows. I've used this example to justify in my mind why I think that the argument of Windows shell vs. Unix shell, while clearly biased towards Unix, isn't quite as one-sided as it initially appears and a lot of people make it out to be.

(The Windows model is also potentially more efficient, in addition to sometimes being easier to use. For instance, if the user enters something like 'ls moon*' on Unix, there are at least two linear directory traversals being undertaken: one by the shell, one by ls. With a smart implementation of each that stops when it gets to the first file after moon, you're looking at an average of n calls to readdir, where n is the size of the directory, each of which probably does a constant amount of work. On Windows, if there is only one match say, there will be at most two system calls. In general, there will be m, where m is the number of matches, but with a smart file system, the first of those will have to do only log(n) work. This is a consequence of both the shell behavior (cmd.exe doesn't expand moon*) and the fact that 'dir' doesn't even expand it, but passes "moon*" to the file system code in the kernel, thus allowing it to take advantage of the additional structure. This is more of an interesting side note, and probably doesn't actually do much of anything in practice. For instance, changing it so that the pattern isn't a prefix, so we say "ls *.c", means that Windows is going to almost certainly do a linear pass through the directory. Though if system calls are a significant part of this time, things are still looking good, as Windows will only need to make system calls for each matching file, not every file. Even so, this would only matter for large directories, and there are probably plenty of other particulars in play that this is only a puny piece of the performance puzzle. (Like that sentence? ;-)))

(Another example is something like "ren *.JPG *.jpg" which behaves "correctly" on Windows, but s/ren/mv/ doesn't work on Unix. This example is also lessened by new discoveries, in that I thought you had to do something like "for $file in *.JPG; do mv $file ${file%.JPG}.jpg; done" (not 100% sure of the syntax, which is part of my point), but I just found out about rename(1), which lets you do this almost as cleanly: "rename .JPG .jpg *.JPG".)

Anyway, while to my knowledge my argument still holds for Bash and tcsh, it doesn't for zsh, which would let you write this:

Code: Select all

grep "foo" **/*.c



In addition, there were a few other things that I thought that the Windows shell does better (at least in some respect) that you can turn on even in Bash, though they aren't default. For instance, though I haven't tried it, you can apparently tell Bash to ignore repetitive commands in the shell with HISTCONTROL=ignoredups. This was something that comes in handy as I often go back and forth between just a couple commands... for instance, I'll issue a bunch of make commands, then run into a problem that requires I run a program explicitly (as opposed through the check target in the makefile) so I can debug it, but if I use the up arrow I have to pass over the dozen make commands first. (I also found out about some things that I've wanted for a while that Windows doesn't do, like autopushd, auto_cd, and ctrl-R for reverse incremental search, which to a large extent obviates the HISTCONTROL option above ;-).)

Now that I've set those options, along with nocaseglob and a similar thing for tab completions (I'm a fervent believer that file systems -- and indeed almost everything else I can think of -- shouldn't be case sensitive, and the fact that I was doing this under cygwin meant that having the shell behave in a case-sensitive manner was pretty stupid), as well as configuring what I think is a pretty decent prompt, I didn't really say "stupid shell" at all today, which is a first, and I'm now pretty much strictly happier in zsh than I am in cmd.exe, instead of routinely going "this behavior is stupid, I wish it worked like {the other shell} in this respect". Hopefully it will hold.

Anyway, I'm not done writing in this thread, because I have other things to say, and questions to ask of the form "will Linux/Unix do blah?". This thread is going to be a dumping ground for a few different things on my mind. This particular one was posted in the hopes that it would be helpful to other people, and also in the hopes that someone will say "you dolt! there's a much better way of doing that" and teach me something.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Postby EvanED » Tue Aug 07, 2007 12:49 am UTC

So for my this post, I pose a question.

On Windows, it is possible to register a debugger, such as windbg or the non-Express Editions of Visual Studio (if you know how to do this with the Express Editions, please tell me) as a just-in-time debugger. Basically what this means is that, instead of getting the typical "this program has performed an illegal operation and will be shut down" dialog, it will offer you the opportunity to debug the dying process. (Or even just open the debugger automatically.)

Now, on Unix, postmortem debugging is done through the use of core files. When a program segfaults, it leaves behind a dropping. You can then load up the core file into GDB or whatever, and investigate away.

First question: when debugging a core file (I've never done this, though I really ought to learn how and start, rather than re-running the program in a debugger and hoping it's a deterministic bug so it crashes again), can you do things like change a variable and continue running if you "fixed" the problem?

Second question: is it possible to rig things so that on a segfault you can go right into GDB, like on Windows? This would be really nice to do, say, on a school system where I apparently don't have access to ulimit ("command not found") and core dumps are disabled.


After I eat dinner, I'll make one more post, in which I describe why my favorite adjective to use to describe Windows is schizophrenic, and why all the psych people should be mad at me for it.

User avatar
davean
Site Ninja
Posts: 2498
Joined: Sat Apr 08, 2006 7:50 am UTC
Contact:

Re: Random Linux stuffs, comparisons to Windows, and other .

Postby davean » Tue Aug 07, 2007 5:06 am UTC

This is too long, and I'm too tired to talk about principals of OS design rigorously, but I'll give a (very) brief reply to the truly stunning points.

EvanED wrote:For instance, I had a question for a friend of mine, which was how do you recursively grep a directory looking for an expression in all .c files. The only way I knew how at the time, and the way he told me, involved some incantation of find which I wouldn't be able to reproduce. The problem, of course, is that

Code: Select all

grep "foo" --recursive *.c

won't work, because the shell will expand the *.c into all matching files in the current directory before calling grep, so you might not have passed the --recursive. If you quote *.c, grep looks for a file actually called *.c, and so that doesn't work either.

By comparison, if grep were written for Windows (instead of just ported to have the same behavior it does on Unix), it's very likely that the command line I gave above would work. The Windows shell won't expand filenames, so grep would actually receive *.c as the argument, just as if you had quoted it to Bash. It would then recursively traverse, looking at .c files as one might naively expect the command to do above. (dir provides a case study -- try 'dir /s *.cpp' in a directory with subdirectories with cpp files.)



That shouldn't work and for the simple reason you are telling it to recurse into every directory ending with '.c'. You have to be able to tell if *which* directories to search in. To decide which files to look at, you'd use the much more powerful --include/--exclude.

Your view of it would leave a tool you couldn't do much with. without being able to tell it which directories to look in you'd have to assume one of two things, the current directory or the root directory. The first is the only real option as the first is far to unrestricted to be practical or useful. It is not possible to change to the directory in all cases and would also necessitate separate runs for each. Furthermore, changing directories is impractical and would highly complicate basic usage patterns like searching in every public_html folder.

You have a conceptual flaw here as you couldn't make a complete tool out of that that was general purpose in any way shape or form. (and in that you don't know what those options actually do)



EvanED wrote:(The Windows model is also potentially more efficient, in addition to sometimes being easier to use. For instance, if the user enters something like 'ls moon*' on Unix, there are at least two linear directory traversals being undertaken: one by the shell, one by ls.


No it isn't. ls doesn't make a linear transversal, it makes a direct lookup (one syscall per file). The shell need not make a linear pass ether necessarily (glob can be smart, only it's interface is specified, not it's implementation; in practice, no one cares to optimize away this O(n) I expect (ok, I actually know of ways to do this through in-kernel stuff if you have a real strong reason, but thats basicly custom kernel scripts ... people wouldn't use that (They use those for real issues with scalability at high load with massive numbers of file handles etc.) - I can't really think of any times it is a limiting factor).

In short, the POSIX-like system is at most n lookups.

Oh, and as for "a smart filesystem", I'll point out that name sorted isn't the fastest case for all workloads.

Also, is Find... actually *kernel* code? A quick look at MSDN lists a needed library ... not that I know how win32 is stuck together, but requiring a library would (slightly) imply it isn't actually part of the kernel. (Looking at file system drivers for windows it looks like there is some support for it there, but not in the same way as that API, so I'm not sure what performance actually looks like for this.)


As for case sensitivity, windows actually *is* case sensitive I just discovered looking at the docs for those functions, it just "hides" it for most uses, which has amusing consequences such as creating files a user can't access with the normal tools.

Furthermore, a case-insensitive file system requires workarounds from users. Case denotes differences between things in real life. You now need to define some way to handled case-affected files if you are making automated tools (these may not come up in your basic usage, but they come up in the real world and not trivially often ether).

As just one real killer case, I'll point out RFC 2616 (HTTP 1.1) which states:

The IETF wrote:3.2.3 URI Comparison


When comparing two URIs to decide if they match or not, a client
SHOULD use a case-sensitive octet-by-octet comparison of the entire
URIs, with these exceptions:

- A port that is empty or not given is equivalent to the default
port for that URI-reference;

- Comparisons of host names MUST be case-insensitive;

- Comparisons of scheme names MUST be case-insensitive;

- An empty abs_path is equivalent to an abs_path of "/".


So, on your case-insensitive file system, how do you plan to mirror a website?

A less real example is how would you save a file for each person under their name? Some names are differentiated only by case! (rare, never mind some names aren't differentiated) (more real is where a word refers to a specific thing or a general concept/set depending on case.) (Many real world cases do come up.)

It is quite clear that a case-sensitive file system is significantly more general and much easier to use in the long run for anyone but a non-power user. (and a case-insensitive file system is emulatable on a case-sensitive one, not really the other way around in any reasonable way)

User avatar
taggedunion
Posts: 146
Joined: Fri Jul 06, 2007 6:20 am UTC
Location: BEHIND YOU

Postby taggedunion » Tue Aug 07, 2007 6:46 am UTC

As far as "mv" goes, when it has >2 args it tries to fit [0...n-1] args into [n-1] arg, which it believes to be a directory.

Code: Select all

mv *.html web
would move all files ending in .html into the directory "web".

Code: Select all

mv *.htm *.html
would attempt to move all files ending in .htm and [0...n-1] files ending in .html into the directory of [n-1].html, which of course it would choke on.

Putting in "man mv" into your own terminal will say as much. So, that's the described behavior (with such behavior easily discovered!), so saying it's "incorrect" when you're coming from a Windows perspective seems a bit low.

Anyway, ren != mv. One might pity you more if they had the same name.

Have you looked at the man and/or info pages for the commands you're having trouble with? A lot what you need you could find by just reading those, I'm sure.
Yo tengo un gato en mis pantelones.

User avatar
davean
Site Ninja
Posts: 2498
Joined: Sat Apr 08, 2006 7:50 am UTC
Contact:

Postby davean » Tue Aug 07, 2007 7:59 am UTC

EvanED wrote:First question: when debugging a core file (I've never done this, though I really ought to learn how and start, rather than re-running the program in a debugger and hoping it's a deterministic bug so it crashes again), can you do things like change a variable and continue running if you "fixed" the problem?


Technically yes, though there are a lot of problems with this, the most critical being reopening the file descriptors since what file each file descriptor pointed to is now gone unless you saved that somewhere (a fairly simple thing to do with ptrace of an LD_PRELOAD injected library). Numerous check pointing systems exist implemented at different levels (user, kernel).

EvanED wrote:Second question: is it possible to rig things so that on a segfault you can go right into GDB, like on Windows? This would be really nice to do, say, on a school system where I apparently don't have access to ulimit ("command not found") and core dumps are disabled.


Yah you could ... it generally is a bad idea though. Things don't usually crash. On your personal workstation going right to a debugger might be ok, but on other systems it is a great way to DOS the system, fast. So are core files. If you enable core dumps, there is a fair bit of responsibility to be ready for your disks to start filling up fast.

'ulimit' isn't a command, it is a bash shell builtin generally. You probably aren't on bash. If for example, you where on a csh, you'd use 'limit'

Just read the help for your shell.

User avatar
djn
Posts: 610
Joined: Mon May 07, 2007 1:33 pm UTC
Location: Oslo, Norway

Postby djn » Tue Aug 07, 2007 12:31 pm UTC

I can condense one of his points down quite a lot:
In DOS-like shells, the programs get passed the bare arguments. In sh-like shells, they are expanded beforehand.

The good thing about the former is that it's easy to give glob patterns (which makes e.g. renaming a bunch of files fairly trivial).
The good thing about the latter is that it can make scripting easier, and programs a bit easier to implement.

I still think the DOS model has more possibilities:
It is possible to implement sh-style expansion if you want, you just have to do it in the program when parsing the arguments instead of the shell doing it for you. On the other hand, it's not possible to implement DOS-style argument handling in an sh-style shell.
On the flipside, the sh style makes for more predictable utilities.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Random Linux stuffs, comparisons to Windows, and other .

Postby EvanED » Tue Aug 07, 2007 12:54 pm UTC

davean wrote:That shouldn't work and for the simple reason you are telling it to recurse into every directory ending with '.c'.


...according to the Unix interpretation, yes. According to the DOS interpretation, no. According to the DOS/Windows interpretation, it recurses in the current directory looking at files matching *.c.

Your view of it would leave a tool you couldn't do much with.


On the contrary, my assertion is that you would be able to do MORE (without a supporting find or zsh-like **/ globbing), or at least more useful things.

There's no reason you couldn't implement an --include/--exclude type mechanism for Windows as well, which would I think give you back most of the power you want.

Edit: I misinterpreted what include/exclude did; my impression was that they specified if you were to recurse into the directory that I named. What they actually are is again a better way of doing what I want than the incantation of find I mentioned. In my defense at this point though, the man page I have easy access to doesn't include the --include and --exclude options. This is partly because the manpage setup of my university's CS dept. is somewhat whacked out I think.

Edit 2: It's not just the man page that's missing the options; the version of grep we have itself doesn't support them.

No it isn't. ls doesn't make a linear transversal, it makes a direct lookup (one syscall per file).


Yeah, I realized this later, and intended to come back, but forgot. Actually I'm fairly surprised I didn't...

In short, the POSIX-like system is at most n lookups.


You'll still have n readdir calls by the shell (Edit 3 no you don't, you have a few getdents64(2) calls) plus m stat calls by ls, as compared to m FindNextFile calls by dir, not all of which are actually going to be system calls. (Edit 3: so the final tally is that the Unix getdents calls are roughly equivalent to the Windows find file calls in that they both return multiple entries per system call, but Unix still has the extra stat calls by ls.)

Oh, and as for "a smart filesystem", I'll point out that name sorted isn't the fastest case for all workloads.


True, but it's a pretty good way of storing things for common ones -- opening a file by name.

Also, is Find... actually *kernel* code?


Not sure what you mean by "Find", but if you're referring to the *.c matching stuff, then yes.

As for case sensitivity, windows actually *is* case sensitive I just discovered looking at the docs for those functions, it just "hides" it for most uses, which has amusing consequences such as creating files a user can't access with the normal tools.


Yes.

I too happen to think that behavior is broken, though the alternative is not much better. Mixing case-sensitivity

Furthermore, a case-insensitive file system requires workarounds from users. Case denotes differences between things in real life.


Occasionally, but rarely. Mostly it just involves a typo, and whether your word is at the beginning of a sentence. Requiring the user to manage different cases means that the memory and typing burden is higher, for no convincing use case I've heard.

As just one real killer case, I'll point out RFC 2616 (HTTP 1.1) which states:

The IETF wrote:3.2.3 URI Comparison


When comparing two URIs to decide if they match or not, a client
SHOULD use a case-sensitive octet-by-octet comparison of the entire
URIs, with these exceptions:

- A port that is empty or not given is equivalent to the default
port for that URI-reference;

- Comparisons of host names MUST be case-insensitive;

- Comparisons of scheme names MUST be case-insensitive;

- An empty abs_path is equivalent to an abs_path of "/".



(1) Yes, I think that's dumb and wrong
(2) You could still implement a case-sensitive view of the file system when serving it over HTTP and hence match the RFC, though it won't help you here:

So, on your case-insensitive file system, how do you plan to mirror a website?


The point is that I don't think any website SHOULD be depending on the case-sensitivity.

A less real example is how would you save a file for each person under their name? Some names are differentiated only by case! (rare, never mind some names aren't differentiated)


You already can't do that since multiple people can have the same name. You already need another key.

(more real is where a word refers to a specific thing or a general concept/set depending on case.) (Many real world cases do come up.)


A more convincing examples I've heard are *.f for Fortran source files and *.F for Fortran object files.

I'm not claiming that you don't lose something by switching to a case-insensitive file system, just that what you lose is IMO very small compared with the usability benefits of not having to worry about case. I will admit that I'm less strong in my belief than I was and I'm not *totally* convinced that it should actually be a filesys feature and not a feature of the shell, but I still lean towards that view.
Last edited by EvanED on Tue Aug 07, 2007 1:36 pm UTC, edited 3 times in total.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Postby EvanED » Tue Aug 07, 2007 1:00 pm UTC

taggedunion wrote:Have you looked at the man and/or info pages for the commands you're having trouble with? A lot what you need you could find by just reading those, I'm sure.


Um, yeah, I know all that. The man page also says "mv - move (rename) files", and if a newbie user asks how you rename a file, your answer is going to be mv.

My point isn't that mv behaves the way it does, it's that I didn't think there was a good way to do what I suggested;

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Postby EvanED » Tue Aug 07, 2007 1:03 pm UTC

davean wrote:Yah you could ... it generally is a bad idea though. Things don't usually crash.


They do if you're a developer. ;-)

'ulimit' isn't a command, it is a bash shell builtin generally. You probably aren't on bash. If for example, you where on a csh, you'd use 'limit'


Ahhhhhh, thanks. Forgot about that. Should have done a man ulimit.

iw
Posts: 150
Joined: Tue Jan 30, 2007 3:58 am UTC

Postby iw » Tue Aug 07, 2007 1:48 pm UTC

I get around limitations of the bash globbing system by using find and xargs. xargs basically runs a command on every file passed to it. So your example could work like this:

Code: Select all

find . -name "*.c" | xargs grep foo

find, by default, lists every file in the current directory recursively, so you get the entire tree. -name of course only prints the .c files, and the pipe sends the output to xargs, which then runs "grep foo x" for every x printed.

Of course, there's one problem you need to watch out for if you are a sysadmin, and that's the fact that xargs separates its arguments by newlines, which of course, are a legal file character in UNIX. This can lead to files that make your automated scripts do naughty things when you didn't mean it to (e.g. "test\n/etc/passwd"). In this case, you have to use a handy function, -print0:

Code: Select all

find . -name "*.c" -print0 | xargs -0 grep foo

-print0 causes the file names to be separated by null characters instead of newlines, and the -0 passed to xargs tells it to use those.

You can also use:

Code: Select all

for i in `find . -name "*.c"`
do
grep foo $i
done

but it has the same newline character problems, and you need to set some magic variable to make it split on a null character instead of a newline.

demon
Posts: 170
Joined: Tue Feb 06, 2007 8:13 pm UTC

Postby demon » Tue Aug 07, 2007 8:22 pm UTC

or you could compress this to:

Code: Select all

find . -name "*.c" -exec grep foo {} \;

this should take care of weird chars in filenames, although I didn't check.

User avatar
taggedunion
Posts: 146
Joined: Fri Jul 06, 2007 6:20 am UTC
Location: BEHIND YOU

Postby taggedunion » Wed Aug 08, 2007 2:42 am UTC

Sorry, EvanED -- it sounded like you hadn't read the man pages, but it turned out it wasn't your fault. My apologies.
Yo tengo un gato en mis pantelones.

ToLazyToThink
Posts: 83
Joined: Thu Jun 14, 2007 1:08 am UTC

Postby ToLazyToThink » Wed Aug 08, 2007 3:36 am UTC

djn wrote:I can condense one of his points down quite a lot:
In DOS-like shells, the programs get passed the bare arguments. In sh-like shells, they are expanded beforehand.

The good thing about the former is that it's easy to give glob patterns (which makes e.g. renaming a bunch of files fairly trivial).
The good thing about the latter is that it can make scripting easier, and programs a bit easier to implement.


You also forget another other benefit of the unix model. Don't think globbing is powerful enough? Write you're own shell and do it however you like. You aren't stuck with decisions made decades ago.

Besides if Windows used the unix model I wouldn't have spent hours trying to figure out why java.lang.Runtime.exec(String[]) is so incredibly stupid (why the hell do I have to escape quotes when I'm passing arguments as separate strings???? I bet that's going to cause some pain when I move the code over to the unix box).

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Postby EvanED » Wed Aug 08, 2007 4:10 am UTC

ToLazyToThink wrote:You also forget another other benefit of the unix model. Don't think globbing is powerful enough? Write you're own shell and do it however you like. You aren't stuck with decisions made decades ago.


I didn't forget that, I just didn't mention it. I figured the command line benefits of Unix get enough airtime that I would point out some good things about Windows.

Besides, you could still do that with Windows. (Though since things that are full-fledged programs in Unix, like ls, become shell builtins, you wouldn't have access to them and would have to do more work unless you leveraged other work like Cygwin.)

Besides if Windows used the unix model I wouldn't have spent hours trying to figure out why java.lang.Runtime.exec(String[]) is so incredibly stupid (why the hell do I have to escape quotes when I'm passing arguments as separate strings???? I bet that's going to cause some pain when I move the code over to the unix box).


Personally, while I don't know the exact problem, that sure sounds like it should be blamed much more on Java than on Windows. That's a detail that the API should hide from you. If I understand you correctly, if I ran across that I would have reported it to Sun as a bug. (Or at least posted it to some message board and tried to get an explanation of the behavior, in case there's something I'm overlooking why it shouldn't do it automatically.)

ToLazyToThink
Posts: 83
Joined: Thu Jun 14, 2007 1:08 am UTC

Postby ToLazyToThink » Wed Aug 08, 2007 10:11 am UTC

EvanED wrote:Personally, while I don't know the exact problem, that sure sounds like it should be blamed much more on Java than on Windows. That's a detail that the API should hide from you. If I understand you correctly, if I ran across that I would have reported it to Sun as a bug. (Or at least posted it to some message board and tried to get an explanation of the behavior, in case there's something I'm overlooking why it shouldn't do it automatically.)


True, I think I was just venting. Write once run anywhere my hiney.

NOTE: Undid stupid edit.
Last edited by ToLazyToThink on Wed Aug 22, 2007 10:34 pm UTC, edited 2 times in total.

User avatar
djn
Posts: 610
Joined: Mon May 07, 2007 1:33 pm UTC
Location: Oslo, Norway

Postby djn » Wed Aug 08, 2007 12:28 pm UTC

ToLazyToThink wrote:You also forget another other benefit of the unix model. Don't think globbing is powerful enough? Write you're own shell and do it however you like. You aren't stuck with decisions made decades ago.


Ah, but. If I use an unix shell, and wanted to be able to pass an application a glob pattern instead of having it expanded, I'd have to first change the shell, and then replace each and every other CLI application: "Stuck with decisions made decades ago" right there.
By leaving the decision to the app, I would merely have to replace the application.

Of course, doing sweeping changes would in theory be harder, but there's a solution for that as well: Let the glob-expanding code be one standard library. Change that, and all apps gain more power ... without having to rewrite anything else.

ToLazyToThink
Posts: 83
Joined: Thu Jun 14, 2007 1:08 am UTC

Postby ToLazyToThink » Thu Aug 09, 2007 12:27 am UTC

djn wrote:Ah, but. If I use an unix shell, and wanted to be able to pass an application a glob pattern instead of having it expanded, I'd have to first change the shell, and then replace each and every other CLI application: "Stuck with decisions made decades ago" right there.
By leaving the decision to the app, I would merely have to replace the application.


That or you could just stick your pattern in double quotes (like you would with find or grep). :>)

And I don't get why you would need to change every CLI program? Unless you're actually wanting to strip it out for all applications all the time for all cases? Seems kind of overkill just to avoid a couple double quotes?

djn wrote:Of course, doing sweeping changes would in theory be harder, but there's a solution for that as well: Let the glob-expanding code be one standard library. Change that, and all apps gain more power ... without having to rewrite anything else.


Unless they were statically compiled. Or their language used a home brew runtime for some reason or the other. Or you have a program that calls another program using arguments that aren't compatible with your changes....

zenten
Posts: 3799
Joined: Fri Jun 22, 2007 7:42 am UTC
Location: Ottawa, Canada

Postby zenten » Thu Aug 09, 2007 12:51 am UTC

Or just use something other than bash for your own shell, as any good script will have at the beginning what shell to use.

Porges
Posts: 46
Joined: Mon Aug 06, 2007 3:01 am UTC
Location: Wellington, New Zealand
Contact:

Postby Porges » Thu Aug 09, 2007 1:47 am UTC

demon wrote:or you could compress this to:

Code: Select all

find . -name "*.c" -exec grep foo {} \;

this should take care of weird chars in filenames, although I didn't check.


I can't check (computer down at the moment), but I think the {} need to be quoted: "{}" for spaces and suchlike.

ToLazyToThink
Posts: 83
Joined: Thu Jun 14, 2007 1:08 am UTC

Postby ToLazyToThink » Wed Aug 22, 2007 10:34 pm UTC

ToLazyToThink wrote:
EvanED wrote:Personally, while I don't know the exact problem, that sure sounds like it should be blamed much more on Java than on Windows. That's a detail that the API should hide from you. If I understand you correctly, if I ran across that I would have reported it to Sun as a bug. (Or at least posted it to some message board and tried to get an explanation of the behavior, in case there's something I'm overlooking why it shouldn't do it automatically.)


True, I think I was just venting. Write once run anywhere my hiney.

NOTE: Undid stupid edit.



After further thought and research, it is Windows fault. Java can't really do anything about it, because it has no idea how the end program is going to parse the command line. Different Windows programs do use different runtimes (or even just different versions of the same runtime). Some even bypass the runtime and parse the command line themselves.

There's no way Java can predict what a random program is going to do, so the only logical choice is to push the responsibility back on the caller (after all there's at least a chance they know what the program is going to do). Java does share some fault though for failing to warn about this issue in the javadoc.

User avatar
TomBot
Posts: 228
Joined: Sun Jul 29, 2007 1:17 am UTC
Location: Illinois (UIUC)
Contact:

Postby TomBot » Fri Aug 24, 2007 6:50 am UTC

Regarding just in time debugging: you basically make a shared object file that installs a signal handler for SIGSEGV, which launches whichever debugger you want. Then you put LD_PRELOAD=/path/to/your-library.so in the environment, and any dynamically linked non-suid programs you run will link against that library. You can even make your own versions of library functions, because LD_PRELOAD happens after the normal libraries are loaded. The hard part is getting code to run when a library is loaded. You can do this portably in C++ using static constructors.

Wildcards: Windows' solution of having the programs themselves decode the wildcards only works for programs that actually do that, which in practice ends up being only a few standard command line tools. The unix way is more universal in that regard. Sucks that bash's wildcards can't express what you want, but you can always do something like grep foo `find -iname *.c`.

Case sensitivity: have to disagree with you there. Unix treats filenames as opaque strings of bytes, that happen to be separated by '/'. That's quite arbitrary enough for me. Otherwise you're building a dependence on ASCII and English into the kernel. Keep in mind that lots of languages don't have a bijective mapping between upper and lower case like English does. To actually handle this right, you need to use a whole system such as unicode, and then you have this huge complexity, as well as security problems arising from being able to make different files that look the same in xyz font. In other words, case insensitivity is ugly. It should live in userspace. The kernel connects the hardware and the software, it's the software that should deal with the humans.

User avatar
djn
Posts: 610
Joined: Mon May 07, 2007 1:33 pm UTC
Location: Oslo, Norway

Postby djn » Fri Aug 24, 2007 11:07 am UTC

ToLazyToThink wrote:That or you could just stick your pattern in double quotes (like you would with find or grep). :>)

That's a workaround, but slightly unelegant. :)

And I don't get why you would need to change every CLI program? Unless you're actually wanting to strip it out for all applications all the time for all cases? Seems kind of overkill just to avoid a couple double quotes?

If I did want to change the default behavior to "pass globs unexpanded", all programs that expect them to be expanded (which would be all of them) would need some rewriting.


Unless they were statically compiled. Or their language used a home brew runtime for some reason or the other. Or you have a program that calls another program using arguments that aren't compatible with your changes....

Well, of course, but that could be worked around.


For the record, I don't mind things the way they currently are, I'm just saying that the alternative wouldn't be universally worse. :)

ToLazyToThink
Posts: 83
Joined: Thu Jun 14, 2007 1:08 am UTC

Postby ToLazyToThink » Fri Aug 24, 2007 5:53 pm UTC

djn wrote:
ToLazyToThink wrote:That or you could just stick your pattern in double quotes (like you would with find or grep). :>)

That's a workaround, but slightly unelegant. :)

True, but it has the advantage of not needing to change anything.

djn wrote:
And I don't get why you would need to change every CLI program? Unless you're actually wanting to strip it out for all applications all the time for all cases? Seems kind of overkill just to avoid a couple double quotes?

If I did want to change the default behavior to "pass globs unexpanded", all programs that expect them to be expanded (which would be all of them) would need some rewriting.

Nope, that's not the only choice in the unix model. You could change your shell to pass the unexpanded globs in environment vars (or some other method). Your program could then be changed to use that instead of (or in addition to) the normal args. Then you only need to change the programs that you wanted to change anyway.

djn wrote:
Unless they were statically compiled. Or their language used a home brew runtime for some reason or the other. Or you have a program that calls another program using arguments that aren't compatible with your changes....

Well, of course, but that could be worked around.


How can it be worked around? That's exactly the problem facing Java's Runtime.exec. There's no way to treat the programs generically if the programs don't behave in a generic fashion.

You can avoid the problem by mandating that all programs use the same runtime, but that didn't work for Windows. Besides you started off wishing for a program that would have to bypass the runtime so it could treat globs special.

User avatar
djn
Posts: 610
Joined: Mon May 07, 2007 1:33 pm UTC
Location: Oslo, Norway

Postby djn » Fri Aug 24, 2007 6:58 pm UTC

ToLazyToThink wrote:True, but it has the advantage of not needing to change anything.

I was arguing that if it had been like that from the start, we'd have a different, and not in all ways worse, sort of flexibility. Being able to work around it with quotes is of course useful, but the answer to a different question. :)


djn wrote:Nope, that's not the only choice in the unix model. You could change your shell to pass the unexpanded globs in environment vars (or some other method). Your program could then be changed to use that instead of (or in addition to) the normal args. Then you only need to change the programs that you wanted to change anyway.

Ok, that is actually a good idea. The possibility for confusion would be huge, but that's also true for the "just pass everything unchanged" - model. One drawback is that the same tool with the same arguments could produce horribly different results depending on if the shell did the right thing or not, while an "always pass untreated"-system would be more predictable in that particular way.

How can it be worked around? That's exactly the problem facing Java's Runtime.exec. There's no way to treat the programs generically if the programs don't behave in a generic fashion.

You can avoid the problem by mandating that all programs use the same runtime, but that didn't work for Windows. Besides you started off wishing for a program that would have to bypass the runtime so it could treat globs special.


In the worst case, you'd have some statically linked programs using old variants, some using subtly different homebrew variants, and some using whatever the locally newest is. Not at all unlike the current state of regexp matching, then. (Remember that we're talking about glob expansion here. It's not the fastest changing area in the world, nor one that's likely to break too horribly if there's a minimal subset people agree on.)

Changing the glob library so much that is loses backwards compatibility would break the oddest things, of course. A similar change to bash or sh wouldn't exactly be painless either, but possibly easier to work around?

The main problem seems to be that a working pass-unexpanded system with possibility for changes would need a certain bit of platform culture. Something like "all programs have a switch that makes it return the version of the most current subset they support fully", some standard way to make the program fall back to an old standard version, and a BSD- or LGPL-licensed library that roughly everybody used, would seem to solve most of the possible problems. Getting all programs to follow non-binding standards like that would be the problem, but things like that have been done. (See e.g. the amiga standards for how to store settings.)

Understand that I'm basically doing speculative alternate reality fiction here, and my position should be clearer. :D

zenten
Posts: 3799
Joined: Fri Jun 22, 2007 7:42 am UTC
Location: Ottawa, Canada

Postby zenten » Fri Aug 24, 2007 7:46 pm UTC

I still don't see why you don't just use a different shell.

User avatar
djn
Posts: 610
Joined: Mon May 07, 2007 1:33 pm UTC
Location: Oslo, Norway

Postby djn » Fri Aug 24, 2007 8:07 pm UTC

zenten wrote:I still don't see why you don't just use a different shell.


Because that would require writing special cases in the shell for every utility I'd like to act differently, or rewriting the programs to work with a special feature of the shell to do whatever I wanted them to.

Because having that serious differences between my shell and the shell scripts of the rest of the world would be somewhat inconvenient.

And most importantly:
Because I really don't mind the current situation. I'm trying to imagine how the world would have looked it the first sh left glob expansion to the apps, and this was taken as the way to do things on *nix, solely because it's an interesting thought experiment.
Last edited by djn on Fri Aug 24, 2007 8:17 pm UTC, edited 2 times in total.

ToLazyToThink
Posts: 83
Joined: Thu Jun 14, 2007 1:08 am UTC

Postby ToLazyToThink » Fri Aug 24, 2007 8:12 pm UTC

djn wrote:Ok, that is actually a good idea. The possibility for confusion would be huge, but that's also true for the "just pass everything unchanged" - model. One drawback is that the same tool with the same arguments could produce horribly different results depending on if the shell did the right thing or not, while an "always pass untreated"-system would be more predictable in that particular way.


True, but if the tool was any good, it would bail with an error message if the unexpanded args weren't available (possibly with a workaround for those who don't have the new shell available).

djn wrote:Changing the glob library so much that is loses backwards compatibility would break the oddest things, of course. A similar change to bash or sh wouldn't exactly be painless either, but possibly easier to work around?


Much easier to work around. Either you rename the shell, or provide a command line switch to enable the new behavior (or both). Most of the more advanced shells that can be used as sh replacements do this.


djn wrote:Understand that I'm basically doing speculative alternate reality fiction here, and my position should be clearer. :D


I realize, it's just that after wasting several hours fighting Windows/Java over this, I feel a little need to vent. Besides, where would the internet be without pointless battles that don't really matter. :)


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 8 guests