Beautiful regexes

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

Iv
Posts: 1207
Joined: Thu Sep 13, 2007 1:08 pm UTC
Location: Lyon, France

Beautiful regexes

Postby Iv » Mon Mar 08, 2010 8:49 am UTC

Hi all,
I'll be doing a presentation soon on concise codes and compact expression and I thought it would be appropriate to talk a bit about regexes. I remember seeing 7 or 8 years ago a repository of nifty obfuscated regexes, almost puzzles. I thought it was on Perlmonk but I couldn't find it back. I especially remember a 3 lines-long regex that was a morse interpreter. Does someone know which site it would be. So far my google-fu has proven to be weak.

And if you have a beautiful and obfuscated regex, do not hesitate to post it here !

I think I'll introduce the concept through the simple-yet-hermetic-at-first expresion : s/\\/\//g (converts backslashes into slashes, useful in many environments) and then go into more complex ones...

User avatar
Zamfir
I built a novelty castle, the irony was lost on some.
Posts: 7590
Joined: Wed Aug 27, 2008 2:43 pm UTC
Location: Nederland

Re: Beautiful regexes

Postby Zamfir » Mon Mar 08, 2010 1:31 pm UTC

I suggest solitary confinement for people who use "beautiful" to mean "obfuscated".

With whippings at random intervals, so they start to love their torturer as the only human being they still have contact with.

User avatar
Briareos
Posts: 1940
Joined: Thu Jul 12, 2007 12:40 pm UTC
Location: Town of the Big House

Re: Beautiful regexes

Postby Briareos » Mon Mar 08, 2010 1:50 pm UTC

Iv wrote:I think I'll introduce the concept through the simple-yet-hermetic-at-first expresion : s/\\/\//g (converts backslashes into slashes, useful in many environments) and then go into more complex ones...
You know that sed lets you use different delimiters as long as you're consistent, right?

s_\\_\/_g
Sandry wrote:Bless you, Briareos.

Blriaraisghaasghoasufdpt.
Oregonaut wrote:Briareos is my new bestest friend.

Iv
Posts: 1207
Joined: Thu Sep 13, 2007 1:08 pm UTC
Location: Lyon, France

Re: Beautiful regexes

Postby Iv » Mon Mar 08, 2010 2:02 pm UTC

Briareos wrote:
Iv wrote:I think I'll introduce the concept through the simple-yet-hermetic-at-first expresion : s/\\/\//g (converts backslashes into slashes, useful in many environments) and then go into more complex ones...
You know that sed lets you use different delimiters as long as you're consistent, right?
s_\\_\/_g

I didn't know that. I used regex with perl mainly. So when you do that, do you need to escape slashes ? And suddenly you need to escape underscores. Hmmm, it can be nice to obfuscate even more !

Zamfir wrote:I suggest solitary confinement for people who use "beautiful" to mean "obfuscated".

With whippings at random intervals, so they start to love their torturer as the only human being they still have contact with.
Can I quote you on that ? It will fit nicely in the introduction part. The people who asked about it are consenting adults and it will voluntarily be a presentation about obfuscated code :-)

User avatar
Zamfir
I built a novelty castle, the irony was lost on some.
Posts: 7590
Joined: Wed Aug 27, 2008 2:43 pm UTC
Location: Nederland

Re: Beautiful regexes

Postby Zamfir » Mon Mar 08, 2010 2:11 pm UTC

Iv wrote:
Zamfir wrote:I suggest solitary confinement for people who use "beautiful" to mean "obfuscated".

With whippings at random intervals, so they start to love their torturer as the only human being they still have contact with.
Can I quote you on that ? It will fit nicely in the introduction part. The people who asked about it are consenting adults and it will voluntarily be a presentation about obfuscated code :-)

Sure, but why? Why is there need for a presentation on this? Even if /\\/\\///a\\r\g\//h really is the best way to perform a task, can't you just wrap in a function with a sensible name?

User avatar
chridd
Has a vermicelli title
Posts: 843
Joined: Tue Aug 19, 2008 10:07 am UTC
Location: ...Earth, I guess?
Contact:

Re: Beautiful regexes

Postby chridd » Mon Mar 08, 2010 4:51 pm UTC

Iv wrote:I didn't know that. I used regex with perl mainly. So when you do that, do you need to escape slashes ? And suddenly you need to escape underscores. Hmmm, it can be nice to obfuscate even more !
You can do that with perl as well (though if you use underscores, you need a space after the s). And you don't need to escape slashes if you do it this way. s!\\!/!g or s(\\)(/)g would be a more readable version of the one you have.
~ chri d. d. /tʃɹɪ.di.di/ (Phonotactics, schmphonotactics) · she · Forum game scores
mittfh wrote:I wish this post was very quotable...

User avatar
lulzfish
Posts: 1214
Joined: Tue Dec 16, 2008 8:17 am UTC

Re: Beautiful regexes

Postby lulzfish » Mon Mar 08, 2010 5:55 pm UTC

Zamfir wrote:I suggest solitary confinement for people who use "beautiful" to mean "obfuscated".
With whippings at random intervals, so they start to love their torturer as the only human being they still have contact with.

Good thinking.

Beautiful code should mean "easy to understand". So I'd rather use something like `string.replace ('\', '/')`, or whatever the local string API uses.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11128
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Beautiful regexes

Postby Yakk » Mon Mar 08, 2010 6:00 pm UTC

Was it perl3 or perl4 that let you use whitespace as delimiters to regexp expressions?

Code: Select all

s   \                 ;
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

0rm
Posts: 81
Joined: Wed Feb 17, 2010 2:30 pm UTC

Re: Beautiful regexes

Postby 0rm » Mon Mar 08, 2010 11:29 pm UTC

Iv wrote:Hi all,
I'll be doing a presentation soon on concise codes and compact expression and I thought it would be appropriate to talk a bit about regexes. I remember seeing 7 or 8 years ago a repository of nifty obfuscated regexes, almost puzzles. I thought it was on Perlmonk but I couldn't find it back. I especially remember a 3 lines-long regex that was a morse interpreter. Does someone know which site it would be. So far my google-fu has proven to be weak.

And if you have a beautiful and obfuscated regex, do not hesitate to post it here !

I think I'll introduce the concept through the simple-yet-hermetic-at-first expresion : s/\\/\//g (converts backslashes into slashes, useful in many environments) and then go into more complex ones...


I have many words to describe regexes, and beautiful ain't one of em. :?
They say it's unhackable; I think it can be hacked.
They say it's fast; I think it could be faster.
They say it's the best; I think it can be done better.

User avatar
lulzfish
Posts: 1214
Joined: Tue Dec 16, 2008 8:17 am UTC

Re: Beautiful regexes

Postby lulzfish » Tue Mar 09, 2010 1:34 am UTC

0rm wrote:I have many words to describe regexes, and beautiful ain't one of em. :?

I prefer phrases. Here's a few of my favorites:
"Write-only"
"Confusing"
"Loads of backslashes"
"Requires extra commenting"
"Still easier to setup than Qt's XML parser"

User avatar
insom
Posts: 40
Joined: Mon Feb 25, 2008 11:29 am UTC

Re: Beautiful regexes

Postby insom » Tue Mar 09, 2010 2:02 am UTC

Pure sed, but not pure regex:

Code: Select all

didn't look nice this forum has too fancy syntax higlighting:(
see link below instead...

http://solstorm.doesntexist.com/files/kiloseconds.txt
It is perhaps not pretty, but few enough see anything but noise, so I take my chanses to brag about it whenever I can (:

Now, where's that solitary confinement I was promised?
Normal cynics think they are realists. Hardcore cynics know they are optimists.
Woo I draw stuff - how incredibly awesome

Rysto
Posts: 1460
Joined: Wed Mar 21, 2007 4:07 am UTC

Re: Beautiful regexes

Postby Rysto » Tue Mar 09, 2010 2:09 am UTC

Briareos wrote:You know that sed lets you use different delimiters as long as you're consistent, right?

s_\\_\/_g

s/sed/GNU sed/

User avatar
jaap
Posts: 2094
Joined: Fri Jul 06, 2007 7:06 am UTC
Contact:

Re: Beautiful regexes

Postby jaap » Tue Mar 09, 2010 5:47 am UTC

Let's not forget this classic regex for validating email addresses.

User avatar
Aaeriele
Posts: 2127
Joined: Tue Feb 23, 2010 3:30 am UTC
Location: San Francisco, CA

Re: Beautiful regexes

Postby Aaeriele » Tue Mar 09, 2010 8:53 am UTC

jaap wrote:Let's not forget this classic regex for validating email addresses.


There's a reason why, when someone says they've got a regex to validate email addresses, the usual response is "you've probably got it wrong".
Vaniver wrote:Harvard is a hedge fund that runs the most prestigious dating agency in the world, and incidentally employs famous scientists to do research.

afuzzyduck wrote:ITS MEANT TO BE FLUTTERSHY BUT I JUST SEE AAERIELE! CURSE YOU FORA!

Iv
Posts: 1207
Joined: Thu Sep 13, 2007 1:08 pm UTC
Location: Lyon, France

Re: Beautiful regexes

Postby Iv » Tue Mar 09, 2010 10:05 am UTC

Ok, maybe I wasn't very clear : I wholeheartedly agree that regex are almost always bad practice in any production code. It is obfuscated, not really self-descriptive, hard to debug, etc... The goal here is not to teach good programming practices but it will be a presentation in a half-tech, half-artistic hackerspace. They want to see convoluted pieces of code and I proposed regex as some sort of quintessential conciseness and obfuscation.

Did you ever have this feeling that over-obfuscated code (like those in the IOCCC) have some kind of artistic value ? Sure I would scream if I were to work with someone who coded exclusively like that, but I feel that these small codes have an appeal as artworks, puzzles and finely-crafted gearboxes.

The validating regex for email is nice, thanks.

elminster
Posts: 1560
Joined: Mon Feb 26, 2007 1:56 pm UTC
Location: London, UK, Dimensions 1 to 42.
Contact:

Re: Beautiful regexes

Postby elminster » Tue Mar 09, 2010 1:15 pm UTC

I'd describe regexs as "powerful" in that, in a very small number of characters, you can do a huge amount of work.
I've saved tens of hours through the use of regex purely comparing the time it would take to do it manually or even writing a program to do similar tasks.
Image

User avatar
'; DROP DATABASE;--
Posts: 3284
Joined: Thu Nov 22, 2007 9:38 am UTC
Location: Midwest Alberta, where it's STILL snowy
Contact:

Re: Beautiful regexes

Postby '; DROP DATABASE;-- » Tue Mar 09, 2010 2:27 pm UTC

On the flip side, regex can be such a pain that I've found myself doing repetitive tasks manually, having decided it'd take longer to write a regex. Or much more commonly, spending 5 times as long as expected writing a regex, and realizing I should've done it manually. :P
poxic wrote:You suck. And simultaneously rock. I think you've invented a new state of being.

User avatar
Meteorswarm
Posts: 979
Joined: Sun Dec 27, 2009 12:28 am UTC
Location: Ithaca, NY

Re: Beautiful regexes

Postby Meteorswarm » Tue Mar 09, 2010 3:56 pm UTC

Regexes are much easier to read if you use the (well, in perl) option to ignore non-explicit whitespace, which lets you structure them on multiple lines, with indentation, and the like.
The same as the old Meteorswarm, now with fewer posts!

User avatar
Berengal
Superabacus Mystic of the First Rank
Posts: 2707
Joined: Thu May 24, 2007 5:51 am UTC
Location: Bergen, Norway
Contact:

Re: Beautiful regexes

Postby Berengal » Tue Mar 09, 2010 3:58 pm UTC

The nice thing about regexes is that they're regular expressions. They're concise and relatively easy to write simply because they're less powerful than other things we're working with.

If you just want concise and elegant but dubious code however then I've got some point-free Haskell expressions for you:

Code: Select all

powerSet :: [a] -> [[a]] -- Takes a list to a list of lists, each inner list containing a subset of the elements of the input, and all possible subsets are present
powerSet = filterM (const [True, False])

runlengthEncode :: (Eq a) => [a] -> [(a, Int)] -- Takes a list of equatable items to a list of items and integers by counting sequential occurences of the same item
runlengthEncode = map (head &&& length) . group

runlengthDecode :: [(a, Int)] -> [a] -- Takes a list of items and integers to a list of items, reversing the above encoding function
runlengthDecode = concatMap . uncurry . flip $ replicate

primes :: [Integer] -- An infinite list of prime numbers. No, this isn't a function, it's an honest-to-god value.
primes = nubBy (fmap fmap fmap (==0) mod) [2..]

fibs :: [Integer] -- Since we're on the topic of infinite lists, the infinite list of fibonacci numbers, and self-recursive value (as opposed to self-recursive functions)
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

Oh me yarm :: [[Double]] -- Finally this gem from the #haskell logs.
Oh me yarm = let o_o = 0.0 ;o' =(, ); ;o (*)=(*) ;( lol, xD :p )= o' o' $o.o$ (:[]) $o.o$ (:[]) o_o in (:[]) o_o :p
It is practically impossible to teach good programming to students who are motivated by money: As potential programmers they are mentally mutilated beyond hope of regeneration.

Peter Galbavy
Posts: 76
Joined: Wed Dec 23, 2009 11:11 am UTC
Location: London, UK
Contact:

Re: Beautiful regexes

Postby Peter Galbavy » Tue Mar 09, 2010 4:25 pm UTC

I love regexps but I would never call any of them beautiful. A bit like being realistic about your own children ;-)

Iv
Posts: 1207
Joined: Thu Sep 13, 2007 1:08 pm UTC
Location: Lyon, France

Re: Beautiful regexes

Postby Iv » Tue Mar 09, 2010 4:41 pm UTC

Let's settle on "twisted beauty" then ;-)

User avatar
Aaeriele
Posts: 2127
Joined: Tue Feb 23, 2010 3:30 am UTC
Location: San Francisco, CA

Re: Beautiful regexes

Postby Aaeriele » Tue Mar 09, 2010 10:01 pm UTC

Meteorswarm wrote:Regexes are much easier to read if you use the (well, in perl) option to ignore non-explicit whitespace, which lets you structure them on multiple lines, with indentation, and the like.


Agreed. AND COMMENTS.

(It's /m by the way, I believe.)
Vaniver wrote:Harvard is a hedge fund that runs the most prestigious dating agency in the world, and incidentally employs famous scientists to do research.

afuzzyduck wrote:ITS MEANT TO BE FLUTTERSHY BUT I JUST SEE AAERIELE! CURSE YOU FORA!

|Erasmus|
Branson
Posts: 2643
Joined: Tue Oct 30, 2007 7:53 am UTC
Location: Sydney, Australia
Contact:

Re: Beautiful regexes

Postby |Erasmus| » Tue Mar 09, 2010 10:07 pm UTC

regexes (and other ridiculous one-liners in difficult to read languages) do have a bit of a mystical quality about them (it does -what-?). I guess some might call it beauty.

I can't even remember what the actual circumstance for writing this particular line was (it is part of a shell script I wrote to run a series of regexes to refactor a large amount of code).

Code: Select all

perl -p -i -e "undef $/; s/(\",)\s+(\"[^\"*]\")\"/\1 \2/ $1"

User avatar
evilbeanfiend
Posts: 2650
Joined: Tue Mar 13, 2007 7:05 am UTC
Location: the old world

Re: Beautiful regexes

Postby evilbeanfiend » Tue Mar 09, 2010 10:28 pm UTC

the real problems come when people start trying to parse stuff with regexp. they are usually fine for tokenising but parsing quickly gets messy even if the rexexp is technically capable of parsing the grammar required, and of course there are plenty of grammars you can't parse with regexps.
in ur beanz makin u eveel

User avatar
Aaeriele
Posts: 2127
Joined: Tue Feb 23, 2010 3:30 am UTC
Location: San Francisco, CA

Re: Beautiful regexes

Postby Aaeriele » Tue Mar 09, 2010 11:10 pm UTC

evilbeanfiend wrote:the real problems come when people start trying to parse stuff with regexp. they are usually fine for tokenising but parsing quickly gets messy even if the rexexp is technically capable of parsing the grammar required, and of course there are plenty of grammars you can't parse with regexps.


*cough* http://stackoverflow.com/questions/1732 ... ained-tags
Vaniver wrote:Harvard is a hedge fund that runs the most prestigious dating agency in the world, and incidentally employs famous scientists to do research.

afuzzyduck wrote:ITS MEANT TO BE FLUTTERSHY BUT I JUST SEE AAERIELE! CURSE YOU FORA!

roboman
Posts: 12
Joined: Wed Jul 16, 2008 12:12 am UTC

Re: Beautiful regexes

Postby roboman » Wed Mar 10, 2010 2:34 am UTC

I think you should definitely mention Abigail's prime number identifier. (Got this one out of Perl Best Practices, Conway).

Code: Select all

sub is_prime{
    my ($number) = @_;
    return (1 x $number) !~ m/\A (?: 1? | (11+?) (?> \1+ ) ) \Z/xms;
}

I still have not figured out how it does it, might have to give it another try though.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11128
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Beautiful regexes

Postby Yakk » Wed Mar 10, 2010 3:19 am UTC

x is the repeat operator.

So it does a regular expression search on a unary encoding of the number.

The unary encoding of the number ... looks like it might be a factoring the number. The first bit detects 1 or 0, the second bit finds a unary encoded number of value at least 2 that, when repeated a number of times, ends up consuming the entire string. If such a number exists, it is composite -- and if 0 or 1, it is (edit: also non-prime).

The !~ is the negative match. (Edit:) So the expression matches 0, 1 or numbers that can be encoded as 1 or more repeats of a a string of 1s that is at least 2 long -- ie, composite (as we factored it).

Lots of little technical details I glossed over -- I don't know what \A, \Z, or the xms flags do. I think the (?:re) syntax is just a non-backslash matched bracket. I don't know what the ? at the end of the (11+?) expression is for. Etc.
Last edited by Yakk on Wed Mar 10, 2010 3:16 pm UTC, edited 1 time in total.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

Iv
Posts: 1207
Joined: Thu Sep 13, 2007 1:08 pm UTC
Location: Lyon, France

Re: Beautiful regexes

Postby Iv » Wed Mar 10, 2010 11:12 am UTC

Thanks roboman. It matches prime numbers made entirely of ones thanks to a property I am not sure I understand well. Not sure why there is a \1 as well.

Well I think regex began to be considered as evil when anyone who could write a "hello world" in PHP began to be called "developers". Regexes do a horrible job in the realm of nested HTML tags and complex grammars. It is nonetheless a very powerful tool in less recursive grammars. But these days, as parsing HTML tags with them is a bad idea, too much people consider they are good for nothing.

|Erasmus| : I think you summed up quite well the idea.

User avatar
Xanthir
My HERO!!!
Posts: 5410
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex
Contact:

Re: Beautiful regexes

Postby Xanthir » Thu Mar 11, 2010 5:03 am UTC

Iv wrote:Thanks roboman. It matches prime numbers made entirely of ones thanks to a property I am not sure I understand well. Not sure why there is a \1 as well.

No, this isn't right.

First, it matches against a number *written in unary*. Unary is base 1, otherwise known as "those hash marks you use when keeping score in things". In unary, 1 is 1, 2 is 11, 3 is 111, 5 is 11111, etc. So the function first converts the given number to unary, and then passes the regex over it.

The technical details of most of the regex are irrelevant; the important part is where it says "11+?" and then, later, "\1+". The "11+?" part will *pessimistically* match 2 or more 1s. Pessimistic matching means that it first starts with the smallest possible match (2 1s), and then grows from there(3 1s, then 4 1s, etc.) (normally, regexes match as much as possible, and then back off if this causes other parts to not match).

The "\1+" part uses a backreference within the regex. When you use regexes, parentheses do more than just group things; they also store the matched bit within the parens as a backreference that you can access later. This is how, frex, you can use "(\d+):(\d\d)(am|pm)" to match against a time, and then later use $1 to get at the hours, $2 to get at the minutes, etc (in some languages this may work slightly differently, frex PHP stores them in the array you pass to matching function, so $matches[1] contains hours, etc.). The "\1" is referring to the first group within the same regex, which is just the "11+?" part. It means that, after you've matched the 11+? bit, you should try to match that string *again* 1 or more times. The \A and \Z just ensure that you're matching the entire string here.

The end result is that this uses regular expressions to do trial division on the number. Frex, if you pass it 9, it will first convert this to 111111111. Then it will try and match 11 (the shortest string it can get that matches the "11+?" expression). Then it will try to match "11" multiple times, which it can't do and still satisfy the \Z at the end (there will be a 1 left over). So it will go back and try the next largest string to match the "11+?" expression, which is 111. Then it will match 111 multiple times, which it can do so and exhaust the string with, so it returns that it's found a successful match.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

User avatar
headprogrammingczar
Posts: 3072
Joined: Mon Oct 22, 2007 5:28 pm UTC
Location: Beaming you up

Re: Beautiful regexes

Postby headprogrammingczar » Thu Mar 11, 2010 11:36 pm UTC

Basically, it does brute force multiplication.
<quintopia> You're not crazy. you're the goddamn headprogrammingspock!
<Weeks> You're the goddamn headprogrammingspock!
<Cheese> I love you

hall2k
Posts: 30
Joined: Wed Jan 20, 2010 4:11 am UTC

Re: Beautiful regexes

Postby hall2k » Sat Mar 13, 2010 5:19 pm UTC

One thing I've always thought was beautiful was in my CS Concepts course wayyy back in first term of first year. We were using Miranda (a mostly dead functional language similar to Haskell) and I forget what the task was, but I was being lazy and googling around for solutions, which can be very hard with the programming equivalent to Latin except it was never popular at all, when I came across a solution that used multiple recursion. I've long forgotten the original problem, and have gone through several new hard drives since, but basically there would be two functions that call each other, and it goes back and forth between the two until one of them hits their base case (they each have their own base case). Probably not the most efficient, but it was the epitome of what I consider "beautiful code".

User avatar
Berengal
Superabacus Mystic of the First Rank
Posts: 2707
Joined: Thu May 24, 2007 5:51 am UTC
Location: Bergen, Norway
Contact:

Re: Beautiful regexes

Postby Berengal » Sat Mar 13, 2010 7:23 pm UTC

Code: Select all

even 0 = True
even n = odd (n - 1)

odd 0 = False
odd n = even (n - 1)
It is practically impossible to teach good programming to students who are motivated by money: As potential programmers they are mentally mutilated beyond hope of regeneration.

fazzone
Posts: 186
Joined: Wed Dec 10, 2008 9:38 pm UTC
Location: A boat

Re: Beautiful regexes

Postby fazzone » Sat Mar 13, 2010 8:15 pm UTC

The inefficiency...it burns!

or

Code: Select all

even 0 = True
even n = not(even(n - 1))
*/


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 10 guests