Repeated newlines with regular expressions

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

Fieari
Posts: 102
Joined: Mon Jan 29, 2007 2:16 am UTC
Location: Okayama, Japan

Repeated newlines with regular expressions

Postby Fieari » Tue Jul 03, 2007 7:06 pm UTC

I've got a plaintext database (ugh) which repeats a particular field at the beginning and end of the record. I'm trying to get rid of the repeated field. Each record has a variable number of fields.

As such, is there anyway to use regular expressions to search for a specifically formatted string, followed by any number of other lines containing anything, followed by that previous specific string again? I would LOVE to use the following expression:

Code: Select all

([[:alpha:]][[:alpha:]][[:digit:]]+ \.[[:alpha:]][[:digit:]]+ (([[:graph:]]+)+)? )(.*\n)*\1
Which seems like it SHOULD work... but!

I'm using TextPad (POSIX syntax), whose regular expression interpreter specifically says in the documentation that you can't use \n with any of the "repeat" characters. There's no option to let the dot wildcard read newlines either.

Is there a way around this? Is there another (free-- I know about PowerGREP, which I'm pretty sure would work, but I can't really justify spending money on it for one single task at work) windows text search/replace program that DOES support newlines being repeated?

(Bonus points* to first person to recognize what kind of records this database contains based on the above regex)

(*Bonus points not actually worth anything)
Surely it is as ridiculous to consider sqrt(-1) "imaginary" because you can't use it to count pieces of chalk as to consider the number 200 imaginary because by itself it cannot express the location of one point with reference to another. -Isaac Asimov

User avatar
Torn Apart By Dingos
Posts: 817
Joined: Thu Aug 03, 2006 2:27 am UTC

Postby Torn Apart By Dingos » Tue Jul 03, 2007 10:57 pm UTC

e text editor. Not free, but free to use as long as it's in beta (even after your 30 day trial expires).

Fieari
Posts: 102
Joined: Mon Jan 29, 2007 2:16 am UTC
Location: Okayama, Japan

Postby Fieari » Wed Jul 04, 2007 12:03 am UTC

Fantastic program! It may replace TextPad as my default editor.
Surely it is as ridiculous to consider sqrt(-1) "imaginary" because you can't use it to count pieces of chalk as to consider the number 200 imaginary because by itself it cannot express the location of one point with reference to another. -Isaac Asimov

User avatar
Jach
Posts: 167
Joined: Sat May 05, 2007 8:38 pm UTC
Contact:

Postby Jach » Wed Jul 04, 2007 5:15 am UTC

Use Java's replaceAll() method? *Shrugs* I don't often have to use regex in my text editor. =P
I love reading quotes.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Postby Yakk » Wed Jul 04, 2007 7:21 pm UTC

perl. python. C++.

All of them have decent regex libraries that can be taught to teach \n as "just another character".

User avatar
evilbeanfiend
Posts: 2650
Joined: Tue Mar 13, 2007 7:05 am UTC
Location: the old world

Postby evilbeanfiend » Thu Jul 05, 2007 9:02 am UTC

Yakk wrote:perl. python. C++.

All of them have decent regex libraries that can be taught to teach \n as "just another character".


as does tcl and im sure ruby and others too.

btw i find this very useful for building and testing complex regexps

http://www.doulos.com/knowhow/tcltk/examples/trev/
in ur beanz makin u eveel

Fieari
Posts: 102
Joined: Mon Jan 29, 2007 2:16 am UTC
Location: Okayama, Japan

Postby Fieari » Fri Jul 06, 2007 2:06 pm UTC

Hey, been using e-TextEditor, and it's been working great, until I've come across a crash bug. I'd just report it on their forums (and I have) except that it seems to only crash when searching a specific set of text. There doesn't seem to be anything particularly unique about this particular block of text. As such, I thought you guys might enjoy this puzzle:

Using the following regex:

Code: Select all

( ?[A-Z][A-Z]\d+(\.\d+)? ?\.[A-Z]\d+[A-Z]? ([^\s]+ )* )(.+\n)+\1
When searching the following block of text:

Code: Select all

 BF30.A56 V.1                             1950
                 Title: Annual review of psychology.
     Electronic access: http://psych.annualreviews.org
 BF30.A56 V.1                           
  copy:1       id:31405000342737             location:STACKS   
 BF30 .A56 V.2                           
  copy:1       id:31405000342745             location:STACKS   
 BF30 .A56 V.3                           
  copy:1       id:31405000342752             location:STACKS   
 BF30 .A56 V.4                           
  copy:1       id:31405000342760             location:STACKS   
 BF30 .A56 V.5                           
  copy:1       id:31405000342778             location:STACKS   
 BF30 .A56 V.6                           
  copy:1       id:31405000342786             location:STACKS   
 BF30 .A56 V.7                           
  copy:1       id:31405000342794             location:STACKS   
 BF30 .A56 V.8                           
  copy:1       id:31405000342802             location:STACKS   
 BF30 .A56 V.9                           
  copy:1       id:31405000342828             location:STACKS   
 BF30 .A56 V.10                         
  copy:1       id:31405000342836             location:STACKS   
 BF30 .A56 V.11                         
  copy:1       id:31405000342844             location:STACKS   
 BF30 .A56 V.12                         
  copy:1       id:31405000342851             location:STACKS   
 BF30 .A56 V.13                         
  copy:1       id:31405000342869             location:STACKS   
 BF30 .A56 V.14                         
  copy:1       id:31405000342877             location:STACKS   
 BF30 .A56 V.15                         
  copy:1       id:31405000342885             location:STACKS   
 BF30 .A56 V.16                         
  copy:1       id:31405000342893             location:STACKS   
 BF30 .A56 V.17                         
  copy:1       id:31405000342901             location:STACKS   
 BF30 .A56 V.18                         
  copy:1       id:31405000342919             location:STACKS   
 BF30 .A56 V.19                         
  copy:1       id:31405000342927             location:STACKS   
 BF30 .A56 V.20                         
  copy:1       id:31405000342935             location:STACKS   
 BF30 .A56 V.21                         
  copy:1       id:31405000342943             location:STACKS   
 BF30 .A56 V.22                         
  copy:1       id:31405000342950             location:STACKS   
 BF30 .A56 V.23                         
 

e-TextEditor crashes to desktop. Now, here's the funny bit. Remove any given line of text. Any line. Doesn't matter WHICH line, just any line. It no longer crashes. You can remove text from a line, just so long as you don't remove the line itself.

Any idea why it's crashing?

Furthermore, is there a way I could alter my regex to expressly exclude this type of situation, thus avoiding the crash?
Surely it is as ridiculous to consider sqrt(-1) "imaginary" because you can't use it to count pieces of chalk as to consider the number 200 imaginary because by itself it cannot express the location of one point with reference to another. -Isaac Asimov

User avatar
evilbeanfiend
Posts: 2650
Joined: Tue Mar 13, 2007 7:05 am UTC
Location: the old world

Postby evilbeanfiend » Fri Jul 06, 2007 2:15 pm UTC

solution? use emacs :wink:
in ur beanz makin u eveel

Fieari
Posts: 102
Joined: Mon Jan 29, 2007 2:16 am UTC
Location: Okayama, Japan

Postby Fieari » Fri Jul 06, 2007 2:26 pm UTC

*sigh* Cygwin full install, here I come... man, my boss is getting edgy. I said this would be a quick job.
Surely it is as ridiculous to consider sqrt(-1) "imaginary" because you can't use it to count pieces of chalk as to consider the number 200 imaginary because by itself it cannot express the location of one point with reference to another. -Isaac Asimov

User avatar
Torn Apart By Dingos
Posts: 817
Joined: Thu Aug 03, 2006 2:27 am UTC

Postby Torn Apart By Dingos » Fri Jul 06, 2007 2:34 pm UTC

I might use this trick for closing e quickly. ;) e is wonderful and has everything I want and need in an editor, but its one flaw is that it's become pretty slow in later versions.

I'm no expert at REs, so I can't help with your problem. The only interest in REs I have in e is to be able to search-and-replace newlines, which only one other editor I've tried (I've tried about 35 Windows editors - and no I won't use vim or emacs) can do (metapad). Why is this a hard problem for editors? Many can find newlines, but they fuck up on replacing them.

iw
Posts: 150
Joined: Tue Jan 30, 2007 3:58 am UTC

Postby iw » Wed Jul 11, 2007 1:01 pm UTC

Torn Apart By Dingos wrote:e text editor. Not free, but free to use as long as it's in beta (even after your 30 day trial expires).

Notepad++ is free, and has a lot of really neat features (like regexp search and replace) and works well with Windows (unlike Emacs).

User avatar
evilbeanfiend
Posts: 2650
Joined: Tue Mar 13, 2007 7:05 am UTC
Location: the old world

Postby evilbeanfiend » Wed Jul 11, 2007 1:43 pm UTC

what? emacs works fine with windows, it wont look or feel like a native app but it will definitely work, i use it at home on windows.
in ur beanz makin u eveel

User avatar
pete
Posts: 126
Joined: Thu Apr 19, 2007 2:32 pm UTC

Postby pete » Wed Jul 11, 2007 5:16 pm UTC

Torn Apart By Dingos wrote: Many can find newlines, but they fuck up on replacing them.


I find that strange. You do know that DOS format text files use 2 characters for a new line (0D,0A for carriage return, line feed) while unix format only uses one (0A)?.

If \n leaves unprintable characters in your text, try \r\n.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 10 guests