Text Processing Help Request

"Please leave a message at the beep, we will get back to you when your support contract expires."

Moderators: phlip, Moderators General, Prelates

User avatar
Izawwlgood
WINNING
Posts: 18686
Joined: Mon Nov 19, 2007 3:55 pm UTC
Location: There may be lovelier lovelies...

Text Processing Help Request

Postby Izawwlgood » Wed Jun 24, 2015 2:54 am UTC

I have a log from a game, and I want to extract some data from the log. There's a mess of useless text here, but thankfully, the information I want is bracketed by ***, or really anything I want.

So, for example:

Spoiler:

Code: Select all

>power
You reach out with your senses and see luminous streams of black Necromantic mana oozing through the area.
Letting your senses extend further, you feel there is pulsating mana to the southwest, and lambent mana to the northeast.
You sense the Obfuscation spell upon you, which will last for about two roisaen.
Roundtime: 3 sec.
>
The mountain giant moves into a position to parry.
>
* Looking as if this were a bad idea, a mountain giant feints to the side at you.  You block with a gargoyle-hide targe. 
[You're solidly balanced and opponent has slight advantage.]
>stance shie

You are now set to use your shield stance:

  Attack :  100%
  Evade  :  80%
  Parry  :  0%
  Block  :  100%

>
A mountain giant spreads her mouth and gingerly plucks a cracked rib from between her blunt yellow teeth, and casually tosses it asides with a pleased expression.
>stow left
>stow right
Stow what?  Type 'STOW HELP' for details.
>
You put your longbow in your kidskin baldric.
>
***bow took 5 hits to kill!***
>face next
setVariable: exp=lt


And I only want the '***bow took 5 hits to kill!***' part. Assuming this is copied into a Word or Text document, how can I delete all lines of text that aren't sandwiched between ***? Ideally, I'd want the output to be something like this -

Code: Select all

***bow took 5 hits to kill!***
***mace took 12 hits to kill!***
***scimitar took 11 hits to kill!***


etc. Super duper ideally, I'd rekajigger the output so it read ***scimitar 11*** or ***bow 5*** and then just copy the whole mess to Prism for stats.
... with gigantic melancholies and gigantic mirth, to tread the jeweled thrones of the Earth under his sandalled feet.

User avatar
PeteP
What the peck?
Posts: 1451
Joined: Tue Aug 23, 2011 4:51 pm UTC

Re: Text Processing Help Request

Postby PeteP » Wed Jun 24, 2015 3:11 am UTC

If you have the text in something like notepad++ and activate regex in the search this: \*\*\*([a-zA-Z]*) [a-zA-Z ]*([0-9]*)[a-zA-Z !]*\*\*\* should find the line and put the first word and the number in capturing group 1 and 2. If you put "\1 \2" in the replacement part the bowline turns into bow 5. Of course that does nothing for removing the rest of the text (I would just run it through a script that only returns finds and said finds and returns lines) but I need to sleep now.
Last edited by PeteP on Wed Jun 24, 2015 11:40 am UTC, edited 1 time in total.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6166
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Text Processing Help Request

Postby Thesh » Wed Jun 24, 2015 3:22 am UTC

PeteP wrote:If you have the text in something like notepad++ and activate regex in the search this: \*\*\*([a-zA-Z]*) [a-zA-Z ]*([0-9]*)[a-zA-Z !]*\*\*\* should find the line and put the first word and the number in capturing group 1 and 2. If you put "\1 \2" in the replacement part the bowline turns into bow 5. Of course that does nothing for removing the rest of the text (I would just run it through a script that only returns finds and said lines) but I need to sleep now.


Normally what I do is find the whole line and replace with something like "`\1" and then I replace "[^`].*$" with nothing, then replace "\n\n+" with "\n".

Also, if you can use sed (if not on *nix/mac, you can get it through gnuwin32) and do something like:

Code: Select all

sed -n -r "/\*{3}.*\*{2}$/p" log.txt
Summum ius, summa iniuria.

User avatar
hotaru
Posts: 1040
Joined: Fri Apr 13, 2007 6:54 pm UTC

Re: Text Processing Help Request

Postby hotaru » Wed Jun 24, 2015 4:09 am UTC

Thesh wrote:Also, if you can use sed (if not on *nix/mac, you can get it through gnuwin32) and do something like:

Code: Select all

sed -n -r "/\*{3}.*\*{2}$/p" log.txt

why not grep instead of sed?

Code: Select all

grep -E '^(\*{3}).*\1$' log.txt

Code: Select all

factorial product enumFromTo 1
isPrime n 
factorial (1) `mod== 1

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6166
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Text Processing Help Request

Postby Thesh » Wed Jun 24, 2015 4:22 am UTC

Because I always prefer sed when modifying a file, and pretty much limit grep to searching.
Summum ius, summa iniuria.

User avatar
hotaru
Posts: 1040
Joined: Fri Apr 13, 2007 6:54 pm UTC

Re: Text Processing Help Request

Postby hotaru » Wed Jun 24, 2015 5:36 am UTC

Thesh wrote:Because I always prefer sed when modifying a file, and pretty much limit grep to searching.

but you're not modifying a file there. you're searching for lines that match that regular expression and then printing them, which is exactly what grep does. also, if you do want to modify a file, i'm pretty sure you'd want to use ed instead of sed.

Code: Select all

factorial product enumFromTo 1
isPrime n 
factorial (1) `mod== 1

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6166
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Text Processing Help Request

Postby Thesh » Wed Jun 24, 2015 5:40 am UTC

hotaru wrote:
Thesh wrote:Because I always prefer sed when modifying a file, and pretty much limit grep to searching.

but you're not modifying a file there. you're searching for lines that match that regular expression and then printing them, which is exactly what grep does. also, if you do want to modify a file, i'm pretty sure you'd want to use ed instead of sed.


Fine, editing a stream. Either way, sed does it just as well as grep, just without highlighting. What are the advantages in this case?
Summum ius, summa iniuria.

User avatar
Izawwlgood
WINNING
Posts: 18686
Joined: Mon Nov 19, 2007 3:55 pm UTC
Location: There may be lovelier lovelies...

Re: Text Processing Help Request

Postby Izawwlgood » Wed Jun 24, 2015 11:20 am UTC

So, I have a PC, but am unfamiliar with gnuwin32. Or all these commands.
... with gigantic melancholies and gigantic mirth, to tread the jeweled thrones of the Earth under his sandalled feet.

User avatar
hotaru
Posts: 1040
Joined: Fri Apr 13, 2007 6:54 pm UTC

Re: Text Processing Help Request

Postby hotaru » Wed Jun 24, 2015 5:51 pm UTC

Thesh wrote:Fine, editing a stream. Either way, sed does it just as well as grep, just without highlighting. What are the advantages in this case?

the biggest advantage is that grep is a much simpler program than sed. grep's primary purpose is exactly what you're doing there (print lines matching a regex), while sed also does other things, and you therefore have to specify more behavior yourself (matching and printing, in addition to the regex itself). why make someone new to the command line learn three new languages, when they can do the same thing with just two?

Code: Select all

factorial product enumFromTo 1
isPrime n 
factorial (1) `mod== 1

User avatar
PeteP
What the peck?
Posts: 1451
Joined: Tue Aug 23, 2011 4:51 pm UTC

Re: Text Processing Help Request

Postby PeteP » Wed Jun 24, 2015 10:57 pm UTC

Izawwlgood wrote:So, I have a PC, but am unfamiliar with gnuwin32. Or all these commands.

Well if you install mingw for instance and have the text in a file called log.txt either put it in the home directory of mingw (msys\1.0\home\username is the default I think) or switch to the right folder then you can just copy Thesh's command or some other command and it will output the relevant lines.

----
Ah whatever here match.exe
put this into a folder, put your log into a file named log.txt in the same folder. Run the exe and it should output into a file called result.txt. I am reasonably sure that it shouldn't delete all file on your hard drive!

User avatar
Flumble
Yes Man
Posts: 2023
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: Text Processing Help Request

Postby Flumble » Thu Jun 25, 2015 1:56 am UTC

Izawwlgood wrote:So, I have a PC, but am unfamiliar with gnuwin32. Or all these commands.

On windows you can use the findstr command in the command prompt

Code: Select all

findstr /B "***" original_log.txt > filtered_log.txt

where /B searches for the three asterisks at the beginning of a line, original_log.txt is the input log file (just drag the file to the prompt instead of navigating to the directory in the prompt and typing the file name) and filtered_log.txt is the output log file (just drag the same file if you're not interested at the unfiltered log).


Another approach is to open the file in your favorite web browser, open up the console (F12 or something like that) and paste:

Code: Select all

input = document.body.textContent;
output = "";
regex = /^\*\*\*(\S+).*?(\d+)/gm;

for (var match = regex.exec(input); match != null; match = regex.exec(input)) {
   output += match[1]+": "+match[2]+"\n";
}

document.body.textContent = output;
document.body.setAttribute("style", "white-space: pre");

Now you can copy the filtered text and save it.

(regarding the regex declaration: see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp#Special_characters_meaning_in_regular_expressions for basically every character on that line)

User avatar
Xanthir
My HERO!!!
Posts: 5311
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex
Contact:

Re: Text Processing Help Request

Postby Xanthir » Thu Jun 25, 2015 1:56 am UTC

Izawwlgood wrote:So, I have a PC, but am unfamiliar with gnuwin32. Or all these commands.

Ignore all them, then. Unix commands aren't helpful (nor is telling you to install a unix utils package). Here's what you need to do:

1. Install a decent text editor. I'm very partial to Sublime Text; it'll do everything you need, and is free to trial (read: use indefinitely, with occasionally nag popup). (But you should totes pay for it if you start actually using it; it's worth supporting.)

2. Load up the file there. In Sublime, you can then hit Ctrl-H to bring up the "Search and Replace" panel at the bottom.

3. In "Find What", enter "^(?!\*\*\*).*\n?" (without the quotes).

4. In "Replace With", make sure it's empty.

5. Hit "Replace All".

You'll be left with only the lines starting with three *s.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

User avatar
WanderingLinguist
Posts: 237
Joined: Tue May 22, 2012 5:14 pm UTC
Location: Seoul
Contact:

Re: Text Processing Help Request

Postby WanderingLinguist » Thu Jun 25, 2015 3:20 am UTC

Xanthir wrote:
Izawwlgood wrote:So, I have a PC, but am unfamiliar with gnuwin32. Or all these commands.

Ignore all them, then. Unix commands aren't helpful (nor is telling you to install a unix utils package). Here's what you need to do:

1. Install a decent text editor. I'm very partial to Sublime Text; it'll do everything you need, and is free to trial (read: use indefinitely, with occasionally nag popup). (But you should totes pay for it if you start actually using it; it's worth supporting.)


This. Sublime Text is awesome. Another way (if you don't feel like regular expressions) is just select the three *** on one line and choose "Find -> Quick Find All". Then type HOME, SHIFT+END, CTRL+C. You've now copied all lines that contain three ***s. You can now paste that into a fresh document. Sublime Text's multi-select feature will also let you get the ***scimitar 11*** or ***bow 5*** stuff that you want too (I think you'll figure it out if you just play around with the feature a bit; it's super-easy, but if not, post here with questions...).

User avatar
Izawwlgood
WINNING
Posts: 18686
Joined: Mon Nov 19, 2007 3:55 pm UTC
Location: There may be lovelier lovelies...

Re: Text Processing Help Request

Postby Izawwlgood » Thu Jun 25, 2015 10:56 am UTC

Gotcha, much obliged!
... with gigantic melancholies and gigantic mirth, to tread the jeweled thrones of the Earth under his sandalled feet.


Return to “The Help Desk”

Who is online

Users browsing this forum: No registered users and 3 guests