Searching with python

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

Searching with python

Postby >-) » Fri Aug 10, 2012 11:11 pm UTC

Is it possible to search a document, an xml file, for a term and print out how many results are found?
For example, in a short text file:
Spoiler:
abcd sdjfklj bsndfkl jskdla abcd abcd jklds
, could i search for the term abcd and have python print out 3?
>-)
 
Posts: 164
Joined: Tue Apr 24, 2012 1:10 am UTC

Re: Searching with python

Postby WanderingLinguist » Sat Aug 11, 2012 12:33 am UTC

Very easy...
Code: Select all
found = 0
for line in file("foo.xml"):
    found += line.count("abcd")
print(found)


Edit:

A slightly shorter way...
Code: Select all
sum([x.count("abcd") for x in file("foo.xml")])


Or, if you don't mind reading the whole file into memory:
Code: Select all
file("foo.xml").read().count("abcd")


If you're on Mac or Linux or or something unixy, you don't even need Pyhton. Just use grep:
Code: Select all
grep -c abcd foo.xml
Last edited by WanderingLinguist on Sat Aug 11, 2012 12:41 am UTC, edited 1 time in total.
User avatar
WanderingLinguist
 
Posts: 150
Joined: Tue May 22, 2012 5:14 pm UTC
Location: Seoul

Re: Searching with python

Postby >-) » Sat Aug 11, 2012 12:39 am UTC

Ok thanks. By the way where do you figure stuff like this out, i've gone through several python tutorial thingies and none of them mention it (although maybe they have and i've just forgotten).
>-)
 
Posts: 164
Joined: Tue Apr 24, 2012 1:10 am UTC

Re: Searching with python

Postby WanderingLinguist » Sat Aug 11, 2012 12:53 am UTC

Just go to python.org (the official Python web site) and click Documentation in the left-hand sidebar.

There's a section on the Python Standard Library. I suggest browsing it and being generally familiar with the stuff there. Tutorials will only get you so far.

One of the things you'll find there is a section called "file objects". This explains how to open and read files.

If you're using Python interactively, there's the dir() function which tells you what fields and methods are available for an object. Most of the build-in objects and methods also have a field called __doc__ that includes a brief explanation.

So, knowing how to open a file, I guessed the rest. Here's my interactive session:

Code: Select all
$ python
Python 2.7.2 (default, Jun 20 2012, 16:23:33)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> f = file("foo.xml")
>>> dir(f)
['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines']
>>> dir(f.read)
['__call__', '__class__', '__cmp__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']
>>> print(f.read.__doc__)
read([size]) -> read at most size bytes, returned as a string.

If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
>>> s = f.read()
>>> dir(s)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>> print(s.count.__doc__)
S.count(sub[, start[, end]]) -> int

Return the number of non-overlapping occurrences of substring sub in
string S[start:end].  Optional arguments start and end are interpreted
as in slice notation.
>>> s.count("abcd")
3
>>> ^D
$


(The stuff after >>> is the what I typed in to the Python interpreter).

The point being, you can figure out a LOT with some guesswork and using dir() and __doc__.
User avatar
WanderingLinguist
 
Posts: 150
Joined: Tue May 22, 2012 5:14 pm UTC
Location: Seoul

Re: Searching with python

Postby EvanED » Sat Aug 11, 2012 1:28 am UTC

Also, help(thingy) is usually easier to use than print(thingy.__doc__), at least IMO. (It gives you a searchable pager -- use / to search.) You could also give ipython a try; I haven't made the switch yet, but looking around it seems like it has some neat features and, if memory serves, one of them is a different & even easier help system.
EvanED
 
Posts: 3781
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: Searching with python

Postby WanderingLinguist » Sat Aug 11, 2012 2:22 am UTC

EvanED wrote:Also, help(thingy) is usually easier to use than print(thingy.__doc__), at least IMO. (It gives you a searchable pager -- use / to search.) You could also give ipython a try; I haven't made the switch yet, but looking around it seems like it has some neat features and, if memory serves, one of them is a different & even easier help system.


Wow, cool. I didn't know about help(). Is that something new?
User avatar
WanderingLinguist
 
Posts: 150
Joined: Tue May 22, 2012 5:14 pm UTC
Location: Seoul

Re: Searching with python

Postby troyp » Sat Aug 11, 2012 2:44 am UTC

EvanED wrote:Also, help(thingy) is usually easier to use than print(thingy.__doc__), at least IMO. (It gives you a searchable pager -- use / to search.)

The help system is useful, although tbh, I think it's pretty second-rate. It doesn't even seem to have any error handling. Typing "modules" would crash the help system (and pop up a box asking if I want to quit Idle altogether) because pdfrecycle wasn't handling the "-n" option. I removed pdfrecycle, thinking it might improve things, but it did the opposite. Now when I ask for a module list, Idle hangs irrevocably and has to be killed (presumably because of the misbehaviour of one of the countless other python modules on my Ubuntu system).

You could also give ipython a try; I haven't made the switch yet, but looking around it seems like it has some neat features and, if memory serves, one of them is a different & even easier help system.

I haven't make the switch either, but I have used ipython and it's tremendously useful. It's got all kinds of help features and special commands, macros, etc. It's integrated with the shell as well: you can call shell functions/commands and you can mostly send information back and forth between python and shell commands (although I remember there was some limitation wrt this, can't remember what it was). It comes with the browser-based IPython notebook now (similar to Mathematica Notebook or Sage Notebook), but I haven't tried it.

There's also bpython, which is a lightweight alternative for Python scripting.
troyp
 
Posts: 398
Joined: Thu May 22, 2008 9:20 pm UTC
Location: Lismore, NSW

Re: Searching with python

Postby EvanED » Sat Aug 11, 2012 3:01 am UTC

WanderingLinguist wrote:
EvanED wrote:Also, help(thingy) is usually easier to use than print(thingy.__doc__), at least IMO. (It gives you a searchable pager -- use / to search.) You could also give ipython a try; I haven't made the switch yet, but looking around it seems like it has some neat features and, if memory serves, one of them is a different & even easier help system.


Wow, cool. I didn't know about help(). Is that something new?

Nope. :-) (Been around for at least a few years.)

It's funny... that was one of the first things I learned, but there are a lot of people who have been using Python for at least a little while who haven't seen it. So perhaps one group or the other took a strange path.

troyp wrote:
EvanED wrote:Also, help(thingy) is usually easier to use than print(thingy.__doc__), at least IMO. (It gives you a searchable pager -- use / to search.)

The help system is useful, although tbh, I think it's pretty second-rate. It doesn't even seem to have any error handling. Typing "modules" would crash the help system (and pop up a box asking if I want to quit Idle altogether) because pdfrecycle wasn't handling the "-n" option. I removed pdfrecycle, thinking it might improve things, but it did the opposite. Now when I ask for a module list, Idle hangs irrevocably and has to be killed (presumably because of the misbehaviour of one of the countless other python modules on my Ubuntu system).

Ah, I don't use Idle either. :-) (I just use Python from the console.)

I'm not going to say that help is great or even good, but I haven't had it crash Python either.
EvanED
 
Posts: 3781
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI

Re: Searching with python

Postby troyp » Sat Aug 11, 2012 5:22 am UTC

Oh it's not IDLE, it happens in the terminal too (although at least there I get error messages and I can get get out of it with Ctrl-Z without having to kill the terminal emulator itself). Usually I use Python from Emacs, but sometimes I use Idle to play around in because it has tooltips for function arguments which is convenient for unfamiliar modules (and which I haven't set up in Emacs). It's handy to have a separate "scratch area" anyway (and I'm kinda fond of Idle - god knows why). If I just want to do some quick calculations or scripting, I'll use Python/IPython from a terminal (sometimes it's handy to keep Python in a drop-down terminal just for a calculator).

Anyway, if you've never had a problem with help(), maybe there's something screwy with my setup :-S
troyp
 
Posts: 398
Joined: Thu May 22, 2008 9:20 pm UTC
Location: Lismore, NSW


Return to Coding

Who is online

Users browsing this forum: wurlitzer153 and 14 guests