[TAG] 2c tip : search google from command line
Raj Shekhar
rajshekhar at hotpop.com
Tue Sep 21 07:46:40 MSD 2004
Hello all,
This is a ugly hack that I am using to search the google from command
line. Any decent Python programmer would be able to make it much better.
You need to have Pygoogle (http://pygoogle.sourceforge.net/) module
installed. In its unaltered form, the script will require Python2.3 to
run. However, if you remove the #--ugly hack part (see the comments in
the code), it will run with Python2.2 too.
``
#!/usr/bin/python2.3
import google,sys,codecs
from sgmllib import SGMLParser
# HTML Stripper class to strip out html from the google search
# returned. shamelessly copy pasted from
# http://mail.python.org/pipermail/tutor/2002-September/017573.html
class HTMLStripper(SGMLParser):
def __init__(self):
SGMLParser.__init__(self)
self._text = []
def handle_data(self, data):
self._text.append(data)
def read_text(self):
return ''.join(self._text)
def strip_html(text):
stripper = HTMLStripper()
stripper.feed(text)
return stripper.read_text()
print "Searching the World Live Web "
google.setLicense('your google key') # must get your own key from
http://www.google.com/apis/ -> free but requires registration
n_show_results = 10 #change the number of search results that are shown
from here
codecs.register_error('xml', codecs.xmlcharrefreplace_errors)
search_str = ""
for i in range(1,len(sys.argv)):
search_str = search_str + " " + sys.argv[i]
print "Searching for " ,search_str
data = google.doGoogleSearch(search_str,0,n_show_results)
print 'Search took %f time and I found a total of %d results\n' % \
(data.meta.searchTime, data.meta.estimatedTotalResultsCount)
for result in data.results:
# if you are going to call this script from within emacs, replace
# this part with the code within the #begin hack -- #end hack code
print 'Title\t:', strip_html(result.title)
print 'URL\t:', result.URL
print
#-- begin hack
# if you want to call this script from within emacs, then you have
#to put in this ugly hack. Other wise emacs will stop with an
#error message ``UnicodeEncodeError: 'ascii' codec can't encode
#character u'\xfc' in position 1: ordinal not in range(128)''
# see http://www.informit.com/articles/article.asp?p=31272&seqNum=5
# to know why this ugly hack is needed
## temp = result.title
## in_tuple=codecs.getencoder('ASCII')(temp, 'xml')
## in_str = str(in_tuple)
## print 'Title\t:', strip_html(in_str)
## print 'URL\t:', result.URL
## print
#-- end hack
print "\n "
''
--
,-.___,-. Raj Shekhar
\_/_ _\_/ System Administrator, programmer and slacker
)O_O( home : http://rajshekhar.net
{ (_) } blog : http://rajshekhar.net/blog/
`-^-' work : http://netphotograph.com
More information about the TAG
mailing list