[TAG] 2c tip : search google from command line

Raj Shekhar rajshekhar at hotpop.com
Tue Sep 21 07:46:40 MSD 2004


Hello all,

This is a ugly hack that I am using to search the google from command 
line. Any decent Python programmer would be able to make it much better. 
You need to have Pygoogle (http://pygoogle.sourceforge.net/) module 
installed. In its unaltered form, the script will require Python2.3 to 
run. However, if you remove the #--ugly hack part (see the comments in 
the code), it will run with Python2.2 too.

``
#!/usr/bin/python2.3
import google,sys,codecs
from sgmllib import SGMLParser

# HTML Stripper class to strip out html from the google search
# returned.  shamelessly copy pasted from
# http://mail.python.org/pipermail/tutor/2002-September/017573.html

class HTMLStripper(SGMLParser):
     def __init__(self):
         SGMLParser.__init__(self)
         self._text = []

     def handle_data(self, data):
         self._text.append(data)

     def read_text(self):
         return ''.join(self._text)


def strip_html(text):
     stripper = HTMLStripper()
     stripper.feed(text)
     return stripper.read_text()

print "Searching the World Live Web "

google.setLicense('your google key') # must get your own key from 
http://www.google.com/apis/ -> free but requires registration
n_show_results = 10 #change the number of search results that are shown 
from here

codecs.register_error('xml', codecs.xmlcharrefreplace_errors)

search_str = ""
for i in range(1,len(sys.argv)):
     search_str = search_str + " " + sys.argv[i]

print "Searching for " ,search_str

data = google.doGoogleSearch(search_str,0,n_show_results)

print 'Search took %f time and I found a total of %d results\n' % \
       (data.meta.searchTime, data.meta.estimatedTotalResultsCount)

for result in data.results:

     # if you are going to call this script from within emacs, replace
     # this part with the code within the #begin hack -- #end hack code

     print 'Title\t:', strip_html(result.title)
     print 'URL\t:', result.URL
     print

     #-- begin hack

     # if you want to call this script from within emacs, then you have
     #to put in this ugly hack. Other wise emacs will stop with an
     #error message ``UnicodeEncodeError: 'ascii' codec can't encode
     #character u'\xfc' in position 1: ordinal not in range(128)''

     # see  http://www.informit.com/articles/article.asp?p=31272&seqNum=5
     # to know why this ugly hack is needed

##     temp = result.title
##     in_tuple=codecs.getencoder('ASCII')(temp, 'xml')
##     in_str = str(in_tuple)
##     print 'Title\t:', strip_html(in_str)
##     print 'URL\t:', result.URL
##     print
     #-- end hack

print "\n "



''


-- 
,-.___,-.  Raj Shekhar
\_/_ _\_/  System Administrator, programmer and  slacker
   )O_O(    home : http://rajshekhar.net
  { (_) }   blog : http://rajshekhar.net/blog/
   `-^-'    work : http://netphotograph.com





More information about the TAG mailing list