Changes between Version 7 and Version 8 of BluePrint/Importer


Ignore:
Timestamp:
04/03/10 13:16:48 (12 years ago)
Author:
Nitin Rastogi
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • BluePrint/Importer

    v7 v8  
    44  * http://wiki.github.com/fizx/parsley/
    55  * http://developer.yahoo.com/yql/guide/
    6   * PDFminer is a tool to convert pdf docs into text, it is open source [http://www.unixuser.org/~euske/python/pdfminer/index.html#license (Licence)]. Some hacking in the souce code will is a good option for coding IMPORTING TOOL [http://trac.sahanapy.org/wiki/SpreadsheetImporter Spreadsheet Importer]
     6  * PDFminer is a tool to convert pdf docs into text, it is open source [http://www.unixuser.org/~euske/python/pdfminer/index.html#license (Licence)]. Some hacking in the souce code will is a good option for coding IMPORTING TOOL [http://trac.sahanapy.org/wiki/SpreadsheetImporter Spreadsheet Importer] by codestasher
     7  * Code snippet to extract hyperlinks from HTML docs.
     8{{{
     9import sgmllib
     10
     11class MyParser(sgmllib.SGMLParser):
     12   
     13    def parse(self, s):
     14        self.feed(s)
     15        self.close()
     16
     17    def __init__(self, verbose=0):
     18        sgmllib.SGMLParser.__init__(self, verbose)
     19        self.hyperlinks = []
     20
     21    def start_a(self, attributes):
     22        for name, value in attributes:
     23            if name == "href":
     24                self.hyperlinks.append(value)
     25
     26    def get_hyperlinks(self):
     27        return self.hyperlinks
     28
     29import urllib, sgmllib
     30
     31f = urllib.urlopen("http://www.python.org")
     32s = f.read()
     33
     34
     35myparser = MyParser()
     36myparser.parse(s)
     37
     38
     39print myparser.get_hyperlinks()
     40
     41}}}
     42by codestasher
     43
    744
    845