Quantcast
Channel: Parsing a partial XML with python lxml - Stack Overflow
Viewing all articles
Browse latest Browse all 4

Answer by Mikhail T. for Parsing a partial XML with python lxml

$
0
0

Ten years later, my "solution" is still to reparse the XML-file (or blob) from the beginning any time new data arrives (which in the example below is reported by select()). To avoid reacting to the already-processed "events", I keep count of the already reacted-to...

In the code below, processing of a new event consists of simply logging it. But it could be anything else, of course.

My only justification for the reparsing is that the files I'm dealing with are small -- no more than 100 elements, usually under 10. But the XML-text arrives sporadically and I want the new arrivals reported immediately, without waiting for the sending process to finish.

I wish, there was a way to tell xml.etree.ElementTree to resume parsing a file, for which it has thrown an error earlier, but there is not...

Maybe, we ought to use something under xml.sax.* instead...

The below code would work with Python-2.x and 3.x:

reported = 0    # count of the already-reported eventswhile True:...    r, w, x = select.select([reader], writers, [reader])    if x:        log.warn('Exceptions %s', x)    if w:    ...    if not r:        continue    chunk = os.read(reader, 251)    if not chunk:        log.debug('There was nothing to read')        continue    ...    # XXX: Here we repeatedly reparse the XML-text collected so    # XXX: far -- cannot find another way to reliably report new entries.    # XXX: To avoid reprinting the earlier entries, we keep count...    count = 0    try:        # The logfile is not complete yet, so parsing will        # eventually fail:        for event, element in ET.iterparse(logfile):            if event != 'end' or not element.text:                continue            count += 1            if count > reported:                log.info('%s: %s', element.tag, element.text)    except SyntaxError:        # In Python 2.6 this is a syntax-error        pass    except ET.ParseError:        # In later Pythons this is a ParseError        pass    reported = count

Viewing all articles
Browse latest Browse all 4

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>