Ten years later, my "solution" is still to reparse the XML-file (or blob) from the beginning any time new data arrives (which in the example below is reported by select()
). To avoid reacting to the already-processed "events", I keep count of the already reacted-to...
In the code below, processing of a new event consists of simply logging it. But it could be anything else, of course.
My only justification for the reparsing is that the files I'm dealing with are small -- no more than 100 elements, usually under 10. But the XML-text arrives sporadically and I want the new arrivals reported immediately, without waiting for the sending process to finish.
I wish, there was a way to tell xml.etree.ElementTree
to resume parsing a file, for which it has thrown an error earlier, but there is not...
Maybe, we ought to use something under xml.sax.*
instead...
The below code would work with Python-2.x and 3.x:
reported = 0 # count of the already-reported eventswhile True:... r, w, x = select.select([reader], writers, [reader]) if x: log.warn('Exceptions %s', x) if w: ... if not r: continue chunk = os.read(reader, 251) if not chunk: log.debug('There was nothing to read') continue ... # XXX: Here we repeatedly reparse the XML-text collected so # XXX: far -- cannot find another way to reliably report new entries. # XXX: To avoid reprinting the earlier entries, we keep count... count = 0 try: # The logfile is not complete yet, so parsing will # eventually fail: for event, element in ET.iterparse(logfile): if event != 'end' or not element.text: continue count += 1 if count > reported: log.info('%s: %s', element.tag, element.text) except SyntaxError: # In Python 2.6 this is a syntax-error pass except ET.ParseError: # In later Pythons this is a ParseError pass reported = count