One of the most common questions asked of my Embedded RSS Reader is how to process the XML data of an RSS feed on the 8-bit microprocessor.
In a conventional application, it is trivial to use an XML library to parse the data into a tree-like structure. In PHP5, for instance, you could simply write:
$xml = simplexml_load_file("myfeed.xml");
Voila! A complete XML tree is now stored in $xml. On the AVR this is slightly more problematic because there is simply not enough memory to store the textual feed, let alone the corresponding tree structure.
The good news is that we don’t actually have to store anything except the required information. We process the incoming data stream one character at a time using a regular expression to match the appropriate tags and capture groups to extract the title, description and link.
Obviously, the AVR cannot use a Perl-style regular expression. That would use too much memory and require too much computation. Furthermore, some RegEx engines work backwards through the data so the entire downloaded document would need to be stored somewhere.
We must go back to the raw elements of how a RegEx works. It is, after all, a representation of a non-deterministic finite automaton (try saying that five times fast). Digital electronics love state machines, and computer software is no exception. The net result, a finite state machine (DFA) with a hundred-or-so states that can extract the relevant parts of an RSS document on the fly.








10/05/2010 at 10:06 am Permalink
I was just wondering if you got any further with this, I just started with the Arduino and to create (sort of) an RSS parser seems pretty hard.
10/05/2010 at 11:05 am Permalink
whoops had to look better on your site