Home C++ Lua GitHub Games Math Myself Contact

HTML Parser / Beautifier

Once upon a time I was asked to make some webpages, but in the files I was given all the HTML was generated by something that clearly never considered that a human being might be working on it later. I spent ten minutes looking for a non-shareware option for my Mac here. I might've found some viable options. I didn't really care. I just wanted an excuse to code up a parser in Lua.

I have since pushed that version aside (you can find it in the "old" subfolder of the zip file). The new version is one I came up with while entertaining a desire to spider all my old blogs off a website or two before closing my accounts. The first version was too simple to handle things like embedded scripts or styles. I would've gone for JavaScript's DOM interface but it can be only run in a browser, leaving GreaseMonkey as the strongest bet. Second on that list would be Rhino with a HTML parser jar -- either a 3rd party or mozilla's (30-meg) parser. In the end I decided to settle with the comfortable, flexible language of Lua anyways. I just wrote another parser.

I fixed a bug in an older version that caused it to gimp out when element attribute values weren't surrounded by quotes. I'm willing to bet it'll still gimp out if it runs into an escaped quote inside a value.

Download the Source Code Here
GitHub: http://github.com/thenumbernine/htmlparser-lua