The Valley of Premature Optimization

Posted on February 11, 2012 in Computrons Literature

So this is the first text produced using the new Nokogiri-based HTML parser unassisted. The process of getting it to work found me looking at a lot of the individual paragraphs of Ulysses on their own, and it made me want to read the book again. We’ll see how long I hold on to that.

One really important thing is that generating text based on a book this big (~15 megs in plain-text, and 16 with HTML markup) is taking FOR EVER. I’m not sure what I’m going to do about that. For one thing, there’s still a strong argument for having the parsing happen in Javascript which would make optimizing the current code a waste of time. If I don’t decide to do that, then I think dealing with it on the character level is probably the way to go. For now, suffering is the only option that doesn’t expose me directly to the root of all evil.

The Project Gutenberg EBook of Ulysses, by James Joyce — I — Stately, plump Buck Mulligan wiped again his gem, turned it and put it neatly into her mouth, asking: –What time is that? –Seven d., sir… Thank you, sir. Mr Bloom said. The drain, you mean. –Drain? Lenehan said. It was an infinite great fall of dung, the breeders in hobnailed boots trudging through the meshes of his body laid. Dolor! O, he did. And Jacky Caffrey shouted to look, look, look, look: you look for some money somewhere? Dilly said. Give me my Wordsworth. Enter Magee Mor Matthew,

This is the best one of these that I have put up by some distance.

