Refactoring a String Parser
A post on the Ruby list [ruby-talk:20893] included some code, with the question "Is there a more Ruby-esque way to do this...?" In this case, I didn't have automated unit tests, so I wasn't doing <strong>real</strong> refactoring. But it's not production code, and I have been refactoring without unit tests for many years (long before I knew it had a name). I certainly may have broken something here, so don't put this code into a pacemaker!My first step was to do some basic restructuring so I could understand the code better. Extracting out some methods simplified the main loop so I could see what was happening. I had to make some of the variables global. That got me to here: source code
Next, I removed some dead code and unused comments. It seemed to me that the loop to handle the single quoted string would be simpler if it looked ahead for backslashes, rather than constantly lugging around the last character that may or may not be a backslash. The result: source code
After doing the same thing for the double quoted string loop, I could see that the loops were really similar. So I combined them, passing in just enough information to distinguish between the two cases. I also used a hash to map the escape characters to the appropriate hex values. That gave me this: source code
I hate global variables in any language, so I got rid of the few that were left. A couple other minor tweaks, and I felt good enough to stop. I don't think it's perfect, but I can't see anything it it that jumps out at me and screams to be refactored further. The end: source code
Now, I <em>didn't</em> convert this to an object-oriented program. If I were serious about it, I probably would do so. I tend to think in objects, so it's natural for my code to reflect that. I suppose I should also mention that I'm not sure it's worth writing your own parser these days. I would go with a yacc-like solution if this were my project.
I also didn't use many Ruby-specific coding tricks. My style is pretty plain, sticking mostly to conventions that Java and C++ coders will be comfortable with. I want the code to be understood by as many people as possible, so I only resort to a deep bag of tricks when it's absolutely necessary.
Special thanks to Charles D Hixson for posting the original code, giving me this chance to learn and share.