Java boilerplate

I fixed a Wicket bug today for compressing whitespace in HTML. The wrinkle is that you need to avoid doing that inside <pre> tags, otherwise your code formatting goes all wrong.

You should probably just gzip your responses instead, as it’ll be much more efficient than this, but apparently, someone cares enough to raise a bug report, so I rolled up my sleeves. I mean, how hard can it be to strip out some whitespace?

You can do this fairly nicely in Perl or PHP or some other language that understands regexp callbacks. However, I wanted to do it in Java without the gnu-regexp library and the neatest and best I could come up with was this:

/**
 * Remove whitespace from raw markup
 * 
 * @param rawMarkup
 * @return rawMarkup with compressed whitespace.
 */
protected String compressWhitespace(String rawMarkup)
{
	// We don't want to compress whitespace inside <pre> tags, so we look
	// for matches and:
	//  - Do whitespace compression on everything before the first match.
	//  - Append the pre match with no compression.
	//  - Loop to find the next match.
	//  - Append with compression everything between the two matches.
	//  - Repeat until no match, then special-case the fragment after the
	//    last pre.
 
	Pattern preBlock = Pattern.compile("<pre>.*?</​pre>", Pattern.DOTALL | Pattern.MULTILINE);
	Matcher m = preBlock.matcher(rawMarkup);
	int lastend = 0;
	StringBuffer sb = null;
	while (true)
	{
		boolean matched = m.find();
		String nonPre = matched
				? rawMarkup.substring(lastend, m.start())
				: rawMarkup.substring(lastend);
		nonPre = nonPre.replaceAll("[ \t]+", " ");
		nonPre = nonPre.replaceAll("( ?[\r\n] ?)+", "n");
 
		// Don't create a StringBuffer if we don't actually need one.
		// This optimises the trivial common case where there is no &lt;pre&gt;
		// tag at all down to just doing the replaceAlls above.
		if (lastend == 0)
		{
			if (matched)
			{
				sb = new StringBuffer(rawMarkup.length());
			}
			else
			{
				return nonPre;
			}
		}
		sb.append(nonPre);
		if (matched)
		{
			sb.append(m.group());
			lastend = m.end();
		}
		else
		{
			break;
		}
	}
	return sb.toString();
}

And something vaguely equivalent in Perl:

$_ = "text to compress";
s#(.*?)(<pre>.*?</​pre>|$)#($_, $pre) = ($1, $2); s/s+/ /g;$_.$pre#emg;
print "$_";

Ugh. Java really sucks sometimes.:-(

It’s not so much that the regular expression stuff in Java is all that much less powerful than Perl, it’s just that by exposing it all as classes and methods, and without such niceties as closures, it’s all so very verbose.

3 thoughts on “Java boilerplate”