This short article, explaining how to trim unnecessary code (comments, empty lines) from HTML documents, is a follow-up to an article published a couple of weeks ago on this blog: Building Web Applications With Apache Ant. Basically, the idea is to use Ant’s optional replaceregexp task as shown below:

<target name="-trim.html.comments">
    <fileset id="html.fileset"
        includes="**/*.jsp, **/*.php, **/*.html"/>
    <!-- HTML Comments -->
    <replaceregexp replace="" flags="g"
        match="\<![ \r\n\t]*(--([^\-]|[\r\n]|-[^\-])*--[ \r\n\t]*)\>">
        <fileset refid="html.fileset"/>
    <!-- Empty lines -->
    <replaceregexp match="^\s+[\r\n]" replace="" flags="mg">
        <fileset refid="html.fileset"/>

Update: Use this code very carefully as it is dangerous territory (Thanks to my co-worker Ryan Grove for pointing out some of the shortcomings)

6 Responses to Trimming comments in HTML documents using Apache Ant

  1. Mike Henke says:

    Awesome. Keep the ant scripts coming they are great.

  4. Steve says:

    How do you prevent the regex from removing Javascript enclosed in HTML comments in order to hide it from browsers, which have Javascript disabled?
    I fiddled around a lot but never managed to get ant ignore comments which end with //->. It always results in an infinite loop. :-(

  5. @Steve

    Do not wrap inline JavaScript code inside HTML comments. Nobody uses Netscape 1 anymore…

  6. Jaime Bueza says:

    Here’s one for removing console.log within a combined yui compressed file.

    <replaceregexp file="my_combined_file.js" match="(console\.log\(.*\))" flags="g" replace="\/\/\1"></replaceregexp>