Common Look and Feel – CLF 2.0 – Argh!

If you know the term Common Look and Feel – CLF 2.0, then chances are you know that the deadline is fast approaching (December 2008).  Across many Canadian Government sites progress is being made to convert content to meet the new standards.  Being fully compliant means far more than just converting html pages; it also means converting legacy PDFs, updating applications and dynamic content as well.  For the past year and a half, I’ve been helping out a group of webmasters convert static pages from CLF 1 to CLF 2.

One of the breakthroughs that helped us speed up conversion from CLF 1 to CLF 2 was the use of Dreamweaver and Regular Expressions (regex).  Essentially, this is just a fancy ’search and replace’ that can run through a whole bunch of pages all at once.  It doesn’t do everything automatically, but it does help speed up the conversion.

The way it works is that you open an old web page that needs to be converted, you identify what parts you’d like to copy, for instance, the page title, the metatags, the body of the page, the date modified and so on.  Then you write a regex that splits up the page into all of these component parts.  You then create a blank CLF 2 template page with variables in the spots on the page that will be replaced by the old content.  With a click, this will run through a section of site and voilà! you have new CLF 2 pages.  The last step required is to verify that everything worked; clean up old deprecated tags to xhtml strict 1.0 (this can be done with a click using Dreamweaver), validate, then publish.

You don’t need Dreamweaver to run a search and replace using regular expressions, there are other text/html editors out there that will also do the job such as UltraEdit, but it’s important to note that not all regex engines are the same so there are variations in the syntax depending on what tool you’re using.  For Dreamweaver, the code we use looks like the following.  Keep in mind that this code varies from section to section of our site depending on how the old page was structured.

(<%@)([^]*)(<title>)([^]*)(</title>)([^]*)(<html lang=”en-ca”>)([^]*)(name=”dc.creator” content=”)([^]*)(“>\n<meta name=”dc.title”)([^]*)(<!– Content Begins Here –>)([^]*)(<!– Content Ends Here –>)([^]*)

Now before you start thinking, “Whoa!  This is way too complicated I need to be a programmer to understand this!”  No worries, it’s actually pretty simple.  Between each bracket () a match will occur with a part of your old html 4 web page.  For intance, the match might be something very specific such as a title tag or it could be anything that falls between the last match and the next one.  Think of it as a wild card that says, ok, I don’t care what’s there, but I do know that it starts at this spot and ends on this other spot.  Each part of the content between brackets is stored in memory and can then be pasted into the new template by using a dollar sign ($) followed by a number.  In the above regex there are 17 uh sorry, sixteen matches that will occur, so the variable $4 will be any text that makes up the page’s title but nothing else.

To attempt to explain regex and how this all works in greater detail would be pretty silly to do so here, so if you have a sense of what I’m talking about and think this is something you’d like to try for converting your old government pages, give me a shout and we can discuss further.

Post to Twitter Post to Plurk Plurk This Post Post to Yahoo Buzz Buzz This Post Post to Facebook Facebook

  1. Tim A
    November 23rd, 2008 at 16:39 | #1

    This works well on smaller files. Once html file sizes get above 60k or so (it’s not consistent) you have to make sure that you have backed up the “content” because it becomes too big to be a variable and you lose a big chunk of content.

  2. Daphne
    November 23rd, 2008 at 16:52 | #2

    As I understand, first you run through all the English pages and convert them, then you do the French pages using a French template page.

  3. Dave
    November 23rd, 2008 at 20:24 | #3

    hi Jobe, when you have some free time this week, please drop me an email. I gave it a try but got stuck in one spot. It works for half the page..

  4. Tim A
    November 24th, 2008 at 06:10 | #4

    That’s what I do Daphne. I never really used Dreamweaver before, preferring to code eveything by hand. The feature that is nice with Dreamweaver’s search and replace is that you can make up your own regular expression and save it as a file for reuse at a later date. You can also create search and replace queries to replace recurring styles/classes that no longer exist. You have to be really careful with these, I tend to run them on individual open files rather than a whole directory until I’m sure there is no bug, this allows you to “revert to saved” if anything unexpected happens.

  5. December 5th, 2008 at 02:27 | #5

    Hi Jobe,

    We've been doing something similar with a .Net app. You will still need to upgrade to XHTML 1.0 Strict – if the legacy pages you're working on are anything like mine they probably don't conform with that standard.

    HTML Tidy is the defacto tool for upgrading your markup, and with a little programming you can create a simple app that will clean up your pages. Most languages have ported it over, and I stumbled across a C# version a little while ago. I wrote a short description on how to use it: http://www.nextdesigns.ca/articles/tidy.aspx

  6. December 6th, 2008 at 12:09 | #6

    I'd also highly recommend Tidy. To give non-webmasters the ability to publish CLF 2 web pages, we've been using a javascript web browser xhtml editor : tinymce which will tidy input and enforce xhtml strict. WP is also now using tinymce.

  7. January 7th, 2009 at 15:20 | #7

    I’ve used Tidy & TinyMCE. There are a bunch of other options available within http//www.Drupal.org too that are worth considering. For WYSIWYG editors, http://www.fckeditor.net/ is a good choice that is available for many systems. A neat option that is probably even better than a WYSIWYG editor is a WYSIWYM editor like http://drupal.org/project/wymeditor that focuses authors to concentrate on its structure and meaning, while trying to give the user as much comfort as possible.

    Thanks again for the regex, it looks useful. Hopefully most departments are moving to CMS based solutions that will allow them to stay more future compatible.

  8. April 16th, 2009 at 19:21 | #8

    hi Jobe,

    Just noticed this article. Good Stuff. We are delivering CLF2.0 Compliant Survey Software for Gov’t departments and it definitely was not an easy task for us… Once you get a hang of it, its not that bad at all though…

    Aydin.

  1. No trackbacks yet.

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.