It's just zipped XML!

If you found this page, it is likely that you have noticed that the ODF (Open Document Format) is nothing more than a zip archive of some xml files. It is also likely that you would like to version-control this data in more than a binary snapshot mode.

Better than Binary

Of course this inherently transparent and hackable just-zipped-xml ODF format is a great thing if you want to do something cool and interesting with it, such as get it into your subversion repository in a way that stores each version as something other than an opaque and nearly impenetrable blob. If this statement seems a bit rash, quick: tell me the difference between versions 12 and 50 of whatever document you have handy.

It's just XML right?

If only it were really that simple. The XML has no linebreaks, so without a special tool, we are stuck in some kind of workflow like:

  unzip -d .file.dir file.sxw
  for i in content.xml meta.xml settings.xml styles.xml
    do xml_linebreak $i
  svn_adder .file.dir
  svn ci .file.dir -m ".file.dir - unpack of file.sxw"

You might notice that there are two mythical tools involved in that command (or at least, they only currently live in my subversion repository.) So, even with a couple of mythical tools, this is still a royal pain and doesn't answer the question of how you deal with updates coming across the wire from someone else's update to the same file.

Designing the Workflow

In the interest of answering these questions, I've taken a crack at what I'm calling oovc

For starters, we have to initialize the process:

  oovc init Foo.sxw

This will create the directory .Foo.sxw/, unpack the file into that directory, add linebreaks to the xml, note the crc's from the zip archive, set an svn:ignore property for the current directory which contains "Foo.sxw", and svn add the directory. I think that's the right set of steps to take, though I still question whether ignoring the binary file is appropriate.

As far as what else happens, I'm still working on the details, but you'll probably want something like:

  oovc ci Foo.sxw -m "message goes here"

Which should unpack the file and checkin the directory (probably emulating the svn command's commit-message editor behavior (or just passing-off control.))

And then this is the really tricky part:

  oovc up File.sxw

What happens if Joe just did a checkin and you have local changes? Recall that this is the entire reason that the manual approach is not workable. Note the bit above where I mentioned that I grab the crc's out of the archive as part of the unpack. Well, that's part of the answer to how this will work.

Questions and RFC

I could really use some feedback on how everybody wants their eggs scrambled here, so ping me if you have some thoughts on this.

About that svn:ignore thing

Do you want the binary checked-in alongside the unpacked directory? This is certainly feasible, but the advantage of ignoring it and having the directory hidden (note the leading ".") is that it encourages you to use the oovc tool instead of the svn tool to manage the directory. Think about all of the bad things that could happen if we don't ignore the binary file and it gets checked-in a few times without the directory getting refreshed and sent too.

On the other hand, if someone without oovc installed checks-out your directory, they can't open the file without packing the directory, etc. However, I'm not inclined to write a lot of extra code into my freely available and redistributable tool just to accommodate people who didn't install it. Maybe that's just today's (Fri Feb 10, 2006) state of mind... Convince me otherwise if you think it needs to be so.


All material Copyright © 2005-2009 Scratch Computing.