Monday, 22 September 2008

Atoms Moved

I've now successfully imported a 700 post blog into draft.blogger.com


There were a few more quirks of blogger.com along the way. Having next and previous links in the Atom feed causes blogger to reject the entire import file, so I ended up putting them in as comments. There were also some old posts that didn't have an author (perhaps from some beta version of Roller, and blogger rejected those due to the empty name field.

Finally, I had an old comment from the blog's early days that was not UTF-8 encoded in the database (it doesn't appear correctly in Roller either), so I had to reencode that before blogger.com would accept it for import.


Here is the final export template for Apache Roller 4.0.



#set($pager = $model.getWeblogEntriesPager())
#set($map = $pager.getEntries())
<?xml version="1.0" encoding='utf-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:thr="http://purl.org/syndication/thread/1.0">
<id>$model.weblog.id</id>
<title>$utils.escapeXML($model.weblog.name)</title>
<subtitle>$utils.escapeXML($model.weblog.description)</subtitle>
<updated>$utils.formatIso8601Date($model.weblog.lastModified)</updated>
#if ($pager.nextLink)
<!-- link rel="next" type="application/atom+xml" href="$pager.nextLink"/ -->
#end
#if ($pager.prevLink)
<!-- link rel="previous" type="application/atom+xml" href="$pager.prevLink"/ -->
#end
#foreach( $day in $map.keySet())
#set($entries = $map.get($day))
#foreach( $entry in $entries )
<entry>
<id>$entry.id</id>
<title>$utils.escapeXML($entry.title)</title>

<author>
#if ("$entry.creator.fullName" != "")
<name>$entry.creator.fullName</name>
#end
</author>

<published>$utils.formatIso8601Date($entry.pubTime)</published>
<updated>$utils.formatIso8601Date($entry.updateTime)</updated>

<content type="html"><![CDATA[$entry.text]]></content>
<category scheme="http://schemas.google.com/g/2005#kind"
term="http://schemas.google.com/blogger/2008/kind#post"/>
<link rel="self" type="application/atom+xml" href="http://example.com/$entry.id" />
</entry>
## use Atom threading extensions for comment annotation
#foreach( $comment in $entry.comments )
<entry>
<id>$comment.id</id>
<title>$utils.escapeXML($utils.truncate($utils.removeHTML($comment.content), 40, 50, "..."))</title>

<author>
#if ("$comment.name" != "")
<name>$utils.escapeXML($comment.name)</name>
#end
#if ("$comment.url" != "")
<uri>$utils.escapeXML($comment.url)</uri>
#end
#if ("$comment.email" != "")
<email>$comment.email</email>
#end
</author>
<published>$utils.formatIso8601Date($comment.postTime)</published>
<updated>$utils.formatIso8601Date($comment.postTime)</updated>
<content>$utils.escapeXML($utils.removeHTML($comment.content))></content>
<thr:in-reply-to ref="$entry.id" type="application/atom+xml" href="$entry.permalink"/>
<category scheme="http://schemas.google.com/g/2005#kind"
term="http://schemas.google.com/blogger/2008/kind#comment"/>
<link rel="self" type="application/atom+xml" href="http://example.com/$entry.id" />
</entry>
#end
#end
#end
</feed>

Sunday, 14 September 2008

Moving Atoms

Having now found an apartment in Malaysia, I'm now faced with the logistics of moving. One problem I'm facing is that for the last 5 years I've run my own server, which now supports several blogs, a forum and my mobile phone's OTA backup. If I knew it was just for a week or two, I could take it offline temporarily, but from what I can tell, internet connectivity in Malaysia is probably not going to be reliable enough to keep all this online for a reasonable proportion of the time, and unless I pay serious money I'll be stuck with the hassle of a dynamic IP address. So I'm looking to migrate at least the blogs and forum off to other services. I've just finished migrating the posts and comments from my first blog (there are still hard links back to my server that need fixing), which was no easy task, as although draft.blogger.com supports Atom based import, it is very particular about some things, with very little documentation and completely useless error handling - it just seems to stop processing as soon as it finds something it doesn't like, and if nothing has been imported yet you get a general meaningless error message, but if one or more posts were successfully imported, it just silently fails to import the posts starting with where it failed. So here is what I found.


In addition to needing to be a valid Atom 1.0 feed, each entry needs a unique self referencing link: <link rel="self" type="application/atom+xml" href="post-id"/> The href does not have to be real, just unique, so I used example.com in my export from roller. The import process also does not accept empty tags for the post or comment author's email, name or uri (according to the rnc schema, only email cannot be empty).


The following is the template I used to export posts and comments from Apache Roller 4.0. It is based on an earlier export template for JRoller by Damien Bonvillain that output Atom 0.3, with updates for Roller 4.0, Atom 1.0 and blogger.com's undocumented quirks. The exported content still needs some post processing; removing or filling in empty author child tags, checking the truncation of comment titles or misformed content has not broken anything, and replacing relative references (which shouldn't be there in the first place), but generally it works for short blogs. There seems to be a hard-coded limit in getRecentWeblogEntries of 100 posts, so it needs rework to use a pager and an external script to fetch all the pages.


As with the original, paste the contents below into a new roller template, then use the template to access your blog. If you have too many posts for blogger.com or the export script to handle, you could try using a pager in the export template, or break up the export file manually after extracting everything.




#set($entries = $model.weblog.getRecentWeblogEntries('', 100))
<?xml version="1.0" encoding='utf-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" thr="http://purl.org/syndication/thread/1.0">
<id>$model.weblog.id</id>
<title>$utils.escapeXML($model.weblog.name)</title>
<subtitle>$utils.escapeXML($model.weblog.description)</subtitle>
<updated>$utils.formatIso8601Date($model.weblog.lastModified)</updated>

#foreach( $entry in $entries )
<entry>
<id>$entry.id</id>
<title>$utils.escapeXML($entry.title)</title>

<author>
<name>$entry.creator.fullName</name>
</author>

<published>$utils.formatIso8601Date($entry.pubTime)</published>
<updated>$utils.formatIso8601Date($entry.updateTime)</updated>

<content type="html"><![CDATA[$entry.text]]></content>
<category scheme="http://schemas.google.com/g/2005#kind"
term="http://schemas.google.com/blogger/2008/kind#post"/>
<link rel="self" type="application/atom+xml" href="http://example.com/$entry.id"/>
</entry>
## use Atom threading extensions for comment annotation
#foreach( $comment in $entry.comments )
<entry>
<id>$comment.id</id>
<title>$utils.escapeXML($utils.truncate($utils.removeHTML($comment.content), 40, 50, "..."))</title>

<author>
<name>$utils.escapeXML($comment.name)</name>
<uri>$utils.escapeXML($comment.url)</uri>
<email>$comment.email</email>
</author>
<published>$utils.formatIso8601Date($comment.postTime)</published>
<updated>$utils.formatIso8601Date($comment.postTime)</updated>
<content>$utils.escapeXML($utils.removeHTML($comment.content))></content>
<thr:in-reply-to ref="$entry.id" type="application/atom+xml" href="$entry.permalink">
<category scheme="http://schemas.google.com/g/2005#kind"
term="http://schemas.google.com/blogger/2008/kind#comment"/>
<link rel="self" type="application/atom+xml" href="http://example.com/$entry.id"/>
</entry>
#end
#end
</feed>

Tuesday, 12 August 2008

Flat hunting

Having just taken up a new job in Malaysia, I'm over here looking for somewhere to live before the family join me in late September. I've been here a week now, and am still getting handle on what flats, houses and apartments are available and where all the facilities are that we will need.



View Larger Map

Tuesday, 30 October 2007

Unicode and fonts


Kenichi Handa has been hard at work on a Unicode based Emacs for some years now. For Windows users, there is nothing radical in the default build, a few more languages are supported, and a wider range of Unicode characters, but the Windows specific code has only been updated enough to keep working. In future, optimisations and simplifications can be made due to the internal encodings of Emacs and Windows being both based on Unicode. Messing around with code pages to get fonts displaying will be a thing of the past, and can be already thanks to the new font backend.



While work progressed in parallel on Emacs 22 and the Unicode codebase, there were several other developments happening outside the core Emacs development team. Multiple terminal support (multi-tty) has already been merged with the CVS trunk, though it doesn't mean much to Windows users. Limitations in the way Windows handles console output mean that it is never likely to provide much in the way of new features on Windows, though it may be possible to rid ourselves of the runemacs.exe hack without sacrificing console support using the multi-tty feature.


Another new development was support for X freetype font rendering. On the face of it, this doesn't seem to mean much to Windows users either, but after being merged with the Unicode branch, it has been refactored into a new font backend design, with better support for unicode fonts. No longer are fonts defined by their character-set, Emacs can make use of font meta-data to determine which Unicode subranges each font supports. Currently the font backend is not enabled by default, but has to be enabled with a configure option. A backend has been implemented for Windows native fonts, and is ready for testing.



  • Checking out the Unicode branch:

    cvs -d:pserver:anonymous@cvs.sv.gnu.org:/sources/emacs co -r emacs-unicode-2 emacs

  • Building Emacs with font backend support:

    cd emacs\nt

    configure --enable-font-backend

    make bootstrap



Future work



The new font backend is noticeably slower on Windows. A lot of this is probably down to the fact that the old font code cached the font metrics for ASCII characters of fixed width fonts, whereas the new font backend does no such caching yet. We can probably do a better job of caching by calculating which ranges of characters the fixed width applies to, rather than just ASCII. We might even allow multiple such range/width combinations to be associated with a font, to speed up CJK text display (where characters in the font are one of two widths).


There is no support for BDF fonts in the new font backend. BDF fonts will be given their own font backend, hopefully mostly reusable on other platforms.


A Uniscribe font backend may be introduced to enable some of the more advanced layout features in Windows XP and later. The new font backend design makes it easier to add new support like this without complex dynamic loading logic to support older versions of Windows.


topic:[emacs]

Thursday, 29 March 2007

Izakaya ryori (pub food) - 居酒屋料理

Tomoro: Coaster

An izakaya is a typical drinking establishment in Japan, though they have always had more emphasis on food than the typical English or New Zealand pub. Like English pubs, and New Zealand cafes, some izakaya have recently started to modernize their menus, combining different styles and bringing foreign influence to traditional Japanese favorites to create new "modern Japanese" dishes. This is especially noticeable in competitive areas like Ginza, where izakaya have to differentiate themselves from the hundreds of other eating and drinking establishments in the area to attract customers.


Lobster Salad


Unlike Western pubs, much of the izakaya is private areas where you reserve a table, so more like what we would see as a restaurant, although many after-work groups use it as we would a pub. Some traditional izakaya have seats at the bar, where you interact with and are served by the owner, but more upmarket ones are strictly table service, with waiters and waitresses rushing about in response to bells at each table, making you imagine you are on an aeroplane.


Oden

Despite being in notoriously expensive Ginza at the heart of downtown Tokyo, a variety of dishes and several hours drinking at Tomoru came to approximately ¥6000 per head. As well as the crayfish salad, branded omelette and meaty morsels pictured, we also had several other mouth watering dishes washed down with wine and beer.


Okonomiyaki (savoury pancake) - お好み焼き

Okonomiyaki is native to the Osaka and Hiroshima areas of Japan. It is a pancake made with potato flour, and containing cabbage, and assorted other vegetables and meats, and in the Osaka variation, noodles. Once cooked, the pancake is topped with brown sauce, mayonnaise and bonito flakes or tiny flakes of nori (seaweed), which appear to dance as the heat rises around them. If you are a connoisseur of Korean pa-jeon and Vietnamese banh xeo, then okonomiyaki is a must try dish, along with takoyaki, small dough balls containing octopus that are often sold from carts on the street.


Okonomiyaki


This Okonomiyaki Special was ¥900 from a small store near Hiroshima station. We had another in a quick lunch stop at Shin-Osaka for ¥650.


Hiroshima Okonomiyaki Restaurant


You can easily make okonomiyaki at home following this recipe or many like it. Often at restaurants, you will be given a bowl of raw mix, and cook it youself on a hotplate at the table. At other restaurants, you might sit at a counter in front of the chef as they cook in front of you, as common in many types of Japanese restaurants, and probably the closest you'll come to "teppanyaki" style cooking in Japan.


Sunday, 18 March 2007

Adding some zest to Picasa's HTML export.


Google's Picasa is great for managing your photos. It even has an Export as HTML Page option, which lets you generate a webpage for your photos. But all of the page styles it generates are very simple - if all your photos aren't the same size and orientation, the result is messy.



There are some good web page designs around for photos. Lightbox 2.0 is one, but you have to code all the HTML pages by hand, which is tedious and error-prone.



Picasa lets you generate ugly web pages easily, and Lightbox 2.0 lets you generate good looking web pages with a lot of effort. Wouldn't it be great if there was a way to generate good looking web pages as easily as you can with Picasa?



When exporting from Picasa, you have the option to export the page as XML Code. This isn't very useful by itself, but with an appropriate stylesheet, you can easily transform it to anything you want. I wrote a stylesheet for converting to a Lightbox 2.0 style blog entry.



picasa-to-lightbox.xsl



Because I use this for blog entries, not complete webpages, there are some stylesheet definitions missing from the generated html, along with the rest of the html head section. See the Lightbox 2.0 webpage for details on what is needed.



To convert Picasa's generated index.xml using Xalan, I use the following command-line:



java -cp xalan2.jar org.apache.xalan.xslt.Process -IN index.xml -XSL picasa-to-lightblox.xsl -OUT blogpost.html



Leave a comment if you have any more Picasa tips.