I write this blog so that I can look back in years to come and see what I spent time thinking about. Computers have this nasty habit of sucking up time without you noticing. Recently I’ve been doing relatively low-brow stuff. But, for my own benefit, I want to record it so that I don’t redo this again from scratch next year.
I’ve recently been considering hosting my own website. Until now, I’ve paid a hosting company to do it for me, but a few things made me reconsider this. Firstly, I’ve been hosting my own email for a while now, and it has been more reliable that any commercial email provider. I can set up more sophisticated spam filtering this way, and diagnose faults without having to spend hours discussing it with a support department. The server runs on a low power fan-less mini-ITX board in my hall cupboard, and the only downtime so far has been due to me tripping off the power to the entire flat once or twice (even then it auto-reboots … I could plug it into a trickle-charged 12v motorbike battery and have a UPS).
So, hosting my own website would give me much more flexibility. I get hit by an awful lot of blog spam (to the extent where I’ve switched off comments for now). Hosting locally would give me direct access to the database which underpins my blog, which would make it easier to tidy up things. Also, I’d like to have direct access to the webserver logs, which is something my current provider doesn’t give. I’ve got a reasonably fast internet connection to my home which is idle most of the day, and so it seems a bit daft to pay data-transfer costs to a commercial web-hosting company when I’m already paying for unused data-transfer to my home. Finally, I already have a “server” in my cupboard and it could easily take the (light) load of running my website too.
I looked into running the webserver on a user-mode linux machine. It’s effectively like a linux-specific VMware. There were two reasons for this. Firstly, running the webserver on its own machine increases security a bit. If someone used an exploit against the webserver and gained root, I certainly wouldn’t want them to then have access to my email or whatever other services are running on that machine. That’s why I have a seperate server in the first place. UML makes it easy to have a seperate machine for each service you wish to expose, effectively sandboxing them, without buying more hardware. Secondly, running as a UML instance makes backup really easy. UML is really easy to run. You have an executable called “linux-2.6.9” and a second file which is the image for the root disk. You run the executable, and you see a new copy of linux booting within your existing one, mounting the root disk image and leaving you at a login prompt. It doesn’t require you to tweak your existing kernel at all – brilliant. So, to back up that virtual machine you tell it to briefly pause (or shutdown), take a copy of the kernel file and root disk file, and you’re done. My root disk for a Debian 3.0 system running Apache, MySQL and PHP compressed down to about 90Mb. I chose Debian because on a server, unlike on a developer machine where I choose Gentoo, I have no need for bleeding edge software or libraries.
Setting up Apache was easy, even though it’s been years since I last did this. Since I already needed a MySQL database for my blog, I added mod_log_sql to put all the access logs into a MySQL database. This was really overkill. I could see the module being very useful if you had a complicated multiple-VirtualHosts setup. But I was just doing it because I could .. and because I don’t really like Webalizer much. I like the idea of being able to phrase arbitary queries and do some data-mining. Plus, it gave me a chance to refresh my SQL knowledge from University.
There’s something very cute about the way you back up MySQL databases. Most applications, such as word processors, persist their data by writing a snapshot of their current state to disk. MySQL writes out a sequence of commands which, when played back, will rebuild the database. So the start of the dump file will be a “CREATE TABLE …” followed by a series of “INSERT INTO …” lines. This is quite elegant. Why invent an entirely new serialization format when you already have a language which is expressive enough to do everything you need?
Although I don’t deal with databases in my day-job, it’s quite an interesting field in some ways. It’s well accepted that separating data-storage from the rest of your application logic is a wise plan. But SQL-backed applications have a further advantage that, say, an XML-backed application doesn’t have. By making such a clean seperation in your application, you can leave the whole data-storage problem to someone else. There’s lots of really clever people who’ve figured out the best way to store and query big relational datasets – laying them out, and moving them between disk/main-memory/cache-memory in a pretty optimal way. As long as you can fit your data into the right shape, you can then magically take advantage of decades of cleverness. That’s a pretty impressive level of reuse.
On to the last part of the Linux/Apache/MySQL/PHP cluster: PHP. I spent some time looking through the source code for WordPress, my blog software. Blog software ought to be pretty simple. It’s just a glue layer which sucks data out of a database, munges it into HTML and sends it to a browser. But to my eyes, WordPress (and probably most PHP apps) are pretty dire. The code is pretty state-happy, with lots of imperative updating which wouldn’t be needed in a language with better building blocks. It’s a domain where people who think Perl is a fine language (and I mean that in a derogatory way) would be happy. But would I want these people to be writing secure e-commerce sites in this way?! I don’t want to think about that (because I know it’s true). I wasn’t impressed.
So, despite the fact that today I’m writing about setting up webservers, this brings me back to Philip Wadlers Links project. The aim of this project is to take the Good Stuff from the world of research, and apply it to make a Better Way to produce web applications. Whenever I started working with XML, I thought “Great, we have schemas which define the structure of the data .. that means we can integrate that with our language’s static type system”. Hah, no such luck in the Real World … but projects like CDuce are showing the way. Similarly, if you write a web application you need to juggle with the inside-out-ness of the programming model – you can’t just call a function to get some user input, because your code is being driven at the top level by the HTTP request/response model and you always need to return up to the top level. Continuation offer a possible solution to this, as a richer means of controlling the flow of a program, as Avi Bryant’s Seaside framework demonstrates. Today, if you are writing a web application you need to worry constantly about what happens if the user hits the “back” button, or reloads a page, or clicks “submit” twice when they’re making a payment. Perhaps in the future, with better building blocks, these things will come “for free”, and we can wave a fond farewell to a whole class of common web-app bugs.
Web-based applications have lots of advantage (and disadvantages too). I personally really like the “your customers are always using the latest version of the software” aspect. But a lot of today’s web technologies are rooted too much in a perl-hacker mindset. It may be that this is indeed a rewarding place to apply newer programming technologies. I still think the world will not be ready for the Links project for many years to come, but perhaps it will pave the way.
Oh, back to the original story. Having installed everything and got it all working, I flipped my DNS record to that www.nobugs.org went to my home box. But the next morning, I flipped it back. Why? At the end of the day, paying someone about 30UKP a year to host my site is pretty good value. I don’t really to be worrying about my website response time every time I downloading big files over the same link. And if my website ever gets cracked, I’d still rather it was on someone else’s LAN and not mine. Although it might seem like a waste of time to spend hours setting all this up and not use it, I know that I’ve learned lots of useful information and lessons. C’est la vie.