Edinburgh Software Companies

I finally made my oft-promised list of Edinburgh software companies. The general consensus so far is “wow, Edinburgh is smaller than I thought”. Exhaustively enumerating the companies has kind of reduced the scope for “gosh, maybe there’s a really cool company I’ve not heard about”. Still, I’m almost certainly still missing a few companies at the moment .. feedback welcome. Please tell people about this resource – it’ll help to make it exhaustive.

Predictably, soon there’ll be a (free) map of Edinburgh showing you where everyone is located. Only the “where is each postcode” information is copyrighted by the Royal Mail. Sigh.

Maps Revisited

I finally figured out the shearing problem with my map of Edinburgh. The GPS receiver outputs longitude/latitude coordinates, and the viewing software was defaulting to using a lon/lat projection to display them onscreen. However, if you buy a streetmap of Edinburgh it’ll probably use the “British Grid” projection, which is quite different. There’s just so many ways to project the surface of a sphere/ellipsoid onto a 2d screen. Using the raw lon/lat coordinates as your x/y pixel positions isn’t very helpful. While lines of latitude (the ones which run N/S) are all the same length, lines of longitude get shorter as you head further north. So by the time you get to Edinburgh (roughly 56 degrees north), the lines on longitude are cos(56 degrees), or roughly half as long as the equator (thank you Frink). So, near Edinburgh, 1/1000th degree of latitude near Edinburgh corresponds to 111m north/south, but 1/1000th degree of longitude corresponds to a mere 62m west/east. Hence my first attempt at mapping was stretched horizontally.

So, just switch the viewing projection to Transverse Mercator, which has the desired “up and across both use the same scale” property, and suddenly the map looks comfortingly like every other map of Edinburgh I’ve ever seen.

However, this necessitated a change of software. JUMP doesn’t support alternative viewing projections, and neither does QGIS. So for now I’m using an evaluation copy of GlobalMapper.

Now to add some depth. A few years ago, the space shuttle flew around a bit measuring the height of earths surface at a resolution of about 100m. And the Americans, being much cooler than the UK government, make all the data freely available. So you can just download a large GeoTIFF file from their website for your neighbourhood (Scotland, in my case). A GeoTIFF file is just a 16-bit greyscale with some metadata to tell you what part of the earth it corresponds to, it’s projection and stuff like that.

The results can be seen in the following images: Edinburgh where Arthurs Seat, Calton Hill and the Castle Rock can be easily seen, Scotland showing all the neaby hills, and a profile of the slope as you head up Leith Walk (cyclists will recognise the nasty steep bit at the end).

The height data isn’t terribly precise. For a start, the shuttle will have measured the height of the tallest buildings rather than the ground level! However, in practical terms it’s accurate enough to feed into a “cyclists routefinder” application, whereby you could request “find me a route from A to B avoiding steep hills”. Very important use.

I’m now working on producing semantic data about the roads. For example, “Princes St joins the Mound at coordinates x,y but cars cannot turn eastwards onto Princes St”. I’m also inquiring about the price of aerial photographs and satellite data, since that’d be a quick and easy way of making a map (although less fun than the GPS route).

Oh, turns out there’s going to be an article about free-as-in-speech map making in New Scientist in a couple of days times.

Apache + MySQL + PHP + not

I write this blog so that I can look back in years to come and see what I spent time thinking about. Computers have this nasty habit of sucking up time without you noticing. Recently I’ve been doing relatively low-brow stuff. But, for my own benefit, I want to record it so that I don’t redo this again from scratch next year.

I’ve recently been considering hosting my own website. Until now, I’ve paid a hosting company to do it for me, but a few things made me reconsider this. Firstly, I’ve been hosting my own email for a while now, and it has been more reliable that any commercial email provider. I can set up more sophisticated spam filtering this way, and diagnose faults without having to spend hours discussing it with a support department. The server runs on a low power fan-less mini-ITX board in my hall cupboard, and the only downtime so far has been due to me tripping off the power to the entire flat once or twice (even then it auto-reboots … I could plug it into a trickle-charged 12v motorbike battery and have a UPS).

So, hosting my own website would give me much more flexibility. I get hit by an awful lot of blog spam (to the extent where I’ve switched off comments for now). Hosting locally would give me direct access to the database which underpins my blog, which would make it easier to tidy up things. Also, I’d like to have direct access to the webserver logs, which is something my current provider doesn’t give. I’ve got a reasonably fast internet connection to my home which is idle most of the day, and so it seems a bit daft to pay data-transfer costs to a commercial web-hosting company when I’m already paying for unused data-transfer to my home. Finally, I already have a “server” in my cupboard and it could easily take the (light) load of running my website too.

I looked into running the webserver on a user-mode linux machine. It’s effectively like a linux-specific VMware. There were two reasons for this. Firstly, running the webserver on its own machine increases security a bit. If someone used an exploit against the webserver and gained root, I certainly wouldn’t want them to then have access to my email or whatever other services are running on that machine. That’s why I have a seperate server in the first place. UML makes it easy to have a seperate machine for each service you wish to expose, effectively sandboxing them, without buying more hardware. Secondly, running as a UML instance makes backup really easy. UML is really easy to run. You have an executable called “linux-2.6.9” and a second file which is the image for the root disk. You run the executable, and you see a new copy of linux booting within your existing one, mounting the root disk image and leaving you at a login prompt. It doesn’t require you to tweak your existing kernel at all – brilliant. So, to back up that virtual machine you tell it to briefly pause (or shutdown), take a copy of the kernel file and root disk file, and you’re done. My root disk for a Debian 3.0 system running Apache, MySQL and PHP compressed down to about 90Mb. I chose Debian because on a server, unlike on a developer machine where I choose Gentoo, I have no need for bleeding edge software or libraries.

Setting up Apache was easy, even though it’s been years since I last did this. Since I already needed a MySQL database for my blog, I added mod_log_sql to put all the access logs into a MySQL database. This was really overkill. I could see the module being very useful if you had a complicated multiple-VirtualHosts setup. But I was just doing it because I could .. and because I don’t really like Webalizer much. I like the idea of being able to phrase arbitary queries and do some data-mining. Plus, it gave me a chance to refresh my SQL knowledge from University.

There’s something very cute about the way you back up MySQL databases. Most applications, such as word processors, persist their data by writing a snapshot of their current state to disk. MySQL writes out a sequence of commands which, when played back, will rebuild the database. So the start of the dump file will be a “CREATE TABLE …” followed by a series of “INSERT INTO …” lines. This is quite elegant. Why invent an entirely new serialization format when you already have a language which is expressive enough to do everything you need?

Although I don’t deal with databases in my day-job, it’s quite an interesting field in some ways. It’s well accepted that separating data-storage from the rest of your application logic is a wise plan. But SQL-backed applications have a further advantage that, say, an XML-backed application doesn’t have. By making such a clean seperation in your application, you can leave the whole data-storage problem to someone else. There’s lots of really clever people who’ve figured out the best way to store and query big relational datasets – laying them out, and moving them between disk/main-memory/cache-memory in a pretty optimal way. As long as you can fit your data into the right shape, you can then magically take advantage of decades of cleverness. That’s a pretty impressive level of reuse.

On to the last part of the Linux/Apache/MySQL/PHP cluster: PHP. I spent some time looking through the source code for WordPress, my blog software. Blog software ought to be pretty simple. It’s just a glue layer which sucks data out of a database, munges it into HTML and sends it to a browser. But to my eyes, WordPress (and probably most PHP apps) are pretty dire. The code is pretty state-happy, with lots of imperative updating which wouldn’t be needed in a language with better building blocks. It’s a domain where people who think Perl is a fine language (and I mean that in a derogatory way) would be happy. But would I want these people to be writing secure e-commerce sites in this way?! I don’t want to think about that (because I know it’s true). I wasn’t impressed.

So, despite the fact that today I’m writing about setting up webservers, this brings me back to Philip Wadlers Links project. The aim of this project is to take the Good Stuff from the world of research, and apply it to make a Better Way to produce web applications. Whenever I started working with XML, I thought “Great, we have schemas which define the structure of the data .. that means we can integrate that with our language’s static type system”. Hah, no such luck in the Real World … but projects like CDuce are showing the way. Similarly, if you write a web application you need to juggle with the inside-out-ness of the programming model – you can’t just call a function to get some user input, because your code is being driven at the top level by the HTTP request/response model and you always need to return up to the top level. Continuation offer a possible solution to this, as a richer means of controlling the flow of a program, as Avi Bryant’s Seaside framework demonstrates. Today, if you are writing a web application you need to worry constantly about what happens if the user hits the “back” button, or reloads a page, or clicks “submit” twice when they’re making a payment. Perhaps in the future, with better building blocks, these things will come “for free”, and we can wave a fond farewell to a whole class of common web-app bugs.

Web-based applications have lots of advantage (and disadvantages too). I personally really like the “your customers are always using the latest version of the software” aspect. But a lot of today’s web technologies are rooted too much in a perl-hacker mindset. It may be that this is indeed a rewarding place to apply newer programming technologies. I still think the world will not be ready for the Links project for many years to come, but perhaps it will pave the way.

Oh, back to the original story. Having installed everything and got it all working, I flipped my DNS record to that www.nobugs.org went to my home box. But the next morning, I flipped it back. Why? At the end of the day, paying someone about 30UKP a year to host my site is pretty good value. I don’t really to be worrying about my website response time every time I downloading big files over the same link. And if my website ever gets cracked, I’d still rather it was on someone else’s LAN and not mine. Although it might seem like a waste of time to spend hours setting all this up and not use it, I know that I’ve learned lots of useful information and lessons. C’est la vie.

Making a map

As mentioned before, I am interested in producing a free/non-copyrighted map of Edinburgh. There are several reasons for this, but the main motivation is ideological. Information about the city I live in ought to be free. It’s *our* city. Information about our city ought to be a public asset. The Ordnance Survey keeps a very tight hold on its data, and charges lots of money for it, despite being a department of our own government. This situation is unlikely to change, unless some crazy geeks bypass the whole establishment and produce their own (totally non-derived) map data. That’s the ideology.

The second reason is more pragmatic. If I want to find where someone lives, I can look at multimap. However, if I’m writing trying to write a route-finder computer program then I need the data about roads/junctions in a form which my program can process. Multimap doesn’t help me with this. So, a secondary benefit of producing a map myself is that I can annotate with with metadata (like, streetname, one-way status, steepness of hill) in a form that a computer can understand. Additionally, any other location-related computerized data sources (such as postcode regions, location of wifi hotspots, or pollution measurements) can be meshed together with the map data.

There are three methods that can be used to produce a map. The classical way is to perform laborious ground surveys. That’s soo yesterday! The more modern way is to use satellite imagery or aerial photographs as a starting point, and trace roads/buildings manually or with computer assistance. While high-res satellite imagery is available to the public, it’s expensive and so I discounted that option (for now). The third option, which I’m looking at just now is to use a handheld GPS system and gather trails as I walk/cycle/motorbike around the city.

I wasn’t sure how well GPS would work in the city. There are a number of GPS satellites orbiting around the earth, each broadcasting a time signal. If your handset can see enough of these satellites, it can figure out its longitude and latitude to some degree of accuracy. In the open, accuracy is typically to within 10m, but in a city you often don’t have a good view of the sky and accuracy suffers. An accuracy of 10m doesn’t sound great, but consider that most roads are probably 10m wide so it’s not too bad.

So, I borrowed my brother’s Etrex GPS system and carried it around as I travelled round the city. The GPS handset shows you a graphical view of where you’ve been, and this was enough to confirm that GPS probably did work well enough in the city.

Next step was to get the data onto my PC for processing. GPSbabel took care of downloading the data from the handset into GPX, which appears to be the preferred interchange format for GPS data. I then converted this into the shapefile format, which is a format for vector data commonly accepted by GIS systems. GIS systems are usually hulking great beasts of software, designed to slurp in terabytes of satellite imagery, vector roadmaps, elevation data and the like, and allow you to query it efficiently. However, many GIS systems are obscure and have a steep learning curve. After looking through lots of options, I settled on the JUMP project as being the most hopeful candidate. It happily imported my raw shapefile/GPS data, and I was able to generate a simple map layer from the data and annotated the roads with attributes like “steetname”.

And so …. *drumroll* … here is the beginings of what will hopefully turn into my free Edinburgh streetmap.

Now, there are still some issues to be resolved here. The map data has been sheared at some point on its journey into the JUMP system. If you are familiar with Edinburgh, you’ll know that the roads which join Princes St should all join at right-angles, which isn’t the case in the above map. I imagine that there’s some disagreement about coordinate systems somewhere. The cause will doubtless be blindingly obvious after I’ve figured out what is going wrong, but this is all part of the learning curve.

So, this represents a pretty succesful spike solution. I’ve done a pretty minimal amount of work to establish that the GPS method works, and that the software exists to allow me to make a pretty reasonable map. Now, I might actually start gathering data a bit more seriously, and see about organising a bit of infrastructure to allow other similarly minded people to contribute GPX trails of their own. I’ll also see about integrating SRTM elevation data (which was gathered on a space shuttle mission) to provide height data – although it’s only on a 100m grid, and the presence of tall buildings will cause problems.

Parsing and stuff

I’ve been busy doing lots of different stuff recently (going to my first ever wedding, recording my first ever CD). But in the computer world, I got a few emails last week from people asking about C++ parsing, so I updated my Parsing C++ page to bring it up to date. Whilst updating it this morning, I realised that I make a strong distinction between “real problems” which are interesting and worthwhile to solve, as opposed to other kinds of problems which are just annoying and timeconsuming. C++ parsing falls in the latter category, which is surprising since I’d personally use a C++ parsing toolkit quite a lot. But it should be a non-problem. Languages should be designed so that they’re easy for humans and computers to parse. End of story. A language which fails this test is plain bad. Writing glue code between API’s is another annoying problem. I’d rather spend my time on Real problems. But what do I mean by “real” problems in programming? I’m not entirely sure myself. Problems which are not specific to a particular programming language, or to a particular hardware architecture, I guess. But at the same time, I’m interested in using computers as tools to achieve some purpose – learn about electromagnetism for example. I think I’ve just got to a point in my life where I’m looking back over what I’ve done in the past 10 or so year, and planning how to spend the next 10 years, and wanting to make sure I actively choose what to do rather than passively ending up doing something.

I enjoyed Kim’s summary of the Composable Memory Transactions paper. I like these kind of mini-summaries of papers, since my list of “papers to read” grows faster than I can keep up with. To most people, “concurrency” means threads and mutexes. This is a horribly primitive way of approaching concurrency – notoriously difficult for us puny humans to get right, and little better than a house of cards. Higher level models such as CSP are much better for building safe, stable programs. I don’t think I’d really explictly understood the importance of this kind of composability before, but clearly it’s a vital ingredient in building larger and larger systems.

I’ve also discovered the dirtsimple blog, which has provided lots of interesting reading.