Categories
General

Making a map

As mentioned before, I am interested in producing a free/non-copyrighted map of Edinburgh. There are several reasons for this, but the main motivation is ideological. Information about the city I live in ought to be free. It’s *our* city. Information about our city ought to be a public asset. The Ordnance Survey keeps a very tight hold on its data, and charges lots of money for it, despite being a department of our own government. This situation is unlikely to change, unless some crazy geeks bypass the whole establishment and produce their own (totally non-derived) map data. That’s the ideology.

The second reason is more pragmatic. If I want to find where someone lives, I can look at multimap. However, if I’m writing trying to write a route-finder computer program then I need the data about roads/junctions in a form which my program can process. Multimap doesn’t help me with this. So, a secondary benefit of producing a map myself is that I can annotate with with metadata (like, streetname, one-way status, steepness of hill) in a form that a computer can understand. Additionally, any other location-related computerized data sources (such as postcode regions, location of wifi hotspots, or pollution measurements) can be meshed together with the map data.

There are three methods that can be used to produce a map. The classical way is to perform laborious ground surveys. That’s soo yesterday! The more modern way is to use satellite imagery or aerial photographs as a starting point, and trace roads/buildings manually or with computer assistance. While high-res satellite imagery is available to the public, it’s expensive and so I discounted that option (for now). The third option, which I’m looking at just now is to use a handheld GPS system and gather trails as I walk/cycle/motorbike around the city.

I wasn’t sure how well GPS would work in the city. There are a number of GPS satellites orbiting around the earth, each broadcasting a time signal. If your handset can see enough of these satellites, it can figure out its longitude and latitude to some degree of accuracy. In the open, accuracy is typically to within 10m, but in a city you often don’t have a good view of the sky and accuracy suffers. An accuracy of 10m doesn’t sound great, but consider that most roads are probably 10m wide so it’s not too bad.

So, I borrowed my brother’s Etrex GPS system and carried it around as I travelled round the city. The GPS handset shows you a graphical view of where you’ve been, and this was enough to confirm that GPS probably did work well enough in the city.

Next step was to get the data onto my PC for processing. GPSbabel took care of downloading the data from the handset into GPX, which appears to be the preferred interchange format for GPS data. I then converted this into the shapefile format, which is a format for vector data commonly accepted by GIS systems. GIS systems are usually hulking great beasts of software, designed to slurp in terabytes of satellite imagery, vector roadmaps, elevation data and the like, and allow you to query it efficiently. However, many GIS systems are obscure and have a steep learning curve. After looking through lots of options, I settled on the JUMP project as being the most hopeful candidate. It happily imported my raw shapefile/GPS data, and I was able to generate a simple map layer from the data and annotated the roads with attributes like “steetname”.

And so …. *drumroll* … here is the beginings of what will hopefully turn into my free Edinburgh streetmap.

Now, there are still some issues to be resolved here. The map data has been sheared at some point on its journey into the JUMP system. If you are familiar with Edinburgh, you’ll know that the roads which join Princes St should all join at right-angles, which isn’t the case in the above map. I imagine that there’s some disagreement about coordinate systems somewhere. The cause will doubtless be blindingly obvious after I’ve figured out what is going wrong, but this is all part of the learning curve.

So, this represents a pretty succesful spike solution. I’ve done a pretty minimal amount of work to establish that the GPS method works, and that the software exists to allow me to make a pretty reasonable map. Now, I might actually start gathering data a bit more seriously, and see about organising a bit of infrastructure to allow other similarly minded people to contribute GPX trails of their own. I’ll also see about integrating SRTM elevation data (which was gathered on a space shuttle mission) to provide height data – although it’s only on a 100m grid, and the presence of tall buildings will cause problems.

Categories
General

Parsing and stuff

I’ve been busy doing lots of different stuff recently (going to my first ever wedding, recording my first ever CD). But in the computer world, I got a few emails last week from people asking about C++ parsing, so I updated my Parsing C++ page to bring it up to date. Whilst updating it this morning, I realised that I make a strong distinction between “real problems” which are interesting and worthwhile to solve, as opposed to other kinds of problems which are just annoying and timeconsuming. C++ parsing falls in the latter category, which is surprising since I’d personally use a C++ parsing toolkit quite a lot. But it should be a non-problem. Languages should be designed so that they’re easy for humans and computers to parse. End of story. A language which fails this test is plain bad. Writing glue code between API’s is another annoying problem. I’d rather spend my time on Real problems. But what do I mean by “real” problems in programming? I’m not entirely sure myself. Problems which are not specific to a particular programming language, or to a particular hardware architecture, I guess. But at the same time, I’m interested in using computers as tools to achieve some purpose – learn about electromagnetism for example. I think I’ve just got to a point in my life where I’m looking back over what I’ve done in the past 10 or so year, and planning how to spend the next 10 years, and wanting to make sure I actively choose what to do rather than passively ending up doing something.

I enjoyed Kim’s summary of the Composable Memory Transactions paper. I like these kind of mini-summaries of papers, since my list of “papers to read” grows faster than I can keep up with. To most people, “concurrency” means threads and mutexes. This is a horribly primitive way of approaching concurrency – notoriously difficult for us puny humans to get right, and little better than a house of cards. Higher level models such as CSP are much better for building safe, stable programs. I don’t think I’d really explictly understood the importance of this kind of composability before, but clearly it’s a vital ingredient in building larger and larger systems.

I’ve also discovered the dirtsimple blog, which has provided lots of interesting reading.

Categories
General

Cuckoo pointers

A cuckoo pointer is a particularly nasty kind of bug to track down in a C/C++ program. I’ve only encountered two of them in the eleven or so years I’ve been using these languages, but when they occur it can be very difficult to understand what is going on. A cuckoo pointer starts life as an old fashioned dangling pointer – a reference to a object on the heap which has been deallocated. It becomes a cuckoo pointer if you are unlucky enough to allocate an object of the same type, at the same address as the old one. Got that? You’ve got some buggy code which hasn’t realised that the old object is now dead, but now suddenly down plops a new object right in its place.

Normally dangling pointers aren’t too tricky to track down. If you write to memory using one, you’ll almost certainly spectacularly trash a bit of memory and your program will die. But in the cuckoo pointer scenario, the write to memory will probably kinda make sense. If the heap object was a string, the cuckoo pointer will be doing string-like operations to the memory, and your program will probably continue running for a while.

You might argue that the chances of a new object of the same type being allocated at exactly the same address are pretty slim. That’d be true, up to a point, but if your heap implementation is using a block-allocator strategy, the chances shoot right up. A block allocator means that allocations of blocks of 8 bytes will come from a special pool of contiguous 8-byte blocks. There’s a seperate pool for each size of blocks – 2, 4, 8, 16, 32 – up to some maximum size. This is a fast way of allocating memory, because the regular structure means that only minimal bookkeeping is required. But under this scheme, the chances of an new object landing at the same address as a recently-freed object of the same type are much increased.

Why are cuckoo pointers so hard to track down? It is because your application will continue executing for a while after the initial error (the dangling pointer). The application will potentially crash a good while after that error, although at least the crash is likely to be in an object of the same type. It’s not too unlike a data race, where two threads are writing to the same bit of memory. Here you have two objects – one real, one not – existing in the same bit of memory.

How do you track down cuckoo pointers? Well, the first step is to realise that they exist! Then you can switch on the heap’s “do not reuse free’d memory blocks” option and see if that changes the behaviour of the program at all.

Categories
General

Words

I like discovering the story behind words. A few months, I realised that “mer” was greek for “parts” and so “polymer” and “monomer” just meant “many parts” and “one part”. Words which come from other languages, and any “native” words, must themselves have ultimately just been made up by someone at some point. The problem is that the birth of most words is not recorded. You might be able to guess why the word was chosen (like, “anteater”) but I think it’s unusual to be able to pinpoint exactly when a word first entered the language.

I’ve been reading a recent biography of Michael Faraday, who did a lot of important early work in electromagentism, and from this book I read the following historic nugget. By 1831, a lot had been discovered about the nature of electromagnetism, but the language used to describe the phenomena hadn’t caught up. People were often using analogies to water – they talked of “electrical fluid – but this analogy could be confusing. Faraday started a correspondance with Revd William Whewell at Cambridge Uni, which I’ll paraphrase:

FARADAY: “I think we need some names for the terminals of a battery. I’ve came up with: exode, zetode, zetexode but I also think westode and eastode are pretty catchy too”

WHEWELL: “Hmm, that’s a bit of a mouthful. How about anode and cathode? Pretty solid greek background for those words”

FARADAY: “Cheers Will, but the guys down at the pub weren’t very convinced by anode and cathode. They laughed at my poor use of greek”

[ FARADAY goes away and writes a paper using the terms DEXIODE and SKAIODE instead ]

WHEWELL: “Let me give you a quick crash course in greek, and you can tell your mates where to stick their criticisms”

FARADAY: “Y’know Will, I think you might just be right after all”

And that’s how the words “anode” and “cathode” first entered the language of electricity. That’s why we have “cathode ray tubes”. Not content with that, Faraday and Whewell went on to add the words “ion”, “dielectric” and also “diamagnetic” and “paramagetic” to the language, all terms which are still used today when describing electricity and magnetism.

Categories
General

The Future

I always liked Technetcast and now I’m listenting to it’s spiritual descendent, IT Conversations. It’s a collection of presentations and lectures from various conferences around the world, all in mp3 format. So I copy them onto my mp3 player, and I get to listen to Steve Wozniak telling his life story while I’m at the gym, or Stephen Wolfram talking about physics as I cycle somewhere.

This is the kind of thing that The Future always promised when I was younger. I can have video chats with my family over the internet. I can listen to audio lectures anywhere I want on a tiny portable device. I can check my email using GPRS halfway down a mountain biking trail. I can watch a degree level course on electromagnetism from MIT whenever I want, from the comfort of my own flat.

Okay, so we don’t have teleporters yet. And hard AI didn’t do so well. But, all things considered, the Future is doing pretty well.