Marc Wick is probably not a name you will have come across in the media. After all, he’s not a world famous computer programmer and doesn’t hold a position at the renowned Massachusetts Institute of Technology. But despite a lack of a hi-tech pedigree, this self-employed software engineer who picked up a degree in computer science from the Federal Institute of Technology in Zurich before working as a software engineer for Siemens Transportation Systems, followed by a stint for some major Swiss banks, has developed something quite fascinating.
Launched towards the end of 2005, GeoNames is a free and open source geographical database. Designed primarily for developers wanting to integrate the project into web services and applications, it integrates worldwide geographical data including names of places in a host of languages, elevations, population, and any latitude/ longitude coordinate you could wish for.
Funded by donations, sponsors and by contributions in the form of data, know-how and time, GeoNames users are able to manually edit, delete, correct and add new names with a user-friendly wiki interface.
The project is already handling upwards of 4 million web service requests per day and its popularity is soaring with a number of organisations including the BBC, Microsoft Popfly (a mashup tool) Nike and Greenpeace, quietly discarding aggregation databases and turning to the new application.
Like many open source projects though, it all started accidentally when Marc needed to develop a holiday apartments application.
- Marc, tell me how you came to develop GeoNames?
- Well it was a solution to a problem, really. I needed geographical data for a project I was working on and simply couldn’t afford the commercially available data. So I had to start aggregating the sources that were freely available. Some months later Google released Google maps and I realized that I was not the only one in need of geographical data who couldn’t afford the commercial type. Commercial geographical data was and is very expensive, very detailed and complex to work with so I began developing something simpler which did the job for most applications. It seemed a waste of time and effort if so many people were doing the same thing and reinvent the wheel. I therefore decided to separate the geographical part of the application into a standalone project, polish it up and put it on a server as an open-data project.
- What languages, coding, scripting and software do you and the development team use in working on GeoNames?
- For the coding we mainly use java. The beauty of java is the tremendous number of fantastic open source libraries available for all different kind of tasks, like Lucene for search. We also use Postgres/PostGIS with PROJ.4 for coordinates reprojections and the FWTools set bundled and developed by Frank Warmerdam.
- Is GeoNames improving gazetteer standards do you think?
- I don’t really see much progress in the direction towards gazetteer standards with the exception perhaps of the wider use of the WGS84 datum. Because GPS is more and more important for the collection of data as it is for the users of the data many providers release their data using the WGS84 datum. The main problem for an aggregating gazetteer is still the unavailability of data. The more data is available the easier it is to map features and feature codes between data sources and the lack of standards could be overcome.
- What was the reasoning behind making the project open source. Did you do it to scale up quickly and being open source you must have a few contributors?
- It didn’t occur to me to not make it open source. It was a natural consequence to release it under a liberal license. It is data that should be freely available and the main work was already done when it was released. Of course it has developed a lot since then and it is hardly comparable to the first versions. GeoNames is aggregating data from the whole globe and it is impossible to do this without a lot of help from many contributors. To make it appealing for contributors certainly is a reason for a liberal license. The cc-by license was the best fit as it is nearly as free as public domain and the attribution concept is better compatible internationally than the public domain concept. It seems only fair to give credit when using someone else’s work and the visibility helps the project gain new users and contributors. The ‘cc’ brand is widely recognised, it may not be legally enforceable for data collections but it is anyhow not in our mind to enforce the ‘by’ clause. Also a ‘share-alike’ license was completely out of the question since I don’t consider ‘share-alike’ a free license and I don’t contribute to ‘share-alike’ projects.
- What is the biggest technical challenge involved in running GeoNames?
- The main challenge is to deal with a huge number of data providers and the absence of gazetteer standards. In Australia for instance, we cannot use the place name data provided by the Australian Government as it is not freely available. Instead we have to contact the authorities of the states to ask for their data. Three states Victoria, South Australia and New South Wales have so far offered their data to GeoNames. All of them are using different data formats and feature types and we have to process each source differently, also making sure we do not insert duplicates to the already existing locations.
- What are some examples of business/enterprise uses for GeoName technology? There must be one killer application out there that you would like to develop?
- Typical uses for GeoNames data are travel, real estate, classifieds or dating applications. But also large enterprises use it. Judging from my inbox nearly all large companies somehow related to technology use GeoNames in one way or the other. Really, GeoNames is a data provider and aggregator. It’s a lot of fun finding new ways and sources to extract geographical facts from data available on the web. I think of geocoded wikipedia articles, geocoded flickr images or the web itself. It would also be good to perfect the natural language geocoding and better understand the geographical information in normal unstructured text. Perhaps ‘killer application’ doesn’t really describe the examples I’ve just read out!
- Do you think the ubiquity of GPS enabled devices is driving most of the interest in GeoNames or is it more complex than this?
- In the beginning of GeoNames the integration of Google maps and other online mapping solutions into web applications has generated a lot of demand for geographical data.
In the meantime I think the growth trend has clearly shifted to location based services for GPS enabled devices. The iPhone in particular is spinning a lot of innovation in the consumer oriented sector. Online mapping solutions have lost the sex appeal they had some years ago, but they remain extremely useful for a lot of applications, of course.
- What future developments are you planning in GeoNames?
- We are currently talking to a couple of national and sub-national mapping providers who would like to work together with us and provide data tothe GeoNames project. We are also discussing co-operations with a couple of corporations.
- Where do you see GeoNames in, say, three years’ time?
- I’m optimistic that we will see more official, governmental sources make their data freely available. Authoritative Geodata in most countries with the exception of the US is not freely available and we have to make do with other sources and it will take some time because political processes always do and it will certainly take longer than three years to see a fundamental paradigm shift.
Economic reasons clearly speak for the US model where free access to data is driving many businesses and engendering new applications and technologies. In the age of GPS devices and navigation systems, people are using technical devices for navigation and the economic advantage for a country to release authoritative data freely will increase tremendously. Insisting on standards could be damaging though, the authorities should just release the data as-is in whatever format they have available for their own purposes.
They should not be forced to make it available under this or that format. Saying that, it is an intrinsic trait of all politicians to act for their vested interests and not for the benefit of their country.
Another factor that is slowing down development is the web 2.0 hype in recent years. I really hope that this is coming to an end and we can start thinking again about what makes really sense. For example how we organize the process of data collection effectively, efficiently and reliably. In the wake of web 2.0 crowd sourcing has become the panacea to all and everything, whether it makes sense or not.
I think this has also slowed down the political process of understanding the economical and cultural importance of the availability of geographical data.
For GeoNames itself there will be more ways to generate revenues to make the project independent, stronger, economically healthy and self-sustainable in the long run. I’m very much looking forward to it.