Science and Technology


11
Aug 08

Ants, Overlays and Open Data

Whilst standing behind the yellow line on the platform this morning, waiting for a train to Oxford, I noticed an ant on the floor wending its way along the tarmac, within the bounds of the thick yellow paint. The little black speck stood out quite sharply against the bright yellow.
Obviously the ant wasn’t following the line, but neither was it moving randomly. It was clearly following its own little invisible marker, an ant scent trail, that just happened to co-incide with the platform markings.
Last night BBC 1 showed Britain from Above an ariel view of Britain during a 24 hour period. The show had some great information visualisations of including traffic patterns for taxis, garbage collection, commuters, shipping, aircraft, as well as more static landmarks such as railway lines, electricity cables, water courses and telephone and network cabling. If you didn’t catch it the programme is definitely worth a watch.
It was this birds eye view of the world that lead me to reflect on that ant and it’s invisible trail. I wonder how many other layers of information could have been
added to the human-centric views shown in the programme? Animal migratory paths are an obvious one. Paths of dispersal, ranges and colonization are some others. It doesn’t take long to come up with many, many more.
The combinations of different paths and layers are also interesting to explore. Are many of these chance overlaps, like the ant on the paint or are there dependencies or inter-relations? For example how are migratory routes affected by no-fly zones or shipping lanes? Do migratory pathways begin to align with man-made features like roads and railways? And where have features like fish ladders and toad tunnels been introduced to avoid clashes between competing uses for the same space?
It’s doubtful that these kinds of questions will be answered in the rest of the series. Judging by the trailer for next week’s episode there seems to be a more of a “Pop geography” focus. (I’ll be tuning in regardless)
The truly exciting thing is that we can do this kind of exploration of layered information sources through map based visualizations ourselves using a huge, and growing, range of commodity tools and data sets.
Whilst watching the programme, what intrigued me more than the admittedly beautiful, animations were questions such as: how did they approach the
information holders in order to get permission to use it? What steps were made towards privacy and anonymity? For the BBC it’s going to be very easy to get access to all kinds of data. Not least because they have resources to spend, but also because their reputation proceeds them and the result of the sharing of data is immediate: “don’t you want to be on the telly”?
Open data advocates may do well to band together to form an organization that can become the focal point for activism and importantly trust. Such an organization could recommend best practices, including auditing of data for privacy results. It could also put together a showcase of the end results: creative visualizations of published data. It may be easier to approach data owners as a member or representative of such an collective, open, distributed, collegial organization than as an independent interested hacker.
But creating a compelling presentation is about more than just having the right technology and data. A good visualization tells a story. It’s through stories that data, really comes alive. The open data movement needs the involvement of strongly creative people as much as (and perhaps more than) technology people.
You need do be able to do more than animate a little black speck against a yellow band: where was that little ant going?


16
Dec 05

The Modern Palimpsest

The following is a brief summary of a talk I gave recently at the Ingenta Publisher Forum on the 28th November. The slides are available as a Powerpoint presentation.
In the presentation I tried to highlight some of the possibilities that could become available if academic publishers begin to share more metadata about the content they publish, ideally by engaging with the scientific community to expose “raw” data and results.

Continue reading →


25
Nov 05

Nature Quote

There’s a short article in Nature (subscribers only I’m afraid) this week about Google Base and its potential impacts on the science community. In particular whether it might galvanise greater data sharing between scientists.
I’ve been corresponding with Declan Butler, the author of the piece, on this and some related topics recently, and he ended up quoting me:


16
Nov 05

WebCite

Alf Eaton posts today to point to the new WebCite service. This is going to be very useful. Don’t think so? Well there’s plenty of research to show that link atrophy is a big problem in scientific literature:
Persistence of Web References in Scientific Research
See also: A study of missing Web-cites in scholarly articles: towards an evaluation framework which reports that “[a]fter evaluating 2162 bibliographic references it was found that 48.1% (1041) of all citations used in the papers referred to a Web-located resource. A significant number of references to URLs were found to be missing (45.8%)…


3
Nov 05

iSpecies and taxonomy (no, not that kind)

For the last few years I’ve been lurking on a mailing list run by the Taxonomic Databases Working Group. It’s a low volume list used by scientists interested in capturing and marking up taxonomies. That’s taxonomy in the Linnaean sense not the semantic web sense. I’ve been lurking there since I wrote this paper a while back proposing an XML format to replace a text based format that had been popular.
Yesterday on the list this interesting little mash-up was announced: ispecies.org. It works by searching NCBI, Yahoo images and Google Scholar to attempt to find relevant information on biological specis. Lions for example.
I found it interesting mainly because it is was one of the first mashups I’ve seen that aren’t combinations of the same old APIs (maps, music, bookmarks) but also because its clearly focused at a particular scientific community.
The author, Rod Page (apparently a big RDF fan) built this as an off-shoot of a wider project thats storing phylogenetic data as RDF. His site also has a Taxonomic Search Engine which federates a number of taxonomic name databases. Perform a search it links you to metadata about the organism. There’s a paper on the application on BioMedCentral.
Given an LSID (Life Sciences Identifier) it turns out you can get RDF metadata about the organism. Lions for example.
There’s a lot of interesting mash-up potential in this data, as well as that available from a few other projects in this area.
I’ve been keeping half an eye on this space recently, after reading this paper on how bioinformatic researchers are bumping into limits of XML and looking at RDF instead: “…the syntactic and document-centric XML cannot achieve the level of interoperability required by the highly dynamic and integrated bioinformatics applications“.
These guys have a lot of data that needs integrating and merging. Modern classification is about much more than the old Linnaean system. It has to be able to merge together data sources ranging from molecular biology through to field observations, and depending on what sources you draw on, and from what level, the tree of life can be draw quite differently.
The early web has pioneered in part by the needs of scientists exchanging research papers. It strikes me that “eScience” and bioinformatics may very well become the driving forces behind a more semantic web.


16
Sep 05

Information Aesthetics

I don’t normally do link blogging, but the information aesthetics blog is too cool not to share, where else can you read about an augmented reality kitchen, the gori node garden, or street clocks?
No attribution as I can’t remember where I discovered it. Quite possibly via oishii! which is often a source of my random browsing.


6
Jan 05

Konfabulator

Via Catalogablog I’ve just learnt that Konfabulator is available for windows. Looked interesting, so I installed it.
I’m in love.
Looking forward to seeing this del.icio.us based widget.


1
Sep 04

Working In A Small World

Stumbled over these musings on how small world theory applies to company organization. They’ve been languishing in my personal wiki for many months, thought I might as well post them as is.
Whilst reading the first few chapters of “Small World” by Mark Buchanan, I was fascinated by the work of Granovetter (see “The Strength of Weak Ties”). This basically highlights the fact that it is weak ties between individuals that are the important ones in a social network; not strong ties as one would expect. People with strong ties in common often have strong ties between them also, hence these links are less important than weak ties (acquaintances) as their removal has little effect on the structure of the graph (as measured in number of degrees between points). Previously descriptions I’ve read about small world phenomena have focussed on hubs/authorities which is a much less human-centric metaphor; quite rightly perhaps as “small worldism” isn’t tied to any particular type of graph, but it’s not very evocative.
This lead me to thinking about relationships within companies. Exploiting social networks to find work, etc seems well explored, indeed it’s behind the current drive for many of the social networking sites and applications that are springing up at the moment. Work relationships seems like a different framework within which to explore the small world phenomena. Or at least it’s the one that occured to me whilst washing up after dinner.

Continue reading →


31
Aug 04

Champernowne’s Constant

Whilst reading von Baeyers ‘Information’ recently, I came across the following fun mathematical tidbit which I thought was worth sharing. Mainly because I couldn’t find many references to it elsewhere on the ‘net.
In the chapter on “Randomness”, von Baeyer introduces several definitions of the term “random”, iteratively showing how each is slightly flawed. Considering a binary sequence of digits, the first definition describes a random number as one in which there is no pattern to the series of 1’s and 0’s. However a sequence such as 000110000100 is not random as it has an unequal proportion of the binary digits. A slightly improved definition is one which states that the numbers of each digit are approximately equal. But not only that: there combinations of the two digits (00, 01, 10, 11) must also occur in roughly equal proportions. And so on for combinations of three, four, five digits. Sequences that meet this restriction are apparently known as “normal numbers”.
The first explicit (rather than theoretical) example of a normal number is Champernowne’s Constant which was produced (discovered?) in 1933. David Champernowne pointed out that if one starts with zero, then one then string together all possible pairings, then all eight triples, an so on you end up with a number which must, by construction, contain all possible patterns, and is therefore “normal”.
Von Baeyer then points out that this number in its binary form is “a fabulous object. Using Morse code, or some other translation of zeroes and ones into typographical symbols, it can be transformed into a string of letters, spaces and punctuation marks. Since every conceivable finite sequence of words is buried somewhere in the string’s tedious gobbledygook, every poem, every traffic ticket, every love letter and every novel ever written, or ever to be composed in the future is there in that string…You may have to travel out along the string for billions of light years before you find them, but they are all in there somewhere….” (pp101-102).
So who needs a million chimpanzees with typewriters? Distributed computing project anyone?


16
Jan 04

Searching Small Worlds

Interesting “small world” article in New Scientist this week (”Know Thy Neighbour”, January 17 2004, Mark Buchanan), this time discussing how people and information can be located within a small world network.
The essay discusses Milgram’s famous experiment in which he asked people to attempt to route a letter, via their contacts, to a given person. Most of the letters got their within a small number of hops and apparently the strategy that most people, quite naturally, adopted was along the lines of “Mr X (the end-point) works in the financial sector, who else do I know that works in that sector…”. In essence people were comparing their contacts with what they know about the end point, categorising them into groups.
Groups are therefore an important feature of small world networks that are “searchable”. Classifying nodes in this way allows your local knowledge of the network (your contacts) to help manipulate it. In the case of the Milgram experiment, that manipulation was to use people to route letters, however the New Scientist article suggests that the similar techniques could be used to benefit internet search engines.

Continue reading →