Apparently there's a recognised stage that children go through during the development of their language skills in which they start to experiment with syntax and grammar. Basically they make up their own words based on the grammar rules that they've absorbed.
Ethan came up with a good one recently: "funnin", meaning "having fun", e.g:
"Where did you go today Ethan?"
"Went funnin in the park"
Imagine my surprise when I discovered that it's actually in the Urban slang Dictionary. So now I'm waiting for:
"What are you doing Ethan?"
"I'm having a crazy ass time with nanny!"
It's amazing how kids just absorb words. I came home the other day, and Debs told me that Ethan has been riding around on his Thomas the Tank Engine scoot-along, pretending to go shopping for "tek-noh-low-gee". That's m'boy!
Interesting "small world" article in New Scientist this week ("Know Thy Neighbour", January 17 2004, Mark Buchanan), this time discussing how people and information can be located within a small world network.
The essay discusses Milgram's famous experiment in which he asked people to attempt to route a letter, via their contacts, to a given person. Most of the letters got their within a small number of hops and apparently the strategy that most people, quite naturally, adopted was along the lines of "Mr X (the end-point) works in the financial sector, who else do I know that works in that sector...". In essence people were comparing their contacts with what they know about the end point, categorising them into groups.
Groups are therefore an important feature of small world networks that are "searchable". Classifying nodes in this way allows your local knowledge of the network (your contacts) to help manipulate it. In the case of the Milgram experiment, that manipulation was to use people to route letters, however the New Scientist article suggests that the similar techniques could be used to benefit internet search engines.
Filippo Menczer at the University of Indiana is carrying out research in this area. His list of papers is online, and a quick surf through them makes interesting reading.
For example in Topical Web Crawlers: Evaluating Adaptive Algorithms (PDF) Menczer et al describe "topical crawlers" (emphasis mine):
Topical crawlers (also known as focused crawlers) respond to the particular information
needs expressed by topical queries or interest profiles. These could be the
needs of an individual user (query time or online crawlers) or those of a community
with shared interests (topical or vertical search engines and portals). Topical
crawlers support decentralizing the crawling process, which is a more scalable approach...An additional benefit is that such crawlers can be driven by a rich context (topics, queries, user profiles) within which to interpret pages and select the links to be visited.
In other words, the crawler can get away with indexing less pages as it's guided to the most relevant material by other cues. The paper Search Engine-Crawler Symbiosis: Adapting to Community Interests describes how a community search engine can improve web crawler performance and vice versa through learning the communities interests.
I seems to me that FOAF could play a role here: rather than solely rely on machine learning to discover information about a document and community interests, one could explicitly gather than data from the aggregated FOAF descriptions of that community.
E.g. Using a FOAF description one can not only determine the interests of a person linking to a given document, but one can also determine the interests of the author of that document, assuming there are appropriate links from the HTML to the FOAF description (cf: autodiscovery, and I Made This). There's even a grouping mechanism that can further help search agents to adapt their paths, using the same technique as Milgram's test subjects. Again the algorithm seems natural: if you want to learn more about a particular topic or event, you'd start looking at the websites, documents and blogs of people you know are interested in that area.
Cool stuff. Just wish I had the maths to understand it all properly!
I've had half an eye on the Share Your OPML as another possible source of FOAF data. I've been having scraping and converting various data sources to help faciliate interesting applications where possible. For instance this week I published some converted Freshmeat data, and in the past I've done various OPML conversions, e.g. for BloggerCon.
Making as much data available as possible also helps resolve specification issues, as we can then explore the pros and cons of various ways of modelling the data, especially in the light of implementation experience.
For example there's an open issue on the Wiki concerning subscriptions and related to this there's been some interesting work happening on community aggregations using FOAF group data. Planet RDF uses FOAF as it's data source, and while I'm not sure about Planets Apache and Gnome they're certainly publishing the data in this format.
So with this in mind I took at look at the Share Your OPML SDK. Unfortunately I can't convert it to any other format as the documentation notes that you must not:
...convert the data into a format other than OPML, for redistribution, it's likely we'll say yes, but you must ask first. We want the data to be useful to you, but we also want to create an installed base of compatible data to encourage others to follow. We've learned that it's necessary to say basically that you can't use this data to thwart the purpose of our project. We wish it weren't this way, but it is, so we have to say it.
Strikes me as a bit overly restrictive but I wouldn't want to be accused of not playing nicely, so I've asked. Perhaps there should be a preference which says "I'd like to share my data, no matter what the end format is"?
Lets see what happens...
Update: what happened what I'm not allowed to convert the file to FOAF. As Edd notes in the comments, my request is deleted. I'm updating this posting to record that fact. I've little more to say on the matter other than note that the best way you can share you data, whatever the format it's in, is to simply publish it on the web: there's no need for a central service as people can
discover your data in a number of different ways.
It also seems to me that if I use a service under the impression that I'm doing so under a Creative Commons share-alike licence, then I expect that service to offer the same terms, after all the licence notes that: If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. So any site that publishes this data, e.g. aggregating it with that of others should make the data freely available, and also that it's legal to transform that data, so long as the transformed version is provided under the same licence.
Tim Bray's Two Laws of Explanation are good reading. I've tried to ascribe to these myself wherever possible, and especially in my writing.
Of course these aren't new formulations, as they're really a modern version of John Locke's tabula rasa concept. See for example Some Thoughts Concerning Education:
One of the aspects of this philosophical view was the concept of people being born as "tabula rasa": a blank sheet, which was gradually filled in by experience. This may explain why Locke considered education an important activity that deserved careful consideration: education meant helping to fill that blank with knowledge and morals. Which in turn meant that the educator ought to take care to further such knowledge and morals, as would be useful both for the pupil himself and for the community as a whole.
We had a cracking storm this afternoon the sky went dark, thunder, lightning, the full monty. It didn't last that long, but shortly afterwards I heard some bangs and thuds from outside as it started to hail. Hard.

What you can see here fell in the space of a couple of minutes -- the time it took
me to pick up the camera and go to the back door. I didn't dare venture outside as the hailstones were the size of pennies; literally:

I think I'd have had a very sore head if I'd have gone outside. The noise from the old plastic roofing we've got on our out-house was deafening, I thought it was going to come down around my ears.
The storm didn't last long though, and there were no further incidents throughout the rest of the afternoon. I was just catching up with the news (I've had my head in my laptop writing all day) when I learned there was a tornado spotted on the Bristol channel this afternoon. Seems it was seen around 1.30pm. The date on my pictures was 1.53pm, so I'm guessing the dark skies, thunder and lightning must have roughly coincided with the tornado with the hail following shortly thereafter.
I was taken to task by my mother over Xmas. She'd been browsing my website during her lunch hour and had neglected to find any new photos, and precious few of her latest grandchild.
After setting aside thoughts that I'd slipped into an issue of The Onion I realised she was right, and that those dozens and dozens of images I've taken with my spangly new digital camera really ought to be published somewhere.
But I don't want to do it half-heartedly, I want to publish as much metadata as possible along with the images themselves. There's lots of fun to be had with co-depiction and rdf annotation.
But I'm essentially a lazy person so want a really, really simple way to publish and annotate the photos. So far I've been able to think of two, each with it's own merits.
JAlbum is a java application that generates online photos albums using a simple scripting language and templating system. It's straight-forward to customise it to spit out RDF instead of, or as well as HTML.
As JAlbum already understands EXIF data, it'd also be easy to generate additional metadata taken from the JPEGs themselves.
The second option is to use Moveable Type. This article, "Beyond The Blog", opened my eyes to a lot of ways to hack MT to be a more general CMS system. Using such a hack to generate RDF would also be straight-forward. Not quite as flexible as JAlbum but it could handle a good 80% of the functionality I want (descriptions, people, depicts).
Happily other people have been through a similar process and implemented RDF annotation solutions for both of these approaches. Phil Wilson has documented his RDF-skin for JAlbum; I've corresponded with Phil and suggested some improvements and alterations to the skin. Bryce Benton has also written up how he's doing RDF annotation with Movable Type.
I'm going to adopt the Movable Type approach, it's a tool I use pretty much daily and I think my wife would be happier using Movable Type for uploading and publishing the images. It's either that or a combination of JAlbum plus a SFTP session to upload every image and annotation.
So mum, if you're reading this, I've still not got any new photos online for you to look at, but at least I've selected the technology!
Some interesting discussion has been triggered by Jon Udell's comments on FOAF. I agree with Edd and Dan that FOAF is about more than social networking and have said as much here on several occasions. Personally I see two problems with FOAF neither of them big.
Firstly the name causes people to adopt certain expectations about it's intended usage particularly with general surge of interest (fad?) in social software. I certainly wouldn't advocate a name change but, as the exchange with Udell has demonstrated, we need to take care to present FOAF correctly.
The second problem is just about data. Because there is no central repository of FOAF data, it's harder to create FOAF applications: you either need to run a scutter yourself to collect up what's available, or generate FOAF out of the back-end of another site. Of course you can also hang out on #foaf and badger someone (e.g. Jim Ley or Matt Biddulph) to give you a data export; that's what I did.
I firmly believe that playing with the FOAF data that's out in the wild will generate the most interesting applications, and provide essential implementation feedback on the vocabulary itself.
So I'm going to try encouraging folk to regularly and visibly publish the results of their scutter runs. An "offical" data set hung of the FOAF homepage would also be useful. This should hopefully encourage the development of more FOAF applications.
Incidentally I mentally classify those applications as follows:
For me this classification separates out some of the implementation issues: a FOAF-consuming application doesn't typically have to worry about attribution, trust, etc. The data is coming from a limited number of sources. FOAF-gathering applications have to deal with a much more difficult set of problems.
Isn't it always the way that no matter how early you get into work, there's always one bright-eyed smug git who's there before you, eyeing their watch and pulling the universal facial expression of "call THIS early? I've been in for hours mate?!".
Well it's 6am in the morning and I'm in work. Not the usual state of affairs, I'm here to do shepherd in an application release. But today it was Murphy who got in before us, as the development machine holding the release scripts is down. Nice eh? Thank god for back-ups.
Doesn't bode well for the rest of the day though. Think I'm going to find a cupboard to hide in.
Anyway, hello from 6am. It's dark.