If anyone were to ask me to write a corporate blogging policy, then this would be it: "Disclose or disclaim but don't stay silent".
Disclosure is very important since it helps the reader make up their own mind. It helps them judge the intent behind the authors' words.
Here is the best example of disclosure I have ever seen. It was published in the Dear Mary section of the The Spectator magazine where readers can ask for help from Mary Killen to solve their problems. Someone wrote in to ask what to do when the person in front of you in an aeroplane reclines their seat leaving you no space of your own. A fellow reader offered this advice:
"May I suggest you advise J.B. of London N1 that next time he is travelling long-haul he should fly Cathay Pacific, whose economy-class seats have a rigid back-shell which does not recline into the space of the passenger behind. The recline is achieved by the seat ingeniously sliding forward instead. Cathay Pacific also has four flights daily from London to Hong Kong, and up to two daily onward flights to New Zealand — all at very competitive fares! I apologise for the commercial, but we’ve gone to great trouble to prevent just the problem JB complains of, and I can’t resist the opportunity to point this out. The problem you could solve for me is the collapse of air-travel demand! Any bright ideas?
T.T., Cathay Pacific"
Even though this is an advert for Cathay Pacific, the full disclosure by T.T. lets you make up your own mind whether to accept it or not. There is no hidden agenda and the facts speak for themselves.
Thursday, July 23, 2009
How to archive the web
The following are my notes and thoughts from the Web Archiving Conference held at British Library on July 21st, 2009.
The meeting was organized jointly by JISC, DPC and UK Web Archiving Consortium and attracted more than a 100 participants. The meeting was chaired by William Kilbride, Exec Director of DPC and Neil Grindley, programme manager for digital preservation for JISC. The presentations are available here.
Adrian Brown of UK Parliamentary Archives raised the interesting issue of how to preserve dynamic websites, ones that personalize on the fly. If every page on a website is individually created per user, then what version do you archive?
He also talked about versions across time. For instance, what is the best way to archive a wiki? Take a snapshot every so often or archive the full audit trail? Versioning is an issue when a site is harvested over a period of time so that there is a chance the site has been updated in-between harvests. Something he called a lack of temporal cohesion or temporal inconsistency.
Someone from the BBC noted that: "the BBC used to only record the goals in football matches and not the whole match" Now they realize how stupid this was - hence we should avoid the same pitfall by applying too much collection decision-making to archiving. This touches on one of the main issues facing web archivists: what to collect and what to discard? Most seem to make this decision on pragmatic grounds e.g. do we have permission to crawl or archive? how much budget do we have? do we have a mandate to collect a particular domain?
It strikes me that this is only a problem when there is a single collection point. The reality is that all sorts of people all over the world are archiving the web from multiple different perspectives all at the same time. If enough people / organizations do this then all of the web will be archived somewhere, sometime. So for instance, if there was a referee foundation archiving football matches for training purposes, and a football coaching organization, and the two clubs playing, then it wouldn't matter that BBC only saved the goals. The problem was that the BBC were the only ones filming the matches - a single collection point.
This touches on another main issue: the relationship between the content creator and the archivist. More on that later.
Peter Murray-Rust was quoted several times during the meeting. This is intriguing since he mostly seems to advocate against building digital archives which he thinks are effectively impossible and a waste of time. Instead we should disseminate data as widely as possible. If people are interested enough they will take copies somehow. Or as he puts it "Create and release herds of cows, not preserve hamburgers in a deep-freeze". The wider point here is that web archives should be part of the web themselves rather than hidden away in offline storage systems.
Another big issue here: access. If the archive is fully accessible then how do you know that what you find through Google is the archived version or the live version? Suppose there are multiple copies of the entire web archived through different institutions all accessible at the same time? Sounds like chaos to me. A chaos that only metadata can solve. Or so it seems to me.
I think it would help if there were metadata standards for archiving of websites. It could be a minimum set of data that is always recorded along with the archived contents. Archives could then be made interoperable either by using the same metadata schema or by exposing their metadata in some sort of data dictionary that is addressable in a standard way. If the standards are adhered to it would be possible to de-duplicate archived websites and easily identify the "live" version. It would also be easy to keep track of the versions of a website across time so that a single link could resolve to the multiple versions in the archive.
Kevin Ashley made the point that we should not only collect the contents of the web, but also that we should collect content about the web if future generations are to make sense of the archive. One simple example are the words used in websites that are archived today. Perhaps we need to archive dictionaries along with the content so that a 100 years from now people will know what the content means.
There seems to be a consensus in the web archiving community to use the WARC format to capture and store web pages. As I understand it, this is a format to package and compress the data including embedded images or pdf's, videos and so forth. When the record is accessed then it is presumably unpacked and delivered back as web pages. But what if the embedded file formats are no longer compatible with the modern operating systems or browsers? One answer to this problem is to upgrade the archive files to keep pace with new software releases. Presumably this means unpacking the WARC file, converting the embedded formats to the new versions, then repacking.
Jeffrey van der Hoeven believes that emulation is a solution to this problem. He is part of the project team that developed the Dioscuri emulator. He is currently working to provide emulation as a web service as part of the KEEP project.
If you would like to dig into the history of browsers, go to evolt.org. where you'll find an archive of web browsers, including the one Tim Berners-Lee built in 1991, the one called simply "WorldWideWeb".
Probably the single biggest issue facing web archivists is permissions. Obtaining permission to crawl and archive is time-consuming and fraught with legal complications. The large institutions like the British Library take great care to respect the rights of the content creators; as a result UKWAC are unable to harvest up to 70% of the sites it selects. Others operate an remove-upon-request policy. Edgar Cook of The National Library of Australia reported that they have decided to collect even without permission, they just keep the content dark if there is no permission to archive is granted. Edgar challenged the group: "are we being too timid? - hiding behind permissions as an explanation for why archives cannot be complete". Several people noted that it was difficult to reach out to the content creators; Helen Hockx-Yu said "communication with content creators is a luxury".
I wonder if this is perhaps the most important issue of all: connecting the creator to the archiver. It seems to me that to be successful both need to care about digital preservation. I think Edgar Cook is right, the danger in hiding behind permissions or hoping for strong legal deposit legislation is that it avoids the issue. Content creators need to understand that they have a part to play in keeping their own work accessible for future generations. Archive organizations have a big role to play to help them understand that. For instance, archives could issue badges for content creators to place on their web site to show that their work has been considered worthy of inclusion in an archive.
Kevin Ashley set me thinking about another idea. Suppose there was a simple self-archiving service that anyone could use for their own digital content. In return for using this tool, content creators would agree to donate their content to an archive. It would be a little like someone donating their personal library or their collection of photo's upon their death. Except this would be a living donation, archiving as the content is created in a partnership between creator and archive. Mind you, I am sure that a simple self-archiving tool will be anything but simple to create.
Indeed it is clear that web archiving is not at all easy. There are lots of questions, problems, issues and challenges and this meeting highlighted many of them. Unfortunately, there don't seem to be too many answers yet!
The meeting was organized jointly by JISC, DPC and UK Web Archiving Consortium and attracted more than a 100 participants. The meeting was chaired by William Kilbride, Exec Director of DPC and Neil Grindley, programme manager for digital preservation for JISC. The presentations are available here.
Adrian Brown of UK Parliamentary Archives raised the interesting issue of how to preserve dynamic websites, ones that personalize on the fly. If every page on a website is individually created per user, then what version do you archive?
He also talked about versions across time. For instance, what is the best way to archive a wiki? Take a snapshot every so often or archive the full audit trail? Versioning is an issue when a site is harvested over a period of time so that there is a chance the site has been updated in-between harvests. Something he called a lack of temporal cohesion or temporal inconsistency.
Someone from the BBC noted that: "the BBC used to only record the goals in football matches and not the whole match" Now they realize how stupid this was - hence we should avoid the same pitfall by applying too much collection decision-making to archiving. This touches on one of the main issues facing web archivists: what to collect and what to discard? Most seem to make this decision on pragmatic grounds e.g. do we have permission to crawl or archive? how much budget do we have? do we have a mandate to collect a particular domain?
It strikes me that this is only a problem when there is a single collection point. The reality is that all sorts of people all over the world are archiving the web from multiple different perspectives all at the same time. If enough people / organizations do this then all of the web will be archived somewhere, sometime. So for instance, if there was a referee foundation archiving football matches for training purposes, and a football coaching organization, and the two clubs playing, then it wouldn't matter that BBC only saved the goals. The problem was that the BBC were the only ones filming the matches - a single collection point.
This touches on another main issue: the relationship between the content creator and the archivist. More on that later.
Peter Murray-Rust was quoted several times during the meeting. This is intriguing since he mostly seems to advocate against building digital archives which he thinks are effectively impossible and a waste of time. Instead we should disseminate data as widely as possible. If people are interested enough they will take copies somehow. Or as he puts it "Create and release herds of cows, not preserve hamburgers in a deep-freeze". The wider point here is that web archives should be part of the web themselves rather than hidden away in offline storage systems.
Another big issue here: access. If the archive is fully accessible then how do you know that what you find through Google is the archived version or the live version? Suppose there are multiple copies of the entire web archived through different institutions all accessible at the same time? Sounds like chaos to me. A chaos that only metadata can solve. Or so it seems to me.
I think it would help if there were metadata standards for archiving of websites. It could be a minimum set of data that is always recorded along with the archived contents. Archives could then be made interoperable either by using the same metadata schema or by exposing their metadata in some sort of data dictionary that is addressable in a standard way. If the standards are adhered to it would be possible to de-duplicate archived websites and easily identify the "live" version. It would also be easy to keep track of the versions of a website across time so that a single link could resolve to the multiple versions in the archive.
Kevin Ashley made the point that we should not only collect the contents of the web, but also that we should collect content about the web if future generations are to make sense of the archive. One simple example are the words used in websites that are archived today. Perhaps we need to archive dictionaries along with the content so that a 100 years from now people will know what the content means.
There seems to be a consensus in the web archiving community to use the WARC format to capture and store web pages. As I understand it, this is a format to package and compress the data including embedded images or pdf's, videos and so forth. When the record is accessed then it is presumably unpacked and delivered back as web pages. But what if the embedded file formats are no longer compatible with the modern operating systems or browsers? One answer to this problem is to upgrade the archive files to keep pace with new software releases. Presumably this means unpacking the WARC file, converting the embedded formats to the new versions, then repacking.
Jeffrey van der Hoeven believes that emulation is a solution to this problem. He is part of the project team that developed the Dioscuri emulator. He is currently working to provide emulation as a web service as part of the KEEP project.
If you would like to dig into the history of browsers, go to evolt.org. where you'll find an archive of web browsers, including the one Tim Berners-Lee built in 1991, the one called simply "WorldWideWeb".
Probably the single biggest issue facing web archivists is permissions. Obtaining permission to crawl and archive is time-consuming and fraught with legal complications. The large institutions like the British Library take great care to respect the rights of the content creators; as a result UKWAC are unable to harvest up to 70% of the sites it selects. Others operate an remove-upon-request policy. Edgar Cook of The National Library of Australia reported that they have decided to collect even without permission, they just keep the content dark if there is no permission to archive is granted. Edgar challenged the group: "are we being too timid? - hiding behind permissions as an explanation for why archives cannot be complete". Several people noted that it was difficult to reach out to the content creators; Helen Hockx-Yu said "communication with content creators is a luxury".
I wonder if this is perhaps the most important issue of all: connecting the creator to the archiver. It seems to me that to be successful both need to care about digital preservation. I think Edgar Cook is right, the danger in hiding behind permissions or hoping for strong legal deposit legislation is that it avoids the issue. Content creators need to understand that they have a part to play in keeping their own work accessible for future generations. Archive organizations have a big role to play to help them understand that. For instance, archives could issue badges for content creators to place on their web site to show that their work has been considered worthy of inclusion in an archive.
Kevin Ashley set me thinking about another idea. Suppose there was a simple self-archiving service that anyone could use for their own digital content. In return for using this tool, content creators would agree to donate their content to an archive. It would be a little like someone donating their personal library or their collection of photo's upon their death. Except this would be a living donation, archiving as the content is created in a partnership between creator and archive. Mind you, I am sure that a simple self-archiving tool will be anything but simple to create.
Indeed it is clear that web archiving is not at all easy. There are lots of questions, problems, issues and challenges and this meeting highlighted many of them. Unfortunately, there don't seem to be too many answers yet!
Monday, July 6, 2009
Lessons from bagged salad
I attended a webinair last week given by Professor Ranjay Gulati of Harvard Business School. One of the great examples he spoke about was, of all things, bagged salad. Bagged salad is still one of the fastest growing food retail product lines, despite the e-coli scares of recent years. The convenience factor has been lauded by chefs and nutritionists alike for popularizing salads. In short, it is a great case study for innovation.
Professor Gulati told us that it was not the lettuce growers who had come up with this idea. "How did they miss this?", he asked, "How did they not see the bagged opportunity coming?". His answer was that they were too busy asking their customers how good they thought their lettuce tasted. Too busy with their Salad Net Promotor Scores.
The message to the audience was plain. Don't be blinded by your existing business. Don't rely on metrics that measure how satisfied your customers are with your existing products. If you do, you risk missing opportunities around your product. Study how your customers use your products and always be on the look out for new ways to deliver or package what you produce.
I think this is very valuable advice. Learning how your customers use your products is a great way to discover new opportunities for product development.
I was intrigued by the bagged salad though. I mean putting food in bags seems really obvious. How could anyone not see that coming, however lettuce-obsessed they were? So I turned to Google to find out how bagged lettuce was invented.
It turns out that (of course) if you put pieces of lettuce in an ordinary plastic bag it will rot very quickly. Fine for the trip home from the grocer, no good for shipping and storage. Once lettuce is cut, it consumes oxygen and gives off carbon dioxide, water and heat. Left in the open air it will consume oxygen until it rots away completely. Keeping lettuce fresh requires a bag that will regulate the oxygen and carbon dioxide levels inside the bag.
I could not find out who first had the idea, but people were experimenting already back in 1960's. Nothing worked well enough. Lettuce is very sensitive to wilting and the plastic bag technology just wasn't good enough. It worked for delivery to fast food chains but shelf life was too short for retail sales channels.
Not until 1980 that is. It took almost twenty years to develop a plastic film that was breathable and that was machinable into bags. Along the way they also learned how to fill the bag with nitrogen to lower oxygen and extend shelf life.
Twenty years of research and development to make the idea of bagged salad real.
I like this untold part of Professor Gulati's story. It is similar to the Personalized M&M's I have written about before. The same dogged determination to figure out how to solve the problem. The same kind of technology breakthrough that made it finally possible. The same belief that this the problem could be solved, no matter what people said.
Just in case you are thinking that bagged salad has nothing at all to do with publishing, let me remind you of how it all started. The breakthrough that finally brought printing to the world was the availability of cheap paper. Before that the printing press was an academic experiment: What use was a cheap way to print if paper was prohibitively expensive? If Gutenberg or Caxton were alive today and working in the corporate world, their ideas would never make it past a Dragon's Den, let alone a business case review board!
There's no doubt in my mind that customer insight is key to innovation. Seeing things that customers think or do that no-one else has seen before. Realizing that people like to eat lettuce but that many people find it a pain to wash and prepare. We have great techniques nowadays, such as ethnographic studies, to help us do this and we have user experience experts to help us do it.
That only takes us so far, as the M&M people found out, the lettuce folks learnt and Gutenberg discovered as well. Believing that your team can solve something no-one else has ever solved before is at least as important as the insights that led you to see the problem in the first place.
Never forget that innovation requires dogged determination and sheer hard work. Or as Guy Kawasaki wrote in Rules for Revolutionaries: "create like a god, command like a king, and work like a slave".
Professor Gulati told us that it was not the lettuce growers who had come up with this idea. "How did they miss this?", he asked, "How did they not see the bagged opportunity coming?". His answer was that they were too busy asking their customers how good they thought their lettuce tasted. Too busy with their Salad Net Promotor Scores.
The message to the audience was plain. Don't be blinded by your existing business. Don't rely on metrics that measure how satisfied your customers are with your existing products. If you do, you risk missing opportunities around your product. Study how your customers use your products and always be on the look out for new ways to deliver or package what you produce.
I think this is very valuable advice. Learning how your customers use your products is a great way to discover new opportunities for product development.
I was intrigued by the bagged salad though. I mean putting food in bags seems really obvious. How could anyone not see that coming, however lettuce-obsessed they were? So I turned to Google to find out how bagged lettuce was invented.
It turns out that (of course) if you put pieces of lettuce in an ordinary plastic bag it will rot very quickly. Fine for the trip home from the grocer, no good for shipping and storage. Once lettuce is cut, it consumes oxygen and gives off carbon dioxide, water and heat. Left in the open air it will consume oxygen until it rots away completely. Keeping lettuce fresh requires a bag that will regulate the oxygen and carbon dioxide levels inside the bag.
I could not find out who first had the idea, but people were experimenting already back in 1960's. Nothing worked well enough. Lettuce is very sensitive to wilting and the plastic bag technology just wasn't good enough. It worked for delivery to fast food chains but shelf life was too short for retail sales channels.
Not until 1980 that is. It took almost twenty years to develop a plastic film that was breathable and that was machinable into bags. Along the way they also learned how to fill the bag with nitrogen to lower oxygen and extend shelf life.
Twenty years of research and development to make the idea of bagged salad real.
I like this untold part of Professor Gulati's story. It is similar to the Personalized M&M's I have written about before. The same dogged determination to figure out how to solve the problem. The same kind of technology breakthrough that made it finally possible. The same belief that this the problem could be solved, no matter what people said.
Just in case you are thinking that bagged salad has nothing at all to do with publishing, let me remind you of how it all started. The breakthrough that finally brought printing to the world was the availability of cheap paper. Before that the printing press was an academic experiment: What use was a cheap way to print if paper was prohibitively expensive? If Gutenberg or Caxton were alive today and working in the corporate world, their ideas would never make it past a Dragon's Den, let alone a business case review board!
There's no doubt in my mind that customer insight is key to innovation. Seeing things that customers think or do that no-one else has seen before. Realizing that people like to eat lettuce but that many people find it a pain to wash and prepare. We have great techniques nowadays, such as ethnographic studies, to help us do this and we have user experience experts to help us do it.
That only takes us so far, as the M&M people found out, the lettuce folks learnt and Gutenberg discovered as well. Believing that your team can solve something no-one else has ever solved before is at least as important as the insights that led you to see the problem in the first place.
Never forget that innovation requires dogged determination and sheer hard work. Or as Guy Kawasaki wrote in Rules for Revolutionaries: "create like a god, command like a king, and work like a slave".
Thursday, July 2, 2009
Innovation is a Risky Business
I have heard many, many senior executives talk about failure. "We have to become more tolerant of failure", they say, or "we have to learn to fail" or "fail often, fail early". And yet, when things do go wrong their first questions are often "who is responsible for this? Who's accountable? Who is to blame?". Well intentioned project post-mortems turn into blamestorming sessions.
And I don't blame them!
It's human to think like this. My first reaction when things go wrong is to blame myself. I've tried telling myself that I should be more tolerant of my own failings but somehow I don't listen to myself.
Nowhere is this struggle more intense than in innovation. You must try new ideas out to see if they work. Sometimes, despite all the research, you only know that the idea will work after launching. Innovation is a risky business.
I don't think we have to learn how to fail. I think we have to learn how to understand risk and how to mitigate it, how to manage it.
Here's an example. Some years ago I was working with someone from Accenture. His previous assignment had been as a Product Manager with (if my memory serves me correctly) Vodafone. He had headed a new product development that upon launch was not as successful as had been hoped and was discontinued soon afterwards.
Vodafone had been very clever. They had assessed the risk of this particular idea and decided that it was high. Too high to risk assigning one of their own Product Manager's to lead the development. If it was unsuccessful then it would be highly career-limiting for that person. In their company culture, a track record of success was important for building a career. So they hired a contractor instead.
It turns out this was the normal practice for their product development group. Risky new projects were handled by contractors. Less risky ones by their own staff. If a risky product ended up being successful they either hired the contractor or replaced them with one of their own staff to take it forward in the life cycle.
This was how they managed innovation risk. It seems to me to be a lot easier than trying to change their passion-for-winning culture.
And I don't blame them!
It's human to think like this. My first reaction when things go wrong is to blame myself. I've tried telling myself that I should be more tolerant of my own failings but somehow I don't listen to myself.
Nowhere is this struggle more intense than in innovation. You must try new ideas out to see if they work. Sometimes, despite all the research, you only know that the idea will work after launching. Innovation is a risky business.
I don't think we have to learn how to fail. I think we have to learn how to understand risk and how to mitigate it, how to manage it.
Here's an example. Some years ago I was working with someone from Accenture. His previous assignment had been as a Product Manager with (if my memory serves me correctly) Vodafone. He had headed a new product development that upon launch was not as successful as had been hoped and was discontinued soon afterwards.
Vodafone had been very clever. They had assessed the risk of this particular idea and decided that it was high. Too high to risk assigning one of their own Product Manager's to lead the development. If it was unsuccessful then it would be highly career-limiting for that person. In their company culture, a track record of success was important for building a career. So they hired a contractor instead.
It turns out this was the normal practice for their product development group. Risky new projects were handled by contractors. Less risky ones by their own staff. If a risky product ended up being successful they either hired the contractor or replaced them with one of their own staff to take it forward in the life cycle.
This was how they managed innovation risk. It seems to me to be a lot easier than trying to change their passion-for-winning culture.
Subscribe to:
Posts (Atom)