Tuesday, January 26, 2016

Online news archives

A mid-career journalist recently bemoaned he should have saved copies of his works. Many of his stories published online are not there anymore

For whatever reason, Republica edited out some portions of this piece I wrote for the newspaper. To be fair to my sources and interviewees (at least one missing in the published piece), and in the spirit of dispassionate media criticism, I have reproduced both versions of the article below.

Unedited article submitted to Republica:

Online News Archives 
Dharma Adhikari
Online archives of news outlets form a key part of what French sociologist Pierre Nora calls the “mémoire prothèse”, artificial memory. It is a giddying array of data, stored or scattered across the internet.

Compare this ‘material’ site of memory (“cloud” storage) to something ethereal—the aakashik records of antiquity, a cosmic memory bank that recorded every moment of every being in existence. Our ancient sages in the Himalayas, as the myth goes, were able to access it whenever they liked to.

Accessing online archives of Nepali media outlets is something different. You are often greeted with the “link rot” message: Page Not Found. The Web is so humongous, ephemeral, and unreliable. And, just like journalism, it has an unconditional bias for the perpetual now. The past is something for historians.

But historians are often more interested in a distant past than the more recent past. Without the journalists’ complete “first draft”, historians will be clueless. The issue of link rot, and the practice of deleting pages or overwriting trouble many academics, professionals and the general users.

Studies have shown that URL citations in scholarly papers often don’t work for long. The average lifespan of a webpage is estimated to be around one to three months. Journalists themselves are afflicted. A mid-career journalist recently bemoaned that he should have saved copies of all his works. Many of his stories were once published online, but they are not there anymore.

Karl-Heinz Krämer, professor at the South Asia Institute, University of Heidelberg in Germany, has, since 1998, maintained one of the most comprehensive websites linking stories from Nepal: nepalresearch.com (alternatively .org). Link rot “is a big problem, indeed” he observes, “I invest a lot of time in updating my website. So I simply do not have the time to check for invalid links”. Luckily, for him, he saves PDFs of articles, links and references in his hard disk, which is searchable.

Mark Turin, Chair of the First Nations and Endangered Languages Program at the University of British Columbia, Canada and Director of the Digital Himalaya Project believes link rot is inevitable. “The key issue is that the content is stable, and findable, through search engines. Links can always be updated, but as long as the content can be located, how one gets there is secondary”.

For Krämer, linking sources online is useful; it is a lot faster than going through print books or public libraries. And library catalogues in Nepal sometimes have gaping holes in their records of historical news. Turin notes the ease of access and democratic potentials of online archives. But users need internet connections, technical and language skills, or sometimes subscriptions to realize this potential, he observes.

In spite of the hype that the Web offers everything instantly, online news archives of Nepali outlets are often poorly maintained, incomplete and inaccessible. Go search online for major stories published in our publications some 10-15 years back: there is little to be found.

It’s not clear how many issues of Gorkhapatra, the country’s oldest newspaper, are online; the archive page offers the option of downloading a PDF copy, but the download time will test your patience. Its sister-publication, The Rising Nepal, maintains issues since June 2014, with copies missing in between.

The Annapurna Post archive goes back to Sept 2014 and Rajdhani maintains e-copies since April 2014. The Himalayan Times archive includes the option to navigate though a wide range of years, and actual copies in PDF are available for the past week. Republica and Nagarik maintain archives since 2011, with a gap between 2009 and 2010. Before their recent revamping of websites, the Kantipur Media Group (KMG) maintained copies until 2006; now you can access two weeks’ worth of e-copies of its daily newspapers.

Digital-only outlets usually bury their content too deep, and search features are often disappointing. The online archives of broadcast media are patchy; many now upload selected clips on YouTube, but not consistently.

Archival retention is an issue. Saving a copy every single day— as Krämer does —is a way to ensure that you have access to print content that matter to you. If newspapers cannot maintain a complete archive, then perhaps they may opt for a third-party host. The weekly Nepali Times, for example, maintains a complete archive of its copies, through the Digital Himalayan Project, headed by Turin.

The problem of access is not just with the PDF versions, but also with webpage stories with unique URLs. It is common for newspapers to delete or overwrite pages, without editorial clarifications: either to defend their reputation, appease newsmakers, or to rectify technical errors. As a prominent example, in 2014, a number of outlets deleted their misrepresented stories on the Dil Shobha episode. If a future historian decides to study the case, she will have to rely on internet archives, such as the Wayback Machine. But even such memory projects do not maintain a complete record of online publications. Their robots crawl our websites only a few times a month whereas they scan news websites of major world publications several times a day.

A change in domain name or the server also results in massive link rot. Leaving out metadata, such as date or author’s name in an individual story or a caption on an image, is a big problem for researchers, aggregators and archivists who need to contextualize information specifically. For example, The Rising Nepal individual story pages almost always disappoint us: they usually come without dates, and even without the year. A story may be retrieved successfully, but it is arduous and even impossible to trace when it was published.

A vast majority of less-known Nepali news outlets do not maintain any archive at all. For them, the Web appears to be a sand mandala: why work hard on retaining our impermanent imprints? Meanwhile, some well-known publications are actually upgrading their archives.

KMG is working on a complete archive to include all editions of its outlets since their beginnings. “It is going to take time; we are short of manpower and there are also technical issues”, says Yangesh Raj, online coordinator at ekantipur.com. He believes missing stories or copies cost reader’s ease of access, and they also hurt the publications’ credibility.

Easy access to useful and authentic information is a key concern to everyone, including the users. But the producers, aggregators, archivists and researchers face additional issues of appraising, processing and selecting information, acquiring it, organizing and storing, creating metadata and descriptions about the content itself, and making it accessible and usable for the users.

That is a long list of tasks, especially for outlets that lack resources and the will to keep pace with ever-advancing technologies. And as Turin points out, the linguistic diversity of our press also poses an additional challenge in developing and maintaining strong digital archives. “It would be tragic if the creativity expressed in Nepal's free media and news platforms were lost due to poor archiving standards”, he observes. His advice: have a documentation officer on staff; ensure that you have more than one back up, in keeping with the LOCKS principle (Lots of Copies Keep Stuff Safe).

While Turin emphasizes endurance, consistency and longevity of data, Krämer suggests effective website oversight: don’t change links as often as done in the past; make sure that older links do not get broken or deleted; don’t forget to upload PDF copies every now and then, as is the case with some outlets; improve search options of the archives. Unlike Nagarik, according to Krämer, the e-paper version of Republica is extremely complicated to read, copy and download. He wonders why the publishers differentiate like that between these two newspapers.

That is the type of question many other users are asking of news websites as their archives emerge as compelling sites of modern memories. And, let’s not forget completeness, another key attribute of any aakashik records.

The following was published (26 January, 2016) with Republica editorial judgement:



Missing links
DHARMA ADHIKARI

A mid-career journalist recently bemoaned he should have saved copies of his works. Many of his stories published online are not there anymore

Online archives of news outlets form a key part of what French sociologist Pierre Nora calls the "mémoire prothèse", artificial memory. It is a giddying array of data, stored and scattered across the internet.

Compare this 'material' site of memory ("cloud" storage) to something ethereal—the aakashik records of antiquity, a cosmic memory bank that recorded every moment of every being in existence. Our ancient sages in the Himalayas, as the myth goes, were able to access it whenever they liked.
Accessing online archives of Nepali media outlets is something different. You are often greeted with the "link rot" message: Page Not Found. The Web is so humongous, ephemeral, and unreliable. And, just like journalism, it has an unconditional bias for the perpetual now. The past is something for historians.

But historians are often more interested in a distant past than the more recent past. Without the journalists' complete "first draft", historians will be clueless. The issue of link rot, and the practice of deleting pages or overwriting trouble many academics, professionals and general users.

Studies have shown that URL citations in scholarly papers often don't work for long. The average lifespan of a webpage is estimated to be around one to three months. Journalists themselves are afflicted. A mid-career journalist recently bemoaned that he should have saved copies of all his works. Many of his stories were once published online, but they are not there anymore.

Karl-Heinz Krämer, professor at the South Asia Institute, University of Heidelberg in Germany, has, since 1998, maintained one of the most comprehensive websites linking stories from Nepal: nepalresearch.com (alternatively .org). Link rot "is a big problem, indeed" he observes, "I invest a lot of time in updating my website. So I simply do not have the time to check for invalid links". Luckily, for him, he saves PDFs of articles, links and references in his hard disk, which is searchable.

Mark Turin, Chair of the First Nations and Endangered Languages Program at the University of British Columbia, Canada and Director of the Digital Himalaya Project believes link rot is inevitable. "The key issue is that the content is stable, and findable, through search engines. Links can always be updated, but as long as the content can be located, how one gets there is secondary".

For Krämer, linking sources online is useful; it is a lot faster than going trough print books or public libraries. And library catalogues in Nepal sometimes have gaping holes in their records of historical news. Turin notes the ease of access and democratic potentials of online archives. But users need internet connections, technical and language skills, or sometimes subscriptions to realize this potential, he observes.

In spite of the hype that the Web offers everything instantly, online news archives of Nepali outlets are often poorly maintained, incomplete and inaccessible. Go search online for major stories published in our publications 10-15 years ago: there is little to be found.

It's not clear how many issues of Gorkhapatra, the country's oldest newspaper, are online; the archive page offers the option of downloading a PDF copy, but the download time will test your patience. Its sister-publication, The Rising Nepal, maintains issues since June 2014, with copies missing in between.

The Annapurna Post archive goes back to Sept 2014 and Rajdhani maintains e-copies since April 2014. The Himalayan Times archive includes the option to navigate though a wide range of years, and actual copies in PDF are available for the past week. Republica and Nagarik maintain archives since 2011, with a gap between 2009 and 2010.

Digital-only outlets usually bury their content too deep, and search features are often disappointing. The online archives of broadcast media are patchy; many now upload selected clips on YouTube, but not consistently.

Archival retention is an issue. Saving a copy every single day— as Krämer does —is a way to ensure that you have access to print content that matter to you. If newspapers cannot maintain a complete archive, then perhaps they may opt for a third-party host. The weekly Nepali Times, for example, maintains a complete archive of its copies, through the Digital Himalayan Project, headed by Turin.

The problem of access is not just with the PDF versions, but also with webpage stories with unique URLs. It is common for newspapers to delete or overwrite pages, without editorial clarifications: either to defend their reputation, appease newsmakers, or to rectify technical errors. As a prominent example, in 2014, a number of outlets deleted their misrepresented stories on the Dil Shobha episode. If a future historian decides to study the case, she will have to rely on internet archives, such as the Wayback Machine. But even such memory projects do not maintain a complete record of online publications. Their robots crawl our websites only a few times a month whereas they scan news websites of major world publications several times a day.

A change in domain name or the server also results in massive link rot. Leaving out metadata, such as date or author's name in an individual story or a caption on an image, is a big problem for researchers, aggregators and archivists who need to contextualize information specifically. For example, The Rising Nepal individual story pages almost always disappoint us: they usually come without dates, and even without the year. A story may be retrieved successfully, but it is arduous and even impossible to trace when it was published.

A vast majority of less-known Nepali news outlets do not maintain any archive at all. For them, the Web appears to be a sand mandala: why work hard on retaining our impermanent imprints? Meanwhile, some well-known publications are actually upgrading their archives.

Easy access to useful and authentic information is a key concern for everyone, including the users. But the producers, aggregators, archivists and researchers face additional issues of appraising, processing and selecting information, acquiring it, organizing and storing, creating metadata and descriptions about the content itself, and making it accessible and usable for the users.

That is a long list of tasks, especially for outlets that lack resources and the will to keep pace with ever-advancing technologies. And as Turin points out, the linguistic diversity of our press also poses an additional challenge in developing and maintaining strong digital archives. "It would be tragic if the creativity expressed in Nepal's free media and news platforms were lost due to poor archiving standards", he observes. His advice: have a documentation officer on staff; ensure that you have more than one back up, in keeping with the LOCKS principle (Lots of Copies Keep Stuff Safe).

While Turin emphasizes endurance, consistency and longevity of data, Krämer suggests effective website oversight: don't change links often; make sure older links do not get broken or deleted; don't forget to upload PDF copies every now and then, as is the case with some outlets; improve search options of the archives. Unlike Nagarik, according to Krämer, the e-paper version of Republica is hard to read, copy and download.

That is the type of question many other users are asking of news websites as their archives emerge as compelling sites of modern memories. And, let's not forget completeness, another key attribute of any aakashik records.

Published in Republica, 26 January, 2016