Page 1 of 2

1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 4:21 pm UTC
by fearless_fool
Image

https://xkcd.com/1909/

Mouse-over text reads

I spent a long time thinking about how to design a system for long-term organization and storage of subject-specific informational resources without needing ongoing work from the experts who created them, only to realized I'd just reinvented libraries.


This reads exactly like the mission statement of the Long Now Foundation (http://longnow.org/) - a really worthwhile group.

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 4:32 pm UTC
by wumpus
How long will it take to get a pdf variant that won't include javascript (or flash and EXE exploit holes)? I've been fond of the format since .ps were available on ftp, but can't recommend an Adobe format for "digital resource lifespan".

PS: what does it take to replace microfilm lamps with high-lifespan LEDs?

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 4:34 pm UTC
by Heimhenge
And why does Randall think that the PDF format will be eternal? I have some older PDFs that can still be opened with the latest Adobe Reader, but the formatting is all messed up and embedded graphics show up as gray boxes. How long till they're completely broke?

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 4:46 pm UTC
by Rombobjörn
Now that PDF is an open standard with multiple independent implementations it has a much better chance of being readable in the future than some corporation's proprietary file format.

This is pure speculation, but if those older PDF files are from the time when PDF was a proprietary format, then perhaps they don't conform to the standard?

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 5:05 pm UTC
by ObsessoMom
Um, has Russell visited a university library recently? He might be alarmed to find that a lot of those once-reliable physical resources (books, microfilm) have been moved to offsite storage or de-accessioned, after they've been digitally scanned.*

*Badly. By inattentive and underpaid students who didn't much care if a few pages were stuck together or folded here and there. Or if non-Latin alphabets got digitally approximated to the closest Latin alphabet equivalent, so that the result is unintelligible in any language.

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 5:15 pm UTC
by golden.number
I play board games. A number of more recent board games now require an app to play. For example, Alchemists https://www.boardgamegeek.com/boardgame/161970/alchemists (essentially a board game about getting a science PhD, disguised as a game about alchemy). I've pointed out on board game forums that at some point the app will probably cease functioning when it is no longer maintained. But non-programmers constantly poo-poo that idea. That can't conceive of an app not being constantly maintained indefinitely.

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 5:18 pm UTC
by Mutex
Heimhenge wrote:And why does Randall think that the PDF format will be eternal?

I'm pretty sure he's just saying they currently still work, not that anything listed there will necessarily be eternal.

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 5:20 pm UTC
by Soupspoon
(Re: "Now PDF is an open standard…") It sounds like there is a need to insert Backwards Compatibility Mode kludges into modern open-source PDF libraries, to detect and account for the historical kinks. Ideally autodetecting, fingerprinting the document format to work out which issues it might have, but a selectable "parse and view as if on FOO version…" (as drop-down on GUI implementation, as a handler object setting within the library code behind that) maybe. Though sounds like it'd be a branched and interwoven patchwork of kludging measures, rather than linear to a timeline. Imagine HTML rendering with "view as if IE2" alongside "view as Netscape Navigator 2" alongside "…NCSA Mosaic 2.1" and, of course, Lynx 2.3ish.

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 5:38 pm UTC
by Justin Lardinois
Reminds me of the BBC Domesday Project.

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 5:52 pm UTC
by Heimhenge
Mutex wrote:
Heimhenge wrote:And why does Randall think that the PDF format will be eternal?

I'm pretty sure he's just saying they currently still work, not that anything listed there will necessarily be eternal.


I was interpreting his right-facing arrow to mean "forever into the future" as he implied with books and microfiche.

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 6:16 pm UTC
by Yu_p
Heimhenge wrote:
Mutex wrote:
Heimhenge wrote:And why does Randall think that the PDF format will be eternal?

I'm pretty sure he's just saying they currently still work, not that anything listed there will necessarily be eternal.


I was interpreting his right-facing arrow to mean "forever into the future" as he implied with books and microfiche.


Replace modern-day subjects by "senate protocols of the great city of Rome" and suddenly books (scrolls?) are mostly out-of-order too. It just takes longer.

Btw, isn't the standard only PDF/A? After my experience when submitting my diploma thesis, I'm not even sure if compliance with the standard is well defined, unless you assume a specific version of some proprietary checker.

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 6:58 pm UTC
by Ranbot
Anyone who has experienced a fire or flood in their home or workplace knows that having the physical resource doesn't guarantee its longevity either. I often have to research public documents for my job and I am often told that records prior to some date were destroyed by a fire/flood. Theft of physical resources can be a problem too. There's a place for both physical and digital information storage, particularly for the most important documents.

Heimhenge wrote:And why does Randall think that the PDF format will be eternal? I have some older PDFs that can still be opened with the latest Adobe Reader, but the formatting is all messed up and embedded graphics show up as gray boxes. How long till they're completely broke?

Apologies if this obvious to you, but you might improve the lifespan of those older PDFs by flattening the [remaining] file and resaving it. Theoretically, :wink: that should make the file readable for as long as "PDF" is a thing.

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 9:44 pm UTC
by DanD
This is definitely something that Librarians and Archivists are deeply aware of, and deeply concerned about.

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 11:42 pm UTC
by qvxb
This is why we need more monks majoring in computer science.

Re: 1909:"Digital Resource Lifespan"

Posted: Mon Oct 30, 2017 11:46 pm UTC
by Mutex
That's probably quite a small intersection on a Venn diagram.

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 12:03 am UTC
by rmsgrey
golden.number wrote:I play board games. A number of more recent board games now require an app to play. For example, Alchemists https://www.boardgamegeek.com/boardgame/161970/alchemists (essentially a board game about getting a science PhD, disguised as a game about alchemy). I've pointed out on board game forums that at some point the app will probably cease functioning when it is no longer maintained. But non-programmers constantly poo-poo that idea. That can't conceive of an app not being constantly maintained indefinitely.


1) The app runs on a wide variety of platforms, including an in-browser version, so it's likely to continue running longer than, say, an obscure DOS game (though most obscure DOS games are still playable on modern PCs through things like DOSBox, assuming you have a copy of the game)
2) At some point, the supply of deduction grids that came with the game will run out, probably significantly before the app becomes unusable
3) It would take me maybe 2 hours to create a workable replacement for the app, and the first hour would be remembering how to use Visual Studio. Or I might decide to save time and do it in VB Script (in which case most of the development time would be coming up with a way to encode/decode permutations as numbers in the range 1-40320)
4) Alchemists does come with a non-digital alternative - which admittedly is pretty non-fun for the person playing as the app - so even after the fall of civilisation leaves no working electronics, the survivors can still play the game. At least until one of the seals goes missing and the deduction grids run out...

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 2:39 am UTC
by Mirkwood
Yu_p wrote:Btw, isn't the standard only PDF/A? After my experience when submitting my diploma thesis, I'm not even sure if compliance with the standard is well defined, unless you assume a specific version of some proprietary checker.


There are multiple PDF standards. The different versions of PDF/A are an archival standard. Compliance is well-defined, but to my knowledge there are no good free compliance-checkers. There is free software that claims to support the standard, though. And the most important parts of the standard are fairly-straightforward.

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 3:44 am UTC
by da Doctah
In 2000, I found a couple of homemade 8-track tapes of some radio shows I recorded off the air in 1978. Knowing an online community that would be interested in the contents, I hit the thrift stores looking for an 8-track player, and finally managed to dig one up on eBay for just ten bucks. One drive belt needed tension adjustment, but apart from that it was out-of-the-box ready to hook up to my line-in jack and turn the tapes into MP3s.

(To further complicate the tech involved in this story, one of the songs included in the recorded shows had originally come out on acetate in the 1920s, the radio show itself had been transcribed on vinyl, and I had an edited version of the program on cassettes during the years the 8-tracks were out of my immediate access. And the radio station itself was in 1978 broadcasting in quadrophonic FM.)

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 3:54 am UTC
by Pfhorrest
I remember there was some urban legend (not sure if that's quite the right term for this) in the 80s or 90s about a guy on UseNet who claimed to be a time traveller from the not-too-distant future, whose mission was simply to recover some old piece of technology because some important records in the future were in a medium that nothing in that time period could read. (I guess the format specs must have been lost too for it to be more economical to send a time traveller than to just custom-build a new reader).

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 7:41 am UTC
by DavCrav
qvxb wrote:This is why we need more monks majoring in computer science.


Not the monks! They destroyed more classical works of science for their stupid fairy stories than fire and flood. OK, probably not that many, but many. Some Greek works only survive because the monk involved wasn't very good at washing the original away.

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 8:52 am UTC
by Phasma Felis
Digital preservation is a fascinating and criminally undervalued field. There's detailed standards out there for ways for digital information to survive (and remain readable) many years after the deaths of both the individuals and the institutions that maintained them--just for one example, by periodically transcoding all files in the archive into more current/accessible formats, while also retaining all earlier versions *and* complete specifications for all of them.

Hardly anyone *follows* those standards. But they do exist.

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 9:04 am UTC
by Jiffy
Heimhenge wrote:
Mutex wrote:
Heimhenge wrote:And why does Randall think that the PDF format will be eternal?

I'm pretty sure he's just saying they currently still work, not that anything listed there will necessarily be eternal.


I was interpreting his right-facing arrow to mean "forever into the future" as he implied with books and microfiche.


Given that the left-facing arrows clearly don't mean "forever into the past", I doubt that the right-facing arrows mean what you're saying they mean. In any case, books and microfilm won't necessarily last forever. They'll stop being reprinted, decay and be destroyed. It's not like we currently have accessible copies of all books that ever existed.

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 11:08 am UTC
by svenman
Phasma Felis wrote:Digital preservation is a fascinating and criminally undervalued field. There's detailed standards out there for ways for digital information to survive (and remain readable) many years after the deaths of both the individuals and the institutions that maintained them--just for one example, by periodically transcoding all files in the archive into more current/accessible formats, while also retaining all earlier versions *and* complete specifications for all of them.

Hardly anyone *follows* those standards. But they do exist.

That certainly seems like a form of maintenance to me. So basically what this boils down to in my view is: to continue ensuring the preservation of a digital resource, there has to be somebody, individual or organisation, maintaining it in some form. If the original maintainer is no longer available, eventually another one has to step in.

Of course, if you choose a wide enough definition of "maintenance" that blends into preservation, then in the end the situation is actually not fundamentally different from the one with traditional media like books and microfilm. In the case of these, the role of maintainer/preserver is traditionally being performed by libraries and archives, just the methods are different.

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 11:38 am UTC
by Soupspoon
That best practice doesn't rely on continual maintenance. It more works towards spreading the bets so that any unforseen pause in the process will still leave various mirroring versions of archival material around (and probably around in different places, each differently vulnerable to the vicisitudes of time) so that once the future turns back to an interest in the old material there's a multi-pronged possibility of finding something recoverable. Either directly or by piecing together fragments from across the canon. Moreover, it is not inconceivable that fragments of acetate-stored records, together with a hardy but lost-art digital representations and various other breadcrumbs could form a Rosetta Stone towards the better understanding of a yet wider range of materials, given the opportunity for further study.

That's ignoring the more immediate recovery from "Oh no! All our CD-RW archive media from 15 years ago have degraded!" problems, necessitating pulls from the bulk deep-paper archives and/or Last Good Read copies to more contemporanius (or at least more recently re-written) forms of media.

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 12:50 pm UTC
by cellocgw
qvxb wrote:This is why we need more monks majoring in computer science.


And thus you have invented Anathem

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 3:27 pm UTC
by orthogon
Soupspoon wrote:That best practice doesn't rely on continual maintenance. It more works towards spreading the bets so that any unforseen pause in the process will still leave various mirroring versions of archival material around (and probably around in different places, each differently vulnerable to the vicisitudes of time) so that once the future turns back to an interest in the old material there's a multi-pronged possibility of finding something recoverable. Either directly or by piecing together fragments from across the canon. Moreover, it is not inconceivable that fragments of acetate-stored records, together with a hardy but lost-art digital representations and various other breadcrumbs could form a Rosetta Stone towards the better understanding of a yet wider range of materials, given the opportunity for further study.

That's ignoring the more immediate recovery from "Oh no! All our CD-RW archive media from 15 years ago have degraded!" problems, necessitating pulls from the bulk deep-paper archives and/or Last Good Read copies to more contemporanius (or at least more recently re-written) forms of media.

These discussions often confuse two different things, though. There's the longevity of the medium, which is what people are referring to when they say "oh, yeah, they told us CDs would last forever". We've obviously realised that individual digital media can't be trusted, and that we need to keep backups in multiple places and continually migrate them onto the latest technology, as you describe. But the great thing about digital files is that this is extremely easy and cheap to do, because it doesn't matter what the data is: providing reliable storage for digital files is a standard off-the-shelf service. It doesn't need specialists in particular video tape standards or microfiche formats. The files themselves are the archives, not the magnetic discs or whatever on which they're stored.

That leaves the problem of longevity of the file format, which I think is what's being discussed here and is the major problem. Nevertheless, I'd expect that text should last as long as the writing systems themselves survive, provided there's a way of pulling the wheaty text out of all the other chaff in the file. (Diagrams and photos are another story). This implies that a good format ought definitely to preserve the order of text characters and keep passages of text together. PDFs don't always do this; quite often the letters of a word will be separated from one another (I know this from trying to search a PDF for a word that I can damn well see occurs at least once).

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 3:40 pm UTC
by Archgeek
Pfhorrest wrote:I remember there was some urban legend (not sure if that's quite the right term for this) in the 80s or 90s about a guy on UseNet who claimed to be a time traveller from the not-too-distant future, whose mission was simply to recover some old piece of technology because some important records in the future were in a medium that nothing in that time period could read. (I guess the format specs must have been lost too for it to be more economical to send a time traveller than to just custom-build a new reader).

Oh yeah, John Thomas! Or..John Connor? No...oh, right, John Titor -- made famous to the otaku crowd by something called Steins;Gate. If I recall that was an actual series of posts that did happen back then, part of the inspiration for Pretend to Be a Time Traveler Day, which turned up in the mid-aughts.

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 5:07 pm UTC
by gmalivuk
DavCrav wrote:
qvxb wrote:This is why we need more monks majoring in computer science.


Not the monks! They destroyed more classical works of science for their stupid fairy stories than fire and flood. OK, probably not that many, but many. Some Greek works only survive because the monk involved wasn't very good at washing the original away.

And others only survive because they were preserved and copied in monasteries, which were also the main sources of any kind of literacy in Europe for several centuries.

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 5:20 pm UTC
by eviloatmeal
Meh. All the important information only exists on floppy zine, anyway.

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 5:35 pm UTC
by petercooperjr
Everybody should know and heed lessons from the Lunar Orbiter Image Recovery Project.
They had all these tapes from the Apollo program, and to actually get the data off they needed to find and restore old tape drives, including scouring eBay to find old parts for those drives. All this amazing historical data was very close to being lost just because we couldn't read it anymore, as technology keeps marching on very quickly.

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 5:39 pm UTC
by JohnTheWysard
Two extremes:

Sumerian cuneiform tablets (properly baked): 3500BCE - millenia hence
Abacus (soroban, suan-pan): until it gets tilted or shaken

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 7:04 pm UTC
by Kit.
Isn't it more a problem of abundance of digital information potentially worth keeping than of longevity of digital media?

Re: 1909:"Digital Resource Lifespan"

Posted: Tue Oct 31, 2017 10:57 pm UTC
by svenman
orthogon wrote:[...] providing reliable storage for digital files is a standard off-the-shelf service.

Contrasting with providing reliable storage for paper-based media, which usually is a standard on-the-shelf service.

Sorry, I had to.

Re: 1909:"Digital Resource Lifespan"

Posted: Wed Nov 01, 2017 7:32 am UTC
by ProphetZarquon
Randall forgot my personal favorite: UTF-8 formatted .txt files. Since 1993 & counting, never had an issue opening one. I still have my first copy of The Anarchist's Cookbook, copied from a Kaypro II running CP/M on a 5-1/4" floppy to an 8088XT running MS-DOS on a 30mb hard drive to an IBM PS/2 286 on 20mb hard drive to an Asus 486 on a 3.5" floppy to a 1.2gHz Pentium on a 100mb Zip drive to a Core 2 Duo on a CD-R to an i7 system on a 128gb solid state drive, which was finally backed up to a 1tb hard drive & archived, as there's a newer copy to carry around. That original .txt file still opens just fine on any PC I've ever used (including mobile).

Also, I believe Linus Torvalds once said (talking about code, but it applies to anything sufficiently desirable) "Only wimps use tape backup, real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)" I can certainly attest to that. I once made a torrent of all the Star Trek I'd accumulated (IE, all the Star Trek ever) & uploaded that to several public torrent indexes. Two years later an old hard drive died & I was able to recover all 200+ gb in a little over 6 hours, simply by downloading my own torrent from other seeds. Thanks Trekkies!

Re: 1909:"Digital Resource Lifespan"

Posted: Wed Nov 01, 2017 7:37 am UTC
by ProphetZarquon
wumpus wrote:How long will it take to get a pdf variant that won't include javascript (or flash and EXE exploit holes)? I've been fond of the format since .ps were available on ftp, but can't recommend an Adobe format for "digital resource lifespan".

PS: what does it take to replace microfilm lamps with high-lifespan LEDs?


As far as security goes, just disable scripting & external plugin use entirely, in your PDF reader. When's the last time you had a .pdf file with actual Flash content in it, anyway? I keep it turned off & I've never run into a .pdf that wouldn't display that way.

Re: 1909:"Digital Resource Lifespan"

Posted: Wed Nov 01, 2017 7:40 am UTC
by ProphetZarquon
ObsessoMom wrote:Um, has Russell visited a university library recently? He might be alarmed to find that a lot of those once-reliable physical resources (books, microfilm) have been moved to offsite storage or de-accessioned, after they've been digitally scanned.*

*Badly. By inattentive and underpaid students who didn't much care if a few pages were stuck together or folded here and there. Or if non-Latin alphabets got digitally approximated to the closest Latin alphabet equivalent, so that the result is unintelligible in any language.


I & l & 1 & | & ¡ & \ are all the same letter, right? Right?

Re: 1909:"Digital Resource Lifespan"

Posted: Wed Nov 01, 2017 7:46 am UTC
by ProphetZarquon
cellocgw wrote:
qvxb wrote:This is why we need more monks majoring in computer science.


And thus you have invented Anathem


Well, strictly speaking, the monks continued the research while the Ita maintained the computers...

Re: 1909:"Digital Resource Lifespan"

Posted: Wed Nov 01, 2017 7:50 am UTC
by ProphetZarquon
Archgeek wrote:
Pfhorrest wrote:I remember there was some urban legend (not sure if that's quite the right term for this) in the 80s or 90s about a guy on UseNet who claimed to be a time traveller from the not-too-distant future, whose mission was simply to recover some old piece of technology because some important records in the future were in a medium that nothing in that time period could read. (I guess the format specs must have been lost too for it to be more economical to send a time traveller than to just custom-build a new reader).

Oh yeah, John Thomas! Or..John Connor? No...oh, right, John Titor -- made famous to the otaku crowd by something called Steins;Gate. If I recall that was an actual series of posts that did happen back then, part of the inspiration for Pretend to Be a Time Traveler Day, which turned up in the mid-aughts.


Time travel: The ultimate archive recovery method.

Re: 1909:"Digital Resource Lifespan"

Posted: Wed Nov 01, 2017 8:00 am UTC
by ProphetZarquon
gmalivuk wrote:
DavCrav wrote:
qvxb wrote:This is why we need more monks majoring in computer science.


Not the monks! They destroyed more classical works of science for their stupid fairy stories than fire and flood. OK, probably not that many, but many. Some Greek works only survive because the monk involved wasn't very good at washing the original away.

And others only survive because they were preserved and copied in monasteries, which were also the main sources of any kind of literacy in Europe for several centuries.


Yes & let's not forget that the first printing press with movable type was used to print copies of the Gutenberg bible. The best way to ensure something will remain available still seems to be "make lots of copies".

M-Data Blu-ray discs & hard drives in a fire safe are not bad either.

Re: 1909:"Digital Resource Lifespan"

Posted: Wed Nov 01, 2017 12:45 pm UTC
by gmalivuk
Please do not double (or sextuple) post.