August 14, 2009

The Real Google Documents

Here’s an idea for a great Google web application - an online archive where you can tag, sort, and store all of your used-to-be-paper documents, i.e., PDFs - and to share the same documents with other people, or even everybody.

I use many, many applications that perform a similar service with the PDFs on my hard drive; Yep!, Papers, Zotero, Scrivener, Evernote. And I use Dropbox to backup and sync my PDFs between machines. I also use Scribd to read PDFs and share them with the world. But Google could easily offer a service that does everything these applications do and more. They’re already offering a web-reader for PDFs. What they need is something that actually lets you USE them.

Here’s how I imagine this goes. Let’s say someone emails you a PDF to your Gmail account, or appends a PDF to a feed you read in Google Reader. Instead of downloading it onto your computer (or, egads, a public machine), you have the opportunity to load it into Docs. Just like that, it’s in your archive. You can also have Google Desktop scan for and index your PDFs and auto-load them into your archive, too.

Once you import it, you don’t have to do anything else. It’ll either pull the text — or if there’s no text layer, it’ll OCR the document FOR you. You can auto-tag it or add your own tags to help you sort your docs together. It can also pull metadata, like Zotero. And you can create smart collections that link PDFs with text documents, emails, and stuff from Google Books, Scholar, even Maps or Groups.

You can also customize levels of privacy and security. Some files you might want to have public, like on Scribd. Maybe you’ll even create RSS channels so folks can receive your new images/PDFs/ebooks/XML documents automatically. Others you want to share with specified users, like Dropbox or Groups. Still others (tax and employment info, etc.), you’ll encrypt with extra passwords.

In fact, this is awfully close to the vision two enterprising chaps passed off years ago of the Google Grid.

Seriously; Google says it wants to index the world’s information. Well, let me tell you - I’m chock full of information that I don’t know what to do with. Why can’t it start by taking some of mine - and giving me some tools so that I can do things with it as payment?

Posted August 14, 2009 at 12:37 | Comments (10) | Permasnark
I like it, but you have to admit: The distribution graph of PDF ownership is a long tail, and it sounds like you are in the head -- the Coldplay of PDF ownership :-)

Well actually, let me rephrase that as a question: Do you think PDFs are on the rise, or fading away? Does the PDF have a bright, busy future -- or is it on its way to legacy-ville?

Zotero is tres chic, but I can't imagine using PDFs much outside of academic journals. If I could easily convert everything I had on paper into pdf or text/web (a la LaTeX), I would go text/web everytime

I don't know if the PDF specification itself is the be-all and end-all of distribution of display-stable text+image documents... But something that DOES that, yes, I think that's the future. We need images with machine-readable texts.

And I think PDFs are just everywhere, way more common than DOC files or PPTs or anything except maybe JPGs. And JPGs don't have text.

I should add that the reason PDFs are so common is because they're really good for forms. That's why the government uses 'em. That's why software companies use 'em for instruction manuals.

The problem with HTML and DOC and their variants is layout consistency, especially with fonts. Anything you want to look the same on any screen works best in PDFs.

Also, unless you're using janky-super-slow Adobe Reader, PDFs load way faster than DOC files in an office suite - a bare-bones text editor, again, can't swing images.

Yeah, you know, this is actually a really important (and surprisingly controversial!) point:

"Anything you want to look the same on any screen works best in PDFs."

I mean, the argument w/ a lot of web content is that it SHOULDN'T always look the same -- it should flex and flow, resize, reformat, etc. "Separation of content and presentation" is a major rallying cry for a lot of web designers/developers.

And one of the *virtues* of something like the Kindle is, ostensibly, its flexibility. You can change the font size! That's cool, right?

As a writer/designer, I find myself *hating* the flexibility of both the web & the Kindle, at least when it comes to presenting stuff I've created.

It goes back to something we were talking about before -- about wanting to engineer things like the page flip. Things like: Where does a sentence fall on a page? At the top? In the middle? That's part of the experience, too.

I can feel my sentiments hardening even as I write this. The Project Gutenberg plain-text version of a book, stripped of all design entirely, *is not the book*.

"Display-stable." That's a nice term, and a characteristic I am definitely after.

Well, dynamic display definitely has its place, and the ideal would be something that handles variations on display GRACEFULLY. But it turns out, that's actually really hard to do right, even if IE6 went the way of all janky code.

Display-stability leads into an idea that (I hope) we'll be kicking around a lot next week, which is that the test case for the electronic reader of the future isn't or shouldn't be the novel. I mean, again, a Project Gutenberg-style text file can work on the tiniest screen, on the dumbest text-capable cell-phone. Strings of continuous text are just not that crucial.

Nor should it be the newspaper, really. Forgive me as an unabashed and sincere fan of newsprint, but it's a testament to the genius of the newspaper, as an early form of hypertext, that it works so well in genuine hypertext, i.e., on the web.

The test case for electronic reading machines is or ought to be 1) the comic book or 2) the children's book. Image-and text-intensive forms where the organization of a single page (or spread of two pages) forms a single unit. These are the real codices, the texts that already understand that the page is a screen.

(I think there totally will and ought to be brand-new forms that take advantage of the electronic reading experience - readies, anyone? - but if we're looking to existing forms for a while, those are the two I'd look to.)

I wonder if there's a way for certain experiential aspects of into electronic texts - like the position of the page turn - to be coded, so that regardless of the device, something simple like that would display some uniformity between devices. That shouldn't be hard, right?

I really like the readies example too. The Globe and Mail iPhone app has something similar and the first time I used it, I felt as if someone were pushing my brain to work in new ways. What's fascinating is the way that it takes its sense of 'temporality' or whatever from video and applies it to text. I think that's an interesting development for the attention economy - "here, read this and, no, you can't look away". By further(?) putting text into time, it demands attention.

But to return to the most appropriate test-case - if it's the organisation of text and images on a page, then why not also the magazine? I ask b/c it seems like it has the benefit of the formal requirements, but comes 'bundled' with its own cultural urgency and anxiety: it sits there on the screen, painfully aware that it, like all things displayed on screens, may flicker and disappear forever.

Totally. It's hard now to think about a magazine like Ray Gun, for example, without thinking about it as a kind of terminus of the print magazine; or the weird pop cultural relevance of the Absolut Vodka ads, from around the same time--

--You can certainly imagine a new life for magazines (and, um, advertising) on a full-color, design-/display-stable reading machine.

Tim, or anyone else: do you have thoughts on Mendeley? I like what they're trying to do, but the current version doesn't seem to handle my already-huge PDF library very well.

Geh! It's actually the first I've heard of it.

Just downloaded; will experiment and report back.

