Discover the best of the web!
Learn more about Digg by taking the tour.
PDF TextOnline - PDF Text Extraction in Your Browser
pdftextonline.com — PDF TextOnline Is a new (Beta) Ajax application that allows you to upload PDF documents right from your browser and convert them to text that you can easily copy past any where you need without the hassle that you usually gets from Adobe Acrobat or other PDF viewer.
- 886 diggs
- digg it
- webtech, on 10/12/2007, -4/+2More in here:
http://go2web2.blogspot.com/2006/09/pdf-textonline-pdf-text-extraction-in.html- chesterjosiah, on 10/12/2007, -1/+9If you use Adobe, stop using it and get Foxit Reader. It opens PDFs INSTANTLY and lets you copy and paste text very easily. It's a small executable (no install) and it just plain rools. I will NEVER use Adobe again.
Link: http://www.foxitsoftware.com/pdf/rd_intro.php - spyres, on 10/12/2007, -0/+2Foxit reader is fast, but there are complex pdf's that it can sometimes not render. If one needs ffidelity, one needs to use Adobe's reader.
- chesterjosiah, on 10/12/2007, -1/+9If you use Adobe, stop using it and get Foxit Reader. It opens PDFs INSTANTLY and lets you copy and paste text very easily. It's a small executable (no install) and it just plain rools. I will NEVER use Adobe again.
- chad78, on 10/12/2007, -2/+8Not everyone can afford Acrobat, and not everyone can download reader (people who get online at Internet cafes or Libraries, etc.) Adobe Reader is huge. It's like a 70 MB file or something. People on dialup might not want to download it when they can go to this site, for free.
I really like the site. It's a good service. I could see people with vision problems loving this site.- OMGWTFROFLMAO, on 10/12/2007, -1/+22I suggest foxit reader. I have turned my back on adobe reader since discovering it.
www.foxitsoftware.com
The download is less than 1 mb in size. - mjm01010101, on 10/12/2007, -0/+4Foxit is fine, I use it myself for quick PDF viewing, but be aware of the following issues:
Rending PDF's is sub par overall to Acrobat's.
Printing PDF's is sub par overall to Acrobat's to some printers. In our environemnt we cannot get consistent printing from Adobe to konica printers, and we can't get good output to Xerox Printers using Foxit.
Overall, for quick "nonprofessional" viewing, foxit cannot be beat.
- OMGWTFROFLMAO, on 10/12/2007, -1/+22I suggest foxit reader. I have turned my back on adobe reader since discovering it.
- dankoleary, on 10/12/2007, -5/+4A simple OCR like that is faily easy to do. Good for them for bringing this service to the public.
- edvas, on 10/12/2007, -0/+13Cool service, but note that it isn't OCR. The text stream is embedded in the pdf.
- FishPoisonCon, on 10/12/2007, -14/+5crap, bury this comment :(
- rocke86, on 10/12/2007, -0/+9Nice service. You can also email to your gmail account and view it in html or use the pdfdownload FF extension that give you the option to view as html.
https://addons.mozilla.org/firefox/636/ - edzieba, on 10/12/2007, -0/+5Isn't it just easier to copy&paste from the reader (I use Foxit, but I'm pretty sure you can switch to text-select in Acrobat too)?
- anonyjames, on 10/12/2007, -8/+0I don't think you can copy and paste from Adobe Reader, though I haven't used it for some time.
- jdawg19, on 10/12/2007, -0/+12@anonyjames
You can copy and paste from reader, its not hard at all, and its been capable of that for a long time. - aarons44, on 10/12/2007, -0/+3Not necessarily. Complex (or poorly designed) documents cannot be copied from using Acrobat. For example, SPIDynamics' WebInspect application security scanning tool produces a .pdf report. There's something about the layout that won't let you copy lines individually, only larger sections. So if I just want to copy and paste a URL into my browser from the report, I can't. I get the lines above and below as well. The other question would be whether this service can convert protected documents. Nothing makes me more upset than purching a .pdf document (ISO Standards come to mind) that I can't copy and paste from. I'd like to test this service with a protected document, but it's suffering the Digg effect right now.
- aarons44, on 10/12/2007, -0/+2Well, I was just about to try a protected .pdf, when I saw the big warning that they are going to store copies of all uploaded documents for "quality control purposes" and that you should own the IP rights to any document you upload. The tinfoil-hat wearer in me is not allowed to use a service that states they will save copies of documents. Shoot. Still might go home and created a protected document with the full Acrobat product and try it just for fun.
- Str8Dog, on 10/12/2007, -9/+3cease and desist in 3...2...1...
- CypherXero, on 10/12/2007, -4/+2Because we all know it's illegal to open a PDF document...
- oneeyedelf1, on 10/12/2007, -1/+2I use kpdf, and I thought everyone could naturally select text out of pdf documents.
- kendawg, on 10/12/2007, -1/+5you can, this site isn't needed.
- Leiterfluid, on 10/12/2007, -3/+0but can you COPY PAST anywhere, like the submitter states?
- Phocion55, on 10/12/2007, -1/+1It just expedites the process, I guess.
- jaderobbins, on 10/12/2007, -0/+2that is assuming it's inserted into the pdf as text, i've seen soooooo any pdf's which are scans of documents that are thrown into the pdf as an image, in which case you do need OCR :(
- FishPoisonCon, on 10/12/2007, -0/+2tell me about it :^|
/works in prepress - wqwert, on 10/12/2007, -0/+1Can you recommend an OCR for converting scanned PDFs to text?
- strcmp, on 10/12/2007, -0/+3Those are pretty prevalent in online journal databases (e.g. JSTOR). They give the PDF file format a bad name because they are generally bloated and slow to print and render.
- FishPoisonCon, on 10/12/2007, -0/+2Omnipage SE is OK, and it's free... saves scans and flattened files (.jpgs, tiffs, etc) as word docs, which can be copy/pasted or imported into quark or indesign - which is what you should be making pdfs with in the first place (NOT WORD!!!)
- aarons44, on 10/12/2007, -0/+2FishPoisonCon,
You really DO work in prepress, don't you. Quark, Indesign, Macs? I'll bet you've seen some 200+ MB .pdf files in your time (I used to work at a newspaper as well, in IT). You're right though. In an environment like that you learn a whole new way of looking at .pdfs (and a slew of expletives for dealing with them as well).
- FishPoisonCon, on 10/12/2007, -0/+2tell me about it :^|
- FishPoisonCon, on 10/12/2007, -7/+2apple/control + p , save to file/save as PDF
- FishPoisonCon, on 10/12/2007, -8/+2*****... i totally read the headline wrong, i need to go smoke... bury this comment as well
- FishPoisonCon, on 10/12/2007, -2/+3this seems pretty cool... this site isn't working for me, but.... what if the fonts aren't embedding correctly? (it's more common than you think)
- ephemeral, on 10/12/2007, -2/+5It's the rad AJAX fad, c'mon. Jump on board. Don't worry about using a silly application PDF reader for this. Desktop applications are passé, it's much more fun to do it through 70 layers of abstraction through the browser interweb, not to mention more inefficient. GO AJAX!
- WaterDragon, on 10/12/2007, -3/+1I thought the entire point of PDF (protected document format) was to enable people to create documents that COULDN'T be copied and pasted, or picked apart into little pieces by plagiarists, but must remain whole, with the original context.
- strcmp, on 10/12/2007, -0/+3You thought wrong. PDF stands for Portable Document Format, although the format does support DRM to disable copying, pasting and printing.
- FishPoisonCon, on 10/12/2007, -0/+1you can enable password security to restrict adjustments/importing/printing (but there are ways to circumvent it)
- noksagt, on 10/12/2007, -0/+1You can do much better locally without having to wait for your PDF to upload.
If you don't like selecting/copying/pasting or using the in-built text conversion in many PDF viewers, you can use pdftotext, which is available as part of xpdf on all operating systems:
http://www.foolabs.com/xpdf
If you're uncomfortable with the command line, it would be trivial to wrap this in a script which could provide a drag & drop target, just as this site does. pdftotext is minuscule & can be run from a usb drive. - FishPoisonCon, on 10/12/2007, -4/+1meant to be a reply, wtf is going on?
- Coreguy, on 10/12/2007, -0/+1Why did it keep greeting me with error when I attempted to upload a file
It always stopped at 12% then "got a network error"
- mv36, on 10/12/2007, -0/+3Thanks for visiting, but we're currently getting hammered by digg!
It's nice to be noticed like that, but we didn't plan on getting pounded like this so soon. So, we're signing off for a while.
Don't worry, we'll be back soon enough with a bigger server! If you want, you can bookmark us so you'll remember to come back.
In the meantime, do check us out at http://snowtide.com to learn about PDFTextStream, the real brains behind PDFTextOnline.
Regards,
The Snowtide/PDFTextOnline Team- cemerick, on 10/12/2007, -1/+0Sorry, wrong reply link.
- Flareman, on 10/12/2007, -3/+1A question: does it also work with password protected PDFs?
- mv36, on 10/12/2007, -0/+1TiAS = Try it and see.
But you just got to wait till it is back now. - cemerick, on 10/12/2007, -0/+2Flareman --
No, PDFTextOnline does not yet support extracting content from password-protected PDF's. That is coming, but first we need to recover and regroup from the pounding we've gotten! :-)
- Chas @ Snowtide
http://blog.snowtide.com
- mv36, on 10/12/2007, -0/+1TiAS = Try it and see.
- OrangeTide, on 10/12/2007, -0/+2I never had a problem cutting and pasting PDF text in unprotected documents. Of course the protected ones won't let you do that, if this webpage lets you get around the protection then that's a DMCA violation and people in the US are not allowed to use it. If it's hosted in the US then Adobe's lawyers will probably be sending them a letter.
- bonked, on 10/12/2007, -0/+1Anyone know a simple way to convert a PDF to a JPG?? (without the obvious screen capture method, preferably with free software)
- gmillerd, on 10/12/2007, -0/+2some of the hylafax tools will do this, take the pdf, break it down by pages into tiff. then take each 'page' of the tiff and break that into whatever. easy perl script really.
- alspar, on 10/12/2007, -0/+2Hylafax actually uses ghostscript for all the heavy lifting (available for lots of platforms at http://www.cs.wisc.edu/~ghost/ ). For converting PDF to JPEG you probably want something like
gs -sDEVICE=jpeg -sOutputFile=foo.jpg foo.pdf
- firus, on 10/12/2007, -3/+2Thanks for visiting, but we're currently getting hammered by digg!
It's nice to be noticed like that, but we didn't plan on getting pounded like this so soon. So, we're signing off for a while.
Don't worry, we'll be back soon enough with a bigger server! If you want, you can bookmark us so you'll remember to come back.
In the meantime, do check us out at http://snowtide.com to learn about PDFTextStream, the PDF text extraction brains behind PDFTextOnline.
Regards,
The Snowtide/PDFTextOnline Team - Haggismuncher, on 10/12/2007, -0/+0Does anyone know of a way to combine multiple PDF files without using Acrobat ?
- cemerick, on 10/12/2007, -0/+0PDFTextOnline is back: http://blog.snowtide.com/2006/09/11/pdftextonline-back-online-now-beta-2
Thanks for all of your interest! - serval, on 10/12/2007, -0/+0I just used this service with a resume and cover letter ... it was pretty dang good and very convenient. BUT the quality is not quite as high as the output from "PrimoPDF" which I have installed on my computer.
Can't comment on the quality of picture or odd symbol conversion, but it seems to be nearly as good as primopdf which is pretty good -- though I'll stick w/primo for now. - thydzik, on 10/11/2007, -0/+1the is genius.
the only pdf2text I have found that actually preserves formatting, well done
