Over the years I've acquired a bit of a collection of old service manuals, and a few other rare works. A lot of these are already available from various sources in electronic form, but some of them are not. I've always thought it would be helpful to others if I scanned some of my collection, and put it online. Whether through existing archive sites, or my own.
So far, the scanning equipment and tools I have are crude, slow, and not really feasible for larger manuals. But I've been making a few experimental attempts. Some examples are here:
http://everist.org/archives/scans/Any comments appreciated, especially suggestions for improvement.
Has anyone else here tried this, and what setup did you use?
The method I'm currently using to present the scans I've done is a bit unusual. I don't much like PDFs (more on that later) and the best alternative I've found is a scheme known as RAR-books. It's a bit clumsy and requires a non-freeware utility that probably most people don't have - WinRAR, which is an alternative to WinZip and other file compression tools.
The key feature of WinRAR is that it can open an archive in which the actual archive does not begin at the start of the file. It just skips over whatever it finds until it gets to the archive header. One side effect of this is that you can concatenate two files - the first a plain JPG image, and the second the RAR archive. The result will open and display as a JPG image (giving you a 'front cover' for the file) and WinRAR can extract the remaining contents.
You can tell if you're looking at a RAR-book, as the JPG image will appear complete very quickly (it will only be a few K), but the browser will still be downloading the rest - possibly many megabytes. If the file name is something.jpg, the image is smallish, but the file is unexpectedly big, it's probably a RAR-book.
This technique is very popular among people free-sharing scanned/OCR'd novels and other literature. It was common for a while on 4chan's 'Lit-thursday', before 4chan decided that even for that den of villainy, the amount of copyright infringement going on was getting excessive. You did realize that copyright infringement is far, far worse than rape, murder, sedition, every type of porn, and so on, didn't you?
For the moment I'm using the RAR-book form of wrapper, with page images strung together with simple html. It allows me complete freedom to optimize image compression for each page. I'm very much concerned with trying to cleanly retain the original appearance of historic documents. (Which these technical manuals are.) So that degree of image quality control is essential, I believe.
However I'm not totally happy with the RAR-book form since apart from the non-freeware utility, it also doesn't allow inclusion of an accessible plain text version of the contents, that can be searched, selected and transfered to an editor. This is a crucial feature. I'm thinking maybe I can incorporate that via the html, but not yet sure how to do it so the image and plain text are associated, word for word.
For scanning, I generally find that 400 dpi grayscale is adequate for text, but must be increased to 600 dpi if there are any screened images on the page. And colour if the page has colours, obviously. These both result in fairly huge scan files, typically 20 to 80 MB per page. Then the choice of final resolution and pixel-coding scheme depends on the content. B&W text works OK with 16-level grayscale, giving 4 bits/pixel. It's really necessary to retain some gray levels, or the horrible ragged 'FAX effect' occurs making character outlines look nasty. It can be quite tricky to choose the grayscale transfer curve to get a flat white background (if that's wanted) while retaining shading on character edges.
The first book-scan I did was Vinge's 'True Names'. It's a SF classic - the first VR novel. Then it was out of print for over a decade. So... That one I OCR'd, but the software was less than ideal. I don't currently have a contemporary OCR tool.
The GR 900-LB Slotted Line manual I'm trying now is the most ambitious one yet. It's about 80 pages, with many diagrams.
If that turns out acceptably, I think I'll try something with foldout schematics next.
And if I ever get some scanning equipment that can cycle pages a lot faster than my present setup, there's a very large, rare, historic book of about 400 pages with many engravings, that I'd love to dump free on the net. Just to piss off a certain group of arseholes.
Why not PDF? Sigh, where to begin? I'll leave that for another comment.