Using the Pdf_Sum component for the Harvest Gatherer

This Pdf_Sum component can summarize Adobe Acrobat PDF files. It uses the program component to translate PDF files temporarily into Postscript. It then uses the ps2txt-1.0 converter to translate the postscript to text, and summarizes the text. The Pdf_Sum software is provided by Dan Schmitt of the Center for Natural Resource Information Technology. The Pdf_Sum software is released under the gnu public license.

To use this component,

  1. Retrieve the Pdf_Sum component distribution (components/Pdf_Sum.tar.gz) from one of the Harvest software distribution sites.
  2. Unpack the distribution into the Harvest source tree, and add the component using the SetupComponent command:
         % gzip -dc Pdf_Sum.tar.gz | (cd harvest-1.x/components/gatherer; tar xvf -)
         % ./SetupComponent add gatherer Pdf_Sum
    
  3. Build and install the Harvest source tree as described in the Harvest User's Manual .

For more information about Harvest, see the Harvest home page.