Create, Read and Update PDF Using iText Tool

iText is a free and open source library for creating and manipulating PDF files in Java. iText has been ported to the .NET Framework under the name iTextSharp. iTextSharp is written in C# and it has a separate codebase, but it is synchronized to iText releases. iText is a tool that focuses on the automation side of things.

When to use iText?

Typically, iText is used in projects that have one of the following requirements:

  • The content isn’t available in advance: it’s calculated based on user input or real-time database information.
  • The PDF files can’t be produced manually due to the massive volume of content: a large number of pages or documents.
  • Documents need to be created in unattended mode, in a batch process.
  • The content needs to be customized or personalized; for instance, the name of the end user has to be stamped on a number of pages.
Often you’ll encounter these requirements in web applications, where content needs to be served dynamically to a browser. Normally, you’d serve this information in the form of HTML, but for some documents, PDF is preferred over HTML for better printing quality, for identical presentation on a variety of platforms, for security reasons, or to reduce the file size.

The functionality covered by iText is marked with the following dots:
bullet Supported by iText
bullet Partly supported by iText
iText’s main purpose is to create and manipulate PDF documents.

[advt]iText is a PDF library

iText is an API that was developed to allow developers to do the following (and much more):

  • Generate documents and reports based on data from an XML file or a database
  • Create maps and books, exploiting numerous interactive features available in PDF
  • Add bookmarks, page numbers,watermarks, and other features to existing PDF documents
  • Split or concatenate pages from existing PDF files
  • Fill out interactive forms
  • Serve dynamically generated or manipulated PDF documents to a web browser

iText is not an end-user tool. You have to build iText into your own applications so that you can automate the PDF creation and manipulation process.

Download iText

Creating PDFs

With the Document and the PdfWriter class, you can create PDF documents from scratch from a database, an XML file, or any other data source. You can do this in three different ways:
  • using high-level objects such as ChunkPhraseParagraphList, and so on. These objects are often referred to as iText’s basic building blocks.
  • using low-level functionality. This is done with PdfContentByte, a class that consists of a series of methods that map to every operator and operand available in Adobe’s imaging model. This class also has numerous convenience methods to draw arcs, circles, rectangles and text at absolute positions.
  • Using PdfGraphics2D which is iText’s implementation of the abstract Graphics2D class in Java (not available in iTextSharp).

iText ships with a plethora of classes that support ecnryption, different image types, color spaces, fonts. There’s functionality to enhance the accessibility of the PDF file, support for the integration of Flash apps into the PDF, and so on.iText can convert an XML or an HTML file to PDF, but only on a very basic level. Converting documents from one format to another is outside the scope of iText. And no: iText does not convert Word documents to PDF!

Updating PDFs

You always need a PdfReader instance to access an existing document. You can use this reader in combination with PdfStamper to stamp extra content on the existing PDF document: page numbers, a watermark, annotations, and so on. PdfStamper is also the class you’ll use to fill out interactive forms. iText has almost complete support for AcroForms, but as soon as you have a form involving the XML Forms Architecture, the possibilities are limited.

You can split and merge PDF documents with PdfCopyPdfSmartCopyPdfCopyFields, and even using PdfImportedPage objects in combination with PdfWriter or PdfStamper.

Converting a PDF document to another format is outside the scope of iText, but you can convert a PDF to XML if the PDF was tagged and contains a structure tree. Depending on how the PDF was created, you can also extract plain text from a page.
iText can also be used to sign existing PDF documents, as well as to encrypt them.

Reading PDFs

iText isn’t a PDF viewer, iText can’t convert PDF to an image, nor can iText be used to print a PDF, but the PdfReader class can give you access to the objects that form a PDF document and to the the content stream of each page. This content stream can be parsed and if the content wasn’t added as rasterized text, you can convert a page to plain text. Note that iText doesn’t do OCR.

Be the first to comment

Leave a Reply

Your email address will not be published.