3.13.  Images in PDF Documents

Overview

An outdated image in a document impresses a customer as much as a repeated New Year's speech. You should be sure that the new logo is actually shown on the document and not the old one.

Another source of errors with images is that a picture was not found when creating the PDF, so it is missing in the document. Let an automated test detect this error, not your customer.

And finally one kind of error has to be mentioned: images sometimes appear on the wrong page.

All errors can be detected with these tags:

<!-- Tags for image tests: -->

<hasNumberOfDifferentImages />
<hasNumberOfVisibleImages />
<containsImage file=".."   (required)
               
               on=".."                 (one of the page selection attributes
               onPage=".."             ...
               onEveryPageAfter=".."   ...
               onEveryPageBefore=".."  ...
               onAnyPageAfter=".."     ...
               onAnyPageBefore=".."    ... is required)
/>

The number of images inside a PDF document is typically not the same as the number of images you can see when it is printed. A logo visible on 10 pages is stored only once within the document. So PDFUnit provides two tags. The tag <hasNumberOfDifferentImages /> validates the number of images stored internally and the tag <hasNumberOfVisibleImages /> validates the number of visible images.

Number of different Images inside PDF

The following listing shows the syntax for verifing the number of images internally stored in PDF:

<testcase name="hasNumberOfDifferentImages">
  <assertThat testDocument="images/imageDemo.pdf">
    <hasNumberOfDifferentImages>2</hasNumberOfDifferentImages>
  </assertThat>
</testcase>

How do you know in this example that 2 is the right number? How do you know which images are stored internally for a given PDF? The answer to both questions is given by the utility program ExtractImages. You can use it to extract all images from a document into separate files. The chapter 9.7: “Extract Images from PDF” describes this topic in detail.

Number of visible Images inside a PDF

The next example validates the number of visible images:

<testcase name="hasNumberOfVisibleImages">
  <assertThat testDocument="images/imageDemo.pdf">
    <hasNumberOfVisibleImages>6</hasNumberOfVisibleImages>
  </assertThat>
</testcase>

The sample document has 6 images on 6 pages, but 2 images on page 3 and no image on page 4.

The test for the visual images can be limited to specified pages. In the following example, only the images on page 3 are counted:

<testcase name="hasNumberOfVisibleImages_OnPage3">
  <assertThat testDocument="images/imageDemo.pdf">
    <hasNumberOfVisibleImages onPage="3">2</hasNumberOfVisibleImages>
  </assertThat>
</testcase>

The same image shown twice on a page is counted twice.

The possibilities for limitting tests to specified pages are described in chapter 13.2: “Page Selection”.

Validate the Existence of an Expected Image

After counting images you might need to test the images themselves. In the following example, PDFUnit verifies that a given image is part of a PDF document:

<testcase name="containsImage">
  <assertThat testDocument="images/imageDemo.pdf">
    <containsImage file="images/apache-software-foundation-logo.png" 
                   on="ANY_PAGE" 
    />
  </assertThat>
</testcase>

The result of a comparison of two images depends on their file formats. PDFUnit can handle JPEG, PNG, GIF, BMP and WBMP. The images are compared byte by byte. Therefore, BMP and PNG versions of an image are not recognized as equal.

A tool which generates PDF can carry out a format conversion when importing images from a file because not all image formats are supported in PDF. So it might be impossible for PDFUnit to successfully compare an image inside your PDF file with the original image file. If you have such a problem, extract the desired image of a sample document into a new PNG file by following these steps:

  • Extract all pictures from a PDF file using ExtractImages. All pictures are stored as PNG.

  • Verify the picture you want to use.

  • Use PDFUnit as demonstrated in the listing above.

Use Multiple Images for Comparison

It might be that a PDF document contains one of three possible logos. Or the signature is one of five possible ones. Use the tag <containsOneOfTheseImages /> to test such a situation:

<testcase name="containsOneOfManyImages_alex">
  <assertThat testDocument="images/letter-signed-by-alex.pdf">
    <containsOneOfTheseImages on="LAST_PAGE">
      <image file="images/signature-alex.png" />
      <image file="images/signature-bob.png" />
    </containsOneOfTheseImages>
  </assertThat>
</testcase>

This test can also refer to several sides of a document, as the following section shows.

Validate Images on Specified Pages

The tests for images can be restricted to single pages, multiple individual or multiple contiguous pages. All possibilities are described in chapter 13.2: “Page Selection”.

Here are some examples:

<testcase name="containsImage_OnEveryPageAfter4">
  <assertThat testDocument="images/imageDemo.pdf">
    <containsImage file="images/apache-software-foundation-logo.png" 
                   onEveryPageAfter="4" 
    />
  </assertThat>
</testcase>
<testcase name="containsImage_OnMultipleSelectedPages">
  <assertThat testDocument="images/imageDemo.pdf">
    <containsImage file="images/apache-software-foundation-logo.png" 
                   onPage="1, 5"  />
  </assertThat>
</testcase>

Tags can be used multiple times. But it might be better to write two separate tests:

<testcase name="containsImage_MultipleInvocation">
  <assertThat testDocument="images/imageDemo.pdf">
    <containsImage file="images/apache-software-foundation-logo.png" 
                   onEveryPageAfter="4"
    />
    <containsImage file="images/apache-ant-logo.png" 
                   onPage="3"
    />
  </assertThat>
</testcase>

All images in a PDF document can be compared to the images of a master PDF. Those tests are described in chapter 4.10: “Comparing Images”.