9.12.  Extract XMP Data to XML

The utility program ExtractXMPData writes the document level XMP data from a PDF document into an XML file. This file can be used for the PDFUnit tests described in section 3.38: “XMP Data”.

XMP data can be found on other places in the PDF than just the document level. Such XMP data is currently not extracted. But it is intended to provide the extraction of all XMP data in the next release of PDFUnit.

Program Start

::
:: Extract XMP data from a PDF document as XML
::

@echo off
setlocal
set CLASSPATH=./lib/aspectj-1.8.7/*;%CLASSPATH%
set CLASSPATH=./lib/bouncycastle-jdk15on-153/*;%CLASSPATH%
set CLASSPATH=./lib/commons-logging-1.2/*;%CLASSPATH%
set CLASSPATH=./lib/pdfbox-2.0.0/*;%CLASSPATH%
set CLASSPATH=./lib/pdfunit-2016.05/*;%CLASSPATH%

set TOOL=com.pdfunit.tools.ExtractXMPData
set OUT_DIR=./tmp
set IN_FILE=LXX_vocab.pdf
set PASSWD=

java  %TOOL%  %IN_FILE%  %OUT_DIR%  %PASSWD%
endlocal

Input

The XMP data will be extracted from LXX_vocab.pdf.

Output

A part of the output file _xmpdata_LXX_vocab.out.xml is shown here:

<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
<?adobe-xap-filters esc="CRLF"?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='XMP toolkit 2.9.1-14, framework 1.6'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' 
            xmlns:iX='http://ns.adobe.com/iX/1.0/'
>
...
<rdf:Description rdf:about='uuid:f6a30687-f1ac-4b71-a555-34b7622eaa94' 
                 xmlns:pdf='http://ns.adobe.com/pdf/1.3/' 
                 pdf:Producer='Acrobat Distiller 6.0.1 (Windows)' 
                 pdf:Keywords='LXX, Septuagint, vocabulary, frequency'>
</rdf:Description>
<rdf:Description rdf:about='uuid:f6a30687-f1ac-4b71-a555-34b7622eaa94' 
                 xmlns:xap='http://ns.adobe.com/xap/1.0/' 
                 xap:CreateDate='2006-05-02T11:35:38-04:00' 
                 xap:CreatorTool='PScript5.dll Version 5.2.2' 
                 xap:ModifyDate='2006-05-02T11:37:57-04:00' 
                 xap:MetadataDate='2006-05-02T11:37:57-04:00'>
</rdf:Description>
...
</rdf:RDF>
</x:xmpmeta>