PDFUnit can handle Unicode. The section 11: “Unicode” deals with this topic in detail.
The following sections describe a utility program that converts a Unicode string into its hex code. The hex code can be used in many of your tests. If you are using a small number of Unicode characters it is easier to use hex code than to install a new font on your computer.
The utility ConvertUnicodeToHex
converts any string into ASCII and escapes all non-ASCII characters
into their corresponding Unicode hex code. For example, the Euro character
is converted into \u20AC.
The input file can be of any encoding, but you have to define the right encoding before executing the program.
You start the Java program with the parameter -D:
:: :: Converting Unicode content of the input file to hex code. :: @echo off setlocal set CLASSPATH=./lib/pdfunit-2016.05/*;%CLASSPATH% set TOOL=com.pdfunit.tools.ConvertUnicodeToHex set OUT_DIR=./tmp set IN_FILE=unicode-to-hex.in.txt java -Dfile.encoding=UTF-8 %TOOL% %IN_FILE% %OUT_DIR% endlocal
So, the created file _unicode-to-hex.out.txt contains the following data:
#Unicode created by com.pdfunit.tools.ConvertUnicodeToHex #Wed Jan 16 21:50:04 CET 2013 unicode-to-hex.in_as-ascii=\u00E4\u00F6\u00FC \u20AC @
Leading and trailing whitespaces in the input string will be trimmed! When you need them for your test, add them later by hand.