3.34.  Text - Right to Left (RTL)

Overview

Tests with RTL-text do not differ from tests with LTR-text, so all methods for comparing text can be used:

// Testing page content:
.hasText()                 // pages and regions has to be specified before

// Validating expected text:
.hasText().containing(..) 
.hasText().containing(.., WhitespaceProcessing)
.hasText().endingWith(..)
.hasText().endingWith(.., WhitespaceProcessing)
.hasText().equalsTo(..) 
.hasText().equalsTo(.., WhitespaceProcessing)
.hasText().matchingRegex(..) 
.hasText().startingWith(..) 

// Prove the absence of defined text:
.hasText().notContaining(..) 
.hasText().notContaining(.., WhitespaceProcessing)
.hasText().notEndingWith(..)
.hasText().notMatchingRegex(..) 
.hasText().notStartingWith(..)

// Validate multiple text in an expected order:
.hasText().inOrder(..)
.hasText().containingFirst(..).then(..)

Example - 'hello, world' from right to left

The next examples use two PDF documents which contain the text 'hello, world' in Arabic and in Hebrew:

// Testing RTL text:
@Test
public void hasRTLText_HelloWorld_Arabic() throws Exception {
  String filename = "helloworld_ar.pdf";
  String rtlHelloWorld = "مرحبا، العالم";  // english: 'hello, world!'
  
  int leftX  = 97;
  int upperY = 69;
  int width  = 69;
  int height = 16;
  PageRegion pageRegion = new PageRegion(leftX, upperY, width, height);
  AssertThat.document(filename)
            .restrictedTo(FIRST_PAGE)
            .restrictedTo(pageRegion)
            .hasText()
            .startingWith(rtlHelloWorld)
  ;
}
// Testing RTL text:
@Test
public void hasRTLText_HelloWorld_Hebrew() throws Exception {
  String filename = "helloworld_iw.pdf";
  String rtlHelloWorld = "שלום, עולם";   // english: 'hello, world!'
  
  int leftX  = 97;
  int upperY = 69;
  int width  = 69;
  int height = 16;
  PageRegion pageRegion = new PageRegion(leftX, upperY, width, height);
  AssertThat.document(filename)
            .restrictedTo(FIRST_PAGE)
            .restrictedTo(pageRegion)
            .hasText()
            .endingWith(rtlHelloWorld)
  ;
}

It's interesting that the Java-editor in Eclipse can handle text with both text directions. Here is a screenshot of the Java code from the previous example:

Internally, PDFUnit uses the PDF-Parser PDFBox. PDFBox parses RTL-text and converts it into a Java string without the need for any special method calls. Congratulations to the development team for such an achievement!