I’m currently trying to extract text from a PDF document, but I encountered some strange cases with Tj operator. Normally I dealt with cases like these:
Tc (SOME_TEXT) TJ
Now encouter a case like this:
Tm [ ( )1.828 (5)1.841 (2)1.828 (2)1.828 (4)1.841 (9)1.828 (.)1.828 (6)1.841 (4) ] TJ
Which converts to string ‘52249.64’. Now encoutered yet another strange case:
Only info I could find is this: A string passed to Tj is always to be interpreted according to the Encoding or CMap for the font. (In this case I expect it is a CIDFont with a CMap)
Td ( \t\004\007\020\007\016\016\026\020 ) Tj
I still don’t understand. Are these some kind of indexes that indicate an offset in some kind of character array or do I have to decode these values? Thanks!