Structured Text Options¶
- preserve-images
Keep images in output
- preserve-ligatures
Do not expand ligatures into constituent characters
- preserve-spans
Do not merge spans on the same line
- preserve-whitespace
Do not convert all whitespace into space characters
- inhibit-spaces
Don’t add spaces between gaps in the text
- paragraph-break
Break blocks at paragraph boundaries
- dehyphenate
Attempt to join up hyphenated words
- ignore-actualtext
Do not apply ActualText replacements
- use-cid-for-unknown-unicode
Use character code if unicode mapping fails
- use-gid-for-unknown-unicode
Use glyph index if unicode mapping fails
- accurate-bboxes
Calculate char bboxes from the outlines
- accurate-ascenders
Calculate ascender/descender from font glyphs
- accurate-side-bearings
Expand character bboxes to completely include width of glyphs
- collect-styles
Attempt to detect text features (fake bold, strikeout, underlined etc)
- clip
Do not include text that is completely clipped
- clip-rect=x0:y0:x1:y1
Specify clipping rectangle within which to collect content
- structured
Collect structure markup
- vectors
Include vector bboxes in output
- segment
Attempt to segment the page
- table-hunt
Hunt for tables within a (segmented) page