accurate PDF extraction for Word documents