Accurate OCR for PDFs to DOC