A state of art non-XML parse for extracting text from PDF, doc and other formats.
Various PDF Structures (Dual column, Special Header elements)
Patent Offices/Full text Journals
Paragraph wrapping (disambiguate Chemical Name and Para / Claim Nos.)
Inconsistent Section Markers
Tables spanning across columns
XML parse consists of a library of parse based on xml formats of all major publishers, ncbi based formats like JATS, NLM Xml, Pubmed, SciELO XML etc. It can be customized for any new format in a very short span of time.