Key features:
Non-XML parse:
A state of art non-XML parse for extracting text from PDF, doc and other formats.
  • Various PDF Structures (Dual column, Special Header elements)
  • Patent Offices/Full text Journals
  • Paragraph wrapping (disambiguate Chemical Name and Para / Claim Nos.)
  • Inconsistent Section Markers
  • Tables spanning across columns
XML parse:
XML parse consists of a library of parse based on xml formats of all major publishers, ncbi based formats like JATS, NLM Xml, Pubmed,  SciELO XML  etc. It can be customized for any new format in a very short span of time.

Areas of Application

  • Research Articles
  • Patents
  • Reports
  • Manuscripts