Key features:
Non-XML parsers:
A state of art non-XML parser for extracting text from PDF, doc and other formats.
  • Various PDF Structures (Dual column, Special Header elements)
  • Patent Offices/Full text Journals
  • Paragraph wrapping (disambiguate Chemical Name and Para / Claim Nos.)
  • Inconsistent Section Markers
  • Tables spanning across columns
XML parsers:
XML parsers consists of a library of parsers  based on xml formats of all major publishers, ncbi based formats like JATS, NLM Xml, Pubmed,  SciELO XML  etc. It can be customized for any new format in a very short span of time.

Areas of Application

  • Research Articles
  • Patents
  • Reports
  • Manuscripts