Newxp.PreProcessing (newxp v0.1.0)

Copy Markdown

Summary

Functions

Get configured html2text options.

Process content for general applications.

Convert HTML to plain text for summarization.

Functions

get_html2text_handler()

Get configured html2text options.

process_for_general(html)

Process content for general applications.

This includes:

  • Core HTML cleaning (figures, tables, read-more)
  • Convert to plaintext (preserving most HTML structure)

process_for_summary(html)

Convert HTML to plain text for summarization.