Stripper v1.1.0 API Reference

Modules

Stripper is a package made to help you normalize input from web scraping (or other questionable sources).

This module exists for dealing with quotes. When parsing text from word processors or web pages, it is inevitable that you will encounter various smart-quotes, curly quotes, and even some backticks masquerading as apostrophes!

This module exists for dealing with whitespace. A space is a space is a space, right? Wrong. There are multiple unicode characters that represent whitespace: tabs, newlines, line-feeds, and a slew of lesser-known characters that are technically different entities but all of which could be referred to as "space".