Module re_tuner

Helper function for working with Regular Expression Erlanb re module.

Copyright © 2021 by Anatolii Kosorukov

Authors: Anatolii Kosorukov (java1cprog@yandex.ru) [web site: rustkas.github.io/].

Description

Helper function for working with Regular Expression Erlanb re module.

Data Types

compile_option()

compile_option() = unicode | anchored | caseless | dollar_endonly | dotall | extended | firstline | multiline | no_auto_capture | dupnames | ungreedy | {newline, nl_spec()} | bsr_anycrlf | bsr_unicode | no_start_optimize | ucp | never_utf

mp()

mp() = {re_pattern, term(), term(), term(), term()}

nl_spec()

nl_spec() = cr | crlf | lf | anycrlf | any

Function Index

avoid_characters/0The list of characters which raise an error if escape character is not used.
is_full_match/2Check whether a string fits a certain pattern in its entirety.
is_match/2Check whether a match can be found for a particular regular expression in a particular string.
mp/1It is reduced form of re:compile/1 function.
mp/2It is reduced form of re:compile/1 function.
replace/1Replace one of shorthand pattern from the list [\s,\w,\h,v] in a pattern string.
save_pattern/1Make save Regex pattern which make literal for any character.
tune/1Replace Regex pattern to more siple one.
unicode_block/1The Unicode character database divides all the code points into blocks.

Function Details

avoid_characters/0

avoid_characters() -> Result

returns: The list of spectial characters.

The list of characters which raise an error if escape character is not used.

is_full_match/2

is_full_match(Text, ReInput) -> Result

Text: regex pattern

returns: true or false

Check whether a string fits a certain pattern in its entirety. A partial match is not sufficient.
See also: http://erlang.org/doc/man/re.html#compile_2.

is_match/2

is_match(Text, ReInput) -> Result

Text: regex pattern

returns: true or false

Check whether a match can be found for a particular regular expression in a particular string. A partial match is sufficient.
See also: http://erlang.org/doc/man/re.html#compile_2.

mp/1

mp(Regex) -> MP | {error, badarg}

Regex: regex pattern

returns: Opaque data type containing a compiled regular expression

It is reduced form of re:compile/1 function. Return opaque data type containing a compiled regular expression or raise an error badarg.
See also: mp().

mp/2

mp(Regex, Options) -> MP | {error, badarg}

Regex: regex pattern
Options: additional regular expression metadata

returns: Opaque data type containing a compiled regular expression

It is reduced form of re:compile/1 function. Return opaque data type containing a compiled regular expression or raise an error badarg.
See also: mp().

replace/1

replace(Pattern) -> UpdatedPattern

Pattern: searched regex pattern for replacing

returns: Updated Regex pattern string

Replace one of shorthand pattern from the list [\s,\w,\h,v] in a pattern string.
Don't apply \w shorthand to unicode content.

save_pattern/1

save_pattern(Pattern) -> SavePattern

returns: Save pattern

Make save Regex pattern which make literal for any character.

tune/1

tune(Regex) -> Result

returns: Transformed Regex pattern.

Replace Regex pattern to more siple one.

unicode_block/1

unicode_block(BlockName) -> Range | nomatch

BlockName: is Regular Expression block name

returns: Regular Expressions range of code points

The Unicode character database divides all the code points into blocks. Each block consists of a single range of code points. The code points U+0000 through U+FFFF are divided into 156 blocks in version 6.1 of the Unicode standard.

  ‹U+0000…U+007F \p{InBasicLatin}›
  ‹U+0080…U+00FF \p{InLatin-1Supplement}›
  ‹U+0100…U+017F \p{InLatinExtended-A}›
  ‹U+0180…U+024F \p{InLatinExtended-B}›
  ‹U+0250…U+02AF \p{InIPAExtensions}›
  ‹U+02B0…U+02FF \p{InSpacingModifierLetters}›
  ‹U+0300…U+036F \p{InCombiningDiacriticalMarks}›
  ‹U+0370…U+03FF \p{InGreekandCoptic}›
  ‹U+0400…U+04FF \p{InCyrillic}›
  ‹U+0500…U+052F \p{InCyrillicSupplement}›
  ‹U+0530…U+058F \p{InArmenian}›
  ‹U+0590…U+05FF \p{InHebrew}›
  ‹U+0600…U+06FF \p{InArabic}›
  ‹U+0700…U+074F \p{InSyriac}›
  ‹U+0750…U+077F \p{InArabicSupplement}›
  ‹U+0780…U+07BF \p{InThaana}›
  ‹U+07C0…U+07FF \p{InNKo}›
  ‹U+0800…U+083F \p{InSamaritan}›
  ‹U+0840…U+085F \p{InMandaic}›
  ‹U+08A0…U+08FF \p{InArabicExtended-A}›
  ‹U+0900…U+097F \p{InDevanagari}›
  ‹U+0980…U+09FF \p{InBengali}›
  ‹U+0A00…U+0A7F \p{InGurmukhi}›
  ‹U+0A80…U+0AFF \p{InGujarati}›
  ‹U+0B00…U+0B7F \p{InOriya}›
  ‹U+0B80…U+0BFF \p{InTamil}›
  ‹U+0C00…U+0C7F \p{InTelugu}›
  ‹U+0C80…U+0CFF \p{InKannada}›
  ‹U+0D00…U+0D7F \p{InMalayalam}›
  ‹U+0D80…U+0DFF \p{InSinhala}›
  ‹U+0E00…U+0E7F \p{InThai}›
  ‹U+0E80…U+0EFF \p{InLao}›
  ‹U+0F00…U+0FFF \p{InTibetan}›
  ‹U+1000…U+109F \p{InMyanmar}›
  ‹U+10A0…U+10FF \p{InGeorgian}›
  ‹U+1100…U+11FF \p{InHangulJamo}›
  ‹U+1200…U+137F \p{InEthiopic}›
  ‹U+1380…U+139F \p{InEthiopicSupplement}›
  ‹U+13A0…U+13FF \p{InCherokee}›
  ‹U+1400…U+167F \p{InUnifiedCanadianAboriginalSyllabics}›
  ‹U+1680…U+169F \p{InOgham}›
  ‹U+16A0…U+16FF \p{InRunic}›
  ‹U+1700…U+171F \p{InTagalog}›
  ‹U+1720…U+173F \p{InHanunoo}›
  ‹U+1740…U+175F \p{InBuhid}›
  ‹U+1760…U+177F \p{InTagbanwa}›
  ‹U+1780…U+17FF \p{InKhmer}›
  ‹U+1800…U+18AF \p{InMongolian}›
  ‹U+18B0…U+18FF \p{InUnifiedCanadianAboriginalSyllabicsExtended}›
  ‹U+1900…U+194F \p{InLimbu}›
  ‹U+1950…U+197F \p{InTaiLe}›
  ‹U+1980…U+19DF \p{InNewTaiLue}›
  ‹U+19E0…U+19FF \p{InKhmerSymbols}›
  ‹U+1A00…U+1A1F \p{InBuginese}›
  ‹U+1A20…U+1AAF \p{InTaiTham}›
  ‹U+1B00…U+1B7F \p{InBalinese}›
  ‹U+1B80…U+1BBF \p{InSundanese}›
  ‹U+1BC0…U+1BFF \p{InBatak}›
  ‹U+1C00…U+1C4F \p{InLepcha}›
  ‹U+1C50…U+1C7F \p{InOlChiki}›
  ‹U+1CC0…U+1CCF \p{InSundaneseSupplement}›
  ‹U+1CD0…U+1CFF \p{InVedicExtensions}›
  ‹U+1D00…U+1D7F \p{InPhoneticExtensions}›
  ‹U+1D80…U+1DBF \p{InPhoneticExtensionsSupplement}›
  ‹U+1DC0…U+1DFF \p{InCombiningDiacriticalMarksSupplement}›
  ‹U+1E00…U+1EFF \p{InLatinExtendedAdditional}›
  ‹U+1F00…U+1FFF \p{InGreekExtended}›
  ‹U+2000…U+206F \p{InGeneralPunctuation}›
  ‹U+2070…U+209F \p{InSuperscriptsandSubscripts}›
  ‹U+20A0…U+20CF \p{InCurrencySymbols}›
  ‹U+20D0…U+20FF \p{InCombiningDiacriticalMarksforSymbols}›
  ‹U+2100…U+214F \p{InLetterlikeSymbols}›
  ‹U+2150…U+218F \p{InNumberForms}›
  ‹U+2190…U+21FF \p{InArrows}›
  ‹U+2200…U+22FF \p{InMathematicalOperators}›
  ‹U+2300…U+23FF \p{InMiscellaneousTechnical}›
  ‹U+2400…U+243F \p{InControlPictures}›
  ‹U+2440…U+245F \p{InOpticalCharacterRecognition}›
  ‹U+2460…U+24FF \p{InEnclosedAlphanumerics}›
  ‹U+2500…U+257F \p{InBoxDrawing}›
  ‹U+2580…U+259F \p{InBlockElements}›
  ‹U+25A0…U+25FF \p{InGeometricShapes}›
  ‹U+2600…U+26FF \p{InMiscellaneousSymbols}›
  ‹U+2700…U+27BF \p{InDingbats}›
  ‹U+27C0…U+27EF \p{InMiscellaneousMathematicalSymbols-A}›
  ‹U+27F0…U+27FF \p{InSupplementalArrows-A}›
  ‹U+2800…U+28FF \p{InBraillePatterns}›
  ‹U+2900…U+297F \p{InSupplementalArrows-B}›
  ‹U+2980…U+29FF \p{InMiscellaneousMathematicalSymbols-B}›
  ‹U+2A00…U+2AFF \p{InSupplementalMathematicalOperators}›
  ‹U+2B00…U+2BFF \p{InMiscellaneousSymbolsandArrows}›
  ‹U+2C00…U+2C5F \p{InGlagolitic}›
  ‹U+2C60…U+2C7F \p{InLatinExtended-C}›
  ‹U+2C80…U+2CFF \p{InCoptic}›
  ‹U+2D00…U+2D2F \p{InGeorgianSupplement}›
  ‹U+2D30…U+2D7F \p{InTifinagh}›
  ‹U+2D80…U+2DDF \p{InEthiopicExtended}›
  ‹U+2DE0…U+2DFF \p{InCyrillicExtended-A}›
  ‹U+2E00…U+2E7F \p{InSupplementalPunctuation}›
  ‹U+2E80…U+2EFF \p{InCJKRadicalsSupplement}›
  ‹U+2F00…U+2FDF \p{InKangxiRadicals}›
  ‹U+2FF0…U+2FFF \p{InIdeographicDescriptionCharacters}›
  ‹U+3000…U+303F \p{InCJKSymbolsandPunctuation}›
  ‹U+3040…U+309F \p{InHiragana}›
  ‹U+30A0…U+30FF \p{InKatakana}›
  ‹U+3100…U+312F \p{InBopomofo}›
  ‹U+3130…U+318F \p{InHangulCompatibilityJamo}›
  ‹U+3190…U+319F \p{InKanbun}›
  ‹U+31A0…U+31BF \p{InBopomofoExtended}›
  ‹U+31C0…U+31EF \p{InCJKStrokes}›
  ‹U+31F0…U+31FF \p{InKatakanaPhoneticExtensions}›
  ‹U+3200…U+32FF \p{InEnclosedCJKLettersandMonths}›
  ‹U+3300…U+33FF \p{InCJKCompatibility}›
  ‹U+3400…U+4DBF \p{InCJKUnifiedIdeographsExtensionA}›
  ‹U+4DC0…U+4DFF \p{InYijingHexagramSymbols}›
  ‹U+4E00…U+9FFF \p{InCJKUnifiedIdeographs}›
  ‹U+A000…U+A48F \p{InYiSyllables}›
  ‹U+A490…U+A4CF \p{InYiRadicals}›
  ‹U+A4D0…U+A4FF \p{InLisu}›
  ‹U+A500…U+A63F \p{InVai}›
  ‹U+A640…U+A69F \p{InCyrillicExtended-B}›
  ‹U+A6A0…U+A6FF \p{InBamum}›
  ‹U+A700…U+A71F \p{InModifierToneLetters}›
  ‹U+A720…U+A7FF \p{InLatinExtended-D}›
  ‹U+A800…U+A82F \p{InSylotiNagri}›
  ‹U+A830…U+A83F \p{InCommonIndicNumberForms}›
  ‹U+A840…U+A87F \p{InPhags-pa}›
  ‹U+A880…U+A8DF \p{InSaurashtra}›
  ‹U+A8E0…U+A8FF \p{InDevanagariExtended}›
  ‹U+A900…U+A92F \p{InKayahLi}›
  ‹U+A930…U+A95F \p{InRejang}›
  ‹U+A960…U+A97F \p{InHangulJamoExtended-A}›
  ‹U+A980…U+A9DF \p{InJavanese}›
  ‹U+AA00…U+AA5F \p{InCham}›
  ‹U+AA60…U+AA7F \p{InMyanmarExtended-A}›
  ‹U+AA80…U+AADF \p{InTaiViet}›
  ‹U+AAE0…U+AAFF \p{InMeeteiMayekExtensions}›
  ‹U+AB00…U+AB2F \p{InEthiopicExtended-A}›
  ‹U+ABC0…U+ABFF \p{InMeeteiMayek}›
  ‹U+AC00…U+D7AF \p{InHangulSyllables}›
  ‹U+D7B0…U+D7FF \p{InHangulJamoExtended-B}›
  ‹U+D800…U+DB7F \p{InHighSurrogates}›
  ‹U+DB80…U+DBFF \p{InHighPrivateUseSurrogates}›
  ‹U+DC00…U+DFFF \p{InLowSurrogates}›
  ‹U+E000…U+F8FF \p{InPrivateUseArea}›
  ‹U+F900…U+FAFF \p{InCJKCompatibilityIdeographs}›
  ‹U+FB00…U+FB4F \p{InAlphabeticPresentationForms}›
  ‹U+FB50…U+FDFF \p{InArabicPresentationForms-A}›
  ‹U+FE00…U+FE0F \p{InVariationSelectors}›
  ‹U+FE10…U+FE1F \p{InVerticalForms}›
  ‹U+FE20…U+FE2F \p{InCombiningHalfMarks}›
  ‹U+FE30…U+FE4F \p{InCJKCompatibilityForms}›
  ‹U+FE50…U+FE6F \p{InSmallFormVariants}›
  ‹U+FE70…U+FEFF \p{InArabicPresentationForms-B}›
  ‹U+FF00…U+FFEF \p{InHalfwidthandFullwidthForms}›
  ‹U+FFF0…U+FFFF \p{InSpecials}›


Generated by EDoc