Module re_tuner

Helper function for working with Regular Expression Erlanb re module.

Copyright © 2021 by Anatolii Kosorukov

Authors: Anatolii Kosorukov (java1cprog@yandex.ru) [web site: rustkas.github.io/].

Description

Helper function for working with Regular Expression Erlanb re module.

Data Types

compile_option()

compile_option() = unicode | anchored | caseless | dollar_endonly | dotall | extended | firstline | multiline | no_auto_capture | dupnames | ungreedy | {newline, nl_spec()} | bsr_anycrlf | bsr_unicode | no_start_optimize | ucp | never_utf

mp()

mp() = {re_pattern, term(), term(), term(), term()}

nl_spec()

nl_spec() = cr | crlf | lf | anycrlf | any

Function Index

avoid_characters/0The list of characters which raise an error if escape character is not used.
first_match/2Retrieve the Matched Text.
first_match_info/2Determine the Position and Length of the Match.
is_full_match/2Check whether a string fits a certain pattern in its entirety.
is_match/2Check whether a match can be found for a particular regular expression in a particular string.
mp/1It is reduced form of re:compile/1 function.
mp/2It is reduced form of re:compile/1 function.
replace/1Replace one of shorthand pattern from the list [\s,\w,\h,v] in a pattern string.
save_pattern/1Make save Regex pattern which make literal for any character.
tune/1Replace Regex pattern to more siple one.
unicode_block/1The Unicode character database divides all the code points into blocks.

Function Details

avoid_characters/0

avoid_characters() -> Result

returns: The list of spectial characters.

The list of characters which raise an error if escape character is not used.

first_match/2

first_match(Text, ReInput) -> Result

Text: regex pattern

returns: String result

Retrieve the Matched Text. You have a regular expression that matches a part of the subject text, and you want to extract the text that was matched. If the regular expression can match the string more than once, you want only the first match.
See also: http://erlang.org/doc/man/re.html#compile_1, http://erlang.org/doc/man/re.html#run_2.

first_match_info/2

first_match_info(Text, Regex) -> any()

Determine the Position and Length of the Match. Instead of extracting the substring matched by the regular expression you want to determine the starting position and length of the match. With this information, you can extract the match in your own code or apply whatever processing you fancy on the part of the original string matched by the regex.
See also: http://erlang.org/doc/man/re.html#compile_1, http://erlang.org/doc/man/re.html#run_2.

is_full_match/2

is_full_match(Text, ReInput) -> Result

Text: regex pattern

returns: true or false

Check whether a string fits a certain pattern in its entirety. A partial match is not sufficient.
See also: http://erlang.org/doc/man/re.html#compile_1.

is_match/2

is_match(Text, ReInput) -> Result

Text: regex pattern

returns: true or false

Check whether a match can be found for a particular regular expression in a particular string. A partial match is sufficient.
See also: http://erlang.org/doc/man/re.html#compile_1.

mp/1

mp(Regex) -> MP | {error, badarg}

Regex: regex pattern

returns: Opaque data type containing a compiled regular expression

It is reduced form of re:compile/1 function. Return opaque data type containing a compiled regular expression or raise an error badarg.
See also: mp().

mp/2

mp(Regex, Options) -> MP | {error, badarg}

Regex: regex pattern
Options: additional regular expression metadata

returns: Opaque data type containing a compiled regular expression

It is reduced form of re:compile/1 function. Return opaque data type containing a compiled regular expression or raise an error badarg.
See also: mp().

replace/1

replace(Pattern) -> UpdatedPattern

Pattern: searched regex pattern for replacing

returns: Updated Regex pattern string

Replace one of shorthand pattern from the list [\s,\w,\h,v] in a pattern string.
Don't apply \w shorthand to unicode content.

save_pattern/1

save_pattern(Pattern) -> SavePattern

returns: Save pattern

Make save Regex pattern which make literal for any character.

tune/1

tune(Regex) -> Result

returns: Transformed Regex pattern.

Replace Regex pattern to more siple one.

unicode_block/1

unicode_block(BlockName) -> Range | nomatch

BlockName: is Regular Expression block name

returns: Regular Expressions range of code points

The Unicode character database divides all the code points into blocks. Each block consists of a single range of code points. The code points U+0000 through U+FFFF are divided into 156 blocks in version 6.1 of the Unicode standard.

  ‹U+0000…U+007F \p{InBasicLatin}›
  ‹U+0080…U+00FF \p{InLatin-1Supplement}›
  ‹U+0100…U+017F \p{InLatinExtended-A}›
  ‹U+0180…U+024F \p{InLatinExtended-B}›
  ‹U+0250…U+02AF \p{InIPAExtensions}›
  ‹U+02B0…U+02FF \p{InSpacingModifierLetters}›
  ‹U+0300…U+036F \p{InCombiningDiacriticalMarks}›
  ‹U+0370…U+03FF \p{InGreekandCoptic}›
  ‹U+0400…U+04FF \p{InCyrillic}›
  ‹U+0500…U+052F \p{InCyrillicSupplement}›
  ‹U+0530…U+058F \p{InArmenian}›
  ‹U+0590…U+05FF \p{InHebrew}›
  ‹U+0600…U+06FF \p{InArabic}›
  ‹U+0700…U+074F \p{InSyriac}›
  ‹U+0750…U+077F \p{InArabicSupplement}›
  ‹U+0780…U+07BF \p{InThaana}›
  ‹U+07C0…U+07FF \p{InNKo}›
  ‹U+0800…U+083F \p{InSamaritan}›
  ‹U+0840…U+085F \p{InMandaic}›
  ‹U+08A0…U+08FF \p{InArabicExtended-A}›
  ‹U+0900…U+097F \p{InDevanagari}›
  ‹U+0980…U+09FF \p{InBengali}›
  ‹U+0A00…U+0A7F \p{InGurmukhi}›
  ‹U+0A80…U+0AFF \p{InGujarati}›
  ‹U+0B00…U+0B7F \p{InOriya}›
  ‹U+0B80…U+0BFF \p{InTamil}›
  ‹U+0C00…U+0C7F \p{InTelugu}›
  ‹U+0C80…U+0CFF \p{InKannada}›
  ‹U+0D00…U+0D7F \p{InMalayalam}›
  ‹U+0D80…U+0DFF \p{InSinhala}›
  ‹U+0E00…U+0E7F \p{InThai}›
  ‹U+0E80…U+0EFF \p{InLao}›
  ‹U+0F00…U+0FFF \p{InTibetan}›
  ‹U+1000…U+109F \p{InMyanmar}›
  ‹U+10A0…U+10FF \p{InGeorgian}›
  ‹U+1100…U+11FF \p{InHangulJamo}›
  ‹U+1200…U+137F \p{InEthiopic}›
  ‹U+1380…U+139F \p{InEthiopicSupplement}›
  ‹U+13A0…U+13FF \p{InCherokee}›
  ‹U+1400…U+167F \p{InUnifiedCanadianAboriginalSyllabics}›
  ‹U+1680…U+169F \p{InOgham}›
  ‹U+16A0…U+16FF \p{InRunic}›
  ‹U+1700…U+171F \p{InTagalog}›
  ‹U+1720…U+173F \p{InHanunoo}›
  ‹U+1740…U+175F \p{InBuhid}›
  ‹U+1760…U+177F \p{InTagbanwa}›
  ‹U+1780…U+17FF \p{InKhmer}›
  ‹U+1800…U+18AF \p{InMongolian}›
  ‹U+18B0…U+18FF \p{InUnifiedCanadianAboriginalSyllabicsExtended}›
  ‹U+1900…U+194F \p{InLimbu}›
  ‹U+1950…U+197F \p{InTaiLe}›
  ‹U+1980…U+19DF \p{InNewTaiLue}›
  ‹U+19E0…U+19FF \p{InKhmerSymbols}›
  ‹U+1A00…U+1A1F \p{InBuginese}›
  ‹U+1A20…U+1AAF \p{InTaiTham}›
  ‹U+1B00…U+1B7F \p{InBalinese}›
  ‹U+1B80…U+1BBF \p{InSundanese}›
  ‹U+1BC0…U+1BFF \p{InBatak}›
  ‹U+1C00…U+1C4F \p{InLepcha}›
  ‹U+1C50…U+1C7F \p{InOlChiki}›
  ‹U+1CC0…U+1CCF \p{InSundaneseSupplement}›
  ‹U+1CD0…U+1CFF \p{InVedicExtensions}›
  ‹U+1D00…U+1D7F \p{InPhoneticExtensions}›
  ‹U+1D80…U+1DBF \p{InPhoneticExtensionsSupplement}›
  ‹U+1DC0…U+1DFF \p{InCombiningDiacriticalMarksSupplement}›
  ‹U+1E00…U+1EFF \p{InLatinExtendedAdditional}›
  ‹U+1F00…U+1FFF \p{InGreekExtended}›
  ‹U+2000…U+206F \p{InGeneralPunctuation}›
  ‹U+2070…U+209F \p{InSuperscriptsandSubscripts}›
  ‹U+20A0…U+20CF \p{InCurrencySymbols}›
  ‹U+20D0…U+20FF \p{InCombiningDiacriticalMarksforSymbols}›
  ‹U+2100…U+214F \p{InLetterlikeSymbols}›
  ‹U+2150…U+218F \p{InNumberForms}›
  ‹U+2190…U+21FF \p{InArrows}›
  ‹U+2200…U+22FF \p{InMathematicalOperators}›
  ‹U+2300…U+23FF \p{InMiscellaneousTechnical}›
  ‹U+2400…U+243F \p{InControlPictures}›
  ‹U+2440…U+245F \p{InOpticalCharacterRecognition}›
  ‹U+2460…U+24FF \p{InEnclosedAlphanumerics}›
  ‹U+2500…U+257F \p{InBoxDrawing}›
  ‹U+2580…U+259F \p{InBlockElements}›
  ‹U+25A0…U+25FF \p{InGeometricShapes}›
  ‹U+2600…U+26FF \p{InMiscellaneousSymbols}›
  ‹U+2700…U+27BF \p{InDingbats}›
  ‹U+27C0…U+27EF \p{InMiscellaneousMathematicalSymbols-A}›
  ‹U+27F0…U+27FF \p{InSupplementalArrows-A}›
  ‹U+2800…U+28FF \p{InBraillePatterns}›
  ‹U+2900…U+297F \p{InSupplementalArrows-B}›
  ‹U+2980…U+29FF \p{InMiscellaneousMathematicalSymbols-B}›
  ‹U+2A00…U+2AFF \p{InSupplementalMathematicalOperators}›
  ‹U+2B00…U+2BFF \p{InMiscellaneousSymbolsandArrows}›
  ‹U+2C00…U+2C5F \p{InGlagolitic}›
  ‹U+2C60…U+2C7F \p{InLatinExtended-C}›
  ‹U+2C80…U+2CFF \p{InCoptic}›
  ‹U+2D00…U+2D2F \p{InGeorgianSupplement}›
  ‹U+2D30…U+2D7F \p{InTifinagh}›
  ‹U+2D80…U+2DDF \p{InEthiopicExtended}›
  ‹U+2DE0…U+2DFF \p{InCyrillicExtended-A}›
  ‹U+2E00…U+2E7F \p{InSupplementalPunctuation}›
  ‹U+2E80…U+2EFF \p{InCJKRadicalsSupplement}›
  ‹U+2F00…U+2FDF \p{InKangxiRadicals}›
  ‹U+2FF0…U+2FFF \p{InIdeographicDescriptionCharacters}›
  ‹U+3000…U+303F \p{InCJKSymbolsandPunctuation}›
  ‹U+3040…U+309F \p{InHiragana}›
  ‹U+30A0…U+30FF \p{InKatakana}›
  ‹U+3100…U+312F \p{InBopomofo}›
  ‹U+3130…U+318F \p{InHangulCompatibilityJamo}›
  ‹U+3190…U+319F \p{InKanbun}›
  ‹U+31A0…U+31BF \p{InBopomofoExtended}›
  ‹U+31C0…U+31EF \p{InCJKStrokes}›
  ‹U+31F0…U+31FF \p{InKatakanaPhoneticExtensions}›
  ‹U+3200…U+32FF \p{InEnclosedCJKLettersandMonths}›
  ‹U+3300…U+33FF \p{InCJKCompatibility}›
  ‹U+3400…U+4DBF \p{InCJKUnifiedIdeographsExtensionA}›
  ‹U+4DC0…U+4DFF \p{InYijingHexagramSymbols}›
  ‹U+4E00…U+9FFF \p{InCJKUnifiedIdeographs}›
  ‹U+A000…U+A48F \p{InYiSyllables}›
  ‹U+A490…U+A4CF \p{InYiRadicals}›
  ‹U+A4D0…U+A4FF \p{InLisu}›
  ‹U+A500…U+A63F \p{InVai}›
  ‹U+A640…U+A69F \p{InCyrillicExtended-B}›
  ‹U+A6A0…U+A6FF \p{InBamum}›
  ‹U+A700…U+A71F \p{InModifierToneLetters}›
  ‹U+A720…U+A7FF \p{InLatinExtended-D}›
  ‹U+A800…U+A82F \p{InSylotiNagri}›
  ‹U+A830…U+A83F \p{InCommonIndicNumberForms}›
  ‹U+A840…U+A87F \p{InPhags-pa}›
  ‹U+A880…U+A8DF \p{InSaurashtra}›
  ‹U+A8E0…U+A8FF \p{InDevanagariExtended}›
  ‹U+A900…U+A92F \p{InKayahLi}›
  ‹U+A930…U+A95F \p{InRejang}›
  ‹U+A960…U+A97F \p{InHangulJamoExtended-A}›
  ‹U+A980…U+A9DF \p{InJavanese}›
  ‹U+AA00…U+AA5F \p{InCham}›
  ‹U+AA60…U+AA7F \p{InMyanmarExtended-A}›
  ‹U+AA80…U+AADF \p{InTaiViet}›
  ‹U+AAE0…U+AAFF \p{InMeeteiMayekExtensions}›
  ‹U+AB00…U+AB2F \p{InEthiopicExtended-A}›
  ‹U+ABC0…U+ABFF \p{InMeeteiMayek}›
  ‹U+AC00…U+D7AF \p{InHangulSyllables}›
  ‹U+D7B0…U+D7FF \p{InHangulJamoExtended-B}›
  ‹U+D800…U+DB7F \p{InHighSurrogates}›
  ‹U+DB80…U+DBFF \p{InHighPrivateUseSurrogates}›
  ‹U+DC00…U+DFFF \p{InLowSurrogates}›
  ‹U+E000…U+F8FF \p{InPrivateUseArea}›
  ‹U+F900…U+FAFF \p{InCJKCompatibilityIdeographs}›
  ‹U+FB00…U+FB4F \p{InAlphabeticPresentationForms}›
  ‹U+FB50…U+FDFF \p{InArabicPresentationForms-A}›
  ‹U+FE00…U+FE0F \p{InVariationSelectors}›
  ‹U+FE10…U+FE1F \p{InVerticalForms}›
  ‹U+FE20…U+FE2F \p{InCombiningHalfMarks}›
  ‹U+FE30…U+FE4F \p{InCJKCompatibilityForms}›
  ‹U+FE50…U+FE6F \p{InSmallFormVariants}›
  ‹U+FE70…U+FEFF \p{InArabicPresentationForms-B}›
  ‹U+FF00…U+FFEF \p{InHalfwidthandFullwidthForms}›
  ‹U+FFF0…U+FFFF \p{InSpecials}›


Generated by EDoc