Copyright © 2021 by Anatolii Kosorukov
Authors: Anatolii Kosorukov (java1cprog@yandex.ru) [web site: rustkas.github.io/].
compile_option() = unicode | anchored | caseless | dollar_endonly | dotall | extended | firstline | multiline | no_auto_capture | dupnames | ungreedy | {newline, nl_spec()} | bsr_anycrlf | bsr_unicode | no_start_optimize | ucp | never_utf
mp() = {re_pattern, term(), term(), term(), term()}
nl_spec() = cr | crlf | lf | anycrlf | any
avoid_characters/0 | The list of characters which raise an error if escape character is not used. |
first_match/2 | Retrieve the Matched Text. |
first_match_info/2 | Determine the Position and Length of the Match. |
is_full_match/2 | Check whether a string fits a certain pattern in its entirety. |
is_match/2 | Check whether a match can be found for a particular regular expression in a particular string. |
mp/1 | It is reduced form of re:compile/1 function. |
mp/2 | It is reduced form of re:compile/1 function. |
replace/1 | Replace one of shorthand pattern from the list [\s,\w,\h,v]
in a pattern string. |
save_pattern/1 | Make save Regex pattern which make literal for any character. |
tune/1 | Replace Regex pattern to more siple one. |
unicode_block/1 | The Unicode character database divides all the code points into blocks. |
avoid_characters() -> Result
returns: The list of spectial characters.
The list of characters which raise an error if escape character is not used.
first_match(Text, ReInput) -> Result
Text: regex pattern
returns: String result
Retrieve the Matched Text.
You have a regular expression that matches a part of the subject text, and you want to
extract the text that was matched. If the regular expression can match the string more
than once, you want only the first match.
See also:
http://erlang.org/doc/man/re.html#compile_1,
http://erlang.org/doc/man/re.html#run_2.
first_match_info(Text, Regex) -> any()
Determine the Position and Length of the Match.
Instead of extracting the substring matched by the regular expression you want to determine
the starting position and length of the match.
With this information, you can extract the match in your own code or apply whatever
processing you fancy on the part of the original string matched by the regex.
See also:
http://erlang.org/doc/man/re.html#compile_1,
http://erlang.org/doc/man/re.html#run_2.
is_full_match(Text, ReInput) -> Result
Text: regex pattern
returns: true or false
Check whether a string fits a certain pattern in its entirety.
A partial match is not sufficient.
See also:
http://erlang.org/doc/man/re.html#compile_1.
is_match(Text, ReInput) -> Result
Text: regex pattern
returns: true or false
Check whether a match can be found for a particular regular expression in a particular string.
A partial match is sufficient.
See also:
http://erlang.org/doc/man/re.html#compile_1.
mp(Regex) -> MP | {error, badarg}
Regex: regex pattern
returns: Opaque data type containing a compiled regular expression
It is reduced form of re:compile/1
function.
Return opaque data type containing a compiled regular expression or raise an error badarg
.
See also:
mp().
mp(Regex, Options) -> MP | {error, badarg}
Regex: regex pattern
Options: additional regular expression metadata
returns: Opaque data type containing a compiled regular expression
It is reduced form of re:compile/1
function.
Return opaque data type containing a compiled regular expression or raise an error badarg
.
See also:
mp().
replace(Pattern) -> UpdatedPattern
Pattern: searched regex pattern for replacing
returns: Updated Regex pattern string
Replace one of shorthand pattern from the list [\s,\w,\h,v]
in a pattern string.
Don't apply \w
shorthand to unicode content.
save_pattern(Pattern) -> SavePattern
returns: Save pattern
Make save Regex pattern which make literal for any character.
tune(Regex) -> Result
returns: Transformed Regex pattern.
Replace Regex pattern to more siple one.
unicode_block(BlockName) -> Range | nomatch
BlockName: is Regular Expression block name
returns: Regular Expressions range of code points
The Unicode character database divides all the code points into blocks. Each block
consists of a single range of code points. The code points U+0000 through U+FFFF
are divided into 156 blocks in version 6.1 of the Unicode standard.
‹U+0000…U+007F \p{InBasicLatin}› ‹U+0080…U+00FF \p{InLatin-1Supplement}› ‹U+0100…U+017F \p{InLatinExtended-A}› ‹U+0180…U+024F \p{InLatinExtended-B}› ‹U+0250…U+02AF \p{InIPAExtensions}› ‹U+02B0…U+02FF \p{InSpacingModifierLetters}› ‹U+0300…U+036F \p{InCombiningDiacriticalMarks}› ‹U+0370…U+03FF \p{InGreekandCoptic}› ‹U+0400…U+04FF \p{InCyrillic}› ‹U+0500…U+052F \p{InCyrillicSupplement}› ‹U+0530…U+058F \p{InArmenian}› ‹U+0590…U+05FF \p{InHebrew}› ‹U+0600…U+06FF \p{InArabic}› ‹U+0700…U+074F \p{InSyriac}› ‹U+0750…U+077F \p{InArabicSupplement}› ‹U+0780…U+07BF \p{InThaana}› ‹U+07C0…U+07FF \p{InNKo}› ‹U+0800…U+083F \p{InSamaritan}› ‹U+0840…U+085F \p{InMandaic}› ‹U+08A0…U+08FF \p{InArabicExtended-A}› ‹U+0900…U+097F \p{InDevanagari}› ‹U+0980…U+09FF \p{InBengali}› ‹U+0A00…U+0A7F \p{InGurmukhi}› ‹U+0A80…U+0AFF \p{InGujarati}› ‹U+0B00…U+0B7F \p{InOriya}› ‹U+0B80…U+0BFF \p{InTamil}› ‹U+0C00…U+0C7F \p{InTelugu}› ‹U+0C80…U+0CFF \p{InKannada}› ‹U+0D00…U+0D7F \p{InMalayalam}› ‹U+0D80…U+0DFF \p{InSinhala}› ‹U+0E00…U+0E7F \p{InThai}› ‹U+0E80…U+0EFF \p{InLao}› ‹U+0F00…U+0FFF \p{InTibetan}› ‹U+1000…U+109F \p{InMyanmar}› ‹U+10A0…U+10FF \p{InGeorgian}› ‹U+1100…U+11FF \p{InHangulJamo}› ‹U+1200…U+137F \p{InEthiopic}› ‹U+1380…U+139F \p{InEthiopicSupplement}› ‹U+13A0…U+13FF \p{InCherokee}› ‹U+1400…U+167F \p{InUnifiedCanadianAboriginalSyllabics}› ‹U+1680…U+169F \p{InOgham}› ‹U+16A0…U+16FF \p{InRunic}› ‹U+1700…U+171F \p{InTagalog}› ‹U+1720…U+173F \p{InHanunoo}› ‹U+1740…U+175F \p{InBuhid}› ‹U+1760…U+177F \p{InTagbanwa}› ‹U+1780…U+17FF \p{InKhmer}› ‹U+1800…U+18AF \p{InMongolian}› ‹U+18B0…U+18FF \p{InUnifiedCanadianAboriginalSyllabicsExtended}› ‹U+1900…U+194F \p{InLimbu}› ‹U+1950…U+197F \p{InTaiLe}› ‹U+1980…U+19DF \p{InNewTaiLue}› ‹U+19E0…U+19FF \p{InKhmerSymbols}› ‹U+1A00…U+1A1F \p{InBuginese}› ‹U+1A20…U+1AAF \p{InTaiTham}› ‹U+1B00…U+1B7F \p{InBalinese}› ‹U+1B80…U+1BBF \p{InSundanese}› ‹U+1BC0…U+1BFF \p{InBatak}› ‹U+1C00…U+1C4F \p{InLepcha}› ‹U+1C50…U+1C7F \p{InOlChiki}› ‹U+1CC0…U+1CCF \p{InSundaneseSupplement}› ‹U+1CD0…U+1CFF \p{InVedicExtensions}› ‹U+1D00…U+1D7F \p{InPhoneticExtensions}› ‹U+1D80…U+1DBF \p{InPhoneticExtensionsSupplement}› ‹U+1DC0…U+1DFF \p{InCombiningDiacriticalMarksSupplement}› ‹U+1E00…U+1EFF \p{InLatinExtendedAdditional}› ‹U+1F00…U+1FFF \p{InGreekExtended}› ‹U+2000…U+206F \p{InGeneralPunctuation}› ‹U+2070…U+209F \p{InSuperscriptsandSubscripts}› ‹U+20A0…U+20CF \p{InCurrencySymbols}› ‹U+20D0…U+20FF \p{InCombiningDiacriticalMarksforSymbols}› ‹U+2100…U+214F \p{InLetterlikeSymbols}› ‹U+2150…U+218F \p{InNumberForms}› ‹U+2190…U+21FF \p{InArrows}› ‹U+2200…U+22FF \p{InMathematicalOperators}› ‹U+2300…U+23FF \p{InMiscellaneousTechnical}› ‹U+2400…U+243F \p{InControlPictures}› ‹U+2440…U+245F \p{InOpticalCharacterRecognition}› ‹U+2460…U+24FF \p{InEnclosedAlphanumerics}› ‹U+2500…U+257F \p{InBoxDrawing}› ‹U+2580…U+259F \p{InBlockElements}› ‹U+25A0…U+25FF \p{InGeometricShapes}› ‹U+2600…U+26FF \p{InMiscellaneousSymbols}› ‹U+2700…U+27BF \p{InDingbats}› ‹U+27C0…U+27EF \p{InMiscellaneousMathematicalSymbols-A}› ‹U+27F0…U+27FF \p{InSupplementalArrows-A}› ‹U+2800…U+28FF \p{InBraillePatterns}› ‹U+2900…U+297F \p{InSupplementalArrows-B}› ‹U+2980…U+29FF \p{InMiscellaneousMathematicalSymbols-B}› ‹U+2A00…U+2AFF \p{InSupplementalMathematicalOperators}› ‹U+2B00…U+2BFF \p{InMiscellaneousSymbolsandArrows}› ‹U+2C00…U+2C5F \p{InGlagolitic}› ‹U+2C60…U+2C7F \p{InLatinExtended-C}› ‹U+2C80…U+2CFF \p{InCoptic}› ‹U+2D00…U+2D2F \p{InGeorgianSupplement}› ‹U+2D30…U+2D7F \p{InTifinagh}› ‹U+2D80…U+2DDF \p{InEthiopicExtended}› ‹U+2DE0…U+2DFF \p{InCyrillicExtended-A}› ‹U+2E00…U+2E7F \p{InSupplementalPunctuation}› ‹U+2E80…U+2EFF \p{InCJKRadicalsSupplement}› ‹U+2F00…U+2FDF \p{InKangxiRadicals}› ‹U+2FF0…U+2FFF \p{InIdeographicDescriptionCharacters}› ‹U+3000…U+303F \p{InCJKSymbolsandPunctuation}› ‹U+3040…U+309F \p{InHiragana}› ‹U+30A0…U+30FF \p{InKatakana}› ‹U+3100…U+312F \p{InBopomofo}› ‹U+3130…U+318F \p{InHangulCompatibilityJamo}› ‹U+3190…U+319F \p{InKanbun}› ‹U+31A0…U+31BF \p{InBopomofoExtended}› ‹U+31C0…U+31EF \p{InCJKStrokes}› ‹U+31F0…U+31FF \p{InKatakanaPhoneticExtensions}› ‹U+3200…U+32FF \p{InEnclosedCJKLettersandMonths}› ‹U+3300…U+33FF \p{InCJKCompatibility}› ‹U+3400…U+4DBF \p{InCJKUnifiedIdeographsExtensionA}› ‹U+4DC0…U+4DFF \p{InYijingHexagramSymbols}› ‹U+4E00…U+9FFF \p{InCJKUnifiedIdeographs}› ‹U+A000…U+A48F \p{InYiSyllables}› ‹U+A490…U+A4CF \p{InYiRadicals}› ‹U+A4D0…U+A4FF \p{InLisu}› ‹U+A500…U+A63F \p{InVai}› ‹U+A640…U+A69F \p{InCyrillicExtended-B}› ‹U+A6A0…U+A6FF \p{InBamum}› ‹U+A700…U+A71F \p{InModifierToneLetters}› ‹U+A720…U+A7FF \p{InLatinExtended-D}› ‹U+A800…U+A82F \p{InSylotiNagri}› ‹U+A830…U+A83F \p{InCommonIndicNumberForms}› ‹U+A840…U+A87F \p{InPhags-pa}› ‹U+A880…U+A8DF \p{InSaurashtra}› ‹U+A8E0…U+A8FF \p{InDevanagariExtended}› ‹U+A900…U+A92F \p{InKayahLi}› ‹U+A930…U+A95F \p{InRejang}› ‹U+A960…U+A97F \p{InHangulJamoExtended-A}› ‹U+A980…U+A9DF \p{InJavanese}› ‹U+AA00…U+AA5F \p{InCham}› ‹U+AA60…U+AA7F \p{InMyanmarExtended-A}› ‹U+AA80…U+AADF \p{InTaiViet}› ‹U+AAE0…U+AAFF \p{InMeeteiMayekExtensions}› ‹U+AB00…U+AB2F \p{InEthiopicExtended-A}› ‹U+ABC0…U+ABFF \p{InMeeteiMayek}› ‹U+AC00…U+D7AF \p{InHangulSyllables}› ‹U+D7B0…U+D7FF \p{InHangulJamoExtended-B}› ‹U+D800…U+DB7F \p{InHighSurrogates}› ‹U+DB80…U+DBFF \p{InHighPrivateUseSurrogates}› ‹U+DC00…U+DFFF \p{InLowSurrogates}› ‹U+E000…U+F8FF \p{InPrivateUseArea}› ‹U+F900…U+FAFF \p{InCJKCompatibilityIdeographs}› ‹U+FB00…U+FB4F \p{InAlphabeticPresentationForms}› ‹U+FB50…U+FDFF \p{InArabicPresentationForms-A}› ‹U+FE00…U+FE0F \p{InVariationSelectors}› ‹U+FE10…U+FE1F \p{InVerticalForms}› ‹U+FE20…U+FE2F \p{InCombiningHalfMarks}› ‹U+FE30…U+FE4F \p{InCJKCompatibilityForms}› ‹U+FE50…U+FE6F \p{InSmallFormVariants}› ‹U+FE70…U+FEFF \p{InArabicPresentationForms-B}› ‹U+FF00…U+FFEF \p{InHalfwidthandFullwidthForms}› ‹U+FFF0…U+FFFF \p{InSpecials}›
Generated by EDoc