A comprehensive regex for phone number validation

I'm trying to put together a comprehensive regex to validate phone numbers. Ideally it would handle international formats, but it must handle US formats, including the following:1-234-567-89011-234-567-8901 x12341-234-567-8901 ext12341 (234) 567-89011.234.567.89011/234/567/890112345678901I'll answer with my current attempt, but I'm hoping somebody has something better and/or more elegant....Read more

regex - How to validate an email address using a regular expression?

Over the years I have slowly developed a regular expression that validates MOST email addresses correctly, assuming they don't use an IP address as the server part.I use it in several PHP programs, and it works most of the time. However, from time to time I get contacted by someone that is having trouble with a site that uses it, and I end up having to make some adjustment (most recently I realized that I wasn't allowing 4-character TLDs).What is the best regular expression you have or have seen for validating emails?I've seen several solution...Read more

regex - VSCode syntax matching constants in string

I am having an issue in that VSCode syntax highlighting is recognizing numeric values within strings. I was under the impression that when a begin/end capture occurs, that everything within it was treated as a string and subsequent rules would not execute.In my syntax file I have the following definitions: { "name": "string.quoted.single.proc", "begin": "'", "beginCaptures": { "0": { "name": "punctuation.definition.quotes.begin.single.proc" } }, "end": "'", "endCaptures": { "0": { "name": "punctuation.def...Read more

regex - Insert missing commas in C source

I've got a perl script (using -p flag) that performs some corrections on a corrupted C source file. Here's part of the script:sub remove_sp { $_ = shift; s/ /, /g; return $_;}s/(\([^}]*\))/remove_sp($1)/eg;This replaces spaces inside parenthesis with , , e.g. foo(bar baz) becomes foo(bar, baz). However, it's not very smart. It also changes foo("bar baz") to foo("bar, baz") which obviously isn't something I want.I can't think of a way to rewrite the script so that it replaces a space with a comma-space only when the space is not...Read more

Bash regex matching with a wildcard

I am trying to check if a given string has .rel6. in it. I am a little puzzled by the Bash regex behavior. What am I missing here?os=$(uname -r) # set to string "2.6.32-504.23.4.el6.x86_64"[[ $os =~ *el6* ]] && echo yes # doesn't match, I understand it is Bash is treating it as a glob expression[[ $os =~ el6 ]] && echo yes # matches[[ $os =~ .el6 ]] && echo yes # matches[[ $os =~ .el6. ]] && echo yes # matches[[ $os =~ ".el6." ]] && echo yes # m...Read more

Python RegEx - unintended full-stop match

I am trying to write a Python regex pizza matching quite a few different date formats, and I have encountered an error I cannot really explain. My current regex pattern looks like so:r'((?:\d?\d[-/ ])?(?:\d?\d|(?:(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*))(?:(?:\d?\d)?[,-/ ])\d{2,4})'It works great for almost everything in my data set, but e.g. this string remains defiant:'Lithium 0.25 (7/11/77). LFTS wnl. Urine tox neg. Serum tox + fluoxetine 500; otherwise neg. TSH 3.28. BUN/Cr: 16/0.83. Lipids unremarkable. B12 363, Fo...Read more

python 3.x - Slice a string by regex

So I want to split a string by space unless that part of the string is in exclamation marks.Sample:ABC DEF !GHI JKL MNO! PQRWould become:["ABC", "DEF", "GHI JKL MNO", "PQR"]Currently, this is my regex (I checked it with and it worked)[^\s]*![^!]*![^\s]*|[^\s!!]+And my code to split it issample = "ABC DEF !GHI JKL MNO! PQR"print(sample.split(r"[^\s]*![^!]*![^\s]*|[^\s!!]+").strip("!"))...Read more

Alternative to Regex for large string format/replace

I have a very large string of key value pairs (old_string) that is formatted as so:"visitorid"="gh43k9sk-gj49-92ks-jgjs-j2ks-j29slgj952ks", "customer_name"="larry", "customer_state"="alabama",..."visitorid"="..."this string is very large since it can be up to 30k customers. I am using this to write a file to upload to an online segmentation tool that requires that it is formatted this way with one modification -- the primary key (visitorid) needs to be tab separated and not in quotes. The end result needs to look like this (note the 4 spaces is...Read more

string - How to use a selective regex to perform replace in a pandas series?

I would like to use a regex when applying pandas.Series.str.replace. I am aware that it takes in regex, but my output is not as intended. Here is a simple example. Suppose I haveser = pd.Series(['asd3', 'qwe3', 'asd4', 'zxc'])I would like to turn the 'asd3' and 'asd4' into 'asd'. That is, simply removing any integer at the end. I am using the code:ser.str.replace('asd([0-9])','')Bote that I am using the ([0-9]) notation, which I interpret as saying: for any element of the series, if it looks like 'asd([0-9])', then replace the [0-9] with `` (th...Read more

How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

How can I use regular expressions in Excel and take advantage of Excel's powerful grid like setup for data manipulation?In-cell function to return matched pattern or replaced value in string.Sub to loop through a column of data and extract matches to adjacent cells.What setup is necessary?What are Excel's special characters for Regular expressions?I understand Regex is not ideal for many situations (To use or not to use regular expressions?) since excel can use Left, Mid, Right, Instr type commands for similar manipulations....Read more

regex - Tempered Greedy Token - What is different about placing the dot before the negative lookahead

<table((?!</table>).)*</table>matches all my table tags, however, <table(.(?!</table>))*</table>does not. The second one seems to make sense if I try to write out the expression in words, but I can't make sense of the first.Can someone explain the difference to me?For reference, I got the term `Tempered Greedy Token' from here: more