Regular expressions capture vs. non capture elements

Mondo Technology Updated on 2024-02-24

Want to read all the best regular expressions in the world? Dreaming, young, what do you want, just do your best to learn.

js has been learning and using it for almost a year or two.,The object will be used if you find something.,Inherit something and get started.,But look at other people's frameworks**,It's always stuck at any time.,There's a major reason.,That is, I can't understand a string of Martian words (regular expressions),Learning?,It's to check and fill in the gaps.,I'm not afraid that you don't understand.,I'm afraid that you think you understand everything.。 Before getting down to business, let's recommend a piece of software: regexbuddy, whether it's doing regular testing or process research, it's a powerful tool.

Grammar review, focusing on three pieces of knowledge: greedy matching (?0==1, * any) with those related qualifiers (,) special characters: = ! / (

Martian script, which is basically composed of them, in order to match the original meaning of the character, the literal expression is added before the special character, and the need to add with new declaration; Non-capture metacharacters: ?:=(Forward Forecast), ?negative pre-check); Retrospectively referencing, the previous character matching is basically related to him; Other, what character boundaries, parentheses, parentheses, etc.; Principle of regular expression parsing:This is not what I can write for the time being,Recommend an article to understand greedy matching first,The daily application of regular expressions is basically satisfied,It has been mentioned in detail at the beginning of the grammar of the rookie tutorial,For example, there is a regex: chapter[1-9], this string we can only match chapter1-chapter9, that is, the first-level title of chapter, but what if we want to match the second-level or **title, here we use greedy matching, which is to maximize the matching result in the target string, and change the previous regex: chapter[1-9] to chapter[1-9], so that we can match chapter1, chapter12, and chapter123. But if we change it to/chapter[1-9]?/This is no matter what/chapterAs many numbers are entered, you can only match at most one number, which is chapter1, but unlike the original expression, this expression can also match the bare chapter, which is called (x?). ), the x in front of the question mark can appear 0 times or 1 time, when we change it to/chapter[1-9]Asterisks (avoid markdown syntax), this can be reached at the end? and the common result, which is also called, occurs any number of times. We can also use the above through [n,m], i.e. n=Another one paired with a greedy match is called a lazy match, and all the greedy matches that appear earlier are followed by one? , so that the whole expression becomes a lazy match, which can be understood as a minimized match, e.g. chapter[1-9] matches chapter12345 and the result is chapter12345, but chapter[1-9] ? The result of the match is chapter1; Chapter[1-9] is Chapter 1234, and Chapter[1-9]? The knot is chapter 1234, which is called minimizing the dematch result and taking the lower bound, often referred to as lazy mode.

What did you see before? :,=,?!I didn't pay attention to it when I used it less, and I often saw it in a large area recently, and I was so frightened that I encountered a regex expression in reading gulp before: -[0-9a-f]-?(Matches app-7ef5d9ee29.)css expressions), just dive headlong into it,'-?'What is the special meaning in the end?,In the end, I found out,That's a greedy match.,You're a fool.,But I really don't understand what the source code author is thinking.,Maybe I didn't come across app-7ef5d9ee29-any.css'-?'What are you doing, let me jump straight into the pit.

Back to the topic, let's first understand what a capture group is, which can be summarized as a capture group in the form of parentheses such as '(pattern)', and the match meets the brackets. Let's take a look at a definition from the rookie tutorial:

Four forms, plus? What is the difference between the capture element and the non-capture element, the manifestation is to use the exec method to match, and the capture group will be simply stored in a set of variables. The theory is too boring, just look at the example, ** set page106 in js high, slightly changed:

var str ='mom and dad and baby';var pattern = /mom( and dad( and baby))/;Capture meta form var pat= mom(?:and dad(?:and baby))/;Non-capture metaform var mat = patternexec(str);var match = pat.exec(str);console.log(mat);console.log(match);

Looking at the results printed by devtools, isn't it a bit eyebrow-raising, yes, although the matching results are consistent, but when the capture group is matched, the units that meet the form of the capture meta are saved separately as a matching result, and the non-capture element is not saved separately, only the complete matching result is saved. Our common regexp$1, $2 is actually a reference to the results of the capture group.

Capture the element and the non-capture element have figured it out, that (?:p attern) and (?=pattern) What's the difference, the answer, two differences. Difference 1: The results of the former match contain the capture element, while the results of the latter do not. Difference 2: When the former matches the capture meta, it consumes characters (indexes), while the latter does not. Let's look at an example:

var str ='ababa';var pattern = /ab(?:a)/g;var pat= /ab(?=a)/g;var mat = pattern.exec(str);var match = pat.exec(str);console.log(mat);console.log(match); mat = pattern.exec(str);Global mode, second match match = patexec(str);Global mode, second match consolelog(mat);console.log(match);

From the screenshot of the above ** run, you can see the difference one, that is, (?:p attern) is saved in the final result, while (?=pattern);The difference is not very obvious, and we need to rely on regexbuddy, what exactly happened in the process? Look at the screenshot of the run, if you are careful enough, you can find the difference, the first time the match to the result, the start of the second match, ?: is from character index 3, and ? = is from 2, which is what we said earlier about consuming characters vs. not consuming characters.

Well, the last question, FCL pre-check (?=pattern) and negative pre-check (?!pattern), in fact, understanding negative pre-examination from Chinese alone will bring ambiguity. The negative direction here is actually only the negation of the positive pre-check, that is, the characters to be matched do not meet the capture conditions in order to match the result.

Related Pages