PHP Regular Expressions Tutorial

In this video tutorial we learn the basics of PHP regular expressions. However, knowing even these regex basics will let you match most of online content. Make sure to watch next video, which will be a practical example of using regular expressions to parse out movie information from imdb.com web site.

Hey, what's going on, guys? It's Clever Techie, and in this video, we're going to learn PHP regular expressions. Okay, is this is a cheat sheet that I've created here. And, I'm going to refer to it throughout this video, and if you guys want to download a full resolution of this image, I have included a link in the description of this video, so if you'd like to download it, go ahead and do so.

Okay so, first of all, I've... Let's go to a website called regex101.com. And, this is an awesome website that I've just discovered, where you can practice regular expressions. And, regular expressions, you can think of them as basically wild cards on steroids. So, they're just patterns for matching a text. And, it's really useful if you're a web developer and if you wanna match some content on websites, you can easily do so with regular expressions as well as match in any other content at all. So for the test string, I've included this test string in the description of this video as well. And this is just a bunch of text that I found where we can use this text to practice using regular expressions on.

Overview of Regular Expression Characters

We're gonna go over some of these quantifiers, character classes, groups and ranges, meta characters, as well as flags. So we're gonna go over all this stuff and I'm gonna explain what most of these things mean. So let's go ahead and get started here. Okay, so the . matches any character. So if we simply put . here, it's gonna match all the characters on here. And if I put *, it's gonna match all the thing together. So basically what the * is is just zero or more. So, in this case, if we put any character followed by the zero or more, this is something that matches basically all the content.

Wild Cards

So if you have a website or any other content, placing a .* will match anything. And you've probably seen something like this, which is a wild card, .* followed by that txt. For example, if I open up a notepad here and I go to File, Open, you're gonna see down here in the text documents, this is the exactly, the exact regular expression that's down here, which is a *., followed by the txt, which means it's gonna match all the files with extension .txt. And there's another one down here, which basically says match any extension and all, and the * is exactly what that means, match anything. All right, so that's how you match anything.

Lazy vs Greedy Quantifiers

Okay, so + just means one or more so that has to be something be matched, whereas a *, it can match nothing or more. Then the question mark ? is when things get a little interesting. Okay, so let explain the difference between the two. So if I wanted to match these strong tags, for example, if you have content on website, and for some reason, you wanna match all the HTML tags, here is what you might do. You might wanna do this, right? So this is what is called, this is called greedy quantifier, because what it will do here is it'll start with a little opening < here, and then what you're doing is basically you're expecting to match all the content and then basically you're trying to match this strong tag here. But, instead, this thing is greedy, so it's gonna go ahead and match all the thing until it sees the ending > here. And that's why it's called greedy, because it'll match, it'll consume all the content in between this < and this > here. And if you wanted to make it lazy, you'll simply put a ? here. So now you can see that it's matching the HTML tags like we wanted them to. And it's also doing that here. So that's the difference between greedy and lazy matching, or greedy and lazy quantifier. And that's what the ? does. And usually I always use .* followed by ?, to make it non-greedy. But there are cases where you do wanna use the greedy one as well.

Matching Digits

Okay, let's go ahead and continue here. Putting a number inside the curly brackets { }, will mean match it exactly this many times. And if you put a number followed by the comma, you can say X or more times. And then you can also have between X and Y, which is X,Y. So you can say, one, two, comma five, and that will match any digit between one and five. So just to give you an example here. Let's go back here and, okay, so if I wanted to match digit which is basically a three digit number. So here I put a digit. So the \d matches a digit, if we back here under character class, you'll set that for \ digit, is a digit character. And I follow that with the {3}, which matches exactly... Which matches the number or whatever exactly this many times. So this'll match any three-digit character. And this is exactly what it's doing here, it's matching all the three-digit characters.

Now if you notice down here, or on the right, you have Regex options, and the Global one is turned on. So the Global one will basically match all the matches on our string. So if I just removed it, if I remove it, it's only gonna match the first one, but we do wanna match all of them, so we're gonna turn the global one one. And if you see here under Flags, I've also included the g, which is my fire, or you can call it the flag, which will match all the matches that are found. And it'll keep going until it matches all of them. All right, so this is how we match digit. And if we wanted a number between, for example, 1 and 3, you can also do that, it'll match all the numbers between 1 and 3. And that's how we do that. Or we can say, 1 or more, that will match all the numbers, because they're all start with 1. And if we did something like this, 5 or more, it'll match all the numbers from 5 or more. And this counts as a one number here, because, well, because it's not separated by the space or anything like that. Okay, let's move on here.

Matching White Space Characters

White space characters, we just use \s, and that'll match all the white space characters here. And it's matching them right now. Then all the non-white space characters, which is a capital S, is all the characters except the white space. And then all the digit characters, which we've already done, that'll be \d, all the non-digit characters are matched with a capital D. And then there's word and and non-word. This is the word, and the word will also match numbers as well, but it will not match special characters and spaces. So keep that in mind. And then all the reverse of that is a non-word character, which is all the special characters and spaces.

Groups, Ranges and the "Or Alternator"

Okay, so if we had something like this here, a or b, a|b, so it just means or. So as you can see it's matching b and a, or a and b, a or b, and it's matching all of them here. Okay, that's cool, that's exactly what we wanted. Okay, if we do a, b, or c, they'll match all the a, b, and c. So you can use that as if you wanted to match, for example, Hello or World, you can do so, it'll match Hello and World. You can match the whole words as well, as well as just characters, okay? That makes sense. Next one is match single character that is a or b or c.

Character Classes, Letter Ranges and Number Ranges

So inside the [ ], it's called the character class. So if we had this, and we wanted to match some of the numbers here, this is how we list them. So this'll match all these characters that are inside the character class, inside the [ ] Here. Next, we can have a negative range. So if we put a little caret here inside the square brackets, it'll match everything besides those characters that are inside the [ ], and it'll come really useful as we start parsing some website content later on. Okay, so next one is a range. And the range will do exactly what it says, it'll match all the range between a and z, so that's why it's matching all the letters here. We can also capitalize that, and they'll match all the capital letters, which you have here. And that's exactly what it's doing. And it's matching some other letters, capital letters, that we have throughout this test string. Okay, so next. Okay, so the digits, we can put 0 through 9, close the character class, and it'll match all the digits. We can test it out and see if it matches 0 through 3 and that's exactly what it's doing here.

Escaping Meta Characters

Okay, so next is the meta characters, which must be escaped. So if you're getting to using any of these characters and you're gonna wanna, if you're gonna wanna match any of these, they'll have to be escape with a \. So, for example if I wanted to match a $, I would have to escape that $ with \. So that's exactly what it's doing here, because otherwise it's a special character. For example, the $ indicates the end of the string here.

Capturing Matches in PHP Arrays

If I roll over the mouse cursor over the Hello World, I mean, Hello word, it's saying group 1. So that's how we capture it in group 1 and it's showing it here. If we wanted to capture another group, we would create another capture that as well. Now the World will be in the capturing group 2, which is exactly what it's saying here etc. So the more you use, the more capturing groups you can have. And this'll make more sense as we actually start programming and designing the script in PHP. The groups that are matched will actually be stored inside an array. And that'll make more sense later on.

Conclusion

So you guys can play around with this, make sure to download this cheat sheet. And just go ahead and play around with this text, and see if you can do some unique matches. And just start getting used to using these regular expressions to match whatever you want. And make sure to check out the next video where we're gonna be doing something practical. We're actually gonna be matching, let me show you here, if I copy this URL here. We're gonna go over some of these... We're gonna go to IMDB.com, which is a movie review site. And we're gonna match all these movies along with their names, reviews, description, all the stars and directors. And we're gonna match all of that using regular expressions. And then later on, we're also gonna save that all in a database and display it in our own website. Okay, I hope you guys found this video useful. If you did, please like, share and subscribe, and I'll see you next time. Clever Techie, out.