Introducing REGEX Excel Functions - Extract, Clean, and Format Data Easily! (NEW!)

Introducing REGEX Excel Functions - Extract, Clean, and Format Data Easily! (NEW!) Microsoft just announced new REGEX functions in Excel, and they're going to make your life a lot simpler. So, what exactly is REGEX? It stands for Regular Expressions. It's a powerful way to search and manipulate text. Before you say, "This is probably for some geeky Excel users. I'm never going to understand this. I'm never going to need this," wait because you're probably going to come across cases where you will need exactly this function. So, just imagine your boss sends you.

Introducing REGEX Excel Functions - Extract, Clean, and Format Data Easily! (NEW!)

A list of data, and you need to find all the email addresses in those cells, hidden in the middle of all this text. Now, instead of going through this manually and copying and pasting email addresses for each single cell, you'll use the REGEXEXTRACT Excel function to get all the instances of email addresses in a dynamic way. So here, if I add "and also info@xelplus.com," we see the second email automatically extracted. You can use this function to extract any text between brackets. It doesn't.

Matter if the brackets are in the middle or at the end. You can use it to extract dates in the middle of text, even multiple dates from a single cell. You can extract website addresses from long sentences. You can break up text that's stuck together like we have here. So as you can see, whenever we have a pattern—and a pattern is anything that you can explain logically to someone—you can use the REGEXEXTRACT function to extract that pattern. Now,.

Sometimes you might want to replace text based on a pattern. You might download data from an app, and it comes with special characters, and you want to get rid of those. You can use the REGEXREPLACE function to replace the characters you don't want, so you end up with clean data. You can use this to get rid of leading zeros in your data. You can also format numbers, like credit card numbers, by adding a dash after every fourth digit. Anything you can explain that has a logic to.

It can be replaced or extracted using these two new Excel functions. Now let's work through these in Excel from scratch, and I'm going to show you how the functions work. You can grab the file from the link in the description of this video. If you don't have these functions yet, don't worry. It's a beta feature, and it's going to take some time until it rolls out to everyone. I've also included a REGEX cheat sheet to your downloadable file, so in case you want to know how this function works,.

You have all you need here. But let's just quickly do some examples. We're going to extract email addresses from text. In this case, I have information about some projects and the next steps, and sometimes I have one email address, sometimes I have two email addresses. I want to extract these email addresses and put them in a separate column, and doing this using standard Excel functions that we had access to until now is not going to be easy. But from now on,.

It is because we have the new REGEX functions. The function that we need here is the REGEXEXTRACT. Why? Because we want to extract a piece of information from this text. Now, this function just requires two things that are mandatory; the rest are optional. So, the first thing it requires is the text. This is where we want to extract our information from, which is this one. Next is the pattern. Now, the pattern is something you provide inside quotation marks. If I just.

Put nothing there and press Enter, I get nothing returned. Now optimally, what I'd love to be able to do is to tell it, "Extract email addresses," or just describe the pattern, like, "Match a sequence of letters that's followed by the '@' sign and extract everything until you get to the dot and everything after that until you get to a space," or just some type of logic, right? But you can't do that. What you need to do is give it a Regular Expression pattern in the way, in the code,.

That it understands. Now, Regular Expressions have been around for a while. They're part of most programming languages. So, the good news is that AI knows REGEX pretty well, so what you can do is get the help of AI. And that's what I did in this example. You can use Copilot or ChatGPT. So here, I'm in Copilot, and I told it, "Give me the REGEX pattern to extract email from text." So if I have "Great progress, please email Dave at dave@thecompany.com for the latest update,".

I want to return "dave@thecompany.com." So it's good practice to give it an example and what you want returned. Then Copilot went ahead and said, "To extract an email address from a given text, you can use the following Regular Expression." That's the expression. All I need to do is copy it, go back to Excel, and put it inside the quotation marks, press Enter, and voila, it works. Let's just drag this down. What I get here is the first email address, but I want all of.

Them. This is where we can take advantage of the optional arguments. Here we get the first match, which was by default, right? If we don't add any optional arguments, we just always get the first match. But we can switch this and get all matches. So if I put a 1 here, it should return all matches. If I drag this down, this spills. I get "support@helpme.com" and also "joho@helpme.com," but it spills downwards, right? So the problem is if I want to drag this formula down, I'm going to.

Get a #SPILL! error. I'm going to work around that. So what I'm going to do is put this function inside the TEXTJOIN function. With TEXTJOIN, I can define the delimiter. So how do I want the results to be separated? I want them to be separated with a comma and space. Then if I want to ignore empty cells or include empty cells, I'm just going to go with ignore empty cells. Okay, then for the text, that's going to be our REGEX argument. Let's close the bracket and press Enter..

Posts Related:

    I'm just going to double-click to send this down until here. Let's double-click again to send this

    All the way down. So now everything looks good, except whenever we don't have an email address, we get the #N/A error. So I'll just wrap this inside the IFERROR function. If we have an error, we want to return nothing. Close the bracket, double-click to send this down. The great thing is because it's a function, everything is dynamic, right? So if we add "and contact.

    Info@xelplus.com," our results update immediately. Now, the pattern does look crazy for anyone who isn't familiar with REGEX. You can ask Copilot or ChatGPT to step through this. You can also reference our cheat sheet to become familiar with what these letters and symbols mean. Now, once you get the hang of it and you understand the basics, it becomes easier to describe your pattern in this way. But honestly, AI works really well here, so why not take that shortcut? Next, let's.

    Take a look at how we can extract text from inside brackets. So here, for our project description, we have parts of the description inside brackets, and I want to grab that part and put it separately here. So because we're going to extract something, we need the REGEXEXTRACT function. The text is here. The pattern, quotation, quotation, because I have no idea, I'm going to go to ChatGPT this time and ask it for help. Over here in ChatGPT, I gave it the prompt to give me the REGEX pattern to.

    Extract text from inside brackets, and I gave it one example and what I want returned. The pattern it gave me is this. Optimally, I would just ask it for the entire Excel function, but ChatGPT, until now, has no idea that Excel has gotten these functions, so we're just going to go with the pattern for now. I'm going to copy this, and I'm going to put the pattern in here. Press Enter, and I get the content that I want. Now, the cool thing is instead of just referencing B2, I.

    Can also make everything spill by referencing the entire range. Now, you might be wondering, "But I don't want these brackets. I just want the text inside the brackets." Here, we can take advantage of the other optional arguments. Here, we can return the first match, which is the default, or all matches, which we saw before. Now, we could select 2 instead, which captures groups of the first match. Now, when I put a 2 and press Enter, I get the content that's inside the brackets. Now,.

    If you're confused about what this argument really does, the capture groups of the first match, let's take a look at our next example. We want to extract dates from the middle of text. So, for example, here we have different project milestones, and we have some dates mentioned. I want to grab these dates and split them into separate year, month, and day columns. Notice sometimes the months have one digit, sometimes they have two digits. The same for days. The.

    Pattern should account for that. Now, because we want to extract something based on our logic, we're going to use the REGEXEXTRACT. Text is this. The pattern that I found is this. Close the bracket, press Enter, and we get the dates back. What this basically means is look for four digits, then either one or two digits for the months, and then one or two digits for the days. Now remember, I want them separately though. This is where I can use the optional argument, Capture groups.

    Of first match. Each of these is a separate group. So when I put a 2 there and press Enter, I get each group separately. The problem is they spill vertically. I want them to spill horizontally. So what I'm going to do is put this inside the TOROW function, press Enter, and drag this down. And I have my dates split into a separate year, month, and a day column. Now, what if I have multiple dates and I want to extract all of them? For example, here I have.

    Three separate dates. I want each date separately in its own cell. So what I'm going to do is use the exact same function that we had before. So my text is this. My pattern is exactly what we had before, except for return mode, I'm going to go with all matches. So I'm going to put a 1, close the bracket, press Enter. Now as I start to pull this down, notice we can see again that everything is spilling vertically, right? So this is going to give me the #SPILL! error. What I want to.

    Do is spill these horizontally, so I'm going to use the TOROW function. Close the bracket,

    Press Enter. Now I can easily drag this down. If you wanted all the dates in the same cell, you can use the TEXTJOIN function. Now, how about extracting web addresses from text? So here I'm dealing with an Excel table. We can also see how these functions work in tables. I have a feedback column. The feedback includes web addresses. Sometimes these addresses start with "https,".

    Sometimes they start with "www.," and sometimes they're just "itcrowd.com." I want to write a function here that extracts these. So we're going to go with REGEXEXTRACT. The text is here. The pattern in quotations is this. Now, if I just close the bracket and press Enter, I get the first web address back. In general, I think I just have one, except this one where I have "ERORR404.com" and "google.com." But notice I'm not even getting the "ERORR404." There, any uppercase.

    Letters are missing. Why? Because my pattern doesn't include uppercase. So I could go and update this to include uppercase, or I can take advantage of that last argument in the function, which accounts for whether I want everything to be case sensitive or case insensitive. So by default, everything is case sensitive. I'm going to go with case insensitive, and now I get everything back. But again, notice I only get the first web address back. If I want to get more than one and return.

    All matches, I have to put a 1 in this argument. But the moment you do that, you're going to end up with a #SPILL! error because everything is spilling vertically. What you want to do is put this inside a function that can bring everything together. So I'm going to use the TEXTJOIN function, split everything based on the comma delimiter, and close the bracket and press Enter. And now we get everything back. Now you might find that you need to update your patterns because something.

    Might not work. So, for example, here, if I have "/courses," the entire web address isn't extracted. I will need to update that pattern. Okay, so let's just reverse that, and I've done it over here. So now we get to extract the entire web address with this pattern. So you're going to find yourself adding a pattern, and it works on the data set, but then when new data come, maybe it doesn't work on that. Figure out how you can update it so it works for all cases. And again,.

    You can use AI for this. Split text that's stuck together into multiple words. That's another thing you can do with the REGEXEXTRACT function. So here I have employee names that are stuck together, but notice there's a pattern. The first letter of the last name is always a capital letter. I'm going to use that pattern as my REGEX pattern and use that to create proper names. So even if I add another name to this, the name of my dog, it ends up as a separate word..

    Now, how about using REGEX to replace leading zeros? So here we have SAP product code numbers, and they have leading zeros. Sometimes it's three, sometimes two, sometimes one, and we just want to see the number. We want to get rid of these. So this time, we can use the REGEXREPLACE function. Here it is. The text is this. The pattern that we want is this in this case. Now we have one additional mandatory argument. We need to tell the function what we want to replace it with. Now,.

    I just want to get rid of the zeros, so I want to replace it with nothing. I'm going to put quotation, quotation, close the bracket, and press Enter. And these are my values. Now, of course, you can make this spill if you want. I could select the entire range and make this spill. These are actually currently seen as text and not numbers, so if I want them to be seen as numbers, I can put these inside the VALUE function. And now I have my product codes without the leading.

    Zeros. Cleaning up messy text is another great use case for the REGEXREPLACE function. So here, for my project descriptions, sometimes I have emojis, sometimes I have special characters that I want to remove. I want to get the description without the junk. I'm going to use the REGEXREPLACE function. The text is this. The pattern I came up with is this. I'm going to put it inside quotations. The replacement is nothing. If I want everything to spill, I can actually reference the.

    Entire range. So now when I press Enter, I get everything returned clean. For performance, that star is gone. For report, these special characters are removed. Now, if a cell doesn't have any special character, I get a hash. So I could put this inside the IFERROR function. So if I happen to have an error, I want to get the project description returned. Close the bracket and Enter. Now, if at any point in time you decide that you want to ignore something, so for example,.

    I actually want to keep this ampersand, all I have to do is add it to my pattern. So right here, inside the square brackets, I'm going to put in the ampersand. When I press Enter, the ampersand stays. Insert a character at specific positions. This is great for formatting credit card numbers or phone numbers. So here I have a list of credit card numbers. It's a mess. Sometimes the number is seen as text, sometimes it's seen as numbers, but because it's so big,.

    Excel shows it in this format. I want to format it so I get a dash after every fourth character but not a dash at the very end. So what I'm going to do is use the REGEXREPLACE function. Text is this. The pattern is this right here. I'm going to put it in quotations. Now for replacement, we're not going to go with nothing because if I go with nothing, I just get the last four characters. Instead, what I want to do is reference the group that's being captured, which is every.

    Fourth digit. And to do that, I can use the symbol $1. So this references the first capturing group, which in this case is my only capturing group, and then I want a dash. And then when I press Enter, I get these properly formatted. To find out more about these symbols and how they work, take a look at the REGEX Cheat Sheet that we've included in our downloadable file. The link to it is in the description of the video. Okay, so this was my review of the new REGEX functions..

    DISCLAIMER: In this description contains affiliate links, which means that if you click on one of the product links, I'll receive a small commission. This helps support the channel and allows us to continuetomake videos like this. All Content Responsibility lies with the Channel Producer. For Download, see The Author's channel. The content of this Post was transcribed from the Channel: https://www.youtube.com/watch?v=YFnXV2be9eg
Previous Post Next Post