Regex in Java | Regular Expression Java Tutorial

Since you are smart enough, you might have already guessed what Regex in Java is short for. Regular Expressions in Java, correct! It is abbreviated owing to the name of its class which is located in the java.util.regex package. Java Regex is nothing but an API that facilitates pattern matching for strings. You could also manipulate a string with the help of regex in Java. So to answer the big question below:

What is a Regular Expression

A regular expression is nothing but a sequence of characters that aids in matching strings using special syntaxes in a pattern. The IT world often makes use of Regex to pull up constraints on passwords and mail validations.

kristen stewart meme for regular expressions in java

Using regex in Java you could search, edit and manipulate any text or data.

That being said Regular Expressions aren’t only Java’s prerogative. It is a widely accepted standard followed by different languages. But since we are dealing with Java here, we are just going to see how they play out here.

Classes of Regex in Java

Inside the java.util.regex there are three important classes namely:

  1. Pattern class
  2. Matcher class
  3. PatternSyntaxException class

Pattern Class

There are no public constructors here. An instance of this class is a compiled representation of regular expressions.

In order to create a pattern, you must first call one of its static methods called compile() that will return a Pattern object. Like this:

Pattern p = Pattern.compile(regex);

Notice, there are only two compile() methods available as part of Pattern class. One takes just one argument as regex which we have used above. The other compile() method takes a flag as the second argument.

Here are some other important methods to remember:

  • static boolean matches(String regex, CharSequence input)
  • String[] split (CharSequence input)
  • String pattern()

Matcher Class

Like Pattern Class, there are no public constructors here either.

It is the job of the Matcher Class to interpret the pattern using the pattern object. It then verifies the passed pattern and tries to match it against the input string.

In order to get a Matcher object, you need to call matcher() method on a pattern object. If we take the above code into consideration, our Matcher declaration would become something like this:

Matcher m = p.matcher(input);

Replace input with your input string where you wish to match the pattern against. input should be Charsequence.

Here are some other methods to remember:

  • boolean matches()
  • boolean find()
  • int groupCount()
  • String group()
  • int start()
  • int end()

PatternSyntaxException Class

This is an unchecked Exception class that might arise owing to syntax error issues with the pattern entered. There are built-in methods here that you could use to check issues with your code. These are namely:

  • String getDescription()
  • int getIndex()
  • String getPattern()
  • String getMessage()

Regex in Java Example

Let’s see our first example to test a regex in Java.

Here we are going to make use of the ‘.’ expression that means a single character. Look at it as a blank. We will use the matches() method to return the result hereby. The matches() method tries to match the entire input against the regex.

Remember, before proceeding you need to import Pattern and Matcher classes in the java.util.regex package. Or you could simply import java.util.regex.*; to avoid individual references.

Pattern p = Pattern.compile(".s");

Matcher m = p.matcher("ts");

boolean b = m.matches();

System.out.println(b);

We are simply following what we have learnt so far. Our regular expression says “.s” which means it wonders if our 2nd character is s (with ‘.’ being the first one). The third line tries to match it against the object m on the input “ts”.

If we run the above program we will get:

true

Yes, our second character out of “ts” is indeed “s”.

Writing it in a Single Line

What if you wish to see if the first character is s or not. You guessed right, you simply need to move the ‘.’ towards the suffix side. We can choose to omit to assign reference variables and make the call in just one line like this:

System.out.println(Pattern.compile("s.").matcher("sd").matches());

It is saying if out of two characters the first one is ‘s’ or not. It is, so we are going to get:

true

NOTE: This search will not work if there are too many characters and you are using a limited number of ‘.’ symbols. Like:

System.out.println(Pattern.compile("s.").matcher("svdncr").matches());

The above will print:

false

The above would be false since there are 6 characters in our input string and our regex intends to find out matching stuff for just two characters.

Using Pattern.matches Method

What if we wanted to check if our 5th character is ‘s’ or not. You could simply use that many ‘.’ as blanks. We could make use of Pattern’s matches() method that takes regex and CharSequence as input to avoid one extra step. It combines Pattern and Compiler in one.

System.out.println(Pattern.matches("....s", "tgcds"));

The answer to the above would be:

true

once again, since the fifth character indeed is s.

Capturing Groups for Regex in Java

Since we will be dealing with searching, comparing and manipulating not just one character, but groups we need to first learn how to represent them in Java.

Capturing group is simply a way to consider multiple characters as one single unit. You just need to put characters in a parenthesis. For instance:

(abc) will treat “abc” as a single group with individual letters ‘a’, ‘b’ and ‘c’.

In the case of multiple groups, you need to simply count the no. of parentheses from left to right to identify the number of groups it has.

For example:

((A(B))(CD)(E))

Here there are 5 opening parentheses meaning 5 different groups:

  1. A(B)
  2. (CD)
  3. (B)
  4. (E)
  5. ((A(B))(CD)(E))

If you are having trouble finding out the number of groups you can choose to call the groupCount() method of matcher class to check that out. It returns integer value.

A simple this would have sufficed:

System.out.println(Pattern.compile("((A(B))(CD)(E))").matcher("").groupCount());

It gives you the same result:

5

There is also one special group known as Group 0 which basically implies the whole equation. It is not included as part of groupCount.

How to Look for Groups

Up until now,  we have been counting groups using groupCount. What if we could separate them out as well in our input string?

Can we do that? Yes. We can.

Well, that’s not difficult actually. You can make use of Matcher’s find() method to run a while loop. Then you can print the results till that while holds true, meaning find() method keeps matching something.

Here is how to do that:

Matcher m = Pattern.compile("((A)(B(C)))").matcher("ABC"); 

while(m.find()) { 

      System.out.println(m.group(0));
      System.out.println(m.group(1)); 
      System.out.println(m.group(2)); 
      System.out.println(m.group(3)); 
      System.out.println(m.group(4)); 

}

I am trying to print group(0) method too to tell you that it prints the entire expression.

You will get the following result for the above:

ABC
ABC
A
BC
C

As you can see there are 4 groups here and are printed accordingly.

Regex Character Classes in Java

There are some character classes that you should remember if you want to match the right pattern.

Remember ^ is used for negation, && insinuates intersection, and we use ‘-‘ to allude range.

  • [abc]                –        a, b or c
  • [^abc]              –        not a, b or c
  • [a-zA-Z]           –        all characters from a to z (Uppercase too)
  • [a-d[m-p]       –        characters from a to d then m to p
  • [a-z&&[def]]  –        all characters from a to z where d, e or f lie
  • [a-z&&[^hn]] –        all characters except h and n
  • [a-z&&[^l-n]] –        all characters except from l to n

I will try to put some in the form of example to dumb things down further:

Matcher m = Pattern.compile("([abc])").matcher("Hello35634564wisconsin55634");

while(m.find()) { System.out.print(m.group(1));

            System.out.print(m.group(1));

}

The above will give you if it successfully finds any a or b or c in the String.

The result to above would be:

c

If you make use of [a-d[j-o]] this is what you will get:

Matcher m = Pattern.compile("([a-d[j-o]])").matcher("Hello35634564wisconsin55634"); 

while(m.find()) {

            System.out.print(m.group(1));

}

The result would be

lloconn

You see all characters from a to d meaning ‘c’ in our string, and j to o meaning “llon” are found and returned.

Here I am trying that intersection bit to separate out just ‘l’, ‘o’ and ‘n’.

Matcher m = Pattern.compile("([a-z&&[lon]])").matcher("Hello3563456wisconsin55634"); 

while(m.find()) { 

System.out.print(m.group(1)); 

}

If you run the above you will get:

lloonn

Regular Expression Metacharacters Syntax in Java

There are some syntaxes and symbols that are shortcodes for the aforementioned lengthy syntaxes. These come in handy when you wish to save a lot of time typing complex regular expressions:

  •   .       –            Any character
  • \d      –            Any digits                 Also [0-9]
  • \D      –            Any non-digit         Also [^0-9]
  • \s       –            Any white-space character
  • \S      –            Any non-white-space character     Also [^\s]
  • \w     –            Any word character                     Also [a-zA-Z_0-9]
  • \W     –            Any non-word character           Also [^\w]
  • \b      –            A word boundary
  • \B      –            A non-word boundary

We will use this in what we have learned so far and try to put them in groups to use the group() method to print things out:

Matcher m = Pattern.compile("(\\D)").matcher("Hello35634564wisconsin55634"); while(m.find()) { System.out.print(m.group(2));

while(m.find()) { System.out.print(m.group(2));

            System.out.print(m.group(2));

}

If you try to run the above this is what you will get:

Hellowisconsin

Notice we are forced to use \\D and not \D because your JVM has different interpretations for a single ‘\’ sign.

The alternative to the above would be [a-zA-Z] of course. Go ahead and put that in the first line and check it out for yourself.

What if we wanted to grab just numbers from the string? Yes you can use either [0-9] or \\d to do that. Let’s see:

Matcher m = Pattern.compile("(\\d)").matcher("Hello35634564wisconsin55634"); 

while(m.find()) { 

           System.out.print(m.group(1));

}

Run the above and you will get this:

3563456455634

Works fine.

Regular Expression Quantifiers

Then we can make use of some Quantifiers of regex in Java which could help in identifying how often a particular element or character occurs in your string.

  • X*              –              X occurs 0 or more times
  • X+              –             X occurs 1 or more times
  • X?              –             X occurs once or not at all
  • X{n}          –             X occurs n no. of times only
  • X{n,}         –             X occurs n or more than n times
  • X {n,l}       –             X occurs at least n times but less than l times
  • *?               –            Reluctant Quantifier. Stops after 1st match

Let’s see examples right away:

Matcher m = Pattern.compile("([l*])").matcher("Hello3563456wisconsin55634"); 

while(m.find()) { 

           System.out.print(m.group(1));
           System.out.println(m.start());
}

If you try to run the above program you will get:

l2
l3

Because ‘l’ occurs twice at indices 2 and 3. Notice we have used start() method of Matcher which gives you the starting index of matched regex. In a similar manner, there is end() method that gives you the ending index.

Replacement Methods

What if you want to replace something with something in your input string? What would you do then?

There are five different types of replacement methods available as part of Matcher class. These are:

  • String replaceFirst(String replacement)
  • String replaceAll(String replacement)
  • Matcher appendReplacement(StringBuffer sb, String replacement)
  • StringBuffer appendTail(StringBuffer sb)
  • static String quoteReplacement(String s)

Let us change “Hello” to a minion language:

Matcher m = Pattern.compile("([H])").matcher("Hello");

System.out.println(m.replaceAll("B"));

If you run the above you will find a minion saying:

Bello

Had we used replaceFirst() it would have just replaced the first instance of the regex from our input.

Matcher m = Pattern.compile("([l])").matcher("Hello"); 

System.out.println(m.replaceFirst("B"));

If you run the above you will get:

HeBlo

appendReplacement() and appendTail() Methods

We can try and understand this using the appendReplacement() and appendTail() methods as well.

Matcher m = Pattern.compile("([H])").matcher("Hello"); 

StringBuffer sb = new StringBuffer(); 

while (m.find()){         

      m.appendReplacement(sb, "B"); 

}        

m.appendTail(sb); 

System.out.println(sb.toString());

If you run the above you will get the same result:

s

quoteReplacement() method

The static quoteReplacement() method can be used to replace entire literal string. They come in handy when you are dealing with certain symbols like “$” or ‘&’ or something that doesn’t work well with certain systsem.

Here’s one example on how to replace using quoteReplacement() method:

String input = "i$\\";String input = "i$\\";

Matcher m = Pattern.compile("(itch)").matcher("She is a bitch");

System.out.println(m.replaceFirst(Matcher.quoteReplacement(input)));

If you run the above you will get the following result:

She is a bi$\

Two backslashes have been treated like one. That’s an internal JVM thing, so don’t worry.

One Final Fun Example

Let’s make regex in Java more interesting by creating a file on our system from where we will grab no. of character occurrences. I have created a file named “minions.txt” in my E:\ drive. The file has the following important top secret info:

Babble Banana Bello

Remembering what we had learned about reading from a file, let’s access it using BufferedReader and FileReader classes. NOTE: You need to import java.io.*; for that.

Here I have written the program:

grabbing characters from a file regex in java

If you run the above you will get the following result:

BbbBB

So there were 5 instances of B’s and b’s combined together in the file. Hmmm…

Let’s replace all B’s and b’s with “h”. For that you need to change the line System.out.print(m.group(1)); to:

System.out.print(m.replaceAll("h"));

That’s it.

Run the program and you get:

hahhle hanana hello

As mentioned in the String chapter before, regular expressions come in handy when you are trying to identify white spaces in a file too.

Now, what if you wish to get rid of white spaces?

White spaces are denoted with ‘\s’ sign, so you need to make just a minor adjustment:

Change this [bB] to [bB\\s]

Now run the program you will get:

hahhlehhananahhello

Pretty cool, huh!

String Split() method

Since the String split() method used to take regex too and since the inclusion pigeonholes it in the regex in java category, it would be great to have a look at it here. The split() method uses regex as a parameter. But the return type is a string array.

Here’s an example:

String s = "A Nightmare on the Elm Street";

String ss[] = s.split("\\s");

   for(int i = 0; i<ss.length; i++){

        System.out.println(ss[i]);

}

Now if you will run the program each word will be separated from its white space. Here’s the result:

A
Nightmare
on
the
Elm
Street

Alright, let’s call it a day then.

I think regex in Java has been tickled enough.

Time to do my favourite pastime: Slumber!

Scottshak

Poet. Author. Blogger. Screenwriter. Director. Editor. Software Engineer. Author of "Songs of a Ruin" and proud owner of four websites and two production houses. Also, one of the geekiest Test Automation Engineers based in Ahmedabad.

You may also like...

2 Responses

  1. July 6, 2017

    […] that you have the source code you can use all sorts of things that you had learnt during our Regex and String chapter to match and sunder the things that you […]

  2. July 11, 2017

    […] you need to remember what we have learnt in the Regex chapter about white spaces. If we would have used just “\s” instead of “\s+” it […]

Leave a Reply