How to Read XML file in Java using Jsoup | Easiest Way to Parse

Learn how to Read XML File in Java using Jsoup here, which is hands down one of the easiest ways to parse an XML file.

If you have been struggling to understand why for some reason, you are unable to parse an XML from all the said ways on the internet like SAX, DOM, Xpaths etc., don’t worry, just relax. Go have a soup! I have been down that road. They bank on unnecessary overhead.

soup meme for ditd article

I present to you one of the easiest methods to read an XML in Java. Ladies and Gentlemen, did you ever come across Jsoup before? It’s not a soup. It’s the SOUP!

Jsoup

Jsoup is a Java library that helps you to work with real-time HTML. But you can use it on XMLs as well and the good news is that they work just fine there. APIs present in Jsoup are easy to use. You can get the job done without having to write a colossal amount of code.

Here’s a step by step process on How to Read XML file in Java using Jsoup. Just follow the simple steps and grab the tag, attribute or value you wish to obtain from an XML of your liking.

Steps on How to Read XML file in Java using Jsoup

So I am assuming you have an XML file with you that you are trying to read and get values from. Why would you be here, if you didn’t?

Here’s my directory structure where my XML file example.xml is located:

example xml file directory structure

The content of this example.xml that we are trying to parse, read or grab data from is:

<note>
<to>Thanos</to>
<from>Captain America</from>
<head> <title value = "Confessions"></title></head>
<heading>Avengers</heading>
<body>Todi Naakhis Fodi Naakhis Bhukko Kari Naakhis...I will kill you!</body>
</note>

It could be anything really. This is just an example.

The first step to learn How to Read XML file in Java using Jsoup would be to get Jsoup slapped on your project file. Let’s take care of that bit first.

Download and Add Jsoup Jar

Step 1: Download Jsoup library first from the following website. Just navigate to it:

https://jsoup.org/download

Step 2: Click on the core library link as shown:

link to download jsoup library

It will download your Jsoup library jar file which you have to now add to your project.

Step 3: Right-click on your project now and navigate to Build Path > Configure Build Path…

configure build path

It will open the Properties folder where you can now click on Libraries tab where the Add External Jars button is available.

Step 4: Click on Add External Jars… button. It is located as shown:

add external jars button

Step 5: In the next dialog box select the downloaded Jsoup jar file and click it open.

select jsoup jar through dialog box

Step 6: Click on Apply and then on Ok.

apply ok for jsoup

Create a Class for the Job

Step 7: Next step would be the coding bit where you need to first create a class structure like this:

class to read xml

Step 8: Since we are trying to read the xml file first, we will make use of File class. Type the following:

File file = new File(System.getProperty("user.dir") + "//Messages//example.xml");

That’s as per my directory structure. If your xml file is located in say D: drive, provide the path for the same like this:

File file = new File("D:/example.xml");

Jsoup’s parse method requires FileInputStream as a parameter, hence we are going to use it in the next line.

You might have to import the following to get rid of the errors:

import java.io.File;

Step 9: Type the following to obtain the stream from the file (example.xml):

 FileInputStream fis = new FileInputStream(file);

That should do it.

Now that we have the input stream with us we can make use of Jsoup’s parse method effectively.

Jsoup Parse Method

Step 10: Type the following piece of code:

Document doc = Jsoup.parse(fis, null, "", Parser.xmlParser());

As you can see the above method has four parameters, the first one of which takes the FileInputStream instance. The second one is for charSetName which we can set as null, third for baseUri which takes a String which we have used “” for, and the last one is the actual Parser that parses the first parameter.

The return type of the Jsoup.parse method is a document which we have declared. You can handle exceptions by using throws or try catch to catch an IOException which will also take care of FileNotFoundException of File class. I have used throws as of now.

Import the following packages to get rid of the errors:

import org.jsoup.nodes.Document;
import org.jsoup.parser.Parser;

Now is the time to cycle through all the elements and grab the desired value of all the instances (repetitive or not) based on your choice. We will use a for each loop for that:

Step 11: Type the following:

for(Element e: doc.select("body")){
     System.out.println(e);
 }

The part where I have put the “body” is the part where you can put your own desired tag.

Import the following package for Element:

import org.jsoup.nodes.Element;

That’s it. That’s all there is to it. Here’s a glimpse of the whole code:

Jsoup the entire code

Time to run your program.

Output for the Program on How to Read XML file in Java using Jsoup

Step 12: Run the above program and you will get the following result:

<body>
 Todi Naakhis Fodi Naakhis Bhukko Kari Naakhis...I will kill you!
</body>

If you want to run it for a different tag, just replace the body with the tag name of your choice:

for(Element e: doc.select("title")){
    System.out.println(e);
}

For title, as mentioned above, the result will become:

<title value="Confessions"></title>

In a similar fashion typing it for <to> like this:

for(Element e: doc.select("to")){
    System.out.println(e);
}

will give you the following result:

<to>
 Thanos
</to>

Now if you type it for <note> which is the root node like this:

for(Element e: doc.select("note")){
    System.out.println(e);
}

will give you the whole xml like this, since it is the root node:

<note> 
 <to>
 Thanos
 </to> 
 <from>
 Captain America
 </from> 
 <head> 
 <title value="Confessions"></title>
 </head> 
 <heading>
 Avengers
 </heading> 
 <body>
 Todi Naakhis Fodi Naakhis Bhukko Kari Naakhis...I will kill you!
 </body> 
</note>

Parsing an XML in String Format

So you don’t have an XML file rather some XML in the form of String that you wish to read. There are not huge changes that you need now.

Just put your XML in a String variable like this:

 String xml = "<note><to>Thanos</to><from>Captain America</from><head><title value = 'Confessions'>"
 + "</title></head><heading>Avengers</heading><body>Todi Naakhis Fodi Naakhis"
 + "Bhukko Kari Naakhis...I will kill you!</body></note>";

and then the main change that you have to do is in the Jsoup parse method. Use the three parameter method of the parse.

So the line you have to change is:

Document doc = Jsoup.parse(xml, "", Parser.xmlParser());

We are no longer using the charsetName parameter. So remove that and use the String variable as the first parameter.

So the whole code would appear something like this:

code to read XML from a String

Other Methods of Document Class

There are other methods in Document class that you can leverage to your advantage.

getElementsByAttribute Method

like there is this getElementsByAttribute method which will focus on that data which has an attribute and a value.

Just replace the above for each code with the following:

for(Element e: doc.getElementsByAttribute("value")){
   System.out.println(e);
}

If you run the above program you will get:

<title value="Confessions"></title>

As you can see it has taken the ‘value’ attribute and provided its tag and values both.

getAllElements Method

In a similar fashion, you can use getAllElements method of Documents class like this:

for(Element e: doc.getAllElements()){
   System.out.println(e);
}

It would spit out each element that it could find in your xml file. Then you can progress and make necessary adjustments as per your requirement. Here is the result you will get if you run the above:

<note> 
 <to>
 Thanos
 </to> 
 <from>
 Captain America
 </from> 
 <head> 
 <title value="Confessions"></title>
 </head> 
 <heading>
 Avengers
 </heading> 
 <body>
 Todi Naakhis Fodi Naakhis Bhukko Kari Naakhis...I will kill you!
 </body> 
</note>
<note> 
 <to>
 Thanos
 </to> 
 <from>
 Captain America
 </from> 
 <head> 
 <title value="Confessions"></title>
 </head> 
 <heading>
 Avengers
 </heading> 
 <body>
 Todi Naakhis Fodi Naakhis Bhukko Kari Naakhis...I will kill you!
 </body> 
</note>
<to>
 Thanos
</to>
<from>
 Captain America
</from>
<head> 
 <title value="Confessions"></title>
</head>
<title value="Confessions"></title>
<heading>
 Avengers
</heading>
<body>
 Todi Naakhis Fodi Naakhis Bhukko Kari Naakhis...I will kill you!
</body>

getElementsByTag Method

If you want to grab elements by their tags which basically works on the similar principles of doc.Select, you can do that by typing:

for(Element e: doc.getElementsByTag("heading")){
   System.out.println(e);
}

Here running  the above will give you:

<heading>
 Avengers
</heading>

getElementsByIndexGreaterThan Method

Yet another method from Documents that will grab all the nodes, attributes, and values based on the index value is the getElementsByIndexGreaterThan method which takes an index as a parameter.

Apparently, an xml file has been disintegrated into indices and there are other similar methods like that as well like:

get elements by index

Here’s one example:

Document doc = Jsoup.parse(fis, null, "", Parser.xmlParser());
  for(Element e: doc.getElementsByIndexGreaterThan(3)){
  System.out.println(e);
}

Run the above and you get:

<body>
Todi Naakhis Fodi Naakhis Bhukko Kari Naakhis...I will kill you!
</body>

In a similar fashion you can check it out for getElementsByIndexLessThan and getElementsByIndexEquals.

How to Get an Attribute

Now that you have grabbed a tag and everything that it has, it becomes fairly simple to grab its attribute. Use the attributes() method for that. Here’s how:

for(Element e: doc.getElementsByTag("title")) {
   System.out.println(e.attributes());
}

It will give you all the attributes present inside that tag along with the value.

Executing the aforementioned will give you:

 value="Confessions"

Notice how it gives you both the attribute and its value.

How to Get Value from an Attribute

Well now that you know how to grab an attribute from a tag, what if you want a particular value from that attribute?

You can use the attr(String attributeKey) method for that. As its parameter, you gotta specify the attribute name for which you want the value.

for(Element e: doc.getElementsByTag("title")){
    System.out.println(e.attr("value"));
}

If you run the above you will get the result as:

Confessions

which is nothing but the value of “value” (I don’t know why I named an attribute as value….so confusing! aargh) But you get it right. That’s how you get the value from an attribute.

Now that you have learned How to Read XML file in Java using Jsoup you can try on experiment methods from Document yourself.

Scottshak

Poet. Author. Blogger. Screenwriter. Director. Editor. Software Engineer. Author of "Songs of a Ruin" and proud owner of four websites and two production houses. Also, one of the geekiest Test Automation Engineers based in Ahmedabad.

You may also like...

3 Responses

  1. Zoé says:

    Hi,

    Parser.xmlParser() is deprecated. Any new method to replace it?

    Thanks

  2. Mick says:

    this website helped me so many times, thank you for sharing and trying to make complicated things simple

Leave a Reply