HackDefense Home
Thijmen Kooy

What is XXE (XML eXternal Entity) injection?

A lot of modern web applications still use XML for transportation and storage of data. In 1996 the World Wide Web Consortium (W3C) created this standard and to this day, it is used for a wide variety of implementations. XML has many features that developers are not always familiar with, offering hackers an opportunity for abuse.

Insecure implementations of some XML functionalities can introduce vulnerabilities, one of which is XML External Entity injection (XXE). XXE means that the XML functionality of the application can be used to fetch external sources through a reference in the XML. Vulnerable software that parses the XML interprets the reference, enabling XXE attacks. This vulnerability can sometimes be used to read files from the server, or even to execute commands on it.

The XXE vulnerability is one of the most critical security issues according to the OWASP Top 10. It is categorized under A05:2021-Security Misconfiguration” and is the 5th most critical vulnerability in 2021.

XML Basics

XML is also known as Extensible Markup Language and is based on Standard Generalized Markup Language (SGML) just like HTML. So, the XML format has a lot in common with the HTML format. It also has declarations, elements and attributes, as shown in the image below.

1

An XML file starts with an XML declaration. This declares which XML version is used. In most cases this is set to <?xml version="1.0"?>.

After that, the Document type declaration (DTD) tells the software how the file is structured. This is declared in the element <!DOCTYPE ...>. The author can define a definition, or can reference a remote or local definition file. This definition can be stored in a .dtd file or can be defined in the Document type declaration using square brackets[ ... ]. It’s also quite common to reference an external DTD on the internet.

The DTD is followed by the data structured in elements and attributes, where (external) entities can be used.

Entities in XML

Entities can be compared to a variable in a programming language. In the following example the entity msg contains the value Hello World”. The value will be stored in the <message> element. A reference to this entity is written as &msg;.

<!DOCTYPE xml [ <!ENTITY msg "Hello World"> ]> <xml> <message>&msg;</message> </xml>

This is an example of an internal entity. There are also external entities and these are used in XXE attacks. An external entity can be used like in the example below. In this example the entity ext references an external source: https://example.com/. The software that parses the XML will fetch the external source when the XML file is interpreted. Also, notice the SYSTEM keyword which indicates that it is an external entity.

<!DOCTYPE xml [ <!ENTITY ext SYSTEM "http://example.com/" > ]> <xml> <site>&ext;</site> </xml>

Besides requesting data form external sources, it is also possible to include local files (on the server) using an external entity.

XXE attack

An XXE attack is possible when XML functionalities are used that support dangerous features like external entities. To demonstrate this vulnerability, we’ve used the xxelab made by Joshua Barone. This lab environment is intentionally vulnerable to XXE attacks for testing purposes. It contains a vulnerable application with a registration form that uses XML. The following screenshot shows how the form data is structured and how the<email> element returns in the response.
En2

To check if this form supports XML entities, use an internal entity, like this:

<!DOCTYPE foo [ <!ENTITY xxe "This is an internal entity" >]>

Because the value of <email> is reflected in the response, we can use this to exploit this vulnerability. In the following example the entity xxe is used to display the value This is an internal entity” in the response.

En3

To exploit this vulnerability to read local files, use an external entity, like in the following example:

<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd" >]>

This external entity references the local file /etc/passwd on the file system of the server hosting the application. Because the value of <email> is reflected in the response, we can use this to display the contents of the local file /etc/passwd. The screenshot below illustrates how the external entity xxe can be used to read the local file/etc/passwd:

4

What else can you do with XXE?

An XXE attack can be used for multiple exploitation vectors. An XXE attack can be used as a DoS attack (known as the Billion Laughs attack). This attack creates a lot of copies of an entity, so the application has to use large amounts of server memory to process the XML. 

In some cases an XXE vulnerability can be used for port scanning. This can be achieved by references to the internal network. In some situations the response or response time can give an indication if the port (referenced by the URL in the external entity) is open or closed. 

And, worst case scenario, an XXE vulnerabiltiy can also be used to execute commands directly on the local system. 

Fixing XXE vulnerabilities

Most XXE vulnerabilities arise when an application supports dangerous functionalities like external XML entities. So the most effective way to mitigate these vulnerabilities is to disable these functionalities or to limit the application through filtering. There are a lot of platforms and libraries that support XML, but here are some fixes for some common ones: 

  • Java – javax.xml.parsers.DocumentBuilderFactory
    • factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
  • PHP – libxml2
    • libxml_set_external_entity_loader(null);
  • .NET –XmlTextReader
    • reader.ProhibitDtd = true;

For mitigations for other platforms, please check the OWASP XXE Prevention cheatsheet. Nowadays most libraries are protected from XXE attacks because loading of external entities is not enabled by default. 

The use of a local static DTD can enforce safe and correct XML input. A DTD file defines some rules that have to be satisfied before the XML input can be loaded. Also make sure that the application does not accept DTD’s from user input.

It also might be an option to look at other data storage formats like YAML and JSON. But be careful, these formats can be dangerous as well if not implemented incorrectly.