XXE
Theory
XML stands for Extensible Markup Language.
It is a standardized way to store structured data which is both human & machine readable.
They're comprised of elements or tags. Tags can contain text or other tags such as attributes, which contain values.
Optional element is the Document Type Definition [DTD]
Used to define the structure of the document so that a parser can determine whether the document is well formed or cannot be validated.
When a DTD is parsed by an XML parser, it generates a Document Object Model[DOM]. The DOM is queried programmatically later on to extract data from the document.
For example, when the above document is parsed into a DOM in Java. The following queries are used to extract data from the DOM.
XXE [XML External Entity Injection]
If a user wanted to create a <tag> with the content "I <3 XML", the XML parser would error out because it thinks the <
bracket is a beginning of a tag. This is where Entities are helpful.
Entities
String representation of specific characters.
5 Predefined entities.
Custom entities can be defined in DOM.
Custom entities are only fully expanded when they are needed by the parser. For example an external HTTP URI, will not be fetched at the time of declaration, but only if the entity is referenced in the XML Document.
Internal Entities: Entities that represent some data within the document, defined as a string.
Useful when referencing the same long string multiple times in the document.
Recursive Entities: Below, when bar2 is referenced, the XML parser recursively expands the bar entitiy as well.
Denial of Service Attack : Billion Laughs attack uses the above concept. 1 billion lols ~ 4GB memory.
External Entities: Entities which represent some data located outside of the document, defined as a URI. Left upto the XML parser how a URI is processed to fetch the data.
In this case, the external entity bar followed by the keyword SYSTEM is used to define URI to a web server. The XML parser parses &bar; issues an HTTP request to the URI and stores the response in the DOM. URI processing is upto the parser.
Schemes:
file://
ftp://
php://filter/
expect:// [Useful for RCE]
This occurs because developers do not disable external entities. The libraries do not disable them by default.
The following entity can be declared, for example:
You can then simply use the reference to foo
: &foo;
(don't forget to encode &
) to get the corresponding result inserted in the XML document during its parsing (server side).
Parameter Entities
Can only be defined within a DTD.
Created by adding a
%
before the entity declaration.Useful when regular entities are disabled.
Useful for blind XXE.
They can be used to dynamically declare other parameter entities. The entity bar
represents a string which contains another entity declaration. Here %
is hex encoding for %
. Otherwise the parser would think we're referencing another parameter entity. Only after bar is referenced, bar2 is referenced.
External DTDs
A DTD with entity definitions can be hosted externally and then referenced in the XML document.
Blind XXE
Occures when parsed entities are not returned to the user. The URIs are still accessed however, and combination of internal, external DTD and dynamically declared parameter entities can be used to exfiltrate data.
Error based
Vulnerability can lead to
Local File Inclusion.
SSRF
Blind attacks (Data exfil)
Denial of service
RCE (sometimes)
Exploitation
Xpath Injection
XPath is a query language, which selects nodes from an XML document. There are no access level permissions and it is possible to refer almost any part of an XML document unlike SQL which allows restrictions on databases, tables or columns
Imagine the XML document as a database, and XPath as an SQL query.
If you can manipulate the query, you will be able to retrieve elements to which you normally should not have access.
Theory of Attack
XML Query
If we insert a malicious payload as user name, the XPath query generated would be as follows:
Thus, the first part of the query becomes true and the second part is neglected. The password becomes irrelevant and the attacker gets unauthorized access to the website.
Exploitation
To comment out the rest of the XPath expression, you can use a NULL BYTE (which you will need to encode as %00)
Blind XXE
XXE in File Parsing
Open Document Format is a zip-compressed, XML-based file format.
Files in ODF format: docx, pptx, xlsx, odt ,ods ,odp and more
The files are zip collection of multiple XML's which are parsed to processing.
A user can edit these XML files and inject an XXE payload. If the backend XML parser allwoed XML External Entities, an attacker can abuse it to perform an XXE attack.
Workflow
Methodology in Identifying Vulnerability
Response after a successful file upload. The XML parser is reading something from the docx file configuration and fetching it to the user.
2. Open file.docx in Vi & identify the location where the above response is present.
3. Insert payload at the correct parsing location.
4. Upload file. Payload is executed.
Last updated