How to easily check if DOC, RTF, XLS, PPT, PPTX or PDF file has a malware without antivirus like a digital forensics expert

In most cyberattack variants threat actors use legitimate-looking documents loaded with malware, which is why researchers often say it all starts with a Word file, Power Point presentation, Excel spreadsheet, or even a book downloaded from a free PDF file website.

This time, digital forensics experts from the International Institute of Cyber Security (IICS) will show you a simple method to manually verify any suspicious documents and check if it is loaded with malware.

Broadly speaking, all file analysis techniques include the following elements:

  • Check the document for dangerous tags and scripts
  • Detect online code like shellcode, VBA macro, Javascript, Powershell and more
  • Extract the suspicious code or object from the file
  • If possible, delete the extracted code (although, with a very high degree of probability, the obfuscated code is harmful)

 Tools for analyzing Microsoft Office files

Oletools: This is a powerful Python toolkit for analyzing Microsoft OLE2 files, primarily Microsoft Office documents such as Word or Power Point files, mentioned by digital forensics experts.

For installation on Linux, simply run the following command:

sudo -H pip install -U oletools

On the other hand, if you want to install the tool on Windows systems, you must use the following command:

pip install -U oletools

In this package you can find many other tools, including:

PCODEDMP: This is a document Pi code disassembler (essentially a shell code). Digital forensic experts mention that this tool requires oletooles to function properly.  

PDF analysis tools

PDF Stream Dumper: This is a Windows GUI utility for PDF analysis very popular among the cybersecurity specialists community.

PDF-parser: Using this tool allows digital forensic experts to extract individual elements from a PDF file, such as headers, links, and more, for detailed analysis.

PDFID: PDFID lists all objects in the scanned PDF file.

PEEPDF: This is a pretty powerful analysis framework that includes shellcode search, Javascript and more. PEEPDF is enabled by default in Kali Linux.

PDFxray: This tool has most of the necessary utilities in the form of separate Python scripts, but requires many dependencies, mentioned by digital forensic experts.

What should we look for when analyzing a PDF document?

First, digital forensic specialists recommend looking for the following parameters:

  • /OpenAction and /AA, as they can run scripts automatically
  • /JavaScript and /JS respectively run js
  • /GoTo, since this action changes the visible page of the file, can automatically open and redirect to other PDF files
  • /Launch is able to start a program or open a document
  • /SubmitForm and /GoToR can send data by URL
  • /RichMedia can be used to embed flash
  • /ObjStm can hide objects

It is rare to find clean and non-merged code into malicious PDF files. The simplest types of obfuscation are HEX encoding such as /J s 61vaScript instead of /Javascript and line breaks:

/Ja\[/SIZE][/SIZE][/SIZE][/SIZE]
[SIZE=6][SIZE=4][SIZE=6][SIZE=4] vascr\
 Ipt

Security test

In this step, we will use a document loaded with malware to exploit the flaw tracked as CVE-2017-11882.

Let’s review the VBA scripts:

olevba exploit.doc

Immediately we will find tons of VBA script lines, and in the end they also show what it does. The next test is to analyze a PDF file using PDFID to view all the objects in the file.

As shown below, the PDF file contains /ObjStm objects. To ensure that they do not negatively impact our systems, we can extract these objects from the file and consider them separately using PDF-parser.

To learn more about information security risks, malware, vulnerabilities and information technologies, feel free to access the International Cyber Security Institute (IICS) website.