Skip to content

Sensitive Data Exposure

NodeZero attempts to discover and assess potentially sensitive information when a filesystem or service is compromised. Examples include, but are not limited to:

  • Business documents in file shares (.docx, .pdf, .xlsx)
  • Outlook PST files
  • Confluence RCE
  • Exchange RCE

When NodeZero gains access to files it takes different actions depending on the file type. For example, if ssh key or .aws/credentials files are found, NodeZero attempts to extract those credentials for use in the operation. When business documents are discovered, NodeZero first extracts the text from the documents and then uses a natural language processing engine and contextual awareness to look for personally identifiable information (PII) and protected health information (PHI). Currently supported protected data types are:

  • Social Security Numbers
  • Credit Card Numbers
  • US Bank Numbers
  • US Individual Tax Identification Numbers
  • US Passport Numbers
  • ABA Routing Numbers

To protect from accidental exposure of PII or PHI, all of the text extraction, natural language processing and contextual analysis happen inside the NodeZero container which resides in the customer's network on the customer's docker host. Once the file has been analyzed, it is deleted from the container and the only information kept is meta-data about the exposure. That meta-data includes the filename, file share it was located at, and the type of data discovered (e.g. Credit Card). As an example, let's say NodeZero discovers a file called customer_invoice.pdf on the C$ share of host The discovery of the file will cause NodeZero to execute its sensitive data routines. NodeZero copies the file from the share to the NodeZero container which is running on the docker host in the customer's network. It extracts the text from the .pdf, processes it using pattern matching, natural language processing and contextual awareness and discovers a credit card number. NodeZero deletes the file from the container and reports back that it found sensitive data of type CREDIT_CARD in the file customer_invoice.pdf on host in the C$ share. The actual credit card number is not extracted or used beyond the initial classification.

For efficiency, NodeZero will not look at every file exposed in a share. Limitations are set on the number of files and the size of the files reviewed. In addition, if all other actions are completed, NodeZero will end the operation even if there are remaining files to process. This is done to prevent prolonged operations when large quantities of files are exposed.

Sensitive Data Exposure Results in the Portal

When sensitive data exposures are discovered, they will be documented in the portal under the Data tab. This tab allows you to browse through exposed file share locations. Displayed on this page is the data store type, name, host location, downstream impacts, number of sensitive resources exposed and types of protected data exposed (PII/PHI).


Digging deeper into each share will provide details and examples of the sensitive files and the pattern that they matched, downstream impacts, credentials used to gain access and protected data exposures if any exist. This data can be used by the user to trace back to the origin of the sensitive data exposure. NodeZero cannot tell you what the data exposed was since it does not extract or store the actual contents.