OWASP Testing Guide
  • Foreword by Eoin Keary
  • Frontispiece
  • Introduction
  • The OWASP Testing Framework
    • The Web Security Testing Framework
    • Penetration Testing Methodologies
  • Web Application Security Testing
    • Introduction and Objectives
    • Information Gathering
      • Conduct Search Engine Discovery Reconnaissance for Information Leakage (WSTG-INFO-01)
      • Fingerprint Web Server (WSTG-INFO-02)
      • Review Webserver Metafiles for Information Leakage (WSTG-INFO-03)
      • Enumerate Applications on Webserver (WSTG-INFO-04)
      • Review Webpage Content for Information Leakage (WSTG-INFO-05)
      • Identify Application Entry Points (WSTG-INFO-06)
      • Map Execution Paths Through Application (WSTG-INFO-07)
      • Fingerprint Web Application Framework (WSTG-INFO-08)
      • Fingerprint Web Application (WSTG-INFO-09)
      • Map Application Architecture (WSTG-INFO-10)
    • Configuration and Deployment Management Testing
      • Test Network Infrastructure Configuration (WSTG-CONF-01)
      • Test Application Platform Configuration (WSTG-CONF-02)
      • Test File Extensions Handling for Sensitive Information (WSTG-CONF-03)
      • Review Old Backup and Unreferenced Files for Sensitive Information (WSTG-CONF-04)
      • Enumerate Infrastructure and Application Admin Interfaces (WSTG-CONF-05)
      • Test HTTP Methods (WSTG-CONF-06)
      • Test HTTP Strict Transport Security (WSTG-CONF-07)
      • Test RIA Cross Domain Policy (WSTG-CONF-08)
      • Test File Permission (WSTG-CONF-09)
      • Test for Subdomain Takeover (WSTG-CONF-10)
      • Test Cloud Storage (WSTG-CONF-11)
      • Testing for Content Security Policy (WSTG-CONF-12)
    • Identity Management Testing
      • Test Role Definitions (WSTG-IDNT-01)
      • Test User Registration Process (WSTG-IDNT-02)
      • Test Account Provisioning Process (WSTG-IDNT-03)
      • Testing for Account Enumeration and Guessable User Account (WSTG-IDNT-04)
      • Testing for Weak or Unenforced Username Policy (WSTG-IDNT-05)
    • Authentication Testing
      • Testing for Credentials Transported over an Encrypted Channel (WSTG-ATHN-01)
      • Testing for Default Credentials (WSTG-ATHN-02)
      • Testing for Weak Lock Out Mechanism (WSTG-ATHN-03)
      • Testing for Bypassing Authentication Schema (WSTG-ATHN-04)
      • Testing for Vulnerable Remember Password (WSTG-ATHN-05)
      • Testing for Browser Cache Weaknesses (WSTG-ATHN-06)
      • Testing for Weak Password Policy (WSTG-ATHN-07)
      • Testing for Weak Security Question Answer (WSTG-ATHN-08)
      • Testing for Weak Password Change or Reset Functionalities (WSTG-ATHN-09)
      • Testing for Weaker Authentication in Alternative Channel (WSTG-ATHN-10)
      • Testing Multi-Factor Authentication (MFA) (WSTG-AUTH-11)
    • Authorization Testing
      • Testing Directory Traversal File Include (WSTG-ATHZ-01)
      • Testing for Bypassing Authorization Schema (WSTG-ATHZ-02)
      • Testing for Privilege Escalation (WSTG-ATHZ-03)
      • Testing for Insecure Direct Object References (WSTG-ATHZ-04)
      • Testing for OAuth Authorization Server Weaknesses
      • Testing for OAuth Client Weaknesses
      • Testing for OAuth Weaknesses (WSTG-ATHZ-05)
    • Session Management Testing
      • Testing for Session Management Schema (WSTG-SESS-01)
      • Testing for Cookies Attributes (WSTG-SESS-02)
      • Testing for Session Fixation (WSTG-SESS-03)
      • Testing for Exposed Session Variables (WSTG-SESS-04)
      • Testing for Cross Site Request Forgery (WSTG-SESS-05)
      • Testing for Logout Functionality (WSTG-SESS-06)
      • Testing Session Timeout (WSTG-SESS-07)
      • Testing for Session Puzzling (WSTG-SESS-08)
      • Testing for Session Hijacking (WSTG-SESS-09)
      • Testing JSON Web Tokens (WSTG-SESS-10)
    • Input Validation Testing
      • Testing for Reflected Cross Site Scripting (WSTG-INPV-01)
      • Testing for Stored Cross Site Scripting (WSTG-INPV-02)
      • Testing for HTTP Verb Tampering (WSTG-INPV-03)
      • Testing for HTTP Parameter Pollution (WSTG-INPV-04)
      • Testing for Oracle
      • Testing for MySQL
      • Testing for SQL Server
      • Testing PostgreSQL
      • Testing for MS Access
      • Testing for NoSQL Injection
      • Testing for ORM Injection
      • Testing for Client-side
      • Testing for SQL Injection (WSTG-INPV-05)
      • Testing for LDAP Injection (WSTG-INPV-06)
      • Testing for XML Injection (WSTG-INPV-07)
      • Testing for SSI Injection (WSTG-INPV-08)
      • Testing for XPath Injection (WSTG-INPV-09)
      • Testing for IMAP SMTP Injection (WSTG-INPV-10)
      • Testing for File Inclusion
      • Testing for Code Injection (WSTG-INPV-11)
      • Testing for Command Injection (WSTG-INPV-12)
      • Testing for Buffer Overflow (WSTG-INPV-13)
      • Testing for Format String Injection (WSTG-INPV-13)
      • Testing for Incubated Vulnerability (WSTG-INPV-14)
      • Testing for HTTP Splitting Smuggling (WSTG-INPV-15)
      • Testing for HTTP Incoming Requests (WSTG-INPV-16)
      • Testing for Host Header Injection (WSTG-INPV-17)
      • Testing for Server-side Template Injection (WSTG-INPV-18)
      • Testing for Server-Side Request Forgery (WSTG-INPV-19)
      • Testing for Mass Assignment (WSTG-INPV-20)
    • Testing for Error Handling
      • Testing for Improper Error Handling (WSTG-ERRH-01)
      • Testing for Stack Traces (WSTG-ERRH-02)
    • Testing for Weak Cryptography
      • Testing for Weak Transport Layer Security (WSTG-CRYP-01)
      • Testing for Padding Oracle (WSTG-CRYP-02)
      • Testing for Sensitive Information Sent via Unencrypted Channels (WSTG-CRYP-03)
      • Testing for Weak Encryption (WSTG-CRYP-04)
    • Business Logic Testing
      • Introduction to Business Logic
      • Test Business Logic Data Validation (WSTG-BUSL-01)
      • Test Ability to Forge Requests (WSTG-BUSL-02)
      • Test Integrity Checks (WSTG-BUSL-03)
      • Test for Process Timing (WSTG-BUSL-04)
      • Test Number of Times a Function Can Be Used Limits (WSTG-BUSL-05)
      • Testing for the Circumvention of Work Flows (WSTG-BUSL-06)
      • Test Defenses Against Application Misuse (WSTG-BUSL-07)
      • Test Upload of Unexpected File Types (WSTG-BUSL-08)
      • Test Upload of Malicious Files (WSTG-BUSL-09)
      • Test Payment Functionality (WSTG-BUSL-10)
    • Client-Side Testing
      • Testing for Self DOM Based Cross-Site Scripting
      • Testing for DOM-Based Cross Site Scripting (WSTG-CLNT-01)
      • Testing for JavaScript Execution (WSTG-CLNT-02)
      • Testing for HTML Injection (WSTG-CLNT-03)
      • Testing for Client-side URL Redirect (WSTG-CLNT-04)
      • Testing for CSS Injection (WSTG-CLNT-05)
      • Testing for Client-side Resource Manipulation (WSTG-CLNT-06)
      • Testing Cross Origin Resource Sharing (WSTG-CLNT-07)
      • Testing for Cross Site Flashing (WSTG-CLNT-08)
      • Testing for Clickjacking (WSTG-CLNT-09)
      • Testing WebSockets (WSTG-CLNT-10)
      • Testing Web Messaging (WSTG-CLNT-11)
      • Testing Browser Storage (WSTG-CLNT-12)
      • Testing for Cross Site Script Inclusion (WSTG-CLNT-13)
      • Testing for Reverse Tabnabbing (WSTG-CLNT-14)
    • API Testing
      • Testing GraphQL (WSTG-APIT-01)
  • Reporting
    • Reporting
    • Vulnerability Naming Schemes
  • Appendix
    • Testing Tools Resource
    • Suggested Reading
    • Fuzz Vectors
    • Encoded Injection
    • History
    • Leveraging Dev Tools
  • Testing Checklist
  • Table of Contents
  • REST Assessment Cheat Sheet
  • API Testing
Powered by GitBook
On this page
  • Summary
  • Test Objectives
  • How to Test
  • Robots
  • META Tags
  • Sitemaps
  • Security TXT
  • Humans TXT
  • Other .well-known Information Sources
  • Tools
  1. Web Application Security Testing
  2. Information Gathering

Review Webserver Metafiles for Information Leakage (WSTG-INFO-03)

PreviousFingerprint Web Server (WSTG-INFO-02)NextEnumerate Applications on Webserver (WSTG-INFO-04)

Last updated 2 years ago

ID

WSTG-INFO-03

Summary

This section describes how to test various metadata files for information leakage of the web application's path(s), or functionality. Furthermore, the list of directories that are to be avoided by Spiders, Robots, or Crawlers can also be created as a dependency for . Other information may also be collected to identify attack surface, technology details, or for use in social engineering engagement.

Test Objectives

  • Identify hidden or obfuscated paths and functionality through the analysis of metadata files.

  • Extract and map other information that could lead to a better understanding of the systems at hand.

How to Test

Any of the actions performed below with wget could also be done with curl. Many Dynamic Application Security Testing (DAST) tools such as ZAP and Burp Suite include checks or parsing for these resources as part of their spider/crawler functionality. They can also be identified using various or leveraging advanced search features such as inurl:.

Robots

Web Spiders, Robots, or Crawlers retrieve a web page and then recursively traverse hyperlinks to retrieve further web content. Their accepted behavior is specified by the of the file in the web root directory.

As an example, the beginning of the robots.txt file from sampled on 2020 May 5 is quoted below:

User-agent: *
Disallow: /search
Allow: /search/about
Allow: /search/static
Allow: /search/howsearchworks
Disallow: /sdch
...

The Disallow directive specifies which resources are prohibited by spiders/robots/crawlers. In the example above, the following are prohibited:

...
Disallow: /search
...
Disallow: /sdch
...

The robots.txt file is retrieved from the web root directory of the web server. For example, to retrieve the robots.txt from www.google.com using wget or curl:

$ curl -O -Ss http://www.google.com/robots.txt && head -n5 robots.txt
User-agent: *
Disallow: /search
Allow: /search/about
Allow: /search/static
Allow: /search/howsearchworks
...

Analyze robots.txt Using Google Webmaster Tools

  1. Sign into Google Webmaster Tools with a Google account.

  2. On the dashboard, enter the URL for the site to be analyzed.

  3. Choose between the available methods and follow the on screen instruction.

META Tags

Robots META Tag

If there is no <META NAME="ROBOTS" ... > entry then the "Robots Exclusion Protocol" defaults to INDEX,FOLLOW respectively. Therefore, the other two valid entries defined by the "Robots Exclusion Protocol" are prefixed with NO... i.e. NOINDEX and NOFOLLOW.

Based on the Disallow directive(s) listed within the robots.txt file in webroot, a regular expression search for <META NAME="ROBOTS" within each web page is undertaken and the result compared to the robots.txt file in webroot.

Miscellaneous META Information Tags

Organizations often embed informational META tags in web content to support various technologies such as screen readers, social networking previews, search engine indexing, etc. Such meta-information can be of value to testers in identifying technologies used, and additional paths/functionality to explore and test. The following meta information was retrieved from www.whitehouse.gov via View Page Source on 2020 May 05:

...
<meta property="og:locale" content="en_US" />
<meta property="og:type" content="website" />
<meta property="og:title" content="The White House" />
<meta property="og:description" content="We, the citizens of America, are now joined in a great national effort to rebuild our country and to restore its promise for all. – President Donald Trump." />
<meta property="og:url" content="https://www.whitehouse.gov/" />
<meta property="og:site_name" content="The White House" />
<meta property="fb:app_id" content="1790466490985150" />
<meta property="og:image" content="https://www.whitehouse.gov/wp-content/uploads/2017/12/wh.gov-share-img_03-1024x538.png" />
<meta property="og:image:secure_url" content="https://www.whitehouse.gov/wp-content/uploads/2017/12/wh.gov-share-img_03-1024x538.png" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:description" content="We, the citizens of America, are now joined in a great national effort to rebuild our country and to restore its promise for all. – President Donald Trump." />
<meta name="twitter:title" content="The White House" />
<meta name="twitter:site" content="@whitehouse" />
<meta name="twitter:image" content="https://www.whitehouse.gov/wp-content/uploads/2017/12/wh.gov-share-img_03-1024x538.png" />
<meta name="twitter:creator" content="@whitehouse" />
...
<meta name="apple-mobile-web-app-title" content="The White House">
<meta name="application-name" content="The White House">
<meta name="msapplication-TileColor" content="#0c2644">
<meta name="theme-color" content="#f5f5f5">
...

Sitemaps

A sitemap is a file where a developer or organization can provide information about the pages, videos, and other files offered by the site or application, and the relationship between them. Search engines can use this file to more intelligently explore your site. Testers can use sitemap.xml files to learn more about the site or application to explore it more completely.

The following excerpt is from Google's primary sitemap retrieved 2020 May 05.

$ wget --no-verbose https://www.google.com/sitemap.xml && head -n8 sitemap.xml
2020-05-05 12:23:30 URL:https://www.google.com/sitemap.xml [2049] -> "sitemap.xml" [1]

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84">
  <sitemap>
    <loc>https://www.google.com/gmail/sitemap.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://www.google.com/forms/sitemaps.xml</loc>
  </sitemap>
...

Exploring from there a tester may wish to retrieve the gmail sitemap https://www.google.com/gmail/sitemap.xml:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <url>
    <loc>https://www.google.com/intl/am/gmail/about/</loc>
    <xhtml:link href="https://www.google.com/gmail/about/" hreflang="x-default" rel="alternate"/>
    <xhtml:link href="https://www.google.com/intl/el/gmail/about/" hreflang="el" rel="alternate"/>
    <xhtml:link href="https://www.google.com/intl/it/gmail/about/" hreflang="it" rel="alternate"/>
    <xhtml:link href="https://www.google.com/intl/ar/gmail/about/" hreflang="ar" rel="alternate"/>
...

Security TXT

  • Identifying further paths or resources to include in discovery/analysis.

  • Open Source intelligence gathering.

  • Finding information on Bug Bounties, etc.

  • Social Engineering.

The file may be present either in the root of the webserver or in the .well-known/ directory. Ex:

  • https://example.com/security.txt

  • https://example.com/.well-known/security.txt

Here is a real world example retrieved from LinkedIn 2020 May 05:

$ wget --no-verbose https://www.linkedin.com/.well-known/security.txt && cat security.txt
2020-05-07 12:56:51 URL:https://www.linkedin.com/.well-known/security.txt [333/333] -> "security.txt" [1]
# Conforms to IETF `draft-foudil-securitytxt-07`
Contact: mailto:security@linkedin.com
Contact: https://www.linkedin.com/help/linkedin/answer/62924
Encryption: https://www.linkedin.com/help/linkedin/answer/79676
Canonical: https://www.linkedin.com/.well-known/security.txt
Policy: https://www.linkedin.com/help/linkedin/answer/62924

Humans TXT

humans.txt is an initiative for knowing the people behind a website. It takes the form of a text file that contains information about the different people who have contributed to building the website. This file often (though not always) contains information for career or job sites/paths.

The following example was retrieved from Google 2020 May 05:

$ wget --no-verbose  https://www.google.com/humans.txt && cat humans.txt
2020-05-07 12:57:52 URL:https://www.google.com/humans.txt [286/286] -> "humans.txt" [1]
Google is built by a large team of engineers, designers, researchers, robots, and others in many different sites across the globe. It is updated continuously, and built with more tools and technologies than we can shake a stick at. If you'd like to help us out, see careers.google.com.

Other .well-known Information Sources

It would be fairly simple for a tester to review the RFC/drafts and create a list to be supplied to a crawler or fuzzer, in order to verify the existence or content of such files.

Tools

  • Browser (View Source or Dev Tools functionality)

  • curl

  • wget

  • Burp Suite

  • ZAP

The directive refers to the specific web spider/robot/crawler. For example, the User-Agent: Googlebot refers to the spider from Google while User-Agent: bingbot refers to a crawler from Microsoft. User-Agent: * in the example above applies to all .

Web spiders/robots/crawlers can the Disallow directives specified in a robots.txt file. Hence, robots.txt should not be considered as a mechanism to enforce restrictions on how web content is accessed, stored, or republished by third parties.

Web site owners can use the Google "Analyze robots.txt" function to analyze the website as part of its . This tool can assist with testing and the procedure is as follows:

<META> tags are located within the HEAD section of each HTML document and should be consistent across a web site in the event that the robot/spider/crawler start point does not begin from a document link other than webroot i.e. a . Robots directive can also be specified through use of a specific .

was ratified by the IETF as which allows websites to define security policies and contact details. There are multiple reasons this might be of interest in testing scenarios, including but not limited to:

There are other RFCs and Internet drafts which suggest standardized uses of files within the .well-known/ directory. Lists of which can be found or .

Map execution paths through application
Google Dorks
Robots Exclusion Protocol
robots.txt
Google
User-Agent
web spiders/robots/crawlers
intentionally ignore
Google Webmaster Tools
deep link
META tag
security.txt
RFC 9116 - A File Format to Aid in Security Vulnerability Disclosure
here
here