OWASP Testing Guide
  • Foreword by Eoin Keary
  • Frontispiece
  • Introduction
  • The OWASP Testing Framework
    • The Web Security Testing Framework
    • Penetration Testing Methodologies
  • Web Application Security Testing
    • Introduction and Objectives
    • Information Gathering
      • Conduct Search Engine Discovery Reconnaissance for Information Leakage (WSTG-INFO-01)
      • Fingerprint Web Server (WSTG-INFO-02)
      • Review Webserver Metafiles for Information Leakage (WSTG-INFO-03)
      • Enumerate Applications on Webserver (WSTG-INFO-04)
      • Review Webpage Content for Information Leakage (WSTG-INFO-05)
      • Identify Application Entry Points (WSTG-INFO-06)
      • Map Execution Paths Through Application (WSTG-INFO-07)
      • Fingerprint Web Application Framework (WSTG-INFO-08)
      • Fingerprint Web Application (WSTG-INFO-09)
      • Map Application Architecture (WSTG-INFO-10)
    • Configuration and Deployment Management Testing
      • Test Network Infrastructure Configuration (WSTG-CONF-01)
      • Test Application Platform Configuration (WSTG-CONF-02)
      • Test File Extensions Handling for Sensitive Information (WSTG-CONF-03)
      • Review Old Backup and Unreferenced Files for Sensitive Information (WSTG-CONF-04)
      • Enumerate Infrastructure and Application Admin Interfaces (WSTG-CONF-05)
      • Test HTTP Methods (WSTG-CONF-06)
      • Test HTTP Strict Transport Security (WSTG-CONF-07)
      • Test RIA Cross Domain Policy (WSTG-CONF-08)
      • Test File Permission (WSTG-CONF-09)
      • Test for Subdomain Takeover (WSTG-CONF-10)
      • Test Cloud Storage (WSTG-CONF-11)
      • Testing for Content Security Policy (WSTG-CONF-12)
    • Identity Management Testing
      • Test Role Definitions (WSTG-IDNT-01)
      • Test User Registration Process (WSTG-IDNT-02)
      • Test Account Provisioning Process (WSTG-IDNT-03)
      • Testing for Account Enumeration and Guessable User Account (WSTG-IDNT-04)
      • Testing for Weak or Unenforced Username Policy (WSTG-IDNT-05)
    • Authentication Testing
      • Testing for Credentials Transported over an Encrypted Channel (WSTG-ATHN-01)
      • Testing for Default Credentials (WSTG-ATHN-02)
      • Testing for Weak Lock Out Mechanism (WSTG-ATHN-03)
      • Testing for Bypassing Authentication Schema (WSTG-ATHN-04)
      • Testing for Vulnerable Remember Password (WSTG-ATHN-05)
      • Testing for Browser Cache Weaknesses (WSTG-ATHN-06)
      • Testing for Weak Password Policy (WSTG-ATHN-07)
      • Testing for Weak Security Question Answer (WSTG-ATHN-08)
      • Testing for Weak Password Change or Reset Functionalities (WSTG-ATHN-09)
      • Testing for Weaker Authentication in Alternative Channel (WSTG-ATHN-10)
      • Testing Multi-Factor Authentication (MFA) (WSTG-AUTH-11)
    • Authorization Testing
      • Testing Directory Traversal File Include (WSTG-ATHZ-01)
      • Testing for Bypassing Authorization Schema (WSTG-ATHZ-02)
      • Testing for Privilege Escalation (WSTG-ATHZ-03)
      • Testing for Insecure Direct Object References (WSTG-ATHZ-04)
      • Testing for OAuth Authorization Server Weaknesses
      • Testing for OAuth Client Weaknesses
      • Testing for OAuth Weaknesses (WSTG-ATHZ-05)
    • Session Management Testing
      • Testing for Session Management Schema (WSTG-SESS-01)
      • Testing for Cookies Attributes (WSTG-SESS-02)
      • Testing for Session Fixation (WSTG-SESS-03)
      • Testing for Exposed Session Variables (WSTG-SESS-04)
      • Testing for Cross Site Request Forgery (WSTG-SESS-05)
      • Testing for Logout Functionality (WSTG-SESS-06)
      • Testing Session Timeout (WSTG-SESS-07)
      • Testing for Session Puzzling (WSTG-SESS-08)
      • Testing for Session Hijacking (WSTG-SESS-09)
      • Testing JSON Web Tokens (WSTG-SESS-10)
    • Input Validation Testing
      • Testing for Reflected Cross Site Scripting (WSTG-INPV-01)
      • Testing for Stored Cross Site Scripting (WSTG-INPV-02)
      • Testing for HTTP Verb Tampering (WSTG-INPV-03)
      • Testing for HTTP Parameter Pollution (WSTG-INPV-04)
      • Testing for Oracle
      • Testing for MySQL
      • Testing for SQL Server
      • Testing PostgreSQL
      • Testing for MS Access
      • Testing for NoSQL Injection
      • Testing for ORM Injection
      • Testing for Client-side
      • Testing for SQL Injection (WSTG-INPV-05)
      • Testing for LDAP Injection (WSTG-INPV-06)
      • Testing for XML Injection (WSTG-INPV-07)
      • Testing for SSI Injection (WSTG-INPV-08)
      • Testing for XPath Injection (WSTG-INPV-09)
      • Testing for IMAP SMTP Injection (WSTG-INPV-10)
      • Testing for File Inclusion
      • Testing for Code Injection (WSTG-INPV-11)
      • Testing for Command Injection (WSTG-INPV-12)
      • Testing for Buffer Overflow (WSTG-INPV-13)
      • Testing for Format String Injection (WSTG-INPV-13)
      • Testing for Incubated Vulnerability (WSTG-INPV-14)
      • Testing for HTTP Splitting Smuggling (WSTG-INPV-15)
      • Testing for HTTP Incoming Requests (WSTG-INPV-16)
      • Testing for Host Header Injection (WSTG-INPV-17)
      • Testing for Server-side Template Injection (WSTG-INPV-18)
      • Testing for Server-Side Request Forgery (WSTG-INPV-19)
      • Testing for Mass Assignment (WSTG-INPV-20)
    • Testing for Error Handling
      • Testing for Improper Error Handling (WSTG-ERRH-01)
      • Testing for Stack Traces (WSTG-ERRH-02)
    • Testing for Weak Cryptography
      • Testing for Weak Transport Layer Security (WSTG-CRYP-01)
      • Testing for Padding Oracle (WSTG-CRYP-02)
      • Testing for Sensitive Information Sent via Unencrypted Channels (WSTG-CRYP-03)
      • Testing for Weak Encryption (WSTG-CRYP-04)
    • Business Logic Testing
      • Introduction to Business Logic
      • Test Business Logic Data Validation (WSTG-BUSL-01)
      • Test Ability to Forge Requests (WSTG-BUSL-02)
      • Test Integrity Checks (WSTG-BUSL-03)
      • Test for Process Timing (WSTG-BUSL-04)
      • Test Number of Times a Function Can Be Used Limits (WSTG-BUSL-05)
      • Testing for the Circumvention of Work Flows (WSTG-BUSL-06)
      • Test Defenses Against Application Misuse (WSTG-BUSL-07)
      • Test Upload of Unexpected File Types (WSTG-BUSL-08)
      • Test Upload of Malicious Files (WSTG-BUSL-09)
      • Test Payment Functionality (WSTG-BUSL-10)
    • Client-Side Testing
      • Testing for Self DOM Based Cross-Site Scripting
      • Testing for DOM-Based Cross Site Scripting (WSTG-CLNT-01)
      • Testing for JavaScript Execution (WSTG-CLNT-02)
      • Testing for HTML Injection (WSTG-CLNT-03)
      • Testing for Client-side URL Redirect (WSTG-CLNT-04)
      • Testing for CSS Injection (WSTG-CLNT-05)
      • Testing for Client-side Resource Manipulation (WSTG-CLNT-06)
      • Testing Cross Origin Resource Sharing (WSTG-CLNT-07)
      • Testing for Cross Site Flashing (WSTG-CLNT-08)
      • Testing for Clickjacking (WSTG-CLNT-09)
      • Testing WebSockets (WSTG-CLNT-10)
      • Testing Web Messaging (WSTG-CLNT-11)
      • Testing Browser Storage (WSTG-CLNT-12)
      • Testing for Cross Site Script Inclusion (WSTG-CLNT-13)
      • Testing for Reverse Tabnabbing (WSTG-CLNT-14)
    • API Testing
      • Testing GraphQL (WSTG-APIT-01)
  • Reporting
    • Reporting
    • Vulnerability Naming Schemes
  • Appendix
    • Testing Tools Resource
    • Suggested Reading
    • Fuzz Vectors
    • Encoded Injection
    • History
    • Leveraging Dev Tools
  • Testing Checklist
  • Table of Contents
  • REST Assessment Cheat Sheet
  • API Testing
Powered by GitBook
On this page
  • Summary
  • Test Objectives
  • How to Test
  • Search Engines
  • Search Operators
  • Viewing Cached Content
  • Google Hacking, or Dorking
  • Remediation
  1. Web Application Security Testing
  2. Information Gathering

Conduct Search Engine Discovery Reconnaissance for Information Leakage (WSTG-INFO-01)

PreviousInformation GatheringNextFingerprint Web Server (WSTG-INFO-02)

Last updated 2 years ago

ID

WSTG-INFO-01

Summary

In order for search engines to work, computer programs (or robots) regularly fetch data (referred to as ) from billions of pages on the web. These programs find web content and functionality by following links from other pages, or by looking at sitemaps. If a website uses a special file called robots.txt to list pages that it does not want search engines to fetch, then the pages listed there will be ignored. This is a basic overview - Google offers a more in-depth explanation of .

Testers can use search engines to perform reconnaissance on websites and web applications. There are direct and indirect elements to search engine discovery and reconnaissance: direct methods relate to searching the indexes and the associated content from caches, while indirect methods relate to learning sensitive design and configuration information by searching forums, newsgroups, and tendering websites.

Once a search engine robot has completed crawling, it commences indexing the web content based on tags and associated attributes, such as <TITLE>, in order to return relevant search results. If the robots.txt file is not updated during the lifetime of the web site, and in-line HTML meta tags that instruct robots not to index content have not been used, then it is possible for indexes to contain web content not intended to be included by the owners. Website owners may use the previously mentioned robots.txt, HTML meta tags, authentication, and tools provided by search engines to remove such content.

Test Objectives

  • Identify what sensitive design and configuration information of the application, system, or organization is exposed directly (on the organization's website) or indirectly (via third-party services).

How to Test

Use a search engine to search for potentially sensitive information. This may include:

  • network diagrams and configurations;

  • archived posts and emails by administrators or other key staff;

  • logon procedures and username formats;

  • usernames, passwords, and private keys;

  • third-party, or cloud service configuration files;

  • revealing error message content; and

  • non-public applications (development, test, User Acceptance Testing (UAT), and staging versions of sites).

Search Engines

Do not limit testing to just one search engine provider, as different search engines may generate different results. Search engine results can vary in a few ways, depending on when the engine last crawled content, and the algorithm the engine uses to determine relevant pages. Consider using the following (alphabetically-listed) search engines:

Search Operators

A search operator is a special keyword or syntax that extends the capabilities of regular search queries, and can help obtain more specific results. They generally take the form of operator:query. Here are some commonly supported search operators:

  • site: will limit the search to the provided domain.

  • inurl: will only return results that include the keyword in the URL.

  • intitle: will only return results that have the keyword in the page title.

  • intext: or inbody: will only search for the keyword in the body of pages.

  • filetype: will match only a specific file type, i.e. .png, or .php.

For example, to find the web content of owasp.org as indexed by a typical search engine, the syntax required is:

site:owasp.org

Viewing Cached Content

To search for content that has previously been indexed, use the cache: operator. This is helpful for viewing content that may have changed since the time it was indexed, or that may no longer be available. Not all search engines provide cached content to search; the most useful source at time of writing is Google.

To view owasp.org as it is cached, the syntax is:

cache:owasp.org

Google Hacking, or Dorking

  • Footholds

  • Files containing usernames

  • Sensitive Directories

  • Web Server Detection

  • Vulnerable Files

  • Vulnerable Servers

  • Error Messages

  • Files containing juicy info

  • Files containing passwords

  • Sensitive Online Shopping Info

Remediation

Carefully consider the sensitivity of design and configuration information before it is posted online.

Periodically review the sensitivity of existing design and configuration information that is posted online.

, China's search engine.

, a search engine owned and operated by Microsoft, and the second worldwide. Supports .

, a search engine for binary Usenet newsgroups.

, "an open repository of web crawl data that can be accessed and analyzed by anyone."

, a privacy-focused search engine that compiles results from many different . Supports .

, which offers the world's search engine, and uses a ranking system to attempt to return the most relevant results. Supports .

, "building a digital library of Internet sites and other cultural artifacts in digital form."

, a service for searching Internet-connected devices and services. Usage options include a limited free plan as well as paid subscription plans.

Figure 4.1.1-1: Google Site Operation Search Result Example

Figure 4.1.1-2: Google Cache Operation Search Result Example

Searching with operators can be a very effective discovery technique when combined with the creativity of the tester. Operators can be chained to effectively discover specific kinds of sensitive files and information. This technique, called or Dorking, is also possible using other search engines, as long as the search operators are supported.

A database of dorks, such as , is a useful resource that can help uncover specific information. Some categories of dorks available on this database include:

crawling
how a search engine works
Baidu
most popular
Bing
most popular
advanced search keywords
binsearch.info
Common Crawl
DuckDuckGo
sources
search syntax
Google
most popular
search operators
Internet Archive Wayback Machine
Shodan
Google hacking
Google Hacking Database
Google Site Operation Search Result Example
Google Cache Operation Search Result Example