Although search engines like Yahoo!, Bing and Google index billions of web pages and other electronic documents, this represents only a tiny part of the total information available on the World Wide Web. To unearth the buried treasure, you have to understand how to mine the data.
Contents
Two Layers of Data
Think of the Web as having two layers: a shallow surface and an almost bottomless, deep level. In the top layer, the Surface Web, you will find all the web pages like the one that you’re now reading. This page and others like it have fixed web addresses or URLs (in this case, https://websitebuilders.com/how-to/search-the-deep-web). Also, the information contained in the page doesn’t change very often.
The Deep Web contains pages with dynamic content–data that changes frequently and can’t be indexed easily by search engines. Most of this information is stored in databases and is assembled “on the fly” when you query the database. For instance, when you search for an item on eBay, information is pulled from eBay’s database and instantly assembled on a web page for you. That page did not exist until you performed your search, which is what makes it dynamic; it was customized in response to your query. Because of this fact, search engines can’t readily index this information.
Other types of “deep” information include:
- Multimedia (audio, music and video)
- Photos and graphics
- Job listings
- Financial data (stock and bond prices, currency rates)
- News
- Travel-related data (airline and train schedules)
- Information on sites that require passwords
The Dark Web
When people talk about the “Deep Web” they often think of illegal or illicit activities. That’s because they’re mistaking the entirety of the Deep Web for a relatively small subset, known as the Dark Web.
So what is the Dark Web, and how does it compare to the Deep Web?
The Dark Web is a collection of information that cannot be easily searched by traditional search engines. This may be because the information is contained in password-protected areas or stored in databases, as we’ve already discussed for more conventional Deep Web content. More often, sites on the Dark Web utilize specialized programs, such as Tor, to mask their IP addresses, making it impossible for conventional search engines to locate or index them. In order to access this type of content, users need to know precisely where to look and utilize the same IP-masking technology.
The dark web – the anonymous network that empowers journalistic free speech and illegal commerce – is just one small part of the deep web. Most of the deep web is comprised of much more mundane content that just isn’t readily accessible to search engines.
The Dark Web has become a haven for illicit activity such as prostitution, drug trafficking, arms dealing, child pornography, and just about anything else you can imagine. Since Dark Web sites utilize IP-masking technology, they are nearly impossible to trace. And, much like the traditional Internet, the open nature of the Web means that even if authorities are able to shut down one illicit site, several will quickly rise up to take its place.
At this point, it’s important to reiterate that the Dark Web is only a small subset of the Deep Web. In order to find Dark Web content, you need to use specialized software and know where to look. You won’t accidentally fall into the Dark Web while searching Google or your library’s database search tool. From here on out, when we talk about searching the Deep Web, we’re talking about the far-more-prominent and perfectly legal content.
Digging Below the Surface
So how to do you find Deep Web pages? Fortunately, you can uncover this wealth of information by using specialized tools designed to mine databases. For instance, let’s say you want to buy a used copy of “Alice in Wonderland.” How would you find it? Searching on eBay or Amazon.com–essentially querying their databases–will be more fruitful than using Yahoo! or Google. The same goes for job hunting. Since job postings are stored in a database, most search engines can’t find them; searching sites like Craigslist or Monster is a better way to go.
The secret to successful searching is to understand what you want to know, and then using the right Web resource to find it. Ask yourself these questions:
- Is the information time-sensitive, such as stock quotes or newspaper articles?
- Are you looking for a photo or a video clip?
- Do you want to find an MP3 music file or listen to a podcast?
- Are you searching for specific types of content, such as blogs?
- Are you looking for a special resource that is only available to a select group (and probably requires a password)?
- Are you searching for a scholarly resource, such as an article from a magazine or journal?
Start With a Traditional Search
Unless you know the exact search tool you need to use, the best place to start is the same place you start most web searches, your favorite search engine.
Yes, we just said Deep Web content was not searchable using traditional search engines, but there are two ways these search engines can help.
First, in recent years all of the major search engines have started building their own tools to search these previously “unsearchable” resources. At one time, you needed to go to a special section in Google or Yahoo! to search images. Now image results are displayed as part of a standard search (though you will still get fuller results by going to their dedicated image search page). The same is now true for much of the most popular dynamic web content.
Depending on who you ask, the Surface Web makes up between 1% and 5% of the web – the remaining 95% to 99% makes up the Deep Web.
How is this possible? The major search providers are trying to simplify your web experience, so they now include results from specialized search systems. Essentially, when you enter a search term, they’re running a traditional search and displaying results for indexed World Wide Web sites, and they’re running searches through specialized database search engines (sometimes their own database, sometimes that of a partner) and showing you those results as well. It’s as close as they have yet come to bringing the content of the Deep Web to the light.
So what if a standard search doesn’t work? This is often the case, but don’t be discouraged. Try one more thing from that search window. Search for a specific type of database or search tool. Looking for scientific journal articles? Type “scientific databases.” Looking for podcasts? Type “podcast search.” While the search engine may not be able to mine the specific database you’re looking for, it can usually tell you what and where those databases are.
If you still can’t find what you’re looking for, you may need a very specialized search tool. We’ve listed some of the more common tools below, but it’s also a good idea to go old school. If you’re looking for a specific type of research or topic, check with your local library’s research librarian. They may know the exact tool you need.
And remember, when you find tools that you like, be sure to bookmark them for future use.
To Find | Try Using |
---|---|
Audio and Music Files | FindSounds |
Yahoo! Music | |
Books, Journals, and Reference | OAIster |
Online Books Page | |
Library Spot | |
Blogs | Alltop |
Bing Blogs | |
IceRocket | |
Databases | Science.gov |
USA.gov | |
The WWW Virtual Library | |
News | Bing News |
Google News | |
NewsLookup | |
NewNow | |
Newsgroups and Groups | Google Groups |
Photos and Graphics | Google Images |
Picsearch | |
Yahoo! Image Search | |
Podcasts | Podcastpedia |
PodcastDirectory | |
RSS Feeds | Feedage |
RSS Micro | |
Sound Effects | FindSounds |
Video | AOL Video Search |
Google Videos | |
Yahoo! Video Search |
To learn more, visit the Deep Web Research Blog.