Along with various methods of individual research, it's important we utilize all avenues of information gathering. Some of the most reliable information often comes from carefully collected databases, newsgroups, alongside general web search. Here we cover the utmost basics of these methods, paired with subscriptions:
Searching the Web
The Web is the greatest collection of information the world has ever seen. It contains billions of pages of information about every topic known to man; so much information you could never exhaust the content that’s on the Web right now, to say nothing of the thousands of new websites launched every day. Search engines are the gateway used to access all of this information, and without good search engines, the amazing wealth of information that is only a few keystrokes away would remain largely inaccessible.
An Indexing and Retrieval System
Search engines work by doing three things:
- Scouring the web for content.
- Indexing the content they find.
- Returning the results that are most relevant to a search term by comparing the term to the indexed content.
Web crawlers sometimes called spiders, locate a web page, and send its contents back to the search engine index. The web crawler then follows every link on the web page to capture all of the content presented on the website, as well as the websites that the page is linked to. In this way, web crawlers not only capture the content of a web page, but they also capture the links that tie different websites together.
Google, founded in 1997, dominates the search engine market with more than 60% of global market share.
The information sent back by a web crawler is analyzed by the search engine and indexed. The search engine indexes web pages based on the web page’s title, headings, paragraph content, subheadings, and any other information contained on the page. The search engine uses this information to determine which topics the page provides relevant information about.
When a user visits a search engine and enters a search term, or query, the keywords are compared against the contents of the search engine index. The relevant pages, sorted by relevance as determined by the search engine, are then returned back to the user.
The way that each search engine decides which results to return is called the search engine algorithm. Each search engine’s algorithm is proprietary, and varies to some degree, meaning that each search engine will return somewhat different results for the same search terms.
Getting the Most out of a Search Engine
Have you ever searched for something on the Web and been unable to find exactly what you were looking for? If you don’t know them already, there are a few advanced searching tricks you can employ to get the most out of a search engine and to track down hard-to-find information.
Google, Yahoo!, Bing, and Baidu command more than 97% of all global search engine market share.
Most search engines allow the use of boolean operators, which are words you can use to refine your search. The most common Boolean operators are AND, OR, and NOT. Combining search terms with these operators will yield more specific results. If you’re looking for information on a specific webpage, a useful trick is to limit your search to just that domain by prefacing your search with site:domain-to-be-searched.com. Another helpful action is to limit your search to results that match a phrase perfectly by surrounding the phrase in parenthesis like this: “exact phrase you are searching for”.
There are many other tools and operators you can use to get the most out of a search engine, and if this is a topic you want to learn about, check out this article to get started.
Search Engines You Should Know
The three most popular search engines are Google, Yahoo!, and Bing. Google is by far the most used, and the combined volume of the top three engines makes up more than 80% of global search engine volume. If you add in the volume of the leading Chinese-language search engine, Baidu, you are left with less than 5% of total search engine volume to spread between all other search engines.
Google was founded in 1997, and today dominates the search engine market with more than 60% of global market share. In addition to keyword searches that return web pages, Google allows you to search for several other types of content such as:
- Images and videos
- Shopping pages
- News articles
- Scholarly articles
- Airline flights
Yahoo!, founded in 1995, and Bing, unveiled by Microsoft in 2009, both work very similarly to Google and offer comparable content searching capabilities. However, due to algorithm differences, search results produced by each search engine will vary from the results provided by the others.
Google is available in 123 different languages while Yahoo! is available in more than 30 languages, and Bing is available in 40 different languages. The availability of each in multiple languages has driven the adoption of all three search engines around the world. Baidu, on the other hand, is focused on the Chinese search engine market. This focus has resulted in Baidu is the market leader in China, commanding more than half of all of China’s internet search queries.
All of the leading search engines depend on advertising revenue and track user behavior to deliver targeted ads. Most users are aware that this is happening and don’t care. However, enough users do want to maintain their anonymity that a few search engines have appeared that do not collect user data. The most noteworthy search engine in this category is DuckDuckGo.
Newsgroups — message forums run on the Usenet system — are a valuable place to find information. Unfortunately, they can be a little difficult to search.
What is the Usenet?
Newsgroups can be an especially helpful place to find information on technology, especially older or more advanced computer science concepts.
The Usenet is a distributed messaging system that runs over the internet. It runs on a different protocol than the World Wide Web and predates the Web by several years. The system itself runs in a manner similar to email, and the user experience is similar to bulletin boards and internet forums. (In fact, many of the early users and communities from Usenet migrated to web-based message forums when they became available.)
Why search Usenet?
Usenet is a valuable repository of community information, much of which is not really duplicated anywhere on the public web. That is because Usenet is primarily a conversation-based system, so searching the Usenet allows you to drop in and out of conversations among people who are fairly dedicated to the topic being discussed.
What types of things can I find on Usenet?
Usenet is a platform for discussion and has also evolved in the last few years into a platform for file sharing.
File sharing newsgroups are mostly in the
alt.binaries hierarchies, and contain a lot of material, very little of which would be discussed in polite company; there are a lot of illicit copies of copyright-protected materials, and there is a lot of pornographic and erotic content. Most of this material can be found other places.
There are Newsgroups on all sorts of topics, from cooking to knitting to civil war re-enactments.
Newsgroups can be an especially helpful place to find information on technology, especially older or more advanced computer science concepts. This is largely because of the demographics of the people who regularly use Newsgroups. You aren’t likely to find as much information on the latest pop star, but if you need help with a command-line text processing tool, Newsgroups might be the place to find what you are looking for.
There are many conversations on Usenet among “power users” of various technologies, especially the less commercially popular things like Linux or older programming languages.
How to search for information on the Usenet
If you want to search for images and video files in the file-sharing groups (mostly in the
alt.binaries hierarchies), you will likely need to subscribe to a premium Usenet service to do so.
If you are looking for text content — actual conversations between legitimate Usenet users — you can do that without paying for a premium membership.
Google Groups provides a completely searchable archive of messages from many of the most important Newsgroups hierarchies. Of the major groups, only the
alt.* groups are not included. The best thing about Google’s archives is that they are very extensive. Google bought the archive from another major provider, and as messages going back all the way to the early 1980s. There aren’t any websites from that period, so this is a valuable resource.
If you want to include the
alt.* groups in your search, check out the archive at Eternal September. They include all the non-binary
alt groups, the “Big 8,” and most regional groups. The problem with Eternal September is that it isn’t intended as a historical archive — messages are only kept for a few months.
Gopher, FTP, WAIS, Archie, Veronica, Jughead.
What do all these strange terms mean? Basically, these are older tools or network services that represent different ways of searching and retrieving files on the Internet. With the development of the World Wide Web, most of these services declined in popularity, and you really don't need to know the gory details of how to use these anymore, but in case you're interested, read on.
Gopher is an application that organizes access to Internet resources using a menu-based search and retrieval system. It indexes the many databases, online library catalogs, bulletin board systems and campus-wide information services available on the Internet, by subject, type of service, or geographic location. While you are “sniffing” around Gopherspace, you are actually doing things that are not obviously visible to you, like transferring files, changing directories, connecting to computers and querying servers all over the world.
Gopher automatically takes care of finding whatever data you want, no matter where it is. You may use a dozen or more different Gopher servers in a single session, but you hardly know it. You need a Gopher client to access Gopher. The good news is that several open-source browsers, including Lynx, support the Gopher protocol.
Veronica is an acronym for Very Easy Rodent-Oriented Internet-wide Index to Computerized Archives–whew, that's a mouthful! It's an application that offers a keyword search of most gopher-server menu titles. A Veronica search produces a menu of Gopher items, each of which is a direct pointer to a Gopher data source. Jughead is another less powerful search utility for Gopher.
FTP stands for File Transfer Protocol, which is a widely used method of copying files from one system to another on the Internet. With FTP you can list the files in a directory and upload or download files to and from that directory.
We should point out that the FTP protocol is still widely used to move files and directories on and off of password-protected web servers. However, in the past, FTP was also commonly used as a way to host files for public access. While this is still done on occasion, the practice has largely fallen by the wayside.
The transfer of publicly available information is one of the most widespread uses of the file transfer capability on the Internet. Many organizations connected to the Internet provide openly accessible file transfer sites with information that anyone can obtain. Files are stored in “open” areas of computers. You access them by using FTP to connect to those systems. These are called Anonymous FTP sites because to access them you log in with the word anonymous, and use your e-mail address as the password. If you are not using a web browser with built-in FTP capability, or if you want to upload files to a remote server, you need an FTP client program.
If you go to one of the Internet software sites, like Download.com, you can find many FTP programs. We like WS_FTP for Windows and Fetch for the Mac. FTP is also built into many applications, like Web management tools, word processors and so on.
Archie is to FTP, what Veronica is to Gopher. It lets you search publicly available FTP sites that contain files with the keyword you are searching for. To use it you have to log in to an Archie server and type some commands. It will do a search and turn up a list of all the sites that have what you're looking for.
WAIS is an acronym for Wide Area Information Servers. It's a networked information retrieval system. Unlike Gopher, which searches files by their titles, WAIS servers search the full text of files and return a list of documents that contain the keyword you are searching for. The WAIS method of search and retrieval is what most search engines on the Web are based on.
Subscribing to Groups and Reading Postings
A Usenet newsgroup is a discussion of redistributed notes and messages on a topic. Newsgroups allow people to discuss topics of interest. Once you know how to access them, you can read about information on countless topics. To have access to Usenet, you will have to pay for an account with a Usenet provider.
You will have to sign up with a Usenet provider by paying or starting a free trial. There are many big Usenet providers, such as Newshosting and Giganews. After you sign up, the provider will send your credentials via email that you will have to enter into your newsreader app’s settings. Some Usenet providers include their own newsreaders, or you may choose to use a standalone app.
You will then have to download a listing of all of the newsgroups using your newsreader app. The app will fetch and download all newsgroups, so this can take some time.
How to Find and Subscribe to Newsgroups
Newsgroups can either be read from a desktop newsreader app or use those that are included with your web browser or e-mail program, such as Google Groups. Specialized desktop programs allow you to manage the huge volume of messages in many newsgroups. Those that are built into web browsers are more limited in what they can do.
Before you can read a newsgroup, you must first subscribe to it. Once you have the listing of all the newsgroups, you can search for ones that you may be interested in. However, searching within the newsreader will not always help you find the right newsgroups for you. This is because many newsgroup names do not reflect their contents and other newsgroups may not cross your mind to search for.
It is easier to use Google to find newsgroups that you may want to subscribe to. There are also many web pages dedicated to helping you find newsgroups to subscribe to based on your interests. For example, Wikipedia has a list of the 8 most widely-used and distributed newsgroup hierarchies if you need somewhere to start. Once you find a newsgroup that you want to read, you can easily subscribe to it in the newsreader app.
Once subscribed, you can visit those newsgroups whenever you like. You can check just one or all of the newsgroups you've subscribed to. The simplest approach is to download all of the new responses from all newsgroups at the same time and then read them.
The newsreader software will display a list of all new responses for each newsgroup. The best way to manage this is to thread your reader so that it groups related responses together. To do this, order the postings by subject. That way you will see the postings and the responses together. You can easily unsubscribe from newsgroups you no longer want to follow.