Teaching Legal Professionals How To Do Research
Teaching Legal Professionals How To Do Research

Home > Internet Research Articles > Web Searching with Advanced Commands 

 Web Searching with Advanced Commands

 

Genie Tyburski, Web Manager, The Virtual Chase

 


 
 
del.icio.us [Slashdot] [Google]    

11 October 2007, updated and corrected 15 October 2007. Seeking out facts, and even basic information on a topic, is relatively easy. Enter 2 or 3 relevant keywords at your favorite search engine. But going beyond the basic, or conducting investigative research, often means using advanced search commands, not to mention additional or more targeted finding tools. This article examines the first issue - using advanced search commands to manipulate or improve search results.

Word Order Counts. Although not a command, per se, the order in which you enter search terms affects the ranking of the results. Be sure to enter the keyword representing the most important concept first. Witness the difference between the queries, asthma allergy link and link asthma allergy.

To Filter or Not. To reduce the number of results from any single Web site, Google automatically filters matches so that only 1 or 2 from the same source appear. But in intellectual property, or other types of investigative research, you may want to see all the hits from particular sites. Add the command, &filter=0, to the URL Google generates when it displays the search results. See: filtered example and unfiltered example.

Limit Scope to a Web Site. It's unfortunate, but some key Web sites have lousy search engines - or worse, no search engine at all. You can limit the results of a query to just one Web site with the site command (site:). Alternatively, you can use it to limit the scope of the research to a particular top-level domain (site:.gov).

Yahoo enables a refinement of sorts. The command, hostname:, lets you limit the scope of a search to a particular computer name or host. For instance, suppose you want to find the keyword, bites, where it appears within the accidents and injuries section on FindLaw. (I don't mean to imply that FindLaw has a lousy search engine. It does not.) Note how the URL (injury.findlaw.com) for this section reflects a distinct computer name. By using the hostname command, you can eliminate matches from other sections of FindLaw like this: bites hostname:injury.findlaw.com.

Note how the results differ from a query entered like this: bites site:injury.findlaw.com. Using the hostname command is even more limiting than the site command. In this example, it excludes several articles because the URL for these begins with another host name; e.g., usatoday.injury.findlaw.com.

Limit Scope to Part of a URL. To avoid missing potentially relevant information, such as the articles mentioned in the previous paragraph, you could alter the query slightly and use the inurl (inurl:) command. For example, the query, bites site:findlaw.com inurl:injury, finds more than bites hostname:injury.findlaw.com and less than bites site:findlaw.com. Which you use depends on the research issue and how thorough you want to be.

Select Advanced Search Commands

Limit To

Ask

Exalead

Google

Live

Yahoo

Title

intitle:

intitle:

intitle:
allintitle:

intitle:

intitle:

URL

inurl:

inurl:

inurl:
allinurl:

NA

inurl:

Link Text

NA

NA

inanchor:
allinanchor:

NA

NA

Text

NA

NA

intext:
allintext:

NA

NA

Cache

NA

NA

cache:

NA

NA

Definition

 

 

define:

 

 

File Type

NA

filetype:

filetype:

filetype:

originurlextension:

Link To

NA

link:

link:

+link: or +linkdomain:

link:

Phone

address of name or phone number of name or phone listings for name

NA

phonebook:
rphonebook:

NA

NA

Phonetic Spelling

NA

soundslike:

NA

NA

NA

Site

site:; cannot exclude (-site:.gov)

site:

site:

site:

site:
hostname:

Stemming

automatic

wordroot*

automatic; prevent by using +keyword

automatic; prevent by using +keyword

automatic; prevent by using +keyword

Stocks

stock quote ticker or ticker stock quote

 

stocks:

stock or quote ticker (no colon) or ticker stock

quote ticker (no colon)

Weather

location weather or forecast (no colon) or forecast location

 

weather location
(no colon)

weather or forecast location
(no colon)

weather location
(no colon)

(Click here to print the chart separately.)


 
 

Limit Scope to a Type of File. In company research, it's often entertaining, if not enlightening, to limit the scope of a search to certain types of files. I once limited a company search to PowerPoint presentations and found a file the company probably hadn't intended to release.

The command you use depends on the search engine. At Yahoo, its originurlextension: (originurlextension:ppt). At Exalead, Live (Microsoft) and Google, it's filetype: (filetype:ppt). The command is not available at Ask.

Using Cached Pages. Cached pages have lots of uses in research. You might want to examine information as it previously appeared. Or the page you need might not be available. Sometimes you just want to weed out all the bells and whistles to read the matching content. Recently, one corporate security researcher told me she uses cached pages to find information about defunct companies.

For many purposes, you would simply follow the labeled cache page in the search results. But if you are using cached pages as a way to avoid the site's server (concerns about malicious code, workaround network filters), then you want to limit the browsing to the text cache.

As far as I know, Google is the only search engine to enable viewing just the text on a page. To do this, add &strip=1 to the cached page URL. Since the cached page defaults to pulling non-textual elements from the Web site's server, you should activate the command without following the cached link. To do this, right click the cached link, copy and paste the URL into the browser address line, and add &strip=1 to the URL.

Each of the search engines, except for Ask, lets you view caches of certain file types as HTML. For instance, run a query, companyname filetype:ppt. Remember to replace the filetype command with originurlextension: if you use Yahoo. Look for the cached links at Live, the preview links at Exalead, or the view as HTML links at Yahoo or Google. These options let you display the proprietary file type as HTML.

Google lets you display the cache of a particular Web page. Use the cache (cache:) command followed by the URL, like this: cache:http://www.virtualchase.com/index.shtml.

Proximity Searching. Exalead is the only search engine that provides a command (NEAR) for proximity searching. It finds keyword1 within 16 words of keyword2, in any order. For example, to find e-mail addresses at a particular domain, you might search, email NEAR domain.com.

While Exalead provides a familiar command, you can simulate this query at other search engines. Google and Yahoo let you use an asterisk as a wildcard so that the query, email * domain.com, searches for the word, email, within one or more words of the domain name, in that order. (15 October 2007. In the article as originally published, I stated that Ask and Live also support wildcard searches. I should have run more tests. They do not. Credit for the correction goes to search experts Gwen Harris and Greg Notess.)

At Google, you can use any number of asterisks between keywords, but doing so seems to narrow the query. While it's not always precise, 2 asterisks return matches with at least 2 words (not including stop words) between the key terms. See the difference between privacy * pretexting and privacy * * pretexting.

Because this technique is a word order search, don't forget to reverse it if the word order isn't important. For instance, in a search for information about treatments for a medical condition, you might try: treatment * conditionname, and then conditionname * treatment. You could combine the two search statements to run a single query, like this: treatment * dyslexia | dyslexia * treatment. (The vertical bar represents OR at Google.)

Date Searching. For the most part, date searching continues to be a problem because the date the search engines use is a server time stamp. Recently, however, Google added the ability to restrict queries to newly indexed Web pages. This helps somewhat by limiting a query to pages in the index, which Google recently discovered.

To find newly indexed pages, add the command, &as_qdr=qN - where q equals d (days), w (weeks) or y (years) and N equals a number - to the URL for the keyword search results, like this: new (to Google) Web pages on pretexting within the past 15 days.

Searching with Synonyms. In initial research, you might want to conduct trial-and-error queries to discover the keywords that retrieve relevant results. This technique may be especially useful if you are unfamiliar with the topic.

While it's best to use a thesaurus to identify possible synonyms, and then string them together with OR, you can do a quick-and-dirty synonym search at Google by inserting a tilde (~) in front of the search term. For example, the query, teens addictive ~behavior, also finds matches to teens addictive personality. Note that not all keywords will have synonyms at Google. Use of the tilde before teens or addictive, for instance, will not affect the search.

Want More? Several Web sites follow developments in search commands and power searching. A few of my favorites include Search Engine Showdown, Google Guide and ResearchBuzz.

 
 


 
 

5-star rating in The Best (and Worst) Legal Sites on the Web

Copyright: 1996 - 2008 Ballard Spahr Andrews & Ingersoll, LLP all rights reserved. Select graphics copyrighted by Jupiterimages Corporation.

Disclaimer: The materials in The Virtual Chase® are informational and provided "as is" without express or implied warranty.

 

Created: 11 October 2007
Revised: 15 May 2008 (text revisions)
URL: http://www.virtualchase.com/articles/advanced_search_commands.html

Suggestions: Genie Tyburski, tvceditor [at] virtualchase [dot] com