|
Google
is a full-text search
engine, it indexes entire web pages rather than just titles or site index
pages.
Basic
Boolean
Google's Boolean default is AND; that means if you enter query words without
modifiers, Google will search for all of them. If you search for:
snowblower Honda "Green Bay"
Google will search for all the words. If you want to specify that either
word is acceptable, you put an OR between each item:
snowblower OR snowmobile OR "Green Bay"
If you want to definitely have one term and have one of two or more other
terms, you group them with parentheses, like this:
snowblower (snowmobile OR "Green Bay")
This query searches for the word "snowmobile" or phrase "Green
Bay" along with the word "snowblower." A stand-in for OR
borrowed from the computer programming realm is the | (pipe) character,
as in:
snowblower (snowmobile | "Green Bay")
If you want to specify that a query item must not appear in your results,
use a - (minus sign or dash).
snowblower snowmobile -"Green Bay"
This will search for pages that contain both the words "snowblower"
and "snowmobile," but not
the phrase "Green Bay."
Google is not case sensitive. If you search for Three,
three, or THREE,
you're going to get the same results.
Rearranging
your query can have quite an effect.
The order in which you put your keywords in a Google query can be every
bit as important as the query words themselves. Rearranging a query can
change not only your overall result count but also what results rise to
the top. While one might expect this of quote-enclosed phrases—"have
you any wool" versus "wool you any have"—it may come
as a
surprise that it also affects sets of individual query words.
Even
if you don't specify a search as a phrase, Google accords any occurence
of the words as a phrase greater weight and more prominence. This is followed
by measures of adjacency between the words and then, finally, the weights
of the individual words themselves.
Example
Search for "pipe systems" gas. Now query for "pipe systems"
gas gas. You'll
notic e that the focus of your results changes slightly. Now try "pipe
systems" pipe
pipe gas gas. Note how the focus slants back the other way.
Repetition
matters when it comes to keywords weighting your queries.
Using keywords multiple times can have an impact on the types and number
of results you get.
Don't believe me? Try searching for internet. At the time of this writing
Microsoft was the
first result. Now try searching for internet internet. At this writing
Yahoo! popped to
the top.
10
word max
Google
does not accept more than 10 query words, or stemming or any other wildcards
but uses the * as a full word wildcard Searching for "three
* mice" in Google would find "three blind mice,"
"three blue mice," "three red mice," and so forth.
Google
doesn't count a the * wildcard toward the10 word limit. So when you have
more than 10 words, substitute a wildcard for common words like so:
"do as * say not as * do" quote origin English usage
Presto! Google runs the search without complaint
Common
words such as "I," "a," "the,"
and "of" actually do no good in the first place. Called "stop
words," they are ignored by Google entirely. To force Google to take
a stop word into account, prepend it with a + (plus) character, as in:
+the.
Special
Syntaxes
intitle:
intitle: restricts your search to the titles of web pages. The variation,
allintitle: finds pages wherein all the words specified
make up the title of the web page. It's probably best to avoid the allintitle:
variation, because it doesn't mix well with some of the other syntaxes.
intitle:"george bush"
allintitle:"money supply" economics
inurl:
inurl: restricts your search to the URLs of web pages. This syntax tends
to work well
for finding search and help pages, because they tend to be rather regular
in composition.
An allinurl: variation finds all the words listed in a URL but doesn't
mix well with
some other special syntaxes.
inurl:help
allinurl:search help
intext:
intext: searches only body text (i.e., ignores link text, URLs, and titles).
There's an
allintext: variation, but again, this doesn't play well with others. While
its uses are
limited, it's perfect for finding query words that might be too common
in URLs or link
titles.
intext:"yahoo.com"
intext:html
inanchor:
inanchor: searches for text in a page's link anchors. A link anchor is
the descriptive
text of a link. For example, the link anchor in the HTML code <a
href="http://www.oreilly.com>O'Reilly and Associates</a>
is "O'Reilly and Associates."
inanchor:"tom peters"
site:
site: allows you to narrow your search by either a site or a top-level
domain.
AltaVista, for example, has two syntaxes for this function (host: and
domain:), but
Google has only the one.
site:loc.gov
site:thomas.loc.gov
site:edu
site:nc.us
link:
link: returns a list of pages linking to the specified URL. Enter
link:www.google.com and you'll be returned a list of pages that link to
Google.
Don't worry about including the http:// bit; you don't need it, and, indeed,
Google
appears to ignore it even if you do put it in. link: works just as well
with "deep"
URLs—http://www.raelity.org/apps/blosxom/ for instance—as
with top-level URLs such
as raelity.org.
cache:
cache: finds a copy of the page that Google indexed even if that page
is no longer
available at its original URL or has since changed its content completely.
This is
particularly useful for pages that change often.
If Google returns a result that appears to have little to do with your
query, you're almost
sure to find what you're looking for in the latest cached version of the
page at Google.
cache:www.yahoo.com
daterange:
daterange: limits your search to a particular date or range of dates that
a page was
indexed. It's important to note that the search is not limited to when
a page was created,
but when it was indexed by Google. So a page created on February 2 and
not indexed by
Google until April 11 could be found with daterange: search on April 11.
Remember also that Google reindexes pages. Whether the date range changes
depends on
whether the page content changed. For example, Google indexes a page on
June 1.
Google reindexes the page on August 13, but the page content hasn't changed.
The date
for the purpose of searching with daterange: is still June 1.
Note that daterange: works with Julian not Gregorian dates (the calendar
we use every day.) There are Gregorian/Julian converters online.
"George Bush" daterange:2452389-2452389
neurosurgery daterange:2452389-2452389
filetype:
filetype: searches the suffixes or filename extensions. These are usually,
but not
necessarily, different file types. I like to make this distinction, because
searching for
filetype:htm and filetype:html will give you different result counts,
even
though they're the same file type. You can even search for different page
generators, such
as ASP, PHP, CGI, and so forth—presuming the site isn't hiding them
behind redirection
and proxying. Google indexes several different Microsoft formats, including:
PowerPoint
(PPT), Excel (XLS), and Word (DOC).
homeschooling filetype:pdf
"leading economic indicators" filetype:ppt
related:
related:, as you might expect, finds pages that are related to the specified
page. Not
all pages are related to other pages. This is a good way to find categories
of pages; a
search for related:google.com would return a variety of search engines,
including HotBot, Yahoo!, and Northern Light.
related:www.yahoo.com
related:www.cnn.com
info:
info: provides a page of links to more information about a specified URL.
Information
includes a link to the URL's cache, a list of pages that link to that
URL, pages that are
related to that URL, and pages that contain that URL. Note that this information
is
dependent on whether Google has indexed that URL or not. If Google hasn't
indexed that
URL, information will obviously be more limited.
info:www.oreilly.com
info:www.nytimes.com/technology
phonebook:
phonebook:, as you might expect, looks up phone numbers.
phonebook:John Doe CA
phonebook:(510) 555-1212
You
can mix syntax but be carefull or your result may be meaningless.
A good example is say you want to get an idea of what databases are offered
by the state of Texas. Run this search: intitle:search intitle:records
site:tx.us
You'll find 32 very targeted results. And of course, you can narrow down
your search even more
by adding keywords: birth intitle:search intitle:records site:tx.us
Don't
mix syntaxes that will cancel each other out, such as:
site:ucla.edu -inurl:ucla or over use in the same search e.g. site:com
site:edu or get too narrow e.g. title:agriculture site:ucla.edu inurl:search.
ADVANCED
SEARCH http://www.google.com/advanced_search?hl=en
PREFERENCES
http://www.google.com/preferences?hl=en
to retain any changes to this page you must have cookies turned on.
Filtering
Google's SafeSearch filtering affords you a method of avoiding
search results that may offend
your sensibilities. The default is no filtering. Moderate filtering rules
out explicit images, but not
explicit language. Strict filtering filters both on text and images but
be carefull when you're searching for
words that might be caught by a filter, like "breast cancer."
The
language tools are available by clicking "Language Tools"
on the front page or by going to
http://www.google.com/language_tools?hl=en.
Don't rely on Google's translation tools to give you more than the "gist"
of the
meaning (machine translation isn't as good as a human) The translation
can be usefull for example Select a word that matches your topic and use
the translator to translate it into another language.
(Google's translation tools work very well for single-word translations
like this.) Now, search for
that word in a country and language that don't match it. For example,
you might search for the
German word "Landstraße" (highway) on French pages in
Canada. Of course, you'll have to be
sure to use words that don't have English equivalents or you'll be overwhelmed
with result
Specialized
Vocabularies
the Glossarist
site at http://www.glossarist.com;
it's a
searchable subject index of about 6,000 different glossaries covering
a variety of different topics.
The On-Line
Medical Dictionary
http://cancerweb.ncl.ac.uk/omd/
Law.com's
Legal Dictionary
http://dictionary.law.com/lookup2.asp
MedTerms.com
http://www.medterms.com/
Whatis
http://whatis.techtarget.com
A searchable subject index of computer terminology, from software to telecom.
This is
especially useful if you're got a hardware- or software-specific word,
because the
definitions are divided up into categories. You can also browse alphabetically.
Annotations are good and are often cross-indexed.
Webopedia http://www.pcwebopaedia.com/
Searchable by keyword or browseable by category. Also has a list of the
newest entries on
the front page so you can check for new words.
Netlingo http://www.netlingo.com/framesindex.html
This is more Internet-oriented. This site shows up with a frame on the
left containing the
words, with the definitions on the right. It includes lots of cross-referencing
and really old
slang.
Tech Encyclopedia http://www.techweb.com/encyclopedia/
Features definitions and information on over 20,000 words. Top 10 terms
searched for are
listed so you can see if everyone else is as confused as you are. Though
entries had
before-the-listing and after-the-listing lists of words, I saw only moderate
crossreferencing.
Slang
words can assist your searches
The Probert
Encyclopedia—Slang
http://www.probertencyclopaedia.com/slang.htm
Slang is from all over the world. It's often crosslinked, especially drug
slang.
A Dictionary of Slang
http://www.peevish.co.uk/slang/
This site focuses on slang heard in the United Kingdom,
Surfing for Slang
http://www.linkopp.com/members/vlaiko/slanglinks.htm
Of course each area in the world has its own slang.
With
slang, and specialized vocabularies add slowly—one word at a time—and
anticipate that it will
narrow down your search results very quickly. For example, take the word
"spudding," often used
in association with oil drilling. Searching for spudding by itself finds
only about 2500 results
on Google. Adding Texas knocks it down to 525 results, and this is still
a very general search!
Add specialty vocabulary very carefully or you'll narrow down your search
results to the point
where you can't find what you want.
Searching
for Images
http://images.google.com/advanced_image_search
Google
Images indexes only JPEG and GIF files
Google's
image search starts with a plain keyword search. Images are
indexed under a variety of keywords, some broader than others; be as
specific as possible. If you're searching for cats, don't use cat as a
keyword unless you don't mind getting results that include "cat scan."
Use
words that are more uniquely cat-related, like feline or kitten. Narrow
down your query as much as possible, using as few words as possible. A
query like feline fang, which would get you over 3,000 results on Google,
will get you no results on Google Image Search; in this case, cat fang
works better. (Building queries for image searching takes a lot of patience
and experimentation.)
Searching
Google Images can be a real crapshoot, because it's difficult to
build multiple-word queries, and single-word queries lead to thousands
of
results. You do have more options to narrow your search both through the
Advanced Image Search interface and through the Google Image
Search special syntaxes.
Google
Images offers a few special syntaxes:
intitle:
Finds
keywords in the page title. This is an excellent way to narrow down
search results.
filetype:
Finds
pictures of a particular type. This only works for JPEG and GIF, not
BMP, PNG, or any number of other formats Google doesn't index. Note that
searching for filetype:jpg and filetype:jpeg will get you different
results, because the filtering is based on file extension, not some deeper
understanding of the file type.
inurl:
As with
any regular Google search, finds the search term in the URL. The
results for this one can be confusing. For example, you may search for
inurl:cat and get the following URL as part of the search result:
www.example.com/something/somethingelse/something.html
Hey, where's the cat? Because Google indexes the graphic name as part
of
the URL, it's probably there. If the page above includes a graphic named
cat.jpg, that's what Google is finding when you search for inurl:cat.
It's
finding the cat in the name of the picture, not in the URL itself.
site:
As with
any other Google web search, restricts your results to a specified
host or domain.
Don't use this to restrict results to a certain host unless you're really
sure what's there.
Instead, use it to restrict results to certain domains. For example, search
for football.site:uk and then search for football.
site:com is a good example of how dramatic a difference using site: can
make.
With
the largest collection of web documents in the world, Google is a reflection
of the Web
Other
Google usefull URL's include
The Google Directory http://directory.google.com/
is a searchable subject index based
on The Open Directory Project
Usenet is a worldwide network of discussion groups. Google Groups
http://groups.google.com/
has archived Usenet's discussions back 20 years in some
places, providing an archive that offers over 700 million messages.
Google Images http://images.google.com/
offers an archive of over 330 million images
culled from sites all over the web.
Google News http://news.google.com/
is still in beta at the time of this writing. It checks
over 4,000 sources for news and updates the database once an hour.
Searching print mail-order catalogs probably isn't the first thing that
pops into your mind
when you think of Google, but you can do it here. Google Catalogs
http://catalogs.google.com/
has digitized and made available catalogs in a dozen
different categories.
There's no telling what you'll find at Google Labs http://labs.google.com/;
it's where
Google parks their works -in-progress and lets the general public play
with `em.
Google Answers
http://answers.google.com/
is all about smart folks. Independent Google Answers answer
questions for a price set by the person asking the questions. Sources
used are restricted to open
web collections, and Google is building a database of the answers.
http://www.google.com/advanced_search
provides narrowed views of its index along various lines and topics
Googlism
what google thinks about you or anything else.
http://www.googlism.com/
Google
People
http://www.avaquest.com/demos
GooglePeople takes a "Who Is" or "Who Was" query
(e.g., "Who was the first man on the moon?" or "Who was
the fifth president of the United
States?") and offers a list of possible candidates. It works well
for some questions, but for others
it's way off base.
| For
further information see Google Hacks by Tara Calashain Published by
O'Reilly |
|
|
|