
|
 |

Query
Language
You can search for any word or phrase on a Web site by typing the word or
phrase into a query form and clicking the button to execute the query
(for example, the Execute Query button on the sample query form). This
section covers the following topics:
Searches produce a list of files that contain the word or phrase no
matter where they appear in the text. This list gives the rules for
formulating queries:
- Consecutive words are treated as a
phrase; they must appear in the same order within a matching document.
- Queries are case-insensitive, so you
can type your query in uppercase or lowercase.
- You can search for any word except
for those in the exception list (for English, this includes a,
an, and, as, and other common words),
which are ignored during a search.
- Words in the exception list are treated
as placeholders in phrase and proximity queries. For example, if
you searched for Word for Windows, the results could
give you Word for Windows and Word and Windows,
because for is a noise word and appears in the exception
list.
- Punctuation marks such as the period
(.), colon (:), semicolon (;), and comma (,) are ignored during
a search.
- To use specially treated characters
such as &, |, ^, #, @, $, (, ), in a query, enclose your query
in quotation marks ().
- To search for a word or phrase containing
quotation marks, enclose the entire phrase in quotation marks and
then double the quotation marks around the word or words you want
to surround with quotes. For example, World-Wide Web or Web
searches for World-Wide Web or Web.
- You can insert Boolean
operators (AND, OR, and NOT)
and the proximity operator (NEAR)
to specify additional search information.
- The wildcard
character (*) can match words with a given prefix. The query
esc* matches the terms ESC, escape, and
so on.
- Free-text
queries can be specified without regard to query syntax.
- Vector space
queries can be specified.
- ActiveX (OLE) and file attribute
property value queries can be
issued.
Boolean
and Proximity Operators
Boolean and proximity operators can create a more precise query.
| To
Search For |
Example |
Results |
| Both terms in the same
page |
access
and basic
Or
access & basic |
Pages with both the
words access and basic |
| Either term in a page
|
cgi
or isapi
Or
cgi | isapi |
Pages with the words
cgi or isapi |
| The first term without
the second term |
access
and not basic
Or
access & ! basic |
Pages with the word
access but not basic |
| Pages not matching
a property value |
not
@size = 100
Or
! @size = 100 |
Pages that are not
100 bytes |
| Both terms in the same
page, close together |
excel
near project
Or
excel ~ project |
Pages with the word
excel near the word project |
Hints:
- You can add parentheses to nest expressions
within a query. The expressions in parentheses are evaluated before
the rest of the query.
- Use double quotes () to indicate
that a Boolean or NEAR operator keyword should
be ignored in your query. For example, Abbott and Costello
will match pages with the phrase, not pages that match the Boolean
expression. In addition to being an operator, the word and
is a noise word in English.
- The NEAR operator
is similar to the AND operator in that NEAR
returns a match if both words being searched for are in the same
page. However, the NEAR operator differs from AND
because the rank assigned by NEAR depends on the
proximity of words. That is, the rank of a page with the searched-for
words closer together is greater than or equal to the rank of a
page where the words are farther apart. If the searched-for words
are more than 50 words apart, they are not considered near enough,
and the page is assigned a rank of zero.
- The NOT operator
can be used only after an AND operator in content
queries; it can be used only to exclude pages that match a previous
content restriction. For property value queries, the NOT
operator can be used apart from the AND operator.
- The AND operator
has a higher precedence than OR. For example, the
first three queries are equal, but the fourth is not:a AND b OR
c
c OR a AND b
c OR (a AND b)
(c OR a) AND b
Note The
symbols (&, |, !, ~) and the English keywords AND,
OR, NOT,
and NEAR work the
same way in all languages supported by Index Server. Localized keywords
are also available when the browser locale is set to one of the following
six languages:
| Language |
Keywords |
| German |
UND,
ODER, NICHT, NAH |
| French |
ET,
OU, SANS, PRES |
| Spanish |
Y,
O, NO, CERCA |
| Dutch |
EN,
OF, NIET, NABIJ |
| Swedish |
OCH,
ELLER, INTE, NÄRA |
| Italian |
E, O,
NO, VICINO |
Note The
NEAR operator can be applied only to words or phrases.
Wildcards
Wildcard operators help
you find pages containing words similar to a given word.
Free-Text
Queries
The query engine
finds pages that best match the words and phrases in a free-text query.
This is done by automatically finding pages that match the meaning,
not the exact wording, of the query. Boolean, proximity, and wildcard
operators are ignored within a free-text query. Free-text queries are
prefixed with $contents.
Vector
Space Queries
The query engine supports vector space queries. Vector queries return
pages that match a list of words and phrases. The rank of each page
indicates how well the page matched the query.
| To
Search For |
Example |
Results |
| Pages that contain
specific words |
light, bulb |
Files with words that
best match the words being searched for |
| Pages that contain
weighted prefixes, words, and phrases |
invent*, light[50],
bulb[10], "light bulb"[400] |
Files that contain
words prefixed by invent, the words light,
bulb, and the phrase light bulb (the terms
are weighted) |
- Components in vector queries are separated
by commas.
- Components in vector queries can be
weighted by using the [weight] syntax.
- Pages returned by vector queries do
not necessarily match every term in the query.
- Vector queries work best when the
results are sorted by rank.
Property
Value Queries
With property value queries, you can find files that have property values
that match a given criteria. The properties over which you can query
include basic file information like file name and file size, and ActiveX
properties including the document summary (information) that is stored
in files created by ActiveX-aware applications.
There are two types of property queries:
- Relational
property queries consist of an at character (@),
a property name, a relational
operator, and a property value.
For example, to find all of the files larger than one million bytes,
issue the query @size > 1000000.
- Regular expression property queries
consist of a number sign (#), a property name, and a regular
expression for the property value. For example, to find to find
all of the video (.avi) files, issue the query #filename *.avi.
Regular expressions will never match the special properties contents
(#contents) and all (#all). Properties that are not retrievable
at query time cannot be used in # queries. these include HTML META
properties not stored in the property cache.
This section covers the following topics:
Property Names
Property names are preceded by either the at (@) or number
sign (#) character. Use @ for relational queries, and # for regular
expression queries.
If no property name is specified, @contents
is assumed.
Properties available for all files include:
| Property
Name |
Description |
| All |
Matches words, phrases,
and any property |
| Contents |
Words and phrases in
the file |
| Filename |
Name of the file |
| Size |
File size |
| Write |
Last time the file
was modified |
ActiveX property values can also be used in queries. Web sites with
files created by most ActiveX-aware applications can be queried for
these properties:
| Property
Name |
Description |
| DocTitle |
Title of the document |
| DocSubject |
Subject of the document
|
| DocAuthor |
The documents
author |
| DocKeywords |
Keywords for the document
|
| DocComments |
Comments about the
document |
For a complete list of property names, see the List
of Property Names later on this page.
Relational
Operators
Relational operators are used in relational property queries.
| To
Search For |
Example |
Results |
| Property values in
relation to a fixed value |
@size <
100
@size <= 100
@size = 100
@size != 100
@size >= 100
@size > 100 |
Files whose size matches
the query |
| Property values with
all of a set of bits on |
@attrib ^a
0x820 |
Compressed files with
the archive bit on |
| Property values with
some of a set of bits on |
@attrib ^s
0x20 |
Files with the archive
bit on |
Property Values
| To
Search For |
Example |
Results |
| A specific value |
@DocAuthor
= Bill Barnes |
Files authored by Bill
Barnes |
| Values beginning with
a prefix |
#DocAuthor
George* |
Files whose author
property begins with George |
| Files with any of a
set of extensions |
#filename *.|(exe|,dll|,sys|)
|
Files with .exe, .dll,
or .sys extensions |
| Files modified after
a certain date |
@write >
96/2/14 10:00:00 |
Files modified after
February 14, 1996 at 10:00 GMT |
| Files modified after
a relative date |
@write >
-1d2h |
Files modified in the
last 26 hours |
| Vectors matching a
vector |
@vectorprop
= { 10, 15, 20 } |
ActiveX documents with
a vectorprop value of { 10, 15, 20 } |
| Vectors where each
value matches a criteria |
@vectorprop
>^a 15 |
ActiveX documents with
a vectorprop value in which all values in the vector are greater
than 15 |
| Vectors where at least
one value matches a criteria |
@vectorprop
=^s 15 |
ActiveX documents with
a vectorprop value in which at least one value is 15 |
- Be sure to use the pound (#) character
before the property name when using a regular expression in a property
value, and an at (@) character otherwise. The equal
(=) relational operator is assumed for regular-expression queries.
- File name (#filename) is the only
property that efficiently supports regular expressions with wildcards
to the left of text.
- Date and time values are of the form
yyyy/mm/dd hh:mm:ss or yyyy-mm-dd hh:mm:ss. The
first two characters of the year and the entire time can be omitted.
If you omit the first two characters of the year, then 29 or less
is interpreted as the year 2000, and 30 or greater is interpreted
as the year 1900. All dates and times are in Greenwich Mean Time
(GMT).
- Dates and times relative to the current
time can be expressed with a minus (-) character followed by zero
or by more integer unit and time unit pairs. Time units are expressed
as: (y) for years, (m) for months, (w) for weeks, (d) for days,
(h) for hours, (n) for minutes, and (s) for seconds. A three-digit
millisecond value can be optionally specified after the seconds
value in date expressions. For example, 1997/12/8 10:10:03:452
- Currency values are of the form x.y,
where x is the whole value amount and y is the
fractional amount. There is no assumption about units.
- Boolean values are (t) or (true) for
TRUE and (f) or (false) for FALSE.
- Vectors (VT_VECTOR) are expressed
as an opening brace ({), followed by a comma-separated list of values,
then a closing brace (}).
- Single-value expressions that are
compared against vectors are expressed as a relational
operator, then a (^a) for all of or a (^s) for some
of.
- Numeric values can be in decimal or
hexadecimal (preceded by 0x).
- The contents property does
not support relational operators. If a relational operator is specified,
no results will be found. For example, @contents Microsoft will
find documents containing Microsoft, but @contents=Microsoft
will find none.
Regular
Expressions
Regular expressions in property queries are defined as follows:
- Any character except asterisk (*),
period (.), question mark (?), and vertical bar (|) defaults to
matching just itself.
- Regular expressions can be enclosed
in matching quotes (), and must be enclosed in quotes if they
contain a space ( ) or closing parenthesis ()).
- The characters *, ., and ? behave
as they behave in Windows; they match any number of characters,
match (.) or end of string, and match any one character, respectively.
- The character | is an escape character.
After |, the following characters have special meaning:
( opens a group. Must be followed by a matching ).
) closes a group. Must be preceded by a matching (.
[ opens a character class. Must be followed by a matching (un-escaped)
].
{ opens a counted match. Must be followed by a matching }.
} closes a counted match. Must be preceded by a matching {.
, separates OR clauses.
* matches zero or more occurrences of the preceding expression.
? matches zero or one occurrences of the preceding expression.
+ matches one or more occurrences of the preceding expression.
Anything else, including |, matches itself.
- Between square brackets ([]) the following
characters have special meaning:
^ matches everything but following classes. Must be the first character.
] matches ]. May only be preceded by ^, otherwise it closes the class.
- range operator. Preceded and followed by normal characters.
Anything else matches itself (or begins or ends a range at itself).
- Between curly braces ({}) the following
syntax applies:
|{m|} matches exactly m occurrences of the preceding expression.
(0 < m < 256).
|{m,|} matches at least m occurrences of the preceding expression.
(1 < m < 256).
|{m,n|} matches between m and n occurrences of the
preceding expression, inclusive. (0 < m < 256, 0 < n <
256).
- To match *, ., and ?, enclose them
in brackets (for example, |[*]sample will match *sample).
Query
Examples
| Example |
Results |
@size >
1000000 |
Pages larger than one
million bytes |
@write >
95/12/23 |
Pages modified after
the date |
Apple tree |
Pages with the phrase
apple tree |
"apple
tree" |
Same as above |
@contents apple
tree |
Same as above |
Microsoft and
@size > 1000000 |
Pages with the word
Microsoft that are larger than one million bytes |
"microsoft
and @size > 1000000" |
Pages with the phrase
specified (not the same as above) |
#filename *.avi |
Video files (the #
prefix is used because the query contains a regular expression) |
@attrib ^s
32 |
Pages with the archive
attribute bit on |
@docauthor
= John Smith |
Pages with the given
author |
$contents why
is the sky blue? |
Pages that match the
query |
@size <
100 & #filename *.gif |
Graphics Interchange
Format (GIF) files less than 100 bytes in size |
List
of Property Names
These properties are always available for queries. Additional properties
may also be available depending on the configuration of the Web server.
| Friendly
Name |
Datatype |
Property |
| A_HRef |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML
HREF. This property name was created for Microsoft® Site
Server and corresponds with the Index Server property name HtmlHRef.
Can be queried but not retrieved. |
| Access |
VT_FILETIME |
Last time
file was accessed. |
| All |
(not applicable) |
Searches
every property for a string. Can be queried but not retrieved. |
| AllocSize |
DBTYPE_I8 |
Size of disk
allocation for file. |
| Attrib |
DBTYPE_UI4 |
File attributes.
Documented in Win32 SDK. |
| ClassId |
DBTYPE_GUID |
Class ID
of object, for example, WordPerfect, Word, and so on. |
| Characterization |
DBTYPE_WSTR | DBTYPE_BYREF |
Characterization,
or abstract, of document. Computed by Index Server. |
| Contents |
(not applicable) |
Main contents
of file. Can be queried but not retrieved. |
| Create |
VT_FILETIME |
Time file
was created. |
| Directory |
DBTYPE_WSTR | DBTYPE_BYREF |
Physical
path to the file, not including the file name. |
| DocAppName |
DBTYPE_WSTR | DBTYPE_BYREF |
Name of application
that created the file. |
| DocAuthor |
DBTYPE_WSTR | DBTYPE_BYREF |
Author of
document. |
| DocByteCount |
DBTYPE_14 |
Number of bytes in a document. |
| DocCategory |
DBTYPE_STR | DBTYPE_BYREF |
Type of document such as a memo,
schedule, or whitepaper. |
| DocCharCount |
DBTYPE_I4 |
Number of
characters in document. |
| DocComments |
DBTYPE_WSTR | DBTYPE_BYREF |
Comments
about document. |
| DocCompany |
DBTYPE_STR | DBTYPE_BYREF |
Name of the company for which the
document was written. |
| DocCreatedTm |
VT_FILETIME |
Time document
was created. |
| DocEditTime |
VT_FILETIME |
Total time
spent editing document. |
| DocHiddenCount |
DBTYPE_14 |
Number of hidden slides in a Microsoft®
PowerPoint document. |
| DocKeywords |
DBTYPE_WSTR | DBTYPE_BYREF |
Document
keywords. |
| DocLastAuthor |
DBTYPE_WSTR | DBTYPE_BYREF |
Most recent
user who edited document. |
| DocLastPrinted |
VT_FILETIME |
Time document
was last printed. |
| DocLastSavedTm |
VT_FILETIME |
Time document
was last saved. |
| DocLineCount |
DBTYPE_14 |
Number of lines contained in a document. |
| DocManager |
DBTYPE_STR | DBTYPE_BYREF |
Name of the manager of the documents
author. |
| DocNoteCount |
DBTYPE_14 |
Number of pages with notes in a
PowerPoint document. |
| DocPageCount |
DBTYPE_I4 |
Number of
pages in document. |
| DocParaCount |
DBTYPE_14 |
Number of paragraphs in a document. |
| DocPartTitles |
DBTYPE_STR | DBTYPE_VECTOR |
Names of document parts. For example,
in Excel part titles are the names of spread sheets, in PowerPoint
slide titles, and in Word for Windows the names of the documents
in the master document. |
| DocPresentationTarget |
DBTYPE_STR|DBTYPE_BYREF |
Target format (35mm, printer, video,
and so on) for a presentation in PowerPoint. |
| DocRevNumber |
DBTYPE_WSTR | DBTYPE_BYREF |
Current version
number of document. |
| DocSlideCount |
DBTYPE_14 |
Number of slides in a PowerPoint
document. |
| DocSubject |
DBTYPE_WSTR | DBTYPE_BYREF |
Subject of
document. |
| DocTemplate |
DBTYPE_WSTR | DBTYPE_BYREF |
Name of template
for document. |
| DocTitle |
DBTYPE_WSTR | DBTYPE_BYREF |
Title of
document. |
| DocWordCount |
DBTYPE_I4 |
Number of
words in document. |
| FileIndex |
DBTYPE_I8 |
Unique ID
of file. |
| FileName |
DBTYPE_WSTR | DBTYPE_BYREF |
Name of file. |
| HitCount |
DBTYPE_I4 |
Number of
hits (words matching query) in file. |
| HtmlHRef |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML
HREF. Can be queried but not retrieved. |
| HtmlHeading1 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML
document in style H1. Can be queried but not retrieved. |
| HtmlHeading2 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML
document in style H2. Can be queried but not retrieved. |
| HtmlHeading3 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML
document in style H3. Can be queried but not retrieved. |
| HtmlHeading4 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML
document in style H4. Can be queried but not retrieved. |
| HtmlHeading5 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML
document in style H5. Can be queried but not retrieved. |
| HtmlHeading6 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of HTML
document in style H6. Can be queried but not retrieved. |
| Img_Alt |
DBTYPE_WSTR | DBTYPE_BYREF |
Alternate
text for <IMG> tags. Can be queried but not retrieved. |
| Path |
DBTYPE_WSTR | DBTYPE_BYREF |
Full physical
path to file, including file name. |
| Rank |
DBTYPE_I4 |
Rank of row.
Ranges from 0 to 1000. Larger numbers indicate better matches. |
| RankVector |
DBTYPE_I4 | DBTYPE_VECTOR |
Ranks of
individual components of a vector query. |
| ShortFileName |
DBTYPE_WSTR | DBTYPE_BYREF |
Short (8.3)
file name. |
| Size |
DBTYPE_I8 |
Size of file,
in bytes. |
| USN |
DBTYPE_I8 |
Update Sequence
Number. NTFS drives only. |
| VPath |
DBTYPE_WSTR | DBTYPE_BYREF |
Full virtual
path to file, including file name. If more than one possible path,
then the best match for the specific query is chosen. |
| WorkId |
DBTYPE_I4 |
Internal
ID for file. Used within Index Server. |
| Write |
VT_FILETIME |
Last time
file was written. |
Defining
New Property Names
To define properties that are not in the previous list, you must list
them in a [Names] section in the .idq file. To use these properties
in a restriction, sort specification, or as a retrieved column, you
have define them in the .idq file, using the following format:
[Names]
#Properties that are not in the standard list
Propertyname ( Datatype
) = GUID [" Name"
| propid]
In the syntax, "Name" is
the property name ( "Sales"
in the following example), and propid
is the property ID in hexadecimal. Note that you need to surround the
friendly name with quotation marks, but the property ID does not take
quotation marks.
For example, suppose you want to define an HTML meta tag as a property
name that somebody can search for. The property you want to define is
Sales.
To define the Sales
property
- In the .idq file, under the [Names]
section, add the following line.
MetaDescription(DBTYPE_WSTR) = d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1
"Sales"
The GUID number comes from the MetaTagClsid parameter
in the registry, at the following location:
HKEY_LOCAL_MACHINE
\SYSTEM
\CurrentControlSet
\Control
\HtmlFilter
\MetaTagClsid
- Then, in the HTML files where you
want the tag to appear, define the meta description.
For example, say you want to search for
all files that give sales projections for the future:
In File1.htm:
<META NAME="Sales" CONTENT="Projections for 1998">
In File2.htm:
<META NAME="Sales" CONTENT="Projections for 1999">
In File3.htm:
<META NAME="Sales" CONTENT="Sales in 1997">
Note Be
sure to add your META NAME tags between the <head> and </head>
HTML tags at the beginning of the file.
You can now search for all files that show sales projections. Send the
following query:
@metadescription projections
This query returns all the files with the word projections
in the CONTENT field of the meta tag. In this example, File1.htm and
File2.htm are returned.
But suppose you want to search for sales by year, for example a list
of sales in 1997. Send the following query:
@metadescription 1997
File3.htm is returned.
|
 |
© Regional Justice Information Service
4255 West Pine Boulevard St. Louis, Missouri 63108
Terms of Use
|
|