Is Windows Vista index-based full-text search powerful enough?

  • Thread starter Thread starter Peter Frank
  • Start date Start date
OK, good to hear. However, what about extensive adding, moving,
copying and deleting of files? My scenario would regularly include a
lot of these operations, so I wonder whether this would slow my
computer considerably due to the re-indexing that Windows Vista would
have to perform. By the way, does Windows Vista re-index immediately
after any changes in a location marked for indexing or only when it is
idle or ...?

Peter

Really don't know, don't move my files around a lot.

I believe it runs after boot, and indexes what's changed, don't know
if it runs mid session on changes ?

Regards
 
It stays out of your way and only operates at idle time. It takes a while to
get the thing built initially. But once it's built the maintenance seems
pretty minor. I use it heavily. (In fact, one of the main reasons I hate
going back to XP now is because I'm spoiled and sick-and-tired of having to
navigate all over the place all the time like I've been doing since the DOS
era. And I'm sick of those stupid Search Companion searches too ;-).

But now that I've offended all of the Search Companion fans.

I don't know what you mean by "extensive" file management. But as long as
you're using reasonably fast hardware you shouldn't have any problem with
the index keeping up in the background.

If you're going to load a zillion document files onto Vista right after
installation, let it sit idle whenever you can for a while. Because that
first step of building the index is (as you might imagine) pretty labor
intensive and takes some time.

(If you've ever written a search index type of thing, you can only imagine
what a nightmare it must be in Vista where you have to worry about umpteen
jillion file types being impinged upon by umpteen million programs from
umpteen thousand software vendors, each of whom makes up their own rules on
the fly).
 
- We put a lot of effort into 'backing-off' the indexing so it doesn't
interfere with the user's normal use of the machine. So in general this
shouldn't be an issue.

- You can move the location of the index files to a different location using
the Indexing Options Control Panel.

Dave
 
Oh, and while I'm on a roll:

The latest query syntax doc is here:

http://www.microsoft.com/windows/desktopsearch/addresources/advanced3.mspx

The one someone else posted was for Windwso Search 2.6, which will be very
similar but not absolutely identical.

Also you discussed index size relative to documents size. There's no way to
estimate this exactly, because different files contribute different amounts
to the index size {e.g. pictures less, text docs more}. But we typically see
a range from less than 5% up to maybe 15%.

Dave
 
I thought I read something about it having to be Reader version 6 at
least....so newer would be better I presume.

Pardon me if I misread though....

FG
 
While you are on a roll would you like to find a correct query language
reference and post that.

If we are to believe the desktop search syntax (my point being is that it
lies) it is not possible to search for contents (2.6 or 3). Yet Index Server
does have a Content= field (which if I recall correctly has servere
limitations), Yet Indexing Server docs have disappeared from the Vista PSDK.
 
Well just typing text into the seach field searches the contents of items in
indexed locations. But if you specifically want to search the contents but
not filename, subject etc. you can use contents:search_term.
 
Hmmmmmm The syntax "contents:search_term" is not even listed in the syntax
page you specified below.

So having said that...is that listing you link to an abridged version?

If yes - then where is the full listing please?

Otherwise its like trying to repair a car using pliers and a couple of
spanners......

Can do some things but not enough to complete the job!

Cheers

FG
 
I tried @content as well as content:

If I bother constructing a search it's because I need to be precise to avoid
massive number of false hits. Not specifing any field wll find contents but
using either Indexing Server syntax or AQS and specifing the contents it is
not found.

In XP one entered advanced searching syntax in the containing text field.
But it would only parse it if indexing was on else it searched for the
characters. How do the Search Options affect searching.
 
Also there is no mention of wildcards in documentation. For instance NT
wildcards are different to dos, and modified so dos ones will work on NT.
Yet how does search parse wildcards, NT (a reg exp with dos compat), NT pure
(Reg Exp), or simple Dos (first * is assumed to be the end of the search
expression). Or perhaps it uses the programming languages RegExps. What
about word stemming.

What about a complete reference to search. I won't use it unless I know what
it is going to do. I need to have confidence in the results of a search.

I have RegExp scripts I use to search (for contents) and use For in cmd to
traverse the tree. I find things incl unicode in non unicode files. I know
when my search is complete that if nothing is found then their is nothing to
be found. Unfortunately it takes a long time to parse a disk or part of one.
 
Yes, "content:" is missing from that doc unfortunately. I believe there's a
more complete MSDN doc in the works but it is not available yet.

The behavior is that by default a search term on its own searches all
properties, including file contents. If you specifically want to search only
file names you can use the "filename:" keyword. If you specifically want to
search only contents you can use the "content:" keyword.

If you change the search options to say "Always search filenames only" then
a search term on its own only searches filenames. If you need to search
contents also, you can use the "content:" keyword {indexed locations only}.

Is this the behavior you are seeing? If not let us know.

Dave
 
I'm busy at the moment. I will give a reply in a day (with some length).
Please keep monitoring.

PS Sorry it big. I spent decades searching and transforming data with search
and replace (and when there was reuse writing programs to do same).

What about wildcards?

does *.t*x finds txt files and not ttf files (as it would in dos but not
cmd)

does * mean the same as *.* (as it does in Dos and Cmd as Cmd maintains
compat with command.com)

Do wildcards work in non filename fields, like DocAuthor. If so which and
what rules.

I only have Office 95 installed at the moment (for the Dictionary from
Bookshelf Basics, next thing to install when I get around to it, is Works
Suite 6 for Encarta. Later versions of Office are a long way away). In OLEDB
in Control Panel I don't seem to have access to the database driver for
Indexing. Is this because I have 95 and it can't use it or is it not
included in Vista. If not what is it included with.

One thing I want to do is compare Explorer's object properties with the
Search's docs. EG these are the 267 properties Explorer can show [So I need
a few hours to compare - also what dll are the words in for AQS so it can
parse it - is it a MUI thingy].


Name, Size, Type, Date modified, Date created, Date accessed, Attributes,
Offline status, Offline availability, Perceived type, Owner, Kinds, Date
taken, Artists, Album, Year, Genre, Conductors, Tags, Rating, Authors,
Title, Subject, Categories, Comments, Copyright, #, Length, Bit rate,
Protected, Camera model, Dimensions, Camera maker, Company, File
description, Program name, Duration, Is online, Is recurring, Location,
Optional attendee addresses, Optional attendees, Organizer address,
Organizer name, Reminder time, Required attendee addresses, Required
attendees, Resources, Free/busy status, Total size, Account name, Computer,
Anniversary, Assistant's name, Assistant's phone, Birthday, Business
address, Business city, Business country/region, Business P.O. box, Business
postal code, Business state or province, Business street, Business fax,
Business home page, Business phone, Callback number, Car phone, Children,
Company main phone, Department, E-mail Address, E-mail2, E-mail3, E-mail
list, E-mail display name, File as, First name, Full name, Gender, Given
name, Hobbies, Home address, Home city, Home country/region, Home P.O. box,
Home postal code, Home state or province, Home street, Home fax, Home phone,
IM addresses, Initials, Job title, Label, Last name, Mailing address, Middle
name, Cell phone, Nickname, Office location, Other address, Other city,
Other country/region, Other P.O. box, Other postal code, Other state or
province, Other street, Pager, Personal title, City, Country/region, P.O.
box, Postal code, State or province, Street, Primary e-mail, Primary phone,
Profession, Spouse, Suffix, TTY/TTD phone, Telex, Webpage, Status, Content
type, Date acquired, Date archived, Date completed, Date imported, Client
ID, Contributors, Content created, Last printed, Date last saved, Division,
Document ID, Pages, Slides, Total editing time, Word count, Due date, End
date, File count, Filename, File version, Flag color, Flag status, Space
free, Bit depth, Horizontal resolution, Width, Vertical resolution, Height,
Importance, Is attachment, Is deleted, Has flag, Is completed, Incomplete,
Read status, Shared, Creator, Date, Folder name, Folder path, Folder,
Participants, Path, Contact names, Entry type, Language, Date visited,
Description, Link status, Link target, URL, Media created, Date released,
Encoded by, Producers, Publisher, Subtitle, User web URL, Writers,
Attachments, Bcc addresses, Bcc names, Cc addresses, Cc names, Conversation
ID, Date received, Date sent, From addresses, From names, Has attachments,
Sender address, Sender name, Store, To addresses, To do title, To names,
Mileage, Album artist, Beats-per-minute, Composers, Initial key, Mood, Part
of set, Period, Color, Parental rating, Parental rating reason, Space used,
EXIF version, Event, Exposure bias, Exposure program, Exposure time, F-stop,
Flash mode, Focal length, 35mm focal length, ISO speed, Lens maker, Lens
model, Light source, Max aperture, Metering mode, Orientation, Program mode,
Saturation, Subject distance, White balance, Priority, Project, Channel
number, Episode name, Closed captioning, Rerun, SAP, Broadcast date, Program
description, Recording time, Station call sign, Station name, Auto summary,
Summary, Search ranking, Sensitivity, Shared with, Product name, Product
version, Source, Start date, Billing information, Complete, Task owner,
Total file size, Legal trademarks, Video compression, Directors, Data rate,
Frame height, Frame rate, Frame width, Total bitrate,
 
To give you an idea I'm searching Drive C with non indexed, hidden etc
looking for a dll with Name = AQS (via field in advanced search) in the
name - name:*aqs* - the search is still going minutes later. Cmd searched
the disk in 1 or 2 seconds, using

dir c:\*aqs*.* /a /s

[and did I say, after making a cup of coffee that search is still
searching - it's when the progress bar starts overwriting the stop button
that is slow.]

There must be some problem - this looking for AQS in a file name is now over
15 minutes of disk churning. Am I in fact searching for AQS anywhere in a
filename (not ext or path) where AQS may start, be in the middle, or at the
end of the name (or the only string in the name). I have (I haven't counted
for years) over 50,000 documents.

I must have been using contents keyword -content did find both names and
strings of digits (which humans call numbers).

Another 6 minutes go by since my last writing about speed while prograss bar
crawls across the stop button. It only took a minute or so to reach the stop
button (it's 1/2 way across the stop button).

Also there is no go button on Search Field. How does a mouse user initiate a
search?
Why does it delete my search terms when I hit the stop? Where is
autocomplete? Where is a Search History.

Have you noticed since IE4 was released MS has insisted we type rather than
use mouses. I will cut and paste.

Another 5 minutes go by, search seems no closer to finishing. Can't wait for
this to finish.
 
Interesting, as soon as I sent last news post Search Finished as soon as
message was sent. The post being open was interfering in search (as an
hypothsis).
 
I turned off search. It doesn't work. It can't stop, it can't find anything,
it defaults to not finding files, you can't tell it to start, it keeps
clearng Search All Files, and on and on. It's a UI nightmare.
 
And where have Search Document Summaries gone from 2000/XP? That one could
access via Computer Management - Services and Application - Indexing Server
and there was Query forms under this.
 
A couple of things to bear in mind, which may or may not be relevant:

- By default the whole drive is not indexed, only the c:\users portion. So
unless you've changed the options {you can look in the Indexing Options cpl}
this is actually going to be doing a non-indexed search. Which explains
{somewhat} why it would be slower.

- Glad you got content: working.

- Generally you just type to make a search happen, there isn't a button to
start searching. As soon as you press a key we start searching so you don't
have to type the whole of a word you are looking for. The exception to this
is when you have the advanced search checked in which case there is a Search
button. But note that the search button converts your advanced search
options to a query string and puts it in the main search box, which triggers
a search.

- Yes, there isn't a search history unfortunately.

- Clicking the red 'X' should stop a search - if that doesn't work let me
know.

- As far as I can tell *.* works the same as * in filename searches. The
wildcards * and ? do work on other properties also. My co-worker Jonas has a
couple of blog entries which explain some of this:
http://blogs.msdn.com/jonasbar/default.aspx

Dave

To give you an idea I'm searching Drive C with non indexed, hidden etc
looking for a dll with Name = AQS (via field in advanced search) in the
name - name:*aqs* - the search is still going minutes later. Cmd searched
the disk in 1 or 2 seconds, using

dir c:\*aqs*.* /a /s

[and did I say, after making a cup of coffee that search is still
searching - it's when the progress bar starts overwriting the stop button
that is slow.]

There must be some problem - this looking for AQS in a file name is now
over 15 minutes of disk churning. Am I in fact searching for AQS anywhere
in a filename (not ext or path) where AQS may start, be in the middle, or
at the end of the name (or the only string in the name). I have (I haven't
counted for years) over 50,000 documents.

I must have been using contents keyword -content did find both names and
strings of digits (which humans call numbers).

Another 6 minutes go by since my last writing about speed while prograss
bar crawls across the stop button. It only took a minute or so to reach
the stop button (it's 1/2 way across the stop button).

Also there is no go button on Search Field. How does a mouse user initiate
a search?
Why does it delete my search terms when I hit the stop? Where is
autocomplete? Where is a Search History.

Have you noticed since IE4 was released MS has insisted we type rather
than use mouses. I will cut and paste.

Another 5 minutes go by, search seems no closer to finishing. Can't wait
for this to finish.

Dave Wood said:
Yes, "content:" is missing from that doc unfortunately. I believe there's
a more complete MSDN doc in the works but it is not available yet.

The behavior is that by default a search term on its own searches all
properties, including file contents. If you specifically want to search
only file names you can use the "filename:" keyword. If you specifically
want to search only contents you can use the "content:" keyword.

If you change the search options to say "Always search filenames only"
then a search term on its own only searches filenames. If you need to
search contents also, you can use the "content:" keyword {indexed
locations only}.

Is this the behavior you are seeing? If not let us know.

Dave
 
Back
Top