Windows Search

Windows Search, formerly known as Windows Desktop Search (WDS) on Windows XP and Windows Server 2003, is an indexed desktop search platform created by Microsoft for Microsoft Windows.

Overview
Windows Search collectively refers to the indexed search on Windows Vista and later versions of Windows (also referred to as Instant Search ) as well as Windows Desktop Search, a standalone add-on for Windows 2000, Windows XP and Windows Server 2003 made available as freeware. All incarnations of Windows Search share a common architecture and indexing technology and use a compatible application programming interface (API).

Windows Search is the successor of the Indexing Service, a remnant of the Object File System feature of the Cairo project which never materialized. Windows Search uses a different architecture.

Windows Search builds a full-text index of files on a computer. (An add-in for 32-bit Windows XP, Windows Server 2003 and Windows Vista allows network shares to be added to the index. ) The time required for the initial creation of this index depends on the amount and type of data to be indexed, and can take up to several hours, but this is a one-time event. Once a file’s contents have been added to this index, Windows Search is able to use the index to search results more rapidly than it would take to search through all the files on the computer. Searches are performed not only on file names, but also on the contents of the file (provided a proper handler for the file type is installed) as well as the keywords, comments and all other forms of metadata that Windows Search recognizes. For instance, searching the computer for "The Beatles" returns a list of music files on the computer which have "The Beatles" in their song titles, artists or album names, as well as any e-mails and documents that include the phrase "The Beatles" in their titles or contents.

Windows Search features incremental search  search (also known as "search as you type"). It begins searching as soon as characters are entered in the search box, and keeps on refining and filtering the search results as more characters are typed in. This results in finding the required files even before the full search text is entered.

Windows Search supports IFilters, components that enable search programs to scan files for their contents and metadata. Once an appropriate IFilter has been installed for a particular file format, the IFilter is used to extract the text from files which were saved in that format.

Windows Search by default includes IFilters for common filetypes, including Word documents, Excel spreadsheets, PowerPoint presentations, HTML files, text files, MP3 and WMA music files, WMV, ASF and AVI video files and JPEG, BMP and PNG images.

Windows Search uses property handlers to handle metadata from file formats. A property handler needs a property description and a schema for the property for Windows Search to index the metadata. Protocol handlers are used for indexing specific data stores. For example, files are accessed using File System Protocol Handler, Microsoft Office Outlook data stores using the Outlook Protocol Handler and Internet Explorer cache using the IE History/Cache Protocol Handler.

Architecture
Windows Search is implemented as a Windows Service. The search service implements the Windows Search configuration and query APIs and also controls, as all indexing and query components. The most important component of Windows Search is the Indexer, which crawls the file system on initial setup, and then listens for file system notifications to pick up changed files in order to create and maintain the index of data. It achieves this using three processes:
 * 1) SearchIndexer.exe, which hosts the indexes and the list of URIs that require indexing, as well as exposes the external configuration and query APIs that other applications use to leverage the Windows Search features.
 * 2) SearchProtocolHost.exe, which hosts the protocol handlers. It runs with the least permission required for the protocol handler. For example, when accessing filesystem, it runs with the credentials of the system account, but on accessing network shares, it runs with the credentials of the user.
 * 3) SearchFilterHost.exe, which hosts the IFilters and property handlers to extract metadata and textual content. It is a low integrity process, which means that it does not have any permission to change the system settings. So, even if it encounters files with malicious content, and by any chance if they manage to take over the process, they will not be able to change any system settings.

The search service consists of several components, including the Gatherer, the Merger, the Backoff Controller, and the Query Processor, among others. The Gatherer retrieves the list of URIs that need to be crawled and invokes proper protocol handler to access the store that hosts the URI, and then the proper property-handler (to extract metadata) and IFilter to extract the document text. Different indices are created during different runs; it is the job of the Merger to periodically merge the indices. While indexing, the indices are generally maintained in-memory and then flushed to disk after a merge to reduce disk I/O. The metadata is stored in property store, which is a database maintained by the ESE database engine. The text is tokenized and the tokens are stored in a custom database built using Inverted Indices. Apart from the indices and property store, another persistent data structure is maintained: the Gather Queue. The Gather Queue maintains a prioritized queue of URIs that needs indexing. The Backoff Controller mentioned above monitors the available system resources, and controls the rate at which the indexer runs. It has three states:
 * 1) Running: In this state, the indexer runs without any restrictions. The indexer runs in this state only when there is no contention for resources.
 * 2) Throttled: In this state, the crawling of URIs and extraction of text and metadata is deliberately throttled, so that the number of operations per minute are kept under a tight control. The indexer is in this state when there is contention for resources, for example, when other applications are running. By throttling the operations, it is ensured that the other operations are not starved of resources they might need.
 * 3) Backed off: In this state, no indexing is done. Only the Gather Queues are kept active so that items do not go unindexed. This state is activated on extreme resource shortage (less than 5 MB of RAM or 200 MB of disk space), or if indexing is configured to be disabled when the computer is on battery power, or if the indexer is manually paused by the user.

Advanced Query Syntax
Windows Search queries are specified in Advanced Query Syntax (AQS) which supports not only simple text searches but provides advanced property-based query operations as well. AQS defines certain keywords which can be used to refine the search query, such as specifying boolean operations on searched terms (AND, OR, NOT) as well as to specify further filters based on file metadata or file type. It can also be used to limit results from specific information stores like regular files, offline files cache, or email stores. File type specific operators are available as well. WDS also supports wildcard prefix matching searches. It also includes several SQL-like operators like GROUP BY. AQS is locale dependent and uses different keywords in international versions of Windows 7.

Programmability
The Windows Search index can be accessed programmatically using both managed as well as native code. Native code connects to the index catalog by using a Data Source Object retrieved from the Indexing Service OLE DB provider. Managed code use the MSIDXS ADO.NET provider. A catalog on a remote machine can also be queried by specifying a UNC path. The criteria for the search is specified using SQL-like syntax. The SQL query can either be created by hand, or by using an implementation of the  interface. Windows Search provides implementations of the interface to convert an AQS or NQS queries to their SQL counterpart.

The OLE DB/SQL API implements the functionality for searching and querying across the indices and property stores. It uses a variant of SQL in which to represent the query (regular SQL with certain restrictions). Results are returned as OLE DB Rowsets. Whenever a query is executed, the parts of the index it used are temporarily cached so that further searches filtering the result set need not access the disk again, to improve performance. Windows Search stores its index in an Extensible Storage Engine file named  that exists, by default, in the   folder at the root of the system drive in Windows Vista or later versions of Windows. (The corresponding location in Windows XP is  inside the   folder.)

The index store is called SystemIndex and contains all retrievable Windows IPropertyStore values, for indexed items. For example, the name and location of documents in the system is exposed as a table with the column names ''System. ItemName and System. ItemURL'' respectively. A SQL query can directly refer these tables and index catalogues and use the MSIDXS provider to run queries against them. The search index can also be used via OLE DB, using the CollatorDSO provider. However, the OLE DB provider is read-only, supporting only SELECT and GROUP ON SQL statements.

Windows Search also registers a  application protocol, which can be used to represent searches as URIs. The search parameters and filters are encoded in the URI using AQS, or its natural language counterpart, NQS. When the URI is invoked by Explorer, Windows Search (which is the default registered handler for the protocol) launches the Search Explorer with the results of the search. In Windows Vista SP1 or later, third party handlers can also register themselves as the application protocol handler, so that searches can be performed using any search engine which the user has set as default, and not just Windows Search.

The Windows Search service provides the Notifications API component to allow applications to "push" changed items that need indexing to the Windows Search indexer. Applications use the component to supply the URIs of the items that need to be indexed, and the URIs are written to the Gather Queue, where they are read off by the indexer. Microsoft Office Outlook 2007, as well as Microsoft Office OneNote 2007 use this ability to index the items managed by them and use Windows Search queries to provide the in-application searching features. The Notifications API is also used by the internal USN Journal Notifier component of Windows Search, which monitors the Change Journal in an NTFS volume to keep track of files that has changed on the volume. If the file is in a location indexed by Windows Search and does not have the FANCI (File Attribute Not Content Indexed) attribute set, the Windows Search service is notified of its path via the Notification API.

Windows Search Configuration APIs are used to specify the configuration settings, such as the root of the URIs that needs to be monitored, setting the frequency of crawling or viewing status information like number of items indexed or length of the gather queue or the reason for throttling the indexer. It also exposes APIs to register protocol handlers (via the  interface, property handlers (via the   interface) or IFilter implementations (via the   interface).   implementations allow only read-only extraction of text and properties, whereas   allows properties to be written as well.

Windows Desktop Search
Windows Desktop Search is the implementation of Windows Search for Windows XP and Windows Server 2003. Searches are specified using the Advanced Query Syntax and are executed while the user types (incremental find). By default, it comes with a number of IFilters for the most common file types—documents, audio, video as well as protocol handlers for Microsoft Outlook e-mails. Other protocol handlers and IFilters can be installed as needed.

User interface


The Windows Desktop Search functionality is exposed via a Windows Taskbar mounted deskbar. It provides a text field to type the query and the results are presented in a flyout pane. It also integrates as a Windows Explorer window. On selecting a file in the Explorer window, a preview of the file is shown in the right hand side of the window, without opening the application which created the file. Web searches can be initiated from both interfaces, but that will open the browser to search the terms using the default search engine.

The deskbar also has the capability to create application aliases, which are short strings which can be set to open different applications. This functionality is accessed by prefixing the ! character to the predefined string. For example "!calc" opens the Windows Calculator. The help documentation includes syntax for creating application aliases out of any text string, regardless of prefix. This feature can also be used to create shortcut for URLs, which when entered, will open the specified URL in browser. It can also be used to send parametrized information over the URL, which are used to create search aliases. For example, "w text" can be configured to search "text" in Wikipedia.

Releases
Windows Desktop Search was initially released as MSN Desktop Search, as a part of the MSN Toolbar suite. It was re-introduced as Windows Desktop Search with version 2, while still being distributed with MSN Toolbar Suite.

For Windows 2000, Windows XP and Windows Server 2003, it came in two flavors, one for home users and the other for enterprise use. The only difference between the two was that the latter could be configured via group policy. The home edition was bundled with MSN Toolbar, while the other was available as a stand alone application. Later, when MSN Toolbar was discontinued in favor of Windows Live Toolbar, the home edition of Windows Desktop Search was discontinued as well. The last version available for Windows 2000 is Windows Desktop Search 2.66.

For Windows XP and Windows Server 2003, version 3.0 of Windows Desktop Search was provided as a standalone release – separate from Windows Live Toolbar. One of the significant new features is Windows Desktop Search 3.0 also installs the Property System on Windows XP introduced in Windows Vista. Windows Desktop Search 3.0 is geared for pre-Windows Vista users, hence the indexer was implemented as a Windows Service, rather than as a per-user application, so that the same index as well as a single instance of the service can be shared across all users – thereby improving performance. Windows Desktop Search found itself in the midst of a controversy on October 25, 2007 when Windows Desktop Search 3.01 was automatically pushed out and installed on Windows when updated via Windows Server Update Services (WSUS). Microsoft responded with two posts on the WSUS Product Team Blog.

Windows Search
Windows Search is the indexed search platform in Windows Vista, Windows 7 and Windows Server 2008, and offers a superset of the features provided by Windows Desktop Search, while being API compatible with it. Unlike WDS, it can seamlessly search indexed as well as non-indexed locations – for indexed locations the index is used and for non-indexed locations, the property handlers and IFilters are invoked on the fly as the search is being performed. This allows for more consistent results, though at the cost of searching speed over non-indexed locations. Windows Search uses Group Policy for centralized management.

Windows Search indexes offline caches of network shares, in addition to the local file systems, Microsoft Outlook e-mail stores and Microsoft OneNote stores indexed by WDS Windows Search also supports queries against a remote index. This means if the file server, on which a network file share is hosted, is running either Windows Vista or a later version of Windows or Windows Search 4.0 on Windows XP, any searches against the share will be queried against the server's index and present the results to the client system, filtering out the files the user does not have access to. This procedure is transparent to the user.

Unlike Windows Desktop Search on Windows XP, the Windows Search indexer performs the I/O operations with low priority, the process also runs with low CPU priority. As a result, whenever other processes require the I/O bandwidth or processor time, it is able to pre-empt the indexer, thereby significantly reducing the performance hit associated with the indexer running in the background.

Windows Search supports natural language searches; so the user can search for things like "photo taken last week" or "email sent from Dave". However, this is disabled by default. Natural language search expresses the queries in Natural Query Syntax (NQS), which is the natural language equivalent of AQS.

User interface
The search functionality is exposed using the search bars in the Start menu and the upper right hand corner of Windows Explorer windows, as well as Open/Save dialog boxes. When searching from the Start menu, the results are shown in the Start menu itself, overlapping the recently used programs. From the Start menu, it is also possible to launch an application by searching for its executable image name or display name. Searching from the search bars in Explorer windows replaces the content of the current folder with the search results. The Explorer windows can also render thumbnails in the search results if a Thumbnail Handler is registered for a particular file type. It can also render enhanced previews of items in a Preview Pane without launching the default application, if the application has registered a Preview Handler. This can provide functionality such as file type-specific navigation (such a browsing a presentation using next/previous controls, or seeking inside a media file). Preview handlers can also allow certain kind of selections (such as highlighting a text snippet) to be performed from the preview pane itself. In the Control Panel, the search bar in the window can also search for Control Panel options. However, unlike WDS, Windows Search does not support creating aliases.

There is also a Search Explorer, which is an integrated Windows Explorer window that is used for searches. It presents the user interface to specify the search parameters, including locations and file types that should be searched, and certain operators, without crafting the AQS queries by hand. With Windows Vista SP1, third party applications will be able to override the Search Explorer as the default search interface so that the registered third party application will be launched, instead of bringing up the Search Explorer, when invoked by any means.

In Windows Search, which is part of Windows Vista, it is also possible to save a search query as a Virtual Folder, called a Saved Search or Search Folder which, when accessed, runs the search with the saved query and returns the results as a folder listing. Physically, a search folder is just an XML file (with a  extension) which stores the search query (in either AQS or NQS), including the search operators as well. Windows Vista also supports query composition, where a saved search (called a scope) can be nested within the query string of another search. Search Folders are also distributable via RSS. They can also be shared as a SearchMelt, which is accessible over a network. Accessing a SearchMelt over the network, like a regular Search Folder, makes the results of the search available as a virtual shared folder. The search will be performed on the machine which shares the SearchMelt, and will return only the results accessible from the network. However, by default, search folders are scoped for local use only; before sharing, they must be configured for remote access. Microsoft makes a SearchMelt Creator tool available for this as well.

Windows Search 4.0
Windows Search 4.0 is the successor to the Windows Search platform for both Windows Desktop Search 3.0 on Windows XP as well as Instant Search on Windows Vista. It is mainly an update to the indexing components, with few changes to the XP user interface and none on Vista. It also enables remote query support on XP and Windows Server 2003 based systems, which previously was a Vista-only feature. This allows a user with a Vista client (or an XP client with Windows Search 4.0) to search the index of networked machines which are also running a supported operating system (Windows 8, 7, Vista, Windows Server 2008, or XP/2003 with Windows Search 4.0).

The first beta of Windows Search 4.0 was released on March 27, 2008. It included numerous performance improvements to the indexer and brought new features, including previously Vista-exclusive ones, to XP, including Group Policy integration, federation of searches to remote indexes, support for EFS-encrypted files and Vista-style preview handlers that allow document-type specific browsing of documents in the preview pane.

Windows Search 4.0 was released on June 3, 2008 and is supported on XP, Windows Server 2003, Vista, Windows Server 2008 and Windows Home Server.