Chapter 11: Searching and Categorizing Content

The Catalog is Zope's built in search engine. It allows you to categorize and search all kinds of Zope objects. You can also use it to search external data such as relational data, files, and remote web pages. In addition to searching you can use the Catalog to organize collections of objects.

The Catalog supports a rich query interface. You can perform full text searching, and can search multiple indexes at once. In addition, the catalog keeps track of meta-data about indexed objects. Here are the two most common ZCatalog usage patterns:

Mass Cataloging
Cataloging a large collection of objects all at once.
Automatic Cataloging
Cataloging objects as they are created and tracking changes made to them.

Getting started with Mass Cataloging

Let's take a look at how to use the catalog to search documents. Cataloging a bunch of objects all at once is called mass cataloging. Mass cataloging involves three steps:

Choose ZCatalog from the product add list to create a ZCatalog object. This takes you to the ZCatalog add form, as shown in Figure 9-1.

ZCatalog add form

Figure 9-1 ZCatalog add form

The Add form asks you for an Id and a Title. The third form element is the Vocabulary select box. For now, leave this box on "Create one for me". Give your ZCatalog the Id "AnimalTracker" and click Add to create your new catalog. The Catalog icon looks like a folder with a small magnifying glass on it. Select the AnimalTracker icon to see the Contents view of the Catalog.

A ZCatalog looks a lot like a folder, but it has a few more tabs. Six tabs on the ZCatalog are the exact same six tabs you find on a standard folder. ZCatalog have the following views: Contents, Catalog, Properties, Indexes, MetaData, Find Objects, Advanced, Undo, Security, and Ownership. When you click on a ZCatalog, you are on the Contents view. Here, you can add new objects and the ZCatalog will contain them just as any folder does. You should note that containment does not imply that the object is searchable.

Now that you have created a ZCatalog, you can move onto the next step, finding objects and cataloging them. Suppose you have a zoo site with information about animals. To work with these examples, create two DTML Documents that contain information about reptiles and amphibians:

Title: Chilean four-eyed frog
The Chilean four-eyed frog has a bright pair of spots on its rump that look like enormous eyes. When seated, the frog's thighs conceal these eyespots. When predators approach, the frog lowers its head and lifts its rump, creating a much larger and more intimidating head. Frogs are amphibians.
Title: Carpet python
Morelia spilotes variegata averages 2.4 meters in length. It is a medium-sized python with black-to-gray patterns of blotches, crossbands, stripes, or a combination of these markings on a light yellowish-to-dark brown background. Snakes are reptiles.

Visitors to your Zoo want to be able to search for information on the Zoo's animals. Eager herpetologists want to know if you have their favorite snake, so you should provide them with the ability to search for certain words and show all the documents that contain those words. Searching is one of the most useful and common web activities.

The AnimalTracker ZCatalog you created can catalog all of the documents in your Zope site and let your users search for specific words. To catalog your documents, go to the AnimalTracker ZCatalog and click on the Find Objects tab.

In this view, you tell the ZCatalog what kind of objects you are interested in. You want to catalog all DTML Documents so select DTML Document from the Find objects of type multiple selection and click Find and Catalog.

The ZCatalog will now start from the folder where it is located and search for all DTML Documents. It will search the folder and then descend down into all of the sub-folders and their sub-folders. If you have lots and lots of objects, this may take a long time to complete, so be patient.

After a period of time, the Catalog will take you to the Catalog view automatically, with a status message telling you what it just did.

Below the status information is a list of objects that are cataloged, they are all DTML Documents. To confirm that these are the objects you are interested in, you can click on them to visit them.

You have completed the first step of searching your objects, cataloging them into a ZCatalog. Now your documents are in the ZCatalog's database. Now you can move onto the third step, creating a web page and result form to query the ZCatalog.

Below the status information is a list of objects that are cataloged. They are all DTML Documents. To confirm that these are the objects you are interested in, you can click on them to visit them.

You have completed the first step of searching your objects, cataloging them into a ZCatalog. Now your documents are in the ZCatalog's database. Now you can move onto the third step, creating a web page and result form to query the ZCatalog.

Search and Report Forms

To create search and report forms, make sure you are inside the AnimalTracker catalog and select Z Search Interface from the add list. Select the AnimalTracker ZCatalog as the searchable object, as shown in Figure 9-2.

Creating a search form for a ZCatalog

Figure 9-2 Creating a search form for a ZCatalog

Name the Report Id "SearchResults" and the Search Input Id "SearchForm" and click Add. This will create two new DTML Methods in the AnimalTracker ZCatalog named SeachForm and SearchResults.

These objects are contained in the ZCatalog, but they are not cataloged by the ZCatalog. The AnimalTracker has only cataloged DTML Documents. The search Form and Report methods are just a user interface to search the animal documents in the Catalog. You can verify this by noting that the search and report forms are not listed in the Cataloged Objects tab.

To search the AnimalTracker ZCatalog, select the SearchForm method and click on its View tab. This form has a number of elements on it. There is one search element for each index in the ZCatalog. Indexes are explained further in the next section. For now, you want to use the PrincipiaSearchSource form element. You can leave all the other form elements blank.

By typing words into the PrincipiaSearchSource form element you can search all of the documents cataloged by the AnimalTracker ZCatalog. For example, type in the word "Reptiles". The AnimalTracker ZCatalog will be searched and return a simple table of objects that have the word "Reptiles" in them. The search results should include the carpet python. You can also try specifying multiple search terms like "reptile amphibian". Search results for this query should include both the Chilean four-eyed Frog and the carpet python. Congratulations, you have successfully created a catalog, cataloged content into it and searched it through the web.

Configuring Catalogs

The Catalog is capable of much more powerful and complex searches than the one you just performed. Let's take a look at how the Catalog stores information. This will help you tailor your catalogs to provide the sort of searching you want.

Defining Indexes

ZCatalogs store information about objects and their contents in fast databases called indexes. Indexes can store and retrieve large volumes of information very quickly. You can create different kinds of indexes that remember different kinds of information about your objects. For example, you could have one index that remembers the text content of DTML Documents, and another index that remembers any objects that have a specific property.

When you search a ZCatalog you are not searching through your objects one by one. That would take far too much time if you had a lot of objects. Before you search a ZCatalog, it looks at your objects and remembers whatever you tell it to remember about them. This process is called indexing. From then on, you can search for certain criteria and the ZCatalog will return objects that match the criteria you provide.

A good way to think of an index in a ZCatalog is just like an index in a book. For example, in a book's index you can look up the word Python:

        Python: 23, 67, 227

The word Python appears on three pages. Zope indexes work like this except that they map the search term, in this case the word Python, to a list of all the objects that contain it, instead of a list of pages in a book.

In Zope 2.4, indexes can be added and removed from a Catalog using a new, "pluggable" index interface as shown in Figure 9-3:

Managing indexes

Figure 9-3 Managing indexes

Here, you can see that ZCatalogs come with some predefined indexes. Each index has a name, like PrincipiaSearchSource, and a type, like TextIndex.

When you catalog an object the Catalog uses each index to examine the object. The catalog consults attributes and methods to find an object's value for each index. For example, in the case of the DTML Documents cataloged with a PrincipiaSearchSource index, the Catalog calls each document's PrincipiaSearchSource method and records the results in its PrincipiaSearchSource index. If the Catalog cannot find an attribute or method for an index, then it ignores it. In other words it's fine if an object does not support a given index. There are four kinds of indexes:

TextIndex
Searches text. Use this kind of index when you want a full-text search.
FieldIndex
Searches objects for specific values. Use this kind of index when you want to search date objects, numbers, or specific strings.
KeywordIndex
Searches collections of specific values. This index is like a FieldIndex, but it allows you to search collections rather than single values.
PathIndex
Searches for all objects that contain certain URL path elements. For example, you could search for all the objects whose paths begin with /Animals/Zoo.

We'll examine these different indexes more closely later in the chapter. New indexes can be created from the Indexes view of a ZCatalog. There, you can enter the name and select a type for your new index. This creates a new empty index in the ZCatalog. To populate this index with information, you need to Go to the Advanced view and click the the Update Catalog button. Recataloging your content may take a while if you have lots of cataloged objects.

To remove an index from a Catalog, select the Indexes and click on the Delete button. This will delete the index and all of its indexed content. As usual, this operation is undoable.

Defining Meta Data

The ZCatalog can not only index information about your object, but it can also store information about your object in a tabular database called the Meta-Data Table. The Meta-Data Table works similarly to a relational database table, it consists of one or more columns that define the schema of the table. The table is filled with rows of information about cataloged objects. These rows can contain information about cataloged objects that you want to store in the table. Your meta data columns don't need to match your Catalog's indexes. Indexes allow you to search; meta-data allows you to report search results.

The Meta-Data Table is useful for generating search reports. It keeps track of information about objects that goes on your report forms. For example, if you create a Meta-Data Table column called absolute_url, then your report forms can use this information to create links to your objects that are returned in search results.

To add a new Meta-Data Table column, type in the name of the column on the Meta-Data Table view and click Add. To remove a column from the Meta-Data Table, select the column check box and click on the Delete button. This will delete the column and all of its content for each row. As usual, this operation is undoable. Next let's look more closely at how to search a Catalog.

Searching Catalogs

You can search a Catalog by passing it search terms. These search terms describe what you are looking for in one or more indexes. The Catalog can glean this information from the web request, or you can pass this information explicitly from DTML or Python. In response to a search request, a Catalog will return a list of records corresponding to the cataloged objects that match the search terms.

Searching with Forms

In this chapter you used the Z Search Interface to automatically build a Form/Action pair to query a Catalog (the Form/Action pattern is discussed in Chapter 4, "Dynamic Content with DTML"). The Z Search Interface builds a very simple form and a very simple report. These two methods are a good place to start understanding how Catalogs are queried and how you can customize and extend your search interface.

Suppose you have a catalog that holds news items. Each news item has contents, an author and a date. Your catalog has three indexes that correspond to these attributes. The contents index is a text index, and the author and date indexes are field indexes. Here is the search form that would allow you to query such a catalog:

        <dtml-var standard_html_header>

        <form action="Report" method="get">
        <h2><dtml-var document_title></h2>
        Enter query parameters:<br><table>

        <tr><th>Content</th>
            <td><input name="content" width=30 value=""></td></tr>
        <tr><th>Author</th>
            <td><input name="author" width=30 value=""></td></tr>
        <tr><th>Date</th>
            <td><input name="date"  width=30 value=""></td></tr>

        <tr><td colspan=2 align=center>
        <input type="SUBMIT" value="Submit Query">
        </td></tr>
        </table>
        </form>

        <dtml-var standard_html_footer>

This form consists of three input boxes named content, author, and date. These names of the input form elements match the names of the indexes in the catalog. These names must match the names of the catalog's indexes for the catalog to find the search terms. Here is a report form that works with the search form:

        <dtml-var standard_html_header>

        <table>
          <dtml-in NewsCatalog>
          <tr>
            <td><dtml-var author></td>
            <td><dtml-var date></td>
          </tr>
          </dtml-in>
        </table>

        <dtml-var standard_html_footer>

There are a few things going on here which merit closer examination. The heart of the whole thing is the in tag.:

        <dtml-in NewsCatalog>

This tag calls the NewsCatalog Catalog. Notice how the form parameters from the search form (content, author, date) are not mentioned here at all. Zope automatically makes sure that the query parameters from the search form are given to the Catalog. All you have to do is make sure the report form calls the Catalog. Zope locates the search terms in the web request and passes them to the Catalog.

The Catalog returns a sequence of Record Objects (just like ZSQL Methods). These record objects correspond to search hits, which are objects that match the search criteria you typed in. For a record to match a search, it must match all criteria for each specified index. So if you enter an author and some search terms for the contents, the Catalog will only return records that match both the author and the contents.

Record objects had an attribute for every column in the database table. Record objects for Catalogs work very similarly, except that a Catalog Record object has an attribute for every column in the Meta-Data Table. In fact, the purpose of the Meta-Data Table is to define the schema for the Record objects that Catalog queries return.

Searching from Python

DTML makes querying a Catalog from a form very simple. For the most part, DTML will automatically make sure your search parameters are passed properly to the Catalog.

Sometimes though you may not want to search a Catalog from a web form; some other part of your application may want to query a Catalog. For example, suppose you want to add a sidebar to the Zope Zoo that shows news items that only relate to the animals in the section of the site that you are currently looking at. As you've seen, the Zope Zoo site is built up from Folders that organize all the sections according to animal. Each Folder's id is a name that specifies the group or animal the folder contains. Suppose you want your sidebar to show you all the news items that contain the id of the current section. Here is a Script called relevantSectionNews that queries the news Catalog with the currentfolder's id:

        ## Script (Python) "relevantSectionNews"
        ##
        """ Returns news relevant to the current folder's id """
        id=context.getId()
        return context.NewsCatalog({'content' : id})

This script queries the NewsCatalog by calling it like a method. Catalog's expect a mapping as the first argument when they are called. The argument maps the name of an index to the search terms you are looking for. In this case, the content index will be queried for all news items that contain the name of the current Folder. To use this in your sidebar, just edit the Zope Zoo's standard_html_header to use the relevantSectionNews script:

        <html>
        <body>
        <dtml-var style_sheet>
        <dtml-var navigation>
        <ul>
        <dtml-in relevantSectionNews>
          <li><a href="&dtml-absolute_url;"><dtml-var title></a></li>
        </dtml-in>
        </ul>

This method assumes that you have defined absolute_url and title as meta-data columns in the news Catalog. Now, when you are in a particular section, the sidebar will show a simple list of links to news items that contain the id of the current animal section you are viewing.

Searching and Indexing Details

Earlier you saw that the Catalog supports three types of indexes, text indexes, field indexes and keyword indexes. Let's examine these indexes more closely to understand what they are good for and how to search them.

Searching Text Indexes

A Text Index is used to index text. After indexing, you can search the index for objects that contain certain words. Text Indexes support a rich search grammar for doing more advanced searches than just looking for a word. ZCatalog's Text Index can:

All of these advanced features can be mixed together. For example, "((bob AND uncle) AND NOT Zoo*)" will return all objects that contain the terms "bob" and "uncle" but will not include any objects that contain words that start with "Zoo" like "Zoologist", "Zoology", or "Zoo" itself.

Querying a TextIndex with these advanced features works just like querying it with the original simple features. In the HTML search form for DTML Documents, for example, you could enter "Koala AND Lion" and get all documents about Koalas and Lions. Querying a TextIndex from Python with advanced features works much the same; suppose you want to change your relevantSectionNews Script to not include any news items that contain the word "catastrophic":

        ## Script (Python) "relevantSectionNews"
        ##
        """ Returns relevant, non-catastropic news """"
        id=context.getId()
        return context.NewsCatalog(
                 {'content' : id + ' AND NOT catastrophic'}
                )

TextIndexes are very powerful. When mixed with the Automatic Cataloging pattern described later in the chapter, they give you the ability to automatically free-text search all of your objects as you create and edit them.

Vocabularies

Vocabularies are used by text indexes. A vocabulary is an object that manages language specific text indexing options. In order for the ZCatalog to work with any kind of language, it must understand certain behaviors of that language. For example, all languages:

Current Vocabularies

There are a number of vocabularies currently available for ZCatalog:

Plain Vocabularies
Plain vocabularies are very simple and do minimal English language specific tasks.
Globbing Vocabularies
Globbing vocabularies are more complex vocabularies that allow wild card searches on English text to be performed. The down side of them is that they consume a lot more memory and database space than plain vocabularies.

The idea behind Vocabularies is to customize the way text in any language is indexed. Because of this, other languages may be supported in the future by people who create a Vocabulary specific to their language. Creating your own Vocabulary is an advanced topic, and beyond the scope of this book.

Using Vocabularies

When you create a new ZCatalog, the ZCatalog add form has a select box for you to choose a vocabulary to use. If you do not select a vocabulary, the ZCatalog automatically creates a Plain Vocabulary for you, and adds it to the ZCatalog's contents (this can be seen on the Contents view of the AnimalTracker you created for the examples in this chapter).

To use a Globbing Vocabulary or any other kind of Vocabulary, you must create it first before you create the Catalog you want to use it on. A ZCatalog can use any Vocabulary inside its contents or any Vocabulary that it can find above it in the Zope Folder hierarchy.

Searching Field Indexes

FieldIndexes differ slightly from TextIndexes. A TextIndex will treat the value it finds in your object, for example the contents of a News Item, like text. This means that it breaks the text up into words and indexes all the individual words.

A FieldIndex does not break up the value it finds. Instead, it indexes the entire value it finds. This is very useful for tracking objects that have traits with fixed values.

In the news item example, you created two FieldIndexes, date and author. With the existing search form, these fields are not very useful. To use them more effectively you have to customize your search form a little. Before doing that though, let's consider some use cases for these indexes.

The date index lets you search for News Items by the time they were created. The existing search form is not very useful though because you have to type in exactly the time you were looking for, right down to the second, in the text box to get any hits. This is obviously not very useful. It would be better to search for a range of dates, like all of the News Items added in the last 24 hours, or all of the next Items from last month.

The author index lets you search for News Items by certain authors. Unless you know exactly the name of the author you are looking for though, you will not get any results. It would be better to be able to select from a list of all the unique authors indexed by the author index.

FieldIndexes are designed to do both range searching and searching for a unique value in the index. To take advantage of these features, you need only change your search form a little bit. Let's try the first example, range searching with dates.

Like TextIndexes, FieldIndexes can be passed special options to enable these features. These special features need to be passed in as form elements that get turned into Catalog queries. Here is the search form used in the previous section Searching with Forms, but with some new form elements added to enable searching for News Items modified since "Yesterday", "Last Week", "Last Month", "Last Year" or "Ever":

        <dtml-var standard_html_header>

        <form action="Report" method="get">
        <h2><dtml-var document_title></h2>
        Search for News Items:<br><table>

        <tr><th>Content</th>
            <td><input name="content" width=30 value=""></td></tr>
        <tr><th>Author</th>
            <td><input name="author" width=30 value=""></td></tr>
        <tr>
          <td><p>modified since:</p></td>
          <td>
            <input type="hidden" name="date_usage" value="range:min">
            <select name="date:date">
              <option value="<dtml-var expr="ZopeTime(0)" >">Ever</option> 
              <option value="<dtml-var expr="ZopeTime() - 1" >">Yesterday</option>
              <option value="<dtml-var expr="ZopeTime() - 7" >">Last Week</option>
              <option value="<dtml-var expr="ZopeTime() - 30" >">Last Month</option>
              <option value="<dtml-var expr="ZopeTime() - 365" >">Last Year</option>
            </select>
          </td>
        </tr>

        <tr><td colspan=2 align=center>
        <input type="SUBMIT" value="Submit Query">
        </td></tr>
        </table>
        </form>
        <dtml-var standard_html_footer>

This should make your search form look like Figure 9-4.

Range searching by Date

Figure 9-4 Range searching by Date

This HTML form changes the date format from the old search form. Instead of just a text box, it offers you a selection box where you can choose a date. But remember, this is a range search. Can you spot the part that tells the date FieldIndex to search by range? Here it is:

        <input type="hidden" name="date_usage" value="range:min">

This is a special kind of HTML form element called a hidden element. It does not show up anywhere on the search form that you look at, but it is still passed into Zope when you submit the form. This special element, called date_usage tells the date FieldIndex that the value in the date form element is a minimum range boundary. This means that the FieldIndex will not just return objects that have that date, but it will return objects that have that date or any later date.

Any kind of FieldIndex can be told what kind of range specifiers to use by adding an additional search argument that suffixes the index name with "_usage". In addition to specifying a minimum range boundary, you specify a maximum range boundary by changing the hidden form element to:

        <input type="hidden" name="date_usage" value="range:max">

This will cause the search form to return all News Items modified before the specified date, instead of after.

The "_usage" syntax can also be used when calling a Catalog directly from a script, like this Script, relevantRecentSectionNews:

        ## Script (Python) "relevantRecentSectionNews"
        ##
        """ Return relevant, and recent, news for this section """ 
        id=context.getId()
        return context.NewsCatalog(
                 {'content'    : id,
                  'date'       : ZopeTime() - 7,
                  'date_usage' : 'range:min',
                 } 
                )

This works just like your old relevantSectionNews script, except that it only shows news items created in the last week.

You can also supply both a minimum and maximum range boundary. There's one catch to this, however. Normally if you specify no range boundary or just one boundary, ZCatalog uses the value you pass in as the search term. But when you provide two range boundaries, the ZCatalog needs two values, not one. Here is the relevantRecentSectionNews Script above with some slight modification to provide a list of date objects instead of just one:

        ## Script (Python) "relevantRecentSectionNews"
        ##
        """ 
        Return relevant news modified in the last month, but not the
        last week
        """
        id=context.getId()
        return context.NewsCatalog(
                 {'content'    : id,
                  'date'       : [ZopeTime() - 30, ZopeTime() - 7],
                  'date_usage' : 'range:min:max',
                 } 
                )

This script will return all of the relevant News Items modified in the last month, but not in the last week. When using two range specifiers, it is important to make sure you get the order of the values to correctly match the order of the range specifiers. If you were to accidentally switch the "min" and "max" around, but didn't switch around the two dates, then you will get no search results because you are making a query that doesn't make sense (providing a minimum value that is larger than the maximum value).

The second use case you considered above was being able to search from a list of all unique authors. There is a special method on the ZCatalog that does exactly this called uniqueValuesFor. The uniqueValuesFor method returns a list of unique values for a certain index. Let's change your search form yet again, and replace the original author input box with something a little more useful:

        <dtml-var standard_html_header>

        <form action="Report" method="get">
        <h2><dtml-var document_title></h2>
        Search for News Items:<br><table>

        <tr><th>Content:</th>
            <td><input name="content" width=30 value=""></td></tr>
        <tr valign="top">
           <td><p>Author:</p></td>

           <td>
             <select name="author:list" size=6 MULTIPLE>
             <dtml-in expr="AnimalTracker.uniqueValuesFor('author')">
               <option value="<dtml-var sequence-item>">
               <dtml-var sequence-item></option>
             </dtml-in>
             </select>
           </td>
         </tr>

        <tr>
          <td><p>modified since:</p></td>
          <td>
            <input type="hidden" name="date_usage" value="range:min">
            <select name="date:date">
              <option value="<dtml-var "ZopeTime(0)" >">Ever</option> 
              <option value="<dtml-var "ZopeTime() - 1" >">Yesterday</option>
              <option value="<dtml-var "ZopeTime() - 7" >">Last Week</option>
              <option value="<dtml-var "ZopeTime() - 30" >">Last Month</option>
              <option value="<dtml-var "ZopeTime() - 365" >">Last Year</option>
            </select>
          </td>
        </tr>

        <tr><td colspan=2 align=center>
        <input type="SUBMIT" name="SUBMIT" value="Submit Query">
        </td></tr>
        </table>
        </form>
        <dtml-var standard_html_footer>

The new, important bit of code added to the search form is:

        <select name="author:list" size=6 MULTIPLE>
        <dtml-in expr="AnimalTracker.uniqueValuesFor('author')">
          <option value="<dtml-var sequence-item>">
          <dtml-var sequence-item></option>
        </dtml-in>
        </select>

The HTML was also changed a bit to make the on-screen presentation make sense.

In this example, you are changing the form element author from just a simple text box to an HTML multiple select box. This box contains a unique list of all the authors that are indexed in the author FieldIndex. Now, your search form should look like Figure 9-5.

Range searching and unique Authors

Figure 9-5 Range searching and unique Authors

That's it. You can continue to extend this search form using HTML form elements to be as complex as you'd like. In the next section, we'll show you how to use the next kind of index, keyword indexes.

Searching Keyword Indexes

A KeywordIndex indexes a sequence of keywords for objects and can be queried for any objects that have one or more of those keywords.

Suppose that you have a number of Image objects that have a topics property. The topics property is a lines property that lists the relevant topics for a given Image, for example, "Portraits", "19th Century", and "Women" for a picture of Queen Victoria.

The topics provide a way of categorizing Images. Each Image can belong in one or more categories depending on its topics property. For example, the portrait of Queen Victoria belongs to three categories and can thus be found by searching for any of the three terms.

You can use a KeyWord index to search the topics property. Define a KeyWord index with the name topics on your ZCatalog. Then catalog your Images. Now you should be able to find all the Images that are portraits by creating a search form and searching for "Portraits" in the topics field. You can also find all pictures that represent 19th Century subjects by searching for "19th Century".

It's important to realize that the same Image can be in more than one category. This gives you much more flexibility in searching and categorizing your objects than you get with a field index. Using a field index your portrait of Queen Victoria can only be categorized one way. Using a keyword index it can be categorized a couple different ways.

Often you will use a small list of terms with KeyWord indexes. In this case you may want to use the uniqueValuesFor method to create a custom search form. For example here's a snippet of DTML that will create a multiple select box for all the values in the topics index:

        <select name="topics:list" multiple>
        <dtml-in expr="uniqueValuesFor('topics')">
          <option value="&dtml-sequence-item;"><dtml-var sequence-item></option>
        </dtml-in>
        </select>

Using this search form you can provide users with a range of valid search terms. You can select as many topics as you want and Zope will find all the Images that match one or more of your selected topics. Not only can each object have several indexed terms, but you can provide several search terms and find all objects that have one or more of those values.

Searching Path Indexes

Path indexes allow you to search for objects based on their location in Zope. Suppose you have an object whose path is /zoo/animals/Africa/tiger.doc. You can find this object with the path queries: /zoo, or /zoo/animals, or /zoo/animals/Africa. In other words, a path index allows you to find objects within a given folder (and below).

If you place related objects within the same folders, you can use path indexes to quickly located these objects. For example:

        <h2>Lizard Pictures</h2>

        <p>
        <dtml-in expr="Catalog(meta_type='Image',
                               path='/Zoo/Animals/Lizard')">
        <a href="&dtml-absolute_url;"><dtml-var title></a>
        </dtml-in>
        </p>

This query searches a catalog for all images that are located within the /Zoo/Animals/Lizard folder and below. It creates a link to each image.

Depending on how you choose to arrange objects in your site, you may find that a path indexes are more or less effective. If you locate objects without regard to their subject (for example, if objects are mostly located in user "home" folders) then path indexes may be of limited value. In these cases, key word and field indexes will be more useful.

Advanced Searching with Records

A new feature in Zope 2.4 is the ability to query indexes more precisely using record objects. Record objects contain information about how to query an index. Records are Python objects with attributes, or mappings. Different indexes support different record attributes.

Keyword Index Record Attributes

query
Either a sequence of words or a single word. (mandatory)
operator
Specifies whether all keywords or only one need to match. Allowed values: and, or. (optional, default: 'or')

For example:

        # big or shiny
        results=Catalog(categories=['big, 'shiny'])

        # big and shiny
        results=Catalog(categories={'query':['big','shiny'], 
                                             'operator':'and'})

The second query matches objects that have both the keywords "big" and "shiny". Without using the record syntax you can only match objects that are big or shiny.

Field Index Record Attributes

query
Either a sequence of objects or a single value to be passed as query to the index (mandatory)
range
Defines a range search on a Field Index (optional, default: not set).

Allowed values:

min
Searches for all objects with values larger than the minimum of the values passed in the query parameter.
max
Searches for all objects with values smaller than the maximum of the values passed in the query parameter.
minmax
Searches for all objects with values smaller than the maximum of the values passed in the query parameter and larger than the minimum of the values passwd in the query parameter.

For example:

        # items modified in the last week
        results=Catalog(bobobase_modification_time={
                          'query':DateTime() - 7,
                          'range': 'min'}
                        )

This query matches objects with a bobobase_modification_time of less than DateTime() -7. Compare this query with one defined in relevantRecentSectionNews earlier in this chapter which uses date_usage to accomplish the same query.

Text Index Record Attributes

query
Either a sequence of words (seperated by white space) or a single word to be passed as query to the index. (mandatory)
operator
Specifies how to combine the search terms. (optional, default: 'or').

Allowed values:

and
All terms must be present.
or
At least one term must be present.
andnot
The first term must be present, but none of the rest of the terms.

There's not much reason to use record queries with text indexes since you can embed the operator information in the query string itself in a very flexible manner.

Path Index Record Attributes

query
Path to search for either as a string (e.g. "/Zoo/Birds") or list (e.g. ["Zoo", "Birds"]). (mandatory)
level
The path level to begin searching at. (optional, default: '0')

Suppose you have a collection of objects with these paths:

  1. /aa/bb/aa
  2. /aa/bb/bb
  3. /aa/bb/cc
  4. /bb/bb/aa
  5. /bb/bb/bb
  6. /bb/bb/cc
  7. /cc/bb/aa
  8. /cc/bb/bb
  9. /cc/bb/cc

Here are some examples queries and their results to show how the level attribute works:

You can use the level attribute to flexibly search different parts of the path.

As of Zope 2.4.1, you can also include level information in a search without using a record. Simply use a tuple containing the query and the level. Here's an example tuple: ("/aa/bb", 1).

Creating Records in HTML

You can also perform record queries using HTML forms. Here's an example showing how to create a search form using records:

        <form action="Report" method="get">
        <table>
        <tr><th>Search Terms (must match all terms)</th>
            <td><input name="content.query:record" width=30 value=""></td></tr>
            <input type="hidden" name="content.operator:record" value="and">
        <tr><td colspan=2 align=center>
        <input type="SUBMIT" value="Submit Query">
        </td></tr>
        </table>
        </form>

For more information on creating records in HTML see the section "Passing Parameters to Scripts" in Chapter 10, Advanced Zope Scripting.

Stored Queries

While the main use of the Catalog is to provide interactive searching, you can also use stored queries to categorize and organize your site. For example, in the section on keyword indexes you saw how you can use the Catalog and properties to search for categories of Images such as portraits. In addition to providing interactive searching for categories of Images you can create web pages with canned queries. So for example, here's some DTML that you could use for a page that displays all your portraits:

      <dtml-var standard_html_header>

      <h1>Portraits</h1>

      <dtml-in expr="ImageCatalog({'topics':'Portraits'})">
      <p> 
      <dtml-var sequence-item>
      <dtml-var title_or_id>
      </p>
      </dtml-in>

      <dtml-var standard_html_footer>

The dynamic nature of this page is not visible to the viewer. However, just add another portrait, update the catalog and this page will automatically include the new Image.

This technique can be very powerful. Not only can you organize and display public resources, but you can easily institute workflow systems by tagging objects with properties to indicate their state and cataloging them. After that it's easy for you to create pages for different people that show which objects need their attention. This technique is even more powerful when using the Automatic Cataloging pattern.

Automatic Cataloging

Automatic Cataloging is an advanced Catalog usage pattern that keeps objects up to date as they are changed. It requires that as objects are created, changed, and destroyed, they are automatically tracked by a ZCatalog. This usually involves the objects notifying the Catalog when they are created, changed, or deleted.

This usage pattern has a number of advantages in comparison to mass cataloging. Mass cataloging is simple but has drawbacks. The total amount of content you can index in one transaction is equivalent to the amount of free virtual memory available to the Zope process, plus the amount of temporary storage the system has. In other words, the more content you want to index all at once, the better your computer hardware has to be. Mass cataloging works well for indexing up to a few thousand objects, but beyond that automatic indexing works much better.

Another major advantage of automatic cataloging is that it can handle objects that change. As objects evolve and change, the index information is always current, even for rapidly changing information sources like message boards.

In this section, we'll show you an example that creates "news" items thatpeople can add to your site. These items will get automatically cataloged. This example consists of two steps:

As of Zope 2.3, none of the "out-of-the-box" Zope objects support automatic cataloging. This is for backwards compatibility reasons. For now, you have to define your own kind of objects that can be cataloged automatically. One of the ways this can be done is by defining a ZClass.

A ZClass is a Zope object that defines new types of Zope objects. In a way, a ZClass is like a blueprint that describes how new Zope objects are built. Consider a news item as discussed in examples earlier in the chapter. News items not only have content, but they also have specific properties that make them news items. Often these Items come in collections that have their own properties. You want to build a News site that collects News Items, reviews them, and posts them online to a web site where readers can read them.

In this kind of system, you may want to create a new type of object called a News Item. This way, when you want to add a new news item to your site, you just select it from the product add list. If you design this object to be automatically cataloged, then you can search your news content very powerfully. In this example, you will just skim a little over ZClasses, which are described in much more detail in Chapter 14, "Extending Zope."

New types of objects are defined in the Products section of the Control Panel. This is reached by clicking on the Control Panel and then clicking on Product Management. Products contain new kinds of ZClasses. On this screen, click "Add" to add a New product. You will be taken to the Add form for new Products.

Name the new Product "News" and click "Generate". This will take you back to the Products Management view and you will see your new Product.

Select the News Product by clicking on it. This new Product looks a lot like a Folder. It contains one object called Help and has an Add menu, as well as the usual Folder "tabs" across the top. To add a new ZClass, pull down the Add menu and select ZClass. This will take you to the ZClass add form, as shown in Figure 9-6.

ZClass add form

Figure 9-6 ZClass add form

This is a complicated form which will be explained in much more detail in Chapter 14, "Extending Zope". For now, you only need to do three things to create your ZClass:

When you're done, don't change any of the other settings in the Form. To create your new ZClass, click Add. This will take you back to your News Product. Notice that there is now a new object called NewsItem as well as several other objects. The NewsItem object is your new ZClass. The other objects are "helpers" that you will examine more in Chapter 14, "Extending Zope".

Select the NewsItem ZClass object. Your view should now look like Figure 9-7.

A ZClass Methods View

Figure 9-7 A ZClass Methods View

This is the Methods View of a ZClass. Here, you can add Zope objects that will act as methods on your new type of object. Here, for example, you can create DTML Methods or Scripts and these objects will become methods on any new News Items that are created. Before creating any methods however, let's review the needs of this new "News Item" object:

News Content
The news Item contains news content, this is its primary purpose. This content should be any kind of plain text or marked up content like HTML or XML.
Author Credit
The News Item should provide some kind of credit to the author or organization that created it.
Date
News Items are timely, so the date that the item was created is important.
Keywords
News Items fit into various lists of categories. By convention, these lists of categories are often called keywords.

You may want your new News Item object to have other properties, these are just suggestions. To add new properties to your News Item click on the Property Sheets tab. This takes you to the Property Sheets view.

Properties are added to new types of objects in groups called Property Sheets. Since your object has no property sheets defined, this view is empty. To add a New Property Sheet, click Add Common Instance Property Sheet, and give the sheet the name "News". Now click Add. This will add a new Property Sheet called News to your object. Clicking on the new Property Sheet will take you to the Properties view of the News Property Sheet, as shown in Figure 9-8.

The properties screen for a Property Sheet

Figure 9-8 The properties screen for a Property Sheet

This view is almost identical to the Properties view found on Folders and other objects. Here, you can create the properties of your News Item object. Create three new properties in this form:

content
This property's type should be text. Each newly created News Item will contain its own unique content property.
author
This property's type should be string. This will contain the name of the news author.
date
This property's type should be date. This will contain the time and date the news item was last updated. A date property requires a value, so for now you can enter the string "01/01/2000".

That's it! Now you have created a Property Sheet that describes your News Items and what kind of information they contain. Properties can be thought of as the data that an object contains. Now that we have the data all set, you need to create an interface to your new kind of objects. This is done by creating new Views for your object.

Click on the Views tab. This will take you to the Views view, as shown in Figure 9-9.

The Views view

Figure 9-9 The Views view

Here, you can see that Zope has created three default Views for you. These views will be described in much more detail in Chapter 14, "Extending Zope", but for now, it suffices to say that these views define the tabs that your objects will eventually have.

To create a new view, use the form at the bottom of the Views view. Create a new View with the name "News" and select "propertysheets/News/manage" from the select box and click Add. This will create a new View on this screen under the original three Views, as shown in Figure 9-10.

The new News View

Figure 9-10 The new News View

Since this View is going to give us the ability to edit the News Item, we want to make it the first view that you see when you select a News Item object. To change the order of the views, select the newly created News view and click the First button. This should move the new view from the bottom to the top of the list.

The final step in creating a ZClass is defining the methods for the class. Methods are defined on the Methods View. Click on the Methods tab and you will be taken to the Methods view. Select 'DTML Method' from the add list and add a new DTML Method with the id "index_html". This will be the default view of your news item. Add the following DTML to the new method:

      <dtml-var standard_html_header>

      <h1>News Flash</h1>

      <p><dtml-var date></p>

      <p><dtml-var author></p>

      <P><dtml-var content></p>

      <dtml-var standard_html_footer>

That's it! You've created your own kind of object called a News Item. When you go to the root folder, you will now see a new entry in your add list.

But don't add any new News Items yet, because the second step in this exercise is to create a Catalog that will catalog your new News Items. Go to the root folder and create a new catalog with the id Catalog.

Like the previous two examples of using a ZCatalog, you need to create Indexes and a Meta-Data Table that make sense for your objects. First, delete the default indexes in the new ZCatalog and create the following indexes to replace them:

content
This should be a TextIndex. This will index the content of your News Items.
title
This should be a TextIndex. This will index the title of your News Items.
author
This should be a FieldIndex. This will index the author of the News Item.
date
This should be a FieldIndex. This will index the date of the News Item.

After creating these Indexes, delete the default Meta-Data columns and add these columns to replace them:

After creating the Indexes and Meta-Data Table columns, create a search interface for the Catalog using the Z Search Interface tool described previously in this chapter.

Now you are ready to go. Start by adding some new News Items to your Zope. Go anywhere in Zope and select News Item from the add list. This will take you to the add Form for News items.

Give your new News Item the id "KoalaGivesBirth" and click Add. This will create a new News Item. Select the new News Item.

Notice how it has four tabs that match the four Views that were in the ZClass. The first View is News, this view corresponds to the News Property Sheet you created in the News Item ZClass.

Enter your news in the contents box:

      Today, Bob the Koala bear gave birth to little baby Jimbo.

Enter your name in the Author box, and today's date in the Date box.

Click Change and your News Item should now contain some news. Because the News Item object is CatalogAware, it is automatically cataloged when it is changed or added. Verify this by looking at the Cataloged Objects tab of the ZCatalog you created for this example.

The News Item you added is the only object that is cataloged. As you add more News Items to your site, they will automatically get cataloged here. Add a few more items, and then experiment with searching the ZCatalog. For example, if you search for "Koala" you should get back the KoalaGivesBirth News Item.

At this point you may want to use some of the more advanced search forms that you created earlier in the chapter. You can see for example that as you add new News Items with new authors, the authors select list on the search form changes to include the new information.

Conclusion

The cataloging features of ZCatalog allow you to search your objects for certain attributes very quickly. This can be very useful for sites with lots of content that many people need to be able to search in an efficient manner.

Searching the ZCatalog works a lot like searching a relational database, except that the searching is more object-oriented. Not all data models are object-oriented however, so in some cases you will want to use the ZCatalog, but in other cases you may want to use a relational database. The next chapter goes into more details about how Zope works with relational databases, and how you can use relational data as objects in Zope.