Plone搜索-用catalog搜索对象
Plone搜索-用catalog搜索对象
http://www.315ok.org/blogfolder/99
http://www.315ok.org/logo.png
Plone搜索-用catalog搜索对象
Plone搜索-用catalog搜索对象
So far, we have discussed the object graph of the ZODB. We can walk this graph,
using getattr() to retrieve acquisition-wrapped objects, and we can scan folderish
objects using methods such as objectIds(), which returns a list of all the IDs in that
container, and objectValues(), which returns a list of contained objects.
Walking the entire ZODB every time we want to find an object is not ideal. In
particular, functions such as objectValues() should be avoided if possible, because
they can wake up a large number of objects from the ZODB—unpickling them, and
swapping them into the ZODB cache. Waking up objects is a relatively slow process,
and waking up lots of objects will make your code very slow.
Zope mitigates this problem with ZCatalogs—relational database-like tables of
objects. In Plone, there is a ZCatalog called portal_catalog in the root of the site,
which indexes all content objects. You can get a hold of it using:
The catalog is configured with indexes—which can be used to search various
attributes of indexed objects—and metadata—copies of certain attributes that can
be examined without fetching the underlying content object. Sometimes, the same
attribute is used both in an index, and as metadata. You can think of an index as
something you use to find an object, and a metadata item as something you use to
inspect the search results.
Where an object does not provide a particular attribute, the value of any
corresponding metadata item may be None (or it could be acquired from a parent).
Note that if too many attributes are listed in the metadata table, the catalog will grow
in size, and become slower, counteracting the efficiency benefits of using metadata
instead of fetching full objects.
To see the complete list of metadata columns, look at portal_catalog in the ZMI,
under the Metadata tab. Indexes are listed on the Indexes tab. We will see an
example of adding additional indexes and metadata columns via the catalog.xml
import step in the next chapter.
Plone's catalog has an implicit search parameter that ensures that only those
content objects that are viewable by the current user are returned, and since
non-content objects (e.g. CMF tools found in the portal root) are not indexed,
they will not be found when searching the catalog. However, if you want to
find objects that are not viewable by the current user regardless, you can use the
unrestrictedSearchResults() function of portal_catalog.
When we search the catalog, it returns a lazy list of items known as brains. Catalog
brains have attributes consisting of values of the various columns in the metadata
table. Brains also contain a few useful methods for inspecting the object that was
cataloged. Most importantly, retrieving a catalog brain does not wake up the indexed
content object itself. To get the full object, you can use the getObject() method on
the brain.
For further information about ZCatalogs, see the Searching and Categorizing
Content chapter of the Zope Book, which can be found at http://www.zope.org/
Documentation/Books/ZopeBook. You will find examples of using the catalog in
the subsequent chapters, and all throughout Plone's source code. Below are a few
examples of common catalog usage.
To retrieve all published news items in the site, use:
testing, we simply print the path, as returned by the brain-specific getPath()
function. This is equivalent to using '/'.join(obj.getPhysicalPath()) on a
regular object.
To prove this, we will use getObject() to retrieve such an object. Note that
normally, we would try not to do this to avoid a performance hit:
catalog object, and pass a dict of search terms instead of keyword parameters. The
keys refer to the names of indexes, while the values are the things to search for.
The getURL() method of a brain is complementary to getPath(). It returns the
referenced object's URL. As with the absolute_url() method on a regular object,
this takes into account the current server URL (which may be different from the
server URL at the time that the object was indexed):
most common kinds are the FieldIndex, which indexes a single field, and the
KeywordIndex, used when a field contains a list of values, and you would like to
be able to search for a subset of them. For example, the Subject index refers to the
Dublin Core subject (aka keywords) field. To find any documents (pages) or news
items that refer to Guitars or Fender, we could write:
given criteria. We also make use of the getId metadata attributes (which stores the
return value of the method with the same name), and reduce the lazy list of results to
a sorted list of string IDs for the purposes of validating the output reliably.
The path index can be used to search for objects by location. By default, it will match
the specified path and all sub-paths. By passing a dictionary with keys query and
depth to the index, we can search either for just a specific path (depth 0) or just
sub-paths of specified path (depth 1):
for all items under guitars, but not when searching for only those objects directly
inside guitars. Also notice that with no depth restriction, the guitars folder is included
in the search results, but it is excluded when searching for items at depth 1 (i.e. those
objects directly inside the folder).
We can control the order of the returned items using the special sort_on and
sort_order parameters, and the maximum number of returned objects using
sort_limit. When using sort_limit, we could potentially get a few more items
back—it is only a hint to the search algorithms, and the lazy nature of the returned
list makes it possible that complex searches will cause them to overshoot a little.
Therefore, we normally also explicitly limit the number of items we iterate over:
parameter can be "ascending" or "descending", with "reverse" being an alias for
"descending". The sortable_title index is a special version of the Title index
that uses some clever string manipulation to make sure that titles will sort the way
people normally expect them to.
When objects change, they need to be reindexed for the catalog to be updated. This is
done automatically when content is manipulated via the Plone user interface. When
making changes in code, however, we sometimes need to reindex manually.
in nearly all content objects. It tells the catalog to reindex the given object. Without
parameters, it reindexes all indexes, but we can save a bit of processing by passing
a list of indexes to re-index if we are certain nothing else has changed. There is also
reindexObjectSecurity(), which will automatically refresh the permission-related
indexes for the current object, and any children it may have.
using getattr() to retrieve acquisition-wrapped objects, and we can scan folderish
objects using methods such as objectIds(), which returns a list of all the IDs in that
container, and objectValues(), which returns a list of contained objects.
Walking the entire ZODB every time we want to find an object is not ideal. In
particular, functions such as objectValues() should be avoided if possible, because
they can wake up a large number of objects from the ZODB—unpickling them, and
swapping them into the ZODB cache. Waking up objects is a relatively slow process,
and waking up lots of objects will make your code very slow.
Zope mitigates this problem with ZCatalogs—relational database-like tables of
objects. In Plone, there is a ZCatalog called portal_catalog in the root of the site,
which indexes all content objects. You can get a hold of it using:
>>> from Products.CMFCore.utils import getToolByName >>> catalog = getToolByName(context, 'portal_catalog')Here, context must be an object inside the Plone site, or the Plone site root itself.
The catalog is configured with indexes—which can be used to search various
attributes of indexed objects—and metadata—copies of certain attributes that can
be examined without fetching the underlying content object. Sometimes, the same
attribute is used both in an index, and as metadata. You can think of an index as
something you use to find an object, and a metadata item as something you use to
inspect the search results.
Where an object does not provide a particular attribute, the value of any
corresponding metadata item may be None (or it could be acquired from a parent).
Note that if too many attributes are listed in the metadata table, the catalog will grow
in size, and become slower, counteracting the efficiency benefits of using metadata
instead of fetching full objects.
To see the complete list of metadata columns, look at portal_catalog in the ZMI,
under the Metadata tab. Indexes are listed on the Indexes tab. We will see an
example of adding additional indexes and metadata columns via the catalog.xml
import step in the next chapter.
Plone's catalog has an implicit search parameter that ensures that only those
content objects that are viewable by the current user are returned, and since
non-content objects (e.g. CMF tools found in the portal root) are not indexed,
they will not be found when searching the catalog. However, if you want to
find objects that are not viewable by the current user regardless, you can use the
unrestrictedSearchResults() function of portal_catalog.
When we search the catalog, it returns a lazy list of items known as brains. Catalog
brains have attributes consisting of values of the various columns in the metadata
table. Brains also contain a few useful methods for inspecting the object that was
cataloged. Most importantly, retrieving a catalog brain does not wake up the indexed
content object itself. To get the full object, you can use the getObject() method on
the brain.
For further information about ZCatalogs, see the Searching and Categorizing
Content chapter of the Zope Book, which can be found at http://www.zope.org/
Documentation/Books/ZopeBook. You will find examples of using the catalog in
the subsequent chapters, and all throughout Plone's source code. Below are a few
examples of common catalog usage.
To retrieve all published news items in the site, use:
>>> for brain in catalog(portal_type='News Item', review_ state='published'): ... print brain.getPath() /plone/Members/test_user_1_/guitars/strat /plone/Members/test_user_1_/guitars/lpHere, we call the catalog object directly to execute a query. For the purposes of
testing, we simply print the path, as returned by the brain-specific getPath()
function. This is equivalent to using '/'.join(obj.getPhysicalPath()) on a
regular object.
To prove this, we will use getObject() to retrieve such an object. Note that
normally, we would try not to do this to avoid a performance hit:
>>> for brain in catalog.searchResults({'portal_type' : 'News Item',
... 'review_state' : 'published'}):
... print '/'.join(brain.getObject().getPhysicalPath())
/plone/Members/test_user_1_/guitars/strat
/plone/Members/test_user_1_/guitars/lp
Here we also use the searchResults() method, which is equivalent to calling thecatalog object, and pass a dict of search terms instead of keyword parameters. The
keys refer to the names of indexes, while the values are the things to search for.
The getURL() method of a brain is complementary to getPath(). It returns the
referenced object's URL. As with the absolute_url() method on a regular object,
this takes into account the current server URL (which may be different from the
server URL at the time that the object was indexed):
>>> for brain in catalog(portal_type='News Item', review_ state='published'): ... print brain.getURL() == brain.getObject().absolute_url() True TrueDifferent types of indexes accept different types of search parameters. The
most common kinds are the FieldIndex, which indexes a single field, and the
KeywordIndex, used when a field contains a list of values, and you would like to
be able to search for a subset of them. For example, the Subject index refers to the
Dublin Core subject (aka keywords) field. To find any documents (pages) or news
items that refer to Guitars or Fender, we could write:
>>> results = catalog(portal_type=('Document', 'News Item',),
... Subject=('Guitars', 'Fender'))
>>> sorted([r.getId for r in results])
['fender', 'lp', 'tele']
Here, we assume that there are three objects, fender, lp, and tele, which match thegiven criteria. We also make use of the getId metadata attributes (which stores the
return value of the method with the same name), and reduce the lazy list of results to
a sorted list of string IDs for the purposes of validating the output reliably.
The path index can be used to search for objects by location. By default, it will match
the specified path and all sub-paths. By passing a dictionary with keys query and
depth to the index, we can search either for just a specific path (depth 0) or just
sub-paths of specified path (depth 1):
>>> guitars_path = '/'.join(self.folder.guitars.getPhysicalPath()) >>> results = catalog(path=guitars_path) >>> sorted([r.getId for r in results]) ['basses', 'fender', 'guitars', 'jagstang', 'lp', 'pbass', 'strat', 'tele'] >>> results = catalog(path=dict(query=guitars_path, depth=0)) >>> sorted([r.getId for r in results]) ['guitars'] >>> results = catalog(path=dict(query=guitars_path, depth=1)) >>> sorted([r.getId for r in results]) ['basses', 'fender', 'jagstang', 'lp', 'strat', 'tele']In these examples, pbass is a child of basses, which is why it shows up when searching
for all items under guitars, but not when searching for only those objects directly
inside guitars. Also notice that with no depth restriction, the guitars folder is included
in the search results, but it is excluded when searching for items at depth 1 (i.e. those
objects directly inside the folder).
We can control the order of the returned items using the special sort_on and
sort_order parameters, and the maximum number of returned objects using
sort_limit. When using sort_limit, we could potentially get a few more items
back—it is only a hint to the search algorithms, and the lazy nature of the returned
list makes it possible that complex searches will cause them to overshoot a little.
Therefore, we normally also explicitly limit the number of items we iterate over:
>>> results = catalog(portal_type='Document', sort_on='sortable_ title') >>> [r.Title for r in results] ['Favorite guitars', 'Fender', 'Precision bass'] >>> results = catalog(portal_type='Document', ... sort_on='sortable_title', sort_ order='descending') >>> [r.Title for r in results] ['Precision bass', 'Fender', 'Favorite guitars'] >>> limit = 5 >>> results = catalog(portal_type='Document', ... sort_on='sortable_title', sort_limit=limit)[:limit] >>> [r.Title for r in results] ['Favorite guitars', 'Fender', 'Precision bass']This returns the last five published objects, sorted on title. The sort_order
parameter can be "ascending" or "descending", with "reverse" being an alias for
"descending". The sortable_title index is a special version of the Title index
that uses some clever string manipulation to make sure that titles will sort the way
people normally expect them to.
When objects change, they need to be reindexed for the catalog to be updated. This is
done automatically when content is manipulated via the Plone user interface. When
making changes in code, however, we sometimes need to reindex manually.
>>> self.folder.favorites.setDescription("Contains a list of
favorites")
>>> len(catalog(Description="list of favorites"))
0
>>> self.folder.favorites.reindexObject(idxs=['Description'])
>>> len(catalog(Description="list of favorites"))
1
>>> self.folder.favorites.setDescription("My favorites!")
>>> self.folder.favorites.setTitle("My favorite guitars")
>>> self.folder.favorites.reindexObject()
>>> len(catalog(Title="My favorite guitars"))
1
The reindexObject() function comes from the CMFCatalogAware mix-in class, usedin nearly all content objects. It tells the catalog to reindex the given object. Without
parameters, it reindexes all indexes, but we can save a bit of processing by passing
a list of indexes to re-index if we are certain nothing else has changed. There is also
reindexObjectSecurity(), which will automatically refresh the permission-related
indexes for the current object, and any children it may have.