Commit 92c26bc8 authored by Tim Peters's avatar Tim Peters

Improve OOV explanation, based on Guido's feedback.

parent 9b736188
...@@ -61,11 +61,13 @@ class BaseIndex(Persistent): ...@@ -61,11 +61,13 @@ class BaseIndex(Persistent):
# of a docid-to-weight map. # of a docid-to-weight map.
# There are two kinds of OOV words: wid 0 is explicitly OOV, # There are two kinds of OOV words: wid 0 is explicitly OOV,
# and it's possible that the lexicon will return a non-zero wid # and it's possible that the lexicon will return a non-zero wid
# for a word *we've* never seen (e.g., lexicons can be shared # for a word we don't currently know about. For example, if we
# across indices, and a query can contain a word some other # unindex the last doc containing a particular word, that wid
# index knows about but we don't). A word is in-vocabulary for # remains in the lexicon, but is no longer in our _wordinfo map;
# this index if and only if _wordinfo.has_key(wid). Note that # lexicons can also be shared across indices, and some other index
# wid 0 must not be a key in _wordinfo. # may introduce a lexicon word we've never seen.
# A word is in-vocabulary for this index if and only if
# _wordinfo.has_key(wid). Note that wid 0 must not be a key.
self._wordinfo = IOBTree() self._wordinfo = IOBTree()
# docid -> weight # docid -> weight
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment