- 20 May, 2002 15 commits
-
-
Tim Peters authored
well check it in. This yields an overall 133% speedup on a "hot" search for 'python' in my python-dev archive (a word that appears in all but 2 documents). For those who read the email, turned out it was a significant speedup to iterate over an IIBTree's items rather than to materialize the items into an explicit list first. This is now within 20% of simply doing "IIBucket(the_IIBTree)" (i.e., no arithmetic at all), so there's no significant possibility remaining for speeding the inner score loop.
-
Guido van Rossum authored
creating it anonymously and then pulling it out of the zc_index object.
-
Guido van Rossum authored
once we have more than one on the menu.)
-
Guido van Rossum authored
in percentages; strip the percent sign to avoid a traceback calling int() when these variables are used.
-
Guido van Rossum authored
I'm unclear whether this is really the right thing, but at least this prevents crashes when nothing is entered in the search box.
-
Guido van Rossum authored
_fieldname; simply return 0 in this case.
-
Guido van Rossum authored
is *disabled*.
-
Guido van Rossum authored
-
Guido van Rossum authored
-
Guido van Rossum authored
Fix typo in docstring.
-
Guido van Rossum authored
- Rephrased the description of the grammar, pointing out that the lexicon decides on globbing syntax. - Refactored term and atom parsing (moving atom parsing into a separate method). The previously checked-in version accidentally accepted some invalid forms like ``foo AND -bar''; this is fixed. tests/testQueryParser.py: - Each test is now in a separate method; this produces more output (alas) but makes pinpointing the errors much simpler. - Added some tests catching ``foo AND -bar'' and similar. - Added an explicit test class for the handling of stopwords. The "and/" test no longer has to check self.__class__. - Some refactoring of the TestQueryParser class; the utility methods are now in a base class TestQueryParserBase, in a different order; compareParseTrees() now shows the parse tree it got when raising an exception. The parser is now self.parser instead of self.p (see below). tests/testZCTextIndex.py: - setUp() no longer needs to assign to self.p; the parser is consistently called self.parser now.
-
Guido van Rossum authored
:-)
-
Guido van Rossum authored
-
Guido van Rossum authored
ILexicon.py: - Added parseTerms() and isGlob(). - Added get_word(), get_wid() (get_word() is old; get_wid() for symmetry). - Reflowed some text. IQueryParser.py: - Expanded docs for parseQuery(). - Added getIgnored() and parseQueryEx(). IPipelineElement.py: - Added processGlob(). Lexicon.py: - Added parseTerms() and isGlob(). - Added get_wid(). - Some pipeline elements now support processGlob(). ParseTree.py: - Clarified the error message for calling executeQuery() on a NotNode. QueryParser.py (lots of changes): - Change private names __tokens etc. into protected _tokens etc. - Add getIgnored() and parseQueryEx() methods. - The atom parser now uses the lexicon's parseTerms() and isGlob() methods. - Query parts that consist only of stopwords (as determined by the lexicon), or of stopwords and negated terms, yield None instead of a parse tree node; the ignored term is added to self._ignored. None is ignored when combining terms for AND/OR/NOT operators, and when an operator has no non-None operands, the operator itself returns None. When this None percolates all the way to the top, the parser raises a ParseError exception. tests/testQueryParser.py: - Changed test expressions of the form "a AND b AND c" to "aa AND bb AND cc" so that the terms won't be considered stopwords. - The test for "and/" can only work for the base class. tests/testZCTextIndex.py: - Added copyright notice. - Refactor testStopWords() to have two helpers, one for success, one for failures. - Change testStopWords() to require parser failure for those queries that have only stopwords or stopwords plus negated terms. - Improve compareSet() to sort the sets of keys, and use a more direct way of extracting the keys. This wasn't strictly needed (nothing fails without this), but the old approach of copying the keys into a dict in a loop depends on the dict hashing to always return keys in the same order.
-
Matt Behrens authored
guido@. when/if merge day comes for the installer this will make for less confusion :-)
-
- 19 May, 2002 6 commits
-
-
Tim Peters authored
display the search time in milliseconds too.
-
Tim Peters authored
for start and end of run. Show elapsed wall-clock time in minutes.
-
Tim Peters authored
msgs to display). Changed the module docstring to separate the index- generation args from the query args.
-
Tim Peters authored
-
Tim Peters authored
original doc text gets restored.
-
Guido van Rossum authored
-
- 18 May, 2002 5 commits
-
-
Tim Peters authored
went wrong if they fail.
-
Tim Peters authored
-
Tim Peters authored
me to call it braindead <wink>).
-
Tim Peters authored
PACK_INTERVAL.
-
Tim Peters authored
-
- 17 May, 2002 14 commits
-
-
Tim Peters authored
uncomment the test cases that were failing in these contexts. Read it and weep <wink>: In an AND context, None is treated like the universal set, which jibes with the convenient fiction that stop words appear in every doc. However, in AND NOT and OR contexts, None is treated like the empty set, which doesn't jibe with anything except that we want real_word AND NOT stop_word and real_word OR stop_word to act like real_word If we treated None as if it were the universal set, these results would be (respectively) the empty set and the universal set instead. At a higher level, we *are* consistent with the notion that a query with a stop word acts the same as if the clause with the stop word weren't present. That's what really drives this schizophrenic (context-dependent) treatment of None.
-
Jeremy Hylton authored
-
Tim Peters authored
empty.
-
Tim Peters authored
-
Tim Peters authored
tests that currently fail are currently commented out. Key question: If someone does a search on a stopword, and nothing else is in the query, what do we want to do? Return all docs in a random order? Return no docs? Raise an exception? Second question: What if someone does a query on rare_word AND NOT stop_word ?
-
Jeremy Hylton authored
do the same query and work as ZCTextIndex would do. Produce a result set, pump it into NBest, and extract the 10 best.
-
Tim Peters authored
-
Jeremy Hylton authored
but works with a TextIndex Lexicon.
-
Tim Peters authored
the index knows about the doc and the wid. _del_wordinfo and _add_wordinfo: s/map/doc2score/g. map is a builtin function, and it's needlessly confusing to name a vrbl that too.
-
Tim Peters authored
-
Jeremy Hylton authored
-
Jeremy Hylton authored
I think that the default Lexicon for TextIndex does not use a stop word list. For the comparison with ZCTextIndex, explicitly pass the default stop word dict from TextIndex to the lexicon.
-
Jeremy Hylton authored
-
Jeremy Hylton authored
In unindex_doc() call _del_wordinfo() for each unique wid in the doc, not for each wid. Before we had WidCode and phrase searching, _docwords stored a list of the unique wids. The unindex code wasn't updated when _docwords started storing all the wids, even duplicates. Replace the try/except around __getitem__ in _add_wordinfo() with a .get() call. Add XXX comment about the purpose of the try/except(s) in _del_wordinfo(). I suspect they only existed because _del_wordinfo() was called repeatedly when a wid existed more than once.
-