- 23 May, 2002 5 commits
-
-
Guido van Rossum authored
-
Guido van Rossum authored
Add -w and -W option to dump the word list (by word and by wid, respectively). Except KeyboardInterrupt from unqualified except clauses.
-
Guido van Rossum authored
- Use slightly more portable values for the Data.fs and Zope/lib/python. - Add -t NNN option to specify how often to commit a transaction; default 20,000. - Change -p into -p NNN to specify how often (counted in commits) to pack (default 0 -- never pack). - Reworked the commit and pack logic to maintain the various counters across folders. - Store relative paths (e.g. "inbox/1"). - Store the mtime of indexed messages in doctimes[docid]. - Store the mtime of indexed folders in watchfolders[folder] (unused). - Refactor updatefolder() to: (a) Avoid indexing messages it's already indexed and whose mtime hasn't changed. (This probably needs an override just in case.) (b) Unindex messages that no longer exist in the folder. - Include the folder name and the message header fields from, to, cc, bcc, and subject in the text to be indexed.
-
Tim Peters authored
-
Guido van Rossum authored
valid value is input, or the empty string, and interpret the empty string as the default. Indicate the default in the prompt.
-
- 22 May, 2002 12 commits
-
-
Guido van Rossum authored
Add glob support to the HTMLWordSplitter class.
-
Casey Duncan authored
selected in a mutally exclusive manner (such as splitters). Existing pipeline elements have been grouped appropriately. Added a stop word remover that does not remove single char words. Modified ZMI lexicon add form to use pipeline element groups to render form. Groups with multiple elements are rendered as selects, singletons are rendered as checkboxes.
-
Guido van Rossum authored
-
Guido van Rossum authored
but the pattern may not begin with a glob character (else someone specifying "*" as the pattern can tie up the CPU for a long time).
-
Andreas Jung authored
class
-
Andreas Jung authored
and recognizes the header attribute
-
Casey Duncan authored
* A pipeline factory registry now allows registration of possible pipeline elements for use by Zope lexicons. * ZMI constructor form for lexicon uses pipeline registry to generate form fields * ZMI constructor form for ZCTextindex allows you to choose between Okapi and Cosine relevance algorithms
-
Guido van Rossum authored
-
Fred Drake authored
instead of an extension type, and let StopWordRemover be a Python class that uses the helper if available.
-
Andreas Jung authored
-
Shane Hathaway authored
-
Tim Peters authored
-
- 21 May, 2002 12 commits
-
-
Jeremy Hylton authored
already ditched Python 1.5.2. The version of tempfile is many revision behind the one in the Python std library.
-
Guido van Rossum authored
Remove redundant import. Ensure that ZCTextIndex implements the PluggableIndexInterface by adding an unimplemented uniqueValues() method.
-
Andreas Jung authored
(similiar to getPhysicalPath())
-
Guido van Rossum authored
Verify that ZCTextIndex implements the PluggableIndexInterface.
-
Guido van Rossum authored
-
Guido van Rossum authored
neither 'pass' (v 1.2) nor 'break' (v 1.3) but 'continue'. Whitespace normalization.
-
Tim Peters authored
loop-invariant, save a little time by multiplying idf by 1024. outside the loop.
-
Tim Peters authored
-
Guido van Rossum authored
the number of words in the index (at least to return a number comparable to the number displayed under "# objects" by TextIndex).
-
Guido van Rossum authored
Index management screen. Ditto for clear(). So group them together and adjust the comment. (So is manage_main, but since it's a DTML method, it can stay in its separate UI group.)
-
Andreas Jung authored
-
Guido van Rossum authored
still only supports a trailing *, so the pipeline should honor that; added a comment to the Splitter class referring to globToWordIds().
-
- 20 May, 2002 11 commits
-
-
Tim Peters authored
well check it in. This yields an overall 133% speedup on a "hot" search for 'python' in my python-dev archive (a word that appears in all but 2 documents). For those who read the email, turned out it was a significant speedup to iterate over an IIBTree's items rather than to materialize the items into an explicit list first. This is now within 20% of simply doing "IIBucket(the_IIBTree)" (i.e., no arithmetic at all), so there's no significant possibility remaining for speeding the inner score loop.
-
Guido van Rossum authored
creating it anonymously and then pulling it out of the zc_index object.
-
Guido van Rossum authored
once we have more than one on the menu.)
-
Guido van Rossum authored
in percentages; strip the percent sign to avoid a traceback calling int() when these variables are used.
-
Guido van Rossum authored
I'm unclear whether this is really the right thing, but at least this prevents crashes when nothing is entered in the search box.
-
Guido van Rossum authored
_fieldname; simply return 0 in this case.
-
Guido van Rossum authored
is *disabled*.
-
Guido van Rossum authored
-
Guido van Rossum authored
-
Guido van Rossum authored
Fix typo in docstring.
-
Guido van Rossum authored
- Rephrased the description of the grammar, pointing out that the lexicon decides on globbing syntax. - Refactored term and atom parsing (moving atom parsing into a separate method). The previously checked-in version accidentally accepted some invalid forms like ``foo AND -bar''; this is fixed. tests/testQueryParser.py: - Each test is now in a separate method; this produces more output (alas) but makes pinpointing the errors much simpler. - Added some tests catching ``foo AND -bar'' and similar. - Added an explicit test class for the handling of stopwords. The "and/" test no longer has to check self.__class__. - Some refactoring of the TestQueryParser class; the utility methods are now in a base class TestQueryParserBase, in a different order; compareParseTrees() now shows the parse tree it got when raising an exception. The parser is now self.parser instead of self.p (see below). tests/testZCTextIndex.py: - setUp() no longer needs to assign to self.p; the parser is consistently called self.parser now.
-