Commits · 2594a81b114ee92c7ff95153d537cddc6d8d0626 · Kirill Smelkov / Zope

15 May, 2002 11 commits

Okapi index now works w/zope. · 2594a81b

Casey Duncan authored May 15, 2002

Removed QueryParser as a persistent attribute of the ZCTextIndex so that
it doesn't need to be persistent (It stores no state).

Updated tests. Functionally tested in Zope.

2594a81b

Report Lexicon statistics after bulk indexing. · b975f42a
Guido van Rossum authored May 15, 2002

b975f42a

Keep some statistics about indexing: total number of bytes and words · c470e7d5

Guido van Rossum authored May 15, 2002

indexed (where the bytes are counted before entry into the pipeline,
and the words are counted after the pipeline is done). To get the
numbers, use the _nbytes and _nwords instance variables directly.

c470e7d5

More comments and small cleanups & rearrangements. No semantic changes · 79b99fbb
Tim Peters authored May 15, 2002
```
(unless I erred ...).
```
79b99fbb
Whitespace normalization. · aee22894
Tim Peters authored May 15, 2002

aee22894
New testMany() -- a meatier test of the mass set ops. · d0584c06
Tim Peters authored May 15, 2002

d0584c06
This no longer needs NBest. · 660aeac5
Tim Peters authored May 15, 2002

660aeac5
Simplified testPairs() a tad. · 59d8feca
Tim Peters authored May 15, 2002

59d8feca
Use the new SetOps for mass union/intersection. · db9cf10c
Tim Peters authored May 15, 2002

db9cf10c
Use the new SetOps for mass union/intersection. · f4c2c29b
Tim Peters authored May 15, 2002

f4c2c29b

Squash bug duplication by moving the clever mass-union and mass- · 08fe38f4

Tim Peters authored May 15, 2002

intersection gimmicks into their own functions with their own test suite.

This turned up two bugs:

1. The mass weighted union gimmick was incorrect when passed a list with
   a single mapping.  In that case, it neglected to multiply the mapping
   by the given weight.

2. The underlying weighted{Intersection, Union} code does something crazy
   if you pass it weights less than 0.  I had vaguely hoped to be able
   to subtract scores by passing 1 and -1 as weights, but this doesn't
   work.  It's hard to say exactly what it does then.  The line
       weightedUnion(IIBTree(), mapping, 1, -2)
   seems to return a mapping with the same keys, but *all* of whose
   values are -2, regardless of the original mapping's values.

08fe38f4

14 May, 2002 21 commits

add some examples of what google and ultraseek return for queries of · 8927f982
Jeremy Hylton authored May 14, 2002
```
www.python.org.

next step is to add queries using ZCTextIndex
```
8927f982
Remove _ in call to a string's split() method. · cf64b682
Jeremy Hylton authored May 14, 2002
```
A little overzealous in the last checkin.
```
cf64b682

Make OkapiIndex the default index. · 154154ed

Jeremy Hylton authored May 14, 2002

ZCTextIndex has grown a new argument with a default value that can be
used to specify an Index class to use.  The default is OkapiIndex.Index.

There is a little kludge to make the test succeed.
testZCTestIndex.IndexTests uses the Index.Index tests instead of
OkapiIndex.Index.  Tim will probably fix this.

154154ed

Coding convention update: avoid use of "__" prefix for instance vars. · b3cb1b87
Fred Drake authored May 14, 2002

b3cb1b87
Consistently use a single leading underscore for instance variable · 9db492f1
Guido van Rossum authored May 14, 2002
```
names.
```
9db492f1
Use underscore for internal methods · 769fad63
Jeremy Hylton authored May 14, 2002

769fad63

Some cosmetic changes · dbdffd61

Jeremy Hylton authored May 14, 2002

Re-order imports so that all Zope imports go together and are separate
from all the ZCTextIndex imports.

Reformat _apply_index() doc string to use std Python style, which is
one-line summary followed by paragraphs of text that start at the same
offset as the function name.

Do comparison of None using is instead of ==.

dbdffd61

Add a comment about some of the data structures. · 1d0d9654
Fred Drake authored May 14, 2002

1d0d9654
Added clear method to comply with plug-in index API. · be21d3ca
Casey Duncan authored May 14, 2002

be21d3ca
Added ZMI icons for index and lexicon objects. · a33a96b9
Casey Duncan authored May 14, 2002

a33a96b9
Removed nbest query option, since it is not supported. · fce7f51a
Casey Duncan authored May 14, 2002

fce7f51a

Integration with Zope complete. ZCTextIndex is now a bonafide Plug-in index. · 0226c34d

Casey Duncan authored May 14, 2002

Some additional plug-in index APIs were added to ZCTextIndex and support APIs added to Index and Lexicon.

_apply_index does not use NBest since ZCatalog has an incompatible strategy for finding the top results. NBest might be abstracted from this product for general consumption in application code.

0226c34d

Remove an obsolete comment. · e5cbcd43
Tim Peters authored May 14, 2002

e5cbcd43
Add mechanics of compiling the Products.ZCTextIndex.stopper module. · c800a471
Fred Drake authored May 14, 2002

c800a471
Add test cases for the C version of StopWordRemover. · 5069d2f2
Fred Drake authored May 14, 2002

5069d2f2
Fix _union() -- it wasn't getting what it expected from pop_smallest() · ca38cb41
Guido van Rossum authored May 14, 2002
```
inside the while loop either.
```
ca38cb41
Simplify code to allow multiple "false" end tags in CDATA content. · 59b09255
Fred Drake authored May 14, 2002

59b09255
Add double end tag to test cdata ignore · faa28298
matt@zope.com authored May 14, 2002

faa28298

There's no point in encoding the number of continuation bytes in the · 5ee2de80

Guido van Rossum authored May 14, 2002

first byte -- we always find the end of a particular encoded number by
searching for the next byte with the high bit set. This simplifies
the encoding and gives us more space for small encodings: 128 values
can now be encoded in 1 byte, and 16K in 2 bytes.

5ee2de80

Merged TextIndexDS9-branch into trunk. · 61e89f2f
Guido van Rossum authored May 14, 2002

61e89f2f

Many small cleanups and simplifications. · a340cb9d

Jeremy Hylton authored May 14, 2002

_indexedSearch():

    Simplify logic that called _apply_index() for each index in the
    catalog.  The if statement under the comment "Optimization" had
    identical code on either branch.  Perhaps the odd indentation made
    this confusing.  Regardless, remove the conditional.

    Change computation of normalized scores to multiply first, then
    divide.  Use literal 100. to make sure mult and div are floating
    point ops.

searchResults():

    Simplify logic at beginning of searchResults().  The first two
    conditionals depended on kw, so organize the logic to make that
    clearer.

    Write helper method to find "sort-on" and "sort-index" instead of
    duplicating code in searchResults().

    For case were results are sorted, simplify construction of the
    final LazyCat and make it more efficient to boot.  Instead of use
    a list comprehension and a reduce + lambda to construct list and
    length of contained lists, do it with one explicit for loop that
    constructs both values.

        Note: I did detailed timing stats on three ways to compute the
        length of a sequence of sequences.  reduce + lambda was the
        slowest.  For short lists, an explicit for loop is fastest.
        For long lists, reduce(operater.add, map(len, list)) is
        fastest.  The explicit for loop is big win here, because we've
        got to walk over the elements anyway to undo the Schwarzian
        transform.

Sundry:

Use getattr() with default value of None in preference to hasattr()
followed by getattr().  This gets the same result with half the work.

Changes for consistent and frequent use of whitespace.

Use types.StringType and isinstance() to test for strings.

a340cb9d

13 May, 2002 1 commit
- Remove unused function to silence compiler warning. · 8b2a64dc
  Jeremy Hylton authored May 13, 2002
  
  8b2a64dc
10 May, 2002 3 commits
- "Fix" false bug: When something that looks like an end tag occurs in CDATA · 1c2de949
  Fred Drake authored May 10, 2002
```
content, but does not match the expected end tag, treat it as character data.
This is mostly useful when script includes string literal that include end
tags.
```
  1c2de949
- Add a comment explaining why the new test is wrong. · 070b5be2
  Fred Drake authored May 10, 2002
  
  070b5be2
- Route html end tag inside html comment inside a CDATA mode triggering tag · 50b412e9
  matt@zope.com authored May 10, 2002
```
from 2.5 branch.
```
  50b412e9
09 May, 2002 2 commits
- · 518fe525
  Andreas Jung authored May 09, 2002
```
      - Collector 386: workaround for hanging FTP connections
        with NcFTP
```
  518fe525
- avoid using PyObject_CallFunction when a single parameter may be a tuple · 9945cdf2
  Toby Dickenson authored May 09, 2002
  
  9945cdf2
07 May, 2002 2 commits

Add "remove_stale_bytecode()". This removes .pyc and .pyo files that · efd87296

Guido van Rossum authored May 07, 2002

don't have a corresponding .py file, to prevent tests that import
deleted modules from running using the stale bytecode files. This has
bitten enough people enough times that it's time it became a standard
part of every test suite runner. (Zope3 already has it.)

Somebody merge this into the Zope2 trunk please.

efd87296

Merged distutils-config-branch. · 8215cdb5
Shane Hathaway authored May 07, 2002

8215cdb5