• Guido van Rossum's avatar
    Refactor the query parser to rely on the lexicon for parsing terms. · b82b2746
    Guido van Rossum authored
    ILexicon.py:
    
      - Added parseTerms() and isGlob().
    
      - Added get_word(), get_wid() (get_word() is old; get_wid() for symmetry).
    
      - Reflowed some text.
    
    IQueryParser.py:
    
      - Expanded docs for parseQuery().
    
      - Added getIgnored() and parseQueryEx().
    
    IPipelineElement.py:
    
      - Added processGlob().
    
    Lexicon.py:
    
      - Added parseTerms() and isGlob().
    
      - Added get_wid().
    
      - Some pipeline elements now support processGlob().
    
    ParseTree.py:
    
      - Clarified the error message for calling executeQuery() on a
        NotNode.
    
    QueryParser.py (lots of changes):
    
      - Change private names __tokens etc. into protected _tokens etc.
    
      - Add getIgnored() and parseQueryEx() methods.
    
      - The atom parser now uses the lexicon's parseTerms() and isGlob()
        methods.
    
      - Query parts that consist only of stopwords (as determined by the
        lexicon), or of stopwords and negated terms, yield None instead of
        a parse tree node; the ignored term is added to self._ignored.
        None is ignored when combining terms for AND/OR/NOT operators, and
        when an operator has no non-None operands, the operator itself
        returns None.  When this None percolates all the way to the top,
        the parser raises a ParseError exception.
    
    tests/testQueryParser.py:
    
      - Changed test expressions of the form "a AND b AND c" to "aa AND bb
        AND cc" so that the terms won't be considered stopwords.
    
      - The test for "and/" can only work for the base class.
    
    tests/testZCTextIndex.py:
    
      - Added copyright notice.
    
      - Refactor testStopWords() to have two helpers, one for success, one
        for failures.
    
      - Change testStopWords() to require parser failure for those queries
        that have only stopwords or stopwords plus negated terms.
    
      - Improve compareSet() to sort the sets of keys, and use a more
        direct way of extracting the keys.  This wasn't strictly needed
        (nothing fails without this), but the old approach of copying the
        keys into a dict in a loop depends on the dict hashing to always
        return keys in the same order.
    b82b2746
Lexicon.py 4.71 KB