1. 12 Jun, 2024 2 commits
    • Vincent Pelletier's avatar
      Fix support for files containing non-ascii chars. · d7f80021
      Vincent Pelletier authored
      pyhton's stdin encoding is based on external information which may not
      related to the actual encoding of the file. For example, on a system this
      is:
        encoding='UTF-8'
        errors='surrogateescape'
      which then cause an exception to be raised if any surrogate was produced:
        UnicodeEncodeError: 'utf-8' codec can't encode character '\udca3' in position 4134: surrogates not allowed
      Reading the same file directly (instead of going through stdin) succeeds,
      because the replacement char is used instead.
      
      Reconfigure stdin encoding and error handling so it is consistent with
      files being opened by this tool directly.
      This is not to say that "ascii" and "replace" are the ultimate best choice
      (of which I am not completely convinced...) but at least this makes stdin
      work in exactly the same way as named files.
      d7f80021
    • Vincent Pelletier's avatar
      Minor source reformatting · dafe8536
      Vincent Pelletier authored
      Word-wrap a long line.
      open's mode is text by default, make it explicit.
      No change expected.
      dafe8536
  2. 28 May, 2024 1 commit
    • Vincent Pelletier's avatar
      Relax regexes for quoted fields. · f22f9e03
      Vincent Pelletier authored
      Allows catching more log lines, especially for quoted fields which lack
      quote escaping. This is at the expense of some parsing performance (10%
      on a random real-worlds sample).
      Also simplify the code a bit by removing expensive matching logic.
      f22f9e03
  3. 07 May, 2024 1 commit
    • Vincent Pelletier's avatar
      Fix failure when median is enabled and --period is not set and logs are not sorted · 1684838f
      Vincent Pelletier authored
      Replacing duration_list with an itertool.chain object breaks the API
      promise that it must have an append method. This cause all accumulate calls
      to fail when simultaneously:
      - median tracking is enabled
      - period is not set and some scaling happens (ex: there is more than 4 days
        of data)
      - log lines are being fed in a non-chronological order, where later lines
        are timestamped before the time some scaling happened (ex: log line for
        day 1, then day 5, then day 1 again)
      When this happens, data accumulation will fail on these later lines,
      causing the report to be only partial - for example errors will be missing.
      1684838f
  4. 09 Jan, 2024 3 commits
  5. 27 Dec, 2023 7 commits
  6. 20 Dec, 2023 1 commit
  7. 20 Apr, 2021 2 commits
  8. 19 Apr, 2021 2 commits
    • Vincent Pelletier's avatar
      apachedex: Tolerate non-ascii URLs. · d743c185
      Vincent Pelletier authored
      Otherwise, if `url` contains non-ascii chars, startswith will fail with
      an error like:
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 84: ordinal not in range(128)
      becasue 'http' is unicode. So byte-ify it to avoid this transcoding.
      d743c185
    • Vincent Pelletier's avatar
      apachedex: Add support for non-standard 444 and 499 status codes. · 1cff57c4
      Vincent Pelletier authored
      Give them captions.
      Also, tread 499 as a non-error: client closed the connection before server
      could respond, so it is likely not something the server could be
      considered responsible for. Of course, the response time still matters,
      so if these statuses come after slow responses it will still affect the
      score.
      1cff57c4
  9. 21 Jan, 2021 2 commits
  10. 20 Jan, 2021 1 commit
  11. 02 Mar, 2020 2 commits
    • Vincent Pelletier's avatar
      Bump to 1.7.1 . · 86913b29
      Vincent Pelletier authored
      86913b29
    • Vincent Pelletier's avatar
      Fix NameError. · c0a37b9f
      Vincent Pelletier authored
      Traceback (most recent call last):
        File "apachedex/__init__.py", line 1274, in wrapper
          return func(*args, **kw)
        File "apachedex/__init__.py", line 1586, in main
          site_data.rescale(rescale, getDuration)
        File "apachedex/__init__.py", line 605, in rescale
          for value_date, data in getattr(self, attribute_id).iteritems():
      NameError: global name 'attribute_id' is not defined
      c0a37b9f
  12. 28 May, 2019 3 commits
  13. 19 Mar, 2019 4 commits
    • Jérome Perrin's avatar
    • Jérome Perrin's avatar
      tests: check for zlib/bz2 encoded logs · 10d91ee0
      Jérome Perrin authored
      10d91ee0
    • Jérome Perrin's avatar
      Support non escaped referer in log · 128167a3
      Jérome Perrin authored
      Unlike apache which escape non ascii characters in referrer, caddy
      writes referrer as is. Edge seem to send referrer not escaped, so with
      Edge and caddy we can have non ascii text in referrer.
      
      For lines which cannot be decoded as ASCII, we use python `replace`
      error handler which would in this case allow the line to be processed if
      the decoding problem is only about the encoding of the referrer.
      
      We don't implement this case as "skip and report ill-formed line",
      because python does not provide utilities to do this easily.
      
      Reproduction with caddy:
      
      ```
      curl -k http://localhost -H 'Referer: héhé'
      ```
      
      With apache, `LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" common`
      ```
      127.0.0.1 - - [28/Feb/2019:10:03:33 +0100] "GET / HTTP/1.1" 200 2046 "h\xc3\xa9h\xc3\xa9" "curl/7.50.1" 4
      ```
      
      With caddy, `log / stdout "{remote} {>REMOTE_USER} [{when}] \"{method} {uri} {proto}\" {status} {size} \"{>Referer}\" \"{>User-Agent}\" {latency_ms}"`
      
      ```
      127.0.0.1 - [28/Feb/2019:10:05:00 +0100] "GET / HTTP/2.0" 200 1950 "héhé" "curl/7.50.1" 4
      ```
      128167a3
    • Arnaud Fontaine's avatar
      apachedex: Display row header when hovering its cells (report tables). · 48392feb
      Arnaud Fontaine authored
      When the table is large and requires scrolling, it becomes difficult
      to read so add 'title' attribute to <tr> on 'Stats per module' and
      'Hits per status code' tables. For example when hovering a particular
      cell on a module row, it display the module name.
      
      /reviewed-on nexedi/apachedex!3
      48392feb
  14. 23 Jan, 2018 2 commits
    • Vincent Pelletier's avatar
      Bump to 1.6.3 . · 7905d8cd
      Vincent Pelletier authored
      7905d8cd
    • Jérome Perrin's avatar
      Prevent errors when parsing date on malformed lines · 855cee8e
      Jérome Perrin authored
      We observed lines in our logs where the timestamp field was still
      respecting the timestamp regexp, so the line was not reported as
      invalid, but parsing such timestamp caused a ValueError in
      _matchToDateTime
      
      The beginning of line was:
      127.0.0.1 - - [14/Jul/2017:127.0.0.1 - - [14/Jul/2017:09:41:41 +0200]
      
      Which uses `[14/Jul/2017:127.0.0.1 - - [14/Jul/2017:09:41:41 +0200]` as
      timestamp, so this fail the simple .split() used to separate timestamp
      and timezone.
      
      Added a minimal test case to reproduce this specific problem.
      855cee8e
  15. 14 Sep, 2016 3 commits
  16. 11 Jul, 2014 2 commits
  17. 17 Apr, 2014 2 commits