- 12 Jun, 2024 2 commits
-
-
Vincent Pelletier authored
pyhton's stdin encoding is based on external information which may not related to the actual encoding of the file. For example, on a system this is: encoding='UTF-8' errors='surrogateescape' which then cause an exception to be raised if any surrogate was produced: UnicodeEncodeError: 'utf-8' codec can't encode character '\udca3' in position 4134: surrogates not allowed Reading the same file directly (instead of going through stdin) succeeds, because the replacement char is used instead. Reconfigure stdin encoding and error handling so it is consistent with files being opened by this tool directly. This is not to say that "ascii" and "replace" are the ultimate best choice (of which I am not completely convinced...) but at least this makes stdin work in exactly the same way as named files.
-
Vincent Pelletier authored
Word-wrap a long line. open's mode is text by default, make it explicit. No change expected.
-
- 28 May, 2024 1 commit
-
-
Vincent Pelletier authored
Allows catching more log lines, especially for quoted fields which lack quote escaping. This is at the expense of some parsing performance (10% on a random real-worlds sample). Also simplify the code a bit by removing expensive matching logic.
-
- 07 May, 2024 1 commit
-
-
Vincent Pelletier authored
Replacing duration_list with an itertool.chain object breaks the API promise that it must have an append method. This cause all accumulate calls to fail when simultaneously: - median tracking is enabled - period is not set and some scaling happens (ex: there is more than 4 days of data) - log lines are being fed in a non-chronological order, where later lines are timestamped before the time some scaling happened (ex: log line for day 1, then day 5, then day 1 again) When this happens, data accumulation will fail on these later lines, causing the report to be only partial - for example errors will be missing.
-
- 09 Jan, 2024 3 commits
-
-
Jérome Perrin authored
Now that this is python3 only, we always have lzma
-
Vincent Pelletier authored
-
Vincent Pelletier authored
For every measure, display the median in addition to the existing values (score, average, max). Optional, because it requires an amount of ram proportional to the number of hits.
-
- 27 Dec, 2023 7 commits
-
-
Vincent Pelletier authored
Switch APDEXStats class to use slots. This should save a bit of memory and get a bit more speed. Make word wrapping a bit more semantically sensible to make future diffs more readable. No functional change expected.
-
Vincent Pelletier authored
-
Vincent Pelletier authored
-
Vincent Pelletier authored
-
Vincent Pelletier authored
As generated by pylint 2.17.4 .
-
Vincent Pelletier authored
To avoid setup.py deprecation warnings without adding new dependencies.
-
Vincent Pelletier authored
Make it a bit more readable. Use non-abbreviated arguments when available. Use lower-case for variable names, which is the de-facto for non-exported variables. Shift once instead of after each argument. Make the script exit if it expands an unset variable or any call fails.
-
- 20 Dec, 2023 1 commit
-
-
Jérome Perrin authored
drop support for python 2
-
- 20 Apr, 2021 2 commits
-
-
Vincent Pelletier authored
The common unit is microseconds. While this value was correct, its name was misleading.
-
Vincent Pelletier authored
-
- 19 Apr, 2021 2 commits
-
-
Vincent Pelletier authored
Otherwise, if `url` contains non-ascii chars, startswith will fail with an error like: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 84: ordinal not in range(128) becasue 'http' is unicode. So byte-ify it to avoid this transcoding.
-
Vincent Pelletier authored
Give them captions. Also, tread 499 as a non-error: client closed the connection before server could respond, so it is likely not something the server could be considered responsible for. Of course, the response time still matters, so if these statuses come after slow responses it will still affect the score.
-
- 21 Jan, 2021 2 commits
-
-
Vincent Pelletier authored
-
Vincent Pelletier authored
-
- 20 Jan, 2021 1 commit
-
-
Jérome Perrin authored
Since 2.4.13, httpd suports %{UNIT}T in LogFormat
-
- 02 Mar, 2020 2 commits
-
-
Vincent Pelletier authored
-
Vincent Pelletier authored
Traceback (most recent call last): File "apachedex/__init__.py", line 1274, in wrapper return func(*args, **kw) File "apachedex/__init__.py", line 1586, in main site_data.rescale(rescale, getDuration) File "apachedex/__init__.py", line 605, in rescale for value_date, data in getattr(self, attribute_id).iteritems(): NameError: global name 'attribute_id' is not defined
-
- 28 May, 2019 3 commits
-
-
Vincent Pelletier authored
-
Arnaud Fontaine authored
Allow to define how many pages will be displayed in 'Hottest pages' section.
-
Arnaud Fontaine authored
By default (without passing this option), there is a single row grouping results of all non-modules URLs ('other'). With this new options, one row per URL is displayed for non-modules results.
-
- 19 Mar, 2019 4 commits
-
-
Jérome Perrin authored
-
Jérome Perrin authored
-
Jérome Perrin authored
Unlike apache which escape non ascii characters in referrer, caddy writes referrer as is. Edge seem to send referrer not escaped, so with Edge and caddy we can have non ascii text in referrer. For lines which cannot be decoded as ASCII, we use python `replace` error handler which would in this case allow the line to be processed if the decoding problem is only about the encoding of the referrer. We don't implement this case as "skip and report ill-formed line", because python does not provide utilities to do this easily. Reproduction with caddy: ``` curl -k http://localhost -H 'Referer: héhé' ``` With apache, `LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" common` ``` 127.0.0.1 - - [28/Feb/2019:10:03:33 +0100] "GET / HTTP/1.1" 200 2046 "h\xc3\xa9h\xc3\xa9" "curl/7.50.1" 4 ``` With caddy, `log / stdout "{remote} {>REMOTE_USER} [{when}] \"{method} {uri} {proto}\" {status} {size} \"{>Referer}\" \"{>User-Agent}\" {latency_ms}"` ``` 127.0.0.1 - [28/Feb/2019:10:05:00 +0100] "GET / HTTP/2.0" 200 1950 "héhé" "curl/7.50.1" 4 ```
-
Arnaud Fontaine authored
When the table is large and requires scrolling, it becomes difficult to read so add 'title' attribute to <tr> on 'Stats per module' and 'Hits per status code' tables. For example when hovering a particular cell on a module row, it display the module name. /reviewed-on nexedi/apachedex!3
-
- 23 Jan, 2018 2 commits
-
-
Vincent Pelletier authored
-
Jérome Perrin authored
We observed lines in our logs where the timestamp field was still respecting the timestamp regexp, so the line was not reported as invalid, but parsing such timestamp caused a ValueError in _matchToDateTime The beginning of line was: 127.0.0.1 - - [14/Jul/2017:127.0.0.1 - - [14/Jul/2017:09:41:41 +0200] Which uses `[14/Jul/2017:127.0.0.1 - - [14/Jul/2017:09:41:41 +0200]` as timestamp, so this fail the simple .split() used to separate timestamp and timezone. Added a minimal test case to reproduce this specific problem.
-
- 14 Sep, 2016 3 commits
-
-
Vincent Pelletier authored
-
Vincent Pelletier authored
-
Vincent Pelletier authored
-
- 11 Jul, 2014 2 commits
-
-
Arnaud Fontaine authored
-
Arnaud Fontaine authored
-
- 17 Apr, 2014 2 commits
-
-
Vincent Pelletier authored
-
Vincent Pelletier authored
-