Commits · master · nexedi / apachedex

12 Jun, 2024 2 commits

Fix support for files containing non-ascii chars. · d7f80021

Vincent Pelletier authored Jun 12, 2024

pyhton's stdin encoding is based on external information which may not
related to the actual encoding of the file. For example, on a system this
is:
  encoding='UTF-8'
  errors='surrogateescape'
which then cause an exception to be raised if any surrogate was produced:
  UnicodeEncodeError: 'utf-8' codec can't encode character '\udca3' in position 4134: surrogates not allowed
Reading the same file directly (instead of going through stdin) succeeds,
because the replacement char is used instead.

Reconfigure stdin encoding and error handling so it is consistent with
files being opened by this tool directly.
This is not to say that "ascii" and "replace" are the ultimate best choice
(of which I am not completely convinced...) but at least this makes stdin
work in exactly the same way as named files.

d7f80021

Minor source reformatting · dafe8536

Vincent Pelletier authored Jun 12, 2024

Word-wrap a long line.
open's mode is text by default, make it explicit.
No change expected.

dafe8536

28 May, 2024 1 commit

Relax regexes for quoted fields. · f22f9e03

Vincent Pelletier authored May 28, 2024

Allows catching more log lines, especially for quoted fields which lack
quote escaping. This is at the expense of some parsing performance (10%
on a random real-worlds sample).
Also simplify the code a bit by removing expensive matching logic.

f22f9e03

07 May, 2024 1 commit

Fix failure when median is enabled and --period is not set and logs are not sorted · 1684838f

Vincent Pelletier authored May 07, 2024

Replacing duration_list with an itertool.chain object breaks the API
promise that it must have an append method. This cause all accumulate calls
to fail when simultaneously:
- median tracking is enabled
- period is not set and some scaling happens (ex: there is more than 4 days
  of data)
- log lines are being fed in a non-chronological order, where later lines
  are timestamped before the time some scaling happened (ex: log line for
  day 1, then day 5, then day 1 again)
When this happens, data accumulation will fail on these later lines,
causing the report to be only partial - for example errors will be missing.

1684838f

09 Jan, 2024 3 commits
- README: remove references to backports.lzma · a174cb05
  Jérome Perrin authored Jan 09, 2024
```
Now that this is python3 only, we always have lzma
```
  a174cb05
- setup.py: Fix twine warning · 63563841
  Vincent Pelletier authored Jan 09, 2024
  
  63563841
- Add optional median computation. · 039da94e
  Vincent Pelletier authored Dec 27, 2023
```
For every measure, display the median in addition to the existing values
(score, average, max).
Optional, because it requires an amount of ram proportional to the number
of hits.
```
  039da94e
27 Dec, 2023 7 commits
- Assorted preparatory work for a new feature · 6a4d6f5c
  Vincent Pelletier authored Dec 27, 2023
```
Switch APDEXStats class to use slots. This should save a bit of memory and
get a bit more speed.
Make word wrapping a bit more semantically sensible to make future diffs
more readable.
No functional change expected.
```
  6a4d6f5c
- Fix --stats output rendering · 3e0e69cd
  Vincent Pelletier authored Dec 27, 2023
  
  3e0e69cd
- Use a context manager for logfile resource management · 3cc66082
  Vincent Pelletier authored Dec 27, 2023
  
  3cc66082
- Make pylint somewhat happy · 466146fb
  Vincent Pelletier authored Dec 26, 2023
  
  466146fb
- .pylintrc: Initial import · 3efd68a4
  Vincent Pelletier authored Dec 26, 2023
```
As generated by pylint 2.17.4 .
```
  3efd68a4
- Make tests executable · fccf36d9
  Vincent Pelletier authored Dec 26, 2023
```
To avoid setup.py deprecation warnings without adding new dependencies.
```
  fccf36d9
- parallel_parse.sh: Cosmetic changes · ca5ae218
  Vincent Pelletier authored Dec 27, 2023
```
Make it a bit more readable.
Use non-abbreviated arguments when available.
Use lower-case for variable names, which is the de-facto for non-exported
variables.
Shift once instead of after each argument.
Make the script exit if it expands an unset variable or any call fails.
```
  ca5ae218
20 Dec, 2023 1 commit
- update for python3 >= 3.9 · ffdc722a
  Jérome Perrin authored Dec 19, 2023
```
drop support for python 2
```
  ffdc722a
20 Apr, 2021 2 commits
- apachedex: Rename global. · 943a005d
  Vincent Pelletier authored Apr 20, 2021
```
The common unit is microseconds. While this value was correct, its name
was misleading.
```
  943a005d
- apachedex: Implement a duration cap. · 2a3edb84
  Vincent Pelletier authored Apr 20, 2021
  
  2a3edb84
19 Apr, 2021 2 commits

apachedex: Tolerate non-ascii URLs. · d743c185

Vincent Pelletier authored Apr 19, 2021

Otherwise, if `url` contains non-ascii chars, startswith will fail with
an error like:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 84: ordinal not in range(128)
becasue 'http' is unicode. So byte-ify it to avoid this transcoding.

d743c185

apachedex: Add support for non-standard 444 and 499 status codes. · 1cff57c4

Vincent Pelletier authored Apr 19, 2021

Give them captions.
Also, tread 499 as a non-error: client closed the connection before server
could respond, so it is likely not something the server could be
considered responsible for. Of course, the response time still matters,
so if these statuses come after slow responses it will still affect the
score.

1cff57c4

21 Jan, 2021 2 commits
- all: versioneer-ify. · 6073b969
  Vincent Pelletier authored Jan 21, 2021
  
  6073b969
- setup.py: Fix use_2to3 usage. · b1fcea93
  Vincent Pelletier authored Jan 21, 2021
  
  b1fcea93
20 Jan, 2021 1 commit
- Support %{ms}T for duration in milliseconds · e07b1f25
  Jérome Perrin authored Nov 04, 2020
```
Since 2.4.13, httpd suports %{UNIT}T in LogFormat
```
  e07b1f25
02 Mar, 2020 2 commits

Bump to 1.7.1 . · 86913b29
Vincent Pelletier authored Mar 02, 2020

86913b29

Fix NameError. · c0a37b9f

Vincent Pelletier authored Mar 02, 2020

Traceback (most recent call last):
  File "apachedex/__init__.py", line 1274, in wrapper
    return func(*args, **kw)
  File "apachedex/__init__.py", line 1586, in main
    site_data.rescale(rescale, getDuration)
  File "apachedex/__init__.py", line 605, in rescale
    for value_date, data in getattr(self, attribute_id).iteritems():
NameError: global name 'attribute_id' is not defined

c0a37b9f

28 May, 2019 3 commits
- Bump to 1.7.0 . · b3884bbe
  Vincent Pelletier authored May 28, 2019
  
  b3884bbe
- apachedex: Add ``--n-hottest-pages'' command line parameter. · a7134ffa
  Arnaud Fontaine authored Mar 25, 2019
```
Allow to define how many pages will be displayed in 'Hottest pages' section.
```
  a7134ffa
- apachedex: Add ``--erp5-expand-other'' to display all 'other' results in 'Stats per module'. · 0ba22c47
  Arnaud Fontaine authored Apr 18, 2019
```
By default (without passing this option), there is a single row grouping
results of all non-modules URLs ('other'). With this new options, one row per
URL is displayed for non-modules results.
```
  0ba22c47
19 Mar, 2019 4 commits

wrap lzma.open, so that it supports encoding and errors on py2 · 2f4388dd
Jérome Perrin authored Mar 18, 2019

2f4388dd
tests: check for zlib/bz2 encoded logs · 10d91ee0
Jérome Perrin authored Mar 18, 2019

10d91ee0

Support non escaped referer in log · 128167a3

Jérome Perrin authored Feb 28, 2019

Unlike apache which escape non ascii characters in referrer, caddy
writes referrer as is. Edge seem to send referrer not escaped, so with
Edge and caddy we can have non ascii text in referrer.

For lines which cannot be decoded as ASCII, we use python `replace`
error handler which would in this case allow the line to be processed if
the decoding problem is only about the encoding of the referrer.

We don't implement this case as "skip and report ill-formed line",
because python does not provide utilities to do this easily.

Reproduction with caddy:

```
curl -k http://localhost -H 'Referer: héhé'
```

With apache, `LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" common`
```
127.0.0.1 - - [28/Feb/2019:10:03:33 +0100] "GET / HTTP/1.1" 200 2046 "h\xc3\xa9h\xc3\xa9" "curl/7.50.1" 4
```

With caddy, `log / stdout "{remote} {>REMOTE_USER} [{when}] \"{method} {uri} {proto}\" {status} {size} \"{>Referer}\" \"{>User-Agent}\" {latency_ms}"`

```
127.0.0.1 - [28/Feb/2019:10:05:00 +0100] "GET / HTTP/2.0" 200 1950 "héhé" "curl/7.50.1" 4
```

128167a3

apachedex: Display row header when hovering its cells (report tables). · 48392feb

Arnaud Fontaine authored Mar 05, 2019

When the table is large and requires scrolling, it becomes difficult
to read so add 'title' attribute to <tr> on 'Stats per module' and
'Hits per status code' tables. For example when hovering a particular
cell on a module row, it display the module name.

/reviewed-on nexedi/apachedex!3

48392feb

23 Jan, 2018 2 commits

Bump to 1.6.3 . · 7905d8cd
Vincent Pelletier authored Jan 23, 2018

7905d8cd

Prevent errors when parsing date on malformed lines · 855cee8e

Jérome Perrin authored Jan 23, 2018

We observed lines in our logs where the timestamp field was still
respecting the timestamp regexp, so the line was not reported as
invalid, but parsing such timestamp caused a ValueError in
_matchToDateTime

The beginning of line was:
127.0.0.1 - - [14/Jul/2017:127.0.0.1 - - [14/Jul/2017:09:41:41 +0200]

Which uses `[14/Jul/2017:127.0.0.1 - - [14/Jul/2017:09:41:41 +0200]` as
timestamp, so this fail the simple .split() used to separate timestamp
and timezone.

Added a minimal test case to reproduce this specific problem.

855cee8e

14 Sep, 2016 3 commits
- apachedex: Add support for ~ when including configuration files. · bf83eb8d
  Vincent Pelletier authored Sep 14, 2016
  
  bf83eb8d
- apachedex: Simplify lzma detection code. · 12fe7cff
  Vincent Pelletier authored Sep 14, 2016
  
  12fe7cff
- parallel_parse: Typo in usage. · e457bb6f
  Vincent Pelletier authored Sep 14, 2016
  
  e457bb6f
11 Jul, 2014 2 commits
- Either --state-file or logfile arguments must be specified. · ca61b13f
  Arnaud Fontaine authored Jul 11, 2014
  
  ca61b13f
- Ignore Apache log formats which are not supported. · ea4376d0
  Arnaud Fontaine authored Jul 11, 2014
  
  ea4376d0
17 Apr, 2014 2 commits
- Bump to 1.6.2 . · 6f71bffe
  Vincent Pelletier authored Apr 17, 2014
  
  6f71bffe
- Brown paper bag: NameError in main() wrapper. · 6b60a90a
  Vincent Pelletier authored Apr 17, 2014
  
  6b60a90a