• Jérome Perrin's avatar
    Support non escaped referer in log · 128167a3
    Jérome Perrin authored
    Unlike apache which escape non ascii characters in referrer, caddy
    writes referrer as is. Edge seem to send referrer not escaped, so with
    Edge and caddy we can have non ascii text in referrer.
    
    For lines which cannot be decoded as ASCII, we use python `replace`
    error handler which would in this case allow the line to be processed if
    the decoding problem is only about the encoding of the referrer.
    
    We don't implement this case as "skip and report ill-formed line",
    because python does not provide utilities to do this easily.
    
    Reproduction with caddy:
    
    ```
    curl -k http://localhost -H 'Referer: héhé'
    ```
    
    With apache, `LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" common`
    ```
    127.0.0.1 - - [28/Feb/2019:10:03:33 +0100] "GET / HTTP/1.1" 200 2046 "h\xc3\xa9h\xc3\xa9" "curl/7.50.1" 4
    ```
    
    With caddy, `log / stdout "{remote} {>REMOTE_USER} [{when}] \"{method} {uri} {proto}\" {status} {size} \"{>Referer}\" \"{>User-Agent}\" {latency_ms}"`
    
    ```
    127.0.0.1 - [28/Feb/2019:10:05:00 +0100] "GET / HTTP/2.0" 200 1950 "héhé" "curl/7.50.1" 4
    ```
    128167a3