web_renderjs_ui: use lxml to extract data-i18n messages
The previous regular expression based approach sometimes could not extract message properly. Using xml parser simplify code and fix several messages that were not extracted properly, like messages containing ", [] or {}
This also fix some problems when looking for messages sources:
- archived web pages were sometimes used instead of published ones
- messages from gadgets implemented as page templates/OFS files were not extracted.
A few more unit tests for the scripts involved in this process are added.