Git Product home page Git Product logo

Comments (3)

msb48 avatar msb48 commented on June 24, 2024

separation between stdout and stderr outputs

+1

remove HTTP method (GET), remove HTTP version (HTTP/1.1)

I've seen and appreciated those entries with other servers too, so I'd prefer not to remove them. In fact, it might make sense to add two more standard entries:

  • HTTP status code
  • size of response in bytes

remove path components (/brouter?)

That would be too much of a cleanup IMO. There might very well be different HTTP endpoints served from the same binary in the future (perhaps even today? - haven't checked), e.g. a v2, or additional helper services.

IP address of client

https://brouter.de/privacypolicy.html (as linked from https://brouter.de/brouter-web) returns 404. For GDPR compliance it should be clearly stated for what reasons and for how long that data is retained, though. See https://bikerouter.de for how it could be done properly.

Before that is fixed, storing the IP should be disabled immediately. IOW, logging the IP should be off by default, or it should be replaced by a static placeholder. Logging the real IP should be an option (config or env), to be enabled only by admins who know what they are doing (see paragraph above).

Is logging the IP even needed, or would a hash be enough?

from brouter.

abrensch avatar abrensch commented on June 24, 2024

Hallo marcus+msb48,

restored the https://brouter.de/privacypolicy.html ...

difficult for me to comment the RFC if you do not say what you plan to do with the logs. As I sayd, I am doing just some access statistics and quality assurance for the quality of service for the brouter.de instance. Here, parallel sessions summer peak is about 200 (so capping at 999 theoretical so far) and requests-per-day summer peak is about 400.000

Yesterday, however, requests per day was 900.000 and looking at the log shows what happened:

18.05.24 11:47 127 ip=147.45.47.133 ms=1 -> GET /brouter/suspects/Austria/Vienna/confirmed/843514405793336318/fixed?ndays=30-1))%20OR%20938=(SELECT%20938%20FROM%20PG_SLEEP(15))-- HTTP/1.1

Lessons learned her: 1) more then just standard routing requests in that log, 2) intrusion detection needs unformatted infoprmation, 3) IP adress needed to add it on the blacklist

However, for context you need to know there's the nginx access/error logs in addition, that do contain the IP and the URL as well. On brouter.de, they are setup for daily rotate/gzip and 2 weeks archive

brouter-logging up to now is rotated manually (which does no work well)

So depends, if you want to have a long-time archive of access statistics (and I would like to have..) that should be some hybrid that works in conjunction with the nginx-log, having the brouter-log free of IPs for GDPR compliance.

Hard to believe for me that you will get happy here with structured JSON when talking about 100 Mio Requests per year. Other Aspect here is that if the JSON comes with a library dependency then it comes with a price... ((Geo)JSON up to now we create low-level)

So maybe Marcus you can comment on the intended usage of the log?

regards, Arndt

from brouter.

mjaschen avatar mjaschen commented on June 24, 2024

Hi @abrensch,

my motivation for introducing structured logging is to analyze more efficiently. The structured logs can be directly imported into modern log management systems like Elasticsearch and then searched, analyzed, aggregated, or visualized with frontends like Kibana, Graylog, Grafana, etc.

Examples:

  • “Which are the most used routing profiles for the winter months vs. summer months?“,
  • “How often are the route alternatives used with a certain routing profile?”
  • “Show me the distribution of processing times vs. routing profiles”

All of this is possible with plain text logs too, but it requires more effort to parse the logs and extract the relevant information.

I'm using Elasticsearch, Graylog and Kibana for other projects. Depending on the time of day, hundreds to thousands of messages are ingested every second on some small virtual servers. So performance is definitely not an issue. 100M messages would require only a few gigabytes of storage, which is negligible nowadays.

As the log format is relatively simple, building the messages manually would not be a hard task if no further external dependencies are allowed.

Of course it'd be easily possible to pick up the existing log messages and convert them to structured logs by an external tool or script. Integration into BRouter is not necessarily important to me, but I thought it might also be useful for other server operators.

Re IP addresses/GDPR:

Logging, identifying, and blocking rogue clients are better done at least one level above BRouter, for example, in the reverse proxy, before the request reaches the BRouter server.

This way, the IP address can easily be printed in a hashed form to BRouter's log messages (even the session pool could work with the hashes instead of plain IP addresses).

In my development setup, I've already implemented logging hashes: SHA256(IP address + User-Agent), then I extract the first 10 characters. This way, each client remains unique but is not traceable to an IP address:

2024-05-19T10:46:18.273+02:00 new a9f14018d5 167 GET /?lonlats=13.377485,52.516247%7C13.351221,52.515004&profile=trekking&alternativeidx=0&format=geojson HTTP/1.1
2024-05-19T10:46:27.256+02:00   1 a9f14018d5 90 GET /?lonlats=13.377485,52.516247%7C13.351221,52.515004&profile=trekking&alternativeidx=0&format=geojson HTTP/1.1
2024-05-19T10:46:27.762+02:00   1 a9f14018d5 97 GET /?lonlats=13.377485,52.516247%7C13.351221,52.515004&profile=trekking&alternativeidx=0&format=geojson HTTP/1.1
2024-05-19T10:46:28.157+02:00   1 a9f14018d5 53 GET /?lonlats=13.377485,52.516247%7C13.351221,52.515004&profile=trekking&alternativeidx=0&format=geojson HTTP/1.1

@msb48, thank you for your remarks regarding the logged HTTP information. It makes total sense. I'll update the table in the first post accordingly.

from brouter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.