Git Product home page Git Product logo

Comments (7)

StyXman avatar StyXman commented on June 22, 2024

I tried to make sense of this part. We have 4 workers 1-4 plus one main thread 0. On the 19th, ~20:50, all four workers
start working on polygon, line, roads and point respectively. 2h07m54s later worker 3 finishes clustering
roads, which is reported at the end of the run. But immediately starts creating indexes for it, which takes ~15m and
~5m each. It starts analyzing roads, which I guess it's the task that finishes at 22:57 (~2m runtime).

Then 2 anonymous tasks, one finishes in 1ms, and the second lingers...? And immediately starts indexing ways.
Meanwhile, nodes, which wasn't reported as being processed by any worker, also finishes. Maybe it's the main loop
which does it? And if so, why did it finish only now, after only 0s? All this happens on the same second.

Still on the 19th, at ~23:32, worker 4 starts creating indexes for point. This is ~3h30m after it started
clustering it, which is also what is reported at the end. Again, 2 indexes and one analysis for this table, then an
anonymous task... which I guess finishes immediately? Because on the same second it creates an index for it, which looks
like a pattern (W3 did the same, remember?). It finishes in ~1h40m, so I guess W3's "Done task" at the end is the index
it was creating since the 19th?

Given all that, I added extra annotations that I think are the right ones to make sense of all that. I hope I can use
some of my plenty spare time to fix it:

    2023-08-19 20:49:52  [1] Clustering table 'planet_osm_polygon' by geometry...
    2023-08-19 20:49:52  [2] Clustering table 'planet_osm_line' by geometry...
    2023-08-19 20:49:52  [3] Clustering table 'planet_osm_roads' by geometry...
    2023-08-19 20:49:52  [4] Clustering table 'planet_osm_point' by geometry...
    2023-08-19 20:49:52  [1] Using native order for clustering table 'planet_osm_polygon'
    2023-08-19 20:49:52  [2] Using native order for clustering table 'planet_osm_line'
    2023-08-19 20:49:52  [3] Using native order for clustering table 'planet_osm_roads'
    2023-08-19 20:49:52  [4] Using native order for clustering table 'planet_osm_point'

    2023-08-19 22:35:50  [3] Creating geometry index on table 'planet_osm_roads'...
    2023-08-19 22:50:47  [3] Creating osm_id index on table 'planet_osm_roads'...
    2023-08-19 22:55:52  [3] Analyzing table 'planet_osm_roads'...
    2023-08-19 22:57:47  [3] Done task [Analyzing table 'planet_osm_roads'] in 7674389ms.
    2023-08-19 22:57:47  [3] Starting task [which one?]...
    2023-08-19 22:57:47  [3] Done task in 1ms.
    2023-08-19 22:57:47  [3] Starting task [which one?]...

    2023-08-19 22:57:47  [0] Done postprocessing on table 'planet_osm_nodes' in 0s

    2023-08-19 22:57:47  [3] Building index on table 'planet_osm_ways'

    2023-08-19 23:32:06  [4] Creating geometry index on table 'planet_osm_point'...
    2023-08-20 00:13:30  [4] Creating osm_id index on table 'planet_osm_point'...
    2023-08-20 00:20:35  [4] Analyzing table 'planet_osm_point'...
    2023-08-20 00:20:40  [4] Done task [Analyzing table 'planet_osm_point'] in 12647156ms.
    2023-08-20 00:20:40  [4] Starting task...

    2023-08-20 00:20:40  [4] Building index on table 'planet_osm_rels'
    2023-08-20 02:03:11  [4] Done task [Building index on table 'planet_osm_rels'] in 6151838ms.

    2023-08-20 03:17:24  [2] Creating geometry index on table 'planet_osm_line'...
    2023-08-20 03:54:40  [2] Creating osm_id index on table 'planet_osm_line'...
    2023-08-20 04:02:57  [2] Analyzing table 'planet_osm_line'...
    2023-08-20 04:03:01  [2] Done task [Analyzing table 'planet_osm_line'] in 25988218ms.

    2023-08-20 05:26:21  [1] Creating geometry index on table 'planet_osm_polygon'...
    2023-08-20 06:17:31  [1] Creating osm_id index on table 'planet_osm_polygon'...
    2023-08-20 06:30:46  [1] Analyzing table 'planet_osm_polygon'...
    2023-08-20 06:30:47  [1] Done task [Analyzing table 'planet_osm_polygon'] in 34854542ms.

    2023-08-20 10:48:18  [3] Done task [Building index on table 'planet_osm_ways'] in 42630605ms.

    2023-08-20 10:48:18  [0] Done postprocessing on table 'planet_osm_ways' in 42630s (11h 50m 30s)
    2023-08-20 10:48:18  [0] Done postprocessing on table 'planet_osm_rels' in 6151s (1h 42m 31s)

    2023-08-20 10:48:18  [0] All postprocessing on table 'planet_osm_point' done in 12647s (3h 30m 47s).
    2023-08-20 10:48:18  [0] All postprocessing on table 'planet_osm_line' done in 25988s (7h 13m 8s).
    2023-08-20 10:48:18  [0] All postprocessing on table 'planet_osm_polygon' done in 34854s (9h 40m 54s).
    2023-08-20 10:48:18  [0] All postprocessing on table 'planet_osm_roads' done in 7674s (2h 7m 54s).

    2023-08-20 10:48:18  [0] Overall memory usage: peak=85815MByte current=727MByte
    2023-08-20 10:48:18  [0] osm2pgsql took 154917s (43h 1m 57s) overall.

from osm2pgsql.

StyXman avatar StyXman commented on June 22, 2024

The mix of ms, s and hms is a little bit haphazard, and the reported times don't seem to reflect all post processing.

from osm2pgsql.

joto avatar joto commented on June 22, 2024

This basically goes in the same direction as #207.

What and how things are logged as changed over time and there never was a grand plan how to do this. I totally agree that the logging is hard to understand for somebody new to the project. You really have to know a lot about the internals of osm2pgsql processing to interpret the output. Osm2pgsql internal processing is complex and the question is, how much the user should actually see of how the sausage is made. Maybe we should just move all that logging to the debug mode and only tell the user when we are done? Does the user actually need to know? What information is actually actionable to the user? On the other hand we could add a lot more output, trying to make things clearer, but that would be a lot of information.

So the question is really: What is that output for? And for whom? Currently it is for experts who want to see what's going on, either in their own setups, or, more importantly, when users report problems. @StyXman What do you expect of that output?

Coincidentally I recently added https://osm2pgsql.org/contribute/how-osm2pgsql-processing-works.html to the website to help explain more about what goes on inside osm2pgsql. Could help with figuring out things, although it is just a small part of what's going on.

from osm2pgsql.

StyXman avatar StyXman commented on June 22, 2024

I'm using the logs to generate annotations on a grafana server like this:

Screenshot_20230822_170209

so I don't want to know how the sausage is made, but at least I want the fabrication and expiring date of each package I buy :)

from osm2pgsql.

joto avatar joto commented on June 22, 2024

But what are you creating those graphs for? What is it that you are trying to achieve in the end?

from osm2pgsql.

StyXman avatar StyXman commented on June 22, 2024

Right now it's investigate how disk usage changes during the import. Later it will allow me to know how updates change too. I hope to finish soon with a write up about it.

from osm2pgsql.

StyXman avatar StyXman commented on June 22, 2024

This level of logging could be done on a --log-level verbose mode. If you want, we can discuss it over IRC, I'm on the #osm channel, OFTC network.

from osm2pgsql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.