Comments (7)
I tried to make sense of this part. We have 4 workers 1-4 plus one main thread 0. On the 19th, ~20:50, all four workers
start working on polygon
, line
, roads
and point
respectively. 2h07m54s later worker 3 finishes clustering
roads
, which is reported at the end of the run. But immediately starts creating indexes for it, which takes ~15m and
~5m each. It starts analyzing roads
, which I guess it's the task that finishes at 22:57 (~2m runtime).
Then 2 anonymous tasks, one finishes in 1ms, and the second lingers...? And immediately starts indexing ways
.
Meanwhile, nodes
, which wasn't reported as being processed by any worker, also finishes. Maybe it's the main loop
which does it? And if so, why did it finish only now, after only 0s? All this happens on the same second.
Still on the 19th, at ~23:32, worker 4 starts creating indexes for point
. This is ~3h30m after it started
clustering it, which is also what is reported at the end. Again, 2 indexes and one analysis for this table, then an
anonymous task... which I guess finishes immediately? Because on the same second it creates an index for it, which looks
like a pattern (W3 did the same, remember?). It finishes in ~1h40m, so I guess W3's "Done task" at the end is the index
it was creating since the 19th?
Given all that, I added extra annotations that I think are the right ones to make sense of all that. I hope I can use
some of my plenty spare time to fix it:
2023-08-19 20:49:52 [1] Clustering table 'planet_osm_polygon' by geometry...
2023-08-19 20:49:52 [2] Clustering table 'planet_osm_line' by geometry...
2023-08-19 20:49:52 [3] Clustering table 'planet_osm_roads' by geometry...
2023-08-19 20:49:52 [4] Clustering table 'planet_osm_point' by geometry...
2023-08-19 20:49:52 [1] Using native order for clustering table 'planet_osm_polygon'
2023-08-19 20:49:52 [2] Using native order for clustering table 'planet_osm_line'
2023-08-19 20:49:52 [3] Using native order for clustering table 'planet_osm_roads'
2023-08-19 20:49:52 [4] Using native order for clustering table 'planet_osm_point'
2023-08-19 22:35:50 [3] Creating geometry index on table 'planet_osm_roads'...
2023-08-19 22:50:47 [3] Creating osm_id index on table 'planet_osm_roads'...
2023-08-19 22:55:52 [3] Analyzing table 'planet_osm_roads'...
2023-08-19 22:57:47 [3] Done task [Analyzing table 'planet_osm_roads'] in 7674389ms.
2023-08-19 22:57:47 [3] Starting task [which one?]...
2023-08-19 22:57:47 [3] Done task in 1ms.
2023-08-19 22:57:47 [3] Starting task [which one?]...
2023-08-19 22:57:47 [0] Done postprocessing on table 'planet_osm_nodes' in 0s
2023-08-19 22:57:47 [3] Building index on table 'planet_osm_ways'
2023-08-19 23:32:06 [4] Creating geometry index on table 'planet_osm_point'...
2023-08-20 00:13:30 [4] Creating osm_id index on table 'planet_osm_point'...
2023-08-20 00:20:35 [4] Analyzing table 'planet_osm_point'...
2023-08-20 00:20:40 [4] Done task [Analyzing table 'planet_osm_point'] in 12647156ms.
2023-08-20 00:20:40 [4] Starting task...
2023-08-20 00:20:40 [4] Building index on table 'planet_osm_rels'
2023-08-20 02:03:11 [4] Done task [Building index on table 'planet_osm_rels'] in 6151838ms.
2023-08-20 03:17:24 [2] Creating geometry index on table 'planet_osm_line'...
2023-08-20 03:54:40 [2] Creating osm_id index on table 'planet_osm_line'...
2023-08-20 04:02:57 [2] Analyzing table 'planet_osm_line'...
2023-08-20 04:03:01 [2] Done task [Analyzing table 'planet_osm_line'] in 25988218ms.
2023-08-20 05:26:21 [1] Creating geometry index on table 'planet_osm_polygon'...
2023-08-20 06:17:31 [1] Creating osm_id index on table 'planet_osm_polygon'...
2023-08-20 06:30:46 [1] Analyzing table 'planet_osm_polygon'...
2023-08-20 06:30:47 [1] Done task [Analyzing table 'planet_osm_polygon'] in 34854542ms.
2023-08-20 10:48:18 [3] Done task [Building index on table 'planet_osm_ways'] in 42630605ms.
2023-08-20 10:48:18 [0] Done postprocessing on table 'planet_osm_ways' in 42630s (11h 50m 30s)
2023-08-20 10:48:18 [0] Done postprocessing on table 'planet_osm_rels' in 6151s (1h 42m 31s)
2023-08-20 10:48:18 [0] All postprocessing on table 'planet_osm_point' done in 12647s (3h 30m 47s).
2023-08-20 10:48:18 [0] All postprocessing on table 'planet_osm_line' done in 25988s (7h 13m 8s).
2023-08-20 10:48:18 [0] All postprocessing on table 'planet_osm_polygon' done in 34854s (9h 40m 54s).
2023-08-20 10:48:18 [0] All postprocessing on table 'planet_osm_roads' done in 7674s (2h 7m 54s).
2023-08-20 10:48:18 [0] Overall memory usage: peak=85815MByte current=727MByte
2023-08-20 10:48:18 [0] osm2pgsql took 154917s (43h 1m 57s) overall.
from osm2pgsql.
The mix of ms, s and hms is a little bit haphazard, and the reported times don't seem to reflect all post processing.
from osm2pgsql.
This basically goes in the same direction as #207.
What and how things are logged as changed over time and there never was a grand plan how to do this. I totally agree that the logging is hard to understand for somebody new to the project. You really have to know a lot about the internals of osm2pgsql processing to interpret the output. Osm2pgsql internal processing is complex and the question is, how much the user should actually see of how the sausage is made. Maybe we should just move all that logging to the debug mode and only tell the user when we are done? Does the user actually need to know? What information is actually actionable to the user? On the other hand we could add a lot more output, trying to make things clearer, but that would be a lot of information.
So the question is really: What is that output for? And for whom? Currently it is for experts who want to see what's going on, either in their own setups, or, more importantly, when users report problems. @StyXman What do you expect of that output?
Coincidentally I recently added https://osm2pgsql.org/contribute/how-osm2pgsql-processing-works.html to the website to help explain more about what goes on inside osm2pgsql. Could help with figuring out things, although it is just a small part of what's going on.
from osm2pgsql.
I'm using the logs to generate annotations on a grafana server like this:
so I don't want to know how the sausage is made, but at least I want the fabrication and expiring date of each package I buy :)
from osm2pgsql.
But what are you creating those graphs for? What is it that you are trying to achieve in the end?
from osm2pgsql.
Right now it's investigate how disk usage changes during the import. Later it will allow me to know how updates change too. I hope to finish soon with a write up about it.
from osm2pgsql.
This level of logging could be done on a --log-level verbose
mode. If you want, we can discuss it over IRC, I'm on the #osm channel, OFTC network.
from osm2pgsql.
Related Issues (20)
- Handling of schemas HOT 1
- Loading ways in non-slim mode is slower than with --slim.
- osm2pgsql-replication init fails on get_dsn_parameters() HOT 1
- Test failure in bdd-flex with 1.9.0 HOT 2
- Problem with query to get changed parent objects HOT 7
- 1.9.2 failed to build on several architectures (error: static assertion failed) HOT 2
- Segmentation fault (core dumped) HOT 2
- Allow more than 32 generalization jobs HOT 2
- Chunky rivers when generalizing water areas HOT 2
- Question: could not extend file "base/361191441/368615606.137": No space left on device
- osm2pgsql should not perform analyze by itself HOT 15
- `object:as_multipolygon()` does not take `object.members` into account. HOT 2
- Some research on middle performance HOT 4
- highway=rest_area treatet as line HOT 2
- free(): invalid size Aborted Core dumped HOT 4
- North America import fails HOT 1
- nlohmann-json is missing from the Alpine build dependencies command
- Error in reprocessing of ways in relation, if osmc_symbols-tag of the relation contains the word 'backslash' HOT 6
- Deprecating -i,--tablespace-index? HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from osm2pgsql.