Git Product home page Git Product logo

Comments (8)

extremecarver avatar extremecarver commented on August 27, 2024 1

Thanks a lot for adding it back.

from pyhgtmap.

agrenott avatar agrenott commented on August 27, 2024

The jobs parameter was actually working (at least on phyghtmap 2.23, the latest one); but due to the constraints put on parallelism (notably to handle single output), it was actually quite difficult to really use more than 2-3 CPUs at any given time.

This is why I took the decision to remove single file output as:

  • I don't need it :D
  • Constraints are different depending on the output format (OSM and O5M expect to have all nodes before the first way, PBF doesn't seem to care)
  • Actual writing of output file is now the most time-consuming part, and can't be parallelized with single output

I could probably re-introduce the single output option without adding too much complexity, but it means I probably won't bother handling parallelization in this case.

from pyhgtmap.

extremecarver avatar extremecarver commented on August 27, 2024

Is it still handling the node numbering right? Did you have any problems with merging using osmconvert? If current approach and then merge with osmconvert then delete the single files is much faster vs writing to single file up front I think it's okay. Should mention in instructions however how you intend them to be merged (for my usecase single files is not an option - but if I know it's reliable to merge fine too).

Ah okay - I could not see any speed diffrence between jobs=2 or jobs=12 (hexacore CPU with 12 threads). Jobs=1 was much slower on 2.23.

from pyhgtmap.

agrenott avatar agrenott commented on August 27, 2024

I kept the logic to avoid nodes & ways numbering overlap, so the resulting files should merge nicely. I didn't try though.

from pyhgtmap.

extremecarver avatar extremecarver commented on August 27, 2024

Well merging with osmconvert only works for o5m files and is quite slow... Multiple pbf files cannot be merged. Writing to 05m is much slower than writing to pbf however.

So a bit unexact as Germany is small and I didn't look at seconds - but roughly it takes twice the time for me - 4 minutes instead of 2 minutes just writing to multiple pbf files.

Here is a my sample command for germany (note somehow bash has a problem with _ so I need to set a variable for it):

Underline=_
nice -n 19 pyhgtmap --earthexplorer-user=extremecarver --earthexplorer-password=Testmap0 --jobs=12 --polygon=/home/contourlines/bounds/"$COUNTRY".poly --step=$step --no-zero-contour --void-range-max=-420 --output-prefix="$COUNTRY2" --line-cat=$detail --start-node-id=10000000 --start-way-id=10000000 --source=$SOURCE --max-nodes-per-way=230 --max-nodes-per-tile=0 --o5m --hgtdir=/home/contourlines/hgt --simplifyContoursEpsilon=0.00001 -j16
dup3="$COUNTRY2""$Underline".o5m
osmconvert $dup3 -o="$COUNTRY2""$Underline".osm.pbf
rm "$COUNTRY2""$Underline"
.o5m

So maybe in that case writing pbf directly would be faster? I don't know any tool that is faster than osmconvert.

from pyhgtmap.

extremecarver avatar extremecarver commented on August 27, 2024

Following up here (instead of the closed Topic on Europe)- the single file option will be needed because otherwis it is not possible to create continents into a single file.
Osmconvert can only process 1001 files - and osmium capitulates very quickly with pbf as input files (maybe 300 max) while with o5m it will run out of memory.

Compiling Europe 10m interval to o5m with pyhgtmap took 2:20 hours plus 18minuts to merge them with osmium.
Compiling Europe 10m interval to pbf took 1:13 hours - no way to merge them later. Osmium crashse quickly with:
"Open failed for 'europe10m_lon36.00_37.00lat65.00_66.00_srtm1v3.0.osm.pbf': Too many open files"
and I doubt it could do it anyhow for big countries as it would run out of memory too.

So for continents, Russia, China, Canada, USA and maybe Brazil if you want it in a single file - it would be needed to have a "slow" output into a single pbf file.

And yeah it's clear writing to pbf is much faster than writing to o5m. That's running --max-nodes-per-tile=0.
And yeah actually I need to split those filse later again usually to max-nodes=6400000 - howevre as many flat areas would result in much smaller 1"x1" tiles - I first need to merge them then split them again. Because in the end for my use case it makes a big difference if I end up with 1077 tiles or 1700 tiles (because the current approach creatse many 1"x1" tiles that are much smaller than 6400000 nodes with some like in the Alps tiles being much bigger for 1"x1"). 1077 vs 17000 is approx for Europe.
For maps that I craete with 20m contourlines the difference will be double as I then use twice the max-nodes value for splitting them (or would need to run pyhgtmap again with 20m interval instead of just running my map compiler later dropping 10m,30m,.... lines and using only 20m,40m,....).
Some other people may have other use cases however and it may be important for them to actually have a single output file.

I still wonder a bit about the comment - Actual writing of output file is now the most time-consuming part - because the time difference above for Europe between 05m and pbf is certainly not down to writing to HDD. While my HDD isn't blazing fast it can write 200MB per second (continous) or maybe 50MB/s for less continous and has a 512MB buffer that would speed up even more for files less than 1GB in size (server grade HDD)

from pyhgtmap.

agrenott avatar agrenott commented on August 27, 2024

I'm off for a week, I'll have a look to the single file output when back.

Concerning the file generation, it's not the IO taking time (it's actually using another thread with pyosmium, and is done in batches), but the computing. Pyosmium interface requires a function call per node, and for millions of nodes this takes a lot of CPU. I think in the latest profiling I did this now more than half of the total processing time.

from pyhgtmap.

agrenott avatar agrenott commented on August 27, 2024

More details concerning this point:

Concerning the file generation, it's not the IO taking time (it's actually using another thread with pyosmium, and is done in batches), but the computing. Pyosmium interface requires a function call per node, and for millions of nodes this takes a lot of CPU. I think in the latest profiling I did this now more than half of the total processing time.

Profiling the generation of a single output from 2 view1 local files (with python -m yappi -f callgrind -o yappi_ex1.out ../../pyhgtmap/main.py --pbf --log=DEBUG --max-nodes-per-tile=0 /mnt/g/git/garmin_mtb/work/hgt/VIEW1/N46E014.hgt /mnt/g/git/garmin_mtb/work/hgt/VIEW1/N46E015.hgt):

image

  1. is the time spent writing NODES to PBF output (11215767 nodes in this example)
  2. is the time spent actually generating contours
  3. is the time spent writing WAYS to PBF output (50796 ways in this example)

At best, parallelization could allow processing 2 in parallel of (1+3), which would be ~25% improvement of the overall elapsed time. Not really worth the added complexity until one find a way to optimize the actual PBF output part.

from pyhgtmap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.