Git Product home page Git Product logo

Comments (10)

namurphy avatar namurphy commented on September 23, 2024 1

Also, I was thinking it would be great to coordinate with people from other databases (e.g., AtomDB/pyatomdb) who might end up storing data in HDF5 files in the future. It would be great for the format/structure of each of the files to be as similar as can be. This would allow direct comparisons between the different databases, and allow routines to be used interchangeably between them.

from fiasco.

wtbarnes avatar wtbarnes commented on September 23, 2024

Thanks for pulling this into an issue @namurphy. @Cadair and @dpshelio also both brought this up on the Sunpy-dev mailing list. It is an issue that I've thought a lot about and one that does not seem to have a clear solution.

For the foreseeable future, I think the best way to go about this is to rebuild the (ASCII-format) CHIANTI database into HDF5 locally the first time a user installs fiasco (and after any updates to the database). It is not a terribly intensive computation (~30 mins on my ancient Macbook Pro) and it means that users who already have CHIANTI installed (e.g. through SolarSoft) can continue to use that same version of the database. If the user has not specified a specific location to look for the ASCII database or it cannot be found, it could be downloaded automatically.

I do like services like Zenodo and I think hosting (and versioning!) the database is a good idea in principle. However, the responsibility of managing and updating a hosted database is not trivial. At this point, I think I'd rather not have this responsibility especially since this project is so new. The lack of a license may also be prohibitive here. All that being said, I think a hosted and versioned HDF5 (or another convenient format) version should be the end goal.

The license issue is another headache. It was brought up in chianti-atomic/ChiantiPy#76 and pretty much went nowhere. I'm in contact with a few of the CHIANTI developers so I could pose this question to them directly. From what I can gather, there isn't really any opposition to licensing the database, It's just that no one has thought to do it. I think if we present a convincing case they would probably be willing to add a license.

from fiasco.

wtbarnes avatar wtbarnes commented on September 23, 2024

Also a good point about AtomDB. I know comparisons between all of these different atomic databases/codes have been a headache in the past so if this package could make these comparisons easier that would be great.

The current layout of the HDF5 file is pretty much exactly the same as the CHIANTI directory layout. However, to facilitate easy comparisons, I don't think the file layouts necessarily have to be the same. The data just has to be exposed (i.e. through some API) in the same way or at least in a flexible way. One of my goals is to abstract the details of the database away from the user-facing code. I think this type of approach would make the kind of comparisons you're talking about much easier.

from fiasco.

namurphy avatar namurphy commented on September 23, 2024

Agreed - as long as there is a common API/user-facing code, then it would be enough to enable easy comparisons for users. It may come down to what ends up being simplest, i.e., is it easier in the long run to put the HDF5 files with the same layout, or to have two sets of methods to access the different HDF5 files. This may end up being a decision fiasco doesn't have to make, as it is the first of these databases to convert to HDF5 (as far as I know, though my knowledge is limited). It will probably be whoever does this second who has to make that decision. In any case, yay HDF5! 👍

If I remember correctly, AtomDB currently uses .fits files.

from fiasco.

wtbarnes avatar wtbarnes commented on September 23, 2024

Just to be a bit more specific, here is what I'm thinking as far as downloading and accessing the data with fiasco... (this seemed the most logical place to record this and I wanted to write it down before I forgot it!)

As I mentioned above, I think it is best (for now) to rebuild the ASCII CHIANTI database on the user's end as an HDF5 file and not worry about distributing this ourselves. This could come later. Doing it this way, there are of course challenges with building and updating the user's database.

At import time, parse the config file ~/.fiasco/fiascorc. In my prototypes, I've structured it as follows,

[database]
dbase_root = '/path/to/chianti/dbase'
hdf5_dbase_root = '/path/to/hdf5/chianti/dbase.h5'

Read these paths into some defaults dict. If either key doesn't exist (or the rc file itself does not exist), default to ~/.fiasco/chianti_dbase and ~/.fiasco/chianti_dbase.h5, respectively. That way, everything is contained in ~/.fiasco unless explicitly stated by the user.

Next, check if the dbase_root directory exists (some additional checking could be done on the contents). If it does not, download (and unzip) the CHIANTI database from here.

Finally, check if the hdf5_dbase_root file exists, if it does not, build it from the ASCII files. If the file does exist, maybe there is some checking to see if the ASCII files have been updated since the HDF5 file was created/updated and the needed datasets are updated appropriately.

So in pseudocode,

defaults = parse_config('~/.fiasco/fiascorc')

if 'dbase_root' not in defaults:
    defaults['dbase_root'] = '~/.fiasco/dbase'
if 'hdf5_dbase_root' not in defaults:
    defaults['hdf5_dbase_root'] = '~/.fiasco/dbase.h5'

if not exists(defaults['dbase_root']):
    download_dbase(CHIANTI_URL,defaults['dbase_root'])
if not exists(defaults['hdf5_dbase_root']):
    build_hdf5_dbase(defaults['hdf5_dbase_root'])
else:
    check_for_updates(defaults['hdf5_dbase_root'])

This is just a rough outline and I'd be interested to hear people's thoughts on this.

from fiasco.

namurphy avatar namurphy commented on September 23, 2024

Overall this sounds great to me!

One possible minor issue is that looking for ~/.fiasco/fiascorc might not work on a Windows machine. A possible way to fix this would be to have a different default file location on Windows, and then check which OS is being used to figure out what the default file location should be.

from fiasco.

wtbarnes avatar wtbarnes commented on September 23, 2024

Good point about Windows. We'll have to be careful about being cross-platform. Historically, CHIANTI (and ChiantiPy) have relied on setting the XUVTOP (no idea what that could stand for, eXtreme UltraViolet TOP directory???) environment variable, an unfortunate dependence I don't want to carry over into fiasco.

I think using an approach like the one outlined in this SO answer should work though I don't have a Windows machine to actually test this on.

from fiasco.

wtbarnes avatar wtbarnes commented on September 23, 2024

Over the past two or so days, I've pushed several commits that essentially implement the system I described above. The main parts are contained in

  • fiasco.util.download_dbase()
  • fiasco.util.build_hdf5_dbase()

both contained in fiasco/util/setup_db.py. They are bit clunky (lots of if/else) and may not cover every corner case, but they'll do for now. In each case, the user is prompted before either downloading the data for building the HDF5 file.

One issue is where to do the downloading and file building. I don't want this to have to be a manual step for the user, but I also don't want to do too much under the hood. I originally did this at import fiasco, but ultimately decided to just do it when an IonBase object, which requires the existence of the HDF5 database, is instantiated.

If others could try this out and/or give their thoughts on how best to handle the downloading that'd be great.

from fiasco.

dpshelio avatar dpshelio commented on September 23, 2024

@wtbarnes - regarding the updates, you could keep a md5 checksum of the data files that chianti offers. Ideally they would provide such signatures on their side (that's something that you could add to the comments when talking with them about the license).
I completely agree with you that the easiest is to download it from chianti directly and make the conversion on your side. I like the idea of zenodo, but that should be done by chianti as its their database. I could imagine people getting anxious because you are getting citations as it would be easier to find such data on zenodo.

from fiasco.

wtbarnes avatar wtbarnes commented on September 23, 2024

@dpshelio Yeah I like the idea of some sort of hash/checksum to effectively version the data locally. though I'm really not sure about the best way to implement this. The CHIANTI team does provide a version number each time they release an update to the database as well.

from fiasco.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.