Git Product home page Git Product logo

Comments (18)

nevrome avatar nevrome commented on July 22, 2024 2

Exactly that's why I wanted to avoid the overhead of multiple packages. But you convinced me. We could call it the bazAARverse with the packages

  • c14bazAAR
  • oslbazaAAR
  • urathorbazAAR
  • isotopebazAAR
  • adnabazAAR
  • ...

and the general helper package bazAAR.

Now we only need a team of 5 and 3 weeks of time 👍. If only we could justify this investment...

from c14bazaar.

felixriede avatar felixriede commented on July 22, 2024 1

We can look into it, perhaps experiment with data import from existing db for the Palaeolithic, as part of @yesdavid's projekt. We'll put it on our to do list.

from c14bazaar.

MartinHinz avatar MartinHinz commented on July 22, 2024 1

I will volunteer for the dendro part! Count me in!

from c14bazaar.

MartinHinz avatar MartinHinz commented on July 22, 2024

From my perspective, it actually does not make so much sense to put OSL and Dendro into a package that is called c14bazaar. That would probably go much better into an oslbazaar and dendrobazaar package. Also, one could envisage extracting the filter functions (that might be useful for all the bazaars) into another package bazaarSanitizaar?

from c14bazaar.

nevrome avatar nevrome commented on July 22, 2024

I generally agree with you. My experience with neolithicRC was that somehow nobody considered it for Bronze Age dates. Weird 😁.

On the other hand it's tedious to maintain and establish multiple packages. I would only invest this time if we have a solid number of expected users.

from c14bazaar.

MartinHinz avatar MartinHinz commented on July 22, 2024

Admittedly, multiple packages require a bit more coordination. But on the other hand, the amount of code to maintain should approx. stay the same, doesn't it?

from c14bazaar.

MartinHinz avatar MartinHinz commented on July 22, 2024

Can I detect a hint of sarcasm in those lines? ;-)

from c14bazaar.

nevrome avatar nevrome commented on July 22, 2024

Maybe a pinch. But seriously: This would be fantastic. A great standardization challenge, that could advance the field. Some discussions I recently shared with @stschiff inspired me to think in bigger dimensions again.

I only fear we're doing this mostly for ourselves at the moment. Is there are possibility that we reach the critical mass to make this an established tool in our field? Some developments in the last weeks give me hope, but I'm still wondering.

from c14bazaar.

MartinHinz avatar MartinHinz commented on July 22, 2024

This sounds mysterious. Nevertheless, starting with osl and dendro would be already a (maybe manageable?) enterprise and a 'leap forward'. Although, adna is also very tempting...
Maybe some other parties might be interesting to join in if you agree on that? I will not name them just now, but I could think of some UK or Iberia based scientists who would also benefit from the interface and might invest some time to align it with existing or future packages. What do you think?

from c14bazaar.

nevrome avatar nevrome commented on July 22, 2024

I started to work on c14bazAAR because I profited from it for my own research projects (although the work went way beyond that at some points). I think this connection is crucial to do this in a feasible way.

@yesdavid Do you think you would need OSL (and/or Uran-Thorium etc.) datings for your research? If yes, could you imagine creating and maintaining such packages if you receive proper support from us? Maybe this is interesting for @felixriede as well?

@MartinHinz Are you in a position where this would apply to you concerning Dendro-datings? Are there even open (!) databases out there for this kind of data?

I could volunteer to coordinate the process and apply the necessary changes to c14bazAAR to detangle 14C related functions and general functions. I'm sure @dirkseidensticker would be on board as well.

Is this a good way to approach this? A good investment of our time? I see this as a long-term, slow-pace transformation.

from c14bazaar.

dirkseidensticker avatar dirkseidensticker commented on July 22, 2024

What a great discussion! I see great value in the way we approached standardization within c14bazAAR, which could be translated to other kinds of data as well.

Exactly that's why I wanted to avoid the overhead of multiple packages. But you convinced me. We could call it the bazAARverse with the packages

  • c14bazAAR
  • oslbazaAAR
  • urathorbazAAR
  • isotopebazAAR
  • adnabazAAR
  • ...

and the general helper package bazAAR.

Now we only need a team of 5 and 3 weeks of time 👍. If only we could justify this investment...

I am very much thrilled about such an approach! We would need to discuss how the logic we have already should/could be split. A lot of our efforts with regards to standartization were pointed at the metadata that are associated with 14C dates. Especially our approached towards 'thesaurification' are only scratching the surface as of yet.

@nevrome is right, a critical mass is important as well as a focus on research questions that benefit from such 'investments' of time and energy.

Two action points from me:

  1. Should we introduce this as a possible topic for the hackathon at CAA?
  2. How many datasets are out there? If most data are only published behind paywalls as supplementary to papers we would not get very far.

Btw: aDRAC contains a few OSL dates as well ... might be a good time to turn myself in 😉

from c14bazaar.

nevrome avatar nevrome commented on July 22, 2024

Especially our approached towards 'thesaurification' are only scratching the surface as of yet.

One interesting aspect @yesdavid brought forward: The amount of data we have manually compiled to simplify oddly specific sample material descriptions could be enough to try machine learning. I guess he was joking, but I would love to give this a try one day.

Should we introduce this as a possible topic for the hackathon at CAA?

I got the impression the topic for the Hackathon is already pretty fix. But as this might not be the last one of these events, I think this is a good idea.

How many datasets are out there? If most data are only published behind paywalls as supplementary to papers we would not get very far.

I think this is the kind of data where it might be the most easy to contact the authors and ask for a data publication on a long-term archive.

Btw: aDRAC contains a few OSL dates as well ... might be a good time to turn myself in 😉

Off with his head! But seriously: We should check all databases for this: #92

from c14bazaar.

MartinHinz avatar MartinHinz commented on July 22, 2024

machine-learning sounds good! the only downside is, that it should be consistent for every user, and you can not trust the machines to do so on the client-side. But on 'server-side' this might be worthwhile

hackathon: not at CAA, but what about a virtual hackathon, or let's call it a sprint on the GitHub repo one day or the other?

paywall: who is not in with open science, is out.

from c14bazaar.

nevrome avatar nevrome commented on July 22, 2024

virtual hackathon

I like this idea. Maybe one day in February that we all try to shovel free to lay the foundation in a concerted attack.

from c14bazaar.

dirkseidensticker avatar dirkseidensticker commented on July 22, 2024

I am in!

from c14bazaar.

stschiff avatar stschiff commented on July 22, 2024

This sounds all great. Regarding aDNA data, just my five cents: This is so high-dimensional (a million genetic markers aren't uncommon) that it wouldn't fit into the exact same framework as the other data you have (C14, dendro, isotopes), which mostly come down to one number (plus extensive meta-information). One could of course think about summary stats (like Principal Components coordinates or something).

But I like the generic setup of these bazAARverse packages, which would basically try to offer a consistent API into such datasets, perhaps even with cross-compatibility of at least overlapping meta-info fields (say, longitude, latitude, or even somehow universal individual IDs that would link experimental data for the same individual burial).

from c14bazaar.

MartinHinz avatar MartinHinz commented on July 22, 2024

Thanks for the clarification, I think you are absolutely right. On the other hand, such things like Haplogroup, mt or Y, could be made accessible with that, or just a link for downloading the original high-dimensional data. Still not being very familiar with that topic, but eager to learn, I would already benefit from such a possibility.

from c14bazaar.

stschiff avatar stschiff commented on July 22, 2024

Yes, good point. Haplogroups could go into this, for sure, and they are already quite interesting. And of course, if we could even automatically download the full data somehow through a function, that would make people's life a lot easier. I think a lot of this depends on the development of an open and consistent data format for aDNA data and its meta-data, and we're working on that with @nevrome and others. So he's in the right position to help pushing this on this frontend side once we're making progress on the backend side.

from c14bazaar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.