Git Product home page Git Product logo

Comments (6)

joeroe avatar joeroe commented on July 22, 2024 2

I think that's the way to go, yeah.

from c14bazaar.

nevrome avatar nevrome commented on July 22, 2024 1

Very cool! - I was not aware of this option.

This indeed sounds like an application for an own package, because the data is not as monolithic as for most of the other "databases" (tables) in c14bazAAR. But writing a parser that simply collects everything may be a good first step towards that direction, as you can ignore the user input for now and nail down the tree merge algorithm first.

A PR would be very welcome! ORAU is extremely juicy.

from c14bazaar.

joeroe avatar joeroe commented on July 22, 2024

@nevrome That was my thinking too. I have a rough parser at joeroe/c14bazAAR/tree/intchron. It does seem to be worth it – crawling the full database returns over 11,000 dates, most of which are new for c14bazAAR:

intchron <- get_intchron("https://intchron.org/host")
# Or to save time:
# load("playground/intchron-cache-20201009.Rd")
length(unique(intchron$labcode))
#> [1] 11613

all <- get_c14data("all")
sum(!intchron$labcode %in% all$labnr)
#> [1] 9882

But it's extremely slow. Getting the whole database took about an hour on my fast university connection, because we have to make something like ~2000 separate HTTP requests.

So I'm thinking that splitting this off to its own package is a good idea after all. That way you could provide functions for getting subsets of the full IntChron database (e.g. by host/source, by country) and encourage the user to use that granularity in the c14bazAAR parser. Some sort of caching might also help.

from c14bazaar.

nevrome avatar nevrome commented on July 22, 2024

Alright - thanks for testing - excellent work! Downloading the whole thing is not feasible then and an own package for specific queries is clearly the way to go.

Maybe one solution to ensure the interoperability with c14bazAAR would be to use the c14_date_list data format for this new package?

from c14bazaar.

joeroe avatar joeroe commented on July 22, 2024

I've split the basic API interaction and querying off into its own package: joeroe/rintchron. I'll rewrite the parser on my intchron branch to use these instead. I also managed to get the time taken to retrieve the whole database down to 7 minutes (joeroe/rintchron#3), so I think we're close to it being viable to use as a normal c14bazAAR database, especially if there are separate parsers for ORAU, NCRF, etc.

from c14bazaar.

nevrome avatar nevrome commented on July 22, 2024

Great job! So we could go through intchron to get the data from different individual databases? We could write a parser function get_orau() which calls rintchron::intchron()?

from c14bazaar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.