Comments (6)
I think that's the way to go, yeah.
from c14bazaar.
Very cool! - I was not aware of this option.
This indeed sounds like an application for an own package, because the data is not as monolithic as for most of the other "databases" (tables) in c14bazAAR. But writing a parser that simply collects everything may be a good first step towards that direction, as you can ignore the user input for now and nail down the tree merge algorithm first.
A PR would be very welcome! ORAU is extremely juicy.
from c14bazaar.
@nevrome That was my thinking too. I have a rough parser at joeroe/c14bazAAR/tree/intchron. It does seem to be worth it – crawling the full database returns over 11,000 dates, most of which are new for c14bazAAR:
intchron <- get_intchron("https://intchron.org/host")
# Or to save time:
# load("playground/intchron-cache-20201009.Rd")
length(unique(intchron$labcode))
#> [1] 11613
all <- get_c14data("all")
sum(!intchron$labcode %in% all$labnr)
#> [1] 9882
But it's extremely slow. Getting the whole database took about an hour on my fast university connection, because we have to make something like ~2000 separate HTTP requests.
So I'm thinking that splitting this off to its own package is a good idea after all. That way you could provide functions for getting subsets of the full IntChron database (e.g. by host/source, by country) and encourage the user to use that granularity in the c14bazAAR parser. Some sort of caching might also help.
from c14bazaar.
Alright - thanks for testing - excellent work! Downloading the whole thing is not feasible then and an own package for specific queries is clearly the way to go.
Maybe one solution to ensure the interoperability with c14bazAAR would be to use the c14_date_list
data format for this new package?
from c14bazaar.
I've split the basic API interaction and querying off into its own package: joeroe/rintchron. I'll rewrite the parser on my intchron branch to use these instead. I also managed to get the time taken to retrieve the whole database down to 7 minutes (joeroe/rintchron#3), so I think we're close to it being viable to use as a normal c14bazAAR database, especially if there are separate parsers for ORAU, NCRF, etc.
from c14bazaar.
Great job! So we could go through intchron to get the data from different individual databases? We could write a parser function get_orau()
which calls rintchron::intchron()
?
from c14bazaar.
Related Issues (20)
- Input argument checking
- openxlsx reading issues HOT 4
- CRAN submission ToDo list HOT 5
- Country thesaurus includes entries that aren't countries HOT 6
- How to use intcal20 for calibration? HOT 2
- Parsers for Palmisano's datasets HOT 9
- remove TL dates from aDRAC parser
- Dev mode for URL downloading HOT 1
- Retain online lookup tables for backwards compatibility? HOT 4
- Simplify variable reference table
- wrong database encodings HOT 6
- Enable all fields to be returned with get_c14data() HOT 1
- Database versions as calendar dates or explicit version numbers
- fread configuration: colClasses = "character"
- List of datasets for c14bazAAR
- devtools::check() and devtools::check(vignettes = F) errors HOT 5
- get_neonet() HOT 6
- Rework the duplicate removal interface HOT 1
- Database version update
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from c14bazaar.