Comments (5)
Hi,
Apologies for the error you encountered, I know that message is completely unhelpful. Sometime the ATTAINS API returns inconsistent results based on states or if there is no data. I'm struggling with how to handle that and provide actionable info for users. That said I dug into the results a little bit and I hope this helps:
We can use the tidy = FALSE
argument to return the plain JSON data that is sent by the ATTAINS webservice whenever there is an error in parsing the data. The [tidyjson
](https://github.com/colearendt/tidyjson) package provides functions to explore this data. When I used assessments(assessment_unit_id='KY-1749', state_code='KY', organization_id='21KY', reporting_cycle='2016', tidy = FALSE)
and parsed it with tidyjson, I got zero results. Is it possible that the KY-1749 assessment unit was not assessed for the 303(d) in 2014?
Here is another way I checked. The following will query all the assessment results for 2020 in Kentucky as a JSON, and I can filter by assessment unit:
library(tidyjson)
library(dplyr)
df_2020 <- assessments(state_code='KY', organization_id='21KY', reporting_cycle='2020', tidy = FALSE)
df_2020 |>
enter_object("items") |>
gather_array() |>
spread_all() |>
select(-c("array.index", "document.id")) |>
enter_object("assessments") |>
gather_array() |>
spread_all(recursive = TRUE) |>
filter(assessmentUnitIdentifier == 'KY-1749')
That returned a tibble with one row with the assessment results for the KY-1749 assessment unit:
# A tbl_json: 1 x 15 tibble with a "JSON" attribute
..JSON organ…¹ organ…² organ…³ repor…⁴ repor…⁵ array…⁶ asses…⁷ agenc…⁸ troph…⁹ ratio…˟ epaIR…˟ overa…˟ cycle…˟ yearL…˟
<chr> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr> <lgl> <chr> <chr> <chr> <chr>
1 "{\"assessm… 21KY Kentuc… State 2020 EPA Fi… 806 KY-1749 S <NA> NA 5 Not Su… 2010 ""
# … with abbreviated variable names ¹organizationIdentifier, ²organizationName, ³organizationTypeText, ⁴reportingCycleText,
# ⁵reportStatusCode, ⁶array.index, ⁷assessmentUnitIdentifier, ⁸agencyCode, ⁹trophicStatusCode, ˟rationaleText,
# ˟epaIRCategory, ˟overallStatus, ˟cycleLastAssessedText, ˟yearLastMonitoredText
Repeat this for 2014:
df_2014 <- assessments(state_code='KY', organization_id='21KY', reporting_cycle='2014', tidy = FALSE)
df_2014 |>
enter_object("items") |>
gather_array() |>
spread_all() |>
select(-c("array.index", "document.id")) |>
enter_object("assessments") |>
gather_array() |>
spread_all(recursive = TRUE) |>
filter(assessmentUnitIdentifier == 'KY-1749')
This returned a zero row tibble, indicating no results for that assessment unit. I am obviously not familiar with the particulars of that site, but it seems it was either not assessed by the state or not included in the data uploaded by the state.
alternative appraoch
I don't know your workflow, but it might work better to just download the assessment data for all the assessment units once and filter it locally instead of using the web api to filter it.
df_2020_tidy <- assessments(state_code='KY', organization_id='21KY', reporting_cycle='2020', tidy = TRUE)
df_2020_tidy$use_assessment |> filter(assessment_unit_identifier == "KY-1749")
# A tibble: 5 × 29
organizati…¹ organ…² organ…³ repor…⁴ repor…⁵ asses…⁶ troph…⁷ ratio…⁸ epa_i…⁹ overa…˟ cycle…˟ year_…˟ sourc…˟ sourc…˟ assoc…˟
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
1 21KY Kentuc… State 2020 EPA Fi… KY-1749 NA NA 5 Not Su… 2010 "" NA NA NA
2 21KY Kentuc… State 2020 EPA Fi… KY-1749 NA NA 5 Not Su… 2010 "" NA NA NA
3 21KY Kentuc… State 2020 EPA Fi… KY-1749 NA NA 5 Not Su… 2010 "" NA NA NA
4 21KY Kentuc… State 2020 EPA Fi… KY-1749 NA NA 5 Not Su… 2010 "" NA NA NA
5 21KY Kentuc… State 2020 EPA Fi… KY-1749 NA NA 5 Not Su… 2010 "" NA NA NA
# … with 14 more variables: use_name <chr>, use_attainment_code <chr>, threatened_indicator <chr>, trend_code <lgl>,
# agency_code <chr>, assessment_metadata <lgl>, use_attainment_code_name <chr>,
# assessment_metadata_assessment_basis_code <lgl>, assessment_metadata_assessment_types <lgl>,
# assessment_metadata_monitoring_activity <lgl>, assessment_metadata_assessment_activity_assessment_date <chr>,
# assessment_metadata_assessment_activity_assessor_name <lgl>, assessment_type_code <chr>,
# assessment_confidence_code <chr>, and abbreviated variable names ¹organization_identifier, ²organization_name,
# ³organization_type_text, ⁴reporting_cycle_text, ⁵report_status_code, ⁶assessment_unit_identifier, ⁷trophic_status_code, …
# ℹ Use `colnames()` to see all variable names
and
df_2020_tidy$parameter_assessment |> filter(assessmentUnitIdentifier == "KY-1749")
## I just noticied I missed cleaning the column names here, sorry!!!
# A tibble: 7 × 28
organizati…¹ organ…² organ…³ repor…⁴ repor…⁵ asses…⁶ troph…⁷ ratio…⁸ epaIR…⁹ overa…˟ cycle…˟ yearL…˟ param…˟ param…˟ pollu…˟
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 21KY Kentuc… State 2020 EPA Fi… KY-1749 NA NA 5 Not Su… 2010 "" Cause FECAL … Y
2 21KY Kentuc… State 2020 EPA Fi… KY-1749 NA NA 5 Not Su… 2010 "" Cause CHLORI… Y
3 21KY Kentuc… State 2020 EPA Fi… KY-1749 NA NA 5 Not Su… 2010 "" Cause HABITA… N
4 21KY Kentuc… State 2020 EPA Fi… KY-1749 NA NA 5 Not Su… 2010 "" Cause NUTRIE… Y
5 21KY Kentuc… State 2020 EPA Fi… KY-1749 NA NA 5 Not Su… 2010 "" Cause ORGANI… Y
6 21KY Kentuc… State 2020 EPA Fi… KY-1749 NA NA 5 Not Su… 2010 "" Cause SEDIME… Y
7 21KY Kentuc… State 2020 EPA Fi… KY-1749 NA NA 5 Not Su… 2010 "" Cause TOTAL … Y
# … with 13 more variables: impairedWatersInformation.category4BInformation <lgl>,
# impairedWatersInformation.listingInformation.agencyCode <chr>,
# impairedWatersInformation.listingInformation.cycleFirstListedText <chr>,
# impairedWatersInformation.listingInformation.cycleScheduledForTMDLText <chr>,
# impairedWatersInformation.listingInformation.CWA303dPriorityRankingText <chr>,
# impairedWatersInformation.listingInformation.consentDecreeCycleText <chr>,
# impairedWatersInformation.listingInformation.alternateListingIdentifier <lgl>, associatedUseName <chr>, …
# ℹ Use `colnames()` to see all variable names
BUT when I went back to 2014 and 2012 with this approach, I got an error....
## Of course this returns an error
df_2014_tidy <- assessments(state_code='KY', organization_id='21KY', reporting_cycle='2014', tidy = TRUE)
So we can write a little function to tidy up the raw json instead:
## make sure we have the packages we need:
library(tidyjson)
library(dplyr)
library(tidyr)
library(purrr)
tidy_assessment_data <- function(content) {
## return documents
content %>%
enter_object("items") %>%
gather_array() %>%
spread_all() %>%
select(-c("array.index", "document.id")) %>%
enter_object("documents") %>%
gather_array() %>%
spread_all(recursive = TRUE) %>%
select(-"array.index") %>%
as_tibble() -> content_docs
## return use assessment data
content %>%
enter_object("items") %>%
gather_array() %>%
spread_all() %>%
select(-c("array.index", "document.id")) %>%
enter_object("assessments") %>%
gather_array() %>%
spread_all(recursive = TRUE) %>%
select(-c("array.index", "agencyCode")) %>%
mutate(
probableSources = map(.data$..JSON, ~{
.x[["probableSources"]] %>% {
tibble(
sourceName = map_chr(., "sourceName"),
sourceConfirmedIndicator = map_chr(., "sourceConfirmedIndicator"),
associatedCauseName = map(., ~{
.x[["associatedCauseNames"]] %>% {
tibble(
causeName = map_chr(., "causeName")
)}
})) %>%
unnest("associatedCauseName", keep_empty = TRUE)
}})
) %>%
tibble::as_tibble() %>%
janitor::clean_names()-> content_use_assessments
## return parameter assessment data
content %>%
enter_object("items") %>%
gather_array() %>%
spread_all() %>%
select(-c("array.index", "document.id")) %>%
enter_object("assessments") %>%
gather_array() %>%
spread_all(recursive = TRUE) %>%
select(-c("array.index", "agencyCode")) %>%
enter_object("parameters") %>%
gather_array() %>%
spread_all(recursive = TRUE) %>%
select(-c("array.index")) %>%
enter_object("associatedUses") %>%
gather_array() %>%
select(-"array.index") %>%
spread_all(recursive = TRUE) %>%
mutate(seasons = map(.data$..JSON, ~{
.x[["seasons"]] %>% {
tibble(
seasonStartText = map_chr(., "seasonStartText"),
seasonEndText = map_chr(., "seasonEndText")
)
}
})) %>%
unnest("seasons", keep_empty = TRUE) -> content_parameter_assessments
return(list(documents = content_docs,
use_assessment = content_use_assessments,
parameter_assessment = content_parameter_assessments))
}
## now make the query again but return the raw json
df_2014_raw <- assessments(state_code='KY', organization_id='21KY', reporting_cycle='2014', tidy = FALSE)
## and tidy it
df_2014_tidy <- tidy_assessment_data(df_2014_raw)
df_2014_tidy
$documents
# A tibble: 0 × 5
# … with 5 variables: organizationIdentifier <chr>, organizationName <chr>, organizationTypeText <chr>,
# reportingCycleText <chr>, reportStatusCode <chr>
# ℹ Use `colnames()` to see all variable names
$use_assessment
# A tibble: 2,572 × 13
organization_identifier organiza…¹ organ…² repor…³ repor…⁴ asses…⁵ troph…⁶ ratio…⁷ epa_i…⁸ overa…⁹ cycle…˟ year_…˟ probab…˟
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <list>
1 21KY Kentucky State 2014 Histor… KY0065… Unknown NA 2 Fully … 2008 "" <tibble>
2 21KY Kentucky State 2014 Histor… KY0066… Unknown NA 2 Fully … 2008 "" <tibble>
3 21KY Kentucky State 2014 Histor… KY0068… Unknown NA 2 Fully … 2009 "" <tibble>
4 21KY Kentucky State 2014 Histor… KY1269… NA NA 5 Not Su… 2012 "" <tibble>
5 21KY Kentucky State 2014 Histor… KY1269… NA NA 5 Not Su… 2012 "" <tibble>
6 21KY Kentucky State 2014 Histor… KY1269… NA NA 2 Fully … 2003 "" <tibble>
7 21KY Kentucky State 2014 Histor… KY1270… NA NA 5 Not Su… 2012 "" <tibble>
8 21KY Kentucky State 2014 Histor… KY1278… NA NA 2 Fully … 2001 "" <tibble>
9 21KY Kentucky State 2014 Histor… KY1291… NA NA 2 Fully … 2003 "" <tibble>
10 21KY Kentucky State 2014 Histor… KY1305… NA NA 2 Fully … 2007 "" <tibble>
# … with 2,562 more rows, and abbreviated variable names ¹organization_name, ²organization_type_text, ³reporting_cycle_text,
# ⁴report_status_code, ⁵assessment_unit_identifier, ⁶trophic_status_code, ⁷rationale_text, ⁸epa_ir_category,
# ⁹overall_status, ˟cycle_last_assessed_text, ˟year_last_monitored_text, ˟probable_sources
# ℹ Use `print(n = ...)` to see more rows
$parameter_assessment
# A tibble: 4,066 × 28
organizat…¹ organ…² organ…³ repor…⁴ repor…⁵ asses…⁶ troph…⁷ ratio…⁸ epaIR…⁹ overa…˟ cycle…˟ yearL…˟ param…˟ param…˟ pollu…˟
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 21KY Kentuc… State 2014 Histor… KY0066… Unknown NA 2 Fully … 2008 "" Observ… MERCUR… NA
2 21KY Kentuc… State 2014 Histor… KY0068… Unknown NA 2 Fully … 2009 "" Observ… MERCUR… NA
3 21KY Kentuc… State 2014 Histor… KY1269… NA NA 5 Not Su… 2012 "" Cause CAUSE … Y
4 21KY Kentuc… State 2014 Histor… KY1269… NA NA 5 Not Su… 2012 "" Cause ESCHER… Y
5 21KY Kentuc… State 2014 Histor… KY1270… NA NA 5 Not Su… 2012 "" Cause CAUSE … Y
6 21KY Kentuc… State 2014 Histor… KY1548… NA NA 5 Not Su… 2003 "" Cause POLYCH… Y
7 21KY Kentuc… State 2014 Histor… KY2355… NA NA 5 Not Su… 1998 "" Cause NUTRIE… Y
8 21KY Kentuc… State 2014 Histor… KY2355… NA NA 5 Not Su… 1998 "" Cause FECAL … Y
9 21KY Kentuc… State 2014 Histor… KY2543… NA NA 5 Not Su… 2009 "" Cause SEDIME… Y
10 21KY Kentuc… State 2014 Histor… KY2564… Unknown NA 5 Not Su… 2011 "" Cause DISSOL… Y
# … with 4,056 more rows, 13 more variables: impairedWatersInformation.category4BInformation <lgl>,
# impairedWatersInformation.listingInformation.agencyCode <chr>,
# impairedWatersInformation.listingInformation.cycleFirstListedText <chr>,
# impairedWatersInformation.listingInformation.cycleScheduledForTMDLText <chr>,
# impairedWatersInformation.listingInformation.CWA303dPriorityRankingText <lgl>,
# impairedWatersInformation.listingInformation.consentDecreeCycleText <chr>,
# impairedWatersInformation.listingInformation.alternateListingIdentifier <chr>, associatedUseName <chr>, …
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
This is pretty close to what the other functions return. I don't have any idea why it isn't returning a 303(d) document for 2014 though.
from rattains.
Thank you so much for your quick and thorough response!! I think you are right --- downloading all assessments and then filtering in R might be the best approach since it reduces the number of queries to the API. Thank you for providing a function to clean up the raw JSON results. I am adding your name to the acknowledgments and look forward to citing this package in our manuscript!
from rattains.
Thanks! Look forward to seeing the paper! Feel free to reach out if there are any other issues, this package is definitely a work in progress.
from rattains.
@TraciPopejoy you may want to try the development version in the dev-flatten branch:
remotes::install_github("mps9506/rATTAINS", ref = "dev-flatten")
I is going to be a while before I can upload to CRAN because I'm waiting on the tibblify package this release depends on to mature. Hopefully this returns more consistent data structures across all queries to EPA ATTAINS.
assessments(assessment_unit_id='KY-1749', state_code='KY', organization_id='21KY', reporting_cycle='2016')
$documents
# A tibble: 8 × 13
organizationIdentifier organizatio…¹ organ…² repor…³ repor…⁴ agenc…⁵ docum…⁶ docum…⁷ docum…⁸ docum…⁹ docum…˟ docum…˟ docum…˟
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 21KY Kentucky State 2016 EPA Fi… S Other … applic… KY 201… KY 201… NA NA https:…
2 21KY Kentucky State 2016 EPA Fi… S Integr… applic… 2016 I… 2016 I… NA Respon… https:…
3 21KY Kentucky State 2016 EPA Fi… S Assess… applic… Kentuc… Kentuc… NA NA https:…
4 21KY Kentucky State 2016 EPA Fi… E EPA De… applic… Kentuc… Kentuc… NA NA https:…
5 21KY Kentucky State 2016 EPA Fi… S Other … applic… Progra… Progra… NA NA https:…
6 21KY Kentucky State 2016 EPA Fi… S Other … applic… 2016IR… 2016 3… NA Includ… https:…
7 21KY Kentucky State 2016 EPA Fi… S Other … applic… 2016IR… 2016 3… NA NA https:…
8 21KY Kentucky State 2016 EPA Fi… S Other … applic… kdow_3… 2016 3… NA NA https:…
# … with abbreviated variable names ¹organizationName, ²organizationTypeText, ³reportingCycleText, ⁴reportStatusCode,
# ⁵agencyCode, ⁶documentTypeCode, ⁷documentFileType, ⁸documentFileName, ⁹documentName, ˟documentDescription,
# ˟documentComments, ˟documentURL
$use_assessment
# A tibble: 1 × 17
organizati…¹ organ…² organ…³ repor…⁴ repor…⁵ asses…⁶ agenc…⁷ troph…⁸ useAt…⁹ param…˟ proba…˟ docum…˟ ratio…˟ epaIR…˟ overa…˟
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <list<> <list<> <list<> <list<> <chr> <chr> <chr>
1 21KY Kentuc… State 2016 EPA Fi… NA NA NA NA NA NA
# … with 2 more variables: cycleLastAssessedText <chr>, yearLastMonitoredText <chr>, and abbreviated variable names
# ¹organizationIdentifier, ²organizationName, ³organizationTypeText, ⁴reportingCycleText, ⁵reportStatusCode,
# ⁶assessmentUnitIdentifier, ⁷agencyCode, ⁸trophicStatusCode, ⁹useAttainments, ˟parameters, ˟probableSources, ˟documents,
# ˟rationaleText, ˟epaIRCategory, ˟overallStatus
# ℹ Use `colnames()` to see all variable names
$delisted_waters
# A tibble: 1 × 10
organizationIdentifier organizationName organizationTypeText reportingCycl…¹ repor…² asses…³ cause…⁴ agenc…⁵ delis…⁶ delis…⁷
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 21KY Kentucky State 2016 EPA Fi… NA NA NA NA NA
# … with abbreviated variable names ¹reportingCycleText, ²reportStatusCode, ³assessmentUnitIdentifier, ⁴causeName,
# ⁵agencyCode, ⁶delistingReasonCode, ⁷delistingCommentText
from rattains.
Thank you so much for this update!! It works perfectly and has made my life a lot easier. I really appreciate all the hard work you've put into this r package.
from rattains.
Related Issues (17)
- Release rATTAINS 0.1.0
- CRAN check errors
- Release rATTAINS 0.1.1
- Empty folders created HOT 1
- Release rATTAINS 0.1.2
- Test Failure on CRAN HOT 1
- Release rATTAINS 0.1.3
- Migrate master to main
- Use pkgdown template
- Error with state_summary() tidy = TRUE HOT 5
- Release rATTAINS 0.1.4
- Remove caching? HOT 1
- Broken Test HOT 1
- Tasks for v1.0 releases
- leave data nested? [possible breaking change]
- Release rATTAINS 1.0.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rattains.