This repository contains a set of libraries, tools, notebooks and documentation for working with Medical Claims data. In particular, this contains an example program and documentation to set up periodic FHIR claims data ingestion to local disk or GCP's FHIR Store from Medicare's Beneficiary Claims Data API (BCDA).
This repository also contains example analysis notebooks using synthetic data that showcase query patterns once the data is in FHIR Store and BigQuery.
Note: This is not an official Google product.
If using these tools with protected health information (PHI), please be sure to follow your organization's policies with respect to PHI.
bcda/
: A go client package for interacting with the BCDA.cmd/bcda_fetch/
: A configurable example CLI program for fetching data from BCDA, and optionally saving to disk or sending to your FHIR Store. The program is highly configurable, and can support pulling incremental data only, among other features.analytics/
: A folder with some analytics notebooks and examples.fhirstore/
: A go helper package for uploading to FHIR store.fhir/
: A go package with some helpful utilities for working with FHIR claims data.
The example bcda_fetch
command line program can be used to fetch data from
the BCDA to save to disk or validate and upload to a FHIR Store. This program can
also be configured to run as a periodic cron job where it only fetches new data
since the program last successfully ran.
To build the program from source run the following from the root of the repository (note you must have Go installed):
go build cmd/bcda_fetch/bcda_fetch.go
To build on a GCP VM, you can follow these instructions to get the environment setup.
Or download a prebuilt binary from the GitHub releases tab.
You can check all of the various flag details by running ./bcda_fetch --help
.
This section will detail common usage patterns for the command line tool.
If you want to try this out without using your real credentials, you can use the synthetic data sandbox credentials (client_id and client_secret) from one of the options here.
If using these tools with protected health information (PHI), please be sure to follow your organization's policies with respect to PHI.
-
Fetch all BCDA data for your ACO to local NDJSON files:
./bcda_fetch \ -client_id=YOUR_CLIENT_ID \ -client_secret=YOUR_SECRET \ -bcda_server_url="https://sandbox.bcda.cms.gov" \ -output_prefix="/path/to/store/output/data/prefix_" \ -use_v2=true \ -alsologtostderr=true -stderrthreshold=0
Change the -bcda_server_url as needed. You will need to change it if you are using the production API servers.
Feel free to change the stderrthreshold depending on what kinds of logs you wish to see. More details on logging flags can be found here.
-
Rectify the data to pass R4 Validation. By default, the FHIR R4 Data returned by BCDA does not satisfy the default FHIR R4 profile at the time of this software release. This CLI provides an option to tag the expected missing fields that BCDA does not map with an extension (if they are indeed missing) that will allow the data to pass R4 profile validation (and be uploaded to FHIR store, or other R4 FHIR servers). To do this, simply pass the following flag:
-rectify=true
-
Fetch all claims data since some timestamp. This is useful if, for example, you only wish to fetch new claims data since yesterday (or some other time). Simply pass a FHIR instant timestamp to the
-since
flag.-since="2021-12-09T11:00:00.123+00:00"
Note that every time fetch is run, it will log the BCDA transaction time, which can be used in future runs of fetch to only get data since the last run. If you will be using fetch in this mode frequently, consider the since file option below which automates this behavior.
-
Automatically fetch new claims since last successful run. The program provides a
-since_file
option, which the program uses to store and read BCDA timestamps from successful runs. When using this option, the fetch program will automatically read the latest timestamp from the since_file and use that to only fetch claims since that time. When completed successfully, it will write a new timestamp back out to that file, so that the next time fetch is run, only claims since that time will be fetched. The first time the program is run with-since_file
it will fetch all historical claims from BCDA and initialize the since_file with the first timestamp.-since_file="path/to/some/file"
Note, do not run concurrent instances of fetch that use the same since file.
-
Upload claims to a GCP FHIR Store:
./bcda_fetch \ -client_id=YOUR_CLIENT_ID \ -client_secret=YOUR_SECRET \ -bcda_server_url="https://sandbox.bcda.cms.gov" \ -output_prefix="/path/to/store/output/data/prefix_" \ -use_v2=true \ -rectify=true \ -enable_fhir_store=true \ -fhir_store_gcp_project="your_project" \ -fhir_store_gcp_location="us-east4" \ -fhir_store_gcp_dataset_id="your_gcp_dataset_id" \ -fhir_store_id="your_fhir_store_id" \ -alsologtostderr=true -stderrthreshold=0
Note: If
-enable_fhir_store=true
specifying-output_prefix
is optional. If-output_prefix
is not specified, no NDJSON output will be written to local disk and the only output will be to FHIR store.
To set up the bcda_fetch
program to run periodically, take a look at the
documentation.