Git Product home page Git Product logo

medical_claims_tools's Introduction

Bulk FHIR Tools

GitHub Actions Build Status Go Documentation

👀 Please tell us more about your interest in or usage of these tools at our survey here!

This repository contains bulk_fhir_fetch, an ingestion tool that connects to FHIR Bulk Data APIs and saves the FHIR to local disk or GCP's FHIR Store and BigQuery. bulk_fhir_fetch is feature rich with support for scheduling and incremental data pulls, integrations to GCP logging/metrics, fetching binary data referenced by FHIR DocumentReferences, rectifying invalid FHIR and more. Popular FHIR Bulk Data APIs bulk_fhir_fetch can ingest data from include:


This is not an official Google product. If using these tools with protected health information (PHI), please be sure to follow your organization's policies with respect to PHI.

Overview

  • cmd/bulk_fhir_fetch/: A program for fetching FHIR data from a FHIR Bulk Data API, and optionally saving to disk or sending to your FHIR Store. The tool is highly configurable via flags, and can support pulling incremental data only, among other features. See bulk_fhir_fetch configuration examples for details on how to use this program.
  • bulkfhir/: A generic client package for interacting with FHIR Bulk Data APIs.
  • analytics/: A folder with some analytics notebooks and examples.
  • fhirstore/: A go helper package for uploading to FHIR store.
  • fhir/: A go package with some helpful utilities for working with FHIR.

Set up bulk_fhir_fetch on GCP

The bulk_fhir_fetch command line program uses the bulkfhir/ client library to fetch FHIR data from a FHIR Bulk Data API.

There are three high level ways to set up this tool:

  • On a GCP VM. This option is recommended for initial testing and exploration.
  • With our Orchestration tooling that deploys on Cloud Batch using Cloud workflows, Cloud Scheduler, and Cloud Secret Manager. This is the recommended setup for production.
  • Locally on your machine by following the Build instructions below.

By default logs and metrics will be written to STDOUT, but we documented how to send logs and set up dashboards in GCP.

bulk_fhir_fetch Configuration Examples

This section will detail common usage patterns for the bulk_fhir_fetch command line program using the BCDA Sandbox as an example. If you want to try this out without using real credentials, you can use the synthetic data sandbox credentials (client_id and client_secret) from the options listed here. You can check all of the various flag details by running ./bulk_fhir_fetch --help.

If using these tools with protected health information (PHI), please be sure to follow your organization's policies with respect to PHI.

  • Fetch all BCDA data for your ACO to local NDJSON files:

    ./bulk_fhir_fetch \
      -client_id=YOUR_CLIENT_ID \
      -client_secret=YOUR_SECRET \
      -fhir_server_base_url="https://sandbox.bcda.cms.gov/api/v2" \
      -fhir_auth_url="https://sandbox.bcda.cms.gov/auth/token" \
      -output_dir="/path/to/store/output/data" \
  • Rectify the data to pass R4 Validation. By default, the FHIR R4 Data returned by BCDA sandbox does not satisfy the default FHIR R4 profile at the time of this software release. bulk_fhir_fetch provides an option to tag the expected missing fields that BCDA does not map with an extension (if they are indeed missing) that will allow the data to pass R4 profile validation (and be uploaded to FHIR store, or other R4 FHIR servers). To do this, simply pass the following flag:

    -rectify=true
  • Fetch all FHIR since some timestamp. This is useful if, for example, you only wish to fetch new FHIR since yesterday (or some other time). Simply pass a FHIR instant timestamp to the -since flag.

    -since="2021-12-09T11:00:00.123+00:00"

    Note that every time fetch is run, it will log the BCDA transaction time, which can be used in future runs of fetch to only get data since the last run. If you will be using fetch in this mode frequently, consider the since file option below which automates this behavior.

  • Automatically fetch new FHIR since last successful run. The program provides a -since_file option, which the program uses to store and read BCDA timestamps from successful runs. When using this option, the fetch program will automatically read the latest timestamp from the since_file and use that to only fetch FHIR since that time. When completed successfully, it will write a new timestamp back out to that file, so that the next time fetch is run, only FHIR since that time will be fetched. The first time the program is run with -since_file it will fetch all historical FHIR from BCDA and initialize the since_file with the first timestamp.

    -since_file="path/to/some/file"

Do not run concurrent instances of fetch that use the same since file.

  • Upload FHIR to a GCP FHIR Store:

    ./bulk_fhir_fetch \
      -client_id=YOUR_CLIENT_ID \
      -client_secret=YOUR_SECRET \
      -fhir_server_base_url="https://sandbox.bcda.cms.gov/api/v2" \
      -fhir_auth_url="https://sandbox.bcda.cms.gov/auth/token" \
      -output_dir="/path/to/store/output/data/" \
      -rectify=true \
      -enable_fhir_store=true \
      -fhir_store_gcp_project="your_project" \
      -fhir_store_gcp_location="us-east4" \
      -fhir_store_gcp_dataset_id="your_gcp_dataset_id" \
      -fhir_store_id="your_fhir_store_id"

    Note: If -enable_fhir_store=true specifying -output_dir is optional. If -output_dir is not specified, no NDJSON output will be written to local disk and the only output will be to FHIR store. If you are using an older version of the tool, use -output_prefix instead of -output_dir.

To set up the bulk_fhir_fetch program to run periodically on a GCP VM, take a look at the documentation. For a discussion on the different FHIR Store upload options see the performance and cost documentation.

Cloning at a pinned version

If cloning the repo for production use, we recommend cloning the repository at the latest released version, which can be found in the releases tab. For example for version v0.1.5:

git clone --branch v0.1.5 https://github.com/google/bulk_fhir_tools.git

Build

To build the program from source run the following from the root of the repository (note you must have Go installed):

go build cmd/bulk_fhir_fetch/bulk_fhir_fetch.go

This will build the bulk_fhir_fetch binary and write it out in your current directory.

Example Analytics

This repository also contains example analysis notebooks using synthetic data that showcase query patterns once the data is in FHIR Store and BigQuery.

Trademark

FHIR® is the registered trademark of HL7 and is used with the permission of HL7.

medical_claims_tools's People

Contributors

kai-bailey avatar lisayin avatar luid101 avatar suyashkumar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

medical_claims_tools's Issues

Security Policy violation SECURITY.md

Allstar has detected that this repository’s SECURITY.md security policy is out of compliance. Status:
Security policy not enabled.
A SECURITY.md file can give users information about what constitutes a vulnerability and how to report one securely so that information about a bug is not publicly visible. Examples of secure reporting methods include using an issue tracker with private issue support, or encrypted email with a published key.

To fix this, add a SECURITY.md file that explains how to handle vulnerabilities found in your repository. Go to https://github.com/googlestaging/medical_claims_tools/security/policy to enable.

For more information, see https://docs.github.com/en/code-security/getting-started/adding-a-security-policy-to-your-repository.

This issue will auto resolve when the policy is in compliance.

Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.

Security Policy violation Outside Collaborators

Allstar has detected that this repository’s Outside Collaborators security policy is out of compliance. Status:
Did not find any owners of this repository
This policy requires all repositories to have an organization member or team assigned as an administrator. Either there are no administrators, or all administrators are outside collaborators. A responsible party is required by organization policy to respond to security events and organization requests.

To add an administrator From the main page of the repository, go to Settings -> Manage Access.
(For more information, see https://docs.github.com/en/organizations/managing-access-to-your-organizations-repositories)

Alternately, if this repository does not have any maintainers, archive or delete it.

This issue will auto resolve when the policy is in compliance.

Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.