Git Product home page Git Product logo

bdclean's People

Contributors

ashwinagrawal16 avatar sunn-e avatar thiloshon avatar tom-gu avatar vijaybarve avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bdclean's Issues

Customizable bdclean

Is your feature request related to a problem? Please describe.
Right now bdclean is highly customizable, but only through source code.

Describe the solution you'd like
We have to bring the same level of customizability outside source code, so that anybody can easily change the bdclean structure.

Describe alternatives you've considered
Need to follow bdchecks style - Having an XML or JSON file that can populate questions and checks and a .config file that can manipulate environment settings and shiny app configs.

Input Visualizations

Is your feature request related to a problem? Please describe.
We need to provide proper visualizations to understand data in bdclean itself.

Describe the solution you'd like
Implement bdviz and bdtools features.

Kurator Project support

Is your feature request related to a problem? Please describe.
Good to have kurator integration/support or something similar.

Message panel

Describe the bug
Messages in the message panel is broken to several lines. Print a message in single line without breaking.

To Reproduce
Steps to reproduce the behavior:

  1. Initiate bdclean worflow '...'
  2. Allow messages to be populated '....'
  3. Click Notifications in the top right corner.'
  4. See error

Fully automating cleaning workflow

Is your feature request related to a problem? Please describe.
The process now requires a lot of manual inputs. If w could develop a system to automatically continue the entire part of the process it will be easily reproducible.

Describe the solution you'd like
We have a limited console functionality to clean data. It doesn't have as much features as shiny app but it can be improved further.

Reduce file size of bdclean

Describe the bug
Remove test csv to bdtests

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

finch will soon be archived on CRAN

πŸ‘‹ @thiloshon!

I see the GitHub version of bdclean does not import {finch} any more, good news as we plan to archive {finch} on CRAN since it's no longer maintained. Do you intend to soon submit {bdclean} to CRAN?

Input Mechanism

  • Storing mechanism for questions (Data frame/csv)
    -Editing / adding new questions and linking data check functions to questions
    -Handling conditional / sub questions (tree structure)
  • Darwinayzer and kurator-web
  • Input data

.rda needs documentation

We have two data files quest and responses which we use in function get_config. We need to document those items in roxygen

Citations function

Generating all necessary citations. This is part of the reports generation module and need to be properly implemented in the data checks design.

DB aggregation and reduplication

Is your feature request related to a problem? Please describe.
Right now only one online DB can be queried at a given time.

Describe the solution you'd like
We need to be able to gather datasets from different sources and possibly local data as well for a single cleaning run and de duplicate.

Data Handling

  • Storing resultant records
  • Storing records for cleaning / editing
  • Managing versions of data after each operation
  • File types
  • More than 5M records, do we need seperate mechanism to run checks?

The questionnaire is not really resetting itself

After answering the questionnaire >> Flagging >> returning to the questionnaire and changing the answers >> some input from the first of the questionnaire session still exist.
A simple example:

Fresh bdclean session>> answering the questionnaire:
image

Performing flagging:
image

Results:
image

Now, changing the questionnaire to:
image

Results aren't changing as they should:
image

For more details see QA working file:
https://docs.google.com/spreadsheets/d/1VFhHy6uo7qVBiHhcuLqUJ3RXKpaG-4C3RD0jqvB0xtU/edit#gid=21537633&range=11:11

DwC grouping mechanism + tabular data R viewer

Combining our DwC fields grouping mechanism and a really good tabular data R viewer (if exists???) can be really helpful when reviewing DwC data in R. If we can't find a decent viewer, I wonder if maybe we can implement the fields grouping mechanism in OpenRefine...

Saving environment and code

Is your feature request related to a problem? Please describe.
Being able to save and laod the current environment and code of the shiny environment.

Visualizing bdclean architecture

I created few slides to better understand how bdclean will be structured and how it will interact with other packages in the bdverse ecosystem.

Access the slide here and any suggestion is welcomed.

Connecting bdclean to the new bdclean checks within bdchecks

Using bdclean checks within bdchecks:

  1. https://github.com/bd-R/bdchecks/blob/dev/R/dc_taxo_level.R
  2. https://github.com/bd-R/bdchecks/blob/dev/R/dc_spatial_resolution.R
  3. https://github.com/bd-R/bdchecks/blob/dev/R/dc_earliest_date.R
  4. https://github.com/bd-R/bdchecks/blob/dev/R/dc_temporal_resolution.R

Instead of bdclean own quality checks: https://github.com/bd-R/bdclean/blob/dev/R/quality_checks.R

I think we just need to modify the questionnaire file (right @thiloshon ?):
https://github.com/bd-R/bdclean/blob/076fa9bf39c1f799e16665121a1260188a8d66d9/R/questionnaire.R
By:

  • Replacing all the bdclean data quality functions with the correlative bdchecks functions
  • To modify the current names of bdchecks functions to their new name (I can do it @tom-gu )
  • πŸ™ Pray πŸ™

Brunches:

bdchecks>>dev | bdclean>>dev

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.