Git Product home page Git Product logo

pacta.data.validation's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

pacta.data.validation's Issues

Sweep `pacta.data.preparation/tests/testthat/helper-fake_data.R` for data faking functions

And replace those functions with consolidated and centralized faking functions.

nit: maybe add a fixme/comment about eventually being able to replace this with functions from pacta.data.validation? I'd rather see these faker functions consolidated into one location rather than duplicated many places (possibly with slight variation)

Originally posted by @cjyetman in https://github.com/RMI-PACTA/pacta.data.preparation/pull/286#discussion_r1159527963

https://github.com/RMI-PACTA/pacta.data.preparation/blob/main/tests/testthat/helper-fake_data.R

AB#10853

Maybe: Add a suite of test input portfolio for iterating over and testing the `rmi_pacta` docker image

@cjyetman This is more of an open question, do you think this package would be a good place for such a thing to exist?

I was thinking of generating fake input portfolio data (with key columns: isin, market_value, currency), that represent a series of common cases:

  • No data
  • Data, but no ISINs in PACTA sectors
  • Only equities
  • Only bonds
    etc.

There is a complicating factor, which is that the ISINs would need to point to real companies in the ABCD.

We could envision, in parallel, generating fake ABCD (and every other dataset) so that we can invent the ISINs and ensure stability.

But in any case, just curious if you think this repo would be an ok place to host this sort of thing?

Alterantively, we could just host a function that generates a suite of test portfolios.

allow technically invalid, but likely intentional ISINs with a warning

  • These ISINs represent private equity
  • Private equity is not considered to be included in the analysis, so it is totally fine that we flag them as "Invalid ISIN", because for the purposes of CTM/ P4I they ARE invalid ISINs (even if overall they are not)
  • The likelihood that a user is inputting private equity is relatively low BUT even if they do, and they notice that we say "invalid ISIN" I highly doubt they will be surprised

It would be unfortunate to lose validation entirely just because of these handful of ISINs though... I would vastly prefer that you:

  • Update pacta.data.validation::validate_financial_data() to have the following behaviour:
  • Pass if the ISIN is formatted as you currently expect (checksum and all that)
  • Warn if an ISIN of this particular type ((A-Z){2}(numeric/ special char){10})
  • Error only if the ISIN is totally bunk (any other format)

and then implement pacta.data.validation in workflow.data.preparation

That way we will at least get a record of the presence of these ISINs in the logs every time we run data prep.

Originally posted by @jdhoffa in RMI-PACTA/workflow.data.preparation#196 (comment)


applies to both:

  • pacta.data.validation::validate_financial_data()
  • pacta.data.validation::validate_abcd_flags_equity()

related:

AB#10854

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.