rmi-pacta / pacta.data.validation Goto Github PK
View Code? Open in Web Editor NEWpacta.data.validation
Home Page: https://rmi-pacta.github.io/pacta.data.validation/
License: Other
pacta.data.validation
Home Page: https://rmi-pacta.github.io/pacta.data.validation/
License: Other
And replace those functions with consolidated and centralized faking functions.
nit: maybe add a fixme/comment about eventually being able to replace this with functions from pacta.data.validation? I'd rather see these faker functions consolidated into one location rather than duplicated many places (possibly with slight variation)
Originally posted by @cjyetman in https://github.com/RMI-PACTA/pacta.data.preparation/pull/286#discussion_r1159527963
https://github.com/RMI-PACTA/pacta.data.preparation/blob/main/tests/testthat/helper-fake_data.R
@cjyetman This is more of an open question, do you think this package would be a good place for such a thing to exist?
I was thinking of generating fake input portfolio data (with key columns: isin, market_value, currency), that represent a series of common cases:
There is a complicating factor, which is that the ISINs would need to point to real companies in the ABCD.
We could envision, in parallel, generating fake ABCD (and every other dataset) so that we can invent the ISINs and ensure stability.
But in any case, just curious if you think this repo would be an ok place to host this sort of thing?
Alterantively, we could just host a function that generates a suite of test portfolios.
- These ISINs represent private equity
- Private equity is not considered to be included in the analysis, so it is totally fine that we flag them as "Invalid ISIN", because for the purposes of CTM/ P4I they ARE invalid ISINs (even if overall they are not)
- The likelihood that a user is inputting private equity is relatively low BUT even if they do, and they notice that we say "invalid ISIN" I highly doubt they will be surprised
It would be unfortunate to lose validation entirely just because of these handful of ISINs though... I would vastly prefer that you:
- Update
pacta.data.validation::validate_financial_data()
to have the following behaviour:- Pass if the ISIN is formatted as you currently expect (checksum and all that)
- Warn if an ISIN of this particular type (
(A-Z){2}(numeric/ special char){10}
)- Error only if the ISIN is totally bunk (any other format)
and then implement
pacta.data.validation
inworkflow.data.preparation
That way we will at least get a record of the presence of these ISINs in the logs every time we run data prep.
Originally posted by @jdhoffa in RMI-PACTA/workflow.data.preparation#196 (comment)
applies to both:
pacta.data.validation::validate_financial_data()
pacta.data.validation::validate_abcd_flags_equity()
related:
In asset_valid_units
, it expects "tCO2/pkm":
In assert_valid_value_range_for_sector_unit
, it expects "gCO2/pkm":
cc: @jacobvjk
I have been questioning whether there should be both `assert_valid_*()` and `is_valid_*()` style functions for each data type it knows, one that errors or returns TRUE invisibly and one that returns TRUE or FALSE. That would be a rather large "interface" change. There are also a few significant things missing.
Originally posted by @cjyetman in #73 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.