sidharthmacherla / conjurer Goto Github PK
View Code? Open in Web Editor NEWR Package to generate synthetic data.
Home Page: https://foyi.co.nz/documentation-of-r-package-conjurer/
License: MIT License
R Package to generate synthetic data.
Home Page: https://foyi.co.nz/documentation-of-r-package-conjurer/
License: MIT License
Currently, product details have SKU number and price. Add more details such as product category, sub category and sub sub category.
Generating names with buildNames()
results in an error message if one specifies a custom data frame of names. Here’s a reprex:
library(conjurer)
df_names = data.frame(names = c("Oliver", "Jack", "Harry"),
stringsAsFactors = FALSE)
new_names = buildNames(df_names, numOfNames = 3, minLength = 5, maxLength = 7)
#> Error in unlist(alphaList, use.names = FALSE) :
#> object 'alphaList' not found
R version 3.6.3 (2020-02-29)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: openSUSE Tumbleweed
Matrix products: default
BLAS: /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] conjurer_1.1.0
loaded via a namespace (and not attached):
[1] compiler_3.6.3
Generate spectral data to be used for signals such as EEG.
Currently, this package can generate string data(buildNames), numeric data(buildNum) and alpha numeric data(buildCust, buildProd). This new feature should be able to generate a pattern i.e. including special characters. Some use cases for this are phone numbers, passwords.
As the package is maturing 🚀 and the number installation has crossed 5K 🥳, the author would like to work towards allowing contributions via pull requests in the future. This means that there must be necessary checks in place to ensure that the master branch is always protected. Currently, the administrator can commit to master without any review needed. Add more protection to the master branch.
This package currently enables generating data based on the descriptive statistics. A new approach could be to generate data based on a model performance measure.
Example:
For a given set of model type such as Logistic Regression, model performance measures such as R2 and the variable type, distribution etc., the corresponding data must be generated. This means that if the generated data is used to build the same model type, then its performance must be similar to what was asked of it.
Currently, the buildModelData only sources slopes from the model object. Add the intercept and the range information of the independent variable.
Publish article to share this package details
This feature allows to reconstruct the matrix/dataframe based on number of clusters
💡 Given eigen values and vectors, build matrix.
The documentation suggests that the products can be mapped to the hierarchy randomly, i.e. evenly. This needs to change to pareto based mapping.
The R package installations have crossed 5K 🥳 on CRAN. The author believes that there could be a wider adoption of this approach if it is scaled to other languages. Start building a Python package that is a replica of the R package.
After the methodology document is published, Add a citation to point to that document instead of the standard R package citation information.
Generating graph is helpful in multiple use cases. Eg: neural network problems, route optimization problems. Currently, the genTree function is a complete m-ary undirected graph.
GitHub usually shows the license type in the About section of the repository. For this repository, there just is View license, pointing to https://github.com/SidharthMacherla/conjurer/blob/0d303273aa60fb7fe2791cfa7ee15d02cd9e9f67/LICENSE.
Looking at its history (https://github.com/SidharthMacherla/conjurer/commits/master/LICENSE), af925b5#diff-c693279643b8cd5d248172d9c22cb7cf4ed163a3c98c8a3f69c2717edd3eacb7 basically erased all valuable license information from this file.
The license is still mentioned inside the README (although referring to the aforementioned LICENSE file for details, with the file being rather sparse on details).
It would be nice if the LICENSE file actually contained the corresponding information, with GitHub being able to correctly show the license (again) automatically.
Currently, a customer is identified by a customer id. Add more details such as name, email ID, age, gender, marital status, phone number, national identity number (eg: SSN). A sequence generator could be built so that it can generate phone numbers but can be used by medical sciences domain to generate gene sequences. Such capabilities will make the package generic enough to be used in multiple domains.
Although the package claims to be useful in multiple domains, the documentation speaks only about a retail use case. Add a use case from medical/biological sciences as well.
A transaction currently does not have any store or online details. Add details such as store ID or online details.
This package currently uses supervised generation approach. Plan for semi-supervised approach.
Maybe use the base price as a mean and allow the user to set the standard deviation. I'll work on this as time permits.
Currently, product price is generated but is not used as part of the use case in the vignette. This needs to be updated.
Build promotional information. It is common for businesses to promote products. It is common in promotions to have a discounted price for the product. Businesses also track the impact of these promotions on sales. A new feature to assign transaction to on or off promotions is needed. A promotional data could be used as treatment data for medical domain.
Time data. In conjunction with spatial data, this could enable spatio-temporal data
Add a methodology vignette to explain the methodology for each function in a more detailed way. Add a DOI for that document.
This feature request is to generate natural language. An example use case is customer review data for products.
Currently, buildHierarchy uses the type equalSplit. Only m-ary complete graphs (eg:binary, tertiary) are generated. This function needs to be enhanced with options manual and automatic where trees of unequal splits can be generated. The changes need to be made at gen function level that hands over the output to build function level
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.