Git Product home page Git Product logo

conjurer's Issues

New feature: Hierarchy

Currently, product details have SKU number and price. Add more details such as product category, sub category and sub sub category.

Bug: Generating names based on custom training data doesn’t work

Generating names with buildNames() results in an error message if one specifies a custom data frame of names. Here’s a reprex:

library(conjurer)
df_names = data.frame(names = c("Oliver", "Jack", "Harry"),
                      stringsAsFactors = FALSE)
new_names = buildNames(df_names, numOfNames = 3, minLength = 5, maxLength = 7)
#> Error in unlist(alphaList, use.names = FALSE) : 
#> object 'alphaList' not found

R version 3.6.3 (2020-02-29)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: openSUSE Tumbleweed

Matrix products: default
BLAS: /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] conjurer_1.1.0

loaded via a namespace (and not attached):
[1] compiler_3.6.3

New feature: Pattern

Currently, this package can generate string data(buildNames), numeric data(buildNum) and alpha numeric data(buildCust, buildProd). This new feature should be able to generate a pattern i.e. including special characters. Some use cases for this are phone numbers, passwords.

Protect master branch

As the package is maturing 🚀 and the number installation has crossed 5K 🥳, the author would like to work towards allowing contributions via pull requests in the future. This means that there must be necessary checks in place to ensure that the master branch is always protected. Currently, the administrator can commit to master without any review needed. Add more protection to the master branch.

Model outcome based generation

This package currently enables generating data based on the descriptive statistics. A new approach could be to generate data based on a model performance measure.
Example:
For a given set of model type such as Logistic Regression, model performance measures such as R2 and the variable type, distribution etc., the corresponding data must be generated. This means that if the generated data is used to build the same model type, then its performance must be similar to what was asked of it.

New feature: Reconstruct matrix

This feature allows to reconstruct the matrix/dataframe based on number of clusters
💡 Given eigen values and vectors, build matrix.

Start a pythonic implementation of conjurer

The R package installations have crossed 5K 🥳 on CRAN. The author believes that there could be a wider adoption of this approach if it is scaled to other languages. Start building a Python package that is a replica of the R package.

Add Citation

After the methodology document is published, Add a citation to point to that document instead of the standard R package citation information.

New feature: Graph data

Generating graph is helpful in multiple use cases. Eg: neural network problems, route optimization problems. Currently, the genTree function is a complete m-ary undirected graph.

License information is hidden on GitHub

GitHub usually shows the license type in the About section of the repository. For this repository, there just is View license, pointing to https://github.com/SidharthMacherla/conjurer/blob/0d303273aa60fb7fe2791cfa7ee15d02cd9e9f67/LICENSE.

Looking at its history (https://github.com/SidharthMacherla/conjurer/commits/master/LICENSE), af925b5#diff-c693279643b8cd5d248172d9c22cb7cf4ed163a3c98c8a3f69c2717edd3eacb7 basically erased all valuable license information from this file.

The license is still mentioned inside the README (although referring to the aforementioned LICENSE file for details, with the file being rather sparse on details).

It would be nice if the LICENSE file actually contained the corresponding information, with GitHub being able to correctly show the license (again) automatically.

New feature: Build String and Numeric data

Currently, a customer is identified by a customer id. Add more details such as name, email ID, age, gender, marital status, phone number, national identity number (eg: SSN). A sequence generator could be built so that it can generate phone numbers but can be used by medical sciences domain to generate gene sequences. Such capabilities will make the package generic enough to be used in multiple domains.

New feature: Spatial data

A transaction currently does not have any store or online details. Add details such as store ID or online details.

Semi supervised approach

This package currently uses supervised generation approach. Plan for semi-supervised approach.

New feature: Promotional data

Build promotional information. It is common for businesses to promote products. It is common in promotions to have a discounted price for the product. Businesses also track the impact of these promotions on sales. A new feature to assign transaction to on or off promotions is needed. A promotional data could be used as treatment data for medical domain.

Methodology

Add a methodology vignette to explain the methodology for each function in a more detailed way. Add a DOI for that document.

New feature: add options to buildHierarchy

Currently, buildHierarchy uses the type equalSplit. Only m-ary complete graphs (eg:binary, tertiary) are generated. This function needs to be enhanced with options manual and automatic where trees of unequal splits can be generated. The changes need to be made at gen function level that hands over the output to build function level

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.