Git Product home page Git Product logo

libra's People

Contributors

a10mic avatar abhilash2000 avatar abx393 avatar aliaryan avatar anas-awadalla avatar dependabot[bot] avatar devashish-sood avatar domoritz avatar jbofill10 avatar kartikchugh avatar pahuja-gor avatar palashio avatar piyush1416 avatar pragun-ananda avatar pranavnt avatar ramyabuva avatar rohan-rap-dev avatar sar1hak avatar sidakalwadi avatar sukkritsharmaofficial avatar tgood13 avatar trigger-happy avatar ugolbck avatar umangj123 avatar vagif12 avatar vakhshoori101 avatar vraj123 avatar vyathakavilocana avatar yash19062000 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libra's Issues

CNN Data Parameters

Most image data sets consist of a CSV file with image path and labels as well as an overall image folder. Currently, the need to enter a path for every class seems redundant and might be difficult for most large image datasets. We might want to accept a CSV and Image directory path.

extensively testing preprocesser.

need to test single_reg_preprocesser preprocesser with many structured datasets. Need to report how it fails, and what it needs improvement upon.

re-do keras logging

because Libra is training multiple models we need to mute the logging that Keras has, where it outputs accuracies after every epoch and just do it for every model. This'll help clean up output when you're running the models.

current logger spams console box

need to figure out a way to make the log just stay at the bottom of the console and then update instead of re-printing out everytime.

create tuning for NLP

we need to be able to call .tune() for NLP tasks. Look into keras-tuner for this.

change query names

some of the queries like regression_querry_ann are quite tacky, a system for naming queries needs to be made.

removing noisy columns

need to create way in the structured_preprocesser() to identify columns that will reduce performance because of noise before training. This can be because columns are similar to each other and/or aren't coorelated whatsoever.

support excel file datasets

right now we're only allowing users to upload .csv files. Need support for excel files. We also need to make sure that when we do read in these files into pandas dataframes they're maintaining the same format/scheme as .csv files.

automatically remove ID columns

sometimes datasets have columns with ID's, we need to remove them.

Here's my idea for removing them: columns that hold ID's are just non-numerical columns (which can be found using data[column].dtype.name != 'object where the number of unique elements obtained by np.unique['column'] is equal to the number of rows in the dataset.

using decreasing metric for shallow tuning doesn't work well

so currently this line of code: while(all(x > y for x, y in zip(losses, losses[1:]))): inside the regression_query_ann checks to see if the new loss (with one more layer) is lower then the last loss. This results in always low stoppage. Need to find better mechanism.

improve documentation overall

documentation for library is very weak right now; we need to develop a method to document properly. Also create a file describing how the documentation works.

better splitting of files

currently, all the queries are just shoved into predictionQueries.py under client class. What's a better way of distributing these?

text classification query

sentiment analysis NLP query. This should definitely be implemented. Could even be a pre-existing algorithm.

sequence to sequence query

part of the textual module. a sequence to sequence query that converts any set of text to a new text.

Replacing the Missing/Nan Data

Instead of filling the Nan values with Zero, replace the numerical value with the column mean/median value and Categorical value with last value before the missing value in the column. For better accuracy.

dealing with date/time columns

dates and times can come in so many different shapes and sizes; we need a whole method that can be integrated into structured_preprocesser() to deal with all possibilities.

convert all queries to pipeline setup

the main framework for the queries should be similar to how the pipeline is setup under the dev-pipeliner module. This whole conversion needs to happen immediately.

Loading Datasets within Libra

TensorFlow currently allows users to load multiple datasets within TensorFlow itself (i.e. MNIST, COCO, etc). We could add this to Libra, but include relevant datasets to the time right now, such as COVID-19 related datasets, Election-Related Datasets, etc.

reinforcement learning queries (q-learning / policy gradient)

look into implementing some sort of reinforcement learning query? how are most users setting up these queries. This is a bit more difficult because RL problems require action and agents are constantly changing. Maybe look into something more practical? Deep Q Networks?

integrate Libra with pip

Libra currently cannot be installed by using pip install libra command. This needs to be completed by using PyPi

managing dataset sizes

sometimes the datasets people provide to Libra are gonna be massive; how're we managing this before we accept the dataset.

add color to logger

The final accuracy and some other items should stand out by being displayed in other colors.

using word2vec to encode targets

right now, produceMask() just creates an array of size 26 (representing the alphabet) and adds one to every element for every character. We need to find a better way to do this. Maybe use word2vec?

Tabular Data File Support

Something we could improve upon in terms of formats of data to be preprocessed could be an integration with Google Sheets/docs as well as an expansion enabling .tsv files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.