Git Product home page Git Product logo

deepjetcore's Introduction

Hi there ๐Ÿ‘‹

Hi there, I'm a postdoc at Boston University and MIT, and founder of SimPPL researching platform governance and building open-access trust and safety tools. I got my Ph.D. at NYU Data Science and CSMAP where I researched methods to limit misleading information online. I use tools from machine learning and causal inference to estimate the effects of interventions to limit online harms on social networks. In the past I worked on productionizing machine learning for particle physics, prototyping and productionizing a graph-based trending hashtag recommendation for videos, and graph-based deep learning in physics! I've also worked on ML in cybersecurity, chatbots, and open-source client APIs for IoT ML pipelines. I'm an engineer at heart and a researcher by profession so I enjoy building scalable systems to tackle hard problems!

I've conceptualized and built a Google AI-backed ML course for undergrads. I (amateur-ishly) hosted an advanced statistics reading group following Cosma Shalizi's wonderful textbook on the weekends and continue to mentor students doing impactful research. I was involved as a technical mentor for the Grand Challenge at CERN and the Stanford Scholar Initiative but most notably I co-founded and continue to lead a fast-growing FOSS-development + mentorship program for undergrads called Unicode which continues to help its undergrad mentees land offers from FAANG companies and Ivy League universities!

I managed to land a couple of nice internships (although it's super-sad that I couldn't do them all) and wrote a guide for students at the undergraduate and graduate level. I'm also working on a book about engineering education in India at the undergraduate level, hopefully coming out in 2022!

Swapneel's GitHub stats

deepjetcore's People

Contributors

astakia avatar emilbols avatar gouskos avatar hajohajo avatar hqucms avatar jkiesele avatar kirschen avatar pfs avatar shahrukhqasim avatar swapneelm avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

ejdomi

deepjetcore's Issues

TFRecords Integration

The internal format that DeepJet writes to while converting from ROOT Ntuples to Numpy Arrays needs to be done away with. The new workflow should write directly into TFRecords.

TFRecords can then be read into a tf.data.Dataset and manipulated (batched, shuffled, custom-function applied to all elements, reshaped, etc.).

For reference on converting data from ROOT Ntuples to TFRecords, refer to the [Batch] Data Preprocessing Notebook in TrackingNtuples.

Triage needed for pip-installable library

DeepJet is now completely pip-installable (this includes auto-compiling the custom C++ extensions that earlier required a manual make install).

This needs to be tested thoroughly because it still results in an ImportError for the c_meanNormZeroPad.so shared library when it is not manually compiled.

Further, this depends on the path where the compiled shared libraries (DeepJetCore/compiled/src and others in the compiled folder) are installed.

It is designed to be installed in the conda environment thus lies in the python/site-packages directory when pip install DeepJetCore is run, but this needs to be tested on different setups (LXPLUS and FlatIron for instance) just to confirm in independent environments that DeepJet can be installed successfully and automatically without manual git-clone of DeepJetCore (note that a manual git-clone of DeepJet will still be required to run the sample models).

Triage Needed on V100 (FlatIron) GPUs

Need to test the old (Python 2, DL4Jets:master) and new version (Python 3, SwapneelM:python-package) on the GPUs at the FlatIron Institute to see if we can port DeepJet to other systems (and get faster performance).

Improve existing (minimal) Documentation

Documentation is available, but we need a complete set of docs relating to what each function (or at least submodule i.e. DeepJetCore.preprocessing, DeepJetCore.evaluation, etc.) is meant for and the parameters that go in and out of each function within these submodules.

Bare minimum: Explain how DeepJetCore is structured and the idea behind each submodule if not docstrings for each function.

See also the process of generating documentation (and a PDF file containing all docs) based on the ReadTheDocs theme.

You want to start off by building your own version of the docs using the make html or other commands available in this Makefile.

Note that this is on the python-package branch, not master.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.