Light

jordiclive / convert-polyai-torch Goto Github PK

View Code? Open in Web Editor NEW

38.0 38.0 13.0 512 KB

License: Apache License 2.0

Shell 2.46% Python 97.54%

convert-polyai-torch's Introduction

Hi there 👋

🧑‍💻 I'm currently a Lead Deep Learning Engineer at Chattermill, previously Research Engineer at Ontocord.ai
🌍 I also carry out Machine Learning Research for LAION (Stability AI) on the Ezra-1 UltraCluster, LUMI and JUWELS supercomputers; previously did work for BigScience and the BLOOM evaluation
🎓 I previously did my Masters in Machine Learning & A.I at Imperial College London carrying out work in natural language generation
📝 I’m an active contributor of machine learning libraries such as Hugging Face Transformers and Gem-benchmark
💬 I sometimes give talks for the NLP study group, the most popular NLP community on meetup.com
🔭 I’m currently working on Mixture of experts and the open-source chat agent OpenAssistant
📫 How to reach me: [email protected] or message me on LinkedIn

convert-polyai-torch's People

Contributors

Stargazers

Watchers

Forkers

ytahacognigy grexzen oceanos74 azizullah2017 bsantraigi ieso phamnam-mta eperrier arian-askari aaronbriel phanxuanphucnd juicehub wangpeng138375

convert-polyai-torch's Issues

check with authors l2 reg vs weight decay, obvs different for adam/adagrad optimizers

no. of parameters discrepancy with original paper

Overwrite backward hook, only want to clip gradients of subword embedding parameters

label smoothing

need to check ive implemented label smoothing with how authors how they label smoothed their objective sampling as objective fn includes negative sampling.

Cosine annealing vs cosine similarity clarification

really not clear from paper: 'computed as cosine similarity
with annealing between the encodings hx and
hy. It starts at 1 and ends atp
d, linearly increasing
over the first 10K training batches.'

Seems to interleave the two??

numerical instability with loss fn check

logsumexp use torch's Cython

BPE from scratch

implement BPE from scratch with unk tokens hashed (although may achieve worse results on downstream tasks) as # perhaps not as general as bpemb's 25000.model

register relevant position mask so automatically on cuda.

For each input sequence make relevant mask sub Matrix of initial register, so not instantiating again and again

fast gelu vs torch gelu

currently using torch gelu. fast gelu in paper

efficient way to calculate, parralelize loss fn calculation for batches

For loop for relative bias

relative bias addition is row additions of permutations of a subset of bias vector ,
need to find way to get rid of for loop and code this in one. definitely parralizable.

investigate biases before layernorms

probs want no biases, stop model bloating

cosine annealing

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.