Comments (2)
Introduction
- pretraining NLP 모델은 갈수록 커지고 있어
- gpu/tpu 리소스 문제
- 훈련 시간이 너무 길다
- 2가지 parameter reduction techniques 소개
- 그래서 모델 파라미터수는 1/18이지만 성능은 비슷한 ALBERT!
Factorized embedding parameterization
- 기존 BERT는 입력 E(Embedding)와 H(Hidden)의 레이어 사이즈 동일
- 모델 관점에서 E(=WordPiece임베딩)은 context-independent한 정보를
- H는 context-dependent한 정보를 학습하는 것을 원함
- 근데 BERT의 장점은 context-dependent한 정보를 학습하는 것
- 관계의 효과적인 학습을 위해 H >> E가 되게 모델링함
- 그럼 길이가 안맞는데???는 행렬곱 연산을 중간에 넣어 길이를 맞춰줍니다.
- 기존 인코딩 V(vocab size) x E 행렬로 변환
- 이를 (V x E) + (E x H)와 같이 행렬을 하나 추가하여 길이를 맞춰줌 (이래서 factorized)
- 카엔 인턴 시절 모델 경량화 관련으로 설명들을 때 비슷한 테크닉을 본 것 같은 기억
- E를 기존 H보다 작은 값으로 했으므로 훨씬 이득
- 성능도 크게 떨어지지 않음 (딜교 ㄱㅇㄷ?)
Cross-layer parameter sharing
- Transformer layer간 파라미터 공유
- Recursive Transformer로 봐도 무방
- FFN(Feed Forward Network)는 공유하면 성능 좀 떨어짐
- 논문에서 이야기하는 성능이 크게 차이 없다. 차이가 있다는 스코어 상 어떤 기준인거지
- 근데 논문에 해석이 없다. 왜일까. 왜 잘되는거지. ㅁㄴㅇㄹ
Inter-sentence coherence loss
- 앞선 논문들(ex.RoBERTa)에서 NSP(Next Sentence Prediction)이 없는 게 낫다고 했지만
- 다음 문장인지 아닌지 맞추는 것은 topic prediction이라 MLM에 비해 어려워서 그런 것
- SOP(sentence order prediction) 제안
- 문장 두 개를 주고, 정순/역순 맞추는 문제
- 성능 향상! (NSP는 SOP 못하는데, SOP는 NSP 한다.)
from deep-papers.
도움되는 자료
from deep-papers.
Related Issues (20)
- DetCo: Unsupervised Contrastive Learning for Object Detection
- Designing Theory-Driven User-Centric Explainable AI
- Deep Learning: A Critical Appraisal HOT 1
- A Style-Based Generator Architecture for Generative Adversarial Networks HOT 2
- Progressive Growing of GANs for Improved Quality, Stability, and Variation HOT 2
- Analyzing and Improving the Image Quality of StyleGAN HOT 2
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
- Image-to-Image Translation with Conditional Adversarial Networks HOT 1
- U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation
- Wasserstein GAN
- Large Scale GAN Training for High Fidelity Natural Image Synthesis HOT 2
- Self-Attention Generative Adversarial Networks HOT 3
- Generative Hierarchical Features from Synthesizing Images
- GANSpace: Discovering Interpretable GAN Controls HOT 2
- SinGAN: Learning a Generative Model from a Single Natural Image HOT 2
- On the "steerability" of generative adversarial networks HOT 1
- Swapping Autoencoder for Deep Image Manipulation
- Adversarial Autoencoders
- Alias-Free Generative Adversarial Networks
- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deep-papers.