Light

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations about deep-papers HOT 2 CLOSED

subinium commented on June 17, 2024

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

from deep-papers.

Comments (2)

subinium commented on June 17, 2024

Introduction

pretraining NLP 모델은 갈수록 커지고 있어
- gpu/tpu 리소스 문제
- 훈련 시간이 너무 길다
2가지 parameter reduction techniques 소개
그래서 모델 파라미터수는 1/18이지만 성능은 비슷한 ALBERT!

Factorized embedding parameterization

기존 BERT는 입력 E(Embedding)와 H(Hidden)의 레이어 사이즈 동일
- 모델 관점에서 E(=WordPiece임베딩)은 context-independent한 정보를
- H는 context-dependent한 정보를 학습하는 것을 원함
- 근데 BERT의 장점은 context-dependent한 정보를 학습하는 것
- 관계의 효과적인 학습을 위해 H >> E가 되게 모델링함
- 그럼 길이가 안맞는데???는 행렬곱 연산을 중간에 넣어 길이를 맞춰줍니다.
  - 기존 인코딩 V(vocab size) x E 행렬로 변환
  - 이를 (V x E) + (E x H)와 같이 행렬을 하나 추가하여 길이를 맞춰줌 (이래서 factorized)
    - 카엔 인턴 시절 모델 경량화 관련으로 설명들을 때 비슷한 테크닉을 본 것 같은 기억
  - E를 기존 H보다 작은 값으로 했으므로 훨씬 이득
  - 성능도 크게 떨어지지 않음 (딜교 ㄱㅇㄷ?)

Cross-layer parameter sharing

Transformer layer간 파라미터 공유
Recursive Transformer로 봐도 무방
FFN(Feed Forward Network)는 공유하면 성능 좀 떨어짐
- 논문에서 이야기하는 성능이 크게 차이 없다. 차이가 있다는 스코어 상 어떤 기준인거지
근데 논문에 해석이 없다. 왜일까. 왜 잘되는거지. ㅁㄴㅇㄹ

Inter-sentence coherence loss

앞선 논문들(ex.RoBERTa)에서 NSP(Next Sentence Prediction)이 없는 게 낫다고 했지만
다음 문장인지 아닌지 맞추는 것은 topic prediction이라 MLM에 비해 어려워서 그런 것
SOP(sentence order prediction) 제안
- 문장 두 개를 주고, 정순/역순 맞추는 문제
- 성능 향상! (NSP는 SOP 못하는데, SOP는 NSP 한다.)

from deep-papers.

subinium commented on June 17, 2024

도움되는 자료

from deep-papers.

Related Issues (20)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.