Git Product home page Git Product logo

beomi / kcbert Goto Github PK

View Code? Open in Web Editor NEW
466.0 19.0 43.0 747 KB

๐Ÿค— Pretrained BERT model & WordPiece tokenizer trained on Korean Comments ํ•œ๊ตญ์–ด ๋Œ“๊ธ€๋กœ ํ”„๋ฆฌํŠธ๋ ˆ์ด๋‹ํ•œ BERT ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ์…‹

Home Page: https://huggingface.co/beomi/kcbert-base

License: MIT License

bert-model korean-nlp bert nlp transformers

kcbert's Issues

ํŒŒ์ผ์ด ์—†๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š”! ์ข‹์€ ๋ชจ๋ธ ๋„ˆ๋ฌด ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!
๋‹ค๋ฆ„์ด ์•„๋‹ˆ๋ผ ์ œ๊ฐ€ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์ถ”๊ฐ€ ํ•™์Šต์‹œํ‚ค๋ ค๊ณ  ํ•˜๋Š”๋ฐ run_mlm.py ํŒŒ์ผ์ด ์—†๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
ํ˜น์‹œ ์‹ค๋ก€๊ฐ€ ์•ˆ๋œ๋‹ค๋ฉด ํŒŒ์ผ์„ ๋ฐ›์„ ์ˆ˜ ์žˆ์„๊นŒ์š” ? ใ…œใ…œ
๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

predictions_tr = trainer.predict(dataloaders=model.val_dataloader()) ๋ถ€๋ถ„ ์˜ค๋ฅ˜ ๋ฌธ์˜

์•ˆ๋…•ํ•˜์„ธ์š”.

kcbert๋ฅผ ํ™œ์šฉํ•ด์„œ ํŒ€ ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ๋Š” ํ•™์ƒ์ž…๋‹ˆ๋‹ค.

์˜ˆ์ธก ๋ชจ๋ธ์ด ํ•™์Šต๋ฐ์ดํ„ฐ์— ์–ด๋–ค ๋ผ๋ฒจ์„ ์˜ˆ์ธกํ–ˆ๋Š”์ง€ ๋ฝ‘์•„ ๋ณด๊ณ  ์‹ถ์–ด์„œ ๋ชจ๋ธ ํ•™์Šต ํ›„, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

predictions_tr = trainer.predict(dataloaders=model.val_dataloader())

ํ•˜์ง€๋งŒ ์•„๋ž˜์™€ ๊ฐ™์€ ์˜ค๋ฅ˜๊ฐ€ ๊ณ„์† ๋œจ๋„ค์š”...
TypeError: Model.forward() takes 1 positional argument but 2 were given

bard์—๊ฒŒ ๋ฌผ์–ด๋ด์„œ

class Model(LightningModule): ์•ˆ์—

def training_step(self, batch, batch_idx):
data, labels = batch
output = self(input_ids=data, labels=labels)

def training_step(self, batch, batch_idx):
    data, labels = batch
    output = self.forward(input_ids=data, labels=labels) # self => self.forward

๋ผ๊ณ  ๊ณ ์ณค๋Š”๋ฐ ๊ทธ๋ž˜๋„ ์˜ค๋ฅ˜๋„ ๋‚˜๊ณ  ์–ด๋–ป๊ฒŒ ํ•ด๋ด๋„ ์•ˆ๋˜๋„ค์š”.

ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ์•„์‹œ๋Š” ๋ถ„๋“ค์€ ๋‹ต๋ณ€ ๋‹ฌ์•„์ฃผ์‹ฌ ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ใ… ใ… 

pooler_num_attention_heads ๋ฌธ์˜

์•ˆ๋…•ํ•˜์„ธ์š”, ๋ชจ๋ธ๊ณผ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ๊ณต์œ ํ•ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
ํ•ด๋‹น ๋‚ด์šฉ์„ ์ฐธ๊ณ ํ•˜์—ฌ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๋ ค๊ณ  ํ•˜๋Š”๋ฐ, ์•„๋ž˜ ๋‘ ํ•ญ๋ชฉ์€ Huggingface์˜ BertConfig ๋ฌธ์„œ์—๋Š” ๋ณด์ด์ง€ ์•Š์•„ ์งˆ๋ฌธ๋“œ๋ฆฌ๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

"pooler_size_per_head": 128,
"pooler_num_attention_heads": 12,

BERT์—์„œ์˜ pooler๋Š” Transformer Encoder์˜ output ์ดํ›„์— ๋ณดํ†ต FC๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ downstream task์— ๋งž๊ฒŒ projection ๋˜๋Š” ๊ฒƒ์œผ๋กœ ์ดํ•ดํ–ˆ์Šต๋‹ˆ๋‹ค.
ํ•˜์ง€๋งŒ ์˜ฌ๋ ค์ฃผ์‹  BertConfig์—๋Š” pooler๊ฐ€ multi head attention layer๋ฅผ ํƒ€๋Š”๋“ฏํ•œ ํ•ญ๋ชฉ์ด ๋ณด์ด๋Š”๋ฐ์š”, https://huggingface.co/transformers/model_doc/bert.html#transformers.BertConfig์ด๋‚˜ ๋‹ค๋ฅธ ๋ฌธ์„œ์—์„œ๋„ ํ™•์ธ์ด ํž˜๋“ค์–ด ์งˆ๋ฌธ๋“œ๋ฆฌ๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

huggingface issue์—๋„ huggingface/transformers#788 ์™€ ๊ฐ™์ด ์œ ์‚ฌํ•œ ์งˆ๋ฌธ์ด ์˜ฌ๋ผ์™”๋˜ ๊ฒƒ ๊ฐ™์œผ๋‚˜, ๋‹ต๋ณ€์ด ๋‹ฌ๋ฆฌ์ง€ ์•Š์•˜๋„ค์š”.

pre-train์‹œ ํ•™์Šต ๋ฐ์ดํ„ฐ

์•ˆ๋…•ํ•˜์„ธ์š”! ์ข‹์€ ๋ชจ๋ธ๊ณผ ์ฝ”๋“œ๋ฅผ ์—ด์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฆ„์ด ์•„๋‹ˆ๋ผ ์ œ๊ฐ€ https://beomi.github.io/2021/03/15/KcBERT-MLM-Finetune/
์ด ์‚ฌ์ดํŠธ์— ๋‚˜์™€์žˆ๋Š”๋ฐ๋กœ ์ถ”๊ฐ€ ํ•™์Šต์„ ํ–ˆ์—ˆ๋Š”๋ฐ
์ œ ๋„๋ฉ”์ธ์— ๋งž๋Š” ๋ฐ์ดํ„ฐ [mask] ์˜ˆ์ธก์„ ์ž˜ ํ•˜์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ ๊ฐ™์•„์„œ,
vocab.txt๋ฅผ ์ œ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ๋งŒ๋“ค์–ด์„œ ๋ณ€๊ฒฝํ›„ ์ถ”๊ฐ€ ํ•™์Šต์„ ํ•˜๋ คํ•˜๋Š”๋ฐ ์ด๋ ‡๊ฒŒ ํ•ด๋„ ๊ดœ์ฐฎ์„๊นŒ์š” ?

์•ˆ๋…•ํ•˜์„ธ์š”, colab์ฝ”๋“œ์— ๋Œ€ํ•ด ์งˆ๋ฌธ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋จผ์ € ์ข‹์€ ์ž๋ฃŒ ๊ณต์œ ํ•ด ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
์ €๋Š” ์•„์ง ์ดˆ๋ณด์ง€๋งŒ ๋”ฅ๋Ÿฌ๋‹์— ๊ด€์‹ฌ์ด ์žˆ๋Š” ํ•™์ƒ์ž…๋‹ˆ๋‹ค.

colab์—์„œ ๋ถˆ๋Ÿฌ์˜ค๋Š” dataset๋งŒ ์ œ๊ฐ€ ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋กœ ์ˆ˜์ •ํ•˜์—ฌ ํ•™์Šต์„ ์ง„ํ–‰ํ•ด ๋ดค์Šต๋‹ˆ๋‹ค.

์ฒ˜์Œ์—๋Š”, colab์˜ KcBERT Large์—์„œ pretrained_model๋งŒ beomi/kcbert-base๋กœ ์ˆ˜์ •ํ•ด ์‹œ๋„ํ•ด ๋ดค๋Š”๋ฐ,
ํ•™์Šต์ด ์™„๋ฃŒ๋œ ๋ชจ๋ธ์ด ์„ธ์…˜ ์ €์žฅ์†Œ์—์„œ ๋ณด์ด์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

๋‘ ๋ฒˆ์งธ์—๋Š”, colab์˜ kcbert-nsmc์—์„œ dataset๋งŒ ๋ฐ”๊พธ์–ด ์ง„ํ–‰ํ•ด ๋ดค๋Š”๋ฐ, Exception: Model doesn't exists! Train first!
๊ฐ€ ๋‚˜์™”์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ predict.py๋ฅผ ์‹คํ–‰ํ–ˆ์„ ๋•Œ์—๋Š” FileNotFoundError: [Errno 2] No such file or directory: './model/training_args.bin'์ด ๋‚˜์™”์Šต๋‹ˆ๋‹ค.

colab์—์„œ predict.py๋ฅผ ์‹คํ–‰ํ•  ๋•Œ ์ œ๊ฐ€ ์ค€๋น„ํ•ด์•ผ ํ•˜๋Š” ๋‹ค๋ฅธ ๋ฌด์–ธ๊ฐ€๊ฐ€ ์žˆ๋‚˜์š”?
colab์—์„œ ์ œ๊ฐ€ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šตํ•œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜, ๋กœ์ปฌ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค.

Pre-train ์‹œ ์ฝ”ํผ์Šค ํŒŒ์ผ ๋ฌธ์„œ ๋‚˜๋ˆ„๊ธฐ

์•ˆ๋…•ํ•˜์„ธ์š”!
์ฝ”ํผ์Šค ๋ฐ ์ฝ”๋“œ๋ฅผ ๊ณต๊ฐœํ•ด์ฃผ์…”์„œ ์ •๋ง ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

๊ณต๊ฐœํ•ด์ฃผ์‹  ์ฝ”ํผ์Šค๋กœ KcBERT๋ฅผ ์ง์ ‘ ํ•œ๋ฒˆ ๋งŒ๋“ค์–ด ๋ณด๋ ค๊ณ  ํ•˜๋Š”๋ฐ์š”.

BERT ๊ณต์‹ github(https://github.com/google-research/bert)์˜ pre-training ์„ค๋ช…์— ๋”ฐ๋ฅด๋ฉด
| Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the "next sentence prediction" task). Documents are delimited by empty lines.

๋ผ๊ณ  ์–ธ๊ธ‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์ธ์šฉํ•œ ๋ถ€๋ถ„์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์žฅ์€ ์ฝ”ํผ์Šค๊ฐ€ ์—ฌ๋Ÿฌ ๋ฌธ์„œ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์„ ๊ฒฝ์šฐ ๋ฌธ์„œ์™€ ๋ฌธ์„œ ์‚ฌ์ด์— ๋นˆ ํ–‰์„ ๋„ฃ์–ด ๊ตฌ๋ถ„ํ•˜๋ผ๊ณ  ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๊ณต๊ฐœํ•ด์ฃผ์‹  ์ฝ”ํผ์Šค๋„ ๋‹จ์ผํ•œ ๋ฌธ์„œ๊ฐ€ ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฌธ์„œ๋ฅผ ํ•˜๋‚˜์˜ ํŒŒ์ผ๋กœ ๋ณ‘ํ•ฉํ•˜์‹  ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ ๊ณต๊ฐœํ•ด์ฃผ์‹  ์ฝ”ํผ์Šค์—๋Š” ๋ฌธ์„œ๋ฅผ ๊ตฌ๋ถ„ํ•˜๊ธฐ ์œ„ํ•œ ๋นˆ ํ–‰์ด ๋ณด์ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
ํ˜น์‹œ ๋ชจ๋ธ์„ ๋งŒ๋“œ์‹ค ๋•Œ ์ฝ”ํผ์Šค ๋‚ด์˜ ๊ฐ ๋ฌธ์„œ๋ฅผ ๋”ฐ๋กœ ๊ตฌ๋ถ„ํ•˜์‹  ํ›„ ์ง„ํ–‰ํ•˜์…จ๋Š”์ง€, ์•„๋‹ˆ๋ฉด ๊ณต๊ฐœ๋œ ๋Œ€๋กœ ๋ฌธ์„œ์™€ ๋ฌธ์„œ๋ฅผ ๊ตฌ๋ถ„ํ•˜์ง€ ์•Š๊ณ  ๋ชจ๋ธ์„ ๋งŒ๋“œ์…จ๋Š”์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค!

์•ˆ๋…•ํ•˜์„ธ์š”! ์งˆ๋ฌธ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š” ~ ๋จผ์ € ์ข‹์€ ์ž๋ฃŒ ๊ณต์œ ํ•ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
์•Œ๊ณ ๋ณด๋‹ˆ ์ œ๊ฐ€ ์ž์ฃผ ๊ฐ”๋˜ ๋ธ”๋กœ๊ทธ ์ฃผ์ธ๋‹˜์ด์‹œ๋„ค์š” ใ…Žใ…Ž

์งˆ๋ฌธ์ด ์žˆ์–ด์„œ Issue ๋‚จ๊ฒจ๋“œ๋ ค์š”!
์ œ๊ฐ€ ์ดˆ๋ณด๋ผ ์งˆ๋ฌธ ์ˆ˜์ค€์ด ๋‚ฎ์•„์„œ.. ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค ใ… 

  1. Tokenizer word ๊ฐœ์ˆ˜๋ฅผ 3๋งŒ๊ฐœ๋กœ ์ง€์ •ํ•œ ์ด์œ ๊ฐ€ ์žˆ์œผ์‹ค๊นŒ์š”?
  • BERT ๋…ผ๋ฌธ์—์„œ์ธ๊ฐ€ ๊ฑฐ๊ธฐ์„œ๋Š” 3๋งŒ๊ฐœ๋กœ ํ–ˆ๋‹ค๊ณ  ๋ณธ ๊ฒƒ ๊ฐ™์€๋ฐ ๊ทธ๊ฒƒ ๋•Œ๋ฌธ์ผ๊นŒ์š”?
  • ํ•œ๊ตญ์–ด (ํŠนํžˆ ๋„คํ‹ฐ์ฆŒ๋“ค์ด ์‚ฌ์šฉํ•˜๋Š” ๋‹จ์–ด)๋Š” ๊ต‰์žฅํžˆ ๋‹ค์–‘ํ•œ ๋‹จ์–ด๊ฐ€ ์žˆ๋Š”๋ฐ 3๋งŒ๊ฐœ๋กœ ์ปค๋ฒ„๋ฆฌ์ง€๊ฐ€ ๊ฐ€๋Šฅํ•œ์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค!
  1. Fine-tuning ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์–ด๋–ป๊ฒŒ ๋ ๊นŒ์š”?
  • Beomi๋‹˜๊ป˜์„œ ๋งŒ๋“œ์‹  Pre-training๋œ model์„ ์ด์šฉํ•ด์„œ ์ œ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” Dataset์— Fine-tuning์„ ํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.
  • Model์„ Fine-tuning ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ Tokenizer tuning์„(๊ฐ€๋Šฅํ• ์ง€ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ) ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์–ด๋–ป๊ฒŒ ๋ ๊นŒ์š”??

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

KcBERT Pre-Training Corpus (Korean News Comments)

์•ˆ๋…•ํ•˜์„ธ์š”,

KcBERT Pre-Training Corpus (Korean News Comments) ๊ด€๋ จํ•ด์„œ ์—ฌ์ญค๋ณผ ๊ฒŒ ์žˆ๋Š”๋ฐ์š”.
์ด ์ฝ”ํผ์Šค๋กœ ๋ชจ๋ธ์„ ํ•™์Šต ์‹œ์ผฐ๋‹ค๊ณ  ํ•˜์‹œ๋Š”๋ฐ ๊ฐ ์ฝ”๋ฉ˜ํŠธ์˜ ๋ผ๋ฒจ์€ ์•ˆ ๋‹ฌ๋ ค์žˆ๋‚˜์š”?

๋‹ต๋ณ€ ๋ฏธ๋ฆฌ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

pretraining ๊ด€๋ จ ๋ฌธ์˜

์•ˆ๋…•ํ•˜์„ธ์š”.
๋จผ์ € ์ข‹์€ ํ”„๋กœ์ ํŠธ ๋ฐ ๋ฐ์ดํ„ฐ ๊ณต์œ ํ•ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

์บ๊ธ€์— ์˜ฌ๋ ค์ฃผ์‹  ๋ฐ์ดํ„ฐ ํ™•์ธํ•ด๋ณด๋‹ˆ ๋Œ“๊ธ€๋“ค ๊ฐ„์— ๋งฅ๋ฝ์€ ๋”ฐ๋กœ ์—†๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์—ฌ์„œ
๊ธฐ๋ณธ ๋ฒ„ํŠธ์˜ NSP๋‚˜ ์•Œ๋ฒ„ํŠธ์˜ SOP๋Š” ์‚ฌ์šฉํ•˜๊ธฐ ์–ด๋ ค์šธ๊ฒƒ์œผ๋กœ ๋ณด์ด๋Š”๋ฐ,
pretraining ์‹œ, MLM๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต ํ•˜์…จ๋Š”์ง€ ์—ฌ์ญค๋ณด๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

predict ์‹œ์— ๋ฐ์ดํ„ฐ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ํ˜„์ƒ

์•ˆ๋…•ํ•˜์„ธ์š” ๋‹ค์‹œ ๋˜ ์ด์Šˆ๋ฅผ ๋“ค๊ณ ์˜ค๊ฒŒ ๋˜์—ˆ๋„ค์š” .. ;-;
์บก์ฒ˜

๋ชจ๋ธ์— ๋งŒ๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•˜๋ ค ํ–ˆ๋Š”๋ฐ ๊ณ„์† ๋งŒ๊ฐœ ์ด์ƒ์˜ ๋ฐ์ดํ„ฐ๋กœ ๋ผ๋ฒจ์ด ๋‚˜์˜ค๋Š”๋ฐ, ์ด๊ฒŒ ์–ด๋–ค ๊ฒฝ์šฐ์— ์ด๋ ‡๊ฒŒ ๋˜๋Š”๊ฑด์ง€ ์•„์‹œ๋‚˜์š”?

์‚ฌ์ง„์€ ๋ฐฐ์น˜๋ฅผ 32๋กœ ํ–ˆ๋”๋‹ˆ ๊ณ„์† ์•ˆ๋˜์„œ, ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋ฅผ 1๋กœ ๋ฐ”๊พธ๊ณ  ํ…Œ์ŠคํŠธํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

optuna๋ฅผ ํ†ตํ•ฉ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ตœ์ ํ™”์™€ ๋ชจ๋ธ ์•™์ƒ๋ธ”

์•ˆ๋…•ํ•˜์„ธ์š”.

BERT๋ฅผ ์ด์šฉํ•œ ํ…์ŠคํŠธ ๋ถ„๋ฅ˜๋ฅผ ์—ฐ๊ตฌํ•˜๋Š” ์‚ฌ๋žŒ์ž…๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ kobert๋‚˜ ๊ธฐํƒ€ ํ•œ๊ตญ check point๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์˜ˆ์ œ์—์„œ๋Š” optuna๋กœ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ตœ์ ํ™” ํ•˜๊ฑฐ๋‚˜, ์•™์ƒ๋ธ” ํ•˜๋Š” ๊ฒƒ์ด ์•ˆ๋ณด์—ฌ์„œ์š”.
๊ทธ๋‹ค์ง€ ์œ ์šฉ์„ฑ์ด ์—†์–ด์„œ ์˜ˆ์ œ๊ฐ€ ์—†๋Š” ๊ฒƒ์ธ์ง€ ์•„๋‹ˆ๋ฉด, ๋‹ค๋ฅธ ์ด์œ ์ธ์ง€ ๊ถ๊ธˆํ•ฉ๋‹ˆ๋‹ค.

ํ˜น์—ฌ ์˜ˆ์ œ๊ฐ€ ์žˆ๋‹ค๋ฉด ์—…๋กœ๋“œ ํ•ด์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค

kcbert-large colab์—์„œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค

์บก์ฒ˜

์•ˆ๋…•ํ•˜์„ธ์š”. ์˜ฌ๋ ค์ฃผ์‹  ์ฝ”๋“œ ์ž˜ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ฝ”๋žฉ์—์„œ large ํŒŒ์ผ์„ ๋Œ๋ฆฌ๋ ค๊ณ  ํ•˜๋Š”๋ฐ, ์ผ์ฃผ์ผ ์ „๊นŒ์ง€๋งŒ ํ•ด๋„ ์ž˜ ์ž‘๋™๋˜๋‹ค๊ฐ€ ๊ฐ‘์ž๊ธฐ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋„ค์š”.
๋ฒ„์ „ ์˜ค๋ฅ˜์ผ ์ˆ˜๋„ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐ์ด ๋“œ๋Š”๋ฐ, ๊ณ„์† ์”จ๋ฆ„ํ•˜๋‹ค๊ฐ€ ํ•ด๊ฒฐ ๋ชป ํ•˜๊ฒ ์–ด์„œ ์ด์Šˆ ์˜ฌ๋ฆฝ๋‹ˆ๋‹ค. ใ… ใ… 

ํ•ด๊ฒฐ๋ฐฉ๋ฒ• ์•„๋Š” ๋ถ„๋“ค์€ ์•Œ๋ ค์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค (_ _)

kcbert-large์—์„œ์˜ predict๋ฐฉ๋ฒ•

์•ˆ๋…•ํ•˜์„ธ์š”. ์šฐ์„  kcbert ๊ณต๊ฐœํ•ด์ฃผ์…”์„œ ์ •๋ง ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.
์ด๋ฅผ ์ด์šฉํ•ด prediction์„ ํ•ด๋ณด๋ ค ํ•˜๋Š”๋ฐ, ํŠœํ† ๋ฆฌ์–ผ์—๋Š” ๋‚˜์™€์žˆ์ง€ ์•Š์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
์–ด๋–ป๊ฒŒ ํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”?
์ถ”๊ฐ€๋กœ, validation๊ณผ์ •์„ ์ƒ๋žตํ•˜๊ณ  ์‹ถ์€๋ฐ, ๊ฐ€๋Šฅํ• ๊นŒ์š”?
ํ™˜๊ฒฝ์€ colab, gpu: t4์ž…๋‹ˆ๋‹ค

ckpt ์—์„œ BERT ๋ชจ๋ธ ๋กœ๋“œํ•˜๋Š” ๋ฐฉ๋ฒ•

์ข‹์€ ์ž๋ฃŒ ๊ณต์œ  ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

๊ฐœ์ธ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต์‹œํ‚จ ํ›„, ์ƒ๊ธด ckpt ๋ฅผ BERT ๋ชจ๋ธ์— load ํ•˜๋Š” ๋ฐฉ๋ฒ•์—์„œ ๋ง‰ํ˜”์Šต๋‹ˆ๋‹ค ใ…œใ…œ

ํ˜น์‹œ ๊ฐ€์ด๋“œ๋ผ์ธ์„ ์ œ์‹œํ•ด์ฃผ์‹ค ์ˆ˜ ์žˆ์„๊นŒ์š”?

IndexError: Target 2 is out of bounds. ์˜ค๋ฅ˜ ์งˆ๋ฌธ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š”!

NLP๋ฅผ ๊ณต๋ถ€ํ•˜๊ณ ์žˆ๋Š” ํ•™์ƒ์ž…๋‹ˆ๋‹ค.

๊ธฐ์‚ฌ ๋Œ“๊ธ€ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š”๋ฐ ์•„์ฃผ ์ ํ•ฉํ•œ KcBERT๋ฅผ ์ฐพ์•„์„œ ๋งค์šฐ ์ž˜ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค :)

๊ทธ๋Ÿฐ๋ฐ, ๊ณต์œ ํ•ด์ฃผ์‹  NSMC ๋ฐ์ดํ„ฐ์…‹ fine-tuningํ•˜๋Š” ์ฝ”๋“œ์—์„œ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•  ๋•Œ ๋ฐœ์ƒํ•œ ์—๋Ÿฌ๋ฅผ ์žก์ง€ ๋ชปํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ฒ˜์Œ์—๋Š” RuntimeError: CUDA error: device-side assert triggered ์—๋Ÿฌ๊ฐ€ ๋‚˜์„œ ํ•ด๋‹น ์‚ฌ์ดํŠธ์—์„œ Runtime type์„ None์œผ๋กœ ๋ณ€๊ฒฝํ•˜๋ฉด ์‹ค์ œ ๋ฌธ์ œ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ์ดํ›„ ๋งˆ์ฃผ์นœ ์—๋Ÿฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: IndexError: Target 2 is out of bounds.

  • ์ฝ”๋žฉํ™˜๊ฒฝ(๊ณต์œ ํ•ด์ฃผ์‹  ๋„ค์ด๋ฒ„ ์˜ํ™”ํ‰ ๋ฐ์ดํ„ฐ์…‹ fine-tuning Large Model): https://colab.research.google.com/drive/1dFC0FL-521m7CL_PSd8RLKq67jgTJVhL?usp=sharing

  • ์—๋Ÿฌ: IndexError: Target 2 is out of bounds.

  • ์ƒํ™ฉ: ์ œ๊ณตํ•ด์ฃผ์‹  ์ฝ”๋žฉ ํ™˜๊ฒฝ์—์„œ ๋„ค์ด๋ฒ„ ์˜ํ™”ํ‰ ๋ฐ์ดํ„ฐ์…‹์ด ์•„๋‹Œ 5๊ฐœ์˜ ๊ฐ์„ฑ ํด๋ž˜์Šค(๋งค์šฐ ๋ถ€์ • 0, ๋ถ€์ • 1, ์ค‘๋ฆฝ 2, ๊ธ์ • 3, ๋งค์šฐ ๊ธ์ • 4)๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ fine-tuning ํ•˜๋Š” ๊ณผ์ •์—์„œ ํ•ด๋‹น ์—๋Ÿฌ๊ฐ€ ๊ณ„์† ์ƒ๊น๋‹ˆ๋‹ค.

  • ๋ฐ์ดํ„ฐ์…‹ ๊ฐœ์ˆ˜: ๋Œ“๊ธ€ ๋ฐ์ดํ„ฐ train: 11,281๊ฐœ / test: 1,253๊ฐœ

  • ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑ: NSMC ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋™์ผํ•˜๊ฒŒ id, document, label๋กœ ์ˆ˜์ •ํ•˜์˜€๊ณ , tab์œผ๋กœ ๋„์–ด์“ด txt ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ๋งŒ ์ˆ˜์ •ํ•˜์˜€๊ณ , ๋‹ค๋ฅธ ๋ถ€๋ถ„์€ ๊ฑด๋“ค์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

5๊ฐœ์˜ ๋ ˆ์ด๋ธ”์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์…‹์„ ํ•ด๋‹น ์ฝ”๋“œ์—์„œ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ์ข‹์„๊นŒ์š”?

์ œ๊ฐ€ ๋ฌด์—‡์„ ๋†“์น˜๊ณ  ์žˆ๋Š” ๊ฒƒ์ธ์ง€ ๋„์›€ ์ฃผ์‹ค์ˆ˜ ์žˆ์œผ์‹ค๊นŒ์š”?

์—๋Ÿฌ ์ฝ”๋“œ ์ „๋ฌธ ๊ณต์œ ๋“œ๋ฆฝ๋‹ˆ๋‹ค.


IndexError Traceback (most recent call last)
in ()
----> 1 main()

18 frames
in main()
18 # tpu_cores=args.tpu_cores if args.tpu_cores else None,
19 )
---> 20 trainer.fit(model)

/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/states.py in wrapped_fn(self, *args, **kwargs)
46 if entering is not None:
47 self.state = entering
---> 48 result = fn(self, *args, **kwargs)
49
50 # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted

/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
1082 self.accelerator_backend = CPUBackend(self)
1083 self.accelerator_backend.setup(model)
-> 1084 results = self.accelerator_backend.train(model)
1085
1086 # on fit end callback

/usr/local/lib/python3.6/dist-packages/pytorch_lightning/accelerators/cpu_backend.py in train(self, model)
37
38 def train(self, model):
---> 39 results = self.trainer.run_pretrain_routine(model)
40 return results

/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py in run_pretrain_routine(self, model)
1237
1238 # CORE TRAINING LOOP
-> 1239 self.train()
1240
1241 def _run_sanity_check(self, ref_model, model):

/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py in train(self)
392 # RUN TNG EPOCH
393 # -----------------
--> 394 self.run_training_epoch()
395
396 if self.max_steps and self.max_steps <= self.global_step:

/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py in run_training_epoch(self)
489 # TRAINING_STEP + TRAINING_STEP_END
490 # ------------------------------------
--> 491 batch_output = self.run_training_batch(batch, batch_idx)
492
493 # only track outputs when user implements training_epoch_end

/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py in run_training_batch(self, batch, batch_idx)
842 opt_idx,
843 optimizer,
--> 844 self.hiddens
845 )
846 using_results_obj = isinstance(opt_closure_result.training_step_output, Result)

/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py in optimizer_closure(self, split_batch, batch_idx, opt_idx, optimizer, hiddens)
1013 else:
1014 training_step_output = self.training_forward(split_batch, batch_idx, opt_idx,
-> 1015 hiddens)
1016
1017 # ----------------------------

/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/training_loop.py in training_forward(self, batch, batch_idx, opt_idx, hiddens)
1224 # CPU forward
1225 else:
-> 1226 output = self.model.training_step(*args)
1227
1228 is_result_obj = isinstance(output, Result)

in training_step(self, batch, batch_idx)
15 def training_step(self, batch, batch_idx):
16 data, labels = batch
---> 17 loss, logits = self(input_ids=data, labels=labels)
18 preds = logits.argmax(dim=-1)
19

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

in forward(self, **kwargs)
11
12 def forward(self, **kwargs):
---> 13 return self.bert(**kwargs)
14
15 def training_step(self, batch, batch_idx):

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
1340 else:
1341 loss_fct = CrossEntropyLoss()
-> 1342 loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
1343
1344 if not return_dict:

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py in forward(self, input, target)
946 def forward(self, input: Tensor, target: Tensor) -> Tensor:
947 return F.cross_entropy(input, target, weight=self.weight,
--> 948 ignore_index=self.ignore_index, reduction=self.reduction)
949
950

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
2420 if size_average is not None or reduce is not None:
2421 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2422 return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
2423
2424

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
2216 .format(input.size(0), target.size(0)))
2217 if dim == 2:
-> 2218 ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
2219 elif dim == 4:
2220 ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

IndexError: Target 2 is out of bounds.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.