๐Ÿ“š ๋…ผ๋ฌธ

GPT-1: Improving Language Understanding by Generative Pre-Training

2023. 6. 20. 02:58
๋ชฉ์ฐจ
  1. Abstract
  2. Introduction
  3. Related Work
  4. Framework
  5. 1. Unsupervised pre-training
  6. 2. Supervised fine-tuning
  7. 3. Task-specific input transformation
  8. Experiments

Abstract

Natural language์—๋Š” unlabeled text์˜ ๋ฐ์ดํ„ฐ ์ˆ˜๊ฐ€ labeled text์˜ ๋ฐ์ดํ„ฐ ์ˆ˜๋ณด๋‹ค ํ›จ์”ฌ ๋งŽ๋‹ค. ํ•ด๋‹น ์‚ฌ์‹ค์— ๊ทผ๊ฑฐํ•˜์—ฌ OpenAI์—์„œ๋Š” ๋‹ค์–‘ํ•œ unlabeled text๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ generative ํ•˜๊ฒŒ pre-train ์‹œํ‚จ GPT ๋ชจ๋ธ์„ ์ œ์‹œํ–ˆ๋‹ค. ํ•ด๋‹น ๋ชจ๋ธ์€ ์ด์ „ ๋ชจ๋ธ๋“ค๋ณด๋‹ค ํ›จ์”ฌ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์ฆ๋ช…ํ–ˆ๋‹ค.

Introduction

unlabeled data๋กœ๋ถ€ํ„ฐ word-level ์ด์ƒ์˜ ์ •๋ณด๋ฅผ ๋Œ์–ด๋‚ด๋Š” ๊ฒƒ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‘ ๊ฐ€์ง€์˜ ์ด์œ ๋กœ ์–ด๋ ต๋‹ค:

  1. transfer์— ์œ ์šฉํ•œ text ํ‘œํ˜„์„ ๋ฐฐ์šฐ๋Š” ๊ฒƒ์— ์–ด๋–คํ•œ ํ˜•ํƒœ์˜ ์ตœ์ ํ™” ๋ชฉ์  (optimation objectives)๊ฐ€ ์ข‹์€์ง€ ๋ชจ๋ฅธ๋‹ค.
  2. ํ•™์Šต๋œ ํ‘œํ˜„์„ target task์— ์ „๋‹ฌํ•  ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์ด ๋ฌด์—‡์ธ์ง€์— ๊ด€ํ•œ ์˜๊ฒฌ ์ผ์น˜๊ฐ€ ์—†๋‹ค.

์ด๋Ÿฐ ๋ถˆ๋ถ„๋ช…์„ฑ์ด NLP์— ํšจ๊ณผ์ ์ธ semi-supervised-learning์„ ๋””๋ฒจ๋กญํ•˜๋Š” ๊ฒƒ์— ์–ด๋ ค์›€์„ ์ฃผ์—ˆ๋‹ค.

ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” unsupervised pre-training๊ณผ supervised fine-tuning์„ ๊ฒฐํ•ฉํ•œ semi-supervised ์ ‘๊ทผ์„ ์‹œ๋„ํ•œ๋‹ค.

์ด ๋…ผ๋ฌธ์˜ ๋ชฉ์ ์€ ์ ์€ ๋ณ€ํ™”๋กœ ๋‹ค์–‘ํ•œ ์ž‘์—…์— transfer ํ•  ์ˆ˜ ์žˆ๋Š” ๋ณดํŽธ์ ์ธ ํ‘œํ˜„๋“ค์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ฅผ ์œ„ํ•ด์„œ๋Š” ๋Œ€๋Ÿ‰์˜ corpus of unlabeled text๊ฐ€ ํ•„์š”ํ•˜๊ณ , ๋ชฉํ‘œ ์ž‘์—…์„ ์œ„ํ•œ labeled data๊ฐ€ ํ•„์š”ํ•˜๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€ ๊ณผ์ •์„ ๋งŒ๋“ค์—ˆ๋‹ค:

  1. Unlabeled data์— language modeling objective๋ฅผ ์ ์šฉ์‹œ์ผœ ์ดˆ๊ธฐ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ•™์Šตํ•˜๊ฒŒ ํ–ˆ๋‹ค.
  2. ํ•ด๋‹น ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ labeled data๋ฅผ ์ด์šฉํ•˜์—ฌ target task์— fine-tuning์‹œํ‚จ๋‹ค.

๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋กœ๋Š” transformer๋ฅผ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ์ด ๋ชจ๋ธ์€ long-term(๊ธด) ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋„ RNN๊ณผ ๊ฐ™์€ ์ด์ „ ๋ชจ๋ธ๋“ค๋ณด๋‹ค ํ›จ์”ฌ robust(ํŠผํŠผ)ํ•œ ๊ฒฐ๊ณผ๋ฌผ์„ ๋‚ด๋†“๋Š”๋‹ค. transfer์‹œ์—๋Š” traversal-style ์ ‘๊ทผ ๊ธฐ๋ฐ˜์—์„œ ์‚ฌ์šฉ๋œ task์— ํŠน์ •์ ์ธ input adaptation์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด๋Š” task๋งˆ๋‹ค ์š”๊ตฌํ•˜๋Š” input text๋ฅผ ์—ฐ์†์ ์ธ ์‹ฑ๊ธ€ ์‹œํ€€์Šค๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ฆ‰ task์— ๋งž๋Š” ๋ฏธ์„ธ์กฐ์ •์„ ์œ„ํ•ด์„œ pre-trained ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์„ ๋ณ€ํ˜•์‹œํ‚จ ๊ฒƒ์ด๋‹ค. ์ด ๋•Œ๋ฌธ์— pre-trained ๋ชจ๋ธ์˜ ์ถœ๋ ฅ๋งŒ ๋ฐ”๊ฟ”๋„ ๋‹ค์–‘ํ•œ ์ž‘์—…์— ๋ฏธ์„ธ์กฐ์ •์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

Related Work

 Semi-supervised learning for NLP : ์ดˆ๊ธฐ ์—ฐ๊ตฌ์—์„œ๋Š” unlabeled data๋ฅผ ์‚ฌ์šฉํ•ด์„œ word-level์ด๋‚˜ phrase-level์„ ๊ณ„์‚ฐํ•˜๊ณ  ์ด๋ฅผ feature๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์ง€๋„ํ•™์Šต์— ์ ์šฉ์‹œ์ผฐ๋‹ค. ์ง€๋‚œ ๋ช‡ ๋…„๊ฐ„์˜ ์—ฐ๊ตฌ๋“ค์—์„œ๋Š” unlabeled corpora๋กœ ํ›ˆ๋ จ๋œ ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์ž‘์—…์— ์„ฑ๋Šฅ์  ๋ฐœ์ „์„ ๊ฐ€์ ธ๋‹ค์ฃผ์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฐ ์ ‘๊ทผ ๋ฐฉ์‹๋“ค์€ ์ฃผ๋กœ ๋‹จ์–ด ์ˆ˜์ค€์˜ ์ •๋ณด๋ฅผ transfer ํ•œ๋‹ค๋Š” ์ ์—์„œ ํ•œ๊ณ„๊ฐ€ ์žˆ์œผ๋ฉฐ, ๋ณธ ๋…ผ๋ฌธ์€ ์ด๋ณด๋‹ค ๋” ๋†’์€ ์ˆ˜์ค€์˜ ์ •๋ณด๋ฅผ transfer ํ•˜๊ณ ์ž ํ•œ๋‹ค. ์ตœ๊ทผ ์—ฐ๊ตฌ๋“ค์—์„œ๋Š” unlabeled data์—์„œ ๋‹จ์–ด ์ˆ˜์ค€์˜ ์˜๋ฏธ๋ฅผ ๋„˜์–ด์„œ ๋” ๊ณ ์ฐจ์›์ ์ธ ๋ฌธ๋งฅ ์ˆ˜์ค€(phrase-level)์ด๋‚˜ ๋ฌธ์žฅ ์ˆ˜์ค€(sentence-level)์˜ ์ž„๋ฒ ๋”ฉ์„ ์‹œ๋„ํ•˜๊ธฐ ์‹œ์ž‘ํ–ˆ๋‹ค.

 

Unsupervised pre-training): ๋น„์ง€๋„ ์‚ฌ์ „ํ•™์Šต์€ ์ข‹์€ ์‹œ์ž‘์ (initialization point)์„ ์ฐพ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ๋ผ๋Š” ์ ์—์„œ ์ค€์ง€๋„ํ•™์Šต์˜ ํŠน๋ณ„ํ•œ ์œ ํ˜•์ด๋‹ค. ์ดˆ๊ธฐ ์—ฐ๊ตฌ๋“ค์—์„œ๋Š” ์ด๋ฅผ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜์™€ ํšŒ๊ท€ ๋ฌธ์ œ์— ์‚ฌ์šฉํ–ˆ๊ณ , ์ดํ›„ ์—ฐ๊ตฌ๋“ค์—์„œ๋Š” ์‚ฌ์ „ํ•™์Šต์ด ์šฐ์ˆ˜ํ•œ ์ •๊ทœํ™”(regularization)๋กœ ์ž‘์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์ฆ๋ช…ํ•˜์˜€์œผ๋ฉฐ, ์ด๋Š” DNN์—์„œ์˜ ์ผ๋ฐ˜์„ฑ์„ ๋†’์—ฌ์ฃผ์—ˆ๋‹ค. 

ํ•ด๋‹น ๋…ผ๋ฌธ๊ณผ ๋น„์Šทํ•œ ์—ฐ๊ตฌ๋กœ๋Š” ์–ธ์–ด ๋ชจ๋ธ objective๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์ „ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๊ณ , ์ด๋ฅผ ๋ชฉํ‘œ ์ž‘์—…์„ ์œ„ํ•ด ๋ฏธ์„ธ์กฐ์ •ํ•˜๋Š” ์—ฐ๊ตฌ๊ฐ€ ์žˆ์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด ์—ฐ๊ตฌ์—์„œ๋Š” ์‚ฌ์ „ํ•™์Šต์„ ํ•  ๋•Œ ์–ธ์–ด์  ์ •๋ณด๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด LSTM์„ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ, ์ด ๋•Œ๋ฌธ์— ์งง์€ ๋ฒ”์œ„์˜ ๋ฐ์ดํ„ฐ์—์„œ๋งŒ ๋ชจ๋ธ์ด ์œ ํšจํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” transformer๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธด ๋ฒ”์œ„์˜ ๋ฐ์ดํ„ฐ์—์„œ๋„ ์œ ํšจํ•˜๋„๋ก ํ–ˆ๋‹ค. ๋˜ํ•œ ๋” ํฐ ๋ฒ”์œ„์˜ ์ž‘์—…์—์„œ๋„ ์œ ์šฉํ•œ๋ฐ, ์ด๋Š” ์ž์—ฐ์–ด ์ถ”๋ก , paraphrase ๊ฐ์ง€, ๊ทธ๋ฆฌ๊ณ  story completion๋ฅผ ํฌํ•จํ•œ๋‹ค. ๋˜ํ•œ ๋‹ค๋ฅธ ์—ฐ๊ตฌ๋“ค์—์„œ๋Š” ๋ชฉํ‘œ ์ž‘์—…์„ ์œ„ํ•œ ์ง€๋„ํ•™์Šต์„ ์ง„ํ–‰ํ•  ๋•Œ ์‚ฌ์ „ํ•™์Šต์ด๋‚˜ ๊ธฐ๊ณ„ ๋ฒˆ์—ญ ๋ชจ๋ธ์—์„œ ๊ฐ€์ ธ์˜จ  ์€๋‹‰ ํ‘œํ˜„(hidden representation)์„ ๋ณด์กฐ feature๋กœ์จ ํ™œ์šฉํ•˜๋Š”๋ฐ,  ์ด๋Š” ๊ฐ ์ž‘์—…์„ ์œ„ํ•œ ์ƒ๋‹นํ•œ ์–‘์˜ ์ƒˆ๋กœ์šด ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์š”๊ตฌํ•œ๋‹ค. ์ด์— ๋น„ํ•ด ๋ณธ ๋…ผ๋ฌธ์˜ ๋ชจ๋ธ์€ transfer ์‹œ ๋ชจ๋ธ ๊ตฌ์กฐ์— ๋Œ€ํ•ด ์ตœ์†Œํ•œ์˜ ๋ณ€ํ™”๋งŒ์„ ์š”๊ตฌํ•œ๋‹ค. 

 

Auxiliary training objectives: ๋ณด์กฐ ๋น„์ง€๋„ ํ•™์Šต ๋ชฉ์ ํ•จ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์€ ์ค€์ง€๋„ํ•™์Šต์˜ ๋Œ€์•ˆ์— ๊ฐ€๊น๋‹ค. ์ดˆ๊ธฐ ์—ฐ๊ตฌ์—์„œ๋Š” POS tagging๊ณผ ๊ฐ™์€ ๋ณด์กฐ NLP ์ž‘์—…์„ ์‚ฌ์šฉํ•˜์—ฌ sementic role labeling์„ ๊ฐœ์„ ํ•˜์˜€์œผ๋ฉฐ, ์ตœ๊ทผ์—๋Š” ๋ณด์กฐ ์–ธ์–ด ๋ชจ๋ธ๋ง ๋ชฉ์ ํ•จ์ˆ˜๋ฅผ target task์˜ ๋ชฉ์ ํ•จ์ˆ˜์— ์‚ฌ์šฉํ•˜์—ฌ ์‹œํ€€์Šค ๋ผ๋ฒจ๋ง์—์„œ์˜ ์„ฑ๋Šฅ์  ๊ฐœ์„ ์„ ์ฆ๋ช…ํ–ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ์‹คํ—˜์—์„œ๋„ ๋ณด์กฐ ๋ชฉ์ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ๋น„์ง€๋„ ์‚ฌ์ „ํ•™์Šต ์ž์ฒด๊ฐ€ ์ด๋ฏธ ๋ชฉํ‘œ ์ž‘์—…๊ณผ ์—ฐ๊ด€๋œ ์–ธ์–ด์  ์ •๋ณด๋ฅผ ํ•™์Šตํ•œ๋‹ค.

  • ๋‹จ์–ด ์ˆ˜์ค€๋ณด๋‹ค ๋” ๊ณ ์ฐจ์›์ ์ธ ์ˆ˜์ค€์˜ ์ •๋ณด๋ฅผ transfer ํ•˜๊ณ ์ž ํ•œ๋‹ค(๋ฌธ๋งฅ ์ˆ˜์ค€, ๋ฌธ์žฅ ์ˆ˜์ค€ ๋“ฑ)
  • ๋น„์ง€๋„ ์‚ฌ์ „ํ•™์Šต์˜ ๋ชฉํ‘œ๋Š” ์ข‹์€ ์‹œ์ž‘์ ์„ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค.
  • Transformer๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธด ๋ฒ”์œ„์˜ ๋ฐ์ดํ„ฐ์—์„œ๋„ ์œ ํšจํ•˜๋‹ค.
  • Transfer์‹œ ์ตœ์†Œํ•œ์˜ ๋ณ€ํ™”๋งŒ์„ ์š”๊ตฌํ•œ๋‹ค

Framework

์œ„์—์„œ ๋งํ–ˆ๋“ฏ์ด, GPT์˜ ํ•™์Šต์€ unsupervised pre-training๊ณผ supervised fine-tuning์˜ ๋‹จ๊ณ„๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค.

1. Unsupervised pre-training

GPT๋Š” ์ฃผ์–ด์ง„ embedding๋“ค์— ๋Œ€ํ•ด transformer์˜ decoding block๋“ค๋งŒ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๊ณ , ๊ทธ๋ ‡๊ฒŒ ๊ฒฐ๊ณผ๋ฌผ์„ ์˜ˆ์ธกํ•œ๋‹ค. 

์œ„ ์‹์—์„œ ์œ ์ถ”ํ•  ์ˆ˜ ์žˆ๋“ฏ์ด, ๋ฐ”๋กœ ์ „ ๋‹จ๊ณ„์—์„œ k๋ฒˆ์งธ ์ด์ „ ๋‹จ๊ณ„๊นŒ์ง€์˜ token๋“ค์„ ์‚ดํŽด๋ณธ ์ดํ›„์—, ๊ทธ๊ฒƒ์„ ๋ฐ”ํƒ•์œผ๋กœ i๋ฒˆ์งธ ๋‹จ์–ด๊ฐ€ ๋ฌด์—‡์ธ์ง€์— ๋Œ€ํ•œ likelihood๋ฅผ ์ตœ๋Œ€ํ™”์‹œํ‚ค๋Š” ๊ฒƒ์ด unsupervied pre-training์˜ ๋ชฉ์ ์ด๋‹ค.

๋” ์ž์„ธํ•œ ์‹์œผ๋กœ ๋ณด์ž๋ฉด,

์œ„ ์‹์—์„œ ๊ฐ ๋ณ€์ˆ˜๋“ค์˜ ์˜๋ฏธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

  • U = (u_(-k),...., u_(-1)): token๋“ค์˜ context vector
  • n: layer์˜ ๊ฐœ์ˆ˜ (์Œ“์•„ ์˜ฌ๋ฆฐ decoder block์˜ ๊ฐœ์ˆ˜, ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ)
  • We: token embedding matrix
  • Wp: position embedding matrix

์šฐ์„  ํ† ํฐ๋“ค์˜ context vector๊ฐ€ ์ž…๋ ฅ๋˜๊ณ  token embedding, position embedding์˜ ์ž‘์—…์„ ๊ฑฐ์ณ h0๊ฐ€ ์ƒ์„ฑ๋œ๋‹ค.

์ดํ›„, h_(l-1) ๋ฒˆ์งธ ํ•ญ๋ชฉ์„ transformer์˜ n๋ฒˆ๋งŒํผ decoder ๋ถ€๋ถ„์— ํ†ต๊ณผ์‹œํ‚ค๊ณ , ์ตœ์ข…์ ์œผ๋กœ feed forward network, softmax ํ•จ์ˆ˜๋ฅผ ๊ฑฐ์ณ ๋งˆ์ง€๋ง‰ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•œ๋‹ค.

์ด๋•Œ ์ค‘์š”ํ•œ ์ ์€ ํ† ํฐ์„ processing ํ•  ๋•Œ, masked self-attention์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์ ์ด๋‹ค.
(masked self-attention: ๋‚ด๊ฐ€ processing ํ•˜๊ณ ์ž ํ•˜๋Š” token์˜ ๋‹ค์Œ sequence์˜ token๋“ค์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค)

 

2. Supervised fine-tuning

y๋ผ๋Š” label์ด ์ฃผ์–ด์ง„ x_1๋ถ€ํ„ฐ x_m๊นŒ์ง€์˜ ํ† ํฐ๋“ค์˜ sequence๊ฐ€ input์œผ๋กœ ๋“ค์–ด์˜ค๊ฒŒ ๋˜๋ฉด, ํ•ด๋‹น input๋“ค์€ pre-trained model์— ๋“ค์–ด๊ฐ€ final transformer block's activation h_l^m์„ ์–ป๊ฒŒ ๋œ๋‹ค.

์ดํ›„, h_l^m์„ ์ƒˆ๋กœ์šด linear output layer์— ๋„ฃ์–ด ์˜ˆ์ธกํ•œ๋‹ค.

์ฆ‰, GPT์˜ unsupervised hidden state์ธ x^m์˜ hidden state block์„ ๊ฐ€์ ธ๋‹ค๊ฐ€ linear layer์— ๋„ฃ๊ณ , softmax ํ•จ์ˆ˜๋ฅผ ๊ฑฐ์ณ ์ตœ์ข… ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ํ•ด๋‹น ํ™•๋ฅ ์ด ์•„๋ž˜ ๊ทธ๋ฆผ์˜ L2๊ฐ€ ๋œ๋‹ค.

์ดํ›„, ์ €์ž๋“ค์€ ์œ„ ๋‹จ๊ณ„๋“ค์„ ํ†ตํ•ด ๋” ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ƒˆ๋‹ค.

1. L1(U)๋ฅผ ํ†ตํ•ด language model์„ pre-training ํ•˜๊ณ , 

2. ์ดํ›„ task-specific ํ•œ dataset์ด ์ฃผ์–ด์ง€๋ฉด, ํ•ด๋‹นํ•˜๋Š” dataset์— ๋Œ€ํ•œ language ๋ชจ๋ธ์˜ fine-tuning๊ณผ, supervised learning์— ํ•ด๋‹นํ•˜๋Š” ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ํ•จ๊ป˜ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ทน๋Œ€ํ™”ํ•˜๋ฉด ๋” ์ข‹์€ ์„ฑ๋Šฅ์ด ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒƒ์„ ๋ณด์˜€๋‹ค.

3. Task-specific input transformation

์œ„์™€ ๊ฐ™์ด classification, entailment, similarity, multiple choice ๋“ฑ์˜ task๊ฐ€ ์žˆ๋‹ค.

๊ฐ๊ฐ์˜ task์— ๋Œ€ํ•˜์—ฌ input์„ ์œ„์™€ ๊ฐ™์ด ๋‹ค๋ฅด๊ฒŒ ๋งŒ๋“ค๋ฉด ํ›จ์”ฌ ํšจ๊ณผ์ ์ด์—ˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

 

Experiments

์œ„ ๊ทธ๋ฆผ์—์„œ, ์™ผ์ชฝ์—์„œ ๋ณด๋“ฏ์ด, decoding block์„ ์Œ“์„์ˆ˜๋ก ์„ฑ๋Šฅ์ด ์ ์  ์ข‹์•„์กŒ๋‹ค. (๋…ผ๋ฌธ์—์„œ๋Š” ์ตœ๋Œ€ 12๊ฐœ๊นŒ์ง€ ์Œ“์•˜๋‹ค.)

์˜ค๋ฅธ์ชฝ ๊ทธ๋ฆผ์—์„œ ๋ณด๋“ฏ์ด, fine tuning๊ณผ zero-shot ๋ฐฉ๋ฒ•์„ ๋น„๊ตํ–ˆ์„ ๋•Œ, fine-tuning์„ ์ง„ํ–‰ํ•˜๋ฉด ์„ฑ๋Šฅ์ด ๋” ์ข‹์•„์ง์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

 

'๐Ÿ“š ๋…ผ๋ฌธ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Discovering New Intents with Deep Aligned Clustering  (0) 2023.08.16
A Probabilistic Framework for Discovering New Intents  (0) 2023.07.27
USTORY: Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding  (0) 2023.07.11
CLICK: Constrastive Learning for Injecting Contextual Knowledge to Conversational Recommender System  (0) 2023.06.26
Attention is All You Need  (2) 2023.06.16
  1. Abstract
  2. Introduction
  3. Related Work
  4. Framework
  5. 1. Unsupervised pre-training
  6. 2. Supervised fine-tuning
  7. 3. Task-specific input transformation
  8. Experiments
'๐Ÿ“š ๋…ผ๋ฌธ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • A Probabilistic Framework for Discovering New Intents
  • USTORY: Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding
  • CLICK: Constrastive Learning for Injecting Contextual Knowledge to Conversational Recommender System
  • Attention is All You Need
์žฅ์˜์ค€
์žฅ์˜์ค€
groomielife
์žฅ์˜์ค€
youngjangjoon
์žฅ์˜์ค€
์ „์ฒด
์˜ค๋Š˜
์–ด์ œ
  • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (35)
    • ๐Ÿ“š ๋…ผ๋ฌธ (10)
    • ๐Ÿ’ป ํ”„๋กœ์ ํŠธ (14)
      • ๐ŸŽ“ RESUMAI (6)
      • ๐Ÿงธ TOY-PROJECTS (8)
    • ๐Ÿ“š ์Šคํ„ฐ๋”” (11)
      • CS224N (6)
      • NLP (5)

์ธ๊ธฐ ๊ธ€

ํƒœ๊ทธ

  • ArcFace
  • allauth
  • ์ž๊ธฐ์†Œ๊ฐœ์„œ์ƒ์„ฑ
  • ๋น„๋™๊ธฐ ์ €์žฅ
  • vectordb
  • gpt-1
  • ์ƒ์„ฑAI
  • DEEPALIGNED
  • GenAI
  • story discovery
  • Haar-cascade
  • ๋…ผ๋ฌธ
  • Conversational Agent
  • MTP-CL
  • dj-rest-auth
  • rag
  • project
  • DEEPLOOK
  • NeuralNet
  • text embedding
  • Neural Net
  • RESUMAI
  • text clustering
  • pinecone
  • CS224N
  • Representation Training
  • cv
  • ์ž์†Œ์„œ์ƒ์„ฑํ”„๋กœ์ ํŠธ
  • contrastive learning
  • NLP
hELLO ยท Designed By ์ •์ƒ์šฐ.
์žฅ์˜์ค€
GPT-1: Improving Language Understanding by Generative Pre-Training
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”

๋‹จ์ถ•ํ‚ค

๋‚ด ๋ธ”๋กœ๊ทธ

๋‚ด ๋ธ”๋กœ๊ทธ - ๊ด€๋ฆฌ์ž ํ™ˆ ์ „ํ™˜
Q
Q
์ƒˆ ๊ธ€ ์“ฐ๊ธฐ
W
W

๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๊ธ€

๊ธ€ ์ˆ˜์ • (๊ถŒํ•œ ์žˆ๋Š” ๊ฒฝ์šฐ)
E
E
๋Œ“๊ธ€ ์˜์—ญ์œผ๋กœ ์ด๋™
C
C

๋ชจ๋“  ์˜์—ญ

์ด ํŽ˜์ด์ง€์˜ URL ๋ณต์‚ฌ
S
S
๋งจ ์œ„๋กœ ์ด๋™
T
T
ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ ์ด๋™
H
H
๋‹จ์ถ•ํ‚ค ์•ˆ๋‚ด
Shift + /
โ‡ง + /

* ๋‹จ์ถ•ํ‚ค๋Š” ํ•œ๊ธ€/์˜๋ฌธ ๋Œ€์†Œ๋ฌธ์ž๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํ‹ฐ์Šคํ† ๋ฆฌ ๊ธฐ๋ณธ ๋„๋ฉ”์ธ์—์„œ๋งŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.