๐Ÿ“š ๋…ผ๋ฌธ

๐Ÿ“š ๋…ผ๋ฌธ

GPT-1: Improving Language Understanding by Generative Pre-Training

AbstractNatural language์—๋Š” unlabeled text์˜ ๋ฐ์ดํ„ฐ ์ˆ˜๊ฐ€ labeled text์˜ ๋ฐ์ดํ„ฐ ์ˆ˜๋ณด๋‹ค ํ›จ์”ฌ ๋งŽ๋‹ค. ํ•ด๋‹น ์‚ฌ์‹ค์— ๊ทผ๊ฑฐํ•˜์—ฌ OpenAI์—์„œ๋Š” ๋‹ค์–‘ํ•œ unlabeled text๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ generative ํ•˜๊ฒŒ pre-train ์‹œํ‚จ GPT ๋ชจ๋ธ์„ ์ œ์‹œํ–ˆ๋‹ค. ํ•ด๋‹น ๋ชจ๋ธ์€ ์ด์ „ ๋ชจ๋ธ๋“ค๋ณด๋‹ค ํ›จ์”ฌ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์ฆ๋ช…ํ–ˆ๋‹ค.Introductionunlabeled data๋กœ๋ถ€ํ„ฐ word-level ์ด์ƒ์˜ ์ •๋ณด๋ฅผ ๋Œ์–ด๋‚ด๋Š” ๊ฒƒ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‘ ๊ฐ€์ง€์˜ ์ด์œ ๋กœ ์–ด๋ ต๋‹ค:transfer์— ์œ ์šฉํ•œ text ํ‘œํ˜„์„ ๋ฐฐ์šฐ๋Š” ๊ฒƒ์— ์–ด๋–คํ•œ ํ˜•ํƒœ์˜ ์ตœ์ ํ™” ๋ชฉ์  (optimation objectives)๊ฐ€ ์ข‹์€์ง€ ๋ชจ๋ฅธ๋‹ค.ํ•™์Šต๋œ ํ‘œํ˜„์„ target task์— ์ „๋‹ฌํ•  ๊ฐ€์žฅ ์ข‹์€..

๐Ÿ“š ๋…ผ๋ฌธ

Attention is All You Need

BackgroundSeq2Seq ๋ชจ๋ธEncoder์™€ Decoder๋ฅผ ๊ฐ๊ฐ RNN์œผ๋กœ ๊ตฌ์„ฑํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.๋™์ž‘์›๋ฆฌ‘๋‚˜๋Š”’, ‘ํ˜ธ๋‘๋ฅผ’, ‘์‚ฌ๋ž‘ํ•ด’๋ผ๋Š” 3๊ฐœ์˜ ํ† ํฐ๋“ค์„ ์ˆœ์ฐจ์ ์œผ๋กœ LSTM ์…€์— ๋„ฃ์œผ๋ฉด , hidden state๋ฅผ ํ•˜๋‚˜์”ฉ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.์ด๋ ‡๊ฒŒ ํ† ํฐ๋“ค์˜ hidden state๋“ค์ด ์ถœ๋ ฅ๋˜๋ฉด, ๋งˆ์ง€๋ง‰ hidden state๋Š” ์ •๋ณด๋ฅผ ์••์ถ•ํ•œ vector๊ฐ€ ๋˜๊ณ , ์ด๋ฅผ Context Vector ๋ผ๊ณ  ์นญํ•ฉ๋‹ˆ๋‹ค.Context Vector๋ฅผ ํ†ตํ•ด ์ดํ›„ token๋“ค์„ ๋„ฃ์—ˆ์„ ๋•Œ ๋‹ค์Œ token ์˜ˆ์ธก์„ ์œ„ํ•œ hidden state๊ฐ€ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.๋ฌธ์ œ์ Sequence๊ฐ€ ๊ธธ์–ด์ง€๋Š” ๊ฒฝ์šฐ์—๋Š” Gradient Vanishing ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜์—ฌ Context Vector์— ์•ž ์ˆœ์„œ token๋“ค์˜ ์ •๋ณด๊ฐ€ ์†Œ์‹ค๋˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ..

์žฅ์˜์ค€
'๐Ÿ“š ๋…ผ๋ฌธ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก (3 Page)