๐Ÿ“š ์Šคํ„ฐ๋””/CS224N

[CS224N] 4. Syntactic Structure and Dependency Parsing

์žฅ์˜์ค€ 2023. 11. 18. 03:31

4๋ฒˆ์งธ ๊ฐ•์˜๋Š” ๋ฌธ์žฅ์— ๋Œ€ํ•œ ๋ถ„์„ ๋ฐฉ๋ฒ•์— ๋‹ค๋ฃฌ๋‹ค. ํŠนํžˆ Dependency Parsing ๊ธฐ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•˜๋Š”๋ฐ, ๊ทธ๋™์•ˆ์˜ ๋ฐฉ์‹๋“ค๊ณผ ํ˜„๋Œ€์˜ neural dependency parsing ๋ฐฉ์‹์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•œ๋‹ค.

1. Two views of linguistic structure

๋ฌธ์žฅ์˜ ๊ตฌ์กฐ๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์€ ๋‘ ๊ฐ€์ง€๊ฐ€ ์žˆ๋Š”๋ฐ, ํ•˜๋‚˜๋Š” Constituency parsing, ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” Dependency parsing์ด๋‹ค.

๊ฐ„๋‹จํ•˜๊ฒŒ Consitituency parsing์€ ๋ฌธ์žฅ์˜ ๊ตฌ์„ฑ์š”์†Œ๋ฅผ ํ†ตํ•ด ๋ฌธ์žฅ ๊ตฌ์กฐ๋ฅผ ๋ถ„์„ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๊ณ , Dependency parsing์€ ๋‹จ์–ด ๊ฐ„ ์˜์กด ๊ด€๊ณ„๋ฅผ ํ†ตํ•ด ๊ตฌ์กฐ๋ฅผ ๋ถ„์„ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์กฐ๊ธˆ ๋” ๊นŠ๊ฒŒ ๋“ค์–ด๊ฐ€ ๋ณด์ž.

1. Constituency Parsing: Context-Free-Grammars(CFGs)

Context free grammars๋ž€, ์˜ˆ์ „์— ์˜์–ดํ•™์›์—์„œ ๋ฌธ์žฅ ๋ถ„์„ํ•  ๋•Œ ๋‹จ์–ด ๋ฐ‘์—๋‹ค๊ฐ€ ํ’ˆ์‚ฌ๋ฅผ ์จ๋†“๋Š” ๋ฐฉ์‹์„ ์ƒ๊ฐํ•˜๋ฉด ํŽธํ•˜๋‹ค. 

์•„๋ž˜์ฒ˜๋Ÿผ ๋‹จ์–ด๋กœ ์‹œ์ž‘ํ•˜๋ฉด, the-Det, cat-N, cuddly-Adj, by-P, door-N ์ด ๋œ๋‹ค.

์ด ๋‹จ์–ด๋“ค์„ ์ ˆ๋กœ ๋ฐ”๊พธ๋ฉด the cuddly cat - Noun phrase, by the door - Preposition phrase๊ฐ€ ๋œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋Ÿฐ ์ ˆ์€ ๊ฒฐํ•ฉ๋˜์–ด ๋” ํฐ ์ ˆ์„ ๋งŒ๋“ค ์ˆ˜๋„ ์žˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์ด ๋ฐฉ๋ฒ•์€ ๋ฌธ์žฅ ๊ตฌ์กฐ๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ๋ถ„๋ช… ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ๋ฐ”๋กœ ์˜๋ฏธ ๊ตฌ์กฐ๋ฅผ ํŒŒ์•…ํ•˜์ง€ ๋ชปํ•œ๋‹ค๊ฑฐ๋‚˜, ๋‹ค๋ฅธ ๋‹จ์–ด์™€์˜ ๊ด€๊ณ„๋ฅผ ์•Œ ์ˆ˜ ์—†๋‹ค๋Š” ์ ์ธ๋ฐ, ๊ฐ€๋ น ๋ฌธ์žฅ์—์„œ ๋Œ€๋ช…์‚ฌ๊ฐ€ ํ•œ ๋ช…์‚ฌ๋ฅผ ๋‚˜ํƒ€๋‚ด๋ ค๋Š” ๊ฒฝ์šฐ๊ฐ€ ์ด์— ํ•ด๋‹นํ•œ๋‹ค.

์ด๋ฅผ ์œ„ํ•ด Dependency Parsing์„ ์‚ฌ์šฉํ•œ๋‹ค.


2. Dependency Parsing

Dependency Parsing์˜ ๊ฐ€์žฅ ํฐ ์žฅ์ ์€ ์ด๊ฒƒ์ด ์–ด๋–ค ๋‹จ์–ด๊ฐ€ ๋‹ค๋ฅธ ์–ด๋–ค ๋‹จ์–ด์— ์˜์กดํ•˜๋Š”์ง€ ๋ณด์—ฌ์ค€๋‹ค๋Š” ์ ์ด๋‹ค.

์˜ˆ์‹œ๋กœ, ๋‹ค์Œ ๋ฌธ์žฅ์„ ์ƒ๊ฐํ•ด ๋ณด์ž:

Bills on ports and immigration were submitted by Senator Brownboack, Republican of Kansas.

์ด ๋ฌธ์žฅ์„ dependency๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด, ๋‹ค์Œ๊ณผ ๊ฐ™์ด tree ํ˜•ํƒœ๋กœ ๊ทธ๋ ค๋ณผ ์ˆ˜ ์žˆ๋‹ค.

๊ฐ•์˜์—์„œ ์„ค๋ช…ํ•œ ๋ช‡ ๊ฐ€์ง€ dependency structure์„ ๋‚˜ํƒ€๋‚ด๋Š” ๊ทœ์น™์„ ๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

  1. Pseudo Element์ธ ROOT๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ชจ๋“  ์„ฑ๋ถ„์˜ ์ตœ์ข… head๊ฐ€ ROOT๊ฐ€ ๋˜๋„๋ก ํ•œ๋‹ค.
  2. ๋ชจ๋“  ์„ฑ๋ถ„์€ ๋‹จ ํ•˜๋‚˜์˜ head๋งŒ ๊ฐ€์ง„๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ•œ head๋Š” ๋งŽ์€ dependency structure์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋‹ค.
  3. ๊ฐ•์˜์—์„œ๋Š” ๋ชจ๋“  ์ „์น˜์‚ฌ๋ฅผ case๋กœ ๊ฐ„์ฃผํ•œ๋‹ค. ์ด๋Š” ๊ฐ™์ด ์˜ค๋Š” ๋ช…์‚ฌ์˜ dependent๋กœ ํฌํ•จํ•ด ๋ฒ„๋ฆฌ๋Š” ๊ฒƒ์ด๋‹ค. ์˜ˆ์‹œ๋กœ, ๊ทธ๋ฆผ์—์„œ Brownback์˜ dependency ์ค‘ ํ•˜๋‚˜๋กœ by๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
  4. ํ™”์‚ดํ‘œ๋Š” head์—์„œ dependency๋กœ ๊ทธ๋ฆฐ๋‹ค. ๋ฌผ๋ก  ์ด๊ฑด ๊ทธ๋ฆฌ๋Š” ์‚ฌ๋žŒ ๋งˆ์Œ์ด์ง€๋งŒ, ๊ฐ•์˜์—์„œ๋Š” ํ†ต์ผ์„ฑ์„ ์œ„ํ•ด์„œ ์ด๋Ÿฐ ์ˆœ์„œ๋กœ ๊ทธ๋ ธ๋‹ค๊ณ  ํ•œ๋‹ค.

์ฐธ๊ณ ๋กœ ์œ„ ๊ทธ๋ฆผ์—์„œ ์„  ์œ„์— ์“ฐ์—ฌ์žˆ๋Š” ๊ฑด ๋‘ ๋‹จ์–ด ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์ด๋‹ค.

Dependency Conditioning Preferences

๊ทธ๋ ‡๋‹ค๋ฉด dependency parsing์˜ sources of information์œผ๋กœ๋Š” ๋ฌด์—‡์ด ์žˆ์„๊นŒ?

  1. Bilexical effects: ๋‘ ๋‹จ์–ด๊ฐ„์˜ ์˜๋ฏธ ๊ด€๊ณ„ ์ •๋ณด์ด๋‹ค. 
  2. Dependency distance: ๋ณดํ†ต์˜ dependency๋“ค์€ ์ธ์ ‘ํ•œ ๋‘ ๋‹จ์–ด ์‚ฌ์ด์—์„œ ์ผ์–ด๋‚œ๋‹ค๊ณ  ํ•œ๋‹ค.
  3. Intervening material: ๋ณดํ†ต dependency ๊ด€๊ณ„๋Š” ๋™์‚ฌ๋‚˜ ๊ตฌ๋‘์ ์„ ๋„˜์–ด์„œ ์ƒ๊ธฐ์ง€ ์•Š๋Š”๋‹ค๊ณ  ํ•œ๋‹ค.
  4. Valency of Heads: ์ด๋Š” Head ๊ฐ€ ๋ฌด์—‡์ด๋ƒ์— ๋”ฐ๋ผ์„œ ๊ทธ๊ฒƒ์ด ๊ฐ–๋Š” dependents (์ข…๋ฅ˜์™€ ๋ฐฉํ–ฅ - ์™ผ, ์˜ค)์— ๋Œ€ํ•œ ํŒจํ„ด์ด ์–ด๋А ์ •๋„ ์กด์žฌํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์˜ˆ์‹œ๋กœ, ๊ด€์‚ฌ the๋Š” dependent ๊ฐ€ ์—†๋Š” ๋ฐ˜๋ฉด, ๋™์‚ฌ๋Š” ๋งŽ์€ dependents๋ฅผ ๊ฐ–๋Š”๋‹ค. ์˜์–ด์—์„œ Noun ๋„ ๋งŽ์€ dependents ๋ฅผ ๊ฐ–๋Š”๋ฐ, ํ˜•์šฉ์‚ฌ dependent๋Š” (์ฃผ๋กœ) ์™ผ์ชฝ์— ์œ„์น˜ํ•˜๊ณ , ์ „์น˜์‚ฌ dependent ๋Š” ์˜ค๋ฅธ์ชฝ์— ์œ„์น˜ํ•œ๋‹ค.

Projectivity

projective parse์˜ ์ •์˜์— ์˜๊ฑฐํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์›์น™๋“ค์ด ์ง€์ผœ์ ธ์•ผ ํ•œ๋‹ค.

  • dependency arc๋“ค์ด ๋‚˜์—ด๋  ๋•Œ, ๊ฐ arc๋“ค์ด ์„œ๋กœ๋ฅผ ๊ต์ฐจํ•˜๋ฉด ์•ˆ ๋œ๋‹ค.
  • CFG์— ํ•ด๋‹นํ•˜๋Š” dependency๋“ค์€ ๋ฐ˜๋“œ์‹œ projective ํ•ด์•ผ ํ•œ๋‹ค. ์ฆ‰, ๋ฐ˜๋“œ์‹œ ํ•˜๋‚˜์˜ ROOT๊ฐ€ ์žˆ์–ด์•ผ ํ•œ๋‹ค.

๋Œ€๋ถ€๋ถ„์˜ ๋ฌธ์žฅ ๊ตฌ์กฐ๋“ค์ด ์ด๋ ‡๊ฒŒ projective ํ•˜์ง€๋งŒ, ๊ฐ€๋” ์ฃผ์–ด์™€ ๋™์‚ฌ์˜ ์œ„์น˜๊ฐ€ ๋ณ€ํ•˜๋Š” ๊ตฌ์กฐ๋ฅผ ์œ„ํ•ด์„œ non-projective ๊ตฌ์กฐ๋ฅผ ํ—ˆ์šฉํ•˜๊ธฐ๋„ ํ•œ๋‹ค. ์˜ˆ์‹œ๋กœ ๋‹ค์Œ ๋ฌธ์žฅ 2๊ฐœ๋ฅผ ๋ณด์ž.

  • Who did Bill buy the coffee from yesterday?
  • From Who did bill buy the coffee yesterday?

์ด๋ ‡๊ฒŒ ๋‘ ๋ฒˆ์งธ ๋ฌธ์žฅ๊ณผ ๊ฐ™์€ ๊ฒฝ์šฐ, non-projective ํ•œ ๊ตฌ์กฐ๊ฐ€ ํ•„์š”ํ•˜๊ธฐ๋„ ํ•˜๋‹ค.


2. Methods of dependency parsing

Dependency parsing์˜ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” DP, Graph alogrithms, Constrain Satisfaction ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์ด ๋งŽ์ง€๋งŒ, ๋ณธ ๊ฐ•์˜์—์„œ๋Š” ๊ฐ€์žฅ ์œ ์šฉํ•˜๊ฒŒ ์‚ฌ์šฉ๋˜๋Š” Transition-based parsing์— ๊ด€ํ•ด ๋‹ค๋ฃฌ๋‹ค.

Greedy transition-based parsing

๋จผ์ € ์ด greedy๊ฐ€ ๋ถ™์€ ์ด๋ฆ„์„ ๋ณด์•„, ๊ฐ ์ƒํ™ฉ์— ๋”ฐ๋ผ ๊ฐ€์žฅ ์ตœ๊ณ ์˜ ์ด์ต์„ ์ทจํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์ด๋ผ๋Š” ๊ฒƒ์„ ์ƒ๊ฐํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ดํŽด๋ณด๋ฉด, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ตฌ์„ฑ ์š”์†Œ๊ฐ€ ์žˆ๋‹ค:

  • ๋ฌธ์žฅ์˜ ๋ชจ๋“  ๋‹จ์–ด๋Š” stack๊ณผ buffer์— ์กด์žฌํ•œ๋‹ค.
  • ๊ฐ€๋Šฅํ•œ ๋™์ž‘์€ 3๊ฐ€์ง€์ด๋‹ค.
    1. Shift: buffer์˜ top word๋ฅผ stack์˜ top position์œผ๋กœ ์ด๋™ํ•œ๋‹ค.
    2. Left-Arc: stack์—์„œ top 2 words๋ฅผ ๊ณจ๋ผ dependency๋ฅผ ๋งŒ๋“ค๊ณ  (<-), dependent๋ฅผ ์ œ๊ฑฐํ•œ๋‹ค.
    3. Right-Arc: stack์—์„œ top 2 words๋ฅผ ๊ณจ๋ผ dependency๋ฅผ ๋งŒ๋“ค๊ณ  (->), dependent๋ฅผ ์ œ๊ฑฐํ•œ๋‹ค.
  • start/end condition
    • start: buffer์—๋Š” root ๋‹จ์–ด 1๊ฐœ, stack์—๋Š” ๋‚˜๋จธ์ง€ ๋ชจ๋“  ๋‹จ์–ด๋“ค์ด ์กด์žฌํ•œ๋‹ค.
    • end: buffer์—๋Š” root ๋‹จ์–ด 1๊ฐœ, stack์€ empty

๊ทธ๋ž˜์„œ Automatic Parser๋ž€ step2์˜ ๋™์ž‘ ์ค‘ ํ•˜๋‚˜๋ฅผ ์ž๋™์ ์œผ๋กœ ์„ ํƒํ•˜๋Š” classifier์ด๋‹ค.

** ์—ฌ๊ธฐ์„œ๋Š” ๋‹จ์ˆœํ™”๋ฅผ ์œ„ํ•ด 3๊ฐ€์ง€ action ๋ฟ์ด๋ผ๊ณ  ํ–ˆ์ง€๋งŒ real application ์—์„œ๋Š” ๊ด€๊ณ„๋งˆ๋‹ค label ๊นŒ์ง€ ์ •์˜ํ•˜๊ธฐ ๋•Œ๋ฌธ์— 80์—ฌ๊ฐœ ์ •๋„์˜ action ์ค‘์—์„œ ๊ณจ๋ผ์•ผ ํ•œ๋‹ค๊ณ  ํ•œ๋‹ค. ex) Left arc - modifier, Left arc - subj, .... ๋“ฑ๋“ฑ  (์ฐธ๊ณ ๋งํฌ)

 

๊ทธ๋Ÿผ Automatic Parser๊ฐ€ 3๊ฐ€์ง€ step ์ค‘ ์ž๋™์ ์œผ๋กœ ์„ ํƒํ•˜๋Š”(classification ํ•˜๋Š”) ๋ฐฉ์‹์€ ๋ฌด์—‡์ผ๊นŒ? ๋‹ต์€ ๋จธ์‹ ๋Ÿฌ๋‹์„ ํ†ตํ•ด classifier๋ฅผ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ฒƒ์ด๊ณ , ์ด ๋ฐฉ๋ฒ•๋ก ์„ MaltParser์ด๋ผ๊ณ  ํ•œ๋‹ค.

MaltParser์—์„œ๋Š” top of stack word, ๊ทธ ๋‹จ์–ด์˜ ํ’ˆ์‚ฌ(POS), first in buffer word, ๊ทธ ๋‹จ์–ด์˜ ํ’ˆ์‚ฌ(POS) ๋“ฑ์„ input์œผ๋กœ classifier์— ๋„ฃ์–ด ๊ฐ ๋‹จ๊ณ„์—์„œ ์ˆ˜ํ–‰ํ•  optimal ํ•œ parsing ์ „๋žต์„ ์œ„ํ•ด ์‚ฌ์šฉ๋œ๋‹ค. MaltParser๋Š” ์ •๋ง ๋น ๋ฅด๊ณ  ์‹œ๊ฐ„ ๋Œ€๋น„ ๋†’์€ ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์„œ web ๊ฐ™์€๊ฑธ parsing ํ•  ๋•Œ ์œ ์šฉํ•˜๋‹ค๊ณ  ํ•œ๋‹ค.

 

๊ทธ๋Ÿฌ๋‚˜, ์œ„์—์„œ ์„ค๋ช…ํ–ˆ๋‹ค์‹œํ”ผ MaltParser๋Š” feature๋“ค์„ one-hot vector๋กœ ์ธ์ฝ”๋”ฉํ•˜๊ณ , ๊ทธ๊ฑธ concatenate ํ•ด์„œ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ฐจ์›์ด ๋งค์šฐ ์ปค์ง„๋‹ค๋Š” ๋ฌธ์ œ์ ์ด ์žˆ์—ˆ๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Neural Network๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค.

 

 


3. Neural dependency parser

๊ธฐ์กด parser์˜ ๋ฌธ์ œ์ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์•˜๋‹ค. 
  1. Sparseness : ๋‹จ์–ด๋ฅผ bag of words๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด dimension ์ด ์—„์ฒญ๋‚˜๊ฒŒ ์ปค์ง€๊ณ , ์—ฌ๊ธฐ์— ๋‹ค๋ฅธ ๊ฒƒ๋“ค๋„ ํ•ฉ์ณ์„œ ๊ฐ ๋‹จ์–ด์— ๋Œ€ํ•œ feature๋ฅผ ๋งŒ๋“œ๋Š”๋ฐ (feature table ์žˆ์Œ), ๊ทธ๋Ÿผ ๊ทธ feature๋Š” ์—„์ฒญ sparse ํ•ด์ง„๋‹ค. 
  2. Incomplete : configuration์—์„œ ๋ณธ ์ ์ด ์—†๋Š” rare words ๋‚˜ rare combination of words ์ด ์žˆ์„ ๊ฒฝ์šฐ, feature table์— ์กด์žฌํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์™„์ „ํ•˜์ง€ ์•Š์€ ๋ชจ๋ธ์ด ๋œ๋‹ค. 
  3. Expensive computation : 1๋ฒˆ๊ณผ ๊ด€๋ จ๋œ ๋ฌธ์ œ์ด๋‹ค. ์—„์ฒญ๋‚˜๊ฒŒ ๋ฐฉ๋Œ€ํ•œ feature space๋ฅผ ๊ฐ€์ง€๊ณ  ์ด๋ฅผ hash map์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค ( feature id : weight of the feature) ๊ทธ๋Ÿฐ๋ฐ ํ•˜๋„ ๋ฐฉ๋Œ€ํ•œ feature space์—์„œ ๊ฒ€์ƒ‰์„ ํ•ด์•ผ ๋˜๋‹ค ๋ณด๋‹ˆ ์ด ์ž์ฒด๋งŒ์œผ๋กœ  ์—„์ฒญ๋‚˜๊ฒŒ ๋งŽ์€ ์‹œ๊ฐ„์„ ์žก์•„๋จน๊ฒŒ ๋˜๋Š” ๊ฒƒ์ด๋‹ค.

1. First Win: Distributed Representations

์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๋‹ค์Œ ๋ฐฉ๋ฒ•๋“ค์„ ๋„์ž…ํ–ˆ๋‹ค:

  1. ๊ฐ ๋‹จ์–ด๋“ค์„ d-dimensional ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•œ๋‹ค. (ex. word embedding) -> ๋น„์Šทํ•œ ๋‹จ์–ด๋Š” ๊ฐ€๊นŒ์šด ๋ฒกํ„ฐ ๊ฑฐ๋ฆฌ๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋œ๋‹ค.
  2. ํ’ˆ์‚ฌ์™€ dependency label ๋˜ํ•œ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•˜์—ฌ ๋”ํ•œ๋‹ค.

์ด๋ ‡๊ฒŒ ์‚ฌ์ง„๊ณผ ๊ฐ™์ด, ๋‹จ์–ด์™€ ํ’ˆ์‚ฌ, dependency ๋ชจ๋‘๋ฅผ ๋ฒกํ„ฐํ™”์‹œ์ผœ ๋”ํ•ด, ํ•˜๋‚˜์˜ ๊ธด ๋ฒกํ„ฐ๋กœ ๋งŒ๋“ ๋‹ค.

2. Second Win: Deep Learning classifiers are non-linear classifiers

softmax์™€ ๊ฐ™์€ ๋น„์„ ํ˜•์„ฑ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ํ†ตํ•ด linear boundary๊ฐ€ ์•„๋‹Œ non-linear boundary๋ฅผ ์ œ๊ณตํ•จ์œผ๋กœ์จ ๋” ๋†’์€ ์ •ํ™•์„ฑ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

 

๊ทธ๋ž˜์„œ ์œ„ ๋ฐฉ๋ฒ•๋“ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งŒ๋“ค์–ด์ง„ neural network multi-class classifier๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

์—ฌ๊ธฐ์„œ ์ด๋ฃจ์–ด์ง€๋Š” ์ž‘์—…์€, hidden layer์—์„œ ์—ญ์ „ํŒŒ(backpropagation) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋ฉด์„œ, ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ํŠน์ง• ๊ณต๊ฐ„(feature space)์ƒ์—์„œ ์žฌ๋ฐฐ์น˜ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด ๊ณผ์ •์€ ๊ฐ ํด๋ž˜์Šค ๊ฐ„์˜ ๊ฒฐ์ • ๊ฒฝ๊ณ„๋ฅผ ๋ช…ํ™•ํ•˜๊ฒŒ ํ•ด์„œ softmax ํ•จ์ˆ˜๋ฅผ ํ†ตํ•œ ๋ถ„๋ฅ˜๊ฐ€ ๋” ํšจ์œจ์ ์œผ๋กœ ์ด๋ฃจ์–ด์งˆ ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.


4. More about neural networks: Dropout & Non-linearities

1. Dropout

์›๋ž˜ ๋จธ์‹ ๋Ÿฌ๋‹์—์„œ ์ •๊ทœํ™”๋Š” overfitting์„ ๋ง‰๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋๋Š”๋ฐ, ์ด์ œ ์ •๊ทœํ™”๋Š” feature co-adaptation์„ ๋ง‰๋Š” ๊ฒƒ์ด ์ฃผ ๋ชฉํ‘œ๊ฐ€ ๋œ๋‹ค. ์ฆ‰, ํ•œ ํŠน์ • feature๋งŒ ์œ ๋ณ„๋‚˜๊ฒŒ ์‚ฌ์šฉ๋  ์ˆ˜ ์—†๋„๋ก ๋ง‰์€ ๊ฒƒ์ด๊ณ , ์ด๋ฅผ Dropout ๋ฐฉ์‹์ด๋ผ๊ณ  ํ•œ๋‹ค.

๊ทธ๋ž˜์„œ ๊ทธ๋ฆผ์—์„œ์™€ ๊ฐ™์ด, ๋ช‡ ๊ฐœ์˜ input์„ ๋žœ๋ค ํ•˜๊ฒŒ ์ œ์™ธ์‹œํ‚ค๋Š” ๊ฒƒ์ด๋‹ค.

2. Non-linearities

sigmoid, tanh, hard tanh ๋“ฑ ์ค‘์š”ํ•œ ๊ฒƒ๋“ค์ด ๋งŽ์ง€๋งŒ, neural net์—์„œ ๊ฐ€์žฅ ํ”ํžˆ ์“ฐ์ด๋Š” ReLU๋Š” sigmoid ํ•จ์ˆ˜์—์„œ ouput์ด ํ•ญ์ƒ 0๊ณผ 1์ด ๋˜์–ด, ๋งŽ์ด ๋ฐ˜๋ณตํ–ˆ์„ ๋•Œ ๊ธฐ์šธ๊ธฐ๊ฐ€ ์†Œ์‹ค๋˜๋Š” ํ˜„์ƒ์„ ๋ง‰๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆ๋œ ๊ธฐ๋ฒ•์ด๋‹ค.

๊ฐ€์žฅ ์˜ค๋ฅธ์ชฝ์— ๋ณด์ด๋Š” ๊ฒƒ์ด ReLU ํ•จ์ˆ˜์ธ๋ฐ, ์ด๋ ‡๊ฒŒ ์Œ์ˆ˜ ์˜์—ญ์— ๋Œ€ํ•ด์„œ๋Š” 0์œผ๋กœ, ์–‘์ˆ˜ ์˜์—ญ์— ๋Œ€ํ•ด์„œ๋Š” ๊ทธ ๊ฐ’์„ ๊ทธ๋Œ€๋กœ ๋ฐ˜ํ™˜ํ•จ์œผ๋กœ์จ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๋ฅผ ์™„์ „ํžˆ ์ฐจ๋‹จํ–ˆ๋‹ค. ๋˜ํ•œ, ๊ธฐ์กด ํ™œ์„ฑํ™” ํ•จ์ˆ˜์— ๋น„ํ•ด ์†๋„๊ฐ€ 6๋ฐฐ๋‚˜ ๋น ๋ฅธ ์žฅ์ ๋„ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.


์ด๋ ‡๊ฒŒ, Dependency Parser๊ฐ€ ์–ด๋–ป๊ฒŒ ๋ฐœ์ „ํ•ด ์™”๋Š”์ง€, NeuralNet์„ ์ด์šฉํ•œ Dependency Parser๊ฐ€ ์–ด๋–ป๊ฒŒ ์ด์šฉ๋˜๋Š”์ง€์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์•˜๋‹ค.