๐Ÿ“š ๋…ผ๋ฌธ

A Probabilistic Framework for Discovering New Intents

์žฅ์˜์ค€ 2023. 7. 27. 01:33

chatGPT ๋•Œ๋ฌธ์ธ์ง€, ์š”์ฆ˜ ๋Œ€ํ™”ํ˜• ์‹œ์Šคํ…œ์— ๊ด€์‹ฌ์ด ์ •๋ง ๋งŽ๋‹ค.

๊ทธ์ค‘, task-oriented dialogue system์— ๋Œ€ํ•ด ์—ฐ๊ตฌ๋ฅผ ํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค.

Task-oriented dialogue system๋ฅผ ์ฃผ์ œ๋กœ ๋ฆฌ์„œ์น˜๋ฅผ ํ•ด๋ณด๋‹ˆ, intent detection (์˜๋„ ๊ฐ์ง€)๊ฐ€ ํ•ด๋‹น ์ฃผ์ œ์—์„œ ์ค‘์š”ํ•œ task๋ผ๋Š” ๊ฒƒ์„ ์•Œ์•˜๋‹ค. ๊ทธ๋Ÿด ๋งŒ๋„ ํ•œ ๊ฒƒ์ด,  task-oriented ์ฑ—๋ด‡์€ ํŠน์ • ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ ์‹œ์Šคํ…œ์ธ๋ฐ, ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ทธ ์ž‘์—…์ด ๋ฌด์—‡์— ๊ด€ํ•œ ๊ฒƒ์ธ์ง€ ๋“ฑ ์˜๋„๋ฅผ ๋จผ์ € ์•„๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๊ทธ๋ฆฌํ•˜์—ฌ ๋‚˜๋Š” ์ •๋ง ์ตœ๊ทผ์— ACL์— ์ˆ˜๋ก๋œ A Probabilistic Framework for Discovering New Intents ๋ผ๋Š” ๋…ผ๋ฌธ์œผ๋กœ ์—ฐ๊ตฌ๋ฅผ ์‹œ์ž‘ํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํฅ๋ฏธ๋กญ๊ฒŒ, ๋ถ€๋ก๊นŒ์ง€ ํ•˜๋‚˜ํ•˜๋‚˜ ๋‹ค ๋’ค์ ธ๊ฐ€๋ฉฐ ์„ฌ์„ธํ•˜๊ฒŒ ์ฝ์€ ๋…ผ๋ฌธ์€ ์ฒ˜์Œ์ธ ๊ฒƒ ๊ฐ™๋‹ค.


Abstract

์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ƒˆ๋กœ์šด ์˜๋„๋ฅผ ๋ฐœ๊ฒฌํ•˜๋Š” ๊ฒƒ์ด ์ด๋ฏธ ์•Œ๋ ค์ง„ ์˜๋„๋ฅผ ์‹๋ณ„ํ•˜๋Š” ๋ฐ ์œ ์ตํ•˜๋‹ค๋Š” ์ง๊ด€์„ ๊ฐ€์ง€๊ณ , ์˜๋„ ํ• ๋‹น (intent assignments)๋ฅผ ์ž ์žฌ ๋ณ€์ˆ˜๋กœ ๋‹ค๋ฃจ๋Š” ํ™•๋ฅ ์ ์ธ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค.

๋˜ํ•œ, ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด Expectation Mechanism์„ ๋„์ž…ํ–ˆ๋Š”๋ฐ, ์ด์— ๋Œ€ํ•œ ๊ฐœ๋…์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

  1. E-step
    • ์˜๋„ ์‹๋ณ„
    • ์˜๋„ ํ• ๋‹น์˜ ์‚ฌํ›„ ํ™•๋ฅ  (posterior probability)๋ฅผ ํ†ตํ•œ unlabeled data์˜ ๋‚ด์žฌ๋œ ๊ตฌ์กฐ ํƒ์ƒ‰
  2. M-step
    • known intent๋กœ๋ถ€ํ„ฐ ์ „๋‹ฌ๋œ ์ง€์‹์„ ๋ง๊ฐํ•˜๋Š” ๊ฒƒ์„ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด labeled data์˜ ๊ตฌ๋ณ„์„ฑ ์ตœ์ ํ™”

์ด ๋‘ ๊ฐœ๋…์„ ์ค‘์‹ฌ์œผ๋กœ ๋…ผ๋ฌธ์„ ์ „๊ฐœํ•œ๋‹ค.


Introduction

Task-Oriented Dialogue System (์ค„์—ฌ์„œ TODS)๋Š” ์œ ์ €์™€์˜ interaction์—์„œ ์ž ์žฌ์ ์ธ ์ƒˆ ์˜๋„๋ฅผ ํฌ์ฐฉํ•ด์•ผ ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด ๊ณผ์ •์€ labeled data์˜ ๋„์›€์„ ๋ฐ›์•„ interactio์—์„œ ๋ฐœ์ƒํ•œ unlabeled data ๋‚ด์˜ ์˜๋„๋“ค์„ ์ ์‘์ ์œผ๋กœ ๋ฐœ๊ฒฌํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.

Previous Works

์ด์ „ ์—ฐ๊ตฌ๋“ค์—์„œ๋Š” ์˜๋„ ๋ฐœ๊ฒฌ์„ ๋น„์ง€๋„ ํด๋Ÿฌ์Šคํ„ฐ ๊ณผ์ • (Unsupervised Cluster Learning)์œผ๋กœ ์ง„ํ–‰ํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด ๋ฐฉ๋ฒ•์—๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ๋Š”๋ฐ, ๋ฐ”๋กœ clustering ๊ณผ์ •์„ ๊ฐ€์ด๋“œํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ€์ƒ์˜ ์ง€๋„ ํ•™์Šต ์‹ ํ˜ธ ๊ตฌ์„ฑ์— ์ดˆ์ ์„ ๋‘๊ณ , ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ labeled data๋Š” ๋ฌด์‹œํ–ˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

์ด๋ฅผ ์ ์€ ์–‘์˜ labeled data๋ฅผ ๋ฏธ๋ฆฌ ์†Œ์œ ํ•˜์—ฌ ์˜๋„ ๋ฐœ๊ฒฌ์„ ๊ฐ€์ด๋“œํ•  ์ˆ˜ ์žˆ๊ณ , ๋Œ€ํ™” ์ค‘ ์ƒ์„ฑ๋˜๋Š” ๋งŽ์€ ์–‘์˜ unlabeled data์—์„œ ๊ธฐ์กด ์˜๋„์™€ ์•Œ๋ ค์ง€์ง€ ์•Š์€ ์˜๋„ ๋‘˜ ๋‹ค ๋ฐœ๊ฒฌ๋˜๋Š” ์‹ค์ œ ์‚ฌ์šฉ ์ƒํ™ฉ์— ์ ์šฉํ•  ์ˆ˜ ์žˆ์„๊นŒ ?

์šฐ์„  ์–ด๋–ป๊ฒŒ labeled data๋ฅผ ์ด์šฉํ•ด์„œ unlabeled corpus์˜ ์˜๋„๋ฅผ ์•Œ์•„๋‚ผ ์ˆ˜ ์žˆ์„์ง€ ํ™•์ธํ•ด ๋ณด์ž.

 

๋…ผ๋ฌธ์—์„œ๋Š” 2021๋…„๋„์— ์œ ๋ช…ํ–ˆ๋˜ DeepAligned ๋ชจ๋ธ์„ ์†Œ๊ฐœํ•œ๋‹ค.

์ด ๋ชจ๋ธ์€

  1. ์‚ฌ์ „์ง€์‹์— ๋Œ€ํ•œ pre-training์„ ํ†ตํ•ด unlabeled data์˜ ์˜๋ฏธ ํŠน์ง•์„ ํ•™์Šต์‹œํ‚ด์œผ๋กœ์จ ์ผ๋ฐ˜ํ™”ํ•œ๋‹ค.
  2. ์ตœ์‹ ์‹ ํ‘œํ˜„์„ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ unlabeled ๋ฐœ์–ธ์— ๊ฐ€์ƒ์˜ label์„ ๋งŒ๋“ค๊ณ  re-train ์‹œํ‚จ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์ด DeepAligned ๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ 2๊ฐ€์ง€ ์น˜๋ช…์ ์ธ ๋ฌธ์ œ์ ์ด ์žˆ์—ˆ๋‹ค.

  1. pseudo supervision signal๋กœ re-train ๋  ๋•Œ ์ „๋‹ฌ ๋‹จ๊ณ„์—์„œ model์ด ์ „๋‹ฌํ•˜๋Š” ์ง€์‹์„ ๋ง๊ฐํ•œ๋‹ค๋Š” ๋ฌธ์ œ
  2. ๊ฐ€์ƒ์˜ label๋กœ ๋งŒ๋“ค์–ด์ง„ softmax loss๋Š” unlabeled data์˜ ๋‚ด์žฌ๋œ ๊ตฌ์กฐ๋ฅผ ๋ณผ ์ˆ˜ ์—†๊ณ , ์ด์— ๋”ฐ๋ผ ์ •ํ™•ํ•œ clustering์ด ๋ถˆ๊ฐ€๋Šฅํ•ด์ง„๋‹ค๋Š” ๋ฌธ์ œ

ํ•ด๋‹น ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ƒˆ๋กœ ๋ฐœ๊ตดํ•˜๋Š” intent๊ฐ€ ์›๋ž˜ intent (known intent)์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š๋Š”๋‹ค๋Š” ์ง๊ด€์œผ๋กœ ์‹œ์ž‘ํ•œ๋‹ค. labeled data์— ํฌํ•จ๋œ ์˜๋„๋Š” ์˜๋„ ๋ฐœ๊ฒฌ์„ ์œ„ํ•œ ๊ฐ€์ด๋“œ๋กœ ์“ฐ์ด๊ณ , unlabeled data๋กœ๋ถ€ํ„ฐ ์–ป์€ ์ •๋ณด๋Š” ์ด์ „ ์˜๋„ ๋ฐœ๊ตด์˜ ํ™•์ธ (์ •ํ™•์„ฑ)์„ ํ–ฅ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์œˆ์œˆ ๊ด€๊ณ„๋ฅผ ํ˜•์„ฑํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

์ด๋Š” ์˜๋„ ํ• ๋‹น์„ ์ˆจ๊ฒจ์ง„ ๋ณ€์ˆ˜ (latent variable)๋กœ ๊ฐ€์ง€๋Š” ์›์น™์ ์ด๊ณ  ๋ฐฉ๋ฒ•๋ก ์ ์ธ ํ”„๋ ˆ์ž„์›Œํฌ์ธ๋ฐ, ํ•ด๋‹น ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด Expectation-Maximization์„ ์›์น™ ํ…œํ”Œ๋ฆฟ์œผ๋กœ ๊ฐ–๋Š”๋‹ค. 

ํ•ด๋‹น ํ…œํ”Œ๋ฆฟ์€ ๋‹ค์‹œ E-step๊ณผ M-step์ด๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ๋‹จ๊ณ„๋กœ ๋‚˜๋‰œ๋‹ค.

  • E-step์—์„œ๋Š” ํ˜„์žฌ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด ์˜๋„๋ฅผ ๋ฐœ๊ฒฌํ•˜๊ณ , data์˜ ๋‚ด์žฌ๋œ ๊ตฌ์กฐ๋ฅผ ๋ณด๊ธฐ ์œ„ํ•ด ์˜๋„ ํ• ๋‹น์˜ ๊ตฌ์ฒด์ ์ธ ์‚ฌํ›„ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•œ๋‹ค.
  • M-step์—์„œ๋Š” unlabeled data๋กœ๋ถ€ํ„ฐ ์ƒˆ๋กœ ๋ฐœ๊ฒฌ๋œ ์˜๋„๋ฅผ ํฌํ•จํ•œ ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ ๊ณผ, labeled data๋ฅผ ์‹๋ณ„ํ•˜๋Š” ํ™•๋ฅ , ์นœํ™”์  ๋ฐœ๊ฒฌ์˜ ํŠน์ง•์„ ํ•™์Šตํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” ์˜๋„ ํ• ๋‹น์˜ ์‚ฌํ›„ ํ™•๋ฅ ์„ ๋™์‹œ์— ์ตœ๋Œ€ํ™”ํ•˜์—ฌ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ตœ์ ํ™” ๋ฐ ์—…๋ฐ์ดํŠธํ•œ๋‹ค.

Related Work

์‚ฌ์ „ ์—ฐ๊ตฌ๋กœ๋Š” Unsupervised Clustering๊ณผ Semi-supervised Clustering ๋ฐฉ์‹์ด ์žˆ๋‹ค.

๊ฐ๊ฐ์˜ ๋ฐฉ์‹์—์„œ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋ก ์ด ํ•œ ์ค„๋กœ ์„ค๋ช…๋˜๋Š”๋ฐ ์ด๋Š” ๋…ผ๋ฌธ์—์„œ Related Work ๋ถ€๋ถ„์„ ๋ณด๋Š” ํŽธ์ด ๋” ์ข‹์„ ๊ฒƒ ๊ฐ™๋‹ค.

Unupervised Clustering์˜ ๋‹จ์ ์€ ์‚ฌ์ „ ์ง€์‹์œผ๋กœ clustering์„ ๊ฐ€์ด๋“œํ•˜๋Š” ๊ฒƒ์„ ์‚ฌ์šฉํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ๊ฒƒ์ด๊ณ , Semi-supervised Clustering์˜ ๋‹จ์ ์€ ์œ„์—์„œ ์„ค๋ช…ํ•œ ๋ง๊ฐ ๊ณผ์ •์„ ๋ณด์œ ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.


Approach

1. Problem Definition

์šฐ์„  labeled dataset์„ D^l, ๊ทธ์— ํ•ด๋‹นํ•˜๋Š” intent(์˜๋„)๋ฅผ Y^l, unlabeled dataset์„ D^u๋ผ๊ณ  ์ •์˜ํ•˜์ž.

D = D^l U D^u ์ž„์„ ์ด์šฉํ•ด์„œ D^u์— ์˜๋„ ํ• ๋‹น์„ ํ•˜๋Š” ๊ฒƒ์ด ์ด ๋…ผ๋ฌธ์˜ ๋ชฉํ‘œ์ด๋‹ค. (D๋Š” ์ „์ฒด dataset)

์œ„์—์„œ ๋ฐํ˜”๋‹ค์‹œํ”ผ, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” known intent(์›๋ž˜ ์•Œ๊ณ  ์žˆ๋˜ ์˜๋„)๋ฅผ ํ›ผ์†ํ•˜์ง€ ์•Š์œผ๋ฉด์„œ ์ด์ต์„ ๊ฐ€์ ธ๊ฐ€์•ผ ํ•˜๋Š” ์ƒํ™ฉ์ด๋‹ค. ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

Z๋Š” ์ž ์žฌ ๋ณ€์ˆ˜

์—ฌ๊ธฐ์— Z_D๋ฅผ ํ•˜๋‚˜์˜ ๊ฐ’์œผ๋กœ ์ทจ๊ธ‰ํ•˜๊ณ  ๊ฐ ๋ณ€์— log๋ฅผ ์ทจํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

L_obj

์ด์ œ ๋ชฉํ‘œ๋Š” ์ข‹์€ Z_D๋ฅผ ์ฐพ์•„ L_obj๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๊ฒƒ์ด ๋˜์—ˆ๋‹ค.

 

์œ„ ๊ณผ์ •์„ ๋” ์•Œ๊ธฐ ์‰ฝ๊ฒŒ ํ’€์–ด์“ฐ๋ฉด, 

  1. ๋ฐ์ดํ„ฐ D์—๋Š” ๋‹ค์–‘ํ•œ ๋Œ€ํ™” ๋ฌธ์žฅ๋“ค์ด ์žˆ๊ณ , ๋ฌธ์žฅ๋“ค์€ ์˜๋„์— ๋”ฐ๋ผ clustering ๋œ๋‹ค.
  2. ์ดํ›„, ๊ฐ ๊ทธ๋ฃน์— ํ•ด๋‹นํ•˜๋Š” ์˜๋„๋ฅผ ํ• ๋‹นํ•˜๊ธฐ ์œ„ํ•ด ์ž ์žฌ ๋ณ€์ˆ˜ Z๋ฅผ ๋„์ž…ํ•œ๋‹ค.
  3. Z_D๋Š” ์˜๋„ ํ• ๋‹น์˜ ๊ฐ’์ธ๋ฐ, ์ด๋Š” ๊ทธ๋ฃน์˜ ์˜๋„๋ฅผ ๋งคํ•‘ํ•˜์—ฌ ๋ฐ์ดํ„ฐ์™€ ์˜๋„์˜ ๊ด€๊ณ„๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฒƒ์— ์‚ฌ์šฉ๋œ๋‹ค.

2. Intent Representation Transferring Knowledge

model์„ ์ดˆ๊ธฐํ™”ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” labeled ๋‹จ์–ด์—์„œ knowledge๋ฅผ tansfer ํ•˜๋Š” ๊ฒƒ์ด ํ•„์š”ํ•œ๋ฐ, ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” BERT๋ฅผ fine-tuning ํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค.

๋ฐœํ™” (x_i)๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, BERT๋ฅผ ์‚ฌ์šฉํ•ด์„œ contextual embedding์„ ๊ตฌํ•˜๊ณ , mean-pooling์„ ํ†ตํ•ด ์˜๋ฏธ์  ํ‘œํ˜„ (z_i)๋ฅผ ์ถ”์ถœํ•œ๋‹ค. fine-tuning ํ•˜๋Š” ๊ณผ์ •์—์„œ objective function์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

BERT fine-tuning ๊ณผ์ •์—์„œ์˜ objective function

fi ํ•จ์ˆ˜๋Š” ์„ ํ˜• ๋ถ„๋ฅ˜๋ฅผ ๋œปํ•˜๊ณ , K^l์€ known intent์˜ ์ˆ˜๋ฅผ ๋œปํ•œ๋‹ค.

3. EM Framework for Optimization

์œ„์—์„œ ์„ค๋ช…ํ•œ Z๋Š” ๋‹ค์Œ ๋‘ ๊ฐ€์ง€ ์†์„ฑ์œผ๋กœ ๊ตฌ์„ฑ๋ผ ์žˆ๋‹ค:

  1. D ์•ˆ์— ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ์˜๋„๊ฐ€ ์žˆ๋Š”์ง€ ๋‚˜ํƒ€๋‚ด๋Š” ์ฒ™๋„์ธ K๋ฅผ ์–ด๋–ป๊ฒŒ ๊ฒฐ์ •ํ• ์ง€
  2. D์— ํ•ด๋‹นํ•˜๋Š” ์˜๋„๋ฅผ ์–ด๋–ป๊ฒŒ ํ• ๋‹นํ• ์ง€

K๋ฅผ ์ถ”์ธกํ•˜๊ธฐ ์œ„ํ•ด์„œ, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ณผ์ •์„ ๊ฑฐ์นœ๋‹ค:

  1. ์šฐ์„  rough value k๋ฅผ ์„ค์ •ํ•œ๋‹ค. (ex. ์‹ค์ œ ์˜๋„ ๊ฐœ์ˆ˜์˜ ๋ฐฐ์ˆ˜)
  2. k๊ฐœ์˜ ์˜๋ฏธ์  cluster๋กœ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ํฌ๊ธฐ๊ฐ€ ์ผ์ • threshold ๋ฏธ๋งŒ์ธ ๊ฒƒ์„ ์ œ๊ฑฐํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ •์ œํ•œ๋‹ค.
    ์—ฌ๊ธฐ์„œ D์— K-means๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ k๊ฐœ์˜ ๋ฌถ์Œ์œผ๋กœ ๋งŒ๋“ ๋‹ค.
  3. K-means๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๊ฐ ๋ฐœํ™”์— ์˜๋„๋ฅผ ํ• ๋‹นํ•œ๋‹ค.
  4. EM ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ์–ด๋–ป๊ฒŒ ์œ„์˜ L_obj ์‹์„ ์ตœ์ ํ™”ํ• ์ง€ ์ƒ๊ฐํ•œ๋‹ค.

EM ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋‹จ๊ณ„๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

1. E-step

  • ์ด์ „์— ๊ฐ€์ƒ์˜ label์„ ํ• ๋‹นํ•œ cross-entropy loss๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ๋“ค์€ ์ •ํ™•ํ•˜์ง€ ์•Š์•˜๋‹ค.
  • D์˜ ๋‚ด์žฌ๋œ ๊ตฌ์กฐ๋ฅผ ๋” ์ž˜ ๋ฐ˜์˜ํ•˜๊ณ , ์˜๋„ ํ• ๋‹น์˜ ํŠน์ง•์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋™์ผํ•œ ์˜๋„๋ฅผ ๊ฐ€์ง„ ๋ฐœํ™”๋ฌธ์„ ์˜๋ฏธ์  ๊ณต๊ฐ„์—์„œ ๊ฐ€๊น๊ฒŒ ๋ฐฐ์น˜ํ•˜๊ณ , ์„œ๋กœ ๋‹ค๋ฅธ ์˜๋„๋ฅผ ๊ฐ€์ง„ ๋ฐœํ™”๋ฌธ์€ ๋ฉ€๋ฆฌ ๋ฐฐ์น˜ํ•œ๋‹ค. 
    ํ•ด๋‹น ๊ฐœ๋…์€ contrastive learning์œผ๋กœ๋ถ€ํ„ฐ ์˜๊ฐ ๋ฐ›์•„ ์‚ฌํ›„ ํ™•๋ฅ ์ธ p(Z|D; ฮธ)๋ฅผ ์ถ”์ •ํ•œ๋‹ค. ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

C_k๋Š” Z์— ์˜ํ•ด ์ƒ์„ฑ๋œ cluster์ด๊ณ , x์™€ x+ ๋Š” ํŠน์ง•๋“ค ๊ฐ„์˜ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋กœ ๊ณ„์‚ฐ๋œ๋‹ค. 

 

Deepaligned์˜ ๋ง๊ฐ ํŠน์„ฑ ๊ทน๋ณต์„ ์œ„ํ•ด์„œ๋Š” L_obj ๊ด€๋ จ ์‹์„ ์ตœ์ ํ™”ํ•ด์•ผ ํ•˜๊ณ , ์ด๋ฅผ ์œ„ํ•ด์„œ๋Š” p(Y^l|Z, D; ฮธ)๋„ ๊ณ„์‚ฐํ•ด์•ผ ํ•œ๋‹ค.

๊ณ„์‚ฐ ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

pi ํ•จ์ˆ˜๋Š” ์œ„์—์„œ ์„ค๋ช…ํ•œ ๊ฒƒ๊ณผ ๊ฐ™์€ linear classifier (์„ ํ˜• ๋ถ„๋ฅ˜๊ธฐ)์ด๊ณ , y๋Š” x์˜ label, K^l๋Š” known intent์˜ ์ด๊ธธ์ด, D^l์€ D์—์„œ์˜ labeled data์ด๋‹ค.

D^l(D^u, Z)๋Š” D^u์˜ ์ƒ˜ํ”Œ ์ค‘ Z ๊ณผ์ •์„ ๊ฑฐ์ณ known intents๋กœ ์—ฌ๊ฒจ์งˆ ์ˆ˜ ์žˆ๋Š” ์ƒ˜ํ”Œ์„ ๋œปํ•œ๋‹ค. ์ด์— ๊ด€ํ•œ ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

x^l์€ D^l์˜ ์ƒ˜ํ”Œ, y^l๋Š” x^l์˜ label์ด๋‹ค. N_Z(x^l)๋Š” Z์— ์˜ํ•ด x^l๊ณผ ๊ฐ™์€ ๋ฌถ์Œ์œผ๋กœ ๋ฌถ์ธ ๊ฒƒ์˜ ๊ฐ€๊นŒ์šด neighbor unlabeled dataset ์ƒ˜ํ”Œ์ด๋‹ค.

(์ฐธ๊ณ ) D^l์„ ๋”ํ–ˆ์„ ๋•Œ์™€ ๋”ํ•˜์ง€ ์•Š์•˜์„ ๋•Œ effect๋ฅผ ๋น„๊ตํ•ด ๋ณด๋‹ˆ, ๋”ํ–ˆ์„ ๋•Œ์˜ ์„ฑ๋Šฅ์ด ํ›จ์”ฌ ์ข‹์•˜์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

์ดํ›„ labeled ๋œ data๋Š” model training์— ๋งž์ถฐ์ง„๋‹ค.

์œ„์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ, ๋ชจ๋ธ์€ labeled data์—์„œ transfer ์‹œ์—๋„ knowledge๋ฅผ ์žƒ์ง€ ์•Š๊ฒŒ ๋˜๊ณ , ๊ณ„์† dataset์˜ ๋‚ด์žฌ์ ์ธ ๊ตฌ์กฐ๋ฅผ ํƒ๊ตฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค.

2. M-step

M-step์—์„œ๋Š” Equation 2๋ฒˆ์˜ ฮธ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด๋‹ค.

์œ„์— E-step์—์„œ ๋งŒ๋“  ์‹์œผ๋กœ, ์ตœ์ข… loss๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘์„ฑ๋œ๋‹ค:

์ตœ์ข… loss

์—ฌ๊ธฐ์„œ ฮป๋Š” ํ•™์Šต ์ค‘ ๋‘ ๊ฐœ์˜ log ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ด๊ณ , ฯ„๋Š” temperautre scaling์„ ์œ„ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ธ๋ฐ, ์ด๋Š” ๊ฐ€๋” contrastive learning์—๋„ ๋“ฑ์žฅํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ first term (ฮป์™€ ๊ณฑํ•ด์ง„ ๋ถ€๋ถ„)์€ unlabeled data์˜ ๋‚ด์žฌ๋œ ๊ตฌ์กฐ๋ฅผ ํƒ์ƒ‰ํ•˜์—ฌ exploration์ด๋ผ๊ณ  ํ•˜๊ณ ,

second term ((1-ฮป)์™€ ๊ณฑํ•ด์ง„ ๋ถ€๋ถ„)์€ labeled data์—์„œ transfer ๋œ ์ง€์‹์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ•ํ™”ํ•˜๋Š” ๊ฒƒ์œผ๋กœ utilization์ด๋ผ๊ณ ๋„ ํ•œ๋‹ค.

ฮป์˜ ๊ฐ’์„ ์กฐ์ •ํ•˜์—ฌ ์‹คํ—˜ํ•œ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด์•˜์„ ๋•Œ, exploration๊ณผ utilization ๋‘˜ ๋‹ค ํ•„์š”ํ•œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.


์ตœ์ข…์ ์ธ EM ํ”„๋ ˆ์ž„์›Œํฌ์˜ ํ•™์Šต ๊ณผ์ •์„ ์‹œ๊ฐํ™”ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:


Experiments

๋…ผ๋ฌธ์—์„œ ์ œ์‹œ๋œ ์‹คํ—˜์—์„œ๋Š” ๋‹ค์Œ 3๊ฐœ์˜ ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•ด์„œ ํšจ์šฉ์„ฑ์„ ์ฆ๋ช…ํ•œ๋‹ค.

1. CLINC

2. BANKING

3. StackOverflow

๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค


Results and Discussion

๊ฒฐ๊ตญ, ์ด ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์œ„์˜ 3๊ฐ€์ง€ ์‹คํ—˜์—์„œ ๋ชจ๋‘ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ์ด ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ๋Š” labeled data๋ฅผ model training์˜ ๊ฐ€์ด๋“œ๋กœ ์‚ฌ์šฉํ•˜์—ฌ model์ด ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ง€๊ฒŒ ํ–ˆ๋‹ค.

 More Than Remembering Knowledge

ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” unlabeled data์—์„œ ๋ฐœ๊ฒฌ๋œ D^l ์ƒ˜ํ”Œ์ด ๋ง๊ฐํ˜„์ƒ์„ ๋ง‰๋Š” ๊ฒƒ์„ ๋„˜์–ด์„œ, known intents์˜ ํ™•์ธ์„ ๋„์™€์ฃผ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— model์ด ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค๊ณ  ์ฃผ์žฅํ•œ๋‹ค.

 

๋Œ“๊ธ€์ˆ˜0