๐Ÿ“š ๋…ผ๋ฌธ

A Probabilistic Framework for Discovering New Intents

2023. 7. 27. 01:33
๋ชฉ์ฐจ
  1. Abstract
  2. Introduction
  3. Previous Works
  4. Related Work
  5. Approach
  6. 1. Problem Definition
  7. 2. Intent Representation Transferring Knowledge
  8. 3. EM Framework for Optimization

chatGPT ๋•Œ๋ฌธ์ธ์ง€, ์š”์ฆ˜ ๋Œ€ํ™”ํ˜• ์‹œ์Šคํ…œ์— ๊ด€์‹ฌ์ด ์ •๋ง ๋งŽ๋‹ค.

๊ทธ์ค‘, task-oriented dialogue system์— ๋Œ€ํ•ด ์—ฐ๊ตฌ๋ฅผ ํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค.

Task-oriented dialogue system๋ฅผ ์ฃผ์ œ๋กœ ๋ฆฌ์„œ์น˜๋ฅผ ํ•ด๋ณด๋‹ˆ, intent detection (์˜๋„ ๊ฐ์ง€)๊ฐ€ ํ•ด๋‹น ์ฃผ์ œ์—์„œ ์ค‘์š”ํ•œ task๋ผ๋Š” ๊ฒƒ์„ ์•Œ์•˜๋‹ค. ๊ทธ๋Ÿด ๋งŒ๋„ ํ•œ ๊ฒƒ์ด,  task-oriented ์ฑ—๋ด‡์€ ํŠน์ • ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ ์‹œ์Šคํ…œ์ธ๋ฐ, ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๊ทธ ์ž‘์—…์ด ๋ฌด์—‡์— ๊ด€ํ•œ ๊ฒƒ์ธ์ง€ ๋“ฑ ์˜๋„๋ฅผ ๋จผ์ € ์•„๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๊ทธ๋ฆฌํ•˜์—ฌ ๋‚˜๋Š” ์ •๋ง ์ตœ๊ทผ์— ACL์— ์ˆ˜๋ก๋œ A Probabilistic Framework for Discovering New Intents ๋ผ๋Š” ๋…ผ๋ฌธ์œผ๋กœ ์—ฐ๊ตฌ๋ฅผ ์‹œ์ž‘ํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํฅ๋ฏธ๋กญ๊ฒŒ, ๋ถ€๋ก๊นŒ์ง€ ํ•˜๋‚˜ํ•˜๋‚˜ ๋‹ค ๋’ค์ ธ๊ฐ€๋ฉฐ ์„ฌ์„ธํ•˜๊ฒŒ ์ฝ์€ ๋…ผ๋ฌธ์€ ์ฒ˜์Œ์ธ ๊ฒƒ ๊ฐ™๋‹ค.


Abstract

์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ƒˆ๋กœ์šด ์˜๋„๋ฅผ ๋ฐœ๊ฒฌํ•˜๋Š” ๊ฒƒ์ด ์ด๋ฏธ ์•Œ๋ ค์ง„ ์˜๋„๋ฅผ ์‹๋ณ„ํ•˜๋Š” ๋ฐ ์œ ์ตํ•˜๋‹ค๋Š” ์ง๊ด€์„ ๊ฐ€์ง€๊ณ , ์˜๋„ ํ• ๋‹น (intent assignments)๋ฅผ ์ž ์žฌ ๋ณ€์ˆ˜๋กœ ๋‹ค๋ฃจ๋Š” ํ™•๋ฅ ์ ์ธ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค.

๋˜ํ•œ, ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด Expectation Mechanism์„ ๋„์ž…ํ–ˆ๋Š”๋ฐ, ์ด์— ๋Œ€ํ•œ ๊ฐœ๋…์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

  1. E-step
    • ์˜๋„ ์‹๋ณ„
    • ์˜๋„ ํ• ๋‹น์˜ ์‚ฌํ›„ ํ™•๋ฅ  (posterior probability)๋ฅผ ํ†ตํ•œ unlabeled data์˜ ๋‚ด์žฌ๋œ ๊ตฌ์กฐ ํƒ์ƒ‰
  2. M-step
    • known intent๋กœ๋ถ€ํ„ฐ ์ „๋‹ฌ๋œ ์ง€์‹์„ ๋ง๊ฐํ•˜๋Š” ๊ฒƒ์„ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด labeled data์˜ ๊ตฌ๋ณ„์„ฑ ์ตœ์ ํ™”

์ด ๋‘ ๊ฐœ๋…์„ ์ค‘์‹ฌ์œผ๋กœ ๋…ผ๋ฌธ์„ ์ „๊ฐœํ•œ๋‹ค.


Introduction

Task-Oriented Dialogue System (์ค„์—ฌ์„œ TODS)๋Š” ์œ ์ €์™€์˜ interaction์—์„œ ์ž ์žฌ์ ์ธ ์ƒˆ ์˜๋„๋ฅผ ํฌ์ฐฉํ•ด์•ผ ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด ๊ณผ์ •์€ labeled data์˜ ๋„์›€์„ ๋ฐ›์•„ interactio์—์„œ ๋ฐœ์ƒํ•œ unlabeled data ๋‚ด์˜ ์˜๋„๋“ค์„ ์ ์‘์ ์œผ๋กœ ๋ฐœ๊ฒฌํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.

Previous Works

์ด์ „ ์—ฐ๊ตฌ๋“ค์—์„œ๋Š” ์˜๋„ ๋ฐœ๊ฒฌ์„ ๋น„์ง€๋„ ํด๋Ÿฌ์Šคํ„ฐ ๊ณผ์ • (Unsupervised Cluster Learning)์œผ๋กœ ์ง„ํ–‰ํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด ๋ฐฉ๋ฒ•์—๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ๋Š”๋ฐ, ๋ฐ”๋กœ clustering ๊ณผ์ •์„ ๊ฐ€์ด๋“œํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ€์ƒ์˜ ์ง€๋„ ํ•™์Šต ์‹ ํ˜ธ ๊ตฌ์„ฑ์— ์ดˆ์ ์„ ๋‘๊ณ , ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ labeled data๋Š” ๋ฌด์‹œํ–ˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

์ด๋ฅผ ์ ์€ ์–‘์˜ labeled data๋ฅผ ๋ฏธ๋ฆฌ ์†Œ์œ ํ•˜์—ฌ ์˜๋„ ๋ฐœ๊ฒฌ์„ ๊ฐ€์ด๋“œํ•  ์ˆ˜ ์žˆ๊ณ , ๋Œ€ํ™” ์ค‘ ์ƒ์„ฑ๋˜๋Š” ๋งŽ์€ ์–‘์˜ unlabeled data์—์„œ ๊ธฐ์กด ์˜๋„์™€ ์•Œ๋ ค์ง€์ง€ ์•Š์€ ์˜๋„ ๋‘˜ ๋‹ค ๋ฐœ๊ฒฌ๋˜๋Š” ์‹ค์ œ ์‚ฌ์šฉ ์ƒํ™ฉ์— ์ ์šฉํ•  ์ˆ˜ ์žˆ์„๊นŒ ?

์šฐ์„  ์–ด๋–ป๊ฒŒ labeled data๋ฅผ ์ด์šฉํ•ด์„œ unlabeled corpus์˜ ์˜๋„๋ฅผ ์•Œ์•„๋‚ผ ์ˆ˜ ์žˆ์„์ง€ ํ™•์ธํ•ด ๋ณด์ž.

 

๋…ผ๋ฌธ์—์„œ๋Š” 2021๋…„๋„์— ์œ ๋ช…ํ–ˆ๋˜ DeepAligned ๋ชจ๋ธ์„ ์†Œ๊ฐœํ•œ๋‹ค.

์ด ๋ชจ๋ธ์€

  1. ์‚ฌ์ „์ง€์‹์— ๋Œ€ํ•œ pre-training์„ ํ†ตํ•ด unlabeled data์˜ ์˜๋ฏธ ํŠน์ง•์„ ํ•™์Šต์‹œํ‚ด์œผ๋กœ์จ ์ผ๋ฐ˜ํ™”ํ•œ๋‹ค.
  2. ์ตœ์‹ ์‹ ํ‘œํ˜„์„ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ unlabeled ๋ฐœ์–ธ์— ๊ฐ€์ƒ์˜ label์„ ๋งŒ๋“ค๊ณ  re-train ์‹œํ‚จ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ์ด DeepAligned ๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ 2๊ฐ€์ง€ ์น˜๋ช…์ ์ธ ๋ฌธ์ œ์ ์ด ์žˆ์—ˆ๋‹ค.

  1. pseudo supervision signal๋กœ re-train ๋  ๋•Œ ์ „๋‹ฌ ๋‹จ๊ณ„์—์„œ model์ด ์ „๋‹ฌํ•˜๋Š” ์ง€์‹์„ ๋ง๊ฐํ•œ๋‹ค๋Š” ๋ฌธ์ œ
  2. ๊ฐ€์ƒ์˜ label๋กœ ๋งŒ๋“ค์–ด์ง„ softmax loss๋Š” unlabeled data์˜ ๋‚ด์žฌ๋œ ๊ตฌ์กฐ๋ฅผ ๋ณผ ์ˆ˜ ์—†๊ณ , ์ด์— ๋”ฐ๋ผ ์ •ํ™•ํ•œ clustering์ด ๋ถˆ๊ฐ€๋Šฅํ•ด์ง„๋‹ค๋Š” ๋ฌธ์ œ

ํ•ด๋‹น ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ƒˆ๋กœ ๋ฐœ๊ตดํ•˜๋Š” intent๊ฐ€ ์›๋ž˜ intent (known intent)์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š๋Š”๋‹ค๋Š” ์ง๊ด€์œผ๋กœ ์‹œ์ž‘ํ•œ๋‹ค. labeled data์— ํฌํ•จ๋œ ์˜๋„๋Š” ์˜๋„ ๋ฐœ๊ฒฌ์„ ์œ„ํ•œ ๊ฐ€์ด๋“œ๋กœ ์“ฐ์ด๊ณ , unlabeled data๋กœ๋ถ€ํ„ฐ ์–ป์€ ์ •๋ณด๋Š” ์ด์ „ ์˜๋„ ๋ฐœ๊ตด์˜ ํ™•์ธ (์ •ํ™•์„ฑ)์„ ํ–ฅ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์œˆ์œˆ ๊ด€๊ณ„๋ฅผ ํ˜•์„ฑํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

์ด๋Š” ์˜๋„ ํ• ๋‹น์„ ์ˆจ๊ฒจ์ง„ ๋ณ€์ˆ˜ (latent variable)๋กœ ๊ฐ€์ง€๋Š” ์›์น™์ ์ด๊ณ  ๋ฐฉ๋ฒ•๋ก ์ ์ธ ํ”„๋ ˆ์ž„์›Œํฌ์ธ๋ฐ, ํ•ด๋‹น ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด Expectation-Maximization์„ ์›์น™ ํ…œํ”Œ๋ฆฟ์œผ๋กœ ๊ฐ–๋Š”๋‹ค. 

ํ•ด๋‹น ํ…œํ”Œ๋ฆฟ์€ ๋‹ค์‹œ E-step๊ณผ M-step์ด๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ๋‹จ๊ณ„๋กœ ๋‚˜๋‰œ๋‹ค.

  • E-step์—์„œ๋Š” ํ˜„์žฌ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด ์˜๋„๋ฅผ ๋ฐœ๊ฒฌํ•˜๊ณ , data์˜ ๋‚ด์žฌ๋œ ๊ตฌ์กฐ๋ฅผ ๋ณด๊ธฐ ์œ„ํ•ด ์˜๋„ ํ• ๋‹น์˜ ๊ตฌ์ฒด์ ์ธ ์‚ฌํ›„ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•œ๋‹ค.
  • M-step์—์„œ๋Š” unlabeled data๋กœ๋ถ€ํ„ฐ ์ƒˆ๋กœ ๋ฐœ๊ฒฌ๋œ ์˜๋„๋ฅผ ํฌํ•จํ•œ ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ ๊ณผ, labeled data๋ฅผ ์‹๋ณ„ํ•˜๋Š” ํ™•๋ฅ , ์นœํ™”์  ๋ฐœ๊ฒฌ์˜ ํŠน์ง•์„ ํ•™์Šตํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” ์˜๋„ ํ• ๋‹น์˜ ์‚ฌํ›„ ํ™•๋ฅ ์„ ๋™์‹œ์— ์ตœ๋Œ€ํ™”ํ•˜์—ฌ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ตœ์ ํ™” ๋ฐ ์—…๋ฐ์ดํŠธํ•œ๋‹ค.

Related Work

์‚ฌ์ „ ์—ฐ๊ตฌ๋กœ๋Š” Unsupervised Clustering๊ณผ Semi-supervised Clustering ๋ฐฉ์‹์ด ์žˆ๋‹ค.

๊ฐ๊ฐ์˜ ๋ฐฉ์‹์—์„œ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋ก ์ด ํ•œ ์ค„๋กœ ์„ค๋ช…๋˜๋Š”๋ฐ ์ด๋Š” ๋…ผ๋ฌธ์—์„œ Related Work ๋ถ€๋ถ„์„ ๋ณด๋Š” ํŽธ์ด ๋” ์ข‹์„ ๊ฒƒ ๊ฐ™๋‹ค.

Unupervised Clustering์˜ ๋‹จ์ ์€ ์‚ฌ์ „ ์ง€์‹์œผ๋กœ clustering์„ ๊ฐ€์ด๋“œํ•˜๋Š” ๊ฒƒ์„ ์‚ฌ์šฉํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ๊ฒƒ์ด๊ณ , Semi-supervised Clustering์˜ ๋‹จ์ ์€ ์œ„์—์„œ ์„ค๋ช…ํ•œ ๋ง๊ฐ ๊ณผ์ •์„ ๋ณด์œ ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.


Approach

1. Problem Definition

์šฐ์„  labeled dataset์„ D^l, ๊ทธ์— ํ•ด๋‹นํ•˜๋Š” intent(์˜๋„)๋ฅผ Y^l, unlabeled dataset์„ D^u๋ผ๊ณ  ์ •์˜ํ•˜์ž.

D = D^l U D^u ์ž„์„ ์ด์šฉํ•ด์„œ D^u์— ์˜๋„ ํ• ๋‹น์„ ํ•˜๋Š” ๊ฒƒ์ด ์ด ๋…ผ๋ฌธ์˜ ๋ชฉํ‘œ์ด๋‹ค. (D๋Š” ์ „์ฒด dataset)

์œ„์—์„œ ๋ฐํ˜”๋‹ค์‹œํ”ผ, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” known intent(์›๋ž˜ ์•Œ๊ณ  ์žˆ๋˜ ์˜๋„)๋ฅผ ํ›ผ์†ํ•˜์ง€ ์•Š์œผ๋ฉด์„œ ์ด์ต์„ ๊ฐ€์ ธ๊ฐ€์•ผ ํ•˜๋Š” ์ƒํ™ฉ์ด๋‹ค. ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

Z๋Š” ์ž ์žฌ ๋ณ€์ˆ˜

์—ฌ๊ธฐ์— Z_D๋ฅผ ํ•˜๋‚˜์˜ ๊ฐ’์œผ๋กœ ์ทจ๊ธ‰ํ•˜๊ณ  ๊ฐ ๋ณ€์— log๋ฅผ ์ทจํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

L_obj

์ด์ œ ๋ชฉํ‘œ๋Š” ์ข‹์€ Z_D๋ฅผ ์ฐพ์•„ L_obj๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๊ฒƒ์ด ๋˜์—ˆ๋‹ค.

 

์œ„ ๊ณผ์ •์„ ๋” ์•Œ๊ธฐ ์‰ฝ๊ฒŒ ํ’€์–ด์“ฐ๋ฉด, 

  1. ๋ฐ์ดํ„ฐ D์—๋Š” ๋‹ค์–‘ํ•œ ๋Œ€ํ™” ๋ฌธ์žฅ๋“ค์ด ์žˆ๊ณ , ๋ฌธ์žฅ๋“ค์€ ์˜๋„์— ๋”ฐ๋ผ clustering ๋œ๋‹ค.
  2. ์ดํ›„, ๊ฐ ๊ทธ๋ฃน์— ํ•ด๋‹นํ•˜๋Š” ์˜๋„๋ฅผ ํ• ๋‹นํ•˜๊ธฐ ์œ„ํ•ด ์ž ์žฌ ๋ณ€์ˆ˜ Z๋ฅผ ๋„์ž…ํ•œ๋‹ค.
  3. Z_D๋Š” ์˜๋„ ํ• ๋‹น์˜ ๊ฐ’์ธ๋ฐ, ์ด๋Š” ๊ทธ๋ฃน์˜ ์˜๋„๋ฅผ ๋งคํ•‘ํ•˜์—ฌ ๋ฐ์ดํ„ฐ์™€ ์˜๋„์˜ ๊ด€๊ณ„๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฒƒ์— ์‚ฌ์šฉ๋œ๋‹ค.

2. Intent Representation Transferring Knowledge

model์„ ์ดˆ๊ธฐํ™”ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” labeled ๋‹จ์–ด์—์„œ knowledge๋ฅผ tansfer ํ•˜๋Š” ๊ฒƒ์ด ํ•„์š”ํ•œ๋ฐ, ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” BERT๋ฅผ fine-tuning ํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค.

๋ฐœํ™” (x_i)๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, BERT๋ฅผ ์‚ฌ์šฉํ•ด์„œ contextual embedding์„ ๊ตฌํ•˜๊ณ , mean-pooling์„ ํ†ตํ•ด ์˜๋ฏธ์  ํ‘œํ˜„ (z_i)๋ฅผ ์ถ”์ถœํ•œ๋‹ค. fine-tuning ํ•˜๋Š” ๊ณผ์ •์—์„œ objective function์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

BERT fine-tuning ๊ณผ์ •์—์„œ์˜ objective function

fi ํ•จ์ˆ˜๋Š” ์„ ํ˜• ๋ถ„๋ฅ˜๋ฅผ ๋œปํ•˜๊ณ , K^l์€ known intent์˜ ์ˆ˜๋ฅผ ๋œปํ•œ๋‹ค.

3. EM Framework for Optimization

์œ„์—์„œ ์„ค๋ช…ํ•œ Z๋Š” ๋‹ค์Œ ๋‘ ๊ฐ€์ง€ ์†์„ฑ์œผ๋กœ ๊ตฌ์„ฑ๋ผ ์žˆ๋‹ค:

  1. D ์•ˆ์— ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ์˜๋„๊ฐ€ ์žˆ๋Š”์ง€ ๋‚˜ํƒ€๋‚ด๋Š” ์ฒ™๋„์ธ K๋ฅผ ์–ด๋–ป๊ฒŒ ๊ฒฐ์ •ํ• ์ง€
  2. D์— ํ•ด๋‹นํ•˜๋Š” ์˜๋„๋ฅผ ์–ด๋–ป๊ฒŒ ํ• ๋‹นํ• ์ง€

K๋ฅผ ์ถ”์ธกํ•˜๊ธฐ ์œ„ํ•ด์„œ, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ณผ์ •์„ ๊ฑฐ์นœ๋‹ค:

  1. ์šฐ์„  rough value k๋ฅผ ์„ค์ •ํ•œ๋‹ค. (ex. ์‹ค์ œ ์˜๋„ ๊ฐœ์ˆ˜์˜ ๋ฐฐ์ˆ˜)
  2. k๊ฐœ์˜ ์˜๋ฏธ์  cluster๋กœ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ํฌ๊ธฐ๊ฐ€ ์ผ์ • threshold ๋ฏธ๋งŒ์ธ ๊ฒƒ์„ ์ œ๊ฑฐํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ •์ œํ•œ๋‹ค.
    ์—ฌ๊ธฐ์„œ D์— K-means๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ k๊ฐœ์˜ ๋ฌถ์Œ์œผ๋กœ ๋งŒ๋“ ๋‹ค.
  3. K-means๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๊ฐ ๋ฐœํ™”์— ์˜๋„๋ฅผ ํ• ๋‹นํ•œ๋‹ค.
  4. EM ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ์–ด๋–ป๊ฒŒ ์œ„์˜ L_obj ์‹์„ ์ตœ์ ํ™”ํ• ์ง€ ์ƒ๊ฐํ•œ๋‹ค.

EM ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋‹จ๊ณ„๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

1. E-step

  • ์ด์ „์— ๊ฐ€์ƒ์˜ label์„ ํ• ๋‹นํ•œ cross-entropy loss๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋ธ๋“ค์€ ์ •ํ™•ํ•˜์ง€ ์•Š์•˜๋‹ค.
  • D์˜ ๋‚ด์žฌ๋œ ๊ตฌ์กฐ๋ฅผ ๋” ์ž˜ ๋ฐ˜์˜ํ•˜๊ณ , ์˜๋„ ํ• ๋‹น์˜ ํŠน์ง•์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋™์ผํ•œ ์˜๋„๋ฅผ ๊ฐ€์ง„ ๋ฐœํ™”๋ฌธ์„ ์˜๋ฏธ์  ๊ณต๊ฐ„์—์„œ ๊ฐ€๊น๊ฒŒ ๋ฐฐ์น˜ํ•˜๊ณ , ์„œ๋กœ ๋‹ค๋ฅธ ์˜๋„๋ฅผ ๊ฐ€์ง„ ๋ฐœํ™”๋ฌธ์€ ๋ฉ€๋ฆฌ ๋ฐฐ์น˜ํ•œ๋‹ค. 
    ํ•ด๋‹น ๊ฐœ๋…์€ contrastive learning์œผ๋กœ๋ถ€ํ„ฐ ์˜๊ฐ ๋ฐ›์•„ ์‚ฌํ›„ ํ™•๋ฅ ์ธ p(Z|D; ฮธ)๋ฅผ ์ถ”์ •ํ•œ๋‹ค. ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

C_k๋Š” Z์— ์˜ํ•ด ์ƒ์„ฑ๋œ cluster์ด๊ณ , x์™€ x+ ๋Š” ํŠน์ง•๋“ค ๊ฐ„์˜ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๋กœ ๊ณ„์‚ฐ๋œ๋‹ค. 

 

Deepaligned์˜ ๋ง๊ฐ ํŠน์„ฑ ๊ทน๋ณต์„ ์œ„ํ•ด์„œ๋Š” L_obj ๊ด€๋ จ ์‹์„ ์ตœ์ ํ™”ํ•ด์•ผ ํ•˜๊ณ , ์ด๋ฅผ ์œ„ํ•ด์„œ๋Š” p(Y^l|Z, D; ฮธ)๋„ ๊ณ„์‚ฐํ•ด์•ผ ํ•œ๋‹ค.

๊ณ„์‚ฐ ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

pi ํ•จ์ˆ˜๋Š” ์œ„์—์„œ ์„ค๋ช…ํ•œ ๊ฒƒ๊ณผ ๊ฐ™์€ linear classifier (์„ ํ˜• ๋ถ„๋ฅ˜๊ธฐ)์ด๊ณ , y๋Š” x์˜ label, K^l๋Š” known intent์˜ ์ด๊ธธ์ด, D^l์€ D์—์„œ์˜ labeled data์ด๋‹ค.

D^l(D^u, Z)๋Š” D^u์˜ ์ƒ˜ํ”Œ ์ค‘ Z ๊ณผ์ •์„ ๊ฑฐ์ณ known intents๋กœ ์—ฌ๊ฒจ์งˆ ์ˆ˜ ์žˆ๋Š” ์ƒ˜ํ”Œ์„ ๋œปํ•œ๋‹ค. ์ด์— ๊ด€ํ•œ ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:

x^l์€ D^l์˜ ์ƒ˜ํ”Œ, y^l๋Š” x^l์˜ label์ด๋‹ค. N_Z(x^l)๋Š” Z์— ์˜ํ•ด x^l๊ณผ ๊ฐ™์€ ๋ฌถ์Œ์œผ๋กœ ๋ฌถ์ธ ๊ฒƒ์˜ ๊ฐ€๊นŒ์šด neighbor unlabeled dataset ์ƒ˜ํ”Œ์ด๋‹ค.

(์ฐธ๊ณ ) D^l์„ ๋”ํ–ˆ์„ ๋•Œ์™€ ๋”ํ•˜์ง€ ์•Š์•˜์„ ๋•Œ effect๋ฅผ ๋น„๊ตํ•ด ๋ณด๋‹ˆ, ๋”ํ–ˆ์„ ๋•Œ์˜ ์„ฑ๋Šฅ์ด ํ›จ์”ฌ ์ข‹์•˜์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

์ดํ›„ labeled ๋œ data๋Š” model training์— ๋งž์ถฐ์ง„๋‹ค.

์œ„์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ, ๋ชจ๋ธ์€ labeled data์—์„œ transfer ์‹œ์—๋„ knowledge๋ฅผ ์žƒ์ง€ ์•Š๊ฒŒ ๋˜๊ณ , ๊ณ„์† dataset์˜ ๋‚ด์žฌ์ ์ธ ๊ตฌ์กฐ๋ฅผ ํƒ๊ตฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค.

2. M-step

M-step์—์„œ๋Š” Equation 2๋ฒˆ์˜ ฮธ๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด๋‹ค.

์œ„์— E-step์—์„œ ๋งŒ๋“  ์‹์œผ๋กœ, ์ตœ์ข… loss๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘์„ฑ๋œ๋‹ค:

์ตœ์ข… loss

์—ฌ๊ธฐ์„œ ฮป๋Š” ํ•™์Šต ์ค‘ ๋‘ ๊ฐœ์˜ log ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ด๊ณ , ฯ„๋Š” temperautre scaling์„ ์œ„ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ธ๋ฐ, ์ด๋Š” ๊ฐ€๋” contrastive learning์—๋„ ๋“ฑ์žฅํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ first term (ฮป์™€ ๊ณฑํ•ด์ง„ ๋ถ€๋ถ„)์€ unlabeled data์˜ ๋‚ด์žฌ๋œ ๊ตฌ์กฐ๋ฅผ ํƒ์ƒ‰ํ•˜์—ฌ exploration์ด๋ผ๊ณ  ํ•˜๊ณ ,

second term ((1-ฮป)์™€ ๊ณฑํ•ด์ง„ ๋ถ€๋ถ„)์€ labeled data์—์„œ transfer ๋œ ์ง€์‹์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ•ํ™”ํ•˜๋Š” ๊ฒƒ์œผ๋กœ utilization์ด๋ผ๊ณ ๋„ ํ•œ๋‹ค.

ฮป์˜ ๊ฐ’์„ ์กฐ์ •ํ•˜์—ฌ ์‹คํ—˜ํ•œ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด์•˜์„ ๋•Œ, exploration๊ณผ utilization ๋‘˜ ๋‹ค ํ•„์š”ํ•œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.


์ตœ์ข…์ ์ธ EM ํ”„๋ ˆ์ž„์›Œํฌ์˜ ํ•™์Šต ๊ณผ์ •์„ ์‹œ๊ฐํ™”ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค:


Experiments

๋…ผ๋ฌธ์—์„œ ์ œ์‹œ๋œ ์‹คํ—˜์—์„œ๋Š” ๋‹ค์Œ 3๊ฐœ์˜ ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•ด์„œ ํšจ์šฉ์„ฑ์„ ์ฆ๋ช…ํ•œ๋‹ค.

1. CLINC

2. BANKING

3. StackOverflow

๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค


Results and Discussion

๊ฒฐ๊ตญ, ์ด ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์œ„์˜ 3๊ฐ€์ง€ ์‹คํ—˜์—์„œ ๋ชจ๋‘ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ์ด ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ๋Š” labeled data๋ฅผ model training์˜ ๊ฐ€์ด๋“œ๋กœ ์‚ฌ์šฉํ•˜์—ฌ model์ด ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ง€๊ฒŒ ํ–ˆ๋‹ค.

 More Than Remembering Knowledge

ํ•ด๋‹น ๋…ผ๋ฌธ์—์„œ๋Š” unlabeled data์—์„œ ๋ฐœ๊ฒฌ๋œ D^l ์ƒ˜ํ”Œ์ด ๋ง๊ฐํ˜„์ƒ์„ ๋ง‰๋Š” ๊ฒƒ์„ ๋„˜์–ด์„œ, known intents์˜ ํ™•์ธ์„ ๋„์™€์ฃผ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— model์ด ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค๊ณ  ์ฃผ์žฅํ•œ๋‹ค.

 

์ €์ž‘์žํ‘œ์‹œ (์ƒˆ์ฐฝ์—ด๋ฆผ)

'๐Ÿ“š ๋…ผ๋ฌธ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Two Birds One Stone: Dynamic Ensemble for OOD Intent Classification  (0) 2023.08.28
Discovering New Intents with Deep Aligned Clustering  (0) 2023.08.16
USTORY: Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding  (0) 2023.07.11
CLICK: Constrastive Learning for Injecting Contextual Knowledge to Conversational Recommender System  (0) 2023.06.26
GPT-1: Improving Language Understanding by Generative Pre-Training  (0) 2023.06.20
  1. Abstract
  2. Introduction
  3. Previous Works
  4. Related Work
  5. Approach
  6. 1. Problem Definition
  7. 2. Intent Representation Transferring Knowledge
  8. 3. EM Framework for Optimization
'๐Ÿ“š ๋…ผ๋ฌธ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
  • Two Birds One Stone: Dynamic Ensemble for OOD Intent Classification
  • Discovering New Intents with Deep Aligned Clustering
  • USTORY: Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding
  • CLICK: Constrastive Learning for Injecting Contextual Knowledge to Conversational Recommender System
์žฅ์˜์ค€
์žฅ์˜์ค€
groomielife
์žฅ์˜์ค€
youngjangjoon
์žฅ์˜์ค€
์ „์ฒด
์˜ค๋Š˜
์–ด์ œ
  • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (35)
    • ๐Ÿ“š ๋…ผ๋ฌธ (10)
    • ๐Ÿ’ป ํ”„๋กœ์ ํŠธ (14)
      • ๐ŸŽ“ RESUMAI (6)
      • ๐Ÿงธ TOY-PROJECTS (8)
    • ๐Ÿ“š ์Šคํ„ฐ๋”” (11)
      • CS224N (6)
      • NLP (5)

์ธ๊ธฐ ๊ธ€

ํƒœ๊ทธ

  • RESUMAI
  • DEEPLOOK
  • contrastive learning
  • project
  • rag
  • NLP
  • Haar-cascade
  • dj-rest-auth
  • vectordb
  • ์ƒ์„ฑAI
  • NeuralNet
  • ์ž์†Œ์„œ์ƒ์„ฑํ”„๋กœ์ ํŠธ
  • CS224N
  • ๋…ผ๋ฌธ
  • text embedding
  • Neural Net
  • GenAI
  • DEEPALIGNED
  • MTP-CL
  • cv
  • text clustering
  • pinecone
  • gpt-1
  • story discovery
  • allauth
  • ArcFace
  • Conversational Agent
  • ์ž๊ธฐ์†Œ๊ฐœ์„œ์ƒ์„ฑ
  • Representation Training
  • ๋น„๋™๊ธฐ ์ €์žฅ
hELLO ยท Designed By ์ •์ƒ์šฐ.
์žฅ์˜์ค€
A Probabilistic Framework for Discovering New Intents
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”

๋‹จ์ถ•ํ‚ค

๋‚ด ๋ธ”๋กœ๊ทธ

๋‚ด ๋ธ”๋กœ๊ทธ - ๊ด€๋ฆฌ์ž ํ™ˆ ์ „ํ™˜
Q
Q
์ƒˆ ๊ธ€ ์“ฐ๊ธฐ
W
W

๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๊ธ€

๊ธ€ ์ˆ˜์ • (๊ถŒํ•œ ์žˆ๋Š” ๊ฒฝ์šฐ)
E
E
๋Œ“๊ธ€ ์˜์—ญ์œผ๋กœ ์ด๋™
C
C

๋ชจ๋“  ์˜์—ญ

์ด ํŽ˜์ด์ง€์˜ URL ๋ณต์‚ฌ
S
S
๋งจ ์œ„๋กœ ์ด๋™
T
T
ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ ์ด๋™
H
H
๋‹จ์ถ•ํ‚ค ์•ˆ๋‚ด
Shift + /
โ‡ง + /

* ๋‹จ์ถ•ํ‚ค๋Š” ํ•œ๊ธ€/์˜๋ฌธ ๋Œ€์†Œ๋ฌธ์ž๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํ‹ฐ์Šคํ† ๋ฆฌ ๊ธฐ๋ณธ ๋„๋ฉ”์ธ์—์„œ๋งŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.