๐Ÿ“š ๋…ผ๋ฌธ

IDAS: Intent Discovery with Abstractive Summarization

์žฅ์˜์ค€ 2023. 10. 10. 18:18

์š”์ฆ˜ New Intent Discovery (NID) ํƒœ์Šคํฌ์— ๋น ์ ธ์žˆ๋‹ค. 

์ง€๋‚œ๋ฒˆ์— MTP-CLNN ๋…ผ๋ฌธ์„ ์ฝ๊ณ ,  "ํ”„๋กฌํ”„ํŠธ๋กœ ๋ฐœํ™”์— ๋Œ€ํ•œ intent๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ๊ทธ๊ฑธ ๊ธฐ๋ฐ˜์œผ๋กœ clustering ํ•˜๋ฉด unsupervised ๋ฐฉ์‹์— ์žˆ์–ด ์ข‹์€ ํ‰๊ฐ€๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™๋‹ค"๋ผ๋Š” ์ƒ๊ฐ์œผ๋กœ ๊ด€๋ จ ๋…ผ๋ฌธ๋“ค์„ ์ฐพ์•„๋ณด์•˜๋‹ค.

๊ทธ์ค‘, NLP4CONVAI@ACL 2023์— ๊ฒŒ์žฌ๋œ ๋…ผ๋ฌธ๊ณผ ๋‚ด๊ฐ€ ์ƒ๊ฐํ•œ ์•„์ด๋””์–ด๊ฐ€ ์ผ์น˜ํ•˜๋Š” ๊ฒƒ์ด ์žˆ์–ด ์ฝ์–ด๋ณด๊ฒŒ ๋˜์—ˆ๋‹ค.

์กฐ๊ธˆ๋งŒ ๋” ์ผ์ฐ ์ƒ๊ฐํ• ๊ฑธ


Abstract

Method

  • ์ถ”์ƒ์ ์ธ summary์— ๊ธฐ๋ฐ˜ํ•œ utternace๋“ค์„ clustering ํ•˜๋Š” ๊ฒƒ์ด ๊ธฐ์กด์˜ intent discovery ๋ฐฉ๋ฒ•๋“ค๋ณด๋‹ค ๋›ฐ์–ด๋‚  ์ˆ˜ ์žˆ์Œ
  • IDAS: LLM์— prompting์„ ํ†ตํ•ด ๋ฐœํ™”์˜ label์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ Intent Discovery ์ง„ํ–‰
  • unsupervised, semi-supervised์™€ ๋ชจ๋‘ ๋น„๊ต

Introduction

  • (utterance, intent)๋กœ fine-tuning ํ•˜๋Š” ๊ฑด time-consuming, expensive ํ•จ โ†’ ๋ช…์‹œ๋œ utterance๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ, ๊ทธ๋ƒฅ unlabeled utterance ๋“ค์„ cluster ํ•˜๋Š” ๋ฐฉ์‹
  • ์ด์ „ ์—ฐ๊ตฌ๋“ค๊ณผ๋Š” ๋‹ฌ๋ฆฌ, fine-tuning์„ ํ•˜์ง€ ์•Š๋Š” pre-trained encoder๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋ฐœํ™”๋“ค์„ ์„ค๋ช…์„ ํ†ตํ•ด ์š”์•ฝ ์ƒ์„ฑ โ†’ ํ…์ŠคํŠธ ๊ณต๊ฐ„์—์„œ ๋” ๊ฐ€๊นŒ์ด, ๋˜๋Š” ๋ฉ€๋ฆฌ ์œ„์น˜ํ•˜๋„๋ก ํ•จ
  • ์—ฌ๊ธฐ์„œ์˜ ์š”์•ฝ = labels โ†’ essential ํ•œ ์ •๋ณด๋“ค๋งŒ ๋ชจ์•„ ๋†“์Œ
  • ๊ฐ€์ •: ์ด๋ ‡๊ฒŒ ์š”์•ฝ์œผ๋กœ ๋งŒ๋“ค์–ด์ง„ label์ด intent๋ฅผ ๋” ์ž˜ ๋‚˜ํƒ€๋‚ด๊ณ  non-intent related information์ด vector similarity์— ์˜ํ–ฅ์„ ์ฃผ๋Š” ๊ฒƒ์„ ๋ง‰์Œ

ICL

  • LLM์ด ์†Œ๋Ÿ‰์˜ (input, output) ์˜ˆ์‹œ๋“ค๊ณผ task๊ฐ€ ํ•จ๊ป˜ ์ฃผ์–ด์ง„ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹
  • ๊ทธ๋Ÿฌ๋‚˜ unsupervised์—๋Š” ์ด ๋ฐœํ™”๋“ค์ด labeling์ด ์•ˆ๋ผ์žˆ์Œ

IDAS

ICL (In-Context Learning)์„ ํ†ตํ•œ label generation์„ ๋จผ์ € ์ง„ํ–‰

  1. ๋ฐœํ™”์— ๋Œ€ํ•œ 1์ฐจ์ ์ธ clustering
    • K-means clustering์„ ํ†ตํ•ด cluster centroid์— ๊ฐ€๊นŒ์šด ๋ฐœํ™” prototype ๋ช‡ ๊ฐœ ์„ ํƒ
    • ์ด์— ๋Œ€ํ•ด LLM์ด ์งง์€ label์„ ์ƒ์„ฑํ•˜๋„๋ก ํ•จ
  2. prototype์— ํฌํ•จ๋˜์ง€ ์•Š์€ ๋ฐœํ™” x์— ๋Œ€ํ•ด, prototype์—์„œ ํ•ด๋‹น ๋ฐœํ™”์™€ ๋น„์Šทํ•œ intent๋ฅผ ๊ฐ€์ง„ n๊ฐœ์˜ ๋ฐœํ™”๋ฅผ ๋ฝ‘์•„ ICL ์ง„ํ–‰
  3. ์ด๋ ‡๊ฒŒ ์ƒ์„ฑ๋œ label๋“ค์— ์˜ค๋ฅ˜๊ฐ€ ์žˆ์„ ์ˆ˜๋„ ์žˆ์œผ๋‹ˆ frozen pre-trained encoder๋ฅผ ํ†ตํ•ด single vector representation๊ณผ ๊ฒฐํ•ฉ
  4. latent ์ถ”์ธก์„ ์œ„ํ•ด K-means clustering ์ง„ํ–‰

Methodology

Task ์ •์˜

  • {(xi, yi) | i = 1โ€ฆN}๋ฅผ N๊ฐœ์˜ ๋ฐœํ™”์˜ dataset์ด๋ผ๊ณ  ์ •์˜.
  • intent๊ฐ€ ์—†๋Š” ๋ฐœํ™”๋“ค Dx = {xi | i = 1โ€ฆN}์— ๋Œ€ํ•ด ์ธ์ฝ”๋” E๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ E(xi)๋ฅผ ํ†ตํ•ด y๋ฅผ ์œ ์ถ”ํ•˜๋Š” ๊ฒƒ

Overview

  1. Initial Clustering
    • K-means clusteringํ•ด์„œ prototype ํ™•๋ณด
  2. Label Generation
    1. instruction๊ณผ prototype ํ•œ ๊ฐœ์”ฉ์„ prompt๋กœ LLM์— ๋„ฃ์–ด label์„ ์ƒ์„ฑ
      • ์ƒ์„ฑ๋œ (prototype xi, label yi)๋Š” M์ด๋ผ๋Š” ์ง‘ํ•ฉ์— ์ €์žฅ๋จ
    2. prototype์— ํฌํ•จ๋˜์ง€ ์•Š์€ ๋ฐœํ™”๋“ค xk(k=1โ€ฆprototype ๊ฐœ์ˆ˜)์™€, M ์ง‘ํ•ฉ์— ์†ํ•œ ๋ฐ์ดํ„ฐ๋“ค์— ๋Œ€ํ•œ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•ด top N๊ฐœ๋ฅผ ๋ฝ‘์•„ (inst, (xi, li), xk)๋กœ ํ”„๋กฌํ”„ํŒ… โ†’ ์ƒˆ๋กญ๊ฒŒ ์ƒ์„ฑ๋œ (xk, lk)๋„ M ์ง‘ํ•ฉ์— ํฌํ•จ
  3. Label & Utterane Encoding
    • ์œ„ ๊ณผ์ •์„ ๋ชจ๋‘ ๊ฑฐ์ณ ์ƒ์„ฑ๋œ utterance์™€ label์„ frozen pre-trained encoder๋ฅผ ์ด์šฉํ•ด์„œ single vector representation์œผ๋กœ ๋งŒ๋“ฆ
  4. ์ตœ์ข…์ ์œผ๋กœ ๊ฒฐํ•ฉ๋œ rerpresentation์— K-means๋ฅผ ์‚ฌ์šฉํ•ด์„œ intent๋ฅผ ์ถ”์ธก

1. Initial Clustering

๋ชฉ์ : ๋‹ค์–‘ํ•œ ํ”„๋กœํ† ํƒ€์ž… ํ™•๋ณด

  1. ๋ชจ๋“  unlabeled data๋Š” ์ธ์ฝ”๋”๋กœ ์ธ์ฝ”๋”ฉ๋จ
  2. K-means๋ฅผ ํ†ตํ•ด K๊ฐœ์˜ cluster๋กœ ๋ถ„ํ• 

3. ๊ฐ cluster์˜ cluster centroid์™€ ๊ฐ€๊นŒ์šด ํ•ด๋‹น cluster์˜ data point๋ฅผ prototype(pi)์œผ๋กœ ์ง€์ •

2. Label Generation

๋ชฉ์ : ํ”„๋กœํ† ํƒ€์ž…์— ์†ํ•˜์ง€ ์•Š์€ unlabeled data์— ๋Œ€ํ•œ ์ตœ๋Œ€ํ•œ ์ •ํ™•ํ•œ label ์ƒ์„ฑ

  1. ํ”„๋กœํ† ํƒ€์ž… labeling
    • ๊ฐ pi ๋“ค์„ instruction๊ณผ ํ•จ๊ป˜ LLM์— ๋„ฃ์Œ
    • instruction: โ€œdescribe the question in a maximum of 5 wordsโ€
    • ์ด๋ ‡๊ฒŒ ์ƒ์„ฑ๋œ ๊ฒƒ์ด label li๋กœ ์‚ฌ์šฉ๋จ
    • ์ƒ์„ฑ๋œ xi li๋ฅผ ์ง‘ํ•ฉ M์— ๋„ฃ์Œ

2. ICL์„ ํ†ตํ•œ label generation

  • ํ”„๋กœํ† ํƒ€์ž…์— ์†ํ•˜์ง€ ์•Š์€ data์— ๋Œ€ํ•œ label ๋งŒ๋“ค๊ธฐ
  • ๋‹ค์Œ 3๊ฐ€์ง€ ํ•ญ๋ชฉ์œผ๋กœ ๊ตฌ์„ฑ

1. instruction: โ€œclassify the question into one of the labelsโ€

2. n๊ฐœ์˜ (utterance, label)์„ ์ƒ˜ํ”Œ๋ง

3. labeling ํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐœํ™”

  • n๊ฐœ์˜ (utterance, label)์„ ์ƒ˜ํ”Œ๋งํ•  ๋•Œ, KATE ์‚ฌ์šฉ (KATE๋Š” Knn-Augmented in-conText Example selection์˜ ์ค„์ž„๋ง์ž„)
    • label์„ ๊ตฌํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐœํ™”์™€ ์ง‘ํ•ฉ M์— ์†ํ•œ ๋ฐœํ™”๋“ค์— ๋Œ€ํ•ด similarity๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ 
    • n๊ฐœ์˜ ๊ฐ€์žฅ ๋น„์Šทํ•œ ๋ฐœํ™” ์„ ํƒ โ†’ N_n(x)๋ผ๊ณ  ํ•จ
  • ์ƒ์„ฑ๋œ xk, lk๋ฅผ M์ด๋ผ๋Š” ์ง‘ํ•ฉ์— ๋„ฃ์Œ
  • instruction ๊ด€๋ จ ์œ ์˜์‚ฌํ•จ
    • ์—ฌ๊ธฐ์„œ โ€classifyโ€๋ผ๋Š” ๋‹จ์–ด๋ฅผ ์จ๋„, ์ƒˆ๋กœ์šด ์˜๋„๋ฅผ ๊ฐ€์ง„ ๋ฐœํ™”๊ฐ€ ์ž…๋ ฅ์œผ๋กœ ์˜ค๋ฉด, ์ƒˆ๋กœ์šด label ์ƒ์„ฑ๋„ ๊ฐ€๋Šฅํ•จ
    • ๋™์ผํ•œ ์˜๋„๋ฅผ ๊ฐ€์ง„ ๋ฐœํ™”๋“ค ์‚ฌ์ด์—์„œ ์ƒ์„ฑ๋˜๋Š” ๋ ˆ์ด๋ธ”์˜ ๋ณ€๋™์„ ์ตœ์†Œํ™”ํ•  ์ˆ˜ ์žˆ์Œ

3. Encoding Utterances and Labels

step2๊นŒ์ง€ ๋งˆ์น˜๋ฉด, ๋ชจ๋“  ๋ฐœํ™”๋“ค์€ label์„ ๊ฐ€์ง€๊ณ  ์žˆ์Œ

๋ชฉ์ : ๋ฐœํ™”์™€ label์„ ์ด์šฉํ•ด์„œ ํ•˜๋‚˜์˜ ํ‘œํ˜„์œผ๋กœ ๋งŒ๋“ค์ž

  • ๊ฒฐํ•ฉ๋œ vector ํ‘œํ˜„ ฯ†_AVG๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ

  • SMOOTHING ๊ณผ์ •: ์ธ์ฝ”๋”ฉ ๋œ ๋ฐœํ™”์™€ label์˜ ๊ฒฐํ•ฉ๋œ ํ‘œํ˜„์„ ๋”์šฑ ์ •์ œํ•˜๋Š” ๊ณผ์ •

1. ๊ฐœ๋ณ„ ๋ฐœํ™”์— ํŠน์ •๋œ ํ”ผ์ฒ˜๋ฅผ ์–ต์ œ

2. ๋ฐœํ™” ๊ฐ„ ๊ณตํ†ต ํŠน์ง•์„ ๊ฐ•์กฐ

  • ๋ฐœํ™” x์™€ ๊ฐ€์žฅ ์œ ์‚ฌํ•œ nโ€™ ๊ฐœ์˜ ๋ฐœํ™”์™€ ํ‰๊ท  ๋ฒกํ„ฐ ์ธ์ฝ”๋”ฉ์˜ ํ‰๊ท ์„ ๊ตฌํ•จ
    • ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด x์™€ ์œ ์‚ฌํ•œ ๋ฐœํ™”๋“ค์˜ ํŠน์ง•์„ ํฌํ•จํ•ด์„œ ๋” ์ผ๋ฐ˜ํ™”๋œ ํŠน์ง•์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ
  • nโ€™์˜ ๊ฒฐ์ •์€ silhouette score์œผ๋กœ ํ•จ
    • ์‹ค๋ฃจ์—ฃ ์ ์ˆ˜๋Š” ๊ฐ ์ƒ˜ํ”Œ์ด ์–ผ๋งˆ๋‚˜ ์ž˜ ๊ตฐ์ง‘ํ™”๋๋Š”์ง€ ๋‚˜ํƒ€๋‚ด๋Š” ์ฒ™๋„๋กœ, ์ด๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” nโ€™ ๊ฐ’ ์„ ํƒ

4. Final Intent Discovery

  • ์ตœ์ข…์ ์œผ๋กœ ๋ฐœํ™”์™€ label์„ ๊ฒฐํ•ฉํ•˜์—ฌ n๊ฐœ์˜ ฯ†_SMOOTH(x, l)์„ ๋งŒ๋“ค์—ˆ์Œ.
  • ์ดํ›„, K-means๋ฅผ ์‹คํ–‰ํ•ด์„œ intent discover
  • (๋’ค์— implementation์—์„œ ๋‚˜์˜ค๋Š” ๋‚ด์šฉ): nโ€™์„ ๊ฒฐ์ •ํ•  ๋•Œ nโ€™=5๋ถ€ํ„ฐ 45๊นŒ์ง€ ์ฆ๊ฐ€์‹œํ‚ค๋ฉด์„œ ์ƒ๊ธฐ๋Š” ฯ†_SMOOTH์„ ๊ธฐ์ค€์œผ๋กœ ์‹ค๋ฃจ์—ฃ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐ โ†’ ์‹ค๋ฃจ์—ฃ ์ ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋†’์€ nโ€™์„ ์ •ํ•จ

Experimental Setup

1. Datasets

CLINC, BANKING, StackOverflow, private dataset from a transportation company

2. Baselines

  • MTP-CLNN
    • ๋‹ค๋งŒ MTP-CLNN์€ CLINC ๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ๋ฐ˜์œผ๋กœ train ๋์œผ๋ฏ€๋กœ, CLINC์œผ๋กœ ๋น„๊ตํ•˜๋Š” ๊ฑด ์ข‹์ง€ ์•Š์Œ
  • CLINC์— ํ•œํ•ด์„œ๋Š” semi-superivised ๋ฐฉ์‹์ธ DAC, SCL+PLT ๋ฐฉ์‹๊ณผ ๋น„๊ต
    • Known Class Ratio๋ฅผ 25%, 50%, 75%๋กœ ์ฆ๊ฐ€์‹œํ‚ค๋ฉด์„œ ์‚ฌ์šฉ

3. Evaluation

  • ARI, NMI, ACC + Hungarian metric ์‚ฌ์šฉ
  • IDAS์˜ label generation ํ”„๋กœ์„ธ์Šค๋Š” ๋ฐ์ดํ„ฐ์˜ ์ˆœ์„œ์— ๋”ฐ๋ผ ์ƒ์„ฑ๋˜๋Š” Label์ด ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•จ
    • ๋ฐœํ™”์˜ ์ˆœ์„œ์— ๋”ฐ๋ผ K-means ๊ฒฐ๊ณผ๊ฐ€ ๋ฐ”๋€” ์ˆ˜ ์žˆ์Œ
    • ๋ฐœํ™”์˜ ์ˆœ์„œ๊ฐ€ ๋ฐ”๋€Œ๋ฉด ์ดˆ๊ธฐ์— ์„ ํƒ๋˜๋Š” ํ”„๋กœํ† ํƒ€์ž…์ด๋‚˜ ๊ทธ ํ›„์— ๋ ˆ์ด๋ธ”์ด ๋ถ€์—ฌ๋˜๋Š” ๋ฐœํ™”์˜ ์ˆœ์„œ๊ฐ€ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์Œ
  • ๋ฐœํ™” ์ˆœ์„œ์˜ ๋ณ€ํ™”๊ฐ€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด ๋ฐœํ™”์˜ ์ˆœ์„œ๋ฅผ ์„ž์–ด์„œ IDAS์˜ 1-2๋ฒˆ ๊ณผ์ •์„ ๋ฐ˜๋ณต โ†’ ๋ฐœํ™”์˜ ์ˆœ์„œ๊ฐ€ ์ดˆ๊ธฐ ์กฐ๊ฑด์— ๋”ฐ๋ผ ์–ผ๋งˆ๋‚˜ ์•ˆ์ •์ ์ธ์ง€๋ฅผ ํŒ๋‹จํ•˜๊ณ  ๋ณ€๋™์„ฑ ์ธก์ • ๊ฐ€๋Šฅ

4. Implementation

  1. Encoder
    • MTP-CLNN์— ๋Œ€ํ•ด์„œ๋Š” MTP ์‚ฌ์šฉ
    • SCL+PLT, DAC์— ๋Œ€ํ•ด์„œ๋Š” SBERT ์‚ฌ์šฉ
  2. Language models and prompts
    1. LM
      • GPT-3 ์‚ฌ์šฉ
      • LM์˜ temp=0: ๊ฐ™์€ intent์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ variation ์žˆ์œผ๋ฉด ์•ˆ ๋˜๊ธฐ ๋•Œ๋ฌธ
    2. Prompts
      • banking, chatbot, transport์— ๋Œ€ํ•ด์„œ๋Š” โ€œDescribe the domain question in a maximum of 5 wordsโ€๋กœ ํ”„๋กฌํ”„ํŒ…
      • StackOverflow๋Š” topic ๊ด€๋ จ์ด๋‹ˆ๊นŒ โ€œIdentify the technology in questionโ€์œผ๋กœ ํ”„๋กฌํ”„ํŒ…
  3. Nearest Neighbor retrieval
    1. similarity function์œผ๋กœ๋Š” ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„
    2. n=8: ํ›„ ์‹คํ—˜์—์„œ ๋” ํ‚ค์šธ์ˆ˜๋ก ๋ณ€ํ™” ์—†์—ˆ์Œ
    3. nโ€™ (5~45)์€ ๋งˆ์ง€๋ง‰์— K-means ์—ฌ๋Ÿฌ ๋ฒˆ ์ˆ˜ํ–‰ ํ›„ ํ‰๊ท  ์‹ค๋ฃจ์—ฃ ์ ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ๊ฐ’์„ ์„ ํƒ

Results and Discussion

1. Main results

  1. Unupervised
    • ๊ฐ€๋Šฅํ•œ label ์—†์Œ
    • model์˜ cluster ํ‰๊ฐ€ํ•  test set ๋ฐ–์— ์—†์Œ
    • MTP, MTP-CLNN์˜ encoder pre-training์— unlabeled test set์„ ํ•™์Šต

  • IDAS vs. MTP-CLNN
    • ๋‹ค์ด์•„๋ชฌ๋“œ: ์›๋ž˜ ๋ฐฉ์‹์œผ๋กœ MTP-CLNN์„ pre-trianing
    • ์ŠคํŽ˜์ด๋“œ: pre-training ์‹œ unlabeled test data๋ฅผ ๊ฐ™์ด ๋„ฃ์–ด pre-training
    • BANKING์˜ ACC ์ œ์™ธํ•˜๊ณ ๋Š” ๋ชจ๋“  ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋งคํŠธ๋ฆญ์„ ๋Šฅ๊ฐ€ํ•จ
  • ๋ถ„์„
    • ๋ณธ ๋…ผ๋ฌธ ๋ฐฉ์‹์—์„œ IDAS์™€ MTP-CLNN์˜ ํผํฌ๋จผ์Šค(์ŠคํŽ˜์ด๋“œ)๊ฐ€ ์›๋ž˜ ์„ธํŒ…(๋‹ค์ด์•„๋ชฌ๋“œ) ๋ณด๋‹ค ๋‚ฎ์Œ
    • ์™œ๋ƒํ•˜๋ฉด ๋ณธ ๋…ผ๋ฌธ ๋ฐฉ์‹์—์„œ ํ›จ์”ฌ ์ ์€ ์ƒ˜ํ”Œ๋กœ training

2. Semi-supervised

  • IDAS๋Š” 0%์˜ KCR(Known Class Ratio) ์‚ฌ์šฉ
  • IDAS๊ฐ€ 25%, 50%์˜ DAC, SCL+PLT๋ฅผ ๋ชจ๋‘ ๋Šฅ๊ฐ€ํ•จ

2. Ablations

  1. Encoding strategies
    • ๋ชจ๋“  4๊ฐœ์˜ ๋ฐ์ดํ„ฐ์…‹ ์‚ฌ์šฉ

  • ์œ„๋ถ€ํ„ฐ ์ˆœ์„œ๋Œ€๋กœ: 1. ๊ฐ๊ฐ์˜ ๋ฐœํ™”๋กœ๋งŒ ์ธ์ฝ”๋”ฉ, 2. label๋กœ๋งŒ ์ธ์ฝ”๋”ฉ, 3. ๋ฐœํ™” ์ธ์ฝ”๋”ฉ๊ณผ ๋ ˆ์ด๋ธ” ์ธ์ฝ”๋”ฉ์˜ ํ‰๊ท , 4. ํ‰๊ท  ๊ตฌํ•œ ํ›„ ์Šค๋ฌด๋”ฉ ๊ณผ์ • ๊ฑฐ์นจ
  • ๋‹ค๋ฅธ 3๊ฐœ๊ฐ€ ๋ฐœํ™”๋กœ๋งŒ ์ธ์ฝ”๋”ฉํ•œ ๊ฒƒ๋ณด๋‹ค ๋ชจ๋“  ๋ฐ์ดํ„ฐ์…‹๊ณผ ํ‰๊ฐ€ ๋งคํŠธ๋ฆญ์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„ โ†’ ๋ฐœํ™”๋ฅผ ์š”์•ฝํ•˜๋Š” ๊ฒŒ ์˜๋„ ๋ฐœ๊ฒฌ ์„ฑ๋Šฅ ๋†’์ž„

2. Inferring the number of smoothing neighbors

  • {5,โ€ฆ,45}์— ํฌํ•จ๋˜๋Š” nโ€™์„ ์„ ํƒํ•ด์„œ silhouette score ๊ณ„์‚ฐ
  • [ํ‘œ] - nโ€™ ๊ฐ’์— ๋”ฐ๋ฅธ ์‹ค๋ฃจ์—ฃ ์Šค์ฝ”์–ด

3. Random vs. Nearest Neighbor

  • IDAS๋Š” KATE๋ฅผ ์‚ฌ์šฉํ•ด์„œ x์™€ ๊ทผ์ ‘ํ•œ n๊ฐœ์˜ ๋ฐ์ดํ„ฐ N_n(x) ์„ ํƒ
    (KATE๋Š” Knn-Augmented in-conText Example selection์˜ ์ค„์ž„๋ง์ž„)

  • KATE์˜ ์‚ฌ์šฉ์ด ICL์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์— ๋น„ํ•ด ๋†’์€ ์Šค์ฝ”์–ด ํ–ฅ์ƒ ๋ณด์—ฌ์คŒ
  • n=1, 2, 4์™€ ๊ฐ™์€ ์ž‘์€ ๋ณ€ํ™”๋ฅผ ์“ฐ๋Š” ๊ฑด ํฌ๊ฒŒ ์Šค์ฝ”์–ด ์ฐจ์ด๊ฐ€ ์—†์Œ
  • n=8, 16์—์„œ ์Šค์ฝ”์–ด๊ฐ€ ์ข‹๊ฒŒ ๋‚˜ํƒ€๋‚จ
  • ์ด์ „ ๋…ผ๋ฌธ) n์„ ์ผ์ • ์ด์ƒ ํ‚ค์›Œ๋ดค์ž ํผํฌ๋จผ์Šค์— ํฌ๊ฒŒ ๋ณ€ํ™” ์—†๋‹ค๋Š” ์—ฐ๊ตฌ โ†’ n=8๋กœ ํ•˜๊ธฐ๋กœ ํ•จ

4. Overestimating the number of prototypes

  • ground truth์˜ 2๋ฐฐ๋กœ clustering ํ•œ ๊ฒŒ (K X 2) ์„ฑ๋Šฅ ํฌ๊ฒŒ ์ €ํ•˜๋˜์ง€๋Š” ์•Š์•˜์Œ

Conclusion & Limitations

Conclusion

IDAS: frozen pre-trained encoder ์‚ฌ์šฉ โ†’ summarize utterances into intents

unsupervised encoder๋ฅผ training ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค LLM ์ด์šฉํ•ด์„œ labeling ํ•˜๋Š” ๊ฒŒ ๋‚˜์Œ

Limitations

  1. K-means ์ˆ˜ํ–‰ ์‹œ K ์˜ˆ์ธกํ•˜๋Š” ๋ถ€๋ถ„์ด ์—†์Œ: ํ•ญ์ƒ ground truth์˜ 2๋ฐฐ๋กœ ์‹œ์ž‘ํ•จ
    • ๊ทธ๋Ÿฌ๋‚˜ ์‹ค์ œ๋กœ๋Š” ground truth ๋ชจ๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ• ์ƒ๊ฐํ•ด์•ผ ํ•จ
    • ๊ทธ๋Ÿฌ๋‚˜ 2๋ฐฐ๋กœ overestimating ํ•˜๋Š” ๊ฒŒ ํฌ๊ฒŒ ์„ฑ๋Šฅ์ €ํ•˜ ์‹œํ‚ค์ง€ ์•Š์•˜์Œ
  2. GPT-3 ๋งŒ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ, ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค ์‚ฌ์šฉํ•˜๋Š” ๊ฒŒ ์ข‹์„์ง€
๋Œ“๊ธ€์ˆ˜0