QUICK REVIEW

[논문 리뷰] Response to Moffat's Comment on "Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval Scales"

Marco Ferrante, Nicola Ferro|arXiv (Cornell University)|2022. 12. 22.

Advanced Text Analysis Techniques인용 수 24

한 줄 요약

본 논문은 Moffat의 비판에 응답하여 측정의 표현 이론, 의미성, 그리고 IR 평가 척도에 대한 intervalization 접근법을 명확히 한다.

ABSTRACT

Moffat recently commented on our previous work. Our work focused on how laying the foundations of our evaluation methodology into the theory of measurement can improve our knowledge and understanding of the evaluation measures we use in IR and how it can shed light on the different types of scales adopted by our evaluation measures; we also provided evidence, through extensive experimentation, on the impact of the different types of scales on the statistical analyses, as well as on the impact of departing from their assumptions. Moreover, we investigated, for the first time in IR, the concept of meaningfulness, i.e. the invariance of the experimental statements and inferences you draw, and proposed it as a way to ensure more valid and generalizabile results. Moffat's comments build on: (i) misconceptions about the representational theory of measurement, such as what an interval scale actually is and what axioms it has to comply with; (ii) they totally miss the central concept of meaningfulness. Therefore, we reply to Moffat's comments by properly framing them in the representational theory of measurement and in the concept of meaningfulness. All in all, we can only reiterate what we said several times: the goal of this research line is to theoretically ground our evaluation methodology - and IR is a field where it is extremely challenging to perform any theoretical advances - in order to aim for more robust and generalizable inferences - something we currently lack in the field. Possibly there are other and better ways to achieve this objective and these proposals could emerge from an open discussion in the field and from the work of others. On the other hand, reducing everything to a contrast on what is (or pretend to be) an interval scale or whether all or none evaluation measures are interval scales may be more a barrier from than a help in progressing towards this goal.

연구 동기 및 목표

IR 평가에서의 측정의 표현 이론에 대한 오해를 명확히 한다.
허용 가능한 척도 변환에 대한 불변성으로서 의미성의 역할을 주장한다.
IR 지표에서 사용자 시각을 보존하는 방법으로 제안된 intervalization 접근법을 옹호한다.
IR에서의 측정 공리(차이 구조)와 구간 척도 간의 연결을 논의한다.

제안 방법

측정과 척도 유형(명목, 서수, 구간, 비율)에 대한 기본 개념을 검토한다.
구간 척도에 대한 핵심 공리로서 solvability(등간 간격의 등분 배치)를 설명한다.
허용 가능한 변환하에서 진술의 불변성으로 의미성을 정의한다.
기본 순서를 보존하면서 순위를 구간 척도로 변환하는 절차로 intervalization을 제시한다.
다양한 데이터셋에 걸친 통계 분석에 대한 intervalization의 영향에 대한 경험적 평가를 제공한다.

실험 결과

연구 질문

RQ1측정의 표현 이론에서 IR 평가 척도에 대해 무엇이 유효한 구간 척도인가?
RQ2IR 평가 척도가 구간 척도로서 의미 있게 해석될 수 있으며, 통계 분석에 어떤 결과를 초래하는가?
RQ3intervalization이 의미 있는 추론을 가능하게 하면서도 사용자 견해를 보존하는가?
RQ4구간 척도 가정에서 벗어남이 IR 평가 및 추론에 어떤 함의를 갖는가?

주요 결과

구간 척도는 등간 간격의 등간화와 아핀 허용 변환을 필요로 한다; 모든 IR 척도가 이를 만족하는 것은 아니다.
의미성은 허용 가능한 변환하에서 진술의 불변성에 관한 것이지 주관적 해석가능성과 관련된 것이 아니다.
Intervalization은 척도가 주는 순서를 보존하면서 구간 척도 분석과 검정을 가능하게 한다.
저자들은 표준 IR 작업 전반에 걸친 척도 가정이 통계 분석에 미치는 영향을 보여주는 광범위한 실험을 제시한다.
응답은 IR에서의 강건하고 일반화 가능한 추론의 이론적 근거를 목표로 하며, 모든 척도를 구간 척도에 강제하는 것이 아님을 다시 강조한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.