Abstrakti
Negative item sampling in offline top-n recommendation evaluation has become increasingly wide-spread, but remains controversial. While several studies have warned against using sampled evaluation metrics on the basis of being a poor approximation of the full ranking (i.e. using all negative items), others have highlighted their improved discriminative power and potential to make evaluation more robust. Unfortunately, empirical studies on negative item sampling are based on relatively few methods (between 3-12) and, therefore, lack the statistical power to assess the impact of negative item sampling in practice. In this article, we present preliminary findings from a comprehensive benchmarking study of negative item sampling based on 52 recommendation algorithms and 3 benchmark data sets. We show how the number of sampled negative items and different sampling strategies affect the consistency and discriminative power of sampled evaluation metrics. Furthermore, we investigate the impact of sparsity bias and popularity bias on the robustness of these metrics. In brief, we show that the optimal parameterizations for negative item sampling are dependent on data set characteristics and the goals of the investigator, suggesting a need for greater transparency in related experimental design decisions.
Alkuperäiskieli | englanti |
---|---|
Sivut | 1152-1157 |
Sivumäärä | 6 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 14 syysk. 2023 |
OKM-julkaisutyyppi | Ei sovellu |
Tapahtuma | ACM Conference on Recommender Systems - Singapore, Singapore Kesto: 18 syysk. 2023 → 22 syysk. 2023 Konferenssinumero: 17 |
Konferenssi
Konferenssi | ACM Conference on Recommender Systems |
---|---|
Lyhennettä | RecSys |
Maa/Alue | Singapore |
Kaupunki | Singapore |
Ajanjakso | 18/09/2023 → 22/09/2023 |
Lisätietoja
Publisher Copyright:© 2023 Owner/Author.
Tieteenalat
- 113 Tietojenkäsittely- ja informaatiotieteet