SparseCL

Sparse Contrastive Learning for Contradiction Retrieval

(*Equal Contribution)
1MIT EECS, 2University of California, Los Angeles,

Contradiction retrieval is an important task but current methods (Cross-Encoders and Bi-Encoder + similarity search) suffer from either low retrieval quality or inefficiency.

SparseCL leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences and use a combined metric of cosine similarity and sparsity to efficiently identify and retrieve documents that contradict a given query.

gain

Figure 1. Performance gains in NDCG@10 across different sentence embedding models and synthetic datasets, showcasing the effectiveness and robustness of our SparseCL compared with standard contrastive learning (CL)

arch

Figure 2. Comparison of our SparseCL with Cross-Encoder and Contrastive-Learning based Bi-Encoder for contradiction retrieval

Abstract

Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query, which is important to many downstream applications like fact checking and data cleaning. To retrieve contradiction argument to the query from large document corpora, existing methods such as similarity search and crossencoder models exhibit significant limitations. The former struggles to capture the essence of contradiction due to its inherent nature of favoring similarity, while the latter suffers from computational inefficiency, especially when the size of corpora is large. To address these challenges, we introduce a novel approach: SparseCL that leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences. Our method utilizes a combined metric of cosine similarity and a sparsity function to efficiently identify and retrieve documents that contradict a given query. This approach dramatically enhances the speed of contradiction detection by reducing the need for exhaustive document comparisons to simple vector calculations. We validate our model using the Arguana dataset, a benchmark dataset specifically geared towards contradiction retrieval, as well as synthetic contradictions generated from the MSMARCO and HotpotQA datasets using GPT-4. Our experiments demonstrate the efficacy of our approach not only in contradiction retrieval with more than 30% accuracy improvements on MSMARCO and HotpotQA across different model architectures but also in applications such as cleaning corrupted corpora to restore high-quality QA retrieval. This paper outlines a promising direction for improving the accuracy and efficiency of contradiction retrieval in large-scale text corpora.

Problem Formulation

We consider the contradiction retrieval problem: given a passage corpus \(C=\{p_1, p_2, ...p_n\}\) and a query passage q, retrieve the "best" passage \(p_*\) that contradicts q. We assume that several similar passages supporting q might exist in corpus C.

SparseCL

Sparsity Enhanced Embeddings: We use "sparisty" to characterize the contradiction between two senetence embeddings. A contradiction between two passages should manifest as a difference in a small semantic subspace, rather than differing across the entire embedding space.

We use contrastive learning to fine-tune any pretrained sentence embedding model to generate the desired sparsity-enhanced embeddings. The positive example for a passage is its contradiction passage in the training set. The hard negative example for a passage is its similar passage in the training set. There are also other random in-batch passages as soft negative examples. The sparsity function we choose here is the Hoyer sparsity function from [1]. Let h1 and h2 be two sentence embeddings and their embeddings have dimension d. We define $$\text{Hoyer}(h_1,h_2)=\left(\sqrt{d}-\frac{\|h_1-h_2\|_1}{\|h_1-h_2\|_2}\right) /\left(\sqrt{d}-1\right)$$

Finally, for each training tuple \((x_i , x^+_i , x^βˆ’_i)\) with their embeddings \((h_i , h^+_i , h^βˆ’_i)\), batch size N, and temperature \(\tau\), its loss function is defined as $$l_i=-\log\frac{e^{\text{Hoyer}(h_i,h^+_i)/\tau}}{\sum^N_{j=1}\left(e^{\text{Hoyer}(h_i,h^+_j)/\tau}+e^{\text{Hoyer}(h_i,h^-_j)/\tau}\right)}$$

Score function for contradiction retrieval: we use a weighted sum of the standard cosine similarity and our sparsity function. The weight is tunable to adapt to different case-dependent contradiction criterions.

hotpotqa_cos_hoyer_hist hotpotqa_sparsity_hoyer_hist

Figure 3. Histograms for the Hoyer sparsity of different pairs of sentence embedding differences on HotpotQA [3] test set. The left figure is the histogram produced by a standard sentence embedding model ("bge-base-en-v1.5" [5]). The right figure is the histogram produced by our sentence embedding model fine-tuned from "bge-base-en-v1.5" using SparseCL.

Contradiction Retrieval Experiment

experiment results

Table 4: Results for different models and methods on the contradiction retrieval task. Experiments are run on the Arguana dataset [2] and modified MSMARCO [4] and HotpotQA [3] datasets. We report NDCG@10 score here, the higher the better. "UAE" stands for "UAE-Large-V1" [6], "BGE" stands for "bge-base-en-v1.5" [5], "GTE" stands for "gte-large-en-v1.5" [7], "SFR-Mistral" stands for "SFR-Embedding-Mistral" [8], "VOYAGE" stands for "voyage-lite-02-instruct". The "Method" column denotes the score function used to retrieve contradictions. We consider two score functions: cosine similarity and cosine similarity plus Hoyer sparsity. "Zeroshot" denotes the direct testing of the model without any fine-tuning. "CL" denotes fine-tuning using standard contrastive learning. "SparseCL" denotes fine-tuning using Hoyer sparsity contrastive learning (our method)

Retrieval Corpus Cleaning Experiment

As an application of contradiction retrieval, we test how well our method can be used to find inconsistencies within a corpus and clean the corpus for future training or QA retrieval. We first inject corrupted data contradicting existing documents into the corpus, and measure the retrieval accuracy degradation for retrieved answers. Then, we use our contradiction retrieval method to filter out corrupted data and measure the retrieval accuracy again.

corpus_cleaning

Table 5: Experimental results for the impact of corrupted data on QA retrieval and contradiction retrieval for filtration. "Acc" represents the retrieval accuracy measured by the NDCG@10 score and "Corrupt" represents the fraction of returned passages that are corrupted, as measured by Recall@10.

Reference

[1] Niall Hurley and Scott Rickard. Comparing measures of sparsity. IEEE Transactions on Information Theory, 55(10):4723–4741, 2009.

[2] Henning Wachsmuth, Shahbaz Syed, and Benno Stein. Retrieval of the best counterargument without prior topic knowledge. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 241–251, 2018.

[3] Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, Brussels, Belgium, October-November 2018. Association for Computational Linguistics.

[4] Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. MS MARCO: A human generated machine reading comprehension dataset. CoRR, abs/1611.09268, 2016.

[5] Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. C-pack: Packaged resources to advance general chinese embedding, 2023.

[6] Xianming Li and Jing Li. Angle-optimized text embeddings. arXiv preprint arXiv:2309.12871, 2023.

[7] Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. Towards general text embeddings with multi-stage contrastive learning. arXiv preprint arXiv:2308.03281, 2023.

[8] Rui Meng, Ye Liu, Shafiq Rayhan Joty, Caiming Xiong, Yingbo Zhou, and Semih Yavuz. Sfr-embedding-mistral:enhance text retrieval with transfer learning. Salesforce AI Research Blog, 2024.