Information retrieval in real-time search meets different challenges than classical web search. (1) In the query-end, user search intents would change rapidly with the occurrence and evolution of breaking news such as earthquake, election, and wars. Previous dense retrieval works focus on static semantic representation while lacks instant search intent, posing inferior performance in recalling the latest event-related documents in time-sensitive scenarios. To tackle this problem, in this paper, we propose to mine event information from large-scale user behavior logs in recent time, and fuse it and query with cross attention mechanism for time-context query representation. (2) In the document-end, user generated content (UGC) is usually colloquial and cluttered in terms of textual expression. We apply Prompt-tuning method to improve the UGC representation. Besides, we use a global buffer to cache recent training batches for hard negative examples mining dynamically. We conduct a set of offline experiments on million-scale production dataset to evaluate the performance of our approach. Moreover, we also deploy a A/B testing in real online system to verify the performance. Extensive experimental results demonstrate that the proposed approach largely outperforms the existing state-of-the-art baseline methods.
marinyoung4596 / err Goto Github PK
View Code? Open in Web Editor NEWEvent-driven Real-time Retrieval with Prompt-tuning