DSpace Collection:

DSpace Collection: http://hdl.handle.net/10174/38457 2026-05-25T06:33:27Z 2026-05-25T06:33:27Z HiPC-QR: Hierarchical Prompt Chaining for Query Reformulation Yang, Hua Hanyang, Li Teresa, Gonçalves http://hdl.handle.net/10174/41219 2026-02-16T15:15:02Z 2025-01-01T00:00:00Z

Title: HiPC-QR: Hierarchical Prompt Chaining for Query Reformulation Authors: Yang, Hua; Hanyang, Li; Teresa, Gonçalves Abstract: Query reformulation techniques optimize user queries to better align with documents, thus improving the performance of Information Retrieval (IR) systems. Previous methods have primarily focused on query expansion using techniques such as synonym replacement to improve recall. With the rapid advancement of Large Language Models (LLMs), the knowledge embedded within these models has grown. Research in prompt engineering has introduced various methods, with prompt chaining proving particularly effective for complex tasks. Directly prompting LLMs to reformulate queries has become a viable approach. However, existing LLM-based prompt methods for query reformulation often introduce irrelevant content into reformulated queries, resulting in decreased retrieval precision and misalignment with user intent. We propose a novel approach called Hierarchical Prompt Chaining for Query Reformulation (HiPC-QR). HiPC-QR employs a two-step prompt chaining technique to extract keywords from the original query and refine its structure by filtering out non-essential keywords based on the user’s query intent. This process reduces the query’s restrictiveness while simultaneously expanding essential keywords to enhance retrieval effectiveness. We evaluated the effectiveness of HiPC-QR on two benchmark retrieval datasets, namely MS MARCO and TREC Deep Learning.The experimental results show that HiPC-QR outperforms existing query reformulation methods on large-scale datasets in terms of both recall@10 and MRR@10.

2025-01-01T00:00:00Z MultiLTR: Text Ranking with a Multi-Stage Learning-to-Rank Approach Yang, Hua Gonçalves, Teresa http://hdl.handle.net/10174/41217 2026-02-16T15:13:45Z 2025-01-01T00:00:00Z

Title: MultiLTR: Text Ranking with a Multi-Stage Learning-to-Rank Approach Authors: Yang, Hua; Gonçalves, Teresa Abstract: The division of retrieval into multiple stages has evolved to balance efficiency and effectiveness among various ranking models. Faster but less accurate models are used to retrieve results from the entire corpus. Slower yet more precise models refine the ranking within the top candidate list. This study proposes a multi-stage learning-to-rank (MultiLTR) method. MultiLTR applies learning-to-rank techniques across multiple stages. It incorporates text from different fields such as titles, body content, and abstracts to produce a more comprehensive and accurate ranking. MultiLTR iteratively refines ranking accuracy through sequential processing phases. It dynamically selects top-performing rankers from a diverse candidate pool at each stage. Experiments were carried out on benchmark datasets, MQ2007 and MQ2008, using three categories of learning-to-rank algorithms. The results demonstrate that MultiLTR outperforms state-of-the-art ranking approaches, particularly in field-based ranking tasks. This study improves ranking accuracy and offers new insights into enhancing multi-stage ranking strategies.

2025-01-01T00:00:00Z Improving Consumer Health Search with Field-Level Learning-to-Rank Techniques Yang, Hua Gonçalves, Teresa http://hdl.handle.net/10174/41215 2026-02-16T15:13:02Z 2024-01-01T00:00:00Z

Title: Improving Consumer Health Search with Field-Level Learning-to-Rank Techniques Authors: Yang, Hua; Gonçalves, Teresa Abstract: In the area of consumer health search (CHS), there is an increasing concern about returning topically relevant and understandable health information to the user. Besides being used to rank topically relevant documents, Learning to Rank (LTR) has also been used to promote understandability ranking. Traditionally, features coming from different document fields are joined together, limiting the performance of standard LTR, since field information plays an important role in promoting understandability ranking. In this paper, a novel field-level Learning-to-Rank (f-LTR) approach is proposed, and its application in CHS is investigated by developing thorough experiments on CLEF’ 2016–2018 eHealth IR data collections. An in-depth analysis of the effects of using f-LTR is provided, with experimental results suggesting that in LTR, title features are more effective than other field features in promoting understandability ranking. Moreover, the fused f-LTR model is compared to existing work, confirming the effectiveness of the methodology.

2024-01-01T00:00:00Z Review and Empirical Analysis of Machine Learning-Based Software Effort Estimation Rahman, Mizanur Sarwar, Hasan Kader, MD. Abdul Gonçalves, Teresa Ting Tin, Tin http://hdl.handle.net/10174/41212 2026-02-16T15:11:32Z 2024-01-01T00:00:00Z

Title: Review and Empirical Analysis of Machine Learning-Based Software Effort Estimation Authors: Rahman, Mizanur; Sarwar, Hasan; Kader, MD. Abdul; Gonçalves, Teresa; Ting Tin, Tin Abstract: The average software company spends a huge amount of its revenue on Research and Development (R&D) for how to deliver software on time. Accurate software effort estimation is critical for successful project planning, resource allocation, and on-time delivery within budget for sustainable software development. However, both overestimation and underestimation can pose significant challenges, highlighting the need for continuous improvement in estimation techniques. This study reviews recent machine learning approaches employed to enhance the accuracy of software effort estimation (SEE), focusing on research published between 2020 and 2023. The literature review employed a systematic approach to identify relevant research on machine learning techniques for SEE. Additionally, comparative experiments were conducted using five commonly employed Machine Learning (ML) methods: K-Nearest Neighbor, Support Vector Machine, Random Forest, Logistic Regression, and LASSO Regression. The performance of these techniques was evaluated using five widely adopted accuracy metrics: Mean Squared Error (MSE), Mean Magnitude of Relative Error (MMRE), R-squared, Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). The evaluation was carried out on seven benchmark datasets: Albrecht, Desharnais, China, Kemerer, Mayazaki94, Maxwell, and COCOMO, which are publicly available and extensively used in SEE research. By carefully reviewing study quality, analyzing results across the literature, and rigorously evaluating experimental outcomes, clear conclusions were drawn about the most promising techniques for achieving state-of-the-art accuracy in estimating software effort. This study makes three key contributions to the field: firstly, it furnishes a thorough overview of recent machine learning research in software effort estimation (SEE); secondly, it provides data-driven guidance for researchers and practitioners to select optimal methods for accurate effort estimation; and thirdly, it demonstrates the performance of publicly available datasets through experimental analysis. Enhanced estimation supports the development of better predictive models for software project time, cost, and staffing needs. The findings aim to guide future research directions and tool development toward the most accurate machine learning approaches for modelling software development effort, costs, and delivery schedules, ultimately contributing to more efficient and cost-effective software projects.

2024-01-01T00:00:00Z