Search | arXiv e-print repository

Web3 and the State: Indian state's redescription of blockchain

Abstract: The article does a close reading of a discussion paper by NITI Aayog and a strategy paper by the Ministry of Electronics and Information Technology (MeitY) advocating non-financial use cases of blockchain in India. By noting the discursive shift from transparency to trust that grounds these two documents and consequently Indian state's redescription of blockchain, the paper foregrounds how governa… ▽ More The article does a close reading of a discussion paper by NITI Aayog and a strategy paper by the Ministry of Electronics and Information Technology (MeitY) advocating non-financial use cases of blockchain in India. By noting the discursive shift from transparency to trust that grounds these two documents and consequently Indian state's redescription of blockchain, the paper foregrounds how governance by infrastructure is at the heart of new forms of governance and how blockchain systems are being designated as decentral by states to have recentralizing effects. The papers highlight how a mapping of discursive shifts of notions such as trust, transparency, (de)centralization and (dis)intermediation can be a potent site to investigate redescriptions of emerging sociotechnical systems. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 21 pages

arXiv:2404.16831 [pdf, other]

The Third Monocular Depth Estimation Challenge

Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 submissions outperforming the baseline on the test set: 10 among them submitted a report describing their approach, highlighting a diffused use of foundational models such as Depth Anything at the core of their method. The challenge winners drastically improved 3D F-Score performance, from 17.51% to 23.72%. △ Less

Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: To appear in CVPRW2024

arXiv:2404.12772 [pdf, other]

Generating Test Scenarios from NL Requirements using Retrieval-Augmented LLMs: An Industrial Study

Authors: Chetan Arora, Tomas Herda, Verena Homm

Abstract: Test scenarios are specific instances of test cases that describe actions to validate a particular software functionality. By outlining the conditions under which the software operates and the expected outcomes, test scenarios ensure that the software functionality is tested in an integrated manner. Test scenarios are crucial for systematically testing an application under various conditions, incl… ▽ More Test scenarios are specific instances of test cases that describe actions to validate a particular software functionality. By outlining the conditions under which the software operates and the expected outcomes, test scenarios ensure that the software functionality is tested in an integrated manner. Test scenarios are crucial for systematically testing an application under various conditions, including edge cases, to identify potential issues and guarantee overall performance and reliability. Specifying test scenarios is tedious and requires a deep understanding of software functionality and the underlying domain. It further demands substantial effort and investment from already time- and budget-constrained requirements engineers and testing teams. This paper presents an automated approach (RAGTAG) for test scenario generation using Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs). RAG allows the integration of specific domain knowledge with LLMs' generation capabilities. We evaluate RAGTAG on two industrial projects from Austrian Post with bilingual requirements in German and English. Our results from an interview survey conducted with four experts on five dimensions -- relevance, coverage, correctness, coherence and feasibility, affirm the potential of RAGTAG in automating test scenario generation. Specifically, our results indicate that, despite the difficult task of analyzing bilingual requirements, RAGTAG is able to produce scenarios that are well-aligned with the underlying requirements and provide coverage of different aspects of the intended functionality. The generated scenarios are easily understandable to experts and feasible for testing in the project environment. The overall correctness is deemed satisfactory; however, gaps in capturing exact action sequences and domain nuances remain, underscoring the need for domain expertise when applying LLMs. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.06371 [pdf, other]

Model Generation from Requirements with LLMs: an Exploratory Study

Authors: Alessio Ferrari, Sallam Abualhaija, Chetan Arora

Abstract: Complementing natural language (NL) requirements with graphical models can improve stakeholders' communication and provide directions for system design. However, creating models from requirements involves manual effort. The advent of generative large language models (LLMs), ChatGPT being a notable example, offers promising avenues for automated assistance in model generation. This paper investigat… ▽ More Complementing natural language (NL) requirements with graphical models can improve stakeholders' communication and provide directions for system design. However, creating models from requirements involves manual effort. The advent of generative large language models (LLMs), ChatGPT being a notable example, offers promising avenues for automated assistance in model generation. This paper investigates the capability of ChatGPT to generate a specific type of model, i.e., UML sequence diagrams, from NL requirements. We conduct a qualitative study in which we examine the sequence diagrams generated by ChatGPT for 28 requirements documents of various types and from different domains. Observations from the analysis of the generated diagrams have systematically been captured through evaluation logs, and categorized through thematic analysis. Our results indicate that, although the models generally conform to the standard and exhibit a reasonable level of understandability, their completeness and correctness with respect to the specified requirements often present challenges. This issue is particularly pronounced in the presence of requirements smells, such as ambiguity and inconsistency. The insights derived from this study can influence the practical utilization of LLMs in the RE process, and open the door to novel RE-specific prompting strategies targeting effective model generation. △ Less

Submitted 9 April, 2024; originally announced April 2024.

ACM Class: D.2; K.6.3; D.2.1; D.3.1; D.2.2; D.2.10; D.2.2; I.2; I.2.7

arXiv:2404.05442 [pdf]

Unlocking Adaptive User Experience with Generative AI

Authors: Yutan Huang, Tanjila Kanij, Anuradha Madugalla, Shruti Mahajan, Chetan Arora, John Grundy

Abstract: Developing user-centred applications that address diverse user needs requires rigorous user research. This is time, effort and cost-consuming. With the recent rise of generative AI techniques based on Large Language Models (LLMs), there is a possibility that these powerful tools can be used to develop adaptive interfaces. This paper presents a novel approach to develop user personas and adaptive i… ▽ More Developing user-centred applications that address diverse user needs requires rigorous user research. This is time, effort and cost-consuming. With the recent rise of generative AI techniques based on Large Language Models (LLMs), there is a possibility that these powerful tools can be used to develop adaptive interfaces. This paper presents a novel approach to develop user personas and adaptive interface candidates for a specific domain using ChatGPT. We develop user personas and adaptive interfaces using both ChatGPT and a traditional manual process and compare these outcomes. To obtain data for the personas we collected data from 37 survey participants and 4 interviews in collaboration with a not-for-profit organisation. The comparison of ChatGPT generated content and manual content indicates promising results that encourage using LLMs in the adaptive interfaces design process. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.05425 [pdf, other]

Requirements Elicitation in Government Projects: A Preliminary Empirical Study

Authors: Anqi Ren, Lin Liu, Yi Wang, Xiao Liu, Hailong Wang, Kaijia Xu, Xishuo Zhang, Chetan Arora

Abstract: Government development projects vary significantly from private sector initiatives in scope, stakeholder complexity, and regulatory requirements. There is a lack of empirical studies focusing on requirements engineering (RE) activities specifically for government projects. We addressed this gap by conducting a series of semi-structured interviews with 12 professional software practitioners working… ▽ More Government development projects vary significantly from private sector initiatives in scope, stakeholder complexity, and regulatory requirements. There is a lack of empirical studies focusing on requirements engineering (RE) activities specifically for government projects. We addressed this gap by conducting a series of semi-structured interviews with 12 professional software practitioners working on government projects. These interviewees are employed by two types of companies, each serving different government departments. Our findings uncover differences in the requirements elicitation phase between government projects, particularly for data visualization aspects, and other software projects, such as stakeholders and policy requirements. Additionally, we explore the coverage of human and social aspects in requirements elicitation, finding that culture, team dynamics, and policy implications are critical considerations. Our findings also pinpoint the main challenges encountered during the requirements elicitation phase for government projects. Our findings highlight future research work that is important to bridge the gap in RE activities for government software projects. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.04603 [pdf, ps, other]

Analyzing LLM Usage in an Advanced Computing Class in India

Authors: Chaitanya Arora, Utkarsh Venaik, Pavit Singh, Sahil Goyal, Jatin Tyagi, Shyama Goel, Ujjwal Singhal, Dhruv Kumar

Abstract: This paper investigates the usage patterns of undergraduate and graduate students when engaging with large language models (LLMs) to tackle programming assignments in the context of advanced computing courses. Existing work predominantly focuses on the influence of LLMs in introductory programming contexts. Additionally, there is a scarcity of studies analyzing actual conversations between student… ▽ More This paper investigates the usage patterns of undergraduate and graduate students when engaging with large language models (LLMs) to tackle programming assignments in the context of advanced computing courses. Existing work predominantly focuses on the influence of LLMs in introductory programming contexts. Additionally, there is a scarcity of studies analyzing actual conversations between students and LLMs. Our study provides a comprehensive quantitative and qualitative analysis of raw interactions between students and LLMs within an advanced computing course (Distributed Systems) at an Indian University. We further complement this by conducting student interviews to gain deeper insights into their usage patterns. Our study shows that students make use of large language models (LLMs) in various ways: generating code or debugging code by identifying and fixing errors. They also copy and paste assignment descriptions into LLM interfaces for specific solutions, ask conceptual questions about complex programming ideas or theoretical concepts, and generate test cases to check code functionality and robustness. Our analysis includes over 4,000 prompts from 411 students and conducting interviews with 10 students. Our analysis shows that LLMs excel at generating boilerplate code and assisting in debugging, while students handle the integration of components and system troubleshooting. This aligns with the learning objectives of advanced computing courses, which are oriented towards teaching students how to build systems and troubleshoot, with less emphasis on generating code from scratch. Therefore, LLM tools can be leveraged to increase student productivity, as shown by the data we collected. This study contributes to the ongoing discussion on LLM use in education, advocating for their usefulness in advanced computing courses to complement higher-level learning and productivity. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: Under review: 12 pages

arXiv:2404.03122 [pdf, other]

Towards Standards-Compliant Assistive Technology Product Specifications via LLMs

Authors: Chetan Arora, John Grundy, Louise Puli, Natasha Layton

Abstract: In the rapidly evolving field of assistive technology (AT), ensuring that products meet national and international standards is essential for user safety, efficacy, and accessibility. In this vision paper, we introduce CompliAT, a pioneering framework designed to streamline the compliance process of AT product specifications with these standards through the innovative use of Large Language Models… ▽ More In the rapidly evolving field of assistive technology (AT), ensuring that products meet national and international standards is essential for user safety, efficacy, and accessibility. In this vision paper, we introduce CompliAT, a pioneering framework designed to streamline the compliance process of AT product specifications with these standards through the innovative use of Large Language Models (LLMs). CompliAT addresses three critical tasks: checking terminology consistency, classifying products according to standards, and tracing key product specifications to standard requirements. We tackle the challenge of terminology consistency to ensure that the language used in product specifications aligns with relevant standards, reducing misunderstandings and non-compliance risks. We propose a novel approach for product classification, leveraging a retrieval-augmented generation model to accurately categorize AT products aligning to international standards, despite the sparse availability of training data. Finally, CompliAT implements a traceability and compliance mechanism from key product specifications to standard requirements, ensuring all aspects of an AT product are thoroughly vetted against the corresponding standards. By semi-automating these processes, CompliAT aims to significantly reduce the time and effort required for AT product standards compliance and uphold quality and safety standards. We outline our planned implementation and evaluation plan for CompliAT. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2403.18807 [pdf, other]

ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation

Authors: Suraj Patni, Aradhye Agarwal, Chetan Arora

Abstract: In the absence of parallax cues, a learning-based single image depth estimation (SIDE) model relies heavily on shading and contextual cues in the image. While this simplicity is attractive, it is necessary to train such models on large and varied datasets, which are difficult to capture. It has been shown that using embeddings from pre-trained foundational models, such as CLIP, improves zero shot… ▽ More In the absence of parallax cues, a learning-based single image depth estimation (SIDE) model relies heavily on shading and contextual cues in the image. While this simplicity is attractive, it is necessary to train such models on large and varied datasets, which are difficult to capture. It has been shown that using embeddings from pre-trained foundational models, such as CLIP, improves zero shot transfer in several applications. Taking inspiration from this, in our paper we explore the use of global image priors generated from a pre-trained ViT model to provide more detailed contextual information. We argue that the embedding vector from a ViT model, pre-trained on a large dataset, captures greater relevant information for SIDE than the usual route of generating pseudo image captions, followed by CLIP based text embeddings. Based on this idea, we propose a new SIDE model using a diffusion backbone which is conditioned on ViT embeddings. Our proposed design establishes a new state-of-the-art (SOTA) for SIDE on NYUv2 dataset, achieving Abs Rel error of 0.059 (14% improvement) compared to 0.069 by the current SOTA (VPD). And on KITTI dataset, achieving Sq Rel error of 0.139 (2% improvement) compared to 0.142 by the current SOTA (GEDepth). For zero-shot transfer with a model trained on NYUv2, we report mean relative improvement of (20%, 23%, 81%, 25%) over NeWCRFs on (Sun-RGBD, iBims1, DIODE, HyperSim) datasets, compared to (16%, 18%, 45%, 9%) by ZoeDepth. The project page is available at https://ecodepth-iitd.github.io △ Less

Submitted 17 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2403.15917 [pdf, other]

Who Uses Personas in Requirements Engineering: The Practitioners' Perspective

Authors: Yi Wang, Chetan Arora, Xiao Liu, Thuong Hoang, Vasudha Malhotra, Ben Cheng, John Grundy

Abstract: Personas are commonly used in software projects to gain a better understanding of end-users' needs. However, there is a limited understanding of their usage and effectiveness in practice. This paper presents the results of a two-step investigation, comprising interviews with 26 software developers, UI/UX designers, business analysts and product managers and a survey of 203 practitioners, aimed at… ▽ More Personas are commonly used in software projects to gain a better understanding of end-users' needs. However, there is a limited understanding of their usage and effectiveness in practice. This paper presents the results of a two-step investigation, comprising interviews with 26 software developers, UI/UX designers, business analysts and product managers and a survey of 203 practitioners, aimed at shedding light on the current practices, methods and challenges of using personas in software development. Our findings reveal variations in the frequency and effectiveness of personas across different software projects and IT companies, the challenges practitioners face when using personas and the reasons for not using them at all. Furthermore, we investigate the coverage of human aspects in personas, often assumed to be a key feature of persona descriptions. Contrary to the general perception, our study shows that human aspects are often ignored for various reasons in personas or requirements engineering in general. Our study provides actionable insights for practitioners to overcome challenges in using personas during requirements engineering stages, and we identify areas for future research. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.08848 [pdf, other]

FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders

Authors: Soumen Basu, Mayuna Gupta, Chetan Madan, Pankaj Gupta, Chetan Arora

Abstract: In recent years, automated Gallbladder Cancer (GBC) detection has gained the attention of researchers. Current state-of-the-art (SOTA) methodologies relying on ultrasound sonography (US) images exhibit limited generalization, emphasizing the need for transformative approaches. We observe that individual US frames may lack sufficient information to capture disease manifestation. This study advocate… ▽ More In recent years, automated Gallbladder Cancer (GBC) detection has gained the attention of researchers. Current state-of-the-art (SOTA) methodologies relying on ultrasound sonography (US) images exhibit limited generalization, emphasizing the need for transformative approaches. We observe that individual US frames may lack sufficient information to capture disease manifestation. This study advocates for a paradigm shift towards video-based GBC detection, leveraging the inherent advantages of spatiotemporal representations. Employing the Masked Autoencoder (MAE) for representation learning, we address shortcomings in conventional image-based methods. We propose a novel design called FocusMAE to systematically bias the selection of masking tokens from high-information regions, fostering a more refined representation of malignancy. Additionally, we contribute the most extensive US video dataset for GBC detection. We also note that, this is the first study on US video-based GBC detection. We validate the proposed methods on the curated dataset, and report a new state-of-the-art (SOTA) accuracy of 96.4% for the GBC detection problem, against an accuracy of 84% by current Image-based SOTA - GBCNet, and RadFormer, and 94.7% by Video-based SOTA - AdaMAE. We further demonstrate the generality of the proposed FocusMAE on a public CT-based Covid detection dataset, reporting an improvement in accuracy by 3.3% over current baselines. The source code and pretrained models are available at: https://gbc-iitd.github.io/focusmae △ Less

Submitted 29 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: To Appear at CVPR 2024

arXiv:2402.11910 [pdf, other]

Enhancing Large Language Models for Text-to-Testcase Generation

Authors: Saranya Alagarsamy, Chakkrit Tantithamthavorn, Chetan Arora, Aldeida Aleti

Abstract: Context: Test-driven development (TDD) is a widely employed software development practice that involves developing test cases based on requirements prior to writing the code. Although various methods for automated test case generation have been proposed, they are not specifically tailored for TDD, where requirements instead of code serve as input. Objective: In this paper, we introduce a text-to-t… ▽ More Context: Test-driven development (TDD) is a widely employed software development practice that involves developing test cases based on requirements prior to writing the code. Although various methods for automated test case generation have been proposed, they are not specifically tailored for TDD, where requirements instead of code serve as input. Objective: In this paper, we introduce a text-to-testcase generation approach based on a large language model (GPT-3.5) that is fine-tuned on our curated dataset with an effective prompt design. Method: Our approach involves enhancing the capabilities of basic GPT-3.5 for text-to-testcase generation task that is fine-tuned on our curated dataset with an effective prompting design. We evaluated the effectiveness of our approach using a span of five large-scale open-source software projects. Results: Our approach generated 7k test cases for open source projects, achieving 78.5% syntactic correctness, 67.09% requirement alignment, and 61.7% code coverage, which substantially outperforms all other LLMs (basic GPT-3.5, Bloom, and CodeT5). In addition, our ablation study demonstrates the substantial performance improvement of the fine-tuning and prompting components of the GPT-3.5 model. Conclusions: These findings lead us to conclude that fine-tuning and prompting should be considered in the future when building a language model for the text-to-testcase generation task △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.02726 [pdf, other]

How do software practitioners perceive human-centric defects?

Authors: Vedant Chauhan, Chetan Arora, Hourieh Khalajzadeh, John Grundy

Abstract: Context: Human-centric software design and development focuses on how users want to carry out their tasks rather than making users accommodate their software. Software users can have different genders, ages, cultures, languages, disabilities, socioeconomic statuses, and educational backgrounds, among many other differences. Due to the inherently varied nature of these differences and their impact… ▽ More Context: Human-centric software design and development focuses on how users want to carry out their tasks rather than making users accommodate their software. Software users can have different genders, ages, cultures, languages, disabilities, socioeconomic statuses, and educational backgrounds, among many other differences. Due to the inherently varied nature of these differences and their impact on software usage, preferences and issues of users can vary, resulting in user-specific defects that we term as `human-centric defects' (HCDs). Objective: This research aims to understand the perception and current management practices of such human-centric defects by software practitioners, identify key challenges in reporting, understanding and fixing them, and provide recommendations to improve HCDs management in software engineering. Method: We conducted a survey and interviews with software engineering practitioners to gauge their knowledge and experience on HCDs and the defect tracking process. Results: We analysed fifty (50) survey- and ten (10) interview- responses from SE practitioners and identified that there are multiple gaps in the current management of HCDs in software engineering practice. There is a lack of awareness regarding human-centric aspects, causing them to be lost or under-appreciated during software development. Our results revealed that handling HCDs could be improved by following a better feedback process with end-users, a more descriptive taxonomy, and suitable automation. Conclusion: HCDs present a major challenge to software practitioners, given their diverse end-user base. In the software engineering domain, research on HCDs has been limited and requires effort from the research and practice communities to create better awareness and support regarding human-centric aspects. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2401.08097 [pdf, other]

A Study of Fairness Concerns in AI-based Mobile App Reviews

Authors: Ali Rezaei Nasab, Maedeh Dashti, Mojtaba Shahin, Mansooreh Zahedi, Hourieh Khalajzadeh, Chetan Arora, Peng Liang

Abstract: Fairness is one of the socio-technical concerns that must be addressed in AI-based systems. Unfair AI-based systems, particularly unfair AI-based mobile apps, can pose difficulties for a significant proportion of the global population. This paper aims to analyze fairness concerns in AI-based app reviews.We first manually constructed a ground-truth dataset, including a statistical sample of fairnes… ▽ More Fairness is one of the socio-technical concerns that must be addressed in AI-based systems. Unfair AI-based systems, particularly unfair AI-based mobile apps, can pose difficulties for a significant proportion of the global population. This paper aims to analyze fairness concerns in AI-based app reviews.We first manually constructed a ground-truth dataset, including a statistical sample of fairness and non-fairness reviews. Leveraging the ground-truth dataset, we developed and evaluated a set of machine learning and deep learning classifiers that distinguish fairness reviews from non-fairness reviews. Our experiments show that our best-performing classifier can detect fairness reviews with a precision of 94%. We then applied the best-performing classifier on approximately 9.5M reviews collected from 108 AI-based apps and identified around 92K fairness reviews. Next, applying the K-means clustering technique to the 92K fairness reviews, followed by manual analysis, led to the identification of six distinct types of fairness concerns (e.g., 'receiving different quality of features and services in different platforms and devices' and 'lack of transparency and fairness in dealing with user-generated content'). Finally, the manual analysis of 2,248 app owners' responses to the fairness reviews identified six root causes (e.g., 'copyright issues') that app owners report to justify fairness concerns. △ Less

Submitted 13 February, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: 25 pages, 4 images, 2 tables, Manuscript submitted to a Journal (2024)

arXiv:2401.01508 [pdf, other]

Practical Guidelines for the Selection and Evaluation of Natural Language Processing Techniques in Requirements Engineering

Authors: Mehrdad Sabetzadeh, Chetan Arora

Abstract: [Context and Motivation] Natural Language Processing (NLP) is now a cornerstone of requirements automation. One compelling factor behind the growing adoption of NLP in Requirements Engineering (RE) is the prevalent use of natural language (NL) for specifying requirements in industry. NLP techniques are commonly used for automatically classifying requirements, extracting important information, e.g.… ▽ More [Context and Motivation] Natural Language Processing (NLP) is now a cornerstone of requirements automation. One compelling factor behind the growing adoption of NLP in Requirements Engineering (RE) is the prevalent use of natural language (NL) for specifying requirements in industry. NLP techniques are commonly used for automatically classifying requirements, extracting important information, e.g., domain models and glossary terms, and performing quality assurance tasks, such as ambiguity handling and completeness checking. With so many different NLP solution strategies available and the possibility of applying machine learning alongside, it can be challenging to choose the right strategy for a specific RE task and to evaluate the resulting solution in an empirically rigorous manner. [Content] In this chapter, we present guidelines for the selection of NLP techniques as well as for their evaluation in the context of RE. In particular, we discuss how to choose among different strategies such as traditional NLP, feature-based machine learning, and language-model-based methods. [Contribution] Our ultimate hope for this chapter is to serve as a stepping stone, assisting newcomers to NLP4RE in quickly initiating themselves into the NLP technologies most pertinent to the RE field. △ Less

Submitted 2 May, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

Comments: This article will appear as Chapter 15 in a book titled "Handbook of Natural Language Processing for Requirements Engineering", to be published by Springer

arXiv:2311.09086 [pdf, other]

The Uli Dataset: An Exercise in Experience Led Annotation of oGBV

Authors: Arnav Arora, Maha Jinadoss, Cheshta Arora, Denny George, Brindaalakshmi, Haseena Dawood Khan, Kirti Rawat, Div, Ritash, Seema Mathur, Shivani Yadav, Shehla Rashid Shora, Rie Raut, Sumit Pawar, Apurva Paithane, Sonia, Vivek, Dharini Priscilla, Khairunnisha, Grace Banu, Ambika Tandon, Rishav Thakker, Rahul Dev Korra, Aatman Vaidya, Tarunima Prabhakar

Abstract: Online gender based violence has grown concomitantly with adoption of the internet and social media. Its effects are worse in the Global majority where many users use social media in languages other than English. The scale and volume of conversations on the internet has necessitated the need for automated detection of hate speech, and more specifically gendered abuse. There is, however, a lack of… ▽ More Online gender based violence has grown concomitantly with adoption of the internet and social media. Its effects are worse in the Global majority where many users use social media in languages other than English. The scale and volume of conversations on the internet has necessitated the need for automated detection of hate speech, and more specifically gendered abuse. There is, however, a lack of language specific and contextual data to build such automated tools. In this paper we present a dataset on gendered abuse in three languages- Hindi, Tamil and Indian English. The dataset comprises of tweets annotated along three questions pertaining to the experience of gender abuse, by experts who identify as women or a member of the LGBTQIA community in South Asia. Through this dataset we demonstrate a participatory approach to creating datasets that drive AI systems. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.04588 [pdf, other]

Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based sample selection

Authors: Akshit Jindal, Vikram Goyal, Saket Anand, Chetan Arora

Abstract: Machine Learning (ML) models become vulnerable to Model Stealing Attacks (MSA) when they are deployed as a service. In such attacks, the deployed model is queried repeatedly to build a labelled dataset. This dataset allows the attacker to train a thief model that mimics the original model. To maximize query efficiency, the attacker has to select the most informative subset of data points from the… ▽ More Machine Learning (ML) models become vulnerable to Model Stealing Attacks (MSA) when they are deployed as a service. In such attacks, the deployed model is queried repeatedly to build a labelled dataset. This dataset allows the attacker to train a thief model that mimics the original model. To maximize query efficiency, the attacker has to select the most informative subset of data points from the pool of available data. Existing attack strategies utilize approaches like Active Learning and Semi-Supervised learning to minimize costs. However, in the black-box setting, these approaches may select sub-optimal samples as they train only one thief model. Depending on the thief model's capacity and the data it was pretrained on, the model might even select noisy samples that harm the learning process. In this work, we explore the usage of an ensemble of deep learning models as our thief model. We call our attack Army of Thieves(AOT) as we train multiple models with varying complexities to leverage the crowd's wisdom. Based on the ensemble's collective decision, uncertain samples are selected for querying, while the most confident samples are directly included in the training data. Our approach is the first one to utilize an ensemble of thief models to perform model extraction. We outperform the base approaches of existing state-of-the-art methods by at least 3% and achieve a 21% higher adversarial sample transferability than previous work for models trained on the CIFAR-10 dataset. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 10 pages, 5 figures, paper accepted to WACV 2024

arXiv:2311.03550 [pdf, other]

United We Stand, Divided We Fall: UnityGraph for Unsupervised Procedure Learning from Videos

Authors: Siddhant Bansal, Chetan Arora, C. V. Jawahar

Abstract: Given multiple videos of the same task, procedure learning addresses identifying the key-steps and determining their order to perform the task. For this purpose, existing approaches use the signal generated from a pair of videos. This makes key-steps discovery challenging as the algorithms lack inter-videos perspective. Instead, we propose an unsupervised Graph-based Procedure Learning (GPL) frame… ▽ More Given multiple videos of the same task, procedure learning addresses identifying the key-steps and determining their order to perform the task. For this purpose, existing approaches use the signal generated from a pair of videos. This makes key-steps discovery challenging as the algorithms lack inter-videos perspective. Instead, we propose an unsupervised Graph-based Procedure Learning (GPL) framework. GPL consists of the novel UnityGraph that represents all the videos of a task as a graph to obtain both intra-video and inter-videos context. Further, to obtain similar embeddings for the same key-steps, the embeddings of UnityGraph are updated in an unsupervised manner using the Node2Vec algorithm. Finally, to identify the key-steps, we cluster the embeddings using KMeans. We test GPL on benchmark ProceL, CrossTask, and EgoProceL datasets and achieve an average improvement of 2% on third-person datasets and 3.6% on EgoProceL over the state-of-the-art. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 13 pages, 6 figures, Accepted in Winter Conference on Applications of Computer Vision (WACV), 2024

arXiv:2311.00284 [pdf, other]

Model-driven Engineering for Machine Learning Components: A Systematic Literature Review

Authors: Hira Naveed, Chetan Arora, Hourieh Khalajzadeh, John Grundy, Omar Haggag

Abstract: Context: Machine Learning (ML) has become widely adopted as a component in many modern software applications. Due to the large volumes of data available, organizations want to increasingly leverage their data to extract meaningful insights and enhance business profitability. ML components enable predictive capabilities, anomaly detection, recommendation, accurate image and text processing, and inf… ▽ More Context: Machine Learning (ML) has become widely adopted as a component in many modern software applications. Due to the large volumes of data available, organizations want to increasingly leverage their data to extract meaningful insights and enhance business profitability. ML components enable predictive capabilities, anomaly detection, recommendation, accurate image and text processing, and informed decision-making. However, developing systems with ML components is not trivial; it requires time, effort, knowledge, and expertise in ML, data processing, and software engineering. There have been several studies on the use of model-driven engineering (MDE) techniques to address these challenges when developing traditional software and cyber-physical systems. Recently, there has been a growing interest in applying MDE for systems with ML components. Objective: The goal of this study is to further explore the promising intersection of MDE with ML (MDE4ML) through a systematic literature review (SLR). Through this SLR, we wanted to analyze existing studies, including their motivations, MDE solutions, evaluation techniques, key benefits and limitations. Results: We analyzed selected studies with respect to several areas of interest and identified the following: 1) the key motivations behind using MDE4ML; 2) a variety of MDE solutions applied, such as modeling languages, model transformations, tool support, targeted ML aspects, contributions and more; 3) the evaluation techniques and metrics used; and 4) the limitations and directions for future work. We also discuss the gaps in existing literature and provide recommendations for future research. Conclusion: This SLR highlights current trends, gaps and future research directions in the field of MDE4ML, benefiting both researchers and practitioners △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.18648 [pdf, other]

Generative Artificial Intelligence for Software Engineering -- A Research Agenda

Authors: Anh Nguyen-Duc, Beatriz Cabrero-Daniel, Adam Przybylek, Chetan Arora, Dron Khanna, Tomas Herda, Usman Rafiq, Jorge Melegati, Eduardo Guerra, Kai-Kristian Kemell, Mika Saari, Zheying Zhang, Huy Le, Tho Quan, Pekka Abrahamsson

Abstract: Generative Artificial Intelligence (GenAI) tools have become increasingly prevalent in software development, offering assistance to various managerial and technical project activities. Notable examples of these tools include OpenAIs ChatGPT, GitHub Copilot, and Amazon CodeWhisperer. Although many recent publications have explored and evaluated the application of GenAI, a comprehensive understandin… ▽ More Generative Artificial Intelligence (GenAI) tools have become increasingly prevalent in software development, offering assistance to various managerial and technical project activities. Notable examples of these tools include OpenAIs ChatGPT, GitHub Copilot, and Amazon CodeWhisperer. Although many recent publications have explored and evaluated the application of GenAI, a comprehensive understanding of the current development, applications, limitations, and open challenges remains unclear to many. Particularly, we do not have an overall picture of the current state of GenAI technology in practical software engineering usage scenarios. We conducted a literature review and focus groups for a duration of five months to develop a research agenda on GenAI for Software Engineering. We identified 78 open Research Questions (RQs) in 11 areas of Software Engineering. Our results show that it is possible to explore the adoption of GenAI in partial automation and support decision-making in all software development activities. While the current literature is skewed toward software implementation, quality assurance and software maintenance, other areas, such as requirements engineering, software design, and software engineering education, would need further research attention. Common considerations when implementing GenAI include industry-level assessment, dependability and accuracy, data accessibility, transparency, and sustainability aspects associated with the technology. GenAI is bringing significant changes to the field of software engineering. Nevertheless, the state of research on the topic still remains immature. We believe that this research agenda holds significance and practical value for informing both researchers and practitioners about current applications and guiding future research. △ Less

Submitted 28 October, 2023; originally announced October 2023.

arXiv:2310.13976 [pdf, other]

Advancing Requirements Engineering through Generative AI: Assessing the Role of LLMs

Authors: Chetan Arora, John Grundy, Mohamed Abdelrazek

Abstract: Requirements Engineering (RE) is a critical phase in software development including the elicitation, analysis, specification, and validation of software requirements. Despite the importance of RE, it remains a challenging process due to the complexities of communication, uncertainty in the early stages and inadequate automation support. In recent years, large-language models (LLMs) have shown sign… ▽ More Requirements Engineering (RE) is a critical phase in software development including the elicitation, analysis, specification, and validation of software requirements. Despite the importance of RE, it remains a challenging process due to the complexities of communication, uncertainty in the early stages and inadequate automation support. In recent years, large-language models (LLMs) have shown significant promise in diverse domains, including natural language processing, code generation, and program understanding. This chapter explores the potential of LLMs in driving RE processes, aiming to improve the efficiency and accuracy of requirements-related tasks. We propose key directions and SWOT analysis for research and development in using LLMs for RE, focusing on the potential for requirements elicitation, analysis, specification, and validation. We further present the results from a preliminary evaluation, in this context. △ Less

Submitted 1 November, 2023; v1 submitted 21 October, 2023; originally announced October 2023.

arXiv:2309.06227 [pdf]

On the Injunction of XAIxArt

Authors: Cheshta Arora, Debarun Sarkar

Abstract: The position paper highlights the range of concerns that are engulfed in the injunction of explainable artificial intelligence in art (XAIxArt). Through a series of quick sub-questions, it points towards the ambiguities concerning 'explanation' and the postpositivist tradition of 'relevant explanation'. Rejecting both 'explanation' and 'relevant explanation', the paper takes a stance that XAIxArt… ▽ More The position paper highlights the range of concerns that are engulfed in the injunction of explainable artificial intelligence in art (XAIxArt). Through a series of quick sub-questions, it points towards the ambiguities concerning 'explanation' and the postpositivist tradition of 'relevant explanation'. Rejecting both 'explanation' and 'relevant explanation', the paper takes a stance that XAIxArt is a symptom of insecurity of the anthropocentric notion of art and a nostalgic desire to return to outmoded notions of authorship and human agency. To justify this stance, the paper makes a distinction between an ornamentation model of explanation to a model of explanation as sense-making. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.05261 [pdf, other]

Gall Bladder Cancer Detection from US Images with Only Image Level Labels

Authors: Soumen Basu, Ashish Papanai, Mayank Gupta, Pankaj Gupta, Chetan Arora

Abstract: Automated detection of Gallbladder Cancer (GBC) from Ultrasound (US) images is an important problem, which has drawn increased interest from researchers. However, most of these works use difficult-to-acquire information such as bounding box annotations or additional US videos. In this paper, we focus on GBC detection using only image-level labels. Such annotation is usually available based on the… ▽ More Automated detection of Gallbladder Cancer (GBC) from Ultrasound (US) images is an important problem, which has drawn increased interest from researchers. However, most of these works use difficult-to-acquire information such as bounding box annotations or additional US videos. In this paper, we focus on GBC detection using only image-level labels. Such annotation is usually available based on the diagnostic report of a patient, and do not require additional annotation effort from the physicians. However, our analysis reveals that it is difficult to train a standard image classification model for GBC detection. This is due to the low inter-class variance (a malignant region usually occupies only a small portion of a US image), high intra-class variance (due to the US sensor capturing a 2D slice of a 3D object leading to large viewpoint variations), and low training data availability. We posit that even when we have only the image level label, still formulating the problem as object detection (with bounding box output) helps a deep neural network (DNN) model focus on the relevant region of interest. Since no bounding box annotations is available for training, we pose the problem as weakly supervised object detection (WSOD). Motivated by the recent success of transformer models in object detection, we train one such model, DETR, using multi-instance-learning (MIL) with self-supervised instance selection to suit the WSOD task. Our proposed method demonstrates an improvement of AP and detection sensitivity over the SOTA transformer-based and CNN-based WSOD methods. Project page is at https://gbc-iitd.github.io/wsod-gbc △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: Accepted at MICCAI 2023

arXiv:2307.00390 [pdf, other]

PersonaGen: A Tool for Generating Personas from User Feedback

Authors: Xishuo Zhang, Lin Liu, Yi Wang, Xiao Liu, Hailong Wang, Anqi Ren, Chetan Arora

Abstract: Personas are crucial in software development processes, particularly in agile settings. However, no effective tools are available for generating personas from user feedback in agile software development processes. To fill this gap, we propose a novel tool that uses the GPT-4 model and knowledge graph to generate persona templates from well-processed user feedback, facilitating requirement analysis… ▽ More Personas are crucial in software development processes, particularly in agile settings. However, no effective tools are available for generating personas from user feedback in agile software development processes. To fill this gap, we propose a novel tool that uses the GPT-4 model and knowledge graph to generate persona templates from well-processed user feedback, facilitating requirement analysis in agile software development processes. We developed a tool called PersonaGen. We evaluated PersonaGen using qualitative feedback from a small-scale user study involving student software projects. The results were mixed, highlighting challenges in persona-based educational practice and addressing non-functional requirements. △ Less

Submitted 6 July, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

arXiv:2306.15782 [pdf, other]

doi 10.1007/978-3-031-41734-4_19

UTRNet: High-Resolution Urdu Text Recognition In Printed Documents

Authors: Abdur Rahman, Arjun Ghosh, Chetan Arora

Abstract: In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of previous works, which struggle to generalize to the intricacies of the Urdu scrip… ▽ More In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of previous works, which struggle to generalize to the intricacies of the Urdu script and the lack of sufficient annotated real-world data, we have introduced the UTRSet-Real, a large-scale annotated real-world dataset comprising over 11,000 lines and UTRSet-Synth, a synthetic dataset with 20,000 lines closely resembling real-world and made corrections to the ground truth of the existing IIITH dataset, making it a more reliable resource for future research. We also provide UrduDoc, a benchmark dataset for Urdu text line detection in scanned documents. Additionally, we have developed an online tool for end-to-end Urdu OCR from printed documents by integrating UTRNet with a text detection model. Our work not only addresses the current limitations of Urdu OCR but also paves the way for future research in this area and facilitates the continued advancement of Urdu OCR technology. The project page with source code, datasets, annotations, trained models, and online tool is available at abdur75648.github.io/UTRNet. △ Less

Submitted 23 August, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: Accepted at The 17th International Conference on Document Analysis and Recognition (ICDAR 2023)

Journal ref: Document Analysis and Recognition - ICDAR 2023 (2023) 305-324

arXiv:2306.01492 [pdf, other]

Multi-Modal Emotion Recognition for Enhanced Requirements Engineering: A Novel Approach

Authors: Ben Cheng, Chetan Arora, Xiao Liu, Thuong Hoang, Yi Wang, John Grundy

Abstract: Requirements engineering (RE) plays a crucial role in developing software systems by bridging the gap between stakeholders' needs and system specifications. However, effective communication and elicitation of stakeholder requirements can be challenging, as traditional RE methods often overlook emotional cues. This paper introduces a multi-modal emotion recognition platform (MEmoRE) to enhance the… ▽ More Requirements engineering (RE) plays a crucial role in developing software systems by bridging the gap between stakeholders' needs and system specifications. However, effective communication and elicitation of stakeholder requirements can be challenging, as traditional RE methods often overlook emotional cues. This paper introduces a multi-modal emotion recognition platform (MEmoRE) to enhance the requirements engineering process by capturing and analyzing the emotional cues of stakeholders in real-time. MEmoRE leverages state-of-the-art emotion recognition techniques, integrating facial expression, vocal intonation, and textual sentiment analysis to comprehensively understand stakeholder emotions. This multi-modal approach ensures the accurate and timely detection of emotional cues, enabling requirements engineers to tailor their elicitation strategies and improve overall communication with stakeholders. We further intend to employ our platform for later RE stages, such as requirements reviews and usability testing. By integrating multi-modal emotion recognition into requirements engineering, we aim to pave the way for more empathetic, effective, and successful software development processes. We performed a preliminary evaluation of our platform. This paper reports on the platform design, preliminary evaluation, and future development plan as an ongoing project. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2305.01082 [pdf, other]

doi 10.1145/3539618.3591861

Contextual Multilingual Spellchecker for User Queries

Authors: Sanat Sharma, Josep Valls-Vargas, Tracy Holloway King, Francois Guerin, Chirag Arora

Abstract: Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most… ▽ More Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most innovative recent architectures focus on English and are not trained in a multilingual fashion and are trained for spell correction in longer text, which is a different paradigm from spell correction for user queries, where context is sparse (most queries are 1-2 words long). Finally, since most enterprises have unique vocabularies such as product names, off-the-shelf spelling solutions fall short of users' needs. In this work, we build a multilingual spellchecker that is extremely fast and scalable and that adapts its vocabulary and hence speller output based on a specific product's needs. Furthermore, our speller out-performs general purpose spellers by a wide margin on in-domain datasets. Our multilingual speller is used in search in Adobe products, powering autocomplete in various applications. △ Less

Submitted 14 June, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: 5 pages, In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23)

arXiv:2304.01074 [pdf, other]

FinderNet: A Data Augmentation Free Canonicalization aided Loop Detection and Closure technique for Point clouds in 6-DOF separation

Authors: Sudarshan S Harithas, Gurkirat Singh, Aneesh Chavan, Sarthak Sharma, Suraj Patni, Chetan Arora, K. Madhava Krishna

Abstract: We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads… ▽ More We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads to highly inaccurate LDC. In this original approach, we propose independent roll and pitch canonicalization of the point clouds using a common dominant ground plane. Discretization of the canonicalized point cloud along the axis perpendicular to the ground plane leads to an image similar to Digital Elevation Maps (DEMs), which exposes strong spatial priors in the scene. Our experiments show that LDC based on learnt embeddings of such DEMs is not only data efficient but also significantly more robust, and generalizable than the current SOTA. We report significant performance gain in terms of Average Precision for loop detection and absolute translation/rotation error for relative pose estimation (or loop closure) on Kitti, GPR and Oxford Robot Car over multiple SOTA LDC methods. Our encoder technique allows to compress the original point cloud by over 830 times. To further test the robustness of our technique we create and opensource a custom dataset called Lidar-UrbanFly Dataset (LUF) which consists of point clouds obtained from a LiDAR mounted on a quadrotor. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2303.10439 [pdf, other]

Stop Words for Processing Software Engineering Documents: Do they Matter?

Authors: Yaohou Fan, Chetan Arora, Christoph Treude

Abstract: Stop words, which are considered non-predictive, are often eliminated in natural language processing tasks. However, the definition of uninformative vocabulary is vague, so most algorithms use general knowledge-based stop lists to remove stop words. There is an ongoing debate among academics about the usefulness of stop word elimination, especially in domain-specific settings. In this work, we inv… ▽ More Stop words, which are considered non-predictive, are often eliminated in natural language processing tasks. However, the definition of uninformative vocabulary is vague, so most algorithms use general knowledge-based stop lists to remove stop words. There is an ongoing debate among academics about the usefulness of stop word elimination, especially in domain-specific settings. In this work, we investigate the usefulness of stop word removal in a software engineering context. To do this, we replicate and experiment with three software engineering research tools from related work. Additionally, we construct a corpus of software engineering domain-related text from 10,000 Stack Overflow questions and identify 200 domain-specific stop words using traditional information-theoretic methods. Our results show that the use of domain-specific stop words significantly improved the performance of research tools compared to the use of a general stop list and that 17 out of 19 evaluation measures showed better performance. Online appendix: https://zenodo.org/record/7865748 △ Less

Submitted 12 June, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

Comments: Accepted for publication at the 2nd Intl. Workshop on NL-based Software Engineering (NLBSE 2023)

arXiv:2303.02920 [pdf, other]

Requirements Engineering Framework for Human-centered Artificial Intelligence Software Systems

Authors: Khlood Ahmad, Mohamed Abdelrazek, Chetan Arora, Arbind Agrahari Baniya, Muneera Bano, John Grundy

Abstract: [Context] Artificial intelligence (AI) components used in building software solutions have substantially increased in recent years. However, many of these solutions focus on technical aspects and ignore critical human-centered aspects. [Objective] Including human-centered aspects during requirements engineering (RE) when building AI-based software can help achieve more responsible, unbiased, and i… ▽ More [Context] Artificial intelligence (AI) components used in building software solutions have substantially increased in recent years. However, many of these solutions focus on technical aspects and ignore critical human-centered aspects. [Objective] Including human-centered aspects during requirements engineering (RE) when building AI-based software can help achieve more responsible, unbiased, and inclusive AI-based software solutions. [Method] In this paper, we present a new framework developed based on human-centered AI guidelines and a user survey to aid in collecting requirements for human-centered AI-based software. We provide a catalog to elicit these requirements and a conceptual model to present them visually. [Results] The framework is applied to a case study to elicit and model requirements for enhancing the quality of 360 degree~videos intended for virtual reality (VR) users. [Conclusion] We found that our proposed approach helped the project team fully understand the human-centered needs of the project to deliver. Furthermore, the framework helped to understand what requirements need to be captured at the initial stages against later stages in the engineering process of AI-based software. △ Less

Submitted 18 May, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

arXiv:2302.06034 [pdf, other]

Requirements Elicitation and Modelling of Artificial Intelligence Systems: An Empirical Study

Authors: Khlood Ahmad, Mohamed Abdelrazek, Chetan Arora, John Grundy, Muneera Bano

Abstract: Artificial Intelligence (AI) systems have gained significant traction in the recent past, creating new challenges in requirements engineering (RE) when building AI software systems. RE for AI practices have not been studied much and have scarce empirical studies. Additionally, many AI software solutions tend to focus on the technical aspects and ignore human-centered values. In this paper, we repo… ▽ More Artificial Intelligence (AI) systems have gained significant traction in the recent past, creating new challenges in requirements engineering (RE) when building AI software systems. RE for AI practices have not been studied much and have scarce empirical studies. Additionally, many AI software solutions tend to focus on the technical aspects and ignore human-centered values. In this paper, we report on a case study for eliciting and modeling requirements using our framework and a supporting tool for human-centred RE for AI systems. Our case study is a mobile health application for encouraging type-2 diabetic people to reduce their sedentary behavior. We conducted our study with three experts from the app team -- a software engineer, a project manager and a data scientist. We found in our study that most human-centered aspects were not originally considered when developing the first version of the application. We also report on other insights and challenges faced in RE for the health application, e.g., frequently changing requirements. △ Less

Submitted 12 February, 2023; originally announced February 2023.

arXiv:2302.05618 [pdf, other]

Persona-based Assessment of Software Engineering Student Research Projects: An Experience Report

Authors: Chetan Arora, Laura Tubino, Andrew Cain, Kevin Lee, Vasudha Malhotra

Abstract: Students enrolled in software engineering degrees are generally required to undertake a research project in their final year through which they demonstrate the ability to conduct research, communicate outcomes, and build in-depth expertise in an area. Assessment in these projects typically involves evaluating the product of their research via a thesis or a similar artifact. However, this misses a… ▽ More Students enrolled in software engineering degrees are generally required to undertake a research project in their final year through which they demonstrate the ability to conduct research, communicate outcomes, and build in-depth expertise in an area. Assessment in these projects typically involves evaluating the product of their research via a thesis or a similar artifact. However, this misses a range of other factors that go into producing successful software engineers and researchers. Incorporating aspects such as process, attitudes, project complexity, and supervision support into the assessment can provide a more holistic evaluation of the performance likely to better align with the intended learning outcomes. In this paper, we present on our experience of adopting an innovative assessment approach to enhance learning outcomes and research performance in our software engineering research projects. Our approach adopted a task-oriented approach to portfolio assessment that incorporates student personas, frequent formative feedback, delayed summative grading, and standards-aligned outcomes-based assessment. We report upon our continuous improvement journey in adapting tasks and criteria to address the challenges of assessing student research projects. Our lessons learnt demonstrate the value of personas to guide the development of holistic rubrics, giving meaning to grades and focusing staff and student attention on attitudes and skills rather than a product only. △ Less

Submitted 11 February, 2023; originally announced February 2023.

arXiv:2302.05617 [pdf, other]

Towards Human-Centred Crowd Computing: Software for Better Use of Computational Resources

Authors: Niroshinie Fernando, Chetan Arora, Seng W. Loke, Lubna Alam, Stephen La Macchia, Helen Graesser

Abstract: Internet-connected smart devices are increasing at an exponential rate. These powerful devices have created a yet-untapped pool of idle resources that can be utilised, among others, for processing data in resource-depleted environments. The idea of bringing together a pool of smart devices for ``crowd computing'' (CC) has been studied in the recent past from an infrastructural feasibility perspect… ▽ More Internet-connected smart devices are increasing at an exponential rate. These powerful devices have created a yet-untapped pool of idle resources that can be utilised, among others, for processing data in resource-depleted environments. The idea of bringing together a pool of smart devices for ``crowd computing'' (CC) has been studied in the recent past from an infrastructural feasibility perspective. However, for the CC paradigm to be successful, numerous socio-technical and software engineering (SE), specifically the requirements engineering (RE)-related factors are at play and have not been investigated in the literature. In this paper, we motivate the SE-related aspects of CC and the ideas for implementing mobile apps required for CC scenarios. We present the results of a preliminary study on understanding the human aspects, incentives that motivate users, and CC app requirements, and present our future development plan in this relatively new field of research for SE applications. △ Less

Submitted 11 February, 2023; originally announced February 2023.

arXiv:2302.04793 [pdf, other]

AI-based Question Answering Assistance for Analyzing Natural-language Requirements

Authors: Saad Ezzini, Sallam Abualhaija, Chetan Arora, Mehrdad Sabetzadeh

Abstract: By virtue of being prevalently written in natural language (NL), requirements are prone to various defects, e.g., inconsistency and incompleteness. As such, requirements are frequently subject to quality assurance processes. These processes, when carried out entirely manually, are tedious and may further overlook important quality issues due to time and budget pressures. In this paper, we propose… ▽ More By virtue of being prevalently written in natural language (NL), requirements are prone to various defects, e.g., inconsistency and incompleteness. As such, requirements are frequently subject to quality assurance processes. These processes, when carried out entirely manually, are tedious and may further overlook important quality issues due to time and budget pressures. In this paper, we propose QAssist -- a question-answering (QA) approach that provides automated assistance to stakeholders, including requirements engineers, during the analysis of NL requirements. Posing a question and getting an instant answer is beneficial in various quality-assurance scenarios, e.g., incompleteness detection. Answering requirements-related questions automatically is challenging since the scope of the search for answers can go beyond the given requirements specification. To that end, QAssist provides support for mining external domain-knowledge resources. Our work is one of the first initiatives to bring together QA and external domain knowledge for addressing requirements engineering challenges. We evaluate QAssist on a dataset covering three application domains and containing a total of 387 question-answer pairs. We experiment with state-of-the-art QA methods, based primarily on recent large-scale language models. In our empirical study, QAssist localizes the answer to a question to three passages within the requirements specification and within the external domain-knowledge resource with an average recall of 90.1% and 96.5%, respectively. QAssist extracts the actual answer to the posed question with an average accuracy of 84.2%. Keywords: Natural-language Requirements, Question Answering (QA), Language Models, Natural Language Processing (NLP), Natural Language Generation (NLG), BERT, T5. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: This paper has been accepted at the 45th International Conference on Software Engineering (ICSE 2023)

arXiv:2301.10404 [pdf, other]

Requirements Practices and Gaps When Engineering Human-Centered Artificial Intelligence Systems

Authors: Khlood Ahmad, Mohamed Abdelrazek, Chetan Arora, Muneera Bano, John Grundy

Abstract: [Context] Engineering Artificial Intelligence (AI) software is a relatively new area with many challenges, unknowns, and limited proven best practices. Big companies such as Google, Microsoft, and Apple have provided a suite of recent guidelines to assist engineering teams in building human-centered AI systems. [Objective] The practices currently adopted by practitioners for developing such system… ▽ More [Context] Engineering Artificial Intelligence (AI) software is a relatively new area with many challenges, unknowns, and limited proven best practices. Big companies such as Google, Microsoft, and Apple have provided a suite of recent guidelines to assist engineering teams in building human-centered AI systems. [Objective] The practices currently adopted by practitioners for developing such systems, especially during Requirements Engineering (RE), are little studied and reported to date. [Method] This paper presents the results of a survey conducted to understand current industry practices in RE for AI (RE4AI) and to determine which key human-centered AI guidelines should be followed. Our survey is based on mapping existing industrial guidelines, best practices, and efforts in the literature. [Results] We surveyed 29 professionals and found most participants agreed that all the human-centered aspects we mapped should be addressed in RE. Further, we found that most participants were using UML or Microsoft Office to present requirements. [Conclusion] We identify that most of the tools currently used are not equipped to manage AI-based software, and the use of UML and Office may pose issues to the quality of requirements captured for AI. Also, all human-centered practices mapped from the guidelines should be included in RE. △ Less

Submitted 24 January, 2023; originally announced January 2023.

arXiv:2212.10693 [pdf, other]

Requirements Engineering for Artificial Intelligence Systems: A Systematic Mapping Study

Authors: Khlood Ahmad, Mohamed Abdelrazek, Chetan Arora, Muneera Bano, John Grundy

Abstract: [Context] In traditional software systems, Requirements Engineering (RE) activities are well-established and researched. However, building Artificial Intelligence (AI) based software with limited or no insight into the system's inner workings poses significant new challenges to RE. Existing literature has focused on using AI to manage RE activities, with limited research on RE for AI (RE4AI). [Obj… ▽ More [Context] In traditional software systems, Requirements Engineering (RE) activities are well-established and researched. However, building Artificial Intelligence (AI) based software with limited or no insight into the system's inner workings poses significant new challenges to RE. Existing literature has focused on using AI to manage RE activities, with limited research on RE for AI (RE4AI). [Objective] This paper investigates current approaches for specifying requirements for AI systems, identifies available frameworks, methodologies, tools, and techniques used to model requirements, and finds existing challenges and limitations. [Method] We performed a systematic mapping study to find papers on current RE4AI approaches. We identified 43 primary studies and analysed the existing methodologies, models, tools, and techniques used to specify and model requirements in real-world scenarios. [Results] We found several challenges and limitations of existing RE4AI practices. The findings highlighted that current RE applications were not adequately adaptable for building AI systems and emphasised the need to provide new techniques and tools to support RE4AI. [Conclusion] Our results showed that most of the empirical studies on RE4AI focused on autonomous, self-driving vehicles and managing data requirements, and areas such as ethics, trust, and explainability need further research. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2211.16200 [pdf, other]

From Forks to Forceps: A New Framework for Instance Segmentation of Surgical Instruments

Authors: Britty Baby, Daksh Thapar, Mustafa Chasmai, Tamajit Banerjee, Kunal Dargan, Ashish Suri, Subhashis Banerjee, Chetan Arora

Abstract: Minimally invasive surgeries and related applications demand surgical tool classification and segmentation at the instance level. Surgical tools are similar in appearance and are long, thin, and handled at an angle. The fine-tuning of state-of-the-art (SOTA) instance segmentation models trained on natural images for instrument segmentation has difficulty discriminating instrument classes. Our rese… ▽ More Minimally invasive surgeries and related applications demand surgical tool classification and segmentation at the instance level. Surgical tools are similar in appearance and are long, thin, and handled at an angle. The fine-tuning of state-of-the-art (SOTA) instance segmentation models trained on natural images for instrument segmentation has difficulty discriminating instrument classes. Our research demonstrates that while the bounding box and segmentation mask are often accurate, the classification head mis-classifies the class label of the surgical instrument. We present a new neural network framework that adds a classification module as a new stage to existing instance segmentation models. This module specializes in improving the classification of instrument masks generated by the existing model. The module comprises multi-scale mask attention, which attends to the instrument region and masks the distracting background features. We propose training our classifier module using metric learning with arc loss to handle low inter-class variance of surgical instruments. We conduct exhaustive experiments on the benchmark datasets EndoVis2017 and EndoVis2018. We demonstrate that our method outperforms all (more than 18) SOTA methods compared with, and improves the SOTA performance by at least 12 points (20%) on the EndoVis2017 benchmark challenge and generalizes effectively across the datasets. △ Less

Submitted 11 March, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

Comments: WACV 2023

arXiv:2211.04793 [pdf, other]

RadFormer: Transformers with Global-Local Attention for Interpretable and Accurate Gallbladder Cancer Detection

Authors: Soumen Basu, Mayank Gupta, Pratyaksha Rana, Pankaj Gupta, Chetan Arora

Abstract: We propose a novel deep neural network architecture to learn interpretable representation for medical image analysis. Our architecture generates a global attention for region of interest, and then learns bag of words style deep feature embeddings with local attention. The global, and local feature maps are combined using a contemporary transformer architecture for highly accurate Gallbladder Cance… ▽ More We propose a novel deep neural network architecture to learn interpretable representation for medical image analysis. Our architecture generates a global attention for region of interest, and then learns bag of words style deep feature embeddings with local attention. The global, and local feature maps are combined using a contemporary transformer architecture for highly accurate Gallbladder Cancer (GBC) detection from Ultrasound (USG) images. Our experiments indicate that the detection accuracy of our model beats even human radiologists, and advocates its use as the second reader for GBC diagnosis. Bag of words embeddings allow our model to be probed for generating interpretable explanations for GBC detection consistent with the ones reported in medical literature. We show that the proposed model not only helps understand decisions of neural network models but also aids in discovery of new visual features relevant to the diagnosis of GBC. Source-code and model will be available at https://github.com/sbasu276/RadFormer △ Less

Submitted 9 November, 2022; originally announced November 2022.

Comments: To Appear in Elsevier Medical Image Analysis

arXiv:2210.09071 [pdf, other]

Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention

Authors: Ashutosh Agarwal, Chetan Arora

Abstract: Monocular Depth Estimation (MDE) aims to predict pixel-wise depth given a single RGB image. For both, the convolutional as well as the recent attention-based models, encoder-decoder-based architectures have been found to be useful due to the simultaneous requirement of global context and pixel-level resolution. Typically, a skip connection module is used to fuse the encoder and decoder features, w… ▽ More Monocular Depth Estimation (MDE) aims to predict pixel-wise depth given a single RGB image. For both, the convolutional as well as the recent attention-based models, encoder-decoder-based architectures have been found to be useful due to the simultaneous requirement of global context and pixel-level resolution. Typically, a skip connection module is used to fuse the encoder and decoder features, which comprises of feature map concatenation followed by a convolution operation. Inspired by the demonstrated benefits of attention in a multitude of computer vision problems, we propose an attention-based fusion of encoder and decoder features. We pose MDE as a pixel query refinement problem, where coarsest-level encoder features are used to initialize pixel-level queries, which are then refined to higher resolutions by the proposed Skip Attention Module (SAM). We formulate the prediction problem as ordinal regression over the bin centers that discretize the continuous depth range and introduce a Bin Center Predictor (BCP) module that predicts bins at the coarsest level using pixel queries. Apart from the benefit of image adaptive depth binning, the proposed design helps learn improved depth embedding in initial pixel queries via direct supervision from the ground truth. Extensive experiments on the two canonical datasets, NYUV2 and KITTI, show that our architecture outperforms the state-of-the-art by 5.3% and 3.9%, respectively, along with an improved generalization performance by 9.4% on the SUNRGBD dataset. Code is available at https://github.com/ashutosh1807/PixelFormer.git. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: Accepted at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

arXiv:2210.06749 [pdf, other]

Reducing Annotation Effort by Identifying and Labeling Contextually Diverse Classes for Semantic Segmentation Under Domain Shift

Authors: Sharat Agarwal, Saket Anand, Chetan Arora

Abstract: In Active Domain Adaptation (ADA), one uses Active Learning (AL) to select a subset of images from the target domain, which are then annotated and used for supervised domain adaptation (DA). Given the large performance gap between supervised and unsupervised DA techniques, ADA allows for an excellent trade-off between annotation cost and performance. Prior art makes use of measures of uncertainty… ▽ More In Active Domain Adaptation (ADA), one uses Active Learning (AL) to select a subset of images from the target domain, which are then annotated and used for supervised domain adaptation (DA). Given the large performance gap between supervised and unsupervised DA techniques, ADA allows for an excellent trade-off between annotation cost and performance. Prior art makes use of measures of uncertainty or disagreement of models to identify `regions' to be annotated by the human oracle. However, these regions frequently comprise of pixels at object boundaries which are hard and tedious to annotate. Hence, even if the fraction of image pixels annotated reduces, the overall annotation time and the resulting cost still remain high. In this work, we propose an ADA strategy, which given a frame, identifies a set of classes that are hardest for the model to predict accurately, thereby recommending semantically meaningful regions to be annotated in a selected frame. We show that these set of `hard' classes are context-dependent and typically vary across frames, and when annotated help the model generalize better. We propose two ADA techniques: the Anchor-based and Augmentation-based approaches to select complementary and diverse regions in the context of the current training set. Our approach achieves 66.6 mIoU on GTA to Cityscapes dataset with an annotation budget of 4.7% in comparison to 64.9 mIoU by MADA using 5% of annotations. Our technique can also be used as a decorator for any existing frame-based AL technique, e.g., we report 1.5% performance improvement for CDAL on Cityscapes using our approach. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Accepted WACV2023

arXiv:2207.13916 [pdf, other]

A Novel Data Augmentation Technique for Out-of-Distribution Sample Detection using Compounded Corruptions

Authors: Ramya S. Hebbalaguppe, Soumya Suvra Goshal, Jatin Prakash, Harshad Khadilkar, Chetan Arora

Abstract: Modern deep neural network models are known to erroneously classify out-of-distribution (OOD) test data into one of the in-distribution (ID) training classes with high confidence. This can have disastrous consequences for safety-critical applications. A popular mitigation strategy is to train a separate classifier that can detect such OOD samples at the test time. In most practical settings OOD ex… ▽ More Modern deep neural network models are known to erroneously classify out-of-distribution (OOD) test data into one of the in-distribution (ID) training classes with high confidence. This can have disastrous consequences for safety-critical applications. A popular mitigation strategy is to train a separate classifier that can detect such OOD samples at the test time. In most practical settings OOD examples are not known at the train time, and hence a key question is: how to augment the ID data with synthetic OOD samples for training such an OOD detector? In this paper, we propose a novel Compounded Corruption technique for the OOD data augmentation termed CnC. One of the major advantages of CnC is that it does not require any hold-out data apart from the training set. Further, unlike current state-of-the-art (SOTA) techniques, CnC does not require backpropagation or ensembling at the test time, making our method much faster at inference. Our extensive comparison with 20 methods from the major conferences in last 4 years show that a model trained using CnC based data augmentation, significantly outperforms SOTA, both in terms of OOD detection accuracy as well as inference time. We include a detailed post-hoc analysis to investigate the reasons for the success of our method and identify higher relative entropy and diversity of CnC samples as probable causes. We also provide theoretical insights via a piece-wise decomposition analysis on a two-dimensional dataset to reveal (visually and quantitatively) that our approach leads to a tighter boundary around ID classes, leading to better detection of OOD samples. Source code link: https://github.com/cnc-ood △ Less

Submitted 21 September, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

Comments: 16 pages of the main text, and supplemental material. Accepted in Research Track ECML'22. Project webpage: https://cnc-ood.github.io/

arXiv:2207.13148 [pdf, other]

Unsupervised Contrastive Learning of Image Representations from Ultrasound Videos with Hard Negative Mining

Authors: Soumen Basu, Somanshu Singla, Mayank Gupta, Pratyaksha Rana, Pankaj Gupta, Chetan Arora

Abstract: Rich temporal information and variations in viewpoints make video data an attractive choice for learning image representations using unsupervised contrastive learning (UCL) techniques. State-of-the-art (SOTA) contrastive learning techniques consider frames within a video as positives in the embedding space, whereas the frames from other videos are considered negatives. We observe that unlike multi… ▽ More Rich temporal information and variations in viewpoints make video data an attractive choice for learning image representations using unsupervised contrastive learning (UCL) techniques. State-of-the-art (SOTA) contrastive learning techniques consider frames within a video as positives in the embedding space, whereas the frames from other videos are considered negatives. We observe that unlike multiple views of an object in natural scene videos, an Ultrasound (US) video captures different 2D slices of an organ. Hence, there is almost no similarity between the temporally distant frames of even the same US video. In this paper we propose to instead utilize such frames as hard negatives. We advocate mining both intra-video and cross-video negatives in a hardness-sensitive negative mining curriculum in a UCL framework to learn rich image representations. We deploy our framework to learn the representations of Gallbladder (GB) malignancy from US videos. We also construct the first large-scale US video dataset containing 64 videos and 15,800 frames for learning GB representations. We show that the standard ResNet50 backbone trained with our framework improves the accuracy of models pretrained with SOTA UCL techniques as well as supervised pretrained models on ImageNet for the GB malignancy detection task by 2-6%. We further validate the generalizability of our method on a publicly available lung US image dataset of COVID-19 pathologies and show an improvement of 1.5% compared to SOTA. Source code, dataset, and models are available at https://gbc-iitd.github.io/usucl. △ Less

Submitted 26 July, 2022; originally announced July 2022.

Comments: ACCEPTED for publication at MICCAI 2022

arXiv:2207.10883 [pdf, other]

My View is the Best View: Procedure Learning from Egocentric Videos

Authors: Siddhant Bansal, Chetan Arora, C. V. Jawahar

Abstract: Procedure learning involves identifying the key-steps and determining their logical order to perform a task. Existing approaches commonly use third-person videos for learning the procedure, making the manipulated object small in appearance and often occluded by the actor, leading to significant errors. In contrast, we observe that videos obtained from first-person (egocentric) wearable cameras pro… ▽ More Procedure learning involves identifying the key-steps and determining their logical order to perform a task. Existing approaches commonly use third-person videos for learning the procedure, making the manipulated object small in appearance and often occluded by the actor, leading to significant errors. In contrast, we observe that videos obtained from first-person (egocentric) wearable cameras provide an unobstructed and clear view of the action. However, procedure learning from egocentric videos is challenging because (a) the camera view undergoes extreme changes due to the wearer's head motion, and (b) the presence of unrelated frames due to the unconstrained nature of the videos. Due to this, current state-of-the-art methods' assumptions that the actions occur at approximately the same time and are of the same duration, do not hold. Instead, we propose to use the signal provided by the temporal correspondences between key-steps across videos. To this end, we present a novel self-supervised Correspond and Cut (CnC) framework for procedure learning. CnC identifies and utilizes the temporal correspondences between the key-steps across multiple videos to learn the procedure. Our experiments show that CnC outperforms the state-of-the-art on the benchmark ProceL and CrossTask datasets by 5.2% and 6.3%, respectively. Furthermore, for procedure learning using egocentric videos, we propose the EgoProceL dataset consisting of 62 hours of videos captured by 130 subjects performing 16 tasks. The source code and the dataset are available on the project page https://sid2697.github.io/egoprocel/. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: 25 pages, 6 figures, Accepted in European Conference on Computer Vision (ECCV) 2022

arXiv:2207.04535 [pdf, other]

Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion

Authors: Ashutosh Agarwal, Chetan Arora

Abstract: Attention-based models such as transformers have shown outstanding performance on dense prediction tasks, such as semantic segmentation, owing to their capability of capturing long-range dependency in an image. However, the benefit of transformers for monocular depth prediction has seldom been explored so far. This paper benchmarks various transformer-based models for the depth estimation task on… ▽ More Attention-based models such as transformers have shown outstanding performance on dense prediction tasks, such as semantic segmentation, owing to their capability of capturing long-range dependency in an image. However, the benefit of transformers for monocular depth prediction has seldom been explored so far. This paper benchmarks various transformer-based models for the depth estimation task on an indoor NYUV2 dataset and an outdoor KITTI dataset. We propose a novel attention-based architecture, Depthformer for monocular depth estimation that uses multi-head self-attention to produce the multiscale feature maps, which are effectively combined by our proposed decoder network. We also propose a Transbins module that divides the depth range into bins whose center value is estimated adaptively per image. The final depth estimated is a linear combination of bin centers for each pixel. Transbins module takes advantage of the global receptive field using the transformer module in the encoding stage. Experimental results on NYUV2 and KITTI depth estimation benchmark demonstrate that our proposed method improves the state-of-the-art by 3.3%, and 3.3% respectively in terms of Root Mean Squared Error (RMSE). Code is available at https://github.com/ashutosh1807/Depthformer.git. △ Less

Submitted 12 July, 2022; v1 submitted 10 July, 2022; originally announced July 2022.

Journal ref: International Conference on Image Processing (ICIP), 2022

arXiv:2206.10233 [pdf, other]

COREQQA -- A COmpliance REQuirements Understanding using Question Answering Tool

Authors: Sallam Abualhaija, Chetan Arora, Lionel Briand

Abstract: We introduce COREQQA, a tool for assisting requirements engineers in acquiring a better understanding of compliance requirements by means of automated Question Answering. Extracting compliance-related requirements by manually navigating through a legal document is both time-consuming and error-prone. COREQQA enables requirements engineers to pose questions in natural language about a compliance-re… ▽ More We introduce COREQQA, a tool for assisting requirements engineers in acquiring a better understanding of compliance requirements by means of automated Question Answering. Extracting compliance-related requirements by manually navigating through a legal document is both time-consuming and error-prone. COREQQA enables requirements engineers to pose questions in natural language about a compliance-related topic given some legal document, e.g., asking about data breach. The tool then automatically navigates through the legal document and returns to the requirements engineer a list of text passages containing the possible answers to the input question. For better readability, the tool also highlights the likely answers in these passages. The engineer can then use this output for specifying compliance requirements. COREQQA is developed using advanced large-scale language models from BERT's family. COREQQA has been evaluated on four legal documents. The results of this evaluation are briefly presented in the paper. The tool is publicly available on Zenodo (DOI: 10.5281/zenodo.6653514). △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.10227 [pdf, other]

TAPHSIR: Towards AnaPHoric Ambiguity Detection and ReSolution In Requirements

Authors: Saad Ezzini, Sallam Abualhaija, Chetan Arora, Mehrdad Sabetzadeh

Abstract: We introduce TAPHSIR, a tool for anaphoric ambiguity detection and anaphora resolution in requirements. TAPHSIR facilities reviewing the use of pronouns in a requirements specification and revising those pronouns that can lead to misunderstandings during the development process. To this end, TAPHSIR detects the requirements which have potential anaphoric ambiguity and further attempts interpreting… ▽ More We introduce TAPHSIR, a tool for anaphoric ambiguity detection and anaphora resolution in requirements. TAPHSIR facilities reviewing the use of pronouns in a requirements specification and revising those pronouns that can lead to misunderstandings during the development process. To this end, TAPHSIR detects the requirements which have potential anaphoric ambiguity and further attempts interpreting anaphora occurrences automatically. TAPHSIR employs a hybrid solution composed of an ambiguity detection solution based on machine learning and an anaphora resolution solution based on a variant of the BERT language model. Given a requirements specification, TAPHSIR decides for each pronoun occurrence in the specification whether the pronoun is ambiguous or unambiguous, and further provides an automatic interpretation for the pronoun. The output generated by TAPHSIR can be easily reviewed and validated by requirements engineers. TAPHSIR is publicly available on Zenodo (DOI: 10.5281/zenodo.5902117). △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2204.11433 [pdf, other]

Surpassing the Human Accuracy: Detecting Gallbladder Cancer from USG Images with Curriculum Learning

Authors: Soumen Basu, Mayank Gupta, Pratyaksha Rana, Pankaj Gupta, Chetan Arora

Abstract: We explore the potential of CNN-based models for gallbladder cancer (GBC) detection from ultrasound (USG) images as no prior study is known. USG is the most common diagnostic modality for GB diseases due to its low cost and accessibility. However, USG images are challenging to analyze due to low image quality, noise, and varying viewpoints due to the handheld nature of the sensor. Our exhaustive s… ▽ More We explore the potential of CNN-based models for gallbladder cancer (GBC) detection from ultrasound (USG) images as no prior study is known. USG is the most common diagnostic modality for GB diseases due to its low cost and accessibility. However, USG images are challenging to analyze due to low image quality, noise, and varying viewpoints due to the handheld nature of the sensor. Our exhaustive study of state-of-the-art (SOTA) image classification techniques for the problem reveals that they often fail to learn the salient GB region due to the presence of shadows in the USG images. SOTA object detection techniques also achieve low accuracy because of spurious textures due to noise or adjacent organs. We propose GBCNet to tackle the challenges in our problem. GBCNet first extracts the regions of interest (ROIs) by detecting the GB (and not the cancer), and then uses a new multi-scale, second-order pooling architecture specializing in classifying GBC. To effectively handle spurious textures, we propose a curriculum inspired by human visual acuity, which reduces the texture biases in GBCNet. Experimental results demonstrate that GBCNet significantly outperforms SOTA CNN models, as well as the expert radiologists. Our technical innovations are generic to other USG image analysis tasks as well. Hence, as a validation, we also show the efficacy of GBCNet in detecting breast cancer from USG images. Project page with source code, trained models, and data is available at https://gbc-iitd.github.io/gbcnet △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: Accepted in IEEE/CVF Computer Vision and Pattern Recognition (CVPR) 2022

arXiv:2203.13834 [pdf, other]

A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration

Authors: Ramya Hebbalaguppe, Jatin Prakash, Neelabh Madan, Chetan Arora

Abstract: Deep Neural Networks ( DNN s) are known to make overconfident mistakes, which makes their use problematic in safety-critical applications. State-of-the-art ( SOTA ) calibration techniques improve on the confidence of predicted labels alone and leave the confidence of non-max classes (e.g. top-2, top-5) uncalibrated. Such calibration is not suitable for label refinement using post-processing. Furth… ▽ More Deep Neural Networks ( DNN s) are known to make overconfident mistakes, which makes their use problematic in safety-critical applications. State-of-the-art ( SOTA ) calibration techniques improve on the confidence of predicted labels alone and leave the confidence of non-max classes (e.g. top-2, top-5) uncalibrated. Such calibration is not suitable for label refinement using post-processing. Further, most SOTA techniques learn a few hyper-parameters post-hoc, leaving out the scope for image, or pixel specific calibration. This makes them unsuitable for calibration under domain shift, or for dense prediction tasks like semantic segmentation. In this paper, we argue for intervening at the train time itself, so as to directly produce calibrated DNN models. We propose a novel auxiliary loss function: Multi-class Difference in Confidence and Accuracy ( MDCA ), to achieve the same MDCA can be used in conjunction with other application/task-specific loss functions. We show that training with MDCA leads to better-calibrated models in terms of Expected Calibration Error ( ECE ), and Static Calibration Error ( SCE ) on image classification, and segmentation tasks. We report ECE ( SCE ) score of 0.72 (1.60) on the CIFAR 100 dataset, in comparison to 1.90 (1.71) by the SOTA. Under domain shift, a ResNet-18 model trained on PACS dataset using MDCA gives an average ECE ( SCE ) score of 19.7 (9.7) across all domains, compared to 24.2 (11.8) by the SOTA. For the segmentation task, we report a 2X reduction in calibration error on PASCAL - VOC dataset in comparison to Focal Loss. Finally, MDCA training improves calibration even on imbalanced data, and for natural language classification tasks. We have released the code here: code is available at https://github.com/mdca-loss △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: Accepted in IEEE Computer Vision and Pattern Recognition 2022

arXiv:2202.10594 [pdf, other]

Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey

Authors: Ngoc Dung Huynh, Mohamed Reda Bouadjenek, Imran Razzak, Kevin Lee, Chetan Arora, Ali Hassani, Arkady Zaslavsky

Abstract: A Machine-Critical Application is a system that is fundamentally necessary to the success of specific and sensitive operations such as search and recovery, rescue, military, and emergency management actions. Recent advances in Machine Learning, Natural Language Processing, voice recognition, and speech processing technologies have naturally allowed the development and deployment of speech-based co… ▽ More A Machine-Critical Application is a system that is fundamentally necessary to the success of specific and sensitive operations such as search and recovery, rescue, military, and emergency management actions. Recent advances in Machine Learning, Natural Language Processing, voice recognition, and speech processing technologies have naturally allowed the development and deployment of speech-based conversational interfaces to interact with various machine-critical applications. While these conversational interfaces have allowed users to give voice commands to carry out strategic and critical activities, their robustness to adversarial attacks remains uncertain and unclear. Indeed, Adversarial Artificial Intelligence (AI) which refers to a set of techniques that attempt to fool machine learning models with deceptive data, is a growing threat in the AI and machine learning research community, in particular for machine-critical applications. The most common reason of adversarial attacks is to cause a malfunction in a machine learning model. An adversarial attack might entail presenting a model with inaccurate or fabricated samples as it's training data, or introducing maliciously designed data to deceive an already trained model. While focusing on speech recognition for machine-critical applications, in this paper, we first review existing speech recognition techniques, then, we investigate the effectiveness of adversarial attacks and defenses against these systems, before outlining research challenges, defense recommendations, and future work. This paper is expected to serve researchers and practitioners as a reference to help them in understanding the challenges, position themselves and, ultimately, help them to improve existing models of speech recognition for mission-critical applications. Keywords: Mission-Critical Applications, Adversarial AI, Speech Recognition Systems. △ Less

Submitted 21 February, 2022; originally announced February 2022.

arXiv:2111.06639 [pdf, other]

Attention Guided Cosine Margin For Overcoming Class-Imbalance in Few-Shot Road Object Detection

Authors: Ashutosh Agarwal, Anay Majee, Anbumani Subramanian, Chetan Arora

Abstract: Few-shot object detection (FSOD) localizes and classifies objects in an image given only a few data samples. Recent trends in FSOD research show the adoption of metric and meta-learning techniques, which are prone to catastrophic forgetting and class confusion. To overcome these pitfalls in metric learning based FSOD techniques, we introduce Attention Guided Cosine Margin (AGCM) that facilitates t… ▽ More Few-shot object detection (FSOD) localizes and classifies objects in an image given only a few data samples. Recent trends in FSOD research show the adoption of metric and meta-learning techniques, which are prone to catastrophic forgetting and class confusion. To overcome these pitfalls in metric learning based FSOD techniques, we introduce Attention Guided Cosine Margin (AGCM) that facilitates the creation of tighter and well separated class-specific feature clusters in the classification head of the object detector. Our novel Attentive Proposal Fusion (APF) module minimizes catastrophic forgetting by reducing the intra-class variance among co-occurring classes. At the same time, the proposed Cosine Margin Cross-Entropy loss increases the angular margin between confusing classes to overcome the challenge of class confusion between already learned (base) and newly added (novel) classes. We conduct our experiments on the challenging India Driving Dataset (IDD), which presents a real-world class-imbalanced setting alongside popular FSOD benchmark PASCAL-VOC. Our method outperforms State-of-the-Art (SoTA) approaches by up to 6.4 mAP points on the IDD-OS and up to 2.0 mAP points on the IDD-10 splits for the 10-shot setting. On the PASCAL-VOC dataset, we outperform existing SoTA approaches by up to 4.9 mAP points. △ Less

Submitted 12 November, 2021; originally announced November 2021.

Comments: 8 pages, 4 figures

Showing 1–50 of 68 results for author: Arora, C