Skip to main content

Showing 1–50 of 200 results for author: Gupta, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.18020  [pdf, other

    cs.LG cs.CR stat.ML

    Generalization under Byzantine & Poisoning Attacks: Tight Stability Bounds in Robust Distributed Learning

    Authors: Thomas Boudou, Batiste Le Bars, Nirupam Gupta, Aurélien Bellet

    Abstract: Robust distributed learning algorithms aim to maintain good performance in distributed and federated settings, even in the presence of misbehaving workers. Two primary threat models have been studied: Byzantine attacks, where misbehaving workers can send arbitrarily corrupted updates, and data poisoning attacks, where misbehavior is limited to manipulation of local training data. While prior work… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  2. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  3. arXiv:2506.10910  [pdf, ps, other

    cs.CL

    Magistral

    Authors: Mistral-AI, :, Abhinav Rastogi, Albert Q. Jiang, Andy Lo, Gabrielle Berrada, Guillaume Lample, Jason Rute, Joep Barmentlo, Karmesh Yadav, Kartik Khandelwal, Khyathi Raghavi Chandu, Léonard Blier, Lucile Saulnier, Matthieu Dinot, Maxime Darrin, Neha Gupta, Roman Soletskyi, Sagar Vaze, Teven Le Scao, Yihan Wang, Adam Yang, Alexander H. Liu, Alexandre Sablayrolles, Amélie Héliou , et al. (76 additional authors not shown)

    Abstract: We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a s… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  4. arXiv:2506.07400  [pdf, ps, other

    cs.MA cs.AI cs.CV cs.LG

    MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models

    Authors: Philip R. Liu, Sparsh Bansal, Jimmy Dinh, Aditya Pawar, Ramani Satishkumar, Shail Desai, Neeraj Gupta, Xin Wang, Shu Hu

    Abstract: The integration of deep learning-based glaucoma detection with large language models (LLMs) presents an automated strategy to mitigate ophthalmologist shortages and improve clinical reporting efficiency. However, applying general LLMs to medical imaging remains challenging due to hallucinations, limited interpretability, and insufficient domain-specific medical knowledge, which can potentially red… ▽ More

    Submitted 11 June, 2025; v1 submitted 8 June, 2025; originally announced June 2025.

    Comments: 7 pages, 6 figures. Accepted to the 2025 IEEE 8th International Conference on Multimedia Information Processing and Retrieval (MIPR)

  5. arXiv:2505.24258  [pdf, ps, other

    cs.AI

    FABLE: A Novel Data-Flow Analysis Benchmark on Procedural Text for Large Language Model Evaluation

    Authors: Vishal Pallagani, Nitin Gupta, John Aydin, Biplav Srivastava

    Abstract: Understanding how data moves, transforms, and persists, known as data flow, is fundamental to reasoning in procedural tasks. Despite their fluency in natural and programming languages, large language models (LLMs), although increasingly being applied to decisions with procedural tasks, have not been systematically evaluated for their ability to perform data-flow reasoning. We introduce FABLE, an e… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  6. arXiv:2505.19479  [pdf

    cs.CV cs.LG

    Revolutionizing Wildfire Detection with Convolutional Neural Networks: A VGG16 Model Approach

    Authors: Lakshmi Aishwarya Malladi, Navarun Gupta, Ahmed El-Sayed, Xingguo Xiong

    Abstract: Over 8,024 wildfire incidents have been documented in 2024 alone, affecting thousands of fatalities and significant damage to infrastructure and ecosystems. Wildfires in the United States have inflicted devastating losses. Wildfires are becoming more frequent and intense, which highlights how urgently efficient warning systems are needed to avoid disastrous outcomes. The goal of this study is to e… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Conference at ASEE 2025

  7. arXiv:2505.17395  [pdf

    cs.CV cs.AI

    Wildfire Detection Using Vision Transformer with the Wildfire Dataset

    Authors: Gowtham Raj Vuppari, Navarun Gupta, Ahmed El-Sayed, Xingguo Xiong

    Abstract: The critical need for sophisticated detection techniques has been highlighted by the rising frequency and intensity of wildfires in the US, especially in California. In 2023, wildfires caused 130 deaths nationwide, the highest since 1990. In January 2025, Los Angeles wildfires which included the Palisades and Eaton fires burnt approximately 40,000 acres and 12,000 buildings, and caused loss of hum… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Published at ASEE NE 2025

  8. arXiv:2505.09970  [pdf, ps, other

    cs.AI

    Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents

    Authors: Mrinal Rawat, Ambuje Gupta, Rushil Goomer, Alessandro Di Bari, Neha Gupta, Roberto Pieraccini

    Abstract: The ReAct (Reasoning + Action) capability in large language models (LLMs) has become the foundation of modern agentic systems. Recent LLMs, such as DeepSeek-R1 and OpenAI o1/o3, exemplify this by emphasizing reasoning through the generation of ample intermediate tokens, which help build a strong premise before producing the final output tokens. In this paper, we introduce Pre-Act, a novel approach… ▽ More

    Submitted 18 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

  9. arXiv:2504.12803  [pdf, other

    cs.LG cs.AI

    Enhancing Explainability and Reliable Decision-Making in Particle Swarm Optimization through Communication Topologies

    Authors: Nitin Gupta, Indu Bala, Bapi Dutta, Luis Martínez, Anupam Yadav

    Abstract: Swarm intelligence effectively optimizes complex systems across fields like engineering and healthcare, yet algorithm solutions often suffer from low reliability due to unclear configurations and hyperparameters. This study analyzes Particle Swarm Optimization (PSO), focusing on how different communication topologies Ring, Star, and Von Neumann affect convergence and search behaviors. Using an ada… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  10. arXiv:2504.09877  [pdf

    cs.IR cs.AI

    Constructing Micro Knowledge Graphs from Technical Support Documents

    Authors: Atul Kumar, Nisha Gupta, Saswati Dana

    Abstract: Short technical support pages such as IBM Technotes are quite common in technical support domain. These pages can be very useful as the knowledge sources for technical support applications such as chatbots, search engines and question-answering (QA) systems. Information extracted from documents to drive technical support applications is often stored in the form of Knowledge Graph (KG). Building KG… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  11. arXiv:2504.07995  [pdf, other

    cs.CL

    SafeChat: A Framework for Building Trustworthy Collaborative Assistants and a Case Study of its Usefulness

    Authors: Biplav Srivastava, Kausik Lakkaraju, Nitin Gupta, Vansh Nagpal, Bharath C. Muppasani, Sara E. Jones

    Abstract: Collaborative assistants, or chatbots, are data-driven decision support systems that enable natural interaction for task completion. While they can meet critical needs in modern society, concerns about their reliability and trustworthiness persist. In particular, Large Language Model (LLM)-based chatbots like ChatGPT, Gemini, and DeepSeek are becoming more accessible. However, such chatbots have l… ▽ More

    Submitted 15 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

  12. arXiv:2504.03869  [pdf

    cond-mat.soft cs.LG

    CREASE-2D Analysis of Small Angle X-ray Scattering Data from Supramolecular Dipeptide Systems

    Authors: Nitant Gupta, Sri V. V. R. Akepati, Simona Bianco, Jay Shah, Dave J. Adams, Arthi Jayaraman

    Abstract: In this paper, we extend a recently developed machine-learning (ML) based CREASE-2D method to analyze the entire two-dimensional (2D) scattering pattern obtained from small angle X-ray scattering measurements of supramolecular dipeptide micellar systems. Traditional analysis of such scattering data would involve use of approximate or incorrect analytical models to fit to azimuthally-averaged 1D sc… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 30 Pages, 9 figures

  13. arXiv:2504.01331  [pdf, other

    cs.AI cs.NE

    An Explainable Reconfiguration-Based Optimization Algorithm for Industrial and Reliability-Redundancy Allocation Problems

    Authors: Dikshit Chauhan, Nitin Gupta, Anupam Yadav

    Abstract: Industrial and reliability optimization problems often involve complex constraints and require efficient, interpretable solutions. This paper presents AI-AEFA, an advanced parameter reconfiguration-based metaheuristic algorithm designed to address large-scale industrial and reliability-redundancy allocation problems. AI-AEFA enhances search space exploration and convergence efficiency through a no… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 38 pages, 12 figures

  14. arXiv:2503.07891  [pdf, other

    cs.CL cs.AI

    Gemini Embedding: Generalizable Embeddings from Gemini

    Authors: Jinhyuk Lee, Feiyang Chen, Sahil Dua, Daniel Cer, Madhuri Shanbhogue, Iftekhar Naim, Gustavo Hernández Ábrego, Zhe Li, Kaifeng Chen, Henrique Schechter Vera, Xiaoqi Ren, Shanfeng Zhang, Daniel Salz, Michael Boratko, Jay Han, Blair Chen, Shuo Huang, Vikram Rao, Paul Suganthan, Feng Han, Andreas Doumanoglou, Nithi Gupta, Fedor Moiseev, Cathy Yip, Aashi Jain , et al. (22 additional authors not shown)

    Abstract: In this report, we introduce Gemini Embedding, a state-of-the-art embedding model leveraging the power of Gemini, Google's most capable large language model. Capitalizing on Gemini's inherent multilingual and code understanding capabilities, Gemini Embedding produces highly generalizable embeddings for text spanning numerous languages and textual modalities. The representations generated by Gemini… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 19 pages

  15. arXiv:2502.19067  [pdf, other

    cs.SE cs.CL

    IndicEval-XL: Bridging Linguistic Diversity in Code Generation Across Indic Languages

    Authors: Ujjwal Singh, Aditi Sharma, Nikhil Gupta, Deepakshi, Vivek Kumar Jha

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation from natural language prompts, revolutionizing software development workflows. As we advance towards agent-based development paradigms, these models form the cornerstone of next-generation software development lifecycles. However, current benchmarks for evaluating multilingual code generation capabilities are… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  16. arXiv:2502.18394  [pdf, other

    cs.LG

    SPECTRE: An FFT-Based Efficient Drop-In Replacement to Self-Attention for Long Contexts

    Authors: Jacob Fein-Ashley, Neelesh Gupta, Rajgopal Kannan, Viktor Prasanna

    Abstract: Long-context transformers face significant efficiency challenges due to the quadratic cost of self-attention. However, many modern applications-from multi-turn dialogue to high-resolution vision-require contexts spanning tens of thousands of tokens. We introduce SPECTRE, a method that replaces each attention head with a fast real FFT, a content-adaptive spectral gate, and an inverse FFT, reducing… ▽ More

    Submitted 17 May, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

  17. arXiv:2502.04028  [pdf, other

    cs.LG

    Deep Meta Coordination Graphs for Multi-agent Reinforcement Learning

    Authors: Nikunj Gupta, James Zachary Hare, Rajgopal Kannan, Viktor Prasanna

    Abstract: This paper presents deep meta coordination graphs (DMCG) for learning cooperative policies in multi-agent reinforcement learning (MARL). Coordination graph formulations encode local interactions and accordingly factorize the joint value function of all agents to improve efficiency in MARL. However, existing approaches rely solely on pairwise relations between agents, which potentially oversimplifi… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  18. arXiv:2412.17910  [pdf, other

    cs.LG cs.AI

    A Novel Approach to Balance Convenience and Nutrition in Meals With Long-Term Group Recommendations and Reasoning on Multimodal Recipes and its Implementation in BEACON

    Authors: Vansh Nagpal, Siva Likitha Valluru, Kausik Lakkaraju, Nitin Gupta, Zach Abdulrahman, Andrew Davison, Biplav Srivastava

    Abstract: "A common decision made by people, whether healthy or with health conditions, is choosing meals like breakfast, lunch, and dinner, comprising combinations of foods for appetizer, main course, side dishes, desserts, and beverages. Often, this decision involves tradeoffs between nutritious choices (e.g., salt and sugar levels, nutrition content) and convenience (e.g., cost and accessibility, cuisine… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2406.13714

  19. arXiv:2412.05453  [pdf, ps, other

    cs.CL

    Knowledge Graphs are all you need: Leveraging KGs in Physics Question Answering

    Authors: Krishnasai Addala, Kabir Dev Paul Baghel, Dhruv Jain, Navya Gupta, Rishitej Reddy Vyalla, Chhavi Kirtani, Avinash Anand, Rajiv Ratn Shah

    Abstract: This study explores the effectiveness of using knowledge graphs generated by large language models to decompose high school-level physics questions into sub-questions. We introduce a pipeline aimed at enhancing model response quality for Question Answering tasks. By employing LLMs to construct knowledge graphs that capture the internal logic of the questions, these graphs then guide the generation… ▽ More

    Submitted 11 June, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

  20. arXiv:2412.05023  [pdf, ps, other

    cs.CL

    Steps are all you need: Rethinking STEM Education with Prompt Engineering

    Authors: Krishnasai Addala, Kabir Dev Paul Baghel, Navya Gupta, Rishitej Reddy Vyalla, Chhavi Kirtani, Avinash Anand, Rajiv Ratn Shah

    Abstract: Few shot and Chain-of-Thought prompting have shown promise when applied to Physics Question Answering Tasks, but are limited by the lack of mathematical ability inherent to LLMs, and are prone to hallucination. By utilizing a Mixture of Experts (MoE) Model, along with analogical prompting, we are able to show improved model performance when compared to the baseline on standard LLMs. We also survey… ▽ More

    Submitted 11 June, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

  21. arXiv:2411.07182  [pdf, other

    cs.LG cs.DC

    Revisiting Ensembling in One-Shot Federated Learning

    Authors: Youssef Allouah, Akash Dhasade, Rachid Guerraoui, Nirupam Gupta, Anne-Marie Kermarrec, Rafael Pinot, Rafael Pires, Rishi Sharma

    Abstract: Federated learning (FL) is an appealing approach to training machine learning models without sharing raw data. However, standard FL algorithms are iterative and thus induce a significant communication cost. One-shot federated learning (OFL) trades the iterative exchange of models between clients and the server with a single round of communication, thereby saving substantially on communication cost… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

    Comments: Accepted at NeurIPS 2024

  22. arXiv:2410.11247  [pdf, other

    cs.LG math-ph physics.geo-ph

    A Unified Framework for Forward and Inverse Problems in Subsurface Imaging using Latent Space Translations

    Authors: Naveen Gupta, Medha Sawhney, Arka Daw, Youzuo Lin, Anuj Karpatne

    Abstract: In subsurface imaging, learning the mapping from velocity maps to seismic waveforms (forward problem) and waveforms to velocity (inverse problem) is important for several applications. While traditional techniques for solving forward and inverse problems are computationally prohibitive, there is a growing interest in leveraging recent advances in deep learning to learn the mapping between velocity… ▽ More

    Submitted 1 April, 2025; v1 submitted 15 October, 2024; originally announced October 2024.

    Comments: Accepted at ICLR 2025

  23. arXiv:2410.10584  [pdf, other

    cs.AI cs.LG cs.MA

    STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with FeedBack

    Authors: Naman Gupta, Shashank Kirtania, Priyanshu Gupta, Krishna Kariya, Sumit Gulwani, Arun Iyer, Suresh Parthasarathy, Arjun Radhakrishna, Sriram K. Rajamani, Gustavo Soares

    Abstract: Large Language Models (LLMs) often generate incorrect or outdated information, especially in low-resource settings or when dealing with private data. To address this, Retrieval-Augmented Generation (RAG) uses external knowledge bases (KBs), but these can also suffer from inaccuracies. We introduce STACKFEED, a novel Structured Textual Actor-Critic Knowledge base editing with FEEDback approach that… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  24. arXiv:2409.20329  [pdf, other

    cs.LG cs.CR

    Fine-Tuning Personalization in Federated Learning to Mitigate Adversarial Clients

    Authors: Youssef Allouah, Abdellah El Mrini, Rachid Guerraoui, Nirupam Gupta, Rafael Pinot

    Abstract: Federated learning (FL) is an appealing paradigm that allows a group of machines (a.k.a. clients) to learn collectively while keeping their data local. However, due to the heterogeneity between the clients' data distributions, the model obtained through the use of FL algorithms may perform poorly on some client's data. Personalization addresses this issue by enabling each client to have a differen… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

  25. arXiv:2409.14803  [pdf, other

    cs.AI

    Benchmarking Edge AI Platforms for High-Performance ML Inference

    Authors: Rakshith Jayanth, Neelesh Gupta, Viktor Prasanna

    Abstract: Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions. While current approaches often involve scaling down modern hardware, the performance characteristics of neural network workloads on these platforms can vary significantly, especially when it comes… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  26. arXiv:2409.13949  [pdf

    cs.CL

    Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM

    Authors: Zheng Wei Lim, Nitish Gupta, Honglin Yu, Trevor Cohn

    Abstract: Multilingual large language models (LLMs) are great translators, but this is largely limited to high-resource languages. For many LLMs, translating in and out of low-resource languages remains a challenging task. To maximize data efficiency in this low-resource setting, we introduce Mufu, which includes a selection of automatically generated multilingual candidates and an instruction to correct in… ▽ More

    Submitted 16 February, 2025; v1 submitted 20 September, 2024; originally announced September 2024.

    Comments: 29 pages

  27. arXiv:2408.10239  [pdf, ps, other

    cs.CY cs.AI cs.LG cs.SE

    A Conceptual Framework for Ethical Evaluation of Machine Learning Systems

    Authors: Neha R. Gupta, Jessica Hullman, Hari Subramonyam

    Abstract: Research in Responsible AI has developed a range of principles and practices to ensure that machine learning systems are used in a manner that is ethical and aligned with human values. However, a critical yet often neglected aspect of ethical ML is the ethical implications that appear when designing evaluations of ML systems. For instance, teams may have to balance a trade-off between highly infor… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  28. Dimensionality Reduction and Nearest Neighbors for Improving Out-of-Distribution Detection in Medical Image Segmentation

    Authors: McKell Woodland, Nihil Patel, Austin Castelo, Mais Al Taie, Mohamed Eltaher, Joshua P. Yung, Tucker J. Netherton, Tiffany L. Calderone, Jessica I. Sanchez, Darrel W. Cleere, Ahmed Elsaiey, Nakul Gupta, David Victor, Laura Beretta, Ankit B. Patel, Kristy K. Brock

    Abstract: Clinically deployed deep learning-based segmentation models are known to fail on data outside of their training distributions. While clinicians review the segmentations, these models tend to perform well in most instances, which could exacerbate automation bias. Therefore, detecting out-of-distribution images at inference is critical to warn the clinicians that the model likely failed. This work a… ▽ More

    Submitted 2 October, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024:020. Expansion of "Dimensionality Reduction for Improving Out-of-Distribution Detection in Medical Image Segmentation" arXiv:2308.03723. Code available at https://github.com/mckellwoodland/dimen_reduce_mahal (https://zenodo.org/records/13881989)

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024) 2006

  29. arXiv:2407.16805  [pdf, other

    cs.HC cs.CY

    TAMIGO: Empowering Teaching Assistants using LLM-assisted viva and code assessment in an Advanced Computing Class

    Authors: Anishka IIITD, Diksha Sethi, Nipun Gupta, Shikhar Sharma, Srishti Jain, Ujjwal Singhal, Dhruv Kumar

    Abstract: Large Language Models (LLMs) have significantly transformed the educational landscape, offering new tools for students, instructors, and teaching assistants. This paper investigates the application of LLMs in assisting teaching assistants (TAs) with viva and code assessments in an advanced computing class on distributed systems in an Indian University. We develop TAMIGO, an LLM-based system for TA… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Under review

  30. arXiv:2407.13597  [pdf, other

    cs.CL cs.AI

    PLANTS: A Novel Problem and Dataset for Summarization of Planning-Like (PL) Tasks

    Authors: Vishal Pallagani, Biplav Srivastava, Nitin Gupta

    Abstract: Text summarization is a well-studied problem that deals with deriving insights from unstructured text consumed by humans, and it has found extensive business applications. However, many real-life tasks involve generating a series of actions to achieve specific goals, such as workflows, recipes, dialogs, and travel plans. We refer to them as planning-like (PL) tasks noting that the main commonality… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  31. arXiv:2407.05887  [pdf, other

    cs.CL cs.AI cs.LG

    Generation and De-Identification of Indian Clinical Discharge Summaries using LLMs

    Authors: Sanjeet Singh, Shreya Gupta, Niralee Gupta, Naimish Sharma, Lokesh Srivastava, Vibhu Agarwal, Ashutosh Modi

    Abstract: The consequences of a healthcare data breach can be devastating for the patients, providers, and payers. The average financial impact of a data breach in recent months has been estimated to be close to USD 10 million. This is especially significant for healthcare organizations in India that are managing rapid digitization while still establishing data governance procedures that align with the lett… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted at BioNLP Workshop at ACL 2024; 21 pages (9 pages main content)

  32. arXiv:2406.14670  [pdf, other

    cs.CL cs.AI cs.LG

    Exploring Design Choices for Building Language-Specific LLMs

    Authors: Atula Tejaswi, Nilesh Gupta, Eunsol Choi

    Abstract: Despite rapid progress in large language models (LLMs), their performance on a vast majority of languages remains unsatisfactory. In this paper, we study building language-specific LLMs by adapting monolingual and multilingual LLMs. We conduct systematic experiments on how design choices (base model selection, vocabulary extension, and continued pretraining) impact the adapted LLM, both in terms o… ▽ More

    Submitted 30 October, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted to EMNLP 2024 Findings

  33. arXiv:2405.19261  [pdf, other

    cs.CL cs.AI cs.LG

    Faster Cascades via Speculative Decoding

    Authors: Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule that invokes the larger model only for "hard" inputs, while speculative decoding uses speculative execution to primarily invoke the larger model in p… ▽ More

    Submitted 21 October, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  34. arXiv:2405.14432  [pdf, other

    cs.LG

    Adaptive Gradient Clipping for Robust Federated Learning

    Authors: Youssef Allouah, Rachid Guerraoui, Nirupam Gupta, Ahmed Jellouli, Geovani Rizk, John Stephan

    Abstract: Robust federated learning aims to maintain reliable performance despite the presence of adversarial or misbehaving workers. While state-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods were proven theoretically optimal, their empirical success has often relied on pre-aggregation gradient clipping. However, existing static clipping strategies yield inconsistent results: en… ▽ More

    Submitted 9 May, 2025; v1 submitted 23 May, 2024; originally announced May 2024.

  35. arXiv:2405.00491  [pdf, ps, other

    cs.LG

    On the Relevance of Byzantine Robust Optimization Against Data Poisoning

    Authors: Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Rafael Pinot

    Abstract: The success of machine learning (ML) has been intimately linked with the availability of large amounts of data, typically collected from heterogeneous sources and processed on vast networks of computing devices (also called {\em workers}). Beyond accuracy, the use of ML in critical domains such as healthcare and autonomous driving calls for robustness against {\em data poisoning}and some {\em faul… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 38 pages

  36. arXiv:2404.16816  [pdf, other

    cs.CL

    IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages

    Authors: Harman Singh, Nitish Gupta, Shikhar Bharadwaj, Dinesh Tewari, Partha Talukdar

    Abstract: As large language models (LLMs) see increasing adoption across the globe, it is imperative for LLMs to be representative of the linguistic diversity of the world. India is a linguistically diverse country of 1.4 Billion people. To facilitate research on multilingual LLM evaluation, we release IndicGenBench - the largest benchmark for evaluating LLMs on user-facing generation tasks across a diverse… ▽ More

    Submitted 7 August, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: ACL 2024

  37. arXiv:2404.10136  [pdf, other

    cs.CL cs.AI cs.LG

    Language Model Cascades: Token-level uncertainty and beyond

    Authors: Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here, a small model is invoked for most "easy" instances, while a few "hard" instances are deferred to the large model. While the principles underpinning c… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  38. arXiv:2404.05872  [pdf, other

    cs.CV cs.LG cs.NE

    TabConv: Low-Computation CNN Inference via Table Lookups

    Authors: Neelesh Gupta, Narayanan Kannan, Pengmiao Zhang, Viktor Prasanna

    Abstract: Convolutional Neural Networks (CNNs) have demonstrated remarkable ability throughout the field of computer vision. However, CNN inference requires a large number of arithmetic operations, making them expensive to deploy in hardware. Current approaches alleviate this issue by developing hardware-supported, algorithmic processes to simplify spatial convolution functions. However, these methods still… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 8 pages, Accepted at CF '24

    ACM Class: I.5.1

  39. arXiv:2404.00665  [pdf, ps, other

    cs.IT

    On cumulative and relative cumulative past information generating function

    Authors: Santosh Kumar Chaudhary, Nitin Gupta, Achintya Roy

    Abstract: In this paper, we introduce the cumulative past information generating function (CPIG) and relative cumulative past information generating function (RCPIG). We study its properties. We establish its relation with generalized cumulative past entropy (GCPE). We defined CPIG stochastic order and its relation with dispersive order. We provide the results for the CPIG measure of the convoluted random v… ▽ More

    Submitted 22 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

  40. arXiv:2403.20327  [pdf, other

    cs.CL cs.AI

    Gecko: Versatile Text Embeddings Distilled from Large Language Models

    Authors: Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, Iftekhar Naim

    Abstract: We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 18 pages

  41. arXiv:2403.14235  [pdf, other

    astro-ph.GA astro-ph.CO astro-ph.IM cs.CV cs.LG

    RG-CAT: Detection Pipeline and Catalogue of Radio Galaxies in the EMU Pilot Survey

    Authors: Nikhel Gupta, Ray P. Norris, Zeeshan Hayder, Minh Huynh, Lars Petersson, X. Rosalind Wang, Andrew M. Hopkins, Heinz Andernach, Yjan Gordon, Simone Riggi, Miranda Yew, Evan J. Crawford, Bärbel Koribalski, Miroslav D. Filipović, Anna D. Kapinśka, Stanislav Shabala, Tessa Vernstrom, Joshua R. Marvil

    Abstract: We present source detection and catalogue construction pipelines to build the first catalogue of radio galaxies from the 270 $\rm deg^2$ pilot survey of the Evolutionary Map of the Universe (EMU-PS) conducted with the Australian Square Kilometre Array Pathfinder (ASKAP) telescope. The detection pipeline uses Gal-DINO computer-vision networks (Gupta et al., 2024) to predict the categories of radio… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted for publication in PASA. The paper has 22 pages, 12 figures and 5 tables

  42. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  43. PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models

    Authors: Neelesh Gupta, Pengmiao Zhang, Rajgopal Kannan, Viktor Prasanna

    Abstract: Deep neural networks (DNNs) have proven to be effective models for accurate Memory Access Prediction (MAP), a critical task in mitigating memory latency through data prefetching. However, existing DNN-based MAP models suffer from the challenges such as significant physical storage space and poor inference latency, primarily due to their large number of parameters. These limitations render them imp… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 6 pages, 2 figures, HPEC '23

    Journal ref: 2023 IEEE High Performance Extreme Computing Conference (HPEC), 2023, pp. 1-7

  44. arXiv:2402.12780  [pdf, other

    cs.LG

    Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates

    Authors: Youssef Allouah, Sadegh Farhadkhani, Rachid GuerraouI, Nirupam Gupta, Rafael Pinot, Geovani Rizk, Sasha Voitovych

    Abstract: The possibility of adversarial (a.k.a., {\em Byzantine}) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a \emph{robust averaging rule}. While a significant amount of work has been devoted to studying the c… ▽ More

    Submitted 10 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  45. arXiv:2402.07411  [pdf, other

    cs.LG

    Potential-Based Reward Shaping For Intrinsic Motivation

    Authors: Grant C. Forbes, Nitish Gupta, Leonardo Villalobos-Arias, Colin M. Potts, Arnav Jhala, David L. Roberts

    Abstract: Recently there has been a proliferation of intrinsic motivation (IM) reward-shaping methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward shaping, particularly through potential-based reward shaping (PBRS), has not been ap… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: Extended version of paper appearing in AAMAS 2024

    ACM Class: I.2.6

  46. arXiv:2402.00045  [pdf, ps, other

    cs.MM cs.AI cs.LG

    Detecting Multimedia Generated by Large AI Models: A Survey

    Authors: Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, Shu Hu

    Abstract: The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has marked a new era where AI-generated multimedia is increasingly integrated into various aspects of daily life. Although beneficial in numerous fields, this content presents significant risks, including potential misuse, societal disruptions, and ethical concerns. Consequently, detecting mu… ▽ More

    Submitted 1 June, 2025; v1 submitted 22 January, 2024; originally announced February 2024.

  47. arXiv:2401.06362  [pdf, other

    cs.NE cs.AR cs.LG cs.OS

    Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching

    Authors: Pengmiao Zhang, Neelesh Gupta, Rajgopal Kannan, Viktor K. Prasanna

    Abstract: Attention-based Neural Networks (NN) have demonstrated their effectiveness in accurate memory access prediction, an essential step in data prefetching. However, the substantial computational overheads associated with these models result in high inference latency, limiting their feasibility as practical prefetchers. To close the gap, we propose a new approach based on tabularization that significan… ▽ More

    Submitted 21 February, 2024; v1 submitted 23 December, 2023; originally announced January 2024.

  48. arXiv:2401.02412  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    LLM Augmented LLMs: Expanding Capabilities through Composition

    Authors: Rachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Shikhar Vashishth, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar

    Abstract: Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domai… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: 17 pages, 2 figures, 8 tables

  49. arXiv:2312.07343  [pdf, ps, other

    cs.HC cs.AI

    Can ChatGPT Play the Role of a Teaching Assistant in an Introductory Programming Course?

    Authors: Anishka, Atharva Mehta, Nipun Gupta, Aarav Balachandran, Dhruv Kumar, Pankaj Jalote

    Abstract: The emergence of Large language models (LLMs) is expected to have a major impact on education. This paper explores the potential of using ChatGPT, an LLM, as a virtual Teaching Assistant (TA) in an Introductory Programming Course. We evaluate ChatGPT's capabilities by comparing its performance with that of human TAs in some of the important TA functions. The TA functions which we focus on include… ▽ More

    Submitted 22 January, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Under review

  50. arXiv:2312.06728  [pdf, other

    cs.CV astro-ph.CO astro-ph.GA astro-ph.IM

    A Multimodal Dataset and Benchmark for Radio Galaxy and Infrared Host Detection

    Authors: Nikhel Gupta, Zeeshan Hayder, Ray P. Norris, Minh Hyunh, Lars Petersson

    Abstract: We present a novel multimodal dataset developed by expert astronomers to automate the detection and localisation of multi-component extended radio galaxies and their corresponding infrared hosts. The dataset comprises 4,155 instances of galaxies in 2,800 images with both radio and infrared modalities. Each instance contains information on the extended radio galaxy class, its corresponding bounding… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted in NeurIPS 2023 conference ML4PS workshop (https://nips.cc/). The full version accepted in PASA, is available at https://doi.org/10.1017/pasa.2023.64