Skip to main content

Showing 1–15 of 15 results for author: Gulzar, M A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.19287  [pdf, ps, other

    cs.SE

    Generating and Understanding Tests via Path-Aware Symbolic Execution with LLMs

    Authors: Yaoxuan Wu, Xiaojie Zhou, Ahmad Humayun, Muhammad Ali Gulzar, Miryung Kim

    Abstract: Symbolic execution is a widely used technique for test generation, offering systematic exploration of program paths through constraint solving. However, it is fundamentally constrained by the capability to model the target code including library functions in terms of symbolic constraint and the capability of underlying constraint solvers. As a result, many paths involving complex features remain u… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  2. arXiv:2504.04372  [pdf, other

    cs.SE cs.AI cs.LG

    How Accurately Do Large Language Models Understand Code?

    Authors: Sabaat Haroon, Ahmad Faraz Khan, Ahmad Humayun, Waris Gill, Abdul Haddi Amjad, Ali R. Butt, Mohammad Taha Khan, Muhammad Ali Gulzar

    Abstract: Large Language Models (LLMs) are increasingly used in post-development tasks such as code repair and testing. A key factor in these tasks' success is the model's deep understanding of code. However, the extent to which LLMs truly understand code remains largely unevaluated. Quantifying code comprehension is challenging due to its abstract nature and the lack of a standardized metric. Previously, t… ▽ More

    Submitted 9 April, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

    Comments: This paper is currently Under Review. It consists of 11 pages, 12 Figures, and 5 Tables

  3. arXiv:2504.02268  [pdf, other

    cs.LG cs.CL

    Advancing Semantic Caching for LLMs with Domain-Specific Embeddings and Synthetic Data

    Authors: Waris Gill, Justin Cechmanek, Tyler Hutcherson, Srijith Rajamohan, Jen Agarwal, Muhammad Ali Gulzar, Manvinder Singh, Benoit Dion

    Abstract: This report investigates enhancing semantic caching effectiveness by employing specialized, fine-tuned embedding models. Semantic caching relies on embedding similarity rather than exact key matching, presenting unique challenges in balancing precision, query latency, and computational efficiency. We propose leveraging smaller, domain-specific embedding models, fine-tuned with targeted real-world… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: Initial study on embedding fine tuning for semantic cache. It also explores synthetic data. Total pages are 12, including refrences

  4. arXiv:2502.04184  [pdf, other

    cs.SE

    Are the Majority of Public Computational Notebooks Pathologically Non-Executable?

    Authors: Tien Nguyen, Waris Gill, Muhammad Ali Gulzar

    Abstract: Computational notebooks are the de facto platforms for exploratory data science, offering an interactive programming environment where users can create, modify, and execute code cells in any sequence. However, this flexibility often introduces code quality issues, with prior studies showing that approximately 76% of public notebooks are non-executable, raising significant concerns about reusabilit… ▽ More

    Submitted 6 February, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

    Comments: 12 pages, 10 figures, 3 tables, the 22nd International Conference on Mining Software Repositories (MSR 2025)

  5. arXiv:2409.18590  [pdf, other

    cs.SE

    Accessibility Issues in Ad-Driven Web Applications

    Authors: Abdul Haddi Amjad, Muhammad Danish, Bless Jah, Muhammad Ali Gulzar

    Abstract: Website accessibility is essential for inclusiveness and regulatory compliance. Although third-party advertisements (ads) are a vital revenue source for free web services, they introduce significant accessibility challenges. Leasing a websiteś space to ad-serving technologies like DoubleClick results in developers losing control over ad content accessibility. Even on highly accessible websites, th… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

  6. arXiv:2405.18385  [pdf, other

    cs.CR

    Blocking Tracking JavaScript at the Function Granularity

    Authors: Abdul Haddi Amjad, Shaoor Munir, Zubair Shafiq, Muhammad Ali Gulzar

    Abstract: Modern websites extensively rely on JavaScript to implement both functionality and tracking. Existing privacy enhancing content blocking tools struggle against mixed scripts, which simultaneously implement both functionality and tracking, because blocking the script would break functionality and not blocking it would allow tracking. We propose Not.js, a fine grained JavaScript blocking tool that o… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  7. arXiv:2404.18881  [pdf, other

    cs.HC cs.LG cs.SE

    Human-in-the-Loop Synthetic Text Data Inspection with Provenance Tracking

    Authors: Hong Jin Kang, Fabrice Harel-Canada, Muhammad Ali Gulzar, Violet Peng, Miryung Kim

    Abstract: Data augmentation techniques apply transformations to existing texts to generate additional data. The transformations may produce low-quality texts, where the meaning of the text is changed and the text may even be mangled beyond human comprehension. Analyzing the synthetically generated texts and their corresponding labels is slow and demanding. To winnow out texts with incorrect labels, we devel… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: NAACL 2024 Findings

  8. arXiv:2403.02694  [pdf, other

    cs.LG cs.AI cs.CL cs.CR cs.DC

    MeanCache: User-Centric Semantic Caching for LLM Web Services

    Authors: Waris Gill, Mohamed Elidrisi, Pallavi Kalapatapu, Ammar Ahmed, Ali Anwar, Muhammad Ali Gulzar

    Abstract: Large Language Models (LLMs) like ChatGPT and Llama have revolutionized natural language processing and search engine dynamics. However, these models incur exceptionally high computational costs. For instance, GPT-3 consists of 175 billion parameters, where inference demands billions of floating-point operations. Caching is a natural solution to reduce LLM inference costs on repeated queries, whic… ▽ More

    Submitted 7 March, 2025; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted at 2025 IEEE 39th International Parallel and Distributed Processing Symposium (IPDPS)

    ACM Class: I.2.7

  9. arXiv:2312.13632  [pdf, other

    cs.LG cs.AI cs.CV cs.DC cs.SE

    TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance

    Authors: Waris Gill, Ali Anwar, Muhammad Ali Gulzar

    Abstract: In Federated Learning, clients train models on local data and send updates to a central server, which aggregates them into a global model using a fusion algorithm. This collaborative yet privacy-preserving training comes at a cost. FL developers face significant challenges in attributing global model predictions to specific clients. Localizing responsible clients is a crucial step towards (a) excl… ▽ More

    Submitted 17 January, 2025; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted at 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)

  10. arXiv:2307.08672  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    FedDefender: Backdoor Attack Defense in Federated Learning

    Authors: Waris Gill, Ali Anwar, Muhammad Ali Gulzar

    Abstract: Federated Learning (FL) is a privacy-preserving distributed machine learning technique that enables individual clients (e.g., user participants, edge devices, or organizations) to train a model on their local data in a secure environment and then share the trained model with an aggregator to build a global model collaboratively. In this work, we propose FedDefender, a defense mechanism against tar… ▽ More

    Submitted 22 February, 2024; v1 submitted 1 July, 2023; originally announced July 2023.

    Comments: Published in SE4SafeML 2023 (co-located with FSE 2023). See https://dl.acm.org/doi/abs/10.1145/3617574.3617858

  11. arXiv:2302.01182  [pdf, other

    cs.CR cs.SE

    Blocking JavaScript without Breaking the Web: An Empirical Investigation

    Authors: Abdul Haddi Amjad, Zubair Shafiq, Muhammad Ali Gulzar

    Abstract: Modern websites heavily rely on JavaScript (JS) to implement legitimate functionality as well as privacy-invasive advertising and tracking. Browser extensions such as NoScript block any script not loaded by a trusted list of endpoints, thus hoping to block privacy-invasive scripts while avoiding breaking legitimate website functionality. In this paper, we investigate whether blocking JS on the web… ▽ More

    Submitted 23 March, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Journal ref: petsymposium 2023

  12. arXiv:2301.03553  [pdf, other

    cs.SE cs.CV cs.DC cs.LG

    FedDebug: Systematic Debugging for Federated Learning Applications

    Authors: Waris Gill, Ali Anwar, Muhammad Ali Gulzar

    Abstract: In Federated Learning (FL), clients independently train local models and share them with a central aggregator to build a global model. Impermissibility to access clients' data and collaborative training make FL appealing for applications with data-privacy concerns, such as medical imaging. However, these FL characteristics pose unprecedented challenges for debugging. When a global model's performa… ▽ More

    Submitted 22 February, 2024; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: Published at ICSE 2023. Link https://ieeexplore.ieee.org/document/10172839

    Journal ref: In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) (pp. 456-789). IEEE (2023)

  13. arXiv:2205.05137  [pdf, other

    cs.CL cs.LG

    Sibylvariant Transformations for Robust Text Classification

    Authors: Fabrice Harel-Canada, Muhammad Ali Gulzar, Nanyun Peng, Miryung Kim

    Abstract: The vast majority of text transformation techniques in NLP are inherently limited in their ability to expand input space coverage due to an implicit constraint to preserve the original class label. In this work, we propose the notion of sibylvariance (SIB) to describe the broader set of transforms that relax the label-preserving constraint, knowably vary the expected class, and lead to significant… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

    Comments: 9 pages, Findings of ACL 2022

  14. arXiv:2108.13923  [pdf, other

    cs.NI

    TrackerSift: Untangling Mixed Tracking and Functional Web Resources

    Authors: Abdul Haddi Amjad, Danial Saleem, Fareed Zaffar, Muhammad Ali Gulzar, Zubair Shafiq

    Abstract: Trackers have recently started to mix tracking and functional resources to circumvent privacy-enhancing content blocking tools. Such mixed web resources put content blockers in a bind: risk breaking legitimate functionality if they act and risk missing privacy-invasive advertising and tracking if they do not. In this paper, we propose TrackerSift to progressively classify and untangle mixed web re… ▽ More

    Submitted 29 September, 2021; v1 submitted 28 August, 2021; originally announced August 2021.

  15. arXiv:2103.05118  [pdf, other

    cs.SE

    Efficient Fuzz Testing for Apache Spark Using Framework Abstraction

    Authors: Qian Zhang, Jiyuan Wang, Muhammad Ali Gulzar, Rohan Padhye, Miryung Kim

    Abstract: The emerging data-intensive applications are increasingly dependent on data-intensive scalable computing (DISC) systems, such as Apache Spark, to process large data. Despite their popularity, DISC applications are hard to test. In recent years, fuzz testing has been remarkably successful; however, it is nontrivial to apply such traditional fuzzing to big data analytics directly because: (1) the lo… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.