Search | arXiv e-print repository

A Partial Replication of MaskFormer in TensorFlow on TPUs for the TensorFlow Model Garden

Authors: Vishal Purohit, Wenxin Jiang, Akshath R. Ravikiran, James C. Davis

Abstract: This paper undertakes the task of replicating the MaskFormer model a universal image segmentation model originally developed using the PyTorch framework, within the TensorFlow ecosystem, specifically optimized for execution on Tensor Processing Units (TPUs). Our implementation exploits the modular constructs available within the TensorFlow Model Garden (TFMG), encompassing elements such as the dat… ▽ More This paper undertakes the task of replicating the MaskFormer model a universal image segmentation model originally developed using the PyTorch framework, within the TensorFlow ecosystem, specifically optimized for execution on Tensor Processing Units (TPUs). Our implementation exploits the modular constructs available within the TensorFlow Model Garden (TFMG), encompassing elements such as the data loader, training orchestrator, and various architectural components, tailored and adapted to meet the specifications of the MaskFormer model. We address key challenges encountered during the replication, non-convergence issues, slow training, adaptation of loss functions, and the integration of TPU-specific functionalities. We verify our reproduced implementation and present qualitative results on the COCO dataset. Although our implementation meets some of the objectives for end-to-end reproducibility, we encountered challenges in replicating the PyTorch version of MaskFormer in TensorFlow. This replication process is not straightforward and requires substantial engineering efforts. Specifically, it necessitates the customization of various components within the TFMG, alongside thorough verification and hyper-parameter tuning. The replication is available at: https://github.com/PurdueDualityLab/tf-maskformer/tree/main/official/projects/maskformer △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.16688 [pdf, other]

Reusing Deep Learning Models: Challenges and Directions in Software Engineering

Authors: James C. Davis, Purvish Jajal, Wenxin Jiang, Taylor R. Schorlemmer, Nicholas Synovic, George K. Thiruvathukal

Abstract: Deep neural networks (DNNs) achieve state-of-the-art performance in many areas, including computer vision, system configuration, and question-answering. However, DNNs are expensive to develop, both in intellectual effort (e.g., devising new architectures) and computational costs (e.g., training). Reusing DNNs is a promising direction to amortize costs within a company and across the computing indu… ▽ More Deep neural networks (DNNs) achieve state-of-the-art performance in many areas, including computer vision, system configuration, and question-answering. However, DNNs are expensive to develop, both in intellectual effort (e.g., devising new architectures) and computational costs (e.g., training). Reusing DNNs is a promising direction to amortize costs within a company and across the computing industry. As with any new technology, however, there are many challenges in reusing DNNs. These challenges include both missing technical capabilities and missing engineering practices. This vision paper describes challenges in current approaches to DNN re-use. We summarize studies of re-use failures across the spectrum of re-use techniques, including conceptual (e.g., reusing based on a research paper), adaptation (e.g., re-using by building on an existing implementation), and deployment (e.g., direct re-use on a new device). We outline possible advances that would improve each kind of re-use. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Proceedings of the IEEE John Vincent Atanasoff Symposium on Modern Computing (JVA'23) 2023

arXiv:2404.16632 [pdf]

Introducing Systems Thinking as a Framework for Teaching and Assessing Threat Modeling Competency

Authors: Siddhant S. Joshi, Preeti Mukherjee, Kirsten A. Davis, James C. Davis

Abstract: Computing systems face diverse and substantial cybersecurity threats. To mitigate these cybersecurity threats, software engineers need to be competent in the skill of threat modeling. In industry and academia, there are many frameworks for teaching threat modeling, but our analysis of these frameworks suggests that (1) these approaches tend to be focused on component-level analysis rather than edu… ▽ More Computing systems face diverse and substantial cybersecurity threats. To mitigate these cybersecurity threats, software engineers need to be competent in the skill of threat modeling. In industry and academia, there are many frameworks for teaching threat modeling, but our analysis of these frameworks suggests that (1) these approaches tend to be focused on component-level analysis rather than educating students to reason holistically about a system's cybersecurity, and (2) there is no rubric for assessing a student's threat modeling competency. To address these concerns, we propose using systems thinking in conjunction with popular and industry-standard threat modeling frameworks like STRIDE for teaching and assessing threat modeling competency. Prior studies suggest a holistic approach, like systems thinking, can help understand and mitigate cybersecurity threats. Thus, we developed and piloted two novel rubrics - one for assessing STRIDE threat modeling performance and the other for assessing systems thinking performance while conducting STRIDE. To conduct this study, we piloted the two rubrics mentioned above to assess threat model artifacts of students enrolled in an upper-level software engineering course at Purdue University in Fall 2021, Spring 2023, and Fall 2023. Students who had both systems thinking and STRIDE instruction identified and attempted to mitigate component-level as well as systems-level threats. Students with only STRIDE instruction tended to focus on identifying and mitigating component-level threats and discounted system-level threats. We contribute to engineering education by: (1) describing a new rubric for assessing threat modeling based on systems thinking; (2) identifying trends and blindspots in students' threat modeling approach; and (3) envisioning the benefits of integrating systems thinking in threat modeling teaching and assessment. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Presented at the Annual Conference of the American Society for Engineering Education (ASEE'24) 2024

arXiv:2403.18679 [pdf]

An Exploratory Study on Upper-Level Computing Students' Use of Large Language Models as Tools in a Semester-Long Project

Authors: Ben Arie Tanay, Lexy Arinze, Siddhant S. Joshi, Kirsten A. Davis, James C. Davis

Abstract: Background: Large Language Models (LLMs) such as ChatGPT and CoPilot are influencing software engineering practice. Software engineering educators must teach future software engineers how to use such tools well. As of yet, there have been few studies that report on the use of LLMs in the classroom. It is, therefore, important to evaluate students' perception of LLMs and possible ways of adapting t… ▽ More Background: Large Language Models (LLMs) such as ChatGPT and CoPilot are influencing software engineering practice. Software engineering educators must teach future software engineers how to use such tools well. As of yet, there have been few studies that report on the use of LLMs in the classroom. It is, therefore, important to evaluate students' perception of LLMs and possible ways of adapting the computing curriculum to these shifting paradigms. Purpose: The purpose of this study is to explore computing students' experiences and approaches to using LLMs during a semester-long software engineering project. Design/Method: We collected data from a senior-level software engineering course at Purdue University. This course uses a project-based learning (PBL) design. The students used LLMs such as ChatGPT and Copilot in their projects. A sample of these student teams were interviewed to understand (1) how they used LLMs in their projects; and (2) whether and how their perspectives on LLMs changed over the course of the semester. We analyzed the data to identify themes related to students' usage patterns and learning outcomes. Results/Discussion: When computing students utilize LLMs within a project, their use cases cover both technical and professional applications. In addition, these students perceive LLMs to be efficient tools in obtaining information and completion of tasks. However, there were concerns about the responsible use of LLMs without being detrimental to their own learning outcomes. Based on our findings, we recommend future research to investigate the usage of LLM's in lower-level computer engineering courses to understand whether and how LLMs can be integrated as a learning aid without hurting the learning outcomes. △ Less

Submitted 16 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted to the 2024 General Conference of the American Society for Engineering Education (ASEE)

arXiv:2402.12252 [pdf, other]

An Interview Study on Third-Party Cyber Threat Hunting Processes in the U.S. Department of Homeland Security

Authors: William P. Maxam III, James C. Davis

Abstract: Cybersecurity is a major challenge for large organizations. Traditional cybersecurity defense is reactive. Cybersecurity operations centers keep out adversaries and incident response teams clean up after break-ins. Recently a proactive stage has been introduced: Cyber Threat Hunting (TH) looks for potential compromises missed by other cyber defenses. TH is mandated for federal executive agencies a… ▽ More Cybersecurity is a major challenge for large organizations. Traditional cybersecurity defense is reactive. Cybersecurity operations centers keep out adversaries and incident response teams clean up after break-ins. Recently a proactive stage has been introduced: Cyber Threat Hunting (TH) looks for potential compromises missed by other cyber defenses. TH is mandated for federal executive agencies and government contractors. As threat hunting is a new cybersecurity discipline, most TH teams operate without a defined process. The practices and challenges of TH have not yet been documented. To address this gap, this paper describes the first interview study of threat hunt practitioners. We obtained access and interviewed 11 threat hunters associated with the U.S. government's Department of Homeland Security. Hour-long interviews were conducted. We analyzed the transcripts with process and thematic coding.We describe the diversity among their processes, show that their processes differ from the TH processes reported in the literature, and unify our subjects' descriptions into a single TH process.We enumerate common TH challenges and solutions according to the subjects. The two most common challenges were difficulty in assessing a Threat Hunter's expertise, and developing and maintaining automation. We conclude with recommendations for TH teams (improve planning, focus on automation, and apprentice new members) and highlight directions for future work (finding a TH process that balances flexibility and formalism, and identifying assessments for TH team performance). △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: Technical report accompanying a paper at USENIX Security 2024

arXiv:2402.00699 [pdf, other]

PeaTMOSS: A Dataset and Initial Analysis of Pre-Trained Models in Open-Source Software

Authors: Wenxin Jiang, Jerin Yasmin, Jason Jones, Nicholas Synovic, Jiashen Kuo, Nathaniel Bielanski, Yuan Tian, George K. Thiruvathukal, James C. Davis

Abstract: The development and training of deep learning models have become increasingly costly and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for their downstream applications. The dynamics of the PTM supply chain remain largely unexplored, signaling a clear need for structured datasets that document not only the metadata but also the subsequent applications of these mo… ▽ More The development and training of deep learning models have become increasingly costly and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for their downstream applications. The dynamics of the PTM supply chain remain largely unexplored, signaling a clear need for structured datasets that document not only the metadata but also the subsequent applications of these models. Without such data, the MSR community cannot comprehensively understand the impact of PTM adoption and reuse. This paper presents the PeaTMOSS dataset, which comprises metadata for 281,638 PTMs and detailed snapshots for all PTMs with over 50 monthly downloads (14,296 PTMs), along with 28,575 open-source software repositories from GitHub that utilize these models. Additionally, the dataset includes 44,337 mappings from 15,129 downstream GitHub repositories to the 2,530 PTMs they use. To enhance the dataset's comprehensiveness, we developed prompts for a large language model to automatically extract model metadata, including the model's training datasets, parameters, and evaluation metrics. Our analysis of this dataset provides the first summary statistics for the PTM supply chain, showing the trend of PTM development and common shortcomings of PTM package documentation. Our example application reveals inconsistencies in software licenses across PTMs and their dependent projects. PeaTMOSS lays the foundation for future research, offering rich opportunities to investigate the PTM supply chain. We outline mining opportunities on PTMs, their downstream usage, and cross-cutting questions. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: Accepted at MSR'24

arXiv:2401.14635 [pdf, other]

Signing in Four Public Software Package Registries: Quantity, Quality, and Influencing Factors

Authors: Taylor R Schorlemmer, Kelechi G Kalu, Luke Chigges, Kyung Myung Ko, Eman Abu Isghair, Saurabh Baghi, Santiago Torres-Arias, James C Davis

Abstract: Many software applications incorporate open-source third-party packages distributed by public package registries. Guaranteeing authorship along this supply chain is a challenge. Package maintainers can guarantee package authorship through software signing. However, it is unclear how common this practice is, and whether the resulting signatures are created properly. Prior work has provided raw data… ▽ More Many software applications incorporate open-source third-party packages distributed by public package registries. Guaranteeing authorship along this supply chain is a challenge. Package maintainers can guarantee package authorship through software signing. However, it is unclear how common this practice is, and whether the resulting signatures are created properly. Prior work has provided raw data on registry signing practices, but only measured single platforms, did not consider quality, did not consider time, and did not assess factors that may influence signing. We do not have up-to-date measurements of signing practices nor do we know the quality of existing signatures. Furthermore, we lack a comprehensive understanding of factors that influence signing adoption. This study addresses this gap. We provide measurements across three kinds of package registries: traditional software (Maven, PyPI), container images (DockerHub), and machine learning models (Hugging Face). For each registry, we describe the nature of the signed artifacts as well as the current quantity and quality of signatures. Then, we examine longitudinal trends in signing practices. Finally, we use a quasi-experiment to estimate the effect that various factors had on software signing practices. To summarize our findings: (1) mandating signature adoption improves the quantity of signatures; (2) providing dedicated tooling improves the quality of signing; (3) getting started is the hard part -- once a maintainer begins to sign, they tend to continue doing so; and (4) although many supply chain attacks are mitigable via signing, signing adoption is primarily affected by registry policy rather than by public knowledge of attacks, new engineering standards, etc. These findings highlight the importance of software package registry managers and signing infrastructure. △ Less

Submitted 14 April, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: Accepted at IEEE Security & Privacy 2024 (S&P'24)

arXiv:2401.14629 [pdf, ps, other]

A First Look at the General Data Protection Regulation (GDPR) in Open-Source Software

Authors: Lucas Franke, Huayu Liang, Aaron Brantly, James C Davis, Chris Brown

Abstract: This poster describes work on the General Data Protection Regulation (GDPR) in open-source software. Although open-source software is commonly integrated into regulated software, and thus must be engineered or adapted for compliance, we do not know how such laws impact open-source software development. We surveyed open-source developers (N=47) to understand their experiences and perceptions of G… ▽ More This poster describes work on the General Data Protection Regulation (GDPR) in open-source software. Although open-source software is commonly integrated into regulated software, and thus must be engineered or adapted for compliance, we do not know how such laws impact open-source software development. We surveyed open-source developers (N=47) to understand their experiences and perceptions of GDPR. We learned many engineering challenges, primarily regarding the management of users' data and assessments of compliance. We call for improved policy-related resources, especially tools to support data privacy regulation implementation and compliance in open-source software. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 2 page extended abstract for ICSE-Poster 2024

arXiv:2310.14117 [pdf, other]

ZTD$_{JAVA}$: Mitigating Software Supply Chain Vulnerabilities via Zero-Trust Dependencies

Authors: Paschal C. Amusuo, Kyle A. Robinson, Tanmay Singla, Huiyun Peng, Aravind Machiry, Santiago Torres-Arias, Laurent Simon, James C. Davis

Abstract: Third-party software components like Log4J accelerate software application development but introduce substantial risk. These components have led to many software supply chain attacks. These attacks succeed because third-party software components are implicitly trusted in an application. Although several security defenses exist to reduce the risks from third-party software components, none of them… ▽ More Third-party software components like Log4J accelerate software application development but introduce substantial risk. These components have led to many software supply chain attacks. These attacks succeed because third-party software components are implicitly trusted in an application. Although several security defenses exist to reduce the risks from third-party software components, none of them fulfills the full set of requirements needed to defend against common attacks. No individual solution prevents malicious access to operating system resources, is dependency-aware, and enables the discovery of least privileges, all with low runtime costs. Consequently, they cannot prevent software supply chain attacks. This paper proposes applying the NIST Zero Trust Architecture to software applications. Our Zero Trust Dependencies concept applies the NIST ZTA principles to an application's dependencies. First, we assess the expected effectiveness and feasibility of Zero Trust Dependencies using a study of third-party software components and their vulnerabilities. Then, we present a system design, ZTDSYS, that enables the application of Zero Trust Dependencies to software applications and a prototype, ZTDJAVA, for Java applications. Finally, with evaluations on recreated vulnerabilities and realistic applications, we show that ZTDJAVA can defend against prevalent vulnerability classes, introduces negligible cost, and is easy to configure and use. △ Less

Submitted 25 April, 2024; v1 submitted 21 October, 2023; originally announced October 2023.

Comments: 15 pages, 5 figures, 5 tables

ACM Class: K.6.5; D.4.6

arXiv:2310.03620 [pdf, other]

PeaTMOSS: Mining Pre-Trained Models in Open-Source Software

Authors: Wenxin Jiang, Jason Jones, Jerin Yasmin, Nicholas Synovic, Rajeev Sashti, Sophie Chen, George K. Thiruvathukal, Yuan Tian, James C. Davis

Abstract: Developing and training deep learning models is expensive, so software engineers have begun to reuse pre-trained deep learning models (PTMs) and fine-tune them for downstream tasks. Despite the wide-spread use of PTMs, we know little about the corresponding software engineering behaviors and challenges. To enable the study of software engineering with PTMs, we present the PeaTMOSS dataset: Pre-T… ▽ More Developing and training deep learning models is expensive, so software engineers have begun to reuse pre-trained deep learning models (PTMs) and fine-tune them for downstream tasks. Despite the wide-spread use of PTMs, we know little about the corresponding software engineering behaviors and challenges. To enable the study of software engineering with PTMs, we present the PeaTMOSS dataset: Pre-Trained Models in Open-Source Software. PeaTMOSS has three parts: a snapshot of (1) 281,638 PTMs, (2) 27,270 open-source software repositories that use PTMs, and (3) a mapping between PTMs and the projects that use them. We challenge PeaTMOSS miners to discover software engineering practices around PTMs. A demo and link to the full dataset are available at: https://github.com/PurdueDualityLab/PeaTMOSS-Demos. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2310.01653 [pdf]

A Unified Taxonomy and Evaluation of IoT Security Guidelines

Authors: Jesse Chen, Dharun Anandayuvaraj, James C Davis, Sazzadur Rahaman

Abstract: Cybersecurity concerns about Internet of Things (IoT) devices and infrastructure are growing each year. In response, organizations worldwide have published IoT cybersecurity guidelines to protect their citizens and customers. These guidelines constrain the development of IoT systems, which include substantial software components both on-device and in the Cloud. While these guidelines are being wid… ▽ More Cybersecurity concerns about Internet of Things (IoT) devices and infrastructure are growing each year. In response, organizations worldwide have published IoT cybersecurity guidelines to protect their citizens and customers. These guidelines constrain the development of IoT systems, which include substantial software components both on-device and in the Cloud. While these guidelines are being widely adopted, e.g. by US federal contractors, their content and merits have not been critically examined. Two notable gaps are: (1) We do not know how these guidelines differ by the topics and details of their recommendations; and (2) We do not know how effective they are at mitigating real-world IoT failures. In this paper, we address these questions through an exploratory sequential mixed-method study of IoT cybersecurity guidelines. We collected a corpus of 142 general IoT cybersecurity guidelines, sampling them for recommendations until saturation was reached. From the resulting 958 unique recommendations, we iteratively developed a hierarchical taxonomy following grounded theory coding principles. We measured the guidelines' usefulness by asking novice engineers about the actionability of each recommendation, and by matching cybersecurity recommendations to the root causes of failures (CVEs and news stories). We report that: (1) Comparing guidelines to one another, each guideline has gaps in its topic coverage and comprehensiveness; and (2) Although 87.2% recommendations are actionable and the union of the guidelines mitigates all 17 of the failures from news stories, 21% of the CVEs apparently evade the guidelines. In summary, we report shortcomings in every guideline's depth and breadth, but as a whole they are capable of preventing security issues. Our results will help software engineers determine which and how many guidelines to study as they implement IoT systems. △ Less

Submitted 3 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2310.01642 [pdf, other]

Naming Practices of Pre-Trained Models in Hugging Face

Authors: Wenxin Jiang, Chingwo Cheung, Mingyu Kim, Heesoo Kim, George K. Thiruvathukal, James C. Davis

Abstract: As innovation in deep learning continues, many engineers seek to adopt Pre-Trained Models (PTMs) as components in computer systems. Researchers publish PTMs, which engineers adapt for quality or performance prior to deployment. PTM authors should choose appropriate names for their PTMs, which would facilitate model discovery and reuse. However, prior research has reported that model names are not… ▽ More As innovation in deep learning continues, many engineers seek to adopt Pre-Trained Models (PTMs) as components in computer systems. Researchers publish PTMs, which engineers adapt for quality or performance prior to deployment. PTM authors should choose appropriate names for their PTMs, which would facilitate model discovery and reuse. However, prior research has reported that model names are not always well chosen - and are sometimes erroneous. The naming for PTM packages has not been systematically studied. In this paper, we frame and conduct the first empirical investigation of PTM naming practices in the Hugging Face PTM registry. We initiated our study with a survey of 108 Hugging Face users to understand the practices in PTM naming. From our survey analysis, we highlight discrepancies from traditional software package naming, and present findings on naming practices. Our findings indicate there is a great mismatch between engineers' preferences and practical practices of PTM naming. We also present practices on detecting naming anomalies and introduce a novel automated DNN ARchitecture Assessment technique (DARA), capable of detecting PTM naming anomalies. We envision future works on leveraging meta-features of PTMs to improve model reuse and trustworthiness. △ Less

Submitted 28 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: 21 pages

arXiv:2310.00205 [pdf, other]

An Empirical Study on the Use of Static Analysis Tools in Open Source Embedded Software

Authors: Mingjie Shen, Akul Pillai, Brian A. Yuan, James C. Davis, Aravind Machiry

Abstract: This paper performs the first study to understand the prevalence, challenges, and effectiveness of using Static Application Security Testing (SAST) tools on Open-Source Embedded Software (EMBOSS) repositories. We collect a corpus of 258 of the most popular EMBOSS projects, representing 13 distinct categories such as real-time operating systems, network stacks, and applications. To understand the c… ▽ More This paper performs the first study to understand the prevalence, challenges, and effectiveness of using Static Application Security Testing (SAST) tools on Open-Source Embedded Software (EMBOSS) repositories. We collect a corpus of 258 of the most popular EMBOSS projects, representing 13 distinct categories such as real-time operating systems, network stacks, and applications. To understand the current use of SAST tools on EMBOSS, we measured this corpus and surveyed developers. To understand the challenges and effectiveness of using SAST tools on EMBOSS projects, we applied these tools to the projects in our corpus. We report that almost none of these projects (just 3%) use SAST tools beyond those baked into the compiler, and developers give rationales such as ineffectiveness and false positives. In applying SAST tools ourselves, we show that minimal engineering effort and project expertise are needed to apply many tools to a given EMBOSS project. GitHub's CodeQL was the most effective SAST tool -- using its built-in security checks we found a total of 540 defects (with a false positive rate of 23%) across the 258 projects, with 399 (74%) likely security vulnerabilities, including in projects maintained by Microsoft, Amazon, and the Apache Foundation. EMBOSS engineers have confirmed 273 (51%) of these defects, mainly by accepting our pull requests. Two CVEs were issued. In summary, we urge EMBOSS engineers to adopt the current generation of SAST tools, which offer low false positive rates and are effective at finding security-relevant defects. △ Less

Submitted 29 September, 2023; originally announced October 2023.

arXiv:2308.12387 [pdf, other]

Reflecting on the Use of the Policy-Process-Product Theory in Empirical Software Engineering

Authors: Kelechi G. Kalu, Taylor R. Schorlemmer, Sophie Chen, Kyle Robinson, Erik Kocinare, James C. Davis

Abstract: The primary theory of software engineering is that an organization's Policies and Processes influence the quality of its Products. We call this the PPP Theory. Although empirical software engineering research has grown common, it is unclear whether researchers are trying to evaluate the PPP Theory. To assess this, we analyzed half (33) of the empirical works published over the last two years in th… ▽ More The primary theory of software engineering is that an organization's Policies and Processes influence the quality of its Products. We call this the PPP Theory. Although empirical software engineering research has grown common, it is unclear whether researchers are trying to evaluate the PPP Theory. To assess this, we analyzed half (33) of the empirical works published over the last two years in three prominent software engineering conferences. In this sample, 70% focus on policies/processes or products, not both. Only 33% provided measurements relating policy/process and products. We make four recommendations: (1) Use PPP Theory in study design; (2) Study feedback relationships; (3) Diversify the studied feedforward relationships; and (4) Disentangle policy and process. Let us remember that research results are in the context of, and with respect to, the relationship between software products, processes, and policies. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: 5 pages, published in the proceedings of the 2023 ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering in the Ideas-Visions-Reflections track (ESEC/FSE-IVR'23)

arXiv:2308.10965 [pdf, other]

Systematically Detecting Packet Validation Vulnerabilities in Embedded Network Stacks

Authors: Paschal C. Amusuo, Ricardo Andrés Calvo Méndez, Zhongwei Xu, Aravind Machiry, James C. Davis

Abstract: Embedded Network Stacks (ENS) enable low-resource devices to communicate with the outside world, facilitating the development of the Internet of Things and Cyber-Physical Systems. Some defects in ENS are thus high-severity cybersecurity vulnerabilities: they are remotely triggerable and can impact the physical world. While prior research has shed light on the characteristics of defects in many cla… ▽ More Embedded Network Stacks (ENS) enable low-resource devices to communicate with the outside world, facilitating the development of the Internet of Things and Cyber-Physical Systems. Some defects in ENS are thus high-severity cybersecurity vulnerabilities: they are remotely triggerable and can impact the physical world. While prior research has shed light on the characteristics of defects in many classes of software systems, no study has described the properties of ENS defects nor identified a systematic technique to expose them. The most common automated approach to detecting ENS defects is feedback-driven randomized dynamic analysis ("fuzzing"), a costly and unpredictable technique. This paper provides the first systematic characterization of cybersecurity vulnerabilities in ENS. We analyzed 61 vulnerabilities across 6 open-source ENS. Most of these ENS defects are concentrated in the transport and network layers of the network stack, require reaching different states in the network protocol, and can be triggered by only 1-2 modifications to a single packet. We therefore propose a novel systematic testing framework that focuses on the transport and network layers, uses seeds that cover a network protocol's states, and systematically modifies packet fields. We evaluated this framework on 4 ENS and replicated 12 of the 14 reported IP/TCP/UDP vulnerabilities. On recent versions of these ENSs, it discovered 7 novel defects (6 assigned CVES) during a bounded systematic test that covered all protocol states and made up to 3 modifications per packet. We found defects in 3 of the 4 ENS we tested that had not been found by prior fuzzing research. Our results suggest that fuzzing should be deferred until after systematic testing is employed. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 12 pages, 3 figures, to be published in the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023)

ACM Class: D.2.5

arXiv:2308.04898 [pdf, other]

An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures

Authors: Tanmay Singla, Dharun Anandayuvaraj, Kelechi G. Kalu, Taylor R. Schorlemmer, James C. Davis

Abstract: As we increasingly depend on software systems, the consequences of breaches in the software supply chain become more severe. High-profile cyber attacks like those on SolarWinds and ShadowHammer have resulted in significant financial and data losses, underlining the need for stronger cybersecurity. One way to prevent future breaches is by studying past failures. However, traditional methods of anal… ▽ More As we increasingly depend on software systems, the consequences of breaches in the software supply chain become more severe. High-profile cyber attacks like those on SolarWinds and ShadowHammer have resulted in significant financial and data losses, underlining the need for stronger cybersecurity. One way to prevent future breaches is by studying past failures. However, traditional methods of analyzing these failures require manually reading and summarizing reports about them. Automated support could reduce costs and allow analysis of more failures. Natural Language Processing (NLP) techniques such as Large Language Models (LLMs) could be leveraged to assist the analysis of failures. In this study, we assessed the ability of Large Language Models (LLMs) to analyze historical software supply chain breaches. We used LLMs to replicate the manual analysis of 69 software supply chain security failures performed by members of the Cloud Native Computing Foundation (CNCF). We developed prompts for LLMs to categorize these by four dimensions: type of compromise, intent, nature, and impact. GPT 3.5s categorizations had an average accuracy of 68% and Bard had an accuracy of 58% over these dimensions. We report that LLMs effectively characterize software supply chain failures when the source articles are detailed enough for consensus among manual analysts, but cannot yet replace human analysts. Future work can improve LLM performance in this context, and study a broader range of articles and failures. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 22 pages, 9 figures

arXiv:2303.17708 [pdf, other]

Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem

Authors: Purvish Jajal, Wenxin Jiang, Arav Tewari, Erik Kocinare, Joseph Woo, Anusha Sarraf, Yung-Hsiang Lu, George K. Thiruvathukal, James C. Davis

Abstract: Software engineers develop, fine-tune, and deploy deep learning (DL) models using a variety of development frameworks and runtime environments. DL model converters move models between frameworks and to runtime environments. Conversion errors compromise model quality and disrupt deployment. However, the failure characteristics of DL model converters are unknown, adding risk when using DL interopera… ▽ More Software engineers develop, fine-tune, and deploy deep learning (DL) models using a variety of development frameworks and runtime environments. DL model converters move models between frameworks and to runtime environments. Conversion errors compromise model quality and disrupt deployment. However, the failure characteristics of DL model converters are unknown, adding risk when using DL interoperability technologies. This paper analyzes failures in DL model converters. We survey software engineers about DL interoperability tools, use cases, and pain points (N=92). Then, we characterize failures in model converters associated with the main interoperability tool, ONNX (N=200 issues in PyTorch and TensorFlow). Finally, we formulate and test two hypotheses about structural causes for the failures we studied. We find that the node conversion stage of a model converter accounts for ~75% of the defects and 33% of reported failure are related to semantically incorrect models. The cause of semantically incorrect models is elusive, but models with behaviour inconsistencies share operator sequences. Our results motivate future research on making DL interoperability software simpler to maintain, extend, and validate. Research into behavioural tolerances and architectural coverage metrics could be fruitful. △ Less

Submitted 24 April, 2024; v1 submitted 30 March, 2023; originally announced March 2023.

arXiv:2303.08934 [pdf, other]

PTMTorrent: A Dataset for Mining Open-source Pre-trained Model Packages

Authors: Wenxin Jiang, Nicholas Synovic, Purvish Jajal, Taylor R. Schorlemmer, Arav Tewari, Bhavesh Pareek, George K. Thiruvathukal, James C. Davis

Abstract: Due to the cost of developing and training deep learning models from scratch, machine learning engineers have begun to reuse pre-trained models (PTMs) and fine-tune them for downstream tasks. PTM registries known as "model hubs" support engineers in distributing and reusing deep learning models. PTM packages include pre-trained weights, documentation, model architectures, datasets, and metadata. M… ▽ More Due to the cost of developing and training deep learning models from scratch, machine learning engineers have begun to reuse pre-trained models (PTMs) and fine-tune them for downstream tasks. PTM registries known as "model hubs" support engineers in distributing and reusing deep learning models. PTM packages include pre-trained weights, documentation, model architectures, datasets, and metadata. Mining the information in PTM packages will enable the discovery of engineering phenomena and tools to support software engineers. However, accessing this information is difficult - there are many PTM registries, and both the registries and the individual packages may have rate limiting for accessing the data. We present an open-source dataset, PTMTorrent, to facilitate the evaluation and understanding of PTM packages. This paper describes the creation, structure, usage, and limitations of the dataset. The dataset includes a snapshot of 5 model hubs and a total of 15,913 PTM packages. These packages are represented in a uniform data schema for cross-hub mining. We describe prior uses of this data and suggest research opportunities for mining using our dataset. The PTMTorrent dataset (v1) is available at: https://app.globus.org/file-manager?origin_id=55e17a6e-9d8f-11ed-a2a2-8383522b48d9&origin_path=%2F~%2F. Our dataset generation tools are available on GitHub: https://doi.org/10.5281/zenodo.7570357. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 5 pages, 2 figures, Accepted to MSR'23

arXiv:2303.07476 [pdf, other]

Challenges and Practices of Deep Learning Model Reengineering: A Case Study on Computer Vision

Authors: Wenxin Jiang, Vishnu Banna, Naveen Vivek, Abhinav Goel, Nicholas Synovic, George K. Thiruvathukal, James C. Davis

Abstract: Many engineering organizations are reimplementing and extending deep neural networks from the research community. We describe this process as deep learning model reengineering. Deep learning model reengineering - reusing, reproducing, adapting, and enhancing state-of-the-art deep learning approaches - is challenging for reasons including under-documented reference models, changing requirements, an… ▽ More Many engineering organizations are reimplementing and extending deep neural networks from the research community. We describe this process as deep learning model reengineering. Deep learning model reengineering - reusing, reproducing, adapting, and enhancing state-of-the-art deep learning approaches - is challenging for reasons including under-documented reference models, changing requirements, and the cost of implementation and testing. In addition, individual engineers may lack expertise in software engineering, yet teams must apply knowledge of software engineering and deep learning to succeed. Prior work has examined on DL systems from a "product" view, examining defects from projects regardless of the engineers' purpose. Our study is focused on reengineering activities from a "process" view, and focuses on engineers specifically engaged in the reengineering process. Our goal is to understand the characteristics and challenges of deep learning model reengineering. We conducted a case study of this phenomenon, focusing on the context of computer vision. Our results draw from two data sources: defects reported in open-source reeengineering projects, and interviews conducted with open-source project contributors and the leaders of a reengineering team. Our results describe how deep learning-based computer vision techniques are reengineered, analyze the distribution of defects in this process, and discuss challenges and practices. Integrating our quantitative and qualitative data, we proposed a novel reengineering workflow. Our findings inform several future directions, including: measuring additional unknown aspects of model reengineering; standardizing engineering practices to facilitate reengineering; and developing tools to support model reengineering and model reuse. △ Less

Submitted 25 August, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: Under submission to EMSE

arXiv:2303.02555 [pdf, other]

Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular Expressions

Authors: Louis G. Michael IV, James Donohue, James C. Davis, Dongyoon Lee, Francisco Servant

Abstract: Regular expressions (regexes) are a powerful mechanism for solving string-matching problems. They are supported by all modern programming languages, and have been estimated to appear in more than a third of Python and JavaScript projects. Yet existing studies have focused mostly on one aspect of regex programming: readability. We know little about how developers perceive and program regexes, nor t… ▽ More Regular expressions (regexes) are a powerful mechanism for solving string-matching problems. They are supported by all modern programming languages, and have been estimated to appear in more than a third of Python and JavaScript projects. Yet existing studies have focused mostly on one aspect of regex programming: readability. We know little about how developers perceive and program regexes, nor the difficulties that they face. In this paper, we provide the first study of the regex development cycle, with a focus on (1) how developers make decisions throughout the process, (2) what difficulties they face, and (3) how aware they are about serious risks involved in programming regexes. We took a mixed-methods approach, surveying 279 professional developers from a diversity of backgrounds (including top tech firms) for a high-level perspective, and interviewing 17 developers to learn the details about the difficulties that they face and the solutions that they prefer. In brief, regexes are hard. Not only are they hard to read, our participants said that they are hard to search for, hard to validate, and hard to document. They are also hard to master: the majority of our studied developers were unaware of critical security risks that can occur when using regexes, and those who knew of the risks did not deal with them in effective manners. Our findings provide multiple implications for future work, including semantic regex search engines for regex reuse and improved input generators for regex validation. △ Less

Submitted 4 March, 2023; originally announced March 2023.

Comments: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2019

arXiv:2303.02552 [pdf, other]

An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry

Authors: Wenxin Jiang, Nicholas Synovic, Matt Hyatt, Taylor R. Schorlemmer, Rohan Sethi, Yung-Hsiang Lu, George K. Thiruvathukal, James C. Davis

Abstract: Deep Neural Networks (DNNs) are being adopted as components in software systems. Creating and specializing DNNs from scratch has grown increasingly difficult as state-of-the-art architectures grow more complex. Following the path of traditional software engineering, machine learning engineers have begun to reuse large-scale pre-trained models (PTMs) and fine-tune these models for downstream tasks.… ▽ More Deep Neural Networks (DNNs) are being adopted as components in software systems. Creating and specializing DNNs from scratch has grown increasingly difficult as state-of-the-art architectures grow more complex. Following the path of traditional software engineering, machine learning engineers have begun to reuse large-scale pre-trained models (PTMs) and fine-tune these models for downstream tasks. Prior works have studied reuse practices for traditional software packages to guide software engineers towards better package maintenance and dependency management. We lack a similar foundation of knowledge to guide behaviors in pre-trained model ecosystems. In this work, we present the first empirical investigation of PTM reuse. We interviewed 12 practitioners from the most popular PTM ecosystem, Hugging Face, to learn the practices and challenges of PTM reuse. From this data, we model the decision-making process for PTM reuse. Based on the identified practices, we describe useful attributes for model reuse, including provenance, reproducibility, and portability. Three challenges for PTM reuse are missing attributes, discrepancies between claimed and actual performance, and model risks. We substantiate these identified challenges with systematic measurements in the Hugging Face ecosystem. Our work informs future directions on optimizing deep learning ecosystems by automated measuring useful attributes and potential attacks, and envision future research on infrastructure and standardization for model registries. △ Less

Submitted 4 March, 2023; originally announced March 2023.

Comments: Proceedings of the ACM/IEEE 45th International Conference on Software Engineering (ICSE) 2023

arXiv:2303.02551 [pdf, other]

Discrepancies among Pre-trained Deep Neural Networks: A New Threat to Model Zoo Reliability

Authors: Diego Montes, Pongpatapee Peerapatanapokin, Jeff Schultz, Chengjun Gun, Wenxin Jiang, James C. Davis

Abstract: Training deep neural networks (DNNs) takes signifcant time and resources. A practice for expedited deployment is to use pre-trained deep neural networks (PTNNs), often from model zoos -- collections of PTNNs; yet, the reliability of model zoos remains unexamined. In the absence of an industry standard for the implementation and performance of PTNNs, engineers cannot confidently incorporate them in… ▽ More Training deep neural networks (DNNs) takes signifcant time and resources. A practice for expedited deployment is to use pre-trained deep neural networks (PTNNs), often from model zoos -- collections of PTNNs; yet, the reliability of model zoos remains unexamined. In the absence of an industry standard for the implementation and performance of PTNNs, engineers cannot confidently incorporate them into production systems. As a first step, discovering potential discrepancies between PTNNs across model zoos would reveal a threat to model zoo reliability. Prior works indicated existing variances in deep learning systems in terms of accuracy. However, broader measures of reliability for PTNNs from model zoos are unexplored. This work measures notable discrepancies between accuracy, latency, and architecture of 36 PTNNs across four model zoos. Among the top 10 discrepancies, we find differences of 1.23%-2.62% in accuracy and 9%-131% in latency. We also fnd mismatches in architecture for well-known DNN architectures (e.g., ResNet and AlexNet). Our findings call for future works on empirical validation, automated tools for measurement, and best practices for implementation. △ Less

Submitted 4 March, 2023; originally announced March 2023.

Comments: Proceedings of the 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering: Ideas, Visions, and Reflections track (ESEC/FSE-IVR) 2022

arXiv:2303.01996 [pdf, other]

Exploiting Input Sanitization for Regex Denial of Service

Authors: Efe Barlas, Xin Du, James C. Davis

Abstract: Web services use server-side input sanitization to guard against harmful input. Some web services publish their sanitization logic to make their client interface more usable, e.g., allowing clients to debug invalid requests locally. However, this usability practice poses a security risk. Specifically, services may share the regexes they use to sanitize input strings -- and regex-based denial of se… ▽ More Web services use server-side input sanitization to guard against harmful input. Some web services publish their sanitization logic to make their client interface more usable, e.g., allowing clients to debug invalid requests locally. However, this usability practice poses a security risk. Specifically, services may share the regexes they use to sanitize input strings -- and regex-based denial of service (ReDoS) is an emerging threat. Although prominent service outages caused by ReDoS have spurred interest in this topic, we know little about the degree to which live web services are vulnerable to ReDoS. In this paper, we conduct the first black-box study measuring the extent of ReDoS vulnerabilities in live web services. We apply the Consistent Sanitization Assumption: that client-side sanitization logic, including regexes, is consistent with the sanitization logic on the server-side. We identify a service's regex-based input sanitization in its HTML forms or its API, find vulnerable regexes among these regexes, craft ReDoS probes, and pinpoint vulnerabilities. We analyzed the HTML forms of 1,000 services and the APIs of 475 services. Of these, 355 services publish regexes; 17 services publish unsafe regexes; and 6 services are vulnerable to ReDoS through their APIs (6 domains; 15 subdomains). Both Microsoft and Amazon Web Services patched their web services as a result of our disclosure. Since these vulnerabilities were from API specifications, not HTML forms, we proposed a ReDoS defense for a popular API validation library, and our patch has been merged. To summarize: in client-visible sanitization logic, some web services advertise ReDoS vulnerabilities in plain sight. Our results motivate short-term patches and long-term fundamental solutions. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering (ICSE) 2022

arXiv:2212.07979 [pdf, other]

Improving Developers' Understanding of Regex Denial of Service Tools through Anti-Patterns and Fix Strategies

Authors: Sk Adnan Hassan, Zainab Aamir, Dongyoon Lee, James C. Davis, Francisco Servant

Abstract: Regular expressions are used for diverse purposes, including input validation and firewalls. Unfortunately, they can also lead to a security vulnerability called ReDoS (Regular Expression Denial of Service), caused by a super-linear worst-case execution time during regex matching. Due to the severity and prevalence of ReDoS, past work proposed automatic tools to detect and fix regexes. Although th… ▽ More Regular expressions are used for diverse purposes, including input validation and firewalls. Unfortunately, they can also lead to a security vulnerability called ReDoS (Regular Expression Denial of Service), caused by a super-linear worst-case execution time during regex matching. Due to the severity and prevalence of ReDoS, past work proposed automatic tools to detect and fix regexes. Although these tools were evaluated in automatic experiments, their usability has not yet been studied; usability has not been a focus of prior work. Our insight is that the usability of existing tools to detect and fix regexes will improve if we complement them with anti-patterns and fix strategies of vulnerable regexes. We developed novel anti-patterns for vulnerable regexes, and a collection of fix strategies to fix them. We derived our anti-patterns and fix strategies from a novel theory of regex infinite ambiguity - a necessary condition for regexes vulnerable to ReDoS. We proved the soundness and completeness of our theory. We evaluated the effectiveness of our anti-patterns, both in an automatic experiment and when applied manually. Then, we evaluated how much our anti-patterns and fix strategies improve developers' understanding of the outcome of detection and fixing tools. Our evaluation found that our anti-patterns were effective over a large dataset of regexes (N=209,188): 100% precision and 99% recall, improving the state of the art 50% precision and 87% recall. Our anti-patterns were also more effective than the state of the art when applied manually (N=20): 100% developers applied them effectively vs. 50% for the state of the art. Finally, our anti-patterns and fix strategies increased developers' understanding using automatic tools (N=9): from median "Very weakly" to median "Strongly" when detecting vulnerabilities, and from median "Very weakly" to median "Very strongly" when fixing them. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: IEEE Security & Privacy 2023

arXiv:2209.02930 [pdf, other]

doi 10.1145/3540250.3560879

Reflections on Software Failure Analysis

Authors: Paschal C. Amusuo, Aishwarya Sharma, Siddharth R. Rao, Abbey Vincent, James C. Davis

Abstract: Failure studies are important in revealing the root causes, behaviors, and life cycle of defects in software systems. These studies either focus on understanding the characteristics of defects in specific classes of systems or the characteristics of a specific type of defect in the systems it manifests in. Failure studies have influenced various software engineering research directions, especially… ▽ More Failure studies are important in revealing the root causes, behaviors, and life cycle of defects in software systems. These studies either focus on understanding the characteristics of defects in specific classes of systems or the characteristics of a specific type of defect in the systems it manifests in. Failure studies have influenced various software engineering research directions, especially in the area of software evolution, defect detection, and program repair. In this paper, we reflect on the conduct of failure studies in software engineering. We reviewed a sample of 52 failure study papers. We identified several recurring problems in these studies, some of which hinder the ability of the engineering community to trust or replicate the results. Based on our findings, we suggest future research directions, including identifying and analyzing failure causal chains, standardizing the conduct of failure studies, and tool support for faster defect analysis. △ Less

Submitted 21 September, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: 6 pages, 4 figures To be published in: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE '22)

arXiv:2207.11767 [pdf, other]

Snapshot Metrics Are Not Enough: Analyzing Software Repositories with Longitudinal Metrics

Authors: Nicholas Synovic, Matt Hyatt, Rohan Sethi, Sohini Thota, Shilpika, Allan J. Miller, Wenxin Jiang, Emmanuel S. Amobi, Austin Pinderski, Konstantin Läufer, Nicholas J. Hayward, Neil Klingensmith, James C. Davis, George K. Thiruvathukal

Abstract: Software metrics capture information about software development processes and products. These metrics support decision-making, e.g., in team management or dependency selection. However, existing metrics tools measure only a snapshot of a software project. Little attention has been given to enabling engineers to reason about metric trends over time -- longitudinal metrics that give insight about pr… ▽ More Software metrics capture information about software development processes and products. These metrics support decision-making, e.g., in team management or dependency selection. However, existing metrics tools measure only a snapshot of a software project. Little attention has been given to enabling engineers to reason about metric trends over time -- longitudinal metrics that give insight about process, not just product. In this work, we present PRiME (PRocess MEtrics), a tool for computing and visualizing process metrics. The currently-supported metrics include productivity, issue density, issue spoilage, and bus factor. We illustrate the value of longitudinal data and conclude with a research agenda. The tool's demo video can be watched at https://youtu.be/YigEHy3_JCo. The source code can be found at https://github.com/SoftwareSystemsLaboratory/prime. △ Less

Submitted 24 July, 2022; originally announced July 2022.

Comments: Accepted at ASE 2022 Tool Demonstrations

arXiv:2206.13562 [pdf]

Incorporating Failure Knowledge into Design Decisions for IoT Systems: A Controlled Experiment on Novices

Authors: Dharun Anandayuvaraj, Pujita Thulluri, Justin Figueroa, Harshit Shandilya, James C. Davis

Abstract: Internet of Things (IoT) systems allow software to directly interact with the physical world. Recent IoT failures can be attributed to recurring software design flaws, suggesting IoT software engineers may not be learning from past failures. We examine the use of failure stories to improve IoT system designs. We conducted an experiment to evaluate the influence of failure-related learning treatmen… ▽ More Internet of Things (IoT) systems allow software to directly interact with the physical world. Recent IoT failures can be attributed to recurring software design flaws, suggesting IoT software engineers may not be learning from past failures. We examine the use of failure stories to improve IoT system designs. We conducted an experiment to evaluate the influence of failure-related learning treatments on design decisions. Our experiment used a between-subjects comparison of novices (computer engineering students) completing a design questionnaire. There were three treatments: a control group (N=7); a group considering a set of design guidelines (N=8); and a group considering failure stories (proposed treatment, N=6). We measured their design decisions and their design rationales. All subjects made comparable decisions. Their rationales varied by treatment: subjects treated with guidelines and failure stories made greater use of criticality as a rationale, while subjects exposed to failure stories more frequently used safety as a rationale. Building on these findings, we suggest several research directions toward a failure-aware IoT engineering process. △ Less

Submitted 20 March, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: Accepted at the Software Engineering Research & Practices for the Internet of Things (SERP4IoT) workshop at The International Conference on Software Engineering (ICSE) 2023

arXiv:2206.13560 [pdf]

doi 10.1145/3551349.3559545

Reflecting on Recurring Failures in IoT Development

Authors: Dharun Anandayuvaraj, James C. Davis

Abstract: As IoT systems are given more responsibility and autonomy, they offer greater benefits, but also carry greater risks. We believe this trend invigorates an old challenge of software engineering: how to develop high-risk software-intensive systems safely and securely under market pressures? As a first step, we conducted a systematic analysis of recent IoT failures to identify engineering challenges.… ▽ More As IoT systems are given more responsibility and autonomy, they offer greater benefits, but also carry greater risks. We believe this trend invigorates an old challenge of software engineering: how to develop high-risk software-intensive systems safely and securely under market pressures? As a first step, we conducted a systematic analysis of recent IoT failures to identify engineering challenges. We collected and analyzed 22 news reports and studied the sources, impacts, and repair strategies of failures in IoT systems. We observed failure trends both within and across application domains. We also observed that failure themes have persisted over time. To alleviate these trends, we outline a research agenda toward a Failure-Aware Software Development Life Cycle for IoT development. We propose an encyclopedia of failures and an empirical basis for system postmortems, complemented by appropriate automated tools. △ Less

Submitted 19 September, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: Accepted at the New Ideas and Emerging Results Track (NIER) at The 37th IEEE/ACM International Conference on Automated Software Engineering (ASE 2022)

arXiv:2109.13356 [pdf, other]

Efficient Computer Vision on Edge Devices with Pipeline-Parallel Hierarchical Neural Networks

Authors: Abhinav Goel, Caleb Tung, Xiao Hu, George K. Thiruvathukal, James C. Davis, Yung-Hsiang Lu

Abstract: Computer vision on low-power edge devices enables applications including search-and-rescue and security. State-of-the-art computer vision algorithms, such as Deep Neural Networks (DNNs), are too large for inference on low-power edge devices. To improve efficiency, some existing approaches parallelize DNN inference across multiple edge devices. However, these techniques introduce significant commun… ▽ More Computer vision on low-power edge devices enables applications including search-and-rescue and security. State-of-the-art computer vision algorithms, such as Deep Neural Networks (DNNs), are too large for inference on low-power edge devices. To improve efficiency, some existing approaches parallelize DNN inference across multiple edge devices. However, these techniques introduce significant communication and synchronization overheads or are unable to balance workloads across devices. This paper demonstrates that the hierarchical DNN architecture is well suited for parallel processing on multiple edge devices. We design a novel method that creates a parallel inference pipeline for computer vision problems that use hierarchical DNNs. The method balances loads across the collaborating devices and reduces communication costs to facilitate the processing of multiple video frames simultaneously with higher throughput. Our experiments consider a representative computer vision problem where image recognition is performed on each video frame, running on multiple Raspberry Pi 4Bs. With four collaborating low-power edge devices, our approach achieves 3.21X higher throughput, 68% less energy consumption per device per frame, and 58% decrease in memory when compared with existing single-device hierarchical DNNs. △ Less

Submitted 4 November, 2021; v1 submitted 27 September, 2021; originally announced September 2021.

Comments: Accepted for publication in ASPDAC 2022

arXiv:2107.00821 [pdf, other]

An Experience Report on Machine Learning Reproducibility: Guidance for Practitioners and TensorFlow Model Garden Contributors

Authors: Vishnu Banna, Akhil Chinnakotla, Zhengxin Yan, Anirudh Vegesana, Naveen Vivek, Kruthi Krishnappa, Wenxin Jiang, Yung-Hsiang Lu, George K. Thiruvathukal, James C. Davis

Abstract: Machine learning techniques are becoming a fundamental tool for scientific and engineering progress. These techniques are applied in contexts as diverse as astronomy and spam filtering. However, correctly applying these techniques requires careful engineering. Much attention has been paid to the technical potential; relatively little attention has been paid to the software engineering process requ… ▽ More Machine learning techniques are becoming a fundamental tool for scientific and engineering progress. These techniques are applied in contexts as diverse as astronomy and spam filtering. However, correctly applying these techniques requires careful engineering. Much attention has been paid to the technical potential; relatively little attention has been paid to the software engineering process required to bring research-based machine learning techniques into practical utility. Technology companies have supported the engineering community through machine learning frameworks such as TensorFLow and PyTorch, but the details of how to engineer complex machine learning models in these frameworks have remained hidden. To promote best practices within the engineering community, academic institutions and Google have partnered to launch a Special Interest Group on Machine Learning Models (SIGMODELS) whose goal is to develop exemplary implementations of prominent machine learning models in community locations such as the TensorFlow Model Garden (TFMG). The purpose of this report is to define a process for reproducing a state-of-the-art machine learning model at a level of quality suitable for inclusion in the TFMG. We define the engineering process and elaborate on each step, from paper analysis to model release. We report on our experiences implementing the YOLO model family with a team of 26 student researchers, share the tools we developed, and describe the lessons we learned along the way. △ Less

Submitted 29 July, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

Comments: Technical Report

arXiv:2106.10588 [pdf, other]

Low-Power Multi-Camera Object Re-Identification using Hierarchical Neural Networks

Authors: Abhinav Goel, Caleb Tung, Xiao Hu, Haobo Wang, James C. Davis, George K. Thiruvathukal, Yung-Hsiang Lu

Abstract: Low-power computer vision on embedded devices has many applications. This paper describes a low-power technique for the object re-identification (reID) problem: matching a query image against a gallery of previously seen images. State-of-the-art techniques rely on large, computationally-intensive Deep Neural Networks (DNNs). We propose a novel hierarchical DNN architecture that uses attribute labe… ▽ More Low-power computer vision on embedded devices has many applications. This paper describes a low-power technique for the object re-identification (reID) problem: matching a query image against a gallery of previously seen images. State-of-the-art techniques rely on large, computationally-intensive Deep Neural Networks (DNNs). We propose a novel hierarchical DNN architecture that uses attribute labels in the training dataset to perform efficient object reID. At each node in the hierarchy, a small DNN identifies a different attribute of the query image. The small DNN at each leaf node is specialized to re-identify a subset of the gallery: only the images with the attributes identified along the path from the root to a leaf. Thus, a query image is re-identified accurately after processing with a few small DNNs. We compare our method with state-of-the-art object reID techniques. With a 4% loss in accuracy, our approach realizes significant resource savings: 74% less memory, 72% fewer operations, and 67% lower query latency, yielding 65% less energy consumption. △ Less

Submitted 19 June, 2021; originally announced June 2021.

Comments: Accepted to ISLPED 2021

arXiv:2105.04397 [pdf, other]

Why Aren't Regular Expressions a Lingua Franca? An Empirical Study on the Re-use and Portability of Regular Expressions

Authors: James C. Davis, Louis G. Michael IV, Christy A. Coghlan, Francisco Servant, Dongyoon Lee

Abstract: This paper explores the extent to which regular expressions (regexes) are portable across programming languages. Many languages offer similar regex syntaxes, and it would be natural to assume that regexes can be ported across language boundaries. But can regexes be copy/pasted across language boundaries while retaining their semantic and performance characteristics? In our survey of 158 professi… ▽ More This paper explores the extent to which regular expressions (regexes) are portable across programming languages. Many languages offer similar regex syntaxes, and it would be natural to assume that regexes can be ported across language boundaries. But can regexes be copy/pasted across language boundaries while retaining their semantic and performance characteristics? In our survey of 158 professional software developers, most indicated that they re-use regexes across language boundaries and about half reported that they believe regexes are a universal language. We experimentally evaluated the riskiness of this practice using a novel regex corpus -- 537,806 regexes from 193,524 projects written in JavaScript, Java, PHP, Python, Ruby, Go, Perl, and Rust. Using our polyglot regex corpus, we explored the hitherto-unstudied regex portability problems: logic errors due to semantic differences, and security vulnerabilities due to performance differences. We report that developers' belief in a regex lingua franca is understandable but unfounded. Though most regexes compile across language boundaries, 15% exhibit semantic differences across languages and 10% exhibit performance differences across languages. We explained these differences using regex documentation, and further illuminate our findings by investigating regex engine implementations. Along the way we found bugs in the regex engines of JavaScript-V8, Python, Ruby, and Rust, and potential semantic and performance regex bugs in thousands of modules. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: ESEC/FSE 2019

arXiv:2009.12156 [pdf, other]

An Empirical Study on the Impact of Deep Parameters on Mobile App Energy Usage

Authors: Qiang Xu, James C. Davis, Y. Charlie Hu, Abhilash Jindal

Abstract: Improving software performance through configuration parameter tuning is a common activity during software maintenance. Beyond traditional performance metrics like latency, mobile app developers are interested in reducing app energy usage. Some mobile apps have centralized locations for parameter tuning, similar to databases and operating systems, but it is common for mobile apps to have hundreds… ▽ More Improving software performance through configuration parameter tuning is a common activity during software maintenance. Beyond traditional performance metrics like latency, mobile app developers are interested in reducing app energy usage. Some mobile apps have centralized locations for parameter tuning, similar to databases and operating systems, but it is common for mobile apps to have hundreds of parameters scattered around the source code. The correlation between these "deep" parameters and app energy usage is unclear. Researchers have studied the energy effects of deep parameters in specific modules, but we lack a systematic understanding of the energy impact of mobile deep parameters. In this paper we empirically investigate this topic, combining a developer survey with systematic energy measurements. Our motivational survey of 25 Android developers suggests that developers do not understand, and largely ignore, the energy impact of deep parameters. To assess the potential implications of this practice, we propose a deep parameter energy profiling framework that can analyze the energy impact of deep parameters in an app. Our framework identifies deep parameters, mutates them based on our parameter value selection scheme, and performs reliable energy impact analysis. Applying the framework to 16 popular Android apps, we discovered that deep parameter-induced energy inefficiency is rare. We found only 2 out of 1644 deep parameters for which a different value would significantly improve its app's energy efficiency. A detailed analysis found that most deep parameters have either no energy impact, limited energy impact, or an energy impact only under extreme values. Our study suggests that it is generally safe for developers to ignore the energy impact when choosing deep parameter values in mobile apps. △ Less

Submitted 16 January, 2022; v1 submitted 22 September, 2020; originally announced September 2020.

Comments: 12 pages, 9 figures, to be published in SANER 2022, camera-ready

arXiv:2009.05632 [pdf, other]

A Principled Approach to GraphQL Query Cost Analysis

Authors: Alan Cha, Erik Wittern, Guillaume Baudart, James C. Davis, Louis Mandel, Jim A. Laredo

Abstract: The landscape of web APIs is evolving to meet new client requirements and to facilitate how providers fulfill them. A recent web API model is GraphQL, which is both a query language and a runtime. Using GraphQL, client queries express the data they want to retrieve or mutate, and servers respond with exactly those data or changes. GraphQL's expressiveness is risky for service providers because cli… ▽ More The landscape of web APIs is evolving to meet new client requirements and to facilitate how providers fulfill them. A recent web API model is GraphQL, which is both a query language and a runtime. Using GraphQL, client queries express the data they want to retrieve or mutate, and servers respond with exactly those data or changes. GraphQL's expressiveness is risky for service providers because clients can succinctly request stupendous amounts of data, and responding to overly complex queries can be costly or disrupt service availability. Recent empirical work has shown that many service providers are at risk. Using traditional API management methods is not sufficient, and practitioners lack principled means of estimating and measuring the cost of the GraphQL queries they receive. In this work, we present a linear-time GraphQL query analysis that can measure the cost of a query without executing it. Our approach can be applied in a separate API management layer and used with arbitrary GraphQL backends. In contrast to existing static approaches, our analysis supports common GraphQL conventions that affect query cost, and our analysis is provably correct based on our formal specification of GraphQL semantics. We demonstrate the potential of our approach using a novel GraphQL query-response corpus for two commercial GraphQL APIs. Our query analysis consistently obtains upper cost bounds, tight enough relative to the true response sizes to be actionable for service providers. In contrast, existing static GraphQL query analyses exhibit over-estimates and under-estimates because they fail to support GraphQL conventions. △ Less

Submitted 11 September, 2020; originally announced September 2020.

Comments: Published at the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) 2020

arXiv:1907.13012 [pdf, other]

An Empirical Study of GraphQL Schemas

Authors: Erik Wittern, Alan Cha, James C. Davis, Guillaume Baudart, Louis Mandel

Abstract: GraphQL is a query language for APIs and a runtime to execute queries. Using GraphQL queries, clients define precisely what data they wish to retrieve or mutate on a server, leading to fewer round trips and reduced response sizes. Although interest in GraphQL is on the rise, with increasing adoption at major organizations, little is known about what GraphQL interfaces look like in practice. This l… ▽ More GraphQL is a query language for APIs and a runtime to execute queries. Using GraphQL queries, clients define precisely what data they wish to retrieve or mutate on a server, leading to fewer round trips and reduced response sizes. Although interest in GraphQL is on the rise, with increasing adoption at major organizations, little is known about what GraphQL interfaces look like in practice. This lack of knowledge makes it hard for providers to understand what practices promote idiomatic, easy-to-use APIs, and what pitfalls to avoid. To address this gap, we study the design of GraphQL interfaces in practice by analyzing their schemas - the descriptions of their exposed data types and the possible operations on the underlying data. We base our study on two novel corpuses of GraphQL schemas, one of 16 commercial GraphQL schemas and the other of 8,399 GraphQL schemas mined from GitHub projects. We make both corpuses available to other researchers. Using these corpuses, we characterize the size of schemas and their use of GraphQL features and assess the use of both prescribed and organic naming conventions. We also report that a majority of APIs are susceptible to denial of service through complex queries, posing real security risks previously discussed only in theory. We also assess ways in which GraphQL APIs attempt to address these concerns. △ Less

Submitted 30 July, 2019; originally announced July 2019.

arXiv:1311.5904 [pdf, ps, other]

doi 10.1016/j.jpdc.2014.08.001

The IceProd Framework: Distributed Data Processing for the IceCube Neutrino Observatory

Authors: M. G. Aartsen, R. Abbasi, M. Ackermann, J. Adams, J. A. Aguilar, M. Ahlers, D. Altmann, C. Arguelles, J. Auffenberg, X. Bai, M. Baker, S. W. Barwick, V. Baum, R. Bay, J. J. Beatty, J. Becker Tjus, K. -H. Becker, S. BenZvi, P. Berghaus, D. Berley, E. Bernardini, A. Bernhard, D. Z. Besson, G. Binder, D. Bindig , et al. (262 additional authors not shown)

Abstract: IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, iden- tify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. IceProd is a distributed management system based on Python, XML-RPC and GridFTP. It… ▽ More IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, iden- tify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. IceProd is a distributed management system based on Python, XML-RPC and GridFTP. It is driven by a central database in order to coordinate and admin- ister production of simulations and processing of data produced by the IceCube detector. IceProd runs as a separate layer on top of other middleware and can take advantage of a variety of computing resources, including grids and batch systems such as CREAM, Condor, and PBS. This is accomplished by a set of dedicated daemons that process job submission in a coordinated fashion through the use of middleware plugins that serve to abstract the details of job submission and job management from the framework. △ Less

Submitted 22 August, 2014; v1 submitted 22 November, 2013; originally announced November 2013.

Journal ref: Journal of Parallel & Distributed Computing 75:198,2015

Showing 1–36 of 36 results for author: Davis, J C