-
Efficient LLM-Jailbreaking by Introducing Visual Modality
Authors:
Zhenxing Niu,
Yuyao Sun,
Haodong Ren,
Haoxuan Ji,
Quan Wang,
Xiaoke Ma,
Gang Hua,
Rong Jin
Abstract:
This paper focuses on jailbreaking attacks against large language models (LLMs), eliciting them to generate objectionable content in response to harmful user queries. Unlike previous LLM-jailbreaks that directly orient to LLMs, our approach begins by constructing a multimodal large language model (MLLM) through the incorporation of a visual module into the target LLM. Subsequently, we conduct an e…
▽ More
This paper focuses on jailbreaking attacks against large language models (LLMs), eliciting them to generate objectionable content in response to harmful user queries. Unlike previous LLM-jailbreaks that directly orient to LLMs, our approach begins by constructing a multimodal large language model (MLLM) through the incorporation of a visual module into the target LLM. Subsequently, we conduct an efficient MLLM-jailbreak to generate jailbreaking embeddings embJS. Finally, we convert the embJS into text space to facilitate the jailbreaking of the target LLM. Compared to direct LLM-jailbreaking, our approach is more efficient, as MLLMs are more vulnerable to jailbreaking than pure LLM. Additionally, to improve the attack success rate (ASR) of jailbreaking, we propose an image-text semantic matching scheme to identify a suitable initial input. Extensive experiments demonstrate that our approach surpasses current state-of-the-art methods in terms of both efficiency and effectiveness. Moreover, our approach exhibits superior cross-class jailbreaking capabilities.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
High-Resolution Observation and Magnetic Modeling of a Solar Minifilament: the Formation, Eruption and Failing Mechanisms
Authors:
Weilin Teng,
Yingna Su,
Rui Liu,
Jialin Chen,
Yanjie Liu,
Jun Dai,
Wenda Cao,
Jinhua Shen,
Haisheng Ji
Abstract:
Minifilaments are widespread small-scale structures in the solar atmosphere. To better understand their formation and eruption mechanisms, we investigate the entire life of a sigmoidal minifilament located below a large quiescent filament observed by BBSO/GST on 2015 August 3. The Hα structure initially appears as a group of arched threads, then transforms into two J-shaped arcades, and finally fo…
▽ More
Minifilaments are widespread small-scale structures in the solar atmosphere. To better understand their formation and eruption mechanisms, we investigate the entire life of a sigmoidal minifilament located below a large quiescent filament observed by BBSO/GST on 2015 August 3. The Hα structure initially appears as a group of arched threads, then transforms into two J-shaped arcades, and finally forms a sigmoidal shape. SDO/AIA observations in 171Å show that two coronal jets occur around the southern footpoint of the minifilament before the minifilament eruption. The minifilament eruption starts from the southern footpoint, then interacts with the overlying filament and fails. The aforementioned observational changes correspond to three episodes of flux cancellations observed by SDO/HMI. Unlike previous studies, the flux cancellation occurs between the polarity where southern footpoint of the minifilament is rooted in and an external polarity. We construct two magnetic field models before the eruption using the flux rope insertion method, and find an hyperbolic flux tube (HFT) above the flux cancellation site. The observation and modeling results suggest that the eruption is triggered by the external magnetic reconnection between the core field of the minifilament and the external fields due to flux cancellations. This study reveals a new triggering mechanism for minifilament eruptions and a new relationship between minifilament eruptions and coronal jets.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings
Authors:
Revanth Gangi Reddy,
Omar Attia,
Yunyao Li,
Heng Ji,
Saloni Potdar
Abstract:
Ranking is a fundamental and popular problem in search. However, existing ranking algorithms usually restrict the granularity of ranking to full passages or require a specific dense index for each desired level of granularity. Such lack of flexibility in granularity negatively affects many applications that can benefit from more granular ranking, such as sentence-level ranking for open-domain ques…
▽ More
Ranking is a fundamental and popular problem in search. However, existing ranking algorithms usually restrict the granularity of ranking to full passages or require a specific dense index for each desired level of granularity. Such lack of flexibility in granularity negatively affects many applications that can benefit from more granular ranking, such as sentence-level ranking for open-domain question-answering, or proposition-level ranking for attribution. In this work, we introduce the idea of any-granularity ranking, which leverages multi-vector embeddings to rank at varying levels of granularity while maintaining encoding at a single (coarser) level of granularity. We propose a multi-granular contrastive loss for training multi-vector approaches, and validate its utility with both sentences and propositions as ranking units. Finally, we demonstrate the application of proposition-level ranking to post-hoc citation addition in retrieval-augmented generation, surpassing the performance of prompt-driven citation generation.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
GLaD: Synergizing Molecular Graphs and Language Descriptors for Enhanced Power Conversion Efficiency Prediction in Organic Photovoltaic Devices
Authors:
Thao Nguyen,
Tiara Torres-Flores,
Changhyun Hwang,
Carl Edwards,
Ying Diao,
Heng Ji
Abstract:
This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors for enhanced PCE prediction. Due to the lack of high-quality experimental data, we collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, whic…
▽ More
This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors for enhanced PCE prediction. Due to the lack of high-quality experimental data, we collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, which we utilize as the training data for our predictive model. In this low-data regime, GLaD leverages properties learned from large language models (LLMs) pretrained on extensive scientific literature to enrich molecular structural representations, allowing for a multimodal representation of molecules. GLaD achieves precise predictions of PCE, thereby facilitating the synthesis of new OPV molecules with improved efficiency. Furthermore, GLaD showcases versatility, as it applies to a range of molecular property prediction tasks (BBBP, BACE, ClinTox, and SIDER), not limited to those concerning OPV materials. Especially, GLaD proves valuable for tasks in low-data regimes within the chemical space, as it enriches molecular representations by incorporating molecular property descriptions learned from large-scale pretraining. This capability is significant in real-world scientific endeavors like drug and material discovery, where access to comprehensive data is crucial for informed decision-making and efficient exploration of the chemical space.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts
Authors:
Yuelyu Ji,
Zhuochun Li,
Rui Meng,
Sonish Sivarajkumar,
Yanshan Wang,
Zeshui Yu,
Hui Ji,
Yushui Han,
Hanyu Zeng,
Daqing He
Abstract:
This paper introduces the RAG-RLRC-LaySum framework, designed to make complex biomedical research understandable to laymen through advanced Natural Language Processing (NLP) techniques. Our Retrieval Augmented Generation (RAG) solution, enhanced by a reranking method, utilizes multiple knowledge sources to ensure the precision and pertinence of lay summaries. Additionally, our Reinforcement Learni…
▽ More
This paper introduces the RAG-RLRC-LaySum framework, designed to make complex biomedical research understandable to laymen through advanced Natural Language Processing (NLP) techniques. Our Retrieval Augmented Generation (RAG) solution, enhanced by a reranking method, utilizes multiple knowledge sources to ensure the precision and pertinence of lay summaries. Additionally, our Reinforcement Learning for Readability Control (RLRC) strategy improves readability, making scientific content comprehensible to non-specialists. Evaluations using the publicly accessible PLOS and eLife datasets show that our methods surpass Plain Gemini model, demonstrating a 20% increase in readability scores, a 15% improvement in ROUGE-2 relevance scores, and a 10% enhancement in factual accuracy. The RAG-RLRC-LaySum framework effectively democratizes scientific knowledge, enhancing public engagement with biomedical discoveries.
△ Less
Submitted 27 May, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
Understanding the Rare Inflammatory Disease Using Large Language Models and Social Media Data
Authors:
Nan Miles Xi,
Hong-Long Ji,
Lin Wang
Abstract:
Sarcoidosis is a rare inflammatory disease characterized by the formation of granulomas in various organs. The disease presents diagnostic and treatment challenges due to its diverse manifestations and unpredictable nature. In this study, we employed a Large Language Model (LLM) to analyze sarcoidosis-related discussions on the social media platform Reddit. Our findings underscore the efficacy of…
▽ More
Sarcoidosis is a rare inflammatory disease characterized by the formation of granulomas in various organs. The disease presents diagnostic and treatment challenges due to its diverse manifestations and unpredictable nature. In this study, we employed a Large Language Model (LLM) to analyze sarcoidosis-related discussions on the social media platform Reddit. Our findings underscore the efficacy of LLMs in accurately identifying sarcoidosis-related content. We discovered a wide array of symptoms reported by patients, with fatigue, swollen lymph nodes, and shortness of breath as the most prevalent. Prednisone was the most prescribed medication, while infliximab showed the highest effectiveness in improving prognoses. Notably, our analysis revealed disparities in prognosis based on age and gender, with women and younger patients experiencing good and polarized outcomes, respectively. Furthermore, unsupervised clustering identified three distinct patient subgroups (phenotypes) with unique symptom profiles, prognostic outcomes, and demographic distributions. Finally, sentiment analysis revealed a moderate negative impact on patients' mental health post-diagnosis, particularly among women and younger individuals. Our study represents the first application of LLMs to understand sarcoidosis through social media data. It contributes to understanding the disease by providing data-driven insights into its manifestations, treatments, prognoses, and impact on patients' lives. Our findings have direct implications for improving personalized treatment strategies and enhancing the quality of care for individuals living with sarcoidosis.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Achieving millisecond coherence fluxonium through overlap Josephson junctions
Authors:
Fei Wang,
Kannan Lu,
Huijuan Zhan,
Lu Ma,
Feng Wu,
Hantao Sun,
Hao Deng,
Yang Bai,
Feng Bao,
Xu Chang,
Ran Gao,
Xun Gao,
Guicheng Gong,
Lijuan Hu,
Ruizi Hu,
Honghong Ji,
Xizheng Ma,
Liyong Mao,
Zhijun Song,
Chengchun Tang,
Hongcheng Wang,
Tenghui Wang,
Ziang Wang,
Tian Xia,
Hongxin Xu
, et al. (10 additional authors not shown)
Abstract:
Fluxonium qubits are recognized for their high coherence times and high operation fidelities, attributed to their unique design incorporating over 100 Josephson junctions per superconducting loop. However, this complexity poses significant fabrication challenges, particularly in achieving high yield and junction uniformity with traditional methods. Here, we introduce an overlap process for Josephs…
▽ More
Fluxonium qubits are recognized for their high coherence times and high operation fidelities, attributed to their unique design incorporating over 100 Josephson junctions per superconducting loop. However, this complexity poses significant fabrication challenges, particularly in achieving high yield and junction uniformity with traditional methods. Here, we introduce an overlap process for Josephson junction fabrication that achieves nearly 100% yield and maintains uniformity across a 2-inch wafer with less than 5% variation for the phase slip junction and less than 2% for the junction array. Our compact junction array design facilitates fluxonium qubits with energy relaxation times exceeding 1 millisecond at the flux frustration point, demonstrating consistency with state-of-the-art dielectric loss tangents and flux noise across multiple devices. This work suggests the scalability of high coherence fluxonium processors using CMOS-compatible processes, marking a significant step towards practical quantum computing.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
An Empirical Study of Kotlin-Java Interactions
Authors:
Qiong Feng,
Huan Ji,
Xiaotian Ma,
Peng Liang
Abstract:
Background: Since Google introduced Kotlin as an official programming language for developing Android apps in 2017, Kotlin has gained widespread adoption in Android development. The interoperability of Java and Kotlin's design nature allows them to coexist and interact with each other smoothly within a project. Aims: However, there is limited research on how Java and Kotlin interact with each othe…
▽ More
Background: Since Google introduced Kotlin as an official programming language for developing Android apps in 2017, Kotlin has gained widespread adoption in Android development. The interoperability of Java and Kotlin's design nature allows them to coexist and interact with each other smoothly within a project. Aims: However, there is limited research on how Java and Kotlin interact with each other in real-world projects and what challenges are faced during these interactions. The answers to these questions are key to understanding these kinds of cross-language software systems. Methods: In this paper, we implemented a tool named DependExtractor, which can extract 11 kinds of Kotlin-Java dependencies, and conducted an empirical study of 23 Kotlin-Java real-world projects with 3,227 Java and 8,630 Kotlin source files. Results: Our findings revealed that Java and Kotlin frequently interact with each other in these cross-language projects, with access and call dependency types being the most dominant. Compared to files interacting with other files in the same language, Java/Kotlin source files, which participate in the cross-language interactions, undergo more commits. Additionally, among all Kotlin-Java problematic interactions, we identified seven common mistakes, along with their fixing strategies. Conclusions: The findings of this study can help developers understand and address the challenges in Kotlin-Java projects.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence
Authors:
Hangyuan Ji,
Jian Yang,
Linzheng Chai,
Chaoren Wei,
Liqun Yang,
Yunlong Duan,
Yunli Wang,
Tianzhen Sun,
Hongcheng Guo,
Tongliang Li,
Changyu Ren,
Zhoujun Li
Abstract:
To address the increasing complexity and frequency of cybersecurity incidents emphasized by the recent cybersecurity threat reports with over 10 billion instances, cyber threat intelligence (CTI) plays a critical role in the modern cybersecurity landscape by offering the insights required to understand and combat the constantly evolving nature of cyber threats. Inspired by the powerful capability…
▽ More
To address the increasing complexity and frequency of cybersecurity incidents emphasized by the recent cybersecurity threat reports with over 10 billion instances, cyber threat intelligence (CTI) plays a critical role in the modern cybersecurity landscape by offering the insights required to understand and combat the constantly evolving nature of cyber threats. Inspired by the powerful capability of large language models (LLMs) in handling complex tasks, in this paper, we introduce a framework to benchmark, elicit, and improve cybersecurity incident analysis and response abilities in LLMs for Security Events (SEvenLLM). Specifically, we create a high-quality bilingual instruction corpus by crawling cybersecurity raw text from cybersecurity websites to overcome the lack of effective data for information extraction. Then, we design a pipeline to auto-select tasks from the tasks pool and convert the raw text into supervised corpora comprised of question and response. The instruction dataset SEvenLLM-Instruct is used to train cybersecurity LLMs with the multi-task learning objective (27 well-designed tasks) for augmenting the analysis of cybersecurity events. Extensive experiments in our curated benchmark (SEvenLLM-bench) demonstrate that SEvenLLM performs more sophisticated threat analysis and fortifies defenses against the evolving landscape of cyber threats.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
On the spectral edge of non-Hermitian random matrices
Authors:
Andrew Campbell,
Giorgio Cipolloni,
László Erdős,
Hong Chang Ji
Abstract:
For general non-Hermitian random matrices $X$ and deterministic deformation matrices $A$, we prove that the local eigenvalue statistics of $A+X$ close to the typical edge points of its spectrum are universal. Furthermore, we show that under natural assumptions on $A$ the spectrum of $A+X$ does not have outliers at a distance larger than the natural fluctuation scale of the eigenvalues. As a conseq…
▽ More
For general non-Hermitian random matrices $X$ and deterministic deformation matrices $A$, we prove that the local eigenvalue statistics of $A+X$ close to the typical edge points of its spectrum are universal. Furthermore, we show that under natural assumptions on $A$ the spectrum of $A+X$ does not have outliers at a distance larger than the natural fluctuation scale of the eigenvalues. As a consequence, the number of eigenvalues in each component of $\mathrm{Spec}(A+X)$ is deterministic.
△ Less
Submitted 6 May, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
Weak-to-Strong Extrapolation Expedites Alignment
Authors:
Chujie Zheng,
Ziqi Wang,
Heng Ji,
Minlie Huang,
Nanyun Peng
Abstract:
The open-source community is experiencing a surge in the release of large language models (LLMs) that are trained to follow instructions and align with human preference. However, further training to improve them still requires expensive computational resources and data annotations. Is it possible to bypass additional training and cost-effectively acquire better-aligned models? Inspired by the lite…
▽ More
The open-source community is experiencing a surge in the release of large language models (LLMs) that are trained to follow instructions and align with human preference. However, further training to improve them still requires expensive computational resources and data annotations. Is it possible to bypass additional training and cost-effectively acquire better-aligned models? Inspired by the literature on model interpolation, we propose a simple method called ExPO to boost LLMs' alignment with human preference. Utilizing a model that has undergone alignment training (e.g., via DPO or RLHF) and its initial SFT checkpoint, ExPO directly obtains a better-aligned model by extrapolating from the weights of the initial and the aligned models, which implicitly optimizes the alignment objective via first-order approximation. Through experiments with twelve open-source LLMs on HuggingFace, we demonstrate that ExPO consistently improves off-the-shelf DPO/RLHF models, as evaluated on the mainstream LLM benchmarks AlpacaEval 2.0 and MT-Bench. Moreover, ExPO exhibits remarkable scalability across various model sizes (from 1.8B to 70B) and capabilities. Through controlled experiments and further empirical analyses, we shed light on the essence of ExPO amplifying the reward signal learned during alignment training. Our work demonstrates the efficacy of model extrapolation in expediting the alignment of LLMs with human preference, suggesting a promising direction for future research.
△ Less
Submitted 22 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
A Survey on Federated Analytics: Taxonomy, Enabling Techniques, Applications and Open Issues
Authors:
Zibo Wang,
Haichao Ji,
Yifei Zhu,
Dan Wang,
Zhu Han
Abstract:
The escalating influx of data generated by networked edge devices, coupled with the growing awareness of data privacy, has promoted a transformative shift in computing paradigms from centralized data processing to privacy-preserved distributed data processing. Federated analytics (FA) is an emerging technique to support collaborative data analytics among diverse data owners without centralizing th…
▽ More
The escalating influx of data generated by networked edge devices, coupled with the growing awareness of data privacy, has promoted a transformative shift in computing paradigms from centralized data processing to privacy-preserved distributed data processing. Federated analytics (FA) is an emerging technique to support collaborative data analytics among diverse data owners without centralizing the raw data. Despite the wide applications of FA in industry and academia, a comprehensive examination of existing research efforts in FA has been notably absent. This survey aims to bridge this gap by first providing an overview of FA, elucidating key concepts, and discussing its relationship with similar concepts. We then conduct a thorough examination of FA, including its taxonomy, key challenges, and enabling techniques. Diverse FA applications, including statistical metrics, set computation, frequency-related applications, database query operations, model-based applications, FL-assisting FA tasks, and other wireless network applications are then carefully reviewed. We complete the survey with several open research issues and future directions. This survey intends to provide a holistic understanding of the emerging FA techniques and foster the continued evolution of privacy-preserving distributed data processing in the emerging networked society.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture
Authors:
Wei Zhang,
Hongcheng Guo,
Jian Yang,
Yi Zhang,
Chaoran Yan,
Zhoujin Tian,
Hangyuan Ji,
Zhoujun Li,
Tongliang Li,
Tieqiao Zheng,
Chao Chen,
Yi Liang,
Xu Shi,
Liangfan Zheng,
Bo Zhang
Abstract:
The escalating complexity of micro-services architecture in cloud-native technologies poses significant challenges for maintaining system stability and efficiency. To conduct root cause analysis (RCA) and resolution of alert events, we propose a pioneering framework, multi-Agent Blockchain-inspired Collaboration for root cause analysis in micro-services architecture (mABC), to revolutionize the AI…
▽ More
The escalating complexity of micro-services architecture in cloud-native technologies poses significant challenges for maintaining system stability and efficiency. To conduct root cause analysis (RCA) and resolution of alert events, we propose a pioneering framework, multi-Agent Blockchain-inspired Collaboration for root cause analysis in micro-services architecture (mABC), to revolutionize the AI for IT operations (AIOps) domain, where multiple agents based on the powerful large language models (LLMs) perform blockchain-inspired voting to reach a final agreement following a standardized process for processing tasks and queries provided by Agent Workflow. Specifically, seven specialized agents derived from Agent Workflow each provide valuable insights towards root cause analysis based on their expertise and the intrinsic software knowledge of LLMs collaborating within a decentralized chain. To avoid potential instability issues in LLMs and fully leverage the transparent and egalitarian advantages inherent in a decentralized structure, mABC adopts a decision-making process inspired by blockchain governance principles while considering the contribution index and expertise index of each agent. Experimental results on the public benchmark AIOps challenge dataset and our created train-ticket dataset demonstrate superior performance in accurately identifying root causes and formulating effective solutions, compared to previous strong baselines. The ablation study further highlights the significance of each component within mABC, with Agent Workflow, multi-agent, and blockchain-inspired voting being crucial for achieving optimal performance. mABC offers a comprehensive automated root cause analysis and resolution in micro-services architecture and achieves a significant improvement in the AIOps domain compared to existing baselines
△ Less
Submitted 3 May, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Text-Based Reasoning About Vector Graphics
Authors:
Zhenhailong Wang,
Joy Hsu,
Xingyao Wang,
Kuan-Hao Huang,
Manling Li,
Jiajun Wu,
Heng Ji
Abstract:
While large multimodal models excel in broad vision-language benchmarks, they often struggle with tasks requiring precise perception of low-level visual details, such as comparing line lengths or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics -- images composed purely of 2D objects and shapes. To address this challenge, we propose…
▽ More
While large multimodal models excel in broad vision-language benchmarks, they often struggle with tasks requiring precise perception of low-level visual details, such as comparing line lengths or solving simple mazes. In particular, this failure mode persists in question-answering tasks about vector graphics -- images composed purely of 2D objects and shapes. To address this challenge, we propose the Visually Descriptive Language Model (VDLM), which performs text-based reasoning about vector graphics. VDLM leverages Scalable Vector Graphics (SVG) for a more precise visual description and first uses an off-the-shelf raster-to-SVG algorithm for encoding. Since existing language models cannot understand raw SVGs in a zero-shot setting, VDLM then bridges SVG with pretrained language models through a newly introduced intermediate symbolic representation, Primal Visual Description (PVD), comprising primitive attributes (e.g., shape, position, measurement) with their corresponding predicted values. PVD is task-agnostic and represents visual primitives that are universal across all vector graphics. It can be learned with procedurally generated (SVG, PVD) pairs and also enables the direct use of LLMs for generalization to complex reasoning tasks. By casting an image to a text-based representation, we can leverage the power of language models to learn alignment from SVG to visual primitives and generalize to unseen question-answering tasks. Empirical results show that VDLM achieves stronger zero-shot performance compared to state-of-the-art LMMs, such as GPT-4V, in various low-level multimodal perception and reasoning tasks on vector graphics. We additionally present extensive analyses on VDLM's performance, demonstrating that our framework offers better interpretability due to its disentangled perception and reasoning processes. Project page: https://mikewangwzhl.github.io/VDLM/
△ Less
Submitted 24 May, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization
Authors:
Zixuan Zhang,
Revanth Gangi Reddy,
Kevin Small,
Tong Zhang,
Heng Ji
Abstract:
Open-domain Question Answering (OpenQA) aims at answering factual questions with an external large-scale knowledge corpus. However, real-world knowledge is not static; it updates and evolves continually. Such a dynamic characteristic of knowledge poses a vital challenge for these models, as the trained models need to constantly adapt to the latest information to make sure that the answers remain a…
▽ More
Open-domain Question Answering (OpenQA) aims at answering factual questions with an external large-scale knowledge corpus. However, real-world knowledge is not static; it updates and evolves continually. Such a dynamic characteristic of knowledge poses a vital challenge for these models, as the trained models need to constantly adapt to the latest information to make sure that the answers remain accurate. In addition, it is still unclear how well an OpenQA model can transfer to completely new knowledge domains. In this paper, we investigate the generalization performance of a retrieval-augmented QA model in two specific scenarios: 1) adapting to updated versions of the same knowledge corpus; 2) switching to completely different knowledge domains. We observe that the generalization challenges of OpenQA models stem from the reader's over-reliance on memorizing the knowledge from the external corpus, which hinders the model from generalizing to a new knowledge corpus. We introduce Corpus-Invariant Tuning (CIT), a simple but effective training strategy, to mitigate the knowledge over-memorization by controlling the likelihood of retrieved contexts during training. Extensive experimental results on multiple OpenQA benchmarks show that CIT achieves significantly better generalizability without compromising the model's performance in its original corpus and domain.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Surface variation analysis of freeform optical systems over surface frequency bands for prescribed wavefront errors
Authors:
Rundong Fan,
Shili Wei,
Huiru JI,
Zhuang Qian,
Hao Tan,
Yan Mo,
Donglin MA
Abstract:
The surface errors of freeform surfaces reflect the manufacturing complexities and significantly impact the feasibility of processing designed optical systems. With multiple degrees of freedom, freeform surfaces pose challenges in surface tolerance analysis in the field. Nevertheless, current research has neglected the influence of surface slopes on the directions of ray propagation. A sudden alte…
▽ More
The surface errors of freeform surfaces reflect the manufacturing complexities and significantly impact the feasibility of processing designed optical systems. With multiple degrees of freedom, freeform surfaces pose challenges in surface tolerance analysis in the field. Nevertheless, current research has neglected the influence of surface slopes on the directions of ray propagation. A sudden alteration in the surface slope will lead to a corresponding abrupt shift in the wavefront, even when the change in surface sag is minimal. Moreover, within the realm of freeform surface manufacturing, variation in surface slope across different frequency bands may give rise to unique surface variation. Within the context of this study, we propose a tolerance analysis method to analyze surface variation in freeform surfaces considering surface frequency band slopes based on real ray data. This approach utilizes real ray data to rapidly evaluate surface variation within a specified frequency band of surface slopes. Crucially, our proposed method yields the capability to obtain system surface variation with significant wavefront aberration, in contrast to previous methodologies. The feasibility and advantages of this framework are assessed by analyzing a single-mirror system with a single field and an off-axis two-mirror system. We expect to integrate the proposed methodology with freeform surface design and manufacturing, thereby expanding the scope of freeform optics.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Fact Checking Beyond Training Set
Authors:
Payam Karisani,
Heng Ji
Abstract:
Evaluating the veracity of everyday claims is time consuming and in some cases requires domain expertise. We empirically demonstrate that the commonly used fact checking pipeline, known as the retriever-reader, suffers from performance deterioration when it is trained on the labeled data from one domain and used in another domain. Afterwards, we delve into each component of the pipeline and propos…
▽ More
Evaluating the veracity of everyday claims is time consuming and in some cases requires domain expertise. We empirically demonstrate that the commonly used fact checking pipeline, known as the retriever-reader, suffers from performance deterioration when it is trained on the labeled data from one domain and used in another domain. Afterwards, we delve into each component of the pipeline and propose novel algorithms to address this problem. We propose an adversarial algorithm to make the retriever component robust against distribution shift. Our core idea is to initially train a bi-encoder on the labeled source data, and then, to adversarially train two separate document and claim encoders using unlabeled target data. We then focus on the reader component and propose to train it such that it is insensitive towards the order of claims and evidence documents. Our empirical evaluations support the hypothesis that such a reader shows a higher robustness against distribution shift. To our knowledge, there is no publicly available multi-topic fact checking dataset. Thus, we propose a simple automatic method to re-purpose two well-known fact checking datasets. We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models, including recent domain adaptation models that use GPT4 for generating synthetic data.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Resource and Mobility Management in Hybrid LiFi and WiFi Networks: A User-Centric Learning Approach
Authors:
Han Ji,
Xiping Wu
Abstract:
Hybrid light fidelity (LiFi) and wireless fidelity (WiFi) networks (HLWNets) are an emerging indoor wireless communication paradigm, which combines the advantages of the capacious optical spectra of LiFi and ubiquitous coverage of WiFi. Meanwhile, load balancing (LB) becomes a key challenge in resource management for such hybrid networks. The existing LB methods are mostly network-centric, relying…
▽ More
Hybrid light fidelity (LiFi) and wireless fidelity (WiFi) networks (HLWNets) are an emerging indoor wireless communication paradigm, which combines the advantages of the capacious optical spectra of LiFi and ubiquitous coverage of WiFi. Meanwhile, load balancing (LB) becomes a key challenge in resource management for such hybrid networks. The existing LB methods are mostly network-centric, relying on a central unit to make a solution for the users all at once. Consequently, the solution needs to be updated for all users at the same pace, regardless of their moving status. This would affect the network performance in two aspects: i) when the update frequency is low, it would compromise the connectivity of fast-moving users; ii) when the update frequency is high, it would cause unnecessary handovers as well as hefty feedback costs for slow-moving users. Motivated by this, we investigate user-centric LB which allows users to update their solutions at different paces. The research is developed upon our previous work on adaptive target-condition neural network (ATCNN), which can conduct LB for individual users in quasi-static channels. In this paper, a deep neural network (DNN) model is designed to enable an adaptive update interval for each individual user. This new model is termed as mobility-supporting neural network (MSNN). Associating MSNN with ATCNN, a user-centric LB framework named mobility-supporting ATCNN (MS-ATCNN) is proposed to handle resource management and mobility management simultaneously. Results show that at the same level of average update interval, MS-ATCNN can achieve a network throughput up to 215\% higher than conventional LB methods such as game theory, especially for a larger number of users. In addition, MS-ATCNN costs an ultra low runtime at the level of 100s $μ$s, which is two to three orders of magnitude lower than game theory.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
Authors:
Kung-Hsiang Huang,
Hou Pong Chan,
Yi R. Fung,
Haoyi Qiu,
Mingyang Zhou,
Shafiq Joty,
Shih-Fu Chang,
Heng Ji
Abstract:
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increa…
▽ More
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increasingly being applied to chart understanding tasks. This survey paper provides a comprehensive overview of the recent developments, challenges, and future directions in chart understanding within the context of these foundation models. We review fundamental building blocks crucial for studying chart understanding tasks. Additionally, we explore various tasks and their evaluation metrics and sources of both charts and textual inputs. Various modeling strategies are then examined, encompassing both classification-based and generation-based approaches, along with tool augmentation techniques that enhance chart understanding performance. Furthermore, we discuss the state-of-the-art performance of each task and discuss how we can improve the performance. Challenges and future directions are addressed, highlighting the importance of several topics, such as domain-specific charts, lack of efforts in developing evaluation metrics, and agent-oriented settings. This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis, providing valuable insights and directions for future research in chart understanding leveraging large foundation models. The studies mentioned in this paper, along with emerging new research, will be continually updated at: https://github.com/khuangaf/Awesome-Chart-Understanding.
△ Less
Submitted 25 March, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Noncentrosymmetric Triangular Magnet CaMnTeO$_6$: Strong Quantum Fluctuations and Role of s0 vs. s2 Electronic States in Competing Exchange Interactions
Authors:
Xudong Huai,
Emmanuel Acheampong,
Erich Delles,
Michał J. Winiarski,
Maurice Sorolla II,
Lila Nassar,
Mingli Liang,
Caleb Ramette,
Huiwen Ji,
Allen Scheie,
Stuart Calder,
Martin Mourigal,
Thao T. Tran
Abstract:
Noncentrosymmetric triangular magnets offer a unique platform for realizing strong quantum fluctuations. However, designing these quantum materials remains an open challenge attributable to a knowledge gap in the tunability of competing exchange interactions at the atomic level. Here, we create a new noncentrosymmetric triangular S = 3/2 magnet CaMnTeO$_6$ based on careful chemical and physical co…
▽ More
Noncentrosymmetric triangular magnets offer a unique platform for realizing strong quantum fluctuations. However, designing these quantum materials remains an open challenge attributable to a knowledge gap in the tunability of competing exchange interactions at the atomic level. Here, we create a new noncentrosymmetric triangular S = 3/2 magnet CaMnTeO$_6$ based on careful chemical and physical considerations. The model material displays competing magnetic interactions and features nonlinear optical responses with the capability of generating coherent photons. The incommensurate magnetic ground state of CaMnTeO$_6$ with an unusually large spin rotation angle of 127 deg.(1) indicates that the anisotropic interlayer exchange is strong and competing with the isotropic interlayer Heisenberg interaction. The moment of 1.39(1) $μ$B, extracted from low-temperature heat capacity and neutron diffraction measurements, is only 46% of the expected value of the static moment 3 $μ$B. This reduction indicates the presence of strong quantum fluctuations in the half-integer spin S = 3/2 CaMnTeO$_6$ magnet, which is rare. By comparing the spin-polarized band structure, chemical bonding, and physical properties of AMnTeO$_6$ (A = Ca, Sr, Pb), we demonstrate how quantum-chemical interpretation can illuminate insights into the fundamentals of magnetic exchange interactions, providing a powerful tool for modulating spin dynamics with atomically precise control.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
Authors:
Haoxuanye Ji,
Pengpeng Liang,
Erkang Cheng
Abstract:
Multi-camera-based 3D object detection has made notable progress in the past several years. However, we observe that there are cases (e.g. faraway regions) in which popular 2D object detectors are more reliable than state-of-the-art 3D detectors. In this paper, to improve the performance of query-based 3D object detectors, we present a novel query generating approach termed QAF2D, which infers 3D…
▽ More
Multi-camera-based 3D object detection has made notable progress in the past several years. However, we observe that there are cases (e.g. faraway regions) in which popular 2D object detectors are more reliable than state-of-the-art 3D detectors. In this paper, to improve the performance of query-based 3D object detectors, we present a novel query generating approach termed QAF2D, which infers 3D query anchors from 2D detection results. A 2D bounding box of an object in an image is lifted to a set of 3D anchors by associating each sampled point within the box with depth, yaw angle, and size candidates. Then, the validity of each 3D anchor is verified by comparing its projection in the image with its corresponding 2D box, and only valid anchors are kept and used to construct queries. The class information of the 2D bounding box associated with each query is also utilized to match the predicted boxes with ground truth for the set-based loss. The image feature extraction backbone is shared between the 3D detector and 2D detector by adding a small number of prompt parameters. We integrate QAF2D into three popular query-based 3D object detectors and carry out comprehensive evaluations on the nuScenes dataset. The largest improvement that QAF2D can bring about on the nuScenes validation subset is $2.3\%$ NDS and $2.7\%$ mAP. Code is available at https://github.com/nullmax-vision/QAF2D.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
LVIC: Multi-modality segmentation by Lifting Visual Info as Cue
Authors:
Zichao Dong,
Bowen Pang,
Xufeng Huang,
Hang Ji,
Xin Zhan,
Junbo Chen
Abstract:
Multi-modality fusion is proven an effective method for 3d perception for autonomous driving. However, most current multi-modality fusion pipelines for LiDAR semantic segmentation have complicated fusion mechanisms. Point painting is a quite straight forward method which directly bind LiDAR points with visual information. Unfortunately, previous point painting like methods suffer from projection e…
▽ More
Multi-modality fusion is proven an effective method for 3d perception for autonomous driving. However, most current multi-modality fusion pipelines for LiDAR semantic segmentation have complicated fusion mechanisms. Point painting is a quite straight forward method which directly bind LiDAR points with visual information. Unfortunately, previous point painting like methods suffer from projection error between camera and LiDAR. In our experiments, we find that this projection error is the devil in point painting. As a result of that, we propose a depth aware point painting mechanism, which significantly boosts the multi-modality fusion. Apart from that, we take a deeper look at the desired visual feature for LiDAR to operate semantic segmentation. By Lifting Visual Information as Cue, LVIC ranks 1st on nuScenes LiDAR semantic segmentation benchmark. Our experiments show the robustness and effectiveness. Codes would be make publicly available soon.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
$\textit{L+M-24}$: Building a Dataset for Language + Molecules @ ACL 2024
Authors:
Carl Edwards,
Qingyun Wang,
Lawrence Zhao,
Heng Ji
Abstract:
Language-molecule models have emerged as an exciting direction for molecular discovery and understanding. However, training these models is challenging due to the scarcity of molecule-language pair datasets. At this point, datasets have been released which are 1) small and scraped from existing databases, 2) large but noisy and constructed by performing entity linking on the scientific literature,…
▽ More
Language-molecule models have emerged as an exciting direction for molecular discovery and understanding. However, training these models is challenging due to the scarcity of molecule-language pair datasets. At this point, datasets have been released which are 1) small and scraped from existing databases, 2) large but noisy and constructed by performing entity linking on the scientific literature, and 3) built by converting property prediction datasets to natural language using templates. In this document, we detail the $\textit{L+M-24}$ dataset, which has been created for the Language + Molecules Workshop shared task at ACL 2024. In particular, $\textit{L+M-24}$ is designed to focus on three key benefits of natural language in molecule design: compositionality, functionality, and abstraction.
△ Less
Submitted 22 February, 2024;
originally announced March 2024.
-
Adaptive Testing Environment Generation for Connected and Automated Vehicles with Dense Reinforcement Learning
Authors:
Jingxuan Yang,
Ruoxuan Bai,
Haoyuan Ji,
Yi Zhang,
Jianming Hu,
Shuo Feng
Abstract:
The assessment of safety performance plays a pivotal role in the development and deployment of connected and automated vehicles (CAVs). A common approach involves designing testing scenarios based on prior knowledge of CAVs (e.g., surrogate models), conducting tests in these scenarios, and subsequently evaluating CAVs' safety performances. However, substantial differences between CAVs and the prio…
▽ More
The assessment of safety performance plays a pivotal role in the development and deployment of connected and automated vehicles (CAVs). A common approach involves designing testing scenarios based on prior knowledge of CAVs (e.g., surrogate models), conducting tests in these scenarios, and subsequently evaluating CAVs' safety performances. However, substantial differences between CAVs and the prior knowledge can significantly diminish the evaluation efficiency. In response to this issue, existing studies predominantly concentrate on the adaptive design of testing scenarios during the CAV testing process. Yet, these methods have limitations in their applicability to high-dimensional scenarios. To overcome this challenge, we develop an adaptive testing environment that bolsters evaluation robustness by incorporating multiple surrogate models and optimizing the combination coefficients of these surrogate models to enhance evaluation efficiency. We formulate the optimization problem as a regression task utilizing quadratic programming. To efficiently obtain the regression target via reinforcement learning, we propose the dense reinforcement learning method and devise a new adaptive policy with high sample efficiency. Essentially, our approach centers on learning the values of critical scenes displaying substantial surrogate-to-real gaps. The effectiveness of our method is validated in high-dimensional overtaking scenarios, demonstrating that our approach achieves notable evaluation efficiency.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Locating heating channels of the solar corona in a plage region with the aid of high-resolution 10830 Å filtergrams
Authors:
Parida Hashim,
Fangyu Xu,
Ya Wang,
Weijie Men,
Jinhua Shen,
Yingna Su,
Jianping Li,
Zhenyu Jin,
Haisheng Ji
Abstract:
In this paper, with a set of high-resolution He I 10830 Å filtergrams, we select an area in a plage, very likely an EUV moss area, as an interface layer to follow the clues of coronal heating channels down to the photosphere. The filtergrams are obtained from the 1-meter aperture New Vacuum Solar Telescope (NVST). We make a distinction between the darker and the brighter regions in the selected ar…
▽ More
In this paper, with a set of high-resolution He I 10830 Å filtergrams, we select an area in a plage, very likely an EUV moss area, as an interface layer to follow the clues of coronal heating channels down to the photosphere. The filtergrams are obtained from the 1-meter aperture New Vacuum Solar Telescope (NVST). We make a distinction between the darker and the brighter regions in the selected area and name the two regions enhanced absorption patches (EAPs) and low absorption patches (LAPs). With well-aligned, nearly simultaneous data from multiple channels of the AIA and the continuum of the HMI on board SDO, we compare the EUV/UV emissions, emission measure, mean temperature, and continuum intensity in the two kinds of regions. The following progress is made: 1) The mean EUV emissions over EAPs are mostly stronger than the corresponding emissions over LAPs except for the emission at 335 Å. The UV emissions at 1600 and 1700 Å fail to capture the difference between the two regions. 2) In the logarithmic temperature range of 5.6-6.2, EAPs have higher EUV emission measure than LAPs, but they have lower mean coronal temperature. 3) The mean continuum intensity over EAPs is lower. Based on the above progress, we suggest that the energy for coronal heating in the moss region can be traced down to some areas in intergranular lanes with enhanced density of both cool and hot material. The lower temperature over the EAPs is due to the greater fraction of cool material over there.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models
Authors:
Jeonghwan Kim,
Heng Ji
Abstract:
Recent advances in instruction-tuned Large Vision-Language Models (LVLMs) have imbued the models with the ability to generate high-level, image-grounded explanations with ease. While such capability is largely attributed to the rich world knowledge contained within the Large Language Models (LLMs), our work reveals their shortcomings in fine-grained visual categorization (FGVC) across six differen…
▽ More
Recent advances in instruction-tuned Large Vision-Language Models (LVLMs) have imbued the models with the ability to generate high-level, image-grounded explanations with ease. While such capability is largely attributed to the rich world knowledge contained within the Large Language Models (LLMs), our work reveals their shortcomings in fine-grained visual categorization (FGVC) across six different benchmark settings. Most recent state-of-the-art LVLMs like LLaVa-1.5, InstructBLIP and GPT-4V not only severely deteriorate in terms of classification performance, e.g., average drop of 65.58 in EM for Stanford Dogs for LLaVA-1.5, but also struggle to generate an accurate explanation with detailed attributes based on the concept that appears within an input image despite their capability to generate holistic image-level descriptions. In-depth analyses show that instruction-tuned LVLMs exhibit modality gap, showing discrepancy when given textual and visual inputs that correspond to the same concept, preventing the image modality from leveraging the rich parametric knowledge within the LLMs. In an effort to further the community's endeavor in this direction, we propose a multiple granularity attribute-centric evaluation benchmark, Finer, which aims to establish a ground to evaluate LVLMs' fine-grained visual comprehension ability and provide significantly improved explainability.
△ Less
Submitted 11 March, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Construction and application of artificial intelligence crowdsourcing map based on multi-track GPS data
Authors:
Yong Wang,
Yanlin Zhou,
Huan Ji,
Zheng He,
Xinyu Shen
Abstract:
In recent years, the rapid development of high-precision map technology combined with artificial intelligence has ushered in a new development opportunity in the field of intelligent vehicles. High-precision map technology is an important guarantee for intelligent vehicles to achieve autonomous driving. However, due to the lack of research on high-precision map technology, it is difficult to ratio…
▽ More
In recent years, the rapid development of high-precision map technology combined with artificial intelligence has ushered in a new development opportunity in the field of intelligent vehicles. High-precision map technology is an important guarantee for intelligent vehicles to achieve autonomous driving. However, due to the lack of research on high-precision map technology, it is difficult to rationally use this technology in the field of intelligent vehicles. Therefore, relevant researchers studied a fast and effective algorithm to generate high-precision GPS data from a large number of low-precision GPS trajectory data fusion, and generated several key data points to simplify the description of GPS trajectory, and realized the "crowdsourced update" model based on a large number of social vehicles for map data collection came into being. This kind of algorithm has the important significance to improve the data accuracy, reduce the measurement cost and reduce the data storage space. On this basis, this paper analyzes the implementation form of crowdsourcing map, so as to improve the various information data in the high-precision map according to the actual situation, and promote the high-precision map can be reasonably applied to the intelligent car.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
The Jiao Tong University Spectroscopic Telescope Project
Authors:
JUST Team,
Chengze Liu,
Ying Zu,
Fabo Feng,
Zhaoyu Li,
Yu Yu,
Hua Bai,
Xiangqun Cui,
Bozhong Gu,
Yizhou Gu,
Jiaxin Han,
Yonghui Hou,
Zhongwen Hu,
Hangxin Ji,
Yipeng Jing,
Wei Li,
Zhaoxiang Qi,
Xianyu Tan,
Cairang Tian,
Dehua Yang,
Xiangyan Yuan,
Chao Zhai,
Congcong Zhang,
Jun Zhang,
Haotong Zhang
, et al. (6 additional authors not shown)
Abstract:
The Jiao Tong University Spectroscopic Telescope (JUST) is a 4.4-meter f/6.0 segmentedmirror telescope dedicated to spectroscopic observations. The JUST primary mirror is composed of 18 hexagonal segments, each with a diameter of 1.1 m. JUST provides two Nasmyth platforms for placing science instruments. One Nasmyth focus fits a field of view of 10 arcmin and the other has an extended field of vie…
▽ More
The Jiao Tong University Spectroscopic Telescope (JUST) is a 4.4-meter f/6.0 segmentedmirror telescope dedicated to spectroscopic observations. The JUST primary mirror is composed of 18 hexagonal segments, each with a diameter of 1.1 m. JUST provides two Nasmyth platforms for placing science instruments. One Nasmyth focus fits a field of view of 10 arcmin and the other has an extended field of view of 1.2 deg with correction optics. A tertiary mirror is used to switch between the two Nasmyth foci. JUST will be installed at a site at Lenghu in Qinghai Province, China, and will conduct spectroscopic observations with three types of instruments to explore the dark universe, trace the dynamic universe, and search for exoplanets: (1) a multi-fiber (2000 fibers) medium-resolution spectrometer (R=4000-5000) to spectroscopically map galaxies and large-scale structure; (2) an integral field unit (IFU) array of 500 optical fibers and/or a long-slit spectrograph dedicated to fast follow-ups of transient sources for multimessenger astronomy; (3) a high-resolution spectrometer (R~100000) designed to identify Jupiter analogs and Earth-like planets, with the capability to characterize the atmospheres of hot exoplanets.
△ Less
Submitted 29 February, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Towards singular optimality in the presence of local initial knowledge
Authors:
Hongyan Ji,
Sriram V. Pemmaraju
Abstract:
The Knowledge Till rho CONGEST model is a variant of the classical CONGEST model of distributed computing in which each vertex v has initial knowledge of the radius-rho ball centered at v. The most commonly studied variants of the CONGEST model are KT0 CONGEST in which nodes initially know nothing about their neighbors and KT1 CONGEST in which nodes initially know the IDs of all their neighbors. I…
▽ More
The Knowledge Till rho CONGEST model is a variant of the classical CONGEST model of distributed computing in which each vertex v has initial knowledge of the radius-rho ball centered at v. The most commonly studied variants of the CONGEST model are KT0 CONGEST in which nodes initially know nothing about their neighbors and KT1 CONGEST in which nodes initially know the IDs of all their neighbors. It has been shown that having access to neighbors' IDs (as in the KT1 CONGEST model) can substantially reduce the message complexity of algorithms for fundamental problems such as BROADCAST and MST. For example, King, Kutten, and Thorup (PODC 2015) show how to construct an MST using just Otilde(n) messages in the KT1 CONGEST model, whereas there is an Omega(m) message lower bound for MST in the KT0 CONGEST model. Building on this result, Gmyr and Pandurangen (DISC 2018) present a family of distributed randomized algorithms for various global problems that exhibit a trade-off between message and round complexity. These algorithms are based on constructing a sparse, spanning subgraph called a danner. Specifically, given a graph G and any delta in [0,1], their algorithm constructs (with high probability) a danner that has diameter Otilde(D + n^{1-delta}) and Otilde(min{m,n^{1+delta}}) edges in Otilde(n^{1-delta}) rounds while using Otilde(min{m,n^{1+δ}}) messages, where n, m, and D are the number of nodes, edges, and the diameter of G, respectively. In the main result of this paper, we show that if we assume the KT2 CONGEST model, it is possible to substantially improve the time-message trade-off in constructing a danner. Specifically, we show in the KT2 CONGEST model, how to construct a danner that has diameter Otilde(D + n^{1-2delta}) and Otilde(min{m,n^{1+delta}}) edges in Otilde(n^{1-2delta}) rounds while using Otilde(min{m,n^{1+δ}}) messages for any delta in [0,1/2].
△ Less
Submitted 22 February, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation
Authors:
Keyang Xuan,
Li Yi,
Fan Yang,
Ruochen Wu,
Yi R. Fung,
Heng Ji
Abstract:
The rise of multimodal misinformation on social platforms poses significant challenges for individuals and societies. Its increased credibility and broader impact compared to textual misinformation make detection complex, requiring robust reasoning across diverse media types and profound knowledge for accurate verification. The emergence of Large Vision Language Model (LVLM) offers a potential sol…
▽ More
The rise of multimodal misinformation on social platforms poses significant challenges for individuals and societies. Its increased credibility and broader impact compared to textual misinformation make detection complex, requiring robust reasoning across diverse media types and profound knowledge for accurate verification. The emergence of Large Vision Language Model (LVLM) offers a potential solution to this problem. Leveraging their proficiency in processing visual and textual information, LVLM demonstrates promising capabilities in recognizing complex information and exhibiting strong reasoning skills. In this paper, we first investigate the potential of LVLM on multimodal misinformation detection. We find that even though LVLM has a superior performance compared to LLMs, its profound reasoning may present limited power with a lack of evidence. Based on these observations, we propose LEMMA: LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation. LEMMA leverages LVLM intuition and reasoning capabilities while augmenting them with external knowledge to enhance the accuracy of misinformation detection. Our method improves the accuracy over the top baseline LVLM by 7% and 13% on Twitter and Fakeddit datasets respectively.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries
Authors:
Jiateng Liu,
Pengfei Yu,
Yuji Zhang,
Sha Li,
Zixuan Zhang,
Heng Ji
Abstract:
The dynamic nature of real-world information necessitates efficient knowledge editing (KE) in large language models (LLMs) for knowledge updating. However, current KE approaches, which typically operate on (subject, relation, object) triples, ignore the contextual information and the relation among different knowledge. Such editing methods could thus encounter an uncertain editing boundary, leavin…
▽ More
The dynamic nature of real-world information necessitates efficient knowledge editing (KE) in large language models (LLMs) for knowledge updating. However, current KE approaches, which typically operate on (subject, relation, object) triples, ignore the contextual information and the relation among different knowledge. Such editing methods could thus encounter an uncertain editing boundary, leaving a lot of relevant knowledge in ambiguity: Queries that could be answered pre-edit cannot be reliably answered afterward. In this work, we analyze this issue by introducing a theoretical framework for KE that highlights an overlooked set of knowledge that remains unchanged and aids in knowledge deduction during editing, which we name as the deduction anchor. We further address this issue by proposing a novel task of event-based knowledge editing that pairs facts with event descriptions. This task manifests not only a closer simulation of real-world editing scenarios but also a more logically sound setting, implicitly defining the deduction anchor to address the issue of indeterminate editing boundaries. We empirically demonstrate the superiority of event-based editing over the existing setting on resolving uncertainty in edited models, and curate a new benchmark dataset EvEdit derived from the CounterFact dataset. Moreover, while we observe that the event-based setting is significantly challenging for existing approaches, we propose a novel approach Self-Edit that showcases stronger performance, achieving 55.6% consistency improvement while maintaining the naturalness of generation.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Persona-DB: Efficient Large Language Model Personalization for Response Prediction with Collaborative Data Refinement
Authors:
Chenkai Sun,
Ke Yang,
Revanth Gangi Reddy,
Yi R. Fung,
Hou Pong Chan,
ChengXiang Zhai,
Heng Ji
Abstract:
The increasing demand for personalized interactions with large language models (LLMs) calls for the development of methodologies capable of accurately and efficiently identifying user opinions and preferences. Retrieval augmentation emerges as an effective strategy, as it can accommodate a vast number of users without the costs from fine-tuning. Existing research, however, has largely focused on e…
▽ More
The increasing demand for personalized interactions with large language models (LLMs) calls for the development of methodologies capable of accurately and efficiently identifying user opinions and preferences. Retrieval augmentation emerges as an effective strategy, as it can accommodate a vast number of users without the costs from fine-tuning. Existing research, however, has largely focused on enhancing the retrieval stage and devoted limited exploration toward optimizing the representation of the database, a crucial aspect for tasks such as personalization. In this work, we examine the problem from a novel angle, focusing on how data can be better represented for more efficient retrieval in the context of LLM customization. To tackle this challenge, we introduce Persona-DB, a simple yet effective framework consisting of a hierarchical construction process to improve generalization across task contexts and collaborative refinement to effectively bridge knowledge gaps among users. In the task of response forecasting, Persona-DB demonstrates superior efficiency in maintaining accuracy with a significantly reduced retrieval size, a critical advantage in scenarios with extensive histories or limited context windows. Our experiments also indicate a marked improvement of over 15% under cold-start scenarios, when users have extremely sparse data. Furthermore, our analysis reveals the increasing importance of collaborative knowledge as the retrieval capacity expands.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback
Authors:
Henry W. Sprueill,
Carl Edwards,
Khushbu Agarwal,
Mariefel V. Olarte,
Udishnu Sanyal,
Conrad Johnston,
Hongbin Liu,
Heng Ji,
Sutanay Choudhury
Abstract:
The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent…
▽ More
The discovery of new catalysts is essential for the design of new and more efficient chemical processes in order to transition to a sustainable future. We introduce an AI-guided computational screening framework unifying linguistic reasoning with quantum-chemistry based feedback from 3D atomistic representations. Our approach formulates catalyst discovery as an uncertain environment where an agent actively searches for highly effective catalysts via the iterative combination of large language model (LLM)-derived hypotheses and atomistic graph neural network (GNN)-derived feedback. Identified catalysts in intermediate search steps undergo structural evaluation based on spatial orientation, reaction pathways, and stability. Scoring functions based on adsorption energies and barriers steer the exploration in the LLM's knowledge space toward energetically favorable, high-efficiency catalysts. We introduce planning methods that automatically guide the exploration without human input, providing competitive performance against expert-enumerated chemical descriptor-based implementations. By integrating language-guided reasoning with computational chemistry feedback, our work pioneers AI-accelerated, trustworthy catalyst discovery.
△ Less
Submitted 6 May, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results
Authors:
Kelly Payette,
Céline Steger,
Roxane Licandro,
Priscille de Dumast,
Hongwei Bran Li,
Matthew Barkovich,
Liu Li,
Maik Dannecker,
Chen Chen,
Cheng Ouyang,
Niccolò McConnell,
Alina Miron,
Yongmin Li,
Alena Uus,
Irina Grigorescu,
Paula Ramirez Gilliland,
Md Mahfuzur Rahman Siddiquee,
Daguang Xu,
Andriy Myronenko,
Haoyu Wang,
Ziyan Huang,
Jin Ye,
Mireia Alenyà,
Valentin Comte,
Oscar Camara
, et al. (42 additional authors not shown)
Abstract:
Segmentation is a critical step in analyzing the developing human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across dif…
▽ More
Segmentation is a critical step in analyzing the developing human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across different imaging centers remains unsolved, limiting real-world clinical applicability. The multi-center FeTA Challenge 2022 focuses on advancing the generalizability of fetal brain segmentation algorithms for magnetic resonance imaging (MRI). In FeTA 2022, the training dataset contained images and corresponding manually annotated multi-class labels from two imaging centers, and the testing data contained images from these two imaging centers as well as two additional unseen centers. The data from different centers varied in many aspects, including scanners used, imaging parameters, and fetal brain super-resolution algorithms applied. 16 teams participated in the challenge, and 17 algorithms were evaluated. Here, a detailed overview and analysis of the challenge results are provided, focusing on the generalizability of the submissions. Both in- and out of domain, the white matter and ventricles were segmented with the highest accuracy, while the most challenging structure remains the cerebral cortex due to anatomical complexity. The FeTA Challenge 2022 was able to successfully evaluate and advance generalizability of multi-class fetal brain tissue segmentation algorithms for MRI and it continues to benchmark new algorithms. The resulting new methods contribute to improving the analysis of brain development in utero.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking
Authors:
Yi Fung,
Ruining Zhao,
Jae Doo,
Chenkai Sun,
Heng Ji
Abstract:
Pretrained large language models have revolutionized many applications but still face challenges related to cultural bias and a lack of cultural commonsense knowledge crucial for guiding cross-culture communication and interactions. Recognizing the shortcomings of existing methods in capturing the diverse and rich cultures across the world, this paper introduces a novel approach for massively mult…
▽ More
Pretrained large language models have revolutionized many applications but still face challenges related to cultural bias and a lack of cultural commonsense knowledge crucial for guiding cross-culture communication and interactions. Recognizing the shortcomings of existing methods in capturing the diverse and rich cultures across the world, this paper introduces a novel approach for massively multicultural knowledge acquisition. Specifically, our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages. Leveraging this valuable source of data collection, we construct the CultureAtlas dataset, which covers a wide range of sub-country level geographical regions and ethnolinguistic groups, with data cleaning and preprocessing to ensure textual assertion sentence self-containment, as well as fine-grained cultural profile information extraction. Our dataset not only facilitates the evaluation of language model performance in culturally diverse contexts but also serves as a foundational tool for the development of culturally sensitive and aware language models. Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI, to promote a more inclusive and balanced representation of global cultures in the digital domain.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate
Authors:
Kyungha Kim,
Sangyun Lee,
Kung-Hsiang Huang,
Hou Pong Chan,
Manling Li,
Heng Ji
Abstract:
Fact-checking research has extensively explored verification but less so the generation of natural-language explanations, crucial for user trust. While Large Language Models (LLMs) excel in text generation, their capability for producing faithful explanations in fact-checking remains underexamined. Our study investigates LLMs' ability to generate such explanations, finding that zero-shot prompts o…
▽ More
Fact-checking research has extensively explored verification but less so the generation of natural-language explanations, crucial for user trust. While Large Language Models (LLMs) excel in text generation, their capability for producing faithful explanations in fact-checking remains underexamined. Our study investigates LLMs' ability to generate such explanations, finding that zero-shot prompts often result in unfaithfulness. To address these challenges, we propose the Multi-Agent Debate Refinement (MADR) framework, leveraging multiple LLMs as agents with diverse roles in an iterative refining process aimed at enhancing faithfulness in generated explanations. MADR ensures that the final explanation undergoes rigorous validation, significantly reducing the likelihood of unfaithful elements and aligning closely with the provided evidence. Experimental results demonstrate that MADR significantly improves the faithfulness of LLM-generated explanations to the evidence, advancing the credibility and trustworthiness of these explanations.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models
Authors:
Yinghao Zhu,
Changyu Ren,
Shiyun Xie,
Shukai Liu,
Hangyuan Ji,
Zixiang Wang,
Tao Sun,
Long He,
Zhoujun Li,
Xi Zhu,
Chengwei Pan
Abstract:
The integration of multimodal Electronic Health Records (EHR) data has significantly improved clinical predictive capabilities. Leveraging clinical notes and multivariate time-series EHR, existing models often lack the medical context relevent to clinical tasks, prompting the incorporation of external knowledge, particularly from the knowledge graph (KG). Previous approaches with KG knowledge have…
▽ More
The integration of multimodal Electronic Health Records (EHR) data has significantly improved clinical predictive capabilities. Leveraging clinical notes and multivariate time-series EHR, existing models often lack the medical context relevent to clinical tasks, prompting the incorporation of external knowledge, particularly from the knowledge graph (KG). Previous approaches with KG knowledge have primarily focused on structured knowledge extraction, neglecting unstructured data modalities and semantic high dimensional medical knowledge. In response, we propose REALM, a Retrieval-Augmented Generation (RAG) driven framework to enhance multimodal EHR representations that address these limitations. Firstly, we apply Large Language Model (LLM) to encode long context clinical notes and GRU model to encode time-series EHR data. Secondly, we prompt LLM to extract task-relevant medical entities and match entities in professionally labeled external knowledge graph (PrimeKG) with corresponding medical knowledge. By matching and aligning with clinical standards, our framework eliminates hallucinations and ensures consistency. Lastly, we propose an adaptive multimodal fusion network to integrate extracted knowledge with multimodal EHR data. Our extensive experiments on MIMIC-III mortality and readmission tasks showcase the superior performance of our REALM framework over baselines, emphasizing the effectiveness of each module. REALM framework contributes to refining the use of multimodal EHR data in healthcare and bridging the gap with nuanced medical context essential for informed clinical predictions.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
Experimental study of Alfvén wave reflection from an Alfvén-speed gradient relevant to the solar coronal holes
Authors:
Sayak Bose,
Jason M. TenBarge,
Troy Carter,
Michael Hahn,
Hantao Ji,
James Juno,
Daniel Wolf Savin,
Shreekrishna Tripathi,
Stephen Vincena
Abstract:
We report the first experimental detection of a reflected Alfvén wave from an Alfvén-speed gradient under conditions similar to those in coronal holes. The experiments were conducted in the Large Plasma Device at the University of California, Los Angeles. We present the experimentally measured dependence of the coefficient of reflection versus the wave inhomogeneity parameter, i.e., the ratio of t…
▽ More
We report the first experimental detection of a reflected Alfvén wave from an Alfvén-speed gradient under conditions similar to those in coronal holes. The experiments were conducted in the Large Plasma Device at the University of California, Los Angeles. We present the experimentally measured dependence of the coefficient of reflection versus the wave inhomogeneity parameter, i.e., the ratio of the wave length of the incident wave to the length scale of the gradient. Two-fluid simulations using the Gkeyll code qualitatively agree with and support the experimental findings. Our experimental results support models of wave heating that rely on wave reflection at low heights from a smooth Alfvén-speed gradient to drive turbulence.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain
Authors:
Amin Karimi Monsefi,
Payam Karisani,
Mengxi Zhou,
Stacey Choi,
Nathan Doble,
Heng Ji,
Srinivasan Parthasarathy,
Rajiv Ramnath
Abstract:
Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introdu…
▽ More
Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method to mitigate such challenges. LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy to capture both long-range and short-range feature dependencies adeptly. This is in contrast to existing methods that rely on increasing network capacity to enhance feature extraction. This combination of novel techniques in our model is especially beneficial in medical image segmentation, given the difficulty of learning intricate and often irregular body organ shapes, such as the spleen. Complementary, we propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets. The method combines masking and contrastive learning techniques within a multi-task learning framework and is compatible with both Vision Transformer (ViT) and CNN-based models. We demonstrate the efficacy of our methods in numerous tasks across two standard datasets (i.e., BTCV and MSD). Benchmark comparisons with eight state-of-the-art models highlight LoGoNet's superior performance in both inference time and accuracy.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Mean field control of droplet dynamics with high order finite element computations
Authors:
Guosheng Fu,
Hangjie Ji,
Will Pazner,
Wuchen Li
Abstract:
Liquid droplet dynamics are widely used in biological and engineering applications, which contain complex interfacial instabilities and pattern formulation such as droplet merging, splitting, and transport. This paper studies a class of mean field control formulation towards these droplet dynamics. They are used to control and maintain the manipulation of droplets in applications. We first formula…
▽ More
Liquid droplet dynamics are widely used in biological and engineering applications, which contain complex interfacial instabilities and pattern formulation such as droplet merging, splitting, and transport. This paper studies a class of mean field control formulation towards these droplet dynamics. They are used to control and maintain the manipulation of droplets in applications. We first formulate the droplet dynamics as gradient flows of free energies in modified optimal transport metrics with nonlinear mobilities. We then design an optimal control problem for these gradient flows. We lastly apply the primal-dual hybrid gradient algorithm with high-order finite element methods to simulate the proposed mean field control problems. Numerical examples, including droplet formation, bead-up/spreading, transport, and merging/splitting on a two-dimensional spatial domain, demonstrate the effectiveness of the proposed mean field control mechanism.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Topological metal and high-order Dirac point in cubic Rashba model
Authors:
Haijiao Ji,
Ning Zhang,
Noah F. Q. Yuan
Abstract:
We investigate the properties of the two-dimensional model with Rashba-type spin-orbit coupling cubic in electron momentum. In the normal phase, edge states emerge on open boundaries. In the superconducting phase, edge states could evolve into gapped fermionic edge states. Applications to realistic materials of interface superconductors are also discussed.
We investigate the properties of the two-dimensional model with Rashba-type spin-orbit coupling cubic in electron momentum. In the normal phase, edge states emerge on open boundaries. In the superconducting phase, edge states could evolve into gapped fermionic edge states. Applications to realistic materials of interface superconductors are also discussed.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Quantum teleportation based on the elegant joint measurement
Authors:
Dong Ding,
Ming-Xing Yu,
Ying-Qiu He,
Hao-Sen Ji,
Ting Gao,
Feng-Li Yan
Abstract:
As a generalization of the well-known Bell state measurement (BSM), the elegant joint measurement (EJM) is a kind of novel two-qubit joint measurement, parameterized by a subtle phase factor $θ\in [0,π/2]$. We explore quantum teleportation based on the EJM, inspired by Gisin's idea that quantum entanglement not only provides quantum channel and also quantum joint measurement for quantum teleportat…
▽ More
As a generalization of the well-known Bell state measurement (BSM), the elegant joint measurement (EJM) is a kind of novel two-qubit joint measurement, parameterized by a subtle phase factor $θ\in [0,π/2]$. We explore quantum teleportation based on the EJM, inspired by Gisin's idea that quantum entanglement not only provides quantum channel and also quantum joint measurement for quantum teleportation. It is a probabilistic teleportation caused by undesired nonunitary quantum evolution. There are two interesting features in the present scenario. First, it goes beyond the conventional teleportation scenario, which can be included in the present scenario. Second, different from the BSM being single input and four outcomes, it can provide an adjustable input setting or even multiple measurement settings for the sender (or the controller). Moreover, we show in detail the feasible quantum circuits to realize the present scenario, where a few unitary operations and a nonunitary quantum gate are being utilized.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Executable Code Actions Elicit Better LLM Agents
Authors:
Xingyao Wang,
Yangyi Chen,
Lifan Yuan,
Yizhe Zhang,
Yunzhu Li,
Hao Peng,
Heng Ji
Abstract:
Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted fl…
▽ More
Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.
△ Less
Submitted 23 May, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Towards Efficient and Exact Optimization of Language Model Alignment
Authors:
Haozhe Ji,
Cheng Lu,
Yilin Niu,
Pei Ke,
Hongning Wang,
Jun Zhu,
Jie Tang,
Minlie Huang
Abstract:
The alignment of language models with human preferences is vital for their application in real-world tasks. The problem is formulated as optimizing the model's policy to maximize the expected reward that reflects human preferences with minimal deviation from the initial policy. While considered as a straightforward solution, reinforcement learning (RL) suffers from high variance in policy updates,…
▽ More
The alignment of language models with human preferences is vital for their application in real-world tasks. The problem is formulated as optimizing the model's policy to maximize the expected reward that reflects human preferences with minimal deviation from the initial policy. While considered as a straightforward solution, reinforcement learning (RL) suffers from high variance in policy updates, which impedes efficient policy improvement. Recently, direct preference optimization (DPO) was proposed to directly optimize the policy from preference data. Though simple to implement, DPO is derived based on the optimal policy that is not assured to be achieved in practice, which undermines its convergence to the intended solution.
In this paper, we propose efficient exact optimization (EXO) of the alignment objective. We prove that EXO is guaranteed to optimize in the same direction as the RL algorithms asymptotically for arbitary parametrization of the policy, while enables efficient optimization by circumventing the complexities associated with RL algorithms. We compare our method to DPO with both theoretical and empirical analyses, and further demonstrate the advantages of our method over existing approaches on realistic human preference data. Code is available at https://github.com/haozheji/exact-optimization.
△ Less
Submitted 23 February, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Radiatively Cooled Magnetic Reconnection Experiments Driven by Pulsed Power
Authors:
R Datta,
K Chandler,
C E Myers,
J P Chittenden,
A J Crilly,
C Aragon,
D J Ampleford,
J T Banasek,
A Edens,
W R Fox,
S B Hansen,
E C Harding,
C A Jennings,
H Ji,
C C Kuranz,
S V Lebedev,
Q Looker,
S G Patel,
A J Porwitzky,
G A Shipley,
D A Uzdensky,
D A Yager-Elorriaga,
J D Hare
Abstract:
We present evidence for strong radiative cooling in a pulsed-power-driven magnetic reconnection experiment. Two aluminum exploding wire arrays, driven by a 20 MA peak current, 300 ns rise time pulse from the Z machine (Sandia National Laboratories), generate strongly-driven plasma flows ($M_A \approx 7$) with anti-parallel magnetic fields, which form a reconnection layer ($S_L \approx 120$) at the…
▽ More
We present evidence for strong radiative cooling in a pulsed-power-driven magnetic reconnection experiment. Two aluminum exploding wire arrays, driven by a 20 MA peak current, 300 ns rise time pulse from the Z machine (Sandia National Laboratories), generate strongly-driven plasma flows ($M_A \approx 7$) with anti-parallel magnetic fields, which form a reconnection layer ($S_L \approx 120$) at the mid-plane. The net cooling rate far exceeds the Alfvénic transit rate ($τ_{\text{cool}}^{-1}/τ_{\text{A}}^{-1} > 100$), leading to strong cooling of the reconnection layer. We determine the advected magnetic field and flow velocity using inductive probes positioned in the inflow to the layer, and inflow ion density and temperature from analysis of visible emission spectroscopy. A sharp decrease in X-ray emission from the reconnection layer, measured using filtered diodes and time-gated X-ray imaging, provides evidence for strong cooling of the reconnection layer after its initial formation. X-ray images also show localized hotspots, regions of strong X-ray emission, with velocities comparable to the expected outflow velocity from the reconnection layer. These hotspots are consistent with plasmoids observed in 3D radiative resistive magnetohydrodynamic simulations of the experiment. X-ray spectroscopy further indicates that the hotspots have a temperature (170 eV) much higher than the bulk layer ($\leq$ 75 eV) and inflow temperatures (about 2 eV), and that these hotspots generate the majority of the high-energy (> 1 keV) emission.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Depends-Kotlin: A Cross-Language Kotlin Dependency Extractor
Authors:
Qiong Feng,
Xiaotian Ma,
Huan Ji,
Peng Liang
Abstract:
Since Google introduced Kotlin as an official programming language for developing Android apps in 2017, Kotlin has gained widespread adoption in Android development. However, compared to Java, there is limited support for Kotlin code dependency analysis, which is the foundation to software analysis. To bridge this gap, we developed Depends-Kotlin to extract entities and their dependencies in Kotli…
▽ More
Since Google introduced Kotlin as an official programming language for developing Android apps in 2017, Kotlin has gained widespread adoption in Android development. However, compared to Java, there is limited support for Kotlin code dependency analysis, which is the foundation to software analysis. To bridge this gap, we developed Depends-Kotlin to extract entities and their dependencies in Kotlin source code. Not only does Depends-Kotlin support extracting entities' dependencies in Kotlin code, but it can also extract dependency relations between Kotlin and Java. The extraction of such cross-language dependencies can help developers understand the migration process from Java to Kotlin. Additionally, we used a Java project with confirmed dependencies as a benchmark and converted this project to two projects: Kotlin-only and a combination of Kotlin and Java. The dependencies in these two projects were then extracted using our tool. The consistency observed among dependency relations in all three projects confirms the accuracy of Depends-Kotlin. Furthermore, the performance of Depends-Kotlin was assessed using another three projects of varying sizes. The source code of Depends-Kotlin and the dataset used in this demo paper have been uploaded to https://github.com/XYZboom/depends-kotlin. We also provided a screencast presenting Depends-Kotlin https://youtu.be/daZuXOwn1Ls.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Transverse oscillation of prominence and filament induced by an EUV wave from the farside of the Sun
Authors:
Yanjie Zhang,
Qingmin Zhang,
De-chao Song,
Haisheng Ji
Abstract:
In this paper, we report our multi-angle observations of the transverse oscillation of a prominence and a filament induced by an EUV wave originating from the farside of the Sun on 2014 September 1. The prominence oscillation was simultaneously observed by both Atmospheric Imaging Assembly (AIA) onboard the Solar Dynamics Observatory (SDO) spacecraft and Extreme-UltraViolet Imager (EUVI) onboard t…
▽ More
In this paper, we report our multi-angle observations of the transverse oscillation of a prominence and a filament induced by an EUV wave originating from the farside of the Sun on 2014 September 1. The prominence oscillation was simultaneously observed by both Atmospheric Imaging Assembly (AIA) onboard the Solar Dynamics Observatory (SDO) spacecraft and Extreme-UltraViolet Imager (EUVI) onboard the Behind Solar Terrestrial Relations Observatory (STEREO) spacecraft. The speed of the shock travelling in the interplanetary space exceeds that of the EUV wave, and the coronal dimming area experiences minimal growth. This indicates that the shock wave is driven by the CME, while the EUV wave freely propagates after the lateral motion of the CME flanks has stopped. The observed oscillation direction of the prominence, determined through three-dimensional reconstruction, further supports this point. Moreover, The detailed investigation of the oscillations in the prominence and filament induced by the EUV wave reveals initial amplitudes of 16.08 and 2.15 Mm, periods of 1769 and 1863 s, damping time scales of 2640 and 1259 s, and damping ratios of 1.49 and 0.68, respectively. The radial component of magnetic field, as derived from the prominence and filament oscillation measurements, was estimated to be 5.4 G and 4.1 G, respectively. In turn, utilizing the onset times of both the prominence and filament oscillation, the average speeds of the EUV wave are determined to be 498 km s$^{-1}$ and 451 km s$^{-1}$, respectively.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Label-free detection of exosomes from different cellular sources based on surface-enhanced Raman spectroscopy combined with machine learning models
Authors:
Yang Li,
Xiaoming Lyu,
Kuo Zhan,
Haoyu Ji,
Lei Qin,
JianAn Huang
Abstract:
Exosomes are significant facilitators of inter-cellular communication that can unveil cell-cell interactions, signaling pathways, regulatory mechanisms and disease diagnostics. Nonetheless, current analysis required large amount of data for exosome identification that it hampers efficient and timely mechanism study and diagnostics. Here, we used a machine-learning assisted Surface-enhanced Raman s…
▽ More
Exosomes are significant facilitators of inter-cellular communication that can unveil cell-cell interactions, signaling pathways, regulatory mechanisms and disease diagnostics. Nonetheless, current analysis required large amount of data for exosome identification that it hampers efficient and timely mechanism study and diagnostics. Here, we used a machine-learning assisted Surface-enhanced Raman spectroscopy (SERS) method to detect exosomes derived from six distinct cell lines (HepG2, Hela, 143B, LO-2, BMSC, and H8) with small amount of data. By employing sodium borohydride-reduced silver nanoparticles and sodium borohydride solution as an aggregating agent, 100 SERS spectra of the each types of exosomes were collected and then subjected to multivariate and machine learning analysis. By integrating Principal Component Analysis with Support Vector Machine (PCA-SVM) models, our analysis achieved a high accuracy rate of 94.4% in predicting exosomes originating from various cellular sources. In comparison to other machine learning analysis, our method used small amount of SERS data to allow a simple and rapid exosome detection, which enables a timely subsequent study of cell-cell interactions, communication mechanisms, and disease mechanisms in life sciences.
△ Less
Submitted 26 January, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences
Authors:
Hongyi Liu,
Qingyun Wang,
Payam Karisani,
Heng Ji
Abstract:
Named entity recognition is a key component of Information Extraction (IE), particularly in scientific domains such as biomedicine and chemistry, where large language models (LLMs), e.g., ChatGPT, fall short. We investigate the applicability of transfer learning for enhancing a named entity recognition model trained in the biomedical domain (the source domain) to be used in the chemical domain (th…
▽ More
Named entity recognition is a key component of Information Extraction (IE), particularly in scientific domains such as biomedicine and chemistry, where large language models (LLMs), e.g., ChatGPT, fall short. We investigate the applicability of transfer learning for enhancing a named entity recognition model trained in the biomedical domain (the source domain) to be used in the chemical domain (the target domain). A common practice for training such a model in a few-shot learning setting is to pretrain the model on the labeled source data, and then, to finetune it on a hand-full of labeled target examples. In our experiments, we observed that such a model is prone to mislabeling the source entities, which can often appear in the text, as the target entities. To alleviate this problem, we propose a model to transfer the knowledge from the source domain to the target domain, but, at the same time, to project the source entities and target entities into separate regions of the feature space. This diminishes the risk of mislabeling the source entities as the target entities. Our model consists of two stages: 1) entity grouping in the source domain, which incorporates knowledge from annotated events to establish relations between entities, and 2) entity discrimination in the target domain, which relies on pseudo labeling and contrastive learning to enhance discrimination between the entities in the two domains. We conduct our extensive experiments across three source and three target datasets, demonstrating that our method outperforms the baselines by up to 5% absolute value.
△ Less
Submitted 31 March, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Chem-FINESE: Validating Fine-Grained Few-shot Entity Extraction through Text Reconstruction
Authors:
Qingyun Wang,
Zixuan Zhang,
Hongxiang Li,
Xuan Liu,
Jiawei Han,
Huimin Zhao,
Heng Ji
Abstract:
Fine-grained few-shot entity extraction in the chemical domain faces two unique challenges. First, compared with entity extraction tasks in the general domain, sentences from chemical papers usually contain more entities. Moreover, entity extraction models usually have difficulty extracting entities of long-tailed types. In this paper, we propose Chem-FINESE, a novel sequence-to-sequence (seq2seq)…
▽ More
Fine-grained few-shot entity extraction in the chemical domain faces two unique challenges. First, compared with entity extraction tasks in the general domain, sentences from chemical papers usually contain more entities. Moreover, entity extraction models usually have difficulty extracting entities of long-tailed types. In this paper, we propose Chem-FINESE, a novel sequence-to-sequence (seq2seq) based few-shot entity extraction approach, to address these two challenges. Our Chem-FINESE has two components: a seq2seq entity extractor to extract named entities from the input sentence and a seq2seq self-validation module to reconstruct the original input sentence from extracted entities. Inspired by the fact that a good entity extraction system needs to extract entities faithfully, our new self-validation module leverages entity extraction results to reconstruct the original input sentence. Besides, we design a new contrastive loss to reduce excessive copying during the extraction process. Finally, we release ChemNER+, a new fine-grained chemical entity extraction dataset that is annotated by domain experts with the ChemNER schema. Experiments in few-shot settings with both ChemNER+ and CHEMET datasets show that our newly proposed framework has contributed up to 8.26% and 6.84% absolute F1-score gains respectively.
△ Less
Submitted 29 May, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.