Human pose estimation in videos has long been a compelling yet challenging task within the realm of computer vision. Nevertheless, this task remains difficult because of the complex video scenes, such as video defocus and self-occlusion. Recent methods strive to integrate multi-frame visual features generated by a backbone network for pose estimation. However, they often ignore the useful joint information encoded in the initial heatmap, which is a by-product of the backbone generation. Comparatively, methods that attempt to refine the initial heatmap fail to consider any spatio-temporal motion features. As a result, the performance of existing methods for pose estimation falls short due to the lack of ability to leverage both local joint (heatmap) information and global motion (feature) dynamics.
To address this problem, we propose a novel joint-motion mutual learning framework for pose estimation, which effectively concentrates on both local joint dependency and global pixel-level motion dynamics. Specifically, we introduce a context-aware joint learner that adaptively leverages initial heatmaps and motion flow to retrieve robust local joint feature. Given that local joint feature and global motion flow are complementary, we further propose a progressive joint-motion mutual learning that synergistically exchanges information and interactively learns between joint feature and motion flow to improve the capability of the model. More importantly, to capture more diverse joint and motion cues, we theoretically analyze and propose an information orthogonality objective to avoid learning redundant information from multi-cues. Empirical experiments show our method outperforms prior arts on three challenging benchmarks.
Deepfake technology has given rise to a spectrum of novel and compelling applications. Unfortunately, the widespread proliferation of high-fidelity fake videos has led to pervasive confusion and deception, shattering our faith that seeing is believing. One aspect that has been overlooked so far is that current deepfake detection approaches may easily fall into the trap of overfitting, focusing only on forgery clues within one or a few local regions. Moreover, existing works heavily rely on neural networks to extract forgery features, lacking theoretical constraints guaranteeing that sufficient forgery clues are extracted and superfluous features are eliminated. These deficiencies culminate in unsatisfactory accuracy and limited generalizability in real-life scenarios. In this paper, we try to tackle these challenges through three designs: (1) We present a novel framework to capture broader forgery clues by extracting multiple non-overlapping local representations and fusing them into a global semantic-rich feature. (2) Based on the information bottleneck theory, we derive Local Information Loss to guarantee the orthogonality of local representations while preserving comprehensive task-relevant information. (3) Further, to fuse the local representations and remove task-irrelevant information, we arrive at a Global Information Loss through the theoretical analysis of mutual information. Empirically, our method achieves state-of-the-art performance on five benchmark datasets. Our code is available at https://github.com/QingyuLiu/Exposing-the-Deception, hoping to inspire researchers.
LinkVisual retrieval aims to search for the most relevant visual items, e.g., images and videos, from a candidate gallery with a given query item. Accuracy and efficiency are two competing objectives in retrieval tasks. Instead of crafting a new method pursuing further improvement on accuracy, in this paper we propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowledge from off-the-shelf pre-trained retrieval models to a lightweight student model for efficient visual retrieval. Furthermore, we discover that the similarities obtained by different retrieval models are diversified and incommensurable, which makes it challenging to jointly distill knowledge from multiple models. Therefore, we propose to whiten the output of teacher models before fusion, which enables effective multi-teacher distillation for retrieval models. Whiten-MTD is conceptually simple and practically effective. Extensive experiments on two landmark image retrieval datasets and one video retrieval dataset demonstrate the effectiveness of our proposed method, and its good balance of retrieval performance and efficiency. Our source code is released at https://github.com/Maryeon/whiten_mtd.
LinkModern operating system kernels, typically written in low-level languages such as C and C++, are tasked with managing extensive memory resources. Memory-related errors, such as memory leak and memory corruption, are common occurrences and constantly introduced. Traditional detection methods often rely on taint analysis, which suffers from scalability issue (i.e., path explosion) when applied to complex OS kernels. Recent research has pivoted towards leveraging techniques like function pairing or similarity analysis to overcome this challenge. These approaches identify memory errors by referencing code that is either frequently used or semantically similar. However, these techniques have limitations when applied to customized code, which may lack a sufficient corpus of code snippets to facilitate effective function pairing or similarity analysis. This deficiency hinders their applicability in kernel analysis where unique or proprietary code is prevalent.
In this paper, we propose a novel methodology for detecting memory bugs based on inconsistent memory management intentions (IMMI). Our insight is that many memory bugs, despite their varied manifestations, stem from a common underlying issue: the ambiguity in ownership and lifecycle management of memory objects, especially when these objects are passed across various functions. Memory bugs emerge when the mem- ory management strategies of the caller and callee functions misalign for a given memory object. IMMI aims to model and clarify these inconsistent intentions, thereby mitigating the prevalence of such bugs. Our methodology offers two primary advantages over existing techniques: (1) It utilizes a fine-grained memory management model that obviates the need for extensive data-flow tracking, and (2) it does not rely on similarity analysis or the identification of function pairs, making it highly effective in the context of customized code. To enhance the capabilities of IMMI, we have integrated a large language model (LLM) to assist in the interpretation of implicit kernel resource management mechanisms. We have implemented IMMI and evaluated it against the Linux kernel. IMMI effectively found 80 new memory bugs (including 23 memory corruptions and 57 memory leaks) with 35% false positive rate. Most of them are missed by the state-of-the-art memory bug detection tools.
Indirect function calls are widely used in building system software like OS kernels for their high flexibility and performance. Statically resolving indirect-call targets has been known to be a hard problem, which is a fundamental requirement for various program analysis and protection tasks. The state-of-the-art techniques, which use type analysis, are still imprecise. In this paper, we present a new approach, TFA, that precisely identifies indirect-call targets. The intuition behind TFA is that type-based analysis and data-flow analysis are inherently complementary in resolving indirect-call targets. TFA incorporates a co-analysis system that makes the best use of both type information and data-flow information. The co-analysis keeps refining the global call graph iteratively, allowing us to achieve an optimal indirect call analysis. We have implemented TFA in LLVM and evaluated it against five famous large-scale programs. The experimental results show that TFA eliminates additional 24% to 59% of indirect-call targets compared with the state-of-the-art approaches, without introducing new false negatives. With the precise indirect-call analysis, we further develop a strengthened fine-grained forward-edge control-flow integrity scheme and apply it to the Linux kernel. We have also used the refined indirect-call analysis results in bug detection, where we have found 8 deep bugs in the Linux kernel. As a generic technique, the precise indirect-call analysis of TFA can also benefit other applications such as compiler optimization and software debloating.
As blockchain smart contracts become more widespread and carry more valuable digital assets, they become an increasingly attractive target for attackers. Over the past few years, smart contracts have been subject to a plethora of devastating attacks, resulting in billions of dollars in financial losses. There has been a notable surge of research interest in identifying defects in smart contracts. However, existing smart contract fuzzing tools are still unsatisfactory. They struggle to screen out meaningful transaction sequences and specify critical inputs for each transaction. As a result, they can only trigger a limited range of contract states, making it difficult to unveil complicated vulnerabilities hidden in the deep state space. In this paper, we shed light on smart contract fuzzing by employing a sequence-aware mutation and seed mask guidance strategy. In particular, we first utilize data-flow-based feedback to determine transaction orders in a meaningful way and further introduce a sequence-aware mutation technique to explore deeper states. Thereafter, we design a mask-guided seed mutation strategy that biases the generated transaction inputs to hit target branches. In addition, we develop a dynamic-adaptive energy adjustment paradigm that balances the fuzzing resource allocation during a fuzzing campaign. We implement our designs into a new smart contract fuzzer named MuFuzz, and extensively evaluate it on three benchmarks. Empirical results demonstrate that MuFuzz outperforms existing tools in terms of both branch coverage and bug finding. Overall, MuFuzz achieves higher branch coverage than state-of-the-art fuzzers (up to 25%) and detects 30 % more bugs than existing bug detectors.
LinkBitcoin is one of the decentralized cryptocurrencies powered by a peer-to-peer blockchain network. Parties who trade in the bitcoin network are not required to disclose any personal information. Such property of anonymity, however, precipitates potential malicious transactions to a certain extent. Indeed, various illegal activities such as money laundering, dark network trading, and gambling in the bitcoin network are nothing new now. While a proliferation of work has been developed to identify malicious bitcoin transactions, the behavior analysis and classification of bitcoin addresses are largely overlooked by existing tools. In this paper, we propose BAClassifier, a tool that can automatically classify bitcoin addresses based on their behaviors. Technically, we come up with the following three key designs. First, we consider casting the transactions of the bitcoin address into an address graph structure, of which we introduce a graph node compression technique and a graph structure augmentation method to characterize a unified graph representation. Furthermore, we leverage a graph feature network to learn the graph representations of each address and generate the graph embeddings. Finally, we aggregate all graph embeddings of an address into the address-level representation, and engage in a classification model to give the address behavior classification. As a side contribution, we construct and release a large-scale annotated dataset that consists of over 2 million real-world bitcoin addresses and concerns 4 types of address behaviors. Experimental results demonstrate that our proposed framework outperforms state-of-the-art bitcoin address classifiers and existing classification models, where the precision and F1-score are 96% and 95%, respectively. Our implementation and dataset are released, hoping to inspire others.
LinkOver the past couple of years, smart contracts have been plagued by multifarious vulnerabilities, which have led to catastrophic financial losses. Their security issues, therefore, have drawn intense attention. As countermeasures, a family of tools has been developed to identify vulnerabilities in smart contracts at the source-code level. Unfortunately, only a small fraction of smart contracts is currently open-sourced. Another spectrum of work is presented to deal with pure bytecode, but most such efforts still suffer from relatively low performance due to the inherent difficulty in restoring abundant semantics in the source code from the bytecode.
This paper proposes a novel cross-modality mutual learning framework for enhancing smart contract vulnerability detection on bytecode. Specifically, we engage in two networks, a student network as the primary network and a teacher network as the auxiliary network. takes two modalities, i.e., source code and its corresponding bytecode as inputs, while is fed with only bytecode. By learning from, is trained to infer the missed source code embeddings and combine both modalities to approach precise vulnerability detection. To further facilitate mutual learning between and, we present a cross-modality mutual learning loss and two transfer losses. As a side contribution, we construct and release a labeled smart contract dataset that concerns four types of common vulnerabilities. Experimental results show that our method significantly surpasses state-of-the-art approaches.
Blockchain smart contracts have given rise to a variety of interesting and compelling applications and emerged as a revolutionary force for the Internet. Smart contracts from various fields now hold over one trillion dollars worth of virtual coins, attracting numerous attacks. Quite a few practitioners have devoted themselves to developing tools for detecting bugs in smart contracts. One line of efforts revolve around static analysis techniques, which heavily suffer from high false positive rates. Another line of works concentrate on fuzzing techniques. Unfortunately, current fuzzing approaches for smart contracts tend to conduct fuzzing starting from the initial state of the contract, which expends too much energy revolving around the initial state of the contract and thus is usually unable to unearth bugs triggered by other states. Moreover, most existing methods treat each branch equally, failing to take care of the branches that are rare or more likely to possess bugs. This might lead to resources wasted on normal branches. In this paper, we try to tackle these challenges from three aspects: 1) generating function invocation sequences, we explicitly consider data dependencies between functions to facilitate exploring richer states. We further prolong a function invocation sequence $\mathcal {S}_{1}$ by appending a new sequence $\mathcal {S}_{2}$ , so that the appended sequence $\mathcal {S}_{2}$ can start fuzzing from states that are different from the initial state; 2) we incorporate a branch distance-based measure to evolve test cases iteratively towards a target branch; 3) we engage a branch search algorithm to discover rare and vulnerable branches, and design an energy allocation mechanism to take care of exercising these crucial branches. We implement IR-Fuzz and extensively evaluate it over 12K real-world contracts. Empirical results show that: (i) IR-Fuzz achieves 28% higher branch coverage than state-of-the-art fuzzing approaches, (ii) IR-Fuzz detects more vulnerabilities and increases the average accuracy of vulnerability detection by 7% over current methods, and (iii) IR-Fuzz is fast, generating an average of 350 test cases per second. Our implementation and dataset are released at https://github.com/Messi-Q/IR-Fuzz , hoping to facilitate future research.
Link