diff --git a/README.md b/README.md index 79da64b..638b91b 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Awesome Binary Similarity -| Title | Venue | Year | Paper | Slide | Video | Github | +| Title | Venue | Year | Paper | Slide | Video | Github | | :----------------------------------------------------------: | :----------: | :--: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | |Cross-Inlining Binary Function Similarity Detection| ICSE | 2024 | [Link](https://dl.acm.org/doi/abs/10.1145/3597503.3639080) | | | [link](https://github.com/island255/cross-inlining_binary_function_similarity)| | Improving ML-based Binary Function Similarity Detection by Assessing and Deprioritizing Control Flow Graph Features | Usenix | 2024 | [link](https://www.usenix.org/system/files/usenixsecurity24-wang-jialai.pdf) | | | | @@ -23,7 +23,7 @@ |Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning | ACSAC | 2022 | [link](https://dl.acm.org/doi/abs/10.1145/3564625.3567975)| [link](https://www.acsac.org/2022/program/papers/76-Ahn-Software_Security_I.pdf)| | [link](https://github.com/asw0316/binshot)| |Improving cross-platform binary analysis using representation learning via graph alignment | ISSTA | 2022 | [link](https://dl.acm.org/doi/pdf/10.1145/3533767.3534383)| | [link](https://www.youtube.com/watch?v=rK1CDMauaZU&t=89s) | [link](https://github.com/yonsei-cysec/XBA)| |jTrans: Jump-Aware Transformer for Binary Code Similarity | ISSTA | 2022 | [link](https://arxiv.org/pdf/2205.12713.pdf)| | [link](https://www.youtube.com/watch?v=rAirmnUsC1k) | [link](https://github.com/vul337/jTrans/)| -|COBRA-GCN: Contrastive Learning to Optimize Binary Representation Analysis with Graph Convolutional Networks | DIMVA | 2022 | [link](https://dl.acm.org/doi/abs/10.1007/978-3-031-09484-2_4)| | | | +|COBRA-GCN: Contrastive Learning to Optimize Binary Representation Analysis with Graph Convolutional Networks | DIMVA | 2022 | [link](https://dl.acm.org/doi/abs/10.1007/978-3-031-09484-2_4)| | | | |A Large-Scale Empirical Analysis of the Vulnerabilities Introduced by Third-Party Components in IoT Firmware | ISSTA | 2022 | [link](https://doi.org/10.1145/3533767.3534366)| | [link](https://www.youtube.com/watch?v=H2o45YRguMM) | [link](https://github.com/BBge/FirmSecDataset)| |How Machine Learning Is Solving the Binary Function Similarity Problem | Usenix | 2022 | [link](https://www.s3.eurecom.fr/docs/usenixsec22_marcelli.pdf)| |[link](https://www.youtube.com/watch?v=e9bab7GpwnI) | [link](https://github.com/Cisco-Talos/binary_function_similarity)| |Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking | TSE | 2022 | [link](https://ieeexplore.ieee.org/document/9707874)| | | [link](https://github.com/computer-analysis/BinUSE)| @@ -37,185 +37,184 @@ |EnBinDiff: Identifying Data-Only Patches for Binaries | TDSC | 2021 | [link](https://ieeexplore.ieee.org/document/9645381)| | | | |BinDiffNN: Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences | TSE | 2021 | [link](https://ieeexplore.ieee.org/document/9470904)| | | [link](https://github.com/sami2316/bindiff_NN)| | Codee: A Tensor Embedding Scheme for Binary Code Search | TSE | 2021 |[link](https://ieeexplore.ieee.org/document/9345532) | | | [link](https://github.com/ycachy/Codee)| -| Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned | TSE(revision) | 2021 | [link](https://arxiv.org/pdf/2011.10749.pdf) | | | [link](https://github.com/SoftSec-KAIST/TikNib) | -| How could Neural Networks understand Programs? | ICML 2021 | 2021 | [link](https://arxiv.org/pdf/2105.04297.pdf) | | [link](https://github.com/pdlan/OSCAR) || -| Multi-threshold token-based code clone detection | SANER 2021 | 2021 | [link](https://arxiv.org/pdf/2002.05204.pdf) | | || -| FastSpec: Scalable Generation and Detection of Spectre Gadgets Using Neural Embeddings | IEEE Euro S&P 2021 | 2021 | [link](https://arxiv.org/pdf/2006.14147.pdf) | | [link](https://www.youtube.com/watch?v=WskRnEY7oCs) | [link](https://github.com/vernamlab/FastSpec) | -| TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity | | 2020 | [link](https://arxiv.org/pdf/2012.08680.pdf) | | | [link](https://github.com/CUMLSec/trex) | -| Similarity of Binaries Across Optimization Levels and Obfuscation | ESORICS 2020 | 2020 | [link](https://books.google.com.hk/books?id=sqT8DwAAQBAJ&pg=PA295&lpg=PA295&dq=Similarity+of+Binaries+Across+Optimization+Levels+and+Obfuscation&source=bl&ots=OFw-NpBFEJ&sig=ACfU3U2DFjxq5lFEM2smLXvWRNf8dyX-TQ&hl=en&sa=X&ved=2ahUKEwiZvuKSk93yAhXCB94KHYNeA_YQ6AF6BAgPEAM#v=onepage&q=Similarity%20of%20Binaries%20Across%20Optimization%20Levels%20and%20Obfuscation&f=false) | | [link](https://www.youtube.com/watch?v=Pi7wsCvfBa8) | | -| Open-source tools and benchmarks for code-clone detection: past, present, and future trends | | 2020 | [link](https://dl.acm.org/doi/abs/10.1145/3381307.3381310) | | | | -| Semantically Find Similar Binary Codes with Mixed Key Instruction Sequence | | 2020 | | | | | -| LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code | | 2020 | [link](https://ieeexplore.ieee.org/document/9054845) | | | | -| Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree | SANER | 2020 | [link](https://arxiv.org/pdf/2002.08653.pdf) | | | | -| What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning | | 2020 | [link](https://arxiv.org/pdf/2002.02650.pdf) | | | | -| Clone Detection on Large Scala Codebases | | 2020 | [link](https://ieeexplore.ieee.org/document/9047640) | | | | -| CloneCompass: Visualizations for Code Clone Analysis | | 2020 | [link](https://dspace.library.uvic.ca/bitstream/handle/1828/11729/Ying_Wang_MSc_2020.pdf?sequence=1&isAllowed=y) | | | | -| DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing | NDSS | 2020 | [link](https://www.ndss-symposium.org/wp-content/uploads/2020/02/24311.pdf) | | [link](https://www.youtube.com/watch?v=TB50csOprMs) | [link](https://github.com/yueduan/DeepBinDiff) | -| VGraph: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets | EuroS&P | 2020 | [link](https://www2.seas.gwu.edu/~howie/publications/VGraph-EuroSP20.pdf) | | | | -| Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection | AAAI | 2020 | [link](https://keenlab.tencent.com/en/whitepapers/Ordermatters.pdf) | | | | -| Similarity Metric Method for Binary Basic Blocks of Cross-Instruction Set Architecture | NDSS | 2020 | [link](https://www.ndss-symposium.org/wp-content/uploads/bar2020-23002.pdf) | | | [link](https://github.com/zhangxiaochuan/MIRROR) | +| Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned | TSE(revision) | 2021 | [link](https://arxiv.org/pdf/2011.10749.pdf) | | | [link](https://github.com/SoftSec-KAIST/TikNib) | +| How could Neural Networks understand Programs? | ICML 2021 | 2021 | [link](https://arxiv.org/pdf/2105.04297.pdf) | | [link](https://github.com/pdlan/OSCAR) || +| Multi-threshold token-based code clone detection | SANER 2021 | 2021 | [link](https://arxiv.org/pdf/2002.05204.pdf) | | || +| FastSpec: Scalable Generation and Detection of Spectre Gadgets Using Neural Embeddings | IEEE Euro S&P 2021 | 2021 | [link](https://arxiv.org/pdf/2006.14147.pdf) | | [link](https://www.youtube.com/watch?v=WskRnEY7oCs) | [link](https://github.com/vernamlab/FastSpec) | +| TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity | | 2020 | [link](https://arxiv.org/pdf/2012.08680.pdf) | | | [link](https://github.com/CUMLSec/trex) | +| Similarity of Binaries Across Optimization Levels and Obfuscation | ESORICS 2020 | 2020 | [link](https://books.google.com.hk/books?id=sqT8DwAAQBAJ&pg=PA295&lpg=PA295&dq=Similarity+of+Binaries+Across+Optimization+Levels+and+Obfuscation&source=bl&ots=OFw-NpBFEJ&sig=ACfU3U2DFjxq5lFEM2smLXvWRNf8dyX-TQ&hl=en&sa=X&ved=2ahUKEwiZvuKSk93yAhXCB94KHYNeA_YQ6AF6BAgPEAM#v=onepage&q=Similarity%20of%20Binaries%20Across%20Optimization%20Levels%20and%20Obfuscation&f=false) | | [link](https://www.youtube.com/watch?v=Pi7wsCvfBa8) | | +| Open-source tools and benchmarks for code-clone detection: past, present, and future trends | | 2020 | [link](https://dl.acm.org/doi/abs/10.1145/3381307.3381310) | | | | +| Semantically Find Similar Binary Codes with Mixed Key Instruction Sequence | | 2020 | [link](https://www.sciencedirect.com/science/article/abs/pii/S0950584920300732) | | | | +| LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code | | 2020 | [link](https://ieeexplore.ieee.org/document/9054845) | | | | +| Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree | SANER | 2020 | [link](https://arxiv.org/pdf/2002.08653.pdf) | | | | +| What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning | | 2020 | [link](https://arxiv.org/pdf/2002.02650.pdf) | | | | +| Clone Detection on Large Scala Codebases | | 2020 | [link](https://ieeexplore.ieee.org/document/9047640) | | | | +| CloneCompass: Visualizations for Code Clone Analysis | | 2020 | [link](https://dspace.library.uvic.ca/bitstream/handle/1828/11729/Ying_Wang_MSc_2020.pdf?sequence=1&isAllowed=y) | | | | +| DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing | NDSS | 2020 | [link](https://www.ndss-symposium.org/wp-content/uploads/2020/02/24311.pdf) | | [link](https://www.youtube.com/watch?v=TB50csOprMs) | [link](https://github.com/yueduan/DeepBinDiff) | +| VGraph: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets | EuroS&P | 2020 | [link](https://www2.seas.gwu.edu/~howie/publications/VGraph-EuroSP20.pdf) | | | | +| Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection | AAAI | 2020 | [link](https://keenlab.tencent.com/en/whitepapers/Ordermatters.pdf) | | | | +| Similarity Metric Method for Binary Basic Blocks of Cross-Instruction Set Architecture | NDSS | 2020 | [link](https://www.ndss-symposium.org/wp-content/uploads/bar2020-23002.pdf) | | | [link](https://github.com/zhangxiaochuan/MIRROR) | | Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis | NDSS Workshop on Binary Analysis Research (BAR) | 2019 | [link](https://www.ndss-symposium.org/wp-content/uploads/bar2019_20_Massarelli_paper.pdf) | | | [link](https://github.com/lucamassarelli/Unsupervised-Features-Learning-For-Binary-Similarity) | -| Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization | IEEE S&P | 2019 | [link](https://www.computer.org/csdl/proceedings-article/sp/2019/666000a038/19skfc3ZfKo) | [link](https://pdfs.semanticscholar.org/38ae/cd9be307867e375b17597499e3e8be2d4930.pdf) | [link](https://www.youtube.com/watch?v=6ethsho5uJA&feature=emb_title) | | -| Semantic-Based Representation Binary Clone Detection for Cross-Architectures in the Internet of Things | MDPI | 2019 | [link](https://www.mdpi.com/2076-3417/9/16/3283/pdf) | | | | -| A Survey of Binary Code Similarity | CSUR | 2019 | [link](https://arxiv.org/pdf/1909.11424.pdf) | | | | -| 代码克隆检测研究进展 | 软件学报 | 2019 | [link](https://xin-xia.github.io/publication/rjxb181.pdf) | | | | -| A Systematic Review on Code Clone Detection | | 2019 | [link](https://ieeexplore.ieee.org/document/8719895) | | | | -| A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis | NDSS | 2019 | [link](https://arxiv.org/pdf/1812.09652.pdf) | | | [link](https://github.com/nlp-code-analysis/cross-arch-instr-model) | -| Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs | NDSS | 2019 | [link](https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_11-4_Zuo_paper.pdf) | [link](https://www.ndss-symposium.org/wp-content/uploads/ndss2019_11-4_Zuo_slides.pdf) | [link](https://www.youtube.com/watch?v=-BeqwMPQNrw&list=PLfUWWM-POgQvnPOa9Bo1AyKplMkOGfHUT&index=5&t=1s) | [model](https://nmt4binaries.github.io/) | -| SAFE: Self-Attentive Function Embeddings for Binary Similarity | | 2019 | [link](https://arxiv.org/pdf/1811.05296.pdf) | [link](https://www.dimva2019.org/wp-content/uploads/sites/31/2019/06/DIMVA19-Slides-22.pdf) | | [link](https://github.com/gadiluna/SAFE) | -| Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection | SANER | 2019 | [link](https://ieeexplore.ieee.org/document/8668039) | | | | -| 基于深度学习的跨平台二进制代码关联分析 | | 2019 | [link](https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD202001&filename=1019646524.nh) | | | | -| CVSkSA: cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph | | 2019 | [link](https://link.springer.com/article/10.1007/s11219-018-9435-5) | | | | +| Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization | IEEE S&P | 2019 | [link](https://www.computer.org/csdl/proceedings-article/sp/2019/666000a038/19skfc3ZfKo) | [link](https://pdfs.semanticscholar.org/38ae/cd9be307867e375b17597499e3e8be2d4930.pdf) | [link](https://www.youtube.com/watch?v=6ethsho5uJA&feature=emb_title) | | +| Semantic-Based Representation Binary Clone Detection for Cross-Architectures in the Internet of Things | MDPI | 2019 | [link](https://www.mdpi.com/2076-3417/9/16/3283/pdf) | | | | +| A Survey of Binary Code Similarity | CSUR | 2019 | [link](https://arxiv.org/pdf/1909.11424.pdf) | | | | +| 代码克隆检测研究进展 | 软件学报 | 2019 | [link](https://xin-xia.github.io/publication/rjxb181.pdf) | | | | +| A Systematic Review on Code Clone Detection | | 2019 | [link](https://ieeexplore.ieee.org/document/8719895) | | | | +| A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis | NDSS | 2019 | [link](https://arxiv.org/pdf/1812.09652.pdf) | | | [link](https://github.com/nlp-code-analysis/cross-arch-instr-model) | +| Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs | NDSS | 2019 | [link](https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_11-4_Zuo_paper.pdf) | [link](https://www.ndss-symposium.org/wp-content/uploads/ndss2019_11-4_Zuo_slides.pdf) | [link](https://www.youtube.com/watch?v=-BeqwMPQNrw&list=PLfUWWM-POgQvnPOa9Bo1AyKplMkOGfHUT&index=5&t=1s) | [model](https://nmt4binaries.github.io/) | +| SAFE: Self-Attentive Function Embeddings for Binary Similarity | | 2019 | [link](https://arxiv.org/pdf/1811.05296.pdf) | [link](https://www.dimva2019.org/wp-content/uploads/sites/31/2019/06/DIMVA19-Slides-22.pdf) | | [link](https://github.com/gadiluna/SAFE) | +| Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection | SANER | 2019 | [link](https://ieeexplore.ieee.org/document/8668039) | | | | +| 基于深度学习的跨平台二进制代码关联分析 | | 2019 | [link](https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD202001&filename=1019646524.nh) | | | | +| CVSkSA: cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph | | 2019 | [link](https://link.springer.com/article/10.1007/s11219-018-9435-5) | | | | | Function matching between binary executables: efficient algorithms and features | JCVHT | 2019 | [link](https://users.auth.gr/kehagiat/Papers/journal/2019JCVHuku.pdf) | | | | -| BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis | ICSME | 2018 | [link](https://loccs.sjtu.edu.cn/~romangol/publications/icsme18.pdf) | | | | -| αDiff: Cross-Version Binary Code Similarity Detection with DNN | ASE | 2018 | [link](https://dl.acm.org/doi/pdf/10.1145/3238147.3238199?download=true) | | | [dataset](https://github.com/twelveand0/alphadiff-dataset) | -| Binary Similarity Detection Using Machine Learning | PLDI | 2018 | [link](https://dl.acm.org/doi/10.1145/3264820.3264821) | | | | -| CCAligner: A Token Based Large-Gap Clone Detector | ICSE | 2018 | [link](http://home.ustc.edu.cn/~wpc520/papers/CCAligner.pdf) | | | | -| Oreo: Detection of Clones in the Twilight Zone | FSE | 2018 | [link](https://arxiv.org/pdf/1806.05837.pdf) | | | | -| VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-platform Binary | ASE | 2018 | [link](https://dl.acm.org/doi/10.1145/3238147.3240480) | | | [link](https://github.com/buptsseGJ/VulSeeker) | -| VulSeeker-pro: enhanced semantic learning based binary vulnerability seeker with emulation | | 2018 | [link](https://dl.acm.org/doi/10.1145/3236024.3275524) | | | | -| FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware | | 2018 | [link](https://dl.acm.org/doi/10.1145/3296957.3177157) | | | | -| BINARM: Scalable and Efficient Detection of Vulnerabilities in Firmware Images of Intelligent Electronic Devices | | 2018 | [link](https://users.encs.concordia.ca/~wang/papers/dimva18paria.pdf) | | | | -| A Resilient and Efficient System for Identifying FOSS Functions in Malware Binaries | | 2018 | [link](https://dl.acm.org/doi/10.1145/3175492) | | | | -| Beyond Precision and Recall: Understanding Uses (and Misuses) of Similarity Hashes in Binary Analysis | | 2018 | [link](https://dl.acm.org/doi/10.1145/3176258.3176306) | [link](https://pagabuc.me/slides/codaspy18_pagani.slides.pdf) | | | -| BCD: Decomposing Binary Code Into Components Using Graph-Based Clustering | ASIA CCS | 2018 | [link](https://dl.acm.org/doi/10.1145/3196494.3196504) | | | | -| A Deep Learning Approach to Program Similarity | MASES | 2018 | [link](https://dl.acm.org/doi/10.1145/3243127.3243131) | | | | -| Recurrent Neural Network for Code Clone Detection | SEIM | 2018 | [link](https://seim-conf.org/media/materials/2018/proceedings/SEIM-2018_Short_Papers.pdf#page=48) | | | | -| The Adverse Effects of Code Duplication in Machine Learning Models of Code | | 2018 | [link](https://dl.acm.org/doi/10.1145/3359591.3359735) | | [link](https://www.youtube.com/watch?v=uvWfpE2LhOo) | | -| Benchmarks for software clone detection: A ten-year retrospective | SANER | 2018 | [link](https://ieeexplore.ieee.org/document/8330194) | | | | -| Binary Code Clone Detection across Architectures and Compiling Configurations | ICPC | 2017 | [link](https://dl.acm.org/doi/10.1109/ICPC.2017.22) | | | | -| Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection | ACM CCS | 2017 | [link](https://arxiv.org/pdf/1708.06525.pdf) | | | [link](https://github.com/Yunlongs/Genimi) | -| BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection | ASIA CCS | 2017 | [link](https://dl.acm.org/doi/10.1145/3052973.3052974) | | | | -| BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape | DIMVA | 2017 | [link](https://link.springer.com/chapter/10.1007/978-3-319-60876-1_14) | | | | -| Compiler-agnostic function detection in binaries | IEEE EuroS&P | 2017 | [link](https://ieeexplore.ieee.org/document/7961979) | | | [link](https://github.com/uxmal/nucleus) | -| BinSign: Fingerprinting binary functions to support automated analysis of code executables | | 2017 | [link](https://spectrum.library.concordia.ca/982206/1/Nouh_MASc_S2017.pdf) | | | | -| Similarity of binaries through re-optimization | PLDI | 2017 | [link](https://dl.acm.org/doi/10.1145/3062341.3062387) | [link](https://nimrodpar.github.io/assets/presentations/gitz-pldi17.pdf) | | | -| Transferring code-clone detection and analysis to practice | ICSE-SEIP | 2017 | [link](https://dl.acm.org/doi/10.1109/ICSE-SEIP.2017.6) | | | | -| Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping | IEEE S&P | 2017 | [link](https://ieeexplore.ieee.org/document/7958617) | | | | -| Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code | IJCAI | 2017 | [link](https://www.ijcai.org/Proceedings/2017/0423.pdf) | | | | -| Extracting Conditional Formulas for Cross-Platform Bug Search | ASIA CCS | 2017 | [link](https://dl.acm.org/doi/10.1145/3052973.3052995) | | | | -| SPAIN: Security Patch Analysis for Binaries Towards Understanding the Pain and Pills | ICSE | 2017 | [link](https://ieeexplore.ieee.org/document/7985685) | | | | -| CCLearner: A Deep Learning-Based Clone Detection Approach | | 2017 | [link](http://people.cs.vt.edu/nm8247/publications/icsme-research-118-camera-ready.pdf) | | | [link](https://github.com/liuqingli/CCLearner) | -| BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking | USENIX | 2017 | [link](https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-ming.pdf) | [link](https://www.usenix.org/sites/default/files/conference/protected-files/usenixsecurity17_slides_jiang_ming.pdf) | [link](https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/ming) | | -| In-memory Fuzzing for Binary Code Similarity Analysis | ASE | 2017 | [link](https://dl.acm.org/doi/10.5555/3155562.3155606) | | | | -| DéjàVu: a map of code duplicates on GitHub | OOPSLA | 2017 | [link](https://dl.acm.org/doi/10.1145/3133908) | | | | -| Some from Here, Some from There: Cross-project Code Reuse in GitHub | MSR | 2017 | [link](https://dl.acm.org/doi/10.1109/MSR.2017.15) | | | | -| CVSSA: Cross-Architecture Vulnerability Search in Firmware Based on Support Vector Machine and Attributed Control Flow Graph | | 2017 | [link](https://link.springer.com/article/10.1007/s11219-018-9435-5) | | | | +| BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis | ICSME | 2018 | [link](https://loccs.sjtu.edu.cn/~romangol/publications/icsme18.pdf) | | | | +| αDiff: Cross-Version Binary Code Similarity Detection with DNN | ASE | 2018 | [link](https://dl.acm.org/doi/pdf/10.1145/3238147.3238199?download=true) | | | [dataset](https://github.com/twelveand0/alphadiff-dataset) | +| Binary Similarity Detection Using Machine Learning | PLDI | 2018 | [link](https://dl.acm.org/doi/10.1145/3264820.3264821) | | | | +| CCAligner: A Token Based Large-Gap Clone Detector | ICSE | 2018 | [link](http://home.ustc.edu.cn/~wpc520/papers/CCAligner.pdf) | | | | +| Oreo: Detection of Clones in the Twilight Zone | FSE | 2018 | [link](https://arxiv.org/pdf/1806.05837.pdf) | | | | +| VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-platform Binary | ASE | 2018 | [link](https://dl.acm.org/doi/10.1145/3238147.3240480) | | | [link](https://github.com/buptsseGJ/VulSeeker) | +| VulSeeker-pro: enhanced semantic learning based binary vulnerability seeker with emulation | | 2018 | [link](https://dl.acm.org/doi/10.1145/3236024.3275524) | | | | +| FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware | | 2018 | [link](https://dl.acm.org/doi/10.1145/3296957.3177157) | | | | +| BINARM: Scalable and Efficient Detection of Vulnerabilities in Firmware Images of Intelligent Electronic Devices | | 2018 | [link](https://users.encs.concordia.ca/~wang/papers/dimva18paria.pdf) | | | | +| A Resilient and Efficient System for Identifying FOSS Functions in Malware Binaries | | 2018 | [link](https://dl.acm.org/doi/10.1145/3175492) | | | | +| Beyond Precision and Recall: Understanding Uses (and Misuses) of Similarity Hashes in Binary Analysis | | 2018 | [link](https://dl.acm.org/doi/10.1145/3176258.3176306) | [link](https://pagabuc.me/slides/codaspy18_pagani.slides.pdf) | | | +| BCD: Decomposing Binary Code Into Components Using Graph-Based Clustering | ASIA CCS | 2018 | [link](https://dl.acm.org/doi/10.1145/3196494.3196504) | | | | +| A Deep Learning Approach to Program Similarity | MASES | 2018 | [link](https://dl.acm.org/doi/10.1145/3243127.3243131) | | | | +| Recurrent Neural Network for Code Clone Detection | SEIM | 2018 | [link](https://seim-conf.org/media/materials/2018/proceedings/SEIM-2018_Short_Papers.pdf#page=48) | | | | +| The Adverse Effects of Code Duplication in Machine Learning Models of Code | | 2018 | [link](https://dl.acm.org/doi/10.1145/3359591.3359735) | | [link](https://www.youtube.com/watch?v=uvWfpE2LhOo) | | +| Benchmarks for software clone detection: A ten-year retrospective | SANER | 2018 | [link](https://ieeexplore.ieee.org/document/8330194) | | | | +| Binary Code Clone Detection across Architectures and Compiling Configurations | ICPC | 2017 | [link](https://dl.acm.org/doi/10.1109/ICPC.2017.22) | | | | +| Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection | ACM CCS | 2017 | [link](https://arxiv.org/pdf/1708.06525.pdf) | | | [link](https://github.com/Yunlongs/Genimi) | +| BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection | ASIA CCS | 2017 | [link](https://dl.acm.org/doi/10.1145/3052973.3052974) | | | | +| BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape | DIMVA | 2017 | [link](https://link.springer.com/chapter/10.1007/978-3-319-60876-1_14) | | | | +| Compiler-agnostic function detection in binaries | IEEE EuroS&P | 2017 | [link](https://ieeexplore.ieee.org/document/7961979) | | | [link](https://github.com/uxmal/nucleus) | +| BinSign: Fingerprinting binary functions to support automated analysis of code executables | | 2017 | [link](https://spectrum.library.concordia.ca/982206/1/Nouh_MASc_S2017.pdf) | | | | +| Similarity of binaries through re-optimization | PLDI | 2017 | [link](https://dl.acm.org/doi/10.1145/3062341.3062387) | [link](https://nimrodpar.github.io/assets/presentations/gitz-pldi17.pdf) | | | +| Transferring code-clone detection and analysis to practice | ICSE-SEIP | 2017 | [link](https://dl.acm.org/doi/10.1109/ICSE-SEIP.2017.6) | | | | +| Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping | IEEE S&P | 2017 | [link](https://ieeexplore.ieee.org/document/7958617) | | | | +| Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code | IJCAI | 2017 | [link](https://www.ijcai.org/Proceedings/2017/0423.pdf) | | | | +| Extracting Conditional Formulas for Cross-Platform Bug Search | ASIA CCS | 2017 | [link](https://dl.acm.org/doi/10.1145/3052973.3052995) | | | | +| SPAIN: Security Patch Analysis for Binaries Towards Understanding the Pain and Pills | ICSE | 2017 | [link](https://ieeexplore.ieee.org/document/7985685) | | | | +| CCLearner: A Deep Learning-Based Clone Detection Approach | | 2017 | [link](http://people.cs.vt.edu/nm8247/publications/icsme-research-118-camera-ready.pdf) | | | [link](https://github.com/liuqingli/CCLearner) | +| BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking | USENIX | 2017 | [link](https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-ming.pdf) | [link](https://www.usenix.org/sites/default/files/conference/protected-files/usenixsecurity17_slides_jiang_ming.pdf) | [link](https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/ming) | | +| In-memory Fuzzing for Binary Code Similarity Analysis | ASE | 2017 | [link](https://dl.acm.org/doi/10.5555/3155562.3155606) | | | | +| DéjàVu: a map of code duplicates on GitHub | OOPSLA | 2017 | [link](https://dl.acm.org/doi/10.1145/3133908) | | | | +| Some from Here, Some from There: Cross-project Code Reuse in GitHub | MSR | 2017 | [link](https://dl.acm.org/doi/10.1109/MSR.2017.15) | | | | +| CVSSA: Cross-Architecture Vulnerability Search in Firmware Based on Support Vector Machine and Attributed Control Flow Graph | | 2017 | [link](https://link.springer.com/article/10.1007/s11219-018-9435-5) | | | | | Identifying Functionally Similar Code in Complex Codebases | ICPC | 2016 | [link](http://www.cs.columbia.edu/~simha/preprint_icpc16.pdf) | | | [link](https://github.com/Programming-Systems-Lab/ioclones) | -| Scalable graph-based bug search for firmware images (Genius) | ASM CCS | 2016 | [link](https://www.cs.ucr.edu/~heng/pubs/genius-ccs16.pdf) | | [link](https://www.youtube.com/watch?v=R9TPqflLGNs) | [link](https://github.com/qian-feng/Gencoding) | -| Cross-Architecture Binary Semantics Understanding via Similar Code Comparison | IEEE SANER | 2016 | [link](https://loccs.sjtu.edu.cn/~romangol/publications/saner16.pdf) | | | | -| discovRE: Efficient cross-architecture identification of bugs in binary code | NDSS | 2016 | [link](https://net.cs.uni-bonn.de/fileadmin/ag/martini/Staff/yakdan/discovre_ndss2016.pdf) | | | | -| BinGo: Cross-architecture cross-OS Binary Search | FSE | 2016 | [link](https://dl.acm.org/doi/10.1145/2950290.2950350) | | | | -| Kam1n0: Mapreduce-based assembly clone search for reverse engineering | KDD | 2016 | [link](https://dl.acm.org/doi/pdf/10.1145/2939672.2939719) | | | [link](https://github.com/McGill-DMaS/Kam1n0-Community) | -| Statistical similarity of binaries | PLDI | 2016 | [link](https://dl.acm.org/doi/10.1145/2980983.2908126) | [link](https://nimrodpar.github.io/assets/presentations/esh-pldi16.pdf) | | [link](https://github.com/tech-srl/esh) | -| Deep learning code fragments for code clone detection | ASE | 2016 | [link](https://ieeexplore.ieee.org/document/7582748) | | | | -| A Survey of Software Clone Detection Techniques | | 2016 | [link](https://pdfs.semanticscholar.org/8df3/d10963233aca0e7686b2818b0c47add5466d.pdf) | | | | -| SourcererCC: Scaling Code Clone Detection to Big Code | ICSE | 2016 | [link](https://arxiv.org/pdf/1512.06448.pdf) | | | | -| Binary executable file similarity calculation using function matching | | 2016 | [link](https://link.springer.com/article/10.1007/s11227-016-1941-2) | | | | -| Matching Similar Functions in Different Versions of a Malware | | 2016 | [link](https://ieeexplore.ieee.org/document/7846954) | | | | -| BinDNN: Resilient Function Matching Using Deep Learning | | 2016 | [link](http://patrickmcdaniel.org/pubs/securecomm16.pdf) | | | | -| VulPecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis | ACSAC | 2016 | [link](https://dl.acm.org/doi/10.1145/2991079.2991102) | | | [link](https://github.com/vulpecker/Vulpecker) | -| BigCloneEval: A Clone Detection Tool Evaluation Framework with BigCloneBench | | 2016 | [link](https://ieeexplore.ieee.org/document/7816515) | | | [link](https://github.com/jeffsvajlenko/BigCloneEval) | -| Cross-architecture bug search in binary executables | IEEE S&P | 2015 | [link](https://ieeexplore.ieee.org/document/7163056) | | | | -| Library functions identification in binary code by using graph isomorphism testings | | 2015 | [link](https://ieeexplore.ieee.org/document/7081836) | | | | -| Evaluating clone detection tools with BigCloneBench | | 2015 | [link](https://ieeexplore.ieee.org/document/7332459) | | | [link](https://github.com/clonebench/BigCloneBench) | -| Memoized semantics-based binary diffing with application to malware lineage inference | | 2015 | [link](https://faculty.ist.psu.edu/wu/papers/memoized-IFIP_SEC_2015.pdf) | | | | -| Sigma: A semantic integrated graph matching approach for identifying reused functions in binary code | | 2015 | [link](https://www.dfrws.org/sites/default/files/session-files/paper-sigma_a_semantic_integrated_graph_matching_approach_for_identifying_reused_functions_in_binary_code.pdf) | [link](https://pdfs.semanticscholar.org/a036/ff11b1a675550ac57949bc540f400e8fa695.pdf) | | | -| BYTEWEIGHT: Learning to Recognize Functions in Binary Code | USENIX | 2014 | [link](https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-bao.pdf) | [link](https://www.usenix.org/sites/default/files/conference/protected-files/sec14_slides_bao.pdf) | [link](https://www.usenix.org/node/184522) | | -| Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection | FSE | 2014 | [link](https://dl.acm.org/doi/10.1145/2635868.2635900) | | | | -| Binclone: Detecting code clones in malware | SERE | 2014 | [link](https://cradpdf.drdc-rddc.gc.ca/PDFS/unc194/p800686_A1b.pdf) | | | [link](https://github.com/BinSigma/BinClone) | -| Detecting fine-grained similarity in binaries | | 2014 | [link](https://web.cs.ucdavis.edu/~su/theses/AS-dissertation.pdf) | | | | -| Leveraging semantic signatures for bug search in binary programs | ACSAC | 2014 | [link](https://dl.acm.org/doi/10.1145/2664243.2664269) | | | | -| How Accurate Is Coarse-grained Clone Detection?: Comparision with Fine-grained Detectors | | 2014 | [link](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.685.7674&rep=rep1&type=pdf) | | | | -| Tracelet-based code search in executables | PLDI | 2014 | [link](https://dl.acm.org/doi/10.1145/2594291.2594343) | | | | -| Control Flow-Based Malware Variant Detection | | 2014 | [link](https://ieeexplore.ieee.org/document/6601601) | | | | -| Hashing for Similarity Search: A Survey | | 2014 | [link](https://arxiv.org/pdf/1408.2927.pdf) | | | | -| Achieving accuracy and scalability simultaneously in detecting application clones on android markets | ICSE | 2014 | [link](https://dl.acm.org/doi/10.1145/2568225.2568286) | | | | -| Identifying Shared Software Components to Support Malware Forensics | | 2014 | [link](https://link.springer.com/chapter/10.1007/978-3-319-08509-8_2) | | | | -| Evaluating Modern Clone Detection Tools | | 2014 | [link](https://ieeexplore.ieee.org/document/6976098) | | | | -| Rendezvous: a search engine for binary code | MSR | 2013 | [link](https://dl.acm.org/doi/10.5555/2487085.2487147) | | | | -| Binslayer: accurate comparison of binary executables | PPREW | 2013 | [link](https://dl.acm.org/doi/10.1145/2430553.2430557) | | | [link](https://github.com/MartialB/BinSlayer) | -| Software clone detection: A systematic review | | 2013 | [link](https://romisatriawahono.net/lecture/rm/survey/software%20engineering/Software%20Construction/Rattan%20-%20Software%20Clone%20Detection%20-%202013.pdf) | | | | -| How to extract differences from similar programs? A cohesion metric approach | | 2013 | [link](https://ieeexplore.ieee.org/document/6613038) | | | | -| Software clone detection and refactoring | | 2013 | [link](https://www.researchgate.net/publication/258389603_Software_Clone_Detection_and_Refactoring) | | | | -| An Emerging Approach towards Code Clone Detection: Metric Based Approach on Byte Code | | 2013 | [link](http://ijarcsse.com/Before_August_2017/docs/papers/Volume_3/5_May2013/V3I5-0355.pdf) | | | | -| A hybrid-token and textual based approach to find similar code segments | | 2013 | [link](https://ieeexplore.ieee.org/document/6726700) | | | | -| Gapped code clone detection with lightweight source code analysis | | 2013 | [link](https://ieeexplore.ieee.org/abstract/document/6613837) | | | | -| MutantX-S: Scalable Malware Clustering Based on Static Features | USENIX | 2013 | [link](https://www.usenix.org/system/files/conference/atc13/atc13-hu.pdf) | | [link](https://www.usenix.org/node/174525) | | -| Binjuice: Fast Location of Similar Code Fragments Using Semantic Juice | PPREW | 2013 | [link](https://dl.acm.org/doi/10.1145/2430553.2430558) | | | | -| Towards Automatic Software Lineage Inference | USENIX | 2013 | [link](https://www.usenix.org/system/files/conference/usenixsecurity13/sec13-paper_jang.pdf) | | [link](https://www.usenix.org/conference/usenixsecurity13/technical-sessions/papers/jang) | | -| AnDarwin: Scalable Detection of Semantically Similar Android Applications | | 2013 | [link](https://ieeexplore.ieee.org/document/6985631) | | | | -| Expose: Discovering potential binary code re-use | | 2013 | [link](https://ieeexplore.ieee.org/document/6649873) | | | | -| Function Matching-based Binary level Software Similarity Calculation | RACS | 2013 | [link](https://dl.acm.org/doi/10.1145/2513228.2513300) | | | | -| FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors | RAID | 2013 | [link](https://software.imdea.org/~juanca/papers/firma_raid13.pdf) | | | | -| A study of repetitiveness of code changes in software evolution | ASE | 2013 | [link](https://dl.acm.org/doi/10.1109/ASE.2013.6693078) | | | | -| ibinhunt: Binary hunting with interprocedural control flow | | 2012 | [link](https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=2699&context=sis_research) | [link](https://slideplayer.com/slide/4168742/) | | | -| ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions | USENIX | 2012 | [link](https://users.ece.cmu.edu/~jiyongj/papers/oakland12.pdf) | | | | -| Boreas: an accurate and scalable token-based approach to code clone detection | ASE | 2012 | [link](https://dl.acm.org/doi/10.1145/2351676.2351725) | | | | -| Folding Repeated Instructions for Improving Token-Based Code Clone Detection | | 2012 | [link](https://ieeexplore.ieee.org/document/6392103) | | | | -| A metrics-based data mining approach for software clone detection | | 2012 | [link](https://ieeexplore.ieee.org/document/6340252) | | | | -| Comparison of Clone Detection Techniques | | 2012 | | | | | -| Malware Classification Method via Binary Content Comparison | RACS | 2012 | [link](https://dl.acm.org/doi/10.1145/2401603.2401672) | | | | -| Binary function clustering using semantic hashes | ICMLA | 2012 | [link](https://ieeexplore.ieee.org/document/6406693) | | | | -| Value-based program characterization and its application to software plagiarism detection | | 2011 | [link](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.370.9508&rep=rep1&type=pdf) | | | | -| CMCD: Count Matrix Based Code Clone Detection | | 2011 | [link](https://ieeexplore.ieee.org/iel5/6129717/6130641/06130694.pdf) | | | | -| Incremental code clone detection: A pdg-based approach | | 2011 | [link](https://ieeexplore.ieee.org/document/6079769) | | | | -| Anywhere, Any-Time Binary Instrumentation | | 2011 | [link](https://dl.acm.org/doi/10.1145/2024569.2024572) | | | | -| Code reuse in open source software development: Quantitative evidence, drivers, and impediments | | 2010 | | | | | -| Index-based code clone detection: incremental, distributed, scalable | | 2010 | | | | | -| Detection of Type-1 and Type-2 Code Clones Using Textual Analysis and Metrics | | 2010 | | | | | -| Ghezzi, A hybrid approach (syntactic and textual) to clone detection | | 2010 | | | | | -| Evaluating code clone genealogies at release level: An empirical study | | 2010 | | | | | -| A survey of Binary similarity and distance measures | | 2010 | | | | | -| Idea: Opcode-Sequence-Based Malware Detection | | 2010 | [link](https://tarjomefa.com/wp-content/uploads/2015/12/4215-English.pdf) | | | | -| Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces | USENIX | 2010 | | | | | -| Data fingerprinting with similarity digests | | 2010 | | | | | -| Automatic mining of functionally equivalent code fragments via random testing | | 2009 | | | | | -| A mutation/injection-based automatic framework for evaluating code clone detection tools | | 2009 | | | | | -| Problematic code clones identification using multiple detection results | | 2009 | | | | | -| Incremental clone detection | | 2009 | | | | | -| Scalable and incremental clone detection for evolving software | | 2009 | | | | | -| Large-scale Malware Indexing Using Function-call Graphs | | 2009 | | | | | -| Scalable, Behavior-Based Malware Clustering | | 2009 | | | | | -| peHash: A Novel Approach to Fast Malware Clustering | USENIX | 2009 | | | | | -| Detecting Code Clones in Binary Executables | | 2009 | | | | | -| Binhunt: Automatically finding semantic differences in binary programs | | 2008 | [link](https://people.eecs.berkeley.edu/~dawnsong/papers/2008%20binhunt_icics08.pdf) | | | | -| Scalable detection of semantic clones | | 2008 | [link](https://hiper.cis.udel.edu/lp/lib/exe/fetch.php/courses/icse08-gabel-detectclones.pdf) | | | | -| Deckard: Scalable and accurate tree-based detection of code clones | | 2007 | | | | | -| Large-scale code reuse in open source software | | 2007 | | | | | -| A survey on software clone detection research | | 2007 | [link](http://research.cs.queensu.ca/TechReports/Reports/2007-541.pdf) | | | | -| A study of consistent and inconsistent changes to code clones | | 2007 | | | | | -| Comparison and evaluation of clone detection tools | | 2007 | | | | | -| Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions | | 2007 | | | | | -| A Static Birthmark of Binary Executables Based on API Call Structure | | 2007 | | | | | -| CP-Miner: Finding copy-paste and related bugs in large-scale software code | | 2006 | | | | | -| Survey of research on software clones | | 2006 | [link](https://www.researchgate.net/publication/30815553_Survey_of_Research_on_Software_Clones) | | | | -| "Cloning considered harmful" considered harmful: patterns of cloning in software | | 2006 | [link](https://ieeexplore.ieee.org/document/4023973) | | | | -| GPLAG: detection of software plagiarism by program dependence graph analysis | | 2006 | | | | | -| Detecting Self-mutating Malware Using Control-flow Graph Matching | | 2006 | | | | | -| Identifying Almost Identical Files Using Context Triggered Piecewise Hashing | | 2006 | | | | | -| Hamsa: Fast signature generation for zero-day polymorphic worms with provable attack resilience | IEEE S&P | 2006 | | | | | -| Graph-based comparison of executable objects | | 2005 | | | | | -| SDD: high performance code clone detection system for large scale source code | | 2005 | [link](http://www.cs.cmu.edu/~seunghak/sdd_slee_2005.pdf) | | | | -| Polygraph: Automatically generating signatures for polymorphic worms | | 2005 | | | | | -| K-gram Based Software Birthmarks | | 2005 | | | | | -| Insights into System-Wide Code Duplication | IEEE | 2004 | [link](https://rmod.inria.fr/archives/papers/Rieg04bWCRE2004ClonesVisualization.pdf) | | | | -| Clone detection in source code by frequent itemset techniques | | 2004 | | | | | -| Evaluating clone detection techniques from a refactoring perspective | | 2004 | | | | | -| Structural comparison of executable objects | | 2004 | | | | | -| Code compaction of matching single-entry multiple-exit regions | | 2003 | [link](http://web.cs.ucla.edu/~palsberg/course/cs239/S04/papers/ChenLiGupta03.pdf) | | | | -| CloSpan: Mining: Closed sequential patterns in large datasets | | 2003 | | | | | -| Ccfinder: a multilinguistic token-based code clone detection system for large scale source code | | 2002 | | | | | -| Identifying similar code with program dependence graphs | | 2001 | | | | | -| Using slicing to identify duplication in source code | | 2001 | | | | | -| BMAT – A Binary Matching Tool for Stale Profile Propagation | | 2000 | | | | | -| A language independent approach for detecting duplicated code | | 1999 | | | | | -| Compressing Differences of Executable Code | | 1999 | | | | | -| Similarity search in high dimensions via hashing | | 1999 | | | | | -| Clone detection using abstract syntax trees | | 1998 | | | | | -| Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics | | 1996 | | | | | -| Pattern matching for clone and concept detection | | 1996 | | | | | -| On finding duplication and near-duplication in large software systems | | 1995 | [link](https://ieeexplore.ieee.org/document/514697) | | | | -| Detecting code similarity using patterns | | 1995 | | | | | -| A Cross-platform Binary Diff | | 1995 | | | | | - +| Scalable graph-based bug search for firmware images (Genius) | ASM CCS | 2016 | [link](https://www.cs.ucr.edu/~heng/pubs/genius-ccs16.pdf) | | [link](https://www.youtube.com/watch?v=R9TPqflLGNs) | [link](https://github.com/qian-feng/Gencoding) | +| Cross-Architecture Binary Semantics Understanding via Similar Code Comparison | IEEE SANER | 2016 | [link](https://loccs.sjtu.edu.cn/~romangol/publications/saner16.pdf) | | | | +| discovRE: Efficient cross-architecture identification of bugs in binary code | NDSS | 2016 | [link](https://net.cs.uni-bonn.de/fileadmin/ag/martini/Staff/yakdan/discovre_ndss2016.pdf) | | | | +| BinGo: Cross-architecture cross-OS Binary Search | FSE | 2016 | [link](https://dl.acm.org/doi/10.1145/2950290.2950350) | | | | +| Kam1n0: Mapreduce-based assembly clone search for reverse engineering | KDD | 2016 | [link](https://dl.acm.org/doi/pdf/10.1145/2939672.2939719) | | | [link](https://github.com/McGill-DMaS/Kam1n0-Community) | +| Statistical similarity of binaries | PLDI | 2016 | [link](https://dl.acm.org/doi/10.1145/2980983.2908126) | [link](https://nimrodpar.github.io/assets/presentations/esh-pldi16.pdf) | | [link](https://github.com/tech-srl/esh) | +| Deep learning code fragments for code clone detection | ASE | 2016 | [link](https://ieeexplore.ieee.org/document/7582748) | | | | +| A Survey of Software Clone Detection Techniques | | 2016 | [link](https://pdfs.semanticscholar.org/8df3/d10963233aca0e7686b2818b0c47add5466d.pdf) | | | | +| SourcererCC: Scaling Code Clone Detection to Big Code | ICSE | 2016 | [link](https://arxiv.org/pdf/1512.06448.pdf) | | | | +| Binary executable file similarity calculation using function matching | | 2016 | [link](https://link.springer.com/article/10.1007/s11227-016-1941-2) | | | | +| Matching Similar Functions in Different Versions of a Malware | | 2016 | [link](https://ieeexplore.ieee.org/document/7846954) | | | | +| BinDNN: Resilient Function Matching Using Deep Learning | | 2016 | [link](http://patrickmcdaniel.org/pubs/securecomm16.pdf) | | | | +| VulPecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis | ACSAC | 2016 | [link](https://dl.acm.org/doi/10.1145/2991079.2991102) | | | [link](https://github.com/vulpecker/Vulpecker) | +| BigCloneEval: A Clone Detection Tool Evaluation Framework with BigCloneBench | | 2016 | [link](https://ieeexplore.ieee.org/document/7816515) | | | [link](https://github.com/jeffsvajlenko/BigCloneEval) | +| Cross-architecture bug search in binary executables | IEEE S&P | 2015 | [link](https://ieeexplore.ieee.org/document/7163056) | | | | +| Library functions identification in binary code by using graph isomorphism testings | | 2015 | [link](https://ieeexplore.ieee.org/document/7081836) | | | | +| Evaluating clone detection tools with BigCloneBench | | 2015 | [link](https://ieeexplore.ieee.org/document/7332459) | | | [link](https://github.com/clonebench/BigCloneBench) | +| Memoized semantics-based binary diffing with application to malware lineage inference | | 2015 | [link](https://faculty.ist.psu.edu/wu/papers/memoized-IFIP_SEC_2015.pdf) | | | | +| Sigma: A semantic integrated graph matching approach for identifying reused functions in binary code | | 2015 | [link](https://www.dfrws.org/sites/default/files/session-files/paper-sigma_a_semantic_integrated_graph_matching_approach_for_identifying_reused_functions_in_binary_code.pdf) | [link](https://pdfs.semanticscholar.org/a036/ff11b1a675550ac57949bc540f400e8fa695.pdf) | | | +| BYTEWEIGHT: Learning to Recognize Functions in Binary Code | USENIX | 2014 | [link](https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-bao.pdf) | [link](https://www.usenix.org/sites/default/files/conference/protected-files/sec14_slides_bao.pdf) | [link](https://www.usenix.org/node/184522) | | +| Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection | FSE | 2014 | [link](https://dl.acm.org/doi/10.1145/2635868.2635900) | | | | +| Binclone: Detecting code clones in malware | SERE | 2014 | [link](https://cradpdf.drdc-rddc.gc.ca/PDFS/unc194/p800686_A1b.pdf) | | | [link](https://github.com/BinSigma/BinClone) | +| Detecting fine-grained similarity in binaries | | 2014 | [link](https://web.cs.ucdavis.edu/~su/theses/AS-dissertation.pdf) | | | | +| Leveraging semantic signatures for bug search in binary programs | ACSAC | 2014 | [link](https://dl.acm.org/doi/10.1145/2664243.2664269) | | | | +| How Accurate Is Coarse-grained Clone Detection?: Comparision with Fine-grained Detectors | | 2014 | [link](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.685.7674&rep=rep1&type=pdf) | | | | +| Tracelet-based code search in executables | PLDI | 2014 | [link](https://dl.acm.org/doi/10.1145/2594291.2594343) | | | | +| Control Flow-Based Malware Variant Detection | | 2014 | [link](https://ieeexplore.ieee.org/document/6601601) | | | | +| Hashing for Similarity Search: A Survey | | 2014 | [link](https://arxiv.org/pdf/1408.2927.pdf) | | | | +| Achieving accuracy and scalability simultaneously in detecting application clones on android markets | ICSE | 2014 | [link](https://dl.acm.org/doi/10.1145/2568225.2568286) | | | | +| Identifying Shared Software Components to Support Malware Forensics | | 2014 | [link](https://link.springer.com/chapter/10.1007/978-3-319-08509-8_2) | | | | +| Evaluating Modern Clone Detection Tools | | 2014 | [link](https://ieeexplore.ieee.org/document/6976098) | | | | +| Rendezvous: a search engine for binary code | MSR | 2013 | [link](https://dl.acm.org/doi/10.5555/2487085.2487147) | | | | +| Binslayer: accurate comparison of binary executables | PPREW | 2013 | [link](https://dl.acm.org/doi/10.1145/2430553.2430557) | | | [link](https://github.com/MartialB/BinSlayer) | +| Software clone detection: A systematic review | | 2013 | [link](https://romisatriawahono.net/lecture/rm/survey/software%20engineering/Software%20Construction/Rattan%20-%20Software%20Clone%20Detection%20-%202013.pdf) | | | | +| How to extract differences from similar programs? A cohesion metric approach | | 2013 | [link](https://ieeexplore.ieee.org/document/6613038) | | | | +| Software clone detection and refactoring | | 2013 | [link](https://www.researchgate.net/publication/258389603_Software_Clone_Detection_and_Refactoring) | | | | +| An Emerging Approach towards Code Clone Detection: Metric Based Approach on Byte Code | | 2013 | [link](http://ijarcsse.com/Before_August_2017/docs/papers/Volume_3/5_May2013/V3I5-0355.pdf) | | | | +| A hybrid-token and textual based approach to find similar code segments | | 2013 | [link](https://ieeexplore.ieee.org/document/6726700) | | | | +| Gapped code clone detection with lightweight source code analysis | | 2013 | [link](https://ieeexplore.ieee.org/abstract/document/6613837) | | | | +| MutantX-S: Scalable Malware Clustering Based on Static Features | USENIX | 2013 | [link](https://www.usenix.org/system/files/conference/atc13/atc13-hu.pdf) | | [link](https://www.usenix.org/node/174525) | | +| Binjuice: Fast Location of Similar Code Fragments Using Semantic Juice | PPREW | 2013 | [link](https://dl.acm.org/doi/10.1145/2430553.2430558) | | | | +| Towards Automatic Software Lineage Inference | USENIX | 2013 | [link](https://www.usenix.org/system/files/conference/usenixsecurity13/sec13-paper_jang.pdf) | | [link](https://www.usenix.org/conference/usenixsecurity13/technical-sessions/papers/jang) | | +| AnDarwin: Scalable Detection of Semantically Similar Android Applications | | 2013 | [link](https://ieeexplore.ieee.org/document/6985631) | | | | +| Expose: Discovering potential binary code re-use | | 2013 | [link](https://ieeexplore.ieee.org/document/6649873) | | | | +| Function Matching-based Binary level Software Similarity Calculation | RACS | 2013 | [link](https://dl.acm.org/doi/10.1145/2513228.2513300) | | | | +| FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors | RAID | 2013 | [link](https://software.imdea.org/~juanca/papers/firma_raid13.pdf) | | | | +| A study of repetitiveness of code changes in software evolution | ASE | 2013 | [link](https://dl.acm.org/doi/10.1109/ASE.2013.6693078) | | | | +| ibinhunt: Binary hunting with interprocedural control flow | | 2012 | [link](https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=2699&context=sis_research) | [link](https://slideplayer.com/slide/4168742/) | | | +| ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions | USENIX | 2012 | [link](https://users.ece.cmu.edu/~jiyongj/papers/oakland12.pdf) | | | | +| Boreas: an accurate and scalable token-based approach to code clone detection | ASE | 2012 | [link](https://dl.acm.org/doi/10.1145/2351676.2351725) | | | | +| Folding Repeated Instructions for Improving Token-Based Code Clone Detection | | 2012 | [link](https://ieeexplore.ieee.org/document/6392103) | | | | +| A metrics-based data mining approach for software clone detection | | 2012 | [link](https://ieeexplore.ieee.org/document/6340252) | | | | +| Comparison of Clone Detection Techniques | | 2012 | [link](https://www.academia.edu/34795889/Comparison_of_Clone_Detection_Techniques) |||| +| Malware Classification Method via Binary Content Comparison | RACS | 2012 | [link](https://dl.acm.org/doi/10.1145/2401603.2401672) | | | | +| Binary function clustering using semantic hashes | ICMLA | 2012 | [link](https://ieeexplore.ieee.org/document/6406693) | | | | +| Value-based program characterization and its application to software plagiarism detection | | 2011 | [link](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.370.9508&rep=rep1&type=pdf) | | | | +| CMCD: Count Matrix Based Code Clone Detection | | 2011 | [link](https://ieeexplore.ieee.org/iel5/6129717/6130641/06130694.pdf) | | | | +| Incremental code clone detection: A pdg-based approach | | 2011 | [link](https://ieeexplore.ieee.org/document/6079769) | | | | +| Anywhere, Any-Time Binary Instrumentation | | 2011 | [link](https://dl.acm.org/doi/10.1145/2024569.2024572) | | | | +| Code reuse in open source software development: Quantitative evidence, drivers, and impediments | | 2010 | [link](https://aisel.aisnet.org/cgi/viewcontent.cgi?article=1562&context=jais) | | | | +| Index-based code clone detection: incremental, distributed, scalable | | 2010 | [link](https://dl.acm.org/doi/10.1109/ICSM.2010.5609665) | | | | +| Detection of Type-1 and Type-2 Code Clones Using Textual Analysis and Metrics | | 2010 | [link](https://ieeexplore.ieee.org/document/5460547) | | | | +| Ghezzi, A hybrid approach (syntactic and textual) to clone detection | | 2010 | [link](https://dl.acm.org/doi/10.1145/1808901.1808914) | | | | +| Evaluating code clone genealogies at release level: An empirical study | | 2010 | [link](https://www2.cose.isu.edu/~minhazzibran/resources/MyPapers/GenealogyStudy_SCAM2010.pdf) | | | | +| A survey of Binary similarity and distance measures | | 2010 | [link](https://www.iiisci.org/journal/pdv/sci/pdfs/gs315jg.pdf) | | | | +| Idea: Opcode-Sequence-Based Malware Detection | | 2010 | [link](https://tarjomefa.com/wp-content/uploads/2015/12/4215-English.pdf)) | | | | +| Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces | USENIX | 2010 | [link](https://www.usenix.org/legacy/event/nsdi10/tech/full_papers/perdisci.pdf) | | | | +| Data fingerprinting with similarity digests | | 2010 | | | | | +| Automatic mining of functionally equivalent code fragments via random testing | | 2009 | | | | | +| A mutation/injection-based automatic framework for evaluating code clone detection tools | | 2009 | | | | | +| Problematic code clones identification using multiple detection results | | 2009 | | | | | +| Incremental clone detection | | 2009 | | | | | +| Scalable and incremental clone detection for evolving software | | 2009 | | | | | +| Large-scale Malware Indexing Using Function-call Graphs | | 2009 | | | | | +| Scalable, Behavior-Based Malware Clustering | | 2009 | | | | | +| peHash: A Novel Approach to Fast Malware Clustering | USENIX | 2009 | | | | | +| Detecting Code Clones in Binary Executables | | 2009 | | | | | +| Binhunt: Automatically finding semantic differences in binary programs | | 2008 | [link](https://people.eecs.berkeley.edu/~dawnsong/papers/2008%20binhunt_icics08.pdf) | | | | +| Scalable detection of semantic clones | | 2008 | [link](https://hiper.cis.udel.edu/lp/lib/exe/fetch.php/courses/icse08-gabel-detectclones.pdf) | | | | +| Deckard: Scalable and accurate tree-based detection of code clones | | 2007 | | | | | +| Large-scale code reuse in open source software | | 2007 | | | | | +| A survey on software clone detection research | | 2007 | [link](http://research.cs.queensu.ca/TechReports/Reports/2007-541.pdf) | | | | +| A study of consistent and inconsistent changes to code clones | | 2007 | | | | | +| Comparison and evaluation of clone detection tools | | 2007 | | | | | +| Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions | | 2007 | | | | | +| A Static Birthmark of Binary Executables Based on API Call Structure | | 2007 | | | | | +| CP-Miner: Finding copy-paste and related bugs in large-scale software code | | 2006 | | | | | +| Survey of research on software clones | | 2006 | [link](https://www.researchgate.net/publication/30815553_Survey_of_Research_on_Software_Clones) | | | | +| "Cloning considered harmful" considered harmful: patterns of cloning in software | | 2006 | [link](https://ieeexplore.ieee.org/document/4023973) | | | | +| GPLAG: detection of software plagiarism by program dependence graph analysis | | 2006 | | | | | +| Detecting Self-mutating Malware Using Control-flow Graph Matching | | 2006 | | | | | +| Identifying Almost Identical Files Using Context Triggered Piecewise Hashing | | 2006 | | | | | +| Hamsa: Fast signature generation for zero-day polymorphic worms with provable attack resilience | IEEE S&P | 2006 | | | | | +| Graph-based comparison of executable objects | | 2005 | | | | | +| SDD: high performance code clone detection system for large scale source code | | 2005 | [link](http://www.cs.cmu.edu/~seunghak/sdd_slee_2005.pdf) | | | | +| Polygraph: Automatically generating signatures for polymorphic worms | | 2005 | | | | | +| K-gram Based Software Birthmarks | | 2005 | | | | | +| Insights into System-Wide Code Duplication | IEEE | 2004 | [link](https://rmod.inria.fr/archives/papers/Rieg04bWCRE2004ClonesVisualization.pdf) | | | | +| Clone detection in source code by frequent itemset techniques | | 2004 | | | | | +| Evaluating clone detection techniques from a refactoring perspective | | 2004 | | | | | +| Structural comparison of executable objects | | 2004 | | | | | +| Code compaction of matching single-entry multiple-exit regions | | 2003 | [link](http://web.cs.ucla.edu/~palsberg/course/cs239/S04/papers/ChenLiGupta03.pdf) | | | | +| CloSpan: Mining: Closed sequential patterns in large datasets | | 2003 | | | | | +| Ccfinder: a multilinguistic token-based code clone detection system for large scale source code | | 2002 | | | | | +| Identifying similar code with program dependence graphs | | 2001 | | | | | +| Using slicing to identify duplication in source code | | 2001 | | | | | +| BMAT – A Binary Matching Tool for Stale Profile Propagation | | 2000 | | | | | +| A language independent approach for detecting duplicated code | | 1999 | | | | | +| Compressing Differences of Executable Code | | 1999 | | | | | +| Similarity search in high dimensions via hashing | | 1999 | | | | | +| Clone detection using abstract syntax trees | | 1998 | | | | | +| Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics | | 1996 | | | | | +| Pattern matching for clone and concept detection | | 1996 | | | | | +| On finding duplication and near-duplication in large software systems | | 1995 | [link](https://ieeexplore.ieee.org/document/514697) | | | | +| Detecting code similarity using patterns | | 1995 | | | | | +| A Cross-platform Binary Diff | | 1995 | | | | |