FDDetector: A Tool for Deduplicating Features in Software Product Lines

  • Amal Khtira IMS Team, ADMIR Laboratory, Rabat IT Center, ENSIAS, Mohammed V University, Rabat, Morocco
  • Anissa Benlarabi IMS Team, ADMIR Laboratory, Rabat IT Center, ENSIAS, Mohammed V University, Rabat
  • Bouchra El Asri IMS Team, ADMIR Laboratory, Rabat IT Center, ENSIAS, Mohammed V University, Rabat
Keywords: Software Product Line, Feature Models, Duplication, Natural Language Processing, Tool Support


Duplication is one of the model defects that affect software product lines during their evolution. Many approaches have been proposed to deal with duplication in code level while duplication in features hasn’t received big interest in literature. At the aim of reducing maintenance cost and improving product quality in an early stage of a product line, we have proposed in previous work a tool support based on a conceptual framework. The main objective of this tool called FDDetector is to detect and correct duplication in product line models. In this paper, we recall the motivation behind creating a solution for feature deduplication and we present progress done in the design and implementation of FDDetector.


A. Khtira, A. Benlarabi, and B. El Asri, "Model Defects in Evolving Software Product Lines: A Review of Literature," American Scientific Research Journal for Engineering, Technology, and Sciences (ASRJETS), vol. 41, no. 1, July 2018.

A. O. Elfaki, "A rule-based approach to detect and prevent inconsistency in the domain engineering process." Expert Systems, vol. 33, no. 1, pp. 3-13, 2016.

M. Alférez, R. E. Lopez-Herrejon, A. Moreira, et al., "Consistency Checking in Early Software Product Line Specifications-The VCC Approach." Journal of Universal Computer Science, vol. 20, no. 5, pp. 640-665, 2014.

C. Quinton, A. Pleuss, D. L. Berre, et al., "Consistency checking for the evolution of cardinality-based feature models," in Proc. 18th International Software Product Line Conference-Volume 1, ACM, Sept. 2014, pp. 122-131.

Z. Stephenson, K. Attwood, and J. McDermid, "Product-Line Models to Address Requirements Uncertainty, Volatility and Risk," in Relating Software Requirements and Architectures, Springer Berlin Heidelberg, pp. 111-131, 2011.

L. Neves, P. Borba, V. Alves, et al., "Safe evolution templates for software product lines." Journal of Systems and Software, vol. 106, pp. 42-58, 2015.

I. Groher, A. Reder, and A. Egyed, "Incremental consistency checking of dynamic constraints," in: Rosenblum D.S., Taentzer G. (eds) Fundamental Approaches to Software Engineering (FASE 2010), Lecture Notes in Computer Science, Springer Berlin Heidelberg, vol. 6013, pp. 203-217, 2010.

L. Zhu, M. Amsler, T. Fuhrer, et al., "A fingerprint based metric for measuring similarities of crystalline structures," The Journal of chemical physics, 2016, vol. 144, no. 3, p. 034203.

T. Papenbrock, A. Heise, and F. Naumann, "Progressive duplicate detection," IEEE Transactions on knowledge and data engineering, 2015, vol. 27, no. 5, pp. 1316-1329.

Z. Zhou, QM. J. Wu, F. Huang, et al. "Fast and accurate near-duplicate image elimination for visual sensor networks," International Journal of Distributed Sensor Networks, 2017, vol. 13, no. 2, pp. 1-12.

Z. Zhang, D. Wang, C. Wang, et al., "Detecting Copy-move Forgeries in Images Based on DCT and Main Transfer Vectors," KSII Transactions on Internet & Information Systems, 2017, vol. 11, no. 9.

J. Yao, B. Yang, and Q. Zhu, "Near-duplicate image retrieval based on contextual descriptor," IEEE signal processing letters, 2015, vol. 22, no. 9, pp. 1404-1408.

K. Aggarwal, F. Timbers, T. Rutgers, et al. "Detecting duplicate bug reports with software engineering domain knowledge," Journal of Software: Evolution and Process, 2017, vol. 29, no. 3, p. e1821.

A. Hindle, A. Alipour, and E. Stroulia, "A contextual approach towards more accurate duplicate bug report detection and ranking," Empirical Software Engineering, 2016, vol. 21, no. 2, pp. 368-410.

Y. Zhang, D. Lo, X. Xia, et al., "Multi-factor duplicate question detection in stack overflow," Journal of Computer Science and Technology, 2015, vol. 30, no. 5, pp. 981-997.

H. A. Chowdhury and D. K. Bhattacharyya, "Plagiarism: taxonomy, tools and detection techniques," 2018, arXiv preprint arXiv:1801.06323

H. Sajnani, "Large-Scale Code Clone Detection," Doctoral thesis, UC Irvine, 2016.

Y. Dang, D. Zhang, S. Ge, et al., "Transferring code-clone detection and analysis to practice," in Proc. IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), IEEE, 2017, pp. 53-62.

J. T. Svajlenko, "Large-scale clone detection and benchmarking," Doctoral thesis, University of Saskatchewan, 2018.

A. Khtira, A. Benlarabi, and B. El Asri, "Duplication Detection when evolving Feature Models of Software Product Lines," Information Science Journal (ISJ), vol. 6, no. 4, pp. 592-612, Oct. 2015.

A. Khtira, A. Benlarabi, and B. El Asri, "Modelling and Correcting Duplication in Evolving Software Product Lines," IJCSI International Journal of Computer Science Issues, vol. 15, no. 4, July 2018.

A. Khtira, A. Benlarabi, and B. El Asri, "A Tool Support for Automatic Detection of Duplicate Features during Software Product Lines Evolution," IJCSI International Journal of Computer Science Issues, vol. 12, no. 4, pp. 1-10, July 2015.

H. Müller, J-C. Freytag, Problems, methods, and challenges in comprehensive data cleansing, Professoren des Inst. Für Informatik, 2005.

M. Bilenko and R. J. Mooney, "Adaptive duplicate detection using learnable string similarity measures," in Proc. 9th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003, pp. 39-48.

J. LY. Koh, M. L. Lee, A. M. Khan, et al., "Duplicate detection in biological data using association rule mining," in Proc. 2nd European Workshop on Data Mining and text Mining in Bioinformatics, 2004, pp. 35-41.

L. Leitão, P. Calado, and M. Weis, "Structure-based inference of XML similarity for fuzzy duplicate detection," in. Proc. 16th ACM conference on Conference on information and knowledge management, ACM, 2007, pp. 293-302.

L. Leitão, P. Calado, and M. Herschel, "Efficient and Effective Duplicate Detection in Hierarchical Data," IEEE Transactions on knowledge and data engineering, 2013, vol. 25, no. 5, pp. 1028-1041.

U. Draisbach and F. Naumann, "A generalization of blocking and windowing algorithms for duplicate detection," in. Proc. International Conference of Data Knowledge Engineering, 2011, pp. 18-24.

W. Jun, Y. Lee, and B-M. Jun, "Duplicate video detection for large-scale multimedia," Multimedia Tools and Applications, 2015, vol. 75, no. 23, pp. 15665-15678.

B. Chen, H. Shu, G. Coatrieux, et al., "Color image analysis by quaternion-type moments," Journal of mathematical imaging and vision, 2015, vol. 51, no. 1, pp. 124-144.

C. Lin and S. Wang, "An edge-based image copy detection scheme," Fundam Inform, 2008, vol. 83, no. 3, pp. 299-318.

W. Zhou, H. Li, Y. Lu, et al., "SIFT match verification by geometric coding for large-scale partial-duplicate web image search," ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2013, vol. 9, no. 1, p. 4.

K. Yan and R. Sukthankar, "PCA-SIFT: A more distinctive representa- tion for local image descriptors," in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., Jun./Jul. 2004, vol. 4, pp. 506-513.

H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, "Speeded-up robust features (SURF)," Computer vision and image understanding, Jun. 2008, vol. 110, no. 3, pp. 346-359.

Z. Li and X. Feng, "Near Duplicate Image Detecting Algorithm based on Bag of Visual Word Model," Journal of Multimedia, 2013, vol. 8, no. 5.

V. T. Martins, D. Fonte, P. R. Henriques, et al., "Plagiarism detection: A tool survey and comparison," in 3rd Symposium on Languages, Applications and Technologies, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2014.

M. Ahasanuzzaman, M. Asaduzzaman, C. K. Roy, et al. "Mining duplicate questions in stack overflow," in Proc. 13th International Conference on Mining Software Repositories, ACM, 2016. pp. 402-412.

K. K. Sabor, A. Hamou-lhadj, and A. Larsson, "Durfex: a feature extraction technique for efficient detection of duplicate bug reports," in. Proc. IEEE international conference on software quality, reliability and security (QRS), IEEE, 2017, pp. 240-250.

W. E. Zhang, Q. Z. Sheng, J. H. Lau, et al., "Detecting duplicate posts in programming QA communities via latent semantics and association rules," in Proc. 26th International Conference on World Wide Web, 2017, pp. 1221-1229.

A. Hunt and D. Thomas, The pragmatic programmer: from journeyman to master, Addison-Wesley Professional, 2000.

K. Tonscheidt, "Leveraging code clone detection for the incremental migration of cloned product variants to a software product line: An explorative study," Bachelorarbeit, Otto-von-Guericke-Universität Magdeburg, 2015, pp. 4-16.

S. Schulze, Analysis and Removal of Code Clones in Software Product Lines, Ph.D. thesis, Magdeburg University, 2012.

R. Koschke, "Identifying and removing software clones," in T. Mens and S. Demeyer (eds.), Software Evolution, Springer, 2008.

R. Koschke, "Survey of research on software clones," in Proc. Dagstuhl Seminar on Duplication, Redundancy, and Similarity in Software, 2007.

E. Juergens, F. Deissenboeck, B. Hummel, and S. Wagner, "Do code clones matter?", in Proc. 31st International Conference on Software Engineering, IEEE, May 2009, pp. 485-495.

S. Apel, D. Batory, C. Kästner, and G. Saake, "Analysis of Software Product Lines," in Feature-Oriented Software Product Lines, Berlin: Springer, 2013, pp. 243-282.

R. M. de Mello, E. Nogueira, M. Schots, et al., "Verification of Software Product Line Artefacts: A Checklist to Support Feature Model Inspections," Journal of Universal Computer Science, 2014, vol. 20, no. 5, pp. 720-745.

D. Berry, "Natural language and requirements engineering-Nu? ," in International Workshop on Requirements Engineering, Imperial College, London, UK. 2001.

M. Luisa, F. Mariangela, and N. I. Pierluigi, "Market research on requirements analysis using linguistic tools," Requirements Engineering, 2004, vol. 9, no. 1, pp. 40–56.

M. Schubanz, A. Pleuss, G. Botterweck, et al., "Modeling rationale over time to support product line evolution planning," in Proc. 6th International Workshop on Variability Modeling of Software-Intensive Systems, ACM, 2012, pp. 193-199.

F. Van Der Linden, K. Schmid, and E. Rommes, Software Product Lines in Action: The Best Industrial Practice in Product Line Engineering, Springer, 2007.

M. Cordy, P. Y. Schobbens, P. Heymans, and A. Legay, "Beyond boolean product-line model checking: dealing with feature attributes and multi-features," in Proc. International Conference on Software Engineering, IEEE, May 2013, pp. 472-481.

D. Romero, S. Urli, C. Quinton, et al., "SPLEMMA: A generic framework for controlled-evolution of software product lines," in Proc. 17th International Software Product Line Conference Collocated Workshops, Tokyo, Japan, 26-30 August 2013, pp. 59–66.

L. Neves, P. Borba, V. Alves, et al., "Safe evolution templates for software product lines," Journal of Systems and Software, 2015, vol. 106, pp. 42-58.

S. McConnell, "Software quality at top speed," Software Development, 1996, vol. 4, no. 8, pp. 38-42.

D. Edwards, "DevOps: Shift left with continuous testing by using automation and virtualization," Sep. 18th, 2014, https://www.ibm.com/developerworks/community/blogs/invisiblethread/entry/enabling_devops_success_with_shift_left_continuous_testing?lang=en [retrieved: October, 2016]

T. Kasse, Practical insight into CMMI, Artech House, 2008.

C. Kästner, T. Thüm, G. Saake, J. Feigenspan, T. Leich, F. Wielgorz, and S. Apel, "Featureide: A tool framework for feature-oriented software development," in Proc. 31st International Conference on Software Engineering (ICSE’09), IEEE, Washington, DC, USA, 2009, pp. 611–614.

A. Khtira, A. Benlarabi, and B. El Asri, "Detecting Feature Duplication in Natural Language Specifications when Evolving Software Product Lines," in Proc. 10th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE’15), Barcelona, Spain, Apr. 2015, pp. 257-262.

The Eclipse Foundation, "SWT: The Standard Widget Toolkit," eclipse.org/swt/ [retrieved: December, 2019]

"The prefuse visualization toolkit," https://web.archive.org/web/20181226190156/http://prefuse.org/ [retrieved: December, 2019]

The Apache Software Foundation, "OpenNLP," opennlp.apache.org [retrieved: December, 2019]

MongoDB, "Java MongoDB Driver," https://docs.mongodb.com/ecosystem/drivers/java/ [retrieved: December, 2019].

MongoDB, "What is MongoDB?" https://www.mongodb.com/what-is-mongodb [retrieved: December, 2019].

Brat, "brat rapid annotation tool," http://brat.nlplab.org/index.html [retrieved: December, 2019]

Sourcemaking, "Visitor Design Pattern," https://sourcemaking.com/design_patterns/visitor [retrieved: December, 2019]

J. Rubin, A. Kirshin, G. Botterweck, and M. Chechik, "Managing forked product variants," in Proc. 16th International Software Product Line Conference-Volume 1, ACM, Sept. 2012, pp. 156-160.

T. Schmorleiz and R. Lämmel, "Similarity management of 'cloned and owned' variants," in Proc. 31st Annual ACM Symposium on Applied Computing, ACM, Apr. 2016, pp. 1466-1471.

R. Hellebrand, A. Silva, M. Becker, et al., "Coevolution of variability models and code: an industrial case study", in 18th International Software Product Line Conference, ACM, Sept. 2014, Vol. 1, pp. 274-283.

J. Rubin, K. Czarnecki, and M. Chechik, "Cloned product variants: from ad-hoc to managed software product lines", International Journal on Software Tools for Technology Transfer, Vol. 17, No. 5, 2015, pp. 627-646.

J. Ghofrani, M. Mohseni, and A. Bozorgmehr, "A conceptual framework for clone detection using machine learning", in 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI), Dec. 2017, pp. 0810-0817.

H. Störrle, "Towards clone detection in UML domain models," Software & Systems Modeling, 2013, Vol. 12, no 2, pp. 307-329.