News
06-22-2013 - Version 1.0 of ForkSim has been released.
About
ForkSim generated collections of artificial software forks for the testing and evaluation of cross-project similarity analysis and visualization tools. It begins with a base subject system, forks it, and simulates continued development activities on each of the forks. The result is a dataset of forks with known similarities and differences.
ForkSim simulates the following development activities, where 'new source' is post-fork development, and 'existing source' is pre-fork development:
The development activities are accomplished by injecting software artifacts into a subset of the forks. The forks to inject a artifact into are chosen at random. The injection locations are randomly either uniform across the chosen forks, or non-uniform. The artifacts may be randomly mutated before injection, to simulate inconsistent development of similar features across forks. File system level software artifacts may be renamed before injection. The simulator considers the following artifacts: source directories, source files, and methods. ForkSim is capable of creating Java, C and C# datasets.
For more information on the generation process, and how it correlates to the six development activities, please refer to our paper.
06-22-2013 - Version 1.0 of ForkSim has been released.
About
ForkSim generated collections of artificial software forks for the testing and evaluation of cross-project similarity analysis and visualization tools. It begins with a base subject system, forks it, and simulates continued development activities on each of the forks. The result is a dataset of forks with known similarities and differences.
ForkSim simulates the following development activities, where 'new source' is post-fork development, and 'existing source' is pre-fork development:
- New source is added.
- Existing source is removed.
- Existing source code is modified and/or evolved.
- Existing source code is moved.
- Source code is copied from another fork. It may be copied into a different position than in the source fork, and it may be modified and/or evolved independently of the source.
- The fork is itself forked.
The development activities are accomplished by injecting software artifacts into a subset of the forks. The forks to inject a artifact into are chosen at random. The injection locations are randomly either uniform across the chosen forks, or non-uniform. The artifacts may be randomly mutated before injection, to simulate inconsistent development of similar features across forks. File system level software artifacts may be renamed before injection. The simulator considers the following artifacts: source directories, source files, and methods. ForkSim is capable of creating Java, C and C# datasets.
For more information on the generation process, and how it correlates to the six development activities, please refer to our paper.