Usage
./forksim myProperties myOutputDirectory
myProperties - Properties file specifying your desired generation parameters.
myOutputDirectory - Where to output the generated dataset and generation log. Directory must not already exist.
Directions
Generation Parameters
Output
ForkSim outputs a directory containing the generated dataset. A generated dataset of 5 forks would have the following structure:
output/
0/ ## Generated Fork 0
1/ ## Generated Fork 1
2/ ## Generated Fork 2
3/ ## Generated Fork 3
4/ ## Generated Fork 4
dirs/ ## A copy of the directories injected into the forks, and their mutants
files/ ## A copy of the files injected into the forks, and their mutants.
function_fragments/ ## A copy of the functions injected into the forks, and their mutants.
log ## The generation log.
originalSystem/ ## A copy of the subject system used.
sourceRepository/ ## A copy of the source repository used.
The dirs, files, and function_fragments directories contain a folder for each directory/file/function chosen for injection. This folder contains a file/folder "original" with the original state of the source artefact, and then a copy of the particular version (subject to mutation/renaming) of the file/folder that was injected into the chosen forks (named after the fork).
For example. The 5th file injection will be stored in output/files/5. It will contain a file "original", which is a copy of the original file. If this file was injected into forks 0, 2, and 3 (possibly with mutations), then files "0", "2" and "3" will be in this folder. Their content will be the version of the file injected into their respective forks.
Generation Log Format
Linked here is a sample log. It reports the generation parameters, than lists the injected files, directories, and functions in injection order.
File Injection Logging
The following is an example of a logged file injection:
5 U 3 /home/jeff/git/ForkingSimulator/output/sourceRepository/java6/com/sun/source/tree/TypeCastTree.java
0 R O /home/jeff/git/ForkingSimulator/output/0/CH/ifa/draw/application/pMoYoEhPnuKx8ZKo2En.java
3 O M mCW_A 2 1 /home/jeff/git/ForkingSimulator/output/3/CH/ifa/draw/application/TypeCastTree.java
4 R M mARI 3 2 /home/jeff/git/ForkingSimulator/output/4/CH/ifa/draw/application/JYzlEy.java
The first line is its header, which describes the selected file from the source directory, and the general injection parameters chosen. The tabbed lines describe each injection into the generated forks.
The header has the following generic format:
Each injection has the following generic format:
Directory Injection Logging
Please see above linked log for an example (too long to include here). However, it has the following tab format:
Header For Selected Directory for Injection
Header For Particular Injection of this Directory
Description of Each File Injection as Part of the Directory Injection
The first line is its header, which describes the selected directroy from the source repository, and the general injection parameters chosen. The tabbed lines are headers for each injection of the directory. The double tabbed lines describe each file injection due to each directory injection.
The header has the following generic format:
Where:
The directory injection headers have the following generic format:
Where:
Each file injection has the following generic format:
Where:
Function Injection Logging
The following is an example of a logged function injection:
50 V 2 446 472 /home/jeff/git/ForkingSimulator/output/sourceRepository/java6/com/sun/corba/se/impl/javax/rmi/CORBA/Util.java
0 M mSIL 2 3 38 64 /home/jeff/git/ForkingSimulator/output/0/CH/ifa/draw/util/CommandChoice.java
3 O 41 67 /home/jeff/git/ForkingSimulator/output/3/CH/ifa/draw/util/UndoableHandle.java
The first line is its header, which describes the selected function from the source repository, and the generation injection parameters chosen. The tabbed lines describe each injection into the generated forks.
The header has the following generic format:
Where:
Each injection has the following generic format:
./forksim myProperties myOutputDirectory
myProperties - Properties file specifying your desired generation parameters.
myOutputDirectory - Where to output the generated dataset and generation log. Directory must not already exist.
Directions
- Navigate to ForkSim/.
- Make a copy of 'properties' and edit for your desired generation parameters. This includes specifying your subject system and source repository.
- Execute forksim: " ./forksim myProperties myOutputDirectory "
- Wait while forksim processes, this may take some time!
- Your forks, and the generation log, are located in "myOutputDirectory".
Generation Parameters
- system - Path to subject system to use as a base of the generated forks.
- repository - Path to a collection of systems to extract source artifacts from for injection.
- language - The language of the dataset to generate. Can be Java, C or C#.
- numForks - The number of forks to generate.
- numFiles - The number of files to inject into the forks. Must be >= 0.
- numDirectories - The number of directories to inject into the forks. Must be >= 0.
- numFragments - The number of functions to inject into the forks. Must be >= 0.
- functionFragmentMinSize - Minimum size of functions to inject. Must be >= 1.
- functionFragmentMaxSize - Maximum size of functions to inject. Must be >= functionFragmentMinSize.
- maxInjectNum - The maximum number of forks to inject a particular function/file/directory into. Must be >= 1, but no greater than 'numForks',
- injectionReptitionRate - Probability that the same injection location is used for all forks a particular function/file/directory is injected into. Specified as % (0-100).
- fragmentMutationRate - Probability that a function is mutated before injection.
- fileMutationRate - Probability that a file is mutated before injection.
- dirMutationRate - Probability that a directory is mutated (the files it contains) before injection.
- fileRenameRate - Probability that a file is renamed before injection (including files within an injected directory).
- dirRenameRate - Probability that a directory is renamed before injection.
- maxFileEdit - Maximum number of edits a file mutation will performed. Expressed as ratio of the size of the file to mutate. Specify as a % (0-100). 0% is interrupted as a maximum of 1 edit.
- maxFunctionEdit - Maximum number of edits a function mutation will performed. Expressed as ratio of the size of the file to mutate. Specify as a % (0-100). 0% is interrupted as a maximum of 1 edit.
- mutationAttempts - The number of times a mutation is attempted (and fails) before giving up on this. Must be >= 1. Best to leave as default (10).
Output
ForkSim outputs a directory containing the generated dataset. A generated dataset of 5 forks would have the following structure:
output/
0/ ## Generated Fork 0
1/ ## Generated Fork 1
2/ ## Generated Fork 2
3/ ## Generated Fork 3
4/ ## Generated Fork 4
dirs/ ## A copy of the directories injected into the forks, and their mutants
files/ ## A copy of the files injected into the forks, and their mutants.
function_fragments/ ## A copy of the functions injected into the forks, and their mutants.
log ## The generation log.
originalSystem/ ## A copy of the subject system used.
sourceRepository/ ## A copy of the source repository used.
The dirs, files, and function_fragments directories contain a folder for each directory/file/function chosen for injection. This folder contains a file/folder "original" with the original state of the source artefact, and then a copy of the particular version (subject to mutation/renaming) of the file/folder that was injected into the chosen forks (named after the fork).
For example. The 5th file injection will be stored in output/files/5. It will contain a file "original", which is a copy of the original file. If this file was injected into forks 0, 2, and 3 (possibly with mutations), then files "0", "2" and "3" will be in this folder. Their content will be the version of the file injected into their respective forks.
Generation Log Format
Linked here is a sample log. It reports the generation parameters, than lists the injected files, directories, and functions in injection order.
File Injection Logging
The following is an example of a logged file injection:
5 U 3 /home/jeff/git/ForkingSimulator/output/sourceRepository/java6/com/sun/source/tree/TypeCastTree.java
0 R O /home/jeff/git/ForkingSimulator/output/0/CH/ifa/draw/application/pMoYoEhPnuKx8ZKo2En.java
3 O M mCW_A 2 1 /home/jeff/git/ForkingSimulator/output/3/CH/ifa/draw/application/TypeCastTree.java
4 R M mARI 3 2 /home/jeff/git/ForkingSimulator/output/4/CH/ifa/draw/application/JYzlEy.java
The first line is its header, which describes the selected file from the source directory, and the general injection parameters chosen. The tabbed lines describe each injection into the generated forks.
The header has the following generic format:
- #FileInjection {U | V} #Injections OriginalFile
- #FileInjection - The number of the file injection.
- U or V - If the injection locations in the individual forks are uniform or varied.
- #Injections - The number of forks this file was injected into.
- OriginalFile - Path to the original file.
Each injection has the following generic format:
- #Fork {O | R} {O | M mutator #edits #type} injectedFile
- #Fork - The id of the fork injected into.
- O or R - If the file kept its original name, or was renamed before injection.
- O or M - If the file was kept in its original state or mutated before injection.
- mutator - The mutator used for the mutation.
- #edits - The number of times the mutator was applied.
- #type - The clone type of the mutator (1, 2 or 3).
- injectedFile - Path to the injected file.
Directory Injection Logging
Please see above linked log for an example (too long to include here). However, it has the following tab format:
Header For Selected Directory for Injection
Header For Particular Injection of this Directory
Description of Each File Injection as Part of the Directory Injection
The first line is its header, which describes the selected directroy from the source repository, and the general injection parameters chosen. The tabbed lines are headers for each injection of the directory. The double tabbed lines describe each file injection due to each directory injection.
The header has the following generic format:
- #DirectoryInjection {U | V} #injections OriginalDirectory
Where:
- #DirectoryInjection - The number of the directory injection.
- U or V - If the injection locations in the individual forks are uniform or varied.
- #injections - The number of forks this directory was injected into.
- OriginalDirectory - Path to the original directory chosen from the source repository.
The directory injection headers have the following generic format:
- #Fork {O | R} InjectedDirectory
Where:
- #Fork - The id of the fork injected into.
- O or R - If the directory kept its original name, or was renamed before injection.
- InjectedDirectory - Path to the injected directory.
Each file injection has the following generic format:
- {O | R} {O | M mutator #edits #type} originalFile;injectedFile
Where:
- O or R - If the file kept its original name or was renamed before injection.
- O or M - If the file was kept in its original state or mutated before injection.
- mutator - The mutator used for the mutation.
- #edits - The number of times the mutator was applied.
- #type - The clone type of the mutator (1, 2 or 3).
- originalFile - Path of the original file (from the original directory).
- injectedFile - Path to the injected file.
Function Injection Logging
The following is an example of a logged function injection:
50 V 2 446 472 /home/jeff/git/ForkingSimulator/output/sourceRepository/java6/com/sun/corba/se/impl/javax/rmi/CORBA/Util.java
0 M mSIL 2 3 38 64 /home/jeff/git/ForkingSimulator/output/0/CH/ifa/draw/util/CommandChoice.java
3 O 41 67 /home/jeff/git/ForkingSimulator/output/3/CH/ifa/draw/util/UndoableHandle.java
The first line is its header, which describes the selected function from the source repository, and the generation injection parameters chosen. The tabbed lines describe each injection into the generated forks.
The header has the following generic format:
- #FunctionInjection {U | V} #Injections OriginalStartLine OriginalEndLine OriginalFile
Where:
- #FunctionInjection - The number of the function injection.
- U or V - If the injection locations in the individual forks are uniform or varied.
- #Injections - The number of forks this function was injected into.
- OriginalStartLine - Start line of the original function.
- OriginalEndLine - End line of the original function (inclusive).
- OriginalFile - The file containing the original function.
Each injection has the following generic format:
- #Fork {O | M mutator #edits #type} injectedStartLine injectedEndLine injectedFile
- #Fork - The id of the fork injected into.
- O or M - If the file was kept in its original state or mutated before injection.
- mutator - The mutator used for the mutation.
- #edits - The number of times the mutator was applied.
- #type - The clone type of the mutator (1, 2, or 3).
- injectedStartLine - Start line of the injected function.
- injectedEndLine - End line of the injected function (inclusive).
- injectedSrcFile - The file the function was injected into.