Skip to content

Commit

Permalink
Semantic profiler and report generation module integration
Browse files Browse the repository at this point in the history
Added the modules for generating the report based on the syntactic and semantic feature present in the code

Signed-off-by: Pankaj Thorat <thorat.pankaj9@gmail.com>
  • Loading branch information
pankajskku committed Nov 28, 2024
1 parent 995bfc6 commit b8fc734
Show file tree
Hide file tree
Showing 50 changed files with 70,142 additions and 2,763 deletions.
2 changes: 2 additions & 0 deletions transforms/code/code_profiler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,5 @@ The high-level system design is as follows:
For each new target language, the offline phase is utilized to create deterministic rules by harnessing the capabilities of LLMs and working with exemplar code samples from the target language. In this process, Workflow W1 facilitates the creation of rules around syntactic structures based on exemplar code samples, while Workflow W2 is used to establish semantic dimensions for profiling. Subsequently, we derive rules that connect syntactic constructs to the predefined semantic concepts. These rules are then stored in a rule database, ready to be employed during the online phase.

In the online phase, the system dynamically generates profiling outputs for any incoming code snippets. This is achieved by extracting concepts from the snippets using the rules in the database and storing these extractions in a tabular format. The structured tabular format allows for generating additional concept columns, which are then utilized to create comprehensive profiling reports.


5 changes: 3 additions & 2 deletions transforms/code/code_profiler/input/data_profiler_params.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{
"input": "multi-package.parquet",
"contents": "Contents",
"language": "Language"
"dynamic_schema_mapping": "True",
"contents": "contents",
"language": "language"
}
Binary file modified transforms/code/code_profiler/input/multi-package.parquet
Binary file not shown.
Binary file not shown.
Binary file not shown.
1,542 changes: 555 additions & 987 deletions transforms/code/code_profiler/notebook_example/code-profiler.ipynb

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions transforms/code/code_profiler/python/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ setup:: .transforms.setup
set-versions:
$(MAKE) TRANSFORM_PYTHON_VERSION=$(CODE_PROFILER_PYTHON_VERSION) TOML_VERSION=$(CODE_PROFILER_PYTHON_VERSION) .transforms.set-versions

build-dist:: .defaults.build-dist
build-dist:: .defaults.build-dist

publish-dist:: .defaults.publish-dist

Expand All @@ -51,5 +51,5 @@ run-local-sample: .transforms.run-local-sample

run-local-python-sample:
$(MAKE) RUN_FILE=code_profiler_local_python.py \
RUN_ARGS="--content 'Contents' --language 'Language'" \
RUN_ARGS="--content 'contents' --language 'language'" \
.transforms.run-local-python-sample
11 changes: 11 additions & 0 deletions transforms/code/code_profiler/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,17 @@ the options provided by
the [python launcher](../../../../data-processing-lib/doc/python-launcher-options.md).

### Running the samples

The code profiler can be run on mach-arm64 and x86_64 host architecture.
Depending on your host architecture, please change the `RUNTIME_HOST_ARCH` in the Makefile.
```
# values possible mach-arm64, x86_64
export RUNTIME_HOST_ARCH=x86_64
```
If you are using mac, you may need to permit your Mac to load the .so from the security settings. Generally, you get the pop-up under the tab security while running the transform.

![alt text](image.png)

To run the samples, use the following `make` targets

* `run-local-sample` - runs src/code_profiler_local.py
Expand Down
33 changes: 31 additions & 2 deletions transforms/code/code_profiler/python/src/UAST_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,8 +228,9 @@ def _add_user_defined(self, node):
return

# Traversing through the AST to create nodes recursively.
def _dfs(self, AST_node, parent) :
if (AST_node.type in self.rules) :
def _dfs(self, AST_node, parent):

if (AST_node.type in self.rules):
ast_snippet = AST_node.text.decode("utf8")
node_type = self.rules[AST_node.type]["uast_node_type"]
exec_string = self.rules[AST_node.type]["extractor"]
Expand Down Expand Up @@ -269,3 +270,31 @@ def _extract(self, ast_snippet, node_type, exec_string):
return self.grammar[node_type]["keyword"] + " " + self.extracted
except Exception as e:
print(e)

def uast_read(jsonstring):
"""
Reads an input json string into UAST class object
"""
uast = UAST()
if jsonstring is not None and jsonstring != 'null':
uast.load_from_json_string(jsonstring)
return uast
return None

def extract_ccr(uast):
"""
Calculates the code to comment ratio given an UAST object as input
"""
if uast is not None:
total_comment_loc = 0
for node_idx in uast.nodes:
node = uast.get_node(node_idx)
if node.node_type == 'uast_comment':
total_comment_loc += node.metadata.get("loc_original_code", 0)
elif node.node_type == 'uast_root':
loc_snippet = node.metadata.get("loc_snippet", 0)
if total_comment_loc > 0:
return loc_snippet / total_comment_loc
else:
return None
return None
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@
local_conf = {
"input_folder": input_folder,
"output_folder": output_folder,
"contents": "Contents",
"language": "Language"
"contents": "contents",
"language": "language"
}
params = {
# Data access. Only required parameters are specified
Expand Down
Loading

0 comments on commit b8fc734

Please sign in to comment.