TDL-13692: Missing parent objects from projection expression causes key error and TDL-16140: Fix key error and index error while applying projection expression #35

hpatel41 · 2021-11-10T12:30:24Z

Description of change

TDL-13692: Missing parent objects from projection expression causes key error

Updated the code to use the get function when getting data from the dictionary and setting default value {} if data is not found.

TDL-16140: Fix key error and index error while applying projection expression

Updated the code to only prepare the output if the data list contains the data the requested index

Manual QA steps

Run the tap in discovery mode
Select LOG_BASED replication method,
Added parent and child fields in projection for the fields in which parent data is not present

Risks

Rollback steps

revert this branch

tap_dynamodb/deserialize.py

dbshah1212 · 2022-01-05T13:40:29Z

tests/test_dynamodb_log_based_parent_child_data.py

+            }
+        ]
+
+    def generate_items(self, num_items, start_key=0):


Add function comments

karanpanchal-crest · 2022-01-06T13:40:21Z

tap_dynamodb/deserialize.py

            else:
                if output.get(breadcrumb[0]) is None:
                    output[breadcrumb[0]] = {}
-                self._apply_projection(record[breadcrumb[0]], breadcrumb[1:], output[breadcrumb[0]])


Explain what this code will input in the Talend-Stitch

Added comment.

karanpanchal-crest · 2022-01-06T13:40:28Z

tap_dynamodb/deserialize.py

                    output[breadcrumb_key] = [{}]
-                self._apply_projection(record[breadcrumb_key][index], breadcrumb[1:], output[breadcrumb_key][0])


Explain what this code will input in the Talend-Stitch

Added comment.

karanpanchal-crest · 2022-01-06T13:40:33Z

tap_dynamodb/deserialize.py

                else:
-                    output[breadcrumb_key] = [record[breadcrumb_key][index]]
-


Explain what this code will input in the Talend-Stitch

Added comment

karanpanchal-crest · 2022-01-06T13:40:41Z

tap_dynamodb/deserialize.py

@@ -58,23 +58,31 @@ def _apply_projection(self, record, breadcrumb, output):
                breadcrumb_key = breadcrumb[0].split('[')[0]
                index = int(breadcrumb[0].split('[')[1].split(']')[0])
                if output.get(breadcrumb_key):
-                    output[breadcrumb_key].append(record[breadcrumb_key][index])


Explain what this code will input in the Talend-Stitch

Added comment.

kspeer825

Requesting test changes.

kspeer825 · 2022-01-12T13:19:06Z

tests/test_dynamodb_log_based_parent_child_data.py

+        # get data
+        messages_by_stream = runner.get_records_from_target_output()
+
+        for stream in expected_streams:


I think there should be an assertion that proves there are upsert messages present. The way this is written it will pass even if messages = [] and then the test would never hit these assertions below.

Updated the code to collect records from messages and loop over every record and assert.

Personally I think an explicit assertion that we replicated records would give more confidence. But I think this does functionally accomplish with the double get in record.get('map_field').get('map_entry_1').

kspeer825 · 2022-01-12T13:21:03Z

tests/test_dynamodb_log_based_parent_child_data.py

+        state = menagerie.get_state(conn_id)
+        state_version = menagerie.get_state_version(conn_id)
+
+        # delete 'finished_shards' for every streams from the state file as we want to run 2nd sync


I am not familiar with dynamodb's bookmarking strategy, but I do not think it makes sense to inject a simulated state at this point in the test. Could you please explain why this bookmark key is being removed.

For the "LOG_BASED" replication method, the Tap is forcing a "FULL_TABLE" sync for the 1st time, and from the next sync, the stream is synced in a LOG_BASED manner. Hence, in this test, when we ran the 1st sync, the tap ran in a FULL_TABLE manner, thus we have removed the finished shards from the state to re-sync the data and ran with state file to validate the projection change for the LOG_BASED replication method.

I see. If you are confident that this mimics reality then I think this is a fine implementation. However, in the database taps we generally prefer to insert data between syncs rather than inject a simulated sync as this gives us a guaranteed copy of what an end-user would do. In the SaaS taps this is much more difficult since there is no local instance of the source to interact with. That is why you see these state injects to simulate behavior elsewhere.

kspeer825 · 2022-01-12T13:43:46Z

.circleci/config.yml

@@ -19,6 +19,17 @@ jobs:
          command: |
            source /usr/local/share/virtualenvs/tap-dynamodb/bin/activate
            make lint
+      - run:


Please update the config to the standards: tap-tester image, env vars and slack notification. I have recently documented the requirments for a circleci config file. See https://github.com/stitchdata/tap-tester/blob/master/reference/circleci_configs.md

Updated the config.yml file.

kspeer825 · 2022-01-12T13:44:58Z

tests/test_dynamodb_log_based_parent_child_data.py

+        - The tap does not break when the parent data is not found and the user is requesting for child data
+        - The tap does not break when the data a specific position is not found in the record


I don't see that both cases are being tested. There should be a multiple logical syncs taking place if we are testing multiple cases like this.

These both conditions are tested in the test as the is in the format:
{ 'int_id': i, 'string_field': "test string", 'test_list_2': ['list_2_data'] }
and in the projection at line no. 25, we are expecting map_field.map_entry_1 (The tap does not break when the parent data is not found and the user is requesting for child data) and test_list_2[1] (The tap does not break when the data a specific position is not found in the record)

cosimon · 2022-01-12T21:19:06Z

tap_dynamodb/deserialize.py

+                    # main breadcrumb = [['metadata[0]'], ['metadata[1]']]
+                    # current breadcrumb = ['metadata[1]']
+                    # current output = {'metadata': ['test1']}
+                    # expected output = {'metadata': ['test1']}


Can we clean up these comments?

Removed the comments

cosimon · 2022-01-12T21:47:52Z

tap_dynamodb/deserialize.py

            else:
                output[breadcrumb[0]] = record.get(breadcrumb[0])
        else:
            if '[' in breadcrumb[0]:
                breadcrumb_key = breadcrumb[0].split('[')[0]
                index = int(breadcrumb[0].split('[')[1].split(']')[0])
-                if output.get(breadcrumb_key) is None:
+                if not output.get(breadcrumb_key):


I think this should be

Suggested change

if not output.get(breadcrumb_key):

if breadcrumb_key not in output:

@cosimon
As seen in the figure, the particular code was kept to handle such scenarios where the dictionary contains an empty list which would return false if we change it to if breadcrumb_key not in output:
Hence, not updated the code.

cosimon · 2022-01-12T21:48:57Z

tap_dynamodb/deserialize.py

+                #       as "metadata" is not present and the current breadcrumb is expecting it as a parent
+
+                # keep empty dict if the data is not found in the record
+                if record.get(breadcrumb[0]):


Suggested change

if record.get(breadcrumb[0]):

if breadcrumb[0] in record:

The same comment mentioned above applies to the change suggested here too.

kspeer825

One more minor change to the config. Tests are good 👍

kspeer825 · 2022-01-13T13:16:39Z

.circleci/config.yml

-                     tests
+            run-test --tap=tap-dynamodb tests
+      - slack/notify-on-failure:
+          only_for_branches: master


Please also include the tap-tester-user context after the circleci-user context in both workflow definitions. That context contains the desired slack webhook. If this is unclear and the tap-tester docs do not clarify, please let me know so I can update them.

Added tap-tester-user context.

kspeer825 · 2022-01-13T13:22:10Z

tests/test_dynamodb_log_based_parent_child_data.py

+        state = menagerie.get_state(conn_id)
+        state_version = menagerie.get_state_version(conn_id)
+
+        # delete 'finished_shards' for every streams from the state file as we want to run 2nd sync


I see. If you are confident that this mimics reality then I think this is a fine implementation. However, in the database taps we generally prefer to insert data between syncs rather than inject a simulated sync as this gives us a guaranteed copy of what an end-user would do. In the SaaS taps this is much more difficult since there is no local instance of the source to interact with. That is why you see these state injects to simulate behavior elsewhere.

kspeer825 · 2022-01-13T13:25:43Z

tests/test_dynamodb_log_based_parent_child_data.py

+        # get data
+        messages_by_stream = runner.get_records_from_target_output()
+
+        for stream in expected_streams:


Personally I think an explicit assertion that we replicated records would give more confidence. But I think this does functionally accomplish with the double get in record.get('map_field').get('map_entry_1').

…t we replicated records

…/tap-dynamodb into fix-key-error-and-index-error

…nger-io/tap-dynamodb into fix-key-error-and-index-error

…/tap-dynamodb into fix-key-error-and-index-error

* Tdl 16328 implement request timeout (#38) * added request timeout * added test cases for the backoff * resolved pylint errors * updated pylint * fixd pylint * updated config.yml * resolved comments * resolved comments * resolved pylint errors * resolved comments * resolved comments * resolved comments * resolved slack-on-notify failure * removed scenarios * added tap-tester-user in config * TDL-16291 added code comments (#36) * added code comments * fixed typo * TDL-13692: Missing parent objects from projection expression causes key error and TDL-16140: Fix key error and index error while applying projection expression (#35) * added code change for handling key error and index error * resolve pylint error * updated the unittests * updated config.yml to run unittest to run on CCi * added unittests * updated the code to handle child dict data * added comments with example for projection * updated config.yml file and resolved PR review comments * resolve slack failure error in config file * updated version 2 to 2.1 * remoed SCENARIO from the test cases * resolved comments * reverted changes * added tap-tester-venv, tap_tester_sandbox and check for verifying that we replicated records Co-authored-by: namrata270998 <namrata.brahmbhatt@crestdatasys.com> * Tdl 15171 implement expression attributes (#37) * Initial commit for expression attributes * Updated circleci and full_table.py * resolved pylint error * Updated circleci * Updated unittest case path in circleci * Updated expression attribute * resolved pylint error * Updated prepare exprssion logic * Handled empty projection * added logic for nested expressions * added in log_based test and comments * added code comments * Updated comments * Resolved review comments * added comment * resolved comments * resolved pylint errors * updated config.yml and a typo * resolved slack-on-notify failure * removed scenarios * Changed the expressionattribute logic as suggested * resolved pylint error * fixed typo * added test case for '.' in fields * added testcase for prepare_projection() * resolved comments * added check for empty expression * handled corner case * resolved comments and updated unittests * removed unused import * resolved comments * fixed pylint failure * updated comment * changed the name of catalog's expression to expression-attributes * added tap-tester-user to the config, updated to nosetests * updated to nosetests * Changed expression-attributes to expression-attribute * Changed expression-attributes to expression-attribute * added tests for primary and replication key reserved words * renamed the file * added funciton comments * fixed pylint * changed the parameter name to attribute * changed the attribute name in new files added * expression-attribute -> expression-attributes in full table * test fixes for pk and hash key as expression-attributes * expression-attribute -> expression-attributes in one more test * fix log sync ref and log projection test name * Changed from attribute to attributes Co-authored-by: namrata270998 <namrata.brahmbhatt@crestdatasys.com> Co-authored-by: KrishnanG <kgurusamy@talend.com> Co-authored-by: kspeer <kspeer@stitchdata.com> Co-authored-by: Harsh <80324346+harshpatel4crest@users.noreply.github.com> Co-authored-by: Prijen Khokhani <88327452+prijendev@users.noreply.github.com> Co-authored-by: KrishnanG <kgurusamy@talend.com> Co-authored-by: kspeer <kspeer@stitchdata.com>

harshpatel4_crest added 3 commits November 10, 2021 17:38

added code change for handling key error and index error

de10f90

resolve pylint error

059ba5e

updated the unittests

4380c3a

hpatel41 requested review from dbshah1212, karanpanchal-crest and savan-chovatiya November 11, 2021 05:35

harshpatel4_crest added 3 commits November 12, 2021 10:37

updated config.yml to run unittest to run on CCi

b84a0bc

added unittests

14c019c

updated the code to handle child dict data

401f9bd

hpatel41 marked this pull request as ready for review December 1, 2021 08:05

dbshah1212 reviewed Jan 5, 2022

View reviewed changes

tap_dynamodb/deserialize.py Show resolved Hide resolved

dbshah1212 reviewed Jan 5, 2022

View reviewed changes

tests/test_dynamodb_log_based_parent_child_data.py

}

]

def generate_items(self, num_items, start_key=0):

Copy link

dbshah1212 Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add function comments

karanpanchal-crest suggested changes Jan 6, 2022

View reviewed changes

karanpanchal-crest mentioned this pull request Jan 6, 2022

Apply projection expression failure cases #34

Closed

added comments with example for projection

dd40139

dbshah1212 requested review from dbshah1212 and karanpanchal-crest January 7, 2022 09:03

karanpanchal-crest approved these changes Jan 10, 2022

View reviewed changes

hpatel41 requested review from cosimon, KrisPersonal, kspeer825 and manand31 January 10, 2022 13:33

kspeer825 suggested changes Jan 12, 2022

View reviewed changes

This was referenced Jan 12, 2022

Tdl 15171 implement expression attributes #37

Merged

Tdl 16328 implement request timeout #38

Merged

harshpatel4_crest added 4 commits January 12, 2022 21:48

updated config.yml file and resolved PR review comments

0164798

resolve slack failure error in config file

f4570ed

updated version 2 to 2.1

3449052

remoed SCENARIO from the test cases

a59cba9

hpatel41 requested a review from kspeer825 January 12, 2022 16:47

cosimon requested changes Jan 12, 2022

View reviewed changes

dbshah1212 changed the base branch from master to TDL-16126-Crest-Master January 13, 2022 05:38

namrata270998 added 2 commits January 13, 2022 14:13

resolved comments

5254703

reverted changes

0fbc820

singer-io deleted a comment from cosimon Jan 13, 2022

kspeer825 suggested changes Jan 13, 2022

View reviewed changes

namrata270998 requested a review from cosimon January 13, 2022 13:31

added tap-tester-venv, tap_tester_sandbox and check for verifying tha…

3432aa3

…t we replicated records

hpatel41 requested a review from kspeer825 January 17, 2022 07:52

kspeer825 approved these changes Jan 18, 2022

View reviewed changes

namrata270998 approved these changes Feb 11, 2022

View reviewed changes

namrata270998 added 2 commits February 11, 2022 12:17

Merge branch 'TDL-16126-Crest-Master' of https://github.com/singer-io…

632853c

…/tap-dynamodb into fix-key-error-and-index-error

Merge branch 'fix-key-error-and-index-error' of https://github.com/si…

ff89e51

…nger-io/tap-dynamodb into fix-key-error-and-index-error

KrisPersonal approved these changes Feb 11, 2022

View reviewed changes

cosimon approved these changes Feb 11, 2022

View reviewed changes

Merge branch 'TDL-16126-Crest-Master' of https://github.com/singer-io…

2044d84

…/tap-dynamodb into fix-key-error-and-index-error

namrata270998 merged commit 3d49bbf into TDL-16126-Crest-Master Feb 14, 2022

namrata270998 mentioned this pull request Feb 14, 2022

Tdl 16126 crest master #39

Merged

luandy64 deleted the fix-key-error-and-index-error branch July 18, 2023 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TDL-13692: Missing parent objects from projection expression causes key error and TDL-16140: Fix key error and index error while applying projection expression #35

TDL-13692: Missing parent objects from projection expression causes key error and TDL-16140: Fix key error and index error while applying projection expression #35

hpatel41 commented Nov 10, 2021

dbshah1212 Jan 5, 2022

karanpanchal-crest Jan 6, 2022

hpatel41 Jan 7, 2022

karanpanchal-crest Jan 6, 2022

hpatel41 Jan 7, 2022

karanpanchal-crest Jan 6, 2022

hpatel41 Jan 7, 2022

karanpanchal-crest Jan 6, 2022

hpatel41 Jan 7, 2022

kspeer825 left a comment

kspeer825 Jan 12, 2022

hpatel41 Jan 12, 2022

kspeer825 Jan 13, 2022

kspeer825 Jan 12, 2022

hpatel41 Jan 12, 2022

kspeer825 Jan 13, 2022

kspeer825 Jan 12, 2022

hpatel41 Jan 12, 2022

kspeer825 Jan 12, 2022

hpatel41 Jan 12, 2022

cosimon Jan 12, 2022

namrata270998 Jan 13, 2022

cosimon Jan 12, 2022

namrata270998 Jan 13, 2022 •

edited

Loading

cosimon Jan 12, 2022

namrata270998 Jan 13, 2022 •

edited

Loading

kspeer825 left a comment

kspeer825 Jan 13, 2022

hpatel41 Jan 17, 2022

kspeer825 Jan 13, 2022

kspeer825 Jan 13, 2022

		output[breadcrumb_key] = [{}]
		self._apply_projection(record[breadcrumb_key][index], breadcrumb[1:], output[breadcrumb_key][0])

		else:
		output[breadcrumb_key] = [record[breadcrumb_key][index]]

		- The tap does not break when the parent data is not found and the user is requesting for child data
		- The tap does not break when the data a specific position is not found in the record

	if not output.get(breadcrumb_key):
	if breadcrumb_key not in output:

TDL-13692: Missing parent objects from projection expression causes key error and TDL-16140: Fix key error and index error while applying projection expression #35

TDL-13692: Missing parent objects from projection expression causes key error and TDL-16140: Fix key error and index error while applying projection expression #35

Conversation

hpatel41 commented Nov 10, 2021

Description of change

Manual QA steps

Risks

Rollback steps

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kspeer825 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

namrata270998 Jan 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

namrata270998 Jan 13, 2022 • edited Loading

Choose a reason for hiding this comment

kspeer825 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

namrata270998 Jan 13, 2022 •

edited

Loading

namrata270998 Jan 13, 2022 •

edited

Loading