You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had searched in the feature and found no similar feature requirement.
Description
I will regularly pull the incremental data of mongodb into hive every day. At present, the extraction method of mongodb is to pull the full amount, and then filter the incremental data in the transform. This method is not friendly. If this table has 1 billion data, it will increase every day. The amount of 5,000 records requires the source side to pull in 1 billion data, and then filter out 5,000 records on the transform side. DBA disagrees with this approach, saying that it consumes a lot of resources, and it needs to be done once a day. If there are too many, the CPU on the direct line will explode. Therefore, I hope to add a place where the filter conditions can be written on the source side, so that I can directly check the 5000 incremental data and export it.
Search before asking
Description
I will regularly pull the incremental data of mongodb into hive every day. At present, the extraction method of mongodb is to pull the full amount, and then filter the incremental data in the transform. This method is not friendly. If this table has 1 billion data, it will increase every day. The amount of 5,000 records requires the source side to pull in 1 billion data, and then filter out 5,000 records on the transform side. DBA disagrees with this approach, saying that it consumes a lot of resources, and it needs to be done once a day. If there are too many, the CPU on the direct line will explode. Therefore, I hope to add a place where the filter conditions can be written on the source side, so that I can directly check the 5000 incremental data and export it.
Usage Scenario
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: