This program is designed to extract data from an API endpoint and save it to a local file in CSV format. If the local file already exists, the program will compare the hash values of the API resource and the local file to determine if there are any changes in the data. If there are changes, the program will pre-process and update the local file with the new data. The data is produced from the CoinDesk Bitcoin Price Index API in real-time.
A cryptographic hash function has the property that it is infeasible to find two different files with the same hash value. Hash functions are commonly used with digital signatures and for data integrity. A hash value is a unique value that corresponds to the content of the file (or stream). Metadata such as the file name, extension, timestamps, permissions, etc. have no influence on the hash. However, changing even a single character in the contents of a file (or stream) changes the hash value of the file (or stream).
- PowerShell 7 (also, pwsh.exe).
- Determine the script execution policy on your machine by executing
Get-ExecutionPolicy
in terminal. The execution policy in PowerShell has to be changed fromRestricted
to enable you run PowerShell scripts. RunSet-ExecutionPolicy -ExecutionPolicy Unsigned -Scope CurrentUser
on terminal. - The script requires the
Invoke-RestMethod
cmdlet to retrieve data from the API endpoint, theGet-FileHash
cmdlet to calculate the hash values of the data and theCompare-Object
cmdlet to return the changed data. These cmdlets are part of the PowerShell core and should be available in any modern version of PowerShell. - Authentication is not required to access the API resource.
$algorithm
: A string that specifies the cryptographic hashing function to use for computing the hash value of the contents of the specified file or stream. The following algorithms are supported: MD5, SHA1, SHA256, SHA384, SHA512. This program usesSHA256
by default.cwd
: A string value representing the current working directory. This is used to locate the script file,Calculate-DataHash.ps1
, that contains theGet-DataHash
function. The batch program,Executor.bat
, defaults to the current working directory; this should be modified to where the program was unbundled.
BitcoinPriceIndex.csv
: A CSV file containing the extracted data in a tabular format. The file will be saved in the current working directory.BitcoinPriceIndexHash.json
: A JSON file containing the hash value of the local CSV file. The file will be saved in the current working directory.
# Execute or create a job to run the batch program
cmd /c Executor.bat
When the program is run for the first time, it would execute the script, Extract-FullDatav1.2.2.ps1
to perform a full batch data extraction of data from the API resource. Subsequent runs of the program, the script, Extract-IncrementalDatav1.2.2.ps1
would be executed to perform incremental data extraction if the data has changed and the hash value is different.
Full batch data extraction is a process of extracting all the data from a data source in one go. It involves retrieving all the data from the source and saving it locally. This is useful in situations where the data is needed for the first time or when the data needs to be updated completely.
On the other hand, incremental data extraction is a process of extracting only the data that has changed since the last extraction. It involves comparing the data from the source with the locally saved data and extracting only the changed data. This is useful in situations where the data changes frequently and only the changes need to be processed.
The program performs both full batch and incremental data extraction from an API endpoint. The Get-DataHash
function is used to calculate the hash value of the data from the API endpoint or a local file and return the data.
The full batch data extraction script starts by setting the variables for the API endpoint and local file name. It then starts a job to calculate the hash value and return the API endpoint resource using the Get-DataHash
function. The script waits for the job to complete and gets the results of the job. It then checks if the count of the results is 2, indicating that both the hash value and data were returned. If the count is 2, the script writes the data to a CSV file and the hash value to a JSON file. It then displays a message indicating that the CSV and JSON files were created.
The incremental data extraction script starts by setting the variables for the API endpoint and local file name. It then starts a job to calculate the hash value and return the API endpoint resource using the Get-DataHash
function. The script waits for the job to complete and gets the results of the job. It then reads the JSON file containing the hash value of the previously saved data. If the hash values of the API endpoint resource and the locally saved data are not equal, the script retrieves the changed data using the Compare-Object
cmdlet. It then writes the changed data to a CSV file and the updated hash value to a JSON file. It then displays a message indicating that the CSV and JSON files were updated.
To use the program, the user needs to provide the algorithm to be used for calculating the hash value and the current working directory as arguments when running the script. The script will then handle the rest of the data extraction process.
This program could easily become part of either a data intergration or data management system, functioning as a microservice. The pre-processed data could be used by other services within the data management system to perform various tasks, such as data analysis, reporting, app development and more.
The data is structured in columnar format and can be easily consumed. The data schema/headers is represented in docschema.json
.
chartName | EUR | GBP | USD | updatedtimeISO |
---|---|---|---|---|
Bitcoin | 16211.2575 | 13905.5101 | 16641.507 | 04-Jan-23 1:49:00 AM |
Bitcoin | 16215.5825 | 13909.2199 | 16645.9468 | 04-Jan-23 2:02:00 AM |
Bitcoin | 16246.5755 | 13935.8048 | 16677.7624 | 04-Jan-23 2:16:00 AM |
- CoinDesk Bitcoin Price Index API
- SHA-256 Cryptographic Hash Algorithm
- MD5 vs SHA-1 vs SHA-2 - Which is the Most Secure Encryption Hash and How to Check Them
- The
Get-FileHash
cmdlet - about_Script_Blocks
- How To Get The Value Of Header In CSV
- Passing an array of bytes to system.IO.MemoryStream
- PowerShell Notes for Professionals
- PowerShell for Beginners - BY ALEX RODRICK