Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify extractors output #566

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions UPGRADE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Upgrade Guide

This document provides guidelines for upgrading between versions of Flow PHP.
Please follow the instructions for your specific version to ensure a smooth upgrade process.

---

## Upgrading from 0.3.x to 0.4.x

### 1) `ref` expression nullability

`ref("entry_name")` is no longer returning null when the entry is not found. Instead, it throws an exception.
The same behavior can be achieved through using a newly introduced `optional` expression:

Before:
```php
<?php

use function Flow\ETL\DSL\optional;
use function Flow\ETL\DSL\ref;

ref('non_existing_column')->cast('string');
```

After:
```php
<?php

use function Flow\ETL\DSL\optional;
use function Flow\ETL\DSL\ref;

optional(ref('non_existing_column'))->cast('string');
// or
optional(ref('non_existing_column')->cast('string'));
```

### 2) Extractors output
norberttech marked this conversation as resolved.
Show resolved Hide resolved

Affected extractors:

* CSV
* JSON
* Avro
* DBAL
* GoogleSheet
* Parquet
* Text
* XML

Extractors are no longer returning data under an array entry called `row`, thanks to this unpacking row become redundant.

Because of that all DSL functions are no longer expecting `$entry_row_name` parameter, if it was used anywhere,
please remove it.

Before:
```php
<?php

(new Flow())
->read(From::array([['id' => 1, 'array' => ['a' => 1, 'b' => 2, 'c' => 3]]]))
->withEntry('row', ref('row')->unpack())
->renameAll('row.', '')
->drop('row')
->withEntry('array', ref('array')->arrayMerge(lit(['d' => 4])))
->write(To::memory($memory = new ArrayMemory()))
->run();
```

After:

```php
<?php

(new Flow())
->read(From::array([['id' => 1, 'array' => ['a' => 1, 'b' => 2, 'c' => 3]]]))
->withEntry('array', ref('array')->arrayMerge(lit(['d' => 4])))
->write(To::memory($memory = new ArrayMemory()))
->run();
```

### 3) ConfigBuilder::putInputIntoRows() output is now prefixed with _ (underscore)

In order to avoid collisions with datasets columns, additional columns created after using putInputIntoRows()
would now be prefixed with `_` (underscore) symbol.

Before:
```php
<?php

$rows = (new Flow(Config::builder()->putInputIntoRows()))
->read(Json::from(__DIR__ . '/../Fixtures/timezones.json', 5))
->fetch();

foreach ($rows as $row) {
$this->assertSame(
[
...
'_input_file_uri',
],
\array_keys($row->toArray())
);
}
```

After:
```php
<?php

$rows = (new Flow(Config::builder()->putInputIntoRows()))
->read(Json::from(__DIR__ . '/../Fixtures/timezones.json', 5))
->fetch();

foreach ($rows as $row) {
$this->assertSame(
[
...
'_input_file_uri',
],
\array_keys($row->toArray())
);
}
```
2 changes: 0 additions & 2 deletions examples/topics/aggregations/power_plant.php
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,6 @@

$flow = (new Flow())
->read(CSV::from(__FLOW_DATA__ . '/power-plant-daily.csv', 10, delimiter: ';'))
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->withEntry('production_kwh', ref('Produkcja(kWh)'))
->withEntry('consumption_kwh', ref('Zużycie(kWh)'))
->withEntry('date', ref('Zaktualizowany czas')->toDate('Y/m/d')->dateFormat('Y/m'))
Expand Down
2 changes: 0 additions & 2 deletions examples/topics/aggregations/power_plant_bar_chart.php
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,6 @@

$flow = (new Flow)
->read(CSV::from(__FLOW_DATA__ . '/power-plant-daily.csv', 10, delimiter: ';'))
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->withEntry('production_kwh', ref('Produkcja(kWh)'))
->withEntry('consumption_kwh', ref('Zużycie(kWh)'))
->withEntry('date', ref('Zaktualizowany czas')->toDate('Y/m/d')->dateFormat('Y/m'))
Expand Down
3 changes: 0 additions & 3 deletions examples/topics/async/csv_to_db_async_amp.php
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,6 @@
$workers = 8
)
)
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop('row')
->withEntry('id', ref('id')->cast('int'))
->withEntry('name', concat(ref('name'), lit(' '), ref('last name')))
->drop('last name')
Expand Down
3 changes: 0 additions & 3 deletions examples/topics/async/csv_to_db_async_react.php
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,6 @@
$workers = 8
)
)
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop('row')
->withEntry('id', ref('id')->cast('int'))
->withEntry('name', concat(ref('name'), lit(' '), ref('last name')))
->drop('last name')
Expand Down
3 changes: 0 additions & 3 deletions examples/topics/async/csv_to_db_sync.php
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,6 @@

(new Flow())
->read(CSV::from($path = __FLOW_OUTPUT__ . '/dataset.csv', 10_000))
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop('row')
->withEntry('id', ref('id')->cast('int'))
->withEntry('name', concat(ref('name'), lit(' '), ref('last name')))
->drop('last name')
Expand Down
3 changes: 0 additions & 3 deletions examples/topics/async/csv_to_json_async_react.php
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,6 @@
$workers = 8
)
)
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop('row')
->withEntry('id', ref('id')->cast('int'))
->withEntry('name', concat(ref('name'), lit(' '), ref('last name')))
->drop('last name')
Expand Down
4 changes: 0 additions & 4 deletions examples/topics/db/db_source.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
exit(1);
}

use function Flow\ETL\DSL\ref;
use Doctrine\DBAL\DriverManager;
use Doctrine\DBAL\Schema\Column;
use Doctrine\DBAL\Schema\Table;
Expand Down Expand Up @@ -38,9 +37,6 @@

(new Flow())
->read(CSV::from($path = __FLOW_OUTPUT__ . '/dataset.csv', 10_000))
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop('row')
->rename('last name', 'last_name')
->limit(1_000_000)
->load(DbalLoader::fromConnection($dbConnection, 'source_dataset_table', 1000))
Expand Down
3 changes: 0 additions & 3 deletions examples/topics/db/db_to_db_sync.php
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,6 @@
new OrderBy('id', Order::DESC)
)
)
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop('row')
->withEntry('id', ref('id')->cast('int'))
->withEntry('name', concat(ref('name'), lit(' '), ref('last name')))
->drop('last_name')
Expand Down
7 changes: 3 additions & 4 deletions examples/topics/fs/remote/json_remote_stream.php
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,8 @@

(new Flow())
->read(Json::from(new Path('flow-aws-s3://dataset.json', $s3_client_option), 10))
->withEntry('row', ref('row')->unpack())
->withEntry('row.id', ref('row.id')->cast('integer'))
->withEntry('name', concat(ref('row.name'), lit(' '), ref('row.last name')))
->drop('row.last name')
->withEntry('id', ref('id')->cast('integer'))
->withEntry('name', concat(ref('name'), lit(' '), ref('last name')))
->drop('last name')
->write(Json::to(new Path('flow-azure-blob://dataset_test.json', $azure_blob_connection_string)))
->run();
3 changes: 0 additions & 3 deletions examples/topics/fs/remote/json_remote_stream_glob.php
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,6 @@

(new Flow())
->read(CSV::from(new Path('flow-aws-s3://nested/**/*.csv', $s3_client_option), 10))
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop('row')
->withEntry('id', ref('id')->cast('int'))
->withEntry('name', concat(ref('name'), lit(' '), ref('last name')))
->drop('last name')
Expand Down
8 changes: 1 addition & 7 deletions examples/topics/types/csv/csv_read.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,14 @@

declare(strict_types=1);

use function Flow\ETL\DSL\col;
use function Flow\ETL\DSL\ref;
use Aeon\Calendar\Stopwatch;
use Flow\ETL\DSL\CSV;
use Flow\ETL\Flow;

require __DIR__ . '/../../../bootstrap.php';

$flow = (new Flow())
->read(CSV::from(__FLOW_OUTPUT__ . '/dataset.csv', 1000))
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop(col('row'))
->limit(10_000);
->read(CSV::from(__FLOW_OUTPUT__ . '/dataset.csv', 1000));

if ($_ENV['FLOW_PHAR_APP'] ?? false) {
return $flow;
Expand Down
4 changes: 0 additions & 4 deletions examples/topics/types/csv/csv_read_partitioned.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

declare(strict_types=1);

use function Flow\ETL\DSL\col;
use function Flow\ETL\DSL\ref;
use Aeon\Calendar\Stopwatch;
use Flow\ETL\DSL\CSV;
Expand All @@ -13,9 +12,6 @@

$flow = (new Flow())
->read(CSV::from(__FLOW_DATA__ . '/partitioned'))
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop(col('row'))
->collect()
->sortBy(ref('id'))
->write(To::output());
Expand Down
3 changes: 0 additions & 3 deletions examples/topics/types/csv/csv_read_partitioned_filter.php
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,6 @@

$flow = (new Flow())
->read(CSV::from(__FLOW_DATA__ . '/partitioned'))
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop('row')
->collect()
->filterPartitions(Partitions::only('t_shirt_color', 'green'))
->sortBy(ref('id'))
Expand Down
5 changes: 0 additions & 5 deletions examples/topics/types/csv/csv_to_avro.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@

declare(strict_types=1);

use function Flow\ETL\DSL\col;
use function Flow\ETL\DSL\ref;
use Aeon\Calendar\Stopwatch;
use Flow\ETL\DSL\Avro;
use Flow\ETL\DSL\CSV;
Expand All @@ -13,9 +11,6 @@

$flow = (new Flow())
->read(CSV::from(__FLOW_OUTPUT__ . '/dataset.csv', 10_000))
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop(col('row'))
->rename('last name', 'last_name')
->write(Avro::to(__FLOW_OUTPUT__ . '/dataset.avro'));

Expand Down
5 changes: 0 additions & 5 deletions examples/topics/types/csv/csv_to_json.php
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
<?php declare(strict_types=1);

use function Flow\ETL\DSL\col;
use function Flow\ETL\DSL\ref;
use Flow\ETL\DSL\CSV;
use Flow\ETL\DSL\Json;
use Flow\ETL\Flow;
Expand All @@ -14,9 +12,6 @@

$flow = (new Flow())
->read(CSV::from(__FLOW_OUTPUT__ . '/dataset.csv'))
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop(col('row'))
->write(Json::to(__FLOW_OUTPUT__ . '/dataset.json'));

if ($_ENV['FLOW_PHAR_APP'] ?? false) {
Expand Down
5 changes: 0 additions & 5 deletions examples/topics/types/csv/csv_to_parquet_100k.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@

declare(strict_types=1);

use function Flow\ETL\DSL\col;
use function Flow\ETL\DSL\ref;
use Aeon\Calendar\Stopwatch;
use Flow\ETL\DSL\CSV;
use Flow\ETL\DSL\Parquet;
Expand All @@ -13,9 +11,6 @@

$flow = (new Flow())
->read(CSV::from(__FLOW_OUTPUT__ . '/dataset.csv', 10_000))
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop(col('row'))
->write(Parquet::to(__FLOW_OUTPUT__ . '/dataset_100k.parquet', 100_000));

if ($_ENV['FLOW_PHAR_APP'] ?? false) {
Expand Down
5 changes: 0 additions & 5 deletions examples/topics/types/csv/csv_to_parquet_10k.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@

declare(strict_types=1);

use function Flow\ETL\DSL\col;
use function Flow\ETL\DSL\ref;
use Aeon\Calendar\Stopwatch;
use Flow\ETL\DSL\CSV;
use Flow\ETL\DSL\Parquet;
Expand All @@ -13,9 +11,6 @@

$flow = (new Flow())
->read(CSV::from(__FLOW_OUTPUT__ . '/dataset.csv', 10_000))
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop(col('row'))
->write(Parquet::to(__FLOW_OUTPUT__ . '/dataset_10k.parquet', 10_000));

if ($_ENV['FLOW_PHAR_APP'] ?? false) {
Expand Down
5 changes: 0 additions & 5 deletions examples/topics/types/csv/php_to_csv.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@

declare(strict_types=1);

use function Flow\ETL\DSL\col;
use function Flow\ETL\DSL\ref;
use Aeon\Calendar\Stopwatch;
use Flow\ETL\DSL\CSV;
use Flow\ETL\Flow;
Expand All @@ -15,9 +13,6 @@

$flow = (new Flow())
->read($extractor)
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop(col('row'))
->write(CSV::to(__FLOW_OUTPUT__ . '/dataset.csv'));

if ($_ENV['FLOW_PHAR_APP'] ?? false) {
Expand Down
5 changes: 0 additions & 5 deletions examples/topics/types/csv/php_to_csv_and_json.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@

declare(strict_types=1);

use function Flow\ETL\DSL\col;
use function Flow\ETL\DSL\ref;
use Aeon\Calendar\Stopwatch;
use Flow\ETL\DSL\CSV;
use Flow\ETL\DSL\Json;
Expand All @@ -16,9 +14,6 @@

$flow = (new Flow())
->read($extractor)
->withEntry('unpacked', ref('row')->unpack())
->renameAll('unpacked.', '')
->drop(col('row'))
->write(CSV::to(__FLOW_OUTPUT__ . '/dataset.csv'))
->write(Json::to(__FLOW_OUTPUT__ . '/dataset.json'));

Expand Down
Loading
Loading