Skip to content

Commit

Permalink
Add config for which column header to use on filtered csv downloads (#…
Browse files Browse the repository at this point in the history
…4261)

* Allow config for which column header to use on filtered csv downloads

* set default config for csv_headers_mode

* Move getTableColsAndComments to where it is used

* Adds test

* Add documentation for CSV Headers Mode (#4264)

---------

Co-authored-by: Paul Mitchum <paul@mile23.com>
  • Loading branch information
janette and paul-m committed Aug 22, 2024
1 parent b668ef3 commit f0fad05
Show file tree
Hide file tree
Showing 12 changed files with 244 additions and 18 deletions.
23 changes: 20 additions & 3 deletions docs/source/user-guide/guide_data_dictionaries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The structure of your data dictionary should follow `Frictionless Standards tabl
{
"data": {
"title": "A human readable label",
"title": "A human readable label for the dictionary",
"fields": [
{
"name": "(REQUIRED) machine name of the field that matches the datastore column header.",
Expand All @@ -50,7 +50,7 @@ The "name" should match the datastore column name. These are derived from the co

title
^^^^^
This is usually the column header from the data file, but if the data file uses abbreviated column headings, this is where you can supply a more human readable and clear display title. This value will also be used for column headings when users export a filtered subset of results as a csv file.
This is usually the column header from the data file, but if the data file uses abbreviated column headings, this is where you can supply a more human readable and clear display title. Depending on your :ref:`configuration <guide_data_dictionary_config>`, this value may also be used for column headings when users export a filtered subset of results as a csv file.

type
^^^^
Expand Down Expand Up @@ -209,7 +209,7 @@ Adding indexes
Data dictionaries can be used to describe indexes that should be applied when importing to a database.
Learn more about this on :doc:`guide_indexes`

How to set the data dictionary mode
How to set the Dictionary Mode
-----------------------------------

In the section above we created a data dictionary
Expand Down Expand Up @@ -303,3 +303,20 @@ request the dataset back from the API, it would show us the absolute URL as well
If you have set the dictionary mode to *distribution reference*, any time you update the data file in the distribution, the datastore will be dropped, re-imported, and any data typing defined in the data dictionary will be applied to the table.

If you have set the dictionary mode to *sitewide*, when any dataset is updated, and the machine name of the column header from the source data matches the name value in the sitewide data dictionary, the data typing will also be applied to the datastore table.


.. _guide_data_dictionary_config:

How to set the CSV Headers Mode
-------------------------------

Users can run queries against the datastore API and download the results to a CSV file. The **CSV Headers Mode** will determine what values to use for the column headers when the CSV file is generated. The default setting will simply use the same column headings that exist in the original resource file. If your site is using data dictionaries, you could change this setting to use the titles defined in the data dictionary. And there is a third option to use the converted machine name headers that are used in the datastore table.

Visit `/admin/dkan/data-dictionary/settings` to make a selection.

- Use the column names from the resource file
- Use data dictionary titles
- Use the datastore machine names

.. NOTE::
If you are changing this setting after data has been imported, you will need to re-import the data for the change to take effect.
1 change: 1 addition & 0 deletions modules/datastore/datastore.services.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ services:
- '@dkan.datastore.database_connection_factory'
- '@pdlt.converter.strptime_to_mysql'
- '@uuid'
- '@config.factory'

dkan.datastore.database:
class: \Drupal\Core\Database\Connection
Expand Down
8 changes: 7 additions & 1 deletion modules/datastore/src/Controller/AbstractQueryController.php
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,7 @@ public static function fixTypes($json, $schema) {
* Array of strings for a CSV header row.
*/
protected function getHeaderRow(DatastoreQuery $datastoreQuery, RootedJsonData &$result) {
$config = $this->configFactory->get('metastore.settings')->get('csv_headers_mode');
$schema_fields = $result->{'$.schema..fields'}[0] ?? [];
if (empty($schema_fields)) {
throw new \DomainException("Could not generate header for CSV.");
Expand All @@ -363,7 +364,12 @@ protected function getHeaderRow(DatastoreQuery $datastoreQuery, RootedJsonData &
$header_row = [];
foreach ($datastoreQuery->{'$.properties'} ?? [] as $property) {
$normalized_prop = $this->propToString($property, $datastoreQuery);
$header_row[] = $schema_fields[$normalized_prop]['description'] ?? $normalized_prop;
if ($config == "machine_names") {
$header_row[] = $normalized_prop ?? ($schema_fields[$normalized_prop]['description'] ?? FALSE);
}
else {
$header_row[] = $schema_fields[$normalized_prop]['description'] ?? $normalized_prop;
}
}

return $header_row;
Expand Down
41 changes: 30 additions & 11 deletions modules/datastore/src/DataDictionary/AlterTableQuery/MySQLQuery.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
namespace Drupal\datastore\DataDictionary\AlterTableQuery;

use Drupal\Core\Database\StatementInterface;

use Drupal\datastore\Plugin\QueueWorker\ImportJob;
use Drupal\datastore\DataDictionary\AlterTableQueryBase;
use Drupal\datastore\DataDictionary\AlterTableQueryInterface;
Expand Down Expand Up @@ -92,6 +91,20 @@ class MySQLQuery extends AlterTableQueryBase implements AlterTableQueryInterface
'duration' => 'TINYTEXT',
];

/**
* Config for which column heading values to use for csv downloads.
*
* @var string
*/
protected string $csvHeadersMode = 'resource_headers';

/**
* Assign the csvHeaderMode based on the config setting.
*/
public function setCsvHeaderMode($mode) {
$this->csvHeadersMode = $mode ?? 'resource_headers';
}

/**
* {@inheritdoc}
*/
Expand All @@ -100,7 +113,6 @@ public function doExecute(): void {
$this->fields = $this->sanitizeFields($this->fields);
// Filter out fields which are not present in the database table.
$this->fields = $this->mergeFields($this->fields, $this->table);

// Sanitize index field names to match database field names.
$this->indexes = $this->sanitizeIndexes($this->indexes);
// Filter out indexes with fields which are not present in the table.
Expand Down Expand Up @@ -177,7 +189,6 @@ protected function sanitizeIndexes(array $indexes): array {
protected function mergeFields(array $fields, string $table): array {
$table_cols = $this->getTableColsAndComments($table);
$column_names = array_keys($table_cols);

// Filter out un-applicable query fields.
$filtered_fields = array_filter($fields, fn ($fields) => in_array($fields['name'], $column_names, TRUE));
// Fill missing field titles.
Expand Down Expand Up @@ -405,7 +416,7 @@ protected function buildAlterCommand(string $table, array $fields, array $indexe
$mysql_type_map = $this->buildDatabaseTypeMap($fields, $table);
// Build alter options.
$alter_options = array_merge(
$this->buildModifyColumnOptions($fields, $mysql_type_map),
$this->buildModifyColumnOptions($fields, $mysql_type_map, $table),
$this->buildAddIndexOptions($indexes, $table, $mysql_type_map)
);

Expand Down Expand Up @@ -441,21 +452,29 @@ protected function buildDatabaseTypeMap(array $fields, string $table): array {
* Query fields.
* @param string[] $type_map
* Field -> MySQL type map.
* @param string $table
* Mysql table name.
*
* @return string[]
* Modify column options.
*/
protected function buildModifyColumnOptions(array $fields, array $type_map): array {
protected function buildModifyColumnOptions(array $fields, array $type_map, string $table): array {
$modify_column_options = [];

foreach ($fields as ['name' => $field, 'title' => $title]) {
$column_type = $type_map[$field];
// Escape characters in column title in preparation for it being used as
// a MySQL comment.
$comment = addslashes($title);
// Build modify line for alter command and add the appropriate arguments
// to the args list.
$modify_column_options[] = "MODIFY COLUMN {$field} {$column_type} COMMENT '{$comment}'";
if ($this->csvHeadersMode == 'dictionary_titles') {
// Escape characters in column title in preparation for it being used as
// a MySQL comment.
$comment = addslashes($title);
// Build modify line for alter command and add the appropriate arguments
// to the args list.
$modify_column_options[] = "MODIFY COLUMN {$field} {$column_type} COMMENT '{$comment}'";
}
else {
$table_cols = $this->getTableColsAndComments($table);
$modify_column_options[] = "MODIFY COLUMN {$field} {$column_type} COMMENT '{$table_cols[$field]}'";
}
}

return $modify_column_options;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,17 @@ class MySQLQueryBuilder extends AlterTableQueryBuilderBase implements AlterTable
* {@inheritdoc}
*/
public function getQuery(): AlterTableQueryInterface {
return new MySQLQuery(

$query = new MySQLQuery(
$this->databaseConnectionFactory->getConnection(),
$this->dateFormatConverter,
$this->table,
$this->fields,
$this->indexes,
);

$query->setCsvHeaderMode($this->configFactory->get('metastore.settings')->get('csv_headers_mode'));
return $query;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
use Drupal\Component\Uuid\UuidInterface;

use Drupal\common\Storage\DatabaseConnectionFactoryInterface;
use Drupal\Core\Config\ConfigFactory;
use Drupal\datastore\DataDictionary\AlterTableQueryInterface;

use PDLT\ConverterInterface;
Expand Down Expand Up @@ -96,6 +97,13 @@ abstract class AlterTableQueryBuilderBase implements AlterTableQueryBuilderInter
*/
protected array $indexes = [];

/**
* ConfigFactory object.
*
* @var \Drupal\Core\Config\ConfigFactory
*/
protected $configFactory;

/**
* Create an alter table query factory.
*
Expand All @@ -105,15 +113,19 @@ abstract class AlterTableQueryBuilderBase implements AlterTableQueryBuilderInter
* PHP Date Language Tool Converter.
* @param \Drupal\Component\Uuid\UuidInterface $uuid
* Uuid generator service.
* @param \Drupal\Core\Config\ConfigFactory $configFactory
* ConfigFactory service.
*/
public function __construct(
DatabaseConnectionFactoryInterface $database_connection_factory,
ConverterInterface $date_format_converter,
UuidInterface $uuid
UuidInterface $uuid,
ConfigFactory $configFactory
) {
$this->databaseConnectionFactory = $database_connection_factory;
$this->dateFormatConverter = $date_format_converter;
$this->uuid = $uuid;
$this->configFactory = $configFactory;
}

/**
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
<?php

namespace Drupal\Tests\datastore\Functional\Controller;

use Drupal\Core\File\FileSystemInterface;
use Drupal\Tests\BrowserTestBase;
use Drupal\Tests\common\Traits\GetDataTrait;
use Drupal\Tests\common\Traits\QueueRunnerTrait;
use RootedData\RootedJsonData;

/**
* @coversDefaultClass \Drupal\datastore\Controller\QueryDownloadController
*
* @group dkan
* @group datastore
* @group functional
* @group btb
*/
class QueryDownloadControllerTest extends BrowserTestBase {

use GetDataTrait, QueueRunnerTrait;

/**
* Uploaded resource file destination.
*
* @var string
*/
protected const UPLOAD_LOCATION = 'public://uploaded_resources/';

/**
* Test data file path.
*
* @var string
*/
protected const TEST_DATA_PATH = __DIR__ . '/../../../data/';

/**
* Resource file name.
*
* @var string
*/
protected const RESOURCE_FILE = 'longcolumn.csv';


protected static $modules = [
'datastore',
'node',
];

protected $defaultTheme = 'stark';

/**
* Test application of data dictionary schema to CSV generated for download.
*/
public function testDownloadWithMachineName() {
// Dependencies.
$uuid = $this->container->get('uuid');
/** @var \Drupal\metastore\ValidMetadataFactory $validMetadataFactory */
$validMetadataFactory = $this->container->get('dkan.metastore.valid_metadata');
/** @var \Drupal\metastore\MetastoreService $metastoreService */
$metastoreService = $this->container->get('dkan.metastore.service');
// Copy resource file to uploads directory.
/** @var \Drupal\Core\File\FileSystemInterface $file_system */
$file_system = $this->container->get('file_system');
$upload_path = $file_system->realpath(self::UPLOAD_LOCATION);
$file_system->prepareDirectory($upload_path, FileSystemInterface::CREATE_DIRECTORY);
$file_system->copy(self::TEST_DATA_PATH . self::RESOURCE_FILE, $upload_path, FileSystemInterface::EXISTS_REPLACE);
$resourceUrl = $this->container->get('stream_wrapper_manager')
->getViaUri(self::UPLOAD_LOCATION . self::RESOURCE_FILE)
->getExternalUrl();

// Set up dataset.
$dataset_id = $uuid->generate();
$this->assertInstanceOf(
RootedJsonData::class,
$dataset = $validMetadataFactory->get(
$this->getDataset(
$dataset_id,
'Test ' . $dataset_id,
[$resourceUrl],
TRUE
),
'dataset'
)
);
// Create dataset.
$this->assertEquals(
$dataset_id,
$metastoreService->post('dataset', $dataset)
);
// Publish should return FALSE, because the node was already published.
$this->assertFalse($metastoreService->publish('dataset', $dataset_id));

// Retrieve dataset.
$this->assertInstanceOf(
RootedJsonData::class,
$dataset = $metastoreService->get('dataset', $dataset_id)
);

// Run queue items to perform the import.
$this->runQueues(['localize_import', 'datastore_import', 'post_import']);

// Explicitly configure for the CSV's headers.
$this->config('metastore.settings')
->set('csv_headers_mode', 'resource_headers')
->save();

// Query for the dataset, as a streaming CSV.
$client = $this->getHttpClient();
$response = $client->request(
'GET',
$this->baseUrl . '/api/1/datastore/query/' . $dataset_id . '/0/download',
['query' => ['format' => 'csv']]
);

$lines = explode("\n", $response->getBody()->getContents());
$this->assertEquals(
'id,name,extra_long_column_name_with_tons_of_characters_that_will_need_to_be_truncated_in_order_to_work,extra_long_column_name_with_tons_of_characters_that_will_need_to_be_truncated_in_order_to_work2',
$lines[0]
);

// Re-request, but with machine name headers.
$this->config('metastore.settings')
->set('csv_headers_mode', 'machine_names')
->save();

$client = $this->getHttpClient();
$response = $client->request(
'GET',
$this->baseUrl . '/api/1/datastore/query/' . $dataset_id . '/0/download',
['query' => ['format' => 'csv']]
);

$lines = explode("\n", $response->getBody()->getContents());
// Truncated headers from the datastore.
$this->assertEquals(
'id,name,extra_long_column_name_with_tons_of_characters_that_will_ne_e872,extra_long_column_name_with_tons_of_characters_that_will_ne_5127',
$lines[0]
);
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ public function testDictionaryEnforcement(): void {
$metastore_config = $this->config('metastore.settings');
$metastore_config->set('data_dictionary_mode', DataDictionaryDiscovery::MODE_SITEWIDE)
->set('data_dictionary_sitewide', $dict_id)
->set('csv_headers_mode', 'dictionary_titles')
->save();

// Build dataset.
Expand Down
Loading

0 comments on commit f0fad05

Please sign in to comment.