topological concatenation of files
topcat
is a simple tool to concatenate files in a topological order. It is useful when you have a set of files that
depend on each other and you want to concatenate them in the right order.
For my use case this is SQL files.
I like to treat my SQL files as a set of functions and views that depend on each other. I like to keep them in separate files and concatenate them in the right order to create a single file that I can run in my database.
pip:
pip install topcat
poetry:
poetry add topcat
topcat -i /path/to/input -o /path/to/output.sql
Where /path/to/input
is the directory containing the files to concatenate and /path/to/output.sql
will be where the
concatenated file will be written.
USAGE:
topcat [FLAGS] [OPTIONS] --output <FILE>
FLAGS:
--dry Only print the output, do not write to file.
-h, --help Prints help information
-V, --version Prints version information
-v, --verbose Print debug information
OPTIONS:
--comment-str <comment-str>
The string used to denote a comment. eg '--' [default: --]
--ensure-each-file-ends-with <ensure-each-file-ends-with-str>
Add this string to the end of files if it does not exist. eg ';' [default: ;]
-x, --exclude <PATTERN>... Exclude files matching given glob pattern
--file-separator-str <file-separator-str>
Add this between each concatenated file in the output. eg '---' [default:
------------------------------------------------------------------------------------------------------------------------]
-n, --include <PATTERN>... Only include files matching glob pattern
-i, --input_dir <DIR>...
Path to directory containing files to be concatenated
-o, --output <FILE> Path to generate combined output file
Some quirks here:
-i
is the input directory. You can have multiple input directories. This is useful if you have a set of files in different directories that depend on each other.-o
is the output file. This is where the concatenated file will be written.-x
and-n
are used to exclude and include files respectively. These are glob patterns. For example-x **/tests/*
will exclude all files in anytests
directory.-n **/functions/*
will only include files in thefunctions
directory. You can use these together to include and exclude files as you need. You can use these multiple times.--comment-str
is the string used to denote a comment. This is used to find thename
,requires
,dropped_by
andexists
comments in the files. The default is--
. In SQL this is--
but in other languages it might be//
or#
.--ensure-each-file-ends-with
is the string to add to the end of each file if it doesn't exist. This is useful for SQL files where you might want to ensure each file ends with a;
. The default is;
.--file-separator-str
is the string to add between each concatenated file in the output. The default is a long line of dashes. This is just visually useful to see where one file ends and the next begins.--dry
will only print the output, it will not write to the output file.-v
will print debug information and a.dot
format of the dependency graph.
The only requirement for a file to be included in the concatenation is that it needs to have a name
comment at the top
of the file.
This can be anything you want, but it needs to be unique. This is used to define a node in the dependency graph.
For example:
-- name: my_schema
If a file requires another file to be concatenated before it, you can add a requires
comment to the file.
An alias for requires
is dropped_by
. I use dropped_by
in SQL files for clarity to show that the DDL in the file
gets dropped so I don't need to use CREATE OR REPLACE FUNCTION
or the like.
For example:
-- name: my_schema.b
-- dropped_by: my_schema
-- requires: my_schema.a
exists
is for soft dependencies. For example in plpgsql functions, the body isn't parsed until the function is called.
So any dependent objects you can't use requires
for, you can use exists
to ensure the file is included in the
concatenated file but order of creation doesn't matter.
For example:
-- name: my_schema.b
-- dropped_by: my_schema
-- requires: my_schema.a
-- exists: my_schema.c
Lets say you have a directory with the following files:
sql
├── my_other_schema
│ ├── functions
│ │ ├── a.sql
│ │ ├── b.sql
│ │ └── c.sql
│ └── schema.sql
└── my_schema
├── functions
│ └── a.sql
└── schema.sql
And the content of the files is:
sql/my_schema/schema.sql
:
-- name: my_schema
DROP SCHEMA IF EXISTS my_schema CASCADE;
CREATE SCHEMA IF NOT EXISTS my_schema;
sql/my_schema/functions/a.sql
:
-- name: my_schema.a
-- dropped_by: my_schema
CREATE FUNCTION my_schema.a() RETURNS INT AS
$$
SELECT 1;
$$ LANGUAGE SQL IMMUTABLE
PARALLEL SAFE;
sql/my_schema/functions/b.sql
:
-- name: my_schema.b
-- dropped_by: my_schema
-- requires: my_schema.a
CREATE FUNCTION my_schema.b() RETURNS INT AS
$$
SELECT my_schema.a() + 1
$$ LANGUAGE SQL;
sql/my_schema/functions/c.sql
:
-- name: my_schema.c
-- dropped_by: my_schema
-- requires: my_schema.b
CREATE FUNCTION my_schema.c() RETURNS INT AS
$$
SELECT my_schema.b() + 1
$$ LANGUAGE SQL IMMUTABLE
PARALLEL SAFE;
sql/my_other_schema/schema.sql
:
-- name: my_other_schema
DROP SCHEMA IF EXISTS my_schema CASCADE;
CREATE SCHEMA IF NOT EXISTS my_schema;
sql/my_other_schema/functions/a.sql
:
-- name: my_other_schema.a
-- dropped_by: my_other_schema
-- requires: my_schema.b
CREATE FUNCTION my_other_schema.a() RETURNS INT AS
$$
SELECT my_schema.b() + 1
$$ LANGUAGE SQL IMMUTABLE
PARALLEL SAFE;
So the dependency graph looks like:
Now you can run topcat
to concatenate the files in the right order:
topcat -i tests/input/sql -o tests/output/sql/output.sql
The content of output.sql
will be:
-- This file was generated by topcat. To regenerate run:
--
-- topcat -i tests/input/sql -o tests/output/sql/output.sql -v
------------------------------------------------------------------------------------------------------------------------
-- tests/input/sql/my_other_schema/schema.sql
-- name: my_schema
DROP SCHEMA IF EXISTS my_schema CASCADE;
CREATE SCHEMA IF NOT EXISTS my_schema;
------------------------------------------------------------------------------------------------------------------------
-- tests/input/sql/my_other_schema/functions/a.sql
-- name: my_schema.a
-- dropped_by: my_schema
CREATE FUNCTION my_schema.a() RETURNS INT AS
$$
SELECT 1;
$$ LANGUAGE SQL;
------------------------------------------------------------------------------------------------------------------------
-- tests/input/sql/my_other_schema/functions/b.sql
-- name: my_schema.b
-- dropped_by: my_schema
-- requires: my_schema.a
CREATE FUNCTION my_schema.b() RETURNS INT AS
$$
SELECT my_schema.a() + 1
$$ LANGUAGE SQL;
------------------------------------------------------------------------------------------------------------------------
-- tests/input/sql/my_schema/schema.sql
-- name: my_other_schema
DROP SCHEMA IF EXISTS my_other_schema CASCADE;
CREATE SCHEMA IF NOT EXISTS my_other_schema;
------------------------------------------------------------------------------------------------------------------------
-- tests/input/sql/my_schema/functions/a.sql
-- name: my_other_schema.a
-- dropped_by: my_other_schema
-- requires: my_schema.b
CREATE FUNCTION my_other_schema.a() RETURNS INT AS
$$
SELECT my_schema.b() + 1
$$ LANGUAGE SQL IMMUTABLE
PARALLEL SAFE;
------------------------------------------------------------------------------------------------------------------------
-- tests/input/sql/my_other_schema/functions/c.sql
-- name: my_schema.c
-- dropped_by: my_schema
-- requires: my_schema.b
-- requires: my_other_schema.a
CREATE FUNCTION my_schema.c() RETURNS INT AS
$$
SELECT my_schema.b() + my_other_schema.a() + 1
$$ LANGUAGE SQL;