- Centralize schema files (Avro, Protobuf, JSON) in a single Git repository.
- Integrate schema deployment to the Schema Registry and automate model generation and distribution in a centralized CI/CD pipeline.
- Automate schema transformations.
- Resolve schema dependencies in JSON configurations to eliminate redundancy and improve maintainability.
Sample Repo Demo: Try the example here in just a few minutes.
Guide: Scenario Example
In modern microservices architectures, separating concerns is essential for achieving scalability and maintainability. Managing schema files (e.g., Avro, Protobuf, JSON) across multiple services can quickly become complex and error-prone, especially when each microservice maintains its own schemas and handles their publication to a schema registry (e.g., Confluent Schema Registry).
Schema Manager addresses these challenges by centralizing schema management and delegation. It maintains schema files in a centralized Git repository and handles deployment to the target Schema Registry (typically through a CI/CD pipeline).
By enforcing this separation of concerns, Schema Manager simplifies schema management across services and ensures a clear separation of responsibilities, allowing microservices to remain lightweight and focused.
For a more comprehensive explanation of how Schema Manager integrates into modern Kafka-oriented applications involving multiple microservices and languages, refer to the Example of Integration with Schema Manager section. This section covers schema publication, as well as the generation and distribution of models across services.
The Schema Manager is distributed via NPM:
npm install @charlescol/schema-manager
The minimal requirement is to initiate an npm project within your repository and install the package. The schema manager capabilities can be used programmatically in a JavaScript script file (example here).
Sample Repo Demo: Try the example here in just a few minutes.
Config | Supported Formats | Description |
---|---|---|
Avro | .avro , .avsc |
Parses Avro schema files, supports extracting dependencies from .avro and .avsc files. |
Protobuf | .proto |
Parses Protobuf schema files, supports extracting dependencies from .proto files. |
Please refer to the Parser Documentation for more details on how to create a parser.
Registry | Class Name | Supported Registries |
---|---|---|
Confluent Schema Registry | ConfluentRegistry |
Confluent Kafka |
Please refer to the Registry Documentation for more details on how to create a registry.
-
Organize Your Schemas:
- Place your files in versioned directories (e.g.,
v1
,v2
) and map them inversions.json
. Refer to the versions file structure for details on how to structure theversions.json
file.
- Place your files in versioned directories (e.g.,
-
Build Schemas:
- Schema Manager produces a
build
directory with all the schemas grouped into folders. Each folder contains the transformed schema with the appropriate namespaces and dependencies.
- Schema Manager produces a
-
Automated Registration:
- Schema Manager processes the
build
folder to register the schemas in the target Schema Registry in the correct order based on dependencies.
- Schema Manager processes the
Consider a system managing a set of Protobuf schemas for an event-driven architecture. Each schema has multiple versions and dependencies.
If you don’t already have a project set up, you can create one with the following commands:
mkdir schema-manager-example
cd schema-manager-example
npm init -y && npm install @charlescol/schema-manager
In this example, we have two topics named "topic1" and "topic2," as well as a common folder used for shared schemas.
The topics can contain multiple major versions of schema files. A JSON file named versions.json
is used to manage them for each topic.
In our case, let's say that our schemas for topic1
contain two major versions due to breaking changes in a file called data.proto
:
example-schemas/
├── topic1/
│ ├── v1/
│ │ ├── data.proto # Schema for v1 data of topic1
│ │ └── model.proto # Schema for v1 model (depends on topic1/v1/data.proto)
│ ├── v2/
│ │ └── data.proto # Schema for v2 data (depends on ./common/v1/entity.proto)
│ └── versions.json # Version mapping for topic1 (v1 and v2)
├── topic2/
│ ├── v1/
│ │ └── data2.proto # Schema for v1 data2 of topic2 (depends on ./common/v1/entity.proto)
│ └── versions.json # Version mapping for topic2 (v1)
├── common/
│ ├── v1/
│ │ └── entity.proto # Schema for shared entity
In the above structure, the folder topic1/v2
contains only one file, even though there are three schemas in v2 (model.proto
→ data.proto
→ entity.proto
).
This is because versions.json
is used to resolve dependencies. If a file is the same across different versions, it doesn't need to be duplicated. The Schema Manager handles this implicit import mechanism to avoid schema duplication.
{
"v1": {
"data": "v1/data.proto",
"model": "v1/model.proto"
},
"v2": {
"data": "v2/data.proto",
"model": "v1/model.proto",
"entity": "../common/v1/entity.proto"
}
}
{
"v1": {
"data2": "v1/data2.proto",
"entity": "../common/v1/entity.proto"
}
}
Adding a versions.json
in the common
directory is unnecessary because it is not meant to be registered independently.
- Schema Manager supports any string for version names (e.g.,
v1
,v1.0
,alpha
). - The schema name must be unique across a given version of a topic.
- Alternatively, a single
versions.json
file could be used to manage all the topics centrally.
Note: Multiple examples, including those mentioned here with different schema types (Protobuf, Avro, etc.), can be found in the repository at ./examples
. Explore these to better understand schema management workflows.
Here is an example on how the model.proto should be structured. For more details on how to structure tour schema files by type, refer to the Schema Content section.
// Information regarding the package, namespace, and import path will be completed automatically during the build process
syntax = "proto3";
// Don't need to create a package.
// Don't need to import the full path, only the filename is enough.
import "data.proto";
message Model {
Data data = 1; // Don't need to reference the namespace, only the message name is enough.
string description = 2;
repeated string tags = 3;
}
The code below will generate a build
directory with all the schemas grouped into folders, one per topic and version.
import { ConfigType, ConfluentRegistry, Manager } from '@charlescol/schema-manager';
import * as path from 'path';
(async () => {
const SCHEMA_REGISTRY_URL = process.env.SCHEMA_REGISTRY_URL || 'http://localhost:8081';
const SCHEMA_DIR = path.resolve(__dirname, '../schemas');
// Create the schema registry
const registry = new ConfluentRegistry({
schemaRegistryUrl: SCHEMA_REGISTRY_URL,
});
// Create the manager
const manager = new Manager({
schemaRegistry: registry,
configType: ConfigType.PROTOBUF, // Specify the config type
});
await manager.build(SCHEMA_DIR); // Build the schemas
})();
Build Directory Structure:
build/
├── topic1/
│ ├── v1/
│ │ ├── data.proto
│ │ └── model.proto # Depends on data.proto
│ ├── v2/
│ │ └── data.proto # Depends on entity.proto
│ │ └── model.proto # Depends on data.proto
│ │ └── entity.proto
├── topic2/
│ ├── v1/
│ │ └── data2.proto # Depends on entity.proto
│ │ └── entity.proto
Each generated file is now ready to be registered in the schema registry.
syntax = "proto3";
package topic1.v1; // Resolved automatically
import "topic1/v1/data.proto"; // Resolved automatically
message Model {
topic1.v1.Data data = 1; // Resolved automatically
string description = 2;
repeated string tags = 3;
}
By default, the namespace is generated by replacing /
with .
in the path. This can be customized by providing a custom namespace builder function to the manager.
Once the schemas are built, they can be deployed in the registry.
function subjectBuilder(filepath: string): string {
const [topic, version, filename] = filepath.split('/'); // Extract topic and version
const filenameWithoutExt = filename?.split('.')[0] || ''; // Extract the filename without extension
return `${topic}.${filenameWithoutExt}.${version}`; // Return the constructed subject
}
await manager.register(subjectBuilder); // Register the schemas
The subjectBuilder
function generates the subject name for each schema registered in the schema registry. The subject is a unique identifier used by the registry to track a schema.
The register
function builds a dependency graph based on schema dependencies and registers them in the correct order using a topological sorting algorithm.
topic1/v1/data.proto
→topic1.data.v1
topic1/v1/model.proto
→topic1.model.v1
topic1/v2/data.proto
→topic1.data.v2
topic1/v1/data.proto
topic1/v2/entity.proto
topic2/v1/entity.proto
topic1/v2/data.proto
topic2/v2/data2.proto
topic1/v1/model.proto
topic1/v2/model.proto
Below is the complete code example for the steps above.
import { ConfigType, ConfluentRegistry, Manager } from '@charlescol/schema-manager';
import * as path from 'path';
(async () => {
const SCHEMA_REGISTRY_URL = process.env.SCHEMA_REGISTRY_URL || 'http://localhost:8081';
const SCHEMA_DIR = path.resolve(__dirname, '../schemas');
const registry = new ConfluentRegistry({
// Create the schema registry
schemaRegistryUrl: SCHEMA_REGISTRY_URL,
});
const manager = new Manager({
// Create the manager
schemaRegistry: registry,
configType: ConfigType.PROTOBUF,
});
await manager.build(SCHEMA_DIR); // Build the schemas
await manager.register(subjectBuilder); // register the schemas
})();
// Function to provide, used to build the subject for each schema file.
function subjectBuilder(filepath: string): string {
const [topic, version, filename] = filepath.split('/');
const filenameWithoutExt = filename?.split('.')[0] || '';
return `${topic}.${filenameWithoutExt}.${version}`; // Return the constructed subject
}
If the file above is saved as publish-schemas.ts
, you can run it with the following command to compile and execute it:
tsc && node dist/publish-schemas.js
Many details are dynamically resolved during the build process using the versions.json
file. This setup avoids code redundancy and enables dynamic schema transformations.
- Package Handling: The package is automatically included during the build process and does not need to be specified in the schema.
- Imports: Import statements should reference only the file name without the full path (e.g.,
import "entity.proto";
✅, notimport "common/v1/entity.proto";
❌). - External Imports: External imports (e.g.,
google/protobuf/timestamp.proto
) remain unchanged. - Object Names: Internal object references do not need to include the package name (e.g.,
Entity
✅, notcommon.v1.Entity
❌).
// topic1/v1/data.proto
syntax = "proto3";
// No need to create a package.
import "entity.proto"; // Use only the file name
import "google/protobuf/timestamp.proto"; // External imports are unchanged
message Data {
string additional_info = 1;
google.protobuf.Timestamp created_at = 2; // External imports are unchanged
repeated Entity entities = 3; // Internal object names exclude the package name
}
- Namespace Handling: Namespaces are automatically included during the build process and do not need to be specified in the schema.
- Object Names: Internal object references do not need to include the namespace name (e.g.,
Entity
✅, notcommon.v1.Entity
❌).
{
"type": "record",
"name": "Model",
"fields": [
{
"name": "description",
"type": "string"
},
{
"name": "entities",
"type": {
"type": "array",
"items": "Entity"
}
}
]
}
We aim to extend Schema Manager with the following features:
- Support for Additional Schema Registries: Expanding beyond Confluent Schema Registry to support other platforms.
- Expanded Format Support: Adding parsers for more schema types beyond Protobuf, Avro, and JSON.
- Complex Transformations: Enabling advanced schema transformations, such as adding options in Protobuf files to control future code generation.
A typical use case for Schema Manager is managing schemas in a Kafka-oriented application with multiple microservices. By centralizing schemas in a single repository, you can automate schema management tasks through a CI/CD pipeline.
- Build Schemas: Generate schema files from source.
- Register Schemas: Deploy schemas to the schema registry.
- Code Generation: Generate code for schemas in target programming languages.
- Package Distribution: Distribute generated code as packages (e.g., Maven, NPM, PyPI).
- Centralized Schema Management: Store all schemas in a version-controlled Git repository. Updates to
versions.json
trigger builds and registrations in the Schema Registry. - Microservice Integration: Each microservice uses versioned and packaged models generated by the central repository. These models ensure consistency across services.
- Serialization: Use tools like
kafka-protobuf-serializer
(Java) or@kafkajs/confluent-schema-registry
(JavaScript) to serialize data with the latest schema version retrieved from the registry.
- Simplified Management: Centralized schemas reduce duplication and inconsistencies.
- Automation: CI/CD integration ensures schema updates are handled automatically.
- Compatibility: Versioned models guarantee consistency across services.
We welcome contributions! If you have suggestions or improvements, please open an issue or submit a pull request.
Refer to the Contributing Guide and How-To Documentation for details on contributing.
This project is licensed under the MIT License. See the LICENSE file for details.