Skip to content

Commit

Permalink
submitted to FoIKS 2024
Browse files Browse the repository at this point in the history
  • Loading branch information
stefjoosten committed Dec 11, 2023
1 parent b842962 commit bf94f68
Showing 1 changed file with 35 additions and 39 deletions.
74 changes: 35 additions & 39 deletions 2022Migration/articleMigrationFoIKS.tex
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,6 @@

% Ampersand -----------------------------------------------------------

%\def\id#1{\text{\it #1\/}}
\newcommand{\ourtheory}{approach}
\newcommand{\foundtheory}{foundational approach}

\newcommand{\xrightarrowdbl}[2][]{%
\xrightarrow[#1]{#2}\mathrel{\mkern-14mu}\rightarrow
}
Expand Down Expand Up @@ -184,13 +180,13 @@
This can inhibit a steady pace of releases, especially for increments that alter the system's schema in production.
Consequently, schema-changing data migrations often face challenges, leading developers to resort to manual migration or employ workarounds.

To address this issue, this paper proposes a \foundtheory{} for data migration,
To address this issue, this paper proposes a foundational approach for data migration,
aiming to generate migration scripts for automating the migration process.
The overarching challenge is preserving the business semantics of data amidst schema changes.
Specifically, this paper tackles the task of generating a migration script based on the schemas of both the existing and the desired system,
under the condition of zero downtime.
The proposed solution was validated by a prototype demonstrating its efficacy.
Notably, the \ourtheory{} is technology-independent, articulating systems in terms of invariants, thereby ensuring applicability across various scenarios.
Notably, the approach is technology-independent, articulating systems in terms of invariants, thereby ensuring applicability across various scenarios.
The migration script generator will be implemented in a software generator named Ampersand.
\keywords{Generative software \and Incremental software deployment \and Data migration \and Relation algebra \and Ampersand \and Schema change \and Invariants \and Zero downtime}
\end{abstract}
Expand Down Expand Up @@ -245,20 +241,20 @@ \section{Introduction}
Some forms of data pollution are not automatable, however.
An example is when a person has deliberately specified a false name without violating any constraint in the system.

The complexity of data migration has prompted us to develop an \ourtheory{} first,
The complexity of data migration has prompted us to develop an approach first,
which we present in this contribution.
We have validated the \ourtheory{} by prototyping because a formal proof of correctness is currently beyond our reach.
This \ourtheory{} perceives an information system as a data set with constraints,
We have validated the approach by prototyping because a formal proof of correctness is currently beyond our reach.
This approach perceives an information system as a data set with constraints,
so we can represent invariants (and thus the business semantics) directly as constraints.

The next section analyzes SCDMs with an eye on zero downtime and data quality.
It sketches the outline of a procedure for SCDMs.
Section~\ref{sct:Definitions} formalizes the concepts that we need to define the procedure.
Section~\ref{sct:Generating} defines the algoritm for generating a migration system, to automate SCDMs.
Section~\ref{sct:PoC} demonstrates the prototype of a migration system, which we used to verify our \ourtheory{} experimentally.
Section~\ref{sct:Generating} defines the migration system that automates SCDMs.
Section~\ref{sct:PoC} demonstrates the prototype of a migration system, which we used to verify our approach experimentally.
We have used the language Ampersand for this purpose,
because its syntax and semantics correspond directly to the definitions in section~\ref{sct:Definitions}.
Finally, section~\ref{sct:Validation} discusses the validation of our \ourtheory{} by showing that all requirements are met.
Finally, section~\ref{sct:Validation} discusses the validation of our approach by showing that all requirements are met.

\section{Analysis}
\label{sct:Analysis}
Expand All @@ -283,8 +279,8 @@ \subsection{Information Systems}
Actors (both users and computers) are changing the data in a system continually.
The state of the system is represented by a data set, typically represented in some form of persistent store such as a database.
Events that the system detects may cause the state to change.
To keep our \ourtheory{} technology independent, we assume that data sets contain triples.
This makes our \ourtheory{} valid for every kind of database that triples can represent,
To keep our approach technology independent, we assume that data sets contain triples.
This makes our approach valid for every kind of database that triples can represent,
including SQL databases, object-oriented databases, graph databases, triple stores, and other no-SQL databases.

We assume that constraints implement the business semantics of the data.
Expand Down Expand Up @@ -312,14 +308,9 @@ \subsection{Information Systems}
Of the three types of constraint, only two are invariants.

\subsection{Ampersand}
We employ Ampersand as a prototyping language to demonstrate our \ourtheory{}.
%More significantly, our intention is to augment the Ampersand compiler with the \ourtheory{} outlined in this paper
%to generate migration systems automatically.
% Opmerking van SJC: ik denk dat deze intentie hier het best achterwege gelaten kan worden,
% het zet de lezer direct op het spoor van: en waarom is dit nog niet in Ampersand geimplementeerd dan? Is dit niet af?
%
We employ Ampersand as a prototyping language to demonstrate our approach.
Ampersand serves as a language for specifying information systems through a framework of concepts, relations, and constraints.
It comprises the three types of constraints discussed in this paper, making it an ideal platform for practical testing of our \ourtheory{}.
It comprises the three types of constraints discussed in this paper, making it an ideal platform for practical testing of our approach.
In Ampersand, developers articulate constraints using heterogeneous relation algebra~\cite{Hattensperger1999,Alloy2006}
The systems they generate keep invariants satisfied and alert users to violations of business constraints.
The absence of imperative code in Ampersand scripts enhances reasoning about the system,
Expand Down Expand Up @@ -352,7 +343,7 @@ \subsection{Ampersand}
To avoid downtime, we must implement new blocking invariants initially as a business constraint,
to let users satisfy them.
The moment the last violation of $u$ is fixed, the business constraint can be removed and $u$ can be implemented as a blocking invariant.
This is the core idea of our \ourtheory{}.
This is the core idea of our approach.

The \define{migration system} to be generated is an intermediate system,
which contains all concepts, relations, and constraints of both the existing and the desired system.
Expand Down Expand Up @@ -536,7 +527,7 @@ \subsection{Constraints}
(a,b)\in \viol{u}{\pair{\triples}{\instance}}\, \Longrightarrow \\
\quad \viol{u}{\pair{\triples \cup \{\triple{a}{\declare{n}{A}{B}}{b}\}}{\instance}} = \viol{u}{\pair{\triples}{\instance}} - \{(a,b)\}
\end{array}
% SJC's opmerking: \instance \cup \{\pair{a}{A},\pair{b}{B}\} = \instance ivm eqn:wellTypedViolation
% SJC's remark: \instance \cup \{\pair{a}{A},\pair{b}{B}\} = \instance ivm eqn:wellTypedViolation
\label{eqn:transaction}
\end{equation}
It is obvious that not every conceivable constraint can satisfy this equation.
Expand Down Expand Up @@ -702,7 +693,7 @@ \section{Generating a Migration Script}
\begin{array}[t]{l}
\text{\bf with}\label{eqn:Bfix}\\
\sign{v}=\sign{u}\\
\viol{v}{\dataset}=\viol{u}{\dataset} % -{\tt fixed}_u %opmerking van SJC: de - fixed is overbodig omdat eqn:blockRule geldt
\viol{v}{\dataset}=\viol{u}{\dataset}
\end{array}\\
&\mid u\in\overrightarrow{\rules_{\schema'}-\rules_{\schema}}\}\notag
\end{align}
Expand All @@ -715,12 +706,12 @@ \section{Generating a Migration Script}
replaces $\rules_\text{block}$ in the migration system by the blocking invariants of the desired system.
This moment arrives when:
\begin{align}
\forall u\in\overrightarrow{\rules_{\schema'}-\rules_{\schema}}.~{\viol{u}{\dataset}}\subseteq \pop{{\tt fixed}_u}{\dataset}
\forall u\in\overrightarrow{\rules_{\schema'}-\rules_{\schema}}.~{\viol{u}{\dataset}} = \emptyset
\label{eqn:readyForMoC}
\end{align}
Equivalently, $\forall u\in\overrightarrow{\rules_{\schema'}-\rules_{\schema}}.~\viol{u}{\dataset} = \emptyset$. After this, the migration engineer can remove the migration system and the old system.
After this, the migration engineer can remove the migration system and the old system.

\item Let us combine the above into a single migration schema:
\item\label{step6} Let us combine the above into a single migration schema:
\begin{align}
\schema_\migrsys=\langle{}&\concepts_\dataset\cup\concepts_{\dataset'},\label{eqn:schema migrsys}\\
&\overleftarrow{\rels_{\schema}}\cup\overrightarrow{\rels_{\schema'}}\cup\rels_1\cup\rels_2,\notag\\
Expand Down Expand Up @@ -834,27 +825,32 @@ \section{Validation}

\section{Conclusion}
\label{sct:Conclusions}
In this paper, we describe the data migration as going from an existing system to a desired one, where the schema changes.
As Ampersand generates information systems, creating a new system can be a small task, allowing for incremental deployment of new features.
We describe the parts of a system that have an effect on data pollution.
We assume that the existing system does not violate any constraints of its schema, but address other forms of data pollution:
constraints that are not in the schema but are in the desired schema are initially relaxed such that the business can start using the migration system, after which this form of data pollution needs to be addressed by human intervention.
We propose a method for doing migration such that only a finite amount of human intervention is needed.
Our method allows a system similar to the desired system to be used while the intervention takes place.
We have shown that it is possible to generate a migration system from the schemas of an existing system and a desired system.
We have defined a migration system such that:
\begin{itemize}
\item after deploying the migration system, data from the existing system will be copied automatically,
\item the migration system behaves as a relaxed version of the desired system,
\item the migration engineer can introduce transactional invariants in the migration system to address data pollution,
\item there is no down time in switching from the existing system to the migration system at the MoT,
\item the number of violations the business needs to fix is finite and known at the MoT,
\item there is no down time in switching from the migration system to the desired system at the MoC,
\item the migration system is equivalent to the desired system at the MoC.
\end{itemize}

The data migration we propose is certainly not the only approach one could think of.
However, we have not come across other approaches that allow changing the schema in the presence of constraints.
As such, we cannot compare our approach against other approaches.
As such, we cannot compare our approach against other approaches yet.
We envision that one day there will be multiple approaches for migration under a changing schema to choose from.
For now, our next step is to implement the approach shown here into Ampersand.

This work does not consider what to do about (user) interfaces.
This work does not consider what to do about (user) interfaces yet.
Instead, it models events by assuming that any change to the data set can be achieved.
In practice, such changes need to be achieved through interfaces.
Most Ampersand systems indeed allow the users of the system to edit the data set quite freely through the interfaces.
However, some interfaces may require certain constraints to be satisfied, which means that interfaces of the desired system might break.
In the spirit of the approach outlined here, we hope to generate migration interfaces that can replace any broken interfaces until the Moment of Transition.
How to do this is future work.
This has allowed us to set aside interfaces in this paper for now.
Nevertheless, we expect that an analysis of interfaces may yield insights about event streams,
which are further building blocks to a foundational theory of information systems.
But that is future work.

%\section{Bibliography}
\bibliographystyle{splncs04}
Expand Down

0 comments on commit bf94f68

Please sign in to comment.