-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Regex-validated string type #6579
Comments
Yeah, I've seen this combing through DefinitelyTyped, . Even we could use something like this with The main problems are:
|
Huge +1 on this, ZipCode, SSN, ONet, many other use cases for this. |
I faced the same problem, and I see that it is not implemented yet, maybe this workaround will be helpful: |
As @mhegazy suggested I will put my sugggestion (#8665) here. What about allow simple validation functions in type declarations? Something like that: type Integer(n:number) => String(n).macth(/^[0-9]+$/)
let x:Integer = 3 //OK
let y:Integer = 3.6 //wrong
type ColorLevel(n:number) => n>0 && n<= 255
type RGB = {red:ColorLevel, green:ColorLevel, blue:ColorLevel};
let redColor:RGB = {red:255, green:0, blue:0} //OK
let wrongColor:RGB = {red:255, green:900, blue:0} //wrong
type Hex(n:string) => n.match(/^([0-9]|[A-F])+$/)
let hexValue:Hex = "F6A5" //OK
let wrongHexValue:Hex = "F6AZ5" //wrong The value that the type can accept would be determined by the function parameter type and by the function evaluation itself. That would solve #7982 also. |
@rylphs +1 this would make TypeScript extremely powerful |
How does subtyping work with regex-validated string types? let a: RegExType_1
let b: RegExType_2
a = b // Is this allowed? Is RegExType_2 subtype of RegExType_1?
b = a // Is this allowed? Is RegExType_1 subtype of RegExType_2? where Edit: It looks like this problem is solvable in polynomial time (see The Inclusion Problem for Regular Expressions). |
Would also help with TypeStyle : typestyle/typestyle#5 🌹 |
In JSX, @RyanCavanaugh and I've seen people add interface IntrinsicElements {
// ....
[attributeName: /aria-\w+/]: number | string | boolean;
} |
Design ProposalThere are a lot of cases when developers need more specified value then just a string, but can't enumerate them as union of simple string literals e.g. css colors, emails, phone numbers, ZipCode, swagger extensions etc. Even json schema specification which commonly used for describing schema of JSON object has pattern and patternProperties that in terms of TS type system could be called GoalsProvide developers with type system that is one step closer to JSON Schema, that commonly used by them and also prevent them from forgetting about string validation checks when needed. Syntactic overviewImplementation of this feature consists of 4 parts: Regex validated typetype CssColor = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i; type Email = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i; type Gmail = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i; Regex-validated variable typelet fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i; and the same, but more readable let fontColor: CssColor; Regex-validated variable type of indexinterface UsersCollection {
[email: /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i]: User;
} and the same, but more readable interface UsersCollection {
[email: Email]: User;
} Type guard for variable typesetFontColorFromString(color: string) {
fontColor = color;// compile time error
if (/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(color)) {
fontColor = color;// correct
}
} and same setFontColorFromString(color: string) {
fontColor = color;// compile time error
if (!(/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(color))) return;
fontColor = color;// correct
} and using defined type for better readability setFontColorFromString(color: string) {
fontColor = color;// compile time error
if (CssColor.test(color)) {
fontColor = color;// correct
}
} same as setFontColorFromString(color: string) {
fontColor = color;// compile time error
if (!(CssColor.test(color))) return;
fontColor = color;// correct
} Type gurard for index typelet collection: UsersCollection;
getUserByEmail(email: string) {
collection[email];// type is any
if (/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(email)) {
collection[email];// type is User
}
} same as let collection: UsersCollection;
getUserByEmail(email: string) {
collection[email];// type is any
if (!(/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(email))) return;
collection[email];// type is User
} and using defined type for better readability let collection: UsersCollection;
getUserByEmail(email: string) {
collection[email];// type is any
if (Email.test(email)) {
collection[email];// type is User
}
} same as let collection: UsersCollection;
getUserByEmail(email: string) {
collection[email];// type is any
if (!(Email.test(email))) return;
collection[email];// type is User
} Semantic overviewAssignmentslet email: Email;
let gmail: Gmail;
email = 'test@example.com';// correct
email = 'test@gmail.com';// correct
gmail = 'test@example.com';// compile time error
gmail = 'test@gmail.com';// correct
gmail = email;// obviously compile time error
email = gmail;// unfortunately compile time error too Unfortunately we can't check is one regex is subtype of another without hard performance impact due to this article. So it should be restricted. But there are next workarounds: // explicit cast
gmail = <Gmail>email;// correct
// type guard
if (Gmail.test(email)) {
gmail = email;// correct
}
// another regex subtype declaration
type Gmail = Email & /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
gmail = email;// correct Unfortunately assigning of let someEmail = 'test@example.com';
let someGmail = 'test@gmail.com';
email = someEmail;// compile time error
gmail = someGmail;// compile time error But we are able to use explicit cast or type guards as shown here. Second is recommended. let someEmail: 'test@example.com' = 'test@example.com';
let someGmail: 'test@gmail.com' = 'test@gmail.com';
email = someEmail;// correct
gmail = someGmail;// correct Type narrowing for indexesFor simple cases of type Email = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i; type Gmail = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i; interface UsersCollection {
[email: Email]: User;
[gmail: Gmail]: GmailUser;
}
let collection: UsersCollection;
let someEmail = 'test@example.com';
let someGmail = 'test@gmail.com';
collection['test@example.com'];// type is User
collection['test@gmail.com'];// type is User & GmailUser
collection[someEmail];// unfortunately type is any
collection[someGmail];// unfortunately type is any
// explicit cast is still an unsafe workaround
collection[<Email> someEmail];// type is User
collection[<Gmail> someGmail];// type is GmailUser
collection[<Email & Gmail> someGmail];// type is User & GmailUser Literals haven't such problem: let collection: UsersCollection;
let someEmail: 'test@example.com' = 'test@example.com';
let someGmail: 'test@gmail.com' = 'test@gmail.com';
collection[someEmail];// type is User
collection[someGmail];// type is User & GmailUser But for variables the best option is using type guards as in next more realistic examples: getUserByEmail(email: string) {
collection[email];// type is any
if (Email.test(email)) {
collection[email];// type is User
if (Gmail.test(email)) {
collection[email];// type is User & GmailUser
}
}
if (Gmail.test(email)) {
collection[email];// type is GmailUser
}
} But if we'll use better definition for type Gmail = Email & /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i; getUserByEmail(email: string) {
collection[email];// type is any
if (Email.test(email)) {
collection[email];// type is User
if (Gmail.test(email)) {
collection[email];// type is User & GmailUser
}
}
if (Gmail.test(email)) {
collection[email];// type is User & GmailUser
}
} Unions and intersectionsActually common types and type Regex_1 = / ... /;
type Regex_2 = / ... /;
type NonRegex = { ... };
type test_1 = Regex_1 | Regex_2;// correct
type test_2 = Regex_1 & Regex_2;// correct
type test_3 = Regex_1 | NonRegex;// correct
type test_4 = Regex_1 & NonRegex;// compile time error
if (test_1.test(something)) {
something;// type is test_1
// something matches Regex_1 OR Regex_2
}
if (test_2.test(something)) {
something;// type is test_2
// something matches Regex_1 AND Regex_2
}
if (test_3.test(something)) {
something;// type is Regex_1
} else {
something;// type is NonRegex
} GenericsThere are no special cases for generics, so class Something<T extends String> { ... }
let something = new Something<Email>();// correct Emit overviewUnlike usual types, type Regex_1 = / ... /;
type Regex_2 = / ... /;
type NonRegex = { ... };
type test_1 = Regex_1 | Regex_2;
type test_2 = Regex_1 & Regex_2;
type test_3 = Regex_1 | NonRegex;
type test_4 = Regex_1 & NonRegex;
if (test_1.test(something)) {
/* ... */
}
if (test_2.test(something)) {
/* ... */
}
if (test_3.test(something)) {
/* ... */
} else {
/* ... */
} will compile to: var Regex_1 = / ... /;
var Regex_2 = / ... /;
if (Regex_1.test(something) || Regex_2.test(something)) {
/* ... */
}
if (Regex_1.test(something) && Regex_2.test(something)) {
/* ... */
}
if (Regex_1.test(something)) {
/* ... */
} else {
/* ... */
} Compatibility overviewThis feature has no issues with compatibility, because there only case that could break it and it is related to that type someType = { ... };
var someType = { ... }; when code below is not: type someRegex = / ... /;
var someRegex = { ... }; But second already WAS invalid, but due to another reason (type declaration was wrong). P.S.Feel free to point on things that I probably have missed. If you like this proposal, I could try to create tests that covers it and add them as PR. |
I've forgotten to point to some cases for intersections and unions of |
@Igmat, question about your design proposal: Could you elaborate on the emit overview? Why would regex-validated types need to be emitted? As far as I can tell, other types don't support runtime checks... am I missing something? |
@alexanderbird, yes, any other type have no impact on emit. At first, I thought that let fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
fontColor = "#000"; and this: type CssColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
let fontColor: CssColor;
fontColor = "#000"; It's ok and has no need for emit changes, because let someString: string;
if (/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(someString)) {
fontColor = someString; // Ok
}
fontColor = someString; // compile time error So it also has no impact on emit and looks ok, except that regex isn't very readable and have to be copied in all places, so user could easily make a mistake. But in this particular case it still seems to be better than changing how let someString: string;
let email: /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/I;
if (/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(someString)) {
email = someString; // Ok
}
email = someString; // compile time error is a nightmare. And it's even without intersections and unions. So to avoid happening of stuff like this, we have to slightly change |
@DanielRosenwasser, could you, please, provide some feedback for this proposal? And also for tests referenced here, if possible? |
Hey @Igmat, I think there are a few things I should have initially asked about To start, I still don't understand why you need any sort of change to emit, and I don't think any sort of emit based on types would be acceptable. Check out our non-goals here. Another issue I should have brought up is the problem of regular expressions that use backreferences. My understanding (and experience) is that backreferences in a regular expression can force a test to run in time exponential to its input. Is this a corner case? Perhaps, but it's something I'd prefer to avoid in general. This is especially important given that in editor scenarios, a type-check at a location should take a minimal amount of time. Another issue is that we'd need to either rely on the engine that the TypeScript compiler runs on, or build a custom regular expression engine to execute these things. For instance, TC39 is moving to include a new |
@Igmat - there is no question in my mind that having regexes emitted at runtime would be useful. However, I don't think they're necessary for this feature to be useful (and from the sounds of what @DanielRosenwasser has said, it probably wouldn't get approved anyway). You said
I think this is only the case if we are to narrow from a dynamic string to a regex-validated type. This gets very complicated. Even in this simple case:
We can't be sure that the types will match - what if the number is negative? And as the regexes get more complicated, it just gets messier and messier. If we really wanted this, maybe we allow "type interpolation: Instead, we could get partway to the goal if we only allowed string literals to be assigned to regex-validated types. Consider the following:
Do you think that's a workable alternative? |
@DanielRosenwasser, I've read Design Goals carefully and, if I understand you correctly, problem is violation of Non-goals#5. const emailRegex = /.../;
/**
* assign it only with values tested to emailRegex
*/
let email: string;
let userInput: string;
// somehow get user input
if (emailRegex.test(userInput)) {
email = userInput;
} else {
console.log('User provided invalid email. Showing validation error');
// Some code for validation error
} With this proposal implemented it would look like: type Email = /.../;
let email: Email;
let userInput: string;
// somehow get user input
if (Email.test(userInput)) {
email = userInput;
} else {
console.log('User provided invalid email. Showing validation error');
// Some code for validation error
} As you see, code is almost the same - it's a common simple usage of regex. But second case is much more expressive and will prevent user from accidental mistake, like forgetting to check string before assignment it to variable that meant to be regex-validated. |
@alexanderbird, I don't suggest making this code valid or add some hidden checks in both runtime and compile time. function foo(bar: number) {
let baz: /prefix:\d+/ = 'prefix:' + number;
} This code have to throw error due to my proposal. But this: function foo(bar: number) {
let baz: /prefix:\d+/ = ('prefix:' + number) as /prefix:\d+/;
} or this: function foo(bar: number) {
let baz: /prefix:\d+/;
let possibleBaz: string = 'prefix:' + number;
if (/prefix:\d+/.test(possibleBaz)) {
baz = possibleBaz;
}
} would be correct, and even have no impact to emitted code. And as I showed in previous comment, literals would be definitely not enough even for common use cases, because we often have to work with stings from user input or other sources. Without implementing of this emit impact, users would have to work with this type in next way: export type Email = /.../;
export const Email = /.../;
let email: Email;
let userInput: string;
// somehow get user input
if (Email.test(userInput)) {
email = <Email>userInput;
} else {
console.log('User provided invalid email. Showing validation error');
// Some code for validation error
} or for intersections: export type Email = /email-regex/;
export const Email = /email-regex/;
export type Gmail = Email & /gmail-regex/;
export const Gmail = {
test: (input: string) => Email.test(input) && /gmail-regex/.test(input)
};
let gmail: Gmail;
let userInput: string;
// somehow get user input
if (Gmail.test(userInput)) {
gmail = <Gmail>userInput;
} else {
console.log('User provided invalid gmail. Showing validation error');
// Some code for validation error
} I don't think that forcing users to duplicate code and to use explicit cast, when it could be easily handled by compiler isn't a good way to go. Emit impact is really very small and predictable, I'm sure that it won't surprise users or lead to some feature misunderstood or hard to locate bugs, while implementing this feature without emit changes definitely WILL. In conclusion I want to say that in simple terms |
@DanielRosenwasser and @alexanderbird ok, I have one more idea for that. What about syntax like this: const type Email = /email-regex/; In this case user have to explicitly define that he/she want this as both const Email = /email-regex/; This seems to be even bigger than just improvement for this proposal, because this probably could allow something like this (example is from project with Redux): export type SOME_ACTION = 'SOME_ACTION';
export const SOME_ACTION = 'SOME_ACTION' as SOME_ACTION; being converted to export const type SOME_ACTION = 'SOME_ACTION'; I've tried to found some similar suggestion but wasn't successful. If it could be a workaround and if you like such idea, I can prepare Design Proposal and tests for it. |
@DanielRosenwasser, about your second issue - I don't think that it would ever happen, because in my suggestion compiler runs regex only for literals and it doesn't seems that someone will do something like this: let something: /some-regex-with-backreferences/ = `
long enough string to make regex.test significantly affect performance
` Anyway we could test how long literal should be for affecting real-time performance and create some heuristic that will warn user if we are unable to check it while he faces this circumstances in some editor scenarios, but we would check it when he will compile the project. Or there could be some other workarounds. About third question, I'm not sure that understand everything correctly, but it seems that regex engine should be selected depending on |
@DanielRosenwasser are there any thoughts? 😄 About initial proposal and about last one. May be I have to make more detailed overview of second one, do I? |
I know this has been beaten to death and some good proposals have been given already. But I just wanted to add extra stuff that some might find mildly interesting.
Back references can make a regexp describe a context-sensitive grammar, a superset of context-free grammars. And language equality for CFGs is undecidable. So it's even worse for CSGs, which are equivalent to linear-bounded automatons. Assuming just all the regular expressions that can be converted to a DFA are used in a regexp (concat, union, star, intersection, complement, etc.), converting a regexp to an NFA is O(n), getting the product of two NFAs is O(m*n), then traversing the resulting graph for accept states is O(m*n). So, checking the language equality/subset of two regular regexps is also O(m*n). The problem is that the alphabet is really large here. Textbooks restrict themselves to alphabets of size 1-5 usually, when talking about DFAs/NFAs/regular expressions. But with JS regexps, we have all of unicode as our alphabet. Granted, there can be efficient ways of representing transition functions using sparse arrays and other clever hacks and optimizations for equality/subset testing... I'm confident it's possible to do type checking for regular-to-regular assignment somewhat efficiently. Then, all non-regular assignments can just require explicit type assertions. I've recently worked on a small finite automaton project, so the info is still fresh in my mind =x |
Funnily enough, this is exactly what's possible with the new template string literal types. This case is avoided by having a threshold for union types, it seems. |
@AnyhowStep JS backreferences are the only context-sensitive production (and a fairly simple and limited one at that - only up to 9 groups can be referenced like that), and the rest of the regexp grammar is regular, so that's why I suspect it is decidable. But regardless, I think we can agree it's not practical in any sense of the word. 🙂 Edit: accuracy |
I confirmed this comment from @rozzzly works with TS 4.1.0 nightly! type TLD = 'com' | 'net' | 'org';
type Domain = `${string}.${TLD}`;
type Url = `${'http'|'https'}://${Domain}`;
const success: Url = 'https://example.com';
const fail: Url = 'example.com';
const domain: Domain = 'example.com'; Try it in the playground and see that Update: after playing with this feature a bit, it will not cover many use cases. For example, it doesn't work for a hex color string. type HexChar = '0' | '1' | '2' | '3' | '4' | '5' | '6'| '7' | '8' | '9' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F';
type HexColor = `#${HexChar}${HexChar}${HexChar}${HexChar}${HexChar}${HexChar}`;
let color: HexColor = '#123456'; Today, that fails with "Expression produces a union type that is too complex to represent.(2590)" |
This would solve the data- or aria- problem that most of us face in UX libraries if it can be applied to indexes. |
Basically this but obviously that doesn't work because TS only allows string | number. Since this is essentially a string can it be enabled? |
There was some reference to this limitation in the release notes. It creates a list of all the possible valid combinations, in this case it would create a union with 16,777,216 (i.e., 16^6) members. |
This is a great idea... Igmat made some incredible posts back in 2016 that look good on paper anyway. I found this because I wanted to make sure the keys of an object literal passed into my function were valid css class names. I can easily check at runtime... but to me it seems so obvious that typescript should be able to do this at compile time, especially in situations where I am just hard-coding object literals and typescript shouldn't have to figure out if MyUnionExtendedExotictype satisfies SomeArbitraryRegexType. Maybe one day I will be knowledgeable enough to make a more productive contribution :/ |
Wow. I honestly did not expect to see this get implemented, not anytime soon at least.
I'd be curious to see how large that union could get before it became a problem performance wise. @styfle's example shows how easy it is to hit that ceiling. There's obviously going to be a some degree of diminishing returns of usefulness of complex types vs performance.
I'm fairly confident in saying that it's not possible with the current implementation. If there was support for quantifiers and ranges you would probably get validation for BEM style class names. The standard js regex for that isn't too terrible: |
I'm sorry. I had to do this. I implemented regular languages in TypeScript.
More accurately, I implemented a simple deterministic finite automaton using TS 4.1 I mean, we can already implement Turing machines in TS. So, DFAs and PDAs are "easy", compared to that. And template strings make this more usable. The core types are actually simple and fit in < 30 LOC, type Head<StrT extends string> = StrT extends `${infer HeadT}${string}` ? HeadT : never;
type Tail<StrT extends string> = StrT extends `${string}${infer TailT}` ? TailT : never;
interface Dfa {
startState : string,
acceptStates : string,
transitions : Record<string, Record<string, string>>,
}
type AcceptsImpl<
DfaT extends Dfa,
StateT extends string,
InputT extends string
> =
InputT extends "" ?
(StateT extends DfaT["acceptStates"] ? true : false) :
AcceptsImpl<
DfaT,
DfaT["transitions"][StateT][Head<InputT>],
Tail<InputT>
>;
type Accepts<DfaT extends Dfa, InputT extends string> = AcceptsImpl<DfaT, DfaT["startState"], InputT>; It's specifying the automatons that's the hard part. But I'm pretty sure someone can make a regex to TypeScript DFA™ generator... I'd also like to highlight that the "hex string of length 6" example shows you can make function parameters only accept strings matching the regex using ugly hackery, declare function takesOnlyHex<StrT extends string> (
hexString : Accepts<HexStringLen6, StrT> extends true ? StrT : {__err : `${StrT} is not a hex-string of length 6`}
) : void;
//OK
takesOnlyHex("DEADBE")
//Error: Argument of type 'string' is not assignable to parameter of type '{ __err: "DEADBEEF is not a hex-string of length 6"; }'.
takesOnlyHex("DEADBEEF")
//OK
takesOnlyHex("01A34B")
//Error: Argument of type 'string' is not assignable to parameter of type '{ __err: "01AZ4B is not a hex-string of length 6"; }'.
takesOnlyHex("01AZ4B") Here's a bonus Playground; it implements the regex And another Playground; it implements the regex One final example, Playground; this is a floating point string regex! |
@AnyhowStep Well i used your DFA idea to implement a simple regex |
https://cyberzhg.github.io/toolbox/min_dfa?regex=ZCgoYmQqYiopKmMpKg== https://github.com/CyberZHG/toolbox If I had more willpower, I'd grab something like the above and use it to turn regexes into TS DFAs™ lol |
Okay, I just threw together a prototype, https://glitch.com/~sassy-valiant-heath [Edit] https://glitch.com/~efficacious-valley-repair <-- This produces way better output for more complicated regexes [Edit] It seems like Glitch will archive free projects that are inactive for too long. So, here's a git repo with the files, Step 1, key in your regex here, Step 3, click the generated TS playground URL, Step 4, scroll down till Step 5, play with input values, Shoutout to @kpdyer , author of https://www.npmjs.com/package/regex2dfa , for doing the heavy lifting of the conversion |
In case someone needs something a little more powerful, here's a Turing machine 😆 |
This thread has gotten too long to read and many of the comments are either addressed by template literal types or are off-topic. I've created a new issue #41160 for discussion of what remaining use cases might be enabled by this feature. Feel free to continue discussing type system parsers here 😀 |
Here is a workaround :) interface $A_MAP {
a: "a";
b: "b";
c: "c";
d: "d";
e: "e";
f: "f";
}
type $a = keyof $A_MAP;
type $aa = "a";
type $ab = $aa | "b";
type $ac = $ab | "c";
type $ad = $ac | "d";
type $ae = $ad | "e";
type $af = $ae | "f";
interface $A_UMAP {
a: $aa;
b: $ab;
c: $ac;
d: $ad;
e: $ae;
f: $af;
}
interface $D_MAP {
0: "0";
1: "1";
2: "2";
3: "3";
4: "4";
5: "5";
6: "6";
7: "7";
8: "8";
9: "9";
}
type $d = $D_MAP[keyof $D_MAP];
type $d0 = "0";
type $d1 = "0" | "1";
type $d2 = $d1 | "2";
type $d3 = $d2 | "3";
type $d4 = $d3 | "4";
type $d5 = $d4 | "5";
type $d6 = $d5 | "6";
type $d7 = $d6 | "7";
type $d8 = $d7 | "8";
type $d9 = $d8 | "9";
interface $D_UMAP {
0: $d0;
1: $d1;
2: $d2;
3: $d3;
4: $d4;
5: $d5;
6: $d6;
7: $d7;
8: $d8;
9: $d9;
}
type $Max_1<T extends string> = "" | `${T}`;
type $Max_2<T extends string> = $Max_1<T> | `${$Max_1<T>}${T}`;
type $Max_3<T extends string> = $Max_2<T> | `${$Max_2<T>}${T}`;
type $Max_4<T extends string> = $Max_3<T> | `${$Max_3<T>}${T}`;
type $Max_5<T extends string> = $Max_4<T> | `${$Max_4<T>}${T}`;
type $Max_6<T extends string> = $Max_5<T> | `${$Max_5<T>}${T}`;
type $Max_7<T extends string> = $Max_6<T> | `${$Max_6<T>}${T}`;
type $Max_8<T extends string> = $Max_7<T> | `${$Max_7<T>}${T}`;
type $Max_9<T extends string> = $Max_8<T> | `${$Max_8<T>}${T}`;
interface $Max_Map<T extends string> {
1: $Max_1<T>;
2: $Max_2<T>;
3: $Max_3<T>;
4: $Max_4<T>;
5: $Max_5<T>;
6: $Max_6<T>;
7: $Max_7<T>;
8: $Max_8<T>;
9: $Max_9<T>;
}
type $Repeat_1<T extends string> = `${T}`;
type $Repeat_2<T extends string> = `${$Repeat_1<T>}${T}`;
type $Repeat_3<T extends string> = `${$Repeat_2<T>}${T}`;
type $Repeat_4<T extends string> = `${$Repeat_3<T>}${T}`;
type $Repeat_5<T extends string> = `${$Repeat_4<T>}${T}`;
type $Repeat_6<T extends string> = `${$Repeat_5<T>}${T}`;
type $Repeat_7<T extends string> = `${$Repeat_6<T>}${T}`;
type $Repeat_8<T extends string> = `${$Repeat_7<T>}${T}`;
type $Repeat_9<T extends string> = `${$Repeat_8<T>}${T}`;
interface $Repeat_Map<T extends string> {
1: $Repeat_1<T>;
2: $Repeat_2<T>;
3: $Repeat_3<T>;
4: $Repeat_4<T>;
5: $Repeat_5<T>;
6: $Repeat_6<T>;
7: $Repeat_7<T>;
8: $Repeat_8<T>;
9: $Repeat_9<T>;
}
// regexp: /[a-f]/
type $arg<From extends $a, To extends $a> =
| Exclude<$A_UMAP[To], $A_UMAP[From]>
| $A_MAP[From];
// regexp: /[0-9]/
type $drg<From extends keyof $D_UMAP, To extends keyof $D_UMAP> =
| Exclude<$D_UMAP[To], $D_UMAP[From]>
| $D_MAP[From];
// regexp: /T{From,To}/
type $rp<
T extends string,
From extends keyof $Max_Map<T>,
To extends keyof $Max_Map<T>
> = $Repeat_Map<T>[From] | Exclude<$Max_Map<T>[To], $Max_Map<T>[From]>;
// examples:
// regexp: /[5-9]/
const reg0: $drg<5, 9> = "7";
// regexp: /[b-e]/
const reg: $arg<"b", "e"> = "d";
// regexp: /a{2,6}/
const reg1: $rp<"a", 2, 6> = "aa";
// regexp: /\d{1,3}/
const reg2: $rp<$d, 1, 3> = "22";
// regexp: /[3-5]{1,3}/
const reg4: $rp<$drg<3, 5>, 1, 3> = "334"; |
For those who need type-safety on "predefined" attributes, here is what helped me: // global.d.ts
declare module "react" {
interface HTMLAttributes {
"data-testid"?: string;
"data-element-name"?: "first" | "second";
}
} In case you need to type every // global.d.ts
declare module "react" {
type DataAttributeValue = number;
interface HTMLAttributes {
[`data-${string}`]?: DataAttributeValue;
}
} |
There are cases, where a property can not just be any string (or a set of strings), but needs to match a pattern.
It's common practice in JavaScript to store color values in css notation, such as in the css style reflection of DOM nodes or various 3rd party libraries.
What do you think?
The text was updated successfully, but these errors were encountered: