Create a system of determining whether a given symbol contains a diacritic, ie an accent, for any and all germanic characters. Additionally, be able to translate said accented symbol into its unaccented base form. This will allow for more standard reading of documents/inputs.
The DeAccenter class is an object that, once initialized, can take in strings and determine if they have diacritics and replace the symbol with its unaccented base form. This project is especially annoying since diacritics are implemented differently between operating systems, and also that diacritics utilize pairs of characters rather than a single character. For example, if you have a single symbol in a string with a diacritic, such as 'À', calling size() on this string would return 2. Thus, in order to account for taking in two characters while maintaining an ease of interface, DeAccenter methods take and return std::strings, some of which assert a string having only two characters.
This class creates a hash map upon initialization that's keys are accented symbols and mapped types are the corresponding base, unaccented characters.
variable | description |
---|---|
std::unordered_map<std::string,char32_t> hash; | a container that maps all germanic symbols with accents to the same germanic chars without accents |
signature | description |
---|---|
explicit DeAccenter(); | def constructor |
bool isAccented(const std::string& string, uint32_t index) const; | determines if a symbol is a germanic letter with an accent at index, capital or lowercase |
bool containsAccent(const std::string& string) const; | determines if a string contains a germanic symbol with an accent, capital or lowercase |
std::string removeAccent(const std::string& string, uint32_t index) const; | if the given parameter is a germanic character with an accent, it will return the germanic character without an accent, capital or lowercase |
std::string removeAllAccents(const std::string& string) const; | replaces any germanic symbols with accents with their base, unaccented characters |