Skip to content

What I'd want to see in version 5.0

Travis Parks edited this page Jul 18, 2021 · 10 revisions

This project is old. Maybe it just feels old. It's been around, in some form or another, since 2013!

I still do not see any other projects out there trying to achieve the same explicit schema definition that FlatFiles pushes for so hard.

For the most part, I am really happy with where this project is. Maintenance is minimal and overall it feels feature complete. There are a couple of big ticket items that I feel would improve the performance of the tool and make it more reusable.

  1. Switch to .NET pipelines to replace the circular queue implementation.
  2. Rework the type mapping code emission logic.
  3. Expose a un-opinionated separated value parser that just returns string[].
  4. Rename classes containing SeparatedValue with Delimited. So, for example, SeparatedValueReader would become DelimitedReader.
  5. Make sure schema/configuration objects are immutable or make sure changes don't affect reader/writer behavior after creation.
  6. Reorganize unit tests.
  7. Update to the minimal .NET Framework/Standard version supporting needed dependencies.
  8. Remove obsolete code
  9. Enable checks for nullable reference types and eliminate all warnings.
  10. Make physical and logic record numbers long instead of int
  11. Make all EventArg classes sealed
  12. Create an interface/abstract base class for numeric column definitions.
  13. Create an interface/abstract base class for temporal column definitions.
  14. Make Window sealed
  15. Create generic and non-generic IPropertyMapping interface.
  16. Create an interface/abstract base class for numeric property mappings.
  17. Create an interface/abstract base class for temporal column definitions.

For the first item, read this blog to get an idea what pipelines are: https://devblogs.microsoft.com/dotnet/system-io-pipelines-high-performance-io-in-net/ Currently, I implemented a simple circular queue class that allows me to read chunks of text from a stream. However, this is home-grown so probably not super efficient.

For the second item, I would like to change how I am using the System.Reflection.Emit classes. There has been a handful of useful feedback about other ways I could generate this code that don't require knowing so much about the internals of the .NET virtual machine. I also might be able to come up with a better, more efficient way of handling setting properties in deeply nested class hierarchies.

For the third item, I already allow parsing delimited files without specifying a schema. However, the GetValues method still returns object[], which may be annoying for some users who don't need type conversion/mapping. I have been one of those people, so I get it. It would be nice if there was a true separation between a parser that returns string[] and the reader that converted those strings into other types, object[]. In fact, I could see the schema definition and mapping code I wrote in this project to be reusable.

I must not have known the term "delimited" when I first started this project. The "SeparatedValue" prefix is soooo long. I feel it's worth a total backward compatibility break just to get away from this horrible name.

Someone recently pointed out I might have allowed configuration options to be changed after-the-fact. I call Options.Clone() to avoid this, but it's not fool-proof. For example, anywhere someone gets their hands on an IOption object, they have the potential to upcast it to the actual option type and call the setter. This problem is even more prominent regarding schema objects. Yuck!

I was just writing some new unit tests the other day. To put it plainly, I have no idea where tests belong. The test names don't correspond with the names of what they are testing, or sub-features, at all.

The last time I put serious effort into this project, the .NET Standard was coming along, but it still was missing some critical functionality. It'd be nice to drop support for .NET Framework 4.5.1, if the .NET Standard included the necessary classes. I will need to see if Span<T> and Memory<T> are available yet.