Skip to content

abhiglobalistic/hams_ml

Repository files navigation

MACHINE LEARNING ENGINEER / DATA SCIENTIST

ML Challenge

Here is a small example data set (CSV sample.zip - 40 MB unzipped, 2 MB zipped) containing 66k records/rows and 295 features/columns. The target variable is the last column (the 296th) with values/classes A,B,C,D,E. Obviously we challenge you to analyze the data and to build some initial ML model which predicts the classes.

Steps:

1. Make an initial data analysis.

Visualize the main characteristics of the datset and try to highlight potential helpful structures in the data.

2. Fit some ML model(s).

Train and evaluate different models and please explain briefly your choices for the models and their pros & cons.

3. Show with some X-validation the power of your model and comment the results.

We are of course interested in the overall performance, but much more in the performance per class and especially in the under represented ones.

4. Present us the results and the steps you have taken.

If possible add also some critical thinking and next possible steps. But mainly explain why your results are good and what insights we can obtain from it.

Deliverable & Remarks:

  • well commented and easy to follow code
  • send us straight .py python files (no ipython notebooks)
  • work with classes and functions, show a bit of your programming skills ;-)
  • PDF (max 3-4 pages) with brief steps taken, some plots and results

Please use only Python for your solution!

We don't expect you to build THE solution here.

Our goal here is:

  • see how you approach such a problem
  • get an idea of your programming skills and ML knowledge
  • see how you can summarize and present results