Data science for HR managers | 2021 | Part 1

 


The following types/examples of questions, not all of which are considered pass/fail, but do give us a reasonable comprehensive picture of the candidate's depth in this area.

In general, pick one or two (that the candidate is good at) and keep going deeper and deeper, rather than go horizontally through some checklist. It is far more indicative of depth.


General mastery: when you really understand something (e.g., you've gone through the cycle of learning-doing-teaching-doing), you can express seemingly complex concepts in simple ways. Or you develop insightful views on things at a broader level and can explain it to others. E.g.,:

  1. Discuss your views on the relationship between machine learning and statistics.
  2. Talk about how Deep Learning (or XYZ method) fits (or not?) within the field.
  3. Isn't it all just curve fitting? Talk about that.
  4. How are kernel methods different?
  5. Why do we need/want the bias term?
  6. Why do we call it GLM when it's clearly non-linear? (somewhat tricky question, to be asked somewhat humorously---but extremely revealing.)
  7. How are neural nets related to Fourier transforms? What are Fourier transforms, for that matter?
  8. Etc.

ML skills specific: E.g.,

  1. Pick an algorithm you like and walk me through the math and then the implementation of it, in pseudo-code.
  2. OK now let's pick another one, maybe more advanced.
  3. Discuss the meaning of the ROC curve, and write pseudo-code to generate the data for such a curve.
  4. Discuss how you go about feature engineering (look for both intuition and specific evaluation techniques).
  5. Etc.

Distributed systems (our needs): E.g.,

  1. Discuss MapReduce (or your favorite parallelization abstraction). Why is MapReduce referred to as a "shared-nothing" architecture (clearly the nodes have to share something, no?) What are the advantages/disadvantages of "shared-nothing"?
  2. Pick an algorithm. Write the pseudo-code for its parallel version.
  3. What are the trade-offs between closed-form and iterative implementations of an algorithm, in the context of distributed systems?
  4. Etc.

Other (hands-on experience, past accomplishments, etc.):

  1. Do you have experience with R (or Weka, Scikit-learn, SAS, Spark, etc.)? Tell me what you've done with that. Write some example data pipelines in that environment.
  2. Tell me about a time when you ... { worked on a project involving ML ; optimized an algorithm for performance/accuracy/etc. }
  3. Estimate the amount of time in your past project spent on each segment of your data mining/machine learning work.
  4. Etc.
Reactions

Post a Comment

0 Comments