Sebastian Raschka

@rasbt

Python, ML & open source! Lead AI Educator | ⚡️. Statistics Prof . Author of "Machine Learning with PyTorch and Scikit-Learn"

Madison, Wisconsin
Joined October 2012

Tweets

You blocked @rasbt

Are you sure you want to view these Tweets? Viewing Tweets won't unblock @rasbt

  1. Pinned Tweet
    Feb 25

    If you are looking for something for this weekend, my new book *Machine Learning with PyTorch and Scikit-Learn* just came out today:

    Show this thread
    Undo
  2. 11 hours ago

    Just discovered this RNN gem: "LSTM: A Search Space Odyssey" Basically it's a large scale study looking into LSTM variants. While things can be simplified, the forget gate and the output activation function are the most critical parts of the LSTM cell

    Undo
  3. 12 hours ago

    Personally, I think that class projects are really worthwhile. They are a lot of work but are very rewarding for both the students (practical experience & sth for the CV) and the teacher (it's always sth unique & interesting vs the standard HW assignments)

    Show this thread
    Undo
  4. 12 hours ago

    The proceedings of the 2nd Teaching Machine Learning and Artificial Intelligence Workshop are now online: It was a fun workshop catching up with colleagues and exchanging tips & tricks about teaching 🤓

    Show this thread
    Undo
  5. Mar 20

    PS: credits go to for this excellent article!

    Show this thread
    Undo
  6. Mar 20

    9/9 Python can perform 32 million additions per sec, however, PyTorch can only do 280 thousand operations per sec. Interesting, but again not an issue in practice, because we usually use few, large array operations when we work in PyTorch

    Show this thread
    Undo
  7. Mar 20

    8/ Interesting tidbit about overhead. Python can perform 32 million additions per sec. But in the time Python requires for 1 FLOP, an A100 GPU can perform 9.75 million FLOPS (but in practice Python overhead is not an issue because the percentage of time spent there is so small)

    Show this thread
    Undo
  8. Mar 20

    7/ In the middle panel we can see that the FLOPS ramp up, and in the right panel we see that the memory bandwidth goes down as we reach 128ish repeats. Meaning that the code switches from being memory-bound to being CPU-bound.

    Show this thread
    Undo
  9. Mar 20

    6/ For the following plot we can see that the runtime (left panel) doesn't increase until we reach 32 repeats. So the code is probably mostly memory-bandwidth- and overhead-bound.

    Show this thread
    Undo
  10. Mar 20

    5/ Anyways, in general, one of the main take-aways is to find out whether your (GPU) code is (1) compute-, (2) memory-, (3) or overhead-bound. There's an interesting analysis using the following code: def f(x: Tensor[N]): for _ in range(repeat): x=x*2 return x

    Show this thread
    Undo
  11. Mar 20

    4/ There is the concept of operator fusion -- eg via custom CUDA kernels. I know many mathematically inclined folks prefer the elegance of composition: things like torch.log(F.softmax(a)). But F.log_softmax(a) is not only better for numeric stability but also efficiency.

    Show this thread
    Undo
  12. Mar 20

    3/ Because of the memory read & writes, running successive operations can be expensive, even if you already have your data transferred to the GPU Eg 4 read & writes: x1 = x.cos() x2 = x1.cos() vs 2: x2 = x.cos().cos() The latter is still inefficient (unless compiled) though.

    Show this thread
    Undo
  13. Mar 20

    2/ Different things to consider: 1. Compute: Time the GPU spends on floating point operations (FLOPS) 2. Memory: Time spent transferring tensors within a GPU
3. Overhead: Everything else (like Python, and the PyTorch API)

    Show this thread
    Undo
  14. Mar 20

    Ever had a colleague who wants to apply linear regression to thousands of small datasets and asked you about a PyTorch-GPU implementation to speed things up? This would be a great article to share, illustrating the different performance bottle necks: 1/

    Show this thread
    Undo
  15. Mar 19

    Is there a better way to spend the weekend 😆? Will join the stream to check in on around 4 pm CST (2 pm PST). Help me to prepare some tricky questions 😁 (just kidding ... or maybe not? 😇)

    Undo
  16. Retweeted
    Mar 18

    ML Twitter: What is the current best practice for the following setting? Problem: Image classification. Setting: I'm given an initial supervised training set of labeled images drawn from P(x,y), and I train a net. Then I'm given a second set of labeled images also from P(x,y). 1/

    Show this thread
    Undo
  17. Mar 18

    Kudos to the authors for providing all the code to reproduce the results on GitHub:

    Show this thread
    Undo
  18. Mar 18

    3/3 The second one is based on periodic activation functions (inspired by positional encodings). Besides having competitive performance to gradient boosting, it is also interesting to note that with these embeddings, the MLPs are also competitive to transformer-architectures.

    Show this thread
    Undo
  19. Mar 18

    2/ The authors employ two embedding schemes that are competitive with gradient boosting and apparently result in SOTA performance. The first embedding scheme is a piecewise-linear encoding involving feature binning. red by positional encodings)

    Show this thread
    Undo
  20. Mar 18

    *No confidence intervals in the table above 🥲, but I think results are averaged over 15 random seeds at least ☺️

    Show this thread
    Undo
  21. Mar 18

    Let the "Deep Learning for Tabular Data" saga continue -- it's been a while! "On Embeddings for Numerical Features in Tabular Deep Learning" () This time, the focus is on the embeddings for numerical features rather than proposing a new architecture.*

    Show this thread
    Undo

Loading seems to be taking a while.

Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

    You may also like

    ·