Tweets
- Tweets, current page.
- Tweets & replies
- Media
You blocked @rasbt
Are you sure you want to view these Tweets? Viewing Tweets won't unblock @rasbt
-
Pinned Tweet
If you are looking for something for this weekend, my new book *Machine Learning with PyTorch and Scikit-Learn* just came out today:https://www.amazon.com/Machine-Learning-PyTorch-Scikit-Learn-scikit-learn-ebook-dp-B09NW48MR1/dp/B09NW48MR1/ …
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
Just discovered this RNN gem: "LSTM: A Search Space Odyssey" https://arxiv.org/abs/1503.04069 Basically it's a large scale study looking into LSTM variants. While things can be simplified, the forget gate and the output activation function are the most critical parts of the LSTM cell
Thanks. Twitter will use this to make your timeline better. UndoUndo -
Personally, I think that class projects are really worthwhile. They are a lot of work but are very rewarding for both the students (practical experience & sth for the CV) and the teacher (it's always sth unique & interesting vs the standard HW assignments)https://proceedings.mlr.press/v170/raschka22a.html …
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
The proceedings of the 2nd Teaching Machine Learning and Artificial Intelligence Workshop are now online: https://proceedings.mlr.press/v170/ It was a fun workshop catching up with colleagues and exchanging tips & tricks about teaching
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
PS: credits go to
@cHHillee for this excellent article!Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
9/9 Python can perform 32 million additions per sec, however, PyTorch can only do 280 thousand operations per sec. Interesting, but again not an issue in practice, because we usually use few, large array operations when we work in PyTorch
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
8/ Interesting tidbit about overhead. Python can perform 32 million additions per sec. But in the time Python requires for 1 FLOP, an A100 GPU can perform 9.75 million FLOPS (but in practice Python overhead is not an issue because the percentage of time spent there is so small)
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
7/ In the middle panel we can see that the FLOPS ramp up, and in the right panel we see that the memory bandwidth goes down as we reach 128ish repeats. Meaning that the code switches from being memory-bound to being CPU-bound.pic.twitter.com/cipIwoj283
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
6/ For the following plot we can see that the runtime (left panel) doesn't increase until we reach 32 repeats. So the code is probably mostly memory-bandwidth- and overhead-bound.pic.twitter.com/RglKfkNUCv
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
5/ Anyways, in general, one of the main take-aways is to find out whether your (GPU) code is (1) compute-, (2) memory-, (3) or overhead-bound. There's an interesting analysis using the following code: def f(x: Tensor[N]): for _ in range(repeat): x=x*2 return x
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
4/ There is the concept of operator fusion -- eg via custom CUDA kernels. I know many mathematically inclined folks prefer the elegance of composition: things like torch.log(F.softmax(a)). But F.log_softmax(a) is not only better for numeric stability but also efficiency.
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
3/ Because of the memory read & writes, running successive operations can be expensive, even if you already have your data transferred to the GPU Eg 4 read & writes: x1 = x.cos() x2 = x1.cos() vs 2: x2 = x.cos().cos() The latter is still inefficient (unless compiled) though.
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
2/ Different things to consider: 1. Compute: Time the GPU spends on floating point operations (FLOPS) 2. Memory: Time spent transferring tensors within a GPU 3. Overhead: Everything else (like Python, and the PyTorch API)pic.twitter.com/MNQkm61QSI
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
Ever had a colleague who wants to apply linear regression to thousands of small datasets and asked you about a PyTorch-GPU implementation to speed things up? This would be a great article to share, illustrating the different performance bottle necks: https://horace.io/brrr_intro.html 1/
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
Is there a better way to spend the weekend
? Will join the stream to check in on
@marksaroufim around 4 pm CST (2 pm PST). Help me to prepare some tricky questions(just kidding ... or maybe not?
)https://twitter.com/PyTorchLightnin/status/1505189852584853507 …
Thanks. Twitter will use this to make your timeline better. UndoUndo -
Sebastian Raschka Retweeted
ML Twitter: What is the current best practice for the following setting? Problem: Image classification. Setting: I'm given an initial supervised training set of labeled images drawn from P(x,y), and I train a net. Then I'm given a second set of labeled images also from P(x,y). 1/
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
Kudos to the authors for providing all the code to reproduce the results on GitHub: https://github.com/Yura52/tabular-dl-num-embeddings#how-to-reproduce-results …pic.twitter.com/qVTsgX2W3S
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
3/3 The second one is based on periodic activation functions (inspired by positional encodings). Besides having competitive performance to gradient boosting, it is also interesting to note that with these embeddings, the MLPs are also competitive to transformer-architectures.
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
2/ The authors employ two embedding schemes that are competitive with gradient boosting and apparently result in SOTA performance. The first embedding scheme is a piecewise-linear encoding involving feature binning. red by positional encodings)pic.twitter.com/z80lB2bDqB
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
*No confidence intervals in the table above
, but I think results are averaged over 15 random seeds at least
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo -
Let the "Deep Learning for Tabular Data" saga continue -- it's been a while! "On Embeddings for Numerical Features in Tabular Deep Learning" (https://arxiv.org/abs/2203.05556 ) This time, the focus is on the embeddings for numerical features rather than proposing a new architecture.*pic.twitter.com/2NcpYvnYlS
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.