CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

I see comments suggesting adding this to understand how loops are being handled by numba, and in the their own FAQ (https://numba.pydata.org/numba-doc/latest/user/faq.html)

from llvmlite import binding as llvm
llvm.set_option('','--debug-only=loop-vectorize')

You would then create your njit function and run it, and I believe the idea is that it prints debug information about whether

I am working on creating a WandbCallback for Weights and Biases. I am glad that CatBoost has a callback system in place but it would be great if we can extend the interface.

The current callback only supports after_iteration that takes info. Taking inspiration from XGBoost callback system it would be great if we can have before iteration that takes info, before_training, and `after

Description

Change the signature of cupy.{percentile,quantile} to provide exactly the same API as NumPy.

I think it's ok to implement overwrite_input as nop (just ignore the option).

Additional Information

No response

@karthikeyann

Based on @karthikeyann's work on this PR rapidsai/cudf#9767 I'm wondering if it makes sense to consider removing the defaults for the stream parameters in various detail functions. It is pretty surprising how often these are getting missed.

The most common case seems to be in factory functions and various ::create functions. Maybe just do it for those?

请问可以直接training tmfile出来吗? 因为tengine-convert-tool covert 会有error

tengine-lite library version: 1.4-dev
Get input tensor failed

或是有例子能training出下面tmfile 呢?
![Screenshot from 2021-05-27 07-01-46](https://user-images.githubusercontent.com/40915044/11

Current implementation of join can be improved by performing the operation in a single call to the backend kernel instead of multiple calls.

This is a fairly easy kernel and may be a good issue for someone getting to know CUDA/ArrayFire internals. Ping me if you want additional info.

First of all, great library!

I am having some confusion in understanding the role of values_first2 in set_difference_by_key as mentioned here.

In general terms, the result of set difference D = A - B will not contain values from B therefore values_first2 can never be part of D. So `val

Report needed documentation

Report needed documentation
While the estimator guide offers a great breakdown of how to use many of the tools in api_context_managers.py, it would be helpful to have information right in the docstring during development to more easily understand what is actually going on in each of the provided functions/classes/methods. This is particularly important for

In order to test manually altered IR, it would be nice to have a --skip-compilation flag for futhark test, just like we do for futhark bench.

CUDA

Here are 3,276 public repositories matching this topic...

NVIDIA / nvidia-docker

hashcat / hashcat

kaldi-asr / kaldi

numba / numba

catboost / catboost

isl-org / Open3D

chainer / chainer

cupy / cupy

Description

Additional Information

hybridgroup / gocv

rapidsai / cudf

OAID / Tengine

arrayfire / arrayfire

NVIDIA / thrust

Oneflow-Inc / oneflow

uber / aresdb

ROCm-Developer-Tools / HIP

rapidsai / cuml

Report needed documentation

chrxh / alien

Celtoys / Remotery

bytedance / lightseq

NVIDIA / libcudacxx

diku-dk / futhark

dmlc / nnvm

NVIDIA / cuda-samples

graphistry / pygraphistry

NVIDIA / cutlass

NVIDIA / MinkowskiEngine

mp3guy / ElasticFusion

Xtra-Computing / thundersvm

CoinCheung / pytorch-loss

Related Topics