Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev support tensor pin memory #8073

Merged
merged 70 commits into from
May 9, 2022
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
4465cc2
empty support pin_memory
Flowingsun007 Apr 21, 2022
6ba0752
refine
Flowingsun007 Apr 21, 2022
3003e96
tensor.pin_memory()
Flowingsun007 Apr 21, 2022
2ae45df
refine
Flowingsun007 Apr 21, 2022
b5b0e3f
add docs
Flowingsun007 Apr 21, 2022
1486c60
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 Apr 21, 2022
2491a88
refine
Flowingsun007 Apr 21, 2022
e902246
refine
Flowingsun007 Apr 21, 2022
510a3a4
support grad && add test case
Flowingsun007 Apr 21, 2022
a09aa46
fix comments
Flowingsun007 Apr 21, 2022
240b4fd
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 Apr 21, 2022
7703f53
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 Apr 22, 2022
2c47e1d
fix comments
Flowingsun007 Apr 22, 2022
9070072
refine
Flowingsun007 Apr 22, 2022
bfb3c4b
restruct
Flowingsun007 Apr 22, 2022
a476b0f
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 Apr 22, 2022
ef6e07d
refine
Flowingsun007 Apr 22, 2022
2de827e
refactor
Flowingsun007 Apr 22, 2022
1982ca4
refine
Flowingsun007 Apr 22, 2022
68442ad
refine
Flowingsun007 Apr 22, 2022
5ce8f43
refine
Flowingsun007 Apr 22, 2022
eb4479d
refine
Flowingsun007 Apr 22, 2022
969724c
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 Apr 24, 2022
b413722
rm assign.cpp
Flowingsun007 Apr 24, 2022
353ce03
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 Apr 24, 2022
43d189c
refine
Flowingsun007 Apr 24, 2022
9130bda
0-dim tesnor support
Flowingsun007 Apr 24, 2022
0cd0669
merge
Flowingsun007 Apr 24, 2022
bf82033
resolve conflict
Flowingsun007 Apr 25, 2022
47d4ec4
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 Apr 25, 2022
b4ce88e
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 Apr 27, 2022
9f587b1
merge master
Flowingsun007 Apr 27, 2022
67b86e6
refactor
Flowingsun007 Apr 27, 2022
b8ad42a
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 Apr 27, 2022
f85c2bf
refine
Flowingsun007 May 5, 2022
5d75e1b
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 May 5, 2022
94df435
auto format by CI
oneflow-ci-bot May 5, 2022
9088962
refine
Flowingsun007 May 6, 2022
002871d
merge
Flowingsun007 May 6, 2022
aabf53c
auto format by CI
oneflow-ci-bot May 6, 2022
cb25313
refinee
Flowingsun007 May 6, 2022
07608d4
refine
Flowingsun007 May 6, 2022
c0f5f8b
auto format by CI
oneflow-ci-bot May 6, 2022
f35bf46
refactor
Flowingsun007 May 6, 2022
3d245fa
Merge branch 'dev_support_tensor_pin_memory' of github.com:Oneflow-In…
Flowingsun007 May 6, 2022
1e5b8bb
auto format by CI
oneflow-ci-bot May 6, 2022
8a82be4
refine
Flowingsun007 May 6, 2022
0d9c558
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 May 6, 2022
5c9bf5a
refine
Flowingsun007 May 6, 2022
93dd0bc
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 May 6, 2022
e1b9956
Merge branch 'master' into dev_support_tensor_pin_memory
mergify[bot] May 6, 2022
24094e7
Merge branch 'master' into dev_support_tensor_pin_memory
mergify[bot] May 6, 2022
0c68be5
fix
Flowingsun007 May 7, 2022
9f956e9
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 May 7, 2022
7c57136
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 May 7, 2022
6eda8bc
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 May 7, 2022
eec969f
Merge branch 'master' into dev_support_tensor_pin_memory
Flowingsun007 May 7, 2022
ad470f2
Merge branch 'master' into dev_support_tensor_pin_memory
mergify[bot] May 7, 2022
9d45481
Merge branch 'master' into dev_support_tensor_pin_memory
mergify[bot] May 7, 2022
d7ab3a1
Merge branch 'master' into dev_support_tensor_pin_memory
mergify[bot] May 8, 2022
3231730
refine
Flowingsun007 May 8, 2022
932b203
refine
Flowingsun007 May 8, 2022
99d3123
Merge branch 'master' into dev_support_tensor_pin_memory
mergify[bot] May 8, 2022
c7add3d
refine
Flowingsun007 May 8, 2022
074770b
Merge branch 'dev_support_tensor_pin_memory' of github.com:Oneflow-In…
Flowingsun007 May 8, 2022
8fdb5f3
auto format by CI
oneflow-ci-bot May 8, 2022
bd3d8dc
Merge branch 'master' into dev_support_tensor_pin_memory
mergify[bot] May 8, 2022
45067bc
Merge branch 'master' into dev_support_tensor_pin_memory
mergify[bot] May 8, 2022
9a5499c
Merge branch 'master' into dev_support_tensor_pin_memory
mergify[bot] May 8, 2022
3749a7f
Merge branch 'master' into dev_support_tensor_pin_memory
mergify[bot] May 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/tensor.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,3 +180,5 @@ OneFlow Tensor Class
where,
zero_,
nms,
pin_memory,

4 changes: 2 additions & 2 deletions oneflow/api/cpp/framework/graph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,7 @@ of::Maybe<void> Graph::GraphImpl::BuildGraph() {
variable_op_name_to_tensor_[op_conf.name()] = JUST(of::one::functional::Empty(
of::Shape(variable_conf.shape()),
JUST(of::DType::Get(static_cast<of::DataType>(variable_conf.data_type()))),
*device_.device_));
*device_.device_, /**pin_memory=*/false));
}
return of::Maybe<void>::Ok();
});
Expand All @@ -346,7 +346,7 @@ of::Maybe<void> Graph::GraphImpl::BuildGraph() {
output_name_to_tensor_[op_conf.name()] = JUST(of::one::functional::Empty(
of::Shape(blob_conf.shape()),
JUST(of::DType::Get(static_cast<of::DataType>(blob_conf.data_type()))),
*device_.device_));
*device_.device_, /**pin_memory=*/false));
}
return of::Maybe<void>::Ok();
});
Expand Down
2 changes: 1 addition & 1 deletion oneflow/api/cpp/framework/tensor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Tensor::Tensor(const Shape& shape, const Device& device, const DType& dtype) {
of::LazyMode::Guard lazy_mode_disabled_guard(/*is_enabled*/ false);
tensor_ = functional::Empty(*shape.shape_,
of::DType::Get(static_cast<of::DataType>(dtype)).GetOrThrow(),
*device.device_)
*device.device_, /**pin_memory=*/false)
.GetPtrOrThrow();
}
Tensor::Tensor(const std::shared_ptr<oneflow::one::Tensor>& tensor) : tensor_(tensor) {}
Expand Down
2 changes: 1 addition & 1 deletion oneflow/api/python/functional/indexing.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ Maybe<Tensor> ConvertToIndexingTensor(PyObject* object) {
if (PyArray_Check(object)) { return TensorWithData(object, NullOpt, device, false); }

const auto& sizes = JUST(InferArraySizes(object));
const auto& tensor = JUST(functional::Empty(*sizes, CHECK_JUST(DType::Get(dtype)), device));
const auto& tensor = JUST(functional::Empty(*sizes, CHECK_JUST(DType::Get(dtype)), device, /**pin_memory=*/false));
// Prevent the python object release until the callback is complete.
Py_INCREF(object);
auto handle = std::shared_ptr<PyObject>(PyObjectPtr(object));
Expand Down
33 changes: 30 additions & 3 deletions oneflow/api/python/functional/tensor_api.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ class TensorWithShapeCtorFunctor {
} else {
device_ = JUST(Device::New("cpu"));
}
return functional::Empty(shape, DType::Float(), device_);
return functional::Empty(shape, DType::Float(), device_, /**pin_memory=*/false);
}
};

Expand Down Expand Up @@ -277,7 +277,7 @@ class LocalTensorSharedNumpyDataFunctor {
/*ls_leaf=*/true);

// Init blob
JUST(tensor_impl->InitEagerBlobObject(NewLocalDepObject()));
JUST(tensor_impl->InitEagerBlobObject(NewLocalDepObject(), /*pin_memory=*/false));
const auto& stream = GetDefaultStreamByDevice(device);
JUST(tensor_impl->eager_blob_object())->set_last_used_stream(stream);
JUST(JUST(tensor_impl->eager_blob_object())->TryInitBlob());
Expand All @@ -287,6 +287,32 @@ class LocalTensorSharedNumpyDataFunctor {
}
};

class PinMemoryFunctor {
public:
PinMemoryFunctor() {
assign_op_ = CHECK_JUST(one::OpBuilder("assign").Input("ref").Input("value").Build());
Flowingsun007 marked this conversation as resolved.
Show resolved Hide resolved
}
Maybe<Tensor> operator()(const std::shared_ptr<one::Tensor>& input) const {
// if tensor already pinned, then just return
CHECK_OR_RETURN(input->is_local()) << "Tensor.pin_memory() only support local tensor for now!";
Flowingsun007 marked this conversation as resolved.
Show resolved Hide resolved
if(JUST(JUST(input->AsMirroredTensor())->eager_blob_object())->pin_memory()){
return input;
}
const bool pin_memory = true;
Flowingsun007 marked this conversation as resolved.
Show resolved Hide resolved
auto shape = input->shape();
auto device = JUST(input->device());

CHECK_EQ_OR_RETURN(device->enum_type(), DeviceType::kCPU) << "cannot pin tensor with device: " << device->ToString() << ", only dense CPU tensors can be pinned.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error::RuntimeError()

auto output = JUST(functional::Empty(*shape.get(), input->dtype(), device, /**pin_memory=*/pin_memory));
Flowingsun007 marked this conversation as resolved.
Show resolved Hide resolved
JUST(OpInterpUtil::Dispatch<TensorTuple>(*assign_op_, {output, input}));
return output;
}

private:
std::shared_ptr<OpExpr> assign_op_;
};


} // namespace impl

ONEFLOW_FUNCTION_LIBRARY(m) {
Expand All @@ -299,8 +325,9 @@ ONEFLOW_FUNCTION_LIBRARY(m) {
m.add_functor<impl::ConsistentTensorWithDataCtorFunctor>("ConsistentTensorWithDataCtor");
m.add_functor<impl::TensorWithShapeCtorFunctor>("TensorWithShapeCtor");
m.add_functor<impl::ConsistentTensorWithShapeCtorFunctor>("ConsistentTensorWithShapeCtor");
m.add_functor<impl::AssignLocalTensorFunctor>("AssignLocalTensorFunctor");
m.add_functor<impl::AssignLocalTensorFunctor>("AssignLocalTensor");
m.add_functor<impl::LocalTensorSharedNumpyDataFunctor>("LocalTensorSharedNumpyData");
m.add_functor<impl::PinMemoryFunctor>("PinMemory");
}

} // namespace functional
Expand Down
6 changes: 5 additions & 1 deletion oneflow/api/python/functional/tensor_api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,13 @@
bind_python: True

- name: "assign_local_tensor"
signature: "Void (Tensor ref, Tensor value) => AssignLocalTensorFunctor"
signature: "Void (Tensor ref, Tensor value) => AssignLocalTensor"
bind_python: True

- name: "from_numpy"
signature: "Tensor (PyObject* obj) => LocalTensorSharedNumpyData"
bind_python: True

- name: "pin_memory"
signature: "Tensor (Tensor input) => PinMemory"
bind_python: True
4 changes: 2 additions & 2 deletions oneflow/api/python/utils/tensor_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ Maybe<Tensor> MakeLocalTensorFromData(PyObject* data, const Optional<Symbol<DTyp
device_ = JUST(Device::New("cpu"));
}
std::shared_ptr<Tensor> tensor =
JUST(functional::Empty(shape, JUST(DType::Get(data_type)), device_));
JUST(functional::Empty(shape, JUST(DType::Get(data_type)), device_, /**pin_memory=*/false));
JUST(SwitchCopyMirroredTensorFromUntypedArray(SwitchCase(data_type), tensor, array));

Py_DECREF(array);
Expand Down Expand Up @@ -229,7 +229,7 @@ Maybe<Tensor> MakeConsistentTensorFromData(PyObject* data, const Optional<Symbol
device = JUST(Device::New("cuda"));
}
std::shared_ptr<Tensor> local_tensor =
JUST(functional::Empty(shape, JUST(DType::Get(data_type)), device));
JUST(functional::Empty(shape, JUST(DType::Get(data_type)), device, /**pin_memory=*/false));
JUST(SwitchCopyMirroredTensorFromUntypedArray(SwitchCase(data_type), local_tensor, array));

Py_DECREF(array);
Expand Down
2 changes: 1 addition & 1 deletion oneflow/core/autograd/gradient_funcs/narrow.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ class Narrow : public OpExprGradFunction<NarrowCaptureState> {
if (LazyMode::is_enabled()) {
like = ctx->SavedTensors().at(0);
} else if (dy->is_local()) {
like = JUST(functional::Empty(ctx->shape, dy->dtype(), JUST(dy->device())));
like = JUST(functional::Empty(ctx->shape, dy->dtype(), JUST(dy->device()), /**pin_memory=*/false));
} else {
like = JUST(
functional::ConsistentEmpty(ctx->shape, dy->dtype(), JUST(dy->parallel_desc()),
Expand Down
2 changes: 1 addition & 1 deletion oneflow/core/boxing/asymmetric_broadcast.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ Maybe<one::Tensor> AsymmetricBroadcast(const std::shared_ptr<one::Tensor>& tenso
if (!in_parallel_id->has_value()) {
const std::string& device_type = in_placement->device_tag();
local_tensor = JUST(one::functional::Empty(*tensor->shape(), tensor->dtype(),
JUST(Device::New(device_type))));
JUST(Device::New(device_type)), /**pin_memory=*/false));
}
const auto& broadcast_group = JUST(GetBroadcastGroup(in_placement, out_placement));

Expand Down
2 changes: 1 addition & 1 deletion oneflow/core/boxing/cuda_copy_boxing_interpreter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Maybe<one::Tensor> CudaCopy(const std::shared_ptr<one::Tensor>& tensor, Symbol<P
const std::string& device_type = tensor_placement->device_tag();
local_tensor = JUST(one::functional::Empty(
*JUST(GetPhysicalShape(*tensor->shape(), *tensor_nd_sbp, *tensor_placement, 0)),
tensor->dtype(), JUST(Device::New(device_type))));
tensor->dtype(), JUST(Device::New(device_type)), /**pin_memory=*/false));
}
const auto& sbp_list = JUST(GetSbpList(out->nd_sbp()));
return JUST(one::functional::LocalToConsistent(local_tensor, out->placement(), *sbp_list,
Expand Down
15 changes: 14 additions & 1 deletion oneflow/core/device/cpu_device_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ limitations under the License.
#include "oneflow/core/kernel/kernel_context.h"
#include "oneflow/core/device/event_record.h"
#include "oneflow/core/vm/cpu_allocator.h"
#include "oneflow/core/vm/cuda_host_allocator.h"
#include "oneflow/core/ep/cpu/cpu_stream.h"
#include "oneflow/core/ep/cpu/cpu_device.h"
#include "oneflow/core/ep/include/device_manager_registry.h"
Expand All @@ -31,12 +32,24 @@ class CpuDeviceCtx final : public DeviceCtx {
CpuDeviceCtx() {
device_ = Global<ep::DeviceManagerRegistry>::Get()->GetDevice(DeviceType::kCPU, 0);
stream_ = device_->CreateStream();
set_pin_memory(false);
}
CpuDeviceCtx(const bool pin_memory) {
device_ = Global<ep::DeviceManagerRegistry>::Get()->GetDevice(DeviceType::kCPU, 0);
stream_ = device_->CreateStream();
set_pin_memory(pin_memory);
}
~CpuDeviceCtx() { device_->DestroyStream(stream_); }

std::unique_ptr<DeviceCtx> Copy() const { return std::unique_ptr<DeviceCtx>(new CpuDeviceCtx()); }

vm::Allocator* mut_allocator() override { return Global<vm::CpuAllocator>::Get(); }
vm::Allocator* mut_allocator(const bool pin_memory) override {
Flowingsun007 marked this conversation as resolved.
Show resolved Hide resolved
if(this->pin_memory()){
return Global<vm::CudaHostAllocator>::Get();
}else{
return Global<vm::CpuAllocator>::Get();
}
}

DeviceType device_type() const override { return DeviceType::kCPU; }

Expand Down
9 changes: 8 additions & 1 deletion oneflow/core/device/device_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,16 @@ class DeviceCtx {
return nullptr;
}
#endif
void set_pin_memory(const bool pin_memory){
pin_memory_ = pin_memory;
}
bool pin_memory() {
return pin_memory_;
}
Flowingsun007 marked this conversation as resolved.
Show resolved Hide resolved

virtual ep::Stream* stream() = 0;

virtual vm::Allocator* mut_allocator() {
virtual vm::Allocator* mut_allocator(const bool pin_memory) {
UNIMPLEMENTED();
return nullptr;
}
Expand All @@ -61,6 +67,7 @@ class DeviceCtx {
DeviceCtx() = default;

private:
bool pin_memory_;
};

} // namespace oneflow
Expand Down
2 changes: 1 addition & 1 deletion oneflow/core/eager/eager_blob_object.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Maybe<void> EagerBlobObject::InitBlobWithOffset(const int64_t offset) {
}

Maybe<void> EagerBlobObject::TryAllocateBlobBodyMemory(DeviceCtx* device_ctx) {
vm::Allocator* allocator = device_ctx->mut_allocator();
vm::Allocator* allocator = device_ctx->mut_allocator(pin_memory());
CHECK_NOTNULL_OR_RETURN(allocator);
Blob* blob = mut_blob();
CHECK_NOTNULL_OR_RETURN(blob);
Expand Down
9 changes: 9 additions & 0 deletions oneflow/core/eager/eager_blob_object.h
Original file line number Diff line number Diff line change
Expand Up @@ -143,11 +143,20 @@ class EagerBlobObject final : public BlobObject {
tensor_storage_->set_last_used_stream(last_used_stream);
}

void set_pin_memory(const bool pin_memory){
pin_memory_ = pin_memory;
}

const bool pin_memory(){
Flowingsun007 marked this conversation as resolved.
Show resolved Hide resolved
return pin_memory_;
}

private:
std::unique_ptr<Blob> blob_;
std::unique_ptr<char[]> header_buffer_;
std::shared_ptr<TensorStorage> tensor_storage_;
std::atomic<bool> is_shape_synced_;
bool pin_memory_;
intrusive::shared_ptr<LocalDepObject> compute_local_dep_object_;
};

Expand Down
2 changes: 1 addition & 1 deletion oneflow/core/eager/lazy_job_device_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ class LazyJobDeviceCtx final : public DeviceCtx {
}
#endif

vm::Allocator* mut_allocator() override {
vm::Allocator* mut_allocator(const bool pin_memory) override {
UNIMPLEMENTED();
return (vm::Allocator*)nullptr;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -138,12 +138,17 @@ Maybe<void> NaiveInterpret(const UserOpExpr& user_op_expr, const TensorTuple& in
return output_tensor_metas->at(i);
}));

bool pin_memory = false;
if(user_op_expr.op_type_name()=="empty"){
pin_memory = JUST(attrs.GetAttr<bool>("pin_memory"));
}
Flowingsun007 marked this conversation as resolved.
Show resolved Hide resolved

for (int i = 0; i < output_eager_blob_objects->size(); i++) {
auto* tensor_impl = JUST(TensorImpl4Tensor(outputs->at(i)));
if (!output_eager_blob_objects->at(i)) {
tensor_impl->mut_tensor_meta()->set_stride(std::make_shared<Stride>(*tensor_impl->shape()));
const auto& dep_object = NewLocalDepObject();
JUST(tensor_impl->InitEagerBlobObject(dep_object));
JUST(tensor_impl->InitEagerBlobObject(dep_object, pin_memory));
output_eager_blob_objects->at(i) = JUST(tensor_impl->eager_blob_object());
} else {
// output i is inplaced.
Expand All @@ -158,25 +163,17 @@ Maybe<void> NaiveInterpret(const UserOpExpr& user_op_expr, const TensorTuple& in

for (int64_t index : kernel->output_tuple_indexes4mut2_obns()) {
output_eager_blob_objects->at(index)->set_is_shape_synced(false);
output_eager_blob_objects->at(index)->set_pin_memory(pin_memory);
Flowingsun007 marked this conversation as resolved.
Show resolved Hide resolved
}


Flowingsun007 marked this conversation as resolved.
Show resolved Hide resolved
JUST(PhysicalRun([&](InstructionsBuilder* builder) -> Maybe<void> {
return builder->LocalCallOpKernel(kernel, input_eager_blob_objects, output_eager_blob_objects,
ctx, stream);
}));
return Maybe<void>::Ok();
}

Maybe<void> RunEmptyOp(TensorTuple* outputs) {
Flowingsun007 marked this conversation as resolved.
Show resolved Hide resolved
CHECK_EQ_OR_RETURN(outputs->size(), 1);
auto* tensor_impl = JUST(TensorImpl4Tensor(outputs->at(0)));
const auto& shape = tensor_impl->tensor_meta()->shape_ptr();
const auto& data_type = tensor_impl->dtype();
const auto& device = tensor_impl->device();
outputs->at(0) = JUST(functional::Empty(*shape, DType(data_type), device));
return Maybe<void>::Ok();
}

static Maybe<void> NaiveInterpret(const UserOpExpr& user_op_expr, const TensorTuple& inputs,
TensorTuple* outputs, const OpExprInterpContext& ctx) {
CHECK_EQ_OR_RETURN(outputs->size(), user_op_expr.output_size());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ namespace one {

class Tensor;

Maybe<void> RunEmptyOp(TensorTuple* outputs);
Maybe<Tensor> Broadcast(const std::shared_ptr<Tensor>& tensor, int64_t src_rank,
Symbol<ParallelDesc> parallel_desc, bool inplace);

Expand Down
8 changes: 4 additions & 4 deletions oneflow/core/framework/random_generator_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Maybe<Tensor> CPUGeneratorImpl::GetState() const {
JUST(CPUSynchronize());
CPUGeneratorState state;
const auto& device = JUST(Device::New("cpu"));
const auto& tensor_state = JUST(functional::Empty(Shape{sizeof(state)}, DType::UInt8(), device));
const auto& tensor_state = JUST(functional::Empty(Shape{sizeof(state)}, DType::UInt8(), device, /**pin_memory=*/false));

std::stringstream ss;
ss << engine_;
Expand Down Expand Up @@ -181,7 +181,7 @@ Maybe<Tensor> CUDAGeneratorImpl::GetState() const {
int64_t state_size = max_block_num_ * max_thread_num_ * sizeof(curandState);
int64_t total_size = state_size + sizeof(int64_t);
const auto& device = JUST(Device::New("cpu"));
const auto& tensor_state = JUST(functional::Empty(Shape{total_size}, DType::UInt8(), device));
const auto& tensor_state = JUST(functional::Empty(Shape{total_size}, DType::UInt8(), device, /**pin_memory=*/false));

const auto& callback = [&](uint64_t of_blob_ptr) {
auto* of_blob = reinterpret_cast<OfBlob*>(of_blob_ptr);
Expand Down Expand Up @@ -293,7 +293,7 @@ Maybe<Tensor> AutoGeneratorImpl::GetState() const {
}
}
const auto& device = JUST(Device::New("cpu"));
const auto& tensor_state = JUST(functional::Empty(Shape{total_size}, DType::UInt8(), device));
const auto& tensor_state = JUST(functional::Empty(Shape{total_size}, DType::UInt8(), device, /**pin_memory=*/false));
const auto& callback = [&buffer, &total_size](uint64_t of_blob_ptr) {
auto* of_blob = reinterpret_cast<OfBlob*>(of_blob_ptr);
memcpy(of_blob->mut_blob()->mut_dptr<uint8_t>(), buffer.data(), total_size);
Expand Down Expand Up @@ -337,7 +337,7 @@ Maybe<void> AutoGeneratorImpl::SetState(const std::shared_ptr<Tensor>& tensor_st
std::vector<std::shared_ptr<Tensor>> tensor_states(state.num);
for (int i = 0; i < state.num; ++i) {
int64_t state_size = state_sizes.at(i);
tensor_states[i] = JUST(functional::Empty(Shape{state_size}, DType::UInt8(), device));
tensor_states[i] = JUST(functional::Empty(Shape{state_size}, DType::UInt8(), device, /**pin_memory=*/false));
const auto& callback = [&data, &state_size](uint64_t of_blob_ptr) {
auto* of_blob = reinterpret_cast<OfBlob*>(of_blob_ptr);
memcpy(of_blob->mut_blob()->mut_dptr<uint8_t>(), data, state_size);
Expand Down
8 changes: 5 additions & 3 deletions oneflow/core/framework/tensor_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ Maybe<LocalDepObject*> EagerMirroredTensorImpl::compute_local_dep_object() const
}

Maybe<void> EagerMirroredTensorImpl::InitEagerBlobObject(
const intrusive::shared_ptr<LocalDepObject>& dep_object) {
const intrusive::shared_ptr<LocalDepObject>& dep_object, const bool pin_memory) {
CHECK_OR_RETURN(static_cast<bool>(device()));
const auto& mem_case = device()->mem_case();
const auto& mut_shape = std::const_pointer_cast<Shape>(tensor_meta()->shape_ptr());
Expand All @@ -110,9 +110,11 @@ Maybe<void> EagerMirroredTensorImpl::InitEagerBlobObject(
auto tensor_storage = tensor_storage_->storage();
eager_blob_object_ = std::make_shared<vm::EagerBlobObject>(mem_case, mut_shape, dtype(),
tensor_storage, dep_object);
eager_blob_object_->set_pin_memory(pin_memory);
} else {
const auto& eager_blob_object = std::make_shared<vm::EagerBlobObject>(
mem_case, mut_shape, dtype(), std::make_shared<vm::TensorStorage>(), dep_object);
eager_blob_object->set_pin_memory(pin_memory);
JUST(set_eager_blob_object(eager_blob_object));
}
return Maybe<void>::Ok();
Expand Down Expand Up @@ -214,11 +216,11 @@ Maybe<Shape> GetPhysicalShape(const Shape& logical_shape, const NdSbp& nd_sbp,
auto cur_rank_phy_tensor_impl =
std::make_shared<EagerMirroredTensorImpl>(cur_rank_phy_tensor_meta, requires_grad, is_leaf);
const auto& dep_object = NewLocalDepObject();
JUST(cur_rank_phy_tensor_impl->InitEagerBlobObject(dep_object));
JUST(cur_rank_phy_tensor_impl->InitEagerBlobObject(dep_object, /**pin_memory=*/false));
cur_rank_phy_tensor = std::make_shared<MirroredTensor>(cur_rank_phy_tensor_impl);
} else {
const auto& dtype_symbol = JUST(DType::Get(dtype));
const auto& empty = JUST(functional::Empty(*cur_rank_phy_shape, dtype_symbol, device));
const auto& empty = JUST(functional::Empty(*cur_rank_phy_shape, dtype_symbol, device, /**pin_memory=*/false));
Flowingsun007 marked this conversation as resolved.
Show resolved Hide resolved
cur_rank_phy_tensor = JUST(empty->AsMirroredTensor());
JUST(cur_rank_phy_tensor->set_requires_grad(requires_grad));
cur_rank_phy_tensor->set_is_leaf(is_leaf);
Expand Down
1 change: 1 addition & 0 deletions oneflow/core/framework/tensor_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,7 @@ class EagerMirroredTensorImpl final : public MirroredTensorImpl {
TensorStorage* mut_tensor_storage() { return tensor_storage_.get(); }

Maybe<void> InitEagerBlobObject(const intrusive::shared_ptr<LocalDepObject>& dep_object);
Maybe<void> InitEagerBlobObject(const intrusive::shared_ptr<LocalDepObject>& dep_object, const bool pin_memory);
Maybe<EagerMirroredTensorImpl*> mut_eager_mirrored_tensor_impl() override { return this; }

Maybe<void> RegisterStorageDeleteHook(const std::function<void()>& hook) override;
Expand Down
Loading