Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ram-disk example and direct IO #324

Open
gurugio opened this issue Aug 29, 2022 · 2 comments
Open

ram-disk example and direct IO #324

gurugio opened this issue Aug 29, 2022 · 2 comments

Comments

@gurugio
Copy link

gurugio commented Aug 29, 2022

Hi,

First I want to tell that I thank you for the great document very very much.

I have a question about the ram-disk example.
When the user application opens a block file, I think the buffer-cache works between the driver and filesystem.

So I ran the ram-disk-test program without solving TODO 3.
Below is what I got when running ram-disk-test program without solving TODO 3.
What I guess:

  1. READ happened when accessing sector 0, 8, 16 ...
    So I guess the unit size of buffer-cache layer is 8*512 = 4096 bytes.
  2. Application reads each sector and write each sector.
    But driver only does write. So I guess the application reads only the cached data.
    Only when application writes data, the buffer-cache layer writes data.
/ # ./ramdisktest
insmod ./ram-disk.ko
mknod /dev/myblock b 240 0
mknod: /dev/myblock: File exists
[   29.347210] my_block_open
use normal io
[   29.349346] req: start=0 tsize=4096 dsize=4096 dir=read
[   29.352516] bio: sector=0 offset=0, len=4096 dir=read
[   29.354854] req: start=0 tsize=4096 dsize=4096 dir=write
[   29.355560] bio: sector=0 offset=0, len=4096 dir=write
[   29.356199] req: start=0 tsize=4096 dsize=4096 dir=write
test sector   0 [   29.356864] bio: sector=0 offset=0, len=4096 dir=write
... passed
test sector   1 [   29.357781] req: start=0 tsize=4096 dsize=4096 dir=write
... passed
[   29.358652] bio: sector=0 offset=0, len=4096 dir=write
[   29.359513] req: start=0 tsize=4096 dsize=4096 dir=write
test sector   2 [   29.360191] bio: sector=0 offset=0, len=4096 dir=write
... passed
test sector   3 [   29.361177] req: start=0 tsize=4096 dsize=4096 dir=write
... passed
[   29.362016] bio: sector=0 offset=0, len=4096 dir=write
[   29.362833] req: start=0 tsize=4096 dsize=4096 dir=write
test sector   4 [   29.363529] bio: sector=0 offset=0, len=4096 dir=write
... passed
test sector   5 [   29.364518] req: start=0 tsize=4096 dsize=4096 dir=write
... passed
[   29.365369] bio: sector=0 offset=0, len=4096 dir=write
test sector   6 [   29.366223] req: start=0 tsize=4096 dsize=4096 dir=write
... passed
[   29.367067] bio: sector=0 offset=0, len=4096 dir=write
test sector   7 [   29.367891] req: start=8 tsize=4096 dsize=4096 dir=read
... passed
[   29.368737] bio: sector=8 offset=0, len=4096 dir=read
[   29.369566] req: start=8 tsize=4096 dsize=4096 dir=write
[   29.370235] bio: sector=8 offset=0, len=4096 dir=write
[   29.370911] req: start=8 tsize=4096 dsize=4096 dir=write
test sector   8 [   29.371590] bio: sector=8 offset=0, len=4096 dir=write
... passed
test sector   9 [   29.372584] req: start=8 tsize=4096 dsize=4096 dir=write
... passed
[   29.373435] bio: sector=8 offset=0, len=4096 dir=write
[   29.374365] req: start=8 tsize=4096 dsize=4096 dir=write
test sector  10 [   29.375054] bio: sector=8 offset=0, len=4096 dir=write
... passed
test sector  11 [   29.376050] req: start=8 tsize=4096 dsize=4096 dir=write
... passed
[   29.376917] bio: sector=8 offset=0, len=4096 dir=write
[   29.377724] req: start=8 tsize=4096 dsize=4096 dir=write
test sector  12 [   29.378405] bio: sector=8 offset=0, len=4096 dir=write
... passed
test sector  13 [   29.379435] req: start=8 tsize=4096 dsize=4096 dir=write
... passed
[   29.380260] bio: sector=8 offset=0, len=4096 dir=write
[   29.381078] req: start=8 tsize=4096 dsize=4096 dir=write
test sector  14 [   29.381764] bio: sector=8 offset=0, len=4096 dir=write
... passed
test sector  15 [   29.382768] req: start=16 tsize=4096 dsize=4096 dir=read
... passed
.............................. skip..................

Therefore I guess the ram-disk-test program needs to use O_DIRECT flag.
So I re-build it with O_DIRECT flag.
Below is what I got.

  1. All accessing failed.
  2. Device driver did nothing.
/ # ./ramdisktest d
insmod ./ram-disk.ko
[    4.929450] ram_disk: loading out-of-tree module taints kernel.
mknod /dev/myblock b 240 0
[    6.985266] my_block_open
use direct io
test sector   0 ... failed
test sector   1 ... failed
test sector   2 ... failed
test sector   3 ... failed
test sector   4 ... failed
test sector   5 ... failed
test sector   6 ... failed
test sector   7 ... failed
test sector   8 ... failed
..........................................skip.......................

I don't understand why the driver did nothing when using O_DIRECT flag.

Here I attached ram-disk-test program code and ram-disk driver code.
Could you please inform me what I did wrong?
And please inform me whether the user application needs O_DIRECT flag or not.


#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <fcntl.h>
#include <time.h>
#include <errno.h>

#define NR_SECTORS 128
#define SECTOR_SIZE 512

#define DEVICE_NAME "/dev/myblock"
#define MODULE_NAME "ram-disk"
#define MY_BLOCK_MAJOR "240"
#define MY_BLOCK_MINOR "0"

#define max_elem_value(elem) (1 << 8 * sizeof(elem))

static unsigned char buffer[SECTOR_SIZE];
static unsigned char buffer_copy[SECTOR_SIZE];

static void test_sector(int fd, size_t sector)
{
	int i;

	for (i = 0; i < sizeof(buffer) / sizeof(buffer[0]); i++)
		buffer[i] = rand() % max_elem_value(buffer[0]);

	printf("test sector %3d ... ", sector);
	lseek(fd, sector * SECTOR_SIZE, SEEK_SET);
	write(fd, buffer, sizeof(buffer));

	fsync(fd);

	lseek(fd, sector * SECTOR_SIZE, SEEK_SET);
	read(fd, buffer_copy, sizeof(buffer_copy));

	if (memcmp(buffer, buffer_copy, sizeof(buffer_copy)) == 0)
		printf("passed\n");
	else
		printf("failed\n");
}

int main(int argc, char **argv)
{
	int fd;
	size_t i;
	int back_errno;

	printf("insmod ./" MODULE_NAME ".ko\n");
	if (system("insmod ./" MODULE_NAME ".ko\n")) {
		fprintf(stderr, "insmod failed\n");
		exit(EXIT_FAILURE);
	}

	sleep(1);

	printf("mknod " DEVICE_NAME " b " MY_BLOCK_MAJOR " " MY_BLOCK_MINOR
	       "\n");
	system("mknod " DEVICE_NAME " b " MY_BLOCK_MAJOR " " MY_BLOCK_MINOR
	       "\n");
	sleep(1);

	if (argc == 2 && argv[1][0] == 'd') {
		printf("use direct io\n");
		fd = open(DEVICE_NAME, O_RDWR | O_DIRECT | O_SYNC);
	} else {
		printf("use normal io\n");
		fd = open(DEVICE_NAME, O_RDWR);
	}
	if (fd < 0) {
		back_errno = errno;
		perror("open");
		fprintf(stderr, "errno is %d\n", back_errno);
		exit(EXIT_FAILURE);
	}

	srand(time(NULL));
	for (i = 0; i < NR_SECTORS; i++)
		test_sector(fd, i);

	close(fd);

	sleep(1);
	printf("rmmod " MODULE_NAME "\n");
	system("rmmod " MODULE_NAME "\n");

	return 0;
}

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>

#include <linux/genhd.h>
#include <linux/fs.h>
#include <linux/blkdev.h>
#include <linux/blk_types.h>
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
#include <linux/bio.h>
#include <linux/vmalloc.h>

MODULE_DESCRIPTION("Simple RAM Disk");
MODULE_AUTHOR("SO2");
MODULE_LICENSE("GPL");

#define KERN_LOG_LEVEL KERN_ALERT

#define MY_BLOCK_MAJOR 240
#define MY_BLKDEV_NAME "mybdev"
#define MY_BLOCK_MINORS 1
#define NR_SECTORS 128

#define KERNEL_SECTOR_SIZE 512

/* TODO 6: use bios for read/write requests */
#define USE_BIO_TRANSFER 0

static struct my_block_dev {
	struct blk_mq_tag_set tag_set;
	struct request_queue *queue;
	struct gendisk *gd;
	u8 *data;
	size_t size;
} g_dev;

static int my_block_open(struct block_device *bdev, fmode_t mode)
{
	pr_info("my_block_open\n");
	return 0;
}

static void my_block_release(struct gendisk *gd, fmode_t mode)
{
	pr_info("my_block_release\n");
}

static const struct block_device_operations my_block_ops = {
	.owner = THIS_MODULE,
	.open = my_block_open,
	.release = my_block_release
};

static void my_block_transfer(struct my_block_dev *dev, sector_t sector,
			      unsigned long len, char *buffer, int dir)
{
	unsigned long offset = sector * KERNEL_SECTOR_SIZE;

	/* check for read/write beyond end of block device */
	if ((offset + len) > dev->size)
		return;

	/* TODO 3: read/write to dev buffer depending on dir */
	if (dir == WRITE) {
		memcpy(dev->data + offset, buffer, len);
	} else {
		memcpy(buffer, dev->data + offset, len);
	}
}

/* to transfer data using bio structures enable USE_BIO_TRANFER */
#if USE_BIO_TRANSFER == 1
static void my_xfer_request(struct my_block_dev *dev, struct request *req)
{
	/* TODO 6: iterate segments */

	/* TODO 6: copy bio data to device buffer */
}
#endif

static blk_status_t my_block_request(struct blk_mq_hw_ctx *hctx,
				     const struct blk_mq_queue_data *bd)
{
	struct request *rq;
	struct my_block_dev *dev = hctx->queue->queuedata;
	struct bio_vec bvec;
	struct req_iterator iter;

	/* TODO 2: get pointer to request */
	rq = bd->rq;

	/* TODO 2: start request processing. */
	blk_mq_start_request(rq);

	/* TODO 2: check fs request. Return if passthrough. */
	if (blk_rq_is_passthrough(rq)) {
		pr_info("passthrough request\n");
		blk_mq_end_request(rq, BLK_STS_IOERR);
		goto out;
	}
	/* TODO 2: print request information */
	pr_info("req: start=%llu tsize=%u dsize=%u dir=%s\n", blk_rq_pos(rq),
		blk_rq_bytes(rq), blk_rq_cur_bytes(rq),
		rq_data_dir(rq) == WRITE ? "write" : "read");
#if USE_BIO_TRANSFER == 1
	/* TODO 6: process the request by calling my_xfer_request */
#else
	/* TODO 3: process the request by calling my_block_transfer */
	rq_for_each_segment (bvec, rq, iter) {
		sector_t sector = iter.iter.bi_sector;
		char *buf = kmap_atomic(bvec.bv_page);
		unsigned long offset = bvec.bv_offset;
		size_t len = bvec.bv_len;
		int dir = bio_data_dir(iter.bio);

		pr_info("bio: sector=%d offset=%d, len=%d dir=%s\n",
			(int)sector, (int)offset, (int)len,
			dir == WRITE ? "write" : "read");

		//my_block_transfer(dev, sector, len, buf + offset, dir);
		kunmap_atomic(buf);
	}
#endif

	/* TODO 2: end request successfully */
	blk_mq_end_request(rq, BLK_STS_OK);
out:
	return BLK_STS_OK;
}

static struct blk_mq_ops my_queue_ops = {
	.queue_rq = my_block_request,
};

static int create_block_device(struct my_block_dev *dev)
{
	int err;

	dev->size = NR_SECTORS * KERNEL_SECTOR_SIZE;
	dev->data = vmalloc(dev->size);
	if (dev->data == NULL) {
		printk(KERN_ERR "vmalloc: out of memory\n");
		err = -ENOMEM;
		goto out_vmalloc;
	}

	/* Initialize tag set. */
	dev->tag_set.ops = &my_queue_ops;
	dev->tag_set.nr_hw_queues = 1;
	dev->tag_set.queue_depth = 128;
	dev->tag_set.numa_node = NUMA_NO_NODE;
	dev->tag_set.cmd_size = 0;
	dev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
	err = blk_mq_alloc_tag_set(&dev->tag_set);
	if (err) {
		printk(KERN_ERR
		       "blk_mq_alloc_tag_set: can't allocate tag set\n");
		goto out_alloc_tag_set;
	}

	/* Allocate queue. */
	dev->queue = blk_mq_init_queue(&dev->tag_set);
	if (IS_ERR(dev->queue)) {
		printk(KERN_ERR "blk_mq_init_queue: out of memory\n");
		err = -ENOMEM;
		goto out_blk_init;
	}
	blk_queue_logical_block_size(dev->queue, KERNEL_SECTOR_SIZE);
	dev->queue->queuedata = dev;

	/* initialize the gendisk structure */
	dev->gd = alloc_disk(MY_BLOCK_MINORS);
	if (!dev->gd) {
		printk(KERN_ERR "alloc_disk: failure\n");
		err = -ENOMEM;
		goto out_alloc_disk;
	}

	dev->gd->major = MY_BLOCK_MAJOR;
	dev->gd->first_minor = 0;
	dev->gd->fops = &my_block_ops;
	dev->gd->queue = dev->queue;
	dev->gd->private_data = dev;
	snprintf(dev->gd->disk_name, DISK_NAME_LEN, "myblock");
	set_capacity(dev->gd, NR_SECTORS);

	add_disk(dev->gd);

	return 0;

out_alloc_disk:
	blk_cleanup_queue(dev->queue);
out_blk_init:
	blk_mq_free_tag_set(&dev->tag_set);
out_alloc_tag_set:
	vfree(dev->data);
out_vmalloc:
	return err;
}

static int __init my_block_init(void)
{
	int err = 0;

	/* TODO 1: register block device */
	err = register_blkdev(MY_BLOCK_MAJOR, MY_BLKDEV_NAME);
	if (err < 0) {
		pr_err("fail register\n");
		return -EINVAL;
	}

	/* TODO 2: create block device using create_block_device */
	err = create_block_device(&g_dev);
	return 0;

out:
	/* TODO 2: unregister block device in case of an error */
	unregister_blkdev(MY_BLOCK_MAJOR, MY_BLKDEV_NAME);
	return err;
}

static void delete_block_device(struct my_block_dev *dev)
{
	if (dev->gd) {
		del_gendisk(dev->gd);
		put_disk(dev->gd);
	}

	if (dev->queue)
		blk_cleanup_queue(dev->queue);
	if (dev->tag_set.tags)
		blk_mq_free_tag_set(&dev->tag_set);
	if (dev->data)
		vfree(dev->data);
}

static void __exit my_block_exit(void)
{
	/* TODO 2: cleanup block device using delete_block_device */
	delete_block_device(&g_dev);

	/* TODO 1: unregister block device */
	unregister_blkdev(MY_BLOCK_MAJOR, MY_BLKDEV_NAME);
}

module_init(my_block_init);
module_exit(my_block_exit);

@rogerpease
Copy link

rogerpease commented Aug 18, 2023

Are you still seeing this (realize it's nearly a year later)? I was trying today and having the exact same problem. I came to the same conclusion you did- maybe a buffer layer exists which caches the data so the read() doesn't actually do the block request? Also saw the 4096 byte requests.

Will dig deeper into the open() call options.

@gurugio
Copy link
Author

gurugio commented Aug 18, 2023

  1. Why does it generate only read request?
    When a new device /dev/myblock, udevadm generates an event and systemd(or something else) reads some kilobytes of the block device. There is a readahead file in /sys/block/myblock/queues. I guess that's why. If you generate bigger block device and try to read the last sector, you could be able to see read-request.
  2. Why does the driver fail with direct IO?
    I didn't get it yet. I guess the driver should do something else with direct IO. I thought there should be some code in the brd driver in kernel but I didn't try it because I stopped reading the document.
    Please inform me you find something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants