2009年11月29日星期日

一个c语言小技巧,摘自minix fs

我说的小技巧就是最后的那句 err |= minix_sync_inode(inode);

真是懒的可以,Linus真是惜行如jing啊, 前面的错误检测和后面的都一并检查了,不过这里也有一个前提,就是 sync_mapping_buffers错误了,调用minix_sync_inode也不会导致更严重的问题。

学习学习。。。




int minix_sync_file(struct file * file,
struct dentry *dentry, int datasync)
{
struct inode *inode = dentry->d_inode;
int err;

err = sync_mapping_buffers(inode->i_mapping);
if (!(inode->i_state & I_DIRTY))
return err;
if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
return err;

err |= minix_sync_inode(inode);
return err ? -EIO : 0;
}

About File system logging

This is a note @MIT OCW 6.824 Lecture 7:

The main point of a log is make complex operations atomic.

I.e. operations that involve many individule writes. You want all writes or none, even if a crash in the middle.

A "transaction" is multi-write operation that should be atomic. The logging system needs to know which set of write from a transication.

Re-do with checkpoint:

Most logs work like this, e.g. FSD,
allows much faster recovery: ca use on-disk data
write-ahead rule:

delay flushing dirty block from in-memory data cache until corresponding commit recore is on disk

Check point rules:

all data writes before check point must be stable on disk checkpoint may not advance beyond first uncommitted Begin

Recovery:

for each block mentioned in the log
find the last xaction that wrote that block
if committed: re-do
if not committed: un-do

Why is logging fast:

group commit -- batched log writes.
could delay flushing log -- may lost committed transactions but at least you have a prefix.

Single seek to implement a transaction.
maybe less if no intervening disk activity, or group commit

Write-behind of data allows batched/schedules.
one data block may reflect many transactions, i.e. create many files in a directory.
don't have to be so careful since the log is the real infomation.

How can we avoid delete/create inconsistency?

This is a file system note: @MIT 6.824 2006 Lecture 6

Think this satiation,

unlink("f1");
create("f2");
Create happens to re-use the i-node freed by the unlink.
suppose only create write goes to disk, but none of the unlink's writes.

Crash.

After re-start, what does recovery see?

The file system looks correct! Nothing to fix!
But file f1 actually has file f2's contents!

Serious *undetected* inconsisency.

This is *not* a state the file system counld have been in if the crash had occured slightly earlier or later. And fsck did not notify the user there was an unfixable problem!

How can we avoid this delete/create inconsistency?

Observation: We only care about what's visible in the file system tree.

Goal: on-disk directory entry must always point to correct on-disk i-node.

Unlink Rule: remove dirent *on disk* before freeing i-node.

Create Rule: initialize new i-node *on disk* before creating directory entry.

In general, directory entry writes should be commit points.
Crash just before leves us with unused allocated i-node.
Crash just after is fine.

2009年11月15日星期日

感冒了,翻墙了

冬天气温降的很快,就连以往冬天不那么冷的广东,这次都很冷很冷了。我都感冒了, 不过不是猪流感。

还有一件事就是使用ssh tunnel翻墙了,很爽阿。

赞一个OpenSSH。