Episode 11 Filesystems

We investigate the standard filesystem hierarchy and some tools for managing filesystems.

13 October 2015

[Rhythmic, dark electronic intro music]

League

Welcome back to Command Line TV, this is episode 11. Today we’re going to talk about file systems.

But first, do we have any follow-up from last time?

Lopes

We did have a follow-up question regarding shell scripts, or just scripts in general.

After writing a script can they be saved anywhere or is there some general

practice as to where to store these files?

League

Uh, yes to both – they can be saved anywhere and there is also sort of a practice.

What determines it is whether the script you’re writing pertains to a

particular set of files in one directory. So if it’s something that is

always run from the same place, then I like to just keep it in that place.

Then you can run the script using ./ like we did in the example last time.

However, if it’s a script that you might want to run on files in a bunch of different directories,

the simplest thing to do is to make that script – is to put it into a

directory that’s on your path. So the path is something that’s used to

determine where to find executable files, right? So I’m going to do echo $PATH

echo $PATH

PATH has to be all caps like that.

And this is a bunch of directories separated by the colon : character,

where the shell will look for programs when you type them.

When I type ls, it is finding the ls executable in one of these directories,

and we can tell that it’s actually in /usr/bin which is the first one.

which ls

So what we could do is write our custom scripts and then move them into one

of these directories so that the shell will automatically find it when we want to run it.

But all of these directories are protected – they’re only accessible for writing by the super-user,

by the administrator. So you could if you have administrative privileges

copy your script into there. But what’s more typical practice is to create

a corresponding directory within your own account for your own personal scripts.

For example, you might want to name it after these,

which are all called bin – that’s just traditional –

it stands for binaries, which is like executable files.

But traditionally that’s the name of a directory that goes into your PATH.

So I’m going to make my own bin but put it underneath my home directory, like that.

mkdir ~/bin

And then I could put my scripts into there. I wrote these scripts last time, like hello.sh.

So if I move hello.sh into ~/bin, then I’ll be able to find it there,

mv hello.sh ~/bin

but that’s not on my PATH yet. My PATH doesn’t contain ~/bin.

echo $PATH

So what I would have to do is – in my .bashrc, set up that my PATH should contain –

nane ~/.bashrc

in some cases writing ~ doesn’t expand properly to your home directory,

so you can use $HOME instead – I just out of habit think that’s a safer way to do this.

PATH=$HOME/bin:$PATH

What this does is, I’m just taking the existing PATH and adding my ~/bin onto the front of it.

So when I save that and then if you log in again or just reload that .bashrc,

source ~/.bashrc

now you see /home/cltv/bin on the front of my PATH,

echo $PATH

and that means that the hello.sh can be found there no matter where I am.

which hello.sh

So I’m in Downloads now, but if I just type hello.sh on the command

hello.sh

line then it’s going to run that script. And remember, we left an error in that script.

nano ~/bin/hello.sh

So let’s maybe go fix that

So this should say echo – save, uh-huh, oops – permission denied, interesting.

What did I do there?

ls -l ~/bin

Okay, so this is from when I was playing with the permissions, yeah.

So I wanted to illustrate some of those octal codes for permissions and I

made it so it’s not writable by me, which is kind of annoying.

Let’s review that – I can do user u+w to say I should be able to write that. Okay, that’s better.

chmod u+w ~/bin/hello.sh

Now I can retry the nanoecho, save, exit. And now hello.sh will

give me those messages and then run the command I told it to run.

I can run that from anywhere because it’s on my PATH.

So if I go into pics then I can do hello.sh and see what’s in the pics folder.

cd pics
hello.sh
Lopes

So the term filesystem can refer to many different things.

I guess where we should start off is, what exactly does filesystem mean to us as a user?

League

Sure, just like a lot of other terms on Unix systems,

‘filesystem’ gets overloaded to mean multiple things.

The simplest thing that it’s used for is just the files that are accessible to you on your system.

So if you start at the top of your filesystem, which we call the root

directory (another term that’s overloaded a lot is ‘root’ –

ls /

so the root directory is just called slash ‘/’, it’s the very top of our filesystem.

And within there are a bunch of directories that go deeper into other subdirectories and so forth.

And we can use ‘filesystem’ to refer to this entire tree structure.

So you could say like “Are there any files on your filesystem that have the

extension .bak” or something like that. There are commands you could use to figure that out.

But that’s referring to filesystem as the set of files that are available.

There are other ways to use ‘filesystem’ as a term.

One way is that it refers specifically to the format of –

the way the files and directories are actually represented on the device.

So how are they represented as bits, how do we represent things like the

filename and permissions and other attributes, like modification time.

Different filesystem formats could have different capabilities so there

might be a format that supports very long filenames and another format that

has a limitation on the filenames.

There could be file formats that support a journal,

which means that when your system goes down or loses power or something,

you can recover what was going on from the journal and it helps prevent

corruption and things like that. So lots of different features that can be

built in to the way that files and directories are represented as bits on the device.

There are also virtual filesystems which we’ll take a look at.

One kind of virtual filesystem is that it can give you access to some kind

of database or data structure that’s part of the operating system.

But it shows it to you by exposing it as if it were directories and files.

So you can use the regular tools like cd and cat and head and grep

to search through and browse these data structures.

Another type of virtual filesystem is when you’ve got a disk image –

and this is something that’s a little familiar to other computer users too.

You can download an image file that represents a filesystem as it might

appear on a CD or something like that. Then you can actually take that and

mount it as if it were a CD. So if you’ve used things like virtualbox or

other virtual machines you probably interacted with those disk images.

Lopes

So in regards to filesystems like we just mentioned working from root in the terminal,

we’ve done things in this directory before – we’ve accessed the /etc

folder to change our .bashrc configurations. We’ve also accessed the /usr directory.

What are the other files and directories within the root?

League

Yeah these are some cryptic looking directories up here.

ls /

They’re all kind of somewhat loosely standardized by a thing called the

Filesystem Hierarchy Standard, or FHS. This is a document that different

Linux distributions use to manage where things should go on a system.

A lot of it is just common practice that gets standardized over decades, really.

But I can describe some of these high-level directories.

Yeah, we’ve looked at /etc this is a lot of configuration files.

So one thing that we got out of there was the passwd file,

so the users that are on your system are defined – oh that’s interesting –

cat etc/passwd

what did I do wrong there? I’m in my home directory so there’s no etc/ subdirectory from home.

I had to put the full, absolute path, like that. And then I could see the passwd file.

cat /etc/passwd

So there’s lots of other stuff like that in /etc that has to do with

different configurations of the software on the system.

The /home folder is where your user accounts go. So we have /home/cltv

or /home/USERNAME in general. The /boot is where we’ve got the kernel images,

ls /boot

so when your OS first boots it’s going to look into that directory and find these image files –

that’s what it loads into memory and starts executing in order to boot the system.

Let’s see – /root is just the home directory of the administrative user,

just like regular users have like /home/cltv, root has /root which

ls /root

we’re not allowed to access unless we are root. But that’s another way

that ‘root’ is confusing because I just told you that / is the root directory,

but then there’s also /root which is – I don’t know – ‘root root’ or something.

Okay, ‘/usr’ is a big one. A lot of the installed software goes into these different trees,

these different hierarchies. Within each tree you’ve got a standardized set

of directories like ‘bin’, ‘lib’, ‘share’, and a couple of others depending on the situation.

So I’ve got a few of those here at the very top level –

so ‘/’ (root) itself is one of those hierarchies – I’ve got bin and lib here.

And the programs that are in /bin are considered to be the really,

ls /bin

really essential programs that you need to boot up the system,

to get things running, to explore the system if maybe something goes wrong.

So they’re generally considered like the most important applications that are available.

Now, /usr/bin – I’m seeing something interesting here on this system,

ls /usr/bin

which is – yeah, so [laughs] – this is going to contradict what I just said.

ls -ld /bin

On this particular system, with ArchLinux, they’ve decided that /bin should just be an alias –

this is a symbolic link, which is a filesystem alias – to /usr/bin.

So in other words, those are the same directory on this system.

Some systems, /bin will be just the essential stuff and then /usr/bin

can get mounted later and it contains many many more programs. But here they’re the same.

So I’m not going to get into that distinction any more.

Within /usr you’ve got the bin and a separate etc, lib, share and so forth.

And the idea with this sub-hierarchy is that your executable stuff goes in

bin and that’s what people put in their $PATH. So traditionally those

were compiled machine code but they also can be scripts of various kinds.

lib is for libraries that get linked in to other programs.

So when you’re doing programming you might link to some library and include those –

you import that stuff, and then – I could show you some examples in there.

ls /usr/lib

So *.so – the .so is one of the extensions for libraries.

ls /usr/lib/*.so

So for example if you have a program that uses the Magick++ – this is the ImageMagick library.

And then maybe you use jpegutils or libz is for compression or stuff like that.

So your executables can refer to libraries in the lib folder.

But they’re not themselves directly executable.

And then share is for stuff that is not binary, so it’s like data files,

ls /usr/share

could be documentation files, licenses, things like that.

So there’s usually a usr/share/doc which contains –

ls /usr/share/doc

for just about everything that’s installed, it contains the documentation.

So I think we looked previously at ImageMagick documentation in there.

ls /usr/share/doc/ImageMagick-6

So that kind of gets to the hierarchy underneath usr but then that same

structure gets repeated in other places. Like, a lot of systems will have

usr/local and the idea of usr/local is stuff that you personally,

ls /usr/local

as the administrator of this machine would install.

So I might have a few things there – not very much usually.

ls /usr/local/bin

But these are things that I have installed without using the package manager.

So the package manager takes care of the /usr hierarchy, like /usr/bin,

but if I want to install something manually it can go in /usr/local/bin

and then it won’t conflict with the packages that the package manager installs.

So the couple of remaining directories at this top,

root level that I want to focus on are: var – this is where things like

log files and temporary files, caches go. We can look into the first level of that –

so it could be, there’s cache and log, and email you send and stuff like that.

So occasionally you might have to go in there, if you’re debugging a serious system problem,

to take a look at log files. But it’s going to depend on exactly what you’re trying to do,

so there’s not much use in exploring that in great detail right now.

Let’s look at /proc though – this is a pretty interesting one.

cd /proc
ls

/proc is one of those virtual filesystems, so it’s giving us access to

data structures inside the operating system. These numbers here are

directories which correspond to information about each process that’s running on the system.

So it’s got a process ID and within there you can see some information about that process.

But there are some other things here – one I like to look at is filesystems.

So if I cat filesystems virtual file, the operating system is going to

cat filesystems

report to me what filesystem formats it understands.

And a lot of these that start with nodev are the virtual ones –

including /proc itself is in that list – just up here.

But then the ones without nodev are the actual physical formats for disks that it can use.

So vfat is a somewhat older format used by Windows systems.

It’s still used today on lots of USB drives and so forth.

The native system for most Linux devices is called ext and the current

one is ext4 but there are older versions of that available.

So this tells us what formats the operating system can understand.

Another interesting file in here that I’ve had to set on multiple occasions –

let’s go down to /proc/sys/fs/inotify – okay. So inotify is a service

cd /proc/sys/fs/inotify

that allows a program to get notifications from the operating system when files change.

So for example if you have a backup program that may be making backups of

all of your files and shipping them off to a server somewhere, encrypting them and so forth.

That backup program will want to know when files change because then it

should make a new backup of that file. So there is a limitation to how many

files one of those programs can be watching – so that’s in max_user_watches.

cat max_user_watches

This appears to be a file that just contains this one number.

But actually that is a setting within the operating system and I am just

reading that setting by using cat, but if I want to change that setting I can also redirect to it.

So I can do something like echo – let’s say 1048700 so I’ll add a couple of extra watches.

echo 1048700 > max_user_watches

And then I would redirect into that file. So that’s how I could set a new setting for that variable.

Now the problem with that is that you need to be the administrator to write to that file.

So if we take a look at its permissions, it’s owned by root and writable

ls -l

by root but not writable by anyone else. So that explains that.

You would think that you could just do sudo to fix that, but you can’t.

sudo echo 1048700 > max_user_watches

The reason that doesn’t work is a little subtle, but when you do sudo

it’s running the echo as the administrator, but redirections don’t become part of the sudo

the redirection is still done by your local user – the current user. So that’s not enough.

What we actually would have to do is get a shell owned by the superuser,

so sudo supports -s and I’ve got to type a password here.

sudo -s

And now I can tell by this pound sign # that I the administrator,

so I’m going to do this same echo with the redirection.

echo 1048700 > max_user_watches

And now if I cat that file, it took on the new value.

cat max_user_watches

So I’m actually using filesystem tools like cd and cat and redirection

to tweak parameters within the operating system itself.

Now my backup program will be able to watch even more files at the same time.

Lopes

So now that we’ve looked at some of the filesystem hierarchy in terms of

the different directories it contains, what if we want to do something with a USB thumb drive.

How can we do things with this? How do we add it to the system, locate it, format it if necessary?

League

Good example, so I’m going to take this and plug it right into my laptop

and what will happen on many Linux systems that are preconfigured to be friendly,

so Ubuntu and those sort that have a desktop environment on them –

a lot of times that will just pop up a folder just like on windows,

so it has been mounted automatically and you can start using it right away.

But we want to learn about what happens underneath,

so I don’t have my system configured to do any of that.

What we’ve got to figure out is, what is the device name on the system that

corresponds to this drive that we just plugged in. So there’s another

directory from that top level hierarchy that I didn’t introduce yet, called /dev.

ls /dev

Inside /dev you’ve got a bunch of stuff that represents different sorts of devices on the system.

So they could be some kind of input/output device, storage device,

sound cards, all sorts of things in there. But the ones that we’re mostly

interested in are the ones that start with sd. On some systems it could

be hd but these are like hard drives of various kinds.

ls /dev/sd*

So if I look for sd* and this yellow – I’m sorry – is a little hard to see!

[Laughs] I could tell ls not to color that. So I’ve got –

ls --color=no /dev/sd*

sda is my main disk, and then sda1, sda2 – these are partitions of that disk.

I’m not going to play with sda because that’s my real actual disk

[laughs] and I don’t want to mess anything up. But now we’ve got sdb and

this only showed up when I plugged in that drive. There wasn’t an sdb here before.

And sdb1 is a partition on that. So generally speaking you can just use sdb directly,

but a lot of times what you’ll do is create like a single partition that takes up the entire drive,

and that would be called sdb1.

In a way it’s not really a partition because it’s still the whole drive,

right – partition you think of as breaking it into pieces.

But you’re using the partition table on that drive to still have one partition.

Okay, I just want to prove that when I unplug this device now,

and I do that ls again – the sdb has disappeared.

ls --color=no /dev/sd*

So the /dev filesystem is one of those virtual filesystems that

automatically updates based on which devices are accessible or not.

So that device is sdb. Another thing that I can do to kind of investigate

the size of the disk or the partition structure is a simple command called

fdisk and -l will give me detailed listing of partitions, just like ls -l.

Then I give /dev/sdb. But to open a disk in this way,

fdisk -l /dev/sdb

to be able to look at the partition information you need to be the administrator. So I’ll do that.

sudo fdisk -l /dev/sdb

And now we see that this drive is about 960 MB and here is the one

partition that starts and ends at particular places.

This partition is formatted as FAT16, so one of the old Windows or even DOS formats.

That makes it readable on lots of different machines, which is good,

but it doesn’t have a lot of the features that we might expect of a modern Linux filesystem.

So first I’m going to mount that filesystem so that I can see the files that are there.

To do that we do a command called mount and first you give the device name,

so that’s sdb1 and then you give a directory on the system where those

files will appear and basically it could be any directory –

sudo mount /dev/sdb1 /mnt

usually it’s empty – but there is a built-in directory I’ve got here called /mnt or ‘mount’,

which is specifically for these types of temporary mounts. So I’ll put it there.

Now if I look into /mnt I’ve got files here that correspond to –

ls /mnt

or that are the files on that drive. Then unmount is actually spelled umount

for some reason they thought saving that one character would be helpful [laughs].

sudo umount /mnt

When I unmount I can either give the device name or the directory name, either one works.

Now if I look back in that directory, it’s empty again – there’s nothing there.

ls /mnt

So it’s unmounted and now it’s safe to remove the device.

Let’s say I want to reformat that. I’m not going to repartition at this stage,

but I just want to reformat using a Linux filesystem.

There’s a command called mkfs – this is the format command.

And there are a lot of variants of it, so I’m hitting ‘tab’ here to see the different variations.

You can just do mkfs and it’ll use some default format, I think it’ll be ext2.

But if I want one of these other ones I can specify that, so mkfs.ext4 for example.

And then there are different options you can specify here about how to

layout the system or how much space you want to reserve for different things.

But generally you don’t have to say anything else, you just give the device name, so sdb1.

And you know, reformatting a disk is dangerous – you’re going to lose all

the files on it currently. So you want to make especially sure I didn’t

type a here because that’s my real disk! So sdb1mkfs.

mkfs.ext4 /dev/sdb1

Oh, permission denied of course, so sudo that. And it’s got a little

sudo mkfs.ext4 /dev/sdb1

protection here that it already seems to contain a filesystem so are you sure you want to reformat?

Yeah, let’s go ahead.

It goes through and creates the format on that disk,

and now if I try to mount it again, let’s say mount /dev/sdb1 into /mnt – that worked.

sudo mount /dev/sdb1 /mnt

And now the files that were there before are gone. There’s this directory called lost+found

ls /mnt

this is a feature of ext filesystems where basically if there are fragments of –

like if the filesystem gets corrupted, which is pretty rare these days –

but if it gets corrupted and there are some fragments of files that it

doesn’t know where they belong, it puts them in the lost+found folder and

maybe you can make sense of them at a later time. But it’s not usually useful,

it’s just always there.

So I’ve got this new filesystem mounted and I can go there, I can create directories.

mkdir hello

I can create a file – “this is on my new drive”, okay. Write that, exit.

nano readme.txt

Oh I’m still in ~ I didn’t mean that. Okay, /mnt – oh yeah,

ls -l /mnt

so [laughs] – I did all of that in ~, that wasn’t the right thing.

ls -l

So I’m going to just mv readme.txt let’s say into /mnt – permission denied, interesting.

mv readme.txt /mnt

So the way it mounted this, because I mounted it as root and I didn’t give

ls -l /mnt

permission to other users to access it, then my regular user can’t do that.

What I could do there is do a chmod, and say that everyone is allowed to read,

sudo chmod +rwx /mnt

write, and execute that folder, okay. And if I do that – did we do ls -a before?

ls -la /mnt

This is what shows the dot files

Lopes

hidden files?

League

hidden files, yeah. And one of those is . so this is a representation of

that mounted directory itself. And it did not add write permission for group and others,

so I’m going to be more explicit about that. Let’s see if that works.

sudo chmod go+w /mnt

Okay, so now everybody is allowed to write to that mounted disk.

Which means that I should be able to repeat this command to move readme over to /mnt.

mv readme.txt /mnt

And now that exists and is owned by this user – so one thing that having

ls -l /mnt

this ext filesystem on the USB drive means is that I can have file

ownerships that make sense to this system. I can do things like symbolic

links and other filesystem features that Linux filesystems support but FAT systems do not.

Okay, so I’ve got the readme file there – let’s unmount that. So now it’s empty.

sudo umount /mnt
ls /mnt

And then we want to remount – just lost it – that appears exactly as it did before,

sudo mount /dev/sdb1 /mnt
ls /mnt -la

with the same permissions here and with the owner kept track of there.

So one thing it does mean, having that drive formatted with ext4 is that

now this will be pretty useless on Windows and Mac systems.

The ext system really only works on Linux, so if you want a drive that

can be transferred between different operating systems,

you need to format it with a filesystem that works on all of them.

Thanks for joining us today. Next time I think we’ll cover more about

searching through filesystems using commands like find and locate and then xargs,

which also gets used quite a lot with find. So see you then.

[Dark electronic beat]

[Captions by Christopher League]

[End]