Episode 11 Filesystems

We investigate the standard filesystem hierarchy and some tools for managing filesystems.

13 October 2015

	• [Rhythmic, dark electronic intro music]
League	• Welcome back to Command Line TV, this is episode 11. Today we’re going to talk about file systems. • But first, do we have any follow-up from last time?
Lopes	• We did have a follow-up question regarding shell scripts, or just scripts in general. • After writing a script can they be saved anywhere or is there some general • practice as to where to store these files?
League	• Uh, yes to both – they can be saved anywhere and there is also sort of a practice. • What determines it is whether the script you’re writing pertains to a • particular set of files in one directory. So if it’s something that is • always run from the same place, then I like to just keep it in that place. • Then you can run the script using `./` like we did in the example last time. • However, if it’s a script that you might want to run on files in a bunch of different directories, • the simplest thing to do is to make that script – is to put it into a • directory that’s on your path. So the path is something that’s used to • determine where to find executable files, right? So I’m going to do `echo $PATH` – echo $PATH • `PATH` has to be all caps like that. • And this is a bunch of directories separated by the colon `:` character, • where the shell will look for programs when you type them. • When I type `ls`, it is finding the `ls` executable in one of these directories, • and we can tell that it’s actually in `/usr/bin` which is the first one. which ls • So what we could do is write our custom scripts and then move them into one • of these directories so that the shell will automatically find it when we want to run it. • But all of these directories are protected – they’re only accessible for writing by the super-user, • by the administrator. So you could if you have administrative privileges • copy your script into there. But what’s more typical practice is to create • a corresponding directory within your own account for your own personal scripts. • For example, you might want to name it after these, • which are all called `bin` – that’s just traditional – • it stands for binaries, which is like executable files. • But traditionally that’s the name of a directory that goes into your `PATH`. • So I’m going to make my own `bin` but put it underneath my home directory, like that. mkdir ~/bin • And then I could put my scripts into there. I wrote these scripts last time, like `hello.sh`. • So if I move `hello.sh` into `~/bin`, then I’ll be able to find it there, mv hello.sh ~/bin • but that’s not on my `PATH` yet. My `PATH` doesn’t contain `~/bin`. echo $PATH • So what I would have to do is – in my `.bashrc`, set up that my `PATH` should contain – nane ~/.bashrc • in some cases writing `~` doesn’t expand properly to your home directory, • so you can use `$HOME` instead – I just out of habit think that’s a safer way to do this. PATH=$HOME/bin:$PATH • What this does is, I’m just taking the existing `PATH` and adding my `~/bin` onto the front of it. • So when I save that and then if you log in again or just reload that `.bashrc`, source ~/.bashrc • now you see `/home/cltv/bin` on the front of my `PATH`, echo $PATH • and that means that the `hello.sh` can be found there no matter where I am. which hello.sh • So I’m in `Downloads` now, but if I just type `hello.sh` on the command hello.sh • line then it’s going to run that script. And remember, we left an error in that script. nano ~/bin/hello.sh • So let’s maybe go fix that • So this should say `echo` – save, uh-huh, oops – permission denied, interesting. • What did I do there? ls -l ~/bin • Okay, so this is from when I was playing with the permissions, yeah. • So I wanted to illustrate some of those octal codes for permissions and I • made it so it’s not writable by me, which is kind of annoying. • Let’s review that – I can do user `u+w` to say I should be able to write that. Okay, that’s better. chmod u+w ~/bin/hello.sh • Now I can retry the `nano` – `echo`, save, exit. And now `hello.sh` will • give me those messages and then run the command I told it to run. • I can run that from anywhere because it’s on my `PATH`. • So if I go into `pics` then I can do `hello.sh` and see what’s in the `pics` folder. cd pics hello.sh
Lopes	• So the term filesystem can refer to many different things. • I guess where we should start off is, what exactly does filesystem mean to us as a user?
League	• Sure, just like a lot of other terms on Unix systems, • ‘filesystem’ gets overloaded to mean multiple things. • The simplest thing that it’s used for is just the files that are accessible to you on your system. • So if you start at the top of your filesystem, which we call the root • directory (another term that’s overloaded a lot is ‘root’ – ls / • so the root directory is just called slash ‘/’, it’s the very top of our filesystem. • And within there are a bunch of directories that go deeper into other subdirectories and so forth. • And we can use ‘filesystem’ to refer to this entire tree structure. • So you could say like “Are there any files on your filesystem that have the • extension `.bak`” or something like that. There are commands you could use to figure that out. • But that’s referring to filesystem as the set of files that are available. • There are other ways to use ‘filesystem’ as a term. • One way is that it refers specifically to the format of – • the way the files and directories are actually represented on the device. • So how are they represented as bits, how do we represent things like the • filename and permissions and other attributes, like modification time. • Different filesystem formats could have different capabilities so there • might be a format that supports very long filenames and another format that • has a limitation on the filenames. • There could be file formats that support a journal, • which means that when your system goes down or loses power or something, • you can recover what was going on from the journal and it helps prevent • corruption and things like that. So lots of different features that can be • built in to the way that files and directories are represented as bits on the device. • There are also virtual filesystems which we’ll take a look at. • One kind of virtual filesystem is that it can give you access to some kind • of database or data structure that’s part of the operating system. • But it shows it to you by exposing it as if it were directories and files. • So you can use the regular tools like `cd` and `cat` and `head` and `grep` • to search through and browse these data structures. • Another type of virtual filesystem is when you’ve got a disk image – • and this is something that’s a little familiar to other computer users too. • You can download an image file that represents a filesystem as it might • appear on a CD or something like that. Then you can actually take that and • mount it as if it were a CD. So if you’ve used things like virtualbox or • other virtual machines you probably interacted with those disk images.
Lopes	• So in regards to filesystems like we just mentioned working from root in the terminal, • we’ve done things in this directory before – we’ve accessed the `/etc` • folder to change our `.bashrc` configurations. We’ve also accessed the `/usr` directory. • What are the other files and directories within the root?
League	• Yeah these are some cryptic looking directories up here. ls / • They’re all kind of somewhat loosely standardized by a thing called the • Filesystem Hierarchy Standard, or FHS. This is a document that different • Linux distributions use to manage where things should go on a system. • A lot of it is just common practice that gets standardized over decades, really. • But I can describe some of these high-level directories. • Yeah, we’ve looked at `/etc` this is a lot of configuration files. • So one thing that we got out of there was the `passwd` file, • so the users that are on your system are defined – oh that’s interesting – cat etc/passwd • what did I do wrong there? I’m in my home directory so there’s no `etc/` subdirectory from home. • I had to put the full, absolute path, like that. And then I could see the `passwd` file. cat /etc/passwd • So there’s lots of other stuff like that in `/etc` that has to do with • different configurations of the software on the system. • The `/home` folder is where your user accounts go. So we have `/home/cltv` • or `/home/USERNAME` in general. The `/boot` is where we’ve got the kernel images, ls /boot • so when your OS first boots it’s going to look into that directory and find these image files – • that’s what it loads into memory and starts executing in order to boot the system. • Let’s see – `/root` is just the home directory of the administrative user, • just like regular users have like `/home/cltv`, root has `/root` which ls /root • we’re not allowed to access unless we are root. But that’s another way • that ‘root’ is confusing because I just told you that `/` is the root directory, • but then there’s also `/root` which is – I don’t know – ‘root root’ or something. • Okay, ‘/usr’ is a big one. A lot of the installed software goes into these different trees, • these different hierarchies. Within each tree you’ve got a standardized set • of directories like ‘bin’, ‘lib’, ‘share’, and a couple of others depending on the situation. • So I’ve got a few of those here at the very top level – • so ‘/’ (root) itself is one of those hierarchies – I’ve got `bin` and `lib` here. • And the programs that are in `/bin` are considered to be the really, ls /bin • really essential programs that you need to boot up the system, • to get things running, to explore the system if maybe something goes wrong. • So they’re generally considered like the most important applications that are available. • Now, `/usr/bin` – I’m seeing something interesting here on this system, ls /usr/bin • which is – yeah, so [laughs] – this is going to contradict what I just said. ls -ld /bin • On this particular system, with ArchLinux, they’ve decided that `/bin` should just be an alias – • this is a symbolic link, which is a filesystem alias – to `/usr/bin`. • So in other words, those are the same directory on this system. • Some systems, `/bin` will be just the essential stuff and then `/usr/bin` • can get mounted later and it contains many many more programs. But here they’re the same. • So I’m not going to get into that distinction any more. • Within `/usr` you’ve got the `bin` and a separate `etc`, `lib`, `share` and so forth. • And the idea with this sub-hierarchy is that your executable stuff goes in • `bin` and that’s what people put in their `$PATH`. So traditionally those • were compiled machine code but they also can be scripts of various kinds. • `lib` is for libraries that get linked in to other programs. • So when you’re doing programming you might link to some library and include those – • you import that stuff, and then – I could show you some examples in there. ls /usr/lib • So `.so` – the `.so` is one of the extensions for libraries. ls /usr/lib/.so • So for example if you have a program that uses the `Magick++` – this is the ImageMagick library. • And then maybe you use `jpegutils` or `libz` is for compression or stuff like that. • So your executables can refer to libraries in the `lib` folder. • But they’re not themselves directly executable. • And then `share` is for stuff that is not binary, so it’s like data files, ls /usr/share • could be documentation files, licenses, things like that. • So there’s usually a `usr/share/doc` which contains – ls /usr/share/doc • for just about everything that’s installed, it contains the documentation. • So I think we looked previously at ImageMagick documentation in there. ls /usr/share/doc/ImageMagick-6 • So that kind of gets to the hierarchy underneath `usr` but then that same • structure gets repeated in other places. Like, a lot of systems will have • `usr/local` and the idea of `usr/local` is stuff that you personally, ls /usr/local • as the administrator of this machine would install. • So I might have a few things there – not very much usually. ls /usr/local/bin • But these are things that I have installed without using the package manager. • So the package manager takes care of the `/usr` hierarchy, like `/usr/bin`, • but if I want to install something manually it can go in `/usr/local/bin` • and then it won’t conflict with the packages that the package manager installs. • So the couple of remaining directories at this top, • root level that I want to focus on are: `var` – this is where things like • log files and temporary files, caches go. We can look into the first level of that – • so it could be, there’s `cache` and `log`, and email you send and stuff like that. • So occasionally you might have to go in there, if you’re debugging a serious system problem, • to take a look at log files. But it’s going to depend on exactly what you’re trying to do, • so there’s not much use in exploring that in great detail right now. • Let’s look at `/proc` though – this is a pretty interesting one. cd /proc ls • `/proc` is one of those virtual filesystems, so it’s giving us access to • data structures inside the operating system. These numbers here are • directories which correspond to information about each process that’s running on the system. • So it’s got a process ID and within there you can see some information about that process. • But there are some other things here – one I like to look at is `filesystems`. • So if I `cat filesystems` virtual file, the operating system is going to cat filesystems • report to me what filesystem formats it understands. • And a lot of these that start with `nodev` are the virtual ones – • including `/proc` itself is in that list – just up here. • But then the ones without `nodev` are the actual physical formats for disks that it can use. • So `vfat` is a somewhat older format used by Windows systems. • It’s still used today on lots of USB drives and so forth. • The native system for most Linux devices is called `ext` and the current • one is `ext4` but there are older versions of that available. • So this tells us what formats the operating system can understand. • Another interesting file in here that I’ve had to set on multiple occasions – • let’s go down to `/proc/sys/fs/inotify` – okay. So `inotify` is a service cd /proc/sys/fs/inotify • that allows a program to get notifications from the operating system when files change. • So for example if you have a backup program that may be making backups of • all of your files and shipping them off to a server somewhere, encrypting them and so forth. • That backup program will want to know when files change because then it • should make a new backup of that file. So there is a limitation to how many • files one of those programs can be watching – so that’s in `max_user_watches`. cat max_user_watches • This appears to be a file that just contains this one number. • But actually that is a setting within the operating system and I am just • reading that setting by using `cat`, but if I want to change that setting I can also redirect to it. • So I can do something like `echo` – let’s say `1048700` so I’ll add a couple of extra watches. echo 1048700 > max_user_watches • And then I would redirect into that file. So that’s how I could set a new setting for that variable. • Now the problem with that is that you need to be the administrator to write to that file. • So if we take a look at its permissions, it’s owned by `root` and writable ls -l • by `root` but not writable by anyone else. So that explains that. • You would think that you could just do `sudo` to fix that, but you can’t. sudo echo 1048700 > max_user_watches • The reason that doesn’t work is a little subtle, but when you do `sudo` • it’s running the `echo` as the administrator, but redirections don’t become part of the `sudo` – • the redirection is still done by your local user – the current user. So that’s not enough. • What we actually would have to do is get a shell owned by the superuser, • so `sudo` supports `-s` and I’ve got to type a password here. sudo -s • And now I can tell by this pound sign `#` that I the administrator, • so I’m going to do this same `echo` with the redirection. echo 1048700 > max_user_watches • And now if I `cat` that file, it took on the new value. cat max_user_watches • So I’m actually using filesystem tools like `cd` and `cat` and redirection • to tweak parameters within the operating system itself. • Now my backup program will be able to watch even more files at the same time.
Lopes	• So now that we’ve looked at some of the filesystem hierarchy in terms of • the different directories it contains, what if we want to do something with a USB thumb drive. • How can we do things with this? How do we add it to the system, locate it, format it if necessary?
League	• Good example, so I’m going to take this and plug it right into my laptop • and what will happen on many Linux systems that are preconfigured to be friendly, • so Ubuntu and those sort that have a desktop environment on them – • a lot of times that will just pop up a folder just like on windows, • so it has been mounted automatically and you can start using it right away. • But we want to learn about what happens underneath, • so I don’t have my system configured to do any of that. • What we’ve got to figure out is, what is the device name on the system that • corresponds to this drive that we just plugged in. So there’s another • directory from that top level hierarchy that I didn’t introduce yet, called `/dev`. ls /dev • Inside `/dev` you’ve got a bunch of stuff that represents different sorts of devices on the system. • So they could be some kind of input/output device, storage device, • sound cards, all sorts of things in there. But the ones that we’re mostly • interested in are the ones that start with `sd`. On some systems it could • be `hd` but these are like hard drives of various kinds. ls /dev/sd* • So if I look for `sd` and this yellow – I’m sorry – is a little hard to see! • [Laughs] I could tell `ls` not to color that. So I’ve got – ls --color=no /dev/sd • `sda` is my main disk, and then `sda1`, `sda2` – these are partitions of that disk. • I’m not going to play with `sda` because that’s my real actual disk • [laughs] and I don’t want to mess anything up. But now we’ve got `sdb` and • this only showed up when I plugged in that drive. There wasn’t an `sdb` here before. • And `sdb1` is a partition on that. So generally speaking you can just use `sdb` directly, • but a lot of times what you’ll do is create like a single partition that takes up the entire drive, • and that would be called `sdb1`. • In a way it’s not really a partition because it’s still the whole drive, • right – partition you think of as breaking it into pieces. • But you’re using the partition table on that drive to still have one partition. • Okay, I just want to prove that when I unplug this device now, • and I do that `ls` again – the `sdb` has disappeared. ls --color=no /dev/sd* • So the `/dev` filesystem is one of those virtual filesystems that • automatically updates based on which devices are accessible or not. • So that device is `sdb`. Another thing that I can do to kind of investigate • the size of the disk or the partition structure is a simple command called • `fdisk` and `-l` will give me detailed listing of partitions, just like `ls -l`. • Then I give `/dev/sdb`. But to open a disk in this way, fdisk -l /dev/sdb • to be able to look at the partition information you need to be the administrator. So I’ll do that. sudo fdisk -l /dev/sdb • And now we see that this drive is about 960 MB and here is the one • partition that starts and ends at particular places. • This partition is formatted as FAT16, so one of the old Windows or even DOS formats. • That makes it readable on lots of different machines, which is good, • but it doesn’t have a lot of the features that we might expect of a modern Linux filesystem. • So first I’m going to mount that filesystem so that I can see the files that are there. • To do that we do a command called `mount` and first you give the device name, • so that’s `sdb1` and then you give a directory on the system where those • files will appear and basically it could be any directory – sudo mount /dev/sdb1 /mnt • usually it’s empty – but there is a built-in directory I’ve got here called `/mnt` or ‘mount’, • which is specifically for these types of temporary mounts. So I’ll put it there. • Now if I look into `/mnt` I’ve got files here that correspond to – ls /mnt • or that are the files on that drive. Then unmount is actually spelled `umount` – • for some reason they thought saving that one character would be helpful [laughs]. sudo umount /mnt • When I unmount I can either give the device name or the directory name, either one works. • Now if I look back in that directory, it’s empty again – there’s nothing there. ls /mnt • So it’s unmounted and now it’s safe to remove the device. • Let’s say I want to reformat that. I’m not going to repartition at this stage, • but I just want to reformat using a Linux filesystem. • There’s a command called `mkfs` – this is the format command. • And there are a lot of variants of it, so I’m hitting ‘tab’ here to see the different variations. • You can just do `mkfs` and it’ll use some default format, I think it’ll be `ext2`. • But if I want one of these other ones I can specify that, so `mkfs.ext4` for example. • And then there are different options you can specify here about how to • layout the system or how much space you want to reserve for different things. • But generally you don’t have to say anything else, you just give the device name, so `sdb1`. • And you know, reformatting a disk is dangerous – you’re going to lose all • the files on it currently. So you want to make especially sure I didn’t • type `a` here because that’s my real disk! So `sdb1` – `mkfs`. mkfs.ext4 /dev/sdb1 • Oh, permission denied of course, so `sudo` that. And it’s got a little sudo mkfs.ext4 /dev/sdb1 • protection here that it already seems to contain a filesystem so are you sure you want to reformat? • Yeah, let’s go ahead. • It goes through and creates the format on that disk, • and now if I try to mount it again, let’s say `mount /dev/sdb1` into `/mnt` – that worked. sudo mount /dev/sdb1 /mnt • And now the files that were there before are gone. There’s this directory called `lost+found` – ls /mnt • this is a feature of `ext` filesystems where basically if there are fragments of – • like if the filesystem gets corrupted, which is pretty rare these days – • but if it gets corrupted and there are some fragments of files that it • doesn’t know where they belong, it puts them in the `lost+found` folder and • maybe you can make sense of them at a later time. But it’s not usually useful, • it’s just always there. • So I’ve got this new filesystem mounted and I can go there, I can create directories. mkdir hello • I can create a file – “this is on my new drive”, okay. Write that, exit. nano readme.txt • Oh I’m still in `~` I didn’t mean that. Okay, `/mnt` – oh yeah, ls -l /mnt • so [laughs] – I did all of that in `~`, that wasn’t the right thing. ls -l • So I’m going to just `mv readme.txt` let’s say into `/mnt` – permission denied, interesting. mv readme.txt /mnt • So the way it mounted this, because I mounted it as root and I didn’t give ls -l /mnt • permission to other users to access it, then my regular user can’t do that. • What I could do there is do a `chmod`, and say that everyone is allowed to read, sudo chmod +rwx /mnt • write, and execute that folder, okay. And if I do that – did we do `ls -a` before? ls -la /mnt • This is what shows the dot files
Lopes	• hidden files?
League	• hidden files, yeah. And one of those is `.` so this is a representation of • that mounted directory itself. And it did not add write permission for group and others, • so I’m going to be more explicit about that. Let’s see if that works. sudo chmod go+w /mnt • Okay, so now everybody is allowed to write to that mounted disk. • Which means that I should be able to repeat this command to move `readme` over to `/mnt`. mv readme.txt /mnt • And now that exists and is owned by this user – so one thing that having ls -l /mnt • this `ext` filesystem on the USB drive means is that I can have file • ownerships that make sense to this system. I can do things like symbolic • links and other filesystem features that Linux filesystems support but FAT systems do not. • Okay, so I’ve got the `readme` file there – let’s unmount that. So now it’s empty. sudo umount /mnt ls /mnt • And then we want to remount – just lost it – that appears exactly as it did before, sudo mount /dev/sdb1 /mnt ls /mnt -la • with the same permissions here and with the owner kept track of there. • So one thing it does mean, having that drive formatted with `ext4` is that • now this will be pretty useless on Windows and Mac systems. • The `ext` system really only works on Linux, so if you want a drive that • can be transferred between different operating systems, • you need to format it with a filesystem that works on all of them. • Thanks for joining us today. Next time I think we’ll cover more about • searching through filesystems using commands like `find` and `locate` and then `xargs`, • which also gets used quite a lot with `find`. So see you then.
	• [Dark electronic beat] • [Captions by Christopher League] • [End]