Episode 12 Find and locate

We use find and locate to dig up lists of files on our system that match certain criteria. We also look at xargs for executing commands on a selected set of files.

20 October 2015

	• [Rhythmic, dark electronic intro music]
League	• Welcome back to Command Line TV. Today we’re going to talk about finding • files using a command called `find` and also an alternative called `locate`. • And do we have any follow-up from last time?
Lopes	• Last time at the end of the episode, we learned about formatting and modifying SD cards or, • sorry, external drives. How can we load a drive so it’s read-only?
League	• Sure, so if you want to make sure that programs can’t access – • or can’t write to the drive, there is an option for that when you mount it. • If I type `mount` and remember we put the device name – I guess `sdb1` – • and then the path where it would be mounted, the directory where it would mount. • You can specify some other options here using a `-o` and one of the most • common options is just saying read-only `ro`. If I do that, mount -o ro /dev/sdb1 /mnt • then the disk will be mounted read-only and it means you can do things like `cat` and `ls` on it, • but if you tried to actually edit a file or copy a file to it or something like that, • it would stop you right away and say “read-only filesystem.” So that prevents it from being written.
Lopes	• Since we’ll be using the `find` command today, I’m assuming that it’s as • simple as the command name sounds. • We just use it to find files and other things on our filesystem, correct?
League	• Yeah, you use it to find things – what’s interesting about it is it’s got • this syntax that’s available as options for specifying a query – • it’s really like querying a database, but about files. • So you can find them by name but you can also find them by modification • times or permissions or combine all these things together into a big query. • I’m going to start with the simplest case, which is finding them by name. • Let’s say I want to find – the first thing that I give is the directory to start looking in, • and then it will look in any sub-directory of that too. • If I want to look across this entire system I could put `find /` – • that would be the top level directory. Or the current directory `.` or my • home directory `~` (which happens to be the current directory). • But you could do any of those as your starting point for the `find`. • Then we put the query as options. So `-name` is a way to search by filename • and this takes wildcards so I could say something like `.png` – • but there is a little bit of a catch there. When you use a wildcard like this, • the shell expands it before it actually gives it to the `find` command. • So `find` is going to get the names of all the PNGs in the current • directory and that’s not what we want. We want that star `` actually to be passed to `find` as is. • So I don’t want my shell to expand the wildcard, I want `find` to match the • wildcard with the files that it comes across. So I have to quote it, • just like when you’ve got spaces in a filename, or any special characters. find ~ -name '.png' • You put quotes around it and then it won’t expand but `find` can still interpret it. • So there’s a simple example of a `find` command. And if I run that it’s • going to just dump out a list of all of these PNG files that exist in my home directory. • So I’m going to pipe that into `less` so we can see it a page at a time. find ~ -name '.png' \| less • You see some of them are in this – these cache folders, • so this is a `.cache` which is one of those hidden files, right? Starts with a dot. • And so it’s got little thumbnails in there of images that I didn’t even know about, • but I can find them with `find`. • The program called `inkscape` – this is a drawing program. • It has put some of its icons into that cache, and so on. • There are lots of PNGs here that you might not have even thought of before.
Lopes	• So when we ran `find` just now it showed all the images that we had using • the wildcard but then it also expanded into subdirectories. • What if for example you wanted to locate some of the playing cards that we worked with, • but the originals not the ones that we changed the geometry on.
League	• Yep, so down here in the `Downloads/Playing\ Cards` and PNG, cd Downloads/Playing\ Cards cd PNG-cards-1.3 ls • I had these and last time we created the subdirectories. • So when I do a `find` here, so let’s say I want to find – • by the way if the current directory is where you want to start from then • you don’t actually need to specify the `.` there. But if I want to find all • of the filenames that have `hearts` in them, I could do that. find . -name 'hearts' • But that is getting me the ones in subdirectories as well. So there’s `cards33`, `cards25`. • If I want to limit it to either the current directory or maybe I just don’t • want to search too deep – there’s an option called `-maxdepth`. • I put `maxdepth` there and if I were to say `-maxdepth 1` then we’re only find -maxdepth 1 -name 'hearts' • seeing the files that have `hearts` in the name that are one level deep, • so basically in the current directory. • And if I went to `2` that would be enough to get me these other directories as well. find -maxdepth 2 -name 'hearts' • Now there are some other queries that I can add to this. • When you have multiple queries on a `find` command they are joined together • using a Boolean AND operator. So in other words all of them have to be true • in order for the file to match. One that I like to use sometimes is – • if you want to find files that have been modified since a certain time • that’s an option called `-newer`. • So I want to show files that are newer than some other file. • Let’s pick one of those `hearts` files, I guess `7_of_hearts.png`. • First let’s get back all of the hearts in the current directory, • and then I’m going to just do the ones that are newer than the • `7_of_hearts` and you see that it’s a subset of those. find -maxdepth 1 -name 'hearts' -newer 7_of_hearts.png • If I were to look at these by modification time – so like `-ltr` for the ls -ltr • most recent ones at the bottom – you’re going to see queen, king, 2, 6, 4, • 5, jack, 3 as being newer than the 7. So let’s see – • oh but I’m seeing stuff that isn’t hearts so let’s do it this way. ls -ltr hearts • King, queen, jack, 2, 3, 4, 6 – I believe that’s what we had before. • So these files below here are newer than the `7_of_hearts`. • And the order those come in – so it’s showing me the modification times here – • all say 2011 because that is the time-stamp that was in the zip file. • But they could have been zipped in a particular order and there are seconds • and milliseconds there that it’s not showing me because the date is so far in the past. • But it is actually more detailed than what it’s showing.
Lopes	• Since these files were modified elsewhere, I guess a way to represent this • or show a better explanation of it would be to `cd` into one of the files • that we modified ourselves, like the `size33` or `size25`, right?
League	• Yeah so the `cards25` – these were all done when we recorded that episode on September 8th. • And again they all show `14:01` as the time but there is more precision than that. • So if I did that newer command – but let’s get rid of the `-maxdepth` – • right so newer than the `7_of_hearts` in the current directory. find -name 'hearts' -newer 7_of_hearts.png • You’re going to get everything in those subdirectories because those were • modified much more recently. These also are not – don’t seem to be coming • out in any particular order. If you wanted these to appear in some more • significant order you could sort them, right? So pipe it into `sort` and find -name 'hearts' -newer 7_of_hearts.png \|sort • now they’re a little more nicely organized. All the `33`s will be together, stuff like that. • So there are a few other queries we could use. One thing that’s useful • besides `-name` and `-newer` is matching on the type of the file. • So if I do `-type` you can specify type as either `f` which is a regular file, • or `d` which is directory. Or there are a few more options for more exotic • types of files like device files and so forth, which we haven’t really learned much about. • So `f` or `d` are the most common ones. • If I wanted to find everything that’s a directory that contains the name `hearts`, find -type d -name 'hearts' • there’s nothing that matches. So there were lots of things that have the • name `hearts` but they’re all regular files. By ANDing both of these together, • then my result set becomes empty. If I just did `-type d` then I get a list find -type d • of all the directories that I’ve got. • Another one that I think is useful is `-empty`. So `-empty` refers to • whether the file has zero bytes – it’s a completely empty file. find -empty • And sometimes there are a surprising number of empty files on your filesystem. • Some of them are there for good reason even though they’re completely empty. • So these are just some of the queries you can use. Do you want to guess how • we could find out about more queries that are available with find?
Lopes	• We could do `--version` – sorry, I mean `-h`.
League	• `--help` or yeah I think `-h` is the same – nope! Has to be `--help`. find -h find --help • So pipe that into `less` and there’s a very brief summary here of some of find --help \| less • the queries that you can do. There’s `-empty` and `-type` and so on in here. • But then for more detail there’s this ‘manual’ command, • so `man` gives us a manual page for – like a reference page for any command. man find • So `man find` will tell us in a much more friendly (but not too friendly!) • way about the capabilities of `find`. So you can browse that to get some other ideas.
Lopes	• So when we pulled up the `man` for `find` just now, there was a list of – • I guess – options called “actions”. What can we do with those?
League	• Right, so it kind of carves up these options into these three categories • and “actions” are something you would put at the end of your query. • The default action if you don’t specify one is just to print filenames `-print`, • but `find` can do things other than print filenames. • There are lots of ways to specify how it prints – that’s what these formats are about. • I’m not going to get into those. But it can also execute arbitrary other commands. • And it’s got a built-in one here called `-delete`, so if you want to remove • a bunch of files according to your query. • Let’s try some of those. I’m going to do a `find` for the name `jack` – • and whenever I’ve got an action besides `-print` I always want to test it find -name 'jack' \|less • out first by doing just a print, right? So I want to see all of the files that it is producing. • Maybe I will simplify it a little by – like it’s finding these with a dot-underscore `._` – • so if I just take filenames that start with `jack` and then have anything afterwards, find -name 'jack' \|less • there ought to be fewer of them. • Okay so those are all of the Jack cards. And then on my `find` command if I • add a `-delete` at the end then they’re all gone! [Laughs] That was pretty fast – find -name 'jack' -delete • but now if I do `find` on `jack` there aren’t any, so all of my Jacks have disappeared. find -name 'jack' • So that’s something obviously you want to use with great care. • There are ways to specify other arbitrary commands you could do as well. • So let’s say I am looking at the queens. Here are all of my queens. find -name 'queen' • And I want to change permissions on those so if I look at – • let’s go down into `Downloads`, `Playing\ Cards`, PNG. cd Downloads/Playing\ Cards/PNG-cards-1.3/ ls -l • So if I look at these cards here, they all have permissions `rw` and then `r`, `r`, right? • Let’s say that my queens are private and they want to turn off the read • permission for anyone but the user. We’re going to do a `find` with • `queen` and then it’s called `-exec`. And now you put the command you want to run, so `chmod`. • And with `chmod` we specify the permission change we want to make, • so for group and others let’s turn off read permission. • The user can keep read permission but turn it off for the others. • And then you would put the filename normally – well, • the filename is going to come from the `find` command – • `find` is going to generate all these filenames and then execute `chmod` on all of them. • So I’ve got a very special way to plug in the filename at this point in my `chmod` command. • And that is I put quote and curly braces `'{}'` like just open and close curly braces. • That’s the signal to the `find` command that that’s where it wants to plug • in the filename that it finds. • Finally I have to say when I’m done with the `chmod` command, • so that `find` knows that I’m all done. The way to do that is almost as weird, • you do backslash semi-colon `\;`. You end up using these quotes and find -name 'queen*' -exec chmod go-r '{}' \; • backslashes and stuff with `find` quite a lot, because of the way its command system works, • you need to pass these wildcards in explicitly. And normally the curly • braces would be a wildcard that the shell interprets, so you put quotes around that. • Semi-colon means something in the shell so you quote that with a backslash, • so that `find` sees all that stuff as it is. • Alright, so I’m going to run that. It was very fast, • and what we will find in the current directory is that all of our queens ls -l • now do not have read permission for those other two, but everybody else does. • So that’s sort of – that hints at the power of this `find` command, • of doing very complex queries and then allowing us to hook that in to some • other command like `chmod` or change-owner or delete – • in order to execute a command on lots of different files.
Lopes	• Can `chmod` be used the same way using numerical values like we’ve done in the past?
League	• Yeah, anything that the `chmod` command supports you could do in here. • So one of those octal numerical values was 662 or something – just make up a weird one. • So if I do that on all of the queens then we see here – this is the result of 662. find -name 'queen' -exec chmod 662 '{}' \; • Yeah, so any command can be put in there, it could even be something like `echo`, • some script that you wrote – any command you could normally execute and put a filename into, find -name 'queen' -exec ./resize '{}' \; • `find` can execute for you and then just put in the filenames that match your query.
Lopes	• So when we combined that `chmod` with the `find` just now, • it seemed sort of like when we would use a pipe.
League	• Yes, there are lots of ways to combine commands together – pipe, • and we also did the command substitution with those back-ticks – • and `find` has a lot in common with those. You’re right, • when I do `chmod` here with an `-exec`, I’m combining the `find` command with the `chmod` command – find -name 'queen' -exec chmod go-r '{}' \; • so you might imagine another way to think of that. Let’s say I do `find` for the – • I did queens and jacks, let’s do a king – and I’m going to do `-maxdepth` • to make this a little bit fewer, right. So there are my kings, find -maxdepth 1 -name 'king' • and if I wanted to run `chmod` on all of those, another way to do it • actually is put `chmod` at the beginning okay, and then my code for the permissions that I want – • so maybe I want 722 – twos don’t make a lot of sense, • like giving other write permission, four is read permission. • Let’s say I want 744 and then normally you would put a filename here but • you can put multiple filenames on `chmod`. So why don’t I do that command substitution, • either with the back-quotes or `$()`. And so what I’ve done here is – chmod 744 $(find -maxdepth 1 -name 'king') • this will run the `find`, and any files that the `find` produces will then • get plugged into the `chmod` command here. And that actually works – • that’s pretty much the same thing. So you see my kings turned green because I made them executable. • So that is very similar to using `find` as the outer command and then `-exec` with `chmod`. • But there are some subtle differences. One of the differences has to do with – • first of all, there is often a limit on how big a command line can get. • So if this `find` were to generate a hundred thousand files or something, • then I might exhaust the limitation on the size of the `chmod` command line. • So this form with the command substitution has that limitation. • Whereas if I do `find` with `-exec`, it can execute `chmod` multiple times • rather than build up an enormous single `chmod` command. • So that is one difference in the limitations, even though it looks like it • does pretty much the same thing. • But another one – you said it’s like a pipe, and there is another way to use a pipe – • which is a command called `xargs`. `xargs` is sort of like a bridge between • piping and command substitution. So let’s bring back the command substitution form. chmod 744 $(find -maxdepth 1 -name 'king') • What is more or less equivalent to this is – let’s do the `find`, • so I’ll copy that out and paste it out there – so that’s going to generate those – find -maxdepth 1 -name 'king' • and then if I pipe it into `xargs` and following the `xargs` I can put a command like `chmod 744`. • Well let’s do something different so I recognize the change. • Now normally `chmod` and then the permission and then you put the filename. • But what `xargs` will do is it’ll take its standard input – • so the result of that `find` that gets piped into it – • it’ll take all of the files from there and put them on the `chmod` command line. find -maxdepth 1 -name 'king' \| xargs chmod 644 • So `xargs` is like building the `chmod` command line based on the standard input. • That allows me to turn what was a command substitution into a pipe. And that works fine. • Now all of the kings have 644 as their permissions. ls -l • So one of the – I said like these are three different ways to do the same thing, right? • We did the `find` with `-exec`, we did command substitution where `chmod` • is on the outside and `find` is inside, and then we can do `xargs` where • `find` comes first and the `chmod` appears after `xargs`. • They’re all more or less the same but the caveats and limitations are where • things get a little weird. And one of those limitations is when spaces – • when filenames have spaces in them. • I might have said before that you should be very careful about naming things with spaces in them. • And this is one of the reasons – it makes it very hard for commands to distinguish between files, • like – let’s take a little example here. I know up here I’ve got – cd ~ • outside of `Downloads` – I’ve got a directory which has spaces in it, right? • So this directory is called `Command Line TV`. So if I did `find -maxdepth 1 -type d`, find -maxdepth 1 -type d • I get all of the directories in the current directory. So that includes that. • But now if I pipe that into `xargs`, and I want to do something like `chmod +x` on them – find -maxdepth 1 -type d \| xargs chmod +x • so I want to make all of those directories executable, which is a reasonable thing to do. • [Sigh] What happened? That “Command Line TV” was one line of my output, • but when I pipe it into `xargs`, it gets split into three parts because it’s got spaces in it. • And so my `chmod` fails there because it’s trying to treat those as three separate things. • So that’s a risk with filenames with spaces in them. • It’s also a risk with how `find` works and `xargs` works. • There is a solution to it though, and it’s a way in which `find` and • `xargs` are actually meant to work together. One of those actions we saw – find --help • I said that `-print` is the default action, there’s also one called `-print0`. • What `-print0` does is it prints these directories out, find -maxdepth 1 -type d -print0 • but instead of separating them with newlines or spaces, • it separates them with the ‘zero’ character, or the null character. • So the character with the value zero. And when I see them printed here on • the command line it looks like they’re all bunched together – • that’s because that null character doesn’t show up. • But if I take that and then pipe it into `xargs`, and I also tell `xargs` • it should look for the zero character to split them up – so I’ve got to – • let me verify with `xargs` how it does that. Yeah, it’s just `-0`, okay. xargs --help • So we’re going to do the `find` with `-print0` and then pipe into `xargs -0` and now try `chmod +x`. find -maxdepth 1 -type d -print0 \| xargs -0 chmod +x • And now it works again. • So it’s able to keep the filename with spaces in it together because it • knows that spaces or newlines are not what splits up the multiple directories. • It’s actually this null character. So if you do that on both sides – • the `find` says I’m going to send nulls, and the `xargs` says I’m going to split on nulls – • then they cooperate and this problem goes away.
Lopes	• So just to backtrack, would the null character be considered that `./` or that `/.` up top?
League	• No you literally can’t see it here, it just doesn’t print out. • So what we’re seeing is `../.dbus` – that’s one entry – that’s one result of my `find`. • And then the next result is `./.thumbnails`. But I only know that because I saw them previously. • The null character just doesn’t appear in printed output. • But it will appear when you pipe it into something that is expecting it. • Another way that we could see it show up actually – • this introduces another command, but one that’s pretty easy – • there’s a command called “octal dump” (`od`) so this takes input data and find -maxdepth 1 -type d \| od • just shows it to you as a series of octal numbers. And you can specify that • they should be like one byte big instead of – so basically – hmm, find -maxdepth 1 -type d \| od -t o1 • I don’t like octal so I’m going to do hexadecimal – that’s better. find -maxdepth 1 -type d \| od -t x1 • What we’re seeing here is basically – this is the – • so `0a` is a newline I believe, and – oh it’s ‘dot’, newline, • dot-slash-dot something, newline, okay. So it’s separating – • `find` by default is separating all of its results by newline. • But if I do that same thing with `-print0` on it, you can see the difference. find -maxdepth 1 -type d -print0 \| od -t x1 • Those `0a`s became `00`. That separates each of these. • So I don’t see that when it’s just output onto the terminal, • but it is sent on to the next command in the pipe.
Lopes	• So `find` seems like a really useful command to use. • I would say that it works especially well when you’re working with a • confined or constrained search area. Is that command the best to use when • you’re searching your entire system?
League	• Yeah, so you can – you know, when you specify `find` you can put a `/` find / • here to say search the entire system. And sometimes you might need to do that. • But it’s very time-consuming, if it has to do that. • If you’re not the administrative user, you’re a regular user, • it’s going to encounter lots of directories you’re not even allowed to read. • So it will give you some error messages about stuff like that. • There is a better command for searching the entire disk for a pattern. • The trade-off – well there are a couple of trade-offs. • One trade-off is that it support as many queries as `find` does. • You know, `find` had `-newer` and by file type and things like that. • The command that I’m going to introduce now – `locate`, it’s basically searching by filename. • So it can do pattern matching on filenames, but that’s pretty much it. • So let’s say I do `locate` and I want to do – I don’t know, • anything that has to do with the password file, right? So this will respond pretty quickly. locate passwd • And the reason it can respond pretty quickly and still find stuff all over • your disk is that it uses a database. There’s a database that only gets updated periodically, • that basically indexes all of the files on your system. • And then `locate` can read that database and give you results that match. • So we can use it like that. Or lets say I want anything that has to do with ImageMagick. locate Magick • So it gives me anything across the whole system, and it’s pretty fast. • If I wanted to find stuff that’s very recent, very new – that’s more of a problem. • So let’s go into my `Downloads` folder and I’ve got here some files called `weblog`, right? cd Downloads • So suppose I want to locate files which have this pattern. locate weblog-2015 • And I believe `locate` sort of implicitly puts a star `*` on each side of your query? • So I don’t really need to do that, although if you wanted to put stars in there somewhere, • you do again just like – for the same reason as with `find` you do have to put quotes. • So we can search for that. And these are all of the files that say `weblog`. • But it’s able to find those because it’s got this database, • so if I add a new file right now – let’s create `weblog-2015` September-something. touch weblog-2015-09-17.txt ls • So now that file exists, and it wasn’t there before. • When I do `locate` it still doesn’t find September 17th, okay? locate weblog-2015 • That’s because the database is now out of date. • If I wanted to update the database manually, I can. • Normally what happens is it’s scheduled as like a periodic job – • like once a day or every couple of hours or something – it will run a command. • The command is called `updatedb` and it has to be run as the administrator, so we want to do `sudo`. • When I do this, it’s going to reindex the entire disk, so it can take a little while. • But then it updates the database and then we’ll be able to see the result. sudo updatedb • Actually that was pretty fast – I’m not sure it did what I thought it was going to do! • But we’ll try it – let’s try `locate` again. And indeed, locate weblog-2015 • `locate` now finds September 17th, so somehow `updatedb` did a really fast job of that – great.
Lopes	• So like most commands on the terminal that we’ve been running, a lot of it is case-sensitive. • What would we do, or what option would we have to set to turn off case-sensitivity.
League	• Yeah, both `locate` and `find` when you do those patterns on the filename, • it’s assuming a case-sensitive match. So if I did `locate` and maybe I • falsely remembered that my weblog files were capitalized like that – it’s not going to find them. locate WEBLOG-2015 • But there is an option `locate -i` for insensitive – • it’s the same that `grep` uses for insensitive matching – `-i`. locate -i WEBLOG-2015 • And then it will ignore the difference in case between your pattern and the filename itself. • That helps me find more things. • `find` is very similar – the file matching feature in `find` was called `-name` and I did – • first of all let’s verify that `find` does not implicitly put the `` before and after, find -name weblog-2015 • like `locate` does. So `locate` with that `weblog-2015` and no asterisks • still found these files but `find` will not. So it wants the star there. find -name 'weblog-2015' • But same thing – `find` with `-name` is case-sensitive so if I do capital, find -name 'WEBLOG-2015' • it’s going to look for capitals. So the fix there is that there’s just a • separate query operator called `-iname` and that’s an insensitive match. find -iname 'WEBLOG-2015' • And then it’ll find those files.
Lopes	• So in today’s episode we touched base on locating files using three commands, • `find`, `locate`, and `xargs`. That wraps up today’s episode. • That also wraps up Season 1 of Command Line TV.
League	• Yes, we hope you found this useful – we covered lots of things since we started this – • so, navigating through files with `cd`, creating pipelines to do some text processing, • searching for stuff, image processing, package management, redirection, • shell scripts – so we did lots of things. And if you found this useful, • I hope you’ll get in touch with us.
Lopes	• You can reach us at `heychris@commandline.tv` or follow us on Twitter `@commandlinetv`.
League	• And if we have good feedback from you and you found this useful then we’ll try to do more!
	• [Dark electronic beat] • [Captions by Christopher League] • [End]