Episode 12 Find and locate

We use find and locate to dig up lists of files on our system that match certain criteria. We also look at xargs for executing commands on a selected set of files.

20 October 2015

[Rhythmic, dark electronic intro music]

League

Welcome back to Command Line TV. Today we’re going to talk about finding

files using a command called find and also an alternative called locate.

And do we have any follow-up from last time?

Lopes

Last time at the end of the episode, we learned about formatting and modifying SD cards or,

sorry, external drives. How can we load a drive so it’s read-only?

League

Sure, so if you want to make sure that programs can’t access –

or can’t write to the drive, there is an option for that when you mount it.

If I type mount and remember we put the device name – I guess sdb1

and then the path where it would be mounted, the directory where it would mount.

You can specify some other options here using a -o and one of the most

common options is just saying read-only ro. If I do that,

mount -o ro /dev/sdb1 /mnt

then the disk will be mounted read-only and it means you can do things like cat and ls on it,

but if you tried to actually edit a file or copy a file to it or something like that,

it would stop you right away and say “read-only filesystem.” So that prevents it from being written.

Lopes

Since we’ll be using the find command today, I’m assuming that it’s as

simple as the command name sounds.

We just use it to find files and other things on our filesystem, correct?

League

Yeah, you use it to find things – what’s interesting about it is it’s got

this syntax that’s available as options for specifying a query –

it’s really like querying a database, but about files.

So you can find them by name but you can also find them by modification

times or permissions or combine all these things together into a big query.

I’m going to start with the simplest case, which is finding them by name.

Let’s say I want to find – the first thing that I give is the directory to start looking in,

and then it will look in any sub-directory of that too.

If I want to look across this entire system I could put find /

that would be the top level directory. Or the current directory . or my

home directory ~ (which happens to be the current directory).

But you could do any of those as your starting point for the find.

Then we put the query as options. So -name is a way to search by filename

and this takes wildcards so I could say something like *.png

but there is a little bit of a catch there. When you use a wildcard like this,

the shell expands it before it actually gives it to the find command.

So find is going to get the names of all the PNGs in the current

directory and that’s not what we want. We want that star * actually to be passed to find as is.

So I don’t want my shell to expand the wildcard, I want find to match the

wildcard with the files that it comes across. So I have to quote it,

just like when you’ve got spaces in a filename, or any special characters.

find ~ -name '*.png'

You put quotes around it and then it won’t expand but find can still interpret it.

So there’s a simple example of a find command. And if I run that it’s

going to just dump out a list of all of these PNG files that exist in my home directory.

So I’m going to pipe that into less so we can see it a page at a time.

find ~ -name '*.png' | less

You see some of them are in this – these cache folders,

so this is a .cache which is one of those hidden files, right? Starts with a dot.

And so it’s got little thumbnails in there of images that I didn’t even know about,

but I can find them with find.

The program called inkscape – this is a drawing program.

It has put some of its icons into that cache, and so on.

There are lots of PNGs here that you might not have even thought of before.

Lopes

So when we ran find just now it showed all the images that we had using

the wildcard but then it also expanded into subdirectories.

What if for example you wanted to locate some of the playing cards that we worked with,

but the originals not the ones that we changed the geometry on.

League

Yep, so down here in the Downloads/Playing\ Cards and PNG,

cd Downloads/Playing\ Cards
cd PNG-cards-1.3
ls

I had these and last time we created the subdirectories.

So when I do a find here, so let’s say I want to find –

by the way if the current directory is where you want to start from then

you don’t actually need to specify the . there. But if I want to find all

of the filenames that have hearts in them, I could do that.

find . -name '*hearts*'

But that is getting me the ones in subdirectories as well. So there’s cards33, cards25.

If I want to limit it to either the current directory or maybe I just don’t

want to search too deep – there’s an option called -maxdepth.

I put maxdepth there and if I were to say -maxdepth 1 then we’re only

find -maxdepth 1 -name '*hearts*'

seeing the files that have hearts in the name that are one level deep,

so basically in the current directory.

And if I went to 2 that would be enough to get me these other directories as well.

find -maxdepth 2 -name '*hearts*'

Now there are some other queries that I can add to this.

When you have multiple queries on a find command they are joined together

using a Boolean AND operator. So in other words all of them have to be true

in order for the file to match. One that I like to use sometimes is –

if you want to find files that have been modified since a certain time

that’s an option called -newer.

So I want to show files that are newer than some other file.

Let’s pick one of those hearts files, I guess 7_of_hearts.png.

First let’s get back all of the hearts in the current directory,

and then I’m going to just do the ones that are newer than the

7_of_hearts and you see that it’s a subset of those.

find -maxdepth 1 -name '*hearts*' -newer 7_of_hearts.png

If I were to look at these by modification time – so like -ltr for the

ls -ltr

most recent ones at the bottom – you’re going to see queen, king, 2, 6, 4,

5, jack, 3 as being newer than the 7. So let’s see –

oh but I’m seeing stuff that isn’t hearts so let’s do it this way.

ls -ltr *hearts*

King, queen, jack, 2, 3, 4, 6 – I believe that’s what we had before.

So these files below here are newer than the 7_of_hearts.

And the order those come in – so it’s showing me the modification times here –

all say 2011 because that is the time-stamp that was in the zip file.

But they could have been zipped in a particular order and there are seconds

and milliseconds there that it’s not showing me because the date is so far in the past.

But it is actually more detailed than what it’s showing.

Lopes

Since these files were modified elsewhere, I guess a way to represent this

or show a better explanation of it would be to cd into one of the files

that we modified ourselves, like the size33 or size25, right?

League

Yeah so the cards25 – these were all done when we recorded that episode on September 8th.

And again they all show 14:01 as the time but there is more precision than that.

So if I did that newer command – but let’s get rid of the -maxdepth

right so newer than the 7_of_hearts in the current directory.

find -name '*hearts*' -newer 7_of_hearts.png

You’re going to get everything in those subdirectories because those were

modified much more recently. These also are not – don’t seem to be coming

out in any particular order. If you wanted these to appear in some more

significant order you could sort them, right? So pipe it into sort and

find -name '*hearts*' -newer 7_of_hearts.png |sort

now they’re a little more nicely organized. All the 33s will be together, stuff like that.

So there are a few other queries we could use. One thing that’s useful

besides -name and -newer is matching on the type of the file.

So if I do -type you can specify type as either f which is a regular file,

or d which is directory. Or there are a few more options for more exotic

types of files like device files and so forth, which we haven’t really learned much about.

So f or d are the most common ones.

If I wanted to find everything that’s a directory that contains the name hearts,

find -type d -name '*hearts*'

there’s nothing that matches. So there were lots of things that have the

name hearts but they’re all regular files. By ANDing both of these together,

then my result set becomes empty. If I just did -type d then I get a list

find -type d

of all the directories that I’ve got.

Another one that I think is useful is -empty. So -empty refers to

whether the file has zero bytes – it’s a completely empty file.

find -empty

And sometimes there are a surprising number of empty files on your filesystem.

Some of them are there for good reason even though they’re completely empty.

So these are just some of the queries you can use. Do you want to guess how

we could find out about more queries that are available with find?

Lopes

We could do --version – sorry, I mean -h.

League

--help or yeah I think -h is the same – nope! Has to be --help.

find -h
find --help

So pipe that into less and there’s a very brief summary here of some of

find --help | less

the queries that you can do. There’s -empty and -type and so on in here.

But then for more detail there’s this ‘manual’ command,

so man gives us a manual page for – like a reference page for any command.

man find

So man find will tell us in a much more friendly (but not too friendly!)

way about the capabilities of find. So you can browse that to get some other ideas.

Lopes

So when we pulled up the man for find just now, there was a list of –

I guess – options called “actions”. What can we do with those?

League

Right, so it kind of carves up these options into these three categories

and “actions” are something you would put at the end of your query.

The default action if you don’t specify one is just to print filenames -print,

but find can do things other than print filenames.

There are lots of ways to specify how it prints – that’s what these formats are about.

I’m not going to get into those. But it can also execute arbitrary other commands.

And it’s got a built-in one here called -delete, so if you want to remove

a bunch of files according to your query.

Let’s try some of those. I’m going to do a find for the name *jack*

and whenever I’ve got an action besides -print I always want to test it

find -name '*jack*' |less

out first by doing just a print, right? So I want to see all of the files that it is producing.

Maybe I will simplify it a little by – like it’s finding these with a dot-underscore ._

so if I just take filenames that start with jack and then have anything afterwards,

find -name 'jack*' |less

there ought to be fewer of them.

Okay so those are all of the Jack cards. And then on my find command if I

add a -delete at the end then they’re all gone! [Laughs] That was pretty fast –

find -name 'jack*' -delete

but now if I do find on jack* there aren’t any, so all of my Jacks have disappeared.

find -name 'jack*'

So that’s something obviously you want to use with great care.

There are ways to specify other arbitrary commands you could do as well.

So let’s say I am looking at the queens. Here are all of my queens.

find -name 'queen*'

And I want to change permissions on those so if I look at –

let’s go down into Downloads, Playing\ Cards, PNG.

cd Downloads/Playing\ Cards/PNG-cards-1.3/
ls -l

So if I look at these cards here, they all have permissions rw and then r, r, right?

Let’s say that my queens are private and they want to turn off the read

permission for anyone but the user. We’re going to do a find with

queen* and then it’s called -exec. And now you put the command you want to run, so chmod.

And with chmod we specify the permission change we want to make,

so for group and others let’s turn off read permission.

The user can keep read permission but turn it off for the others.

And then you would put the filename normally – well,

the filename is going to come from the find command –

find is going to generate all these filenames and then execute chmod on all of them.

So I’ve got a very special way to plug in the filename at this point in my chmod command.

And that is I put quote and curly braces '{}' like just open and close curly braces.

That’s the signal to the find command that that’s where it wants to plug

in the filename that it finds.

Finally I have to say when I’m done with the chmod command,

so that find knows that I’m all done. The way to do that is almost as weird,

you do backslash semi-colon \;. You end up using these quotes and

find -name 'queen*' -exec chmod go-r '{}' \;

backslashes and stuff with find quite a lot, because of the way its command system works,

you need to pass these wildcards in explicitly. And normally the curly

braces would be a wildcard that the shell interprets, so you put quotes around that.

Semi-colon means something in the shell so you quote that with a backslash,

so that find sees all that stuff as it is.

Alright, so I’m going to run that. It was very fast,

and what we will find in the current directory is that all of our queens

ls -l

now do not have read permission for those other two, but everybody else does.

So that’s sort of – that hints at the power of this find command,

of doing very complex queries and then allowing us to hook that in to some

other command like chmod or change-owner or delete –

in order to execute a command on lots of different files.

Lopes

Can chmod be used the same way using numerical values like we’ve done in the past?

League

Yeah, anything that the chmod command supports you could do in here.

So one of those octal numerical values was 662 or something – just make up a weird one.

So if I do that on all of the queens then we see here – this is the result of 662.

find -name 'queen*' -exec chmod 662 '{}' \;

Yeah, so any command can be put in there, it could even be something like echo,

some script that you wrote – any command you could normally execute and put a filename into,

find -name 'queen*' -exec ./resize '{}' \;

find can execute for you and then just put in the filenames that match your query.

Lopes

So when we combined that chmod with the find just now,

it seemed sort of like when we would use a pipe.

League

Yes, there are lots of ways to combine commands together – pipe,

and we also did the command substitution with those back-ticks –

and find has a lot in common with those. You’re right,

when I do chmod here with an -exec, I’m combining the find command with the chmod command –

find -name 'queen*' -exec chmod go-r '{}' \;

so you might imagine another way to think of that. Let’s say I do find for the –

I did queens and jacks, let’s do a king – and I’m going to do -maxdepth

to make this a little bit fewer, right. So there are my kings,

find -maxdepth 1 -name 'king*'

and if I wanted to run chmod on all of those, another way to do it

actually is put chmod at the beginning okay, and then my code for the permissions that I want –

so maybe I want 722 – twos don’t make a lot of sense,

like giving other write permission, four is read permission.

Let’s say I want 744 and then normally you would put a filename here but

you can put multiple filenames on chmod. So why don’t I do that command substitution,

either with the back-quotes or $(). And so what I’ve done here is –

chmod 744 $(find -maxdepth 1 -name 'king*')

this will run the find, and any files that the find produces will then

get plugged into the chmod command here. And that actually works –

that’s pretty much the same thing. So you see my kings turned green because I made them executable.

So that is very similar to using find as the outer command and then -exec with chmod.

But there are some subtle differences. One of the differences has to do with –

first of all, there is often a limit on how big a command line can get.

So if this find were to generate a hundred thousand files or something,

then I might exhaust the limitation on the size of the chmod command line.

So this form with the command substitution has that limitation.

Whereas if I do find with -exec, it can execute chmod multiple times

rather than build up an enormous single chmod command.

So that is one difference in the limitations, even though it looks like it

does pretty much the same thing.

But another one – you said it’s like a pipe, and there is another way to use a pipe –

which is a command called xargs. xargs is sort of like a bridge between

piping and command substitution. So let’s bring back the command substitution form.

chmod 744 $(find -maxdepth 1 -name 'king*')

What is more or less equivalent to this is – let’s do the find,

so I’ll copy that out and paste it out there – so that’s going to generate those –

find -maxdepth 1 -name 'king*'

and then if I pipe it into xargs and following the xargs I can put a command like chmod 744.

Well let’s do something different so I recognize the change.

Now normally chmod and then the permission and then you put the filename.

But what xargs will do is it’ll take its standard input –

so the result of that find that gets piped into it –

it’ll take all of the files from there and put them on the chmod command line.

find -maxdepth 1 -name 'king*' | xargs chmod 644

So xargs is like building the chmod command line based on the standard input.

That allows me to turn what was a command substitution into a pipe. And that works fine.

Now all of the kings have 644 as their permissions.

ls -l

So one of the – I said like these are three different ways to do the same thing, right?

We did the find with -exec, we did command substitution where chmod

is on the outside and find is inside, and then we can do xargs where

find comes first and the chmod appears after xargs.

They’re all more or less the same but the caveats and limitations are where

things get a little weird. And one of those limitations is when spaces –

when filenames have spaces in them.

I might have said before that you should be very careful about naming things with spaces in them.

And this is one of the reasons – it makes it very hard for commands to distinguish between files,

like – let’s take a little example here. I know up here I’ve got –

cd ~

outside of Downloads – I’ve got a directory which has spaces in it, right?

So this directory is called Command Line TV. So if I did find -maxdepth 1 -type d,

find -maxdepth 1 -type d

I get all of the directories in the current directory. So that includes that.

But now if I pipe that into xargs, and I want to do something like chmod +x on them –

find -maxdepth 1 -type d | xargs chmod +x

so I want to make all of those directories executable, which is a reasonable thing to do.

[Sigh] What happened? That “Command Line TV” was one line of my output,

but when I pipe it into xargs, it gets split into three parts because it’s got spaces in it.

And so my chmod fails there because it’s trying to treat those as three separate things.

So that’s a risk with filenames with spaces in them.

It’s also a risk with how find works and xargs works.

There is a solution to it though, and it’s a way in which find and

xargs are actually meant to work together. One of those actions we saw –

find --help

I said that -print is the default action, there’s also one called -print0.

What -print0 does is it prints these directories out,

find -maxdepth 1 -type d -print0

but instead of separating them with newlines or spaces,

it separates them with the ‘zero’ character, or the null character.

So the character with the value zero. And when I see them printed here on

the command line it looks like they’re all bunched together –

that’s because that null character doesn’t show up.

But if I take that and then pipe it into xargs, and I also tell xargs

it should look for the zero character to split them up – so I’ve got to –

let me verify with xargs how it does that. Yeah, it’s just -0, okay.

xargs --help

So we’re going to do the find with -print0 and then pipe into xargs -0 and now try chmod +x.

find -maxdepth 1 -type d -print0 | xargs -0 chmod +x

And now it works again.

So it’s able to keep the filename with spaces in it together because it

knows that spaces or newlines are not what splits up the multiple directories.

It’s actually this null character. So if you do that on both sides –

the find says I’m going to send nulls, and the xargs says I’m going to split on nulls –

then they cooperate and this problem goes away.

Lopes

So just to backtrack, would the null character be considered that ./ or that /. up top?

League

No you literally can’t see it here, it just doesn’t print out.

So what we’re seeing is ../.dbus – that’s one entry – that’s one result of my find.

And then the next result is ./.thumbnails. But I only know that because I saw them previously.

The null character just doesn’t appear in printed output.

But it will appear when you pipe it into something that is expecting it.

Another way that we could see it show up actually –

this introduces another command, but one that’s pretty easy –

there’s a command called “octal dump” (od) so this takes input data and

find -maxdepth 1 -type d | od

just shows it to you as a series of octal numbers. And you can specify that

they should be like one byte big instead of – so basically – hmm,

find -maxdepth 1 -type d | od -t o1

I don’t like octal so I’m going to do hexadecimal – that’s better.

find -maxdepth 1 -type d | od -t x1

What we’re seeing here is basically – this is the –

so 0a is a newline I believe, and – oh it’s ‘dot’, newline,

dot-slash-dot something, newline, okay. So it’s separating –

find by default is separating all of its results by newline.

But if I do that same thing with -print0 on it, you can see the difference.

find -maxdepth 1 -type d -print0 | od -t x1

Those 0as became 00. That separates each of these.

So I don’t see that when it’s just output onto the terminal,

but it is sent on to the next command in the pipe.

Lopes

So find seems like a really useful command to use.

I would say that it works especially well when you’re working with a

confined or constrained search area. Is that command the best to use when

you’re searching your entire system?

League

Yeah, so you can – you know, when you specify find you can put a /

find /

here to say search the entire system. And sometimes you might need to do that.

But it’s very time-consuming, if it has to do that.

If you’re not the administrative user, you’re a regular user,

it’s going to encounter lots of directories you’re not even allowed to read.

So it will give you some error messages about stuff like that.

There is a better command for searching the entire disk for a pattern.

The trade-off – well there are a couple of trade-offs.

One trade-off is that it support as many queries as find does.

You know, find had -newer and by file type and things like that.

The command that I’m going to introduce now – locate, it’s basically searching by filename.

So it can do pattern matching on filenames, but that’s pretty much it.

So let’s say I do locate and I want to do – I don’t know,

anything that has to do with the password file, right? So this will respond pretty quickly.

locate passwd

And the reason it can respond pretty quickly and still find stuff all over

your disk is that it uses a database. There’s a database that only gets updated periodically,

that basically indexes all of the files on your system.

And then locate can read that database and give you results that match.

So we can use it like that. Or lets say I want anything that has to do with ImageMagick.

locate Magick

So it gives me anything across the whole system, and it’s pretty fast.

If I wanted to find stuff that’s very recent, very new – that’s more of a problem.

So let’s go into my Downloads folder and I’ve got here some files called weblog, right?

cd Downloads

So suppose I want to locate files which have this pattern.

locate weblog-2015

And I believe locate sort of implicitly puts a star * on each side of your query?

So I don’t really need to do that, although if you wanted to put stars in there somewhere,

you do again just like – for the same reason as with find you do have to put quotes.

So we can search for that. And these are all of the files that say weblog.

But it’s able to find those because it’s got this database,

so if I add a new file right now – let’s create weblog-2015 September-something.

touch weblog-2015-09-17.txt
ls

So now that file exists, and it wasn’t there before.

When I do locate it still doesn’t find September 17th, okay?

locate weblog-2015

That’s because the database is now out of date.

If I wanted to update the database manually, I can.

Normally what happens is it’s scheduled as like a periodic job –

like once a day or every couple of hours or something – it will run a command.

The command is called updatedb and it has to be run as the administrator, so we want to do sudo.

When I do this, it’s going to reindex the entire disk, so it can take a little while.

But then it updates the database and then we’ll be able to see the result.

sudo updatedb

Actually that was pretty fast – I’m not sure it did what I thought it was going to do!

But we’ll try it – let’s try locate again. And indeed,

locate weblog-2015

locate now finds September 17th, so somehow updatedb did a really fast job of that – great.

Lopes

So like most commands on the terminal that we’ve been running, a lot of it is case-sensitive.

What would we do, or what option would we have to set to turn off case-sensitivity.

League

Yeah, both locate and find when you do those patterns on the filename,

it’s assuming a case-sensitive match. So if I did locate and maybe I

falsely remembered that my weblog files were capitalized like that – it’s not going to find them.

locate WEBLOG-2015

But there is an option locate -i for insensitive –

it’s the same that grep uses for insensitive matching – -i.

locate -i WEBLOG-2015

And then it will ignore the difference in case between your pattern and the filename itself.

That helps me find more things.

find is very similar – the file matching feature in find was called -name and I did –

first of all let’s verify that find does not implicitly put the * before and after,

find -name weblog-2015

like locate does. So locate with that weblog-2015 and no asterisks

still found these files but find will not. So it wants the star there.

find -name 'weblog-2015*'

But same thing – find with -name is case-sensitive so if I do capital,

find -name 'WEBLOG-2015*'

it’s going to look for capitals. So the fix there is that there’s just a

separate query operator called -iname and that’s an insensitive match.

find -iname 'WEBLOG-2015*'

And then it’ll find those files.

Lopes

So in today’s episode we touched base on locating files using three commands,

find, locate, and xargs. That wraps up today’s episode.

That also wraps up Season 1 of Command Line TV.

League

Yes, we hope you found this useful – we covered lots of things since we started this –

so, navigating through files with cd, creating pipelines to do some text processing,

searching for stuff, image processing, package management, redirection,

shell scripts – so we did lots of things. And if you found this useful,

I hope you’ll get in touch with us.

Lopes

You can reach us at heychris@commandline.tv or follow us on Twitter @commandlinetv.

League

And if we have good feedback from you and you found this useful then we’ll try to do more!

[Dark electronic beat]

[Captions by Christopher League]

[End]