Episode 3 Wildcards and grep
In this episode, we use basic wildcards to select files, and then explore how the ‘grep’ command can search for words or phrases across multiple files. As always, you can follow along using the same directory structure by downloading it from https://github.com/commandlinetv/sample-files.
10 August 2015
•
[Rhythmic, dark electronic intro music] | |
League |
•
Welcome back to Command Line TV. This is Episode 3. •Today we’re going to talk about wildcards and text processing using pipelines. •First of all, do we have any follow-up from last time? |
Lopes |
•
I did have a question about accessing files, especially when it comes to their extensions. •We did access a and I was curious as to which one was the actual extension type? |
League |
•
Yeah, so first of all, extensions in UNIX don’t mean quite as much as they do on other systems. •They are primarily there for humans, and the system can work – •most commands at least can work perfectly well with whatever extension you want to give it. •So when you have a file like this which is a But what’s interesting about tar is that it’s not by itself compressed. •All it does is it packages up a bunch of files, so that it creates one file, •and then you can compress that separately. •So that’s why it gets 2 extensions.
The and the And so they go in that order. •But extensions are really not as meaningful as on other systems. •So for example, if I wanted to rename that as something else, I can still use it as a compressed file. •Or another pretty shocking example – last time
I think we looked at a we did an external viewer to open up the PNG.
That was in cd thinkjava/figs/ ls *.png• And I have this PNG, so we did xdg-open gridworld.png• And it pops up in a separate viewer here. •But that viewer doesn’t – and even I can rename that file. So, to rename a file is we’re going to learn a lot more about And then I rename it so it has a So it looks like that would be a text file.
But when I do it still opens up the image viewer. It still knows that it’s an image. •So that’s a little bit odd. •The way that it knows that is that it actually looks at the content of the file, rather than the extension. •So there’s a command that does that too,
called the file gridworld.txt• When you run It looks into the file content and identifies what’s there to tell you what it is. •And so and everything else about the image. So that’s pretty useful. •For that It doesn’t actually decompress to say what is there behind the compressed data. |
Lopes |
•
So you just did What are other things that this |
League |
•
Right, so And And including characters that seem like they would be special, like dots. •So a very common way to use it is with an extension,
so And we know that works. •But you can also use it in some other interesting ways. •So if I want to see every filename here that has an
ls *a*• And this means that any characters come before an
And both of those stars could match emptiness,
so it could start with an And we get a subset of the files that were listed –
just the ones that have an Or if it’s just ls a*• So you don’t have to do it just along the lines of extensions. •You might think it only works with ls card.*• Those work fine, but you can also use it in more flexible ways. |
Lopes |
•
Besides star, are there any other wildcards that exist? |
League |
•
Sure, some of it depends on what shell you’re using and how it’s configured. •But I’ll give you 2 more of the basic ones that are always available. •One is the question mark. So, like a star, a question mark matches characters, but it only matches •exactly one character – not zero or more. So you can use it to substitute a missing character. •And a great example of that is – I have files here
that have numbers in them, so If I wanted to match all the list files with any number after them, •I could say ls list?.fig• Or it could be ls list?.*• So that question mark matching one character is useful in lots of situations. •You can also pile them together a little bit. •So if I wanted to match multiple characters, but a specific number of them, like 3 characters. •Then I could put 3 question marks.
So let’s say for example ls list*.???• So the star will match anything – that’s going to match my numbers – •and then the question mark matches one character, but it does that 3 times. •So that will get any file that starts with list and has a 3 character extension. •But it would not work if there was a one or 2 character extension. |
Lopes |
•
So the star could be used to search for things that you have unknown lengths for, •and the question mark is used for more precise queries? |
League |
•
Yeah, I think so. If you know exactly that it’s one character, or how many chars it needs to be, then that’s useful. •One great example is when you’re doing C – so I’m going to go over here to a little C program. •So C programs usually use the extensions So you can do something like ls *.?• It turns out every file in this folder is a single- character extension, so it matched all of them. •But if these were interspersed with other files, it would allow me to select just those. •That leads me to the third kind of wildcard we can do today, which is the square brackets. •So when you put square brackets into a file expression like this, then you can put •individual characters that would match. •So if I only want to match files that end with a ls *.[ch]• And that matches only one character, like the question mark, •but the character has to be one of the specified characters. •So this would match the You can switch the order of these, but that won’t make a difference – it’s just any character from the set. ls *.[hc]• So the order within that set doesn’t matter. |
Lopes |
•
I know that besides the square brackets, we also used the squiggly braces as a wildcard. •What’s the purpose of that? |
League |
•
It’s a little bit of overlap with what we’ve already seen, but it works a little differently. •So it’s important to understand the difference. •If I use squiggly braces here, I can specify different possible extensions. •They can be more than one character, so we’re going to separate them with commas. •So if I did something like ls *.{c,h}• So that is the same as with square brackets, no more power. •But let’s look at some other files up here. •So I’ve got a bunch of files that start with config,
right – And if I only wanted the ls *.{h,log}• and it would only match those 2. |
Lopes |
•
So far we’ve only used wildcards
with the Can wildcards be used with the other commands
we’ve used so far, such as |
League |
•
Yeah, definitely. So wildcards can be used with any command. •In fact, wildcards are expanded by your shell program – the program that is interpreting all your commands. •That means they can work with commands that aren’t even necessarily programmed to use them. •So let’s try it with a couple other commands.
So So if I do cat config.h• If I give it multiple files, using for example curly braces, it will just dump the contents of both files. cat config.{h,log}• And of course that scrolled off the screen,
but then I can pipe it into So I’m first getting the So it can A command we learned last week that does something
especially useful with multiple files is So if I do head config.h |
Lopes |
•
It just shows the intro to that file? |
League |
•
Yeah, the first 8 or 10 lines, whatever that is. •And we can specify an option here to make it shorter or longer. head -3 config.h• But you can also give that multiple files. •So if I said head -3 config.*• and gives me 3 lines from that file, a blank line, and then the next file. •3 lines from that file, blank line, and so on. •So it’s showing me the top couple of lines from each of multiple files. •And the multiple files are just based on the wildcard. |
Lopes |
•
Using What if instead we wanted to search throughout those files for particular words or phrases? |
League |
•
Great, there’s a perfect command for that, that you’re going to love. [Laughs] •This is one of the most powerful Unix commands that is accessible to a beginner. •And it’s called What you do with So let’s say I want to search for
a word like And the files I want to search in are what I put next. •So you could list multiple files here – like that – or you could use your wildcards to specify which files. •What if I just put star, all by itself? grep Copyright *• That will match any file at all in the current directory. •So this command says I want to see occurrences of the
word And what this output does – it shows us a filename,
so for example, there’s a file called And then in that file, it’s showing me only the
lines that match the word So the which has one line which matches, and so on.
The So that’s the basic structure of grep. |
Lopes |
•
Now the Is there a better way to organize or view what we’re trying to see? |
League |
•
Yeah, one thing is that it paged off the screen, so we have to scroll up to see some of it. •And of course we know how to do a pager,
so we could pipe it to grep Copyright * | less• and see only one screen at a time. That’s part of it. •But something else really cool you can do is – at least the version from the GNU project (which we mentioned last time as well). •
So if you say double-dash color grep --color Copyright *• the filenames are in purple, the text you’re looking for is in red, and then the rest of that line is black. •And that just makes it a whole lot easier to see the different matches. |
Lopes |
•
So like most of the commands we’ve learned,
it seems that Is there a way to work around that? |
League |
•
Yeah, you notice here that I typed And all of the matches it’s giving are a capital C. •If I searched for – oh let’s also keep the color – there are ways to specify we always want color output, •but I’m not going to get into them right now. •So I’ll just remember to put that So when I use lowercase and those are different than the upper-case ones. So yeah it is case-sensitive. •But I can put a grep --color -i copyright *• So now it will give me every match of and some of them are uppercase. And I think there are even ones, •if I search up a little bit, that are all caps. •Yeah here it appears in all caps, which we didn’t
get by doing |
Lopes |
•
I notice that the last 2 lines that your terminal put out •didn’t seem to put out anything in regards to
|
League |
•
Yeah these are errors, or warning messages. •The last one here – but it’s not useful to show you the lines of a binary file, because they won’t be understandable. •So it just says that it matches, without showing me the line where it matches. •So that explains that one. •These other ones, which also appeared sprinkled throughout up here. •When I specified but that includes other directories. •So So it just gives me a warning that one of the
filenames that I included here, by typing it’s not going to look at. •There are 2 things I can do about that. •One is to just silence those types of messages. •So there’s an you can merge that in with another switch. •So part of the one dash: So And if I do it that way, it doesn’t say anything about those directories, just silently ignores them. grep --color -is copyright *• So that is a little bit of a cleaner output. •The other option is you can actually ask all of the files within them. So when you do that, •you specify grep --color -ir copyright *• we didn’t see before. And some of them with slash in them, which means it’s in a sub-directory. •So previously it just ignored the But now it’s going and looking at all the files
in there, also searching for So that allows you to search many more files, very quickly. |
Lopes |
•
Well now that we used is there any way to tell exactly where within the file that line is? |
League |
•
Yeah, that’s a great question, and very useful. •There’s a very simple option we can add to grep,
which is So again, I can keep it as part of this same block
or make a separate grep --color -ir -n copyright *• And what this will do is it adds number after each filename here. •That number tells me what line it appeared on in that file. •So you can see that in this one file, but it also appeared down on 167, 180, and 183. •So it gives you a sense of whether all of the instances appear in the same place, •or are they spread out more – stuff like that. •There are a couple more options related to changing the output style. •One of them is – let’s say that I only want to see what files matched. •I don’t really care to see the text of the line that matched, just which files. •So that is an option called And I’m not going to do recursive anymore,
but I’ll turn the grep --color -ils copyright *• which means suppress the error messages. •So It’s just going to give me a list of filenames that contain that word copyright. •It doesn’t show me where it matches, and the file only appears once in this list •even if it has multiple occurrences of that text. •So let’s try to search for a different word, and we’ll see different files that match. •Let’s try grep --color -ils printf *• If you want to go back to the style we had before,
just delete the grep --color -is printf *• and now we will see where it matches. grep --color -is printf *• A lot of these are matching capitalized
versions of So if I wanted to see if lowercase I’ll get rid of the grep --color -ls printf *• And here are some of those. grep --color -s printf *• So So in addition to that there’s one other option that’s really cool,
which is And that means to print a count of how many matches within the file. grep --color -cs printf *• But again, it doesn’t show us the lines that match – it just shows us the counts. •So that looks like this. •What’s happening here is it has a filename, and following that filename it puts a number, •which is the number of times that the match – I believe it’s actually the number of lines that match. •So if the word |
Lopes |
•
I guess this is a good place to mention that, like all the other commands, •you can use the |
League |
•
Definitely. So And if we do grep --help | less• Another small tip is that, if you have a phrase you want to search for rather than just a single word. •Remember that spaces are significant in command lines. •So if I put spaces – let’s say I want to search for
The problem with that is it interprets the first parameter as what you’re searching for, •and the rest as filenames that you’re searching in. •So there is no file called So to do that, I can use quotes. •The same way that I quoted spaces in filenames. •So I can use quotes there to group together
And then, wherever that appears will show up. •But I need the quotes to group it together. |
Lopes |
•
So today we went over the wildcards as well as a lot
of features that |
League |
•
Yeah, and next time I think we will look at a few more of the text processing commands. •There’s a command called A lot of data in Unix systems is kept in plain text files. •And these commands will allow us to process them and search them in particular ways. •And they all interact with each other very nicely. •We may also look a little bit at renaming files using
the move ( So we’ll go into some of the features of that as well. •So join us next time! |
•
[Dark electronic beat] •[Captions by Christopher League] •[End] |