Look at various different ways in which you can create arrays within numpy or numpy there i go again how you can slice arrays to extract just rows or columns or block of data you want how you can aggregate along an axis using either mean or max or mean or some how you can multiply arrays together and there's two different ways to do that and i'll explain the differences between them and then we'll do a practical example of multiplication using premier league football data as it happens we're then going to look at how you can sort arrays.
How you can join arrays together again two different ways to do that too and finally i'll look at a whole raft of other array techniques you can use to do things like transposing arrays or filtering them or many other things besides at the top right of the screen you'll be able to click on the link which should appear about now and that will give you access to any files or exercises to do with this tutorial but that's enough of me i'm going to vanish and sven will guide you through as ever the rest of the tutorial.
So let's get started so before we begin looking at what number is or num pears let's look at one of the most controversial topics which is how you pronounce the name of the module there's two schools of thought americans will probably be inclined to pronounce it numpy it is after all called python europeans particularly perhaps those wretched brits tend to pronounce it num p to rhyme with lumpy i can see merit in both approaches i.
Probably already pronounced it in two different ways and will continue to be inconsistent but i'm leaning towards numpy because most of the modules have a pi sound in them but let's get on to more important things the example we're going to be doing during this tutorial will be primarily around the premier league which is english football basically hopefully people are familiar with this around the world although you don't particularly need to be so this is the results at the end of the 20 20 21 season.
And these are the results at the end of the 2019-20 season and by happy coincidence the same five teams finished in the top five positions which means we can create arrays pretending that these are the only teams in the league and then we can manipulate them to show exciting results so i want to look now at how you can specify arrays dimensions shapes data types and axes some of the main terms that numpy or numpy that i go again uses so we might create an array to hold the.
But some of the other things are numbered from one or you could create another array to hold the actual data how many games were one drawn lost goals four are gained some difference in this case this would have two dimensions the data type would be integer although again it would actually be probably in 32 and the shape would be five by eight and the axes the first axis would be going down and the second axis number one would be going across what you couldn't do is combine the two.
Arrays together one of the main symptoms that's the wrong word one of the main things you can notice about an array of numpy is that it can only have a single data type within it so you can't mix data types and that's to allow them to work more quickly you can have arrays with more than two dimensions and if you want so this is an array which takes at the top there 20 20 21 results at the bottom it takes the 20 19 20 results and it combines them.
Together into a single array and this array would have three dimensions the data type would probably be integer or some variant of that and the shape would be two by five by seven and there's nothing to stop you having four five six n dimensions although you'll find it very difficult to represent on a two-dimensional sheet of paper so let's look at some advantages of num pi i still can't decide one of the advantages is it's quick it's written in another language probably c.
Sharp i think and compiled so unlike some of the other modules it should run very quickly a second advantage is there's loads of built-in functions for doing things like transposing arrays and multiplying them together and many other things besides many of which we'll look at during this tutorial and a third advantage is it's used as a basis for other modules such as for example pandas now you shouldn't worry necessarily too much about this for example if you're learning pandas.
You can perfectly well use it without really understanding many of the things i'm going to tell you in this numpy tutorial so you might like to consider that when you're scratching or tearing your hair out trying to understand some of the more complicated examples we're going to come to so in order to be able to look at numpy you need to import the module so i can type in import the name of the module numpy as and universally it's called np you don't.
Have to give it an alias like that but everybody does now you'll notice it doesn't recognize numpy and that's because i haven't yet
Imported it or rather installed it or if i have i then uninstalled it so what i need to do is open up a new copy of terminal window as i've just done then and type pip space install space numpy and i do apologize to american viewers i seem to be standardizing on the college numpy if i press return then you can see it's installing numpy.And that's finished and the underlining and the error message has gone away from that so that's good that means i'll be able to create a raise but before i do that i just want to show you the difference between a numpy array and a normal list so i'll begin in a rather strange place by creating a single variable so what i'm going to do is create a variable called spiceboy we'll have some spice girls soon and i'll set that to be my own name sven so that's a single variable you can combine variables to create lists or tuples which are called.
Sequences in general so for example i might have a variable called spice girls spice girls list because that's what it's going to be and what that would do is so i've got slightly too many square brackets there that could contain a list of all the different spice girls and then i could print it out and i don't think i'm telling you anything you didn't know already so i'll just print out that list and if i run that you can see it gives.
Me the list i've just typed in but the critical thing about this is you can put anything you like into a list including strings of text numbers dates other lists tuples so just to prove this let's add a ridiculous tuple 1 2 and 3. it doesn't belong in the list but when i run the program you can see i get the elements i've added to it so there's no concept of a list having the same data type for each element and that's where arrays differ so finally what i'm going to do is create an array of spice girls.
They deserve to be capitalized so to do this i can create a new variable just like lists and tuples arrays are held in variables i can take my numpy module which i've chosen to refer to as mp instead and then i can create an array based on that type in an open bracket because this is a function which needs to take an argument and the argument it takes the main one is a sequence of objects so i've created a list there and what i could then do is print out.
Some information about this what i'll do is print out firstly i would like to know what shape this array i've created is i've talked a bit about shape already secondly i would like to know what data type each of the elements contains and finally what i'm going to do is print out the array itself i'll just comment out that line so that doesn't interfere with anything so if i run this program you can see.
That i will get the shape of it a single dimension with five elements in the data type i did say it was going to be weird it's called angle brackets u6 because what it does is look at all the entries works out the longest one is six characters long i think ginger and sporty share that distinction and allocates six characters to each of the elements when you create an array it will always use the it will look down the list of objects to be stored in it and always allocate the amount of bytes stored by.
The most memory intensive element so in this case it's not just using six characters to saw the sporty and ginger but also to store posh scary and baby two so that's something to think about when you're creating an array and the third thing it does is it prints out the array itself and you can see it's an array because there's no commas between the elements in it otherwise it would have been a list so that's a basic array let's create another one now let's create our premier league array.
To do this i'm going to create a variable called prem table and i'm going to set it to be something so i'll create another array mp.array and put some brackets in and then within the brackets i'm going to specify that i want to start a list there my list there so this array is going to be two-dimensional so we'll have a list of lists and if i go to my clipboard i'm hoping i will find what i'm looking for there.
So that array contains um the team number the number of games they won
Drew and lost how many goals they scored for how many goals they scored against and the total number points one although there's nothing telling of that it's up to you to remember that information so what i could do again is to print that information out but i'm going to cheat slightly and copy these lines from above and what i'll do is select the word spice girls and type in instead prem table.I could probably do this more efficiently but nothing springs to mind because i want to avoid editing every occurrence of it and if i run this program having commented out my information on the spice girls you can see that it will give me the information i wanted the shape of this array is five by seven so it's five rows and seven columns although in many ways think of these as as rows and columns can be a mistake with arrays they're really more just axes the data type is in 32 so that's good it means every single.
Number i'm storing will fit into an in 32 variable there's no need to use in 64 for any of these none of the numbers are so big and then it prints out the array itself and that's a visual representation of a two-dimensional array so that's how you create arrays in numpy i just wanted to say a quick word about data types i've created a file called b datatypes.py and copied or pasted rather my premier league table array into that.
And if i just print that out you will see that it gives me my array now when you're creating an array like this you've got the choice of a second argument so if i just managed to position my mouse pointer between there just before the end of the function cool type the comma in you can see it comes up with the intellisense and you can see the second argument is specifying what the data type should be so you can use this either as a positional or a named argument so i could either type dtype equals or just.
Put the data type in and what i'm going to do is type in np dot int 64 and that means it's a larger integer more on the different data types in a second so when i run that you can see that it should work but just to prove what's going on i'm also going to print out the arrays data type as well and i think we'll have a blank line between them as a bit of a treat so if i now run that program you can see it gives me the data type which is now changed to in 64 from what it would be by default in 32.
And the array itself so what would happen if you tried putting in a data type which wasn't wholly compatible well it would still probably work so i'm going to use bool8 which is a boolean in numpy again more on this in a second and if i run that you'll see i get an array of trues because unless any of these values is zero then it will be treated as true there's one exception there as you can see and sure enough it shows up as a false in my array so what are the different data types that you can use well there's a file included within this.
Tutorial called datatypes.png and if you go to that you can see a summary of them so uh i've only included intro floating point and boolean because i can't see why you'd ever have an array of strings um but you can and you can see uh the main ones data types you can use in this column now that's not the full story that's just the simplification of it so if you want the full story in the useful links.txt file i've included a link to the full mccoy.
If you like and if you go to that you can see on the numpy website a full list of how the different aliases work and what they actually mean underneath the scenes but i think by this stage i've told you way more than you'd probably need to know to use numpy so i think it's time to move on to something else so we're going to look now at ways in which you can create arrays in numpy there's there's many of them so we'll look at using a range to act as the equivalent of the normal range.
Function which will generate a sequence of numbers we'll look at using linspace to fill up a space with a given number of data points so from a start point to an endpoint we'll look at generating empty arrays and then we'll look at generating arrays of zeros and ones which isn't quite the same thing as we'll see we'll look at generating arrays of random numbers a couple of ways of doing that out of the many available and we'll look at filling arrays from sequences or iterations so to get it started i've created a file.
Called c ways to create arrays and i've imported the numpy namespace and renamed it as mp so the first thing we're going to do is look at a range and a range is the equivalent if i could spell that of the range function works in exactly the same way so what i'll do is create a an array called test array and i'll set it equal to np dot arrange now whenever i type this in i type in arrange and i'm astonished to find it doesn't.
Exist in intellisense and that's because it should be a range and i guess it's the array equivalent of just a range function and you can see that the arguments are the start the stop and the step value so this is exactly the same as range so for example i could create an array going from one up 2 but not including 11 with a step value of 2 and when i print this out which i will do now then what it will do is show me.
What i've got let's just print that out and you can see it gives me one three five seven nine so exactly the same as range so the next thing we're going to do is look at the lin space which fills well actually phil's um a range would be a better way of describing it so i'm going to keep creating the same variable this is grossly inefficient because it will mean when i run the program it keeps creating lots of different versions of the same lots of different arrays and puts them in the same variable but it saves me.
Having to comment things out so to do this i can use linspace what i'm going to do is start the number one and i'm going to go on to the number 10. but in between that i want to create i'm going to choose this number at random seven data points so now when i print this out you can see it creates seven data points one 2.5 4 5.5 7 8.5 and 10 and linspace will always generate floating point numbers because it has to to fit in the gaps.
So moving swiftly on that's the third thing we're going to do is generate an empty array to do that i'll create my trusty variable again and i'll set it equal to empty and the argument to the empty function as for so many array ones is a shape so in this i can specify in brackets any shape i like i'm going to go for 2x3 let's say and if i run that program you can see it generates a fairly strange looking array.
So the idea behind this is it will create a space in memory ready for use but it won't initialize any of the values so if it's not initializing any of the values if it's saving time why on earth has it got values in them and the answer to that is it's displaying whatever was left in memory so the basis common did a bit of your memory in your computer and whatever was there already it's just showing those values as if they were numbers so the empty uh function has the advantage it's much much quicker than.
Creating an array full of zeros for example because if you do the latter you have to actually populate or initialize all the values so that's an empty array let's do some zeros and ones which none of which will surprise anyone so to do this i can put mp dot zeros and what i'll do again is specify the shape of it so let's this time let's have a single dimension let's have 10 comma the comma is to signify that this is just one dimension and if i run that you can see.
It gives me an array of ten zeros there's nothing very exciting about that and i'm afraid that the ones is going to be equally unexciting it's just going to have ones instead of zeros but let's have a second dimension to it a second axis rather and if i run that you'll see my second array just contains ones i presume the main use of these is working with things like determinants with matrices that's all i can think of anyway moving swiftly on so let's now create some.
Random numbers and to do that we'll use our trusty variable to create an array type mp.random and when you type the dot after that you can see some idea of just how many random number functions there are we're going to use two which will cover most eventualities i think which are rand and randint so let's start with rand this is an odd function because it's not obvious from the intellisense what the arguments should be but all you do is specify the shape but not included in brackets so let's say i want.
Something like two by seven um when i run that you'll see it generates an array of random numbers and uh the random numbers are between zero and one that's what's returned by default with the rand function if you want integers by contrast then what you can do is use the randint function so random.randint and you can specify three arguments the first one is where you want to start let's say the lowest number is one the second one is where you want to stop.
Or actually just before where you want to stop so we'll we'll have 10 as the highest integer and the third one is a shape and we haven't had many three-dimensional arrays i think it's time to rectify that that's a three by four by five and if i run that you'll see i get a fairly large array three by four by five hopefully you'll agree with me about that containing random numbers between 1 well it's actually between 0 and 11 so starting at 1 but not including the endpoint 11. so that's random numbers.
And then the last thing i was going to do in this comprehensive section on the different ways in which you can generate arrays is to generate an array from a sequence and to do that i need a sequence so what i'm going to do is create one called the trusty squares what i'll do is use a list comprehension to generate a list of the squares so i'll take n times n for n in range and i'll go from 1.
Up two but not including eleven so that'll be the first ten squares if you've no idea what that is it's a list comprehension it's an excellent tutorial go back to it and watch it well worth watching so what i can now do is uh set my variable called testarray equal to mp dot and then i can use the built-in mathematical function from iter well that's what i thought it was it actually stands for from iteration it took me a worryingly long time to realize that so what i'll do is type in the nope and brackets and then all i.
Need to specify you would think was the um iterable object the sequence in this case and you'd think that would be enough but when i run it i'll actually get an error message because it's missing a retired argument the data type quite why it needs a data type i'm not sure it already knows these are integers but hey who am i to argue so i'll tell it it's using in 32 data and now if i run that again you should see i get a list of all the squares as.
An array so you can't argue you can't say that there's not lots of ways of creating arrays using numpy so we're going to look at how you can slice arrays now and we'll start with a quick bit of revision of how you can slice sequences because this is exactly the same principle so if you have a sequence you can specify up to three arguments or three parts i guess i should say they're not strictly speaking arguments.
The first part is where you want to start in your sequence the second optional part is one before the end position so it will run up to but not including that point and the third optional part is how many items you want to skip each time so you can specify one two or three of those parts and that's how it works for a sequence now the difference with an array is it's even possibly more complicated because what you can do for an array is have an additional argument.
After here with how you want to treat the second and then the third and then the force axis in the array so you can slice each axis of the array independently using exactly the same syntax so let's see how this works in practice so i've created a file called the slicingarrays.py and i've imported my numpy module and i've created a premiere table array what i'm going to do is three or four examples of how you can slice this.
So i'll start with printing the first three teams so i'll put a headline into that effect with a line break before it just make it easier to read and then i'm going to print out the premier league table but not all the premier league table i'll put a square bracket in to show that i want to slice this and i'm going to start at the beginning so i'll start at zero and i'll stop just before the fourth row and the fourth row because they're numbered from zero is denoted as number three but as i explained in my section.
On sequence slicing the best thing to do is take the difference between those two numbers three minus zero is three so i'll get three rows out i don't actually need to put the zero in it's a default anyway it will by default start with at the beginning if i try running that now you'll see i should get the first three teams i didn't specify any slicing for my horizontal my second axis and so would assume i want everything in.
That so let's try another example this time let's print out uh let's all the key info i'll explain what i mean by that in a second just after typing in a comment so what do i mean by key info well i don't want the first column because that just contains a team number that's nothing to do with how the teams did and i don't want the last column giving their points because i can infer that from how many games they won.