Commit 528d73a1 authored by Raphael Bacher's avatar Raphael Bacher
Browse files

add broadcast

parent 97dd052b
......@@ -74,7 +74,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
......@@ -1141,14 +1141,216 @@
"\n",
"For more info, see the related FAQ entry: https://www.scipy.org/scipylib/faq.html#why-both-numpy-linalg-and-scipy-linalg-what-s-the-difference."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Broadcasting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Broadcasting is the name given to the method that NumPy uses to allow array arithmetic between arrays with a different shape or size.\n",
"\n",
"Although the technique was developed for NumPy, it has also been adopted more broadly in other numerical computational libraries, such as Theano, TensorFlow, and Octave.\n",
"\n",
"Broadcasting solves the problem of arithmetic between arrays of differing shapes by in effect replicating the smaller array along the last mismatched dimension.\n",
"\n",
" The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.\n",
"\n",
"— Broadcasting, SciPy.org\n",
"\n",
"See : https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1 2 3]\n",
"2\n",
"[3 4 5]\n"
]
}
],
"source": [
"# scalar and one-dimensional\n",
"a = np.array([1, 2, 3])\n",
"print(a)\n",
"b = 2\n",
"print(b)\n",
"c = a + b\n",
"print(c)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1 2 3]\n",
" [1 2 3]]\n",
"2\n",
"[[3 4 5]\n",
" [3 4 5]]\n"
]
}
],
"source": [
"# scalar and two-dimensional\n",
"A = np.array([[1, 2, 3], [1, 2, 3]])\n",
"print(A)\n",
"b = 2\n",
"print(b)\n",
"C = A + b\n",
"print(C)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1 2 3]\n",
" [1 2 3]]\n",
"[1 2 3]\n",
"[[2 4 6]\n",
" [2 4 6]]\n"
]
}
],
"source": [
"# one-dimensional and two-dimensional\n",
"A = np.array([[1, 2, 3], [1, 2, 3]])\n",
"print(A)\n",
"b = np.array([1, 2, 3])\n",
"print(b)\n",
"C = A + b\n",
"print(C)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Limitations of Broadcasting\n",
"\n",
"Broadcasting is a handy shortcut that proves very useful in practice when working with NumPy arrays.\n",
"\n",
"That being said, it does not work for all cases, and in fact imposes a strict rule that must be satisfied for broadcasting to be performed.\n",
"\n",
"Arithmetic, including broadcasting, can only be performed when the shape of each dimension in the arrays are equal or one has the dimension size of 1. The dimensions are considered in reverse order, starting with the trailing dimension; for example, looking at columns before rows in a two-dimensional case."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Vectorization\n",
"\n",
"- The vectorize function : https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html\n",
"- The art of rewriting problems"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3, 4, 1, 2])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Vetorize a function\n",
"def myfunc(a, b):\n",
" \"Return a-b if a>b, otherwise return a+b\"\n",
" if a > b:\n",
" return a - b\n",
" else:\n",
" return a + b\n",
"\n",
"vfunc = np.vectorize(myfunc)\n",
"vfunc([1, 2, 3, 4], 2)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 95 µs, sys: 0 ns, total: 95 µs\n",
"Wall time: 78.7 µs\n",
"CPU times: user 619 µs, sys: 0 ns, total: 619 µs\n",
"Wall time: 608 µs\n"
]
}
],
"source": [
"# Vectorize a problem\n",
"\n",
"def dist_loop(X,Y):\n",
" \"\"\"\n",
" Naive loop\n",
" \"\"\"\n",
" res=0\n",
" for x,y in zip(X,Y):\n",
" res += (x-y)**2\n",
" \n",
" return np.sqrt(res)\n",
"\n",
" \n",
"def dist(x,y): \n",
" return np.sqrt(np.sum((x-y)**2))\n",
"\n",
"a = np.random.random(1000)\n",
"b = np.random.random(1000)\n",
"%time dist_a_b = dist(a,b)\n",
"%time dist_a_b_loop = dist_loop(a,b)\n",
"assert np.isclose(dist_a_b, dist_a_b_loop)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"celltoolbar": "Diaporama",
"kernelspec": {
"display_name": "Python 3",
"display_name": "jnb-rise",
"language": "python",
"name": "python3"
"name": "jnb-rise"
},
"language_info": {
"codemirror_mode": {
......@@ -1160,7 +1362,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
"version": "3.7.3"
}
},
"nbformat": 4,
......
%% Cell type:markdown id: tags:
# Python training UGA 2017
**A training to acquire strong basis in Python to use it efficiently**
Pierre Augier (LEGI), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre), Oliver Henriot (GRICAD), Christophe Picard (LJK), Loïc Huder (ISTerre)
# Python scientific ecosystem
# A short introduction to Numpy, Scipy and Pandas
%% Cell type:markdown id: tags:
## Python scientific ecosystem
There are a lot of very good Python packages for sciences. The fundamental packages are in particular:
- [numpy](http://www.numpy.org/): numerical computing with powerful numerical arrays objects, and routines to manipulate them.
- [scipy](http://www.scipy.org/): high-level numerical routines. Optimization, regression, interpolation, etc.
- [matplotlib](http://matplotlib.org/): 2D-3D visualization, “publication-ready” plots.
With `IPython` and `Spyder`, Python plus these fundamental scientific packages constitutes a very good alternative to Matlab, that is technically very similar (using the libraries Blas and Lapack). Matlab has a JustInTime (JIT) compiler so that Matlab code is generally faster than Python. However, we will see that Numpy is already quite efficient for standard operations and other Python tools (for example `pypy`, `cython`, `numba`, `pythran`, `theano`...) can be used to optimize the code to reach the performance of optimized Matlab code.
The advantage of Python over Matlab is its high polyvalency (and nicer syntax) and there are notably several other scientific Python packages (see our notebook `pres13_doc_applications.ipynb`):
%% Cell type:markdown id: tags:
- [sympy](http://www.sympy.org) for symbolic computing,
- [pandas](http://pandas.pydata.org/), [statsmodels](http://www.statsmodels.org), [seaborn](http://seaborn.pydata.org/) for statistics,
- [h5py](http://www.h5py.org/), [h5netcdf](https://pypi.python.org/pypi/h5netcdf) for hdf5 and netcdf files,
- [mpi4py](https://pypi.python.org/pypi/mpi4py) for MPI communications,
- [opencv](https://pypi.python.org/pypi/opencv-python), [scikit-image](http://scikit-image.org/) for image processing,
- [pyopencl](https://pypi.python.org/pypi/pyopencl), [pycuda](https://mathema.tician.de/software/pycuda/), [theano](http://deeplearning.net/software/theano/), [tensorflow](https://www.tensorflow.org/) for speed and GPU computing,
- [scikit-learn](http://scikit-learn.org), [keras](https://keras.io/), [mxnet](http://mxnet.io/) for machine learning,
- [bokeh](http://bokeh.pydata.org) for display data efficiently,
- [mayavi](http://docs.enthought.com/mayavi/mayavi/) for 3D visualization,
- [qtpy](https://pypi.python.org/pypi/QtPy), [kivy](https://kivy.org) for GUI frameworks
- ...
%% Cell type:markdown id: tags:
## A short introduction on NumPy
Code using `numpy` usually starts with the import statement
%% Cell type:code id: tags:
``` python
import numpy as np
```
%% Cell type:markdown id: tags:
NumPy provides the type `np.ndarray`. Such array are multidimensionnal sequences of homogeneous elements. They can be created for example with the commands:
%% Cell type:code id: tags:
``` python
# from a list
l = [10.0, 12.5, 15.0, 17.5, 20.0]
np.array(l)
```
%%%% Output: execute_result
array([10. , 12.5, 15. , 17.5, 20. ])
%% Cell type:code id: tags:
``` python
# fast but the values can be anything
np.empty(4)
```
%%%% Output: execute_result
array([1.27880790e-316, 0.00000000e+000, 6.91986808e-310, 1.57378525e-316])
%% Cell type:code id: tags:
``` python
# slower than np.empty but the values are all 0.
np.zeros([2, 6])
```
%%%% Output: execute_result
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
%% Cell type:code id: tags:
``` python
# multidimensional array
a = np.ones([2, 3, 4])
print(a.shape, a.size, a.dtype)
a
```
%%%% Output: stream
(2, 3, 4) 24 float64
%%%% Output: execute_result
array([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]])
%% Cell type:code id: tags:
``` python
# like range but produce 1D numpy array
np.arange(4)
```
%%%% Output: execute_result
array([0, 1, 2, 3])
%% Cell type:code id: tags:
``` python
# np.arange can produce arrays of floats
np.arange(4.)
```
%%%% Output: execute_result
array([0., 1., 2., 3.])
%% Cell type:code id: tags:
``` python
# another convenient function to generate 1D arrays
np.linspace(10, 20, 5)
```
%%%% Output: execute_result
array([10. , 12.5, 15. , 17.5, 20. ])
%% Cell type:markdown id: tags:
A NumPy array can be easily converted to a Python list.
%% Cell type:code id: tags:
``` python
a = np.linspace(10, 20 ,5)
list(a)
```
%%%% Output: execute_result
[10.0, 12.5, 15.0, 17.5, 20.0]
%% Cell type:code id: tags:
``` python
# Or even better
a.tolist()
```
%%%% Output: execute_result
[10.0, 12.5, 15.0, 17.5, 20.0]
%% Cell type:markdown id: tags:
# NumPy efficiency
Beside some convenient functions for the manipulation of data in arrays of arbritrary dimensions, `numpy` can be much more efficient than pure Python.
%% Cell type:code id: tags:
``` python
n = 1000
# we use the ipython magic command %timeit
%timeit list(range(n))
```
%%%% Output: stream
15.6 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%% Cell type:code id: tags:
``` python
%%capture timeit_python
# to capture the result of the command timeit in the variable timeit_python
# Pure Python
%timeit list(range(n))
```
%% Cell type:code id: tags:
``` python
%%capture timeit_numpy
# numpy
%timeit np.arange(n)
```
%% Cell type:code id: tags:
``` python
def compute_time_in_second(timeit_result):
string = timeit_result.stdout
print(string)
for line in string.split('\n'):
words = line.split(' ')
if len(words) > 1:
time = float(words[0])
unit = words[1]
if unit == 'ms':
time *= 1e-3
elif unit == 'us':
time *= 1e-6
elif unit == 'ns':
time *= 1e-9
return time
def compare_times(string, timeit_python, timeit_numpy):
time_python = compute_time_in_second(timeit_python)
time_numpy = compute_time_in_second(timeit_numpy)
print(string + ': ratio times (Python / NumPy): ',
time_python/time_numpy)
```
%% Cell type:code id: tags:
``` python
compare_times('Creation of object', timeit_python, timeit_numpy)
```
%%%% Output: stream
12.7 us +- 1.14 us per loop (mean +- std. dev. of 7 runs, 100000 loops each)
1.31 us +- 112 ns per loop (mean +- std. dev. of 7 runs, 1000000 loops each)
Creation of object: ratio times (Python / NumPy): 9.694656488549617
%% Cell type:code id: tags:
``` python
n = 200000
python_r_1 = range(n)
python_r_2 = range(n)
numpy_a_1 = np.arange(n)
numpy_a_2 = np.arange(n)
```
%% Cell type:code id: tags:
``` python
%%capture timeit_python
%%timeit
# Regular Python
[(x + y) for x, y in zip(python_r_1, python_r_2)]
```
%% Cell type:code id: tags:
``` python
print(timeit_python)
```
%%%% Output: stream
16.6 ms +- 220 us per loop (mean +- std. dev. of 7 runs, 100 loops each)
%% Cell type:code id: tags:
``` python
%%capture timeit_numpy
%%timeit
#Numpy
numpy_a_1 + numpy_a_2
```
%% Cell type:code id: tags:
``` python
print(timeit_numpy)
```
%%%% Output: stream
246 us +- 16.7 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
%% Cell type:code id: tags:
``` python
compare_times('Additions', timeit_python, timeit_numpy)
```
%%%% Output: stream
16.6 ms +- 220 us per loop (mean +- std. dev. of 7 runs, 100 loops each)
246 us +- 16.7 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)
Additions: ratio times (Python / NumPy): 67.47967479674797
%% Cell type:markdown id: tags:
This shows that when you need to perform mathematical operations on a lot of homogeneous numbers, it is more efficient to use `numpy` arrays.
%% Cell type:markdown id: tags:
# Manipulating NumPy arrays
%% Cell type:markdown id: tags:
## Access elements
Elements in a `numpy` array can be accessed using indexing and slicing in any dimension. It also offers the same functionalities available in Fortan or Matlab.
### Indexes and slices
For example, we can create an array `A` and perform any kind of selection operations on it.
%% Cell type:code id: tags:
``` python
A = np.random.random([4, 5])
A
```
%%%% Output: execute_result
array([[0.89925962, 0.31519992, 0.17170063, 0.06102236, 0.6055506 ],
[0.43365108, 0.67461267, 0.34962124, 0.75648088, 0.53096922],
[0.65643503, 0.4723704 , 0.77202087, 0.50192904, 0.14067726],
[0.80709755, 0.2314217 , 0.65465368, 0.28459125, 0.54727527]])
%% Cell type:code id: tags:
``` python
# Get the element from second line, first column
A[1, 0]
```
%%%% Output: execute_result
0.4336510750584107
%% Cell type:code id: tags:
``` python
# Get the first two lines
A[:2]
```
%%%% Output: execute_result
array([[0.89925962, 0.31519992, 0.17170063, 0.06102236, 0.6055506 ],
[0.43365108, 0.67461267, 0.34962124, 0.75648088, 0.53096922]])
%% Cell type:code id: tags:
``` python
# Get the last column
A[:, -1]
```
%%%% Output: execute_result
array([0.6055506 , 0.53096922, 0.14067726, 0.54727527])
%% Cell type:code id: tags:
``` python
# Get the first two lines and the columns with an even index
A[:2, ::2]
```
%%%% Output: execute_result
array([[0.89925962, 0.17170063, 0.6055506 ],
[0.43365108, 0.34962124, 0.53096922]])
%% Cell type:markdown id: tags:
### Using a mask to select elements validating a condition:
%% Cell type:code id: tags:
``` python
cond = A > 0.5
print(cond)
print(A[cond])
```
%%%% Output: stream
[[ True False False False True]
[False True False True True]
[ True False True True False]
[ True False True False True]]
[0.89925962 0.6055506 0.67461267 0.75648088 0.53096922 0.65643503
0.77202087 0.50192904 0.80709755 0.65465368 0.54727527]
%% Cell type:markdown id: tags:
The mask is in fact a particular case of the advanced indexing capabilities provided by NumPy. For example, it is even possible to use lists for indexing:
%% Cell type:code id: tags:
``` python
# Selecting only particular columns
print(A)
A[:, [0, 1, 4]]
```
%%%% Output: stream
[[0.89925962 0.31519992 0.17170063 0.06102236 0.6055506 ]
[0.43365108 0.67461267 0.34962124 0.75648088 0.53096922]
[0.65643503 0.4723704 0.77202087 0.50192904 0.14067726]
[0.80709755 0.2314217 0.65465368 0.28459125 0.54727527]]
%%%% Output: execute_result
array([[0.89925962, 0.31519992, 0.6055506 ],
[0.43365108, 0.67461267, 0.53096922],
[0.65643503, 0.4723704 , 0.14067726],
[0.80709755, 0.2314217 , 0.54727527]])
%% Cell type:markdown id: tags:
## Perform array manipulations
### Apply arithmetic operations to whole arrays (element-wise):
%% Cell type:code id: tags:
``` python
(A+5)**2
```
%%%% Output: execute_result
array([[34.80126403, 28.25135024, 26.7464874 , 25.61394735, 31.42219749],
[29.52456401, 32.20122896, 28.61844741, 33.13707212, 30.59162046],
[31.99525724, 29.94683782, 33.31622493, 30.27122313, 26.42656267],
[33.72238198, 27.36777304, 31.97510827, 27.92690466, 30.77226288]])
%% Cell type:markdown id: tags:
### Apply functions element-wise:
%% Cell type:code id: tags:
``` python
np.exp(A) # With numpy arrays, use the functions from numpy !
```
%%%% Output: execute_result
array([[2.45778274, 1.37053329, 1.18732233, 1.06292268, 1.83226077],
[1.54288042, 1.9632724 , 1.41853016, 2.13076459, 1.70057974],
[1.92790714, 1.60379132, 2.16413527, 1.65190478, 1.15105309],
[2.24139301, 1.26039064, 1.92447592, 1.3292186 , 1.72853679]])
%% Cell type:markdown id: tags:
### Setting parts of arrays
%% Cell type:code id: tags:
``` python
A[:, 0] = 0.
print(A)
```
%%%% Output: stream
[[0. 0.31519992 0.17170063 0.06102236 0.6055506 ]
[0. 0.67461267 0.34962124 0.75648088 0.53096922]
[0. 0.4723704 0.77202087 0.50192904 0.14067726]
[0. 0.2314217 0.65465368 0.28459125 0.54727527]]
%% Cell type:code id: tags:
``` python
# BONUS: Safe element-wise inverse with masks
cond = (A != 0)
A[cond] = 1./A[cond]
print(A)
```
%%%% Output: stream
[[ 0. 3.17258959 5.82409047 16.387435 1.65138967]
[ 0. 1.48233207 2.86023812 1.32191048 1.88334836]
[ 0. 2.11698277 1.29530177 1.99231351 7.10846954]
[ 0. 4.32111589 1.5275252 3.51381149 1.82723405]]
%% Cell type:markdown id: tags:
### Attributes and methods of `np.ndarray` (see the [doc](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray))
%% Cell type:code id: tags:
``` python
print([s for s in dir(A) if not s.startswith('__')])
```
%%%% Output: stream
['T', 'all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tobytes', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']
%% Cell type:code id: tags:
``` python
# Ex1: Get the mean through different dimensions
print(A)
print('Mean value', A.mean())
print('Mean line', A.mean(axis=0))
print('Mean column', A.mean(axis=1))
```
%%%% Output: stream
[[ 0. 3.17258959 5.82409047 16.387435 1.65138967]
[ 0. 1.48233207 2.86023812 1.32191048 1.88334836]
[ 0. 2.11698277 1.29530177 1.99231351 7.10846954]
[ 0. 4.32111589 1.5275252 3.51381149 1.82723405]]
Mean value 2.9143043986324475
Mean line [0. 2.77325508 2.87678889 5.80386762 3.1176104 ]
Mean column [5.40710095 1.50956581 2.50261352 2.23793733]
%% Cell type:code id: tags:
``` python
# Ex2: Convert a 2D array in 1D keeping all elements
print(A, A.shape)
A_flat = A.flatten()
print(A_flat, A_flat.shape)
```
%%%% Output: stream
[[ 0. 3.17258959 5.82409047 16.387435 1.65138967]
[ 0. 1.48233207 2.86023812 1.32191048 1.88334836]
[ 0. 2.11698277 1.29530177 1.99231351 7.10846954]
[ 0. 4.32111589 1.5275252 3.51381149 1.82723405]] (4, 5)
[ 0. 3.17258959 5.82409047 16.387435 1.65138967 0.
1.48233207 2.86023812 1.32191048 1.88334836 0. 2.11698277
1.29530177 1.99231351 7.10846954 0. 4.32111589 1.5275252
3.51381149 1.82723405] (20,)
%% Cell type:markdown id: tags:
### Remark: dot product
%% Cell type:code id: tags:
``` python
b = np.linspace(0, 10, 11)
c = b @ b
# before 3.5:
# c = b.dot(b)
print(b)
print(c)
```
%%%% Output: stream
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
385.0
%% Cell type:markdown id: tags:
#### For Matlab users
| ` ` | Matlab | Numpy |
| ------------- | ------ | ----- |
| element wise | `.*` | `*` |
| dot product | `*` | `@` |
%% Cell type:markdown id: tags:
### To finish: `dtypes` and sub-packages
%% Cell type:markdown id: tags:
`numpy` arrays can also be sorted, even when they are composed of complex data if the type of the columns are explicitly stated with `dtypes`.
%% Cell type:code id: tags:
``` python
dtypes = np.dtype([('country', 'S20'), ('density', 'i4'),
('area', 'i4'), ('population', 'i4')])
x = np.array([('Netherlands', 393, 41526, 16928800),
('Belgium', 337, 30510, 11007020),
('United Kingdom', 256, 243610, 62262000),
('Germany', 233, 357021, 81799600)],
dtype=dtypes)
arr = np.array(x, dtype=dtypes)
arr.sort(order='density')
print(arr)
```
%%%% Output: stream
[(b'Germany', 233, 357021, 81799600)
(b'United Kingdom', 256, 243610, 62262000)
(b'Belgium', 337, 30510, 11007020)
(b'Netherlands', 393, 41526, 16928800)]
%% Cell type:markdown id: tags:
In the previous example, we manipulated a one dimensional array containing quadruplets of data. This functionality can be used to load images into arrays and save arrays to images.
It can also be used when loading data of different types from a file with `np.genfromtxt`.
%% Cell type:markdown id: tags:
#### NumPy and SciPy sub-packages:
We already saw `numpy.random` to generate `numpy` arrays filled with random values. This submodule also provides functions related to distributions (Poisson, gaussian, etc.) and permutations.
%% Cell type:markdown id: tags:
To perform linear algebra with dense matrices, we can use the submodule `numpy.linalg`. For instance, in order to compute the determinant of a random matrix, we use the method `det`
%% Cell type:code id: tags:
``` python
A = np.random.random([5,5])
print(A)
np.linalg.det(A)
```
%%%% Output: stream
[[0.47138506 0.41353868 0.09441948 0.225147 0.82335198]
[0.04490952 0.14682972 0.31792846 0.22918746 0.73823443]
[0.50485749 0.99705961 0.51896582 0.93318595 0.11375617]
[0.37148317 0.0477689 0.29061475 0.41826056 0.47950005]
[0.70324502 0.82838271 0.92172528 0.79532669 0.56698101]]
%%%% Output: execute_result
0.06968780805887545
%% Cell type:code id: tags:
``` python
squared_subA = A[1:3, 1:3]
print(squared_subA)
np.linalg.inv(squared_subA)
```
%%%% Output: stream
[[0.14682972 0.31792846]
[0.99705961 0.51896582]]
%%%% Output: execute_result
array([[-2.15522717, 1.32033369],
[ 4.14071576, -0.6097731 ]])
%% Cell type:markdown id: tags:
If the data are sparse matrices, instead of using `numpy`, it is recommended to use `scipy`.
%% Cell type:code id: tags:
``` python
from scipy.sparse import csr_matrix
print(csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]]))
```
%%%% Output: stream
(0, 0) 1
(0, 1) 2
(1, 2) 3
(2, 0) 4
(2, 2) 5
%% Cell type:markdown id: tags:
#### SciPy or NumPy ?
`scipy` also provides a submodule for linear algebra `scipy.linalg`. It provides an extension of `numpy.linalg`.
For more info, see the related FAQ entry: https://www.scipy.org/scipylib/faq.html#why-both-numpy-linalg-and-scipy-linalg-what-s-the-difference.
%% Cell type:markdown id: tags:
### Broadcasting
%% Cell type:markdown id: tags:
Broadcasting is the name given to the method that NumPy uses to allow array arithmetic between arrays with a different shape or size.
Although the technique was developed for NumPy, it has also been adopted more broadly in other numerical computational libraries, such as Theano, TensorFlow, and Octave.
Broadcasting solves the problem of arithmetic between arrays of differing shapes by in effect replicating the smaller array along the last mismatched dimension.
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.
— Broadcasting, SciPy.org
See : https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html
%% Cell type:code id: tags:
``` python
# scalar and one-dimensional
a = np.array([1, 2, 3])
print(a)
b = 2
print(b)
c = a + b
print(c)
```
%%%% Output: stream
[1 2 3]
2
[3 4 5]
%% Cell type:code id: tags:
``` python
# scalar and two-dimensional
A = np.array([[1, 2, 3], [1, 2, 3]])
print(A)
b = 2
print(b)
C = A + b
print(C)
```
%%%% Output: stream
[[1 2 3]
[1 2 3]]
2
[[3 4 5]
[3 4 5]]
%% Cell type:code id: tags:
``` python
# one-dimensional and two-dimensional
A = np.array([[1, 2, 3], [1, 2, 3]])
print(A)
b = np.array([1, 2, 3])
print(b)
C = A + b
print(C)
```
%%%% Output: stream
[[1 2 3]
[1 2 3]]
[1 2 3]
[[2 4 6]
[2 4 6]]
%% Cell type:markdown id: tags:
#### Limitations of Broadcasting
Broadcasting is a handy shortcut that proves very useful in practice when working with NumPy arrays.
That being said, it does not work for all cases, and in fact imposes a strict rule that must be satisfied for broadcasting to be performed.
Arithmetic, including broadcasting, can only be performed when the shape of each dimension in the arrays are equal or one has the dimension size of 1. The dimensions are considered in reverse order, starting with the trailing dimension; for example, looking at columns before rows in a two-dimensional case.
%% Cell type:markdown id: tags:
### Vectorization
- The vectorize function : https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html
- The art of rewriting problems
%% Cell type:code id: tags:
``` python
# Vetorize a function
def myfunc(a, b):
"Return a-b if a>b, otherwise return a+b"
if a > b:
return a - b
else:
return a + b
vfunc = np.vectorize(myfunc)
vfunc([1, 2, 3, 4], 2)
```
%%%% Output: execute_result
array([3, 4, 1, 2])
%% Cell type:code id: tags:
``` python
# Vectorize a problem
def dist_loop(X,Y):
"""
Naive loop
"""
res=0
for x,y in zip(X,Y):
res += (x-y)**2
return np.sqrt(res)
def dist(x,y):
return np.sqrt(np.sum((x-y)**2))
a = np.random.random(1000)
b = np.random.random(1000)
%time dist_a_b = dist(a,b)
%time dist_a_b_loop = dist_loop(a,b)
assert np.isclose(dist_a_b, dist_a_b_loop)
```
%%%% Output: stream
CPU times: user 95 µs, sys: 0 ns, total: 95 µs
Wall time: 78.7 µs
CPU times: user 619 µs, sys: 0 ns, total: 619 µs
Wall time: 608 µs
%% Cell type:code id: tags:
``` python
```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment