Commit 61b8b54c authored by Loic Huder's avatar Loic Huder
Browse files

Changed example of dict comprehension

parent 3f7d1271
Pipeline #32973 passed with stage
in 57 seconds
%% Cell type:markdown id: tags:
# Python training UGA 2017
**A training to acquire strong basis in Python to use it efficiently**
Pierre Augier (LEGI), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre), Christophe Picard (LJK), Loïc Huder (ISTerre)
# [Data structures](https://docs.python.org/3/tutorial/datastructures.html)
4 built-in containers: list, tuple, set and dict...
For more containers: see [collections](https://docs.python.org/3/library/collections.html)...
%% Cell type:markdown id: tags:
### list: mutable sequence
Lists are mutable ordered tables of inhomogeneous objects. They can be viewed as an array of references (nearly pointers) to objects.
%% Cell type:code id: tags:
``` python
# 2 equivalent ways to define an empty list
l0 = []
l1 = list()
assert l0 == l1
# not empty lists
l2 = ['a', 2]
l3 = list(range(3))
print(l2, l3, l2 + l3)
print(3 * l2)
```
%% Output
['a', 2] [0, 1, 2] ['a', 2, 0, 1, 2]
['a', 2, 'a', 2, 'a', 2]
%% Cell type:markdown id: tags:
The [`itertools`](https://docs.python.org/3/library/itertools.html) module provide other ways of iterating over lists or set of lists (e.g. cartesian product, permutation, filter, ... ).
%% Cell type:markdown id: tags:
### list: mutable sequence
The builtin function `dir` returns a list of name of the attributes. For a list, these attributes are python system attributes (with double-underscores) and 11 public methods:
%% Cell type:code id: tags:
``` python
print(dir(l3))
```
%% Output
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
%% Cell type:code id: tags:
``` python
l3.append(10)
print(l3)
l3.reverse()
print(l3)
```
%% Output
[0, 1, 2, 10]
[10, 2, 1, 0]
%% Cell type:code id: tags:
``` python
# Built-in functions applied on lists
# return lower value
print(min(l3))
# return higher value
print(max(l3))
# return sorted list
print(sorted([5, 2, 10, 0]))
```
%% Output
0
10
[0, 2, 5, 10]
%% Cell type:code id: tags:
``` python
# "pasting" two lists can be done using zip
l1 = [1, 2, 3]
s = 'abc'
print(list(zip(l1, s)))
print(list(zip('abc', 'defg')))
```
%% Output
[(1, 'a'), (2, 'b'), (3, 'c')]
[('a', 'd'), ('b', 'e'), ('c', 'f')]
%% Cell type:markdown id: tags:
### `list`: list comprehension
They are iterable so they are often used to make loops. We have already seen how to use the keyword `for`. For example to build a new list (side note: `x**2` computes `x^2`):
%% Cell type:code id: tags:
``` python
l0 = [1, 4, 10]
l1 = []
for number in l0:
l1.append(number**2)
print(l1)
```
%% Output
[1, 16, 100]
%% Cell type:markdown id: tags:
There is a more readable (and slightly more efficient) method to do such things, the "list comprehension":
%% Cell type:code id: tags:
``` python
l1 = [number**2 for number in l0]
print(l1)
```
%% Output
[1, 16, 100]
%% Cell type:code id: tags:
``` python
# list comprehension with a condition
[s for s in ['a', 'bbb', 'e'] if len(s) == 1]
```
%% Output
['a', 'e']
%% Cell type:code id: tags:
``` python
# lists comprehensions can be cascaded
[(x,y) for x in [1,2] for y in ['a','b'] ]
```
%% Output
[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]
%% Cell type:markdown id: tags:
### Do it yourself (advanced)
- Write a function `extract_patterns(text, n=3)` extracting the list of patterns of size `n=3` from a long string (e.g. if `text = "basically"`, patterns would be the list `['bas', 'asi', 'sic', ..., 'lly']`). Use list comprehension, range, slicing. Use a sliding window.
- You can apply your function to a long "ipsum lorem" string (ask to your favorite web search engine).
%% Cell type:markdown id: tags:
#### A possible solution
%% Cell type:code id: tags:
``` python
text = "basically"
def extract_patterns(text, n=3):
pat = [text[i:i+n] for i in range(len(text)-n+1)]
return pat
print("patterns=", extract_patterns(text))
print("patterns=", extract_patterns(text, n=5))
```
%% Output
patterns= ['bas', 'asi', 'sic', 'ica', 'cal', 'all', 'lly']
patterns= ['basic', 'asica', 'sical', 'icall', 'cally']
%% Cell type:markdown id: tags:
### `tuple`: immutable sequence
Tuples are very similar to lists but they are immutable (they can not be modified).
%% Cell type:code id: tags:
``` python
# 2 equivalent notations to define an empty tuple (not very useful...)
t0 = ()
t1 = tuple()
assert t0 == t1
# not empty tuple
t2 = (1, 2, 'a') # with the parenthesis
t2 = 1, 2, 'a' # it also works without parenthesis
t3 = tuple(l3) # from a list
```
%% Cell type:code id: tags:
``` python
# tuples only have 2 public methods (with a list comprehension)
[name for name in dir(t3) if not name.startswith('__')]
```
%% Output
['count', 'index']
%% Cell type:code id: tags:
``` python
# assigment of multiple variables in 1 line
a, b = 1, 2
print(a, b)
# exchange of values
b, a = a, b
print(a, b)
```
%% Output
1 2
2 1
%% Cell type:markdown id: tags:
### `tuple`: immutable sequence
Tuples are used *a lot* with the keyword `return` in functions:
%% Cell type:code id: tags:
``` python
def myfunc():
return 1, 2, 3
t = myfunc()
print(type(t), t)
# Directly unpacking the tuple
a, b, c = myfunc()
print(a, b, c)
```
%% Output
<class 'tuple'> (1, 2, 3)
1 2 3
%% Cell type:markdown id: tags:
### `set`: a hashtable
Unordered collections of unique elements (a hashtable). Sets are mutable. The elements of a set must be [hashable](https://docs.python.org/3/glossary.html#term-hashable).
%% Cell type:code id: tags:
``` python
s0 = set()
```
%% Cell type:code id: tags:
``` python
{1, 1, 1, 3}
```
%% Output
{1, 3}
%% Cell type:code id: tags:
``` python
set([1, 1, 1, 3])
```
%% Output
{1, 3}
%% Cell type:code id: tags:
``` python
s1 = {1, 2}
s2 = {2, 3}
print(s1.intersection(s2))
print(s1.union(s2))
```
%% Output
{2}
{1, 2, 3}
%% Cell type:markdown id: tags:
### `set`: lookup
Hashtable lookup (for example `1 in s1`) is algorithmically efficient (complexity O(1)), i.e. theoretically faster than a look up in a list or a tuple (complexity O(size iterable)).
%% Cell type:code id: tags:
``` python
print(1 in s1, 1 in s2)
```
%% Output
True False
%% Cell type:markdown id: tags:
### What is a hashtable?
https://en.wikipedia.org/wiki/Hash_table
%% Cell type:code id: tags:
``` python
from random import shuffle, randint
n = 20
i = randint(0, n-1)
print('integer remove from the list:', i)
l = list(range(n))
l.remove(i)
shuffle(l)
print('shuffled list: ', l)
```
%% Output
integer remove from the list: 3
shuffled list: [4, 2, 5, 16, 15, 6, 9, 18, 8, 7, 13, 11, 17, 14, 12, 0, 19, 1, 10]
integer remove from the list: 12
shuffled list: [6, 15, 7, 1, 0, 8, 2, 19, 10, 17, 11, 3, 9, 14, 16, 4, 18, 13, 5]
%% Cell type:markdown id: tags:
## DIY: back to the "find the removed element" problem
- Could the problem be solved using set ?
- What is the complexity of this solution ?
%% Cell type:markdown id: tags:
## A possible solution :
%% Cell type:code id: tags:
``` python
full_set = set(range(n))
changed_set = set(l)
ns = full_set - changed_set
ns.pop()
```
%% Output
3
12
%% Cell type:markdown id: tags:
## Complity :
- line 1: n insertions --> O(n)
- line 2 : n insertions --> O(n)
- line 3: one traversal O(n), with one lookup at each time (O(1) -> O(n)
-> Complixity of the whole algorithm : O(n)
# Complexity of the "sum" solution :
- One traversal for the computation of the sum O(n) with sum at each step O(1) -> O(n)
%% Cell type:markdown id: tags:
### `dict`: unordered set of key: value pairs
The dictionary (`dict`) is a very important data structure in Python. All namespaces are (nearly) dictionaries and "Namespaces are one honking great idea -- let's do more of those!" (The zen of Python).
A dict is a hashtable (a set) + associated values.
%% Cell type:code id: tags:
``` python
d = {}
d['b'] = 2
d['a'] = 1
print(d)
```
%% Output
{'b': 2, 'a': 1}
%% Cell type:code id: tags:
``` python
d = {'a': 1, 'b': 2, 0: False, 1: True}
print(d)
```
%% Output
{'a': 1, 'b': 2, 0: False, 1: True}
%% Cell type:markdown id: tags:
### Tip: parallel between `dict` and `list`
You can first think about `dict` as a super `list` which can be indexed with other objects than integers (and in particular with `str`).
%% Cell type:code id: tags:
``` python
l = ["value0", "value1"]
l.append("value2")
print(l)
```
%% Output
['value0', 'value1', 'value2']
%% Cell type:code id: tags:
``` python
l[1]
```
%% Output
'value1'
%% Cell type:code id: tags:
``` python
d = {"key0": "value0", "key1": "value1"}
d["key2"] = "value2"
print(d)
```
%% Output
{'key0': 'value0', 'key1': 'value1', 'key2': 'value2'}
%% Cell type:code id: tags:
``` python
d["key1"]
```
%% Output
'value1'
%% Cell type:markdown id: tags:
But warning, `dict` are not ordered (since they are based on a hashtable)!
%% Cell type:markdown id: tags:
### `dict`: public methods
%% Cell type:code id: tags:
``` python
# dict have 11 public methods (with a list comprehension)
[name for name in dir(d) if not name.startswith('__')]
```
%% Output
['clear',
'copy',
'fromkeys',
'get',
'items',
'keys',
'pop',
'popitem',
'setdefault',
'update',
'values']
%% Cell type:markdown id: tags:
### `dict`: different ways to loops over a dictionary
%% Cell type:code id: tags:
``` python
# loop with items
for key, value in d.items():
if isinstance(key, str):
print(key, value)
```
%% Output
key0 value0
key1 value1
key2 value2
%% Cell type:code id: tags:
``` python
# loop with values
for value in d.values():
print(value)
```
%% Output
value0
value1
value2
%% Cell type:code id: tags:
``` python
# loop with keys
for key in d.keys():
print(key)
```
%% Output
key0
key1
key2
%% Cell type:code id: tags:
``` python
# dict comprehension (here for the "inversion" of the dictionary)
# dict comprehension (here to change the case of values)
print(d)
d1 = {v: k for k, v in d.items()}
d1 = {k: v.upper() for k, v in d.items()}
print(d1)
```
%% Output
{'key0': 'value0', 'key1': 'value1', 'key2': 'value2'}
{'key0': 'VALUE0', 'key1': 'VALUE1', 'key2': 'VALUE2'}
%% Cell type:markdown id: tags:
## Do it yourself:
Write a function that returns a dictionary containing the number of occurrences of letters in a text.
%% Cell type:code id: tags:
``` python
text = 'abbbcc'
```
%% Cell type:markdown id: tags:
#### A possible solution:
%% Cell type:code id: tags:
``` python