" - One traversal for the computation of the sum O(n) with sum at each step O(1) -> O(n) "
]
},
{
...
...
@@ -1010,6 +1012,96 @@
"- Given a query pattern of size 2, propose the pattern of size 3 with the same prefix that has the highest frequency. Filter the keys of the previous dictionary so that they starts with the query pattern."
"s = \"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam tristique at velit in varius. Cras ut ultricies orci. Fusce vel consequat ante, vitae luctus tortor. Sed condimentum faucibus enim, sit amet pulvinar ligula feugiat ac. Sed interdum id risus id rhoncus. Nullam nisi justo, ultrices eu est nec, hendrerit maximus lorem. Nam urna eros, accumsan nec magna eu, elementum semper diam. Nulla tempus, nibh id elementum dapibus, ex diam lacinia est, sit amet suscipit nulla nibh eu sapien. Aliquam orci enim, malesuada in facilisis vitae, pharetra sit amet mi. Pellentesque mi tortor, sagittis quis odio quis, fermentum faucibus ex. Aenean sagittis nisl orci. Maecenas tristique velit sed leo facilisis porttitor. \"\n",
Lists are mutable ordered tables of inhomogeneous objects. They can be viewed as an array of references (nearly pointers) to objects.
%% Cell type:code id: tags:
``` python
# 2 equivalent ways to define an empty list
l0=[]
l1=list()
assertl0==l1
# not empty lists
l2=['a',2]
l3=list(range(3))
print(l2,l3,l2+l3)
print(3*l2)
```
%%%% Output: stream
['a', 2] [0, 1, 2] ['a', 2, 0, 1, 2]
['a', 2, 'a', 2, 'a', 2]
%% Cell type:markdown id: tags:
The `itertools` module provide other ways of iterating over lists or set of lists (e.g. cartesian product, permutation, filter, ... ): https://docs.python.org/3/library/itertools.html
%% Cell type:markdown id: tags:
### list: mutable sequence
The builtin function `dir` returns a list of name of the attributes. For a list, these attributes are python system attributes (with double-underscores) and 11 public methods:
They are iterable so they are often used to make loops. We have already seen how to use the keyword `for`. For example to build a new list (side note: `x**2` computes `x^2`):
%% Cell type:code id: tags:
``` python
l0=[1,4,10]
l1=[]
fornumberinl0:
l1.append(number**2)
print(l1)
```
%%%% Output: stream
[1, 16, 100]
%% Cell type:markdown id: tags:
There is a more readable (and slightly more efficient) method to do such things, the "list comprehension":
%% Cell type:code id: tags:
``` python
l1=[number**2fornumberinl0]
print(l1)
```
%%%% Output: stream
[1, 16, 100]
%% Cell type:code id: tags:
``` python
# list comprehension with a condition
[sforsin['a','bbb','e']iflen(s)==1]
```
%%%% Output: execute_result
['a', 'e']
%% Cell type:code id: tags:
``` python
# lists comprehensions can be cascaded
[(x,y)forxin[1,2]foryin['a','b']]
```
%%%% Output: execute_result
[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]
%% Cell type:markdown id: tags:
### Do it yourself (advanced)
- Write a function `extract_patterns(text, n=3)` extracting the list of patterns of size `n=3` from a long string (e.g. if `text = "basically"`, patterns would be the list `['bas', 'asi', 'sic', ..., 'lly']`). Use list comprehension, range, slicing. Use a sliding window.
- You can apply your function to a long "ipsum lorem" string (ask to your favorite web search engine).
Tuples are very similar to lists but they are immutable (they can not be modified).
%% Cell type:code id: tags:
``` python
# 2 equivalent notations to define an empty tuple (not very useful...)
t0=()
t1=tuple()
assertt0==t1
# not empty tuple
t2=(1,2,'a')# with the parenthesis
t2=1,2,'a'# it also works without parenthesis
t3=tuple(l3)# from a list
```
%% Cell type:code id: tags:
``` python
# tuples only have 2 public methods (with a list comprehension)
[namefornameindir(t3)ifnotname.startswith('__')]
```
%%%% Output: execute_result
['count', 'index']
%% Cell type:code id: tags:
``` python
# assigment of multiple variables in 1 line
a,b=1,2
print(a,b)
# exchange of values
b,a=a,b
print(a,b)
```
%%%% Output: stream
1 2
2 1
%% Cell type:markdown id: tags:
### `tuple`: immutable sequence
Tuples are used *a lot* with the keyword `return` in functions:
%% Cell type:code id: tags:
``` python
defmyfunc():
return1,2,3
t=myfunc()
print(type(t),t)
# Directly unpacking the tuple
a,b,c=myfunc()
print(a,b,c)
```
%%%% Output: stream
<class 'tuple'> (1, 2, 3)
1 2 3
%% Cell type:markdown id: tags:
### `set`: a hashtable
Unordered collections of unique elements (a hashtable). Sets are mutable. The elements of a set must be [hashable](https://docs.python.org/3/glossary.html#term-hashable).
%% Cell type:code id: tags:
``` python
s0=set()
```
%% Cell type:code id: tags:
``` python
{1,1,1,3}
```
%%%% Output: execute_result
{1, 3}
%% Cell type:code id: tags:
``` python
set([1,1,1,3])
```
%%%% Output: execute_result
{1, 3}
%% Cell type:code id: tags:
``` python
s1={1,2}
s2={2,3}
print(s1.intersection(s2))
print(s1.union(s2))
```
%%%% Output: stream
{2}
{1, 2, 3}
%% Cell type:markdown id: tags:
### `set`: lookup
Hashtable lookup (for example `1 in s1`) is algorithmically efficient (complexity O(1)), i.e. theoretically faster than a look up in a list or a tuple (complexity O(size iterable)).
## DIY: back to the "find the removed element" problem
- Could the problem be solved using set ?
- What is the complexity of this solution ?
%% Cell type:markdown id: tags:
## A possible solution :
%% Cell type:code id: tags:
``` python
full_set=set(range(n))
changed_set=set(l)
full_set-changed_set
ns=full_set-changed_set
ns.pop()
```
%%%% Output: execute_result
{5}
3
%% Cell type:markdown id: tags:
## Complity :
- line 1: n insertions --> O(n)
- line 2 : n insertions --> O(n)
- line 3: one traversal O(n), with one lookup at each time (O(1) -> O(n)
-> Complixity of the whole algorithm : O(n)
# Note
# Complexity of the "sum" solution :
- One traversal for the computation of the sum O(n) with sum at each step O(1) -> O(n)
%% Cell type:markdown id: tags:
### `dict`: unordered set of key: value pairs
The dictionary (`dict`) is a very important data structure in Python. All namespaces are (nearly) dictionaries and "Namespaces are one honking great idea -- let's do more of those!" (The zen of Python).
A dict is a hashtable (a set) + associated values.
%% Cell type:code id: tags:
``` python
d={}
d['b']=2
d['a']=1
print(d)
```
%%%% Output: stream
{'b': 2, 'a': 1}
%% Cell type:code id: tags:
``` python
d={'a':1,'b':2,0:False,1:True}
print(d)
```
%%%% Output: stream
{'a': 1, 'b': 2, 0: False, 1: True}
%% Cell type:markdown id: tags:
### Tip: parallel between `dict` and `list`
You can first think about `dict` as a super `list` which can be indexed with other objects than integers (and in particular with `str`).
- For a text, count the appearance of each pattern (using a dictionary).
- Given a query pattern of size 2, propose the pattern of size 3 with the same prefix that has the highest frequency. Filter the keys of the previous dictionary so that they starts with the query pattern.
%% Cell type:code id: tags:
``` python
defbuild_count_base(t):
d={}
forsint:
ifsind:
d[s]+=1
else:
d[s]=1
returnd
defbuild_count_set(t):
d={k:0forkinset(t)}
forsint:
d[s]+=1
returnd
defbuild_count_count(t):
d={k:t.count(k)forkinset(t)}
returnd
defbuild_count_excpt(t):
d={}
forsint:
try:
d[s]+=1
except:
d[s]=1
returnd
importcollections
defbuild_count_counter(t):
returncollections.Counter(t)
defbuild_count_defaultdict(t):
d=collections.defaultdict(int)
forkins:
d[k]+=1
returnd
s="Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam tristique at velit in varius. Cras ut ultricies orci. Fusce vel consequat ante, vitae luctus tortor. Sed condimentum faucibus enim, sit amet pulvinar ligula feugiat ac. Sed interdum id risus id rhoncus. Nullam nisi justo, ultrices eu est nec, hendrerit maximus lorem. Nam urna eros, accumsan nec magna eu, elementum semper diam. Nulla tempus, nibh id elementum dapibus, ex diam lacinia est, sit amet suscipit nulla nibh eu sapien. Aliquam orci enim, malesuada in facilisis vitae, pharetra sit amet mi. Pellentesque mi tortor, sagittis quis odio quis, fermentum faucibus ex. Aenean sagittis nisl orci. Maecenas tristique velit sed leo facilisis porttitor. "