"By predicting the class with the most observations in the dataset (M or mines) the Zero Rule Algorithm can achieve an accuracy of 53%.\n",
"\n",
"You can learn more about this dataset at the \n",
"[UCI Machine Learning repository][UCI Machine Learning repository documentation](http://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks)). You can download the dataset for free and place it in your working directory with the filename `sonar.all-data.csv` (also avaible in the gitlab repo of the course)."
"You can learn more about this dataset at the [UCI Machine Learning repository documentation](http://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks)). You can download the dataset for free and place it in your working directory with the filename `sonar.all-data.csv` (also avaible in the gitlab repo of the course)."
]
},
{
...
...
%% Cell type:markdown id: tags:
This notebook can be run on mybinder: [](https://mybinder.org/v2/git/https%3A%2F%2Fgricad-gitlab.univ-grenoble-alpes.fr%2Fchatelaf%2Fml-sicom3a/master?urlpath=lab/tree/notebooks/8_Trees_Boosting/N1_Classif_tree.ipynb/N3_c_Random_forests_Sonar_Data.ipynb)
%% Cell type:markdown id: tags:
## SONAR DATA
This is a dataset that describes sonar chirp returns bouncing off different services. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders.
It is a well-understood dataset. All of the variables are continuous and generally in the range of 0 to 1. As such we will not have to normalize the input data, which is often a good practice with the Perceptron algorithm. The output variable is a string “M” for mine and “R” for rock, which will need to be converted to integers 1 and 0.
By predicting the class with the most observations in the dataset (M or mines) the Zero Rule Algorithm can achieve an accuracy of 53%.
You can learn more about this dataset at the
[UCI Machine Learning repository][UCI Machine Learning repository documentation](http://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks)). You can download the dataset for free and place it in your working directory with the filename `sonar.all-data.csv` (also avaible in the gitlab repo of the course).
You can learn more about this dataset at the [UCI Machine Learning repository documentation](http://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks)). You can download the dataset for free and place it in your working directory with the filename `sonar.all-data.csv` (also avaible in the gitlab repo of the course).
%% Cell type:code id: tags:
``` python
from csv import reader
import numpy as np
# Load a CSV file
def load_csv(filename):
dataset = list()
with open(filename, "r") as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset
# Convert string column to float
def str_column_to_float(dataset, column):
for row in dataset:
row[column] = float(row[column].strip())
# Convert string column to integer
def str_column_to_int(dataset, column):
class_values = [row[column] for row in dataset]
unique = set(class_values)
lookup = dict()
for i, value in enumerate(unique):
lookup[value] = i
for row in dataset:
row[column] = lookup[row[column]]
return lookup
# load and prepare data
filename = "sonar.all-data.csv"
dataset = load_csv(filename)
for i in range(len(dataset[0]) - 1):
str_column_to_float(dataset, i)
# convert string class to integers
str_column_to_int(dataset, len(dataset[0]) - 1)
print("size of sonar dataset = {}".format(np.asarray(dataset).shape))
dataset_X = list()
dataset_y = list()
for i in range(len(dataset)):
dataset_X.append(dataset[i][:-1])
dataset_y.append(dataset[i][-1])
```
%%%% Output: stream
size of sonar dataset = (208, 61)
%% Cell type:markdown id: tags:
### about Sonar data file
The file contains 111 patterns obtained by bouncing sonar signals off a metal cylinder at various angles and under various conditions. The file also contains 97 patterns obtained from rocks under similar conditions. The transmitted sonar signal is a frequency-modulated chirp, rising in frequency. The data set contains signals obtained from a variety of different aspect angles, spanning 90 degrees for the cylinder and 180 degrees for the rock.
Each pattern is a set of 60 numbers in the range 0.0 to 1.0. Each number represents the energy within a particular frequency band, integrated over a certain period of time. The integration aperture for higher frequencies occur later in time, since these frequencies are transmitted later during the chirp.
These experiments were conducted to evaluate the possibilities to detect mines or pipes on the sea floor.
See the [UCI Machine Learning repository documentation](http://archive.ics.uci.edu/ml/datasets/connectionist+bench+(sonar,+mines+vs.+rocks)).