18/11/19: maintenance de la plate-forme - Interruptions du service gricad-gitlab et perturbations possibles tout au long de la journée.

Commit f34e4f1c authored by Tien's avatar Tien

Fixed typos

parent c24b0927
No preview for this file type
......@@ -26,12 +26,14 @@
\par\end{flushleft}
\vskip 20em
}
\preauthor{\begin{flushright}}\postauthor{\end{flushright}}
\preauthor{\begin{flushright}}
\postauthor{\end{flushright}}
\predate{
\begin{flushleft}
\rule{\linewidth}{0.5mm}
\end{flushleft}
\begin{flushright}\large\scshape}\postdate{\par\end{flushright}}
\begin{flushright}\large\scshape}
\postdate{\par\end{flushright}}
%%%%%
\begin{document}
......@@ -57,8 +59,20 @@ Cyril Labb\'e \\
\begin{table}[ht]\caption*{Revision History}\begin{tabular}{|c|c|c|c|}\hline
\hline Version & Date & Author & Comment \\ [0.5ex] \hline 1.4 & 13-02-2015 & MT & Initial deployment \\ 1.41 & 17-02-2015 & MT & Added support for XML and XTX \\ 2.0 & 25-02-2015 & MT & Added multiple configurable parameters \\\hline\end{tabular}\label{table:nonlin}\end{table}
\begin{table}[ht]
\caption*{Revision History}
\begin{tabular}{|c|c|c|c|}
\hline
\hline
Version & Date & Author & Comment \\ [0.5ex]
\hline
1.4 & 13-02-2015 & MT & Initial deployment \\
1.41 & 17-02-2015 & MT & Added support for XML and XTX \\
2.0 & 25-02-2015 & MT & Added multiple configurable parameters \\
\hline
\end{tabular}
\label{table:nonlin}
\end{table}
\tableofcontents
......@@ -125,16 +139,16 @@ New subdirectories can be added. This can be done for two purpose:
Threshold_Scigen 0.48 0.56
\end{lstlisting}
A line starting with {\tt Threshold\_Dirname} is used to define thresholds needed to take decisions to assigned tested texts the class for which examples can be found in the directory {\tt Dirname}. There should have one line (i.e. two Thresholds) per classe. These values are 2 real numbers between 0 and 1. The smallest one is use to take the decision to assigned the tested paper (almost certainly) to the classe. The second one is used as a threshold for suspicion for containing parts of generated text.
A line starting with {\tt Threshold\_Dirname} is used to define thresholds needed to take decisions to assigned tested texts the class for which examples can be found in the directory {\tt Dirname}. There should have one line (i.e. two Thresholds) per classe. These values are 2 real numbers between 0 and 1. The smallest one is use to take the decision to assigned the tested paper (almost certainly) to the class. The second one is used as a threshold for suspicion for containing parts of generated text.
The previous example (concerning Scigen class) has the following meaning. Given distances from the tested text to its nearest neighbor in the set of samples (i.e. texts found in the Scigen dir):
The previous example (concerning Scigen class) has the following meaning. Given distances from the tested text to its nearest neighbour in the set of samples (i.e. texts found in the Scigen dir):
\begin{itemize}
\item If the distance is greater than 0.56, then it is reasonably believable that this is a article.
\item If the distance is greater than 0.56, then it is reasonably believable that this is a genuine article.
\item From 0.56 to 0.48, there is a chance that this article or part of this article is Scigen generated.
\item If the distance is less than 0.48, there is a very high chance that this is an automatic Scigen generated article.
\end{itemize}
If new samples are added to the sample folder, the threshold configuration should also be added, if not the default-threshold values are used (?? and ??).
If new samples are added to the sample folder, the threshold configuration should also be added, if not the default-threshold values are used (0.48 and 0.56).
\subsection{Path for log files}
\begin{lstlisting}[language=bash]
......@@ -149,16 +163,16 @@ These lines are use to set the default log folder and a default detail log folde
\begin{lstlisting}[language=bash]
Max_length 30000
\end{lstlisting}
This set the max length in character (including whitespace char) for a text to be eligible for classification. This parameter is used in order to avoid miss classification: when an article is too long, this cause the characteristic of the article to becomes too generic and very long paper may be misclassified (without splitting misclassification rate: ??).
This set the max length in character (including white space char) for a text to be eligible for classification. This parameter is used in order to avoid miss classification: when an article is too long, this cause the characteristic of the article to becomes too generic and very long paper may be misclassified (without splitting misclassification rate: ??).
The default value is set at 30000 characters (about 15 pages). A longer text will be splited into several part which are tested individually.
The default value is set at 30000 characters (about 15 pages). A longer text will be splitter into several part which are tested individually.
\section{Make use of detail logging}
The detail log (parameter -d) stores all the distances from the text under test to all other samples in the sample set (i.e. all texts in all directories found at {\tt /data/sample}).
This can be use to get a more detail look at the results.
For example: an article returned with a distant to the nearest neighbor that barely pass the threshold. Turning on the detail log for that article and checking the results may help the decision.
For example: an article returned with a distant to the nearest neighbour that barely pass the threshold. Turning on the detail log for that article and checking the results may help the decision.
%if it is just a rare incident or the distances to other samples are also suspicious and have a better estimation.
\end{document}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment