Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Cyril Labbe
scidetect
Commits
05c78f85
Commit
05c78f85
authored
Dec 01, 2015
by
Tien
Browse files
removed some debug functions
parent
7d4318f2
Changes
2
Hide whitespace changes
Inline
Side-by-side
src/fr/imag/forge/scidetect/Checker/Reader.java
View file @
05c78f85
...
...
@@ -119,6 +119,7 @@ public class Reader {
readtests
(
listOfFile
[
j
].
getPath
(),
Samplecorpus
,
savedetaillog
);
}
else
if
(
listOfFile
[
j
].
getName
().
endsWith
(
".pdf"
)
||
listOfFile
[
j
].
getName
().
endsWith
(
".xml"
)
||
listOfFile
[
j
].
getName
().
endsWith
(
".xtx"
))
{
ArrayList
<
Text
>
text
=
new
ArrayList
<
Text
>();
//System.out.println(listOfFile[j].getName());
TextProcessor
textprocessor
=
new
TextProcessor
();
text
=
textprocessor
.
newtext
(
listOfFile
[
j
],
listOfFile
);
for
(
int
i
=
0
;
i
<
text
.
size
();
i
++)
{
...
...
src/fr/imag/forge/scidetect/TextExtractor/normalizer.java
View file @
05c78f85
...
...
@@ -51,14 +51,14 @@ public class normalizer {
}
br
.
close
();
content
=
content
.
toUpperCase
();
//
content = content.replaceAll("-", " ");// parenthesis
//
content = content.replaceAll("[^A-Z ]", "");// non A to Z
//
//
content = content.replaceAll("\n", " ");//prob not nessesary :D
//
content = content.replaceAll("\\s+", " ");// remove extra spaces\
content
=
content
.
replaceAll
(
"[-\r\n\\s+]"
,
" "
);
// parenthesis
//content = content.replaceAll("\r", " "); // make a new line
content
=
content
.
replaceAll
(
"[^A-Z ]"
,
""
);
// remove non A to Z
content
=
content
.
replaceAll
(
"-"
,
" "
);
// parenthesis
content
=
content
.
replaceAll
(
"[^A-Z ]"
,
""
);
// non A to Z
content
=
content
.
replaceAll
(
"\r"
,
" "
);
// make a new line
content
=
content
.
replaceAll
(
"\n"
,
" "
);
//prob not nessesary :D
content
=
content
.
replaceAll
(
"\\s+"
,
" "
);
// remove extra spaces\
//
content = content.replaceAll("[-\r\n\\s+]", " ");// parenthesis
//
content = content.replaceAll("[^A-Z ]", "");// remove non A to Z
PrintWriter
out
=
new
PrintWriter
(
txt
);
out
.
println
(
content
);
out
.
close
();
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment