Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
W
wendelin
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Boxiang Sun
wendelin
Commits
452dea3a
Commit
452dea3a
authored
Feb 26, 2021
by
Roque
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
erp5_wendelin_data_lake_ingestion: fix check md5 script
parent
c2cf0c14
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
19 additions
and
6 deletions
+19
-6
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/DataSet_checkMd5DataStreamList.py
...erp5_wendelin_data_lake/DataSet_checkMd5DataStreamList.py
+19
-6
No files found.
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/DataSet_checkMd5DataStreamList.py
View file @
452dea3a
"""
"""
Script to check that a filesystem md5sum of a folder (uploaded to file_system_checksum File)
Script to check that a data set is properly uploaded
is properly uploaded to Wendelin Data Lake.
to Wendelin Data Lake.
How to use it: create a file_system_checksum file containing md5sum
values of all dataset files uploaded with the following format:
Format of is the same as md5sum's output:
Format of is the same as md5sum's output:
<md5_sum> <filename.extension>
<md5_sum> <filename.extension>
It can be generated in the original data set folder outside wendelin by doing md5sum * > output.txt
"""
"""
import
os.path
data
=
str
(
context
.
file_system_checksum
).
strip
()
data
=
str
(
context
.
file_system_checksum
).
strip
()
lines
=
data
.
split
(
"
\
n
"
)
lines
=
data
.
split
(
"
\
n
"
)
print
"Total files = "
,
len
(
lines
)
print
"Total files = "
,
len
(
lines
)
print
check_result
=
True
for
line
in
lines
[:]:
for
line
in
lines
[:]:
md5_checksum
=
line
[:
32
].
strip
()
md5_checksum
=
line
[:
32
].
strip
()
full_filename
=
line
[
32
:].
strip
()
full_filename
=
line
[
32
:].
strip
()
filename
,
extension
=
os
.
path
.
splitext
(
full_filename
)
# check Data stream for this hash exists
extension
=
extension
[
1
:]
filename
,
extension
=
full_filename
.
split
(
"."
)
reference
=
"%s/%s/%s"
%
(
data_set_reference
,
filename
,
extension
)
reference
=
"%s/%s/%s"
%
(
data_set_reference
,
filename
,
extension
)
catalog_kw
=
{
"portal_type"
:
"Data Stream"
,
catalog_kw
=
{
"portal_type"
:
"Data Stream"
,
"reference"
:
reference
}
"reference"
:
reference
}
data_stream
=
context
.
portal_catalog
.
getResultValue
(
**
catalog_kw
)
data_stream
=
context
.
portal_catalog
.
getResultValue
(
**
catalog_kw
)
if
data_stream
is
None
:
if
data_stream
is
None
:
print
"[NOT FOUND]"
,
reference
print
"[NOT FOUND]"
,
reference
check_result
=
False
else
:
else
:
is_upload_ok
=
(
data_stream
.
getVersion
()
==
md5_checksum
)
is_upload_ok
=
(
data_stream
.
getVersion
()
==
md5_checksum
)
print
md5_checksum
,
filename
,
data_stream
is
not
None
,
is_upload_ok
print
md5_checksum
,
filename
,
data_stream
is
not
None
,
is_upload_ok
if
not
is_upload_ok
:
check_result
=
False
print
if
check_result
:
print
"[OK] Data set correctly uploaded"
else
:
print
"[ERROR] Data set was not correctly uploaded"
return
printed
return
printed
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment