Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
E
embulk-input-filename
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Klaus Wölfel
embulk-input-filename
Commits
e20b5ac0
Commit
e20b5ac0
authored
Aug 08, 2017
by
yu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
fix the error in README.md
parent
6253b2d8
Changes
3
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
115 additions
and
34 deletions
+115
-34
README.md
README.md
+107
-23
build.gradle
build.gradle
+6
-7
src/main/java/org/embulk/input/filename/FilenameInputPlugin.java
...n/java/org/embulk/input/filename/FilenameInputPlugin.java
+2
-4
No files found.
README.md
View file @
e20b5ac0
...
...
@@ -4,7 +4,7 @@
## Overview
*
**Plugin type**
: input
*
**Resume supported**
: no
t yet
*
**Resume supported**
: no
*
**Cleanup supported**
: yes
*
**Guess supported**
: no
...
...
@@ -12,10 +12,10 @@
-
**multi_dir**
: description (ArrayList
<String>
, required)
-
**mulit_tag**
: description (ArrayList
<String>
, default:
`[]`
)
-
**order**
: description (String, default:
`ALPHABETICAL`
)
-
**
load_
order**
: description (String, default:
`ALPHABETICAL`
)
-
**chunk_size**
: description (int, default:
`10485760(10M)`
)
f
or the order option. There are many alternative:
Attention: F
or the order option. There are many alternative:
ALPHABETICAL (default value)
ASCEND_MODIFIED
DESCEND_MODIFIED
...
...
@@ -29,34 +29,36 @@ exec:
min_output_tasks
:
1
in
:
type
:
filename
mul
it
_dir
:
[
"
../sample/sample_"
,
"
../example/example_"
]
mul
ti
_dir
:
[
"
../sample/sample_"
,
"
../example/example_"
]
multi_tag
:
[
"
tag1"
,
"
tag2"
]
order
:
ASCEND_MODIFIED
load_
order
:
ASCEND_MODIFIED
chunk_size
:
1000
```
Attention:
exec:
min_output_tasks: 1
is necessary!
Embulk will optimize the task according the core number which means that, it will re-distribute the task, which will cause
errors.
If the multi_dir contains more than one directory. each directory will be treated as a task. the embulk will distribute those tasks to multi
thread. As each task will one consistently, the files in each directory will be uploading in order.
**exec:**
**min_output_tasks: 1**
This configuration is oblige!
Embulk will optimize the task according the core number of the PC which means that, it will re-distribute the task and cause errors.
If the multi_dir contains more than one directory, each directory will be treated as a task. the Embulk will distribute those tasks to multi
thread. Each task will run consistently, the files in each directory will be uploading in order.
For example the upload order maybe:
example1.txt
sample1.txt
sample2.txt
example2.txt
sample3.txt
-
example1.txt
-
sample1.txt
-
sample2.txt
-
example2.txt
-
sample3.txt
If you want to upload the directory one by one, you need to configure the
**max_thread: 1**
To also upload the directory one by one, you need to configure the max_thread: 1
then you will get
example1.txt
example2.txt
sample1.txt
sample2.txt
sample3.txt
-
example1.txt
-
example2.txt
-
sample1.txt
-
sample2.txt
-
sample3.txt
## Build
...
...
@@ -65,3 +67,85 @@ java 1.8 is required.
```
$ ./gradlew gem # -t to watch change of files and rebuild continuously
```
## Usage
If you are a new user of embulk.
Here are some tips can help you use this plugin quickly.
First of all, you need to have a java8 environment in a linux system.
And you need a erp5 isntance.
[
if not, follow the tutorial to have
one
](
https://nexedi.erp5.net/web_page_module/7056/WebPage_view?ignore_layout:int=1&selection_index=0&portal_status_message=Status%20changed.&selection_name=web_page_module_view_web_page_list_selection&editable_mode:int=1
)
Then, you need a embulk on your PC, now there is a bug to load the plugin with the newest embulk. I recommand you use the embulk_v.8.27 instead
of the newest version.
To install the embulk
```
curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.bintray.com/embulk/maven/embulk-0.8.27.jar"
chmod +x ~/.embulk/bin/embulk
echo 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
```
After installing the embulk. You need to build this filename-input-plugin on your PC.
```
git clone https://lab.nexedi.com/caiyu/embulk-input-filename/tree/multiThread
cd embulk-input-filename
./gradlew package
```
If you want to test the Plugin. Just run ./gradlew test
In fact this input should be used with the wendelin-output-plugin. build it on your PC too.
```
git https://lab.nexedi.com/caiyu/embulk-output-wendelin/tree/java-output
cd embulk-output-plugin
./gradlew package
```
Now you can use the embulk with these two plugin to upload the data.
In your workplace, create a yml file. Say that we create a config.yml, and fill in the configuration.
```
yaml
exec
:
min_output_tasks
:
1
in
:
type
:
filename
mulit_dir
:
[
"
../sample/sample_"
,
"
../example/example_"
]
multi_tag
:
[
"
tag1"
,
"
tag2"
]
load_order
:
ASCEND_MODIFIED
chunk_size
:
1000
out
:
type
:
wendelin
tag
:
"
weather-cc"
streamtool_uri
:
https://softinstxxxxx.host.vifib.net/erp5/portal_ingestion_policies/weather-cc
user
:
zope
password
:
yourpassword
```
Prepare the sample data and example data to upload.
```
mkdir ../sample
vim ../sample/sample_01.txt
vim ../sample/sample_02.txt
vim ../sample/sample_03.txt
mkdir ../example
vim ../example/example_01.txt
vim ../example/example_02.txt
```
Then run the embulk
```
embulk run -L path/to/embulk-input-filename -L path/to/embulk-output-wendelin config.yml
```
build.gradle
View file @
e20b5ac0
...
...
@@ -19,15 +19,14 @@ sourceCompatibility = 1.8
targetCompatibility
=
1.8
dependencies
{
compile
"org.embulk:embulk-core:0.8.27"
provided
"org.embulk:embulk-core:0.8.27"
compile
"org.embulk:embulk-standards:0.8.27"
provided
"org.embulk:embulk-standards:0.8.27"
compile
"commons-codec:commons-codec:1.9"
compile
"org.embulk:embulk-core:0.8.29"
provided
"org.embulk:embulk-core:0.8.29"
compile
"org.embulk:embulk-standards:0.8.29"
provided
"org.embulk:embulk-standards:0.8.29"
// compile "YOUR_JAR_DEPENDENCY_GROUP:YOUR_JAR_DEPENDENCY_MODULE:YOUR_JAR_DEPENDENCY_VERSION"
testCompile
"junit:junit:4.+"
testCompile
"org.embulk:embulk-core:0.8.2
7
:tests"
testCompile
'org.embulk:embulk-test:0.8.2
7
'
testCompile
"org.embulk:embulk-core:0.8.2
9
:tests"
testCompile
'org.embulk:embulk-test:0.8.2
9
'
}
test
{
...
...
src/main/java/org/embulk/input/filename/FilenameInputPlugin.java
View file @
e20b5ac0
...
...
@@ -305,7 +305,7 @@ public class FilenameInputPlugin
// This method is for walk through the directory and record the files in the directory. It will compare the filename with the lastPath
// In we want to upload the files in ALPHABETICAL order. than the filename "smaller than" the lastPath will be abandonned.
// Be careful, that since we have alternative for the order. You should be careful what "smaller than" means!
public
ArrayList
<
String
>
listFiles
(
PluginTask
task
,
Path
pathPrefix
,
String
lastPath
,
String
order
)
public
ArrayList
<
String
>
listFiles
(
PluginTask
task
,
Path
pathPrefix
,
final
String
lastPath
,
final
String
order
)
{
//Path pathPrefix = Paths.get(task.getPathPrefix()).normalize();
final
Path
directory
;
...
...
@@ -319,7 +319,6 @@ public class FilenameInputPlugin
directory
=
(
d
==
null
?
CURRENT_DIR
:
d
);
}
//final ImmutableList.Builder<String> builder = ImmutableList.builder();
final
ArrayList
<
String
>
filesArray
=
new
ArrayList
<
String
>();
try
{
log
.
info
(
"Listing local files at directory '{}' filtering filename by prefix '{}'"
,
directory
.
equals
(
CURRENT_DIR
)
?
"."
:
directory
.
toString
(),
fileNamePrefix
);
...
...
@@ -380,8 +379,7 @@ public class FilenameInputPlugin
// End
// Static method to return a FileTime of a file
public
static
FileTime
getCreationTime
(
String
filename
)
throws
IOException
{
File
file
=
new
File
(
filename
);
Path
p
=
Paths
.
get
(
file
.
getAbsolutePath
());
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment