1. 08 Mar, 2019 5 commits
    • Kirill Smelkov's avatar
      . · 8b0816bc
      Kirill Smelkov authored
      8b0816bc
    • Kirill Smelkov's avatar
      . · 424beee4
      Kirill Smelkov authored
      424beee4
    • Kirill Smelkov's avatar
      . · 501b7b27
      Kirill Smelkov authored
      501b7b27
    • Kirill Smelkov's avatar
      X Don't keep ZBigFile activated during whole current transaction · 89ad3a79
      Kirill Smelkov authored
      If we keep it activated, there could be a problem at Resync time - if
      Resync sees that old zhead.At is outside of DB.δtail coverage it will
      invalidate all zhead.cache objects, not only changed objects. That means
      that even non-changed and being kept active ZBigFile will be invalidated
      and oops - PInvalidate will panic.
      
      We could avoid it via deactivating all ZBigFiles on each transaction
      update, but that can be too expensive if there are many ZBigFiles.
      
      We could avoid the problem another way - add to ZODB/go API to request
      that DB.δtail covers particular connection. That in turn would mean we
      have to also extend ZODB/go API to release connection from affecting DB
      via such constraint. Even if first step could be done via e.g. another
      flag, the second step - release - is not very clear - we already have
      connection "release" on transaction completion and adding e.g.
      conn.Close() in addition to that would be ambiguous for users.
      Also, if wcfs is slow to process invalidations for some reason,
      such constraint would mean DB.δtail would ↑ indefinitely.
      
      -> we can solve the problem in another way: don't keep ZBigFile always
      activated and just do activation/deactivation as if working with ZODB
      objects regularly. This does not add any complications to code flow and
      from the performance point of view we can practically avoid the slowdown
      by teaching zodbCacheControl to also pin ZBigFile in live cache.
      89ad3a79
    • Kirill Smelkov's avatar
      . · 5f7c757e
      Kirill Smelkov authored
      5f7c757e
  2. 07 Mar, 2019 1 commit
  3. 06 Mar, 2019 1 commit
  4. 05 Mar, 2019 1 commit
  5. 04 Mar, 2019 3 commits
    • Kirill Smelkov's avatar
      . · e1ad2ae3
      Kirill Smelkov authored
      e1ad2ae3
    • Kirill Smelkov's avatar
      X wcfs: Care to disable OS polling on us · 59552328
      Kirill Smelkov authored
      If we don't early-disable it, we can see a situation where when
      handling invalidations wcfs calls open("@revX/bigfile/...") to upload
      cache data there, go runtime tries to use epoll on that fd, and it gets
      stuck as described in the commit referenced in comments.
      
      In particular the deadlock was easy to trigger with nproc=1 environment
      (either a VM with 1 cpu, or e.g. under `taskset -c 0`)
      
      Bug reported by @romain.
      59552328
    • Kirill Smelkov's avatar
      . · e8c3499d
      Kirill Smelkov authored
      e8c3499d
  6. 01 Mar, 2019 1 commit
  7. 28 Feb, 2019 1 commit
    • Kirill Smelkov's avatar
      Merge branch 'master' into t · 8b60658b
      Kirill Smelkov authored
      * master:
        t/qemu-runlinux: Mount bpf and fusectl filesystems
        t/qemu-runlinux: Issue terminal resize before running program
        t/qemu-runlinux: Don't propagate $TERM in graphics mode
      8b60658b
  8. 27 Feb, 2019 3 commits
  9. 25 Feb, 2019 2 commits
  10. 22 Feb, 2019 4 commits
    • Kirill Smelkov's avatar
      X Draft demo that reading data through wcfs works · 01916f09
      Kirill Smelkov authored
      and gives exactly the same data as non-wcfs Wendelin.core:
      
      	---- 8< ----
      	(neo) (z-dev) (g.env) kirr@deco:~/src/wendelin/wendelin.core$ free -h
      	              total        used        free      shared  buff/cache   available
      	Mem:          7,5Gi       931Mi       613Mi       194Mi       6,0Gi       6,1Gi
      	Swap:            0B          0B          0B
      
      	(neo) (z-dev) (g.env) kirr@deco:~/src/wendelin/wendelin.core$ time ./demo/demo_zbigarray.py gen 1.fs
      	I: RAM:  7.47GB
      	I: WORK: 14.94GB
      	gen signal t=0...2.00e+09  float64  (= 14.94GB)
      	gen signal blk [0:4194304]  (0.2%)
      	gen signal blk [4194304:8388608]  (0.4%)
      	gen signal blk [8388608:12582912]  (0.6%)
      	gen signal blk [12582912:16777216]  (0.8%)
      	gen signal blk [16777216:20971520]  (1.0%)
      	gen signal blk [20971520:25165824]  (1.3%)
      	...
      	gen signal blk [1988100096:1992294400]  (99.4%)
      	gen signal blk [1992294400:1996488704]  (99.6%)
      	gen signal blk [1996488704:2000683008]  (99.8%)
      	gen signal blk [2000683008:2004649984]  (100.0%)
      	VIRT: 457 MB    RSS: 259MB
      
      	real    7m51,814s
      	user    2m19,001s
      	sys     0m42,615s
      
      	(neo) (z-dev) (g.env) kirr@deco:~/src/wendelin/wendelin.core$ time ./demo/demo_zbigarray.py read 1.fs
      	I: RAM:  7.47GB
      	sig: 2004649984 float64 (= 14.94GB)
      	<sig>:  2.3794727102747662e-08
      	S(sig): 47.70009930580747
      	VIRT: 245 MB    RSS: 49MB
      
      	real    2m36,006s
      	user    0m14,773s
      	sys     0m59,467s
      
      	(neo) (z-dev) (g.env) kirr@deco:~/src/wendelin/wendelin.core$ time WENDELIN_CORE_VIRTMEM=r:wcfs+w:uvmm ./demo/demo_zbigarray.py read 1.fs
      	I: RAM:  7.47GB
      	sig: 2004649984 float64 (= 14.94GB)
      	wcfs: 2019/02/22 21:33:48 zodb: FIXME: open file:///home/kirr/src/wendelin/wendelin.core/1.fs: cache is not ready for invalidations -> NoCache forced
      	db.openx 03cdd855e0d73622 nopool=true   ; δtail (03cdd855e0d73622, 03cdd855e0d73622]
      	db.openx 03cdd855e0d73622 nopool=false  ; δtail (03cdd855e0d73622, 03cdd855e0d73622]
      	W0222 21:35:20.282163    6697 misc.go:84] /: lookup "lib": invalid argument: not @rev
      	W0222 21:35:20.334896    6697 misc.go:84] /: lookup "libX11.so": invalid argument: not @rev
      	W0222 21:35:20.340128    6697 misc.go:84] /: lookup "libX11.so.so": invalid argument: not @rev
      	W0222 21:35:20.342492    6697 misc.go:84] /: lookup "libX11.so.la": invalid argument: not @rev
      	<sig>:  2.3794727102747662e-08
      	S(sig): 47.70009930580747
      	VIRT: 371 MB    RSS: 37MB
      
      	real    6m8,611s
      	user    0m10,167s
      	sys     0m21,964s
      	---- 8< ----
      
      Wcfs was not yet optimized at all. Offhand `perf top` was showing lots of time
      is spent in garbage collector, but maybe something to also debug on FUSE-level latencies.
      In any way FileStorage case is not very representative and as for ZEO/NEO case
      there are network requests to be made and they start to dominate the latency to
      access one page/object.
      01916f09
    • Kirill Smelkov's avatar
      . · 13ee0416
      Kirill Smelkov authored
      13ee0416
    • Kirill Smelkov's avatar
      . · a4d63fbb
      Kirill Smelkov authored
      a4d63fbb
    • Kirill Smelkov's avatar
      . · bc041be8
      Kirill Smelkov authored
      bc041be8
  11. 21 Feb, 2019 3 commits
  12. 18 Feb, 2019 6 commits
  13. 13 Feb, 2019 2 commits
    • Kirill Smelkov's avatar
      t/qemu-runlinux: Update · fe541453
      Kirill Smelkov authored
      Continuing 76d8f76d (Script to run compiled linux kernel with root fs
      mounted from host) update the script to run/debug linux inside QEMU:
      
      - teach it to run specified program + args, instead of hardcoded /bin/sh;
      - before tailing to user program, builtin init mounts /proc, /sys, ...
        inside - previously it was /proc, /sys from host seen on those
        mountpoints and it was very misleading - e.g. ps was showing processes
        from host, not inside, etc.
      - builtin init also cares to run specified program with the same current
        directory that was current on host, and environments such as $HOME,
        $PATH, $TERM, ... are also propagated.
      - allow to optionally run QEMU with graphics, instead of stdout only;
      - increase available RAM from 128M to 512M (with 128M running py.test
        inside is failing with fork: not enough memory).
      
      This updated version was useful to debug WCFS(*) & FUSE issues by running
      
      	kirr@deco:~/src/wendelin/wendelin.core/wcfs$ ../t/qemu-runlinux ~/src/linux/linux/arch/x86_64/boot/bzImage py.test -vsx -k test_wcfs
      
      See https://marc.info/?l=linux-fsdevel&m=155000277921155&w=2 for details.
      
      (*) WCFS is still being draft and worked on t branch.
      fe541453
    • Kirill Smelkov's avatar
      . · e23c133e
      Kirill Smelkov authored
      e23c133e
  14. 12 Feb, 2019 1 commit
  15. 11 Feb, 2019 5 commits
  16. 08 Feb, 2019 1 commit
    • Kirill Smelkov's avatar
      X found that 2 read requests from wcfs are indeed pending · 3dd755dd
      Kirill Smelkov authored
      As it should be.
      
      grep -w -e '<- qread\>' y.log |awk {'print $6'} |sort >qread.txt
      grep -w -e '-> read\>' y.log |awk {'print $6'} |sort >read.txt
      
      root@deco:/home/kirr/src/wendelin/wendelin.core/wcfs# xdiff qread.txt read.txt
      diff --git a/qread.txt b/read.txt
      index 4ab50d7..fdd2be1 100644
      --- a/qread.txt
      +++ b/read.txt
      @@ -53,7 +53,5 @@ wcfs/11399_1_r:
       wcfs/11399_2_r:
       wcfs/11399_3_r:
       wcfs/11399_4_r:
      -wcfs/11399_5_r:
       wcfs/11400_0_r:
       wcfs/11401_0_r:
      -wcfs/11401_1_r:
      
      root@deco:/home/kirr/src/wendelin/wendelin.core/wcfs# tail -80 y.log
              github.com/hanwen/go-fuse/fuse.handleEINTR+39
              github.com/hanwen/go-fuse/fuse.(*Server).readRequest+355
              github.com/hanwen/go-fuse/fuse.(*Server).loop+107
              runtime.goexit+1
      
      P2 2.215810 /dev/fuse -> read       wcfs/11399_4_r:
              .56  RELEASE i8 ...             (ret=64)
      
      P2 2.215859 /dev/fuse <- write      wcfs/11399_5_w:
              .56 (0) ...
      
              syscall.Syscall+48
              syscall.Write+73
              github.com/hanwen/go-fuse/fuse.(*Server).systemWrite.func1+76
              github.com/hanwen/go-fuse/fuse.handleEINTR+39
              github.com/hanwen/go-fuse/fuse.(*Server).systemWrite+931
              github.com/hanwen/go-fuse/fuse.(*Server).write+194
              github.com/hanwen/go-fuse/fuse.(*Server).handleRequest+179
              github.com/hanwen/go-fuse/fuse.(*Server).loop+399
              runtime.goexit+1
      
      P2 2.215871 /dev/fuse -> write_ack  wcfs/11399_5_w (ret=16)
      
      P2 2.215876 /dev/fuse <- qread      wcfs/11399_5_r:				<-- NOTE
      
              syscall.Syscall+48
              syscall.Read+73
              github.com/hanwen/go-fuse/fuse.(*Server).readRequest.func1+85
              github.com/hanwen/go-fuse/fuse.handleEINTR+39
              github.com/hanwen/go-fuse/fuse.(*Server).readRequest+355
              github.com/hanwen/go-fuse/fuse.(*Server).loop+107
              runtime.goexit+1
      
      P0 2.221527 /dev/fuse <- qread      wcfs/11401_1_r:				<-- NOTE
      
              syscall.Syscall+48
              syscall.Read+73
              github.com/hanwen/go-fuse/fuse.(*Server).readRequest.func1+85
              github.com/hanwen/go-fuse/fuse.handleEINTR+39
              github.com/hanwen/go-fuse/fuse.(*Server).readRequest+355
              github.com/hanwen/go-fuse/fuse.(*Server).loop+107
              runtime.goexit+1
      
      P1 2.239384 /dev/fuse -> read       wcfs/11398_6_r:
              .57  READ i5 ...                (ret=80)
      
      P0 2.239626 /dev/fuse <- write      wcfs/11397_0_w:
              NOTIFY_RETRIEVE ...
      
              syscall.Syscall+48
              syscall.Write+73
              github.com/hanwen/go-fuse/fuse.(*Server).systemWrite.func1+76
              github.com/hanwen/go-fuse/fuse.handleEINTR+39
              github.com/hanwen/go-fuse/fuse.(*Server).systemWrite+931
              github.com/hanwen/go-fuse/fuse.(*Server).write+194
              github.com/hanwen/go-fuse/fuse.(*Server).InodeRetrieveCache+764
              github.com/hanwen/go-fuse/fuse/nodefs.(*FileSystemConnector).FileRetrieveCache+157
              main.(*BigFile).invalidateBlk+232
              main.(*Root).zδhandle1.func1+72
              golang.org/x/sync/errgroup.(*Group).Go.func1+87
              runtime.goexit+1
      
      P0 2.239660 /dev/fuse -> write_ack  wcfs/11397_0_w (ret=48)
      
      pending read/write:
      
      ...
      3dd755dd