1. 21 Mar, 2013 10 commits
    • Jan Ziak's avatar
    • Robert Griesemer's avatar
      go/format: fix documentation · e95c41f3
      Robert Griesemer authored
      R=r
      CC=golang-dev
      https://golang.org/cl/7920048
      e95c41f3
    • Russ Cox's avatar
      crypto/sha1: faster amd64, 386 implementations · 2f32138a
      Russ Cox authored
      -- amd64 --
      
      On a MacBookPro10,2 (Core i5):
      
      benchmark              old ns/op    new ns/op    delta
      BenchmarkHash8Bytes          785          592  -24.59%
      BenchmarkHash1K             8727         3014  -65.46%
      BenchmarkHash8K            64926        20723  -68.08%
      
      benchmark               old MB/s     new MB/s  speedup
      BenchmarkHash8Bytes        10.19        13.50    1.32x
      BenchmarkHash1K           117.34       339.71    2.90x
      BenchmarkHash8K           126.17       395.31    3.13x
      
      For comparison, on the same machine, openssl 0.9.8r reports
      its sha1 speed as 341 MB/s for 1K and 404 MB/s for 8K.
      
      On an Intel Xeon E5520:
      
      benchmark              old ns/op    new ns/op    delta
      BenchmarkHash8Bytes          984          707  -28.15%
      BenchmarkHash1K            11141         3466  -68.89%
      BenchmarkHash8K            82435        23411  -71.60%
      
      benchmark               old MB/s     new MB/s  speedup
      BenchmarkHash8Bytes         8.13        11.31    1.39x
      BenchmarkHash1K            91.91       295.36    3.21x
      BenchmarkHash8K            99.37       349.91    3.52x
      
      For comparison, on the same machine, openssl 1.0.1 reports
      its sha1 speed as 286 MB/s for 1K and 394 MB/s for 8K.
      
      -- 386 --
      
      On a MacBookPro10,2 (Core i5):
      
      benchmark              old ns/op    new ns/op    delta
      BenchmarkHash8Bytes         1041          713  -31.51%
      BenchmarkHash1K            15612         3382  -78.34%
      BenchmarkHash8K           110152        22733  -79.36%
      
      benchmark               old MB/s     new MB/s  speedup
      BenchmarkHash8Bytes         7.68        11.21    1.46x
      BenchmarkHash1K            65.59       302.76    4.62x
      BenchmarkHash8K            74.37       360.36    4.85x
      
      On an Intel Xeon E5520:
      
      benchmark              old ns/op    new ns/op    delta
      BenchmarkHash8Bytes         1221          842  -31.04%
      BenchmarkHash1K            14643         4137  -71.75%
      BenchmarkHash8K           108722        27394  -74.80%
      
      benchmark               old MB/s     new MB/s  speedup
      BenchmarkHash8Bytes         6.55         9.49    1.45x
      BenchmarkHash1K            69.93       247.51    3.54x
      BenchmarkHash8K            75.35       299.04    3.97x
      
      R=agl, dave
      CC=golang-dev
      https://golang.org/cl/7763049
      2f32138a
    • Russ Cox's avatar
      crypto/md5: faster amd64, 386 implementations · 25cbd534
      Russ Cox authored
      -- amd64 --
      
      On a MacBookPro10,2 (Core i5):
      
      benchmark                       old ns/op    new ns/op    delta
      BenchmarkHash8Bytes                   471          524  +11.25%
      BenchmarkHash1K                      3018         2220  -26.44%
      BenchmarkHash8K                     20634        14604  -29.22%
      BenchmarkHash8BytesUnaligned          468          523  +11.75%
      BenchmarkHash1KUnaligned             3006         2212  -26.41%
      BenchmarkHash8KUnaligned            20820        14652  -29.63%
      
      benchmark                        old MB/s     new MB/s  speedup
      BenchmarkHash8Bytes                 16.98        15.26    0.90x
      BenchmarkHash1K                    339.26       461.19    1.36x
      BenchmarkHash8K                    397.00       560.92    1.41x
      BenchmarkHash8BytesUnaligned        17.08        15.27    0.89x
      BenchmarkHash1KUnaligned           340.65       462.75    1.36x
      BenchmarkHash8KUnaligned           393.45       559.08    1.42x
      
      For comparison, on the same machine, openssl 0.9.8r reports
      its md5 speed as 350 MB/s for 1K and 410 MB/s for 8K.
      
      On an Intel Xeon E5520:
      
      benchmark                       old ns/op    new ns/op    delta
      BenchmarkHash8Bytes                   565          607   +7.43%
      BenchmarkHash1K                      3753         2475  -34.05%
      BenchmarkHash8K                     25945        16250  -37.37%
      BenchmarkHash8BytesUnaligned          559          594   +6.26%
      BenchmarkHash1KUnaligned             3754         2474  -34.10%
      BenchmarkHash8KUnaligned            26011        16359  -37.11%
      
      benchmark                        old MB/s     new MB/s  speedup
      BenchmarkHash8Bytes                 14.15        13.17    0.93x
      BenchmarkHash1K                    272.83       413.58    1.52x
      BenchmarkHash8K                    315.74       504.11    1.60x
      BenchmarkHash8BytesUnaligned        14.31        13.46    0.94x
      BenchmarkHash1KUnaligned           272.73       413.78    1.52x
      BenchmarkHash8KUnaligned           314.93       500.73    1.59x
      
      For comparison, on the same machine, openssl 1.0.1 reports
      its md5 speed as 443 MB/s for 1K and 513 MB/s for 8K.
      
      -- 386 --
      
      On a MacBookPro10,2 (Core i5):
      
      benchmark                       old ns/op    new ns/op    delta
      BenchmarkHash8Bytes                   602          670  +11.30%
      BenchmarkHash1K                      4038         2549  -36.87%
      BenchmarkHash8K                     27879        16690  -40.13%
      BenchmarkHash8BytesUnaligned          602          670  +11.30%
      BenchmarkHash1KUnaligned             4025         2546  -36.75%
      BenchmarkHash8KUnaligned            27844        16692  -40.05%
      
      benchmark                        old MB/s     new MB/s  speedup
      BenchmarkHash8Bytes                 13.28        11.93    0.90x
      BenchmarkHash1K                    253.58       401.69    1.58x
      BenchmarkHash8K                    293.83       490.81    1.67x
      BenchmarkHash8BytesUnaligned        13.27        11.94    0.90x
      BenchmarkHash1KUnaligned           254.40       402.05    1.58x
      BenchmarkHash8KUnaligned           294.21       490.77    1.67x
      
      On an Intel Xeon E5520:
      
      benchmark                       old ns/op    new ns/op    delta
      BenchmarkHash8Bytes                   752          716   -4.79%
      BenchmarkHash1K                      5307         2799  -47.26%
      BenchmarkHash8K                     36993        18042  -51.23%
      BenchmarkHash8BytesUnaligned          748          730   -2.41%
      BenchmarkHash1KUnaligned             5301         2795  -47.27%
      BenchmarkHash8KUnaligned            36983        18085  -51.10%
      
      benchmark                        old MB/s     new MB/s  speedup
      BenchmarkHash8Bytes                 10.64        11.16    1.05x
      BenchmarkHash1K                    192.93       365.80    1.90x
      BenchmarkHash8K                    221.44       454.03    2.05x
      BenchmarkHash8BytesUnaligned        10.69        10.95    1.02x
      BenchmarkHash1KUnaligned           193.15       366.36    1.90x
      BenchmarkHash8KUnaligned           221.51       452.96    2.04x
      
      R=agl
      CC=golang-dev
      https://golang.org/cl/7621049
      25cbd534
    • Russ Cox's avatar
      crypto/rc4: faster amd64, 386 implementations · 1af96080
      Russ Cox authored
      -- amd64 --
      
      On a MacBookPro10,2 (Core i5):
      
      benchmark           old ns/op    new ns/op    delta
      BenchmarkRC4_128          470          421  -10.43%
      BenchmarkRC4_1K          3123         3275   +4.87%
      BenchmarkRC4_8K         26351        25866   -1.84%
      
      benchmark            old MB/s     new MB/s  speedup
      BenchmarkRC4_128       272.22       303.40    1.11x
      BenchmarkRC4_1K        327.80       312.58    0.95x
      BenchmarkRC4_8K        307.24       313.00    1.02x
      
      For comparison, on the same machine, openssl 0.9.8r reports
      its rc4 speed as somewhat under 350 MB/s for both 1K and 8K.
      The Core i5 performance can be boosted another 20%, but only
      by making the Xeon performance significantly slower.
      
      On an Intel Xeon E5520:
      
      benchmark           old ns/op    new ns/op    delta
      BenchmarkRC4_128          774          417  -46.12%
      BenchmarkRC4_1K          6121         3200  -47.72%
      BenchmarkRC4_8K         48394        25151  -48.03%
      
      benchmark            old MB/s     new MB/s  speedup
      BenchmarkRC4_128       165.18       306.84    1.86x
      BenchmarkRC4_1K        167.28       319.92    1.91x
      BenchmarkRC4_8K        167.29       321.89    1.92x
      
      For comparison, on the same machine, openssl 1.0.1
      (which uses a different implementation than 0.9.8r)
      reports its rc4 speed as 587 MB/s for 1K and 601 MB/s for 8K.
      It is using SIMD instructions to do more in parallel.
      
      So there's still some improvement to be had, but even so,
      this is almost 2x faster than what it replaced.
      
      -- 386 --
      
      On a MacBookPro10,2 (Core i5):
      
      benchmark           old ns/op    new ns/op    delta
      BenchmarkRC4_128         3491          421  -87.94%
      BenchmarkRC4_1K         28063         3205  -88.58%
      BenchmarkRC4_8K        220392        25228  -88.55%
      
      benchmark            old MB/s     new MB/s  speedup
      BenchmarkRC4_128        36.66       303.81    8.29x
      BenchmarkRC4_1K         36.49       319.42    8.75x
      BenchmarkRC4_8K         36.73       320.90    8.74x
      
      On an Intel Xeon E5520:
      
      benchmark           old ns/op    new ns/op    delta
      BenchmarkRC4_128         2268          524  -76.90%
      BenchmarkRC4_1K         18161         4137  -77.22%
      BenchmarkRC4_8K        142396        32350  -77.28%
      
      benchmark            old MB/s     new MB/s  speedup
      BenchmarkRC4_128        56.42       244.13    4.33x
      BenchmarkRC4_1K         56.38       247.46    4.39x
      BenchmarkRC4_8K         56.86       250.26    4.40x
      
      R=agl
      CC=golang-dev
      https://golang.org/cl/7547050
      1af96080
    • Dmitriy Vyukov's avatar
      runtime: explicitly remove fd's from epoll waitset before close() · 44840786
      Dmitriy Vyukov authored
      Fixes #5061.
      
      Current code relies on the fact that fd's are automatically removed from epoll set when closed. However, it is not true. Underlying file description is removed from epoll set only when *all* fd's referring to it are closed.
      
      There are 2 bad consequences:
      1. Kernel delivers notifications on already closed fd's.
      2. The following sequence of events leads to error:
         - add fd1 to epoll
         - dup fd1 = fd2
         - close fd1 (not removed from epoll since we've dup'ed the fd)
         - dup fd2 = fd1 (get the same fd as fd1)
         - add fd1 to epoll = EEXIST
      
      So, if fd can be potentially dup'ed of fork'ed, it's necessary to explicitly remove the fd from epoll set.
      
      R=golang-dev, bradfitz, dave
      CC=golang-dev
      https://golang.org/cl/7870043
      44840786
    • Dmitriy Vyukov's avatar
      runtime: faster parallel GC · d4c80d19
      Dmitriy Vyukov authored
      Use per-thread work buffers instead of global mutex-protected pool. This eliminates contention from parallel scan phase.
      
      benchmark                             old ns/op    new ns/op    delta
      garbage.BenchmarkTree2-8               97100768     71417553  -26.45%
      garbage.BenchmarkTree2LastPause-8     970931485    714103692  -26.45%
      garbage.BenchmarkTree2Pause-8         469127802    345029253  -26.45%
      garbage.BenchmarkParser-8            2880950854   2715456901   -5.74%
      garbage.BenchmarkParserLastPause-8    137047399    103336476  -24.60%
      garbage.BenchmarkParserPause-8         80686028     58922680  -26.97%
      
      R=golang-dev, 0xe2.0x9a.0x9b, dave, adg, rsc, iant
      CC=golang-dev
      https://golang.org/cl/7816044
      d4c80d19
    • Rémy Oudompheng's avatar
      cmd/gc: implement more cases in racewalk. · 656bc3eb
      Rémy Oudompheng authored
      Add missing CLOSUREVAR in switch.
      Mark MAKE, string conversion nodes as impossible.
      Control statements do not need instrumentation.
      Instrument COM and LROT nodes.
      Instrument map length.
      
      Update #4228
      
      R=dvyukov, golang-dev
      CC=golang-dev
      https://golang.org/cl/7504047
      656bc3eb
    • Brad Fitzpatrick's avatar
      crypto/tls: use method values · 76d5e2ce
      Brad Fitzpatrick authored
      Currently fails with a compiler error, though.
      
      R=golang-dev, agl, rsc
      CC=golang-dev
      https://golang.org/cl/7933043
      76d5e2ce
    • Russ Cox's avatar
      cmd/gc: fix escape analysis of method values · 38e9b077
      Russ Cox authored
      R=ken2
      CC=golang-dev
      https://golang.org/cl/7518050
      38e9b077
  2. 20 Mar, 2013 17 commits
  3. 19 Mar, 2013 13 commits