An error occurred fetching the project authors.
- 25 Mar, 2014 4 commits
-
-
Keith Randall authored
See http://golang.org/s/go13heapdump for the file format. LGTM=rsc R=rsc, bradfitz, dvyukov, khr CC=golang-codereviews https://golang.org/cl/37540043
-
Keith Randall authored
Change two-bit stack map entries to encode: 0 = dead 1 = scalar 2 = pointer 3 = multiword If multiword, the two-bit entry for the following word encodes: 0 = string 1 = slice 2 = iface 3 = eface That way, during stack scanning we can check if a string is zero length or a slice has zero capacity. We can avoid following the contained pointer in those cases. It is safe to do so because it can never be dereferenced, and it is desirable to do so because it may cause false retention of the following block in memory. Slice feature turned off until issue 7564 is fixed. Update #7549 LGTM=rsc R=golang-codereviews, bradfitz, rsc CC=golang-codereviews https://golang.org/cl/76380043
-
Ian Lance Taylor authored
The existing code did not have a clear notion of whether memory has been actually reserved. It checked based on whether in 32-bit mode or 64-bit mode and (on GNU/Linux) the requested address, but it confused the requested address and the returned address. LGTM=rsc R=rsc, dvyukov CC=golang-codereviews, michael.hudson https://golang.org/cl/79610043
-
Ian Lance Taylor authored
The nproc and ndone fields are uint32. This makes the type consistent. LGTM=minux.ma R=golang-codereviews, minux.ma CC=golang-codereviews https://golang.org/cl/79340044
-
- 14 Mar, 2014 3 commits
-
-
Dmitriy Vyukov authored
It's possible that bgsweep constantly does not catch up for some reason, in this case runfinq was not woken at all. R=rsc CC=golang-codereviews https://golang.org/cl/75940043
-
Dmitriy Vyukov authored
The problem was that spans end up in wrong lists after split (e.g. in h->busy instead of h->central->empty). Also the span can be non-swept before split, I don't know what it can cause, but it's safer to operate on swept spans. Fixes #7544. R=golang-codereviews, rsc CC=golang-codereviews, khr https://golang.org/cl/76160043
-
Dmitriy Vyukov authored
See the comment for description. LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/75670044
-
- 12 Mar, 2014 1 commit
-
-
Russ Cox authored
The garbage collector uses type information to guide the traversal of the heap. If it sees a field that should be a string, it marks the object pointed at by the string data pointer as visited but does not bother to look at the data, because strings contain bytes, not pointers. If you save s[len(s):] somewhere, though, the string data pointer actually points just beyond the string data; if the string data were exactly the size of an allocated block, the string data pointer would actually point at the next block. It is incorrect to mark that next block as visited and not bother to look at the data, because the next block may be some other type entirely. The fix is to ignore strings with zero length during collection: they are empty and can never become non-empty: the base pointer will never be used again. The handling of slices already does this (but using cap instead of len). This was not a bug in Go 1.2, because until January all string allocations included a trailing NUL byte not included in the length, so s[len(s):] still pointed inside the string allocation (at the NUL). This bug was causing the crashes in test/run.go. Specifically, the parsing of a regexp in package regexp/syntax allocated a []syntax.Inst with rounded size 1152 bytes. In fact it allocated many such slices, because during the processing of test/index2.go it creates thousands of regexps that are all approximately the same complexity. That takes a long time, and test/run works on other tests in other goroutines. One such other test is chan/perm.go, which uses an 1152-byte source file. test/run reads that file into a []byte and then calls strings.Split(string(src), "\n"). The string(src) creates an 1152-byte string - and there's a very good chance of it landing next to one of the many many regexp slices already allocated - and then because the file ends in a \n, strings.Split records the tail empty string as the final element in the slice. A garbage collection happens at this point, the collection finds that string before encountering the []syntax.Inst data it now inadvertently points to, and the []syntax.Inst data is not scanned for the pointers that it contains. Each syntax.Inst contains a []rune, those are missed, and the backing rune arrays are freed for reuse. When the regexp is later executed, the runes being searched for are no longer runes at all, and there is no match, even on text that should match. On 64-bit machines the pointer in the []rune inside the syntax.Inst is larger (along with a few other pointers), pushing the []syntax.Inst backing array into a larger size class, avoiding the collision with chan/perm.go's inadvertently sized file. I expect this was more prevalent on OS X than on Linux or Windows because those managed to run faster or slower and didn't overlap index2.go with chan/perm.go as often. On the ARM systems, we only run one errorcheck test at a time, so index2 and chan/perm would never overlap. It is possible that this bug is the root cause of other crashes as well. For now we only know it is the cause of the test/run crash. Many thanks to Dmitriy for help debugging. Fixes #7344. Fixes #7455. LGTM=r, dvyukov, dave, iant R=golang-codereviews, dave, r, dvyukov, delpontej, iant CC=golang-codereviews, khr https://golang.org/cl/74250043
-
- 11 Mar, 2014 2 commits
-
-
Dmitriy Vyukov authored
Spans are now private to threads, and the loop is removed from all other functions. Remove it from marknogc for consistency. LGTM=khr, rsc R=golang-codereviews, bradfitz, khr CC=golang-codereviews, khr, rsc https://golang.org/cl/72520043
-
Dmitriy Vyukov authored
LGTM=khr, rsc R=golang-codereviews, bradfitz, khr CC=golang-codereviews, khr, rsc https://golang.org/cl/72480044
-
- 10 Mar, 2014 1 commit
-
-
Keith Randall authored
LGTM=bradfitz R=golang-codereviews, bradfitz CC=golang-codereviews https://golang.org/cl/73720044
-
- 07 Mar, 2014 1 commit
-
-
Russ Cox authored
One reason the sync.Pool finalizer test can fail is that this function's ef1 contains uninitialized data that just happens to point at some of the old pool. I've seen this cause retention of a single pool cache line (32 elements) on arm. Really we need liveness information for C functions, but for now we can be more careful about data in long-lived C functions that block. LGTM=bradfitz, dvyukov R=golang-codereviews, bradfitz, dvyukov CC=golang-codereviews, iant, khr https://golang.org/cl/72490043
-
- 06 Mar, 2014 3 commits
-
-
Russ Cox authored
Two memory allocator bug fixes. - efence is not maintaining the proper heap metadata to make eventual memory reuse safe, so use SysFault. - now that our heap PageSize is 8k but most hardware uses 4k pages, SysAlloc and SysReserve results must be explicitly aligned. Do that in a few more call sites and document this fact in malloc.h. Fixes #7448. LGTM=iant R=golang-codereviews, josharian, iant CC=dvyukov, golang-codereviews https://golang.org/cl/71750048
-
David du Colombier authored
warning: pkg/runtime/mgc0.c:2352 format mismatch p UVLONG, arg 2 warning: pkg/runtime/mgc0.c:2352 format mismatch p UVLONG, arg 3 LGTM=bradfitz R=golang-codereviews, bradfitz CC=golang-codereviews https://golang.org/cl/71950044
-
Dmitriy Vyukov authored
It was caused by mstats.heap_alloc skew. Fixes #7430. LGTM=khr R=golang-codereviews, khr CC=golang-codereviews, rsc https://golang.org/cl/69870055
-
- 27 Feb, 2014 2 commits
-
-
Keith Randall authored
Before GC, we flush all the per-P allocation caches. Doing stack shrinking mid-GC causes these caches to fill up. At the end of gc, the sweepgen is incremented which causes all of the data in these caches to be in a bad state (cached but not yet swept). Move the stack shrinking until after sweepgen is incremented, so any caching that happens as part of shrinking is done with already-swept data. Reenable stack copying. LGTM=bradfitz R=golang-codereviews, bradfitz CC=golang-codereviews https://golang.org/cl/69620043
-
Keith Randall authored
On stack overflow, if all frames on the stack are copyable, we copy the frames to a new stack twice as large as the old one. During GC, if a G is using less than 1/4 of its stack, copy the stack to a stack half its size. TODO - Do something about C frames. When a C frame is in the stack segment, it isn't copyable. We allocate a new segment in this case. - For idempotent C code, we can abort it, copy the stack, then retry. I'm working on a separate CL for this. - For other C code, we can raise the stackguard to the lowest Go frame so the next call that Go frame makes triggers a copy, which will then succeed. - Pick a starting stack size? The plan is that eventually we reach a point where the stack contains only copyable frames. LGTM=rsc R=dvyukov, rsc CC=golang-codereviews https://golang.org/cl/54650044
-
- 26 Feb, 2014 1 commit
-
-
Keith Randall authored
MCaches now hold a MSpan for each sizeclass which they have exclusive access to allocate from, so no lock is needed. Modifying the heap bitmaps also no longer requires a cas. runtime.free gets more expensive. But we don't use it much any more. It's not much faster on 1 processor, but it's a lot faster on multiple processors. benchmark old ns/op new ns/op delta BenchmarkSetTypeNoPtr1 24 23 -0.42% BenchmarkSetTypeNoPtr2 33 34 +0.89% BenchmarkSetTypePtr1 51 49 -3.72% BenchmarkSetTypePtr2 55 54 -1.98% benchmark old ns/op new ns/op delta BenchmarkAllocation 52739 50770 -3.73% BenchmarkAllocation-2 33957 34141 +0.54% BenchmarkAllocation-3 33326 29015 -12.94% BenchmarkAllocation-4 38105 25795 -32.31% BenchmarkAllocation-5 68055 24409 -64.13% BenchmarkAllocation-6 71544 23488 -67.17% BenchmarkAllocation-7 68374 23041 -66.30% BenchmarkAllocation-8 70117 20758 -70.40% LGTM=rsc, dvyukov R=dvyukov, bradfitz, khr, rsc CC=golang-codereviews https://golang.org/cl/46810043
-
- 25 Feb, 2014 1 commit
-
-
Dave Cheney authored
See golang.org/s/go13nacl for design overview. This CL is the mostly mechanical changes from rsc's Go 1.2 based NaCl branch, specifically 39cb35750369 to 500771b477cf from https://code.google.com/r/rsc-go13nacl. This CL does not include working NaCl support, there are probably two or three more large merges to come. CL 15750044 is not included as it involves more invasive changes to the linker which will need to be merged separately. The exact change lists included are 15050047: syscall: support for Native Client 15360044: syscall: unzip implementation for Native Client 15370044: syscall: Native Client SRPC implementation 15400047: cmd/dist, cmd/go, go/build, test: support for Native Client 15410048: runtime: support for Native Client 15410049: syscall: file descriptor table for Native Client 15410050: syscall: in-memory file system for Native Client 15440048: all: update +build lines for Native Client port 15540045: cmd/6g, cmd/8g, cmd/gc: support for Native Client 15570045: os: support for Native Client 15680044: crypto/..., hash/crc32, reflect, sync/atomic: support for amd64p32 15690044: net: support for Native Client 15690048: runtime: support for fake time like on Go Playground 15690051: build: disable various tests on Native Client LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/68150047
-
- 24 Feb, 2014 3 commits
-
-
Dmitriy Vyukov authored
With concurrent sweeping finc if modified by runfinq and queuefinalizer concurrently. Fixes crashes like this one: http://build.golang.org/log/6ad7b59ef2e93e3c9347eabfb4c4bd66df58fd5a Fixes #7324. Update #7396 LGTM=rsc R=golang-codereviews, minux.ma, rsc CC=golang-codereviews, khr https://golang.org/cl/67980043
-
Dmitriy Vyukov authored
Reinforce the guarantee that MSpan_EnsureSwept actually ensures that the span is swept. I have not observed crashes related to this, but I do not see why it can't crash as well. LGTM=rsc R=golang-codereviews CC=golang-codereviews, khr, rsc https://golang.org/cl/67990043
-
Dmitriy Vyukov authored
runfinqv is already defined the same way on line 271. There may also be something to fix in compiler/linker wrt diagnostics. Fixes #7375. LGTM=bradfitz R=golang-codereviews, dave, bradfitz CC=golang-codereviews https://golang.org/cl/67850044
-
- 20 Feb, 2014 1 commit
-
-
Russ Cox authored
Package runtime's C functions written to be called from Go started out written in C using carefully constructed argument lists and the FLUSH macro to write a result back to memory. For some functions, the appropriate parameter list ended up being architecture-dependent due to differences in alignment, so we added 'goc2c', which takes a .goc file containing Go func declarations but C bodies, rewrites the Go func declaration to equivalent C declarations for the target architecture, adds the needed FLUSH statements, and writes out an equivalent C file. That C file is compiled as part of package runtime. Native Client's x86-64 support introduces the most complex alignment rules yet, breaking many functions that could until now be portably written in C. Using goc2c for those avoids the breakage. Separately, Keith's work on emitting stack information from the C compiler would require the hand-written functions to add #pragmas specifying how many arguments are result parameters. Using goc2c for those avoids maintaining #pragmas. For both reasons, use goc2c for as many Go-called C functions as possible. This CL is a replay of the bulk of CL 15400047 and CL 15790043, both of which were reviewed as part of the NaCl port and are checked in to the NaCl branch. This CL is part of bringing the NaCl code into the main tree. No new code here, just reformatting and occasional movement into .h files. LGTM=r R=dave, alex.brainman, r CC=golang-codereviews https://golang.org/cl/65220044
-
- 13 Feb, 2014 2 commits
-
-
Russ Cox authored
This cleans up the code significantly, and it avoids any possible problems with madvise zeroing out some but not all of the data. Fixes #6400. LGTM=dave R=dvyukov, dave CC=golang-codereviews https://golang.org/cl/57680046
-
Dmitriy Vyukov authored
The issue was that one of the MSpan_Sweep callers was doing sweep with preemption enabled. Additional checks are added. LGTM=rsc R=rsc, dave CC=golang-codereviews https://golang.org/cl/62990043
-
- 12 Feb, 2014 3 commits
-
-
Russ Cox authored
State of the world: CL 46430043 introduced a new concurrent sweep but is broken. CL 62360043 made the new sweep non-concurrent to try to fix the world while we understand what's wrong with the concurrent version. This CL fixes the non-concurrent form to run finalizers. This CL is just a band-aid to get the build green again. Dmitriy is working on understanding and then fixing what's wrong with the concurrent sweep. TBR=dvyukov CC=golang-codereviews https://golang.org/cl/62370043
-
Dmitriy Vyukov authored
We see failures on builders, e.g.: http://build.golang.org/log/70bb28cd6bcf8c4f49810a011bb4337a61977bf4 LGTM=rsc, dave R=rsc, dave CC=golang-codereviews https://golang.org/cl/62360043
-
Dmitriy Vyukov authored
Moves sweep phase out of stoptheworld by adding background sweeper goroutine and lazy on-demand sweeping. It turned out to be somewhat trickier than I expected, because there is no point in time when we know size of live heap nor consistent number of mallocs and frees. So everything related to next_gc, mprof, memstats, etc becomes trickier. At the end of GC next_gc is conservatively set to heap_alloc*GOGC, which is much larger than real value. But after every sweep next_gc is decremented by freed*GOGC. So when everything is swept next_gc becomes what it should be. For mprof I had to introduce 3-generation scheme (allocs, revent_allocs, prev_allocs), because by the end of GC we know number of frees for the *previous* GC. Significant caution is required to not cross yet-unknown real value of next_gc. This is achieved by 2 means: 1. Whenever I allocate a span from MCentral, I sweep a span in that MCentral. 2. Whenever I allocate N pages from MHeap, I sweep until at least N pages are returned to heap. This provides quite strong guarantees that heap does not grow when it should now. http-1 allocated 7036 7033 -0.04% allocs 60 60 +0.00% cputime 51050 46700 -8.52% gc-pause-one 34060569 1777993 -94.78% gc-pause-total 2554 133 -94.79% latency-50 178448 170926 -4.22% latency-95 284350 198294 -30.26% latency-99 345191 220652 -36.08% rss 101564416 101007360 -0.55% sys-gc 6606832 6541296 -0.99% sys-heap 88801280 87752704 -1.18% sys-other 7334208 7405928 +0.98% sys-stack 524288 524288 +0.00% sys-total 103266608 102224216 -1.01% time 50339 46533 -7.56% virtual-mem 292990976 293728256 +0.25% garbage-1 allocated 2983818 2990889 +0.24% allocs 62880 62902 +0.03% cputime 16480000 16190000 -1.76% gc-pause-one 828462467 487875135 -41.11% gc-pause-total 4142312 2439375 -41.11% rss 1151709184 1153712128 +0.17% sys-gc 66068352 66068352 +0.00% sys-heap 1039728640 1039728640 +0.00% sys-other 37776064 40770176 +7.93% sys-stack 8781824 8781824 +0.00% sys-total 1152354880 1155348992 +0.26% time 16496998 16199876 -1.80% virtual-mem 1409564672 1402281984 -0.52% LGTM=rsc R=golang-codereviews, sameer, rsc, iant, jeremyjackins, gobot CC=golang-codereviews, khr https://golang.org/cl/46430043
-
- 30 Jan, 2014 1 commit
-
-
Dmitriy Vyukov authored
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages. Only Chromium uses 4K pages today (in "slow but small" configuration). The general tendency is to increase page size, because it reduces metadata size and DTLB pressure. This change reduces GC pause by ~10% and slightly improves other metrics. json-1 allocated 8037492 8038689 +0.01% allocs 105762 105573 -0.18% cputime 158400000 155800000 -1.64% gc-pause-one 4412234 4135702 -6.27% gc-pause-total 2647340 2398707 -9.39% rss 54923264 54525952 -0.72% sys-gc 3952624 3928048 -0.62% sys-heap 46399488 46006272 -0.85% sys-other 5597504 5290304 -5.49% sys-stack 393216 393216 +0.00% sys-total 56342832 55617840 -1.29% time 158478890 156046916 -1.53% virtual-mem 256548864 256593920 +0.02% garbage-1 allocated 2991113 2986259 -0.16% allocs 62844 62652 -0.31% cputime 16330000 15860000 -2.88% gc-pause-one 789108229 725555211 -8.05% gc-pause-total 3945541 3627776 -8.05% rss 1143660544 1132253184 -1.00% sys-gc 65609600 65806208 +0.30% sys-heap 1032388608 1035599872 +0.31% sys-other 37501632 22777664 -39.26% sys-stack 8650752 8781824 +1.52% sys-total 1144150592 1132965568 -0.98% time 16364602 15891994 -2.89% virtual-mem 1327296512 1313746944 -1.02% This is the exact reincarnation of already LGTMed: https://golang.org/cl/45770044 which must not break darwin/freebsd after: https://golang.org/cl/56630043 TBR=iant LGTM=khr, iant R=iant, khr CC=golang-codereviews https://golang.org/cl/58230043
-
- 27 Jan, 2014 1 commit
-
-
Dmitriy Vyukov authored
Currently windows crashes because early allocs in schedinit try to allocate tiny memory blocks, but m->p is not yet setup. I've considered calling procresize(1) earlier in schedinit, but this refactoring is better and must fix the issue as well. Fixes #7218. R=golang-codereviews, r CC=golang-codereviews https://golang.org/cl/54570045
-
- 24 Jan, 2014 3 commits
-
-
Dmitriy Vyukov authored
Combine NoScan allocations < 16 bytes into a single memory block. Reduces number of allocations on json/garbage benchmarks by 10+%. json-1 allocated 8039872 7949194 -1.13% allocs 105774 93776 -11.34% cputime 156200000 100700000 -35.53% gc-pause-one 4908873 3814853 -22.29% gc-pause-total 2748969 2899288 +5.47% rss 52674560 43560960 -17.30% sys-gc 3796976 3256304 -14.24% sys-heap 43843584 35192832 -19.73% sys-other 5589312 5310784 -4.98% sys-stack 393216 393216 +0.00% sys-total 53623088 44153136 -17.66% time 156193436 100886714 -35.41% virtual-mem 256548864 256540672 -0.00% garbage-1 allocated 2996885 2932982 -2.13% allocs 62904 55200 -12.25% cputime 17470000 17400000 -0.40% gc-pause-one 932757485 925806143 -0.75% gc-pause-total 4663787 4629030 -0.75% rss 1151074304 1133670400 -1.51% sys-gc 66068352 65085312 -1.49% sys-heap 1039728640 1024065536 -1.51% sys-other 38038208 37485248 -1.45% sys-stack 8650752 8781824 +1.52% sys-total 1152485952 1135417920 -1.48% time 17478088 17418005 -0.34% virtual-mem 1343709184 1324204032 -1.45% LGTM=iant, bradfitz R=golang-codereviews, dave, iant, rsc, bradfitz CC=golang-codereviews, khr https://golang.org/cl/38750047
-
Dmitriy Vyukov authored
Introduce fixed-size P-local caches. When local caches overflow/underflow a batch of items is transferred to/from global mutex-protected cache. benchmark old ns/op new ns/op delta BenchmarkPool 50554 22423 -55.65% BenchmarkPool-4 400359 5904 -98.53% BenchmarkPool-16 403311 1598 -99.60% BenchmarkPool-32 367310 1526 -99.58% BenchmarkPoolOverlflow 5214 3633 -30.32% BenchmarkPoolOverlflow-4 42663 9539 -77.64% BenchmarkPoolOverlflow-8 46919 11385 -75.73% BenchmarkPoolOverlflow-16 39454 13048 -66.93% BenchmarkSprintfEmpty 84 63 -25.68% BenchmarkSprintfEmpty-2 371 32 -91.13% BenchmarkSprintfEmpty-4 465 22 -95.25% BenchmarkSprintfEmpty-8 565 12 -97.77% BenchmarkSprintfEmpty-16 498 5 -98.87% BenchmarkSprintfEmpty-32 492 4 -99.04% BenchmarkSprintfString 259 229 -11.58% BenchmarkSprintfString-2 574 144 -74.91% BenchmarkSprintfString-4 651 77 -88.05% BenchmarkSprintfString-8 868 47 -94.48% BenchmarkSprintfString-16 825 33 -95.96% BenchmarkSprintfString-32 825 30 -96.28% BenchmarkSprintfInt 213 188 -11.74% BenchmarkSprintfInt-2 448 138 -69.20% BenchmarkSprintfInt-4 624 52 -91.63% BenchmarkSprintfInt-8 691 31 -95.43% BenchmarkSprintfInt-16 724 18 -97.46% BenchmarkSprintfInt-32 718 16 -97.70% BenchmarkSprintfIntInt 311 282 -9.32% BenchmarkSprintfIntInt-2 333 145 -56.46% BenchmarkSprintfIntInt-4 642 110 -82.87% BenchmarkSprintfIntInt-8 832 42 -94.90% BenchmarkSprintfIntInt-16 817 24 -97.00% BenchmarkSprintfIntInt-32 805 22 -97.17% BenchmarkSprintfPrefixedInt 309 269 -12.94% BenchmarkSprintfPrefixedInt-2 245 168 -31.43% BenchmarkSprintfPrefixedInt-4 598 99 -83.36% BenchmarkSprintfPrefixedInt-8 770 67 -91.23% BenchmarkSprintfPrefixedInt-16 829 54 -93.49% BenchmarkSprintfPrefixedInt-32 824 50 -93.83% BenchmarkSprintfFloat 418 398 -4.78% BenchmarkSprintfFloat-2 295 203 -31.19% BenchmarkSprintfFloat-4 585 128 -78.12% BenchmarkSprintfFloat-8 873 60 -93.13% BenchmarkSprintfFloat-16 884 33 -96.24% BenchmarkSprintfFloat-32 881 29 -96.62% BenchmarkManyArgs 1097 1069 -2.55% BenchmarkManyArgs-2 705 567 -19.57% BenchmarkManyArgs-4 792 319 -59.72% BenchmarkManyArgs-8 963 172 -82.14% BenchmarkManyArgs-16 1115 103 -90.76% BenchmarkManyArgs-32 1133 90 -92.03% LGTM=rsc R=golang-codereviews, bradfitz, minux.ma, gobot, rsc CC=golang-codereviews https://golang.org/cl/46010043
-
Russ Cox authored
There is more zeroing than I would like right now - temporaries used for the new map and channel runtime calls need to be eliminated - but it will do for now. This CL only has an effect if you are building with GOEXPERIMENT=precisestack ./all.bash (or make.bash). It costs about 5% in the overall time spent in all.bash. That number will come down before we make it on by default, but this should be enough for Keith to try using the precise maps for copying stacks. amd64 only (and it's not really great generated code). TBR=khr, iant CC=golang-codereviews https://golang.org/cl/56430043
-
- 23 Jan, 2014 2 commits
-
-
Dmitriy Vyukov authored
Breaks darwin and freebsd. ««« original CL description runtime: increase page size to 8K Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages. Only Chromium uses 4K pages today (in "slow but small" configuration). The general tendency is to increase page size, because it reduces metadata size and DTLB pressure. This change reduces GC pause by ~10% and slightly improves other metrics. json-1 allocated 8037492 8038689 +0.01% allocs 105762 105573 -0.18% cputime 158400000 155800000 -1.64% gc-pause-one 4412234 4135702 -6.27% gc-pause-total 2647340 2398707 -9.39% rss 54923264 54525952 -0.72% sys-gc 3952624 3928048 -0.62% sys-heap 46399488 46006272 -0.85% sys-other 5597504 5290304 -5.49% sys-stack 393216 393216 +0.00% sys-total 56342832 55617840 -1.29% time 158478890 156046916 -1.53% virtual-mem 256548864 256593920 +0.02% garbage-1 allocated 2991113 2986259 -0.16% allocs 62844 62652 -0.31% cputime 16330000 15860000 -2.88% gc-pause-one 789108229 725555211 -8.05% gc-pause-total 3945541 3627776 -8.05% rss 1143660544 1132253184 -1.00% sys-gc 65609600 65806208 +0.30% sys-heap 1032388608 1035599872 +0.31% sys-other 37501632 22777664 -39.26% sys-stack 8650752 8781824 +1.52% sys-total 1144150592 1132965568 -0.98% time 16364602 15891994 -2.89% virtual-mem 1327296512 1313746944 -1.02% R=golang-codereviews, dave, khr, rsc, khr CC=golang-codereviews https://golang.org/cl/45770044 »»» R=golang-codereviews CC=golang-codereviews https://golang.org/cl/56060043
-
Dmitriy Vyukov authored
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages. Only Chromium uses 4K pages today (in "slow but small" configuration). The general tendency is to increase page size, because it reduces metadata size and DTLB pressure. This change reduces GC pause by ~10% and slightly improves other metrics. json-1 allocated 8037492 8038689 +0.01% allocs 105762 105573 -0.18% cputime 158400000 155800000 -1.64% gc-pause-one 4412234 4135702 -6.27% gc-pause-total 2647340 2398707 -9.39% rss 54923264 54525952 -0.72% sys-gc 3952624 3928048 -0.62% sys-heap 46399488 46006272 -0.85% sys-other 5597504 5290304 -5.49% sys-stack 393216 393216 +0.00% sys-total 56342832 55617840 -1.29% time 158478890 156046916 -1.53% virtual-mem 256548864 256593920 +0.02% garbage-1 allocated 2991113 2986259 -0.16% allocs 62844 62652 -0.31% cputime 16330000 15860000 -2.88% gc-pause-one 789108229 725555211 -8.05% gc-pause-total 3945541 3627776 -8.05% rss 1143660544 1132253184 -1.00% sys-gc 65609600 65806208 +0.30% sys-heap 1032388608 1035599872 +0.31% sys-other 37501632 22777664 -39.26% sys-stack 8650752 8781824 +1.52% sys-total 1144150592 1132965568 -0.98% time 16364602 15891994 -2.89% virtual-mem 1327296512 1313746944 -1.02% R=golang-codereviews, dave, khr, rsc, khr CC=golang-codereviews https://golang.org/cl/45770044
-
- 22 Jan, 2014 1 commit
-
-
Dmitriy Vyukov authored
Introduces two-phase goroutine parking mechanism -- prepare to park, commit park. This mechanism does not require backing mutex to protect wait predicate. Use it in netpoll. See comment in netpoll.goc for details. This slightly reduces contention between reader, writer and read/write io notifications; and just eliminates a bunch of mutex operations from hotpaths, thus making then faster. benchmark old ns/op new ns/op delta BenchmarkTCP4ConcurrentReadWrite 2109 1945 -7.78% BenchmarkTCP4ConcurrentReadWrite-2 1162 1113 -4.22% BenchmarkTCP4ConcurrentReadWrite-4 798 755 -5.39% BenchmarkTCP4ConcurrentReadWrite-8 803 748 -6.85% BenchmarkTCP4Persistent 9411 9240 -1.82% BenchmarkTCP4Persistent-2 5888 5813 -1.27% BenchmarkTCP4Persistent-4 4016 3968 -1.20% BenchmarkTCP4Persistent-8 3943 3857 -2.18% R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc CC=golang-codereviews, khr https://golang.org/cl/45700043
-
- 21 Jan, 2014 3 commits
-
-
Dmitriy Vyukov authored
Currently we collect (add) all roots into a global array in a single-threaded GC phase. This hinders parallelism. With this change we just kick off parallel for for number_of_goroutines+5 iterations. Then parallel for callback decides whether it needs to scan stack of a goroutine scan data segment, scan finalizers, etc. This eliminates the single-threaded phase entirely. This requires to store all goroutines in an array instead of a linked list (to allow direct indexing). This CL also removes DebugScan functionality. It is broken because it uses unbounded stack, so it can not run on g0. When it was working, I've found it helpless for debugging issues because the two algorithms are too different now. This change would require updating the DebugScan, so it's simpler to just delete it. With 8 threads this change reduces GC pause by ~6%, while keeping cputime roughly the same. garbage-8 allocated 2987886 2989221 +0.04% allocs 62885 62887 +0.00% cputime 21286000 21272000 -0.07% gc-pause-one 26633247 24885421 -6.56% gc-pause-total 873570 811264 -7.13% rss 242089984 242515968 +0.18% sys-gc 13934336 13869056 -0.47% sys-heap 205062144 205062144 +0.00% sys-other 12628288 12628288 +0.00% sys-stack 11534336 11927552 +3.41% sys-total 243159104 243487040 +0.13% time 2809477 2740795 -2.44% R=golang-codereviews, rsc CC=cshapiro, golang-codereviews, khr https://golang.org/cl/46860043
-
Dmitriy Vyukov authored
Instead of a per-goroutine stack of defers for all sizes, introduce per-P defer pool for argument sizes 8, 24, 40, 56, 72 bytes. For a program that starts 1e6 goroutines and then joins then: old: rss=6.6g virtmem=10.2g time=4.85s new: rss=4.5g virtmem= 8.2g time=3.48s R=golang-codereviews, rsc CC=golang-codereviews https://golang.org/cl/42750044
-
Dmitriy Vyukov authored
Currently for 2-word blocks we set the flag to clear the flag. Makes no sense. In particular on 32-bits we call memclr always. R=golang-codereviews, dave, iant CC=golang-codereviews, khr, rsc https://golang.org/cl/41170044
-
- 16 Jan, 2014 1 commit
-
-
Dmitriy Vyukov authored
Example of output: goroutine 4 [sleep for 3 min]: time.Sleep(0x34630b8a000) src/pkg/runtime/time.goc:31 +0x31 main.func·002() block.go:16 +0x2c created by main.main block.go:17 +0x33 Full program and output are here: http://play.golang.org/p/NEZdADI3Td Fixes #6809. R=golang-codereviews, khr, kamil.kisiel, bradfitz, rsc CC=golang-codereviews https://golang.org/cl/50420043
-