iOS源码解析: 从一次翻车现场到GCD的源码分析

一切都起源于一次Fabric上的crash分析。

Crash Log

Fabric上突然出现一些下载业务使用GCD group引发的crash,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#0. Crashed: com.apple.main-thread
0 libdispatch.dylib 0x192759b3c dispatch_group_leave.cold.1 + 36
1 libdispatch.dylib 0x19272ad84 _dispatch_group_wake + 114
2 MTXX 0x103be1af8 __38-[xxxxxx downloadCompletion]_block_invoke + 108 (xxxxxx.m:108)
3 libdispatch.dylib 0x192728b7c _dispatch_call_block_and_release + 32
4 libdispatch.dylib 0x192729fd8 _dispatch_client_callout + 20
5 libdispatch.dylib 0x192735cc8 _dispatch_main_queue_callback_4CF + 968
6 CoreFoundation 0x1929ffcc8 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 16
7 CoreFoundation 0x1929faa24 __CFRunLoopRun + 1980
8 CoreFoundation 0x1929f9f40 CFRunLoopRunSpecific + 480
9 GraphicsServices 0x19cc8a534 GSEventRunModal + 108
10 UIKitCore 0x196b85580 UIApplicationMain + 1940
11 MTXX 0x105c6af10 main + 16 (main.m:16)
12 libdyld.dylib 0x192878e18 start + 4

凭借以前的经验,这显然是GCD group的enter/leave没有匹配引发的问题。dispatch_group_enter函数已经明确说了要跟dispatch_group_leave成对使用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/*!
* @function dispatch_group_enter
*
* @abstract
* Manually indicate a block has entered the group
*
* @discussion
* Calling this function indicates another block has joined the group through
* a means other than dispatch_group_async(). Calls to this function must be
* balanced with dispatch_group_leave().
*
* @param group
* The dispatch group to update.
* The result of passing NULL in this parameter is undefined.
*/
API_AVAILABLE(macos(10.6), ios(4.0))
DISPATCH_EXPORT DISPATCH_NONNULL_ALL DISPATCH_NOTHROW
void
dispatch_group_enter(dispatch_group_t group);

第一次分析

那么,经过仔细的review,发现确实有一处漏洞可能导致dispatch_group_leave不执行。代码逻辑大概如下,仅列出了本文可能相关的部分伪代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
- (void)downloadURLs:(NSURL *)urls finishCompletion:(void(^)(NSURL *URL))finishCompletion {
dispatch_group_t dispatchGroup = dispatch_group_create();
for (NSURL *url in urls) {
dispatch_group_enter(dispatchGroup);
[self downloadURL:url finishCompletion:^(NSURL *url, BOOL isSuccess) {
// 下载成功与否的逻辑代码
// xxxxx
dispatch_group_leave(dispatchGroup);
}];
}
dispatch_group_notify(dispatchGroup, dispatch_get_main_queue(), ^{
if (finishCompletion) {
finishCompletion();
}
});
}

- (void)downloadURL:(NSURL *)url finishCompletion:(void(^)(NSURL *URL, BOOL isSuccess))finishCompletion {
// 各种逻辑,if-else判断等。。。项目代码比较久了的原因。

// 其中有一个暂停任务的判断,大概代码如下:
DownloadItem *downloadItem = [self downloadItemForURL:url];
if (downloadItem正在暂停) {
// 继续下载操作
return;
}

// xxxxxx
// 触发实际的下载操作
}

注意,因为代码比较久的原因,执行继续下载操作的时候,并未将finishCompletion传递,因此finishCompletion也就没有机会执行了。所以导致group的enter/leave不匹配,修改代码如下:

1
2
3
4
5
if (downloadItem正在暂停) {
// 继续下载操作
downloadItem.finishCompletion = finishCompletion;
return;
}

一番探索

改了之后,心里却依然感觉不太踏实,果真就是这样修改的么?

仔细思考这一番解释:

dispatch_group_enter: Calling this function indicates another block has joined the group through a means other than dispatch_group_async(). Calls to this function must be balanced with dispatch_group_leave().

也没说缺少dispatch_group_leave就会导致崩溃?那就用代码来试一试:

试验代码

缺少dispatch_group_leave

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
- (void)group_leave_not_crash_1 {
dispatch_group_t group = dispatch_group_create();

dispatch_group_enter(group);
dispatch_async(dispatch_get_global_queue(0, 0), ^{
NSLog(@"global_queue block 1");
});

dispatch_group_enter(group);
dispatch_async(dispatch_get_global_queue(0, 0), ^{
NSLog(@"global_queue block 2");
});

dispatch_group_notify(group, dispatch_get_main_queue(), ^{
NSLog(@"dispatch_group_notify");
});

NSLog(@"done");
}

输出:

1
2
3
done
global_queue block 1
global_queue block 2

并未发生崩溃。啪啪打脸的声音倒是有的。

缺少dispatch_group_enter

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
- (void)group_leave_crash {
dispatch_group_t group = dispatch_group_create();

dispatch_async(dispatch_get_global_queue(0, 0), ^{
NSLog(@"dispatch_group_notify main_queue block 1");
dispatch_group_leave(group);
});

dispatch_async(dispatch_get_global_queue(0, 0), ^{
NSLog(@"dispatch_group_notify main_queue block 2");
dispatch_group_leave(group);
});

dispatch_group_notify(group, dispatch_get_main_queue(), ^{
NSLog(@"dispatch_group_notify");
});

NSLog(@"done");
}

两句过度调用dispatch_group_leave的地方都会导致崩溃。

dispatch_group_enter与dispatch_group_leave不严格匹配

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
- (void)group_leave_not_crash_2 {
dispatch_group_t group = dispatch_group_create();

dispatch_group_enter(group);
dispatch_async(dispatch_get_global_queue(0, 0), ^{
NSLog(@"global_queue block 1");
});

dispatch_async(dispatch_get_global_queue(0, 0), ^{
NSLog(@"global_queue block 2");
dispatch_group_leave(group);
});

dispatch_group_notify(group, dispatch_get_main_queue(), ^{
NSLog(@"dispatch_group_notify");
});

NSLog(@"done");
}

- (void)group_leave_not_crash_3 {
dispatch_group_t group = dispatch_group_create();

dispatch_async(dispatch_get_global_queue(0, 0), ^{
NSLog(@"global_queue block 1");
dispatch_group_leave(group);
});

dispatch_group_enter(group);
dispatch_async(dispatch_get_global_queue(0, 0), ^{
NSLog(@"global_queue block 2");
});

dispatch_group_notify(group, dispatch_get_main_queue(), ^{
NSLog(@"dispatch_group_notify");
});

NSLog(@"done");
}

输出结果都是:

1
2
3
4
done
global_queue block 1
global_queue block 2
dispatch_group_notify

dispatch_group_enter与dispatch_group_leave并未严格地一一对应,但dispatch_group_notify的那个notification block成功执行了。这个有点奇怪。。。

结论

  1. 仅有dispatch_group_enter,缺少dispatch_group_leave,不会有问题
  2. 缺少dispatch_group_enter,执行dispatch_group_leave,直接导致崩溃
  3. dispatch_group_enter与dispatch_group_leave不严格匹配,但是个数匹配,不会有问题

libdispatch的源码解析

分析崩溃堆栈

缺少dispatch_group_enter的那个demo,是在dispatch_group_leave(group);那一行直接导致的崩溃:Thread 1: EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)。如果打印group对象,为 *<OS_dispatch_group: group[0x600001088190] = { xref = 1, ref = 1, count = 1073741823, gen = 0, waiters = 0, notifs = 0 }>*

看一看调用堆栈:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
0x108e8fb30 <+0>:  pushq  %rbp
0x108e8fb31 <+1>: movq %rsp, %rbp
0x108e8fb34 <+4>: subq $0x20, %rsp
0x108e8fb38 <+8>: movq %rdi, -0x8(%rbp)
0x108e8fb3c <+12>: movq %rsi, -0x10(%rbp)
0x108e8fb40 <+16>: callq 0x108e900a8 ; symbol stub for: dispatch_group_create
0x108e8fb45 <+21>: movq %rax, -0x18(%rbp)
0x108e8fb49 <+25>: movq -0x18(%rbp), %rdi
0x108e8fb4d <+29>: callq 0x108e900b4 ; symbol stub for: dispatch_group_leave
-> 0x108e8fb52 <+34>: xorl %ecx, %ecx
0x108e8fb54 <+36>: movl %ecx, %esi
0x108e8fb56 <+38>: leaq -0x18(%rbp), %rax
0x108e8fb5a <+42>: movq %rax, %rdi
0x108e8fb5d <+45>: callq 0x108e900f6 ; symbol stub for: objc_storeStrong
0x108e8fb62 <+50>: addq $0x20, %rsp
0x108e8fb66 <+54>: popq %rbp
0x108e8fb67 <+55>: retq
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
libdispatch.dylib`dispatch_group_leave:
0x10f528955 <+0>: movl $0x4, %eax
0x10f52895a <+5>: lock
0x10f52895b <+6>: xaddq %rax, 0x30(%rdi)
0x10f528960 <+11>: cmpl $-0x4, %eax
0x10f528963 <+14>: jae 0x10f52896d ; <+24>
0x10f528965 <+16>: andl $-0x4, %eax
0x10f528968 <+19>: testl %eax, %eax
0x10f52896a <+21>: je 0x10f5289a3 ; <+78>
0x10f52896c <+23>: retq
0x10f52896d <+24>: addq $0x4, %rax
0x10f528971 <+28>: movq %rax, %rsi
0x10f528974 <+31>: movq %rax, %rcx
0x10f528977 <+34>: andq $-0x4, %rcx
0x10f52897b <+38>: testl $0xfffffffc, %esi ; imm = 0xFFFFFFFC
0x10f528981 <+44>: cmovneq %rax, %rcx
0x10f528985 <+48>: andq $-0x3, %rcx
0x10f528989 <+52>: cmpq %rcx, %rax
0x10f52898c <+55>: je 0x10f528999 ; <+68>
0x10f52898e <+57>: movq %rsi, %rax
0x10f528991 <+60>: lock
0x10f528992 <+61>: cmpxchgq %rcx, 0x30(%rdi)
0x10f528997 <+66>: jne 0x10f528971 ; <+28>
0x10f528999 <+68>: movl $0x1, %edx
0x10f52899e <+73>: jmp 0x10f5289af ; _dispatch_group_wake
0x10f5289a3 <+78>: pushq %rbp
0x10f5289a4 <+79>: movq %rsp, %rbp
0x10f5289a7 <+82>: movq %rax, %rdi
0x10f5289aa <+85>: callq 0x10f55a66d ; dispatch_group_leave.cold.1
1
2
3
4
5
6
libdispatch.dylib`dispatch_group_leave.cold.1:
0x10f55a66d <+0>: movq %rdi, %rax
0x10f55a670 <+3>: leaq 0x5bd6(%rip), %rcx ; "BUG IN CLIENT OF LIBDISPATCH: Unbalanced call to dispatch_group_leave()"
0x10f55a677 <+10>: movq %rcx, 0x27ad2(%rip) ; gCRAnnotations + 8
0x10f55a67e <+17>: movq %rax, 0x27afb(%rip) ; gCRAnnotations + 56
-> 0x10f55a685 <+24>: ud2

崩溃的关键信息如下,也指明了确实是引发了Unbalanced call,而且跟Fabric上的crash log一致。

1
2
3
4
5
Thread 1: EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)

dispatch_group_leave(group);
callq 0x10f55a66d ; dispatch_group_leave.cold.1
"BUG IN CLIENT OF LIBDISPATCH: Unbalanced call to dispatch_group_leave()"

因此,可以确定该crash也同样是过度调用了dispatch_group_leave函数导致的,所以第一次的修改果然是错误的。

过度调用dispatch_group_leave确实会崩溃,但具体原因是什么?想弄懂以上的这些,只能去研究GCD的源码了。

dispatch_group_leave

dispatch_group_leave的源码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
void
dispatch_group_leave(dispatch_group_t dg)
{
// The value is incremented on a 64bits wide atomic so that the carry for
// the -1 -> 0 transition increments the generation atomically.
uint64_t new_state, old_state = os_atomic_add_orig2o(dg, dg_state,
DISPATCH_GROUP_VALUE_INTERVAL, release);
uint32_t old_value = (uint32_t)(old_state & DISPATCH_GROUP_VALUE_MASK);

if (unlikely(old_value == DISPATCH_GROUP_VALUE_1)) {
old_state += DISPATCH_GROUP_VALUE_INTERVAL;
do {
new_state = old_state;
if ((old_state & DISPATCH_GROUP_VALUE_MASK) == 0) {
new_state &= ~DISPATCH_GROUP_HAS_WAITERS;
new_state &= ~DISPATCH_GROUP_HAS_NOTIFS;
} else {
// If the group was entered again since the atomic_add above,
// we can't clear the waiters bit anymore as we don't know for
// which generation the waiters are for
new_state &= ~DISPATCH_GROUP_HAS_NOTIFS;
}
if (old_state == new_state) break;
} while (unlikely(!os_atomic_cmpxchgv2o(dg, dg_state,
old_state, new_state, &old_state, relaxed)));
return _dispatch_group_wake(dg, old_state, true);
}

if (unlikely(old_value == 0)) {
DISPATCH_CLIENT_CRASH((uintptr_t)old_value,
"Unbalanced call to dispatch_group_leave()");
}
}

Unbalanced call出现的时机,就是old_value为0的时候。os_atomic_add_orig2o操作是一个加操作,即往dispatch_group_t对象中的某个字段dg_bits加一个值DISPATCH_GROUP_VALUE_INTERVAL,而加之前的旧值就是old_value。

所以,当old_value已经为0的时候,再执行dispatch_group_leave调用,就会触发Unbalanced call的崩溃。

dispatch_group_enter

那只有一个dispatch_group_enter,而没有对应的leave是不会崩溃的。如果是因为dispatch_group_enter的Unbalanced call,会出现什么情况呢?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void
dispatch_group_enter(dispatch_group_t dg)
{
// The value is decremented on a 32bits wide atomic so that the carry
// for the 0 -> -1 transition is not propagated to the upper 32bits.
uint32_t old_bits = os_atomic_sub_orig2o(dg, dg_bits,
DISPATCH_GROUP_VALUE_INTERVAL, acquire);
uint32_t old_value = old_bits & DISPATCH_GROUP_VALUE_MASK;
if (unlikely(old_value == 0)) {
_dispatch_retain(dg); // <rdar://problem/22318411>
}
if (unlikely(old_value == DISPATCH_GROUP_VALUE_MAX)) {
DISPATCH_CLIENT_CRASH(old_bits,
"Too many nested calls to dispatch_group_enter()");
}
}

这个enter就很好理解了。os_atomic_sub_orig2o操作是一个减操作,即往dispatch_group_t对象中的某个字段dg_bits减一个值DISPATCH_GROUP_VALUE_INTERVAL,而减之前的旧值就是old_value。当old_value为DISPATCH_GROUP_VALUE_MAX的时候,再执行dispatch_group_enter调用,就会触发Unbalanced call的崩溃。

测试一下:

1
2
3
4
5
6
7
8
- (void)group_enter_crash_1 {
dispatch_group_t group = dispatch_group_create();

while (YES) {
dispatch_group_enter(group); // 要挺久的,直接触发dispatch_group_enter.cold.2
// <OS_dispatch_group: group[0x600003c73a70] = { xref = 1, ref = 2, count = 0, gen = 0, waiters = 0, notifs = 0 }>
}
}

确实发生了崩溃,不过需要几秒钟,要使得os_atomic_sub_orig2o操作发生相当多的数量,才能使得old_value为DISPATCH_GROUP_VALUE_MAX的条件发生。此时的关键堆栈信息为:

1
2
3
4
5
6
7
Thread 1: EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)

dispatch_group_enter
callq 0x10e155687 ; dispatch_group_enter.cold.1
movl %eax, %edi
callq 0x10e155697 ; dispatch_group_enter.cold.2
"BUG IN CLIENT OF LIBDISPATCH: Too many nested calls to dispatch_group_enter()"

因此,GCD的group enter/leave操作,就是会对一个字段值执行加/减操作,而避免Unbalanced call的方式就是成对出现。这也解释了dispatch_group_enter与dispatch_group_leave不严格匹配的时候,不会导致崩溃的原因。

dispatch_group_create

dispatch_group_create的源码如下,显然就只是一个初始化操作,然后给对应的group enter/leave需要的字段值赋一个初始值,这里应该是0。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
DISPATCH_ALWAYS_INLINE
static inline dispatch_group_t
_dispatch_group_create_with_count(uint32_t n)
{
dispatch_group_t dg = _dispatch_object_alloc(DISPATCH_VTABLE(group),
sizeof(struct dispatch_group_s));
dg->do_next = DISPATCH_OBJECT_LISTLESS;
dg->do_targetq = _dispatch_get_default_queue(false);
if (n) {
os_atomic_store2o(dg, dg_bits,
-n * DISPATCH_GROUP_VALUE_INTERVAL, relaxed);
os_atomic_store2o(dg, do_ref_cnt, 1, relaxed); // <rdar://22318411>
}
return dg;
}

dispatch_group_t
dispatch_group_create(void)
{
return _dispatch_group_create_with_count(0);
}

dispatch group enter/leave的原理

到这里,已经基本明确了dispatch_group_enter和dispatch_group_leave的原理。dispatch_group_enter将dispatch_group_t对象中的某个字段dg_bits的值执行减操作(减一),而dispatch_group_leave将其执行加操作(加一)。当dispatch_group_leave执行的时候,一定要确保之前调用过dispatch_group_enter(该字段值小于一),这也就是balanced call的意思。

第二次分析

有了以上的分析,已经可以明确第一次的分析是错误的。

再看一下downloadURL的实际操作中finishCompletion的调用时机:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
- (void)downloadURL:(NSURL *)url finishCompletion:(void(^)(NSURL *URL, BOOL isSuccess))finishCompletion {
// 各种逻辑,if-else判断等。。。项目代码比较久了的原因。

if (根据url和downloadItems判定,是否正在下载中) {
// 相应操作
return;
}

// 其中有一个暂停任务的判断,大概代码如下:
DownloadItem *downloadItem = [self downloadItemForURL:url];
if (downloadItem正在暂停) {
// 继续下载操作
return;
}

// xxxxxx
// 触发实际的下载操作
// 1. 使用url构建一个NSURLRequest,再构建一个AFDownloadRequestOperation
// 2. 根据url构建一个downloadItem对象,传入下载完成回调finishCompletion,装入downloadItems字典中。
// 3. 设置CompletionBlock,其中根据url来获取downloadItem,根据条件来执行其finishCompletion
// 4. 添加到queue中,发起下载请求
[task setCompletionBlockWithSuccess:^(AFHTTPRequestOperation *operation, id responseObject) {
DownloadItem *downloadItem = [self getDownloadItem:operation.request.URL];
// 根据下载状态,执行downloadItem中的finishCompletion
}];
}

代码中使用downloadItems字典来存储下载封装对象downloadItem,finishCompletion即为外部传入的下载完成回调。。

1
2
3
4
5
6
self.downloadItems[url] = downloadItem;

- (DownloadItem *)getDownloadItem:(NSURL *)url
{
return self.downloadItems[url.absoluteString];
}

虽然代码比较久了,但流程看起来好像没啥问题。。。然而仔细一想,涉及到downloadItems字典的逻辑貌似最容易埋坑,思考一番果然恍然大悟。

问题确实就出在downloadItems字典这一块:

  1. 假设传入URL,构建downloadItem A对象,传入finishCompletion A,存入downloadItems字典中。根据URL发起下载操作 A。这一步流程正常。
  2. 再次下载同样一个链接URL,正常不会有问题,那万一出现多线程场景呢?代码对downloadItems字典的相关操作没有做线程保护。
  3. 假设多线程场景下:使用同一个URL,可能同时符合过滤条件,而触发实际的下载操作。即,构建downloadItem B对象,传入finishCompletion B,存入downloadItems字典中。根据URL发起下载操作 B。
  4. 此时downloadItems字典中,URL对应的downloadItem从之前的downloadItem A,变成了downloadItem B。
  5. 两个下载操作都完成后,根据URL取出downloadItem,此时只能取到downloadItem B,执行finishCompletion B,里边包含一个dispatch_group_leave操作。因此下载操作 A和下载操作 B都会触发finishCompletion B。导致B相关流程,出现dispatch_group_leave的Unbalanced call,导致崩溃。

知道了根本原因就好办了,改动其实也很简单,在下载任务task完成的回调setCompletionBlockWithSuccess中,不要从downloadItems字典中取出downloadItem。而是通过捕获当前的局部变量downloadItem即可获取到正确的downloadItem。

总结

iOS相关的官方文档,大部分都写得非常好。但是也有个别一些,如GCD group,写得太简略,让人很容易似懂非懂。这个时候,就是需要show me the code的时候了。

One More Thing

知道了GCD group enter/leave的原理,相信以后便不会再犯类似的错误了。最后,还有一个疑问,dispatch_group_notify里边的notification block到底是如何触发执行的呢?

dispatch_group_t

关于dispatch_group_t这个结构体,之前一直没有分析。

1
2
3
4
5
6
7
8
9
10
11
typedef struct dispatch_group_s *dispatch_group_t;

struct dispatch_group_s {
DISPATCH_OBJECT_HEADER(group);
DISPATCH_UNION_LE(uint64_t volatile dg_state,
uint32_t dg_bits,
uint32_t dg_gen
) DISPATCH_ATOMIC64_ALIGN;
struct dispatch_continuation_s *volatile dg_notify_head;
struct dispatch_continuation_s *volatile dg_notify_tail;
};

dg_bits是enter/leave需要的字段值,而该值在其他GCD接口中也需要使用。两个dispatch_continuation_t对象,dg_notify_head和dg_notify_tail则是group notification block相关的结构了,可以看出封装notification block的结构是以链表形式保存的group中的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
typedef struct dispatch_continuation_s {
DISPATCH_CONTINUATION_HEADER(continuation);
} *dispatch_continuation_t;

// If dc_flags is less than 0x1000, then the object is a continuation.
// Otherwise, the object has a private layout and memory management rules. The
// layout until after 'do_next' must align with normal objects.
#define DISPATCH_CONTINUATION_HEADER(x) \
union { \
const void *do_vtable; \
uintptr_t dc_flags; \
}; \
union { \
pthread_priority_t dc_priority; \
int dc_cache_cnt; \
uintptr_t dc_pad; \
}; \
struct voucher_s *dc_voucher; \
struct dispatch_##x##_s *volatile do_next; \
dispatch_function_t dc_func; \
void *dc_ctxt; \
void *dc_data; \
void *dc_other

dispatch_continuation_t结构体的内容其实不多,不过没啥注释,基本看不出来啥。

dispatch_group_notify

看一下dispatch_group_notify的源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
DISPATCH_ALWAYS_INLINE
static inline void
_dispatch_group_notify(dispatch_group_t dg, dispatch_queue_t dq,
dispatch_continuation_t dsn)
{
uint64_t old_state, new_state;
dispatch_continuation_t prev;

dsn->dc_data = dq;
_dispatch_retain(dq);

prev = os_mpsc_push_update_tail(os_mpsc(dg, dg_notify), dsn, do_next);
if (os_mpsc_push_was_empty(prev)) _dispatch_retain(dg);
os_mpsc_push_update_prev(os_mpsc(dg, dg_notify), prev, dsn, do_next);
if (os_mpsc_push_was_empty(prev)) {
os_atomic_rmw_loop2o(dg, dg_state, old_state, new_state, release, {
new_state = old_state | DISPATCH_GROUP_HAS_NOTIFS;
if ((uint32_t)old_state == 0) {
os_atomic_rmw_loop_give_up({
return _dispatch_group_wake(dg, new_state, false);
});
}
});
}
}

DISPATCH_NOINLINE
void
dispatch_group_notify_f(dispatch_group_t dg, dispatch_queue_t dq, void *ctxt,
dispatch_function_t func)
{
dispatch_continuation_t dsn = _dispatch_continuation_alloc();
_dispatch_continuation_init_f(dsn, dq, ctxt, func, 0, DC_FLAG_CONSUME);
_dispatch_group_notify(dg, dq, dsn);
}

#ifdef __BLOCKS__
void
dispatch_group_notify(dispatch_group_t dg, dispatch_queue_t dq,
dispatch_block_t db)
{
dispatch_continuation_t dsn = _dispatch_continuation_alloc();
_dispatch_continuation_init(dsn, dq, db, 0, DC_FLAG_CONSUME);
_dispatch_group_notify(dg, dq, dsn);
}
#endif

notification block的执行,显然是_dispatch_group_wake调用触发的。若dispatch_group_notify函数调用之前,并未有执行过dispatch_group_enter,则会直接触发_dispatch_group_wake。

dispatch_group_notify函数会使用_dispatch_continuation_init函数,将一个dispatch_block_t对象db存入dispatch_group_t对象dg中。

_dispatch_continuation_init

_dispatch_continuation_init函数中则是对dispatch_continuation_t对象的各种初始化操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
DISPATCH_ALWAYS_INLINE
static inline dispatch_qos_t
_dispatch_continuation_init_f(dispatch_continuation_t dc,
dispatch_queue_class_t dqu, void *ctxt, dispatch_function_t f,
dispatch_block_flags_t flags, uintptr_t dc_flags)
{
pthread_priority_t pp = 0;
dc->dc_flags = dc_flags | DC_FLAG_ALLOCATED;
dc->dc_func = f;
dc->dc_ctxt = ctxt;
// in this context DISPATCH_BLOCK_HAS_PRIORITY means that the priority
// should not be propagated, only taken from the handler if it has one
if (!(flags & DISPATCH_BLOCK_HAS_PRIORITY)) {
pp = _dispatch_priority_propagate();
}
_dispatch_continuation_voucher_set(dc, flags);
return _dispatch_continuation_priority_set(dc, dqu, pp, flags);
}

DISPATCH_ALWAYS_INLINE
static inline dispatch_qos_t
_dispatch_continuation_init(dispatch_continuation_t dc,
dispatch_queue_class_t dqu, dispatch_block_t work,
dispatch_block_flags_t flags, uintptr_t dc_flags)
{
void *ctxt = _dispatch_Block_copy(work);

dc_flags |= DC_FLAG_BLOCK | DC_FLAG_ALLOCATED;
if (unlikely(_dispatch_block_has_private_data(work))) {
dc->dc_flags = dc_flags;
dc->dc_ctxt = ctxt;
// will initialize all fields but requires dc_flags & dc_ctxt to be set
return _dispatch_continuation_init_slow(dc, dqu, flags);
}

dispatch_function_t func = _dispatch_Block_invoke(work);
if (dc_flags & DC_FLAG_CONSUME) {
func = _dispatch_call_block_and_release;
}
return _dispatch_continuation_init_f(dc, dqu, ctxt, func, flags, dc_flags);
}

DISPATCH_ALWAYS_INLINE
static inline dispatch_qos_t
_dispatch_continuation_priority_set(dispatch_continuation_t dc,
dispatch_queue_class_t dqu,
pthread_priority_t pp, dispatch_block_flags_t flags)
{
dispatch_qos_t qos = DISPATCH_QOS_UNSPECIFIED;
#if HAVE_PTHREAD_WORKQUEUE_QOS
dispatch_queue_t dq = dqu._dq;

if (likely(pp)) {
bool enforce = (flags & DISPATCH_BLOCK_ENFORCE_QOS_CLASS);
bool is_floor = (dq->dq_priority & DISPATCH_PRIORITY_FLAG_FLOOR);
bool dq_has_qos = (dq->dq_priority & DISPATCH_PRIORITY_REQUESTED_MASK);
if (enforce) {
pp |= _PTHREAD_PRIORITY_ENFORCE_FLAG;
qos = _dispatch_qos_from_pp_unsafe(pp);
} else if (!is_floor && dq_has_qos) {
pp = 0;
} else {
qos = _dispatch_qos_from_pp_unsafe(pp);
}
}
dc->dc_priority = pp;
#else
(void)dc; (void)dqu; (void)pp; (void)flags;
#endif
return qos;
}

注意_dispatch_continuation_init函数中,参数dispatch_block_t work即为传入的notification block。

1
2
3
void *ctxt = _dispatch_Block_copy(work);
// xxxxxx
dc->dc_ctxt = ctxt;

notification block实际存入了dispatch_continuation_t对象dc的dc_ctxt字段中了。

_dispatch_group_wake

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
DISPATCH_NOINLINE
static void
_dispatch_group_wake(dispatch_group_t dg, uint64_t dg_state, bool needs_release)
{
uint16_t refs = needs_release ? 1 : 0; // <rdar://problem/22318411>

if (dg_state & DISPATCH_GROUP_HAS_NOTIFS) {
dispatch_continuation_t dc, next_dc, tail;

// Snapshot before anything is notified/woken <rdar://problem/8554546>
dc = os_mpsc_capture_snapshot(os_mpsc(dg, dg_notify), &tail);
do {
dispatch_queue_t dsn_queue = (dispatch_queue_t)dc->dc_data;
next_dc = os_mpsc_pop_snapshot_head(dc, tail, do_next);
_dispatch_continuation_async(dsn_queue, dc,
_dispatch_qos_from_pp(dc->dc_priority), dc->dc_flags);
_dispatch_release(dsn_queue);
} while ((dc = next_dc));

refs++;
}

if (dg_state & DISPATCH_GROUP_HAS_WAITERS) {
_dispatch_wake_by_address(&dg->dg_gen);
}

if (refs) _dispatch_release_n(dg, refs);
}

#define os_mpsc_capture_snapshot(Q, tail) ({ \
os_mpsc_node_type(Q) _head = os_mpsc_get_head(Q); \
os_atomic_store(_os_mpsc_head Q, NULL, relaxed); \
/* 22708742: set tail to NULL with release, so that NULL write */ \
/* to head above doesn't clobber head from concurrent enqueuer */ \
*(tail) = os_atomic_xchg(_os_mpsc_tail Q, NULL, release); \
_head; \
})

#define os_mpsc_pop_snapshot_head(head, tail, _o_next) ({ \
typeof(head) _head = (head), _tail = (tail), _n = NULL; \
if (_head != _tail) _n = os_mpsc_get_next(_head, _o_next); \
_n; \
})

通过 os_mpsc_pop_snapshot_head 的定义,以及 next_dc = os_mpsc_pop_snapshot_head(dc, tail, do_next); 这一句代码,可以看出_dispatch_group_wake函数的主要逻辑也就是对dispatch_continuation_t next_dc这个一个链表结构,依次取出其中的元素dispatch_continuation_t dc,执行函数调用_dispatch_continuation_async,这也就是触发notification block执行的实际代码。

1
2
_dispatch_continuation_async(dsn_queue, dc,
_dispatch_qos_from_pp(dc->dc_priority), dc->dc_flags);

_dispatch_continuation_async

1
2
3
4
5
6
7
8
9
10
11
12
13
14
DISPATCH_ALWAYS_INLINE
static inline void
_dispatch_continuation_async(dispatch_queue_class_t dqu,
dispatch_continuation_t dc, dispatch_qos_t qos, uintptr_t dc_flags)
{
#if DISPATCH_INTROSPECTION
if (!(dc_flags & DC_FLAG_NO_INTROSPECTION)) {
_dispatch_trace_item_push(dqu, dc);
}
#else
(void)dc_flags;
#endif
return dx_push(dqu._dq, dc, qos);
}

看这个dx_push(dqu._dq, dc, qos);

1
2
#define dx_push(x, y, z) dx_vtable(x)->dq_push(x, y, z)
#define dx_vtable(x) (&(x)->do_vtable->_os_obj_vtable)

这个do_vtable是啥呢?即为之前构建dispatch_continuation_t对象的时候,其中的DISPATCH_CONTINUATION_HEADER宏定义中的字段。

1
2
3
4
union { \
const void *do_vtable; \
uintptr_t dc_flags; \
}; \
1
2
3
4
5
6
7
#define DISPATCH_QUEUE_VTABLE_HEADER(x); \
DISPATCH_OBJECT_VTABLE_HEADER(x); \
void (*const dq_activate)(dispatch_queue_class_t, bool *allow_resume); \
void (*const dq_wakeup)(dispatch_queue_class_t, dispatch_qos_t, \
dispatch_wakeup_flags_t); \
void (*const dq_push)(dispatch_queue_class_t, dispatch_object_t, \
dispatch_qos_t)

所以,由此可以看出,在_dispatch_group_wake调用时,通过将notification block丢入(dx_push)到指定的queue中,则完成了GCD group的一个完整流程。

_dispatch_workloop_push

关于 #define dx_push(x, y, z) dx_vtable(x)->dq_push(x, y, z) ,通过DISPATCH_VTABLE_INSTANCE宏将dq_push与_dispatch_workloop_push关联起来。

1
2
3
4
5
6
7
8
9
10
DISPATCH_VTABLE_INSTANCE(workloop,
.do_type = DISPATCH_WORKLOOP_TYPE,
.do_dispose = _dispatch_workloop_dispose,
.do_debug = _dispatch_queue_debug,
.do_invoke = _dispatch_workloop_invoke,

.dq_activate = _dispatch_queue_no_activate,
.dq_wakeup = _dispatch_workloop_wakeup,
.dq_push = _dispatch_workloop_push,
);

_dispatch_workloop_push的函数原型如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
void
_dispatch_workloop_push(dispatch_workloop_t dwl, dispatch_object_t dou,
dispatch_qos_t qos)
{
struct dispatch_object_s *prev;

if (unlikely(_dispatch_object_is_waiter(dou))) {
return _dispatch_workloop_push_waiter(dwl, dou._dsc, qos);
}

if (qos < _dispatch_priority_qos(dwl->dq_priority)) {
qos = _dispatch_priority_qos(dwl->dq_priority);
}
if (qos == DISPATCH_QOS_UNSPECIFIED) {
qos = _dispatch_priority_fallback_qos(dwl->dq_priority);
}
prev = _dispatch_workloop_push_update_tail(dwl, qos, dou._do);
if (unlikely(os_mpsc_push_was_empty(prev))) {
_dispatch_retain_2_unsafe(dwl);
}
_dispatch_workloop_push_update_prev(dwl, qos, prev, dou._do);
if (unlikely(os_mpsc_push_was_empty(prev))) {
return _dispatch_workloop_wakeup(dwl, qos, DISPATCH_WAKEUP_CONSUME_2 |
DISPATCH_WAKEUP_MAKE_DIRTY);
}
}

前边的 dx_push(dqu._dq, dc, qos); 即等同于 _dispatch_workloop_push(dqu._dq, dc, qos); 操作。

调用_dispatch_workloop_push即完成了将dispatch_continuation_t对象dc丢到dispatch_queue_class_t的_dq中(queue),同时还有qos参数。

至于queue中的block的实际执行代码,要继续从GCD源码找答案了。这里先埋一个坑,以后再填吧!

参考资料

  1. dispatch
  2. iOS疑难问题排查之深入探究dispatch_group crash
坚持原创技术分享,您的支持将鼓励我继续创作! So,来杯咖啡?