I had already wondered how Apple manages to perform inference at Google while simultaneously protecting their privacy, essentially their unique selling point.
The answer: the heaviest requests run on Blackwell B200s inside Google Cloud, with NVIDIA's Confidential Computing encrypting the data while it's processed, so neither Google nor Apple can see it.
"NVIDIA Confidential Computing provides a hardware-based security layer for accelerated AI workloads. The technology protects data while it’s being processed by isolating workloads in trusted execution environments and enabling systems to cryptographically verify that the infrastructure has not been tampered with before any sensitive data is sent to the server."



