Relative to other operations, accessing files on disk is one of the slowest operations a computer can perform. Depending on the size and number of files, it can take anywhere from a few milliseconds to several minutes to read files from a disk-based hard drive. Make sure your code performs as efficiently as possible under even light to moderate workloads.
- 相对于其他操作,访问磁盘上的文件是计算机可以执行的最慢操作之一。 根据文件的大小和数量,从基于磁盘的硬盘驱动器读取文件可能需要几毫秒到几分钟的时间。 确保您的代码在轻度到中等工作负载下尽可能高效地运行。
If your app slows down or becomes less responsive when it starts working with files, use the Instruments app to gather some baseline metrics. Instruments show you how much time your app spends operating on files and helps monitor various file-related activity. As you fix each problem, run your code in Instruments again and record the results, so that you can verify whether your changes worked.
- 如果您的应用在开始处理文件时速度变慢或响应速度变慢,请使用Instruments应用收集一些基线指标。 仪器显示您的应用程序花在文件上的时间,并帮助监控各种与文件相关的活动。 在解决每个问题时,再次在Instruments中运行代码并记录结果,以便验证更改是否有效。
Potential Problem Areas and Fixes
Look for these possible problem areas:
寻找这些可能的问题领域:
-
Code that’s reading lots of files (of any type) from disk. Remember to look for places where you are loading resource files too. Are you actually using the data from all of those files right away? If not, you might want to load some of the files more lazily.
- 从磁盘读取大量文件(任何类型)的代码。 请记住查找正在加载资源文件的位置。 您是否正在使用所有这些文件中的数据? 如果没有,您可能想要更懒惰地加载一些文件。
-
Code that uses older file-system calls. Most calls should be using Swift or Objective-C APIs. You can use BSD-level calls too, but don’t use older Carbon-based functions that operate on
FSRef
orFSSpec
data structures. Xcode generates warnings when it detects your code using deprecated methods and functions, so make sure you check those warnings. Also see Use Modern File System Interfaces.- 使用旧文件系统调用的代码。 大多数调用应该使用Swift或Objective-C API。 您也可以使用BSD级别的调用,但不要使用在FSRef或FSSpec数据结构上运行的旧的基于Carbon的函数。 Xcode在使用不推荐使用的方法和函数检测到代码时会生成警告,因此请务必检查这些警告。 另请参阅使用现代文件系统接口。
-
Code that uses callback functions or methods to process file data. If a newer API is available that takes a block object, update your code to use that API instead.
- 使用回调函数或方法处理文件数据的代码。 如果有更新的API可用于获取块对象,请更新代码以使用该API。
-
Code with many small read or write operations performed on the same file. Can you group those operations together and perform them all at once? For the same amount of data, one large read or write operation is usually more efficient than many small operations.
- 在同一文件上执行许多小型读取或写入操作的代码。 您可以将这些操作组合在一起并立即执行所有操作吗? 对于相同数量的数据,一个大的读或写操作通常比许多小操作更有效。
General Recommendations
These recommendations can help improve your file system related performance. As with all tips, measure performance before and after so that you can verify optimizations.
这些建议有助于提高与文件系统相关的性能。 与所有提示一样,请在之前和之后测量性能,以便验证优化。
-
Minimize the number of file operations. Moving data from a local file system into memory takes a significant amount of time. And if the target file system is located on a server halfway around the world, network latency increases the delay in retrieving the data.
- 最小化文件操作的数量。 将数据从本地文件系统移动到内存需要花费大量时间。 如果目标文件系统位于世界各地的服务器上,则网络延迟会增加检索数据的延迟。
-
Reuse path objects. When you access file resources for a NSURL object using the resourceValuesForKeys(_:) method, those values are cached. By reusing the NSURL object, you may avoid file system access on subsequent access to those file resources. Also, because NSURL objects can be expensive to construct, prefer reuse over creating new instances each time you reference a file.
- 重用路径对象。 使用resourceValuesForKeys(_ :)方法访问NSURL对象的文件资源时,将缓存这些值。 通过重用NSURL对象,可以避免在后续访问这些文件资源时访问文件系统。 此外,由于NSURL对象的构造成本很高,因此每次引用文件时都希望重用而不是创建新实例。
-
Don’t process large amounts of data in small chunks. Buffer size dramatically affects how fast data is read from disk to a local buffer. If you’re working with relatively large files, create a large buffer (say 128KB to 256KB) and read much or all of the data into memory before processing it. The same rules apply for writing data to the disk: Write data as sequentially as you can using a single file-system call.
- 不要以小块处理大量数据。 缓冲区大小会显着影响从磁盘读取数据到本地缓冲区的速度。 如果您正在使用相对较大的文件,请创建一个大缓冲区(比如128KB到256KB),并在处理之前将大部分或全部数据读入内存。 将数据写入磁盘也适用相同的规则:使用单个文件系统调用按顺序写入数据。
-
Read data sequentially instead of jumping around in a file. The kernel transparently clusters I/O operations, which makes sequential reads much faster.
- 按顺序读取数据而不是在文件中跳转。 内核透明地集群I / O操作,这使得顺序读取更快。
-
Avoid skipping ahead in an empty file before writing data. The system might have to write zeroes into the intervening space to fill the gap. Including “holes” in your files at write time might incur a performance penalty, so don’t do it without good reason. For more information, see Zero-Fill Delays Provide Security at a Cost.
- 在写入数据之前,请避免在空文件中向前跳过。 系统可能必须将零写入中间空间以填补空白。 在写入时在文件中包含“漏洞”可能会导致性能下降,因此如果没有充分理由,请不要这样做。 有关更多信息,请参阅零填充延迟以成本提供安全性。
-
Defer I/O operations until your app needs the data. The golden rule of being lazy applies to disk performance as well as many other types of performance.
- 推迟I / O操作,直到您的应用需要数据。 懒惰的黄金法则适用于磁盘性能以及许多其他类型的性能。
-
Don’t use the preferences system to capture data that can be inexpensively recomputed. Use the preferences system to capture only user preferences (such as window positions, view settings, and user provided preferences). Recomputing simple values is significantly faster than reading the same value from disk.
- 不要使用首选项系统来捕获可以廉价重新计算的数据。 使用首选项系统仅捕获用户首选项(例如窗口位置,视图设置和用户提供的首选项)。 重新计算简单值比从磁盘读取相同值要快得多。
-
Don’t assume that caching files in memory will speed up your app. Caching files in memory increases memory usage, which can decrease performance in other ways. Plus, the system may cache some file data for you automatically, so creating your own caches might make things even worse; see The System Has its Own File Caching Mechanism.
- 不要以为内存中的缓存文件会加速你的应用。 在内存中缓存文件会增加内存使用量,这会以其他方式降低性能。 此外,系统可能会自动为您缓存一些文件数据,因此创建自己的缓存可能会使事情变得更糟; 看看系统有自己的文件缓存机制。
Deciding When to Use File System Caching or Mapped I/O
Disk caching can be a good way to accelerate access to file data, but its use is not appropriate in every situation. Caching increases the memory footprint of your app and if used inappropriately can be more expensive than simply reloading data from the disk.
- 磁盘缓存可以是加速文件数据访问的好方法,但它的使用并不适用于所有情况。 缓存会增加应用程序的内存占用量,如果使用不当,可能比简单地从磁盘重新加载数据更昂贵。
Caching is most appropriate for files you plan to access multiple times. If you have files that you intend to use only once, either disable the caches or map the file into memory.
- 缓存最适合您计划多次访问的文件。 如果您有仅打算使用一次的文件,请禁用缓存或将文件映射到内存中。
Disabling File System Caching
When reading data that you won’t need again soon, such as when streaming a large multimedia file, tell the file system not to add that data to the file-system caches. Disable file system caching for files being read once and discarded by passing the DataReadingUncached option to init(contentsOfURL:options:). By default, the system maintains a buffer cache with the data most recently read from disk. This disk cache is most effective when it contains frequently used data. If you leave file caching enabled while streaming a large multimedia file, you can quickly fill up the disk cache with data you won’t use again. Even worse, this process is likely to push other data out of the cache that might have benefited from being there.
- 在读取不再需要的数据时,例如流式传输大型多媒体文件时,请告诉文件系统不要将该数据添加到文件系统缓存中。 通过将DataReadingUncached选项传递给init(contentsOfURL:options :),禁用一次读取和丢弃的文件的文件系统缓存。 默认情况下,系统维护一个缓冲区缓存,其中包含最近从磁盘读取的数据。 当包含常用数据时,此磁盘缓存最有效。 如果在流式传输大型多媒体文件时启用文件缓存,则可以使用不再使用的数据快速填充磁盘缓存。 更糟糕的是,这个过程可能会将其他数据从缓存中推出,而这些数据可能会从那里获益。
Note: For reading uncached data, it is recommended that you use 4K-aligned buffers. This gives the system more flexibility in how it loads the data into memory and can result in faster load times.
- 要读取未缓存的数据,建议您使用4K对齐的缓冲区。 这使系统在如何将数据加载到内存方面具有更大的灵活性,并可以加快加载速度。
Using Mapped I/O Instead of Caching
For data read randomly from a file, you can sometimes improve performance by mapping that file directly into your app’s virtual memory space. File mapping is a programming convenience for files you want to access with read-only permissions. It lets the kernel take advantage of the virtual memory paging mechanism to read the file data only when it is needed. You can also use file mapping to overwrite existing bytes in a file; however, you cannot extend the size of the file using this technique. Mapped files bypass the system disk caches, so only one copy of the file is stored in memory.
- 对于从文件中随机读取的数据,有时可以通过将该文件直接映射到应用程序的虚拟内存空间来提高性能。 对于要以只读权限访问的文件,文件映射是一种编程方便。 它允许内核利用虚拟内存分页机制仅在需要时读取文件数据。 您还可以使用文件映射来覆盖文件中的现有字节; 但是,您无法使用此技术扩展文件的大小。 映射文件绕过系统磁盘缓存,因此只有一个文件副本存储在内存中。
Important: If you map a file into memory and the file becomes inaccessible—because the disk containing the file was ejected or the network server containing the file is unmounted—your app will crash with a SIGBUS error. Your app can also crash if you map a file into memory, that file gets truncated, and you attempt to access data at a range that not longer exists.
- 如果将文件映射到内存并且文件变得不可访问 - 因为弹出了包含该文件的磁盘或者卸载了包含该文件的网络服务器 - 您的应用程序将因SIGBUS错误而崩溃。 如果将文件映射到内存,该文件被截断,并且您尝试访问不再存在的范围内的数据,您的应用程序也会崩溃。
For more information about mapping files into memory, see File System Advanced Programming Topics.
- 有关将文件映射到内存的更多信息,请参阅文件系统高级编程主题。
Working with Zero-Filling
- 使用零填充
For security reasons, file systems are supposed to zero out areas on disk when the data from those areas is allocated to a file. This behavior prevents data left over from a previously deleted file from being included with the new file.
- 出于安全原因,当来自这些区域的数据被分配给文件时,文件系统应该将磁盘上的区域清零。 此行为可防止先前删除的文件遗留的数据包含在新文件中。
For both reading and writing operations, the system delays the writing of zeroes until the last possible moment. When you close a file after writing to it, the system writes zeroes to any portions of the file your code did not touch. When reading from a file, the system writes zeroes to new areas only when your code attempts to read from that area or when it closes the file. This delayed-write behavior avoids redundant I/O operations to the same area of a file.
- 对于读取和写入操作,系统将零写入延迟到最后一刻。 在写入文件后关闭文件时,系统会将零写入代码未触及的文件的任何部分。 从文件读取时,只有当代码尝试从该区域读取或关闭文件时,系统才会将零写入新区域。 这种延迟写入行为避免了对文件的同一区域的冗余I / O操作。
If you notice a delay when closing your files, it is likely because of this zero-fill behavior. Make sure you do the following when working with files:
如果您在关闭文件时发现延迟,可能是因为这种零填充行为。 使用文件时,请确保执行以下操作:
-
Write data to files sequentially. Gaps in writing must be filled with zeros when the file is saved.
- 按顺序将数据写入文件。 保存文件时,必须用零填充写入空白。
-
Don’t move the file pointer past the end of the file and then close the file.
- 不要将文件指针移到文件末尾,然后关闭文件。
-
Truncate files to match the length of the data you wrote. For scratch files you plan to delete, truncate the file to zero-length.
- 截断文件以匹配您编写的数据的长度。 对于您计划删除的临时文件,将文件截断为零长度。
Note: Whereas the HFS Plus file system implements zero-fill behavior, APFS solves the zero-filling problem for you by supporting sparse files. In APFS, empty parts of a file that span one or more blocks are not physically stored, making it unnecessary to zero-fill entire blocks on disk.
- HFS Plus文件系统实现零填充行为,而APFS通过支持稀疏文件解决了零填充问题。 在APFS中,跨越一个或多个块的文件的空白部分没有物理存储,因此不必将整个块零填充到磁盘上。
Use Modern File System Interfaces
Choose routines that let you specify paths using NSURL objects over those that specify paths that use strings. Most URL-based routines are supported in macOS 10.6 and later, and are designed to take advantage of technologies like Grand Central Dispatch. This gives your code an immediate advantage on multicore computers while not requiring you to do much work.
- 选择允许您使用NSURL对象指定路径的例程,而不是指定使用字符串的路径的路径。 大多数基于URL的例程在macOS 10.6及更高版本中受支持,旨在利用Grand Central Dispatch等技术。 这使您的代码在多核计算机上立即获得优势,同时不需要您做太多工作。
Prefer routines that accept block objects over those that accept callback functions or methods. Blocks are a convenient and more efficient way to implement callback-type behaviors. Blocks often require much less code to implement because they don’t require you to define and manage a context data structure for passing data. Some routines might also execute your block by scheduling it in a GCD queue, which can also improve performance.
- 首选接受块对象的例程,而不是那些接受回调函数或方法的例程。 块是实现回调类型行为的一种方便且更有效的方法。 块通常需要更少的代码来实现,因为它们不需要您定义和管理用于传递数据的上下文数据结构。 某些例程也可能通过在GCD队列中调度它来执行您的块,这也可以提高性能。
网友评论