SMB sudden loss of connection: macOS broken for Catalina, Big Sur, Monterey and Ventura?

Hello!


Once again, it's about the tiresome topic of SMB.


We have a file server and 6-8 macOS workstations that process their files on the server.


The server was very old (running Yosemite) and was replaced by a new Mac mini M1 with Monterey and since that day there are several times a day SMB connection failures on the workstations. After restarting the server or restarting the shares it works again until it occurs again after a few hours. The SMB connection just gets lost.


I have already tested various combinations of Mac mini with Intel or M1, with Catalina, Big Sur, Monterey and Ventura, with all of them the same error occurs. You can work normally until suddenly the workstations lose the connection to the shares.


When we work with only a few workstations, this error seems to occur less often. Then we can work for 3-4 days without restarting the server, but if all workstations are used, the error occurs at some point and then it repeats itself every 1-3 hours.


The error does not seem to occur with older macOS versions, there are then unfortunately in connection with Monterey on the workstations other errors (Spotlight search, tags, etc.).


I want to run the server with Monterey because in Ventura is more troublesome with SMB.


What have I already tried:

1. fresh operating system installed

2. macOS 10.15, 11, 12, 13 tested

3. mac mini Intel (2012), Mac mini M1, Mac Studio tested

4. old and new RAID system tested

5. exchanged network switch

6. network lines checked

7. forced SMB2

... automatic restart every morning ...

Nothing helped. For about 9 months I have been struggling with this.


Every hint is highly welcome. Thanks.

eweri


Posted on Jul 18, 2023 02:18 AM

Reply
Question marked as Top-ranking reply

Posted on Jul 20, 2023 11:53 PM

While i was locking for some logs to fine a reason or solution to my problem i tracked the connections of smbd with lsof


A few Macs and only one Windows PC connects to the server and has a lot connections in state "closed" - maybe this PC kills the smbd?


Here is the output of lsof just before the connections drop:


than something happened and the connections start to drop:


and than every client with x.x.31.x lost connection to the shares:


So it looks to me, that the smbd got deaf - a restart of sharing service or the server is required to get it back to work.


Similar questions

10 replies
Question marked as Top-ranking reply

Jul 20, 2023 11:53 PM in response to xnav

While i was locking for some logs to fine a reason or solution to my problem i tracked the connections of smbd with lsof


A few Macs and only one Windows PC connects to the server and has a lot connections in state "closed" - maybe this PC kills the smbd?


Here is the output of lsof just before the connections drop:


than something happened and the connections start to drop:


and than every client with x.x.31.x lost connection to the shares:


So it looks to me, that the smbd got deaf - a restart of sharing service or the server is required to get it back to work.


Jul 20, 2023 11:12 PM in response to eweri

I´ve checked the console once again. The only problematic log entry it shows with smbd is this:



But there are only three groups, and four users in a group at maximum. I have no groups in groups. I think this message is a false warning.

There is no share with more than 4 ACEs (maximum three group-ACLs and one for _spotlight).


The users are configured as Standard

Jul 19, 2023 03:22 PM in response to eweri

Sounds like it might be electrical issues or radio-frequency interference issues, or a combination of both.


Electrical Issues:

Are all electrical circuits properly wired?

First, check for grounding issues.

A inexpensive plug-in LED "outlet checker" plugged into each outlet can quickly check for bad or missing earth ground, reversed leads, and similar conditions.

Is every machine plugged directly into the same power circuit?

Are all problematic machine(s) plugged into the same circuit/same breaker?

Are any computers plugged into extension cords, surge suppressors, or power strips?

Is the server power cord directly connected directly to a properly-grounded (ideally true sine-wave) Uninterruptible Power Supply?

If you plug a second machine into the same UPS, does the problem go away for those two machines?

(If the UPS has continuous power monitoring and logging software, examine the power log to see if the SMB disconnections coincide with power surges or dips?


Are there frequent building power disruptions? Frequent nearby lightning strikes?

A near direct-hit can easily overwhelm most inexpensive surge protectors, causing cumulative damage to computer components that weakens them over time.

Is this an old(er) building, or a new building in an area with lots of new construction nearby?

Is the building HVAC cycling frequently, or are there other high-draw devices (manufacturing equipment, microwaves, ovens, refrigerators, or refrigeration equipment, window air conditioners ?) in occasional use that would cause occasional voltage sags throughout the day?


How far apart are the network cable runs between the switch and the computers? There are still limits on network cable lengths. Are all network cables between devices of a similar recent spec (Cat 5, Cat 6, Cat 7?) All high quality cables and connectors? Were the network cables run by IT staff, building maintenance, electricians, or network specialists? Any network cables run directly over old-style fluorescent tube lighting fixtures?

Any damaged network connection cables? Do custodians use vacuums or floor maintenance equipment near network connecting cables on a regular basis?


Radio Frequency Interference Issues:

Any nearby television broadcast towers, military, airport, or weather radar, satellite dishes, microwave communications towers, cellular towers, new 5G cellular installations?

Do users have cell phones turned on and located next to these networked computers? (Intentionally listed last, because it's probably the least likely possibility.)


Are there any other 'mysterious' non-computer issues affecting the building - flaky lighting, HVAC, alarm systems, or communications equipment? Taken together, they might help point you to the ultimate cause(s).

Jul 20, 2023 12:47 AM in response to xnav

This is a replay to xnav (could not reply from my iPad last night):

Yes - But smbd logs nearly nothing and the other 100k log-entries are little overwhelming.


Tried to find something that gives a hint to the problem but nothing.




I found out, when it happens you can see entries in network connection list that all smb connections are in state „close wait“ (from my memory) and this means „clients do not answer, waiting for timeout to close socket“ IIRR.




other network connections like ssh or ARD have no interrupts.




Bye

Jul 20, 2023 01:01 AM in response to kostby

Hello!


The server stands in a server rack with lots of other servers and other network equipment. We also moved the server into another location in a server rack 100m from the origin.


While we were searching for a solution we found an old HP switch acting wired - this one was replaced by a new one.


Currently all workstation and the server are connected to the same fresh new switch and we setup a router to isolate our network traffic from the rest of the network. Before this there were 3-4 switches with VLAN and fibre-lines involved.


I am still thinking it is a bug in macOS as the same hardware with Yosemite did not show this connection losses.

This problem starts with macOS 10.15 and up - Intel or M1 does not matter.


Workstations are all Mac Studios running Monterey.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

SMB sudden loss of connection: macOS broken for Catalina, Big Sur, Monterey and Ventura?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.