It is was one of those days when we found out that there was something wrong with data traffic. Our customers complained that some of the web sites only partially opened or didn’t open at all. It wasn’t hard to guess that it was probably an MTU issue since in 90% of cases it is MTU. Between the client’s site and our aggregation switch we were running Q in Q. Q-in-Q requires 1504 bytes as minimum MTU size on all switches along the way. After some research and debug it was confirmed that the issue was in MTU size. To be more specific some device in the cloud in-between SITE #1 and SITE #2 didn’t allow packets bigger than 1500 to go through. As temporary solution it was decided to come up with fragmenting packets on the gateway. If you are interested how to do it, follow the link: MTU on Cisco Routers. Although it was not desired because of increased CPU usage. But there was no other choice.
Meanwhile I prepared a diagram for technical support staff of higher tier ISP from which the connection between sites was rented to prove that the problem was on their side. Eventually the issue was solved. They found out that an optical transceiver in the path didn’t allow packets bigger than 1500 bytes to go through and replaced it.
I wouldn’t like to throw away the diagram so I publish it here. Maybe someone will find it useful for MTU troubleshooting.