Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strawman text to show how unverified media would work#1026

Closed
wants to merge 1 commit into from

Conversation

fluffy
Copy link
Contributor

@fluffy fluffy commented Feb 12, 2017

Fixes #849

@fluffy
Copy link
Contributor Author

fluffy commented Feb 12, 2017

More works is needed on this but I want to check with people the basic content of this is correct before we bother to cross the t and dot the i.

@stefhak
Copy link
Contributor

stefhak commented Feb 12, 2017

Thanks Cullen. Let's see what people think.Should we reach out to Ekr for an opinion on whether this is OK from a security perspective?

@rshpount
Copy link

I do not think 5 sec limit is needed here. I think all we need to say that media must be terminated if SDP with fingerprint is received and this fingerprint does not match the certificate used in the established DTLS session.

@stefhak
Copy link
Contributor

stefhak commented Feb 14, 2017

@rshpount to not have a timeout seems very strange. Would not an app that wants to do something bad simply just skip applying the answer, and the unverified media could flow for a long time?

@rshpount
Copy link

@stefhak I understand your concern about the timeout, but then we need to answer two questions:

  1. What happens if answer arrives after 5 sec and it matches the certificate?
  2. Why 5 seconds? Why not 32 seconds or some other number? I can certainly see how answer can be delayed for more then 5 seconds, especially in situations of signaling servers recovering after fail-over.

@stefhak
Copy link
Contributor

stefhak commented Feb 14, 2017

@rshpount as I've understood it the use case is when the media set-up outraces the SDP answer propagation back to the offerer when the answerer has a legacy device (that can't send a pranswer). That to me says the timeout should be something more like 1 second (in what normal scenarios would the answer take more than 1 second more than the media?). The use case is not about handling signaling servers that go down.

A fundamental question to me is if

   NOT assume that the data transmitted over the TLS connection is valid
   until it has received a matching fingerprint in an SDP answer.

from RFC 4572 allows playing the media at all before the answer is received.

(BTW the PR looks fine to me, except for perhaps the timeout should be more 1 sec than 5 IMO, given this is allowed by 4572)

@rshpount
Copy link

I do not think RFC 4572 allows media playback before SDP answer. Ideal behavior is to process and decode media, but not to pass it for playback so that media can be played as soon as answer is received without waiting for the new iframe. From what I understand this is the proposed behavior when allowUnverifiedMedia flag is not set.

This being said, if you set the allowUnverifiedMedia flag and do allow media playback before answer is received, then you need to take into account that signaling normally goes over some sort of centralized infrastructure while media goes direct between two end points. Signaling often traverses multiple servers, which are non-local, can become temporary overloadedm or fail-over. Things which result in user detectable issues (and starting playback and then stopping and then starting again is such as issue), should be avoided. In case of services that we operate, in case of fail-over, signaling can be delayed up to about 16 sec. The 5 sec delay will not be sufficient for me since it will require a much more frequent heartbeat monitoring between signaling servers. Even in case of two end points on the same network communicating with remote signaling server, simple internet connection disruption can cause TCP packet delay which is higher then 5 sec which will in turn delay the answer.

@stefhak
Copy link
Contributor

stefhak commented Feb 15, 2017

@rshpount what makes me feel uneasy is adding an API that allows skipping one part of the security solution we've agreed on without much discussion (and no input from security experts like @ekr so far). This may be totally fine, but I am personally not able to judge.

@fluffy
Copy link
Contributor Author

fluffy commented Feb 15, 2017

I will ping EKR - be good to have his input.

The 5 seconds was very arbitrary. I was trying to pick a number that was high enough that a reasonable signaling system could deliver the answer in that time. I don't really think a timer is needed at all but I don't mind adding it. The reasons I don't think a timer is needed is because the app can implement whatever timer it wants in the app. Keep in mind that if the app is "bad" it can do whatever it wants with this media.

The key thing about data received before a finger print is not that "it's not valid" it is valid data, it's that you don't know who it is from. Some application clearly won't want to play media when they don't know who it is from, some are fine with that and they will not display who the media is from until they get the fingerprint. Keep in mind the default behavior would be to not use this data, we are just adding the option for applications that want to use it play it. One of the reasons we need this is for the 1-800-gofedex case to work which is in our original requirements.

@stefhak
Copy link
Contributor

stefhak commented Feb 15, 2017

Thanks @fluffy. Just to be sure: is there no way that this can be exploited e.g. by some MITM for the period up to when the fingerprint is received?

@martinthomson
Copy link
Member

martinthomson commented Feb 15, 2017

I was initially concerned that this could be incompatible with media isolation and could not be implemented in Firefox without also removing or significantly altering the semantics of identity. However, I think that it's OK from that angle - the requirement here is that the DTLS handshake is complete, which should make that available.

That said, this definitely could be exploited by an attacker. This would end-run any protections we might gain through something like draft-thomson-avtcore-sdp-uks. FWIW, that's a pretty lame attack, so you might reasonably conclude that the risks are acceptable.

@aboba
Copy link
Contributor

aboba commented Feb 15, 2017

@fluffy Some clarifying questions:

"The RTCRtpReceiver MAY discard this media or MAY buffer this media so that video key frames are not
lost. Once the SDP fingerprint is received, and the DTLS connection verified, any buffered media and media received after is made available to the application."

[BA] How does the application developer know which of the above approaches an implementation has taken (e.g. whether it discards unverified media or buffers it)?

"If the allowUnverifiedMedia attribute on the RTCRtpReceiver has been set to true, then up to 5 seconds worth of media is made available to the applications even if that media is received before the SDP fingerprint."

[BA] Is the above trying to say that unverified media is buffered for up to 5 seconds and then made available once verification is completed? Or is it trying to say that up to 5 seconds of unverified media is made available as it comes in?

s/applications/application/

@aboba aboba changed the title strawman text to show how unverfieid media would work strawman text to show how unverified media would work Feb 15, 2017
@pthatcherg
Copy link
Contributor

As I pointed out in #849 just now, I think it's actually impossible for this to work with ICE+DTLS. Here's my reasoning, copied from #849:

  1. You can receive DTLS from the remote side before receiving the remote description (and thus fingerprint). This happens if the remote side sends an ICE connectivity check and the local side sends a response and then the remote side sends a DTLS packet.

  2. You cannot send DTLS from the local side before receiving the remote description (and thus fingerprint). This is because you can't send an ICE connectivity check until you have the remote ICE ufrag and pwd, and thus can't get an ICE connectivity check response, and thus can't send DTLS. This is because you can't send anything other than ICE until you get an ICE connectivity check response.

  3. Since you can't send DTLS, you can't complete the handshake, and thus can't extract the SRTP key.

@adamroach
Copy link

On their face, @pthatcherg's assertions appear to be true. Can someone who thinks this can happen draw a ladder diagram demonstrating how the situation under discussion arises? @fluffy?

@adamroach
Copy link

adamroach commented Apr 3, 2017

The MMUSIC discussion seems, at the moment, to conclude that the only way the situation under discussion can arise is:

  1. When using ORTC rather than WebRTC -- which clearly requires no text in the WebRTC document, or
  2. When an ICE lite endpoint is in use, the ICE lite endpoint itself (but not the full ICE endpoint it is talking to) can get early media. Since browsers cannot be ICE lite endpoints, this situation also requires no text in the WebRTC document.

The only thing that I've seen mentioned is some interaction between PRANSWER and trickle ICE in which a fingerprint has been received by the offerer, but that fingerprint is incorrect. Leaving aside for a moment that this sounds like it goes beyond "unverified media" and clear into "indistinguishable from forged media", it's still not clear how this can happen.

Minimally, I think the working group needs to understand the circumstances leading to such a situation (hence my request for a ladder diagram); and, ideally, such a situation should be clearly described in the text around unverified media. It does no good to give webdevs an affordance to control the behavior in this situation if they don't have any way to understand what the situation actually is. And if it's not obvious to us, then they have no hope whatsoever.

@rshpount
Copy link

rshpount commented Apr 3, 2017 via email

@adamroach
Copy link

The executive summary of @rshpount's comment as I read it is "this can't happen to a browser." Any counterpoints? @fluffy?

@fluffy
Copy link
Contributor Author

fluffy commented Apr 4, 2017

Imagine a browser sends offer to a SBC like thing. The SBC sends PR answer that sets up just a data connection. At this point ICE comes up and a TLS session comes up. Now the SBC forwards the offer as a SIP invite to a PSTN GW calling 1-800-go-fedex which sets up a second TLS connection but that media packets for this are relayed via the SBC. So the first PR answer set up the ICE. The second TLS connection is happening within that TLS context. The PSTN GW completes the ICE handshake, and then starts sending the one way media with IVR prompts from fedex as ringback tone. At the point that it needs to go two way, the PSTN GW sends a 200 with answer which the SBC translates into answer with the fingerprint of the PSTN GW to send to the browser. If the browser discards this media, it looses the initial prompt. Note that the browser gets the fingerprint and knows who it is talking to before it needs to send any media or DTMF. This case would likely work if you buffered all the media and only played it once the fingerprint arrived because the IVR would just wait at the prompt till the person had listened to the buffered media and pressed 1. But the browser would need to speed up the playback or it could never catch up.

@rshpount
Copy link

rshpount commented Apr 4, 2017 via email

@pthatcherg
Copy link
Contributor

pthatcherg commented Apr 4, 2017 via email

@taylor-b
Copy link
Contributor

taylor-b commented Apr 5, 2017

I think I understand Cullen's use case. I just didn't consider there being a second DTLS handshake in the picture. Let me break down the important steps that occur here (if I understand correctly):

  1. Offer and provisional answer exchanged between browser and gateway.
  2. DTLS handshake completes using the fingerprint in the provisional answer.
  3. Before receiving a final answer, the WebRTC endpoint receives a Client Hello on a new candidate pair with a new ufrag (addressing @rshpount's point about needing a new 5-tuple).
  4. The WebRTC endpoint completes this second handshake, maintaining the two DTLS associations in parallel?
  5. At this point, early media can be received on the second DTLS association.
  6. Eventually the answer is received, the first DTLS association is discarded and the fingerprint of the second one can be verified.

However, this goes into extreme pranswer edge case territory, which JSEP doesn't currently define. This is what a really robust implementation might do, but I don't see anything preventing it from just ignoring the second Client Hello until it gets the final answer and discards the first DTLS association.

A very related issue is rtcweb-wg/jsep#600 (scroll down to find stuff about maintaining N DTLS associations in parallel...). In the PR I wrote to address this issue, I didn't end up adding any requirements related to early media, since @juberti said "implementations should be allowed to only handle one remote username at a time", which seemed reasonable to me.

EDIT: Nevermind. At step 4, you still can't complete the second DTLS handshake, because you don't have the second remote ICE password yet, so you can't get ICE connected. Which is back to the original problem. The best an implementation could do is cache the second Client Hello.

@rshpount
Copy link

rshpount commented Apr 5, 2017 via email

@taylor-b
Copy link
Contributor

taylor-b commented Apr 5, 2017

Right, that's what I'm saying. See edit.

@fluffy
Copy link
Contributor Author

fluffy commented Apr 5, 2017

Sorry - I messed up the original explain because I had the data and audio and video muxed ... but to do it in the muxed case here ... take the steps Taylor has in the above thing but with with the SBC never setting up any DTLS session. It only does the ICE and never initiated the TLS. Later the PSTN GW initiated the DTLS. So in step 2, it would only be ICE completed not DTLS.

I have vague memory there was some flow that used a secondary PR answer with a=dtls-connection:new too.

@taylor-b
Copy link
Contributor

taylor-b commented Apr 5, 2017

Ok; so what fingerprint does the SBC put in the PR answer? If the answer is "a bogus fingerprint", then the WebRTC implementation will fail the DTLS handshake with "bad certificate". The alternative would be that the SBC creates its own certificate, but shares it with the PSTN GW?

@rshpount
Copy link

rshpount commented Apr 5, 2017

First of all, there is no longer a=dtls-connection:new. This got edited out of dtls-sdp draft.

Second, SBC cannot just establish ICE without starting DTLS association with WebRTC end point. Any session description sent to WebRTC end point must have both ICE and DTLS attributes. This means DTLS association is always started after step 2. It is possible to start new DTLS association in the final answer from the same original offer, but this will also require a new ICE session with new ufrag and candidates allocated by SBC. Until ICE password used for these new candidates is delivered to WebRTC end point, WebRTC end point cannot send the consent check to SBC, which prevents ServerHello from being sent to SBC and DTLS association from being established.

All of this being said, there is a problem on SBC side. There are plenty of scenarios when it will receive media from SIP gateway with no way to complete the ICE session and DTLS association setup with WebRTC end point. The only way early media works right now, is if SBC sits on the media path doing transcoding if necessary until final answer is received. To make SBC work with early media without transcoding you need functionality to setup ICE/DTLS before doing codec negotiation, which is not coming until 2.0.

@fluffy
Copy link
Contributor Author

fluffy commented Apr 26, 2017

There does not need to be a dtls-connection:new it just comes in an answer that in sip terms would have a different ufrag. In webrtc terms would just be a different answer that arrived after the first PR answer. To answer taylors question, it can put a a fingerprint the SBC can terminate or it can put in a bogus fingerprint but make sure to never negotatte DTLS. It does not need to share a figner print with the SBC.

@rshpount
Copy link

Once again, dtls-connection:new no longer exists at all. It was replaced by tls-id.

Second, from what everyone else sees, it is impossible to receive unverified media with full ICE end points and consent to send. If you think otherwise, please provide a complete scenario.

@taylor-b
Copy link
Contributor

it can put a a fingerprint the SBC can terminate or it can put in a bogus fingerprint but make sure to never negotatte DTLS. It does not need to share a figner print with the SBC.

So, it can use either a fingerprint for an association terminated by the SBC, or it can use a bogus fingerprint? I thought we already talked about why this wouldn't work. Maybe I'm still not understanding the network topology. Here's what I think you're describing; can you point out where I went wrong?

screen shot 2017-04-26 at 10 01 20 pm

screen shot 2017-04-26 at 10 03 36 pm

@stefhak
Copy link
Contributor

stefhak commented May 2, 2017

Closing, see comment #849 (comment).

@stefhak stefhak closed this May 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants