New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strawman text to show how unverified media would work#1026
Conversation
More works is needed on this but I want to check with people the basic content of this is correct before we bother to cross the t and dot the i. |
Thanks Cullen. Let's see what people think.Should we reach out to Ekr for an opinion on whether this is OK from a security perspective? |
I do not think 5 sec limit is needed here. I think all we need to say that media must be terminated if SDP with fingerprint is received and this fingerprint does not match the certificate used in the established DTLS session. |
@rshpount to not have a timeout seems very strange. Would not an app that wants to do something bad simply just skip applying the answer, and the unverified media could flow for a long time? |
@stefhak I understand your concern about the timeout, but then we need to answer two questions:
|
@rshpount as I've understood it the use case is when the media set-up outraces the SDP answer propagation back to the offerer when the answerer has a legacy device (that can't send a A fundamental question to me is if
from RFC 4572 allows playing the media at all before the answer is received. (BTW the PR looks fine to me, except for perhaps the timeout should be more 1 sec than 5 IMO, given this is allowed by 4572) |
I do not think RFC 4572 allows media playback before SDP answer. Ideal behavior is to process and decode media, but not to pass it for playback so that media can be played as soon as answer is received without waiting for the new iframe. From what I understand this is the proposed behavior when allowUnverifiedMedia flag is not set. This being said, if you set the allowUnverifiedMedia flag and do allow media playback before answer is received, then you need to take into account that signaling normally goes over some sort of centralized infrastructure while media goes direct between two end points. Signaling often traverses multiple servers, which are non-local, can become temporary overloadedm or fail-over. Things which result in user detectable issues (and starting playback and then stopping and then starting again is such as issue), should be avoided. In case of services that we operate, in case of fail-over, signaling can be delayed up to about 16 sec. The 5 sec delay will not be sufficient for me since it will require a much more frequent heartbeat monitoring between signaling servers. Even in case of two end points on the same network communicating with remote signaling server, simple internet connection disruption can cause TCP packet delay which is higher then 5 sec which will in turn delay the answer. |
I will ping EKR - be good to have his input. The 5 seconds was very arbitrary. I was trying to pick a number that was high enough that a reasonable signaling system could deliver the answer in that time. I don't really think a timer is needed at all but I don't mind adding it. The reasons I don't think a timer is needed is because the app can implement whatever timer it wants in the app. Keep in mind that if the app is "bad" it can do whatever it wants with this media. The key thing about data received before a finger print is not that "it's not valid" it is valid data, it's that you don't know who it is from. Some application clearly won't want to play media when they don't know who it is from, some are fine with that and they will not display who the media is from until they get the fingerprint. Keep in mind the default behavior would be to not use this data, we are just adding the option for applications that want to use it play it. One of the reasons we need this is for the 1-800-gofedex case to work which is in our original requirements. |
Thanks @fluffy. Just to be sure: is there no way that this can be exploited e.g. by some MITM for the period up to when the fingerprint is received? |
I was initially concerned that this could be incompatible with media isolation and could not be implemented in Firefox without also removing or significantly altering the semantics of identity. However, I think that it's OK from that angle - the requirement here is that the DTLS handshake is complete, which should make that available. That said, this definitely could be exploited by an attacker. This would end-run any protections we might gain through something like draft-thomson-avtcore-sdp-uks. FWIW, that's a pretty lame attack, so you might reasonably conclude that the risks are acceptable. |
@fluffy Some clarifying questions: "The RTCRtpReceiver MAY discard this media or MAY buffer this media so that video key frames are not [BA] How does the application developer know which of the above approaches an implementation has taken (e.g. whether it discards unverified media or buffers it)? "If the allowUnverifiedMedia attribute on the RTCRtpReceiver has been set to true, then up to 5 seconds worth of media is made available to the applications even if that media is received before the SDP fingerprint." [BA] Is the above trying to say that unverified media is buffered for up to 5 seconds and then made available once verification is completed? Or is it trying to say that up to 5 seconds of unverified media is made available as it comes in? s/applications/application/ |
As I pointed out in #849 just now, I think it's actually impossible for this to work with ICE+DTLS. Here's my reasoning, copied from #849:
|
On their face, @pthatcherg's assertions appear to be true. Can someone who thinks this can happen draw a ladder diagram demonstrating how the situation under discussion arises? @fluffy? |
The MMUSIC discussion seems, at the moment, to conclude that the only way the situation under discussion can arise is:
The only thing that I've seen mentioned is some interaction between PRANSWER and trickle ICE in which a fingerprint has been received by the offerer, but that fingerprint is incorrect. Leaving aside for a moment that this sounds like it goes beyond "unverified media" and clear into "indistinguishable from forged media", it's still not clear how this can happen. Minimally, I think the working group needs to understand the circumstances leading to such a situation (hence my request for a ladder diagram); and, ideally, such a situation should be clearly described in the text around unverified media. It does no good to give webdevs an affordance to control the behavior in this situation if they don't have any way to understand what the situation actually is. And if it's not obvious to us, then they have no hope whatsoever. |
I see 3 cases here:
1. Two full ICE end points communicating -- this issue is impossible since
answering end point will not send ServerHello until it runs the
connectivity check, which cannot complete before answer SDP with ICE ufrag
and fingerprints is received
2. ICE-lite end point sends an offer to the the full ICE WebRTC end point.
WebRTC end point runs the ICE connectivity check and sends ClientHello.
ICE-lite end point can receive ClientHello, send back ServerHello and
establish DTLS association before the answer is received from WebRTC end
point. This means ICE-lite (non-webrtc end poing) can receive data before
it receives answer SDP and fingerprints. I think the right solution here is
for ICE-lite end point cache ClientHello but not to send ServerHello until
the answer is received. This will prevent DTLS association from being
established before fingerprints are received.
3. Infamous 1-800 FedEx problem. This has nothing to do with the question,
but does create the real problem. Imagine SBC which is a full ICE/DTLS end
point on one side communicating with WebRTC end points and SIP/AVP (no DTLS
or ICE) on the other side. Imagine then, that WebRTC end point sends an
offer to SBC. SBC strips ICE and DTLS information from that offer and
sends it to SIP. SIP end point immediately starts sending data to SBC
before even sending the answer. SBC did not receive the answer from the SIP
end point and did not send the answer to WebRTC end point. There is no ICE
session or DTLS association established between the WebRTC end point and
SBC, but SBC is receiving data. The question is what to do with this data?
Right now the only options are:
a: discard
b: establish media session between SBC and WebRTC end point before sending
the offer to SIP and run transcoding during the early media session. Once
session is fully establish do a 3pcc session update to remove transcoding.
I believe problem 3 cannot be fixed until ortc/webrtc 2.0 is ready.
In either of those cases I do not see how media can flow to webrtc end
point before fingerprints are available.
…_____________
Roman Shpount
On Mon, Apr 3, 2017 at 12:07 PM, adamroach ***@***.***> wrote:
On their face, @pthatcherg <https://github.com/pthatcherg>'s assertions
appear to be true. Can someone who thinks this can happen draw a ladder
diagram demonstrating how the situation under discussion arises? @fluffy
<https://github.com/fluffy>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1026 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEh5SutuXNrJOR4WnGkpUIJOOrEBlZW3ks5rsRkogaJpZM4L-hgv>
.
|
Imagine a browser sends offer to a SBC like thing. The SBC sends PR answer that sets up just a data connection. At this point ICE comes up and a TLS session comes up. Now the SBC forwards the offer as a SIP invite to a PSTN GW calling 1-800-go-fedex which sets up a second TLS connection but that media packets for this are relayed via the SBC. So the first PR answer set up the ICE. The second TLS connection is happening within that TLS context. The PSTN GW completes the ICE handshake, and then starts sending the one way media with IVR prompts from fedex as ringback tone. At the point that it needs to go two way, the PSTN GW sends a 200 with answer which the SBC translates into answer with the fingerprint of the PSTN GW to send to the browser. If the browser discards this media, it looses the initial prompt. Note that the browser gets the fingerprint and knows who it is talking to before it needs to send any media or DTMF. This case would likely work if you buffered all the media and only played it once the fingerprint arrived because the IVR would just wait at the prompt till the person had listened to the buffered media and pressed 1. But the browser would need to speed up the playback or it could never catch up. |
Cullen,
Can you please explain what does it mean "second TLS connection is
happening within that TLS context"?
As far as I know, to establish second or any new DTLS association you need
to do an ICE restart. You cannot have two DTLS associations over the same
underlying transport (5-tuple) since you cannot de-mux DTLS packet. If you
do the ICE-restart and get consent for a new 5-tuple, you need a complete
offer/answer exchange before you can send data in both directions, which
means fingerprints for both end points are available before DTLS
association is established.
I do think your problem is real, but the early media gets stuck on the SBC
with no way to send it to the WebRTC end point since DTLS session is not
running yet. So, this is an SBC problem, not a WebRTC end [point problem.
Regards,
…_____________
Roman Shpount
On Tue, Apr 4, 2017 at 10:51 AM, Cullen Jennings ***@***.***> wrote:
Imagine a browser sends offer to a SBC like thing. The SBC sends PR answer
that sets up just a data connection. At this point ICE comes up and a TLS
session comes up. Now the SBC forwards the offer as a SIP invite to a PSTN
GW calling 1-800-go-fedex which sets up a second TLS connection but that
media packets for this are relayed via the SBC. So the first PR answer set
up the ICE. The second TLS connection is happening within that TLS context.
The PSTN GW completes the ICE handshake, and then starts sending the one
way media with IVR prompts from fedex as ringback tone. At the point that
it needs to go two way, the PSTN GW sends a 200 with answer which the SBC
translates into answer with the fingerprint of the PSTN GW to send to the
browser. If the browser discards this media, it looses the initial prompt.
Note that the browser gets the fingerprint and knows who it is talking to
*before* it needs to send any media or DTMF. This case would likely work
if you buffered all the media and only played it once the fingerprint
arrived because the IVR would just wait at the prompt till the person had
listened to the buffered media and pressed 1. But the browser would need to
speed up the playback or it could never catch up.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1026 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEh5SsM78TPKmUrAYx88KRUyTS8QypBGks5rslj6gaJpZM4L-hgv>
.
|
On Tue, Apr 4, 2017 at 7:51 AM Cullen Jennings ***@***.***> wrote:
Imagine a browser sends offer to a SBC like thing. The SBC sends PR answer
that sets up just a data connection. At this point ICE comes up and a TLS
session comes up. Now the SBC forwards the offer as a SIP invite to a PSTN
GW calling 1-800-go-fedex which sets up a second TLS connection but that
media packets for this are relayed via the SBC. So the first PR answer set
up the ICE. The second TLS connection is happening within that TLS context.
It's not clear to me what DTLS handshakes are taking place and what
fingerprints the browser sees. Perhaps if you were more specific about
what remote descriptions come into PeerConnection on what DTLS fingerprints
they have, along with when the DTLS handshakes take place.
… The PSTN GW completes the ICE handshake, and then starts sending the one
way media with IVR prompts from fedex as ringback tone. At the point that
it needs to go two way, the PSTN GW sends a 200 with answer which the SBC
translates into answer with the fingerprint of the PSTN GW to send to the
browser. If the browser discards this media, it looses the initial prompt.
Note that the browser gets the fingerprint and knows who it is talking to
*before* it needs to send any media or DTMF. This case would likely work
if you buffered all the media and only played it once the fingerprint
arrived because the IVR would just wait at the prompt till the person had
listened to the buffered media and pressed 1. But the browser would need to
speed up the playback or it could never catch up.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1026 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHaf-uqQApIbgs2M0Q5pu2fU4MdzWWi7ks5rslj7gaJpZM4L-hgv>
.
|
I think I understand Cullen's use case. I just didn't consider there being a second DTLS handshake in the picture. Let me break down the important steps that occur here (if I understand correctly):
EDIT: Nevermind. At step 4, you still can't complete the second DTLS handshake, because you don't have the second remote ICE password yet, so you can't get ICE connected. Which is back to the original problem. The best an implementation could do is cache the second Client Hello. |
At the step 3, WebRTC end point should not send Server Hello on this
candidate pair until it sends its own consent check on this pair. WebRTC
end point cannot send this check until new remote ice-pwd is received in
the final answer. Because of this second DTLS association is not
established until the final answer is received.
Am I missing something here?
…_____________
Roman Shpount
On Tue, Apr 4, 2017 at 8:36 PM, Taylor Brandstetter < ***@***.***> wrote:
I think I understand Cullen's use case. I just didn't consider there being
a second DTLS handshake in the picture. Let me break down the important
steps that occur here (if I understand correctly):
1. Offer and provisional answer exchanged between browser and gateway.
2. DTLS handshake completes using the fingerprint in the provisional
answer.
3. Before receiving a final answer, the WebRTC endpoint receives a
Client Hello on a new candidate pair with a new ufrag (addressing
@rshpount <https://github.com/rshpount>'s point about needing a new
5-tuple).
4. The WebRTC endpoint completes this second handshake, maintaining
the two DTLS associations in parallel?
5. At this point, early media can be received on the second DTLS
association.
6. Eventually the answer is received, the first DTLS association is
discarded and the fingerprint of the second one can be verified.
However, this goes into extreme pranswer edge case territory, which JSEP
doesn't currently define. This is what a really robust implementation
*might* do, but I don't see anything preventing it from just ignoring the
second Client Hello until it gets the final answer and discards the first
DTLS association.
A very related issue is rtcweb-wg/jsep#600
<rtcweb-wg/jsep#600> (scroll down to find stuff
about maintaining N DTLS associations in parallel...). In the PR I wrote to
address this issue, I didn't end up adding any requirements related to
early media, since @juberti <https://github.com/juberti> said
"implementations should be allowed to only handle one remote username at a
time", which seemed reasonable to me.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1026 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEh5Sv8brQF1ghxdRHuMlrI2_VQ3YIJfks5rsuHwgaJpZM4L-hgv>
.
|
Right, that's what I'm saying. See edit. |
Sorry - I messed up the original explain because I had the data and audio and video muxed ... but to do it in the muxed case here ... take the steps Taylor has in the above thing but with with the SBC never setting up any DTLS session. It only does the ICE and never initiated the TLS. Later the PSTN GW initiated the DTLS. So in step 2, it would only be ICE completed not DTLS. I have vague memory there was some flow that used a secondary PR answer with a=dtls-connection:new too. |
Ok; so what fingerprint does the SBC put in the PR answer? If the answer is "a bogus fingerprint", then the WebRTC implementation will fail the DTLS handshake with "bad certificate". The alternative would be that the SBC creates its own certificate, but shares it with the PSTN GW? |
First of all, there is no longer a=dtls-connection:new. This got edited out of dtls-sdp draft. Second, SBC cannot just establish ICE without starting DTLS association with WebRTC end point. Any session description sent to WebRTC end point must have both ICE and DTLS attributes. This means DTLS association is always started after step 2. It is possible to start new DTLS association in the final answer from the same original offer, but this will also require a new ICE session with new ufrag and candidates allocated by SBC. Until ICE password used for these new candidates is delivered to WebRTC end point, WebRTC end point cannot send the consent check to SBC, which prevents ServerHello from being sent to SBC and DTLS association from being established. All of this being said, there is a problem on SBC side. There are plenty of scenarios when it will receive media from SIP gateway with no way to complete the ICE session and DTLS association setup with WebRTC end point. The only way early media works right now, is if SBC sits on the media path doing transcoding if necessary until final answer is received. To make SBC work with early media without transcoding you need functionality to setup ICE/DTLS before doing codec negotiation, which is not coming until 2.0. |
There does not need to be a dtls-connection:new it just comes in an answer that in sip terms would have a different ufrag. In webrtc terms would just be a different answer that arrived after the first PR answer. To answer taylors question, it can put a a fingerprint the SBC can terminate or it can put in a bogus fingerprint but make sure to never negotatte DTLS. It does not need to share a figner print with the SBC. |
Once again, dtls-connection:new no longer exists at all. It was replaced by tls-id. Second, from what everyone else sees, it is impossible to receive unverified media with full ICE end points and consent to send. If you think otherwise, please provide a complete scenario. |
So, it can use either a fingerprint for an association terminated by the SBC, or it can use a bogus fingerprint? I thought we already talked about why this wouldn't work. Maybe I'm still not understanding the network topology. Here's what I think you're describing; can you point out where I went wrong? |
Closing, see comment #849 (comment). |
Fixes #849