From: "Nathan Shipley" Received: from mail-pf0-f174.google.com ([209.85.192.174] verified) by media-motion.tv (CommuniGate Pro SMTP 6.1.0) with ESMTPS id 6467850 for AE-List@media-motion.tv; Wed, 20 Jun 2018 20:35:34 +0200 Received: by mail-pf0-f174.google.com with SMTP id r11-v6so230482pfl.6 for ; Wed, 20 Jun 2018 11:43:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=62AcpqZ3/fUz6WsWKy4U/Y2Dl+7nq3VX0S5N+R0UqPY=; b=nk9p+8pr+oArDqmNeM8sAgqatmuo0HbqqMJqo4QxGI9LyEqJaRB47xfDIxEJWtFzzi LTZnX4OpIIuK+x23YfLl1YjCvJBiZlRTSHnpAs224Sx+YTmh82gEu6RYMyUZgauAn6MX amAUCYaS47BhEJbhCmRzrBVuEwQPvYG8fKJ7h+YNx7T19721j9FOYUakcmBBzB9syBbR cKglVQfj9ZN+9jJ2PP2M1AoacgPYMa/7mpCdY2aKjn0gPS9voWLx6rSb4PYqWIwZo6ZC 5Z4Rc0Ude08Ql7PBmR61alBG7F6rUxJqT3DCQOaS45wZDV3TkZx9kY6ET6OnCJLrfsun UiaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=62AcpqZ3/fUz6WsWKy4U/Y2Dl+7nq3VX0S5N+R0UqPY=; b=rWUc7gGiV0dVRLIZG58gKA3FeruIMV1oatMLKSqirDYAiJV7RA7XAnBVMCRIlbAStJ MSA4RcFfmaopKb3UETq/F69MrpfX3ebMRgIV71gTWAO/tefDGEHiHjGrpGFWbqMPQHO7 MWBwFWh6Yio1WRZC+rIsSMO8+h9kR9aaqWMV9P6V43EnlDn5NgSLAvo3g+DkoeWS7/TI mQ5JLl8ev1W9tn20tixImv0M4xfgEooJL3UIndckmNbPYoH3pQGqs7GWmA5UiwkFFPWV iwV4YuKflYasuis5iPkVN6a6Gy+92ZHimKDCDp4XflsOp/Op+mVWSepZ6H0kvXMvXvdB u1Sg== X-Gm-Message-State: APt69E37bFKSOx7JtIj0pCQM1w1Bc/gwNcPwYB8Zf7CTSz9P5iIdatwb mNBvlpTdEIjit6dJ1kO2E6pGJY2qW6eayS9JO3DKScfP X-Google-Smtp-Source: ADUXVKKgb8QnakoTqzJ9vujl6ohJzr3ob7hcjUDQ5G8f1uguUF7AGE9lzkT52yAGgP7cE+0xku2ghbXS3syYnvrMUSE= X-Received: by 2002:a63:5f0c:: with SMTP id t12-v6mr19264500pgb.95.1529520195851; Wed, 20 Jun 2018 11:43:15 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a17:90a:7146:0:0:0:0 with HTTP; Wed, 20 Jun 2018 11:42:55 -0700 (PDT) In-Reply-To: References: Date: Wed, 20 Jun 2018 11:42:55 -0700 Message-ID: Subject: Re: [AE] nVidias Slo mo demo To: After Effects Mail List Content-Type: multipart/alternative; boundary="000000000000f359d9056f172bec" --000000000000f359d9056f172bec Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable It sounds like the Nvidia tech is generating those mattes automatically and using them to blend optical flow retimed versions of the footage, but that it's helped by a trained neural net. Here's the summary of the white paper they released; I bolded some key parts: Given two consecutive frames, video interpolation aims at generating intermediate frame(s) to form both spatially and temporally coherent video sequences. While most existing methods focus on single-frame interpolation, we propose an end-to-end convolutional neural network for variable-length multi-frame video interpolation, where the motion interpretation and occlusion reasoning are jointly modeled. *We start by computing bi-directional optical flow between the input images* using a U-Net architecture. These flows are then linearly combined at each time step to approximate the intermediate bi-directional optical flows. These approximate flows, however, only work well in locally smooth regions and *produce artifacts around motion boundaries*. To address this shortcoming, we employ another U-Net to *refine the approximated flow and also predict soft visibility maps*. *Finally, the two input images are warped and linearly fused to form each intermediate frame. By applying the visibility maps to the warped images before fusion, we exclude the contribution of occluded pixels to the interpolated intermediate frame to avoid artifacts*. Since none of our learned network parameters are time-dependent, our approach is able to produce as many intermediate frames as needed. We use 1,132 video clips with 240-fps, containing 300K individual video frames, to train our network. Experimental results on several datasets, predicting different numbers of interpolated frames, demonstrate that our approach performs consistently better than existing methods. So, yeah, David - it sounds like they've improved the video analysis part to deal with objects moving separately with machine learning. Which current optical-flow based tech doesn't do, to the best of my knowledge. Looks quite cool! It'd be nice to see some samples of how it does on footage that isn't already slow-motion. There are some still frames at the end of the above-linked white paper, but no motion. - Nathan On Wed, Jun 20, 2018 at 11:31 AM, David Baud wrote: > My understanding is that in order to get good result with any of the > optical flow solutions, the system needs to be able to define the contour > of your moving =E2=80=9Cobjects=E2=80=9D (i.e a person, a ball, a car, et= c=E2=80=A6) in your frame. > Better the system is capable of recognizing these objects, better results > you will get. Twixtor Pro version will let you =E2=80=9Chelp=E2=80=9D the= system to define > these contour by providing a mask for your object. As we know it can be > time consuming to do rotoscoping. Where I think these systems can improve > is in the automatic recognition of the moving objects in your frame, i.e > recognizing "the person walking" and "the picket fence" as two different > objects. I am not familiar with the proposed technology by nVidia, but > maybe they improved the analysis of a video and the system is capable of > calculating automatically the displacement of all objects in a frame > separately? > > Maybe Peter with RE:Vision will chime in this discussion and correct me i= f > I am wrong =F0=9F=98=89=E2=80=A6 and maybe gives us a better understandin= g of the optical > flow technology in general=E2=80=A6 without revealing his secret sauce fo= r Twixtor! > > David Baud > Colorist & Finishing Editor > david at kosmos-productions.com > > On Jun 20, 2018, at 11:59 , Jim Curtis wrote: > > Optical Flow and Twixtor have limitations. Try a slo-mo of a person > walking next to a picket fence, and see how wacky the pickets become with > any method besides frame-blending. There have been occasions where I=E2= =80=99ve > stitched together the different methods with masking and editing, as ther= e > seems not to be a Silver Bullet so far. If this is it, I=E2=80=99m inter= ested! > Thanks for the head=E2=80=99s up. > > > --000000000000f359d9056f172bec Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
It sounds like the Nvidia tech is generating those mattes = automatically and using them to blend optical flow retimed versions of the = footage, but that it's helped by a trained neural net.=C2=A0 Here's= the summary of the white = paper they released; I bolded some key parts:

Given two consecu= tive frames, video interpolation aims at generating intermediate frame(s) t= o form both spatially and temporally coherent video sequences. While most e= xisting methods focus on single-frame interpolation, we propose an end-to-e= nd convolutional neural network for variable-length multi-frame video inter= polation, where the motion interpretation and occlusion reasoning are joint= ly modeled. We start by computing bi-directional optical flow between th= e input images using a U-Net architecture. These flows are then linearl= y combined at each time step to approximate the intermediate bi-directional= optical flows. These approximate flows, however, only work well in locally= smooth regions and produce artifacts around motion boundaries. To a= ddress this shortcoming, we employ another U-Net to refine the approxima= ted flow and also predict soft visibility maps. Finally, the two inp= ut images are warped and linearly fused to form each intermediate frame. By= applying the visibility maps to the warped images before fusion, we exclud= e the contribution of occluded pixels to the interpolated intermediate fram= e to avoid artifacts. Since none of our learned network parameters are = time-dependent, our approach is able to produce as many intermediate frames= as needed. We use 1,132 video clips with 240-fps, containing 300K individu= al video frames, to train our network. Experimental results on several data= sets, predicting different numbers of interpolated frames, demonstrate that= our approach performs consistently better than existing methods.

So, yeah, David - it sounds like they've improv= ed the video analysis part to deal with objects moving separately with mach= ine learning.=C2=A0 Which current optical-flow based tech doesn't do, t= o the best of my knowledge.

Looks quite cool!=C2=A0 It&#= 39;d be nice to see some samples of how it does on footage that isn't a= lready slow-motion.=C2=A0 There are some still frames at the end of the abo= ve-linked white paper, but no motion.

- Nathan

On Wed, J= un 20, 2018 at 11:31 AM, David Baud <AE-List@media-motion.tv>= wrote:
My understanding is that in order to get good result with any of = the optical flow solutions, the system needs to be able to define the conto= ur of your moving =E2=80=9Cobjects=E2=80=9D (i.e a person, a ball, a car, e= tc=E2=80=A6) in your frame. Better the system is capable of recognizing the= se objects, better results you will get. Twixtor Pro version will let you = =E2=80=9Chelp=E2=80=9D the system to define these contour by providing a ma= sk for your object. As we know it can be time consuming to do rotoscoping. = Where I think these systems can improve is in the automatic recognition of = the moving objects in your frame, i.e recognizing "the person walking&= quot; and "the picket fence" as two different objects. I am not f= amiliar with the proposed technology by nVidia, but maybe they improved the= analysis of a video and the system is capable of calculating automatically= the displacement of all objects in a frame separately?

= Maybe Peter with RE:Vision will chime in this discussion and correct me if = I am wrong =F0=9F=98=89=E2=80=A6 and maybe gives us a better understanding = of the optical flow technology in general=E2=80=A6 without revealing his se= cret sauce for Twixtor!

<= span class=3D"m_-7146124434611830375Apple-style-span" style=3D"border-colla= pse:separate;color:rgb(0,0,0);font-variant-ligatures:normal;font-variant-ca= ps:normal;font-variant-east-asian:normal;letter-spacing:normal;line-height:= normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-sp= ace:normal;word-spacing:0px;border-spacing:0px">
David Baud
<= span style=3D"font-size:12px">Colorist & Finishing Editor
<= /span>
david at kosmos-productions.com
<= /div>

On Jun 20, 2018, at 11:59 , Jim Cur= tis <AE-Lis= t@media-motion.tv> wrote:

Optical F= low and Twixtor have limitations.=C2=A0 Try a slo-mo of a person walking ne= xt to a picket fence, and see how wacky the pickets become with any method = besides frame-blending.=C2=A0 There have been occasions where I=E2=80=99ve = stitched together the different methods with masking and editing, as there = seems not to be a Silver Bullet so far.=C2=A0 If this is it, I=E2=80=99m in= terested!=C2=A0 Thanks for the head=E2=80=99s up.
<= /div>

--000000000000f359d9056f172bec--