An in-depth analysis of “ffmpeg and non-monotone timestamps”

Having built the latest version of ffmpeg and then tried to run this comamnd line on an FLV file grabbed with flvstreamer, I was presented with this error:

[scharles]vrec $ ffmpeg -i VREC_2.flv -acodec copy -vcodec copy -f flv test.flv
FFmpeg version git-svn-r21326, Copyright (c) 2000-2010 Fabrice Bellard, et al.
  built on Jan 19 2010 16:35:26 with gcc 4.0.1 (Apple Inc. build 5493)
  configuration: 
  libavutil     50. 7. 0 / 50. 7. 0
  libavcodec    52.48. 0 / 52.48. 0
  libavformat   52.47. 0 / 52.47. 0
  libavdevice   52. 2. 0 / 52. 2. 0
  libswscale     0. 8. 0 /  0. 8. 0
[flv @ 0x1002600]Estimating duration from bitrate, this may be inaccurate
 
Seems stream 0 codec frame rate differs from container frame rate: 1000.00 (1000/1) -> 24.83 (149/6)
Input #0, flv, from 'VREC_2.flv':
  Duration: 00:24:28.79, start: 0.000000, bitrate: 128 kb/s
    Stream #0.0: Video: vp6f, yuv420p, 480x360, 24.83 tbr, 1k tbn, 1k tbc
    Stream #0.1: Audio: mp3, 44100 Hz, 2 channels, s16, 128 kb/s
Output #0, flv, to 'test.flv':
    Stream #0.0: Video: 0x0004, yuv420p, 480x360, q=2-31, 1k tbn, 1k tbc
    Stream #0.1: Audio: 0x0002, 44100 Hz, 2 channels, 128 kb/s
Stream mapping:
  Stream #0.0 -> #0.0
  Stream #0.1 -> #0.1
Press [q] to stop encoding
[flv @ 0x1007600]st:0 error, non monotone timestamps 1169932 >= 1201
av_interleaved_write_frame(): Error while opening file

In the ffmpeg source, in the file ./libavformat/utils.c at line 2633 we have the following piece of code, which is responsible for the above error and subsequent bail-out of the operation:

2633
2634
2635
2636
2637
2638
    if(st->cur_dts && st->cur_dts != AV_NOPTS_VALUE && st->cur_dts >= pkt->dts){
        av_log(s, AV_LOG_ERROR,
               "st:%d error, non monotone timestamps %"PRId64" >= %"PRId64"\n",
               st->index, st->cur_dts, pkt->dts);
        return -1;
    }

Also worth pointing out here is the comment in the definition of the AVPacket structure, which is what pkt is a pointer to at this point, file is ./libavcodec/avcodec.h and this is the full structure dump of AVPacket for FFmpeg version git-svn-r21326, Copyright (c) 2000-2010 Fabrice Bellard, et al. built on Jan 19 2010 16:35:26 with gcc 4.0.1 (Apple Inc. build 5493) by me…

928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
typedef struct AVPacket {
    /**                                                                                                    
     * Presentation timestamp in AVStream->time_base units; the time at which                              
     * the decompressed packet will be presented to the user.                                              
     * Can be AV_NOPTS_VALUE if it is not stored in the file.                                              
     * pts MUST be larger or equal to dts as presentation cannot happen before                             
     * decompression, unless one wants to view hex dumps. Some formats misuse                              
     * the terms dts and pts/cts to mean something different. Such timestamps                              
     * must be converted to true pts/dts before they are stored in AVPacket.                               
     */
    int64_t pts;
    /**                                                                                                    
     * Decompression timestamp in AVStream->time_base units; the time at which                             
     * the packet is decompressed.                                                                         
     * Can be AV_NOPTS_VALUE if it is not stored in the file.                                              
     */
    int64_t dts;
    uint8_t *data;
    int   size;
    int   stream_index;
    int   flags;
    /**                                                                                                    
     * Duration of this packet in AVStream->time_base units, 0 if unknown.                                 
     * Equals next_pts - this_pts in presentation order.                                                   
     */
    int   duration;
    void  (*destruct)(struct AVPacket *);
    void  *priv;
    int64_t pos;                            ///< byte position in stream, -1 if unknown                    
 
    /**                                                                                                    
     * Time difference in AVStream->time_base units from the pts of this                                   
     * packet to the point at which the output from the decoder has converged                              
     * independent from the availability of previous frames. That is, the                                  
     * frames are virtually identical no matter if decoding started from                                   
     * the very first frame or from this keyframe.                                                         
     * Is AV_NOPTS_VALUE if unknown.                                                                       
     * This field is not the display duration of the current packet.                                       
     *                                                                                                     
     * The purpose of this field is to allow seeking in streams that have no                               
     * keyframes in the conventional sense. It corresponds to the                                          
     * recovery point SEI in H.264 and match_time_delta in NUT. It is also                                 
     * essential for some types of subtitle streams to ensure that all                                     
     * subtitles are correctly displayed after seeking.                                                    
     */
    int64_t convergence_duration;
} AVPacket;

At first sight (and somebody that knows ffmpeg better please correct me) it would appear that we have two timestamps being maintained, one is the presentation timestamp and one is the decompression timestamp. With that in mind, lets go back to the C code that blew the chunks…

2633
2634
2635
2636
2637
2638
    if(st->cur_dts && st->cur_dts != AV_NOPTS_VALUE && st->cur_dts >= pkt->dts){
        av_log(s, AV_LOG_ERROR,
               "st:%d error, non monotone timestamps %"PRId64" >= %"PRId64"\n",
               st->index, st->cur_dts, pkt->dts);
        return -1;
    }

As always, it helps to translate the stuff into English first and then try to see what it thinks it was doing… my real issue is with the English, “non monotone timestamps” may mean something to the gurus that maintain ffmpeg but to us users it doesn’t really mean anything… a classic case of writing error messages that mean nothing to the average user. The definition of monotone is such that it would appear to have absolutely nothing to do with time at all, and everything to do with sound. Interesting error message!

[UPDATE: 24/1/2010: Whilst reading "A Mathematical Theory of Communication" by C.E.Shannon I saw the term "monotonic function" on the very first page and a connection was made. Not being a working mathematician I looked it up and then realised that my ultimate conclusion to this problem was at least correct! So, I will concede that the still somewhat obtuse error is kind of on track but I would suggest that it be changed to "non monotonic" or even better, something a little more verbose that leaves no guessing as to the cause of the error... a few more bytes of RAM is not going to kill ffmpeg methinks.]

So, with all that in our heads and knowing that “st” is a pointer to an AVStream structure and “pkt” is a pointer to an AVPacket structure let’s go…

1] if the current stream decompression timestamp value is not zero
2] and the current stream decompression timestamp value is not AV_NOPTS_VALUE
3] and the current stream decompression timestamp value is bigger than
4] the current packet decompression timestamp value

[1] => on the principle of early bailing, this test will ensure that the remainder of the expression is NOT evaluated if the current value is 0, i.e. we have not presumably decoded any data yet.

[2] assuming [1] is true (we have data) then we check to see if the ‘dts’ value has not been set to the value of AV_NOPTS_VALUE, which immunises the code from failing at this point. Question: Where, when and why would this value be set ? A quick look:

find . -name "*c" -exec grep -iH " = AV_NOPTS_VALUE" {} \; | wc -l
66

Shows 66 assigments so it looks like it gets used to initialise a lot of things in a lot of places… but when does it get replaced with something else? Sigh. Another unknown.

[3] assuming that [1] and [2] are both true (ignoring the reasons why…) then we get to the real crux of the test for generating the error condition: “Has the accumulated time exceeded the packet time, is this packet ‘late’ with respect to the stream timestamp ?”

Given that we now know that timestamps are expected to be monotonic in the positive direction then it would indeed appear to indicate that somehow, the previous packet(s) have caused the current stream decompression time to have exceeded the time at which this packet thinks it was due. Why? I haven’t a clue. But it means that for some reason, during the decoding of the packets in my errant FLV file, there is something causing ffmpeg to think that at some point a packet has somehow fallen behind its decompression time and I have no idea why. The FLV is recorded from a live RTMP stream which may be part of the reason, the start time values seem to be the actual time of day rather than based at 00:00:00. Maybe that is the problem. Maybe I should try to ‘re-base’ or ‘re-stripe’ the internal time code before trying to add the GOP data ? Anybody, help!

I think. (The cause of all my problems.)

If anybody cares to expand / enlighten / correct me then please do as I still need to be able to deal with this file!

Cheers! :)

Published: January 21st, 2010 at 12:33
Categories: C, ffmpeg, video